The same repo (so mono-repo approach). All packages would be in
"airflow_integrations" directory. It's mainly about moving the
operators/hooks/sensor files to different directory structure.

It might be done pretty much without changing the current
installation/development model:

1) We can add setup.py command to install all the packages in -e mode in
the main setup.py (to make it easier to install all deps in one go).
2) We can add dependencies in setup.py extras to install appropriate
packages. For example [google] extra will 'require
apache-airflow-integrations-providers-google' package - or
apache-airflow-providers-google if we decide to skip -integrations from the
package name to make it shorter.

The only potential drawback I see is a bit more involved setup of the IDE.

This way installation method for both dev and prod remains simple.

In the future we can have separate release schedule for the packages
(AIP-8) but for now we can stick to the same version for 'apache-airflow'
and 'apache-airflow-integrations-*' package (+ separate release schedule
for backporting needs)
Here again the structure of repo (we will likely be able to use native
namespaces so I removed some needles __init__.py).

|-- airflow
|   |- __init__.py|   |- operators -> fundamental operators are here
|-- tests -> tests for core airflow are here (optionally we can move
them under "airflow")|-- setup.py -> setup.py for the "apache-airflow"
package|-- airflow_integrations
|   |-providers
|   | |-google
|   |   |-setup.py -> setup.py for the
"apache-airflow-integrations-providers-google" package
|   |   |-airflow_integrations
|   |     |-providers
|   |       |-google
|   |         |-__init__.py
|   |         | tests -> tests for the
"apache-airflow-integrations-providers-google" package|   |
|-__init__.py|   |-protocols
|     |-setup.py -> setup.py for the
"apache-airflow-integrations-protocols" package
|     |-airflow_integrations
|        |-protocols
|          |-__init__.py|          |-tests -> tests for the
"apache-airflow-integrations-protocols" package


J.

On Thu, Oct 31, 2019 at 3:38 PM Kaxil Naik <kaxiln...@gmail.com> wrote:

> So create another package in a different repo? or the same repo with a
> separate setup.py file that has airflow has dependency?
>
>
>
>
> On Thu, Oct 31, 2019 at 2:32 PM Jarek Potiuk <jarek.pot...@polidea.com>
> wrote:
>
> > TL;DR; I did some more testing on how namespaces work. I still believe
> the
> > only way to use namespaces is to have separate (for example
> > "airflow_integrations") package for all backportable packages.
> >
> > I am not sue if someone used namespaces before, but after reading and
> > trying out , the main blocker seems to be that we have non-trivial code
> in
> > airflow's "__init__.py"  (including class definitions, imported
> > sub-packages and plugin initialisation).
> >
> > Details are in
> > https://packaging.python.org/guides/packaging-namespace-packages/ but
> it's
> > a long one so let me summarize my findings:
> >
> >    - In order to use "airflow.providers" package we would have to declare
> >    "airflow" as namespace
> >    - It can be done in three different ways:
> >       - omitting __init__.py in this package (native/implicit namespace)
> >       - making __init__.py  of the "airflow" package in main airflow (and
> >       other packages) must be "*__path__ =
> >       __import__('pkgutil').extend_path(__path__, __name__)*" (pkgutil
> >       style) or
> "*__import__('pkg_resources').declare_namespace(__name__)*"
> >       (pkg_resources style)
> >
> > The first is not possible (we already have __init__.py  in "airflow".
> > The second case is not possible because we already have quite a lot in
> the
> > airflow's "__init__.py" and both pkgutil and pkg_resources style state:
> >
> > "*Every* distribution that uses the namespace package must include an
> > identical *__init__.py*. If any distribution does not, it will cause the
> > namespace logic to fail and the other sub-packages will not be
> importable.
> > *Any
> > additional code in __init__.py will be inaccessible."*
> >
> > I even tried to add those pkgutil/pkg_resources to airflow and do some
> > experimenting with it - but it does not work. Pip install fails at the
> > plugins_manager as "airflow.plugins" is not accessible (kind of
> expected),
> > but I am sure there will be other problems as well. :(
> >
> > Basically - we cannot turn "airflow" into namespace because it has some
> > "__init__.py" logic :(.
> >
> > So I think it still holds that if we want to use namespaces, we should
> use
> > another package. The *"airflow_integrations"* is current candidate, but
> we
> > can think of some nicer/shorter one: "airflow_ext", "airflow_int",
> > "airflow_x", "airflow_mod", "airlfow_next", "airflow_xt", "airflow_",
> > "ext_airflow", ....  Interestingly "airflow_" is the one suggested by
> PEP8
> > to avoid conflicts with Python names (which is a different case but kind
> of
> > close).
> >
> > What do you think?
> >
> > J.
> >
> > On Tue, Oct 29, 2019 at 4:51 PM Kaxil Naik <kaxiln...@gmail.com> wrote:
> >
> > > The namespace feature looks promising and from your tests, it looks
> like
> > it
> > > would work well from Airflow 2.0 and onwards.
> > >
> > > I will look at it in-depth and see if I have more suggestions or
> opinion
> > on
> > > it
> > >
> > > On Tue, Oct 29, 2019 at 3:32 PM Jarek Potiuk <jarek.pot...@polidea.com
> >
> > > wrote:
> > >
> > > > TL;DR; We did some testing about namespaces and packaging (and
> > potential
> > > > backporting options for 1.10.* python3 Airflows) and we think it's
> best
> > > to
> > > > use namespaces quickly and use different package name
> > > > "airflow-integrations" for all non-fundamental integrations.
> > > >
> > > > Unless we missed some tricks, we cannot use airflow.* sub-packages
> for
> > > the
> > > > 1.10.* backportable packages. Example:
> > > >
> > > >    - "*apache-airflow"* package provides: "airflow.*" (this is what
> we
> > > have
> > > >    today)
> > > >    - "*apache-airflow-providers-google*": provides
> > > >    "airflow.providers.google.*" packages
> > > >
> > > > If we install both packages (old apache-airflow 1.10.6  and new
> > > > apache-airflow-providers-google from 2.0) - it seems that
> > > > the "airflow.providers.google.*" package cannot be imported. This is
> a
> > > bit
> > > > of a problem if we would like to backport the operators from Airflow
> > 2.0
> > > to
> > > > Airflow 1.10 in a way that will be forward-compatible We really want
> > > users
> > > > who started using backported operators in 1.10.* do not have to
> change
> > > > imports in their DAGs to run them in Airflow 2.0.
> > > >
> > > > We discussed it internally in our team and considered several
> options,
> > > but
> > > > we think the best way will be to go straight to "namespaces" in
> Airflow
> > > 2.0
> > > > and to have the integrations (as discussed in AIP-21 discussion) to
> be
> > > in a
> > > > separate "*airflow_integrations*" package.  It might be even more
> > towards
> > > > the AIP-8 implementation and plays together very well in terms of
> > > > "stewardship" discussed in AIP-21 now. But we will still keep (for
> now)
> > > > single release process for all packages for 2.0 (except for the
> > > backporting
> > > > which can be done per-provider before 2.0 release) and provide a
> > > foundation
> > > > for future more complex release cycles in future versions.
> > > >
> > > > Herre is the way how the new Airflow 2.0 repository could look like
> (i
> > > only
> > > > show subset of dirs but they are representative). For those whose
> email
> > > > fixed/colorfont will get corrupted here is an image of this structure
> > > > https://pasteboard.co/IEesTih.png:
> > > >
> > > > |-- airflow
> > > > |   |- __init__.py|   |- operators -> fundamental operators are here
> > > > |-- tests -> tests for core airflow are here (optionally we can move
> > > > them under "airflow")|-- setup.py -> setup.py for the
> "apache-airflow"
> > > > package|-- airflow_integrations
> > > > |   |-providers
> > > > |   | |-google
> > > > |   |   |-setup.py -> setup.py for the
> > > > "apache-airflow-integrations-providers-google" package
> > > > |   |   |-airflow_integrations
> > > > |   |     |-__init__.py
> > > > |   |     |-providers
> > > > |   |       |-__init__.py
> > > > |   |       |-google
> > > > |   |         |-__init__.py
> > > > |   |         | tests -> tests for the
> > > > "apache-airflow-integrations-providers-google" package|   |
> > > > |-__init__.py|   |-protocols
> > > > |     |-setup.py -> setup.py for the
> > > > "apache-airflow-integrations-protocols" package
> > > > |     |-airflow_integrations
> > > > |        |-protocols
> > > > |          |-__init__.py|          |-tests -> tests for the
> > > > "apache-airflow-integrations-protocols" package
> > > >
> > > > There are a number of pros for this solution:
> > > >
> > > >    - We could use the standard namespaces feature of python to build
> > > >    multiple packages:
> > > >    https://packaging.python.org/guides/packaging-namespace-packages/
> > > >    - Installation for users will be the same as previously. We could
> > > >    install the needed packages automatically when particular extras
> are
> > > > used
> > > >    (pip install apache-airflow[google] could install both
> > > "apache-airflow"
> > > > and
> > > >    "apache-airflow-integrations-providers-google")
> > > >    - We could have custom setup.py installation process for
> developers
> > > that
> > > >    could install all the packages in development ("-e ." mode) in a
> > > single
> > > >    operation.
> > > >    - In case of transfer packages we could have nice error messages
> > > >    informing that the other package needs to be installed (for
> example
> > > > S3->GCS
> > > >    operator would import "airflow-integrations.providers.amazon.*"
> and
> > if
> > > > it
> > > >    fails it could raise ("Please install [amazon] extra to use me.")
> > > >    - We could implement numerous optimisations in the way how we run
> > > tests
> > > >    in CI (for example run all the "providers" tests only with sqlite,
> > run
> > > >    tests in parallel etc.)
> > > >    - We could implement it gradually - we do not have to have a "big
> > > bang"
> > > >    approach - we can implement it in "provider-by-provider" way and
> > test
> > > it
> > > >    with one provider (Google) first to make sure that all the
> > mechanisms
> > > > are
> > > >    working
> > > >    - For now we could have the monorepo approach where all the
> packages
> > > >    will be developed in concert - for now avoiding the dependency
> > > problems
> > > >    (but allowing for back-portability to 1.10).
> > > >    - We will have clear boundaries between packages and ability to
> test
> > > for
> > > >    some unwanted/hidden dependencies between packages.
> > > >    - We could switch to (much better) sphinx-apidoc package to
> continue
> > > >    building single documentation for all of those (sphinx apidoc has
> > > > support
> > > >    for namespaces).
> > > >
> > > > As we are working on GCP move from contrib to core, we could make all
> > the
> > > > effort to test it and try it before we merge it to master so that it
> > will
> > > > be ready for others (and we could help with most of the moves
> > > afterwards).
> > > > It seems complex, but in fact in most cases it will be very simple
> move
> > > > between the packages and can be done incrementally so there is little
> > > risk
> > > > in doing this I think.
> > > >
> > > > J.
> > > >
> > > >
> > > > On Mon, Oct 28, 2019 at 11:45 PM Kevin Yang <yrql...@gmail.com>
> wrote:
> > > >
> > > > > Tomasz and Ash got good points about the overhead of having
> separate
> > > > repos.
> > > > > But while we grow bigger and more mature, I would prefer to have
> what
> > > was
> > > > > described in AIP-8. It shouldn't be extremely hard for us to come
> up
> > > with
> > > > > good strategies to handle the overhead. AIP-8 already talked about
> > how
> > > it
> > > > > can benefit us. IMO on a high level, having clearly seperation on
> > core
> > > > vs.
> > > > > hooks/operators would make the project much more scalable and the
> > gains
> > > > > would outweigh the cost we pay.
> > > > >
> > > > > That being said, I'm supportive to this moving towards AIP-8 while
> > > > learning
> > > > > approach, quite a good practise to tackle a big project. Looking
> > > forward
> > > > to
> > > > > read the AIP.
> > > > >
> > > > >
> > > > > Cheers,
> > > > > Kevin Y
> > > > >
> > > > > On Mon, Oct 28, 2019 at 6:21 AM Jarek Potiuk <
> > jarek.pot...@polidea.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > We are checking how we can use namespaces in back-portable way
> and
> > we
> > > > > will
> > > > > > have POC soon so that we all will be able to see how it will look
> > > like.
> > > > > >
> > > > > > J.
> > > > > >
> > > > > > On Mon, Oct 28, 2019 at 1:24 PM Ash Berlin-Taylor <
> a...@apache.org>
> > > > > wrote:
> > > > > >
> > > > > > > I'll have to read your proposal in detail (sorry, no time right
> > > > now!),
> > > > > > but
> > > > > > > I'm broadly in favour of this approach, and I think keeping
> them
> > > _in_
> > > > > the
> > > > > > > same repo is the best plan -- that makes writing and  testing
> > > > > > cross-cutting
> > > > > > > changes  easier.
> > > > > > >
> > > > > > > -a
> > > > > > >
> > > > > > > > On 28 Oct 2019, at 12:14, Tomasz Urbaszek <
> > > > > tomasz.urbas...@polidea.com
> > > > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > I think utilizing namespaces should reduce a lot of problems
> > > raised
> > > > > by
> > > > > > > > using separate repos (who will manage it? how to release?
> where
> > > > > should
> > > > > > be
> > > > > > > > the repo?).
> > > > > > > >
> > > > > > > > Bests,
> > > > > > > > Tomek
> > > > > > > >
> > > > > > > > On Sun, Oct 27, 2019 at 11:54 AM Jarek Potiuk <
> > > > > > jarek.pot...@polidea.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >> Thanks Bas for comments! Let me share my thoughts below.
> > > > > > > >>
> > > > > > > >> On Sun, Oct 27, 2019 at 9:23 AM Bas Harenslak <
> > > > > > > >> basharens...@godatadriven.com>
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >>> Hi Jarek, I definitely see a future in creating separate
> > > > > installable
> > > > > > > >>> packages for various operators/hooks/etc (as in AIP-8).
> This
> > > > would
> > > > > > IMO
> > > > > > > >>> strip the “core” Airflow to only what’s needed and result
> in
> > a
> > > > > small
> > > > > > > >>> package without a ton of dependencies (and make it more
> > > > > maintainable,
> > > > > > > >>> shorter tests, etc etc etc). Not exactly sure though what
> > > you’re
> > > > > > > >> proposing
> > > > > > > >>> in your e-mail, is it a new AIP for an intermediate step
> > > towards
> > > > > > AIP-8?
> > > > > > > >>>
> > > > > > > >>
> > > > > > > >> It's a new AIP I am proposing.  For now it's only for
> > > backporting
> > > > > the
> > > > > > > new
> > > > > > > >> 2.0 import paths to 1.10.* series.
> > > > > > > >>
> > > > > > > >> It's more of "incremental going in direction of AIP-8 and
> > > learning
> > > > > > some
> > > > > > > >> difficulties involved" than implementing AIP-8 fully. We are
> > > > taking
> > > > > > > >> advantage of changes in import paths from AIP-21 which make
> it
> > > > > > possible
> > > > > > > to
> > > > > > > >> have both old and new (optional) operators available in
> 1.10.*
> > > > > series
> > > > > > of
> > > > > > > >> Airflow. I think there is a lot more to do for full
> > > implementation
> > > > > of
> > > > > > > >> AIP-8: decisions how to maintain, install those operator
> > groups
> > > > > > > separately,
> > > > > > > >> stewardship model/organisation for the separate groups, how
> to
> > > > > manage
> > > > > > > >> cross-dependencies, procedures for releasing the packages
> etc.
> > > > > > > >>
> > > > > > > >> I think about this new AIP also as a learning effort - we
> > would
> > > > > learn
> > > > > > > more
> > > > > > > >> how separate packaging works and then we can follow up with
> > > AIP-8
> > > > > full
> > > > > > > >> implementation for "modular" Airflow. Then AIP-8 could be
> > > > > implemented
> > > > > > in
> > > > > > > >> Airflow 2.1 for example - or 3.0 if we start following
> > semantic
> > > > > > > versioning
> > > > > > > >> - based on those learnings. It's a bit of good example of
> > having
> > > > > cake
> > > > > > > and
> > > > > > > >> eating it too. We can try out modularity in 1.10.* while
> > cutting
> > > > the
> > > > > > > scope
> > > > > > > >> of 2.0 and not implementing full management/release
> procedure
> > > for
> > > > > > AIP-8
> > > > > > > >> yet.
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>> Thinking about this, I think there are still a few grey
> areas
> > > > > (which
> > > > > > > >> would
> > > > > > > >>> be good to discuss in a new AIP, or continue on AIP-8):
> > > > > > > >>>
> > > > > > > >>>  *   In your email you only speak only about the 3 big
> cloud
> > > > > > providers
> > > > > > > >>> (btw I made a PR for migrating all AWS components ->
> > > > > > > >>> https://github.com/apache/airflow/pull/6439). Is there a
> > plan
> > > > for
> > > > > > > >>> splitting other components than Google/AWS/Azure?
> > > > > > > >>>
> > > > > > > >>
> > > > > > > >> We could add more groups as part of this new AIP indeed (as
> an
> > > > > > > extension to
> > > > > > > >> AIP-21 and pre-requisite to AIP-8). We already see how
> > > > > > > moving/deprecation
> > > > > > > >> works for the providers package - it works for GCP/Google
> > rather
> > > > > > nicely.
> > > > > > > >> But there is nothing to prevent us from extending it to
> cover
> > > > other
> > > > > > > groups
> > > > > > > >> of operators/hooks. If you look at the current structure of
> > > > > > > documentation
> > > > > > > >> done by Kamil, we can follow the structure there and move
> the
> > > > > > > >> operators/hooks accordingly (
> > > > > > > >>
> > > > >
> > https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html
> > > > > > ):
> > > > > > > >>
> > > > > > > >>      Fundamentals, ASF: Apache Software Foundation, Azure:
> > > > Microsoft
> > > > > > > >> Azure, AWS: Amazon Web Services, GCP: Google Cloud Platform,
> > > > Service
> > > > > > > >> integrations, Software integrations, Protocol integrations.
> > > > > > > >>
> > > > > > > >> I am happy to include that in the AIP - if others agree
> it's a
> > > > good
> > > > > > > idea.
> > > > > > > >> Out of those groups -  I think only Fundamentals should not
> be
> > > > > > > back-ported.
> > > > > > > >> Others should be rather easy to port (if we decide to). We
> > > already
> > > > > > have
> > > > > > > >> quite a lot of those in the new GCP operators for 2.0. So
> > > starting
> > > > > > with
> > > > > > > >> GCP/Google group is a good idea. Also following with Cloud
> > > > Providers
> > > > > > > first
> > > > > > > >> is a good thing. For example we have now support from Google
> > > > > Composer
> > > > > > > team
> > > > > > > >> to do this separation for GCP (and we learn from it) and
> then
> > we
> > > > can
> > > > > > > claim
> > > > > > > >> the stewardship in our team for releasing the python 3/
> > Airflow
> > > > > > > >> 1.10-compatible "airflow-google" packages. Possibly other
> > Cloud
> > > > > > > >> Providers/teams might follow this (if they see the value in
> > it)
> > > > and
> > > > > > > there
> > > > > > > >> could be different stewards for those. And then we can do
> > other
> > > > > groups
> > > > > > > if
> > > > > > > >> we decide to. I think this way we can learn whether AIP-8 is
> > > > > > manageable
> > > > > > > and
> > > > > > > >> what real problems we are going to face.
> > > > > > > >>
> > > > > > > >>  *   Each “plugin” e.g. GCP would be a separate repo, should
> > we
> > > > > create
> > > > > > > >>> some sort of blueprint for such packages?
> > > > > > > >>>
> > > > > > > >>
> > > > > > > >> I think we do not need separate repos (at all) but in this
> new
> > > AIP
> > > > > we
> > > > > > > can
> > > > > > > >> test it before we decide to go for AIP-8. IMHO - monorepo
> > > approach
> > > > > > will
> > > > > > > >> work here rather nicely. We could use python-3 native
> > namespaces
> > > > > > > >> <
> > > > https://packaging.python.org/guides/packaging-namespace-packages/>
> > > > > > for
> > > > > > > >> the
> > > > > > > >> sub-packages when we go full AIP-8. For now we could simply
> > > > package
> > > > > > the
> > > > > > > new
> > > > > > > >> operators in separate pip package for Python 3 version
> 1.10.*
> > > > series
> > > > > > > only.
> > > > > > > >> We only need to test if it works well with another package
> > > > providing
> > > > > > > >> 'airflow.providers.*' after apache-airflow is installed
> > > (providing
> > > > > > > >> 'airflow' package). But I think we can make it work. I don't
> > > think
> > > > > we
> > > > > > > >> really need to split the repos, namespaces will work just
> fine
> > > and
> > > > > has
> > > > > > > >> easier management of cross-repository dependencies (but we
> can
> > > > learn
> > > > > > > >> otherwise). For sure we will not need it for the new
> proposed
> > > AIP
> > > > of
> > > > > > > >> backporting groups to 1.10 and we can defer that decision to
> > > AIP-8
> > > > > > > >> implementation time.
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>>  *   In which Airflow version do we start raising
> deprecation
> > > > > > warnings
> > > > > > > >>> and in which version would we remove the original?
> > > > > > > >>>
> > > > > > > >>
> > > > > > > >> I think we should do what we did in GCP case already. Those
> > old
> > > > > > > "imports"
> > > > > > > >> for operators can be made as deprecated in Airflow 2.0 (and
> > > > removed
> > > > > in
> > > > > > > 2.1
> > > > > > > >> or 3.0 if we start following semantic versioning). We can
> > > however
> > > > do
> > > > > > it
> > > > > > > >> before in 1.10.7 or 1.10.8 if we release those (without
> > removing
> > > > the
> > > > > > old
> > > > > > > >> operators yet - just raise deprecation warnings and inform
> > that
> > > > for
> > > > > > > python3
> > > > > > > >> the new "airflow-google", "airflow-aws" etc. packages can be
> > > > > installed
> > > > > > > and
> > > > > > > >> users can switch to it).
> > > > > > > >>
> > > > > > > >> J.
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>>
> > > > > > > >>> Cheers,
> > > > > > > >>> Bas
> > > > > > > >>>
> > > > > > > >>> On 27 Oct 2019, at 08:33, Jarek Potiuk <
> > > jarek.pot...@polidea.com
> > > > > > > <mailto:
> > > > > > > >>> jarek.pot...@polidea.com>> wrote:
> > > > > > > >>>
> > > > > > > >>> Hello - any comments on that? I am happy to make it into an
> > AIP
> > > > :)?
> > > > > > > >>>
> > > > > > > >>> On Sun, Oct 13, 2019 at 5:53 PM Jarek Potiuk <
> > > > > > jarek.pot...@polidea.com
> > > > > > > >>> <mailto:jarek.pot...@polidea.com>>
> > > > > > > >>> wrote:
> > > > > > > >>>
> > > > > > > >>> *Motivation*
> > > > > > > >>>
> > > > > > > >>> I think we really should start thinking about making it
> > easier
> > > to
> > > > > > > migrate
> > > > > > > >>> to 2.0 for our users. After implementing some recent
> changes
> > > > > related
> > > > > > to
> > > > > > > >>> AIP-21-
> > > > > > > >>> Changes in import paths
> > > > > > > >>> <
> > > > > > > >>>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
> > > > > > > >>>
> > > > > > > >>> I
> > > > > > > >>> think I have an idea that might help with it.
> > > > > > > >>>
> > > > > > > >>> *Proposal*
> > > > > > > >>>
> > > > > > > >>> We could package some of the new and improved 2.0 operators
> > > > (moved
> > > > > to
> > > > > > > >>> "providers" package) and let them be used in Python 3
> > > environment
> > > > > of
> > > > > > > >>> airflow 1.10.x.
> > > > > > > >>>
> > > > > > > >>> This can be done case-by-case per "cloud provider". It
> should
> > > not
> > > > > be
> > > > > > > >>> obligatory, should be largely driven by each provider. It's
> > not
> > > > yet
> > > > > > > full
> > > > > > > >>> AIP-8
> > > > > > > >>> Split Hooks/Operators into separate packages
> > > > > > > >>> <
> > > > > > > >>>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303
> > > > > > > >>> .
> > > > > > > >>> It's
> > > > > > > >>> merely backporting of some operators/hooks to get it work
> in
> > > > 1.10.
> > > > > > But
> > > > > > > by
> > > > > > > >>> doing it we might try out the concept of splitting, learn
> > about
> > > > > > > >> maintenance
> > > > > > > >>> problems and maybe implement full *AIP-8 *approach in 2.1
> > > > > > consistently
> > > > > > > >>> across the board.
> > > > > > > >>>
> > > > > > > >>> *Context*
> > > > > > > >>>
> > > > > > > >>> Part of the AIP-21 was to move import paths for Cloud
> > providers
> > > > to
> > > > > > > >>> separate providers/<PROVIDER> package. An example for that
> > (the
> > > > > first
> > > > > > > >>> provider we already almost migrated) was providers/google
> > > package
> > > > > > > >> (further
> > > > > > > >>> divided into gcp/gsuite etc).
> > > > > > > >>>
> > > > > > > >>> We've done a massive migration of all the Google-related
> > > > operators,
> > > > > > > >>> created a few missing ones and retrofitted some old
> operators
> > > to
> > > > > > follow
> > > > > > > >> GCP
> > > > > > > >>> best practices and fixing a number of problems - also
> > > > implementing
> > > > > > > >> Python3
> > > > > > > >>> and Pylint compatibility. Some of these operators/hooks are
> > not
> > > > > > > backwards
> > > > > > > >>> compatible. Those that are compatible are still available
> via
> > > the
> > > > > old
> > > > > > > >>> imports with deprecation warning.
> > > > > > > >>>
> > > > > > > >>> We've added missing tests (including system tests) and
> > missing
> > > > > > > features -
> > > > > > > >>> improving some of the Google operators - giving the users
> > more
> > > > > > > >> capabilities
> > > > > > > >>> and fixing some issues. Those operators should pretty much
> > > "just
> > > > > > work"
> > > > > > > in
> > > > > > > >>> Airflow 1.10.x (any recent version) for Python 3. We should
> > be
> > > > able
> > > > > > to
> > > > > > > >>> release a separate pip-installable package for those
> > operators
> > > > that
> > > > > > > users
> > > > > > > >>> should be able to install in Airflow 1.10.x.
> > > > > > > >>>
> > > > > > > >>> Any user will be able to install this separate package in
> > their
> > > > > > Airflow
> > > > > > > >>> 1.10.x installation and start using those new "provider"
> > > > operators
> > > > > in
> > > > > > > >>> parallel to the old 1.10.x operators. Other providers
> > > > ("microsoft",
> > > > > > > >>> "amazon") might follow the same approach if they want. We
> > could
> > > > > even
> > > > > > at
> > > > > > > >>> some point decide to move some of the core operators in
> > similar
> > > > > > fashion
> > > > > > > >>> (for example following the structure proposed in the latest
> > > > > > > >> documentation:
> > > > > > > >>> fundamentals / software / etc.
> > > > > > > >>>
> > > > > >
> > > https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html)
> > > > > > > >>>
> > > > > > > >>> *Pros and cons*
> > > > > > > >>>
> > > > > > > >>> There are a number of pros:
> > > > > > > >>>
> > > > > > > >>>  - Users will have an easier migration path if they are
> > deeply
> > > > > vested
> > > > > > > >>>  into 1.10.* version
> > > > > > > >>>  - It's possible to migrate in stages for people who are
> also
> > > > > vested
> > > > > > in
> > > > > > > >>>  py2: *py2 (1.10) -> py3 (1.10) -> py3 + new operators
> (1.10)
> > > ->
> > > > > py3
> > > > > > +
> > > > > > > >>>  2.0*
> > > > > > > >>>  - Moving to new operators in py3 + new operators can be
> done
> > > > > > > >>>  gradually. Old operators will continue to work while new
> can
> > > be
> > > > > used
> > > > > > > >> more
> > > > > > > >>>  and more
> > > > > > > >>>  - People will get incentivised to migrate to python 3
> before
> > > 2.0
> > > > > is
> > > > > > > >>>  out (by using new operators)
> > > > > > > >>>  - Each provider "package" can have independent release
> > > schedule
> > > > -
> > > > > > and
> > > > > > > >>>  add functionality in already released Airflow versions.
> > > > > > > >>>  - We do not take out any functionality from the users - we
> > > just
> > > > > add
> > > > > > > >>>  more options
> > > > > > > >>>  - The releases can be - similarly as main airflow
> releases -
> > > > voted
> > > > > > > >>>  separately by PMC after "stewards" of the package (per
> > > provider)
> > > > > > > >> perform
> > > > > > > >>>  round of testing on 1.10.* versions.
> > > > > > > >>>  - Users will start migrating to new operators earlier and
> > have
> > > > > > > >>>  smoother switch to 2.0 later
> > > > > > > >>>  - The latest improved operators will start
> > > > > > > >>>
> > > > > > > >>> There are three cons I could think of:
> > > > > > > >>>
> > > > > > > >>>  - There will be quite a lot of duplication between old and
> > new
> > > > > > > >>>  operators (they will co-exist in 1.10). That might lead to
> > > > > confusion
> > > > > > > of
> > > > > > > >>>  users and problems with cooperation between different
> > > > > > operators/hooks
> > > > > > > >>>  - Having new operators in 1.10 python 3 might keep people
> > from
> > > > > > > >>>  migrating to 2.0
> > > > > > > >>>  - It will require some maintenance and separate release
> > > > overhead.
> > > > > > > >>>
> > > > > > > >>> I already spoke to Composer team @Google and they are very
> > > > positive
> > > > > > > about
> > > > > > > >>> this. I also spoke to Ash and seems it might also be OK for
> > > > > > Astronomer
> > > > > > > >>> team. We have Google's backing and support, and we can
> > provide
> > > > > > > >> maintenance
> > > > > > > >>> and support for those packages - being an example for other
> > > > > providers
> > > > > > > how
> > > > > > > >>> they can do it.
> > > > > > > >>>
> > > > > > > >>> Let me know what you think - and whether I should make it
> > into
> > > an
> > > > > > > >> official
> > > > > > > >>> AIP maybe?
> > > > > > > >>>
> > > > > > > >>> J.
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>> --
> > > > > > > >>>
> > > > > > > >>> Jarek Potiuk
> > > > > > > >>> Polidea <https://www.polidea.com/> | Principal Software
> > > Engineer
> > > > > > > >>>
> > > > > > > >>> M: +48 660 796 129 <+48660796129>
> > > > > > > >>> [image: Polidea] <https://www.polidea.com/>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>> --
> > > > > > > >>>
> > > > > > > >>> Jarek Potiuk
> > > > > > > >>> Polidea <https://www.polidea.com/> | Principal Software
> > > Engineer
> > > > > > > >>>
> > > > > > > >>> M: +48 660 796 129 <+48660796129>
> > > > > > > >>> [image: Polidea] <https://www.polidea.com/>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>
> > > > > > > >> --
> > > > > > > >>
> > > > > > > >> Jarek Potiuk
> > > > > > > >> Polidea <https://www.polidea.com/> | Principal Software
> > > Engineer
> > > > > > > >>
> > > > > > > >> M: +48 660 796 129 <+48660796129>
> > > > > > > >> [image: Polidea] <https://www.polidea.com/>
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > >
> > > > > > > > Tomasz Urbaszek
> > > > > > > > Polidea <https://www.polidea.com/> | Junior Software
> Engineer
> > > > > > > >
> > > > > > > > M: +48 505 628 493 <+48505628493>
> > > > > > > > E: tomasz.urbas...@polidea.com <tomasz.urbasz...@polidea.com
> >
> > > > > > > >
> > > > > > > > Unique Tech
> > > > > > > > Check out our projects! <https://www.polidea.com/our-work>
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Jarek Potiuk
> > > > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > > > >
> > > > > > M: +48 660 796 129 <+48660796129>
> > > > > > [image: Polidea] <https://www.polidea.com/>
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Jarek Potiuk
> > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > >
> > > > M: +48 660 796 129 <+48660796129>
> > > > [image: Polidea] <https://www.polidea.com/>
> > > >
> > >
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/>
> >
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Reply via email to