Thanks Ash! It might indeed work. I will take it from there and try to make
a POC PR with airflow.

It's a bit different approach than google-python libraries (they keep all
the libraries as separate sub-packages/mini projects inside the main
project). The approach you propose is far less invasive in terms of
changing structure of the main repo. I like it this way much more. It makes
it much easier to import project in IDE even if it is less modular in
nature.

>From what I understand with this structure - if it works - we have two
options:

(1) For Airflow 2.0 we will be able to install Airflow and all
"integrations" in single (apache-airflow == 2.0.0) package and build
separate backporting integration packages for 1.10.* only.
(2) We will split Airflow 2.0 into separate "core" and "integration"
packages as well while preparing packages.

I think (1) is a bit more reasonable for now, until we work full AIP-8
solution (including dependency hell solving). Let me know what you think
(and others as well).

J.

On Mon, Nov 4, 2019 at 9:24 PM Ash Berlin-Taylor <a...@apache.org> wrote:

> https://github.com/ashb/airflow-submodule-test <
> https://github.com/ashb/airflow-submodule-test>
>
> That seems to work in any order things are installed, at least on python
> 3.7. I've had a stressful few days so I may have missed something. Please
> tell me if there's a case I've missed, or if this is not a suitable proxy
> for our situation.
>
> -a
>
> > On 4 Nov 2019, at 20:08, Ash Berlin-Taylor <a...@apache.org> wrote:
> >
> > Pretty hard pass from me in airflow_ext. If it's released by airflow I
> want it to live under airflow.* (Anyone else is free to release packages
> under any namespace they choose)
> >
> > That said I think I've got something that works:
> >
> >
> /Users/ash/.virtualenvs/test-providers/lib/python3.7/site-packages/notairflow/__init__.py
> module level code running
> >
> /Users/ash/.virtualenvs/test-providers/lib/python3.7/site-packages/notairflow/providers/gcp/__init__.py
> module level code running
> >
> > Let me test it again in a few different cases etc.
> >
> > -a
> >
> > On 4 November 2019 14:00:24 GMT, Jarek Potiuk <jarek.pot...@polidea.com>
> wrote:
> > Hey Ash,
> >
> > Thanks for the offer. I must admin pkgutil and package namespaces are not
> > the best documented part of python.
> >
> > I dug a deep deeper and I found a similar problem -
> > https://github.com/pypa/setuptools/issues/895. <
> https://github.com/pypa/setuptools/issues/895.>  Seems that even if it is
> > not explicitly explained in pkgutil documentation, this comment (assuming
> > it is right) explains everything:
> >
> > *"That's right. All parents of a namespace package must also be namespace
> > packages, as they will necessarily share that parent name space (farm and
> > farm.deps in this example)."*
> >
> > There are few possibilities mentioned in the issue on how this can be
> > "workarounded", but those are by far not perfect solutions. They would
> > require patching already installed airflow's __init__.py to work - to
> > manipulate the search path, Still from my tests I do not know if this
> would
> > be possible at all because of the non-trivial __init__.py we have (and
> use)
> > in the *airflow* package.
> >
> > We have a few PRs now waiting for decision on that one I think, so maybe
> we
> > can simply agree that we should use another package (I really like
> > *"airflow_ext"
> > *:D  and use it from now on? What do you (and others) think.
> >
> > I'd love to start voting on it soon.
> >
> > J.
> >
> >
> >
> > On Thu, Oct 31, 2019 at 5:37 PM Ash Berlin-Taylor <a...@apache.org>
> wrote:
> >
> > Let me run some tests too - I've used them a bit in the past. I thought
> > since we only want to make airflow.providers a namespace package it might
> > work for us.
> >
> > Will report back next week.
> >
> > -ash
> >
> > On 31 October 2019 15:58:22 GMT, Jarek Potiuk <jarek.pot...@polidea.com>
> > wrote:
> > The same repo (so mono-repo approach). All packages would be in
> > "airflow_integrations" directory. It's mainly about moving the
> > operators/hooks/sensor files to different directory structure.
> >
> > It might be done pretty much without changing the current
> > installation/development model:
> >
> > 1) We can add setup.py command to install all the packages in -e mode
> > in
> > the main setup.py (to make it easier to install all deps in one go).
> > 2) We can add dependencies in setup.py extras to install appropriate
> > packages. For example [google] extra will 'require
> > apache-airflow-integrations-providers-google' package - or
> > apache-airflow-providers-google if we decide to skip -integrations from
> > the
> > package name to make it shorter.
> >
> > The only potential drawback I see is a bit more involved setup of the
> > IDE.
> >
> > This way installation method for both dev and prod remains simple.
> >
> > In the future we can have separate release schedule for the packages
> > (AIP-8) but for now we can stick to the same version for
> > 'apache-airflow'
> > and 'apache-airflow-integrations-*' package (+ separate release
> > schedule
> > for backporting needs)
> > Here again the structure of repo (we will likely be able to use native
> > namespaces so I removed some needles __init__.py).
> >
> > |-- airflow
> > |   |- __init__.py|   |- operators -> fundamental operators are here
> > |-- tests -> tests for core airflow are here (optionally we can move
> > them under "airflow")|-- setup.py -> setup.py for the "apache-airflow"
> > package|-- airflow_integrations
> > |   |-providers
> > |   | |-google
> > |   |   |-setup.py -> setup.py for the
> > "apache-airflow-integrations-providers-google" package
> > |   |   |-airflow_integrations
> > |   |     |-providers
> > |   |       |-google
> > |   |         |-__init__.py
> > |   |         | tests -> tests for the
> > "apache-airflow-integrations-providers-google" package|   |
> > |-__init__.py|   |-protocols
> > |     |-setup.py -> setup.py for the
> > "apache-airflow-integrations-protocols" package
> > |     |-airflow_integrations
> > |        |-protocols
> > |          |-__init__.py|          |-tests -> tests for the
> > "apache-airflow-integrations-protocols" package
> >
> >
> > J.
> >
> > On Thu, Oct 31, 2019 at 3:38 PM Kaxil Naik <kaxiln...@gmail.com> wrote:
> >
> > So create another package in a different repo? or the same repo with
> > a
> > separate setup.py file that has airflow has dependency?
> >
> >
> >
> >
> > On Thu, Oct 31, 2019 at 2:32 PM Jarek Potiuk
> > <jarek.pot...@polidea.com>
> > wrote:
> >
> > TL;DR; I did some more testing on how namespaces work. I still
> > believe
> > the
> > only way to use namespaces is to have separate (for example
> > "airflow_integrations") package for all backportable packages.
> >
> > I am not sue if someone used namespaces before, but after reading
> > and
> > trying out , the main blocker seems to be that we have non-trivial
> > code
> > in
> > airflow's "__init__.py"  (including class definitions, imported
> > sub-packages and plugin initialisation).
> >
> > Details are in
> > https://packaging.python.org/guides/packaging-namespace-packages/ <
> https://packaging.python.org/guides/packaging-namespace-packages/>
> > but
> > it's
> > a long one so let me summarize my findings:
> >
> >    - In order to use "airflow.providers" package we would have to
> > declare
> > "airflow" as namespace
> > - It can be done in three different ways:
> >   - omitting __init__.py in this package (native/implicit
> > namespace)
> > - making __init__.py  of the "airflow" package in main
> > airflow (and
> > other packages) must be "*__path__ =
> > __import__('pkgutil').extend_path(__path__, __name__)*"
> > (pkgutil
> > style) or
> > "*__import__('pkg_resources').declare_namespace(__name__)*"
> >       (pkg_resources style)
> >
> > The first is not possible (we already have __init__.py  in
> > "airflow".
> > The second case is not possible because we already have quite a lot
> > in
> > the
> > airflow's "__init__.py" and both pkgutil and pkg_resources style
> > state:
> >
> > "*Every* distribution that uses the namespace package must include
> > an
> > identical *__init__.py*. If any distribution does not, it will
> > cause the
> > namespace logic to fail and the other sub-packages will not be
> > importable.
> > *Any
> > additional code in __init__.py will be inaccessible."*
> >
> > I even tried to add those pkgutil/pkg_resources to airflow and do
> > some
> > experimenting with it - but it does not work. Pip install fails at
> > the
> > plugins_manager as "airflow.plugins" is not accessible (kind of
> > expected),
> > but I am sure there will be other problems as well. :(
> >
> > Basically - we cannot turn "airflow" into namespace because it has
> > some
> > "__init__.py" logic :(.
> >
> > So I think it still holds that if we want to use namespaces, we
> > should
> > use
> > another package. The *"airflow_integrations"* is current candidate,
> > but
> > we
> > can think of some nicer/shorter one: "airflow_ext", "airflow_int",
> > "airflow_x", "airflow_mod", "airlfow_next", "airflow_xt",
> > "airflow_",
> > "ext_airflow", ....  Interestingly "airflow_" is the one suggested
> > by
> > PEP8
> > to avoid conflicts with Python names (which is a different case but
> > kind
> > of
> > close).
> >
> > What do you think?
> >
> > J.
> >
> > On Tue, Oct 29, 2019 at 4:51 PM Kaxil Naik <kaxiln...@gmail.com>
> > wrote:
> >
> > The namespace feature looks promising and from your tests, it
> > looks
> > like
> > it
> > would work well from Airflow 2.0 and onwards.
> >
> > I will look at it in-depth and see if I have more suggestions or
> > opinion
> > on
> > it
> >
> > On Tue, Oct 29, 2019 at 3:32 PM Jarek Potiuk
> > <jarek.pot...@polidea.com
> >
> > wrote:
> >
> > TL;DR; We did some testing about namespaces and packaging (and
> > potential
> > backporting options for 1.10.* python3 Airflows) and we think
> > it's
> > best
> > to
> > use namespaces quickly and use different package name
> > "airflow-integrations" for all non-fundamental integrations.
> >
> > Unless we missed some tricks, we cannot use airflow.*
> > sub-packages
> > for
> > the
> > 1.10.* backportable packages. Example:
> >
> >    - "*apache-airflow"* package provides: "airflow.*" (this is
> > what
> > we
> > have
> >    today)
> >    - "*apache-airflow-providers-google*": provides
> >    "airflow.providers.google.*" packages
> >
> > If we install both packages (old apache-airflow 1.10.6  and new
> > apache-airflow-providers-google from 2.0) - it seems that
> > the "airflow.providers.google.*" package cannot be imported.
> > This is
> > a
> > bit
> > of a problem if we would like to backport the operators from
> > Airflow
> > 2.0
> > to
> > Airflow 1.10 in a way that will be forward-compatible We really
> > want
> > users
> > who started using backported operators in 1.10.* do not have to
> > change
> > imports in their DAGs to run them in Airflow 2.0.
> >
> > We discussed it internally in our team and considered several
> > options,
> > but
> > we think the best way will be to go straight to "namespaces" in
> > Airflow
> > 2.0
> > and to have the integrations (as discussed in AIP-21
> > discussion) to
> > be
> > in a
> > separate "*airflow_integrations*" package.  It might be even
> > more
> > towards
> > the AIP-8 implementation and plays together very well in terms
> > of
> > "stewardship" discussed in AIP-21 now. But we will still keep
> > (for
> > now)
> > single release process for all packages for 2.0 (except for the
> > backporting
> > which can be done per-provider before 2.0 release) and provide
> > a
> > foundation
> > for future more complex release cycles in future versions.
> >
> > Herre is the way how the new Airflow 2.0 repository could look
> > like
> > (i
> > only
> > show subset of dirs but they are representative). For those
> > whose
> > email
> > fixed/colorfont will get corrupted here is an image of this
> > structure
> > https://pasteboard.co/IEesTih.png: <https://pasteboard.co/IEesTih.png:>
> >
> > |-- airflow
> > |   |- __init__.py|   |- operators -> fundamental operators are
> > here
> > |-- tests -> tests for core airflow are here (optionally we can
> > move
> > them under "airflow")|-- setup.py -> setup.py for the
> > "apache-airflow"
> > package|-- airflow_integrations
> > |   |-providers
> > |   | |-google
> > |   |   |-setup.py -> setup.py for the
> > "apache-airflow-integrations-providers-google" package
> > |   |   |-airflow_integrations
> > |   |     |-__init__.py
> > |   |     |-providers
> > |   |       |-__init__.py
> > |   |       |-google
> > |   |         |-__init__.py
> > |   |         | tests -> tests for the
> > "apache-airflow-integrations-providers-google" package|   |
> > |-__init__.py|   |-protocols
> > |     |-setup.py -> setup.py for the
> > "apache-airflow-integrations-protocols" package
> > |     |-airflow_integrations
> > |        |-protocols
> > |          |-__init__.py|          |-tests -> tests for the
> > "apache-airflow-integrations-protocols" package
> >
> > There are a number of pros for this solution:
> >
> >    - We could use the standard namespaces feature of python to
> > build
> >    multiple packages:
> >
> > https://packaging.python.org/guides/packaging-namespace-packages/ <
> https://packaging.python.org/guides/packaging-namespace-packages/>
> > - Installation for users will be the same as previously. We
> > could
> > install the needed packages automatically when particular
> > extras
> > are
> > used
> >   (pip install apache-airflow[google] could install both
> > "apache-airflow"
> > and
> >   "apache-airflow-integrations-providers-google")
> >   - We could have custom setup.py installation process for
> > developers
> > that
> > could install all the packages in development ("-e ." mode)
> > in a
> > single
> > operation.
> > - In case of transfer packages we could have nice error
> > messages
> > informing that the other package needs to be installed (for
> > example
> > S3->GCS
> >   operator would import
> > "airflow-integrations.providers.amazon.*"
> > and
> > if
> > it
> >   fails it could raise ("Please install [amazon] extra to use
> > me.")
> > - We could implement numerous optimisations in the way how
> > we run
> > tests
> > in CI (for example run all the "providers" tests only with
> > sqlite,
> > run
> > tests in parallel etc.)
> > - We could implement it gradually - we do not have to have a
> > "big
> > bang"
> > approach - we can implement it in "provider-by-provider" way
> > and
> > test
> > it
> > with one provider (Google) first to make sure that all the
> > mechanisms
> > are
> >   working
> >   - For now we could have the monorepo approach where all the
> > packages
> > will be developed in concert - for now avoiding the
> > dependency
> > problems
> > (but allowing for back-portability to 1.10).
> > - We will have clear boundaries between packages and ability
> > to
> > test
> > for
> > some unwanted/hidden dependencies between packages.
> > - We could switch to (much better) sphinx-apidoc package to
> > continue
> > building single documentation for all of those (sphinx
> > apidoc has
> > support
> >    for namespaces).
> >
> > As we are working on GCP move from contrib to core, we could
> > make all
> > the
> > effort to test it and try it before we merge it to master so
> > that it
> > will
> > be ready for others (and we could help with most of the moves
> > afterwards).
> > It seems complex, but in fact in most cases it will be very
> > simple
> > move
> > between the packages and can be done incrementally so there is
> > little
> > risk
> > in doing this I think.
> >
> > J.
> >
> >
> > On Mon, Oct 28, 2019 at 11:45 PM Kevin Yang <yrql...@gmail.com>
> > wrote:
> >
> > Tomasz and Ash got good points about the overhead of having
> > separate
> > repos.
> > But while we grow bigger and more mature, I would prefer to
> > have
> > what
> > was
> > described in AIP-8. It shouldn't be extremely hard for us to
> > come
> > up
> > with
> > good strategies to handle the overhead. AIP-8 already talked
> > about
> > how
> > it
> > can benefit us. IMO on a high level, having clearly
> > seperation on
> > core
> > vs.
> > hooks/operators would make the project much more scalable and
> > the
> > gains
> > would outweigh the cost we pay.
> >
> > That being said, I'm supportive to this moving towards AIP-8
> > while
> > learning
> > approach, quite a good practise to tackle a big project.
> > Looking
> > forward
> > to
> > read the AIP.
> >
> >
> > Cheers,
> > Kevin Y
> >
> > On Mon, Oct 28, 2019 at 6:21 AM Jarek Potiuk <
> > jarek.pot...@polidea.com
> >
> > wrote:
> >
> > We are checking how we can use namespaces in back-portable
> > way
> > and
> > we
> > will
> > have POC soon so that we all will be able to see how it
> > will look
> > like.
> >
> > J.
> >
> > On Mon, Oct 28, 2019 at 1:24 PM Ash Berlin-Taylor <
> > a...@apache.org>
> > wrote:
> >
> > I'll have to read your proposal in detail (sorry, no time
> > right
> > now!),
> > but
> > I'm broadly in favour of this approach, and I think
> > keeping
> > them
> > _in_
> > the
> > same repo is the best plan -- that makes writing and
> > testing
> > cross-cutting
> > changes  easier.
> >
> > -a
> >
> > On 28 Oct 2019, at 12:14, Tomasz Urbaszek <
> > tomasz.urbas...@polidea.com
> >
> > wrote:
> >
> > I think utilizing namespaces should reduce a lot of
> > problems
> > raised
> > by
> > using separate repos (who will manage it? how to
> > release?
> > where
> > should
> > be
> > the repo?).
> >
> > Bests,
> > Tomek
> >
> > On Sun, Oct 27, 2019 at 11:54 AM Jarek Potiuk <
> > jarek.pot...@polidea.com>
> > wrote:
> >
> > Thanks Bas for comments! Let me share my thoughts
> > below.
> >
> > On Sun, Oct 27, 2019 at 9:23 AM Bas Harenslak <
> > basharens...@godatadriven.com>
> > wrote:
> >
> > Hi Jarek, I definitely see a future in creating
> > separate
> > installable
> > packages for various operators/hooks/etc (as in
> > AIP-8).
> > This
> > would
> > IMO
> > strip the “core” Airflow to only what’s needed and
> > result
> > in
> > a
> > small
> > package without a ton of dependencies (and make it
> > more
> > maintainable,
> > shorter tests, etc etc etc). Not exactly sure though
> > what
> > you’re
> > proposing
> > in your e-mail, is it a new AIP for an intermediate
> > step
> > towards
> > AIP-8?
> >
> >
> > It's a new AIP I am proposing.  For now it's only for
> > backporting
> > the
> > new
> > 2.0 import paths to 1.10.* series.
> >
> > It's more of "incremental going in direction of AIP-8
> > and
> > learning
> > some
> > difficulties involved" than implementing AIP-8 fully.
> > We are
> > taking
> > advantage of changes in import paths from AIP-21 which
> > make
> > it
> > possible
> > to
> > have both old and new (optional) operators available
> > in
> > 1.10.*
> > series
> > of
> > Airflow. I think there is a lot more to do for full
> > implementation
> > of
> > AIP-8: decisions how to maintain, install those
> > operator
> > groups
> > separately,
> > stewardship model/organisation for the separate
> > groups, how
> > to
> > manage
> > cross-dependencies, procedures for releasing the
> > packages
> > etc.
> >
> > I think about this new AIP also as a learning effort -
> > we
> > would
> > learn
> > more
> > how separate packaging works and then we can follow up
> > with
> > AIP-8
> > full
> > implementation for "modular" Airflow. Then AIP-8 could
> > be
> > implemented
> > in
> > Airflow 2.1 for example - or 3.0 if we start following
> > semantic
> > versioning
> > - based on those learnings. It's a bit of good example
> > of
> > having
> > cake
> > and
> > eating it too. We can try out modularity in 1.10.*
> > while
> > cutting
> > the
> > scope
> > of 2.0 and not implementing full management/release
> > procedure
> > for
> > AIP-8
> > yet.
> >
> >
> > Thinking about this, I think there are still a few
> > grey
> > areas
> > (which
> > would
> > be good to discuss in a new AIP, or continue on
> > AIP-8):
> >
> >  *   In your email you only speak only about the 3
> > big
> > cloud
> > providers
> > (btw I made a PR for migrating all AWS components ->
> > https://github.com/apache/airflow/pull/6439). <
> https://github.com/apache/airflow/pull/6439).> Is
> > there a
> > plan
> > for
> > splitting other components than Google/AWS/Azure?
> >
> >
> > We could add more groups as part of this new AIP
> > indeed (as
> > an
> > extension to
> > AIP-21 and pre-requisite to AIP-8). We already see how
> > moving/deprecation
> > works for the providers package - it works for
> > GCP/Google
> > rather
> > nicely.
> > But there is nothing to prevent us from extending it
> > to
> > cover
> > other
> > groups
> > of operators/hooks. If you look at the current
> > structure of
> > documentation
> > done by Kamil, we can follow the structure there and
> > move
> > the
> > operators/hooks accordingly (
> >
> >
> >
> > https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html <
> https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html>
> > ):
> >
> >      Fundamentals, ASF: Apache Software Foundation,
> > Azure:
> > Microsoft
> > Azure, AWS: Amazon Web Services, GCP: Google Cloud
> > Platform,
> > Service
> > integrations, Software integrations, Protocol
> > integrations.
> >
> > I am happy to include that in the AIP - if others
> > agree
> > it's a
> > good
> > idea.
> > Out of those groups -  I think only Fundamentals
> > should not
> > be
> > back-ported.
> > Others should be rather easy to port (if we decide
> > to). We
> > already
> > have
> > quite a lot of those in the new GCP operators for 2.0.
> > So
> > starting
> > with
> > GCP/Google group is a good idea. Also following with
> > Cloud
> > Providers
> > first
> > is a good thing. For example we have now support from
> > Google
> > Composer
> > team
> > to do this separation for GCP (and we learn from it)
> > and
> > then
> > we
> > can
> > claim
> > the stewardship in our team for releasing the python
> > 3/
> > Airflow
> > 1.10-compatible "airflow-google" packages. Possibly
> > other
> > Cloud
> > Providers/teams might follow this (if they see the
> > value in
> > it)
> > and
> > there
> > could be different stewards for those. And then we can
> > do
> > other
> > groups
> > if
> > we decide to. I think this way we can learn whether
> > AIP-8 is
> > manageable
> > and
> > what real problems we are going to face.
> >
> >  *   Each “plugin” e.g. GCP would be a separate repo,
> > should
> > we
> > create
> > some sort of blueprint for such packages?
> >
> >
> > I think we do not need separate repos (at all) but in
> > this
> > new
> > AIP
> > we
> > can
> > test it before we decide to go for AIP-8. IMHO -
> > monorepo
> > approach
> > will
> > work here rather nicely. We could use python-3 native
> > namespaces
> > <
> >
> > https://packaging.python.org/guides/packaging-namespace-packages/ <
> https://packaging.python.org/guides/packaging-namespace-packages/>>
> > for
> > the
> > sub-packages when we go full AIP-8. For now we could
> > simply
> > package
> > the
> > new
> > operators in separate pip package for Python 3 version
> > 1.10.*
> > series
> > only.
> > We only need to test if it works well with another
> > package
> > providing
> > 'airflow.providers.*' after apache-airflow is
> > installed
> > (providing
> > 'airflow' package). But I think we can make it work. I
> > don't
> > think
> > we
> > really need to split the repos, namespaces will work
> > just
> > fine
> > and
> > has
> > easier management of cross-repository dependencies
> > (but we
> > can
> > learn
> > otherwise). For sure we will not need it for the new
> > proposed
> > AIP
> > of
> > backporting groups to 1.10 and we can defer that
> > decision to
> > AIP-8
> > implementation time.
> >
> >
> > *   In which Airflow version do we start raising
> > deprecation
> > warnings
> > and in which version would we remove the original?
> >
> >
> > I think we should do what we did in GCP case already.
> > Those
> > old
> > "imports"
> > for operators can be made as deprecated in Airflow 2.0
> > (and
> > removed
> > in
> > 2.1
> > or 3.0 if we start following semantic versioning). We
> > can
> > however
> > do
> > it
> > before in 1.10.7 or 1.10.8 if we release those
> > (without
> > removing
> > the
> > old
> > operators yet - just raise deprecation warnings and
> > inform
> > that
> > for
> > python3
> > the new "airflow-google", "airflow-aws" etc. packages
> > can be
> > installed
> > and
> > users can switch to it).
> >
> > J.
> >
> >
> >
> > Cheers,
> > Bas
> >
> > On 27 Oct 2019, at 08:33, Jarek Potiuk <
> > jarek.pot...@polidea.com
> > <mailto:
> > jarek.pot...@polidea.com>> wrote:
> >
> > Hello - any comments on that? I am happy to make it
> > into an
> > AIP
> > :)?
> >
> > On Sun, Oct 13, 2019 at 5:53 PM Jarek Potiuk <
> > jarek.pot...@polidea.com
> > <mailto:jarek.pot...@polidea.com>>
> > wrote:
> >
> > *Motivation*
> >
> > I think we really should start thinking about making
> > it
> > easier
> > to
> > migrate
> > to 2.0 for our users. After implementing some recent
> > changes
> > related
> > to
> > AIP-21-
> > Changes in import paths
> > <
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
> <
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
> >
> >
> > I
> > think I have an idea that might help with it.
> >
> > *Proposal*
> >
> > We could package some of the new and improved 2.0
> > operators
> > (moved
> > to
> > "providers" package) and let them be used in Python 3
> > environment
> > of
> > airflow 1.10.x.
> >
> > This can be done case-by-case per "cloud provider".
> > It
> > should
> > not
> > be
> > obligatory, should be largely driven by each
> > provider. It's
> > not
> > yet
> > full
> > AIP-8
> > Split Hooks/Operators into separate packages
> > <
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303
> <
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303
> >
> > .
> > It's
> > merely backporting of some operators/hooks to get it
> > work
> > in
> > 1.10.
> > But
> > by
> > doing it we might try out the concept of splitting,
> > learn
> > about
> > maintenance
> > problems and maybe implement full *AIP-8 *approach in
> > 2.1
> > consistently
> > across the board.
> >
> > *Context*
> >
> > Part of the AIP-21 was to move import paths for Cloud
> > providers
> > to
> > separate providers/<PROVIDER> package. An example for
> > that
> > (the
> > first
> > provider we already almost migrated) was
> > providers/google
> > package
> > (further
> > divided into gcp/gsuite etc).
> >
> > We've done a massive migration of all the
> > Google-related
> > operators,
> > created a few missing ones and retrofitted some old
> > operators
> > to
> > follow
> > GCP
> > best practices and fixing a number of problems - also
> > implementing
> > Python3
> > and Pylint compatibility. Some of these
> > operators/hooks are
> > not
> > backwards
> > compatible. Those that are compatible are still
> > available
> > via
> > the
> > old
> > imports with deprecation warning.
> >
> > We've added missing tests (including system tests)
> > and
> > missing
> > features -
> > improving some of the Google operators - giving the
> > users
> > more
> > capabilities
> > and fixing some issues. Those operators should pretty
> > much
> > "just
> > work"
> > in
> > Airflow 1.10.x (any recent version) for Python 3. We
> > should
> > be
> > able
> > to
> > release a separate pip-installable package for those
> > operators
> > that
> > users
> > should be able to install in Airflow 1.10.x.
> >
> > Any user will be able to install this separate
> > package in
> > their
> > Airflow
> > 1.10.x installation and start using those new
> > "provider"
> > operators
> > in
> > parallel to the old 1.10.x operators. Other providers
> > ("microsoft",
> > "amazon") might follow the same approach if they
> > want. We
> > could
> > even
> > at
> > some point decide to move some of the core operators
> > in
> > similar
> > fashion
> > (for example following the structure proposed in the
> > latest
> > documentation:
> > fundamentals / software / etc.
> >
> >
> >
> > https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html) <
> https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html)>
> >
> > *Pros and cons*
> >
> > There are a number of pros:
> >
> >  - Users will have an easier migration path if they
> > are
> > deeply
> > vested
> > into 1.10.* version
> > - It's possible to migrate in stages for people who
> > are
> > also
> > vested
> > in
> > py2: *py2 (1.10) -> py3 (1.10) -> py3 + new
> > operators
> > (1.10)
> > ->
> > py3
> > +
> > 2.0*
> > - Moving to new operators in py3 + new operators can
> > be
> > done
> > gradually. Old operators will continue to work while
> > new
> > can
> > be
> > used
> > more
> > and more
> > - People will get incentivised to migrate to python
> > 3
> > before
> > 2.0
> > is
> > out (by using new operators)
> > - Each provider "package" can have independent
> > release
> > schedule
> > -
> > and
> > add functionality in already released Airflow
> > versions.
> > - We do not take out any functionality from the
> > users - we
> > just
> > add
> > more options
> > - The releases can be - similarly as main airflow
> > releases -
> > voted
> > separately by PMC after "stewards" of the package
> > (per
> > provider)
> > perform
> > round of testing on 1.10.* versions.
> > - Users will start migrating to new operators
> > earlier and
> > have
> >  smoother switch to 2.0 later
> >  - The latest improved operators will start
> >
> > There are three cons I could think of:
> >
> >  - There will be quite a lot of duplication between
> > old and
> > new
> > operators (they will co-exist in 1.10). That might
> > lead to
> > confusion
> > of
> > users and problems with cooperation between
> > different
> > operators/hooks
> > - Having new operators in 1.10 python 3 might keep
> > people
> > from
> > migrating to 2.0
> > - It will require some maintenance and separate
> > release
> > overhead.
> >
> > I already spoke to Composer team @Google and they are
> > very
> > positive
> > about
> > this. I also spoke to Ash and seems it might also be
> > OK for
> > Astronomer
> > team. We have Google's backing and support, and we
> > can
> > provide
> > maintenance
> > and support for those packages - being an example for
> > other
> > providers
> > how
> > they can do it.
> >
> > Let me know what you think - and whether I should
> > make it
> > into
> > an
> > official
> > AIP maybe?
> >
> > J.
> >
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> Principal
> > Software
> > Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
> >
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> Principal
> > Software
> > Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
> >
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> Principal
> > Software
> > Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
> >
> >
> >
> > --
> >
> > Tomasz Urbaszek
> > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> | Junior
> Software
> > Engineer
> >
> > M: +48 505 628 493 <+48505628493>
> > E: tomasz.urbas...@polidea.com
> > <tomasz.urbasz...@polidea.com
> >
> >
> > Unique Tech
> > Check out our projects!
> > <https://www.polidea.com/our-work <https://www.polidea.com/our-work>>
> >
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> Principal Software
> > Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
> >
> >
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> Principal Software
> > Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
> >
> >
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
> >
> >
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
> >
> >
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Reply via email to