Thanks Ash! It seems it works really well and is super simple!

I have a POC working for Airflow:
https://github.com/apache/airflow/pull/6507

I managed to build and pip-install two packages:

1) apache_airflow 2.0 -> which is the same as today - containing everything
- including providers and gcp.

2) apache-airflow-providers-google package which has apache-airflow-1.10.*
as installation prerequisite.

I managed to actually schedule the example_gcp_pubsub dag from
airflow.providers.google.example_dags - which uses
airflow.providers.google.cloud.operators.pubsub operators and the results
are attached (Hope you can see pictures).
It worked very nicely - when I just did 'pip install
apache-airflow-providers-google' it downloaded and installed from pip
apache-airflow-1.10.6 + all prerequisites from the [gcp] extra (which I
added as needed for the google package).

So we seem to have a working solution now. I will cast a final vote for
what I think is a consensus now as update to AIP-21 (there is no point in
creating a separate AIP).

J.


On Tue, Nov 5, 2019 at 11:34 AM Kaxil Naik <kaxiln...@gmail.com> wrote:

> Yes let's just do (1) for now.
>
>
>
> On Tue, Nov 5, 2019, 08:48 Jarek Potiuk <jarek.pot...@polidea.com> wrote:
>
> > Thanks Ash! It might indeed work. I will take it from there and try to
> make
> > a POC PR with airflow.
> >
> > It's a bit different approach than google-python libraries (they keep all
> > the libraries as separate sub-packages/mini projects inside the main
> > project). The approach you propose is far less invasive in terms of
> > changing structure of the main repo. I like it this way much more. It
> makes
> > it much easier to import project in IDE even if it is less modular in
> > nature.
> >
> > From what I understand with this structure - if it works - we have two
> > options:
> >
> > (1) For Airflow 2.0 we will be able to install Airflow and all
> > "integrations" in single (apache-airflow == 2.0.0) package and build
> > separate backporting integration packages for 1.10.* only.
> > (2) We will split Airflow 2.0 into separate "core" and "integration"
> > packages as well while preparing packages.
> >
> > I think (1) is a bit more reasonable for now, until we work full AIP-8
> > solution (including dependency hell solving). Let me know what you think
> > (and others as well).
> >
> > J.
> >
> > On Mon, Nov 4, 2019 at 9:24 PM Ash Berlin-Taylor <a...@apache.org> wrote:
> >
> > > https://github.com/ashb/airflow-submodule-test <
> > > https://github.com/ashb/airflow-submodule-test>
> > >
> > > That seems to work in any order things are installed, at least on
> python
> > > 3.7. I've had a stressful few days so I may have missed something.
> Please
> > > tell me if there's a case I've missed, or if this is not a suitable
> proxy
> > > for our situation.
> > >
> > > -a
> > >
> > > > On 4 Nov 2019, at 20:08, Ash Berlin-Taylor <a...@apache.org> wrote:
> > > >
> > > > Pretty hard pass from me in airflow_ext. If it's released by airflow
> I
> > > want it to live under airflow.* (Anyone else is free to release
> packages
> > > under any namespace they choose)
> > > >
> > > > That said I think I've got something that works:
> > > >
> > > >
> > >
> >
> /Users/ash/.virtualenvs/test-providers/lib/python3.7/site-packages/notairflow/__init__.py
> > > module level code running
> > > >
> > >
> >
> /Users/ash/.virtualenvs/test-providers/lib/python3.7/site-packages/notairflow/providers/gcp/__init__.py
> > > module level code running
> > > >
> > > > Let me test it again in a few different cases etc.
> > > >
> > > > -a
> > > >
> > > > On 4 November 2019 14:00:24 GMT, Jarek Potiuk <
> > jarek.pot...@polidea.com>
> > > wrote:
> > > > Hey Ash,
> > > >
> > > > Thanks for the offer. I must admin pkgutil and package namespaces are
> > not
> > > > the best documented part of python.
> > > >
> > > > I dug a deep deeper and I found a similar problem -
> > > > https://github.com/pypa/setuptools/issues/895. <
> > > https://github.com/pypa/setuptools/issues/895.>  Seems that even if it
> > is
> > > > not explicitly explained in pkgutil documentation, this comment
> > (assuming
> > > > it is right) explains everything:
> > > >
> > > > *"That's right. All parents of a namespace package must also be
> > namespace
> > > > packages, as they will necessarily share that parent name space (farm
> > and
> > > > farm.deps in this example)."*
> > > >
> > > > There are few possibilities mentioned in the issue on how this can be
> > > > "workarounded", but those are by far not perfect solutions. They
> would
> > > > require patching already installed airflow's __init__.py to work - to
> > > > manipulate the search path, Still from my tests I do not know if this
> > > would
> > > > be possible at all because of the non-trivial __init__.py we have
> (and
> > > use)
> > > > in the *airflow* package.
> > > >
> > > > We have a few PRs now waiting for decision on that one I think, so
> > maybe
> > > we
> > > > can simply agree that we should use another package (I really like
> > > > *"airflow_ext"
> > > > *:D  and use it from now on? What do you (and others) think.
> > > >
> > > > I'd love to start voting on it soon.
> > > >
> > > > J.
> > > >
> > > >
> > > >
> > > > On Thu, Oct 31, 2019 at 5:37 PM Ash Berlin-Taylor <a...@apache.org>
> > > wrote:
> > > >
> > > > Let me run some tests too - I've used them a bit in the past. I
> thought
> > > > since we only want to make airflow.providers a namespace package it
> > might
> > > > work for us.
> > > >
> > > > Will report back next week.
> > > >
> > > > -ash
> > > >
> > > > On 31 October 2019 15:58:22 GMT, Jarek Potiuk <
> > jarek.pot...@polidea.com>
> > > > wrote:
> > > > The same repo (so mono-repo approach). All packages would be in
> > > > "airflow_integrations" directory. It's mainly about moving the
> > > > operators/hooks/sensor files to different directory structure.
> > > >
> > > > It might be done pretty much without changing the current
> > > > installation/development model:
> > > >
> > > > 1) We can add setup.py command to install all the packages in -e mode
> > > > in
> > > > the main setup.py (to make it easier to install all deps in one go).
> > > > 2) We can add dependencies in setup.py extras to install appropriate
> > > > packages. For example [google] extra will 'require
> > > > apache-airflow-integrations-providers-google' package - or
> > > > apache-airflow-providers-google if we decide to skip -integrations
> from
> > > > the
> > > > package name to make it shorter.
> > > >
> > > > The only potential drawback I see is a bit more involved setup of the
> > > > IDE.
> > > >
> > > > This way installation method for both dev and prod remains simple.
> > > >
> > > > In the future we can have separate release schedule for the packages
> > > > (AIP-8) but for now we can stick to the same version for
> > > > 'apache-airflow'
> > > > and 'apache-airflow-integrations-*' package (+ separate release
> > > > schedule
> > > > for backporting needs)
> > > > Here again the structure of repo (we will likely be able to use
> native
> > > > namespaces so I removed some needles __init__.py).
> > > >
> > > > |-- airflow
> > > > |   |- __init__.py|   |- operators -> fundamental operators are here
> > > > |-- tests -> tests for core airflow are here (optionally we can move
> > > > them under "airflow")|-- setup.py -> setup.py for the
> "apache-airflow"
> > > > package|-- airflow_integrations
> > > > |   |-providers
> > > > |   | |-google
> > > > |   |   |-setup.py -> setup.py for the
> > > > "apache-airflow-integrations-providers-google" package
> > > > |   |   |-airflow_integrations
> > > > |   |     |-providers
> > > > |   |       |-google
> > > > |   |         |-__init__.py
> > > > |   |         | tests -> tests for the
> > > > "apache-airflow-integrations-providers-google" package|   |
> > > > |-__init__.py|   |-protocols
> > > > |     |-setup.py -> setup.py for the
> > > > "apache-airflow-integrations-protocols" package
> > > > |     |-airflow_integrations
> > > > |        |-protocols
> > > > |          |-__init__.py|          |-tests -> tests for the
> > > > "apache-airflow-integrations-protocols" package
> > > >
> > > >
> > > > J.
> > > >
> > > > On Thu, Oct 31, 2019 at 3:38 PM Kaxil Naik <kaxiln...@gmail.com>
> > wrote:
> > > >
> > > > So create another package in a different repo? or the same repo with
> > > > a
> > > > separate setup.py file that has airflow has dependency?
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, Oct 31, 2019 at 2:32 PM Jarek Potiuk
> > > > <jarek.pot...@polidea.com>
> > > > wrote:
> > > >
> > > > TL;DR; I did some more testing on how namespaces work. I still
> > > > believe
> > > > the
> > > > only way to use namespaces is to have separate (for example
> > > > "airflow_integrations") package for all backportable packages.
> > > >
> > > > I am not sue if someone used namespaces before, but after reading
> > > > and
> > > > trying out , the main blocker seems to be that we have non-trivial
> > > > code
> > > > in
> > > > airflow's "__init__.py"  (including class definitions, imported
> > > > sub-packages and plugin initialisation).
> > > >
> > > > Details are in
> > > > https://packaging.python.org/guides/packaging-namespace-packages/ <
> > > https://packaging.python.org/guides/packaging-namespace-packages/>
> > > > but
> > > > it's
> > > > a long one so let me summarize my findings:
> > > >
> > > >    - In order to use "airflow.providers" package we would have to
> > > > declare
> > > > "airflow" as namespace
> > > > - It can be done in three different ways:
> > > >   - omitting __init__.py in this package (native/implicit
> > > > namespace)
> > > > - making __init__.py  of the "airflow" package in main
> > > > airflow (and
> > > > other packages) must be "*__path__ =
> > > > __import__('pkgutil').extend_path(__path__, __name__)*"
> > > > (pkgutil
> > > > style) or
> > > > "*__import__('pkg_resources').declare_namespace(__name__)*"
> > > >       (pkg_resources style)
> > > >
> > > > The first is not possible (we already have __init__.py  in
> > > > "airflow".
> > > > The second case is not possible because we already have quite a lot
> > > > in
> > > > the
> > > > airflow's "__init__.py" and both pkgutil and pkg_resources style
> > > > state:
> > > >
> > > > "*Every* distribution that uses the namespace package must include
> > > > an
> > > > identical *__init__.py*. If any distribution does not, it will
> > > > cause the
> > > > namespace logic to fail and the other sub-packages will not be
> > > > importable.
> > > > *Any
> > > > additional code in __init__.py will be inaccessible."*
> > > >
> > > > I even tried to add those pkgutil/pkg_resources to airflow and do
> > > > some
> > > > experimenting with it - but it does not work. Pip install fails at
> > > > the
> > > > plugins_manager as "airflow.plugins" is not accessible (kind of
> > > > expected),
> > > > but I am sure there will be other problems as well. :(
> > > >
> > > > Basically - we cannot turn "airflow" into namespace because it has
> > > > some
> > > > "__init__.py" logic :(.
> > > >
> > > > So I think it still holds that if we want to use namespaces, we
> > > > should
> > > > use
> > > > another package. The *"airflow_integrations"* is current candidate,
> > > > but
> > > > we
> > > > can think of some nicer/shorter one: "airflow_ext", "airflow_int",
> > > > "airflow_x", "airflow_mod", "airlfow_next", "airflow_xt",
> > > > "airflow_",
> > > > "ext_airflow", ....  Interestingly "airflow_" is the one suggested
> > > > by
> > > > PEP8
> > > > to avoid conflicts with Python names (which is a different case but
> > > > kind
> > > > of
> > > > close).
> > > >
> > > > What do you think?
> > > >
> > > > J.
> > > >
> > > > On Tue, Oct 29, 2019 at 4:51 PM Kaxil Naik <kaxiln...@gmail.com>
> > > > wrote:
> > > >
> > > > The namespace feature looks promising and from your tests, it
> > > > looks
> > > > like
> > > > it
> > > > would work well from Airflow 2.0 and onwards.
> > > >
> > > > I will look at it in-depth and see if I have more suggestions or
> > > > opinion
> > > > on
> > > > it
> > > >
> > > > On Tue, Oct 29, 2019 at 3:32 PM Jarek Potiuk
> > > > <jarek.pot...@polidea.com
> > > >
> > > > wrote:
> > > >
> > > > TL;DR; We did some testing about namespaces and packaging (and
> > > > potential
> > > > backporting options for 1.10.* python3 Airflows) and we think
> > > > it's
> > > > best
> > > > to
> > > > use namespaces quickly and use different package name
> > > > "airflow-integrations" for all non-fundamental integrations.
> > > >
> > > > Unless we missed some tricks, we cannot use airflow.*
> > > > sub-packages
> > > > for
> > > > the
> > > > 1.10.* backportable packages. Example:
> > > >
> > > >    - "*apache-airflow"* package provides: "airflow.*" (this is
> > > > what
> > > > we
> > > > have
> > > >    today)
> > > >    - "*apache-airflow-providers-google*": provides
> > > >    "airflow.providers.google.*" packages
> > > >
> > > > If we install both packages (old apache-airflow 1.10.6  and new
> > > > apache-airflow-providers-google from 2.0) - it seems that
> > > > the "airflow.providers.google.*" package cannot be imported.
> > > > This is
> > > > a
> > > > bit
> > > > of a problem if we would like to backport the operators from
> > > > Airflow
> > > > 2.0
> > > > to
> > > > Airflow 1.10 in a way that will be forward-compatible We really
> > > > want
> > > > users
> > > > who started using backported operators in 1.10.* do not have to
> > > > change
> > > > imports in their DAGs to run them in Airflow 2.0.
> > > >
> > > > We discussed it internally in our team and considered several
> > > > options,
> > > > but
> > > > we think the best way will be to go straight to "namespaces" in
> > > > Airflow
> > > > 2.0
> > > > and to have the integrations (as discussed in AIP-21
> > > > discussion) to
> > > > be
> > > > in a
> > > > separate "*airflow_integrations*" package.  It might be even
> > > > more
> > > > towards
> > > > the AIP-8 implementation and plays together very well in terms
> > > > of
> > > > "stewardship" discussed in AIP-21 now. But we will still keep
> > > > (for
> > > > now)
> > > > single release process for all packages for 2.0 (except for the
> > > > backporting
> > > > which can be done per-provider before 2.0 release) and provide
> > > > a
> > > > foundation
> > > > for future more complex release cycles in future versions.
> > > >
> > > > Herre is the way how the new Airflow 2.0 repository could look
> > > > like
> > > > (i
> > > > only
> > > > show subset of dirs but they are representative). For those
> > > > whose
> > > > email
> > > > fixed/colorfont will get corrupted here is an image of this
> > > > structure
> > > > https://pasteboard.co/IEesTih.png: <
> https://pasteboard.co/IEesTih.png
> > :>
> > > >
> > > > |-- airflow
> > > > |   |- __init__.py|   |- operators -> fundamental operators are
> > > > here
> > > > |-- tests -> tests for core airflow are here (optionally we can
> > > > move
> > > > them under "airflow")|-- setup.py -> setup.py for the
> > > > "apache-airflow"
> > > > package|-- airflow_integrations
> > > > |   |-providers
> > > > |   | |-google
> > > > |   |   |-setup.py -> setup.py for the
> > > > "apache-airflow-integrations-providers-google" package
> > > > |   |   |-airflow_integrations
> > > > |   |     |-__init__.py
> > > > |   |     |-providers
> > > > |   |       |-__init__.py
> > > > |   |       |-google
> > > > |   |         |-__init__.py
> > > > |   |         | tests -> tests for the
> > > > "apache-airflow-integrations-providers-google" package|   |
> > > > |-__init__.py|   |-protocols
> > > > |     |-setup.py -> setup.py for the
> > > > "apache-airflow-integrations-protocols" package
> > > > |     |-airflow_integrations
> > > > |        |-protocols
> > > > |          |-__init__.py|          |-tests -> tests for the
> > > > "apache-airflow-integrations-protocols" package
> > > >
> > > > There are a number of pros for this solution:
> > > >
> > > >    - We could use the standard namespaces feature of python to
> > > > build
> > > >    multiple packages:
> > > >
> > > > https://packaging.python.org/guides/packaging-namespace-packages/ <
> > > https://packaging.python.org/guides/packaging-namespace-packages/>
> > > > - Installation for users will be the same as previously. We
> > > > could
> > > > install the needed packages automatically when particular
> > > > extras
> > > > are
> > > > used
> > > >   (pip install apache-airflow[google] could install both
> > > > "apache-airflow"
> > > > and
> > > >   "apache-airflow-integrations-providers-google")
> > > >   - We could have custom setup.py installation process for
> > > > developers
> > > > that
> > > > could install all the packages in development ("-e ." mode)
> > > > in a
> > > > single
> > > > operation.
> > > > - In case of transfer packages we could have nice error
> > > > messages
> > > > informing that the other package needs to be installed (for
> > > > example
> > > > S3->GCS
> > > >   operator would import
> > > > "airflow-integrations.providers.amazon.*"
> > > > and
> > > > if
> > > > it
> > > >   fails it could raise ("Please install [amazon] extra to use
> > > > me.")
> > > > - We could implement numerous optimisations in the way how
> > > > we run
> > > > tests
> > > > in CI (for example run all the "providers" tests only with
> > > > sqlite,
> > > > run
> > > > tests in parallel etc.)
> > > > - We could implement it gradually - we do not have to have a
> > > > "big
> > > > bang"
> > > > approach - we can implement it in "provider-by-provider" way
> > > > and
> > > > test
> > > > it
> > > > with one provider (Google) first to make sure that all the
> > > > mechanisms
> > > > are
> > > >   working
> > > >   - For now we could have the monorepo approach where all the
> > > > packages
> > > > will be developed in concert - for now avoiding the
> > > > dependency
> > > > problems
> > > > (but allowing for back-portability to 1.10).
> > > > - We will have clear boundaries between packages and ability
> > > > to
> > > > test
> > > > for
> > > > some unwanted/hidden dependencies between packages.
> > > > - We could switch to (much better) sphinx-apidoc package to
> > > > continue
> > > > building single documentation for all of those (sphinx
> > > > apidoc has
> > > > support
> > > >    for namespaces).
> > > >
> > > > As we are working on GCP move from contrib to core, we could
> > > > make all
> > > > the
> > > > effort to test it and try it before we merge it to master so
> > > > that it
> > > > will
> > > > be ready for others (and we could help with most of the moves
> > > > afterwards).
> > > > It seems complex, but in fact in most cases it will be very
> > > > simple
> > > > move
> > > > between the packages and can be done incrementally so there is
> > > > little
> > > > risk
> > > > in doing this I think.
> > > >
> > > > J.
> > > >
> > > >
> > > > On Mon, Oct 28, 2019 at 11:45 PM Kevin Yang <yrql...@gmail.com>
> > > > wrote:
> > > >
> > > > Tomasz and Ash got good points about the overhead of having
> > > > separate
> > > > repos.
> > > > But while we grow bigger and more mature, I would prefer to
> > > > have
> > > > what
> > > > was
> > > > described in AIP-8. It shouldn't be extremely hard for us to
> > > > come
> > > > up
> > > > with
> > > > good strategies to handle the overhead. AIP-8 already talked
> > > > about
> > > > how
> > > > it
> > > > can benefit us. IMO on a high level, having clearly
> > > > seperation on
> > > > core
> > > > vs.
> > > > hooks/operators would make the project much more scalable and
> > > > the
> > > > gains
> > > > would outweigh the cost we pay.
> > > >
> > > > That being said, I'm supportive to this moving towards AIP-8
> > > > while
> > > > learning
> > > > approach, quite a good practise to tackle a big project.
> > > > Looking
> > > > forward
> > > > to
> > > > read the AIP.
> > > >
> > > >
> > > > Cheers,
> > > > Kevin Y
> > > >
> > > > On Mon, Oct 28, 2019 at 6:21 AM Jarek Potiuk <
> > > > jarek.pot...@polidea.com
> > > >
> > > > wrote:
> > > >
> > > > We are checking how we can use namespaces in back-portable
> > > > way
> > > > and
> > > > we
> > > > will
> > > > have POC soon so that we all will be able to see how it
> > > > will look
> > > > like.
> > > >
> > > > J.
> > > >
> > > > On Mon, Oct 28, 2019 at 1:24 PM Ash Berlin-Taylor <
> > > > a...@apache.org>
> > > > wrote:
> > > >
> > > > I'll have to read your proposal in detail (sorry, no time
> > > > right
> > > > now!),
> > > > but
> > > > I'm broadly in favour of this approach, and I think
> > > > keeping
> > > > them
> > > > _in_
> > > > the
> > > > same repo is the best plan -- that makes writing and
> > > > testing
> > > > cross-cutting
> > > > changes  easier.
> > > >
> > > > -a
> > > >
> > > > On 28 Oct 2019, at 12:14, Tomasz Urbaszek <
> > > > tomasz.urbas...@polidea.com
> > > >
> > > > wrote:
> > > >
> > > > I think utilizing namespaces should reduce a lot of
> > > > problems
> > > > raised
> > > > by
> > > > using separate repos (who will manage it? how to
> > > > release?
> > > > where
> > > > should
> > > > be
> > > > the repo?).
> > > >
> > > > Bests,
> > > > Tomek
> > > >
> > > > On Sun, Oct 27, 2019 at 11:54 AM Jarek Potiuk <
> > > > jarek.pot...@polidea.com>
> > > > wrote:
> > > >
> > > > Thanks Bas for comments! Let me share my thoughts
> > > > below.
> > > >
> > > > On Sun, Oct 27, 2019 at 9:23 AM Bas Harenslak <
> > > > basharens...@godatadriven.com>
> > > > wrote:
> > > >
> > > > Hi Jarek, I definitely see a future in creating
> > > > separate
> > > > installable
> > > > packages for various operators/hooks/etc (as in
> > > > AIP-8).
> > > > This
> > > > would
> > > > IMO
> > > > strip the “core” Airflow to only what’s needed and
> > > > result
> > > > in
> > > > a
> > > > small
> > > > package without a ton of dependencies (and make it
> > > > more
> > > > maintainable,
> > > > shorter tests, etc etc etc). Not exactly sure though
> > > > what
> > > > you’re
> > > > proposing
> > > > in your e-mail, is it a new AIP for an intermediate
> > > > step
> > > > towards
> > > > AIP-8?
> > > >
> > > >
> > > > It's a new AIP I am proposing.  For now it's only for
> > > > backporting
> > > > the
> > > > new
> > > > 2.0 import paths to 1.10.* series.
> > > >
> > > > It's more of "incremental going in direction of AIP-8
> > > > and
> > > > learning
> > > > some
> > > > difficulties involved" than implementing AIP-8 fully.
> > > > We are
> > > > taking
> > > > advantage of changes in import paths from AIP-21 which
> > > > make
> > > > it
> > > > possible
> > > > to
> > > > have both old and new (optional) operators available
> > > > in
> > > > 1.10.*
> > > > series
> > > > of
> > > > Airflow. I think there is a lot more to do for full
> > > > implementation
> > > > of
> > > > AIP-8: decisions how to maintain, install those
> > > > operator
> > > > groups
> > > > separately,
> > > > stewardship model/organisation for the separate
> > > > groups, how
> > > > to
> > > > manage
> > > > cross-dependencies, procedures for releasing the
> > > > packages
> > > > etc.
> > > >
> > > > I think about this new AIP also as a learning effort -
> > > > we
> > > > would
> > > > learn
> > > > more
> > > > how separate packaging works and then we can follow up
> > > > with
> > > > AIP-8
> > > > full
> > > > implementation for "modular" Airflow. Then AIP-8 could
> > > > be
> > > > implemented
> > > > in
> > > > Airflow 2.1 for example - or 3.0 if we start following
> > > > semantic
> > > > versioning
> > > > - based on those learnings. It's a bit of good example
> > > > of
> > > > having
> > > > cake
> > > > and
> > > > eating it too. We can try out modularity in 1.10.*
> > > > while
> > > > cutting
> > > > the
> > > > scope
> > > > of 2.0 and not implementing full management/release
> > > > procedure
> > > > for
> > > > AIP-8
> > > > yet.
> > > >
> > > >
> > > > Thinking about this, I think there are still a few
> > > > grey
> > > > areas
> > > > (which
> > > > would
> > > > be good to discuss in a new AIP, or continue on
> > > > AIP-8):
> > > >
> > > >  *   In your email you only speak only about the 3
> > > > big
> > > > cloud
> > > > providers
> > > > (btw I made a PR for migrating all AWS components ->
> > > > https://github.com/apache/airflow/pull/6439). <
> > > https://github.com/apache/airflow/pull/6439).> Is
> > > > there a
> > > > plan
> > > > for
> > > > splitting other components than Google/AWS/Azure?
> > > >
> > > >
> > > > We could add more groups as part of this new AIP
> > > > indeed (as
> > > > an
> > > > extension to
> > > > AIP-21 and pre-requisite to AIP-8). We already see how
> > > > moving/deprecation
> > > > works for the providers package - it works for
> > > > GCP/Google
> > > > rather
> > > > nicely.
> > > > But there is nothing to prevent us from extending it
> > > > to
> > > > cover
> > > > other
> > > > groups
> > > > of operators/hooks. If you look at the current
> > > > structure of
> > > > documentation
> > > > done by Kamil, we can follow the structure there and
> > > > move
> > > > the
> > > > operators/hooks accordingly (
> > > >
> > > >
> > > >
> > > >
> https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html
> > <
> > > https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html>
> > > > ):
> > > >
> > > >      Fundamentals, ASF: Apache Software Foundation,
> > > > Azure:
> > > > Microsoft
> > > > Azure, AWS: Amazon Web Services, GCP: Google Cloud
> > > > Platform,
> > > > Service
> > > > integrations, Software integrations, Protocol
> > > > integrations.
> > > >
> > > > I am happy to include that in the AIP - if others
> > > > agree
> > > > it's a
> > > > good
> > > > idea.
> > > > Out of those groups -  I think only Fundamentals
> > > > should not
> > > > be
> > > > back-ported.
> > > > Others should be rather easy to port (if we decide
> > > > to). We
> > > > already
> > > > have
> > > > quite a lot of those in the new GCP operators for 2.0.
> > > > So
> > > > starting
> > > > with
> > > > GCP/Google group is a good idea. Also following with
> > > > Cloud
> > > > Providers
> > > > first
> > > > is a good thing. For example we have now support from
> > > > Google
> > > > Composer
> > > > team
> > > > to do this separation for GCP (and we learn from it)
> > > > and
> > > > then
> > > > we
> > > > can
> > > > claim
> > > > the stewardship in our team for releasing the python
> > > > 3/
> > > > Airflow
> > > > 1.10-compatible "airflow-google" packages. Possibly
> > > > other
> > > > Cloud
> > > > Providers/teams might follow this (if they see the
> > > > value in
> > > > it)
> > > > and
> > > > there
> > > > could be different stewards for those. And then we can
> > > > do
> > > > other
> > > > groups
> > > > if
> > > > we decide to. I think this way we can learn whether
> > > > AIP-8 is
> > > > manageable
> > > > and
> > > > what real problems we are going to face.
> > > >
> > > >  *   Each “plugin” e.g. GCP would be a separate repo,
> > > > should
> > > > we
> > > > create
> > > > some sort of blueprint for such packages?
> > > >
> > > >
> > > > I think we do not need separate repos (at all) but in
> > > > this
> > > > new
> > > > AIP
> > > > we
> > > > can
> > > > test it before we decide to go for AIP-8. IMHO -
> > > > monorepo
> > > > approach
> > > > will
> > > > work here rather nicely. We could use python-3 native
> > > > namespaces
> > > > <
> > > >
> > > > https://packaging.python.org/guides/packaging-namespace-packages/ <
> > > https://packaging.python.org/guides/packaging-namespace-packages/>>
> > > > for
> > > > the
> > > > sub-packages when we go full AIP-8. For now we could
> > > > simply
> > > > package
> > > > the
> > > > new
> > > > operators in separate pip package for Python 3 version
> > > > 1.10.*
> > > > series
> > > > only.
> > > > We only need to test if it works well with another
> > > > package
> > > > providing
> > > > 'airflow.providers.*' after apache-airflow is
> > > > installed
> > > > (providing
> > > > 'airflow' package). But I think we can make it work. I
> > > > don't
> > > > think
> > > > we
> > > > really need to split the repos, namespaces will work
> > > > just
> > > > fine
> > > > and
> > > > has
> > > > easier management of cross-repository dependencies
> > > > (but we
> > > > can
> > > > learn
> > > > otherwise). For sure we will not need it for the new
> > > > proposed
> > > > AIP
> > > > of
> > > > backporting groups to 1.10 and we can defer that
> > > > decision to
> > > > AIP-8
> > > > implementation time.
> > > >
> > > >
> > > > *   In which Airflow version do we start raising
> > > > deprecation
> > > > warnings
> > > > and in which version would we remove the original?
> > > >
> > > >
> > > > I think we should do what we did in GCP case already.
> > > > Those
> > > > old
> > > > "imports"
> > > > for operators can be made as deprecated in Airflow 2.0
> > > > (and
> > > > removed
> > > > in
> > > > 2.1
> > > > or 3.0 if we start following semantic versioning). We
> > > > can
> > > > however
> > > > do
> > > > it
> > > > before in 1.10.7 or 1.10.8 if we release those
> > > > (without
> > > > removing
> > > > the
> > > > old
> > > > operators yet - just raise deprecation warnings and
> > > > inform
> > > > that
> > > > for
> > > > python3
> > > > the new "airflow-google", "airflow-aws" etc. packages
> > > > can be
> > > > installed
> > > > and
> > > > users can switch to it).
> > > >
> > > > J.
> > > >
> > > >
> > > >
> > > > Cheers,
> > > > Bas
> > > >
> > > > On 27 Oct 2019, at 08:33, Jarek Potiuk <
> > > > jarek.pot...@polidea.com
> > > > <mailto:
> > > > jarek.pot...@polidea.com>> wrote:
> > > >
> > > > Hello - any comments on that? I am happy to make it
> > > > into an
> > > > AIP
> > > > :)?
> > > >
> > > > On Sun, Oct 13, 2019 at 5:53 PM Jarek Potiuk <
> > > > jarek.pot...@polidea.com
> > > > <mailto:jarek.pot...@polidea.com>>
> > > > wrote:
> > > >
> > > > *Motivation*
> > > >
> > > > I think we really should start thinking about making
> > > > it
> > > > easier
> > > > to
> > > > migrate
> > > > to 2.0 for our users. After implementing some recent
> > > > changes
> > > > related
> > > > to
> > > > AIP-21-
> > > > Changes in import paths
> > > > <
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
> > > >
> > > >
> > > > I
> > > > think I have an idea that might help with it.
> > > >
> > > > *Proposal*
> > > >
> > > > We could package some of the new and improved 2.0
> > > > operators
> > > > (moved
> > > > to
> > > > "providers" package) and let them be used in Python 3
> > > > environment
> > > > of
> > > > airflow 1.10.x.
> > > >
> > > > This can be done case-by-case per "cloud provider".
> > > > It
> > > > should
> > > > not
> > > > be
> > > > obligatory, should be largely driven by each
> > > > provider. It's
> > > > not
> > > > yet
> > > > full
> > > > AIP-8
> > > > Split Hooks/Operators into separate packages
> > > > <
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303
> > > >
> > > > .
> > > > It's
> > > > merely backporting of some operators/hooks to get it
> > > > work
> > > > in
> > > > 1.10.
> > > > But
> > > > by
> > > > doing it we might try out the concept of splitting,
> > > > learn
> > > > about
> > > > maintenance
> > > > problems and maybe implement full *AIP-8 *approach in
> > > > 2.1
> > > > consistently
> > > > across the board.
> > > >
> > > > *Context*
> > > >
> > > > Part of the AIP-21 was to move import paths for Cloud
> > > > providers
> > > > to
> > > > separate providers/<PROVIDER> package. An example for
> > > > that
> > > > (the
> > > > first
> > > > provider we already almost migrated) was
> > > > providers/google
> > > > package
> > > > (further
> > > > divided into gcp/gsuite etc).
> > > >
> > > > We've done a massive migration of all the
> > > > Google-related
> > > > operators,
> > > > created a few missing ones and retrofitted some old
> > > > operators
> > > > to
> > > > follow
> > > > GCP
> > > > best practices and fixing a number of problems - also
> > > > implementing
> > > > Python3
> > > > and Pylint compatibility. Some of these
> > > > operators/hooks are
> > > > not
> > > > backwards
> > > > compatible. Those that are compatible are still
> > > > available
> > > > via
> > > > the
> > > > old
> > > > imports with deprecation warning.
> > > >
> > > > We've added missing tests (including system tests)
> > > > and
> > > > missing
> > > > features -
> > > > improving some of the Google operators - giving the
> > > > users
> > > > more
> > > > capabilities
> > > > and fixing some issues. Those operators should pretty
> > > > much
> > > > "just
> > > > work"
> > > > in
> > > > Airflow 1.10.x (any recent version) for Python 3. We
> > > > should
> > > > be
> > > > able
> > > > to
> > > > release a separate pip-installable package for those
> > > > operators
> > > > that
> > > > users
> > > > should be able to install in Airflow 1.10.x.
> > > >
> > > > Any user will be able to install this separate
> > > > package in
> > > > their
> > > > Airflow
> > > > 1.10.x installation and start using those new
> > > > "provider"
> > > > operators
> > > > in
> > > > parallel to the old 1.10.x operators. Other providers
> > > > ("microsoft",
> > > > "amazon") might follow the same approach if they
> > > > want. We
> > > > could
> > > > even
> > > > at
> > > > some point decide to move some of the core operators
> > > > in
> > > > similar
> > > > fashion
> > > > (for example following the structure proposed in the
> > > > latest
> > > > documentation:
> > > > fundamentals / software / etc.
> > > >
> > > >
> > > >
> > > >
> https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html)
> > <
> > > https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html
> )>
> > > >
> > > > *Pros and cons*
> > > >
> > > > There are a number of pros:
> > > >
> > > >  - Users will have an easier migration path if they
> > > > are
> > > > deeply
> > > > vested
> > > > into 1.10.* version
> > > > - It's possible to migrate in stages for people who
> > > > are
> > > > also
> > > > vested
> > > > in
> > > > py2: *py2 (1.10) -> py3 (1.10) -> py3 + new
> > > > operators
> > > > (1.10)
> > > > ->
> > > > py3
> > > > +
> > > > 2.0*
> > > > - Moving to new operators in py3 + new operators can
> > > > be
> > > > done
> > > > gradually. Old operators will continue to work while
> > > > new
> > > > can
> > > > be
> > > > used
> > > > more
> > > > and more
> > > > - People will get incentivised to migrate to python
> > > > 3
> > > > before
> > > > 2.0
> > > > is
> > > > out (by using new operators)
> > > > - Each provider "package" can have independent
> > > > release
> > > > schedule
> > > > -
> > > > and
> > > > add functionality in already released Airflow
> > > > versions.
> > > > - We do not take out any functionality from the
> > > > users - we
> > > > just
> > > > add
> > > > more options
> > > > - The releases can be - similarly as main airflow
> > > > releases -
> > > > voted
> > > > separately by PMC after "stewards" of the package
> > > > (per
> > > > provider)
> > > > perform
> > > > round of testing on 1.10.* versions.
> > > > - Users will start migrating to new operators
> > > > earlier and
> > > > have
> > > >  smoother switch to 2.0 later
> > > >  - The latest improved operators will start
> > > >
> > > > There are three cons I could think of:
> > > >
> > > >  - There will be quite a lot of duplication between
> > > > old and
> > > > new
> > > > operators (they will co-exist in 1.10). That might
> > > > lead to
> > > > confusion
> > > > of
> > > > users and problems with cooperation between
> > > > different
> > > > operators/hooks
> > > > - Having new operators in 1.10 python 3 might keep
> > > > people
> > > > from
> > > > migrating to 2.0
> > > > - It will require some maintenance and separate
> > > > release
> > > > overhead.
> > > >
> > > > I already spoke to Composer team @Google and they are
> > > > very
> > > > positive
> > > > about
> > > > this. I also spoke to Ash and seems it might also be
> > > > OK for
> > > > Astronomer
> > > > team. We have Google's backing and support, and we
> > > > can
> > > > provide
> > > > maintenance
> > > > and support for those packages - being an example for
> > > > other
> > > > providers
> > > > how
> > > > they can do it.
> > > >
> > > > Let me know what you think - and whether I should
> > > > make it
> > > > into
> > > > an
> > > > official
> > > > AIP maybe?
> > > >
> > > > J.
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Jarek Potiuk
> > > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> > > Principal
> > > > Software
> > > > Engineer
> > > >
> > > > M: +48 660 796 129 <+48660796129>
> > > > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/
> >>
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Jarek Potiuk
> > > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> > > Principal
> > > > Software
> > > > Engineer
> > > >
> > > > M: +48 660 796 129 <+48660796129>
> > > > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/
> >>
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Jarek Potiuk
> > > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> > > Principal
> > > > Software
> > > > Engineer
> > > >
> > > > M: +48 660 796 129 <+48660796129>
> > > > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/
> >>
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Tomasz Urbaszek
> > > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> Junior
> > > Software
> > > > Engineer
> > > >
> > > > M: +48 505 628 493 <+48505628493>
> > > > E: tomasz.urbas...@polidea.com
> > > > <tomasz.urbasz...@polidea.com
> > > >
> > > >
> > > > Unique Tech
> > > > Check out our projects!
> > > > <https://www.polidea.com/our-work <https://www.polidea.com/our-work
> >>
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Jarek Potiuk
> > > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> > > Principal Software
> > > > Engineer
> > > >
> > > > M: +48 660 796 129 <+48660796129>
> > > > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/
> >>
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Jarek Potiuk
> > > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> > > Principal Software
> > > > Engineer
> > > >
> > > > M: +48 660 796 129 <+48660796129>
> > > > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/
> >>
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Jarek Potiuk
> > > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> > > Principal Software Engineer
> > > >
> > > > M: +48 660 796 129 <+48660796129>
> > > > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/
> >>
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Jarek Potiuk
> > > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> > > Principal Software Engineer
> > > >
> > > > M: +48 660 796 129 <+48660796129>
> > > > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/
> >>
> > > >
> > > >
> > >
> > >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/>
> >
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Reply via email to