Hey Ash,

Thanks for the offer. I must admin pkgutil and package namespaces are not
the best documented part of python.

I dug a deep deeper and I found a similar problem -
https://github.com/pypa/setuptools/issues/895.  Seems that even if it is
not explicitly explained in pkgutil documentation, this comment (assuming
it is right) explains everything:

*"That's right. All parents of a namespace package must also be namespace
packages, as they will necessarily share that parent name space (farm and
farm.deps in this example)."*

There are few possibilities mentioned in the issue on how this can be
"workarounded", but those are by far not perfect solutions. They would
require patching already installed airflow's __init__.py to work - to
manipulate the search path, Still from my tests I do not know if this would
be possible at all because of the non-trivial __init__.py we have (and use)
in the *airflow* package.

We have a few PRs now waiting for decision on that one I think, so maybe we
can simply agree that we should use another package (I really like
*"airflow_ext"
*:D  and use it from now on? What do you (and others) think.

I'd love to start voting on it soon.

J.



On Thu, Oct 31, 2019 at 5:37 PM Ash Berlin-Taylor <a...@apache.org> wrote:

> Let me run some tests too - I've used them a bit in the past. I thought
> since we only want to make airflow.providers a namespace package it might
> work for us.
>
> Will report back next week.
>
> -ash
>
> On 31 October 2019 15:58:22 GMT, Jarek Potiuk <jarek.pot...@polidea.com>
> wrote:
> >The same repo (so mono-repo approach). All packages would be in
> >"airflow_integrations" directory. It's mainly about moving the
> >operators/hooks/sensor files to different directory structure.
> >
> >It might be done pretty much without changing the current
> >installation/development model:
> >
> >1) We can add setup.py command to install all the packages in -e mode
> >in
> >the main setup.py (to make it easier to install all deps in one go).
> >2) We can add dependencies in setup.py extras to install appropriate
> >packages. For example [google] extra will 'require
> >apache-airflow-integrations-providers-google' package - or
> >apache-airflow-providers-google if we decide to skip -integrations from
> >the
> >package name to make it shorter.
> >
> >The only potential drawback I see is a bit more involved setup of the
> >IDE.
> >
> >This way installation method for both dev and prod remains simple.
> >
> >In the future we can have separate release schedule for the packages
> >(AIP-8) but for now we can stick to the same version for
> >'apache-airflow'
> >and 'apache-airflow-integrations-*' package (+ separate release
> >schedule
> >for backporting needs)
> >Here again the structure of repo (we will likely be able to use native
> >namespaces so I removed some needles __init__.py).
> >
> >|-- airflow
> >|   |- __init__.py|   |- operators -> fundamental operators are here
> >|-- tests -> tests for core airflow are here (optionally we can move
> >them under "airflow")|-- setup.py -> setup.py for the "apache-airflow"
> >package|-- airflow_integrations
> >|   |-providers
> >|   | |-google
> >|   |   |-setup.py -> setup.py for the
> >"apache-airflow-integrations-providers-google" package
> >|   |   |-airflow_integrations
> >|   |     |-providers
> >|   |       |-google
> >|   |         |-__init__.py
> >|   |         | tests -> tests for the
> >"apache-airflow-integrations-providers-google" package|   |
> >|-__init__.py|   |-protocols
> >|     |-setup.py -> setup.py for the
> >"apache-airflow-integrations-protocols" package
> >|     |-airflow_integrations
> >|        |-protocols
> >|          |-__init__.py|          |-tests -> tests for the
> >"apache-airflow-integrations-protocols" package
> >
> >
> >J.
> >
> >On Thu, Oct 31, 2019 at 3:38 PM Kaxil Naik <kaxiln...@gmail.com> wrote:
> >
> >> So create another package in a different repo? or the same repo with
> >a
> >> separate setup.py file that has airflow has dependency?
> >>
> >>
> >>
> >>
> >> On Thu, Oct 31, 2019 at 2:32 PM Jarek Potiuk
> ><jarek.pot...@polidea.com>
> >> wrote:
> >>
> >> > TL;DR; I did some more testing on how namespaces work. I still
> >believe
> >> the
> >> > only way to use namespaces is to have separate (for example
> >> > "airflow_integrations") package for all backportable packages.
> >> >
> >> > I am not sue if someone used namespaces before, but after reading
> >and
> >> > trying out , the main blocker seems to be that we have non-trivial
> >code
> >> in
> >> > airflow's "__init__.py"  (including class definitions, imported
> >> > sub-packages and plugin initialisation).
> >> >
> >> > Details are in
> >> > https://packaging.python.org/guides/packaging-namespace-packages/
> >but
> >> it's
> >> > a long one so let me summarize my findings:
> >> >
> >> >    - In order to use "airflow.providers" package we would have to
> >declare
> >> >    "airflow" as namespace
> >> >    - It can be done in three different ways:
> >> >       - omitting __init__.py in this package (native/implicit
> >namespace)
> >> >       - making __init__.py  of the "airflow" package in main
> >airflow (and
> >> >       other packages) must be "*__path__ =
> >> >       __import__('pkgutil').extend_path(__path__, __name__)*"
> >(pkgutil
> >> >       style) or
> >> "*__import__('pkg_resources').declare_namespace(__name__)*"
> >> >       (pkg_resources style)
> >> >
> >> > The first is not possible (we already have __init__.py  in
> >"airflow".
> >> > The second case is not possible because we already have quite a lot
> >in
> >> the
> >> > airflow's "__init__.py" and both pkgutil and pkg_resources style
> >state:
> >> >
> >> > "*Every* distribution that uses the namespace package must include
> >an
> >> > identical *__init__.py*. If any distribution does not, it will
> >cause the
> >> > namespace logic to fail and the other sub-packages will not be
> >> importable.
> >> > *Any
> >> > additional code in __init__.py will be inaccessible."*
> >> >
> >> > I even tried to add those pkgutil/pkg_resources to airflow and do
> >some
> >> > experimenting with it - but it does not work. Pip install fails at
> >the
> >> > plugins_manager as "airflow.plugins" is not accessible (kind of
> >> expected),
> >> > but I am sure there will be other problems as well. :(
> >> >
> >> > Basically - we cannot turn "airflow" into namespace because it has
> >some
> >> > "__init__.py" logic :(.
> >> >
> >> > So I think it still holds that if we want to use namespaces, we
> >should
> >> use
> >> > another package. The *"airflow_integrations"* is current candidate,
> >but
> >> we
> >> > can think of some nicer/shorter one: "airflow_ext", "airflow_int",
> >> > "airflow_x", "airflow_mod", "airlfow_next", "airflow_xt",
> >"airflow_",
> >> > "ext_airflow", ....  Interestingly "airflow_" is the one suggested
> >by
> >> PEP8
> >> > to avoid conflicts with Python names (which is a different case but
> >kind
> >> of
> >> > close).
> >> >
> >> > What do you think?
> >> >
> >> > J.
> >> >
> >> > On Tue, Oct 29, 2019 at 4:51 PM Kaxil Naik <kaxiln...@gmail.com>
> >wrote:
> >> >
> >> > > The namespace feature looks promising and from your tests, it
> >looks
> >> like
> >> > it
> >> > > would work well from Airflow 2.0 and onwards.
> >> > >
> >> > > I will look at it in-depth and see if I have more suggestions or
> >> opinion
> >> > on
> >> > > it
> >> > >
> >> > > On Tue, Oct 29, 2019 at 3:32 PM Jarek Potiuk
> ><jarek.pot...@polidea.com
> >> >
> >> > > wrote:
> >> > >
> >> > > > TL;DR; We did some testing about namespaces and packaging (and
> >> > potential
> >> > > > backporting options for 1.10.* python3 Airflows) and we think
> >it's
> >> best
> >> > > to
> >> > > > use namespaces quickly and use different package name
> >> > > > "airflow-integrations" for all non-fundamental integrations.
> >> > > >
> >> > > > Unless we missed some tricks, we cannot use airflow.*
> >sub-packages
> >> for
> >> > > the
> >> > > > 1.10.* backportable packages. Example:
> >> > > >
> >> > > >    - "*apache-airflow"* package provides: "airflow.*" (this is
> >what
> >> we
> >> > > have
> >> > > >    today)
> >> > > >    - "*apache-airflow-providers-google*": provides
> >> > > >    "airflow.providers.google.*" packages
> >> > > >
> >> > > > If we install both packages (old apache-airflow 1.10.6  and new
> >> > > > apache-airflow-providers-google from 2.0) - it seems that
> >> > > > the "airflow.providers.google.*" package cannot be imported.
> >This is
> >> a
> >> > > bit
> >> > > > of a problem if we would like to backport the operators from
> >Airflow
> >> > 2.0
> >> > > to
> >> > > > Airflow 1.10 in a way that will be forward-compatible We really
> >want
> >> > > users
> >> > > > who started using backported operators in 1.10.* do not have to
> >> change
> >> > > > imports in their DAGs to run them in Airflow 2.0.
> >> > > >
> >> > > > We discussed it internally in our team and considered several
> >> options,
> >> > > but
> >> > > > we think the best way will be to go straight to "namespaces" in
> >> Airflow
> >> > > 2.0
> >> > > > and to have the integrations (as discussed in AIP-21
> >discussion) to
> >> be
> >> > > in a
> >> > > > separate "*airflow_integrations*" package.  It might be even
> >more
> >> > towards
> >> > > > the AIP-8 implementation and plays together very well in terms
> >of
> >> > > > "stewardship" discussed in AIP-21 now. But we will still keep
> >(for
> >> now)
> >> > > > single release process for all packages for 2.0 (except for the
> >> > > backporting
> >> > > > which can be done per-provider before 2.0 release) and provide
> >a
> >> > > foundation
> >> > > > for future more complex release cycles in future versions.
> >> > > >
> >> > > > Herre is the way how the new Airflow 2.0 repository could look
> >like
> >> (i
> >> > > only
> >> > > > show subset of dirs but they are representative). For those
> >whose
> >> email
> >> > > > fixed/colorfont will get corrupted here is an image of this
> >structure
> >> > > > https://pasteboard.co/IEesTih.png:
> >> > > >
> >> > > > |-- airflow
> >> > > > |   |- __init__.py|   |- operators -> fundamental operators are
> >here
> >> > > > |-- tests -> tests for core airflow are here (optionally we can
> >move
> >> > > > them under "airflow")|-- setup.py -> setup.py for the
> >> "apache-airflow"
> >> > > > package|-- airflow_integrations
> >> > > > |   |-providers
> >> > > > |   | |-google
> >> > > > |   |   |-setup.py -> setup.py for the
> >> > > > "apache-airflow-integrations-providers-google" package
> >> > > > |   |   |-airflow_integrations
> >> > > > |   |     |-__init__.py
> >> > > > |   |     |-providers
> >> > > > |   |       |-__init__.py
> >> > > > |   |       |-google
> >> > > > |   |         |-__init__.py
> >> > > > |   |         | tests -> tests for the
> >> > > > "apache-airflow-integrations-providers-google" package|   |
> >> > > > |-__init__.py|   |-protocols
> >> > > > |     |-setup.py -> setup.py for the
> >> > > > "apache-airflow-integrations-protocols" package
> >> > > > |     |-airflow_integrations
> >> > > > |        |-protocols
> >> > > > |          |-__init__.py|          |-tests -> tests for the
> >> > > > "apache-airflow-integrations-protocols" package
> >> > > >
> >> > > > There are a number of pros for this solution:
> >> > > >
> >> > > >    - We could use the standard namespaces feature of python to
> >build
> >> > > >    multiple packages:
> >> > > >
> >https://packaging.python.org/guides/packaging-namespace-packages/
> >> > > >    - Installation for users will be the same as previously. We
> >could
> >> > > >    install the needed packages automatically when particular
> >extras
> >> are
> >> > > > used
> >> > > >    (pip install apache-airflow[google] could install both
> >> > > "apache-airflow"
> >> > > > and
> >> > > >    "apache-airflow-integrations-providers-google")
> >> > > >    - We could have custom setup.py installation process for
> >> developers
> >> > > that
> >> > > >    could install all the packages in development ("-e ." mode)
> >in a
> >> > > single
> >> > > >    operation.
> >> > > >    - In case of transfer packages we could have nice error
> >messages
> >> > > >    informing that the other package needs to be installed (for
> >> example
> >> > > > S3->GCS
> >> > > >    operator would import
> >"airflow-integrations.providers.amazon.*"
> >> and
> >> > if
> >> > > > it
> >> > > >    fails it could raise ("Please install [amazon] extra to use
> >me.")
> >> > > >    - We could implement numerous optimisations in the way how
> >we run
> >> > > tests
> >> > > >    in CI (for example run all the "providers" tests only with
> >sqlite,
> >> > run
> >> > > >    tests in parallel etc.)
> >> > > >    - We could implement it gradually - we do not have to have a
> >"big
> >> > > bang"
> >> > > >    approach - we can implement it in "provider-by-provider" way
> >and
> >> > test
> >> > > it
> >> > > >    with one provider (Google) first to make sure that all the
> >> > mechanisms
> >> > > > are
> >> > > >    working
> >> > > >    - For now we could have the monorepo approach where all the
> >> packages
> >> > > >    will be developed in concert - for now avoiding the
> >dependency
> >> > > problems
> >> > > >    (but allowing for back-portability to 1.10).
> >> > > >    - We will have clear boundaries between packages and ability
> >to
> >> test
> >> > > for
> >> > > >    some unwanted/hidden dependencies between packages.
> >> > > >    - We could switch to (much better) sphinx-apidoc package to
> >> continue
> >> > > >    building single documentation for all of those (sphinx
> >apidoc has
> >> > > > support
> >> > > >    for namespaces).
> >> > > >
> >> > > > As we are working on GCP move from contrib to core, we could
> >make all
> >> > the
> >> > > > effort to test it and try it before we merge it to master so
> >that it
> >> > will
> >> > > > be ready for others (and we could help with most of the moves
> >> > > afterwards).
> >> > > > It seems complex, but in fact in most cases it will be very
> >simple
> >> move
> >> > > > between the packages and can be done incrementally so there is
> >little
> >> > > risk
> >> > > > in doing this I think.
> >> > > >
> >> > > > J.
> >> > > >
> >> > > >
> >> > > > On Mon, Oct 28, 2019 at 11:45 PM Kevin Yang <yrql...@gmail.com>
> >> wrote:
> >> > > >
> >> > > > > Tomasz and Ash got good points about the overhead of having
> >> separate
> >> > > > repos.
> >> > > > > But while we grow bigger and more mature, I would prefer to
> >have
> >> what
> >> > > was
> >> > > > > described in AIP-8. It shouldn't be extremely hard for us to
> >come
> >> up
> >> > > with
> >> > > > > good strategies to handle the overhead. AIP-8 already talked
> >about
> >> > how
> >> > > it
> >> > > > > can benefit us. IMO on a high level, having clearly
> >seperation on
> >> > core
> >> > > > vs.
> >> > > > > hooks/operators would make the project much more scalable and
> >the
> >> > gains
> >> > > > > would outweigh the cost we pay.
> >> > > > >
> >> > > > > That being said, I'm supportive to this moving towards AIP-8
> >while
> >> > > > learning
> >> > > > > approach, quite a good practise to tackle a big project.
> >Looking
> >> > > forward
> >> > > > to
> >> > > > > read the AIP.
> >> > > > >
> >> > > > >
> >> > > > > Cheers,
> >> > > > > Kevin Y
> >> > > > >
> >> > > > > On Mon, Oct 28, 2019 at 6:21 AM Jarek Potiuk <
> >> > jarek.pot...@polidea.com
> >> > > >
> >> > > > > wrote:
> >> > > > >
> >> > > > > > We are checking how we can use namespaces in back-portable
> >way
> >> and
> >> > we
> >> > > > > will
> >> > > > > > have POC soon so that we all will be able to see how it
> >will look
> >> > > like.
> >> > > > > >
> >> > > > > > J.
> >> > > > > >
> >> > > > > > On Mon, Oct 28, 2019 at 1:24 PM Ash Berlin-Taylor <
> >> a...@apache.org>
> >> > > > > wrote:
> >> > > > > >
> >> > > > > > > I'll have to read your proposal in detail (sorry, no time
> >right
> >> > > > now!),
> >> > > > > > but
> >> > > > > > > I'm broadly in favour of this approach, and I think
> >keeping
> >> them
> >> > > _in_
> >> > > > > the
> >> > > > > > > same repo is the best plan -- that makes writing and
> >testing
> >> > > > > > cross-cutting
> >> > > > > > > changes  easier.
> >> > > > > > >
> >> > > > > > > -a
> >> > > > > > >
> >> > > > > > > > On 28 Oct 2019, at 12:14, Tomasz Urbaszek <
> >> > > > > tomasz.urbas...@polidea.com
> >> > > > > > >
> >> > > > > > > wrote:
> >> > > > > > > >
> >> > > > > > > > I think utilizing namespaces should reduce a lot of
> >problems
> >> > > raised
> >> > > > > by
> >> > > > > > > > using separate repos (who will manage it? how to
> >release?
> >> where
> >> > > > > should
> >> > > > > > be
> >> > > > > > > > the repo?).
> >> > > > > > > >
> >> > > > > > > > Bests,
> >> > > > > > > > Tomek
> >> > > > > > > >
> >> > > > > > > > On Sun, Oct 27, 2019 at 11:54 AM Jarek Potiuk <
> >> > > > > > jarek.pot...@polidea.com>
> >> > > > > > > > wrote:
> >> > > > > > > >
> >> > > > > > > >> Thanks Bas for comments! Let me share my thoughts
> >below.
> >> > > > > > > >>
> >> > > > > > > >> On Sun, Oct 27, 2019 at 9:23 AM Bas Harenslak <
> >> > > > > > > >> basharens...@godatadriven.com>
> >> > > > > > > >> wrote:
> >> > > > > > > >>
> >> > > > > > > >>> Hi Jarek, I definitely see a future in creating
> >separate
> >> > > > > installable
> >> > > > > > > >>> packages for various operators/hooks/etc (as in
> >AIP-8).
> >> This
> >> > > > would
> >> > > > > > IMO
> >> > > > > > > >>> strip the “core” Airflow to only what’s needed and
> >result
> >> in
> >> > a
> >> > > > > small
> >> > > > > > > >>> package without a ton of dependencies (and make it
> >more
> >> > > > > maintainable,
> >> > > > > > > >>> shorter tests, etc etc etc). Not exactly sure though
> >what
> >> > > you’re
> >> > > > > > > >> proposing
> >> > > > > > > >>> in your e-mail, is it a new AIP for an intermediate
> >step
> >> > > towards
> >> > > > > > AIP-8?
> >> > > > > > > >>>
> >> > > > > > > >>
> >> > > > > > > >> It's a new AIP I am proposing.  For now it's only for
> >> > > backporting
> >> > > > > the
> >> > > > > > > new
> >> > > > > > > >> 2.0 import paths to 1.10.* series.
> >> > > > > > > >>
> >> > > > > > > >> It's more of "incremental going in direction of AIP-8
> >and
> >> > > learning
> >> > > > > > some
> >> > > > > > > >> difficulties involved" than implementing AIP-8 fully.
> >We are
> >> > > > taking
> >> > > > > > > >> advantage of changes in import paths from AIP-21 which
> >make
> >> it
> >> > > > > > possible
> >> > > > > > > to
> >> > > > > > > >> have both old and new (optional) operators available
> >in
> >> 1.10.*
> >> > > > > series
> >> > > > > > of
> >> > > > > > > >> Airflow. I think there is a lot more to do for full
> >> > > implementation
> >> > > > > of
> >> > > > > > > >> AIP-8: decisions how to maintain, install those
> >operator
> >> > groups
> >> > > > > > > separately,
> >> > > > > > > >> stewardship model/organisation for the separate
> >groups, how
> >> to
> >> > > > > manage
> >> > > > > > > >> cross-dependencies, procedures for releasing the
> >packages
> >> etc.
> >> > > > > > > >>
> >> > > > > > > >> I think about this new AIP also as a learning effort -
> >we
> >> > would
> >> > > > > learn
> >> > > > > > > more
> >> > > > > > > >> how separate packaging works and then we can follow up
> >with
> >> > > AIP-8
> >> > > > > full
> >> > > > > > > >> implementation for "modular" Airflow. Then AIP-8 could
> >be
> >> > > > > implemented
> >> > > > > > in
> >> > > > > > > >> Airflow 2.1 for example - or 3.0 if we start following
> >> > semantic
> >> > > > > > > versioning
> >> > > > > > > >> - based on those learnings. It's a bit of good example
> >of
> >> > having
> >> > > > > cake
> >> > > > > > > and
> >> > > > > > > >> eating it too. We can try out modularity in 1.10.*
> >while
> >> > cutting
> >> > > > the
> >> > > > > > > scope
> >> > > > > > > >> of 2.0 and not implementing full management/release
> >> procedure
> >> > > for
> >> > > > > > AIP-8
> >> > > > > > > >> yet.
> >> > > > > > > >>
> >> > > > > > > >>
> >> > > > > > > >>> Thinking about this, I think there are still a few
> >grey
> >> areas
> >> > > > > (which
> >> > > > > > > >> would
> >> > > > > > > >>> be good to discuss in a new AIP, or continue on
> >AIP-8):
> >> > > > > > > >>>
> >> > > > > > > >>>  *   In your email you only speak only about the 3
> >big
> >> cloud
> >> > > > > > providers
> >> > > > > > > >>> (btw I made a PR for migrating all AWS components ->
> >> > > > > > > >>> https://github.com/apache/airflow/pull/6439). Is
> >there a
> >> > plan
> >> > > > for
> >> > > > > > > >>> splitting other components than Google/AWS/Azure?
> >> > > > > > > >>>
> >> > > > > > > >>
> >> > > > > > > >> We could add more groups as part of this new AIP
> >indeed (as
> >> an
> >> > > > > > > extension to
> >> > > > > > > >> AIP-21 and pre-requisite to AIP-8). We already see how
> >> > > > > > > moving/deprecation
> >> > > > > > > >> works for the providers package - it works for
> >GCP/Google
> >> > rather
> >> > > > > > nicely.
> >> > > > > > > >> But there is nothing to prevent us from extending it
> >to
> >> cover
> >> > > > other
> >> > > > > > > groups
> >> > > > > > > >> of operators/hooks. If you look at the current
> >structure of
> >> > > > > > > documentation
> >> > > > > > > >> done by Kamil, we can follow the structure there and
> >move
> >> the
> >> > > > > > > >> operators/hooks accordingly (
> >> > > > > > > >>
> >> > > > >
> >> >
> >https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html
> >> > > > > > ):
> >> > > > > > > >>
> >> > > > > > > >>      Fundamentals, ASF: Apache Software Foundation,
> >Azure:
> >> > > > Microsoft
> >> > > > > > > >> Azure, AWS: Amazon Web Services, GCP: Google Cloud
> >Platform,
> >> > > > Service
> >> > > > > > > >> integrations, Software integrations, Protocol
> >integrations.
> >> > > > > > > >>
> >> > > > > > > >> I am happy to include that in the AIP - if others
> >agree
> >> it's a
> >> > > > good
> >> > > > > > > idea.
> >> > > > > > > >> Out of those groups -  I think only Fundamentals
> >should not
> >> be
> >> > > > > > > back-ported.
> >> > > > > > > >> Others should be rather easy to port (if we decide
> >to). We
> >> > > already
> >> > > > > > have
> >> > > > > > > >> quite a lot of those in the new GCP operators for 2.0.
> >So
> >> > > starting
> >> > > > > > with
> >> > > > > > > >> GCP/Google group is a good idea. Also following with
> >Cloud
> >> > > > Providers
> >> > > > > > > first
> >> > > > > > > >> is a good thing. For example we have now support from
> >Google
> >> > > > > Composer
> >> > > > > > > team
> >> > > > > > > >> to do this separation for GCP (and we learn from it)
> >and
> >> then
> >> > we
> >> > > > can
> >> > > > > > > claim
> >> > > > > > > >> the stewardship in our team for releasing the python
> >3/
> >> > Airflow
> >> > > > > > > >> 1.10-compatible "airflow-google" packages. Possibly
> >other
> >> > Cloud
> >> > > > > > > >> Providers/teams might follow this (if they see the
> >value in
> >> > it)
> >> > > > and
> >> > > > > > > there
> >> > > > > > > >> could be different stewards for those. And then we can
> >do
> >> > other
> >> > > > > groups
> >> > > > > > > if
> >> > > > > > > >> we decide to. I think this way we can learn whether
> >AIP-8 is
> >> > > > > > manageable
> >> > > > > > > and
> >> > > > > > > >> what real problems we are going to face.
> >> > > > > > > >>
> >> > > > > > > >>  *   Each “plugin” e.g. GCP would be a separate repo,
> >should
> >> > we
> >> > > > > create
> >> > > > > > > >>> some sort of blueprint for such packages?
> >> > > > > > > >>>
> >> > > > > > > >>
> >> > > > > > > >> I think we do not need separate repos (at all) but in
> >this
> >> new
> >> > > AIP
> >> > > > > we
> >> > > > > > > can
> >> > > > > > > >> test it before we decide to go for AIP-8. IMHO -
> >monorepo
> >> > > approach
> >> > > > > > will
> >> > > > > > > >> work here rather nicely. We could use python-3 native
> >> > namespaces
> >> > > > > > > >> <
> >> > > >
> >https://packaging.python.org/guides/packaging-namespace-packages/>
> >> > > > > > for
> >> > > > > > > >> the
> >> > > > > > > >> sub-packages when we go full AIP-8. For now we could
> >simply
> >> > > > package
> >> > > > > > the
> >> > > > > > > new
> >> > > > > > > >> operators in separate pip package for Python 3 version
> >> 1.10.*
> >> > > > series
> >> > > > > > > only.
> >> > > > > > > >> We only need to test if it works well with another
> >package
> >> > > > providing
> >> > > > > > > >> 'airflow.providers.*' after apache-airflow is
> >installed
> >> > > (providing
> >> > > > > > > >> 'airflow' package). But I think we can make it work. I
> >don't
> >> > > think
> >> > > > > we
> >> > > > > > > >> really need to split the repos, namespaces will work
> >just
> >> fine
> >> > > and
> >> > > > > has
> >> > > > > > > >> easier management of cross-repository dependencies
> >(but we
> >> can
> >> > > > learn
> >> > > > > > > >> otherwise). For sure we will not need it for the new
> >> proposed
> >> > > AIP
> >> > > > of
> >> > > > > > > >> backporting groups to 1.10 and we can defer that
> >decision to
> >> > > AIP-8
> >> > > > > > > >> implementation time.
> >> > > > > > > >>
> >> > > > > > > >>
> >> > > > > > > >>>  *   In which Airflow version do we start raising
> >> deprecation
> >> > > > > > warnings
> >> > > > > > > >>> and in which version would we remove the original?
> >> > > > > > > >>>
> >> > > > > > > >>
> >> > > > > > > >> I think we should do what we did in GCP case already.
> >Those
> >> > old
> >> > > > > > > "imports"
> >> > > > > > > >> for operators can be made as deprecated in Airflow 2.0
> >(and
> >> > > > removed
> >> > > > > in
> >> > > > > > > 2.1
> >> > > > > > > >> or 3.0 if we start following semantic versioning). We
> >can
> >> > > however
> >> > > > do
> >> > > > > > it
> >> > > > > > > >> before in 1.10.7 or 1.10.8 if we release those
> >(without
> >> > removing
> >> > > > the
> >> > > > > > old
> >> > > > > > > >> operators yet - just raise deprecation warnings and
> >inform
> >> > that
> >> > > > for
> >> > > > > > > python3
> >> > > > > > > >> the new "airflow-google", "airflow-aws" etc. packages
> >can be
> >> > > > > installed
> >> > > > > > > and
> >> > > > > > > >> users can switch to it).
> >> > > > > > > >>
> >> > > > > > > >> J.
> >> > > > > > > >>
> >> > > > > > > >>
> >> > > > > > > >>>
> >> > > > > > > >>> Cheers,
> >> > > > > > > >>> Bas
> >> > > > > > > >>>
> >> > > > > > > >>> On 27 Oct 2019, at 08:33, Jarek Potiuk <
> >> > > jarek.pot...@polidea.com
> >> > > > > > > <mailto:
> >> > > > > > > >>> jarek.pot...@polidea.com>> wrote:
> >> > > > > > > >>>
> >> > > > > > > >>> Hello - any comments on that? I am happy to make it
> >into an
> >> > AIP
> >> > > > :)?
> >> > > > > > > >>>
> >> > > > > > > >>> On Sun, Oct 13, 2019 at 5:53 PM Jarek Potiuk <
> >> > > > > > jarek.pot...@polidea.com
> >> > > > > > > >>> <mailto:jarek.pot...@polidea.com>>
> >> > > > > > > >>> wrote:
> >> > > > > > > >>>
> >> > > > > > > >>> *Motivation*
> >> > > > > > > >>>
> >> > > > > > > >>> I think we really should start thinking about making
> >it
> >> > easier
> >> > > to
> >> > > > > > > migrate
> >> > > > > > > >>> to 2.0 for our users. After implementing some recent
> >> changes
> >> > > > > related
> >> > > > > > to
> >> > > > > > > >>> AIP-21-
> >> > > > > > > >>> Changes in import paths
> >> > > > > > > >>> <
> >> > > > > > > >>>
> >> > > > > > > >>
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
> >> > > > > > > >>>
> >> > > > > > > >>> I
> >> > > > > > > >>> think I have an idea that might help with it.
> >> > > > > > > >>>
> >> > > > > > > >>> *Proposal*
> >> > > > > > > >>>
> >> > > > > > > >>> We could package some of the new and improved 2.0
> >operators
> >> > > > (moved
> >> > > > > to
> >> > > > > > > >>> "providers" package) and let them be used in Python 3
> >> > > environment
> >> > > > > of
> >> > > > > > > >>> airflow 1.10.x.
> >> > > > > > > >>>
> >> > > > > > > >>> This can be done case-by-case per "cloud provider".
> >It
> >> should
> >> > > not
> >> > > > > be
> >> > > > > > > >>> obligatory, should be largely driven by each
> >provider. It's
> >> > not
> >> > > > yet
> >> > > > > > > full
> >> > > > > > > >>> AIP-8
> >> > > > > > > >>> Split Hooks/Operators into separate packages
> >> > > > > > > >>> <
> >> > > > > > > >>>
> >> > > > > > > >>
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303
> >> > > > > > > >>> .
> >> > > > > > > >>> It's
> >> > > > > > > >>> merely backporting of some operators/hooks to get it
> >work
> >> in
> >> > > > 1.10.
> >> > > > > > But
> >> > > > > > > by
> >> > > > > > > >>> doing it we might try out the concept of splitting,
> >learn
> >> > about
> >> > > > > > > >> maintenance
> >> > > > > > > >>> problems and maybe implement full *AIP-8 *approach in
> >2.1
> >> > > > > > consistently
> >> > > > > > > >>> across the board.
> >> > > > > > > >>>
> >> > > > > > > >>> *Context*
> >> > > > > > > >>>
> >> > > > > > > >>> Part of the AIP-21 was to move import paths for Cloud
> >> > providers
> >> > > > to
> >> > > > > > > >>> separate providers/<PROVIDER> package. An example for
> >that
> >> > (the
> >> > > > > first
> >> > > > > > > >>> provider we already almost migrated) was
> >providers/google
> >> > > package
> >> > > > > > > >> (further
> >> > > > > > > >>> divided into gcp/gsuite etc).
> >> > > > > > > >>>
> >> > > > > > > >>> We've done a massive migration of all the
> >Google-related
> >> > > > operators,
> >> > > > > > > >>> created a few missing ones and retrofitted some old
> >> operators
> >> > > to
> >> > > > > > follow
> >> > > > > > > >> GCP
> >> > > > > > > >>> best practices and fixing a number of problems - also
> >> > > > implementing
> >> > > > > > > >> Python3
> >> > > > > > > >>> and Pylint compatibility. Some of these
> >operators/hooks are
> >> > not
> >> > > > > > > backwards
> >> > > > > > > >>> compatible. Those that are compatible are still
> >available
> >> via
> >> > > the
> >> > > > > old
> >> > > > > > > >>> imports with deprecation warning.
> >> > > > > > > >>>
> >> > > > > > > >>> We've added missing tests (including system tests)
> >and
> >> > missing
> >> > > > > > > features -
> >> > > > > > > >>> improving some of the Google operators - giving the
> >users
> >> > more
> >> > > > > > > >> capabilities
> >> > > > > > > >>> and fixing some issues. Those operators should pretty
> >much
> >> > > "just
> >> > > > > > work"
> >> > > > > > > in
> >> > > > > > > >>> Airflow 1.10.x (any recent version) for Python 3. We
> >should
> >> > be
> >> > > > able
> >> > > > > > to
> >> > > > > > > >>> release a separate pip-installable package for those
> >> > operators
> >> > > > that
> >> > > > > > > users
> >> > > > > > > >>> should be able to install in Airflow 1.10.x.
> >> > > > > > > >>>
> >> > > > > > > >>> Any user will be able to install this separate
> >package in
> >> > their
> >> > > > > > Airflow
> >> > > > > > > >>> 1.10.x installation and start using those new
> >"provider"
> >> > > > operators
> >> > > > > in
> >> > > > > > > >>> parallel to the old 1.10.x operators. Other providers
> >> > > > ("microsoft",
> >> > > > > > > >>> "amazon") might follow the same approach if they
> >want. We
> >> > could
> >> > > > > even
> >> > > > > > at
> >> > > > > > > >>> some point decide to move some of the core operators
> >in
> >> > similar
> >> > > > > > fashion
> >> > > > > > > >>> (for example following the structure proposed in the
> >latest
> >> > > > > > > >> documentation:
> >> > > > > > > >>> fundamentals / software / etc.
> >> > > > > > > >>>
> >> > > > > >
> >> > >
> >https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html)
> >> > > > > > > >>>
> >> > > > > > > >>> *Pros and cons*
> >> > > > > > > >>>
> >> > > > > > > >>> There are a number of pros:
> >> > > > > > > >>>
> >> > > > > > > >>>  - Users will have an easier migration path if they
> >are
> >> > deeply
> >> > > > > vested
> >> > > > > > > >>>  into 1.10.* version
> >> > > > > > > >>>  - It's possible to migrate in stages for people who
> >are
> >> also
> >> > > > > vested
> >> > > > > > in
> >> > > > > > > >>>  py2: *py2 (1.10) -> py3 (1.10) -> py3 + new
> >operators
> >> (1.10)
> >> > > ->
> >> > > > > py3
> >> > > > > > +
> >> > > > > > > >>>  2.0*
> >> > > > > > > >>>  - Moving to new operators in py3 + new operators can
> >be
> >> done
> >> > > > > > > >>>  gradually. Old operators will continue to work while
> >new
> >> can
> >> > > be
> >> > > > > used
> >> > > > > > > >> more
> >> > > > > > > >>>  and more
> >> > > > > > > >>>  - People will get incentivised to migrate to python
> >3
> >> before
> >> > > 2.0
> >> > > > > is
> >> > > > > > > >>>  out (by using new operators)
> >> > > > > > > >>>  - Each provider "package" can have independent
> >release
> >> > > schedule
> >> > > > -
> >> > > > > > and
> >> > > > > > > >>>  add functionality in already released Airflow
> >versions.
> >> > > > > > > >>>  - We do not take out any functionality from the
> >users - we
> >> > > just
> >> > > > > add
> >> > > > > > > >>>  more options
> >> > > > > > > >>>  - The releases can be - similarly as main airflow
> >> releases -
> >> > > > voted
> >> > > > > > > >>>  separately by PMC after "stewards" of the package
> >(per
> >> > > provider)
> >> > > > > > > >> perform
> >> > > > > > > >>>  round of testing on 1.10.* versions.
> >> > > > > > > >>>  - Users will start migrating to new operators
> >earlier and
> >> > have
> >> > > > > > > >>>  smoother switch to 2.0 later
> >> > > > > > > >>>  - The latest improved operators will start
> >> > > > > > > >>>
> >> > > > > > > >>> There are three cons I could think of:
> >> > > > > > > >>>
> >> > > > > > > >>>  - There will be quite a lot of duplication between
> >old and
> >> > new
> >> > > > > > > >>>  operators (they will co-exist in 1.10). That might
> >lead to
> >> > > > > confusion
> >> > > > > > > of
> >> > > > > > > >>>  users and problems with cooperation between
> >different
> >> > > > > > operators/hooks
> >> > > > > > > >>>  - Having new operators in 1.10 python 3 might keep
> >people
> >> > from
> >> > > > > > > >>>  migrating to 2.0
> >> > > > > > > >>>  - It will require some maintenance and separate
> >release
> >> > > > overhead.
> >> > > > > > > >>>
> >> > > > > > > >>> I already spoke to Composer team @Google and they are
> >very
> >> > > > positive
> >> > > > > > > about
> >> > > > > > > >>> this. I also spoke to Ash and seems it might also be
> >OK for
> >> > > > > > Astronomer
> >> > > > > > > >>> team. We have Google's backing and support, and we
> >can
> >> > provide
> >> > > > > > > >> maintenance
> >> > > > > > > >>> and support for those packages - being an example for
> >other
> >> > > > > providers
> >> > > > > > > how
> >> > > > > > > >>> they can do it.
> >> > > > > > > >>>
> >> > > > > > > >>> Let me know what you think - and whether I should
> >make it
> >> > into
> >> > > an
> >> > > > > > > >> official
> >> > > > > > > >>> AIP maybe?
> >> > > > > > > >>>
> >> > > > > > > >>> J.
> >> > > > > > > >>>
> >> > > > > > > >>>
> >> > > > > > > >>>
> >> > > > > > > >>> --
> >> > > > > > > >>>
> >> > > > > > > >>> Jarek Potiuk
> >> > > > > > > >>> Polidea <https://www.polidea.com/> | Principal
> >Software
> >> > > Engineer
> >> > > > > > > >>>
> >> > > > > > > >>> M: +48 660 796 129 <+48660796129>
> >> > > > > > > >>> [image: Polidea] <https://www.polidea.com/>
> >> > > > > > > >>>
> >> > > > > > > >>>
> >> > > > > > > >>>
> >> > > > > > > >>> --
> >> > > > > > > >>>
> >> > > > > > > >>> Jarek Potiuk
> >> > > > > > > >>> Polidea <https://www.polidea.com/> | Principal
> >Software
> >> > > Engineer
> >> > > > > > > >>>
> >> > > > > > > >>> M: +48 660 796 129 <+48660796129>
> >> > > > > > > >>> [image: Polidea] <https://www.polidea.com/>
> >> > > > > > > >>>
> >> > > > > > > >>>
> >> > > > > > > >>
> >> > > > > > > >> --
> >> > > > > > > >>
> >> > > > > > > >> Jarek Potiuk
> >> > > > > > > >> Polidea <https://www.polidea.com/> | Principal
> >Software
> >> > > Engineer
> >> > > > > > > >>
> >> > > > > > > >> M: +48 660 796 129 <+48660796129>
> >> > > > > > > >> [image: Polidea] <https://www.polidea.com/>
> >> > > > > > > >>
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > --
> >> > > > > > > >
> >> > > > > > > > Tomasz Urbaszek
> >> > > > > > > > Polidea <https://www.polidea.com/> | Junior Software
> >> Engineer
> >> > > > > > > >
> >> > > > > > > > M: +48 505 628 493 <+48505628493>
> >> > > > > > > > E: tomasz.urbas...@polidea.com
> ><tomasz.urbasz...@polidea.com
> >> >
> >> > > > > > > >
> >> > > > > > > > Unique Tech
> >> > > > > > > > Check out our projects!
> ><https://www.polidea.com/our-work>
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > > > --
> >> > > > > >
> >> > > > > > Jarek Potiuk
> >> > > > > > Polidea <https://www.polidea.com/> | Principal Software
> >Engineer
> >> > > > > >
> >> > > > > > M: +48 660 796 129 <+48660796129>
> >> > > > > > [image: Polidea] <https://www.polidea.com/>
> >> > > > > >
> >> > > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > >
> >> > > > Jarek Potiuk
> >> > > > Polidea <https://www.polidea.com/> | Principal Software
> >Engineer
> >> > > >
> >> > > > M: +48 660 796 129 <+48660796129>
> >> > > > [image: Polidea] <https://www.polidea.com/>
> >> > > >
> >> > >
> >> >
> >> >
> >> > --
> >> >
> >> > Jarek Potiuk
> >> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >> >
> >> > M: +48 660 796 129 <+48660796129>
> >> > [image: Polidea] <https://www.polidea.com/>
> >> >
> >>
> >
> >
> >--
> >
> >Jarek Potiuk
> >Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> >M: +48 660 796 129 <+48660796129>
> >[image: Polidea] <https://www.polidea.com/>
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Reply via email to