Let me run some tests too - I've used them a bit in the past. I thought since 
we only want to make airflow.providers a namespace package it might work for us.

Will report back next week.

-ash

On 31 October 2019 15:58:22 GMT, Jarek Potiuk <jarek.pot...@polidea.com> wrote:
>The same repo (so mono-repo approach). All packages would be in
>"airflow_integrations" directory. It's mainly about moving the
>operators/hooks/sensor files to different directory structure.
>
>It might be done pretty much without changing the current
>installation/development model:
>
>1) We can add setup.py command to install all the packages in -e mode
>in
>the main setup.py (to make it easier to install all deps in one go).
>2) We can add dependencies in setup.py extras to install appropriate
>packages. For example [google] extra will 'require
>apache-airflow-integrations-providers-google' package - or
>apache-airflow-providers-google if we decide to skip -integrations from
>the
>package name to make it shorter.
>
>The only potential drawback I see is a bit more involved setup of the
>IDE.
>
>This way installation method for both dev and prod remains simple.
>
>In the future we can have separate release schedule for the packages
>(AIP-8) but for now we can stick to the same version for
>'apache-airflow'
>and 'apache-airflow-integrations-*' package (+ separate release
>schedule
>for backporting needs)
>Here again the structure of repo (we will likely be able to use native
>namespaces so I removed some needles __init__.py).
>
>|-- airflow
>|   |- __init__.py|   |- operators -> fundamental operators are here
>|-- tests -> tests for core airflow are here (optionally we can move
>them under "airflow")|-- setup.py -> setup.py for the "apache-airflow"
>package|-- airflow_integrations
>|   |-providers
>|   | |-google
>|   |   |-setup.py -> setup.py for the
>"apache-airflow-integrations-providers-google" package
>|   |   |-airflow_integrations
>|   |     |-providers
>|   |       |-google
>|   |         |-__init__.py
>|   |         | tests -> tests for the
>"apache-airflow-integrations-providers-google" package|   |
>|-__init__.py|   |-protocols
>|     |-setup.py -> setup.py for the
>"apache-airflow-integrations-protocols" package
>|     |-airflow_integrations
>|        |-protocols
>|          |-__init__.py|          |-tests -> tests for the
>"apache-airflow-integrations-protocols" package
>
>
>J.
>
>On Thu, Oct 31, 2019 at 3:38 PM Kaxil Naik <kaxiln...@gmail.com> wrote:
>
>> So create another package in a different repo? or the same repo with
>a
>> separate setup.py file that has airflow has dependency?
>>
>>
>>
>>
>> On Thu, Oct 31, 2019 at 2:32 PM Jarek Potiuk
><jarek.pot...@polidea.com>
>> wrote:
>>
>> > TL;DR; I did some more testing on how namespaces work. I still
>believe
>> the
>> > only way to use namespaces is to have separate (for example
>> > "airflow_integrations") package for all backportable packages.
>> >
>> > I am not sue if someone used namespaces before, but after reading
>and
>> > trying out , the main blocker seems to be that we have non-trivial
>code
>> in
>> > airflow's "__init__.py"  (including class definitions, imported
>> > sub-packages and plugin initialisation).
>> >
>> > Details are in
>> > https://packaging.python.org/guides/packaging-namespace-packages/
>but
>> it's
>> > a long one so let me summarize my findings:
>> >
>> >    - In order to use "airflow.providers" package we would have to
>declare
>> >    "airflow" as namespace
>> >    - It can be done in three different ways:
>> >       - omitting __init__.py in this package (native/implicit
>namespace)
>> >       - making __init__.py  of the "airflow" package in main
>airflow (and
>> >       other packages) must be "*__path__ =
>> >       __import__('pkgutil').extend_path(__path__, __name__)*"
>(pkgutil
>> >       style) or
>> "*__import__('pkg_resources').declare_namespace(__name__)*"
>> >       (pkg_resources style)
>> >
>> > The first is not possible (we already have __init__.py  in
>"airflow".
>> > The second case is not possible because we already have quite a lot
>in
>> the
>> > airflow's "__init__.py" and both pkgutil and pkg_resources style
>state:
>> >
>> > "*Every* distribution that uses the namespace package must include
>an
>> > identical *__init__.py*. If any distribution does not, it will
>cause the
>> > namespace logic to fail and the other sub-packages will not be
>> importable.
>> > *Any
>> > additional code in __init__.py will be inaccessible."*
>> >
>> > I even tried to add those pkgutil/pkg_resources to airflow and do
>some
>> > experimenting with it - but it does not work. Pip install fails at
>the
>> > plugins_manager as "airflow.plugins" is not accessible (kind of
>> expected),
>> > but I am sure there will be other problems as well. :(
>> >
>> > Basically - we cannot turn "airflow" into namespace because it has
>some
>> > "__init__.py" logic :(.
>> >
>> > So I think it still holds that if we want to use namespaces, we
>should
>> use
>> > another package. The *"airflow_integrations"* is current candidate,
>but
>> we
>> > can think of some nicer/shorter one: "airflow_ext", "airflow_int",
>> > "airflow_x", "airflow_mod", "airlfow_next", "airflow_xt",
>"airflow_",
>> > "ext_airflow", ....  Interestingly "airflow_" is the one suggested
>by
>> PEP8
>> > to avoid conflicts with Python names (which is a different case but
>kind
>> of
>> > close).
>> >
>> > What do you think?
>> >
>> > J.
>> >
>> > On Tue, Oct 29, 2019 at 4:51 PM Kaxil Naik <kaxiln...@gmail.com>
>wrote:
>> >
>> > > The namespace feature looks promising and from your tests, it
>looks
>> like
>> > it
>> > > would work well from Airflow 2.0 and onwards.
>> > >
>> > > I will look at it in-depth and see if I have more suggestions or
>> opinion
>> > on
>> > > it
>> > >
>> > > On Tue, Oct 29, 2019 at 3:32 PM Jarek Potiuk
><jarek.pot...@polidea.com
>> >
>> > > wrote:
>> > >
>> > > > TL;DR; We did some testing about namespaces and packaging (and
>> > potential
>> > > > backporting options for 1.10.* python3 Airflows) and we think
>it's
>> best
>> > > to
>> > > > use namespaces quickly and use different package name
>> > > > "airflow-integrations" for all non-fundamental integrations.
>> > > >
>> > > > Unless we missed some tricks, we cannot use airflow.*
>sub-packages
>> for
>> > > the
>> > > > 1.10.* backportable packages. Example:
>> > > >
>> > > >    - "*apache-airflow"* package provides: "airflow.*" (this is
>what
>> we
>> > > have
>> > > >    today)
>> > > >    - "*apache-airflow-providers-google*": provides
>> > > >    "airflow.providers.google.*" packages
>> > > >
>> > > > If we install both packages (old apache-airflow 1.10.6  and new
>> > > > apache-airflow-providers-google from 2.0) - it seems that
>> > > > the "airflow.providers.google.*" package cannot be imported.
>This is
>> a
>> > > bit
>> > > > of a problem if we would like to backport the operators from
>Airflow
>> > 2.0
>> > > to
>> > > > Airflow 1.10 in a way that will be forward-compatible We really
>want
>> > > users
>> > > > who started using backported operators in 1.10.* do not have to
>> change
>> > > > imports in their DAGs to run them in Airflow 2.0.
>> > > >
>> > > > We discussed it internally in our team and considered several
>> options,
>> > > but
>> > > > we think the best way will be to go straight to "namespaces" in
>> Airflow
>> > > 2.0
>> > > > and to have the integrations (as discussed in AIP-21
>discussion) to
>> be
>> > > in a
>> > > > separate "*airflow_integrations*" package.  It might be even
>more
>> > towards
>> > > > the AIP-8 implementation and plays together very well in terms
>of
>> > > > "stewardship" discussed in AIP-21 now. But we will still keep
>(for
>> now)
>> > > > single release process for all packages for 2.0 (except for the
>> > > backporting
>> > > > which can be done per-provider before 2.0 release) and provide
>a
>> > > foundation
>> > > > for future more complex release cycles in future versions.
>> > > >
>> > > > Herre is the way how the new Airflow 2.0 repository could look
>like
>> (i
>> > > only
>> > > > show subset of dirs but they are representative). For those
>whose
>> email
>> > > > fixed/colorfont will get corrupted here is an image of this
>structure
>> > > > https://pasteboard.co/IEesTih.png:
>> > > >
>> > > > |-- airflow
>> > > > |   |- __init__.py|   |- operators -> fundamental operators are
>here
>> > > > |-- tests -> tests for core airflow are here (optionally we can
>move
>> > > > them under "airflow")|-- setup.py -> setup.py for the
>> "apache-airflow"
>> > > > package|-- airflow_integrations
>> > > > |   |-providers
>> > > > |   | |-google
>> > > > |   |   |-setup.py -> setup.py for the
>> > > > "apache-airflow-integrations-providers-google" package
>> > > > |   |   |-airflow_integrations
>> > > > |   |     |-__init__.py
>> > > > |   |     |-providers
>> > > > |   |       |-__init__.py
>> > > > |   |       |-google
>> > > > |   |         |-__init__.py
>> > > > |   |         | tests -> tests for the
>> > > > "apache-airflow-integrations-providers-google" package|   |
>> > > > |-__init__.py|   |-protocols
>> > > > |     |-setup.py -> setup.py for the
>> > > > "apache-airflow-integrations-protocols" package
>> > > > |     |-airflow_integrations
>> > > > |        |-protocols
>> > > > |          |-__init__.py|          |-tests -> tests for the
>> > > > "apache-airflow-integrations-protocols" package
>> > > >
>> > > > There are a number of pros for this solution:
>> > > >
>> > > >    - We could use the standard namespaces feature of python to
>build
>> > > >    multiple packages:
>> > > >   
>https://packaging.python.org/guides/packaging-namespace-packages/
>> > > >    - Installation for users will be the same as previously. We
>could
>> > > >    install the needed packages automatically when particular
>extras
>> are
>> > > > used
>> > > >    (pip install apache-airflow[google] could install both
>> > > "apache-airflow"
>> > > > and
>> > > >    "apache-airflow-integrations-providers-google")
>> > > >    - We could have custom setup.py installation process for
>> developers
>> > > that
>> > > >    could install all the packages in development ("-e ." mode)
>in a
>> > > single
>> > > >    operation.
>> > > >    - In case of transfer packages we could have nice error
>messages
>> > > >    informing that the other package needs to be installed (for
>> example
>> > > > S3->GCS
>> > > >    operator would import
>"airflow-integrations.providers.amazon.*"
>> and
>> > if
>> > > > it
>> > > >    fails it could raise ("Please install [amazon] extra to use
>me.")
>> > > >    - We could implement numerous optimisations in the way how
>we run
>> > > tests
>> > > >    in CI (for example run all the "providers" tests only with
>sqlite,
>> > run
>> > > >    tests in parallel etc.)
>> > > >    - We could implement it gradually - we do not have to have a
>"big
>> > > bang"
>> > > >    approach - we can implement it in "provider-by-provider" way
>and
>> > test
>> > > it
>> > > >    with one provider (Google) first to make sure that all the
>> > mechanisms
>> > > > are
>> > > >    working
>> > > >    - For now we could have the monorepo approach where all the
>> packages
>> > > >    will be developed in concert - for now avoiding the
>dependency
>> > > problems
>> > > >    (but allowing for back-portability to 1.10).
>> > > >    - We will have clear boundaries between packages and ability
>to
>> test
>> > > for
>> > > >    some unwanted/hidden dependencies between packages.
>> > > >    - We could switch to (much better) sphinx-apidoc package to
>> continue
>> > > >    building single documentation for all of those (sphinx
>apidoc has
>> > > > support
>> > > >    for namespaces).
>> > > >
>> > > > As we are working on GCP move from contrib to core, we could
>make all
>> > the
>> > > > effort to test it and try it before we merge it to master so
>that it
>> > will
>> > > > be ready for others (and we could help with most of the moves
>> > > afterwards).
>> > > > It seems complex, but in fact in most cases it will be very
>simple
>> move
>> > > > between the packages and can be done incrementally so there is
>little
>> > > risk
>> > > > in doing this I think.
>> > > >
>> > > > J.
>> > > >
>> > > >
>> > > > On Mon, Oct 28, 2019 at 11:45 PM Kevin Yang <yrql...@gmail.com>
>> wrote:
>> > > >
>> > > > > Tomasz and Ash got good points about the overhead of having
>> separate
>> > > > repos.
>> > > > > But while we grow bigger and more mature, I would prefer to
>have
>> what
>> > > was
>> > > > > described in AIP-8. It shouldn't be extremely hard for us to
>come
>> up
>> > > with
>> > > > > good strategies to handle the overhead. AIP-8 already talked
>about
>> > how
>> > > it
>> > > > > can benefit us. IMO on a high level, having clearly
>seperation on
>> > core
>> > > > vs.
>> > > > > hooks/operators would make the project much more scalable and
>the
>> > gains
>> > > > > would outweigh the cost we pay.
>> > > > >
>> > > > > That being said, I'm supportive to this moving towards AIP-8
>while
>> > > > learning
>> > > > > approach, quite a good practise to tackle a big project.
>Looking
>> > > forward
>> > > > to
>> > > > > read the AIP.
>> > > > >
>> > > > >
>> > > > > Cheers,
>> > > > > Kevin Y
>> > > > >
>> > > > > On Mon, Oct 28, 2019 at 6:21 AM Jarek Potiuk <
>> > jarek.pot...@polidea.com
>> > > >
>> > > > > wrote:
>> > > > >
>> > > > > > We are checking how we can use namespaces in back-portable
>way
>> and
>> > we
>> > > > > will
>> > > > > > have POC soon so that we all will be able to see how it
>will look
>> > > like.
>> > > > > >
>> > > > > > J.
>> > > > > >
>> > > > > > On Mon, Oct 28, 2019 at 1:24 PM Ash Berlin-Taylor <
>> a...@apache.org>
>> > > > > wrote:
>> > > > > >
>> > > > > > > I'll have to read your proposal in detail (sorry, no time
>right
>> > > > now!),
>> > > > > > but
>> > > > > > > I'm broadly in favour of this approach, and I think
>keeping
>> them
>> > > _in_
>> > > > > the
>> > > > > > > same repo is the best plan -- that makes writing and 
>testing
>> > > > > > cross-cutting
>> > > > > > > changes  easier.
>> > > > > > >
>> > > > > > > -a
>> > > > > > >
>> > > > > > > > On 28 Oct 2019, at 12:14, Tomasz Urbaszek <
>> > > > > tomasz.urbas...@polidea.com
>> > > > > > >
>> > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > I think utilizing namespaces should reduce a lot of
>problems
>> > > raised
>> > > > > by
>> > > > > > > > using separate repos (who will manage it? how to
>release?
>> where
>> > > > > should
>> > > > > > be
>> > > > > > > > the repo?).
>> > > > > > > >
>> > > > > > > > Bests,
>> > > > > > > > Tomek
>> > > > > > > >
>> > > > > > > > On Sun, Oct 27, 2019 at 11:54 AM Jarek Potiuk <
>> > > > > > jarek.pot...@polidea.com>
>> > > > > > > > wrote:
>> > > > > > > >
>> > > > > > > >> Thanks Bas for comments! Let me share my thoughts
>below.
>> > > > > > > >>
>> > > > > > > >> On Sun, Oct 27, 2019 at 9:23 AM Bas Harenslak <
>> > > > > > > >> basharens...@godatadriven.com>
>> > > > > > > >> wrote:
>> > > > > > > >>
>> > > > > > > >>> Hi Jarek, I definitely see a future in creating
>separate
>> > > > > installable
>> > > > > > > >>> packages for various operators/hooks/etc (as in
>AIP-8).
>> This
>> > > > would
>> > > > > > IMO
>> > > > > > > >>> strip the “core” Airflow to only what’s needed and
>result
>> in
>> > a
>> > > > > small
>> > > > > > > >>> package without a ton of dependencies (and make it
>more
>> > > > > maintainable,
>> > > > > > > >>> shorter tests, etc etc etc). Not exactly sure though
>what
>> > > you’re
>> > > > > > > >> proposing
>> > > > > > > >>> in your e-mail, is it a new AIP for an intermediate
>step
>> > > towards
>> > > > > > AIP-8?
>> > > > > > > >>>
>> > > > > > > >>
>> > > > > > > >> It's a new AIP I am proposing.  For now it's only for
>> > > backporting
>> > > > > the
>> > > > > > > new
>> > > > > > > >> 2.0 import paths to 1.10.* series.
>> > > > > > > >>
>> > > > > > > >> It's more of "incremental going in direction of AIP-8
>and
>> > > learning
>> > > > > > some
>> > > > > > > >> difficulties involved" than implementing AIP-8 fully.
>We are
>> > > > taking
>> > > > > > > >> advantage of changes in import paths from AIP-21 which
>make
>> it
>> > > > > > possible
>> > > > > > > to
>> > > > > > > >> have both old and new (optional) operators available
>in
>> 1.10.*
>> > > > > series
>> > > > > > of
>> > > > > > > >> Airflow. I think there is a lot more to do for full
>> > > implementation
>> > > > > of
>> > > > > > > >> AIP-8: decisions how to maintain, install those
>operator
>> > groups
>> > > > > > > separately,
>> > > > > > > >> stewardship model/organisation for the separate
>groups, how
>> to
>> > > > > manage
>> > > > > > > >> cross-dependencies, procedures for releasing the
>packages
>> etc.
>> > > > > > > >>
>> > > > > > > >> I think about this new AIP also as a learning effort -
>we
>> > would
>> > > > > learn
>> > > > > > > more
>> > > > > > > >> how separate packaging works and then we can follow up
>with
>> > > AIP-8
>> > > > > full
>> > > > > > > >> implementation for "modular" Airflow. Then AIP-8 could
>be
>> > > > > implemented
>> > > > > > in
>> > > > > > > >> Airflow 2.1 for example - or 3.0 if we start following
>> > semantic
>> > > > > > > versioning
>> > > > > > > >> - based on those learnings. It's a bit of good example
>of
>> > having
>> > > > > cake
>> > > > > > > and
>> > > > > > > >> eating it too. We can try out modularity in 1.10.*
>while
>> > cutting
>> > > > the
>> > > > > > > scope
>> > > > > > > >> of 2.0 and not implementing full management/release
>> procedure
>> > > for
>> > > > > > AIP-8
>> > > > > > > >> yet.
>> > > > > > > >>
>> > > > > > > >>
>> > > > > > > >>> Thinking about this, I think there are still a few
>grey
>> areas
>> > > > > (which
>> > > > > > > >> would
>> > > > > > > >>> be good to discuss in a new AIP, or continue on
>AIP-8):
>> > > > > > > >>>
>> > > > > > > >>>  *   In your email you only speak only about the 3
>big
>> cloud
>> > > > > > providers
>> > > > > > > >>> (btw I made a PR for migrating all AWS components ->
>> > > > > > > >>> https://github.com/apache/airflow/pull/6439). Is
>there a
>> > plan
>> > > > for
>> > > > > > > >>> splitting other components than Google/AWS/Azure?
>> > > > > > > >>>
>> > > > > > > >>
>> > > > > > > >> We could add more groups as part of this new AIP
>indeed (as
>> an
>> > > > > > > extension to
>> > > > > > > >> AIP-21 and pre-requisite to AIP-8). We already see how
>> > > > > > > moving/deprecation
>> > > > > > > >> works for the providers package - it works for
>GCP/Google
>> > rather
>> > > > > > nicely.
>> > > > > > > >> But there is nothing to prevent us from extending it
>to
>> cover
>> > > > other
>> > > > > > > groups
>> > > > > > > >> of operators/hooks. If you look at the current
>structure of
>> > > > > > > documentation
>> > > > > > > >> done by Kamil, we can follow the structure there and
>move
>> the
>> > > > > > > >> operators/hooks accordingly (
>> > > > > > > >>
>> > > > >
>> >
>https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html
>> > > > > > ):
>> > > > > > > >>
>> > > > > > > >>      Fundamentals, ASF: Apache Software Foundation,
>Azure:
>> > > > Microsoft
>> > > > > > > >> Azure, AWS: Amazon Web Services, GCP: Google Cloud
>Platform,
>> > > > Service
>> > > > > > > >> integrations, Software integrations, Protocol
>integrations.
>> > > > > > > >>
>> > > > > > > >> I am happy to include that in the AIP - if others
>agree
>> it's a
>> > > > good
>> > > > > > > idea.
>> > > > > > > >> Out of those groups -  I think only Fundamentals
>should not
>> be
>> > > > > > > back-ported.
>> > > > > > > >> Others should be rather easy to port (if we decide
>to). We
>> > > already
>> > > > > > have
>> > > > > > > >> quite a lot of those in the new GCP operators for 2.0.
>So
>> > > starting
>> > > > > > with
>> > > > > > > >> GCP/Google group is a good idea. Also following with
>Cloud
>> > > > Providers
>> > > > > > > first
>> > > > > > > >> is a good thing. For example we have now support from
>Google
>> > > > > Composer
>> > > > > > > team
>> > > > > > > >> to do this separation for GCP (and we learn from it)
>and
>> then
>> > we
>> > > > can
>> > > > > > > claim
>> > > > > > > >> the stewardship in our team for releasing the python
>3/
>> > Airflow
>> > > > > > > >> 1.10-compatible "airflow-google" packages. Possibly
>other
>> > Cloud
>> > > > > > > >> Providers/teams might follow this (if they see the
>value in
>> > it)
>> > > > and
>> > > > > > > there
>> > > > > > > >> could be different stewards for those. And then we can
>do
>> > other
>> > > > > groups
>> > > > > > > if
>> > > > > > > >> we decide to. I think this way we can learn whether
>AIP-8 is
>> > > > > > manageable
>> > > > > > > and
>> > > > > > > >> what real problems we are going to face.
>> > > > > > > >>
>> > > > > > > >>  *   Each “plugin” e.g. GCP would be a separate repo,
>should
>> > we
>> > > > > create
>> > > > > > > >>> some sort of blueprint for such packages?
>> > > > > > > >>>
>> > > > > > > >>
>> > > > > > > >> I think we do not need separate repos (at all) but in
>this
>> new
>> > > AIP
>> > > > > we
>> > > > > > > can
>> > > > > > > >> test it before we decide to go for AIP-8. IMHO -
>monorepo
>> > > approach
>> > > > > > will
>> > > > > > > >> work here rather nicely. We could use python-3 native
>> > namespaces
>> > > > > > > >> <
>> > > >
>https://packaging.python.org/guides/packaging-namespace-packages/>
>> > > > > > for
>> > > > > > > >> the
>> > > > > > > >> sub-packages when we go full AIP-8. For now we could
>simply
>> > > > package
>> > > > > > the
>> > > > > > > new
>> > > > > > > >> operators in separate pip package for Python 3 version
>> 1.10.*
>> > > > series
>> > > > > > > only.
>> > > > > > > >> We only need to test if it works well with another
>package
>> > > > providing
>> > > > > > > >> 'airflow.providers.*' after apache-airflow is
>installed
>> > > (providing
>> > > > > > > >> 'airflow' package). But I think we can make it work. I
>don't
>> > > think
>> > > > > we
>> > > > > > > >> really need to split the repos, namespaces will work
>just
>> fine
>> > > and
>> > > > > has
>> > > > > > > >> easier management of cross-repository dependencies
>(but we
>> can
>> > > > learn
>> > > > > > > >> otherwise). For sure we will not need it for the new
>> proposed
>> > > AIP
>> > > > of
>> > > > > > > >> backporting groups to 1.10 and we can defer that
>decision to
>> > > AIP-8
>> > > > > > > >> implementation time.
>> > > > > > > >>
>> > > > > > > >>
>> > > > > > > >>>  *   In which Airflow version do we start raising
>> deprecation
>> > > > > > warnings
>> > > > > > > >>> and in which version would we remove the original?
>> > > > > > > >>>
>> > > > > > > >>
>> > > > > > > >> I think we should do what we did in GCP case already.
>Those
>> > old
>> > > > > > > "imports"
>> > > > > > > >> for operators can be made as deprecated in Airflow 2.0
>(and
>> > > > removed
>> > > > > in
>> > > > > > > 2.1
>> > > > > > > >> or 3.0 if we start following semantic versioning). We
>can
>> > > however
>> > > > do
>> > > > > > it
>> > > > > > > >> before in 1.10.7 or 1.10.8 if we release those
>(without
>> > removing
>> > > > the
>> > > > > > old
>> > > > > > > >> operators yet - just raise deprecation warnings and
>inform
>> > that
>> > > > for
>> > > > > > > python3
>> > > > > > > >> the new "airflow-google", "airflow-aws" etc. packages
>can be
>> > > > > installed
>> > > > > > > and
>> > > > > > > >> users can switch to it).
>> > > > > > > >>
>> > > > > > > >> J.
>> > > > > > > >>
>> > > > > > > >>
>> > > > > > > >>>
>> > > > > > > >>> Cheers,
>> > > > > > > >>> Bas
>> > > > > > > >>>
>> > > > > > > >>> On 27 Oct 2019, at 08:33, Jarek Potiuk <
>> > > jarek.pot...@polidea.com
>> > > > > > > <mailto:
>> > > > > > > >>> jarek.pot...@polidea.com>> wrote:
>> > > > > > > >>>
>> > > > > > > >>> Hello - any comments on that? I am happy to make it
>into an
>> > AIP
>> > > > :)?
>> > > > > > > >>>
>> > > > > > > >>> On Sun, Oct 13, 2019 at 5:53 PM Jarek Potiuk <
>> > > > > > jarek.pot...@polidea.com
>> > > > > > > >>> <mailto:jarek.pot...@polidea.com>>
>> > > > > > > >>> wrote:
>> > > > > > > >>>
>> > > > > > > >>> *Motivation*
>> > > > > > > >>>
>> > > > > > > >>> I think we really should start thinking about making
>it
>> > easier
>> > > to
>> > > > > > > migrate
>> > > > > > > >>> to 2.0 for our users. After implementing some recent
>> changes
>> > > > > related
>> > > > > > to
>> > > > > > > >>> AIP-21-
>> > > > > > > >>> Changes in import paths
>> > > > > > > >>> <
>> > > > > > > >>>
>> > > > > > > >>
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
>> > > > > > > >>>
>> > > > > > > >>> I
>> > > > > > > >>> think I have an idea that might help with it.
>> > > > > > > >>>
>> > > > > > > >>> *Proposal*
>> > > > > > > >>>
>> > > > > > > >>> We could package some of the new and improved 2.0
>operators
>> > > > (moved
>> > > > > to
>> > > > > > > >>> "providers" package) and let them be used in Python 3
>> > > environment
>> > > > > of
>> > > > > > > >>> airflow 1.10.x.
>> > > > > > > >>>
>> > > > > > > >>> This can be done case-by-case per "cloud provider".
>It
>> should
>> > > not
>> > > > > be
>> > > > > > > >>> obligatory, should be largely driven by each
>provider. It's
>> > not
>> > > > yet
>> > > > > > > full
>> > > > > > > >>> AIP-8
>> > > > > > > >>> Split Hooks/Operators into separate packages
>> > > > > > > >>> <
>> > > > > > > >>>
>> > > > > > > >>
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303
>> > > > > > > >>> .
>> > > > > > > >>> It's
>> > > > > > > >>> merely backporting of some operators/hooks to get it
>work
>> in
>> > > > 1.10.
>> > > > > > But
>> > > > > > > by
>> > > > > > > >>> doing it we might try out the concept of splitting,
>learn
>> > about
>> > > > > > > >> maintenance
>> > > > > > > >>> problems and maybe implement full *AIP-8 *approach in
>2.1
>> > > > > > consistently
>> > > > > > > >>> across the board.
>> > > > > > > >>>
>> > > > > > > >>> *Context*
>> > > > > > > >>>
>> > > > > > > >>> Part of the AIP-21 was to move import paths for Cloud
>> > providers
>> > > > to
>> > > > > > > >>> separate providers/<PROVIDER> package. An example for
>that
>> > (the
>> > > > > first
>> > > > > > > >>> provider we already almost migrated) was
>providers/google
>> > > package
>> > > > > > > >> (further
>> > > > > > > >>> divided into gcp/gsuite etc).
>> > > > > > > >>>
>> > > > > > > >>> We've done a massive migration of all the
>Google-related
>> > > > operators,
>> > > > > > > >>> created a few missing ones and retrofitted some old
>> operators
>> > > to
>> > > > > > follow
>> > > > > > > >> GCP
>> > > > > > > >>> best practices and fixing a number of problems - also
>> > > > implementing
>> > > > > > > >> Python3
>> > > > > > > >>> and Pylint compatibility. Some of these
>operators/hooks are
>> > not
>> > > > > > > backwards
>> > > > > > > >>> compatible. Those that are compatible are still
>available
>> via
>> > > the
>> > > > > old
>> > > > > > > >>> imports with deprecation warning.
>> > > > > > > >>>
>> > > > > > > >>> We've added missing tests (including system tests)
>and
>> > missing
>> > > > > > > features -
>> > > > > > > >>> improving some of the Google operators - giving the
>users
>> > more
>> > > > > > > >> capabilities
>> > > > > > > >>> and fixing some issues. Those operators should pretty
>much
>> > > "just
>> > > > > > work"
>> > > > > > > in
>> > > > > > > >>> Airflow 1.10.x (any recent version) for Python 3. We
>should
>> > be
>> > > > able
>> > > > > > to
>> > > > > > > >>> release a separate pip-installable package for those
>> > operators
>> > > > that
>> > > > > > > users
>> > > > > > > >>> should be able to install in Airflow 1.10.x.
>> > > > > > > >>>
>> > > > > > > >>> Any user will be able to install this separate
>package in
>> > their
>> > > > > > Airflow
>> > > > > > > >>> 1.10.x installation and start using those new
>"provider"
>> > > > operators
>> > > > > in
>> > > > > > > >>> parallel to the old 1.10.x operators. Other providers
>> > > > ("microsoft",
>> > > > > > > >>> "amazon") might follow the same approach if they
>want. We
>> > could
>> > > > > even
>> > > > > > at
>> > > > > > > >>> some point decide to move some of the core operators
>in
>> > similar
>> > > > > > fashion
>> > > > > > > >>> (for example following the structure proposed in the
>latest
>> > > > > > > >> documentation:
>> > > > > > > >>> fundamentals / software / etc.
>> > > > > > > >>>
>> > > > > >
>> > >
>https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html)
>> > > > > > > >>>
>> > > > > > > >>> *Pros and cons*
>> > > > > > > >>>
>> > > > > > > >>> There are a number of pros:
>> > > > > > > >>>
>> > > > > > > >>>  - Users will have an easier migration path if they
>are
>> > deeply
>> > > > > vested
>> > > > > > > >>>  into 1.10.* version
>> > > > > > > >>>  - It's possible to migrate in stages for people who
>are
>> also
>> > > > > vested
>> > > > > > in
>> > > > > > > >>>  py2: *py2 (1.10) -> py3 (1.10) -> py3 + new
>operators
>> (1.10)
>> > > ->
>> > > > > py3
>> > > > > > +
>> > > > > > > >>>  2.0*
>> > > > > > > >>>  - Moving to new operators in py3 + new operators can
>be
>> done
>> > > > > > > >>>  gradually. Old operators will continue to work while
>new
>> can
>> > > be
>> > > > > used
>> > > > > > > >> more
>> > > > > > > >>>  and more
>> > > > > > > >>>  - People will get incentivised to migrate to python
>3
>> before
>> > > 2.0
>> > > > > is
>> > > > > > > >>>  out (by using new operators)
>> > > > > > > >>>  - Each provider "package" can have independent
>release
>> > > schedule
>> > > > -
>> > > > > > and
>> > > > > > > >>>  add functionality in already released Airflow
>versions.
>> > > > > > > >>>  - We do not take out any functionality from the
>users - we
>> > > just
>> > > > > add
>> > > > > > > >>>  more options
>> > > > > > > >>>  - The releases can be - similarly as main airflow
>> releases -
>> > > > voted
>> > > > > > > >>>  separately by PMC after "stewards" of the package
>(per
>> > > provider)
>> > > > > > > >> perform
>> > > > > > > >>>  round of testing on 1.10.* versions.
>> > > > > > > >>>  - Users will start migrating to new operators
>earlier and
>> > have
>> > > > > > > >>>  smoother switch to 2.0 later
>> > > > > > > >>>  - The latest improved operators will start
>> > > > > > > >>>
>> > > > > > > >>> There are three cons I could think of:
>> > > > > > > >>>
>> > > > > > > >>>  - There will be quite a lot of duplication between
>old and
>> > new
>> > > > > > > >>>  operators (they will co-exist in 1.10). That might
>lead to
>> > > > > confusion
>> > > > > > > of
>> > > > > > > >>>  users and problems with cooperation between
>different
>> > > > > > operators/hooks
>> > > > > > > >>>  - Having new operators in 1.10 python 3 might keep
>people
>> > from
>> > > > > > > >>>  migrating to 2.0
>> > > > > > > >>>  - It will require some maintenance and separate
>release
>> > > > overhead.
>> > > > > > > >>>
>> > > > > > > >>> I already spoke to Composer team @Google and they are
>very
>> > > > positive
>> > > > > > > about
>> > > > > > > >>> this. I also spoke to Ash and seems it might also be
>OK for
>> > > > > > Astronomer
>> > > > > > > >>> team. We have Google's backing and support, and we
>can
>> > provide
>> > > > > > > >> maintenance
>> > > > > > > >>> and support for those packages - being an example for
>other
>> > > > > providers
>> > > > > > > how
>> > > > > > > >>> they can do it.
>> > > > > > > >>>
>> > > > > > > >>> Let me know what you think - and whether I should
>make it
>> > into
>> > > an
>> > > > > > > >> official
>> > > > > > > >>> AIP maybe?
>> > > > > > > >>>
>> > > > > > > >>> J.
>> > > > > > > >>>
>> > > > > > > >>>
>> > > > > > > >>>
>> > > > > > > >>> --
>> > > > > > > >>>
>> > > > > > > >>> Jarek Potiuk
>> > > > > > > >>> Polidea <https://www.polidea.com/> | Principal
>Software
>> > > Engineer
>> > > > > > > >>>
>> > > > > > > >>> M: +48 660 796 129 <+48660796129>
>> > > > > > > >>> [image: Polidea] <https://www.polidea.com/>
>> > > > > > > >>>
>> > > > > > > >>>
>> > > > > > > >>>
>> > > > > > > >>> --
>> > > > > > > >>>
>> > > > > > > >>> Jarek Potiuk
>> > > > > > > >>> Polidea <https://www.polidea.com/> | Principal
>Software
>> > > Engineer
>> > > > > > > >>>
>> > > > > > > >>> M: +48 660 796 129 <+48660796129>
>> > > > > > > >>> [image: Polidea] <https://www.polidea.com/>
>> > > > > > > >>>
>> > > > > > > >>>
>> > > > > > > >>
>> > > > > > > >> --
>> > > > > > > >>
>> > > > > > > >> Jarek Potiuk
>> > > > > > > >> Polidea <https://www.polidea.com/> | Principal
>Software
>> > > Engineer
>> > > > > > > >>
>> > > > > > > >> M: +48 660 796 129 <+48660796129>
>> > > > > > > >> [image: Polidea] <https://www.polidea.com/>
>> > > > > > > >>
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > --
>> > > > > > > >
>> > > > > > > > Tomasz Urbaszek
>> > > > > > > > Polidea <https://www.polidea.com/> | Junior Software
>> Engineer
>> > > > > > > >
>> > > > > > > > M: +48 505 628 493 <+48505628493>
>> > > > > > > > E: tomasz.urbas...@polidea.com
><tomasz.urbasz...@polidea.com
>> >
>> > > > > > > >
>> > > > > > > > Unique Tech
>> > > > > > > > Check out our projects!
><https://www.polidea.com/our-work>
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > > > --
>> > > > > >
>> > > > > > Jarek Potiuk
>> > > > > > Polidea <https://www.polidea.com/> | Principal Software
>Engineer
>> > > > > >
>> > > > > > M: +48 660 796 129 <+48660796129>
>> > > > > > [image: Polidea] <https://www.polidea.com/>
>> > > > > >
>> > > > >
>> > > >
>> > > >
>> > > > --
>> > > >
>> > > > Jarek Potiuk
>> > > > Polidea <https://www.polidea.com/> | Principal Software
>Engineer
>> > > >
>> > > > M: +48 660 796 129 <+48660796129>
>> > > > [image: Polidea] <https://www.polidea.com/>
>> > > >
>> > >
>> >
>> >
>> > --
>> >
>> > Jarek Potiuk
>> > Polidea <https://www.polidea.com/> | Principal Software Engineer
>> >
>> > M: +48 660 796 129 <+48660796129>
>> > [image: Polidea] <https://www.polidea.com/>
>> >
>>
>
>
>-- 
>
>Jarek Potiuk
>Polidea <https://www.polidea.com/> | Principal Software Engineer
>
>M: +48 660 796 129 <+48660796129>
>[image: Polidea] <https://www.polidea.com/>

Reply via email to