Yes let's just do (1) for now.


On Tue, Nov 5, 2019, 08:48 Jarek Potiuk <jarek.pot...@polidea.com> wrote:

> Thanks Ash! It might indeed work. I will take it from there and try to make
> a POC PR with airflow.
>
> It's a bit different approach than google-python libraries (they keep all
> the libraries as separate sub-packages/mini projects inside the main
> project). The approach you propose is far less invasive in terms of
> changing structure of the main repo. I like it this way much more. It makes
> it much easier to import project in IDE even if it is less modular in
> nature.
>
> From what I understand with this structure - if it works - we have two
> options:
>
> (1) For Airflow 2.0 we will be able to install Airflow and all
> "integrations" in single (apache-airflow == 2.0.0) package and build
> separate backporting integration packages for 1.10.* only.
> (2) We will split Airflow 2.0 into separate "core" and "integration"
> packages as well while preparing packages.
>
> I think (1) is a bit more reasonable for now, until we work full AIP-8
> solution (including dependency hell solving). Let me know what you think
> (and others as well).
>
> J.
>
> On Mon, Nov 4, 2019 at 9:24 PM Ash Berlin-Taylor <a...@apache.org> wrote:
>
> > https://github.com/ashb/airflow-submodule-test <
> > https://github.com/ashb/airflow-submodule-test>
> >
> > That seems to work in any order things are installed, at least on python
> > 3.7. I've had a stressful few days so I may have missed something. Please
> > tell me if there's a case I've missed, or if this is not a suitable proxy
> > for our situation.
> >
> > -a
> >
> > > On 4 Nov 2019, at 20:08, Ash Berlin-Taylor <a...@apache.org> wrote:
> > >
> > > Pretty hard pass from me in airflow_ext. If it's released by airflow I
> > want it to live under airflow.* (Anyone else is free to release packages
> > under any namespace they choose)
> > >
> > > That said I think I've got something that works:
> > >
> > >
> >
> /Users/ash/.virtualenvs/test-providers/lib/python3.7/site-packages/notairflow/__init__.py
> > module level code running
> > >
> >
> /Users/ash/.virtualenvs/test-providers/lib/python3.7/site-packages/notairflow/providers/gcp/__init__.py
> > module level code running
> > >
> > > Let me test it again in a few different cases etc.
> > >
> > > -a
> > >
> > > On 4 November 2019 14:00:24 GMT, Jarek Potiuk <
> jarek.pot...@polidea.com>
> > wrote:
> > > Hey Ash,
> > >
> > > Thanks for the offer. I must admin pkgutil and package namespaces are
> not
> > > the best documented part of python.
> > >
> > > I dug a deep deeper and I found a similar problem -
> > > https://github.com/pypa/setuptools/issues/895. <
> > https://github.com/pypa/setuptools/issues/895.>  Seems that even if it
> is
> > > not explicitly explained in pkgutil documentation, this comment
> (assuming
> > > it is right) explains everything:
> > >
> > > *"That's right. All parents of a namespace package must also be
> namespace
> > > packages, as they will necessarily share that parent name space (farm
> and
> > > farm.deps in this example)."*
> > >
> > > There are few possibilities mentioned in the issue on how this can be
> > > "workarounded", but those are by far not perfect solutions. They would
> > > require patching already installed airflow's __init__.py to work - to
> > > manipulate the search path, Still from my tests I do not know if this
> > would
> > > be possible at all because of the non-trivial __init__.py we have (and
> > use)
> > > in the *airflow* package.
> > >
> > > We have a few PRs now waiting for decision on that one I think, so
> maybe
> > we
> > > can simply agree that we should use another package (I really like
> > > *"airflow_ext"
> > > *:D  and use it from now on? What do you (and others) think.
> > >
> > > I'd love to start voting on it soon.
> > >
> > > J.
> > >
> > >
> > >
> > > On Thu, Oct 31, 2019 at 5:37 PM Ash Berlin-Taylor <a...@apache.org>
> > wrote:
> > >
> > > Let me run some tests too - I've used them a bit in the past. I thought
> > > since we only want to make airflow.providers a namespace package it
> might
> > > work for us.
> > >
> > > Will report back next week.
> > >
> > > -ash
> > >
> > > On 31 October 2019 15:58:22 GMT, Jarek Potiuk <
> jarek.pot...@polidea.com>
> > > wrote:
> > > The same repo (so mono-repo approach). All packages would be in
> > > "airflow_integrations" directory. It's mainly about moving the
> > > operators/hooks/sensor files to different directory structure.
> > >
> > > It might be done pretty much without changing the current
> > > installation/development model:
> > >
> > > 1) We can add setup.py command to install all the packages in -e mode
> > > in
> > > the main setup.py (to make it easier to install all deps in one go).
> > > 2) We can add dependencies in setup.py extras to install appropriate
> > > packages. For example [google] extra will 'require
> > > apache-airflow-integrations-providers-google' package - or
> > > apache-airflow-providers-google if we decide to skip -integrations from
> > > the
> > > package name to make it shorter.
> > >
> > > The only potential drawback I see is a bit more involved setup of the
> > > IDE.
> > >
> > > This way installation method for both dev and prod remains simple.
> > >
> > > In the future we can have separate release schedule for the packages
> > > (AIP-8) but for now we can stick to the same version for
> > > 'apache-airflow'
> > > and 'apache-airflow-integrations-*' package (+ separate release
> > > schedule
> > > for backporting needs)
> > > Here again the structure of repo (we will likely be able to use native
> > > namespaces so I removed some needles __init__.py).
> > >
> > > |-- airflow
> > > |   |- __init__.py|   |- operators -> fundamental operators are here
> > > |-- tests -> tests for core airflow are here (optionally we can move
> > > them under "airflow")|-- setup.py -> setup.py for the "apache-airflow"
> > > package|-- airflow_integrations
> > > |   |-providers
> > > |   | |-google
> > > |   |   |-setup.py -> setup.py for the
> > > "apache-airflow-integrations-providers-google" package
> > > |   |   |-airflow_integrations
> > > |   |     |-providers
> > > |   |       |-google
> > > |   |         |-__init__.py
> > > |   |         | tests -> tests for the
> > > "apache-airflow-integrations-providers-google" package|   |
> > > |-__init__.py|   |-protocols
> > > |     |-setup.py -> setup.py for the
> > > "apache-airflow-integrations-protocols" package
> > > |     |-airflow_integrations
> > > |        |-protocols
> > > |          |-__init__.py|          |-tests -> tests for the
> > > "apache-airflow-integrations-protocols" package
> > >
> > >
> > > J.
> > >
> > > On Thu, Oct 31, 2019 at 3:38 PM Kaxil Naik <kaxiln...@gmail.com>
> wrote:
> > >
> > > So create another package in a different repo? or the same repo with
> > > a
> > > separate setup.py file that has airflow has dependency?
> > >
> > >
> > >
> > >
> > > On Thu, Oct 31, 2019 at 2:32 PM Jarek Potiuk
> > > <jarek.pot...@polidea.com>
> > > wrote:
> > >
> > > TL;DR; I did some more testing on how namespaces work. I still
> > > believe
> > > the
> > > only way to use namespaces is to have separate (for example
> > > "airflow_integrations") package for all backportable packages.
> > >
> > > I am not sue if someone used namespaces before, but after reading
> > > and
> > > trying out , the main blocker seems to be that we have non-trivial
> > > code
> > > in
> > > airflow's "__init__.py"  (including class definitions, imported
> > > sub-packages and plugin initialisation).
> > >
> > > Details are in
> > > https://packaging.python.org/guides/packaging-namespace-packages/ <
> > https://packaging.python.org/guides/packaging-namespace-packages/>
> > > but
> > > it's
> > > a long one so let me summarize my findings:
> > >
> > >    - In order to use "airflow.providers" package we would have to
> > > declare
> > > "airflow" as namespace
> > > - It can be done in three different ways:
> > >   - omitting __init__.py in this package (native/implicit
> > > namespace)
> > > - making __init__.py  of the "airflow" package in main
> > > airflow (and
> > > other packages) must be "*__path__ =
> > > __import__('pkgutil').extend_path(__path__, __name__)*"
> > > (pkgutil
> > > style) or
> > > "*__import__('pkg_resources').declare_namespace(__name__)*"
> > >       (pkg_resources style)
> > >
> > > The first is not possible (we already have __init__.py  in
> > > "airflow".
> > > The second case is not possible because we already have quite a lot
> > > in
> > > the
> > > airflow's "__init__.py" and both pkgutil and pkg_resources style
> > > state:
> > >
> > > "*Every* distribution that uses the namespace package must include
> > > an
> > > identical *__init__.py*. If any distribution does not, it will
> > > cause the
> > > namespace logic to fail and the other sub-packages will not be
> > > importable.
> > > *Any
> > > additional code in __init__.py will be inaccessible."*
> > >
> > > I even tried to add those pkgutil/pkg_resources to airflow and do
> > > some
> > > experimenting with it - but it does not work. Pip install fails at
> > > the
> > > plugins_manager as "airflow.plugins" is not accessible (kind of
> > > expected),
> > > but I am sure there will be other problems as well. :(
> > >
> > > Basically - we cannot turn "airflow" into namespace because it has
> > > some
> > > "__init__.py" logic :(.
> > >
> > > So I think it still holds that if we want to use namespaces, we
> > > should
> > > use
> > > another package. The *"airflow_integrations"* is current candidate,
> > > but
> > > we
> > > can think of some nicer/shorter one: "airflow_ext", "airflow_int",
> > > "airflow_x", "airflow_mod", "airlfow_next", "airflow_xt",
> > > "airflow_",
> > > "ext_airflow", ....  Interestingly "airflow_" is the one suggested
> > > by
> > > PEP8
> > > to avoid conflicts with Python names (which is a different case but
> > > kind
> > > of
> > > close).
> > >
> > > What do you think?
> > >
> > > J.
> > >
> > > On Tue, Oct 29, 2019 at 4:51 PM Kaxil Naik <kaxiln...@gmail.com>
> > > wrote:
> > >
> > > The namespace feature looks promising and from your tests, it
> > > looks
> > > like
> > > it
> > > would work well from Airflow 2.0 and onwards.
> > >
> > > I will look at it in-depth and see if I have more suggestions or
> > > opinion
> > > on
> > > it
> > >
> > > On Tue, Oct 29, 2019 at 3:32 PM Jarek Potiuk
> > > <jarek.pot...@polidea.com
> > >
> > > wrote:
> > >
> > > TL;DR; We did some testing about namespaces and packaging (and
> > > potential
> > > backporting options for 1.10.* python3 Airflows) and we think
> > > it's
> > > best
> > > to
> > > use namespaces quickly and use different package name
> > > "airflow-integrations" for all non-fundamental integrations.
> > >
> > > Unless we missed some tricks, we cannot use airflow.*
> > > sub-packages
> > > for
> > > the
> > > 1.10.* backportable packages. Example:
> > >
> > >    - "*apache-airflow"* package provides: "airflow.*" (this is
> > > what
> > > we
> > > have
> > >    today)
> > >    - "*apache-airflow-providers-google*": provides
> > >    "airflow.providers.google.*" packages
> > >
> > > If we install both packages (old apache-airflow 1.10.6  and new
> > > apache-airflow-providers-google from 2.0) - it seems that
> > > the "airflow.providers.google.*" package cannot be imported.
> > > This is
> > > a
> > > bit
> > > of a problem if we would like to backport the operators from
> > > Airflow
> > > 2.0
> > > to
> > > Airflow 1.10 in a way that will be forward-compatible We really
> > > want
> > > users
> > > who started using backported operators in 1.10.* do not have to
> > > change
> > > imports in their DAGs to run them in Airflow 2.0.
> > >
> > > We discussed it internally in our team and considered several
> > > options,
> > > but
> > > we think the best way will be to go straight to "namespaces" in
> > > Airflow
> > > 2.0
> > > and to have the integrations (as discussed in AIP-21
> > > discussion) to
> > > be
> > > in a
> > > separate "*airflow_integrations*" package.  It might be even
> > > more
> > > towards
> > > the AIP-8 implementation and plays together very well in terms
> > > of
> > > "stewardship" discussed in AIP-21 now. But we will still keep
> > > (for
> > > now)
> > > single release process for all packages for 2.0 (except for the
> > > backporting
> > > which can be done per-provider before 2.0 release) and provide
> > > a
> > > foundation
> > > for future more complex release cycles in future versions.
> > >
> > > Herre is the way how the new Airflow 2.0 repository could look
> > > like
> > > (i
> > > only
> > > show subset of dirs but they are representative). For those
> > > whose
> > > email
> > > fixed/colorfont will get corrupted here is an image of this
> > > structure
> > > https://pasteboard.co/IEesTih.png: <https://pasteboard.co/IEesTih.png
> :>
> > >
> > > |-- airflow
> > > |   |- __init__.py|   |- operators -> fundamental operators are
> > > here
> > > |-- tests -> tests for core airflow are here (optionally we can
> > > move
> > > them under "airflow")|-- setup.py -> setup.py for the
> > > "apache-airflow"
> > > package|-- airflow_integrations
> > > |   |-providers
> > > |   | |-google
> > > |   |   |-setup.py -> setup.py for the
> > > "apache-airflow-integrations-providers-google" package
> > > |   |   |-airflow_integrations
> > > |   |     |-__init__.py
> > > |   |     |-providers
> > > |   |       |-__init__.py
> > > |   |       |-google
> > > |   |         |-__init__.py
> > > |   |         | tests -> tests for the
> > > "apache-airflow-integrations-providers-google" package|   |
> > > |-__init__.py|   |-protocols
> > > |     |-setup.py -> setup.py for the
> > > "apache-airflow-integrations-protocols" package
> > > |     |-airflow_integrations
> > > |        |-protocols
> > > |          |-__init__.py|          |-tests -> tests for the
> > > "apache-airflow-integrations-protocols" package
> > >
> > > There are a number of pros for this solution:
> > >
> > >    - We could use the standard namespaces feature of python to
> > > build
> > >    multiple packages:
> > >
> > > https://packaging.python.org/guides/packaging-namespace-packages/ <
> > https://packaging.python.org/guides/packaging-namespace-packages/>
> > > - Installation for users will be the same as previously. We
> > > could
> > > install the needed packages automatically when particular
> > > extras
> > > are
> > > used
> > >   (pip install apache-airflow[google] could install both
> > > "apache-airflow"
> > > and
> > >   "apache-airflow-integrations-providers-google")
> > >   - We could have custom setup.py installation process for
> > > developers
> > > that
> > > could install all the packages in development ("-e ." mode)
> > > in a
> > > single
> > > operation.
> > > - In case of transfer packages we could have nice error
> > > messages
> > > informing that the other package needs to be installed (for
> > > example
> > > S3->GCS
> > >   operator would import
> > > "airflow-integrations.providers.amazon.*"
> > > and
> > > if
> > > it
> > >   fails it could raise ("Please install [amazon] extra to use
> > > me.")
> > > - We could implement numerous optimisations in the way how
> > > we run
> > > tests
> > > in CI (for example run all the "providers" tests only with
> > > sqlite,
> > > run
> > > tests in parallel etc.)
> > > - We could implement it gradually - we do not have to have a
> > > "big
> > > bang"
> > > approach - we can implement it in "provider-by-provider" way
> > > and
> > > test
> > > it
> > > with one provider (Google) first to make sure that all the
> > > mechanisms
> > > are
> > >   working
> > >   - For now we could have the monorepo approach where all the
> > > packages
> > > will be developed in concert - for now avoiding the
> > > dependency
> > > problems
> > > (but allowing for back-portability to 1.10).
> > > - We will have clear boundaries between packages and ability
> > > to
> > > test
> > > for
> > > some unwanted/hidden dependencies between packages.
> > > - We could switch to (much better) sphinx-apidoc package to
> > > continue
> > > building single documentation for all of those (sphinx
> > > apidoc has
> > > support
> > >    for namespaces).
> > >
> > > As we are working on GCP move from contrib to core, we could
> > > make all
> > > the
> > > effort to test it and try it before we merge it to master so
> > > that it
> > > will
> > > be ready for others (and we could help with most of the moves
> > > afterwards).
> > > It seems complex, but in fact in most cases it will be very
> > > simple
> > > move
> > > between the packages and can be done incrementally so there is
> > > little
> > > risk
> > > in doing this I think.
> > >
> > > J.
> > >
> > >
> > > On Mon, Oct 28, 2019 at 11:45 PM Kevin Yang <yrql...@gmail.com>
> > > wrote:
> > >
> > > Tomasz and Ash got good points about the overhead of having
> > > separate
> > > repos.
> > > But while we grow bigger and more mature, I would prefer to
> > > have
> > > what
> > > was
> > > described in AIP-8. It shouldn't be extremely hard for us to
> > > come
> > > up
> > > with
> > > good strategies to handle the overhead. AIP-8 already talked
> > > about
> > > how
> > > it
> > > can benefit us. IMO on a high level, having clearly
> > > seperation on
> > > core
> > > vs.
> > > hooks/operators would make the project much more scalable and
> > > the
> > > gains
> > > would outweigh the cost we pay.
> > >
> > > That being said, I'm supportive to this moving towards AIP-8
> > > while
> > > learning
> > > approach, quite a good practise to tackle a big project.
> > > Looking
> > > forward
> > > to
> > > read the AIP.
> > >
> > >
> > > Cheers,
> > > Kevin Y
> > >
> > > On Mon, Oct 28, 2019 at 6:21 AM Jarek Potiuk <
> > > jarek.pot...@polidea.com
> > >
> > > wrote:
> > >
> > > We are checking how we can use namespaces in back-portable
> > > way
> > > and
> > > we
> > > will
> > > have POC soon so that we all will be able to see how it
> > > will look
> > > like.
> > >
> > > J.
> > >
> > > On Mon, Oct 28, 2019 at 1:24 PM Ash Berlin-Taylor <
> > > a...@apache.org>
> > > wrote:
> > >
> > > I'll have to read your proposal in detail (sorry, no time
> > > right
> > > now!),
> > > but
> > > I'm broadly in favour of this approach, and I think
> > > keeping
> > > them
> > > _in_
> > > the
> > > same repo is the best plan -- that makes writing and
> > > testing
> > > cross-cutting
> > > changes  easier.
> > >
> > > -a
> > >
> > > On 28 Oct 2019, at 12:14, Tomasz Urbaszek <
> > > tomasz.urbas...@polidea.com
> > >
> > > wrote:
> > >
> > > I think utilizing namespaces should reduce a lot of
> > > problems
> > > raised
> > > by
> > > using separate repos (who will manage it? how to
> > > release?
> > > where
> > > should
> > > be
> > > the repo?).
> > >
> > > Bests,
> > > Tomek
> > >
> > > On Sun, Oct 27, 2019 at 11:54 AM Jarek Potiuk <
> > > jarek.pot...@polidea.com>
> > > wrote:
> > >
> > > Thanks Bas for comments! Let me share my thoughts
> > > below.
> > >
> > > On Sun, Oct 27, 2019 at 9:23 AM Bas Harenslak <
> > > basharens...@godatadriven.com>
> > > wrote:
> > >
> > > Hi Jarek, I definitely see a future in creating
> > > separate
> > > installable
> > > packages for various operators/hooks/etc (as in
> > > AIP-8).
> > > This
> > > would
> > > IMO
> > > strip the “core” Airflow to only what’s needed and
> > > result
> > > in
> > > a
> > > small
> > > package without a ton of dependencies (and make it
> > > more
> > > maintainable,
> > > shorter tests, etc etc etc). Not exactly sure though
> > > what
> > > you’re
> > > proposing
> > > in your e-mail, is it a new AIP for an intermediate
> > > step
> > > towards
> > > AIP-8?
> > >
> > >
> > > It's a new AIP I am proposing.  For now it's only for
> > > backporting
> > > the
> > > new
> > > 2.0 import paths to 1.10.* series.
> > >
> > > It's more of "incremental going in direction of AIP-8
> > > and
> > > learning
> > > some
> > > difficulties involved" than implementing AIP-8 fully.
> > > We are
> > > taking
> > > advantage of changes in import paths from AIP-21 which
> > > make
> > > it
> > > possible
> > > to
> > > have both old and new (optional) operators available
> > > in
> > > 1.10.*
> > > series
> > > of
> > > Airflow. I think there is a lot more to do for full
> > > implementation
> > > of
> > > AIP-8: decisions how to maintain, install those
> > > operator
> > > groups
> > > separately,
> > > stewardship model/organisation for the separate
> > > groups, how
> > > to
> > > manage
> > > cross-dependencies, procedures for releasing the
> > > packages
> > > etc.
> > >
> > > I think about this new AIP also as a learning effort -
> > > we
> > > would
> > > learn
> > > more
> > > how separate packaging works and then we can follow up
> > > with
> > > AIP-8
> > > full
> > > implementation for "modular" Airflow. Then AIP-8 could
> > > be
> > > implemented
> > > in
> > > Airflow 2.1 for example - or 3.0 if we start following
> > > semantic
> > > versioning
> > > - based on those learnings. It's a bit of good example
> > > of
> > > having
> > > cake
> > > and
> > > eating it too. We can try out modularity in 1.10.*
> > > while
> > > cutting
> > > the
> > > scope
> > > of 2.0 and not implementing full management/release
> > > procedure
> > > for
> > > AIP-8
> > > yet.
> > >
> > >
> > > Thinking about this, I think there are still a few
> > > grey
> > > areas
> > > (which
> > > would
> > > be good to discuss in a new AIP, or continue on
> > > AIP-8):
> > >
> > >  *   In your email you only speak only about the 3
> > > big
> > > cloud
> > > providers
> > > (btw I made a PR for migrating all AWS components ->
> > > https://github.com/apache/airflow/pull/6439). <
> > https://github.com/apache/airflow/pull/6439).> Is
> > > there a
> > > plan
> > > for
> > > splitting other components than Google/AWS/Azure?
> > >
> > >
> > > We could add more groups as part of this new AIP
> > > indeed (as
> > > an
> > > extension to
> > > AIP-21 and pre-requisite to AIP-8). We already see how
> > > moving/deprecation
> > > works for the providers package - it works for
> > > GCP/Google
> > > rather
> > > nicely.
> > > But there is nothing to prevent us from extending it
> > > to
> > > cover
> > > other
> > > groups
> > > of operators/hooks. If you look at the current
> > > structure of
> > > documentation
> > > done by Kamil, we can follow the structure there and
> > > move
> > > the
> > > operators/hooks accordingly (
> > >
> > >
> > >
> > > https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html
> <
> > https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html>
> > > ):
> > >
> > >      Fundamentals, ASF: Apache Software Foundation,
> > > Azure:
> > > Microsoft
> > > Azure, AWS: Amazon Web Services, GCP: Google Cloud
> > > Platform,
> > > Service
> > > integrations, Software integrations, Protocol
> > > integrations.
> > >
> > > I am happy to include that in the AIP - if others
> > > agree
> > > it's a
> > > good
> > > idea.
> > > Out of those groups -  I think only Fundamentals
> > > should not
> > > be
> > > back-ported.
> > > Others should be rather easy to port (if we decide
> > > to). We
> > > already
> > > have
> > > quite a lot of those in the new GCP operators for 2.0.
> > > So
> > > starting
> > > with
> > > GCP/Google group is a good idea. Also following with
> > > Cloud
> > > Providers
> > > first
> > > is a good thing. For example we have now support from
> > > Google
> > > Composer
> > > team
> > > to do this separation for GCP (and we learn from it)
> > > and
> > > then
> > > we
> > > can
> > > claim
> > > the stewardship in our team for releasing the python
> > > 3/
> > > Airflow
> > > 1.10-compatible "airflow-google" packages. Possibly
> > > other
> > > Cloud
> > > Providers/teams might follow this (if they see the
> > > value in
> > > it)
> > > and
> > > there
> > > could be different stewards for those. And then we can
> > > do
> > > other
> > > groups
> > > if
> > > we decide to. I think this way we can learn whether
> > > AIP-8 is
> > > manageable
> > > and
> > > what real problems we are going to face.
> > >
> > >  *   Each “plugin” e.g. GCP would be a separate repo,
> > > should
> > > we
> > > create
> > > some sort of blueprint for such packages?
> > >
> > >
> > > I think we do not need separate repos (at all) but in
> > > this
> > > new
> > > AIP
> > > we
> > > can
> > > test it before we decide to go for AIP-8. IMHO -
> > > monorepo
> > > approach
> > > will
> > > work here rather nicely. We could use python-3 native
> > > namespaces
> > > <
> > >
> > > https://packaging.python.org/guides/packaging-namespace-packages/ <
> > https://packaging.python.org/guides/packaging-namespace-packages/>>
> > > for
> > > the
> > > sub-packages when we go full AIP-8. For now we could
> > > simply
> > > package
> > > the
> > > new
> > > operators in separate pip package for Python 3 version
> > > 1.10.*
> > > series
> > > only.
> > > We only need to test if it works well with another
> > > package
> > > providing
> > > 'airflow.providers.*' after apache-airflow is
> > > installed
> > > (providing
> > > 'airflow' package). But I think we can make it work. I
> > > don't
> > > think
> > > we
> > > really need to split the repos, namespaces will work
> > > just
> > > fine
> > > and
> > > has
> > > easier management of cross-repository dependencies
> > > (but we
> > > can
> > > learn
> > > otherwise). For sure we will not need it for the new
> > > proposed
> > > AIP
> > > of
> > > backporting groups to 1.10 and we can defer that
> > > decision to
> > > AIP-8
> > > implementation time.
> > >
> > >
> > > *   In which Airflow version do we start raising
> > > deprecation
> > > warnings
> > > and in which version would we remove the original?
> > >
> > >
> > > I think we should do what we did in GCP case already.
> > > Those
> > > old
> > > "imports"
> > > for operators can be made as deprecated in Airflow 2.0
> > > (and
> > > removed
> > > in
> > > 2.1
> > > or 3.0 if we start following semantic versioning). We
> > > can
> > > however
> > > do
> > > it
> > > before in 1.10.7 or 1.10.8 if we release those
> > > (without
> > > removing
> > > the
> > > old
> > > operators yet - just raise deprecation warnings and
> > > inform
> > > that
> > > for
> > > python3
> > > the new "airflow-google", "airflow-aws" etc. packages
> > > can be
> > > installed
> > > and
> > > users can switch to it).
> > >
> > > J.
> > >
> > >
> > >
> > > Cheers,
> > > Bas
> > >
> > > On 27 Oct 2019, at 08:33, Jarek Potiuk <
> > > jarek.pot...@polidea.com
> > > <mailto:
> > > jarek.pot...@polidea.com>> wrote:
> > >
> > > Hello - any comments on that? I am happy to make it
> > > into an
> > > AIP
> > > :)?
> > >
> > > On Sun, Oct 13, 2019 at 5:53 PM Jarek Potiuk <
> > > jarek.pot...@polidea.com
> > > <mailto:jarek.pot...@polidea.com>>
> > > wrote:
> > >
> > > *Motivation*
> > >
> > > I think we really should start thinking about making
> > > it
> > > easier
> > > to
> > > migrate
> > > to 2.0 for our users. After implementing some recent
> > > changes
> > > related
> > > to
> > > AIP-21-
> > > Changes in import paths
> > > <
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
> > <
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
> > >
> > >
> > > I
> > > think I have an idea that might help with it.
> > >
> > > *Proposal*
> > >
> > > We could package some of the new and improved 2.0
> > > operators
> > > (moved
> > > to
> > > "providers" package) and let them be used in Python 3
> > > environment
> > > of
> > > airflow 1.10.x.
> > >
> > > This can be done case-by-case per "cloud provider".
> > > It
> > > should
> > > not
> > > be
> > > obligatory, should be largely driven by each
> > > provider. It's
> > > not
> > > yet
> > > full
> > > AIP-8
> > > Split Hooks/Operators into separate packages
> > > <
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303
> > <
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303
> > >
> > > .
> > > It's
> > > merely backporting of some operators/hooks to get it
> > > work
> > > in
> > > 1.10.
> > > But
> > > by
> > > doing it we might try out the concept of splitting,
> > > learn
> > > about
> > > maintenance
> > > problems and maybe implement full *AIP-8 *approach in
> > > 2.1
> > > consistently
> > > across the board.
> > >
> > > *Context*
> > >
> > > Part of the AIP-21 was to move import paths for Cloud
> > > providers
> > > to
> > > separate providers/<PROVIDER> package. An example for
> > > that
> > > (the
> > > first
> > > provider we already almost migrated) was
> > > providers/google
> > > package
> > > (further
> > > divided into gcp/gsuite etc).
> > >
> > > We've done a massive migration of all the
> > > Google-related
> > > operators,
> > > created a few missing ones and retrofitted some old
> > > operators
> > > to
> > > follow
> > > GCP
> > > best practices and fixing a number of problems - also
> > > implementing
> > > Python3
> > > and Pylint compatibility. Some of these
> > > operators/hooks are
> > > not
> > > backwards
> > > compatible. Those that are compatible are still
> > > available
> > > via
> > > the
> > > old
> > > imports with deprecation warning.
> > >
> > > We've added missing tests (including system tests)
> > > and
> > > missing
> > > features -
> > > improving some of the Google operators - giving the
> > > users
> > > more
> > > capabilities
> > > and fixing some issues. Those operators should pretty
> > > much
> > > "just
> > > work"
> > > in
> > > Airflow 1.10.x (any recent version) for Python 3. We
> > > should
> > > be
> > > able
> > > to
> > > release a separate pip-installable package for those
> > > operators
> > > that
> > > users
> > > should be able to install in Airflow 1.10.x.
> > >
> > > Any user will be able to install this separate
> > > package in
> > > their
> > > Airflow
> > > 1.10.x installation and start using those new
> > > "provider"
> > > operators
> > > in
> > > parallel to the old 1.10.x operators. Other providers
> > > ("microsoft",
> > > "amazon") might follow the same approach if they
> > > want. We
> > > could
> > > even
> > > at
> > > some point decide to move some of the core operators
> > > in
> > > similar
> > > fashion
> > > (for example following the structure proposed in the
> > > latest
> > > documentation:
> > > fundamentals / software / etc.
> > >
> > >
> > >
> > > https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html)
> <
> > https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html)>
> > >
> > > *Pros and cons*
> > >
> > > There are a number of pros:
> > >
> > >  - Users will have an easier migration path if they
> > > are
> > > deeply
> > > vested
> > > into 1.10.* version
> > > - It's possible to migrate in stages for people who
> > > are
> > > also
> > > vested
> > > in
> > > py2: *py2 (1.10) -> py3 (1.10) -> py3 + new
> > > operators
> > > (1.10)
> > > ->
> > > py3
> > > +
> > > 2.0*
> > > - Moving to new operators in py3 + new operators can
> > > be
> > > done
> > > gradually. Old operators will continue to work while
> > > new
> > > can
> > > be
> > > used
> > > more
> > > and more
> > > - People will get incentivised to migrate to python
> > > 3
> > > before
> > > 2.0
> > > is
> > > out (by using new operators)
> > > - Each provider "package" can have independent
> > > release
> > > schedule
> > > -
> > > and
> > > add functionality in already released Airflow
> > > versions.
> > > - We do not take out any functionality from the
> > > users - we
> > > just
> > > add
> > > more options
> > > - The releases can be - similarly as main airflow
> > > releases -
> > > voted
> > > separately by PMC after "stewards" of the package
> > > (per
> > > provider)
> > > perform
> > > round of testing on 1.10.* versions.
> > > - Users will start migrating to new operators
> > > earlier and
> > > have
> > >  smoother switch to 2.0 later
> > >  - The latest improved operators will start
> > >
> > > There are three cons I could think of:
> > >
> > >  - There will be quite a lot of duplication between
> > > old and
> > > new
> > > operators (they will co-exist in 1.10). That might
> > > lead to
> > > confusion
> > > of
> > > users and problems with cooperation between
> > > different
> > > operators/hooks
> > > - Having new operators in 1.10 python 3 might keep
> > > people
> > > from
> > > migrating to 2.0
> > > - It will require some maintenance and separate
> > > release
> > > overhead.
> > >
> > > I already spoke to Composer team @Google and they are
> > > very
> > > positive
> > > about
> > > this. I also spoke to Ash and seems it might also be
> > > OK for
> > > Astronomer
> > > team. We have Google's backing and support, and we
> > > can
> > > provide
> > > maintenance
> > > and support for those packages - being an example for
> > > other
> > > providers
> > > how
> > > they can do it.
> > >
> > > Let me know what you think - and whether I should
> > > make it
> > > into
> > > an
> > > official
> > > AIP maybe?
> > >
> > > J.
> > >
> > >
> > >
> > > --
> > >
> > > Jarek Potiuk
> > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> > Principal
> > > Software
> > > Engineer
> > >
> > > M: +48 660 796 129 <+48660796129>
> > > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
> > >
> > >
> > >
> > > --
> > >
> > > Jarek Potiuk
> > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> > Principal
> > > Software
> > > Engineer
> > >
> > > M: +48 660 796 129 <+48660796129>
> > > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
> > >
> > >
> > >
> > > --
> > >
> > > Jarek Potiuk
> > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> > Principal
> > > Software
> > > Engineer
> > >
> > > M: +48 660 796 129 <+48660796129>
> > > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
> > >
> > >
> > >
> > > --
> > >
> > > Tomasz Urbaszek
> > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> | Junior
> > Software
> > > Engineer
> > >
> > > M: +48 505 628 493 <+48505628493>
> > > E: tomasz.urbas...@polidea.com
> > > <tomasz.urbasz...@polidea.com
> > >
> > >
> > > Unique Tech
> > > Check out our projects!
> > > <https://www.polidea.com/our-work <https://www.polidea.com/our-work>>
> > >
> > >
> > >
> > > --
> > >
> > > Jarek Potiuk
> > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> > Principal Software
> > > Engineer
> > >
> > > M: +48 660 796 129 <+48660796129>
> > > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
> > >
> > >
> > >
> > >
> > > --
> > >
> > > Jarek Potiuk
> > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> > Principal Software
> > > Engineer
> > >
> > > M: +48 660 796 129 <+48660796129>
> > > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
> > >
> > >
> > >
> > >
> > > --
> > >
> > > Jarek Potiuk
> > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> > Principal Software Engineer
> > >
> > > M: +48 660 796 129 <+48660796129>
> > > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
> > >
> > >
> > >
> > >
> > > --
> > >
> > > Jarek Potiuk
> > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> > Principal Software Engineer
> > >
> > > M: +48 660 796 129 <+48660796129>
> > > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
> > >
> > >
> >
> >
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>

Reply via email to