Thanks Ash! It seems it works really well and is super simple! I have a POC working for Airflow: https://github.com/apache/airflow/pull/6507
I managed to build and pip-install two packages: 1) apache_airflow 2.0 -> which is the same as today - containing everything - including providers and gcp. 2) apache-airflow-providers-google package which has apache-airflow-1.10.* as installation prerequisite. I managed to actually schedule the example_gcp_pubsub dag from airflow.providers.google.example_dags - which uses airflow.providers.google.cloud.operators.pubsub operators and the results are attached (Hope you can see pictures). It worked very nicely - when I just did 'pip install apache-airflow-providers-google' it downloaded and installed from pip apache-airflow-1.10.6 + all prerequisites from the [gcp] extra (which I added as needed for the google package). So we seem to have a working solution now. I will cast a final vote for what I think is a consensus now as update to AIP-21 (there is no point in creating a separate AIP). J. On Tue, Nov 5, 2019 at 11:34 AM Kaxil Naik <kaxiln...@gmail.com> wrote: > Yes let's just do (1) for now. > > > > On Tue, Nov 5, 2019, 08:48 Jarek Potiuk <jarek.pot...@polidea.com> wrote: > > > Thanks Ash! It might indeed work. I will take it from there and try to > make > > a POC PR with airflow. > > > > It's a bit different approach than google-python libraries (they keep all > > the libraries as separate sub-packages/mini projects inside the main > > project). The approach you propose is far less invasive in terms of > > changing structure of the main repo. I like it this way much more. It > makes > > it much easier to import project in IDE even if it is less modular in > > nature. > > > > From what I understand with this structure - if it works - we have two > > options: > > > > (1) For Airflow 2.0 we will be able to install Airflow and all > > "integrations" in single (apache-airflow == 2.0.0) package and build > > separate backporting integration packages for 1.10.* only. > > (2) We will split Airflow 2.0 into separate "core" and "integration" > > packages as well while preparing packages. > > > > I think (1) is a bit more reasonable for now, until we work full AIP-8 > > solution (including dependency hell solving). Let me know what you think > > (and others as well). > > > > J. > > > > On Mon, Nov 4, 2019 at 9:24 PM Ash Berlin-Taylor <a...@apache.org> wrote: > > > > > https://github.com/ashb/airflow-submodule-test < > > > https://github.com/ashb/airflow-submodule-test> > > > > > > That seems to work in any order things are installed, at least on > python > > > 3.7. I've had a stressful few days so I may have missed something. > Please > > > tell me if there's a case I've missed, or if this is not a suitable > proxy > > > for our situation. > > > > > > -a > > > > > > > On 4 Nov 2019, at 20:08, Ash Berlin-Taylor <a...@apache.org> wrote: > > > > > > > > Pretty hard pass from me in airflow_ext. If it's released by airflow > I > > > want it to live under airflow.* (Anyone else is free to release > packages > > > under any namespace they choose) > > > > > > > > That said I think I've got something that works: > > > > > > > > > > > > > > /Users/ash/.virtualenvs/test-providers/lib/python3.7/site-packages/notairflow/__init__.py > > > module level code running > > > > > > > > > > /Users/ash/.virtualenvs/test-providers/lib/python3.7/site-packages/notairflow/providers/gcp/__init__.py > > > module level code running > > > > > > > > Let me test it again in a few different cases etc. > > > > > > > > -a > > > > > > > > On 4 November 2019 14:00:24 GMT, Jarek Potiuk < > > jarek.pot...@polidea.com> > > > wrote: > > > > Hey Ash, > > > > > > > > Thanks for the offer. I must admin pkgutil and package namespaces are > > not > > > > the best documented part of python. > > > > > > > > I dug a deep deeper and I found a similar problem - > > > > https://github.com/pypa/setuptools/issues/895. < > > > https://github.com/pypa/setuptools/issues/895.> Seems that even if it > > is > > > > not explicitly explained in pkgutil documentation, this comment > > (assuming > > > > it is right) explains everything: > > > > > > > > *"That's right. All parents of a namespace package must also be > > namespace > > > > packages, as they will necessarily share that parent name space (farm > > and > > > > farm.deps in this example)."* > > > > > > > > There are few possibilities mentioned in the issue on how this can be > > > > "workarounded", but those are by far not perfect solutions. They > would > > > > require patching already installed airflow's __init__.py to work - to > > > > manipulate the search path, Still from my tests I do not know if this > > > would > > > > be possible at all because of the non-trivial __init__.py we have > (and > > > use) > > > > in the *airflow* package. > > > > > > > > We have a few PRs now waiting for decision on that one I think, so > > maybe > > > we > > > > can simply agree that we should use another package (I really like > > > > *"airflow_ext" > > > > *:D and use it from now on? What do you (and others) think. > > > > > > > > I'd love to start voting on it soon. > > > > > > > > J. > > > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 5:37 PM Ash Berlin-Taylor <a...@apache.org> > > > wrote: > > > > > > > > Let me run some tests too - I've used them a bit in the past. I > thought > > > > since we only want to make airflow.providers a namespace package it > > might > > > > work for us. > > > > > > > > Will report back next week. > > > > > > > > -ash > > > > > > > > On 31 October 2019 15:58:22 GMT, Jarek Potiuk < > > jarek.pot...@polidea.com> > > > > wrote: > > > > The same repo (so mono-repo approach). All packages would be in > > > > "airflow_integrations" directory. It's mainly about moving the > > > > operators/hooks/sensor files to different directory structure. > > > > > > > > It might be done pretty much without changing the current > > > > installation/development model: > > > > > > > > 1) We can add setup.py command to install all the packages in -e mode > > > > in > > > > the main setup.py (to make it easier to install all deps in one go). > > > > 2) We can add dependencies in setup.py extras to install appropriate > > > > packages. For example [google] extra will 'require > > > > apache-airflow-integrations-providers-google' package - or > > > > apache-airflow-providers-google if we decide to skip -integrations > from > > > > the > > > > package name to make it shorter. > > > > > > > > The only potential drawback I see is a bit more involved setup of the > > > > IDE. > > > > > > > > This way installation method for both dev and prod remains simple. > > > > > > > > In the future we can have separate release schedule for the packages > > > > (AIP-8) but for now we can stick to the same version for > > > > 'apache-airflow' > > > > and 'apache-airflow-integrations-*' package (+ separate release > > > > schedule > > > > for backporting needs) > > > > Here again the structure of repo (we will likely be able to use > native > > > > namespaces so I removed some needles __init__.py). > > > > > > > > |-- airflow > > > > | |- __init__.py| |- operators -> fundamental operators are here > > > > |-- tests -> tests for core airflow are here (optionally we can move > > > > them under "airflow")|-- setup.py -> setup.py for the > "apache-airflow" > > > > package|-- airflow_integrations > > > > | |-providers > > > > | | |-google > > > > | | |-setup.py -> setup.py for the > > > > "apache-airflow-integrations-providers-google" package > > > > | | |-airflow_integrations > > > > | | |-providers > > > > | | |-google > > > > | | |-__init__.py > > > > | | | tests -> tests for the > > > > "apache-airflow-integrations-providers-google" package| | > > > > |-__init__.py| |-protocols > > > > | |-setup.py -> setup.py for the > > > > "apache-airflow-integrations-protocols" package > > > > | |-airflow_integrations > > > > | |-protocols > > > > | |-__init__.py| |-tests -> tests for the > > > > "apache-airflow-integrations-protocols" package > > > > > > > > > > > > J. > > > > > > > > On Thu, Oct 31, 2019 at 3:38 PM Kaxil Naik <kaxiln...@gmail.com> > > wrote: > > > > > > > > So create another package in a different repo? or the same repo with > > > > a > > > > separate setup.py file that has airflow has dependency? > > > > > > > > > > > > > > > > > > > > On Thu, Oct 31, 2019 at 2:32 PM Jarek Potiuk > > > > <jarek.pot...@polidea.com> > > > > wrote: > > > > > > > > TL;DR; I did some more testing on how namespaces work. I still > > > > believe > > > > the > > > > only way to use namespaces is to have separate (for example > > > > "airflow_integrations") package for all backportable packages. > > > > > > > > I am not sue if someone used namespaces before, but after reading > > > > and > > > > trying out , the main blocker seems to be that we have non-trivial > > > > code > > > > in > > > > airflow's "__init__.py" (including class definitions, imported > > > > sub-packages and plugin initialisation). > > > > > > > > Details are in > > > > https://packaging.python.org/guides/packaging-namespace-packages/ < > > > https://packaging.python.org/guides/packaging-namespace-packages/> > > > > but > > > > it's > > > > a long one so let me summarize my findings: > > > > > > > > - In order to use "airflow.providers" package we would have to > > > > declare > > > > "airflow" as namespace > > > > - It can be done in three different ways: > > > > - omitting __init__.py in this package (native/implicit > > > > namespace) > > > > - making __init__.py of the "airflow" package in main > > > > airflow (and > > > > other packages) must be "*__path__ = > > > > __import__('pkgutil').extend_path(__path__, __name__)*" > > > > (pkgutil > > > > style) or > > > > "*__import__('pkg_resources').declare_namespace(__name__)*" > > > > (pkg_resources style) > > > > > > > > The first is not possible (we already have __init__.py in > > > > "airflow". > > > > The second case is not possible because we already have quite a lot > > > > in > > > > the > > > > airflow's "__init__.py" and both pkgutil and pkg_resources style > > > > state: > > > > > > > > "*Every* distribution that uses the namespace package must include > > > > an > > > > identical *__init__.py*. If any distribution does not, it will > > > > cause the > > > > namespace logic to fail and the other sub-packages will not be > > > > importable. > > > > *Any > > > > additional code in __init__.py will be inaccessible."* > > > > > > > > I even tried to add those pkgutil/pkg_resources to airflow and do > > > > some > > > > experimenting with it - but it does not work. Pip install fails at > > > > the > > > > plugins_manager as "airflow.plugins" is not accessible (kind of > > > > expected), > > > > but I am sure there will be other problems as well. :( > > > > > > > > Basically - we cannot turn "airflow" into namespace because it has > > > > some > > > > "__init__.py" logic :(. > > > > > > > > So I think it still holds that if we want to use namespaces, we > > > > should > > > > use > > > > another package. The *"airflow_integrations"* is current candidate, > > > > but > > > > we > > > > can think of some nicer/shorter one: "airflow_ext", "airflow_int", > > > > "airflow_x", "airflow_mod", "airlfow_next", "airflow_xt", > > > > "airflow_", > > > > "ext_airflow", .... Interestingly "airflow_" is the one suggested > > > > by > > > > PEP8 > > > > to avoid conflicts with Python names (which is a different case but > > > > kind > > > > of > > > > close). > > > > > > > > What do you think? > > > > > > > > J. > > > > > > > > On Tue, Oct 29, 2019 at 4:51 PM Kaxil Naik <kaxiln...@gmail.com> > > > > wrote: > > > > > > > > The namespace feature looks promising and from your tests, it > > > > looks > > > > like > > > > it > > > > would work well from Airflow 2.0 and onwards. > > > > > > > > I will look at it in-depth and see if I have more suggestions or > > > > opinion > > > > on > > > > it > > > > > > > > On Tue, Oct 29, 2019 at 3:32 PM Jarek Potiuk > > > > <jarek.pot...@polidea.com > > > > > > > > wrote: > > > > > > > > TL;DR; We did some testing about namespaces and packaging (and > > > > potential > > > > backporting options for 1.10.* python3 Airflows) and we think > > > > it's > > > > best > > > > to > > > > use namespaces quickly and use different package name > > > > "airflow-integrations" for all non-fundamental integrations. > > > > > > > > Unless we missed some tricks, we cannot use airflow.* > > > > sub-packages > > > > for > > > > the > > > > 1.10.* backportable packages. Example: > > > > > > > > - "*apache-airflow"* package provides: "airflow.*" (this is > > > > what > > > > we > > > > have > > > > today) > > > > - "*apache-airflow-providers-google*": provides > > > > "airflow.providers.google.*" packages > > > > > > > > If we install both packages (old apache-airflow 1.10.6 and new > > > > apache-airflow-providers-google from 2.0) - it seems that > > > > the "airflow.providers.google.*" package cannot be imported. > > > > This is > > > > a > > > > bit > > > > of a problem if we would like to backport the operators from > > > > Airflow > > > > 2.0 > > > > to > > > > Airflow 1.10 in a way that will be forward-compatible We really > > > > want > > > > users > > > > who started using backported operators in 1.10.* do not have to > > > > change > > > > imports in their DAGs to run them in Airflow 2.0. > > > > > > > > We discussed it internally in our team and considered several > > > > options, > > > > but > > > > we think the best way will be to go straight to "namespaces" in > > > > Airflow > > > > 2.0 > > > > and to have the integrations (as discussed in AIP-21 > > > > discussion) to > > > > be > > > > in a > > > > separate "*airflow_integrations*" package. It might be even > > > > more > > > > towards > > > > the AIP-8 implementation and plays together very well in terms > > > > of > > > > "stewardship" discussed in AIP-21 now. But we will still keep > > > > (for > > > > now) > > > > single release process for all packages for 2.0 (except for the > > > > backporting > > > > which can be done per-provider before 2.0 release) and provide > > > > a > > > > foundation > > > > for future more complex release cycles in future versions. > > > > > > > > Herre is the way how the new Airflow 2.0 repository could look > > > > like > > > > (i > > > > only > > > > show subset of dirs but they are representative). For those > > > > whose > > > > email > > > > fixed/colorfont will get corrupted here is an image of this > > > > structure > > > > https://pasteboard.co/IEesTih.png: < > https://pasteboard.co/IEesTih.png > > :> > > > > > > > > |-- airflow > > > > | |- __init__.py| |- operators -> fundamental operators are > > > > here > > > > |-- tests -> tests for core airflow are here (optionally we can > > > > move > > > > them under "airflow")|-- setup.py -> setup.py for the > > > > "apache-airflow" > > > > package|-- airflow_integrations > > > > | |-providers > > > > | | |-google > > > > | | |-setup.py -> setup.py for the > > > > "apache-airflow-integrations-providers-google" package > > > > | | |-airflow_integrations > > > > | | |-__init__.py > > > > | | |-providers > > > > | | |-__init__.py > > > > | | |-google > > > > | | |-__init__.py > > > > | | | tests -> tests for the > > > > "apache-airflow-integrations-providers-google" package| | > > > > |-__init__.py| |-protocols > > > > | |-setup.py -> setup.py for the > > > > "apache-airflow-integrations-protocols" package > > > > | |-airflow_integrations > > > > | |-protocols > > > > | |-__init__.py| |-tests -> tests for the > > > > "apache-airflow-integrations-protocols" package > > > > > > > > There are a number of pros for this solution: > > > > > > > > - We could use the standard namespaces feature of python to > > > > build > > > > multiple packages: > > > > > > > > https://packaging.python.org/guides/packaging-namespace-packages/ < > > > https://packaging.python.org/guides/packaging-namespace-packages/> > > > > - Installation for users will be the same as previously. We > > > > could > > > > install the needed packages automatically when particular > > > > extras > > > > are > > > > used > > > > (pip install apache-airflow[google] could install both > > > > "apache-airflow" > > > > and > > > > "apache-airflow-integrations-providers-google") > > > > - We could have custom setup.py installation process for > > > > developers > > > > that > > > > could install all the packages in development ("-e ." mode) > > > > in a > > > > single > > > > operation. > > > > - In case of transfer packages we could have nice error > > > > messages > > > > informing that the other package needs to be installed (for > > > > example > > > > S3->GCS > > > > operator would import > > > > "airflow-integrations.providers.amazon.*" > > > > and > > > > if > > > > it > > > > fails it could raise ("Please install [amazon] extra to use > > > > me.") > > > > - We could implement numerous optimisations in the way how > > > > we run > > > > tests > > > > in CI (for example run all the "providers" tests only with > > > > sqlite, > > > > run > > > > tests in parallel etc.) > > > > - We could implement it gradually - we do not have to have a > > > > "big > > > > bang" > > > > approach - we can implement it in "provider-by-provider" way > > > > and > > > > test > > > > it > > > > with one provider (Google) first to make sure that all the > > > > mechanisms > > > > are > > > > working > > > > - For now we could have the monorepo approach where all the > > > > packages > > > > will be developed in concert - for now avoiding the > > > > dependency > > > > problems > > > > (but allowing for back-portability to 1.10). > > > > - We will have clear boundaries between packages and ability > > > > to > > > > test > > > > for > > > > some unwanted/hidden dependencies between packages. > > > > - We could switch to (much better) sphinx-apidoc package to > > > > continue > > > > building single documentation for all of those (sphinx > > > > apidoc has > > > > support > > > > for namespaces). > > > > > > > > As we are working on GCP move from contrib to core, we could > > > > make all > > > > the > > > > effort to test it and try it before we merge it to master so > > > > that it > > > > will > > > > be ready for others (and we could help with most of the moves > > > > afterwards). > > > > It seems complex, but in fact in most cases it will be very > > > > simple > > > > move > > > > between the packages and can be done incrementally so there is > > > > little > > > > risk > > > > in doing this I think. > > > > > > > > J. > > > > > > > > > > > > On Mon, Oct 28, 2019 at 11:45 PM Kevin Yang <yrql...@gmail.com> > > > > wrote: > > > > > > > > Tomasz and Ash got good points about the overhead of having > > > > separate > > > > repos. > > > > But while we grow bigger and more mature, I would prefer to > > > > have > > > > what > > > > was > > > > described in AIP-8. It shouldn't be extremely hard for us to > > > > come > > > > up > > > > with > > > > good strategies to handle the overhead. AIP-8 already talked > > > > about > > > > how > > > > it > > > > can benefit us. IMO on a high level, having clearly > > > > seperation on > > > > core > > > > vs. > > > > hooks/operators would make the project much more scalable and > > > > the > > > > gains > > > > would outweigh the cost we pay. > > > > > > > > That being said, I'm supportive to this moving towards AIP-8 > > > > while > > > > learning > > > > approach, quite a good practise to tackle a big project. > > > > Looking > > > > forward > > > > to > > > > read the AIP. > > > > > > > > > > > > Cheers, > > > > Kevin Y > > > > > > > > On Mon, Oct 28, 2019 at 6:21 AM Jarek Potiuk < > > > > jarek.pot...@polidea.com > > > > > > > > wrote: > > > > > > > > We are checking how we can use namespaces in back-portable > > > > way > > > > and > > > > we > > > > will > > > > have POC soon so that we all will be able to see how it > > > > will look > > > > like. > > > > > > > > J. > > > > > > > > On Mon, Oct 28, 2019 at 1:24 PM Ash Berlin-Taylor < > > > > a...@apache.org> > > > > wrote: > > > > > > > > I'll have to read your proposal in detail (sorry, no time > > > > right > > > > now!), > > > > but > > > > I'm broadly in favour of this approach, and I think > > > > keeping > > > > them > > > > _in_ > > > > the > > > > same repo is the best plan -- that makes writing and > > > > testing > > > > cross-cutting > > > > changes easier. > > > > > > > > -a > > > > > > > > On 28 Oct 2019, at 12:14, Tomasz Urbaszek < > > > > tomasz.urbas...@polidea.com > > > > > > > > wrote: > > > > > > > > I think utilizing namespaces should reduce a lot of > > > > problems > > > > raised > > > > by > > > > using separate repos (who will manage it? how to > > > > release? > > > > where > > > > should > > > > be > > > > the repo?). > > > > > > > > Bests, > > > > Tomek > > > > > > > > On Sun, Oct 27, 2019 at 11:54 AM Jarek Potiuk < > > > > jarek.pot...@polidea.com> > > > > wrote: > > > > > > > > Thanks Bas for comments! Let me share my thoughts > > > > below. > > > > > > > > On Sun, Oct 27, 2019 at 9:23 AM Bas Harenslak < > > > > basharens...@godatadriven.com> > > > > wrote: > > > > > > > > Hi Jarek, I definitely see a future in creating > > > > separate > > > > installable > > > > packages for various operators/hooks/etc (as in > > > > AIP-8). > > > > This > > > > would > > > > IMO > > > > strip the “core” Airflow to only what’s needed and > > > > result > > > > in > > > > a > > > > small > > > > package without a ton of dependencies (and make it > > > > more > > > > maintainable, > > > > shorter tests, etc etc etc). Not exactly sure though > > > > what > > > > you’re > > > > proposing > > > > in your e-mail, is it a new AIP for an intermediate > > > > step > > > > towards > > > > AIP-8? > > > > > > > > > > > > It's a new AIP I am proposing. For now it's only for > > > > backporting > > > > the > > > > new > > > > 2.0 import paths to 1.10.* series. > > > > > > > > It's more of "incremental going in direction of AIP-8 > > > > and > > > > learning > > > > some > > > > difficulties involved" than implementing AIP-8 fully. > > > > We are > > > > taking > > > > advantage of changes in import paths from AIP-21 which > > > > make > > > > it > > > > possible > > > > to > > > > have both old and new (optional) operators available > > > > in > > > > 1.10.* > > > > series > > > > of > > > > Airflow. I think there is a lot more to do for full > > > > implementation > > > > of > > > > AIP-8: decisions how to maintain, install those > > > > operator > > > > groups > > > > separately, > > > > stewardship model/organisation for the separate > > > > groups, how > > > > to > > > > manage > > > > cross-dependencies, procedures for releasing the > > > > packages > > > > etc. > > > > > > > > I think about this new AIP also as a learning effort - > > > > we > > > > would > > > > learn > > > > more > > > > how separate packaging works and then we can follow up > > > > with > > > > AIP-8 > > > > full > > > > implementation for "modular" Airflow. Then AIP-8 could > > > > be > > > > implemented > > > > in > > > > Airflow 2.1 for example - or 3.0 if we start following > > > > semantic > > > > versioning > > > > - based on those learnings. It's a bit of good example > > > > of > > > > having > > > > cake > > > > and > > > > eating it too. We can try out modularity in 1.10.* > > > > while > > > > cutting > > > > the > > > > scope > > > > of 2.0 and not implementing full management/release > > > > procedure > > > > for > > > > AIP-8 > > > > yet. > > > > > > > > > > > > Thinking about this, I think there are still a few > > > > grey > > > > areas > > > > (which > > > > would > > > > be good to discuss in a new AIP, or continue on > > > > AIP-8): > > > > > > > > * In your email you only speak only about the 3 > > > > big > > > > cloud > > > > providers > > > > (btw I made a PR for migrating all AWS components -> > > > > https://github.com/apache/airflow/pull/6439). < > > > https://github.com/apache/airflow/pull/6439).> Is > > > > there a > > > > plan > > > > for > > > > splitting other components than Google/AWS/Azure? > > > > > > > > > > > > We could add more groups as part of this new AIP > > > > indeed (as > > > > an > > > > extension to > > > > AIP-21 and pre-requisite to AIP-8). We already see how > > > > moving/deprecation > > > > works for the providers package - it works for > > > > GCP/Google > > > > rather > > > > nicely. > > > > But there is nothing to prevent us from extending it > > > > to > > > > cover > > > > other > > > > groups > > > > of operators/hooks. If you look at the current > > > > structure of > > > > documentation > > > > done by Kamil, we can follow the structure there and > > > > move > > > > the > > > > operators/hooks accordingly ( > > > > > > > > > > > > > > > > > https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html > > < > > > https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html> > > > > ): > > > > > > > > Fundamentals, ASF: Apache Software Foundation, > > > > Azure: > > > > Microsoft > > > > Azure, AWS: Amazon Web Services, GCP: Google Cloud > > > > Platform, > > > > Service > > > > integrations, Software integrations, Protocol > > > > integrations. > > > > > > > > I am happy to include that in the AIP - if others > > > > agree > > > > it's a > > > > good > > > > idea. > > > > Out of those groups - I think only Fundamentals > > > > should not > > > > be > > > > back-ported. > > > > Others should be rather easy to port (if we decide > > > > to). We > > > > already > > > > have > > > > quite a lot of those in the new GCP operators for 2.0. > > > > So > > > > starting > > > > with > > > > GCP/Google group is a good idea. Also following with > > > > Cloud > > > > Providers > > > > first > > > > is a good thing. For example we have now support from > > > > Google > > > > Composer > > > > team > > > > to do this separation for GCP (and we learn from it) > > > > and > > > > then > > > > we > > > > can > > > > claim > > > > the stewardship in our team for releasing the python > > > > 3/ > > > > Airflow > > > > 1.10-compatible "airflow-google" packages. Possibly > > > > other > > > > Cloud > > > > Providers/teams might follow this (if they see the > > > > value in > > > > it) > > > > and > > > > there > > > > could be different stewards for those. And then we can > > > > do > > > > other > > > > groups > > > > if > > > > we decide to. I think this way we can learn whether > > > > AIP-8 is > > > > manageable > > > > and > > > > what real problems we are going to face. > > > > > > > > * Each “plugin” e.g. GCP would be a separate repo, > > > > should > > > > we > > > > create > > > > some sort of blueprint for such packages? > > > > > > > > > > > > I think we do not need separate repos (at all) but in > > > > this > > > > new > > > > AIP > > > > we > > > > can > > > > test it before we decide to go for AIP-8. IMHO - > > > > monorepo > > > > approach > > > > will > > > > work here rather nicely. We could use python-3 native > > > > namespaces > > > > < > > > > > > > > https://packaging.python.org/guides/packaging-namespace-packages/ < > > > https://packaging.python.org/guides/packaging-namespace-packages/>> > > > > for > > > > the > > > > sub-packages when we go full AIP-8. For now we could > > > > simply > > > > package > > > > the > > > > new > > > > operators in separate pip package for Python 3 version > > > > 1.10.* > > > > series > > > > only. > > > > We only need to test if it works well with another > > > > package > > > > providing > > > > 'airflow.providers.*' after apache-airflow is > > > > installed > > > > (providing > > > > 'airflow' package). But I think we can make it work. I > > > > don't > > > > think > > > > we > > > > really need to split the repos, namespaces will work > > > > just > > > > fine > > > > and > > > > has > > > > easier management of cross-repository dependencies > > > > (but we > > > > can > > > > learn > > > > otherwise). For sure we will not need it for the new > > > > proposed > > > > AIP > > > > of > > > > backporting groups to 1.10 and we can defer that > > > > decision to > > > > AIP-8 > > > > implementation time. > > > > > > > > > > > > * In which Airflow version do we start raising > > > > deprecation > > > > warnings > > > > and in which version would we remove the original? > > > > > > > > > > > > I think we should do what we did in GCP case already. > > > > Those > > > > old > > > > "imports" > > > > for operators can be made as deprecated in Airflow 2.0 > > > > (and > > > > removed > > > > in > > > > 2.1 > > > > or 3.0 if we start following semantic versioning). We > > > > can > > > > however > > > > do > > > > it > > > > before in 1.10.7 or 1.10.8 if we release those > > > > (without > > > > removing > > > > the > > > > old > > > > operators yet - just raise deprecation warnings and > > > > inform > > > > that > > > > for > > > > python3 > > > > the new "airflow-google", "airflow-aws" etc. packages > > > > can be > > > > installed > > > > and > > > > users can switch to it). > > > > > > > > J. > > > > > > > > > > > > > > > > Cheers, > > > > Bas > > > > > > > > On 27 Oct 2019, at 08:33, Jarek Potiuk < > > > > jarek.pot...@polidea.com > > > > <mailto: > > > > jarek.pot...@polidea.com>> wrote: > > > > > > > > Hello - any comments on that? I am happy to make it > > > > into an > > > > AIP > > > > :)? > > > > > > > > On Sun, Oct 13, 2019 at 5:53 PM Jarek Potiuk < > > > > jarek.pot...@polidea.com > > > > <mailto:jarek.pot...@polidea.com>> > > > > wrote: > > > > > > > > *Motivation* > > > > > > > > I think we really should start thinking about making > > > > it > > > > easier > > > > to > > > > migrate > > > > to 2.0 for our users. After implementing some recent > > > > changes > > > > related > > > > to > > > > AIP-21- > > > > Changes in import paths > > > > < > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths > > > < > > > > > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths > > > > > > > > > > > > I > > > > think I have an idea that might help with it. > > > > > > > > *Proposal* > > > > > > > > We could package some of the new and improved 2.0 > > > > operators > > > > (moved > > > > to > > > > "providers" package) and let them be used in Python 3 > > > > environment > > > > of > > > > airflow 1.10.x. > > > > > > > > This can be done case-by-case per "cloud provider". > > > > It > > > > should > > > > not > > > > be > > > > obligatory, should be largely driven by each > > > > provider. It's > > > > not > > > > yet > > > > full > > > > AIP-8 > > > > Split Hooks/Operators into separate packages > > > > < > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303 > > > < > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303 > > > > > > > > . > > > > It's > > > > merely backporting of some operators/hooks to get it > > > > work > > > > in > > > > 1.10. > > > > But > > > > by > > > > doing it we might try out the concept of splitting, > > > > learn > > > > about > > > > maintenance > > > > problems and maybe implement full *AIP-8 *approach in > > > > 2.1 > > > > consistently > > > > across the board. > > > > > > > > *Context* > > > > > > > > Part of the AIP-21 was to move import paths for Cloud > > > > providers > > > > to > > > > separate providers/<PROVIDER> package. An example for > > > > that > > > > (the > > > > first > > > > provider we already almost migrated) was > > > > providers/google > > > > package > > > > (further > > > > divided into gcp/gsuite etc). > > > > > > > > We've done a massive migration of all the > > > > Google-related > > > > operators, > > > > created a few missing ones and retrofitted some old > > > > operators > > > > to > > > > follow > > > > GCP > > > > best practices and fixing a number of problems - also > > > > implementing > > > > Python3 > > > > and Pylint compatibility. Some of these > > > > operators/hooks are > > > > not > > > > backwards > > > > compatible. Those that are compatible are still > > > > available > > > > via > > > > the > > > > old > > > > imports with deprecation warning. > > > > > > > > We've added missing tests (including system tests) > > > > and > > > > missing > > > > features - > > > > improving some of the Google operators - giving the > > > > users > > > > more > > > > capabilities > > > > and fixing some issues. Those operators should pretty > > > > much > > > > "just > > > > work" > > > > in > > > > Airflow 1.10.x (any recent version) for Python 3. We > > > > should > > > > be > > > > able > > > > to > > > > release a separate pip-installable package for those > > > > operators > > > > that > > > > users > > > > should be able to install in Airflow 1.10.x. > > > > > > > > Any user will be able to install this separate > > > > package in > > > > their > > > > Airflow > > > > 1.10.x installation and start using those new > > > > "provider" > > > > operators > > > > in > > > > parallel to the old 1.10.x operators. Other providers > > > > ("microsoft", > > > > "amazon") might follow the same approach if they > > > > want. We > > > > could > > > > even > > > > at > > > > some point decide to move some of the core operators > > > > in > > > > similar > > > > fashion > > > > (for example following the structure proposed in the > > > > latest > > > > documentation: > > > > fundamentals / software / etc. > > > > > > > > > > > > > > > > > https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html) > > < > > > https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html > )> > > > > > > > > *Pros and cons* > > > > > > > > There are a number of pros: > > > > > > > > - Users will have an easier migration path if they > > > > are > > > > deeply > > > > vested > > > > into 1.10.* version > > > > - It's possible to migrate in stages for people who > > > > are > > > > also > > > > vested > > > > in > > > > py2: *py2 (1.10) -> py3 (1.10) -> py3 + new > > > > operators > > > > (1.10) > > > > -> > > > > py3 > > > > + > > > > 2.0* > > > > - Moving to new operators in py3 + new operators can > > > > be > > > > done > > > > gradually. Old operators will continue to work while > > > > new > > > > can > > > > be > > > > used > > > > more > > > > and more > > > > - People will get incentivised to migrate to python > > > > 3 > > > > before > > > > 2.0 > > > > is > > > > out (by using new operators) > > > > - Each provider "package" can have independent > > > > release > > > > schedule > > > > - > > > > and > > > > add functionality in already released Airflow > > > > versions. > > > > - We do not take out any functionality from the > > > > users - we > > > > just > > > > add > > > > more options > > > > - The releases can be - similarly as main airflow > > > > releases - > > > > voted > > > > separately by PMC after "stewards" of the package > > > > (per > > > > provider) > > > > perform > > > > round of testing on 1.10.* versions. > > > > - Users will start migrating to new operators > > > > earlier and > > > > have > > > > smoother switch to 2.0 later > > > > - The latest improved operators will start > > > > > > > > There are three cons I could think of: > > > > > > > > - There will be quite a lot of duplication between > > > > old and > > > > new > > > > operators (they will co-exist in 1.10). That might > > > > lead to > > > > confusion > > > > of > > > > users and problems with cooperation between > > > > different > > > > operators/hooks > > > > - Having new operators in 1.10 python 3 might keep > > > > people > > > > from > > > > migrating to 2.0 > > > > - It will require some maintenance and separate > > > > release > > > > overhead. > > > > > > > > I already spoke to Composer team @Google and they are > > > > very > > > > positive > > > > about > > > > this. I also spoke to Ash and seems it might also be > > > > OK for > > > > Astronomer > > > > team. We have Google's backing and support, and we > > > > can > > > > provide > > > > maintenance > > > > and support for those packages - being an example for > > > > other > > > > providers > > > > how > > > > they can do it. > > > > > > > > Let me know what you think - and whether I should > > > > make it > > > > into > > > > an > > > > official > > > > AIP maybe? > > > > > > > > J. > > > > > > > > > > > > > > > > -- > > > > > > > > Jarek Potiuk > > > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> | > > > Principal > > > > Software > > > > Engineer > > > > > > > > M: +48 660 796 129 <+48660796129> > > > > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/ > >> > > > > > > > > > > > > > > > > -- > > > > > > > > Jarek Potiuk > > > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> | > > > Principal > > > > Software > > > > Engineer > > > > > > > > M: +48 660 796 129 <+48660796129> > > > > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/ > >> > > > > > > > > > > > > > > > > -- > > > > > > > > Jarek Potiuk > > > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> | > > > Principal > > > > Software > > > > Engineer > > > > > > > > M: +48 660 796 129 <+48660796129> > > > > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/ > >> > > > > > > > > > > > > > > > > -- > > > > > > > > Tomasz Urbaszek > > > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> | > Junior > > > Software > > > > Engineer > > > > > > > > M: +48 505 628 493 <+48505628493> > > > > E: tomasz.urbas...@polidea.com > > > > <tomasz.urbasz...@polidea.com > > > > > > > > > > > > Unique Tech > > > > Check out our projects! > > > > <https://www.polidea.com/our-work <https://www.polidea.com/our-work > >> > > > > > > > > > > > > > > > > -- > > > > > > > > Jarek Potiuk > > > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> | > > > Principal Software > > > > Engineer > > > > > > > > M: +48 660 796 129 <+48660796129> > > > > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/ > >> > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Jarek Potiuk > > > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> | > > > Principal Software > > > > Engineer > > > > > > > > M: +48 660 796 129 <+48660796129> > > > > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/ > >> > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Jarek Potiuk > > > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> | > > > Principal Software Engineer > > > > > > > > M: +48 660 796 129 <+48660796129> > > > > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/ > >> > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Jarek Potiuk > > > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> | > > > Principal Software Engineer > > > > > > > > M: +48 660 796 129 <+48660796129> > > > > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/ > >> > > > > > > > > > > > > > > > > > > -- > > > > Jarek Potiuk > > Polidea <https://www.polidea.com/> | Principal Software Engineer > > > > M: +48 660 796 129 <+48660796129> > > [image: Polidea] <https://www.polidea.com/> > > > -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> [image: Polidea] <https://www.polidea.com/>