Hey Ash, Thanks for the offer. I must admin pkgutil and package namespaces are not the best documented part of python.
I dug a deep deeper and I found a similar problem - https://github.com/pypa/setuptools/issues/895. Seems that even if it is not explicitly explained in pkgutil documentation, this comment (assuming it is right) explains everything: *"That's right. All parents of a namespace package must also be namespace packages, as they will necessarily share that parent name space (farm and farm.deps in this example)."* There are few possibilities mentioned in the issue on how this can be "workarounded", but those are by far not perfect solutions. They would require patching already installed airflow's __init__.py to work - to manipulate the search path, Still from my tests I do not know if this would be possible at all because of the non-trivial __init__.py we have (and use) in the *airflow* package. We have a few PRs now waiting for decision on that one I think, so maybe we can simply agree that we should use another package (I really like *"airflow_ext" *:D and use it from now on? What do you (and others) think. I'd love to start voting on it soon. J. On Thu, Oct 31, 2019 at 5:37 PM Ash Berlin-Taylor <a...@apache.org> wrote: > Let me run some tests too - I've used them a bit in the past. I thought > since we only want to make airflow.providers a namespace package it might > work for us. > > Will report back next week. > > -ash > > On 31 October 2019 15:58:22 GMT, Jarek Potiuk <jarek.pot...@polidea.com> > wrote: > >The same repo (so mono-repo approach). All packages would be in > >"airflow_integrations" directory. It's mainly about moving the > >operators/hooks/sensor files to different directory structure. > > > >It might be done pretty much without changing the current > >installation/development model: > > > >1) We can add setup.py command to install all the packages in -e mode > >in > >the main setup.py (to make it easier to install all deps in one go). > >2) We can add dependencies in setup.py extras to install appropriate > >packages. For example [google] extra will 'require > >apache-airflow-integrations-providers-google' package - or > >apache-airflow-providers-google if we decide to skip -integrations from > >the > >package name to make it shorter. > > > >The only potential drawback I see is a bit more involved setup of the > >IDE. > > > >This way installation method for both dev and prod remains simple. > > > >In the future we can have separate release schedule for the packages > >(AIP-8) but for now we can stick to the same version for > >'apache-airflow' > >and 'apache-airflow-integrations-*' package (+ separate release > >schedule > >for backporting needs) > >Here again the structure of repo (we will likely be able to use native > >namespaces so I removed some needles __init__.py). > > > >|-- airflow > >| |- __init__.py| |- operators -> fundamental operators are here > >|-- tests -> tests for core airflow are here (optionally we can move > >them under "airflow")|-- setup.py -> setup.py for the "apache-airflow" > >package|-- airflow_integrations > >| |-providers > >| | |-google > >| | |-setup.py -> setup.py for the > >"apache-airflow-integrations-providers-google" package > >| | |-airflow_integrations > >| | |-providers > >| | |-google > >| | |-__init__.py > >| | | tests -> tests for the > >"apache-airflow-integrations-providers-google" package| | > >|-__init__.py| |-protocols > >| |-setup.py -> setup.py for the > >"apache-airflow-integrations-protocols" package > >| |-airflow_integrations > >| |-protocols > >| |-__init__.py| |-tests -> tests for the > >"apache-airflow-integrations-protocols" package > > > > > >J. > > > >On Thu, Oct 31, 2019 at 3:38 PM Kaxil Naik <kaxiln...@gmail.com> wrote: > > > >> So create another package in a different repo? or the same repo with > >a > >> separate setup.py file that has airflow has dependency? > >> > >> > >> > >> > >> On Thu, Oct 31, 2019 at 2:32 PM Jarek Potiuk > ><jarek.pot...@polidea.com> > >> wrote: > >> > >> > TL;DR; I did some more testing on how namespaces work. I still > >believe > >> the > >> > only way to use namespaces is to have separate (for example > >> > "airflow_integrations") package for all backportable packages. > >> > > >> > I am not sue if someone used namespaces before, but after reading > >and > >> > trying out , the main blocker seems to be that we have non-trivial > >code > >> in > >> > airflow's "__init__.py" (including class definitions, imported > >> > sub-packages and plugin initialisation). > >> > > >> > Details are in > >> > https://packaging.python.org/guides/packaging-namespace-packages/ > >but > >> it's > >> > a long one so let me summarize my findings: > >> > > >> > - In order to use "airflow.providers" package we would have to > >declare > >> > "airflow" as namespace > >> > - It can be done in three different ways: > >> > - omitting __init__.py in this package (native/implicit > >namespace) > >> > - making __init__.py of the "airflow" package in main > >airflow (and > >> > other packages) must be "*__path__ = > >> > __import__('pkgutil').extend_path(__path__, __name__)*" > >(pkgutil > >> > style) or > >> "*__import__('pkg_resources').declare_namespace(__name__)*" > >> > (pkg_resources style) > >> > > >> > The first is not possible (we already have __init__.py in > >"airflow". > >> > The second case is not possible because we already have quite a lot > >in > >> the > >> > airflow's "__init__.py" and both pkgutil and pkg_resources style > >state: > >> > > >> > "*Every* distribution that uses the namespace package must include > >an > >> > identical *__init__.py*. If any distribution does not, it will > >cause the > >> > namespace logic to fail and the other sub-packages will not be > >> importable. > >> > *Any > >> > additional code in __init__.py will be inaccessible."* > >> > > >> > I even tried to add those pkgutil/pkg_resources to airflow and do > >some > >> > experimenting with it - but it does not work. Pip install fails at > >the > >> > plugins_manager as "airflow.plugins" is not accessible (kind of > >> expected), > >> > but I am sure there will be other problems as well. :( > >> > > >> > Basically - we cannot turn "airflow" into namespace because it has > >some > >> > "__init__.py" logic :(. > >> > > >> > So I think it still holds that if we want to use namespaces, we > >should > >> use > >> > another package. The *"airflow_integrations"* is current candidate, > >but > >> we > >> > can think of some nicer/shorter one: "airflow_ext", "airflow_int", > >> > "airflow_x", "airflow_mod", "airlfow_next", "airflow_xt", > >"airflow_", > >> > "ext_airflow", .... Interestingly "airflow_" is the one suggested > >by > >> PEP8 > >> > to avoid conflicts with Python names (which is a different case but > >kind > >> of > >> > close). > >> > > >> > What do you think? > >> > > >> > J. > >> > > >> > On Tue, Oct 29, 2019 at 4:51 PM Kaxil Naik <kaxiln...@gmail.com> > >wrote: > >> > > >> > > The namespace feature looks promising and from your tests, it > >looks > >> like > >> > it > >> > > would work well from Airflow 2.0 and onwards. > >> > > > >> > > I will look at it in-depth and see if I have more suggestions or > >> opinion > >> > on > >> > > it > >> > > > >> > > On Tue, Oct 29, 2019 at 3:32 PM Jarek Potiuk > ><jarek.pot...@polidea.com > >> > > >> > > wrote: > >> > > > >> > > > TL;DR; We did some testing about namespaces and packaging (and > >> > potential > >> > > > backporting options for 1.10.* python3 Airflows) and we think > >it's > >> best > >> > > to > >> > > > use namespaces quickly and use different package name > >> > > > "airflow-integrations" for all non-fundamental integrations. > >> > > > > >> > > > Unless we missed some tricks, we cannot use airflow.* > >sub-packages > >> for > >> > > the > >> > > > 1.10.* backportable packages. Example: > >> > > > > >> > > > - "*apache-airflow"* package provides: "airflow.*" (this is > >what > >> we > >> > > have > >> > > > today) > >> > > > - "*apache-airflow-providers-google*": provides > >> > > > "airflow.providers.google.*" packages > >> > > > > >> > > > If we install both packages (old apache-airflow 1.10.6 and new > >> > > > apache-airflow-providers-google from 2.0) - it seems that > >> > > > the "airflow.providers.google.*" package cannot be imported. > >This is > >> a > >> > > bit > >> > > > of a problem if we would like to backport the operators from > >Airflow > >> > 2.0 > >> > > to > >> > > > Airflow 1.10 in a way that will be forward-compatible We really > >want > >> > > users > >> > > > who started using backported operators in 1.10.* do not have to > >> change > >> > > > imports in their DAGs to run them in Airflow 2.0. > >> > > > > >> > > > We discussed it internally in our team and considered several > >> options, > >> > > but > >> > > > we think the best way will be to go straight to "namespaces" in > >> Airflow > >> > > 2.0 > >> > > > and to have the integrations (as discussed in AIP-21 > >discussion) to > >> be > >> > > in a > >> > > > separate "*airflow_integrations*" package. It might be even > >more > >> > towards > >> > > > the AIP-8 implementation and plays together very well in terms > >of > >> > > > "stewardship" discussed in AIP-21 now. But we will still keep > >(for > >> now) > >> > > > single release process for all packages for 2.0 (except for the > >> > > backporting > >> > > > which can be done per-provider before 2.0 release) and provide > >a > >> > > foundation > >> > > > for future more complex release cycles in future versions. > >> > > > > >> > > > Herre is the way how the new Airflow 2.0 repository could look > >like > >> (i > >> > > only > >> > > > show subset of dirs but they are representative). For those > >whose > >> email > >> > > > fixed/colorfont will get corrupted here is an image of this > >structure > >> > > > https://pasteboard.co/IEesTih.png: > >> > > > > >> > > > |-- airflow > >> > > > | |- __init__.py| |- operators -> fundamental operators are > >here > >> > > > |-- tests -> tests for core airflow are here (optionally we can > >move > >> > > > them under "airflow")|-- setup.py -> setup.py for the > >> "apache-airflow" > >> > > > package|-- airflow_integrations > >> > > > | |-providers > >> > > > | | |-google > >> > > > | | |-setup.py -> setup.py for the > >> > > > "apache-airflow-integrations-providers-google" package > >> > > > | | |-airflow_integrations > >> > > > | | |-__init__.py > >> > > > | | |-providers > >> > > > | | |-__init__.py > >> > > > | | |-google > >> > > > | | |-__init__.py > >> > > > | | | tests -> tests for the > >> > > > "apache-airflow-integrations-providers-google" package| | > >> > > > |-__init__.py| |-protocols > >> > > > | |-setup.py -> setup.py for the > >> > > > "apache-airflow-integrations-protocols" package > >> > > > | |-airflow_integrations > >> > > > | |-protocols > >> > > > | |-__init__.py| |-tests -> tests for the > >> > > > "apache-airflow-integrations-protocols" package > >> > > > > >> > > > There are a number of pros for this solution: > >> > > > > >> > > > - We could use the standard namespaces feature of python to > >build > >> > > > multiple packages: > >> > > > > >https://packaging.python.org/guides/packaging-namespace-packages/ > >> > > > - Installation for users will be the same as previously. We > >could > >> > > > install the needed packages automatically when particular > >extras > >> are > >> > > > used > >> > > > (pip install apache-airflow[google] could install both > >> > > "apache-airflow" > >> > > > and > >> > > > "apache-airflow-integrations-providers-google") > >> > > > - We could have custom setup.py installation process for > >> developers > >> > > that > >> > > > could install all the packages in development ("-e ." mode) > >in a > >> > > single > >> > > > operation. > >> > > > - In case of transfer packages we could have nice error > >messages > >> > > > informing that the other package needs to be installed (for > >> example > >> > > > S3->GCS > >> > > > operator would import > >"airflow-integrations.providers.amazon.*" > >> and > >> > if > >> > > > it > >> > > > fails it could raise ("Please install [amazon] extra to use > >me.") > >> > > > - We could implement numerous optimisations in the way how > >we run > >> > > tests > >> > > > in CI (for example run all the "providers" tests only with > >sqlite, > >> > run > >> > > > tests in parallel etc.) > >> > > > - We could implement it gradually - we do not have to have a > >"big > >> > > bang" > >> > > > approach - we can implement it in "provider-by-provider" way > >and > >> > test > >> > > it > >> > > > with one provider (Google) first to make sure that all the > >> > mechanisms > >> > > > are > >> > > > working > >> > > > - For now we could have the monorepo approach where all the > >> packages > >> > > > will be developed in concert - for now avoiding the > >dependency > >> > > problems > >> > > > (but allowing for back-portability to 1.10). > >> > > > - We will have clear boundaries between packages and ability > >to > >> test > >> > > for > >> > > > some unwanted/hidden dependencies between packages. > >> > > > - We could switch to (much better) sphinx-apidoc package to > >> continue > >> > > > building single documentation for all of those (sphinx > >apidoc has > >> > > > support > >> > > > for namespaces). > >> > > > > >> > > > As we are working on GCP move from contrib to core, we could > >make all > >> > the > >> > > > effort to test it and try it before we merge it to master so > >that it > >> > will > >> > > > be ready for others (and we could help with most of the moves > >> > > afterwards). > >> > > > It seems complex, but in fact in most cases it will be very > >simple > >> move > >> > > > between the packages and can be done incrementally so there is > >little > >> > > risk > >> > > > in doing this I think. > >> > > > > >> > > > J. > >> > > > > >> > > > > >> > > > On Mon, Oct 28, 2019 at 11:45 PM Kevin Yang <yrql...@gmail.com> > >> wrote: > >> > > > > >> > > > > Tomasz and Ash got good points about the overhead of having > >> separate > >> > > > repos. > >> > > > > But while we grow bigger and more mature, I would prefer to > >have > >> what > >> > > was > >> > > > > described in AIP-8. It shouldn't be extremely hard for us to > >come > >> up > >> > > with > >> > > > > good strategies to handle the overhead. AIP-8 already talked > >about > >> > how > >> > > it > >> > > > > can benefit us. IMO on a high level, having clearly > >seperation on > >> > core > >> > > > vs. > >> > > > > hooks/operators would make the project much more scalable and > >the > >> > gains > >> > > > > would outweigh the cost we pay. > >> > > > > > >> > > > > That being said, I'm supportive to this moving towards AIP-8 > >while > >> > > > learning > >> > > > > approach, quite a good practise to tackle a big project. > >Looking > >> > > forward > >> > > > to > >> > > > > read the AIP. > >> > > > > > >> > > > > > >> > > > > Cheers, > >> > > > > Kevin Y > >> > > > > > >> > > > > On Mon, Oct 28, 2019 at 6:21 AM Jarek Potiuk < > >> > jarek.pot...@polidea.com > >> > > > > >> > > > > wrote: > >> > > > > > >> > > > > > We are checking how we can use namespaces in back-portable > >way > >> and > >> > we > >> > > > > will > >> > > > > > have POC soon so that we all will be able to see how it > >will look > >> > > like. > >> > > > > > > >> > > > > > J. > >> > > > > > > >> > > > > > On Mon, Oct 28, 2019 at 1:24 PM Ash Berlin-Taylor < > >> a...@apache.org> > >> > > > > wrote: > >> > > > > > > >> > > > > > > I'll have to read your proposal in detail (sorry, no time > >right > >> > > > now!), > >> > > > > > but > >> > > > > > > I'm broadly in favour of this approach, and I think > >keeping > >> them > >> > > _in_ > >> > > > > the > >> > > > > > > same repo is the best plan -- that makes writing and > >testing > >> > > > > > cross-cutting > >> > > > > > > changes easier. > >> > > > > > > > >> > > > > > > -a > >> > > > > > > > >> > > > > > > > On 28 Oct 2019, at 12:14, Tomasz Urbaszek < > >> > > > > tomasz.urbas...@polidea.com > >> > > > > > > > >> > > > > > > wrote: > >> > > > > > > > > >> > > > > > > > I think utilizing namespaces should reduce a lot of > >problems > >> > > raised > >> > > > > by > >> > > > > > > > using separate repos (who will manage it? how to > >release? > >> where > >> > > > > should > >> > > > > > be > >> > > > > > > > the repo?). > >> > > > > > > > > >> > > > > > > > Bests, > >> > > > > > > > Tomek > >> > > > > > > > > >> > > > > > > > On Sun, Oct 27, 2019 at 11:54 AM Jarek Potiuk < > >> > > > > > jarek.pot...@polidea.com> > >> > > > > > > > wrote: > >> > > > > > > > > >> > > > > > > >> Thanks Bas for comments! Let me share my thoughts > >below. > >> > > > > > > >> > >> > > > > > > >> On Sun, Oct 27, 2019 at 9:23 AM Bas Harenslak < > >> > > > > > > >> basharens...@godatadriven.com> > >> > > > > > > >> wrote: > >> > > > > > > >> > >> > > > > > > >>> Hi Jarek, I definitely see a future in creating > >separate > >> > > > > installable > >> > > > > > > >>> packages for various operators/hooks/etc (as in > >AIP-8). > >> This > >> > > > would > >> > > > > > IMO > >> > > > > > > >>> strip the “core” Airflow to only what’s needed and > >result > >> in > >> > a > >> > > > > small > >> > > > > > > >>> package without a ton of dependencies (and make it > >more > >> > > > > maintainable, > >> > > > > > > >>> shorter tests, etc etc etc). Not exactly sure though > >what > >> > > you’re > >> > > > > > > >> proposing > >> > > > > > > >>> in your e-mail, is it a new AIP for an intermediate > >step > >> > > towards > >> > > > > > AIP-8? > >> > > > > > > >>> > >> > > > > > > >> > >> > > > > > > >> It's a new AIP I am proposing. For now it's only for > >> > > backporting > >> > > > > the > >> > > > > > > new > >> > > > > > > >> 2.0 import paths to 1.10.* series. > >> > > > > > > >> > >> > > > > > > >> It's more of "incremental going in direction of AIP-8 > >and > >> > > learning > >> > > > > > some > >> > > > > > > >> difficulties involved" than implementing AIP-8 fully. > >We are > >> > > > taking > >> > > > > > > >> advantage of changes in import paths from AIP-21 which > >make > >> it > >> > > > > > possible > >> > > > > > > to > >> > > > > > > >> have both old and new (optional) operators available > >in > >> 1.10.* > >> > > > > series > >> > > > > > of > >> > > > > > > >> Airflow. I think there is a lot more to do for full > >> > > implementation > >> > > > > of > >> > > > > > > >> AIP-8: decisions how to maintain, install those > >operator > >> > groups > >> > > > > > > separately, > >> > > > > > > >> stewardship model/organisation for the separate > >groups, how > >> to > >> > > > > manage > >> > > > > > > >> cross-dependencies, procedures for releasing the > >packages > >> etc. > >> > > > > > > >> > >> > > > > > > >> I think about this new AIP also as a learning effort - > >we > >> > would > >> > > > > learn > >> > > > > > > more > >> > > > > > > >> how separate packaging works and then we can follow up > >with > >> > > AIP-8 > >> > > > > full > >> > > > > > > >> implementation for "modular" Airflow. Then AIP-8 could > >be > >> > > > > implemented > >> > > > > > in > >> > > > > > > >> Airflow 2.1 for example - or 3.0 if we start following > >> > semantic > >> > > > > > > versioning > >> > > > > > > >> - based on those learnings. It's a bit of good example > >of > >> > having > >> > > > > cake > >> > > > > > > and > >> > > > > > > >> eating it too. We can try out modularity in 1.10.* > >while > >> > cutting > >> > > > the > >> > > > > > > scope > >> > > > > > > >> of 2.0 and not implementing full management/release > >> procedure > >> > > for > >> > > > > > AIP-8 > >> > > > > > > >> yet. > >> > > > > > > >> > >> > > > > > > >> > >> > > > > > > >>> Thinking about this, I think there are still a few > >grey > >> areas > >> > > > > (which > >> > > > > > > >> would > >> > > > > > > >>> be good to discuss in a new AIP, or continue on > >AIP-8): > >> > > > > > > >>> > >> > > > > > > >>> * In your email you only speak only about the 3 > >big > >> cloud > >> > > > > > providers > >> > > > > > > >>> (btw I made a PR for migrating all AWS components -> > >> > > > > > > >>> https://github.com/apache/airflow/pull/6439). Is > >there a > >> > plan > >> > > > for > >> > > > > > > >>> splitting other components than Google/AWS/Azure? > >> > > > > > > >>> > >> > > > > > > >> > >> > > > > > > >> We could add more groups as part of this new AIP > >indeed (as > >> an > >> > > > > > > extension to > >> > > > > > > >> AIP-21 and pre-requisite to AIP-8). We already see how > >> > > > > > > moving/deprecation > >> > > > > > > >> works for the providers package - it works for > >GCP/Google > >> > rather > >> > > > > > nicely. > >> > > > > > > >> But there is nothing to prevent us from extending it > >to > >> cover > >> > > > other > >> > > > > > > groups > >> > > > > > > >> of operators/hooks. If you look at the current > >structure of > >> > > > > > > documentation > >> > > > > > > >> done by Kamil, we can follow the structure there and > >move > >> the > >> > > > > > > >> operators/hooks accordingly ( > >> > > > > > > >> > >> > > > > > >> > > >https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html > >> > > > > > ): > >> > > > > > > >> > >> > > > > > > >> Fundamentals, ASF: Apache Software Foundation, > >Azure: > >> > > > Microsoft > >> > > > > > > >> Azure, AWS: Amazon Web Services, GCP: Google Cloud > >Platform, > >> > > > Service > >> > > > > > > >> integrations, Software integrations, Protocol > >integrations. > >> > > > > > > >> > >> > > > > > > >> I am happy to include that in the AIP - if others > >agree > >> it's a > >> > > > good > >> > > > > > > idea. > >> > > > > > > >> Out of those groups - I think only Fundamentals > >should not > >> be > >> > > > > > > back-ported. > >> > > > > > > >> Others should be rather easy to port (if we decide > >to). We > >> > > already > >> > > > > > have > >> > > > > > > >> quite a lot of those in the new GCP operators for 2.0. > >So > >> > > starting > >> > > > > > with > >> > > > > > > >> GCP/Google group is a good idea. Also following with > >Cloud > >> > > > Providers > >> > > > > > > first > >> > > > > > > >> is a good thing. For example we have now support from > >Google > >> > > > > Composer > >> > > > > > > team > >> > > > > > > >> to do this separation for GCP (and we learn from it) > >and > >> then > >> > we > >> > > > can > >> > > > > > > claim > >> > > > > > > >> the stewardship in our team for releasing the python > >3/ > >> > Airflow > >> > > > > > > >> 1.10-compatible "airflow-google" packages. Possibly > >other > >> > Cloud > >> > > > > > > >> Providers/teams might follow this (if they see the > >value in > >> > it) > >> > > > and > >> > > > > > > there > >> > > > > > > >> could be different stewards for those. And then we can > >do > >> > other > >> > > > > groups > >> > > > > > > if > >> > > > > > > >> we decide to. I think this way we can learn whether > >AIP-8 is > >> > > > > > manageable > >> > > > > > > and > >> > > > > > > >> what real problems we are going to face. > >> > > > > > > >> > >> > > > > > > >> * Each “plugin” e.g. GCP would be a separate repo, > >should > >> > we > >> > > > > create > >> > > > > > > >>> some sort of blueprint for such packages? > >> > > > > > > >>> > >> > > > > > > >> > >> > > > > > > >> I think we do not need separate repos (at all) but in > >this > >> new > >> > > AIP > >> > > > > we > >> > > > > > > can > >> > > > > > > >> test it before we decide to go for AIP-8. IMHO - > >monorepo > >> > > approach > >> > > > > > will > >> > > > > > > >> work here rather nicely. We could use python-3 native > >> > namespaces > >> > > > > > > >> < > >> > > > > >https://packaging.python.org/guides/packaging-namespace-packages/> > >> > > > > > for > >> > > > > > > >> the > >> > > > > > > >> sub-packages when we go full AIP-8. For now we could > >simply > >> > > > package > >> > > > > > the > >> > > > > > > new > >> > > > > > > >> operators in separate pip package for Python 3 version > >> 1.10.* > >> > > > series > >> > > > > > > only. > >> > > > > > > >> We only need to test if it works well with another > >package > >> > > > providing > >> > > > > > > >> 'airflow.providers.*' after apache-airflow is > >installed > >> > > (providing > >> > > > > > > >> 'airflow' package). But I think we can make it work. I > >don't > >> > > think > >> > > > > we > >> > > > > > > >> really need to split the repos, namespaces will work > >just > >> fine > >> > > and > >> > > > > has > >> > > > > > > >> easier management of cross-repository dependencies > >(but we > >> can > >> > > > learn > >> > > > > > > >> otherwise). For sure we will not need it for the new > >> proposed > >> > > AIP > >> > > > of > >> > > > > > > >> backporting groups to 1.10 and we can defer that > >decision to > >> > > AIP-8 > >> > > > > > > >> implementation time. > >> > > > > > > >> > >> > > > > > > >> > >> > > > > > > >>> * In which Airflow version do we start raising > >> deprecation > >> > > > > > warnings > >> > > > > > > >>> and in which version would we remove the original? > >> > > > > > > >>> > >> > > > > > > >> > >> > > > > > > >> I think we should do what we did in GCP case already. > >Those > >> > old > >> > > > > > > "imports" > >> > > > > > > >> for operators can be made as deprecated in Airflow 2.0 > >(and > >> > > > removed > >> > > > > in > >> > > > > > > 2.1 > >> > > > > > > >> or 3.0 if we start following semantic versioning). We > >can > >> > > however > >> > > > do > >> > > > > > it > >> > > > > > > >> before in 1.10.7 or 1.10.8 if we release those > >(without > >> > removing > >> > > > the > >> > > > > > old > >> > > > > > > >> operators yet - just raise deprecation warnings and > >inform > >> > that > >> > > > for > >> > > > > > > python3 > >> > > > > > > >> the new "airflow-google", "airflow-aws" etc. packages > >can be > >> > > > > installed > >> > > > > > > and > >> > > > > > > >> users can switch to it). > >> > > > > > > >> > >> > > > > > > >> J. > >> > > > > > > >> > >> > > > > > > >> > >> > > > > > > >>> > >> > > > > > > >>> Cheers, > >> > > > > > > >>> Bas > >> > > > > > > >>> > >> > > > > > > >>> On 27 Oct 2019, at 08:33, Jarek Potiuk < > >> > > jarek.pot...@polidea.com > >> > > > > > > <mailto: > >> > > > > > > >>> jarek.pot...@polidea.com>> wrote: > >> > > > > > > >>> > >> > > > > > > >>> Hello - any comments on that? I am happy to make it > >into an > >> > AIP > >> > > > :)? > >> > > > > > > >>> > >> > > > > > > >>> On Sun, Oct 13, 2019 at 5:53 PM Jarek Potiuk < > >> > > > > > jarek.pot...@polidea.com > >> > > > > > > >>> <mailto:jarek.pot...@polidea.com>> > >> > > > > > > >>> wrote: > >> > > > > > > >>> > >> > > > > > > >>> *Motivation* > >> > > > > > > >>> > >> > > > > > > >>> I think we really should start thinking about making > >it > >> > easier > >> > > to > >> > > > > > > migrate > >> > > > > > > >>> to 2.0 for our users. After implementing some recent > >> changes > >> > > > > related > >> > > > > > to > >> > > > > > > >>> AIP-21- > >> > > > > > > >>> Changes in import paths > >> > > > > > > >>> < > >> > > > > > > >>> > >> > > > > > > >> > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths > >> > > > > > > >>> > >> > > > > > > >>> I > >> > > > > > > >>> think I have an idea that might help with it. > >> > > > > > > >>> > >> > > > > > > >>> *Proposal* > >> > > > > > > >>> > >> > > > > > > >>> We could package some of the new and improved 2.0 > >operators > >> > > > (moved > >> > > > > to > >> > > > > > > >>> "providers" package) and let them be used in Python 3 > >> > > environment > >> > > > > of > >> > > > > > > >>> airflow 1.10.x. > >> > > > > > > >>> > >> > > > > > > >>> This can be done case-by-case per "cloud provider". > >It > >> should > >> > > not > >> > > > > be > >> > > > > > > >>> obligatory, should be largely driven by each > >provider. It's > >> > not > >> > > > yet > >> > > > > > > full > >> > > > > > > >>> AIP-8 > >> > > > > > > >>> Split Hooks/Operators into separate packages > >> > > > > > > >>> < > >> > > > > > > >>> > >> > > > > > > >> > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303 > >> > > > > > > >>> . > >> > > > > > > >>> It's > >> > > > > > > >>> merely backporting of some operators/hooks to get it > >work > >> in > >> > > > 1.10. > >> > > > > > But > >> > > > > > > by > >> > > > > > > >>> doing it we might try out the concept of splitting, > >learn > >> > about > >> > > > > > > >> maintenance > >> > > > > > > >>> problems and maybe implement full *AIP-8 *approach in > >2.1 > >> > > > > > consistently > >> > > > > > > >>> across the board. > >> > > > > > > >>> > >> > > > > > > >>> *Context* > >> > > > > > > >>> > >> > > > > > > >>> Part of the AIP-21 was to move import paths for Cloud > >> > providers > >> > > > to > >> > > > > > > >>> separate providers/<PROVIDER> package. An example for > >that > >> > (the > >> > > > > first > >> > > > > > > >>> provider we already almost migrated) was > >providers/google > >> > > package > >> > > > > > > >> (further > >> > > > > > > >>> divided into gcp/gsuite etc). > >> > > > > > > >>> > >> > > > > > > >>> We've done a massive migration of all the > >Google-related > >> > > > operators, > >> > > > > > > >>> created a few missing ones and retrofitted some old > >> operators > >> > > to > >> > > > > > follow > >> > > > > > > >> GCP > >> > > > > > > >>> best practices and fixing a number of problems - also > >> > > > implementing > >> > > > > > > >> Python3 > >> > > > > > > >>> and Pylint compatibility. Some of these > >operators/hooks are > >> > not > >> > > > > > > backwards > >> > > > > > > >>> compatible. Those that are compatible are still > >available > >> via > >> > > the > >> > > > > old > >> > > > > > > >>> imports with deprecation warning. > >> > > > > > > >>> > >> > > > > > > >>> We've added missing tests (including system tests) > >and > >> > missing > >> > > > > > > features - > >> > > > > > > >>> improving some of the Google operators - giving the > >users > >> > more > >> > > > > > > >> capabilities > >> > > > > > > >>> and fixing some issues. Those operators should pretty > >much > >> > > "just > >> > > > > > work" > >> > > > > > > in > >> > > > > > > >>> Airflow 1.10.x (any recent version) for Python 3. We > >should > >> > be > >> > > > able > >> > > > > > to > >> > > > > > > >>> release a separate pip-installable package for those > >> > operators > >> > > > that > >> > > > > > > users > >> > > > > > > >>> should be able to install in Airflow 1.10.x. > >> > > > > > > >>> > >> > > > > > > >>> Any user will be able to install this separate > >package in > >> > their > >> > > > > > Airflow > >> > > > > > > >>> 1.10.x installation and start using those new > >"provider" > >> > > > operators > >> > > > > in > >> > > > > > > >>> parallel to the old 1.10.x operators. Other providers > >> > > > ("microsoft", > >> > > > > > > >>> "amazon") might follow the same approach if they > >want. We > >> > could > >> > > > > even > >> > > > > > at > >> > > > > > > >>> some point decide to move some of the core operators > >in > >> > similar > >> > > > > > fashion > >> > > > > > > >>> (for example following the structure proposed in the > >latest > >> > > > > > > >> documentation: > >> > > > > > > >>> fundamentals / software / etc. > >> > > > > > > >>> > >> > > > > > > >> > > > >https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html) > >> > > > > > > >>> > >> > > > > > > >>> *Pros and cons* > >> > > > > > > >>> > >> > > > > > > >>> There are a number of pros: > >> > > > > > > >>> > >> > > > > > > >>> - Users will have an easier migration path if they > >are > >> > deeply > >> > > > > vested > >> > > > > > > >>> into 1.10.* version > >> > > > > > > >>> - It's possible to migrate in stages for people who > >are > >> also > >> > > > > vested > >> > > > > > in > >> > > > > > > >>> py2: *py2 (1.10) -> py3 (1.10) -> py3 + new > >operators > >> (1.10) > >> > > -> > >> > > > > py3 > >> > > > > > + > >> > > > > > > >>> 2.0* > >> > > > > > > >>> - Moving to new operators in py3 + new operators can > >be > >> done > >> > > > > > > >>> gradually. Old operators will continue to work while > >new > >> can > >> > > be > >> > > > > used > >> > > > > > > >> more > >> > > > > > > >>> and more > >> > > > > > > >>> - People will get incentivised to migrate to python > >3 > >> before > >> > > 2.0 > >> > > > > is > >> > > > > > > >>> out (by using new operators) > >> > > > > > > >>> - Each provider "package" can have independent > >release > >> > > schedule > >> > > > - > >> > > > > > and > >> > > > > > > >>> add functionality in already released Airflow > >versions. > >> > > > > > > >>> - We do not take out any functionality from the > >users - we > >> > > just > >> > > > > add > >> > > > > > > >>> more options > >> > > > > > > >>> - The releases can be - similarly as main airflow > >> releases - > >> > > > voted > >> > > > > > > >>> separately by PMC after "stewards" of the package > >(per > >> > > provider) > >> > > > > > > >> perform > >> > > > > > > >>> round of testing on 1.10.* versions. > >> > > > > > > >>> - Users will start migrating to new operators > >earlier and > >> > have > >> > > > > > > >>> smoother switch to 2.0 later > >> > > > > > > >>> - The latest improved operators will start > >> > > > > > > >>> > >> > > > > > > >>> There are three cons I could think of: > >> > > > > > > >>> > >> > > > > > > >>> - There will be quite a lot of duplication between > >old and > >> > new > >> > > > > > > >>> operators (they will co-exist in 1.10). That might > >lead to > >> > > > > confusion > >> > > > > > > of > >> > > > > > > >>> users and problems with cooperation between > >different > >> > > > > > operators/hooks > >> > > > > > > >>> - Having new operators in 1.10 python 3 might keep > >people > >> > from > >> > > > > > > >>> migrating to 2.0 > >> > > > > > > >>> - It will require some maintenance and separate > >release > >> > > > overhead. > >> > > > > > > >>> > >> > > > > > > >>> I already spoke to Composer team @Google and they are > >very > >> > > > positive > >> > > > > > > about > >> > > > > > > >>> this. I also spoke to Ash and seems it might also be > >OK for > >> > > > > > Astronomer > >> > > > > > > >>> team. We have Google's backing and support, and we > >can > >> > provide > >> > > > > > > >> maintenance > >> > > > > > > >>> and support for those packages - being an example for > >other > >> > > > > providers > >> > > > > > > how > >> > > > > > > >>> they can do it. > >> > > > > > > >>> > >> > > > > > > >>> Let me know what you think - and whether I should > >make it > >> > into > >> > > an > >> > > > > > > >> official > >> > > > > > > >>> AIP maybe? > >> > > > > > > >>> > >> > > > > > > >>> J. > >> > > > > > > >>> > >> > > > > > > >>> > >> > > > > > > >>> > >> > > > > > > >>> -- > >> > > > > > > >>> > >> > > > > > > >>> Jarek Potiuk > >> > > > > > > >>> Polidea <https://www.polidea.com/> | Principal > >Software > >> > > Engineer > >> > > > > > > >>> > >> > > > > > > >>> M: +48 660 796 129 <+48660796129> > >> > > > > > > >>> [image: Polidea] <https://www.polidea.com/> > >> > > > > > > >>> > >> > > > > > > >>> > >> > > > > > > >>> > >> > > > > > > >>> -- > >> > > > > > > >>> > >> > > > > > > >>> Jarek Potiuk > >> > > > > > > >>> Polidea <https://www.polidea.com/> | Principal > >Software > >> > > Engineer > >> > > > > > > >>> > >> > > > > > > >>> M: +48 660 796 129 <+48660796129> > >> > > > > > > >>> [image: Polidea] <https://www.polidea.com/> > >> > > > > > > >>> > >> > > > > > > >>> > >> > > > > > > >> > >> > > > > > > >> -- > >> > > > > > > >> > >> > > > > > > >> Jarek Potiuk > >> > > > > > > >> Polidea <https://www.polidea.com/> | Principal > >Software > >> > > Engineer > >> > > > > > > >> > >> > > > > > > >> M: +48 660 796 129 <+48660796129> > >> > > > > > > >> [image: Polidea] <https://www.polidea.com/> > >> > > > > > > >> > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > -- > >> > > > > > > > > >> > > > > > > > Tomasz Urbaszek > >> > > > > > > > Polidea <https://www.polidea.com/> | Junior Software > >> Engineer > >> > > > > > > > > >> > > > > > > > M: +48 505 628 493 <+48505628493> > >> > > > > > > > E: tomasz.urbas...@polidea.com > ><tomasz.urbasz...@polidea.com > >> > > >> > > > > > > > > >> > > > > > > > Unique Tech > >> > > > > > > > Check out our projects! > ><https://www.polidea.com/our-work> > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > -- > >> > > > > > > >> > > > > > Jarek Potiuk > >> > > > > > Polidea <https://www.polidea.com/> | Principal Software > >Engineer > >> > > > > > > >> > > > > > M: +48 660 796 129 <+48660796129> > >> > > > > > [image: Polidea] <https://www.polidea.com/> > >> > > > > > > >> > > > > > >> > > > > >> > > > > >> > > > -- > >> > > > > >> > > > Jarek Potiuk > >> > > > Polidea <https://www.polidea.com/> | Principal Software > >Engineer > >> > > > > >> > > > M: +48 660 796 129 <+48660796129> > >> > > > [image: Polidea] <https://www.polidea.com/> > >> > > > > >> > > > >> > > >> > > >> > -- > >> > > >> > Jarek Potiuk > >> > Polidea <https://www.polidea.com/> | Principal Software Engineer > >> > > >> > M: +48 660 796 129 <+48660796129> > >> > [image: Polidea] <https://www.polidea.com/> > >> > > >> > > > > > >-- > > > >Jarek Potiuk > >Polidea <https://www.polidea.com/> | Principal Software Engineer > > > >M: +48 660 796 129 <+48660796129> > >[image: Polidea] <https://www.polidea.com/> > -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> [image: Polidea] <https://www.polidea.com/>