Let me run some tests too - I've used them a bit in the past. I thought since we only want to make airflow.providers a namespace package it might work for us.
Will report back next week. -ash On 31 October 2019 15:58:22 GMT, Jarek Potiuk <jarek.pot...@polidea.com> wrote: >The same repo (so mono-repo approach). All packages would be in >"airflow_integrations" directory. It's mainly about moving the >operators/hooks/sensor files to different directory structure. > >It might be done pretty much without changing the current >installation/development model: > >1) We can add setup.py command to install all the packages in -e mode >in >the main setup.py (to make it easier to install all deps in one go). >2) We can add dependencies in setup.py extras to install appropriate >packages. For example [google] extra will 'require >apache-airflow-integrations-providers-google' package - or >apache-airflow-providers-google if we decide to skip -integrations from >the >package name to make it shorter. > >The only potential drawback I see is a bit more involved setup of the >IDE. > >This way installation method for both dev and prod remains simple. > >In the future we can have separate release schedule for the packages >(AIP-8) but for now we can stick to the same version for >'apache-airflow' >and 'apache-airflow-integrations-*' package (+ separate release >schedule >for backporting needs) >Here again the structure of repo (we will likely be able to use native >namespaces so I removed some needles __init__.py). > >|-- airflow >| |- __init__.py| |- operators -> fundamental operators are here >|-- tests -> tests for core airflow are here (optionally we can move >them under "airflow")|-- setup.py -> setup.py for the "apache-airflow" >package|-- airflow_integrations >| |-providers >| | |-google >| | |-setup.py -> setup.py for the >"apache-airflow-integrations-providers-google" package >| | |-airflow_integrations >| | |-providers >| | |-google >| | |-__init__.py >| | | tests -> tests for the >"apache-airflow-integrations-providers-google" package| | >|-__init__.py| |-protocols >| |-setup.py -> setup.py for the >"apache-airflow-integrations-protocols" package >| |-airflow_integrations >| |-protocols >| |-__init__.py| |-tests -> tests for the >"apache-airflow-integrations-protocols" package > > >J. > >On Thu, Oct 31, 2019 at 3:38 PM Kaxil Naik <kaxiln...@gmail.com> wrote: > >> So create another package in a different repo? or the same repo with >a >> separate setup.py file that has airflow has dependency? >> >> >> >> >> On Thu, Oct 31, 2019 at 2:32 PM Jarek Potiuk ><jarek.pot...@polidea.com> >> wrote: >> >> > TL;DR; I did some more testing on how namespaces work. I still >believe >> the >> > only way to use namespaces is to have separate (for example >> > "airflow_integrations") package for all backportable packages. >> > >> > I am not sue if someone used namespaces before, but after reading >and >> > trying out , the main blocker seems to be that we have non-trivial >code >> in >> > airflow's "__init__.py" (including class definitions, imported >> > sub-packages and plugin initialisation). >> > >> > Details are in >> > https://packaging.python.org/guides/packaging-namespace-packages/ >but >> it's >> > a long one so let me summarize my findings: >> > >> > - In order to use "airflow.providers" package we would have to >declare >> > "airflow" as namespace >> > - It can be done in three different ways: >> > - omitting __init__.py in this package (native/implicit >namespace) >> > - making __init__.py of the "airflow" package in main >airflow (and >> > other packages) must be "*__path__ = >> > __import__('pkgutil').extend_path(__path__, __name__)*" >(pkgutil >> > style) or >> "*__import__('pkg_resources').declare_namespace(__name__)*" >> > (pkg_resources style) >> > >> > The first is not possible (we already have __init__.py in >"airflow". >> > The second case is not possible because we already have quite a lot >in >> the >> > airflow's "__init__.py" and both pkgutil and pkg_resources style >state: >> > >> > "*Every* distribution that uses the namespace package must include >an >> > identical *__init__.py*. If any distribution does not, it will >cause the >> > namespace logic to fail and the other sub-packages will not be >> importable. >> > *Any >> > additional code in __init__.py will be inaccessible."* >> > >> > I even tried to add those pkgutil/pkg_resources to airflow and do >some >> > experimenting with it - but it does not work. Pip install fails at >the >> > plugins_manager as "airflow.plugins" is not accessible (kind of >> expected), >> > but I am sure there will be other problems as well. :( >> > >> > Basically - we cannot turn "airflow" into namespace because it has >some >> > "__init__.py" logic :(. >> > >> > So I think it still holds that if we want to use namespaces, we >should >> use >> > another package. The *"airflow_integrations"* is current candidate, >but >> we >> > can think of some nicer/shorter one: "airflow_ext", "airflow_int", >> > "airflow_x", "airflow_mod", "airlfow_next", "airflow_xt", >"airflow_", >> > "ext_airflow", .... Interestingly "airflow_" is the one suggested >by >> PEP8 >> > to avoid conflicts with Python names (which is a different case but >kind >> of >> > close). >> > >> > What do you think? >> > >> > J. >> > >> > On Tue, Oct 29, 2019 at 4:51 PM Kaxil Naik <kaxiln...@gmail.com> >wrote: >> > >> > > The namespace feature looks promising and from your tests, it >looks >> like >> > it >> > > would work well from Airflow 2.0 and onwards. >> > > >> > > I will look at it in-depth and see if I have more suggestions or >> opinion >> > on >> > > it >> > > >> > > On Tue, Oct 29, 2019 at 3:32 PM Jarek Potiuk ><jarek.pot...@polidea.com >> > >> > > wrote: >> > > >> > > > TL;DR; We did some testing about namespaces and packaging (and >> > potential >> > > > backporting options for 1.10.* python3 Airflows) and we think >it's >> best >> > > to >> > > > use namespaces quickly and use different package name >> > > > "airflow-integrations" for all non-fundamental integrations. >> > > > >> > > > Unless we missed some tricks, we cannot use airflow.* >sub-packages >> for >> > > the >> > > > 1.10.* backportable packages. Example: >> > > > >> > > > - "*apache-airflow"* package provides: "airflow.*" (this is >what >> we >> > > have >> > > > today) >> > > > - "*apache-airflow-providers-google*": provides >> > > > "airflow.providers.google.*" packages >> > > > >> > > > If we install both packages (old apache-airflow 1.10.6 and new >> > > > apache-airflow-providers-google from 2.0) - it seems that >> > > > the "airflow.providers.google.*" package cannot be imported. >This is >> a >> > > bit >> > > > of a problem if we would like to backport the operators from >Airflow >> > 2.0 >> > > to >> > > > Airflow 1.10 in a way that will be forward-compatible We really >want >> > > users >> > > > who started using backported operators in 1.10.* do not have to >> change >> > > > imports in their DAGs to run them in Airflow 2.0. >> > > > >> > > > We discussed it internally in our team and considered several >> options, >> > > but >> > > > we think the best way will be to go straight to "namespaces" in >> Airflow >> > > 2.0 >> > > > and to have the integrations (as discussed in AIP-21 >discussion) to >> be >> > > in a >> > > > separate "*airflow_integrations*" package. It might be even >more >> > towards >> > > > the AIP-8 implementation and plays together very well in terms >of >> > > > "stewardship" discussed in AIP-21 now. But we will still keep >(for >> now) >> > > > single release process for all packages for 2.0 (except for the >> > > backporting >> > > > which can be done per-provider before 2.0 release) and provide >a >> > > foundation >> > > > for future more complex release cycles in future versions. >> > > > >> > > > Herre is the way how the new Airflow 2.0 repository could look >like >> (i >> > > only >> > > > show subset of dirs but they are representative). For those >whose >> email >> > > > fixed/colorfont will get corrupted here is an image of this >structure >> > > > https://pasteboard.co/IEesTih.png: >> > > > >> > > > |-- airflow >> > > > | |- __init__.py| |- operators -> fundamental operators are >here >> > > > |-- tests -> tests for core airflow are here (optionally we can >move >> > > > them under "airflow")|-- setup.py -> setup.py for the >> "apache-airflow" >> > > > package|-- airflow_integrations >> > > > | |-providers >> > > > | | |-google >> > > > | | |-setup.py -> setup.py for the >> > > > "apache-airflow-integrations-providers-google" package >> > > > | | |-airflow_integrations >> > > > | | |-__init__.py >> > > > | | |-providers >> > > > | | |-__init__.py >> > > > | | |-google >> > > > | | |-__init__.py >> > > > | | | tests -> tests for the >> > > > "apache-airflow-integrations-providers-google" package| | >> > > > |-__init__.py| |-protocols >> > > > | |-setup.py -> setup.py for the >> > > > "apache-airflow-integrations-protocols" package >> > > > | |-airflow_integrations >> > > > | |-protocols >> > > > | |-__init__.py| |-tests -> tests for the >> > > > "apache-airflow-integrations-protocols" package >> > > > >> > > > There are a number of pros for this solution: >> > > > >> > > > - We could use the standard namespaces feature of python to >build >> > > > multiple packages: >> > > > >https://packaging.python.org/guides/packaging-namespace-packages/ >> > > > - Installation for users will be the same as previously. We >could >> > > > install the needed packages automatically when particular >extras >> are >> > > > used >> > > > (pip install apache-airflow[google] could install both >> > > "apache-airflow" >> > > > and >> > > > "apache-airflow-integrations-providers-google") >> > > > - We could have custom setup.py installation process for >> developers >> > > that >> > > > could install all the packages in development ("-e ." mode) >in a >> > > single >> > > > operation. >> > > > - In case of transfer packages we could have nice error >messages >> > > > informing that the other package needs to be installed (for >> example >> > > > S3->GCS >> > > > operator would import >"airflow-integrations.providers.amazon.*" >> and >> > if >> > > > it >> > > > fails it could raise ("Please install [amazon] extra to use >me.") >> > > > - We could implement numerous optimisations in the way how >we run >> > > tests >> > > > in CI (for example run all the "providers" tests only with >sqlite, >> > run >> > > > tests in parallel etc.) >> > > > - We could implement it gradually - we do not have to have a >"big >> > > bang" >> > > > approach - we can implement it in "provider-by-provider" way >and >> > test >> > > it >> > > > with one provider (Google) first to make sure that all the >> > mechanisms >> > > > are >> > > > working >> > > > - For now we could have the monorepo approach where all the >> packages >> > > > will be developed in concert - for now avoiding the >dependency >> > > problems >> > > > (but allowing for back-portability to 1.10). >> > > > - We will have clear boundaries between packages and ability >to >> test >> > > for >> > > > some unwanted/hidden dependencies between packages. >> > > > - We could switch to (much better) sphinx-apidoc package to >> continue >> > > > building single documentation for all of those (sphinx >apidoc has >> > > > support >> > > > for namespaces). >> > > > >> > > > As we are working on GCP move from contrib to core, we could >make all >> > the >> > > > effort to test it and try it before we merge it to master so >that it >> > will >> > > > be ready for others (and we could help with most of the moves >> > > afterwards). >> > > > It seems complex, but in fact in most cases it will be very >simple >> move >> > > > between the packages and can be done incrementally so there is >little >> > > risk >> > > > in doing this I think. >> > > > >> > > > J. >> > > > >> > > > >> > > > On Mon, Oct 28, 2019 at 11:45 PM Kevin Yang <yrql...@gmail.com> >> wrote: >> > > > >> > > > > Tomasz and Ash got good points about the overhead of having >> separate >> > > > repos. >> > > > > But while we grow bigger and more mature, I would prefer to >have >> what >> > > was >> > > > > described in AIP-8. It shouldn't be extremely hard for us to >come >> up >> > > with >> > > > > good strategies to handle the overhead. AIP-8 already talked >about >> > how >> > > it >> > > > > can benefit us. IMO on a high level, having clearly >seperation on >> > core >> > > > vs. >> > > > > hooks/operators would make the project much more scalable and >the >> > gains >> > > > > would outweigh the cost we pay. >> > > > > >> > > > > That being said, I'm supportive to this moving towards AIP-8 >while >> > > > learning >> > > > > approach, quite a good practise to tackle a big project. >Looking >> > > forward >> > > > to >> > > > > read the AIP. >> > > > > >> > > > > >> > > > > Cheers, >> > > > > Kevin Y >> > > > > >> > > > > On Mon, Oct 28, 2019 at 6:21 AM Jarek Potiuk < >> > jarek.pot...@polidea.com >> > > > >> > > > > wrote: >> > > > > >> > > > > > We are checking how we can use namespaces in back-portable >way >> and >> > we >> > > > > will >> > > > > > have POC soon so that we all will be able to see how it >will look >> > > like. >> > > > > > >> > > > > > J. >> > > > > > >> > > > > > On Mon, Oct 28, 2019 at 1:24 PM Ash Berlin-Taylor < >> a...@apache.org> >> > > > > wrote: >> > > > > > >> > > > > > > I'll have to read your proposal in detail (sorry, no time >right >> > > > now!), >> > > > > > but >> > > > > > > I'm broadly in favour of this approach, and I think >keeping >> them >> > > _in_ >> > > > > the >> > > > > > > same repo is the best plan -- that makes writing and >testing >> > > > > > cross-cutting >> > > > > > > changes easier. >> > > > > > > >> > > > > > > -a >> > > > > > > >> > > > > > > > On 28 Oct 2019, at 12:14, Tomasz Urbaszek < >> > > > > tomasz.urbas...@polidea.com >> > > > > > > >> > > > > > > wrote: >> > > > > > > > >> > > > > > > > I think utilizing namespaces should reduce a lot of >problems >> > > raised >> > > > > by >> > > > > > > > using separate repos (who will manage it? how to >release? >> where >> > > > > should >> > > > > > be >> > > > > > > > the repo?). >> > > > > > > > >> > > > > > > > Bests, >> > > > > > > > Tomek >> > > > > > > > >> > > > > > > > On Sun, Oct 27, 2019 at 11:54 AM Jarek Potiuk < >> > > > > > jarek.pot...@polidea.com> >> > > > > > > > wrote: >> > > > > > > > >> > > > > > > >> Thanks Bas for comments! Let me share my thoughts >below. >> > > > > > > >> >> > > > > > > >> On Sun, Oct 27, 2019 at 9:23 AM Bas Harenslak < >> > > > > > > >> basharens...@godatadriven.com> >> > > > > > > >> wrote: >> > > > > > > >> >> > > > > > > >>> Hi Jarek, I definitely see a future in creating >separate >> > > > > installable >> > > > > > > >>> packages for various operators/hooks/etc (as in >AIP-8). >> This >> > > > would >> > > > > > IMO >> > > > > > > >>> strip the “core” Airflow to only what’s needed and >result >> in >> > a >> > > > > small >> > > > > > > >>> package without a ton of dependencies (and make it >more >> > > > > maintainable, >> > > > > > > >>> shorter tests, etc etc etc). Not exactly sure though >what >> > > you’re >> > > > > > > >> proposing >> > > > > > > >>> in your e-mail, is it a new AIP for an intermediate >step >> > > towards >> > > > > > AIP-8? >> > > > > > > >>> >> > > > > > > >> >> > > > > > > >> It's a new AIP I am proposing. For now it's only for >> > > backporting >> > > > > the >> > > > > > > new >> > > > > > > >> 2.0 import paths to 1.10.* series. >> > > > > > > >> >> > > > > > > >> It's more of "incremental going in direction of AIP-8 >and >> > > learning >> > > > > > some >> > > > > > > >> difficulties involved" than implementing AIP-8 fully. >We are >> > > > taking >> > > > > > > >> advantage of changes in import paths from AIP-21 which >make >> it >> > > > > > possible >> > > > > > > to >> > > > > > > >> have both old and new (optional) operators available >in >> 1.10.* >> > > > > series >> > > > > > of >> > > > > > > >> Airflow. I think there is a lot more to do for full >> > > implementation >> > > > > of >> > > > > > > >> AIP-8: decisions how to maintain, install those >operator >> > groups >> > > > > > > separately, >> > > > > > > >> stewardship model/organisation for the separate >groups, how >> to >> > > > > manage >> > > > > > > >> cross-dependencies, procedures for releasing the >packages >> etc. >> > > > > > > >> >> > > > > > > >> I think about this new AIP also as a learning effort - >we >> > would >> > > > > learn >> > > > > > > more >> > > > > > > >> how separate packaging works and then we can follow up >with >> > > AIP-8 >> > > > > full >> > > > > > > >> implementation for "modular" Airflow. Then AIP-8 could >be >> > > > > implemented >> > > > > > in >> > > > > > > >> Airflow 2.1 for example - or 3.0 if we start following >> > semantic >> > > > > > > versioning >> > > > > > > >> - based on those learnings. It's a bit of good example >of >> > having >> > > > > cake >> > > > > > > and >> > > > > > > >> eating it too. We can try out modularity in 1.10.* >while >> > cutting >> > > > the >> > > > > > > scope >> > > > > > > >> of 2.0 and not implementing full management/release >> procedure >> > > for >> > > > > > AIP-8 >> > > > > > > >> yet. >> > > > > > > >> >> > > > > > > >> >> > > > > > > >>> Thinking about this, I think there are still a few >grey >> areas >> > > > > (which >> > > > > > > >> would >> > > > > > > >>> be good to discuss in a new AIP, or continue on >AIP-8): >> > > > > > > >>> >> > > > > > > >>> * In your email you only speak only about the 3 >big >> cloud >> > > > > > providers >> > > > > > > >>> (btw I made a PR for migrating all AWS components -> >> > > > > > > >>> https://github.com/apache/airflow/pull/6439). Is >there a >> > plan >> > > > for >> > > > > > > >>> splitting other components than Google/AWS/Azure? >> > > > > > > >>> >> > > > > > > >> >> > > > > > > >> We could add more groups as part of this new AIP >indeed (as >> an >> > > > > > > extension to >> > > > > > > >> AIP-21 and pre-requisite to AIP-8). We already see how >> > > > > > > moving/deprecation >> > > > > > > >> works for the providers package - it works for >GCP/Google >> > rather >> > > > > > nicely. >> > > > > > > >> But there is nothing to prevent us from extending it >to >> cover >> > > > other >> > > > > > > groups >> > > > > > > >> of operators/hooks. If you look at the current >structure of >> > > > > > > documentation >> > > > > > > >> done by Kamil, we can follow the structure there and >move >> the >> > > > > > > >> operators/hooks accordingly ( >> > > > > > > >> >> > > > > >> > >https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html >> > > > > > ): >> > > > > > > >> >> > > > > > > >> Fundamentals, ASF: Apache Software Foundation, >Azure: >> > > > Microsoft >> > > > > > > >> Azure, AWS: Amazon Web Services, GCP: Google Cloud >Platform, >> > > > Service >> > > > > > > >> integrations, Software integrations, Protocol >integrations. >> > > > > > > >> >> > > > > > > >> I am happy to include that in the AIP - if others >agree >> it's a >> > > > good >> > > > > > > idea. >> > > > > > > >> Out of those groups - I think only Fundamentals >should not >> be >> > > > > > > back-ported. >> > > > > > > >> Others should be rather easy to port (if we decide >to). We >> > > already >> > > > > > have >> > > > > > > >> quite a lot of those in the new GCP operators for 2.0. >So >> > > starting >> > > > > > with >> > > > > > > >> GCP/Google group is a good idea. Also following with >Cloud >> > > > Providers >> > > > > > > first >> > > > > > > >> is a good thing. For example we have now support from >Google >> > > > > Composer >> > > > > > > team >> > > > > > > >> to do this separation for GCP (and we learn from it) >and >> then >> > we >> > > > can >> > > > > > > claim >> > > > > > > >> the stewardship in our team for releasing the python >3/ >> > Airflow >> > > > > > > >> 1.10-compatible "airflow-google" packages. Possibly >other >> > Cloud >> > > > > > > >> Providers/teams might follow this (if they see the >value in >> > it) >> > > > and >> > > > > > > there >> > > > > > > >> could be different stewards for those. And then we can >do >> > other >> > > > > groups >> > > > > > > if >> > > > > > > >> we decide to. I think this way we can learn whether >AIP-8 is >> > > > > > manageable >> > > > > > > and >> > > > > > > >> what real problems we are going to face. >> > > > > > > >> >> > > > > > > >> * Each “plugin” e.g. GCP would be a separate repo, >should >> > we >> > > > > create >> > > > > > > >>> some sort of blueprint for such packages? >> > > > > > > >>> >> > > > > > > >> >> > > > > > > >> I think we do not need separate repos (at all) but in >this >> new >> > > AIP >> > > > > we >> > > > > > > can >> > > > > > > >> test it before we decide to go for AIP-8. IMHO - >monorepo >> > > approach >> > > > > > will >> > > > > > > >> work here rather nicely. We could use python-3 native >> > namespaces >> > > > > > > >> < >> > > > >https://packaging.python.org/guides/packaging-namespace-packages/> >> > > > > > for >> > > > > > > >> the >> > > > > > > >> sub-packages when we go full AIP-8. For now we could >simply >> > > > package >> > > > > > the >> > > > > > > new >> > > > > > > >> operators in separate pip package for Python 3 version >> 1.10.* >> > > > series >> > > > > > > only. >> > > > > > > >> We only need to test if it works well with another >package >> > > > providing >> > > > > > > >> 'airflow.providers.*' after apache-airflow is >installed >> > > (providing >> > > > > > > >> 'airflow' package). But I think we can make it work. I >don't >> > > think >> > > > > we >> > > > > > > >> really need to split the repos, namespaces will work >just >> fine >> > > and >> > > > > has >> > > > > > > >> easier management of cross-repository dependencies >(but we >> can >> > > > learn >> > > > > > > >> otherwise). For sure we will not need it for the new >> proposed >> > > AIP >> > > > of >> > > > > > > >> backporting groups to 1.10 and we can defer that >decision to >> > > AIP-8 >> > > > > > > >> implementation time. >> > > > > > > >> >> > > > > > > >> >> > > > > > > >>> * In which Airflow version do we start raising >> deprecation >> > > > > > warnings >> > > > > > > >>> and in which version would we remove the original? >> > > > > > > >>> >> > > > > > > >> >> > > > > > > >> I think we should do what we did in GCP case already. >Those >> > old >> > > > > > > "imports" >> > > > > > > >> for operators can be made as deprecated in Airflow 2.0 >(and >> > > > removed >> > > > > in >> > > > > > > 2.1 >> > > > > > > >> or 3.0 if we start following semantic versioning). We >can >> > > however >> > > > do >> > > > > > it >> > > > > > > >> before in 1.10.7 or 1.10.8 if we release those >(without >> > removing >> > > > the >> > > > > > old >> > > > > > > >> operators yet - just raise deprecation warnings and >inform >> > that >> > > > for >> > > > > > > python3 >> > > > > > > >> the new "airflow-google", "airflow-aws" etc. packages >can be >> > > > > installed >> > > > > > > and >> > > > > > > >> users can switch to it). >> > > > > > > >> >> > > > > > > >> J. >> > > > > > > >> >> > > > > > > >> >> > > > > > > >>> >> > > > > > > >>> Cheers, >> > > > > > > >>> Bas >> > > > > > > >>> >> > > > > > > >>> On 27 Oct 2019, at 08:33, Jarek Potiuk < >> > > jarek.pot...@polidea.com >> > > > > > > <mailto: >> > > > > > > >>> jarek.pot...@polidea.com>> wrote: >> > > > > > > >>> >> > > > > > > >>> Hello - any comments on that? I am happy to make it >into an >> > AIP >> > > > :)? >> > > > > > > >>> >> > > > > > > >>> On Sun, Oct 13, 2019 at 5:53 PM Jarek Potiuk < >> > > > > > jarek.pot...@polidea.com >> > > > > > > >>> <mailto:jarek.pot...@polidea.com>> >> > > > > > > >>> wrote: >> > > > > > > >>> >> > > > > > > >>> *Motivation* >> > > > > > > >>> >> > > > > > > >>> I think we really should start thinking about making >it >> > easier >> > > to >> > > > > > > migrate >> > > > > > > >>> to 2.0 for our users. After implementing some recent >> changes >> > > > > related >> > > > > > to >> > > > > > > >>> AIP-21- >> > > > > > > >>> Changes in import paths >> > > > > > > >>> < >> > > > > > > >>> >> > > > > > > >> >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> >https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths >> > > > > > > >>> >> > > > > > > >>> I >> > > > > > > >>> think I have an idea that might help with it. >> > > > > > > >>> >> > > > > > > >>> *Proposal* >> > > > > > > >>> >> > > > > > > >>> We could package some of the new and improved 2.0 >operators >> > > > (moved >> > > > > to >> > > > > > > >>> "providers" package) and let them be used in Python 3 >> > > environment >> > > > > of >> > > > > > > >>> airflow 1.10.x. >> > > > > > > >>> >> > > > > > > >>> This can be done case-by-case per "cloud provider". >It >> should >> > > not >> > > > > be >> > > > > > > >>> obligatory, should be largely driven by each >provider. It's >> > not >> > > > yet >> > > > > > > full >> > > > > > > >>> AIP-8 >> > > > > > > >>> Split Hooks/Operators into separate packages >> > > > > > > >>> < >> > > > > > > >>> >> > > > > > > >> >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> >https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303 >> > > > > > > >>> . >> > > > > > > >>> It's >> > > > > > > >>> merely backporting of some operators/hooks to get it >work >> in >> > > > 1.10. >> > > > > > But >> > > > > > > by >> > > > > > > >>> doing it we might try out the concept of splitting, >learn >> > about >> > > > > > > >> maintenance >> > > > > > > >>> problems and maybe implement full *AIP-8 *approach in >2.1 >> > > > > > consistently >> > > > > > > >>> across the board. >> > > > > > > >>> >> > > > > > > >>> *Context* >> > > > > > > >>> >> > > > > > > >>> Part of the AIP-21 was to move import paths for Cloud >> > providers >> > > > to >> > > > > > > >>> separate providers/<PROVIDER> package. An example for >that >> > (the >> > > > > first >> > > > > > > >>> provider we already almost migrated) was >providers/google >> > > package >> > > > > > > >> (further >> > > > > > > >>> divided into gcp/gsuite etc). >> > > > > > > >>> >> > > > > > > >>> We've done a massive migration of all the >Google-related >> > > > operators, >> > > > > > > >>> created a few missing ones and retrofitted some old >> operators >> > > to >> > > > > > follow >> > > > > > > >> GCP >> > > > > > > >>> best practices and fixing a number of problems - also >> > > > implementing >> > > > > > > >> Python3 >> > > > > > > >>> and Pylint compatibility. Some of these >operators/hooks are >> > not >> > > > > > > backwards >> > > > > > > >>> compatible. Those that are compatible are still >available >> via >> > > the >> > > > > old >> > > > > > > >>> imports with deprecation warning. >> > > > > > > >>> >> > > > > > > >>> We've added missing tests (including system tests) >and >> > missing >> > > > > > > features - >> > > > > > > >>> improving some of the Google operators - giving the >users >> > more >> > > > > > > >> capabilities >> > > > > > > >>> and fixing some issues. Those operators should pretty >much >> > > "just >> > > > > > work" >> > > > > > > in >> > > > > > > >>> Airflow 1.10.x (any recent version) for Python 3. We >should >> > be >> > > > able >> > > > > > to >> > > > > > > >>> release a separate pip-installable package for those >> > operators >> > > > that >> > > > > > > users >> > > > > > > >>> should be able to install in Airflow 1.10.x. >> > > > > > > >>> >> > > > > > > >>> Any user will be able to install this separate >package in >> > their >> > > > > > Airflow >> > > > > > > >>> 1.10.x installation and start using those new >"provider" >> > > > operators >> > > > > in >> > > > > > > >>> parallel to the old 1.10.x operators. Other providers >> > > > ("microsoft", >> > > > > > > >>> "amazon") might follow the same approach if they >want. We >> > could >> > > > > even >> > > > > > at >> > > > > > > >>> some point decide to move some of the core operators >in >> > similar >> > > > > > fashion >> > > > > > > >>> (for example following the structure proposed in the >latest >> > > > > > > >> documentation: >> > > > > > > >>> fundamentals / software / etc. >> > > > > > > >>> >> > > > > > >> > > >https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html) >> > > > > > > >>> >> > > > > > > >>> *Pros and cons* >> > > > > > > >>> >> > > > > > > >>> There are a number of pros: >> > > > > > > >>> >> > > > > > > >>> - Users will have an easier migration path if they >are >> > deeply >> > > > > vested >> > > > > > > >>> into 1.10.* version >> > > > > > > >>> - It's possible to migrate in stages for people who >are >> also >> > > > > vested >> > > > > > in >> > > > > > > >>> py2: *py2 (1.10) -> py3 (1.10) -> py3 + new >operators >> (1.10) >> > > -> >> > > > > py3 >> > > > > > + >> > > > > > > >>> 2.0* >> > > > > > > >>> - Moving to new operators in py3 + new operators can >be >> done >> > > > > > > >>> gradually. Old operators will continue to work while >new >> can >> > > be >> > > > > used >> > > > > > > >> more >> > > > > > > >>> and more >> > > > > > > >>> - People will get incentivised to migrate to python >3 >> before >> > > 2.0 >> > > > > is >> > > > > > > >>> out (by using new operators) >> > > > > > > >>> - Each provider "package" can have independent >release >> > > schedule >> > > > - >> > > > > > and >> > > > > > > >>> add functionality in already released Airflow >versions. >> > > > > > > >>> - We do not take out any functionality from the >users - we >> > > just >> > > > > add >> > > > > > > >>> more options >> > > > > > > >>> - The releases can be - similarly as main airflow >> releases - >> > > > voted >> > > > > > > >>> separately by PMC after "stewards" of the package >(per >> > > provider) >> > > > > > > >> perform >> > > > > > > >>> round of testing on 1.10.* versions. >> > > > > > > >>> - Users will start migrating to new operators >earlier and >> > have >> > > > > > > >>> smoother switch to 2.0 later >> > > > > > > >>> - The latest improved operators will start >> > > > > > > >>> >> > > > > > > >>> There are three cons I could think of: >> > > > > > > >>> >> > > > > > > >>> - There will be quite a lot of duplication between >old and >> > new >> > > > > > > >>> operators (they will co-exist in 1.10). That might >lead to >> > > > > confusion >> > > > > > > of >> > > > > > > >>> users and problems with cooperation between >different >> > > > > > operators/hooks >> > > > > > > >>> - Having new operators in 1.10 python 3 might keep >people >> > from >> > > > > > > >>> migrating to 2.0 >> > > > > > > >>> - It will require some maintenance and separate >release >> > > > overhead. >> > > > > > > >>> >> > > > > > > >>> I already spoke to Composer team @Google and they are >very >> > > > positive >> > > > > > > about >> > > > > > > >>> this. I also spoke to Ash and seems it might also be >OK for >> > > > > > Astronomer >> > > > > > > >>> team. We have Google's backing and support, and we >can >> > provide >> > > > > > > >> maintenance >> > > > > > > >>> and support for those packages - being an example for >other >> > > > > providers >> > > > > > > how >> > > > > > > >>> they can do it. >> > > > > > > >>> >> > > > > > > >>> Let me know what you think - and whether I should >make it >> > into >> > > an >> > > > > > > >> official >> > > > > > > >>> AIP maybe? >> > > > > > > >>> >> > > > > > > >>> J. >> > > > > > > >>> >> > > > > > > >>> >> > > > > > > >>> >> > > > > > > >>> -- >> > > > > > > >>> >> > > > > > > >>> Jarek Potiuk >> > > > > > > >>> Polidea <https://www.polidea.com/> | Principal >Software >> > > Engineer >> > > > > > > >>> >> > > > > > > >>> M: +48 660 796 129 <+48660796129> >> > > > > > > >>> [image: Polidea] <https://www.polidea.com/> >> > > > > > > >>> >> > > > > > > >>> >> > > > > > > >>> >> > > > > > > >>> -- >> > > > > > > >>> >> > > > > > > >>> Jarek Potiuk >> > > > > > > >>> Polidea <https://www.polidea.com/> | Principal >Software >> > > Engineer >> > > > > > > >>> >> > > > > > > >>> M: +48 660 796 129 <+48660796129> >> > > > > > > >>> [image: Polidea] <https://www.polidea.com/> >> > > > > > > >>> >> > > > > > > >>> >> > > > > > > >> >> > > > > > > >> -- >> > > > > > > >> >> > > > > > > >> Jarek Potiuk >> > > > > > > >> Polidea <https://www.polidea.com/> | Principal >Software >> > > Engineer >> > > > > > > >> >> > > > > > > >> M: +48 660 796 129 <+48660796129> >> > > > > > > >> [image: Polidea] <https://www.polidea.com/> >> > > > > > > >> >> > > > > > > > >> > > > > > > > >> > > > > > > > -- >> > > > > > > > >> > > > > > > > Tomasz Urbaszek >> > > > > > > > Polidea <https://www.polidea.com/> | Junior Software >> Engineer >> > > > > > > > >> > > > > > > > M: +48 505 628 493 <+48505628493> >> > > > > > > > E: tomasz.urbas...@polidea.com ><tomasz.urbasz...@polidea.com >> > >> > > > > > > > >> > > > > > > > Unique Tech >> > > > > > > > Check out our projects! ><https://www.polidea.com/our-work> >> > > > > > > >> > > > > > > >> > > > > > >> > > > > > -- >> > > > > > >> > > > > > Jarek Potiuk >> > > > > > Polidea <https://www.polidea.com/> | Principal Software >Engineer >> > > > > > >> > > > > > M: +48 660 796 129 <+48660796129> >> > > > > > [image: Polidea] <https://www.polidea.com/> >> > > > > > >> > > > > >> > > > >> > > > >> > > > -- >> > > > >> > > > Jarek Potiuk >> > > > Polidea <https://www.polidea.com/> | Principal Software >Engineer >> > > > >> > > > M: +48 660 796 129 <+48660796129> >> > > > [image: Polidea] <https://www.polidea.com/> >> > > > >> > > >> > >> > >> > -- >> > >> > Jarek Potiuk >> > Polidea <https://www.polidea.com/> | Principal Software Engineer >> > >> > M: +48 660 796 129 <+48660796129> >> > [image: Polidea] <https://www.polidea.com/> >> > >> > > >-- > >Jarek Potiuk >Polidea <https://www.polidea.com/> | Principal Software Engineer > >M: +48 660 796 129 <+48660796129> >[image: Polidea] <https://www.polidea.com/>