Tomasz and Ash got good points about the overhead of having separate repos. But while we grow bigger and more mature, I would prefer to have what was described in AIP-8. It shouldn't be extremely hard for us to come up with good strategies to handle the overhead. AIP-8 already talked about how it can benefit us. IMO on a high level, having clearly seperation on core vs. hooks/operators would make the project much more scalable and the gains would outweigh the cost we pay.
That being said, I'm supportive to this moving towards AIP-8 while learning approach, quite a good practise to tackle a big project. Looking forward to read the AIP. Cheers, Kevin Y On Mon, Oct 28, 2019 at 6:21 AM Jarek Potiuk <jarek.pot...@polidea.com> wrote: > We are checking how we can use namespaces in back-portable way and we will > have POC soon so that we all will be able to see how it will look like. > > J. > > On Mon, Oct 28, 2019 at 1:24 PM Ash Berlin-Taylor <a...@apache.org> wrote: > > > I'll have to read your proposal in detail (sorry, no time right now!), > but > > I'm broadly in favour of this approach, and I think keeping them _in_ the > > same repo is the best plan -- that makes writing and testing > cross-cutting > > changes easier. > > > > -a > > > > > On 28 Oct 2019, at 12:14, Tomasz Urbaszek <tomasz.urbas...@polidea.com > > > > wrote: > > > > > > I think utilizing namespaces should reduce a lot of problems raised by > > > using separate repos (who will manage it? how to release? where should > be > > > the repo?). > > > > > > Bests, > > > Tomek > > > > > > On Sun, Oct 27, 2019 at 11:54 AM Jarek Potiuk < > jarek.pot...@polidea.com> > > > wrote: > > > > > >> Thanks Bas for comments! Let me share my thoughts below. > > >> > > >> On Sun, Oct 27, 2019 at 9:23 AM Bas Harenslak < > > >> basharens...@godatadriven.com> > > >> wrote: > > >> > > >>> Hi Jarek, I definitely see a future in creating separate installable > > >>> packages for various operators/hooks/etc (as in AIP-8). This would > IMO > > >>> strip the “core” Airflow to only what’s needed and result in a small > > >>> package without a ton of dependencies (and make it more maintainable, > > >>> shorter tests, etc etc etc). Not exactly sure though what you’re > > >> proposing > > >>> in your e-mail, is it a new AIP for an intermediate step towards > AIP-8? > > >>> > > >> > > >> It's a new AIP I am proposing. For now it's only for backporting the > > new > > >> 2.0 import paths to 1.10.* series. > > >> > > >> It's more of "incremental going in direction of AIP-8 and learning > some > > >> difficulties involved" than implementing AIP-8 fully. We are taking > > >> advantage of changes in import paths from AIP-21 which make it > possible > > to > > >> have both old and new (optional) operators available in 1.10.* series > of > > >> Airflow. I think there is a lot more to do for full implementation of > > >> AIP-8: decisions how to maintain, install those operator groups > > separately, > > >> stewardship model/organisation for the separate groups, how to manage > > >> cross-dependencies, procedures for releasing the packages etc. > > >> > > >> I think about this new AIP also as a learning effort - we would learn > > more > > >> how separate packaging works and then we can follow up with AIP-8 full > > >> implementation for "modular" Airflow. Then AIP-8 could be implemented > in > > >> Airflow 2.1 for example - or 3.0 if we start following semantic > > versioning > > >> - based on those learnings. It's a bit of good example of having cake > > and > > >> eating it too. We can try out modularity in 1.10.* while cutting the > > scope > > >> of 2.0 and not implementing full management/release procedure for > AIP-8 > > >> yet. > > >> > > >> > > >>> Thinking about this, I think there are still a few grey areas (which > > >> would > > >>> be good to discuss in a new AIP, or continue on AIP-8): > > >>> > > >>> * In your email you only speak only about the 3 big cloud > providers > > >>> (btw I made a PR for migrating all AWS components -> > > >>> https://github.com/apache/airflow/pull/6439). Is there a plan for > > >>> splitting other components than Google/AWS/Azure? > > >>> > > >> > > >> We could add more groups as part of this new AIP indeed (as an > > extension to > > >> AIP-21 and pre-requisite to AIP-8). We already see how > > moving/deprecation > > >> works for the providers package - it works for GCP/Google rather > nicely. > > >> But there is nothing to prevent us from extending it to cover other > > groups > > >> of operators/hooks. If you look at the current structure of > > documentation > > >> done by Kamil, we can follow the structure there and move the > > >> operators/hooks accordingly ( > > >> https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html > ): > > >> > > >> Fundamentals, ASF: Apache Software Foundation, Azure: Microsoft > > >> Azure, AWS: Amazon Web Services, GCP: Google Cloud Platform, Service > > >> integrations, Software integrations, Protocol integrations. > > >> > > >> I am happy to include that in the AIP - if others agree it's a good > > idea. > > >> Out of those groups - I think only Fundamentals should not be > > back-ported. > > >> Others should be rather easy to port (if we decide to). We already > have > > >> quite a lot of those in the new GCP operators for 2.0. So starting > with > > >> GCP/Google group is a good idea. Also following with Cloud Providers > > first > > >> is a good thing. For example we have now support from Google Composer > > team > > >> to do this separation for GCP (and we learn from it) and then we can > > claim > > >> the stewardship in our team for releasing the python 3/ Airflow > > >> 1.10-compatible "airflow-google" packages. Possibly other Cloud > > >> Providers/teams might follow this (if they see the value in it) and > > there > > >> could be different stewards for those. And then we can do other groups > > if > > >> we decide to. I think this way we can learn whether AIP-8 is > manageable > > and > > >> what real problems we are going to face. > > >> > > >> * Each “plugin” e.g. GCP would be a separate repo, should we create > > >>> some sort of blueprint for such packages? > > >>> > > >> > > >> I think we do not need separate repos (at all) but in this new AIP we > > can > > >> test it before we decide to go for AIP-8. IMHO - monorepo approach > will > > >> work here rather nicely. We could use python-3 native namespaces > > >> <https://packaging.python.org/guides/packaging-namespace-packages/> > for > > >> the > > >> sub-packages when we go full AIP-8. For now we could simply package > the > > new > > >> operators in separate pip package for Python 3 version 1.10.* series > > only. > > >> We only need to test if it works well with another package providing > > >> 'airflow.providers.*' after apache-airflow is installed (providing > > >> 'airflow' package). But I think we can make it work. I don't think we > > >> really need to split the repos, namespaces will work just fine and has > > >> easier management of cross-repository dependencies (but we can learn > > >> otherwise). For sure we will not need it for the new proposed AIP of > > >> backporting groups to 1.10 and we can defer that decision to AIP-8 > > >> implementation time. > > >> > > >> > > >>> * In which Airflow version do we start raising deprecation > warnings > > >>> and in which version would we remove the original? > > >>> > > >> > > >> I think we should do what we did in GCP case already. Those old > > "imports" > > >> for operators can be made as deprecated in Airflow 2.0 (and removed in > > 2.1 > > >> or 3.0 if we start following semantic versioning). We can however do > it > > >> before in 1.10.7 or 1.10.8 if we release those (without removing the > old > > >> operators yet - just raise deprecation warnings and inform that for > > python3 > > >> the new "airflow-google", "airflow-aws" etc. packages can be installed > > and > > >> users can switch to it). > > >> > > >> J. > > >> > > >> > > >>> > > >>> Cheers, > > >>> Bas > > >>> > > >>> On 27 Oct 2019, at 08:33, Jarek Potiuk <jarek.pot...@polidea.com > > <mailto: > > >>> jarek.pot...@polidea.com>> wrote: > > >>> > > >>> Hello - any comments on that? I am happy to make it into an AIP :)? > > >>> > > >>> On Sun, Oct 13, 2019 at 5:53 PM Jarek Potiuk < > jarek.pot...@polidea.com > > >>> <mailto:jarek.pot...@polidea.com>> > > >>> wrote: > > >>> > > >>> *Motivation* > > >>> > > >>> I think we really should start thinking about making it easier to > > migrate > > >>> to 2.0 for our users. After implementing some recent changes related > to > > >>> AIP-21- > > >>> Changes in import paths > > >>> < > > >>> > > >> > > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths > > >>> > > >>> I > > >>> think I have an idea that might help with it. > > >>> > > >>> *Proposal* > > >>> > > >>> We could package some of the new and improved 2.0 operators (moved to > > >>> "providers" package) and let them be used in Python 3 environment of > > >>> airflow 1.10.x. > > >>> > > >>> This can be done case-by-case per "cloud provider". It should not be > > >>> obligatory, should be largely driven by each provider. It's not yet > > full > > >>> AIP-8 > > >>> Split Hooks/Operators into separate packages > > >>> < > > >>> > > >> > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303 > > >>> . > > >>> It's > > >>> merely backporting of some operators/hooks to get it work in 1.10. > But > > by > > >>> doing it we might try out the concept of splitting, learn about > > >> maintenance > > >>> problems and maybe implement full *AIP-8 *approach in 2.1 > consistently > > >>> across the board. > > >>> > > >>> *Context* > > >>> > > >>> Part of the AIP-21 was to move import paths for Cloud providers to > > >>> separate providers/<PROVIDER> package. An example for that (the first > > >>> provider we already almost migrated) was providers/google package > > >> (further > > >>> divided into gcp/gsuite etc). > > >>> > > >>> We've done a massive migration of all the Google-related operators, > > >>> created a few missing ones and retrofitted some old operators to > follow > > >> GCP > > >>> best practices and fixing a number of problems - also implementing > > >> Python3 > > >>> and Pylint compatibility. Some of these operators/hooks are not > > backwards > > >>> compatible. Those that are compatible are still available via the old > > >>> imports with deprecation warning. > > >>> > > >>> We've added missing tests (including system tests) and missing > > features - > > >>> improving some of the Google operators - giving the users more > > >> capabilities > > >>> and fixing some issues. Those operators should pretty much "just > work" > > in > > >>> Airflow 1.10.x (any recent version) for Python 3. We should be able > to > > >>> release a separate pip-installable package for those operators that > > users > > >>> should be able to install in Airflow 1.10.x. > > >>> > > >>> Any user will be able to install this separate package in their > Airflow > > >>> 1.10.x installation and start using those new "provider" operators in > > >>> parallel to the old 1.10.x operators. Other providers ("microsoft", > > >>> "amazon") might follow the same approach if they want. We could even > at > > >>> some point decide to move some of the core operators in similar > fashion > > >>> (for example following the structure proposed in the latest > > >> documentation: > > >>> fundamentals / software / etc. > > >>> > https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html) > > >>> > > >>> *Pros and cons* > > >>> > > >>> There are a number of pros: > > >>> > > >>> - Users will have an easier migration path if they are deeply vested > > >>> into 1.10.* version > > >>> - It's possible to migrate in stages for people who are also vested > in > > >>> py2: *py2 (1.10) -> py3 (1.10) -> py3 + new operators (1.10) -> py3 > + > > >>> 2.0* > > >>> - Moving to new operators in py3 + new operators can be done > > >>> gradually. Old operators will continue to work while new can be used > > >> more > > >>> and more > > >>> - People will get incentivised to migrate to python 3 before 2.0 is > > >>> out (by using new operators) > > >>> - Each provider "package" can have independent release schedule - > and > > >>> add functionality in already released Airflow versions. > > >>> - We do not take out any functionality from the users - we just add > > >>> more options > > >>> - The releases can be - similarly as main airflow releases - voted > > >>> separately by PMC after "stewards" of the package (per provider) > > >> perform > > >>> round of testing on 1.10.* versions. > > >>> - Users will start migrating to new operators earlier and have > > >>> smoother switch to 2.0 later > > >>> - The latest improved operators will start > > >>> > > >>> There are three cons I could think of: > > >>> > > >>> - There will be quite a lot of duplication between old and new > > >>> operators (they will co-exist in 1.10). That might lead to confusion > > of > > >>> users and problems with cooperation between different > operators/hooks > > >>> - Having new operators in 1.10 python 3 might keep people from > > >>> migrating to 2.0 > > >>> - It will require some maintenance and separate release overhead. > > >>> > > >>> I already spoke to Composer team @Google and they are very positive > > about > > >>> this. I also spoke to Ash and seems it might also be OK for > Astronomer > > >>> team. We have Google's backing and support, and we can provide > > >> maintenance > > >>> and support for those packages - being an example for other providers > > how > > >>> they can do it. > > >>> > > >>> Let me know what you think - and whether I should make it into an > > >> official > > >>> AIP maybe? > > >>> > > >>> J. > > >>> > > >>> > > >>> > > >>> -- > > >>> > > >>> Jarek Potiuk > > >>> Polidea <https://www.polidea.com/> | Principal Software Engineer > > >>> > > >>> M: +48 660 796 129 <+48660796129> > > >>> [image: Polidea] <https://www.polidea.com/> > > >>> > > >>> > > >>> > > >>> -- > > >>> > > >>> Jarek Potiuk > > >>> Polidea <https://www.polidea.com/> | Principal Software Engineer > > >>> > > >>> M: +48 660 796 129 <+48660796129> > > >>> [image: Polidea] <https://www.polidea.com/> > > >>> > > >>> > > >> > > >> -- > > >> > > >> Jarek Potiuk > > >> Polidea <https://www.polidea.com/> | Principal Software Engineer > > >> > > >> M: +48 660 796 129 <+48660796129> > > >> [image: Polidea] <https://www.polidea.com/> > > >> > > > > > > > > > -- > > > > > > Tomasz Urbaszek > > > Polidea <https://www.polidea.com/> | Junior Software Engineer > > > > > > M: +48 505 628 493 <+48505628493> > > > E: tomasz.urbas...@polidea.com <tomasz.urbasz...@polidea.com> > > > > > > Unique Tech > > > Check out our projects! <https://www.polidea.com/our-work> > > > > > > -- > > Jarek Potiuk > Polidea <https://www.polidea.com/> | Principal Software Engineer > > M: +48 660 796 129 <+48660796129> > [image: Polidea] <https://www.polidea.com/> >