Thanks Bas for comments! Let me share my thoughts below. On Sun, Oct 27, 2019 at 9:23 AM Bas Harenslak <basharens...@godatadriven.com> wrote:
> Hi Jarek, I definitely see a future in creating separate installable > packages for various operators/hooks/etc (as in AIP-8). This would IMO > strip the “core” Airflow to only what’s needed and result in a small > package without a ton of dependencies (and make it more maintainable, > shorter tests, etc etc etc). Not exactly sure though what you’re proposing > in your e-mail, is it a new AIP for an intermediate step towards AIP-8? > It's a new AIP I am proposing. For now it's only for backporting the new 2.0 import paths to 1.10.* series. It's more of "incremental going in direction of AIP-8 and learning some difficulties involved" than implementing AIP-8 fully. We are taking advantage of changes in import paths from AIP-21 which make it possible to have both old and new (optional) operators available in 1.10.* series of Airflow. I think there is a lot more to do for full implementation of AIP-8: decisions how to maintain, install those operator groups separately, stewardship model/organisation for the separate groups, how to manage cross-dependencies, procedures for releasing the packages etc. I think about this new AIP also as a learning effort - we would learn more how separate packaging works and then we can follow up with AIP-8 full implementation for "modular" Airflow. Then AIP-8 could be implemented in Airflow 2.1 for example - or 3.0 if we start following semantic versioning - based on those learnings. It's a bit of good example of having cake and eating it too. We can try out modularity in 1.10.* while cutting the scope of 2.0 and not implementing full management/release procedure for AIP-8 yet. > Thinking about this, I think there are still a few grey areas (which would > be good to discuss in a new AIP, or continue on AIP-8): > > * In your email you only speak only about the 3 big cloud providers > (btw I made a PR for migrating all AWS components -> > https://github.com/apache/airflow/pull/6439). Is there a plan for > splitting other components than Google/AWS/Azure? > We could add more groups as part of this new AIP indeed (as an extension to AIP-21 and pre-requisite to AIP-8). We already see how moving/deprecation works for the providers package - it works for GCP/Google rather nicely. But there is nothing to prevent us from extending it to cover other groups of operators/hooks. If you look at the current structure of documentation done by Kamil, we can follow the structure there and move the operators/hooks accordingly ( https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html): Fundamentals, ASF: Apache Software Foundation, Azure: Microsoft Azure, AWS: Amazon Web Services, GCP: Google Cloud Platform, Service integrations, Software integrations, Protocol integrations. I am happy to include that in the AIP - if others agree it's a good idea. Out of those groups - I think only Fundamentals should not be back-ported. Others should be rather easy to port (if we decide to). We already have quite a lot of those in the new GCP operators for 2.0. So starting with GCP/Google group is a good idea. Also following with Cloud Providers first is a good thing. For example we have now support from Google Composer team to do this separation for GCP (and we learn from it) and then we can claim the stewardship in our team for releasing the python 3/ Airflow 1.10-compatible "airflow-google" packages. Possibly other Cloud Providers/teams might follow this (if they see the value in it) and there could be different stewards for those. And then we can do other groups if we decide to. I think this way we can learn whether AIP-8 is manageable and what real problems we are going to face. * Each “plugin” e.g. GCP would be a separate repo, should we create > some sort of blueprint for such packages? > I think we do not need separate repos (at all) but in this new AIP we can test it before we decide to go for AIP-8. IMHO - monorepo approach will work here rather nicely. We could use python-3 native namespaces <https://packaging.python.org/guides/packaging-namespace-packages/> for the sub-packages when we go full AIP-8. For now we could simply package the new operators in separate pip package for Python 3 version 1.10.* series only. We only need to test if it works well with another package providing 'airflow.providers.*' after apache-airflow is installed (providing 'airflow' package). But I think we can make it work. I don't think we really need to split the repos, namespaces will work just fine and has easier management of cross-repository dependencies (but we can learn otherwise). For sure we will not need it for the new proposed AIP of backporting groups to 1.10 and we can defer that decision to AIP-8 implementation time. > * In which Airflow version do we start raising deprecation warnings > and in which version would we remove the original? > I think we should do what we did in GCP case already. Those old "imports" for operators can be made as deprecated in Airflow 2.0 (and removed in 2.1 or 3.0 if we start following semantic versioning). We can however do it before in 1.10.7 or 1.10.8 if we release those (without removing the old operators yet - just raise deprecation warnings and inform that for python3 the new "airflow-google", "airflow-aws" etc. packages can be installed and users can switch to it). J. > > Cheers, > Bas > > On 27 Oct 2019, at 08:33, Jarek Potiuk <jarek.pot...@polidea.com<mailto: > jarek.pot...@polidea.com>> wrote: > > Hello - any comments on that? I am happy to make it into an AIP :)? > > On Sun, Oct 13, 2019 at 5:53 PM Jarek Potiuk <jarek.pot...@polidea.com > <mailto:jarek.pot...@polidea.com>> > wrote: > > *Motivation* > > I think we really should start thinking about making it easier to migrate > to 2.0 for our users. After implementing some recent changes related to > AIP-21- > Changes in import paths > < > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths> > I > think I have an idea that might help with it. > > *Proposal* > > We could package some of the new and improved 2.0 operators (moved to > "providers" package) and let them be used in Python 3 environment of > airflow 1.10.x. > > This can be done case-by-case per "cloud provider". It should not be > obligatory, should be largely driven by each provider. It's not yet full > AIP-8 > Split Hooks/Operators into separate packages > < > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303>. > It's > merely backporting of some operators/hooks to get it work in 1.10. But by > doing it we might try out the concept of splitting, learn about maintenance > problems and maybe implement full *AIP-8 *approach in 2.1 consistently > across the board. > > *Context* > > Part of the AIP-21 was to move import paths for Cloud providers to > separate providers/<PROVIDER> package. An example for that (the first > provider we already almost migrated) was providers/google package (further > divided into gcp/gsuite etc). > > We've done a massive migration of all the Google-related operators, > created a few missing ones and retrofitted some old operators to follow GCP > best practices and fixing a number of problems - also implementing Python3 > and Pylint compatibility. Some of these operators/hooks are not backwards > compatible. Those that are compatible are still available via the old > imports with deprecation warning. > > We've added missing tests (including system tests) and missing features - > improving some of the Google operators - giving the users more capabilities > and fixing some issues. Those operators should pretty much "just work" in > Airflow 1.10.x (any recent version) for Python 3. We should be able to > release a separate pip-installable package for those operators that users > should be able to install in Airflow 1.10.x. > > Any user will be able to install this separate package in their Airflow > 1.10.x installation and start using those new "provider" operators in > parallel to the old 1.10.x operators. Other providers ("microsoft", > "amazon") might follow the same approach if they want. We could even at > some point decide to move some of the core operators in similar fashion > (for example following the structure proposed in the latest documentation: > fundamentals / software / etc. > https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html) > > *Pros and cons* > > There are a number of pros: > > - Users will have an easier migration path if they are deeply vested > into 1.10.* version > - It's possible to migrate in stages for people who are also vested in > py2: *py2 (1.10) -> py3 (1.10) -> py3 + new operators (1.10) -> py3 + > 2.0* > - Moving to new operators in py3 + new operators can be done > gradually. Old operators will continue to work while new can be used more > and more > - People will get incentivised to migrate to python 3 before 2.0 is > out (by using new operators) > - Each provider "package" can have independent release schedule - and > add functionality in already released Airflow versions. > - We do not take out any functionality from the users - we just add > more options > - The releases can be - similarly as main airflow releases - voted > separately by PMC after "stewards" of the package (per provider) perform > round of testing on 1.10.* versions. > - Users will start migrating to new operators earlier and have > smoother switch to 2.0 later > - The latest improved operators will start > > There are three cons I could think of: > > - There will be quite a lot of duplication between old and new > operators (they will co-exist in 1.10). That might lead to confusion of > users and problems with cooperation between different operators/hooks > - Having new operators in 1.10 python 3 might keep people from > migrating to 2.0 > - It will require some maintenance and separate release overhead. > > I already spoke to Composer team @Google and they are very positive about > this. I also spoke to Ash and seems it might also be OK for Astronomer > team. We have Google's backing and support, and we can provide maintenance > and support for those packages - being an example for other providers how > they can do it. > > Let me know what you think - and whether I should make it into an official > AIP maybe? > > J. > > > > -- > > Jarek Potiuk > Polidea <https://www.polidea.com/> | Principal Software Engineer > > M: +48 660 796 129 <+48660796129> > [image: Polidea] <https://www.polidea.com/> > > > > -- > > Jarek Potiuk > Polidea <https://www.polidea.com/> | Principal Software Engineer > > M: +48 660 796 129 <+48660796129> > [image: Polidea] <https://www.polidea.com/> > > -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> [image: Polidea] <https://www.polidea.com/>