One more day to go. I would love to see some opinions on this AIP-21 update :).
Executive summary: * we will be moving a number of integrations to sub-packages of airflow. * they will be backportable to 1.10.*. There will be 'apache-airflow-[package]-backport' pypi installable with python 3 that will make Airflow 2.0 operators/hooks etc. available with 1.10* operators. * the current proposal for sub-packages is "protocols/software/providers/" (but if you think merging protocols and software makes sense - please express your opinion * we are not moving "fundamental" operators/hooks etc.. * Airflow 2.0 is still going to be installed as a single package with all operators (so we are not yet implementing AIP-8) J. On Wed, Nov 6, 2019 at 10:07 AM Jarek Potiuk <[email protected]> wrote: > I think all this cases are valid but maybe I was not super-clear. It's > only the transfer operators that we need to decide where to put - not > hooks. > Usually the complexity of communication with particular storages is (or at > least should be) in the Hooks rather than Operators. > > Operators should be just thin wrappers over the logic in the hooks. > Hooks are going to stay where they belong - S3 Hooks in amazon, GCS Hooks > in google.cloud, GoogleSheet Hooks in google.gsuite. > > Since we actually have mono-repo - this will be no problem (and no cross > dependencies problem) to have S3 -> GCS operator in google and use hooks > from both google/amazon. > > I hope this alleviates your concern Daniel ? > > J. > > >> What about GoogleSheetsToS3? GoogleSheetsToGCS? These you would put in >> the target, i.e. the storage? But GoogleSheetsToSftp would be in google >> sheets operators file? The complexity, and the shared code, are in the >> gsheet component -- not into the storage destination. >> >> > > > >> On Tue, Nov 5, 2019 at 5:46 PM Jarek Potiuk <[email protected]> >> wrote: >> >> > Hello Airflow Community, >> > >> > The email calls for a vote to update AIP-21 Changes in import paths >> > < >> > >> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths >> > > >> > with >> > the changes described below. The vote will last till Saturday 8th 2am >> CEST >> > (72 hours). Committers have a binding vote but everyone from the >> community >> > is encouraged to cast an advisory vote. >> > >> > *Summary*: >> > >> > The proposal is to update AIP-21 to move all non-core >> > operators/hooks/sensor (and related files) to sub-packages within >> airflow >> > (protocols/software/providers) or (software/providers). >> > I am also happy to merge protocols+software, so if you have a strong >> > opinion on it - please state it with your vote and we can decide based >> on >> > majority. >> > >> > Those packages will be separately released (schedule/process TBD) and >> will >> > be backportable to 1.10.* airflow series, so that users can install it >> and >> > start using new Airflow2.0 operators in their Python 3 Airflow 1.10 >> > environments (only Python 3.5+ is supported). >> > >> > We will proceed with migrating the providers package to already agreed >> > paths without waiting for the final vote (following current version of >> > AIP-21). Since we have working POC - we know the agreed paths will work >> for >> > us. >> > >> > *Previous discussions: * >> > >> > - >> > >> > >> https://lists.apache.org/thread.html/b07a93c9114e3d3c55d4ee514955bac79bc012c7a00db627c6b4c55f@%3Cdev.airflow.apache.org%3E >> > - >> > >> > >> https://lists.apache.org/thread.html/e25ddc546e367a4af3e594fecbd4431959bd5a89045e748e4206e7ff@%3Cdev.airflow.apache.org%3E >> > >> > *More Details*: >> > >> > 1) Information that we are going in the direction of AIP-8 but not yet >> > reaching it - focusing on separating out backportable packages >> installable >> > in Airflow releases 1.10.* . Airflow 2.0 will still be installed as a >> whole >> > and all the source will be kept in one repo, but we now have a way to >> build >> > backportable packages for groups of operators. POC available here: >> > https://github.com/apache/airflow/pull/6507 (based on Ash's >> > https://github.com/ashb/airflow-submodule-test) >> > >> > 2) We move all integrations to new packages (keeping deprecated import >> > aliases in the old places). The following split (according to >> "stewardship" >> > over the integrations): >> > >> > - *fundamentals* - core of ariflow - they are really part of Apache >> > Airflow. Stewards - core Airflow team. Not backportable/separated >> out. >> > - *protocols* - are not owned by anyone, they are public and the >> > implementation is fully "open". There are no particular stewards (no >> > need). >> > Users of particular protocols should mainly maintain those and add >> > support >> > for different versions of the protocols. >> > - *software* - both API and software are controlled by someone >> outside >> > of Airflow (commercial or open-source project), but the deployment of >> > that >> > software is "owned" by the user installing Airflow. The "stewardship" >> > might >> > be also the users but the controlling party (Oracle for example) >> might >> > be >> > interested in maintaining those operators as well. >> > - *providers* - API/software/deployments are fully controlled by a >> 3rd >> > party. Here most likely "provider" will be interested in maintaining >> the >> > operators (and for example like Google - provide integration >> guidelines >> > < >> > >> https://docs.google.com/document/d/1_rTdJSLCt0eyrAylmmgYc3yZr-_h51fVlnvMmWqhCkY/edit?usp=drive_web&ouid=112320280470690058978 >> > > >> > for >> > their hooks/operators/sensors) >> > >> > >> > 3) Between-providers transfer operators should be kept at the "target" >> > rather than "source" >> > For example S3 -> GCS should be in "google" provider, but GCS-> S3 >> should >> > be in "amazon". >> > >> > 4) One-side provider transfer operators should be kept at the "provider" >> > regardless if they are target or source. >> > For example GCS-> SFTP or SFTP -> GCS should be in "google" provider. >> > >> > 5) If in doubt we will discuss individual cases separately. >> > >> > J. >> > >> > -- >> > >> > Jarek Potiuk >> > Polidea <https://www.polidea.com/> | Principal Software Engineer >> > >> > M: +48 660 796 129 <+48660796129> >> > [image: Polidea] <https://www.polidea.com/> >> > >> > > > -- > > Jarek Potiuk > Polidea <https://www.polidea.com/> | Principal Software Engineer > > M: +48 660 796 129 <+48660796129> > [image: Polidea] <https://www.polidea.com/> > > -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> [image: Polidea] <https://www.polidea.com/>
