I really like to make everything a provider. That's a great idea ! This way everything "backportable" will have to be in "providers" package. Really nice and clean separation (and less mess in "airflow"). And we will not have to have any artificial grouping (we can still group them at the documentation level).
We do not need backport in name. And I think it's more of technical detail on naming the package which we can work out while reviewing PRs and we can agree final naming of the released packaged on PMC level (PMCs will have to vote on releasing those). The thinking is that it's intention is really to be only backported to 1.10 - we are not going (yet) to use the packages in Airflow 2.*. so I thought by naming them backport we can express that intent more clearly. So let me clarify the structure of folders we are going to have if we follow it (i just added some examples) including the already agreed changes from AIP-21: providers/ google cloud operators hooks sensors gsuite operators ... amazon aws operators ... microsoft azure operators ... operators sqlite.py oracle.py docker.py hooks hdfs.py sqlite.py sensors http.py sql.py J. On Fri, Nov 8, 2019 at 9:43 AM Ash Berlin-Taylor <a...@apache.org> wrote: > Do we need to include `-backport,`? What was the thinking behind that? > > I think software and protocol should be merged. I would also say > _everything_ is a provider, so airflow.providers.ssh.SSHOperator for > instance is what I would prefer > > -a > > On 8 November 2019 08:32:42 GMT, Jarek Potiuk <jarek.pot...@polidea.com> > wrote: > >One more day to go. I would love to see some opinions on this AIP-21 > >update > >:). > > > >Executive summary: > > > >* we will be moving a number of integrations to sub-packages of > >airflow. > >* they will be backportable to 1.10.*. There will be > >'apache-airflow-[package]-backport' pypi installable with python 3 that > >will make Airflow 2.0 operators/hooks etc. available with 1.10* > >operators. > >* the current proposal for sub-packages is > >"protocols/software/providers/" > >(but if you think merging protocols and software makes sense - please > >express your opinion > >* we are not moving "fundamental" operators/hooks etc.. > >* Airflow 2.0 is still going to be installed as a single package with > >all > >operators (so we are not yet implementing AIP-8) > > > >J. > > > >On Wed, Nov 6, 2019 at 10:07 AM Jarek Potiuk <jarek.pot...@polidea.com> > >wrote: > > > >> I think all this cases are valid but maybe I was not super-clear. > >It's > >> only the transfer operators that we need to decide where to put - not > >> hooks. > >> Usually the complexity of communication with particular storages is > >(or at > >> least should be) in the Hooks rather than Operators. > >> > >> Operators should be just thin wrappers over the logic in the hooks. > >> Hooks are going to stay where they belong - S3 Hooks in amazon, GCS > >Hooks > >> in google.cloud, GoogleSheet Hooks in google.gsuite. > >> > >> Since we actually have mono-repo - this will be no problem (and no > >cross > >> dependencies problem) to have S3 -> GCS operator in google and use > >hooks > >> from both google/amazon. > >> > >> I hope this alleviates your concern Daniel ? > >> > >> J. > >> > >> > >>> What about GoogleSheetsToS3? GoogleSheetsToGCS? These you would > >put in > >>> the target, i.e. the storage? But GoogleSheetsToSftp would be in > >google > >>> sheets operators file? The complexity, and the shared code, are in > >the > >>> gsheet component -- not into the storage destination. > >>> > >>> > >> > >> > >> > >>> On Tue, Nov 5, 2019 at 5:46 PM Jarek Potiuk > ><jarek.pot...@polidea.com> > >>> wrote: > >>> > >>> > Hello Airflow Community, > >>> > > >>> > The email calls for a vote to update AIP-21 Changes in import > >paths > >>> > < > >>> > > >>> > > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths > >>> > > > >>> > with > >>> > the changes described below. The vote will last till Saturday 8th > >2am > >>> CEST > >>> > (72 hours). Committers have a binding vote but everyone from the > >>> community > >>> > is encouraged to cast an advisory vote. > >>> > > >>> > *Summary*: > >>> > > >>> > The proposal is to update AIP-21 to move all non-core > >>> > operators/hooks/sensor (and related files) to sub-packages within > >>> airflow > >>> > (protocols/software/providers) or (software/providers). > >>> > I am also happy to merge protocols+software, so if you have a > >strong > >>> > opinion on it - please state it with your vote and we can decide > >based > >>> on > >>> > majority. > >>> > > >>> > Those packages will be separately released (schedule/process TBD) > >and > >>> will > >>> > be backportable to 1.10.* airflow series, so that users can > >install it > >>> and > >>> > start using new Airflow2.0 operators in their Python 3 Airflow > >1.10 > >>> > environments (only Python 3.5+ is supported). > >>> > > >>> > We will proceed with migrating the providers package to already > >agreed > >>> > paths without waiting for the final vote (following current > >version of > >>> > AIP-21). Since we have working POC - we know the agreed paths will > >work > >>> for > >>> > us. > >>> > > >>> > *Previous discussions: * > >>> > > >>> > - > >>> > > >>> > > >>> > > > https://lists.apache.org/thread.html/b07a93c9114e3d3c55d4ee514955bac79bc012c7a00db627c6b4c55f@%3Cdev.airflow.apache.org%3E > >>> > - > >>> > > >>> > > >>> > > > https://lists.apache.org/thread.html/e25ddc546e367a4af3e594fecbd4431959bd5a89045e748e4206e7ff@%3Cdev.airflow.apache.org%3E > >>> > > >>> > *More Details*: > >>> > > >>> > 1) Information that we are going in the direction of AIP-8 but not > >yet > >>> > reaching it - focusing on separating out backportable packages > >>> installable > >>> > in Airflow releases 1.10.* . Airflow 2.0 will still be installed > >as a > >>> whole > >>> > and all the source will be kept in one repo, but we now have a way > >to > >>> build > >>> > backportable packages for groups of operators. POC available here: > >>> > https://github.com/apache/airflow/pull/6507 (based on Ash's > >>> > https://github.com/ashb/airflow-submodule-test) > >>> > > >>> > 2) We move all integrations to new packages (keeping deprecated > >import > >>> > aliases in the old places). The following split (according to > >>> "stewardship" > >>> > over the integrations): > >>> > > >>> > - *fundamentals* - core of ariflow - they are really part of > >Apache > >>> > Airflow. Stewards - core Airflow team. Not > >backportable/separated > >>> out. > >>> > - *protocols* - are not owned by anyone, they are public and > >the > >>> > implementation is fully "open". There are no particular > >stewards (no > >>> > need). > >>> > Users of particular protocols should mainly maintain those and > >add > >>> > support > >>> > for different versions of the protocols. > >>> > - *software* - both API and software are controlled by someone > >>> outside > >>> > of Airflow (commercial or open-source project), but the > >deployment of > >>> > that > >>> > software is "owned" by the user installing Airflow. The > >"stewardship" > >>> > might > >>> > be also the users but the controlling party (Oracle for > >example) > >>> might > >>> > be > >>> > interested in maintaining those operators as well. > >>> > - *providers* - API/software/deployments are fully controlled > >by a > >>> 3rd > >>> > party. Here most likely "provider" will be interested in > >maintaining > >>> the > >>> > operators (and for example like Google - provide integration > >>> guidelines > >>> > < > >>> > > >>> > > > https://docs.google.com/document/d/1_rTdJSLCt0eyrAylmmgYc3yZr-_h51fVlnvMmWqhCkY/edit?usp=drive_web&ouid=112320280470690058978 > >>> > > > >>> > for > >>> > their hooks/operators/sensors) > >>> > > >>> > > >>> > 3) Between-providers transfer operators should be kept at the > >"target" > >>> > rather than "source" > >>> > For example S3 -> GCS should be in "google" provider, but GCS-> S3 > >>> should > >>> > be in "amazon". > >>> > > >>> > 4) One-side provider transfer operators should be kept at the > >"provider" > >>> > regardless if they are target or source. > >>> > For example GCS-> SFTP or SFTP -> GCS should be in "google" > >provider. > >>> > > >>> > 5) If in doubt we will discuss individual cases separately. > >>> > > >>> > J. > >>> > > >>> > -- > >>> > > >>> > Jarek Potiuk > >>> > Polidea <https://www.polidea.com/> | Principal Software Engineer > >>> > > >>> > M: +48 660 796 129 <+48660796129> > >>> > [image: Polidea] <https://www.polidea.com/> > >>> > > >>> > >> > >> > >> -- > >> > >> Jarek Potiuk > >> Polidea <https://www.polidea.com/> | Principal Software Engineer > >> > >> M: +48 660 796 129 <+48660796129> > >> [image: Polidea] <https://www.polidea.com/> > >> > >> > > > >-- > > > >Jarek Potiuk > >Polidea <https://www.polidea.com/> | Principal Software Engineer > > > >M: +48 660 796 129 <+48660796129> > >[image: Polidea] <https://www.polidea.com/> > -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> [image: Polidea] <https://www.polidea.com/>