+1 for Python and Bash being in the stock install -- they are just _so_ commonly used that I think it makes sense to keep them in the base install. (and the virtualenv module is not an onerous dep, not caused us any problems. Yet).
Kubeneretes is also a slighlty funny one since the deps for that will be in "core" anyway thanks to the Kube executor, but I think it probably makes sense to have `from airflow.providers.kubernetes.operators import KubernetesOperator`. Is that the pattern we are going with for the "one-level" providers, or will it be `from airflow.providers.kubernetes.operators.pod_operator import KubernetesOperator`? Possibly more an AIP-8 question: with moving Azure Blob/S3/GCS to separate packages we might have to look at how we enable remote log storage. -a > On 11 Nov 2019, at 15:53, Jarek Potiuk <jarek.pot...@polidea.com> wrote: > > On Mon, Nov 11, 2019 at 4:22 PM Kamil Breguła <kamil.breg...@polidea.com > <mailto:kamil.breg...@polidea.com>> > wrote: > >> One more question. Are you sure you want to move Python and Bash from >> core? These are the elements that are installed in every environment >> because they are required by Airflow, so moving them to a separate >> installed package is pointless in my opinion. >> >> I have no problem with moving them to "fundamentals", but I am not sure if > they are really required ? I looked through the code and other than few > examples and tests, they are not really "required". Maybe that's enough to > keep them in fundamentals, > Also Python operator has some dependencies - virtualenv - which is only > required for this operator so maybe it's worth to keep it separate from > "fundamentals". > > >> On Mon, Nov 11, 2019 at 3:07 PM Kaxil Naik <kaxiln...@gmail.com> wrote: >>> >>> I am fine with this list +1 >>> >>> On Mon, Nov 11, 2019 at 1:27 PM Jarek Potiuk <jarek.pot...@polidea.com> >>> wrote: >>> >>>> I am all for it Kamil! >>>> >>>> Super happy to treat Apache projects in the same way as "proprietary" >>>> providers :). Anyone else has some other comments ? >>>> >>>> J. >>>> >>>> On Mon, Nov 11, 2019 at 2:17 PM Kamil Breguła < >> kamil.breg...@polidea.com> >>>> wrote: >>>> >>>>> I looked at this list and I'm only worried about two operators. >>>>> >>>>> airflow.contrib.operators.vertica_to_hive >>>>> airflow.contrib.operators.s3_to_hive >>>>> >>>>> If we want the operators to be grouped according to destination, then >>>>> this operator should be in apache package. It is the members of the >>>>> Apache community who will care most about this operator being of high >>>>> quality. Apache can be treated equally with other large cloud >>>>> providers, such as GCP, AWS. I can imagine that a new Apache product >>>>> will appear and it will want to promote the same way as products of >>>>> cloud providers are promoted. By creating a large number of >>>>> integrations that allow you to copy data to its operating range. >>>>> There's another cases - building a strong Apache community. As a >>>>> member of the Apache community, we should promote Apache products to >>>>> ensure that the development of the community is correct, and >> therefore >>>>> also for integration into our products with other products. >>>>> >>>>> On Mon, Nov 11, 2019 at 12:28 AM Jarek Potiuk < >> jarek.pot...@polidea.com> >>>>> wrote: >>>>>> >>>>>> Just to select the "packages" for this update. Anyone has >> objections >>>> for >>>>>> this structure (details including transfer operators in >>>>>> >>>>>> https://docs.google.com/spreadsheets/d/17zA5t2JVxnDdg5Cs1Cg_ >>>>>> Mb1GXvGctmesfg2L089QSOk/edit#gid=0? >>>>>> >>>>>> *Fundamentals (no change)* >>>>>> >>>>>> >>>>>> >>>>>> providers >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> google >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> cloud >>>>>> >>>>>> >>>>>> >>>>>> gsuite >>>>>> >>>>>> >>>>>> >>>>>> marketing_platform >>>>>> >>>>>> >>>>>> amazon >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> aws >>>>>> >>>>>> >>>>>> microsoft >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> azure >>>>>> >>>>>> >>>>>> apache >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> cassandra >>>>>> >>>>>> >>>>>> >>>>>> druid >>>>>> >>>>>> >>>>>> >>>>>> hadoop >>>>>> >>>>>> >>>>>> >>>>>> hive >>>>>> >>>>>> >>>>>> >>>>>> pig >>>>>> >>>>>> >>>>>> >>>>>> pinot >>>>>> >>>>>> >>>>>> >>>>>> spark >>>>>> >>>>>> >>>>>> >>>>>> sqoop >>>>>> >>>>>> >>>>>> mysql >>>>>> >>>>>> >>>>>> >>>>>> jira >>>>>> >>>>>> >>>>>> >>>>>> databricks >>>>>> >>>>>> >>>>>> >>>>>> datadog >>>>>> >>>>>> >>>>>> >>>>>> dingding >>>>>> >>>>>> >>>>>> >>>>>> discord >>>>>> >>>>>> >>>>>> >>>>>> cloudant >>>>>> >>>>>> >>>>>> >>>>>> jenkins >>>>>> >>>>>> >>>>>> >>>>>> opsgenie >>>>>> >>>>>> >>>>>> >>>>>> qubole >>>>>> >>>>>> >>>>>> >>>>>> salesforce >>>>>> >>>>>> >>>>>> >>>>>> segment >>>>>> >>>>>> >>>>>> >>>>>> slack >>>>>> >>>>>> >>>>>> >>>>>> snowflake >>>>>> >>>>>> >>>>>> >>>>>> vertica >>>>>> >>>>>> >>>>>> >>>>>> zendesk >>>>>> >>>>>> >>>>>> >>>>>> celery >>>>>> >>>>>> >>>>>> >>>>>> docker >>>>>> >>>>>> >>>>>> >>>>>> bash >>>>>> >>>>>> >>>>>> >>>>>> kubernetes >>>>>> >>>>>> >>>>>> >>>>>> mssql >>>>>> >>>>>> >>>>>> >>>>>> mongodb >>>>>> >>>>>> >>>>>> >>>>>> mysql >>>>>> >>>>>> >>>>>> >>>>>> openfaas >>>>>> >>>>>> >>>>>> >>>>>> oracle >>>>>> >>>>>> >>>>>> >>>>>> papermill >>>>>> >>>>>> >>>>>> >>>>>> postgres >>>>>> >>>>>> >>>>>> >>>>>> presto >>>>>> >>>>>> >>>>>> >>>>>> python >>>>>> >>>>>> >>>>>> >>>>>> redis >>>>>> >>>>>> >>>>>> >>>>>> samba >>>>>> >>>>>> >>>>>> >>>>>> sqlite >>>>>> >>>>>> >>>>>> >>>>>> imap >>>>>> >>>>>> >>>>>> >>>>>> ssh >>>>>> >>>>>> >>>>>> >>>>>> filesystem >>>>>> >>>>>> >>>>>> >>>>>> sftp >>>>>> >>>>>> >>>>>> >>>>>> ftp >>>>>> >>>>>> >>>>>> >>>>>> http >>>>>> >>>>>> >>>>>> >>>>>> grpc >>>>>> >>>>>> >>>>>> >>>>>> smtp >>>>>> >>>>>> >>>>>> >>>>>> jdbc >>>>>> >>>>>> >>>>>> >>>>>> winrm >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Nov 8, 2019 at 5:47 PM Jarek Potiuk < >> jarek.pot...@polidea.com> >>>>>> wrote: >>>>>> >>>>>>> Let me then cancel this vote and I will restart it next week. >>>>>>> >>>>>>> Yeah. It's a bit like re-opening the Pandora's box but now that >> we >>>> know >>>>>>> that we can do it, and we are unblocked in moving to google >> (which is >>>>> now >>>>>>> the biggest move in-progress), we can spend more time on getting >>>>> better >>>>>>> (and more final) consensus. >>>>>>> I decided to go through the list from the docs (once again Kamil >> - >>>>> great >>>>>>> that you did it) and prepared this spreadsheet showing the >>>> structure. I >>>>>>> went through ALL the operators and put them in the right place >> where >>>>> our >>>>>>> current rules place them. >>>>>>> >>>>>>> After this exercise, I think that makes sense: >>>>>>> - put all the stuff except fundamentals in *"providers"* >> (everything >>>>>>> in "providers" will be potentially backportable). >>>>>>> - grouping apache projects under *"apache"* - similar to >>>>>>> google/amazon/microsoft (different kind of ownership but still >> it is >>>> an >>>>>>> ownership) >>>>>>> - for the rest I think what we can do is really to put the >> operators >>>> in >>>>>>> folders per "service/company" (without sub-packages). That >> includes >>>>>>> sftp/ssh/ftp etc (should we group [ftp and sftp] or [ssh and >> sftp] >>>> ??). >>>>>>> there is no "ownership" there and no reason to group them. That >> will >>>>> put >>>>>>> "operators/hooks/sensors" at different levels in the directory >> tree >>>>> but we >>>>>>> already have that for fundamentals and I am not too worried about >>>>> that. We >>>>>>> do not have to have everything at the same level. >>>>>>> - I put transfer operators according to the rule where "to" side >> is >>>>> more >>>>>>> important unless the other side is a public protocol (so sftp -> >> gcs >>>>> and >>>>>>> gcs -> sftp both go to google/gcp). I did not have any doubt >> where to >>>>> put >>>>>>> which transfer operator, so this is a good sign: >>>>>>> >>>>>>> >>>>>>> >>>>> >>>> >> https://docs.google.com/spreadsheets/d/17zA5t2JVxnDdg5Cs1Cg_Mb1GXvGctmesfg2L089QSOk/edit#gid=0 >>>>>>> >>>>>>> Can you please take a look and express your opinions here so >> that we >>>>> can >>>>>>> have final voting next week (for those who are not yet tired >> with the >>>>>>> discussion ;)). >>>>>>> >>>>>>> J. >>>>>>> >>>>>>> On Fri, Nov 8, 2019 at 4:38 PM Kaxil Naik <kaxiln...@gmail.com> >>>> wrote: >>>>>>> >>>>>>>> Yes, that makes sense. >>>>>>>> >>>>>>>> On Fri, Nov 8, 2019 at 3:22 PM Kamil Breguła < >>>>> kamil.breg...@polidea.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> In the case of Hadoop, it is published by Apache, so it can >> be in >>>>> the >>>>>>>>> apache directory. This will mimic the grouping presented in >> the >>>>>>>>> documentation. >>>>>>>>> >>>>>>>> >>>>> >>>> >> https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html#software-operators-and-hooks >>>>>>>>> >>>>>>>>> On Fri, Nov 8, 2019 at 3:47 PM Kaxil Naik < >> kaxiln...@gmail.com> >>>>> wrote: >>>>>>>>>> >>>>>>>>>> I think we should keep the vote open at least until mid next >>>> week >>>>> to >>>>>>>> have >>>>>>>>>> more thought and inputs on this one. >>>>>>>>>> >>>>>>>>>> In general, I am happy with the approach but >> operators/hooks and >>>>>>>> sensors >>>>>>>>>> shouldn't be a provider. "hadoop" can be its provider and >> hdfs >>>>> can be >>>>>>>> a >>>>>>>>>> part of it. >>>>>>>>>> >>>>>>>>>> providers/ >>>>>>>>>> google >>>>>>>>>> cloud >>>>>>>>>> operators >>>>>>>>>> hooks >>>>>>>>>> sensors >>>>>>>>>> gsuite >>>>>>>>>> operators >>>>>>>>>> ... >>>>>>>>>> amazon >>>>>>>>>> aws >>>>>>>>>> operators >>>>>>>>>> ... >>>>>>>>>> microsoft >>>>>>>>>> azure >>>>>>>>>> operators >>>>>>>>>> ... >>>>>>>>>> hadoop >>>>>>>>>> hdfs >>>>>>>>>> operators >>>>>>>>>> ... >>>>>>>>>> >>>>>>>>>> We can also define what is a "provider" so we know what to >> add >>>> in >>>>> it >>>>>>>> in >>>>>>>>> the >>>>>>>>>> future. SSH/FTP/SFTP belongs to the same family group. Do we >>>> want >>>>> to >>>>>>>> have >>>>>>>>>> separate providers for each one of them ??? >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Kaxil >>>>>>>>>> >>>>>>>>>> On Fri, Nov 8, 2019 at 9:08 AM Jarek Potiuk < >>>>> jarek.pot...@polidea.com >>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> I really like to make everything a provider. That's a >> great >>>>> idea ! >>>>>>>>> This way >>>>>>>>>>> everything "backportable" will have to be in "providers" >>>>> package. >>>>>>>>> Really >>>>>>>>>>> nice and clean separation (and less mess in "airflow"). >> And we >>>>> will >>>>>>>> not >>>>>>>>>>> have to have any artificial grouping (we can still group >> them >>>>> at the >>>>>>>>>>> documentation level). >>>>>>>>>>> >>>>>>>>>>> We do not need backport in name. And I think it's more of >>>>> technical >>>>>>>>> detail >>>>>>>>>>> on naming the package which we can work out while >> reviewing >>>> PRs >>>>> and >>>>>>>> we >>>>>>>>> can >>>>>>>>>>> agree final naming of the released packaged on PMC level >> (PMCs >>>>> will >>>>>>>>> have to >>>>>>>>>>> vote on releasing those). >>>>>>>>>>> >>>>>>>>>>> The thinking is that it's intention is really to be only >>>>> backported >>>>>>>> to >>>>>>>>> 1.10 >>>>>>>>>>> - we are not going (yet) to use the packages in Airflow >> 2.*. >>>> so >>>>> I >>>>>>>>> thought >>>>>>>>>>> by naming them backport we can express that intent more >>>> clearly. >>>>>>>>>>> >>>>>>>>>>> So let me clarify the structure of folders we are going to >>>> have >>>>> if >>>>>>>> we >>>>>>>>>>> follow it (i just added some examples) including the >> already >>>>> agreed >>>>>>>>> changes >>>>>>>>>>> from AIP-21: >>>>>>>>>>> >>>>>>>>>>> providers/ >>>>>>>>>>> google >>>>>>>>>>> cloud >>>>>>>>>>> operators >>>>>>>>>>> hooks >>>>>>>>>>> sensors >>>>>>>>>>> gsuite >>>>>>>>>>> operators >>>>>>>>>>> ... >>>>>>>>>>> amazon >>>>>>>>>>> aws >>>>>>>>>>> operators >>>>>>>>>>> ... >>>>>>>>>>> microsoft >>>>>>>>>>> azure >>>>>>>>>>> operators >>>>>>>>>>> ... >>>>>>>>>>> operators >>>>>>>>>>> sqlite.py >>>>>>>>>>> oracle.py >>>>>>>>>>> docker.py >>>>>>>>>>> hooks >>>>>>>>>>> hdfs.py >>>>>>>>>>> sqlite.py >>>>>>>>>>> sensors >>>>>>>>>>> http.py >>>>>>>>>>> sql.py >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> J. >>>>>>>>>>> >>>>>>>>>>> On Fri, Nov 8, 2019 at 9:43 AM Ash Berlin-Taylor < >>>>> a...@apache.org> >>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Do we need to include `-backport,`? What was the >> thinking >>>>> behind >>>>>>>>> that? >>>>>>>>>>>> >>>>>>>>>>>> I think software and protocol should be merged. I would >> also >>>>> say >>>>>>>>>>>> _everything_ is a provider, so >>>>> airflow.providers.ssh.SSHOperator >>>>>>>> for >>>>>>>>>>>> instance is what I would prefer >>>>>>>>>>>> >>>>>>>>>>>> -a >>>>>>>>>>>> >>>>>>>>>>>> On 8 November 2019 08:32:42 GMT, Jarek Potiuk < >>>>>>>>> jarek.pot...@polidea.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> One more day to go. I would love to see some opinions >> on >>>> this >>>>>>>> AIP-21 >>>>>>>>>>>>> update >>>>>>>>>>>>> :). >>>>>>>>>>>>> >>>>>>>>>>>>> Executive summary: >>>>>>>>>>>>> >>>>>>>>>>>>> * we will be moving a number of integrations to >>>> sub-packages >>>>> of >>>>>>>>>>>>> airflow. >>>>>>>>>>>>> * they will be backportable to 1.10.*. There will be >>>>>>>>>>>>> 'apache-airflow-[package]-backport' pypi installable >> with >>>>> python >>>>>>>> 3 >>>>>>>>> that >>>>>>>>>>>>> will make Airflow 2.0 operators/hooks etc. available >> with >>>>> 1.10* >>>>>>>>>>>>> operators. >>>>>>>>>>>>> * the current proposal for sub-packages is >>>>>>>>>>>>> "protocols/software/providers/" >>>>>>>>>>>>> (but if you think merging protocols and software makes >>>> sense >>>>> - >>>>>>>>> please >>>>>>>>>>>>> express your opinion >>>>>>>>>>>>> * we are not moving "fundamental" operators/hooks etc.. >>>>>>>>>>>>> * Airflow 2.0 is still going to be installed as a >> single >>>>> package >>>>>>>>> with >>>>>>>>>>>>> all >>>>>>>>>>>>> operators (so we are not yet implementing AIP-8) >>>>>>>>>>>>> >>>>>>>>>>>>> J. >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Nov 6, 2019 at 10:07 AM Jarek Potiuk < >>>>>>>>> jarek.pot...@polidea.com> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> I think all this cases are valid but maybe I was not >>>>>>>> super-clear. >>>>>>>>>>>>> It's >>>>>>>>>>>>>> only the transfer operators that we need to decide >> where >>>> to >>>>>>>> put - >>>>>>>>> not >>>>>>>>>>>>>> hooks. >>>>>>>>>>>>>> Usually the complexity of communication with >> particular >>>>>>>> storages >>>>>>>>> is >>>>>>>>>>>>> (or at >>>>>>>>>>>>>> least should be) in the Hooks rather than Operators. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Operators should be just thin wrappers over the >> logic in >>>>> the >>>>>>>>> hooks. >>>>>>>>>>>>>> Hooks are going to stay where they belong - S3 Hooks >> in >>>>> amazon, >>>>>>>>> GCS >>>>>>>>>>>>> Hooks >>>>>>>>>>>>>> in google.cloud, GoogleSheet Hooks in google.gsuite. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Since we actually have mono-repo - this will be no >>>> problem >>>>>>>> (and no >>>>>>>>>>>>> cross >>>>>>>>>>>>>> dependencies problem) to have S3 -> GCS operator in >>>>> google and >>>>>>>>> use >>>>>>>>>>>>> hooks >>>>>>>>>>>>>> from both google/amazon. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I hope this alleviates your concern Daniel ? >>>>>>>>>>>>>> >>>>>>>>>>>>>> J. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> What about GoogleSheetsToS3? GoogleSheetsToGCS? >> These >>>>> you >>>>>>>> would >>>>>>>>>>>>> put in >>>>>>>>>>>>>>> the target, i.e. the storage? But >> GoogleSheetsToSftp >>>>> would >>>>>>>> be in >>>>>>>>>>>>> google >>>>>>>>>>>>>>> sheets operators file? The complexity, and the >> shared >>>>> code, >>>>>>>> are >>>>>>>>> in >>>>>>>>>>>>> the >>>>>>>>>>>>>>> gsheet component -- not into the storage >> destination. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, Nov 5, 2019 at 5:46 PM Jarek Potiuk >>>>>>>>>>>>> <jarek.pot...@polidea.com> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hello Airflow Community, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The email calls for a vote to update AIP-21 >> Changes in >>>>>>>> import >>>>>>>>>>>>> paths >>>>>>>>>>>>>>>> < >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>> >>>> >> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>> the changes described below. The vote will last >> till >>>>>>>> Saturday >>>>>>>>> 8th >>>>>>>>>>>>> 2am >>>>>>>>>>>>>>> CEST >>>>>>>>>>>>>>>> (72 hours). Committers have a binding vote but >>>> everyone >>>>> from >>>>>>>>> the >>>>>>>>>>>>>>> community >>>>>>>>>>>>>>>> is encouraged to cast an advisory vote. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> *Summary*: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The proposal is to update AIP-21 to move all >> non-core >>>>>>>>>>>>>>>> operators/hooks/sensor (and related files) to >>>>> sub-packages >>>>>>>>> within >>>>>>>>>>>>>>> airflow >>>>>>>>>>>>>>>> (protocols/software/providers) or >>>> (software/providers). >>>>>>>>>>>>>>>> I am also happy to merge protocols+software, so >> if you >>>>> have >>>>>>>> a >>>>>>>>>>>>> strong >>>>>>>>>>>>>>>> opinion on it - please state it with your vote >> and we >>>>> can >>>>>>>>> decide >>>>>>>>>>>>> based >>>>>>>>>>>>>>> on >>>>>>>>>>>>>>>> majority. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Those packages will be separately released >>>>> (schedule/process >>>>>>>>> TBD) >>>>>>>>>>>>> and >>>>>>>>>>>>>>> will >>>>>>>>>>>>>>>> be backportable to 1.10.* airflow series, so that >>>> users >>>>> can >>>>>>>>>>>>> install it >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>> start using new Airflow2.0 operators in their >> Python 3 >>>>>>>> Airflow >>>>>>>>>>>>> 1.10 >>>>>>>>>>>>>>>> environments (only Python 3.5+ is supported). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> We will proceed with migrating the providers >> package >>>> to >>>>>>>> already >>>>>>>>>>>>> agreed >>>>>>>>>>>>>>>> paths without waiting for the final vote >> (following >>>>> current >>>>>>>>>>>>> version of >>>>>>>>>>>>>>>> AIP-21). Since we have working POC - we know the >>>> agreed >>>>>>>> paths >>>>>>>>> will >>>>>>>>>>>>> work >>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>> us. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> *Previous discussions: * >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>> >>>> >> https://lists.apache.org/thread.html/b07a93c9114e3d3c55d4ee514955bac79bc012c7a00db627c6b4c55f@%3Cdev.airflow.apache.org%3E >>>>>>>>>>>>>>>> - >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>> >>>> >> https://lists.apache.org/thread.html/e25ddc546e367a4af3e594fecbd4431959bd5a89045e748e4206e7ff@%3Cdev.airflow.apache.org%3E >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> *More Details*: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 1) Information that we are going in the direction >> of >>>>> AIP-8 >>>>>>>> but >>>>>>>>> not >>>>>>>>>>>>> yet >>>>>>>>>>>>>>>> reaching it - focusing on separating out >> backportable >>>>>>>> packages >>>>>>>>>>>>>>> installable >>>>>>>>>>>>>>>> in Airflow releases 1.10.* . Airflow 2.0 will >> still be >>>>>>>>> installed >>>>>>>>>>>>> as a >>>>>>>>>>>>>>> whole >>>>>>>>>>>>>>>> and all the source will be kept in one repo, but >> we >>>> now >>>>>>>> have a >>>>>>>>> way >>>>>>>>>>>>> to >>>>>>>>>>>>>>> build >>>>>>>>>>>>>>>> backportable packages for groups of operators. POC >>>>> available >>>>>>>>> here: >>>>>>>>>>>>>>>> https://github.com/apache/airflow/pull/6507 >> (based on >>>>> Ash's >>>>>>>>>>>>>>>> https://github.com/ashb/airflow-submodule-test) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 2) We move all integrations to new packages >> (keeping >>>>>>>> deprecated >>>>>>>>>>>>> import >>>>>>>>>>>>>>>> aliases in the old places). The following split >>>>> (according >>>>>>>> to >>>>>>>>>>>>>>> "stewardship" >>>>>>>>>>>>>>>> over the integrations): >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - *fundamentals* - core of ariflow - they are >>>> really >>>>>>>> part of >>>>>>>>>>>>> Apache >>>>>>>>>>>>>>>> Airflow. Stewards - core Airflow team. Not >>>>>>>>>>>>> backportable/separated >>>>>>>>>>>>>>> out. >>>>>>>>>>>>>>>> - *protocols* - are not owned by anyone, they >> are >>>>> public >>>>>>>> and >>>>>>>>>>>>> the >>>>>>>>>>>>>>>> implementation is fully "open". There are no >>>>> particular >>>>>>>>>>>>> stewards (no >>>>>>>>>>>>>>>> need). >>>>>>>>>>>>>>>> Users of particular protocols should mainly >>>> maintain >>>>>>>> those >>>>>>>>> and >>>>>>>>>>>>> add >>>>>>>>>>>>>>>> support >>>>>>>>>>>>>>>> for different versions of the protocols. >>>>>>>>>>>>>>>> - *software* - both API and software are >> controlled >>>>> by >>>>>>>>> someone >>>>>>>>>>>>>>> outside >>>>>>>>>>>>>>>> of Airflow (commercial or open-source >> project), but >>>>> the >>>>>>>>>>>>> deployment of >>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>> software is "owned" by the user installing >> Airflow. >>>>> The >>>>>>>>>>>>> "stewardship" >>>>>>>>>>>>>>>> might >>>>>>>>>>>>>>>> be also the users but the controlling party >> (Oracle >>>>> for >>>>>>>>>>>>> example) >>>>>>>>>>>>>>> might >>>>>>>>>>>>>>>> be >>>>>>>>>>>>>>>> interested in maintaining those operators as >> well. >>>>>>>>>>>>>>>> - *providers* - API/software/deployments are >> fully >>>>>>>>> controlled >>>>>>>>>>>>> by a >>>>>>>>>>>>>>> 3rd >>>>>>>>>>>>>>>> party. Here most likely "provider" will be >>>>> interested in >>>>>>>>>>>>> maintaining >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> operators (and for example like Google - >> provide >>>>>>>> integration >>>>>>>>>>>>>>> guidelines >>>>>>>>>>>>>>>> < >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>> >>>> >> https://docs.google.com/document/d/1_rTdJSLCt0eyrAylmmgYc3yZr-_h51fVlnvMmWqhCkY/edit?usp=drive_web&ouid=112320280470690058978 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>> their hooks/operators/sensors) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 3) Between-providers transfer operators should be >> kept >>>>> at >>>>>>>> the >>>>>>>>>>>>> "target" >>>>>>>>>>>>>>>> rather than "source" >>>>>>>>>>>>>>>> For example S3 -> GCS should be in "google" >> provider, >>>>> but >>>>>>>>> GCS-> S3 >>>>>>>>>>>>>>> should >>>>>>>>>>>>>>>> be in "amazon". >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 4) One-side provider transfer operators should be >> kept >>>>> at >>>>>>>> the >>>>>>>>>>>>> "provider" >>>>>>>>>>>>>>>> regardless if they are target or source. >>>>>>>>>>>>>>>> For example GCS-> SFTP or SFTP -> GCS should be in >>>>> "google" >>>>>>>>>>>>> provider. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 5) If in doubt we will discuss individual cases >>>>> separately. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> J. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Jarek Potiuk >>>>>>>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal >>>> Software >>>>>>>>> Engineer >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> M: +48 660 796 129 <+48660796129> >>>>>>>>>>>>>>>> [image: Polidea] <https://www.polidea.com/> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> >>>>>>>>>>>>>> Jarek Potiuk >>>>>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal >> Software >>>>>>>> Engineer >>>>>>>>>>>>>> >>>>>>>>>>>>>> M: +48 660 796 129 <+48660796129> >>>>>>>>>>>>>> [image: Polidea] <https://www.polidea.com/> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> >>>>>>>>>>>>> Jarek Potiuk >>>>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal >> Software >>>>> Engineer >>>>>>>>>>>>> >>>>>>>>>>>>> M: +48 660 796 129 <+48660796129> >>>>>>>>>>>>> [image: Polidea] <https://www.polidea.com/> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>> Jarek Potiuk >>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal Software >>>>> Engineer >>>>>>>>>>> >>>>>>>>>>> M: +48 660 796 129 <+48660796129> >>>>>>>>>>> [image: Polidea] <https://www.polidea.com/> >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> Jarek Potiuk >>>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer >>>>>>> >>>>>>> M: +48 660 796 129 <+48660796129> >>>>>>> [image: Polidea] <https://www.polidea.com/> >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> Jarek Potiuk >>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer >>>>>> >>>>>> M: +48 660 796 129 <+48660796129> >>>>>> [image: Polidea] <https://www.polidea.com/> >>>>> >>>> >>>> >>>> -- >>>> >>>> Jarek Potiuk >>>> Polidea <https://www.polidea.com/> | Principal Software Engineer >>>> >>>> M: +48 660 796 129 <+48660796129> >>>> [image: Polidea] <https://www.polidea.com/> >>>> >> > > > -- > > Jarek Potiuk > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> | Principal > Software Engineer > > M: +48 660 796 129 <+48660796129> > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>