In the case of Hadoop, it is published by Apache, so it can be in the apache directory. This will mimic the grouping presented in the documentation. https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html#software-operators-and-hooks
On Fri, Nov 8, 2019 at 3:47 PM Kaxil Naik <kaxiln...@gmail.com> wrote: > > I think we should keep the vote open at least until mid next week to have > more thought and inputs on this one. > > In general, I am happy with the approach but operators/hooks and sensors > shouldn't be a provider. "hadoop" can be its provider and hdfs can be a > part of it. > > providers/ > google > cloud > operators > hooks > sensors > gsuite > operators > ... > amazon > aws > operators > ... > microsoft > azure > operators > ... > hadoop > hdfs > operators > ... > > We can also define what is a "provider" so we know what to add in it in the > future. SSH/FTP/SFTP belongs to the same family group. Do we want to have > separate providers for each one of them ??? > > Regards, > Kaxil > > On Fri, Nov 8, 2019 at 9:08 AM Jarek Potiuk <jarek.pot...@polidea.com> > wrote: > > > I really like to make everything a provider. That's a great idea ! This way > > everything "backportable" will have to be in "providers" package. Really > > nice and clean separation (and less mess in "airflow"). And we will not > > have to have any artificial grouping (we can still group them at the > > documentation level). > > > > We do not need backport in name. And I think it's more of technical detail > > on naming the package which we can work out while reviewing PRs and we can > > agree final naming of the released packaged on PMC level (PMCs will have to > > vote on releasing those). > > > > The thinking is that it's intention is really to be only backported to 1.10 > > - we are not going (yet) to use the packages in Airflow 2.*. so I thought > > by naming them backport we can express that intent more clearly. > > > > So let me clarify the structure of folders we are going to have if we > > follow it (i just added some examples) including the already agreed changes > > from AIP-21: > > > > providers/ > > google > > cloud > > operators > > hooks > > sensors > > gsuite > > operators > > ... > > amazon > > aws > > operators > > ... > > microsoft > > azure > > operators > > ... > > operators > > sqlite.py > > oracle.py > > docker.py > > hooks > > hdfs.py > > sqlite.py > > sensors > > http.py > > sql.py > > > > > > J. > > > > On Fri, Nov 8, 2019 at 9:43 AM Ash Berlin-Taylor <a...@apache.org> wrote: > > > > > Do we need to include `-backport,`? What was the thinking behind that? > > > > > > I think software and protocol should be merged. I would also say > > > _everything_ is a provider, so airflow.providers.ssh.SSHOperator for > > > instance is what I would prefer > > > > > > -a > > > > > > On 8 November 2019 08:32:42 GMT, Jarek Potiuk <jarek.pot...@polidea.com> > > > wrote: > > > >One more day to go. I would love to see some opinions on this AIP-21 > > > >update > > > >:). > > > > > > > >Executive summary: > > > > > > > >* we will be moving a number of integrations to sub-packages of > > > >airflow. > > > >* they will be backportable to 1.10.*. There will be > > > >'apache-airflow-[package]-backport' pypi installable with python 3 that > > > >will make Airflow 2.0 operators/hooks etc. available with 1.10* > > > >operators. > > > >* the current proposal for sub-packages is > > > >"protocols/software/providers/" > > > >(but if you think merging protocols and software makes sense - please > > > >express your opinion > > > >* we are not moving "fundamental" operators/hooks etc.. > > > >* Airflow 2.0 is still going to be installed as a single package with > > > >all > > > >operators (so we are not yet implementing AIP-8) > > > > > > > >J. > > > > > > > >On Wed, Nov 6, 2019 at 10:07 AM Jarek Potiuk <jarek.pot...@polidea.com> > > > >wrote: > > > > > > > >> I think all this cases are valid but maybe I was not super-clear. > > > >It's > > > >> only the transfer operators that we need to decide where to put - not > > > >> hooks. > > > >> Usually the complexity of communication with particular storages is > > > >(or at > > > >> least should be) in the Hooks rather than Operators. > > > >> > > > >> Operators should be just thin wrappers over the logic in the hooks. > > > >> Hooks are going to stay where they belong - S3 Hooks in amazon, GCS > > > >Hooks > > > >> in google.cloud, GoogleSheet Hooks in google.gsuite. > > > >> > > > >> Since we actually have mono-repo - this will be no problem (and no > > > >cross > > > >> dependencies problem) to have S3 -> GCS operator in google and use > > > >hooks > > > >> from both google/amazon. > > > >> > > > >> I hope this alleviates your concern Daniel ? > > > >> > > > >> J. > > > >> > > > >> > > > >>> What about GoogleSheetsToS3? GoogleSheetsToGCS? These you would > > > >put in > > > >>> the target, i.e. the storage? But GoogleSheetsToSftp would be in > > > >google > > > >>> sheets operators file? The complexity, and the shared code, are in > > > >the > > > >>> gsheet component -- not into the storage destination. > > > >>> > > > >>> > > > >> > > > >> > > > >> > > > >>> On Tue, Nov 5, 2019 at 5:46 PM Jarek Potiuk > > > ><jarek.pot...@polidea.com> > > > >>> wrote: > > > >>> > > > >>> > Hello Airflow Community, > > > >>> > > > > >>> > The email calls for a vote to update AIP-21 Changes in import > > > >paths > > > >>> > < > > > >>> > > > > >>> > > > > > > > > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths > > > >>> > > > > > >>> > with > > > >>> > the changes described below. The vote will last till Saturday 8th > > > >2am > > > >>> CEST > > > >>> > (72 hours). Committers have a binding vote but everyone from the > > > >>> community > > > >>> > is encouraged to cast an advisory vote. > > > >>> > > > > >>> > *Summary*: > > > >>> > > > > >>> > The proposal is to update AIP-21 to move all non-core > > > >>> > operators/hooks/sensor (and related files) to sub-packages within > > > >>> airflow > > > >>> > (protocols/software/providers) or (software/providers). > > > >>> > I am also happy to merge protocols+software, so if you have a > > > >strong > > > >>> > opinion on it - please state it with your vote and we can decide > > > >based > > > >>> on > > > >>> > majority. > > > >>> > > > > >>> > Those packages will be separately released (schedule/process TBD) > > > >and > > > >>> will > > > >>> > be backportable to 1.10.* airflow series, so that users can > > > >install it > > > >>> and > > > >>> > start using new Airflow2.0 operators in their Python 3 Airflow > > > >1.10 > > > >>> > environments (only Python 3.5+ is supported). > > > >>> > > > > >>> > We will proceed with migrating the providers package to already > > > >agreed > > > >>> > paths without waiting for the final vote (following current > > > >version of > > > >>> > AIP-21). Since we have working POC - we know the agreed paths will > > > >work > > > >>> for > > > >>> > us. > > > >>> > > > > >>> > *Previous discussions: * > > > >>> > > > > >>> > - > > > >>> > > > > >>> > > > > >>> > > > > > > > > > https://lists.apache.org/thread.html/b07a93c9114e3d3c55d4ee514955bac79bc012c7a00db627c6b4c55f@%3Cdev.airflow.apache.org%3E > > > >>> > - > > > >>> > > > > >>> > > > > >>> > > > > > > > > > https://lists.apache.org/thread.html/e25ddc546e367a4af3e594fecbd4431959bd5a89045e748e4206e7ff@%3Cdev.airflow.apache.org%3E > > > >>> > > > > >>> > *More Details*: > > > >>> > > > > >>> > 1) Information that we are going in the direction of AIP-8 but not > > > >yet > > > >>> > reaching it - focusing on separating out backportable packages > > > >>> installable > > > >>> > in Airflow releases 1.10.* . Airflow 2.0 will still be installed > > > >as a > > > >>> whole > > > >>> > and all the source will be kept in one repo, but we now have a way > > > >to > > > >>> build > > > >>> > backportable packages for groups of operators. POC available here: > > > >>> > https://github.com/apache/airflow/pull/6507 (based on Ash's > > > >>> > https://github.com/ashb/airflow-submodule-test) > > > >>> > > > > >>> > 2) We move all integrations to new packages (keeping deprecated > > > >import > > > >>> > aliases in the old places). The following split (according to > > > >>> "stewardship" > > > >>> > over the integrations): > > > >>> > > > > >>> > - *fundamentals* - core of ariflow - they are really part of > > > >Apache > > > >>> > Airflow. Stewards - core Airflow team. Not > > > >backportable/separated > > > >>> out. > > > >>> > - *protocols* - are not owned by anyone, they are public and > > > >the > > > >>> > implementation is fully "open". There are no particular > > > >stewards (no > > > >>> > need). > > > >>> > Users of particular protocols should mainly maintain those and > > > >add > > > >>> > support > > > >>> > for different versions of the protocols. > > > >>> > - *software* - both API and software are controlled by someone > > > >>> outside > > > >>> > of Airflow (commercial or open-source project), but the > > > >deployment of > > > >>> > that > > > >>> > software is "owned" by the user installing Airflow. The > > > >"stewardship" > > > >>> > might > > > >>> > be also the users but the controlling party (Oracle for > > > >example) > > > >>> might > > > >>> > be > > > >>> > interested in maintaining those operators as well. > > > >>> > - *providers* - API/software/deployments are fully controlled > > > >by a > > > >>> 3rd > > > >>> > party. Here most likely "provider" will be interested in > > > >maintaining > > > >>> the > > > >>> > operators (and for example like Google - provide integration > > > >>> guidelines > > > >>> > < > > > >>> > > > > >>> > > > > > > > > > https://docs.google.com/document/d/1_rTdJSLCt0eyrAylmmgYc3yZr-_h51fVlnvMmWqhCkY/edit?usp=drive_web&ouid=112320280470690058978 > > > >>> > > > > > >>> > for > > > >>> > their hooks/operators/sensors) > > > >>> > > > > >>> > > > > >>> > 3) Between-providers transfer operators should be kept at the > > > >"target" > > > >>> > rather than "source" > > > >>> > For example S3 -> GCS should be in "google" provider, but GCS-> S3 > > > >>> should > > > >>> > be in "amazon". > > > >>> > > > > >>> > 4) One-side provider transfer operators should be kept at the > > > >"provider" > > > >>> > regardless if they are target or source. > > > >>> > For example GCS-> SFTP or SFTP -> GCS should be in "google" > > > >provider. > > > >>> > > > > >>> > 5) If in doubt we will discuss individual cases separately. > > > >>> > > > > >>> > J. > > > >>> > > > > >>> > -- > > > >>> > > > > >>> > Jarek Potiuk > > > >>> > Polidea <https://www.polidea.com/> | Principal Software Engineer > > > >>> > > > > >>> > M: +48 660 796 129 <+48660796129> > > > >>> > [image: Polidea] <https://www.polidea.com/> > > > >>> > > > > >>> > > > >> > > > >> > > > >> -- > > > >> > > > >> Jarek Potiuk > > > >> Polidea <https://www.polidea.com/> | Principal Software Engineer > > > >> > > > >> M: +48 660 796 129 <+48660796129> > > > >> [image: Polidea] <https://www.polidea.com/> > > > >> > > > >> > > > > > > > >-- > > > > > > > >Jarek Potiuk > > > >Polidea <https://www.polidea.com/> | Principal Software Engineer > > > > > > > >M: +48 660 796 129 <+48660796129> > > > >[image: Polidea] <https://www.polidea.com/> > > > > > > > > > -- > > > > Jarek Potiuk > > Polidea <https://www.polidea.com/> | Principal Software Engineer > > > > M: +48 660 796 129 <+48660796129> > > [image: Polidea] <https://www.polidea.com/> > >