Just to select the "packages" for this update. Anyone has objections for
this structure (details including transfer operators in

https://docs.google.com/spreadsheets/d/17zA5t2JVxnDdg5Cs1Cg_
Mb1GXvGctmesfg2L089QSOk/edit#gid=0?

*Fundamentals (no change)*



providers




google




cloud



gsuite



marketing_platform


amazon




aws


microsoft




azure


apache




cassandra



druid



hadoop



hive



pig



pinot



spark



sqoop


mysql



jira



databricks



datadog



dingding



discord



cloudant



jenkins



opsgenie



qubole



salesforce



segment



slack



snowflake



vertica



zendesk



celery



docker



bash



kubernetes



mssql



mongodb



mysql



openfaas



oracle



papermill



postgres



presto



python



redis



samba



sqlite



imap



ssh



filesystem



sftp



ftp



http



grpc



smtp



jdbc



winrm



On Fri, Nov 8, 2019 at 5:47 PM Jarek Potiuk <jarek.pot...@polidea.com>
wrote:

> Let me then cancel this vote and I will restart it next week.
>
> Yeah. It's a bit like re-opening the Pandora's box but now that we know
> that we can do it, and we are unblocked in moving to google (which is now
> the biggest move in-progress),  we can spend more time on getting better
> (and more final) consensus.
> I decided to go through the list from the docs (once again Kamil - great
> that you did it) and prepared this spreadsheet showing the structure. I
> went through ALL the operators and put them in the right place where our
> current rules place them.
>
> After this exercise, I think that makes sense:
> - put all the stuff except fundamentals in *"providers"* (everything
> in "providers" will be potentially backportable).
> - grouping apache projects under *"apache"* - similar to
> google/amazon/microsoft (different kind of ownership but still it is an
> ownership)
> - for the rest I think what we can do is really to put the operators in
> folders per "service/company" (without sub-packages). That includes
> sftp/ssh/ftp etc (should we group [ftp and sftp] or [ssh and sftp] ??).
> there is no "ownership" there and no reason to group them. That will put
> "operators/hooks/sensors" at different levels in the directory tree but we
> already have that for fundamentals and I am not too worried about that. We
> do not have to have everything at the same level.
> - I put transfer operators according to the rule where "to" side is more
> important unless the other side is a public protocol (so sftp -> gcs and
> gcs -> sftp both go to google/gcp). I did not have any doubt where to put
> which transfer operator, so this is a good sign:
>
>
> https://docs.google.com/spreadsheets/d/17zA5t2JVxnDdg5Cs1Cg_Mb1GXvGctmesfg2L089QSOk/edit#gid=0
>
> Can you please take a look and express your opinions here so that we can
> have final voting next week (for those who are not yet tired with the
> discussion ;)).
>
> J.
>
> On Fri, Nov 8, 2019 at 4:38 PM Kaxil Naik <kaxiln...@gmail.com> wrote:
>
>> Yes, that makes sense.
>>
>> On Fri, Nov 8, 2019 at 3:22 PM Kamil Breguła <kamil.breg...@polidea.com>
>> wrote:
>>
>> > In the case of Hadoop, it is published by Apache, so it can be in the
>> > apache directory.  This will mimic the grouping presented in the
>> > documentation.
>> >
>> https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html#software-operators-and-hooks
>> >
>> > On Fri, Nov 8, 2019 at 3:47 PM Kaxil Naik <kaxiln...@gmail.com> wrote:
>> > >
>> > > I think we should keep the vote open at least until mid next week to
>> have
>> > > more thought and inputs on this one.
>> > >
>> > > In general, I am happy with the approach but operators/hooks and
>> sensors
>> > > shouldn't be a provider. "hadoop" can be its provider and hdfs can be
>> a
>> > > part of it.
>> > >
>> > > providers/
>> > >     google
>> > >          cloud
>> > >              operators
>> > >              hooks
>> > >              sensors
>> > >          gsuite
>> > >              operators
>> > >              ...
>> > >     amazon
>> > >          aws
>> > >              operators
>> > >              ...
>> > >     microsoft
>> > >          azure
>> > >              operators
>> > >              ...
>> > >     hadoop
>> > >         hdfs
>> > >              operators
>> > >              ...
>> > >
>> > > We can also define what is a "provider" so we know what to add in it
>> in
>> > the
>> > > future. SSH/FTP/SFTP belongs to the same family group. Do we want to
>> have
>> > > separate providers for each one of them ???
>> > >
>> > > Regards,
>> > > Kaxil
>> > >
>> > > On Fri, Nov 8, 2019 at 9:08 AM Jarek Potiuk <jarek.pot...@polidea.com
>> >
>> > > wrote:
>> > >
>> > > > I really like to make everything a provider. That's a great idea !
>> > This way
>> > > > everything "backportable" will have to be in "providers" package.
>> > Really
>> > > > nice and clean separation (and less mess in "airflow"). And we will
>> not
>> > > > have to have any artificial grouping (we can still group them at the
>> > > > documentation level).
>> > > >
>> > > > We do not need backport in name. And I think it's more of technical
>> > detail
>> > > > on naming the package which we can work out while reviewing PRs and
>> we
>> > can
>> > > > agree final naming of the released packaged on PMC level (PMCs will
>> > have to
>> > > > vote on releasing those).
>> > > >
>> > > > The thinking is that it's intention is really to be only backported
>> to
>> > 1.10
>> > > > - we are not going (yet) to use the packages in Airflow 2.*. so I
>> > thought
>> > > > by naming them backport we can express that intent more clearly.
>> > > >
>> > > > So let me clarify the structure of folders we are going to have if
>> we
>> > > > follow it (i just added some examples) including the already agreed
>> > changes
>> > > > from AIP-21:
>> > > >
>> > > > providers/
>> > > >     google
>> > > >          cloud
>> > > >              operators
>> > > >              hooks
>> > > >              sensors
>> > > >          gsuite
>> > > >              operators
>> > > >              ...
>> > > >     amazon
>> > > >          aws
>> > > >              operators
>> > > >              ...
>> > > >     microsoft
>> > > >          azure
>> > > >              operators
>> > > >              ...
>> > > >     operators
>> > > >          sqlite.py
>> > > >          oracle.py
>> > > >          docker.py
>> > > >     hooks
>> > > >          hdfs.py
>> > > >          sqlite.py
>> > > >     sensors
>> > > >          http.py
>> > > >          sql.py
>> > > >
>> > > >
>> > > > J.
>> > > >
>> > > > On Fri, Nov 8, 2019 at 9:43 AM Ash Berlin-Taylor <a...@apache.org>
>> > wrote:
>> > > >
>> > > > > Do we need to include `-backport,`? What was the thinking behind
>> > that?
>> > > > >
>> > > > > I think software and protocol should be merged. I would also say
>> > > > > _everything_ is a provider, so airflow.providers.ssh.SSHOperator
>> for
>> > > > > instance is what I would prefer
>> > > > >
>> > > > > -a
>> > > > >
>> > > > > On 8 November 2019 08:32:42 GMT, Jarek Potiuk <
>> > jarek.pot...@polidea.com>
>> > > > > wrote:
>> > > > > >One more day to go. I would love to see some opinions on this
>> AIP-21
>> > > > > >update
>> > > > > >:).
>> > > > > >
>> > > > > >Executive summary:
>> > > > > >
>> > > > > >* we will be moving a number of integrations to sub-packages of
>> > > > > >airflow.
>> > > > > >* they will be backportable to 1.10.*.  There will be
>> > > > > >'apache-airflow-[package]-backport' pypi installable with python
>> 3
>> > that
>> > > > > >will make Airflow 2.0 operators/hooks etc. available with 1.10*
>> > > > > >operators.
>> > > > > >* the current proposal for sub-packages is
>> > > > > >"protocols/software/providers/"
>> > > > > >(but if you think merging protocols and software makes sense -
>> > please
>> > > > > >express your opinion
>> > > > > >* we are not moving "fundamental" operators/hooks etc..
>> > > > > >* Airflow 2.0 is still going to be installed as a single package
>> > with
>> > > > > >all
>> > > > > >operators (so we are not yet implementing AIP-8)
>> > > > > >
>> > > > > >J.
>> > > > > >
>> > > > > >On Wed, Nov 6, 2019 at 10:07 AM Jarek Potiuk <
>> > jarek.pot...@polidea.com>
>> > > > > >wrote:
>> > > > > >
>> > > > > >> I think all this cases are valid but maybe I was not
>> super-clear.
>> > > > > >It's
>> > > > > >> only the transfer operators that we need to decide where to
>> put -
>> > not
>> > > > > >> hooks.
>> > > > > >> Usually the complexity of communication with particular
>> storages
>> > is
>> > > > > >(or at
>> > > > > >> least should be) in the Hooks rather than Operators.
>> > > > > >>
>> > > > > >> Operators should be just thin wrappers over the logic in the
>> > hooks.
>> > > > > >> Hooks are going to stay where they belong - S3 Hooks in amazon,
>> > GCS
>> > > > > >Hooks
>> > > > > >> in google.cloud, GoogleSheet Hooks in google.gsuite.
>> > > > > >>
>> > > > > >> Since we actually have mono-repo - this will be no problem
>> (and no
>> > > > > >cross
>> > > > > >> dependencies problem) to have S3 -> GCS operator  in google and
>> > use
>> > > > > >hooks
>> > > > > >> from both google/amazon.
>> > > > > >>
>> > > > > >> I hope this alleviates your concern Daniel ?
>> > > > > >>
>> > > > > >> J.
>> > > > > >>
>> > > > > >>
>> > > > > >>> What about GoogleSheetsToS3?  GoogleSheetsToGCS?  These you
>> would
>> > > > > >put in
>> > > > > >>> the target, i.e. the storage?  But GoogleSheetsToSftp would
>> be in
>> > > > > >google
>> > > > > >>> sheets operators file?  The complexity, and the shared code,
>> are
>> > in
>> > > > > >the
>> > > > > >>> gsheet component -- not into the storage destination.
>> > > > > >>>
>> > > > > >>>
>> > > > > >>
>> > > > > >>
>> > > > > >>
>> > > > > >>> On Tue, Nov 5, 2019 at 5:46 PM Jarek Potiuk
>> > > > > ><jarek.pot...@polidea.com>
>> > > > > >>> wrote:
>> > > > > >>>
>> > > > > >>> > Hello Airflow Community,
>> > > > > >>> >
>> > > > > >>> > The email calls for a vote to update AIP-21 Changes in
>> import
>> > > > > >paths
>> > > > > >>> > <
>> > > > > >>> >
>> > > > > >>>
>> > > > > >
>> > > > >
>> > > >
>> >
>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
>> > > > > >>> > >
>> > > > > >>> > with
>> > > > > >>> > the changes described below. The vote will last till
>> Saturday
>> > 8th
>> > > > > >2am
>> > > > > >>> CEST
>> > > > > >>> > (72 hours). Committers have a binding vote but everyone from
>> > the
>> > > > > >>> community
>> > > > > >>> > is encouraged to cast an advisory vote.
>> > > > > >>> >
>> > > > > >>> > *Summary*:
>> > > > > >>> >
>> > > > > >>> > The proposal is to update AIP-21 to move all non-core
>> > > > > >>> > operators/hooks/sensor (and related files) to sub-packages
>> > within
>> > > > > >>> airflow
>> > > > > >>> > (protocols/software/providers) or (software/providers).
>> > > > > >>> > I am also happy to merge protocols+software, so if you have
>> a
>> > > > > >strong
>> > > > > >>> > opinion on it - please state it with your vote and we can
>> > decide
>> > > > > >based
>> > > > > >>> on
>> > > > > >>> > majority.
>> > > > > >>> >
>> > > > > >>> > Those packages will be separately released (schedule/process
>> > TBD)
>> > > > > >and
>> > > > > >>> will
>> > > > > >>> > be backportable to 1.10.* airflow series, so that users can
>> > > > > >install it
>> > > > > >>> and
>> > > > > >>> > start using new Airflow2.0 operators in their Python 3
>> Airflow
>> > > > > >1.10
>> > > > > >>> > environments (only Python 3.5+ is supported).
>> > > > > >>> >
>> > > > > >>> > We will proceed with migrating the providers package to
>> already
>> > > > > >agreed
>> > > > > >>> > paths without waiting for the final vote (following current
>> > > > > >version of
>> > > > > >>> > AIP-21). Since we have working POC - we know the agreed
>> paths
>> > will
>> > > > > >work
>> > > > > >>> for
>> > > > > >>> > us.
>> > > > > >>> >
>> > > > > >>> > *Previous discussions: *
>> > > > > >>> >
>> > > > > >>> >    -
>> > > > > >>> >
>> > > > > >>> >
>> > > > > >>>
>> > > > > >
>> > > > >
>> > > >
>> >
>> https://lists.apache.org/thread.html/b07a93c9114e3d3c55d4ee514955bac79bc012c7a00db627c6b4c55f@%3Cdev.airflow.apache.org%3E
>> > > > > >>> >    -
>> > > > > >>> >
>> > > > > >>> >
>> > > > > >>>
>> > > > > >
>> > > > >
>> > > >
>> >
>> https://lists.apache.org/thread.html/e25ddc546e367a4af3e594fecbd4431959bd5a89045e748e4206e7ff@%3Cdev.airflow.apache.org%3E
>> > > > > >>> >
>> > > > > >>> > *More Details*:
>> > > > > >>> >
>> > > > > >>> > 1) Information that we are going in the direction of AIP-8
>> but
>> > not
>> > > > > >yet
>> > > > > >>> > reaching it - focusing on separating out backportable
>> packages
>> > > > > >>> installable
>> > > > > >>> > in Airflow releases 1.10.* . Airflow 2.0 will still be
>> > installed
>> > > > > >as a
>> > > > > >>> whole
>> > > > > >>> > and all the source will be kept in one repo, but we now
>> have a
>> > way
>> > > > > >to
>> > > > > >>> build
>> > > > > >>> > backportable packages for groups of operators. POC available
>> > here:
>> > > > > >>> > https://github.com/apache/airflow/pull/6507 (based on Ash's
>> > > > > >>> > https://github.com/ashb/airflow-submodule-test)
>> > > > > >>> >
>> > > > > >>> > 2) We move all integrations to new packages (keeping
>> deprecated
>> > > > > >import
>> > > > > >>> > aliases in the old places). The following split (according
>> to
>> > > > > >>> "stewardship"
>> > > > > >>> > over the integrations):
>> > > > > >>> >
>> > > > > >>> >    - *fundamentals* - core of ariflow - they are really
>> part of
>> > > > > >Apache
>> > > > > >>> >    Airflow. Stewards - core Airflow team. Not
>> > > > > >backportable/separated
>> > > > > >>> out.
>> > > > > >>> >    - *protocols* - are not owned by anyone, they are public
>> and
>> > > > > >the
>> > > > > >>> >    implementation is fully "open". There are no particular
>> > > > > >stewards (no
>> > > > > >>> > need).
>> > > > > >>> >    Users of particular protocols should mainly maintain
>> those
>> > and
>> > > > > >add
>> > > > > >>> > support
>> > > > > >>> >    for different versions of the protocols.
>> > > > > >>> >    - *software* - both API and software are controlled by
>> > someone
>> > > > > >>> outside
>> > > > > >>> >    of Airflow (commercial or open-source project), but the
>> > > > > >deployment of
>> > > > > >>> > that
>> > > > > >>> >    software is "owned" by the user installing Airflow. The
>> > > > > >"stewardship"
>> > > > > >>> > might
>> > > > > >>> >    be also the users but the controlling party (Oracle for
>> > > > > >example)
>> > > > > >>> might
>> > > > > >>> > be
>> > > > > >>> >    interested in maintaining those operators as well.
>> > > > > >>> >    - *providers* - API/software/deployments are fully
>> > controlled
>> > > > > >by a
>> > > > > >>> 3rd
>> > > > > >>> >    party. Here most likely "provider" will be interested in
>> > > > > >maintaining
>> > > > > >>> the
>> > > > > >>> >    operators (and for example like Google - provide
>> integration
>> > > > > >>> guidelines
>> > > > > >>> >    <
>> > > > > >>> >
>> > > > > >>>
>> > > > > >
>> > > > >
>> > > >
>> >
>> https://docs.google.com/document/d/1_rTdJSLCt0eyrAylmmgYc3yZr-_h51fVlnvMmWqhCkY/edit?usp=drive_web&ouid=112320280470690058978
>> > > > > >>> > >
>> > > > > >>> > for
>> > > > > >>> >    their hooks/operators/sensors)
>> > > > > >>> >
>> > > > > >>> >
>> > > > > >>> > 3) Between-providers transfer operators should be kept at
>> the
>> > > > > >"target"
>> > > > > >>> > rather than "source"
>> > > > > >>> > For example S3 -> GCS should be in "google" provider, but
>> > GCS-> S3
>> > > > > >>> should
>> > > > > >>> > be in "amazon".
>> > > > > >>> >
>> > > > > >>> > 4) One-side provider transfer operators should be kept at
>> the
>> > > > > >"provider"
>> > > > > >>> > regardless if they are target or source.
>> > > > > >>> > For example GCS-> SFTP or SFTP -> GCS should be in "google"
>> > > > > >provider.
>> > > > > >>> >
>> > > > > >>> > 5) If in doubt we will discuss individual cases separately.
>> > > > > >>> >
>> > > > > >>> > J.
>> > > > > >>> >
>> > > > > >>> > --
>> > > > > >>> >
>> > > > > >>> > Jarek Potiuk
>> > > > > >>> > Polidea <https://www.polidea.com/> | Principal Software
>> > Engineer
>> > > > > >>> >
>> > > > > >>> > M: +48 660 796 129 <+48660796129>
>> > > > > >>> > [image: Polidea] <https://www.polidea.com/>
>> > > > > >>> >
>> > > > > >>>
>> > > > > >>
>> > > > > >>
>> > > > > >> --
>> > > > > >>
>> > > > > >> Jarek Potiuk
>> > > > > >> Polidea <https://www.polidea.com/> | Principal Software
>> Engineer
>> > > > > >>
>> > > > > >> M: +48 660 796 129 <+48660796129>
>> > > > > >> [image: Polidea] <https://www.polidea.com/>
>> > > > > >>
>> > > > > >>
>> > > > > >
>> > > > > >--
>> > > > > >
>> > > > > >Jarek Potiuk
>> > > > > >Polidea <https://www.polidea.com/> | Principal Software Engineer
>> > > > > >
>> > > > > >M: +48 660 796 129 <+48660796129>
>> > > > > >[image: Polidea] <https://www.polidea.com/>
>> > > > >
>> > > >
>> > > >
>> > > > --
>> > > >
>> > > > Jarek Potiuk
>> > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
>> > > >
>> > > > M: +48 660 796 129 <+48660796129>
>> > > > [image: Polidea] <https://www.polidea.com/>
>> > > >
>> >
>>
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Reply via email to