Re

For example S3 -> GCS should be in "google" provider, but GCS-> S3 should
> be in "amazon
>

So if there were a BigQueryToS3 or SnowflakeToS3 operator, would you put
this in AWS?

I feel like storage should be a secondary consideration concerning object
naming.

Using snowflake as an example, we might have export operator variations
like SnowflakeToS3, SnowflakeToGCS, SnowflakeToAzureBlobStorage.  In my
view these would make sense in the same file as a BaseSnowflakeOperator, in
a snowflake operators module -- not in the target.  The storage component
for this kind of operator is secondary.

What about GoogleSheetsToS3?  GoogleSheetsToGCS?  These you would put in
the target, i.e. the storage?  But GoogleSheetsToSftp would be in google
sheets operators file?  The complexity, and the shared code, are in the
gsheet component -- not into the storage destination.







On Tue, Nov 5, 2019 at 5:46 PM Jarek Potiuk <jarek.pot...@polidea.com>
wrote:

> Hello Airflow Community,
>
> The email calls for a vote to update AIP-21 Changes in import paths
> <
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
> >
> with
> the changes described below. The vote will last till Saturday 8th 2am CEST
> (72 hours). Committers have a binding vote but everyone from the community
> is encouraged to cast an advisory vote.
>
> *Summary*:
>
> The proposal is to update AIP-21 to move all non-core
> operators/hooks/sensor (and related files) to sub-packages within airflow
> (protocols/software/providers) or (software/providers).
> I am also happy to merge protocols+software, so if you have a strong
> opinion on it - please state it with your vote and we can decide based on
> majority.
>
> Those packages will be separately released (schedule/process TBD) and will
> be backportable to 1.10.* airflow series, so that users can install it and
> start using new Airflow2.0 operators in their Python 3 Airflow 1.10
> environments (only Python 3.5+ is supported).
>
> We will proceed with migrating the providers package to already agreed
> paths without waiting for the final vote (following current version of
> AIP-21). Since we have working POC - we know the agreed paths will work for
> us.
>
> *Previous discussions: *
>
>    -
>
> https://lists.apache.org/thread.html/b07a93c9114e3d3c55d4ee514955bac79bc012c7a00db627c6b4c55f@%3Cdev.airflow.apache.org%3E
>    -
>
> https://lists.apache.org/thread.html/e25ddc546e367a4af3e594fecbd4431959bd5a89045e748e4206e7ff@%3Cdev.airflow.apache.org%3E
>
> *More Details*:
>
> 1) Information that we are going in the direction of AIP-8 but not yet
> reaching it - focusing on separating out backportable packages installable
> in Airflow releases 1.10.* . Airflow 2.0 will still be installed as a whole
> and all the source will be kept in one repo, but we now have a way to build
> backportable packages for groups of operators. POC available here:
> https://github.com/apache/airflow/pull/6507 (based on Ash's
> https://github.com/ashb/airflow-submodule-test)
>
> 2) We move all integrations to new packages (keeping deprecated import
> aliases in the old places). The following split (according to "stewardship"
> over the integrations):
>
>    - *fundamentals* - core of ariflow - they are really part of Apache
>    Airflow. Stewards - core Airflow team. Not backportable/separated out.
>    - *protocols* - are not owned by anyone, they are public and the
>    implementation is fully "open". There are no particular stewards (no
> need).
>    Users of particular protocols should mainly maintain those and add
> support
>    for different versions of the protocols.
>    - *software* - both API and software are controlled by someone outside
>    of Airflow (commercial or open-source project), but the deployment of
> that
>    software is "owned" by the user installing Airflow. The "stewardship"
> might
>    be also the users but the controlling party (Oracle for example) might
> be
>    interested in maintaining those operators as well.
>    - *providers* - API/software/deployments are fully controlled by a 3rd
>    party. Here most likely "provider" will be interested in maintaining the
>    operators (and for example like Google - provide integration guidelines
>    <
> https://docs.google.com/document/d/1_rTdJSLCt0eyrAylmmgYc3yZr-_h51fVlnvMmWqhCkY/edit?usp=drive_web&ouid=112320280470690058978
> >
> for
>    their hooks/operators/sensors)
>
>
> 3) Between-providers transfer operators should be kept at the "target"
> rather than "source"
> For example S3 -> GCS should be in "google" provider, but GCS-> S3 should
> be in "amazon".
>
> 4) One-side provider transfer operators should be kept at the "provider"
> regardless if they are target or source.
> For example GCS-> SFTP or SFTP -> GCS should be in "google" provider.
>
> 5) If in doubt we will discuss individual cases separately.
>
> J.
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>

Reply via email to