Also, ansible has something similar:
https://github.com/ansible/ansible/tree/devel/lib/ansible/modules

Generally, I have been inspired by how Terraform and Ansible have
implemented it and can serve as an inspiration to us.

On Tue, Oct 29, 2019 at 12:51 PM Ash Berlin-Taylor <a...@apache.org> wrote:

> Also providers and SAAS could be merged (taking inspiration from Terraform
> here: https://www.terraform.io/docs/providers/index.html <
> https://www.terraform.io/docs/providers/index.html> - ignore the menu on
> the left, that is just for Docs layout, which we could do too -- Docs
> grouping doesn't have to match code grouping 100%)
>
> I would favour fewer sub-packages than more. My only reason for for
> suggesting providers was to make it clear when looking at the code what the
> purpose of a module is. If "everything" lived under
> airflow.providers.{$major_cloud,core} or I would be okay with that.
>
> Can we talk in specifics here too? What package namespaces are you
> suggesting?
>
> -ash
>
>
> On 29 October 2019 12:02:54 GMT, "Driesprong, Fokko" <fo...@driesprong.frl>
> wrote:
> Thanks Jarek for clearing that up.
>
> Personally I would omit the Apache one. We should not step into the
> fallacy as before with not being sure if it was in contrib or not. I would
> even consider merging software and protocols, as it not entirely clear what
> a protocol is or not. In the end, everything is a protocol, might be a high
> level (FTP) or a low-level protocol (FS).
>
> Cheers, Fokko
>
> Cheers, Fokko
>
> Op di 29 okt. 2019 om 12:45 schreef Jarek Potiuk <jarek.pot...@polidea.com
> >:
>
>  Yep. We should definitely discuss the split!
>
>  For me these are the criteria:
>
>     - fundamentals - those are all the operators/hooks/sensors that are the
>     "Core" of Airflow (base, dbapi) and allow you to run basic examples,
>     implements basic logic of  Airflow (subdags, branch etc.) + generic
>     operators being base for others (like generic transfer/sql)
>     - providers - integration with cloud providers - (PAAS)
>     - apache - integrations - with other ApacheSoftwareFoundation projects
>     - software - Integration with other software that is proprietary or
>     open-source that you can install on-premises (or in the cloud)
>     - protocols - integration with protocols that can be implemented by any
>     software (SFTP/mail/etc.)
>     - services - Integration with SAAS solutions
>
>  From the above list I only have doubts about the "apache" one - question
> is
>  whether as part of Apache Community we want to somehow group those.
>
>  J.
>
>
>  On Tue, Oct 29, 2019 at 11:19 AM Bas Harenslak <
>  basharens...@godatadriven.com> wrote:
>
>    1.  Sounds good to me
>    2.  Also fine
>    3.  We should have some consensus here. E.g. I’m not sure what groups
>  “fundamentals” and “software” are meant to be :-)
>
>  While we’re at it: we should really move the BaseOperator out of models.
>  The BaseOperator has no representation in the DB and should be placed
>  together with other scripts where it belongs, i.e. something like
>  airflow.operators.base_operator.
>
>  Bas
>
>  On 29 Oct 2019, at 10:43, Jarek Potiuk <jarek.pot...@polidea.com<mailto:
>  jarek.pot...@polidea.com>> wrote:
>
>  After some consideration and seeing the actual move in practice I wanted
> to
>  propose 3rd amendment ;) to the AIP-21.
>  I have a few observations from seeing the discussions and observing the
>  actual moving process. I have the following proposals:
>
>  *1) Between-providers transfer operators should be kept at the "target"
>  rather than "source"*
>
>  If we end up with splitting operators by groups (AIP-8 and the proposed
>  Backporting to Airflow 1.10), I think it makes more sense to keep
> transfer
> operators in the "target" package. For example "S3 to GCS" operator in
> "providers/google" package - simply because it is more likely that the
> individuals that will be working on the pure "GCP" services will also be
> more interested in getting the data from other cloud providers, and
> likely
> they will even have some transfer services that can be used for that
> purpose (rather than using worker to transfer the data) - in the
> particular
> S3-> GCS case we have GCP's
> https://cloud.google.com/storage-transfer/docs/overview <
> https://cloud.google.com/storage-transfer/docs/overview> which allows to
> transfer data from any other cloud provider to GCS . Same for example if
> we
> imagine Athena -> Bigquery for example. At least that's the feeling I
> have.
> I can imagine that the kind of "stewardship" over those groups of
> operators
> can be somewhat influenced and maybe even performed by those cloud
> providers themselves. Corresponding hooks of course should be in
> different
>  "groups".
>
>  2) *One-side provider-neutral transfer operators should be kept at the
>  "provider" regardless if they are target or source.*
>
>  For example GCS-> SFTP or SFTP -> GCS. There the hook for SFTP should be
> in
> the "core" package but both operators should be in "providers/google".
> The
>  reason is quite the same as above - the "stewardship" over all the
>  operators can be done by the "provider" group.
>
>  *3) Grouping non-provider operators/hooks according to their purpose.*
>
>  I think it is also the right time to move the other operators/hooks to
>  different groups within core. We already have some reasonable and nice
>  groups proposed in the new documentation by Kamil
>  https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html <
> https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html>
> and
> it only makes sense to move those now (Fundamentals, ASF: Apache Software
> Foundation, Azure: Microsoft Azure, AWS: Amazon Web Services, GCP: Google
> Cloud Platform, Service integrations, Software integrations, Protocol
> integrations). I think it would make sense to use the same approach in
> the
>  code: We could have
>
>
>
> fundamentals/asf/azure(microsoft/azure?)/aws(amazon/aws?)/google/services/software/protocols)
>  packages.
>
>  There will be few exceptions probably but we can handle them on
>  case-by-case basis.
>
>  J.
>
>  On Fri, Oct 11, 2019 at 3:11 PM Jarek Potiuk <jarek.pot...@polidea.com
>  <mailto:jarek.pot...@polidea.com>>
>  wrote:
>
>  Hello everyone. I updated AIP-21 and updated examples.
>
>
>  Point D. of AIP-21 is now as follows:
>
>
>
>  *D. * Group operators/sensors/hooks in
>  *airflow/providers/<PROVIDER>*/operators(sensors,
>  hooks).
>
>  Each provider can define its own internal structure of that package. For
>  example in case of "google" provider the packages will be further grouped
>  by "gcp", "gsuite", "core" sub-packages.
>
>  In case of transfer operators where two providers are involved, the
>  transfer operators will be moved to "source" of the transfer. When there
>  is only one provider as target but source is a database or another
>  non-provider source, the operator is put to the target provider.
>
>  Non-cloud provider ones are moved to airflow/operators(sensors/hooks).
>  *Drop the prefix.*
>
>  Examples:
>
>  AWS operator:
>
>    -
>  *airflow/contrib/operators/sns_publish_operator.py
>    becomes airflow/providers/aws/operators/**sns_publish_operator.py*
>
>  *Google GCP operator:*
>
>    - *airflow/contrib/operators/dataproc_operator.py*
>   becomes *airflow/providers/gooogle/gcp/operators/dataproc_operator.py*
>
>  Previously GCP-prefixed operator:
>
>    -
>  *airflow/contrib/operators/gcp_bigtable_operator.py  *becomes
>    *airflow/providers/google/**gcp/operators/bigtable_operator.py*
>
>  *Transfer from GCP:*
>
>    - *airflow/contrib/operators/gcs_to_s3_operator.py*
>    * becomes airflow/providers/google/gcp/operators/gcs_to_s3_operator.py*
>
>  *MySQL to GCS:*
>
>    - *airflow/contrib/operators/mysql_to_gcs_operator.py*
>    * becomes airflow/providers/google/gcp/operators/*
>    *mysql_to_gcs_operator.py*
>
>  *SSH operator:*
>
>    -
>  *airflow/contrib/operators/ssh_operator.py  *becomes *airflow/*
>    *operators/ssh_operator.py*
>
>
>  On Fri, Oct 4, 2019 at 6:22 PM Jarek Potiuk <jarek.pot...@polidea.com
>  <mailto:jarek.pot...@polidea.com>>
>  wrote:
>
>  Yeah. I think the important point is that the latest doc changes by Kamil
>  index all available operators and hooks nicely and make them easy to
> find.
>
>  That also includes (as of today) automated CI checking if new operators
>  and hooks added are added to the documentation :
>
>
>
> https://github.com/apache/airflow/commit/104a151d6a19b1ba1281cb00c66a2c3409e1bb13
> <
> https://github.com/apache/airflow/commit/104a151d6a19b1ba1281cb00c66a2c3409e1bb13
> >
>
>  J.
>
>  On Fri, Oct 4, 2019 at 5:21 PM Chris Palmer <ch...@crpalmer.com> wrote:
>
>  It's not obvious to me why an S3ToMsSQLOperator in the aws package is
>  "silly". Why do you say it made sense to create a MsSqlFromS3Operator?
>
>  Basically all of these operators could be thought of as "move data from A
>  to B" or "move data to B from A". I think what feels natural to each
>  individual will depend on what their frame of reference is, and where
>  their
>  main focus is. If you are largely focused on MsSql then I can understand
>  that it's natural to think "What MsSql operators are there?" and to
>  not see S3ToMsSqlOperator
>  as one of those MsSql operators. That's exactly the point I made with my
>  earlier response; I was so focused on BigQuery that I didn't think to
>  look
>  under Cloud Storage documentation for the
>  GoogleCloudStorageToBigQueryOperator.
>
>  I think it is too hard to draw a very distinct line between what is just
>  "storage" and what is more. There are going to be fuzzy edge cases, so
>  picking a single convention is going to much less hassle in my view. As
>  long as that convention is well documented and the documentation is
>  improved so that it's easier to find all operators that relate to
>  BigQuery
>  or MsSql etc in one place (as is being done by Kamil) then that is the
>  best
>  we can do.
>
>  Chris
>
>
>
>  On Fri, Oct 4, 2019 at 10:55 AM Daniel Standish <dpstand...@gmail.com>
>  wrote:
>
>  One case popped up for us recently, where it made sense to make a MsSql
>  *From*S3Operator .
>
>  I think using "source" makes sense in general, but in this case calling
>  this a S3ToMsSqlOperator and putting it under AWS seems silly, even
>  though
>  you could say s3 is "source" here.
>
>  I think in most of these cases we say "let's use source" because
>  source is
>  where the actual work is done and destination is just storage.
>
>  Does a guideline saying "ignore storage" or "storage is secondary in
>  object
>  location" make sense?
>
>
>
>  On Fri, Oct 4, 2019 at 6:42 AM Jarek Potiuk <jarek.pot...@polidea.com>
>  wrote:
>
>  It looks like we have general consensus about putting transfer
>  operators
>  into "source provider" package.
>  That's great for me as well.
>
>  Since I will be updating AIP-21 to reflect the "google" vs. "gcp"
>  case, I
>  will also update it to add this decision.
>
>  If no-one objects (Lazy Consensus
>  <https://community.apache.org/committers/lazyConsensus.html <
> https://community.apache.org/committers/lazyConsensus.html>>) till
>  Monday7th of October, 3.20 CEST, we will update AIP-21 with
>  information
>  that transfer operators should be placed in the "source" provider
>  module.
>
>  J.
>
>  On Tue, Sep 24, 2019 at 1:34 PM Kamil Breguła <
>  kamil.breg...@polidea.com
>
>  wrote:
>
>  On Mon, Sep 23, 2019 at 7:42 PM Chris Palmer <ch...@crpalmer.com>
>  wrote:
>
>  On Mon, Sep 23, 2019 at 1:22 PM Kamil Breguła <
>  kamil.breg...@polidea.com
>
>  wrote:
>
>  On Mon, Sep 23, 2019 at 7:04 PM Chris Palmer <
>  ch...@crpalmer.com>
>  wrote:
>
>  Is there a reason why we can't use symlinks to have copies
>  of the
>  files
>  show up in both subpackages? So that `gcs_to_s3.py` would be
>  under
>  both
>  `aws/operators/` and `gcp/operators`. I could imagine there
>  may
>  be
>  technical reasons why this is a bad idea, but just thought I
>  would
>  ask.
>
>  Symlinks is not supported by git.
>
>
>  Why do you say that? This blog post
>  <https://www.mokacoding.com/blog/symliks-in-git/ <
> https://www.mokacoding.com/blog/symliks-in-git/>> details how
>  you
>  can
>  use
>  them, and the caveats with regards to needing relative links not
>  absolute.
>  The example repo he links to at the end includes a symlink which
>  worked
>  fine for me when I cloned it. But maybe not relevant given the
>  below:
>
>  We still have to check if python packages can have links, but I'm
>  afraid of this mechanism. This is not popular and may cause
>  unexpected
>  consequences.
>
>
>  Likewise, someone who spends 99% of their time working in
>  AWS and
>  using
>  all
>  the operators in that subpackage, might not think to look in
>  the
>  GCP
>  package the first time they need a GCS to S3 operator. I'm
>  admittedly
>  terrible at documentation, but if duplicating the files via
>  symlinks
>  isn't
>  an option, then is there an easy way we could duplicate the
>  documentation
>  for those operators so they are easily findable in both doc
>  sections?
>
>
>  Recently, I updated the documentation:
>  https://airflow.readthedocs.io/en/latest/integration.html <
> https://airflow.readthedocs.io/en/latest/integration.html>
>  We have list of all integration in AWS, Azure, GCP.  If the
>  operator
>  concerns two cloud proivders, it repeats in two places. It's
>  good
>  for
>  documentation.  DRY rule is only valid for source code.
>  I am working on documentation for other operators.
>  My work is part of this ticket:
>  https://issues.apache.org/jira/browse/AIRFLOW-5431 <
> https://issues.apache.org/jira/browse/AIRFLOW-5431>
>
>
>  This updated documentation looks great, definitely heading in a
>  direction
>  that makes it easier and addresses my concerns. (Although it
>  took me
>  a
>  while to realize those tables can be scrolled horizontally!).
>
>  I'm working on redesign of documentation theme. It's part of AIP-11
>
>
>
>
>
>
>
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-11+Create+a+Landing+Page+for+Apache+Airflow
> <
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-11+Create+a+Landing+Page+for+Apache+Airflow
> >
>  We are currently at the stage of collecting comments from the first
>  phase - we sent materials to the community, but also conducted
>  tests
>  with real users
>
>
>
>
>
>
>
> https://lists.apache.org/thread.html/6fa1cdceb97ed17752978a8d4202bf1ff1a86c6b50bbc9d09f694166@%3Cdev.airflow.apache.org%3E
> <
> https://lists.apache.org/thread.html/6fa1cdceb97ed17752978a8d4202bf1ff1a86c6b50bbc9d09f694166@%3Cdev.airflow.apache.org%3E
> >
>
>
>
>  --
>
>  Jarek Potiuk
>  Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> Principal Software Engineer
>
>  M: +48 660 796 129 <+48660796129>
>  [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
>
>
>
>
>
>  --
>
>  Jarek Potiuk
>  Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> Principal Software Engineer
>
>  M: +48 660 796 129 <+48660796129>
>  [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
>
>
>
>  --
>
>  Jarek Potiuk
>  Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> Principal Software Engineer
>
>  M: +48 660 796 129 <+48660796129>
>  [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
>
>
>
>  --
>
>  Jarek Potiuk
>  Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> Principal Software Engineer
>
>  M: +48 660 796 129 <+48660796129>
>  [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
>
>
>
>  --
>
>  Jarek Potiuk
>  Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> Principal Software Engineer
>
>  M: +48 660 796 129 <+48660796129>
>  [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
>
>

Reply via email to