It's not obvious to me why an S3ToMsSQLOperator in the aws package is
"silly". Why do you say it made sense to create a MsSqlFromS3Operator?

Basically all of these operators could be thought of as "move data from A
to B" or "move data to B from A". I think what feels natural to each
individual will depend on what their frame of reference is, and where their
main focus is. If you are largely focused on MsSql then I can understand
that it's natural to think "What MsSql operators are there?" and to
not see S3ToMsSqlOperator
as one of those MsSql operators. That's exactly the point I made with my
earlier response; I was so focused on BigQuery that I didn't think to look
under Cloud Storage documentation for the
GoogleCloudStorageToBigQueryOperator.

I think it is too hard to draw a very distinct line between what is just
"storage" and what is more. There are going to be fuzzy edge cases, so
picking a single convention is going to much less hassle in my view. As
long as that convention is well documented and the documentation is
improved so that it's easier to find all operators that relate to BigQuery
or MsSql etc in one place (as is being done by Kamil) then that is the best
we can do.

Chris



On Fri, Oct 4, 2019 at 10:55 AM Daniel Standish <dpstand...@gmail.com>
wrote:

> One case popped up for us recently, where it made sense to make a MsSql
> *From*S3Operator .
>
> I think using "source" makes sense in general, but in this case calling
> this a S3ToMsSqlOperator and putting it under AWS seems silly, even though
> you could say s3 is "source" here.
>
> I think in most of these cases we say "let's use source" because source is
> where the actual work is done and destination is just storage.
>
> Does a guideline saying "ignore storage" or "storage is secondary in object
> location" make sense?
>
>
>
> On Fri, Oct 4, 2019 at 6:42 AM Jarek Potiuk <jarek.pot...@polidea.com>
> wrote:
>
> > It looks like we have general consensus about putting transfer operators
> > into "source provider" package.
> > That's great for me as well.
> >
> > Since I will be updating AIP-21 to reflect the "google" vs. "gcp" case, I
> > will also update it to add this decision.
> >
> > If no-one objects (Lazy Consensus
> > <https://community.apache.org/committers/lazyConsensus.html>) till
> > Monday7th of October, 3.20 CEST, we will update AIP-21 with information
> > that transfer operators should be placed in the "source" provider module.
> >
> > J.
> >
> > On Tue, Sep 24, 2019 at 1:34 PM Kamil Breguła <kamil.breg...@polidea.com
> >
> > wrote:
> >
> > > On Mon, Sep 23, 2019 at 7:42 PM Chris Palmer <ch...@crpalmer.com>
> wrote:
> > > >
> > > > On Mon, Sep 23, 2019 at 1:22 PM Kamil Breguła <
> > kamil.breg...@polidea.com
> > > >
> > > > wrote:
> > > >
> > > > > On Mon, Sep 23, 2019 at 7:04 PM Chris Palmer <ch...@crpalmer.com>
> > > wrote:
> > > > > >
> > > > > > Is there a reason why we can't use symlinks to have copies of the
> > > files
> > > > > > show up in both subpackages? So that `gcs_to_s3.py` would be
> under
> > > both
> > > > > > `aws/operators/` and `gcp/operators`. I could imagine there may
> be
> > > > > > technical reasons why this is a bad idea, but just thought I
> would
> > > ask.
> > > > > >
> > > > > Symlinks is not supported by git.
> > > > >
> > > > >
> > > > Why do you say that? This blog post
> > > > <https://www.mokacoding.com/blog/symliks-in-git/> details how you
> can
> > > use
> > > > them, and the caveats with regards to needing relative links not
> > > absolute.
> > > > The example repo he links to at the end includes a symlink which
> worked
> > > > fine for me when I cloned it. But maybe not relevant given the below:
> > >
> > > We still have to check if python packages can have links, but I'm
> > > afraid of this mechanism. This is not popular and may cause unexpected
> > > consequences.
> > >
> > >
> > > > > > Likewise, someone who spends 99% of their time working in AWS and
> > > using
> > > > > all
> > > > > > the operators in that subpackage, might not think to look in the
> > GCP
> > > > > > package the first time they need a GCS to S3 operator. I'm
> > admittedly
> > > > > > terrible at documentation, but if duplicating the files via
> > symlinks
> > > > > isn't
> > > > > > an option, then is there an easy way we could duplicate the
> > > documentation
> > > > > > for those operators so they are easily findable in both doc
> > sections?
> > > > > >
> > > > >
> > > > > Recently, I updated the documentation:
> > > > > https://airflow.readthedocs.io/en/latest/integration.html
> > > > > We have list of all integration in AWS, Azure, GCP.  If the
> operator
> > > > > concerns two cloud proivders, it repeats in two places. It's good
> for
> > > > > documentation.  DRY rule is only valid for source code.
> > > > > I am working on documentation for other operators.
> > > > > My work is part of this ticket:
> > > > > https://issues.apache.org/jira/browse/AIRFLOW-5431
> > > > >
> > > > >
> > > > This updated documentation looks great, definitely heading in a
> > direction
> > > > that makes it easier and addresses my concerns. (Although it took me
> a
> > > > while to realize those tables can be scrolled horizontally!).
> > > >
> > > I'm working on redesign of documentation theme. It's part of AIP-11
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-11+Create+a+Landing+Page+for+Apache+Airflow
> > > We are currently at the stage of collecting comments from the first
> > > phase - we sent materials to the community, but also conducted tests
> > > with real users
> > >
> > >
> >
> https://lists.apache.org/thread.html/6fa1cdceb97ed17752978a8d4202bf1ff1a86c6b50bbc9d09f694166@%3Cdev.airflow.apache.org%3E
> > >
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/>
> >
>

Reply via email to