Are there IOs that would be desired for a generic transfer operator that
don't exist in:  https://beam.apache.org/documentation/io/built-in/ <-
there is pretty solid coverage?

Beam is getting to the point where even python beam can leverage the java
IOs, which increases the range of IOs (and performance).



On Tue, Sep 1, 2020 at 3:24 PM Jarek Potiuk <jarek.pot...@polidea.com>
wrote:

> But I believe those two ideas are separate ones as Tomek explained :)
>
> On Wed, Sep 2, 2020 at 12:03 AM Jarek Potiuk <jarek.pot...@polidea.com>
> wrote:
>
> > I love the idea of connecting the projects more closely!
> >
> > I've been helping recently as a consultant in improving the Apache Beam
> > build infrastructure (in many parts based on my Airflow experience and
> > Github Actions - even recently they adopted the "cancel" action I
> developed
> > for Apache Airflow). https://github.com/apache/beam/pull/12729
> >
> > Synergies in Apache projects are cool.
> >
> > J.
> >
> >
> > On Tue, Sep 1, 2020 at 11:16 PM Gerard Casas Saez
> > <gcasass...@twitter.com.invalid> wrote:
> >
> >> Agree on keeping those separate, just intervened as I believe its a
> great
> >> idea. But lets keep @beam and @spark to a separate thread.
> >>
> >>
> >> Gerard Casas Saez
> >> Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> >>
> >>
> >> On Tue, Sep 1, 2020 at 2:14 PM Tomasz Urbaszek <turbas...@apache.org>
> >> wrote:
> >>
> >> > Daniel is right we have few Apache Beam committers in Polidea so we
> >> > will ask for advice. However, I would be highly in favor of having it
> >> > as Gerard suggested as @beam decorator. This is something we should
> >> > put into another AIP together with the mentioned @spark decorator.
> >> >
> >> > Our proposition of transfer operators was mainly to create something
> >> > Airflow-native that works out of the box and allows us to simplify
> >> > read/write from external sources. Thus, it requires no external
> >> > dependency other than the library to communicate with the API. In the
> >> > case of Beam we need more than that I think.
> >> >
> >> > Additionally, the ideas of Source and Destination play nicely with
> >> > data lineage and may bring more interest to this feature of Airflow.
> >> >
> >> > Cheers,
> >> > Tomek
> >> >
> >> >
> >> > On Tue, Sep 1, 2020 at 9:31 PM Kaxil Naik <kaxiln...@gmail.com>
> wrote:
> >> > >
> >> > > Nice. Just a note here, we will need to make sure that those
> "Source"
> >> and
> >> > > "Destination" needs to be serializable.
> >> > >
> >> > > On Tue, Sep 1, 2020, 20:00 Daniel Imberman <
> daniel.imber...@gmail.com
> >> >
> >> > > wrote:
> >> > >
> >> > > > Interesting! Beam also could potentially allow transfers within
> >> > Dask/any
> >> > > > other system with a java/python SDK? I think @jarek and Polidea
> do a
> >> > lot of
> >> > > > work with Beam as well so I’d love their thoughts if this a good
> >> > use-case.
> >> > > >
> >> > > > via Newton Mail [
> >> > > >
> >> >
> >>
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.6&source=email_footer_2
> >> > > > ]
> >> > > > On Tue, Sep 1, 2020 at 11:46 AM, Gerard Casas Saez <
> >> > gcasass...@twitter.com.invalid>
> >> > > > wrote:
> >> > > > I would be highly in favour of having a generic Beam operator.
> >> Similar
> >> > > > to @spark_task decorator. Something where you can easily define
> and
> >> > wrap a
> >> > > > beam pipeline and convert it to an Airflow operator.
> >> > > >
> >> > > > Gerard Casas Saez
> >> > > > Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> >> > > >
> >> > > >
> >> > > > On Tue, Sep 1, 2020 at 12:44 PM Austin Bennett <
> >> > > > whatwouldausti...@gmail.com>
> >> > > > wrote:
> >> > > >
> >> > > > > Are you guys familiar with Beam <https://beam.apache.org>? Esp.
> >> if
> >> > not
> >> > > > > doing transforms, it might rather straightforward to rely on the
> >> > > > ecosystem
> >> > > > > of connectors in that Apache Project to use as the foundations
> >> for a
> >> > > > > generic transfer operator.
> >> > > > >
> >> > > > > On Tue, Sep 1, 2020 at 11:05 AM Jarek Potiuk <
> >> > jarek.pot...@polidea.com>
> >> > > > > wrote:
> >> > > > >
> >> > > > > > +1
> >> > > > > >
> >> > > > > > On Tue, Sep 1, 2020 at 1:35 PM Kamil Olszewski <
> >> > > > > > kamil.olszew...@polidea.com>
> >> > > > > > wrote:
> >> > > > > >
> >> > > > > > > Hello all,
> >> > > > > > > since there have been no new comments shared in the POC doc
> >> > > > > > > <
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> >
> >>
> https://docs.google.com/document/d/1o7Ph7RRNqLWkTbe7xkWjb100eFaK1Apjv27LaqHgNkE/edit
> >> > > > > > > >
> >> > > > > > > for a couple of days, then I will proceed with creating an
> AIP
> >> > for
> >> > > > this
> >> > > > > > > feature, if that is ok with everybody.
> >> > > > > > > Best regards,
> >> > > > > > > Kamil
> >> > > > > > > On Thu, Aug 27, 2020 at 10:50 AM Tomasz Urbaszek <
> >> > > > turbas...@apache.org
> >> > > > > >
> >> > > > > > > wrote:
> >> > > > > > >
> >> > > > > > > > I like the approach as it itnroduces another interesting
> >> > operators'
> >> > > > > > > > interface standarization. It would be awesome to here more
> >> > opinions
> >> > > > > :)
> >> > > > > > > >
> >> > > > > > > > Cheers,
> >> > > > > > > > Tomek
> >> > > > > > > >
> >> > > > > > > > On Wed, Aug 19, 2020 at 8:10 PM Jarek Potiuk <
> >> > > > > jarek.pot...@polidea.com
> >> > > > > > >
> >> > > > > > > > wrote:
> >> > > > > > > >
> >> > > > > > > > > I like the idea a lot. Similar things have been
> discussed
> >> > before
> >> > > > > but
> >> > > > > > > the
> >> > > > > > > > > proposal is I think rather pragmatic and solves a real
> >> > problem
> >> > > > (and
> >> > > > > > it
> >> > > > > > > > does
> >> > > > > > > > > not seem to be too complex to implement)
> >> > > > > > > > >
> >> > > > > > > > > There is some discussion about it already in the
> document
> >> > (please
> >> > > > > > > > chime-in
> >> > > > > > > > > for those interested) but here a few points why I like
> it:
> >> > > > > > > > >
> >> > > > > > > > > - performance and optimization is not a focus for that.
> >> For
> >> > > > generic
> >> > > > > > > stuff
> >> > > > > > > > > it is usually to write "optimal" solution but once you
> >> admit
> >> > you
> >> > > > > are
> >> > > > > > > not
> >> > > > > > > > > going to focus for optimisation, you come with simpler
> and
> >> > easier
> >> > > > > to
> >> > > > > > > use
> >> > > > > > > > > solutions
> >> > > > > > > > >
> >> > > > > > > > > - on the other hand - it uses very "Python'y" approach
> >> with
> >> > using
> >> > > > > > > > > Airflow's familiar concepts (connection, transfer) and
> has
> >> > the
> >> > > > > > > potential
> >> > > > > > > > of
> >> > > > > > > > > plugging in into 100s of hooks we have already easily -
> >> > > > leveraging
> >> > > > > > all
> >> > > > > > > > the
> >> > > > > > > > > "providers" richness of Airflow.
> >> > > > > > > > >
> >> > > > > > > > > - it aims to be easy to do "quick start" - if you have a
> >> > number
> >> > > > of
> >> > > > > > > > > different sources/targets and as a data scientist you
> >> would
> >> > like
> >> > > > to
> >> > > > > > > > quickly
> >> > > > > > > > > start transferring data between them - you can do it
> >> easily
> >> > with
> >> > > > > > only
> >> > > > > > > > > basic python knowledge and simple DAG structure.
> >> > > > > > > > >
> >> > > > > > > > > - it should be possible to plug it in into our new
> >> functional
> >> > > > > > approach
> >> > > > > > > as
> >> > > > > > > > > well as future lineage discussions as it makes
> connection
> >> > between
> >> > > > > > > sources
> >> > > > > > > > > and targets
> >> > > > > > > > >
> >> > > > > > > > > - it opens up possibilities of adding simple and
> flexible
> >> > data
> >> > > > > > > > > transformation on-transfer. Not a replacement for any of
> >> the
> >> > > > > external
> >> > > > > > > > > services that Airflow should use (Airflow is an
> >> > orchestrator, not
> >> > > > > > data
> >> > > > > > > > > processing solution) but for the kind of quick-start
> >> > scenarios I
> >> > > > > > > foresee
> >> > > > > > > > it
> >> > > > > > > > > might be most useful, being able to apply simple data
> >> > > > > transformation
> >> > > > > > on
> >> > > > > > > > the
> >> > > > > > > > > fly by data scientist might be a big plus.
> >> > > > > > > > >
> >> > > > > > > > > Suggestion: Panda DataFrame as the format of the "data"
> >> > component
> >> > > > > > > > >
> >> > > > > > > > > Kamil - you should have access now.
> >> > > > > > > > >
> >> > > > > > > > > J.
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > On Tue, Aug 18, 2020 at 6:53 PM Kamil Olszewski <
> >> > > > > > > > > kamil.olszew...@polidea.com>
> >> > > > > > > > > wrote:
> >> > > > > > > > >
> >> > > > > > > > > > Hello all,
> >> > > > > > > > > > in Polidea we have come up with an idea for a generic
> >> > transfer
> >> > > > > > > operator
> >> > > > > > > > > > that would be able to transport data between two
> >> > destinations
> >> > > > of
> >> > > > > > > > various
> >> > > > > > > > > > types (file, database, storage, etc.) - please find
> the
> >> > link
> >> > > > > with a
> >> > > > > > > > short
> >> > > > > > > > > > doc with POC
> >> > > > > > > > > > <
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> >
> >>
> https://docs.google.com/document/d/1o7Ph7RRNqLWkTbe7xkWjb100eFaK1Apjv27LaqHgNkE/edit?usp=sharing
> >> > > > > > > > > > >
> >> > > > > > > > > > where we can discuss the design initially. Once we
> come
> >> to
> >> > the
> >> > > > > > > initial
> >> > > > > > > > > > conclusion I can create an AIP on cWiki - can I ask
> for
> >> > > > > permission
> >> > > > > > to
> >> > > > > > > > do
> >> > > > > > > > > so
> >> > > > > > > > > > (my id is 'kamil.olszewski')? I believe that during
> the
> >> > > > > discussion
> >> > > > > > we
> >> > > > > > > > > > should definitely aim for this feature to be released
> >> only
> >> > > > after
> >> > > > > > > > Airflow
> >> > > > > > > > > > 2.0 is out.
> >> > > > > > > > > >
> >> > > > > > > > > > What do you think about this idea? Would you find such
> >> an
> >> > > > > operator
> >> > > > > > > > > helpful
> >> > > > > > > > > > in your pipelines? Maybe you already use a similar
> >> > solution or
> >> > > > > know
> >> > > > > > > > > > packages that could be used to implement it?
> >> > > > > > > > > >
> >> > > > > > > > > > Best regards,
> >> > > > > > > > > > --
> >> > > > > > > > > >
> >> > > > > > > > > > Kamil Olszewski
> >> > > > > > > > > > Polidea <https://www.polidea.com> | Software Engineer
> >> > > > > > > > > >
> >> > > > > > > > > > M: +48 503 361 783
> >> > > > > > > > > > E: kamil.olszew...@polidea.com
> >> > > > > > > > > >
> >> > > > > > > > > > Unique Tech
> >> > > > > > > > > > Check out our projects! <
> >> https://www.polidea.com/our-work>
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > --
> >> > > > > > > > >
> >> > > > > > > > > Jarek Potiuk
> >> > > > > > > > > Polidea <https://www.polidea.com/> | Principal Software
> >> > Engineer
> >> > > > > > > > >
> >> > > > > > > > > M: +48 660 796 129 <+48660796129>
> >> > > > > > > > > [image: Polidea] <https://www.polidea.com/>
> >> > > > > > > > >
> >> > > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > --
> >> > > > > > >
> >> > > > > > > Kamil Olszewski
> >> > > > > > > Polidea <https://www.polidea.com> | Software Engineer
> >> > > > > > >
> >> > > > > > > M: +48 503 361 783
> >> > > > > > > E: kamil.olszew...@polidea.com
> >> > > > > > >
> >> > > > > > > Unique Tech
> >> > > > > > > Check out our projects! <https://www.polidea.com/our-work>
> >> > > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > > --
> >> > > > > >
> >> > > > > > Jarek Potiuk
> >> > > > > > Polidea <https://www.polidea.com/> | Principal Software
> >> Engineer
> >> > > > > >
> >> > > > > > M: +48 660 796 129 <+48660796129>
> >> > > > > > [image: Polidea] <https://www.polidea.com/>
> >> > > > > >
> >> > > > >
> >> >
> >> >
> >> >
> >> > --
> >> >
> >> > Tomasz Urbaszek
> >> > Polidea | Software Engineer
> >> >
> >> > M: +48 505 628 493
> >> > E: tomasz.urbas...@polidea.com
> >> >
> >> > Unique Tech
> >> > Check out our projects!
> >> >
> >>
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/>
> >
> >
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>

Reply via email to