But I believe those two ideas are separate ones as Tomek explained :)

On Wed, Sep 2, 2020 at 12:03 AM Jarek Potiuk <[email protected]>
wrote:

> I love the idea of connecting the projects more closely!
>
> I've been helping recently as a consultant in improving the Apache Beam
> build infrastructure (in many parts based on my Airflow experience and
> Github Actions - even recently they adopted the "cancel" action I developed
> for Apache Airflow). https://github.com/apache/beam/pull/12729
>
> Synergies in Apache projects are cool.
>
> J.
>
>
> On Tue, Sep 1, 2020 at 11:16 PM Gerard Casas Saez
> <[email protected]> wrote:
>
>> Agree on keeping those separate, just intervened as I believe its a great
>> idea. But lets keep @beam and @spark to a separate thread.
>>
>>
>> Gerard Casas Saez
>> Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
>>
>>
>> On Tue, Sep 1, 2020 at 2:14 PM Tomasz Urbaszek <[email protected]>
>> wrote:
>>
>> > Daniel is right we have few Apache Beam committers in Polidea so we
>> > will ask for advice. However, I would be highly in favor of having it
>> > as Gerard suggested as @beam decorator. This is something we should
>> > put into another AIP together with the mentioned @spark decorator.
>> >
>> > Our proposition of transfer operators was mainly to create something
>> > Airflow-native that works out of the box and allows us to simplify
>> > read/write from external sources. Thus, it requires no external
>> > dependency other than the library to communicate with the API. In the
>> > case of Beam we need more than that I think.
>> >
>> > Additionally, the ideas of Source and Destination play nicely with
>> > data lineage and may bring more interest to this feature of Airflow.
>> >
>> > Cheers,
>> > Tomek
>> >
>> >
>> > On Tue, Sep 1, 2020 at 9:31 PM Kaxil Naik <[email protected]> wrote:
>> > >
>> > > Nice. Just a note here, we will need to make sure that those "Source"
>> and
>> > > "Destination" needs to be serializable.
>> > >
>> > > On Tue, Sep 1, 2020, 20:00 Daniel Imberman <[email protected]
>> >
>> > > wrote:
>> > >
>> > > > Interesting! Beam also could potentially allow transfers within
>> > Dask/any
>> > > > other system with a java/python SDK? I think @jarek and Polidea do a
>> > lot of
>> > > > work with Beam as well so I’d love their thoughts if this a good
>> > use-case.
>> > > >
>> > > > via Newton Mail [
>> > > >
>> >
>> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.6&source=email_footer_2
>> > > > ]
>> > > > On Tue, Sep 1, 2020 at 11:46 AM, Gerard Casas Saez <
>> > [email protected]>
>> > > > wrote:
>> > > > I would be highly in favour of having a generic Beam operator.
>> Similar
>> > > > to @spark_task decorator. Something where you can easily define and
>> > wrap a
>> > > > beam pipeline and convert it to an Airflow operator.
>> > > >
>> > > > Gerard Casas Saez
>> > > > Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
>> > > >
>> > > >
>> > > > On Tue, Sep 1, 2020 at 12:44 PM Austin Bennett <
>> > > > [email protected]>
>> > > > wrote:
>> > > >
>> > > > > Are you guys familiar with Beam <https://beam.apache.org>? Esp.
>> if
>> > not
>> > > > > doing transforms, it might rather straightforward to rely on the
>> > > > ecosystem
>> > > > > of connectors in that Apache Project to use as the foundations
>> for a
>> > > > > generic transfer operator.
>> > > > >
>> > > > > On Tue, Sep 1, 2020 at 11:05 AM Jarek Potiuk <
>> > [email protected]>
>> > > > > wrote:
>> > > > >
>> > > > > > +1
>> > > > > >
>> > > > > > On Tue, Sep 1, 2020 at 1:35 PM Kamil Olszewski <
>> > > > > > [email protected]>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > Hello all,
>> > > > > > > since there have been no new comments shared in the POC doc
>> > > > > > > <
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> >
>> https://docs.google.com/document/d/1o7Ph7RRNqLWkTbe7xkWjb100eFaK1Apjv27LaqHgNkE/edit
>> > > > > > > >
>> > > > > > > for a couple of days, then I will proceed with creating an AIP
>> > for
>> > > > this
>> > > > > > > feature, if that is ok with everybody.
>> > > > > > > Best regards,
>> > > > > > > Kamil
>> > > > > > > On Thu, Aug 27, 2020 at 10:50 AM Tomasz Urbaszek <
>> > > > [email protected]
>> > > > > >
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > I like the approach as it itnroduces another interesting
>> > operators'
>> > > > > > > > interface standarization. It would be awesome to here more
>> > opinions
>> > > > > :)
>> > > > > > > >
>> > > > > > > > Cheers,
>> > > > > > > > Tomek
>> > > > > > > >
>> > > > > > > > On Wed, Aug 19, 2020 at 8:10 PM Jarek Potiuk <
>> > > > > [email protected]
>> > > > > > >
>> > > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > I like the idea a lot. Similar things have been discussed
>> > before
>> > > > > but
>> > > > > > > the
>> > > > > > > > > proposal is I think rather pragmatic and solves a real
>> > problem
>> > > > (and
>> > > > > > it
>> > > > > > > > does
>> > > > > > > > > not seem to be too complex to implement)
>> > > > > > > > >
>> > > > > > > > > There is some discussion about it already in the document
>> > (please
>> > > > > > > > chime-in
>> > > > > > > > > for those interested) but here a few points why I like it:
>> > > > > > > > >
>> > > > > > > > > - performance and optimization is not a focus for that.
>> For
>> > > > generic
>> > > > > > > stuff
>> > > > > > > > > it is usually to write "optimal" solution but once you
>> admit
>> > you
>> > > > > are
>> > > > > > > not
>> > > > > > > > > going to focus for optimisation, you come with simpler and
>> > easier
>> > > > > to
>> > > > > > > use
>> > > > > > > > > solutions
>> > > > > > > > >
>> > > > > > > > > - on the other hand - it uses very "Python'y" approach
>> with
>> > using
>> > > > > > > > > Airflow's familiar concepts (connection, transfer) and has
>> > the
>> > > > > > > potential
>> > > > > > > > of
>> > > > > > > > > plugging in into 100s of hooks we have already easily -
>> > > > leveraging
>> > > > > > all
>> > > > > > > > the
>> > > > > > > > > "providers" richness of Airflow.
>> > > > > > > > >
>> > > > > > > > > - it aims to be easy to do "quick start" - if you have a
>> > number
>> > > > of
>> > > > > > > > > different sources/targets and as a data scientist you
>> would
>> > like
>> > > > to
>> > > > > > > > quickly
>> > > > > > > > > start transferring data between them - you can do it
>> easily
>> > with
>> > > > > > only
>> > > > > > > > > basic python knowledge and simple DAG structure.
>> > > > > > > > >
>> > > > > > > > > - it should be possible to plug it in into our new
>> functional
>> > > > > > approach
>> > > > > > > as
>> > > > > > > > > well as future lineage discussions as it makes connection
>> > between
>> > > > > > > sources
>> > > > > > > > > and targets
>> > > > > > > > >
>> > > > > > > > > - it opens up possibilities of adding simple and flexible
>> > data
>> > > > > > > > > transformation on-transfer. Not a replacement for any of
>> the
>> > > > > external
>> > > > > > > > > services that Airflow should use (Airflow is an
>> > orchestrator, not
>> > > > > > data
>> > > > > > > > > processing solution) but for the kind of quick-start
>> > scenarios I
>> > > > > > > foresee
>> > > > > > > > it
>> > > > > > > > > might be most useful, being able to apply simple data
>> > > > > transformation
>> > > > > > on
>> > > > > > > > the
>> > > > > > > > > fly by data scientist might be a big plus.
>> > > > > > > > >
>> > > > > > > > > Suggestion: Panda DataFrame as the format of the "data"
>> > component
>> > > > > > > > >
>> > > > > > > > > Kamil - you should have access now.
>> > > > > > > > >
>> > > > > > > > > J.
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > On Tue, Aug 18, 2020 at 6:53 PM Kamil Olszewski <
>> > > > > > > > > [email protected]>
>> > > > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > Hello all,
>> > > > > > > > > > in Polidea we have come up with an idea for a generic
>> > transfer
>> > > > > > > operator
>> > > > > > > > > > that would be able to transport data between two
>> > destinations
>> > > > of
>> > > > > > > > various
>> > > > > > > > > > types (file, database, storage, etc.) - please find the
>> > link
>> > > > > with a
>> > > > > > > > short
>> > > > > > > > > > doc with POC
>> > > > > > > > > > <
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> >
>> https://docs.google.com/document/d/1o7Ph7RRNqLWkTbe7xkWjb100eFaK1Apjv27LaqHgNkE/edit?usp=sharing
>> > > > > > > > > > >
>> > > > > > > > > > where we can discuss the design initially. Once we come
>> to
>> > the
>> > > > > > > initial
>> > > > > > > > > > conclusion I can create an AIP on cWiki - can I ask for
>> > > > > permission
>> > > > > > to
>> > > > > > > > do
>> > > > > > > > > so
>> > > > > > > > > > (my id is 'kamil.olszewski')? I believe that during the
>> > > > > discussion
>> > > > > > we
>> > > > > > > > > > should definitely aim for this feature to be released
>> only
>> > > > after
>> > > > > > > > Airflow
>> > > > > > > > > > 2.0 is out.
>> > > > > > > > > >
>> > > > > > > > > > What do you think about this idea? Would you find such
>> an
>> > > > > operator
>> > > > > > > > > helpful
>> > > > > > > > > > in your pipelines? Maybe you already use a similar
>> > solution or
>> > > > > know
>> > > > > > > > > > packages that could be used to implement it?
>> > > > > > > > > >
>> > > > > > > > > > Best regards,
>> > > > > > > > > > --
>> > > > > > > > > >
>> > > > > > > > > > Kamil Olszewski
>> > > > > > > > > > Polidea <https://www.polidea.com> | Software Engineer
>> > > > > > > > > >
>> > > > > > > > > > M: +48 503 361 783
>> > > > > > > > > > E: [email protected]
>> > > > > > > > > >
>> > > > > > > > > > Unique Tech
>> > > > > > > > > > Check out our projects! <
>> https://www.polidea.com/our-work>
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > --
>> > > > > > > > >
>> > > > > > > > > Jarek Potiuk
>> > > > > > > > > Polidea <https://www.polidea.com/> | Principal Software
>> > Engineer
>> > > > > > > > >
>> > > > > > > > > M: +48 660 796 129 <+48660796129>
>> > > > > > > > > [image: Polidea] <https://www.polidea.com/>
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > > >
>> > > > > > > --
>> > > > > > >
>> > > > > > > Kamil Olszewski
>> > > > > > > Polidea <https://www.polidea.com> | Software Engineer
>> > > > > > >
>> > > > > > > M: +48 503 361 783
>> > > > > > > E: [email protected]
>> > > > > > >
>> > > > > > > Unique Tech
>> > > > > > > Check out our projects! <https://www.polidea.com/our-work>
>> > > > > > >
>> > > > > >
>> > > > > >
>> > > > > > --
>> > > > > >
>> > > > > > Jarek Potiuk
>> > > > > > Polidea <https://www.polidea.com/> | Principal Software
>> Engineer
>> > > > > >
>> > > > > > M: +48 660 796 129 <+48660796129>
>> > > > > > [image: Polidea] <https://www.polidea.com/>
>> > > > > >
>> > > > >
>> >
>> >
>> >
>> > --
>> >
>> > Tomasz Urbaszek
>> > Polidea | Software Engineer
>> >
>> > M: +48 505 628 493
>> > E: [email protected]
>> >
>> > Unique Tech
>> > Check out our projects!
>> >
>>
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Reply via email to