I spoke to Davor from the Beam team about this today at the Apache Big Data.

In the bigger picture multiple DSLs and language specific SDKs are
translated into a language independent representation, which then is
translated by the runner to the execution engine. It seems possible to pass
hints or annotations that can be accessed at the runner level and used for
optimizations. There is also the notion of hierarchical constructs similar
to our modules.

I have also contacted the Beam folks for a follow-up on how we can
collaborate on this.

Thanks,
Thomas


On Mon, May 9, 2016 at 1:08 PM, Siyuan Hua <[email protected]> wrote:

> Hey Ilya,
>
> Since I'm working on java High-level API, I also looked at Apache Beam.
> Some questions are asked like is high-level API replaceable by Apache Beam
> or can we just follow the Apache Beam API that based on Google Dataflow
> Model. Well here is something I found:
>
> 1. Beam provides whole bunch of classes to define DAG and options of how to
> run it. There is no easy way to extend their DAG API or implement them on
> your own.
>
> 2. The way to use Beam API is use whatever they have to construct a dag,
> get the graph data structure and convert it to Apex DAG and run it with our
> engine. Beam follows visitor design pattern which is similar to ASM. Here
> are 2 core parts to run Beam application in Apex. One is pipeline which is
> Dag structure in Beam
>
> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/Pipeline.java
> And the other is the Visitor interface which defines callback functions
> when you visit each node in the dag. Here is an example of Flink
> translator(visitor)
>
> https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/translation/FlinkStreamingTransformTranslators.java
>
> 3. Although I think dataflow model is a very good and complete model for
> stream process to follow, I don't the Beam API is very declarative and
> expressive. I still suggest we build a whole bunch of API that could
> deliver same features in dataflow model but more Stream(java stream) like
> and SQL like.
>
> In summary, I think the integration is just rum Beam dag with different
> engine(storem, flink, spark or apex). But if you want to mingle Beam API
> with other ones, it is not very easy.
>
> And also I think we need to work on is not only translation but also
> implement some operators that provide the missing features in dataflow
> model. And those operators can also be used in high-level API.
>
> Regards,
> Siyuan
>
> On Mon, May 9, 2016 at 11:51 AM, Thomas Weise <[email protected]>
> wrote:
>
> > Hi Ilya,
> >
> > Absolutely, this has been discussed in is "on the roadmap".  A quick
> search
> > reveals that a JIRA was already created for it:
> > https://issues.apache.org/
> > jira/browse/BEAM-261
> >
> > We are currently discussing the windowing semantics in the context of
> high
> > level stream API, perhaps Siyuan can post his notes here?
> >
> > Thanks,
> > Thomas
> >
> >
> > On Mon, May 9, 2016 at 11:25 AM, Ganelin, Ilya <
> > [email protected]>
> > wrote:
> >
> > > Hello, all – Google has just published a new blog announcing the first
> > > complete integration of an open source project (Apache Flink) with
> Apache
> > > Beam:
> > >
> > >
> > >
> >
> https://cloud.google.com/blog/big-data/2016/05/why-apache-beam-a-google-perspective
> > >
> > > http://data-artisans.com/why-apache-beam/
> > >
> > > Apache Beam is a unifying framework that allows users to leverage
> > > disparate streaming computational frameworks such as Storm, DataFlow,
> > > Flink, or Spark using a single API. This integration demands that the
> > > framework conform to the Beam programming model:
> > > http://vldb.org/pvldb/vol8/p1792-Akidau.pdf, and provides the
> > appropriate
> > > APIs.
> > >
> > > While cumbersome, the benefit of integrating with Beam is tremendous.
> > > Since there is no single framework that solves all streaming problems
> for
> > > all use cases, the ability to combine frameworks at-will makes
> developing
> > > end-state applications much more straightforward. I believe that many
> > > projects will choose to leverage Apache Beam to take advantage of this
> > and
> > > if Apex does not provide support Beam, it will fall behind, replaced by
> > > those frameworks that fit the easy-to-use model of Beam.
> > >
> > > If we become early adopters, we have a unique opportunity to become
> part
> > > of what will quite possible become a very large community of users and
> to
> > > capitalize on the inherent name recognition of Google to elevate the
> Apex
> > > project and expose it to many who would otherwise not be aware of it.
> > >
> > > I think integration with Beam can pair with the recent work on
> developing
> > > a high-level API for Apex and is a natural evolution towards making
> Apex
> > > more accessible and more usable by a broader technical community.
> > >
> > > If there is compelling interest around making this effort a reality, I
> > > would love to get this conversation started and work on translating
> this
> > > into a concrete plan of action.
> > >
> > >
> > > ________________________________________________________
> > >
> > > The information contained in this e-mail is confidential and/or
> > > proprietary to Capital One and/or its affiliates and may only be used
> > > solely in performance of work or services for Capital One. The
> > information
> > > transmitted herewith is intended only for use by the individual or
> entity
> > > to which it is addressed. If the reader of this message is not the
> > intended
> > > recipient, you are hereby notified that any review, retransmission,
> > > dissemination, distribution, copying or other use of, or taking of any
> > > action in reliance upon this information is strictly prohibited. If you
> > > have received this communication in error, please contact the sender
> and
> > > delete the material from your computer.
> > >
> >
>

Reply via email to