Hi Ismael,

basically we adopted the Beam's design regarding partitioning (https://github.com/seznam/euphoria/issues/160) and implemented the sorting manually (https://github.com/seznam/euphoria/issues/158). I'm not aware of the time model differences (Euphoria supports ingestion and event time, we don't support processing time by decision). Regarding other differences (looking into Beam capability matrix, I'd say that):

 - we don't support stateful FlatMap (i.e. ParDo) for now (https://github.com/seznam/euphoria/issues/192)

 - we don't support side inputs (by decision now, but might be reconsidered) and outputs (https://github.com/seznam/euphoria/issues/124)

 - we support complete event-time windows (non-merging, merging, aligned, unaligned) and time control

 - we don't support processing time by decision (might be reconsidered if a valid use-case is found)

 - we support window triggering based on both time and data, including discarding and accumulating (without accumulating & retracting)

All our executors (runners) - Flink, Spark and Local - implement the complete model, which we enforce using "operator test kit" that all executors must pass. Spark executor supports bounded sources only (for now). As David said, we currently don't have serialization abstraction, so there is some work to be done in that regard.

Our intention is to completely supersede Euphoria, we would like to consider possibility to use executors that would not rely on Beam, but that is optional now and should be straightforward.

We'd be happy to answer any more questions you might have and thanks a lot!

Best,

 Jan


On 12/18/2017 03:19 PM, Ismaël Mejía wrote:
Hi,

It is great to see that you guys have achieved a maturity point to
propose this. Congratulations for your work and the idea to contribute
it into Beam.

I remember from a previous discussion with Jan about the model
mismatch between Euphoria and Beam, because of some design decisions
of both projects. I remember you guys had some issues with the way
Beam's sources do partitioning, as well as Beam's lack of sorted data
(on shuffle a la hadoop). Also if I remember well the 'time' model of
Euphoria was simpler than Beam's. I talk about all of this because I
am curious about what parts of the Euphoria model you guys had to
sacrifice to support Beam, and what parts of Beam's model should still
be integrated into Euphoria (and if there is a straightforward path to
do it).

If I understand well if this gets merged into Apache this means that
Euphoria's current implementation would be superseded by this DSL? I
am curious because I would like to understand your level of investment
on supporting the future of this DSL.

Thanks and congrats again !
Ismaël

On Mon, Dec 18, 2017 at 10:12 AM, Jean-Baptiste Onofré <j...@nanthrax.net> 
wrote:
Depending of the donation, you would need ICLA for each contributor, and
CCLA in addition of SGA.

We can sync with Davor and I for the legal stuff.
However, I would wait a little bit just to have feedback from the whole team
and start a formal vote.

I would be happy to start the formal vote.

Regards
JB

On 12/18/2017 10:03 AM, David Morávek wrote:
Hello,

Thanks for the awesome feedback!

Romain:

We already use Java Stream API in all operators where it makes sense (eg.:
ReduceByKey). Still not sure if it was a good choice, but i can be easily
converted to iterator anyway.

Side outputs support is coming soon, we already made an initial work on
this.

Side inputs are not supported in a way you are used to from beam, because
it can be replaced by Join operator on the same key (if annotated with
broadcastHashJoin, it will be turned into map side join).

Only significant difference from Beam is, that we decided not to abstract
serialization, so we need to add support for Type Hints, because of type
erasure.

Fluent API:

API is fluent within one operator. It is designed to "lead the
programmer", which means, that he we'll be only offered methods that makes
sense after the last method he used (eg.: in ReduceByKey, we know that after
keyBy either reduceBy method should come). It is implemented as a series of
builders.

Davor:

Thanks, I'll contact you, and will start the process of having all the
necessary paperwork signed on our side, so we can get things moving.












On Mon, Dec 18, 2017 at 7:46 AM, Romain Manni-Bucau <rmannibu...@gmail.com
<mailto:rmannibu...@gmail.com>> wrote:

     Hi guys

     A DSL would be very welcomed, in particular if fluent.

     Open question: did you study to implement Stream API (surely extending
it to
     have a BeamStream and a few more features like sides etc)? Would be
very
     natural and integrable easily anywhere and avoid a new API discovery.

     Hazelcast jet did it so I dont see why Beam couldnt.

     Le 18 déc. 2017 07:26, "Davor Bonaci" <da...@apache.org
     <mailto:da...@apache.org>> a écrit :

         Hi David,
         As JB noted, merging of these two projects is a great idea. If
fact,
         some of us have had those discussions in the past.

         Legally, nothing particular is strictly necessary as the code seem
to
         already be Apache 2.0 licensed. We don't, however, want to be
perceived
         as making hostile forks, so it would be great to file a Software
Grant
         Agreement with the ASF Secretary. I can help with the process, as
necessary.

         Project alignment-wise, there aren't any particular blockers that
I am
         aware of. We welcome DSLs.

         Technically, the code would start in a feature branch. During this
         stage, we'd need to validate a few things, including confirmation
the
         code and dependencies match the ASF policy, automate testing in
Beam's
         tooling, etc. At that point, we'd take a community vote to accept
the
         component into master, and consider author(s) for committership in
the
         overall project.

         Welcome to the ASF and Beam -- we are thrilled to have you! Hope
this
         helps, and please reach out if anybody on our end can help,
including JB
         or myself.

         Davor


         On Sun, Dec 17, 2017 at 10:13 AM, Jean-Baptiste Onofré
<j...@nanthrax.net
         <mailto:j...@nanthrax.net>> wrote:

             Hi David,

             Generally speaking, having different fluent DSL on top of the
Beam
             SDK is great.

             I would like to take a look on your wordcount examples to give
you a
             complete feedback. I like the idea and a fluent Java DSL is
valuable.

             Let's wait feedback from others. If we have a consensus, then
I
             would be more than happy to help you for the donation (I
worked on
             the Camel Java DSL while ago, so I have some experience here).

             Thanks !
             Regards
             JB

             On 12/17/2017 07:00 PM, David Morávek wrote:

                 Hello,


                 First of all, thanks for the amazing work the Apache Beam
                 community is doing!


                 In 2014, we've started development of the runtime
independent
                 Java 8 API, that helps us to create unified big-data
processing
                 flows. It has been used as a core building block of
Seznam.cz
                 web crawler data infrastructure every since. Its design
                 principles and execution model are very similar to Apache
Beam.


                 This API was open sourced in 2016, under the name Euphoria
API:

                 https://github.com/seznam/euphoria
                 <https://github.com/seznam/euphoria>


                 As it is very similar to Apache Beam, we feel, that it is
not
                 worth of duplicating effort in terms of development of new
                 runtimes and fine-tuning of current ones.


                 The main blocker for us to switch to Apache Beam is lack
of the
                 Java 8 API. *W*e propose the integration of Euphoria API
into
                 Apache Beam as a Java 8 DSL, in order to share our effort
with
                 the community.


                 Simple example of the Euphoria API usage, can be found
here:


https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount

<https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount>


                 If you feel, that Beam community could leverage from our
work,
                 we would love to start working on Euphoria integration
into
                 Apache Beam (we already have a working POC, with few basic
                 operators implemented).


                 I look forward to hearing from you,

                 David


             --             Jean-Baptiste Onofré
             jbono...@apache.org <mailto:jbono...@apache.org>
             http://blog.nanthrax.net
             Talend - http://www.talend.com





--
s pozdravem

David Morávek

--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Reply via email to