Hello JB, can we help in any way to move things forward?
Thanks, D. On Mon, Dec 18, 2017 at 4:28 PM, Jean-Baptiste Onofré <[email protected]> wrote: > Thanks Jan, > > It makes sense. > > Let me take a look on the code to understand the "interaction". > > Regards > JB > > > On 12/18/2017 04:26 PM, Jan Lukavský wrote: > >> Hi JB, >> >> basically you are not wrong. The project started about three or four >> years ago with a goal to unify batch and streaming processing into single >> portable, executor independent API. Because of that, it is currently >> "close" to Beam in this sense. But we don't see much added value keeping >> this as a separate project, with one of the key differences to be the API >> (not the model itself), so we would like to focus on translation from >> Euphoria API to Beam's SDK. That's why we would like to see it as a DSL, so >> that it would be possible to use Euphoria API with Beam's runners as much >> natively as possible. >> >> I hope I didn't make the subject even more unclear, if so, I'll be happy >> to explain anything in more detail. :-) >> >> Jan >> >> >> On 12/18/2017 04:08 PM, Jean-Baptiste Onofré wrote: >> >>> Hi Jan, >>> >>> Thanks for your answers. >>> >>> However, they confused me ;) >>> >>> Regarding what you replied, Euphoria seems like a programming model/SDK >>> "close" to Beam more than a DSL on top of an existing Beam SDK. >>> >>> Am I wrong ? >>> >>> Regards >>> JB >>> >>> On 12/18/2017 03:44 PM, Jan Lukavský wrote: >>> >>>> Hi Ismael, >>>> >>>> basically we adopted the Beam's design regarding partitioning ( >>>> https://github.com/seznam/euphoria/issues/160) and implemented the >>>> sorting manually (https://github.com/seznam/euphoria/issues/158). I'm >>>> not aware of the time model differences (Euphoria supports ingestion and >>>> event time, we don't support processing time by decision). Regarding other >>>> differences (looking into Beam capability matrix, I'd say that): >>>> >>>> - we don't support stateful FlatMap (i.e. ParDo) for now ( >>>> https://github.com/seznam/euphoria/issues/192) >>>> >>>> - we don't support side inputs (by decision now, but might be >>>> reconsidered) and outputs (https://github.com/seznam/eup >>>> horia/issues/124) >>>> >>>> - we support complete event-time windows (non-merging, merging, >>>> aligned, unaligned) and time control >>>> >>>> - we don't support processing time by decision (might be reconsidered >>>> if a valid use-case is found) >>>> >>>> - we support window triggering based on both time and data, including >>>> discarding and accumulating (without accumulating & retracting) >>>> >>>> All our executors (runners) - Flink, Spark and Local - implement the >>>> complete model, which we enforce using "operator test kit" that all >>>> executors must pass. Spark executor supports bounded sources only (for >>>> now). As David said, we currently don't have serialization abstraction, so >>>> there is some work to be done in that regard. >>>> >>>> Our intention is to completely supersede Euphoria, we would like to >>>> consider possibility to use executors that would not rely on Beam, but that >>>> is optional now and should be straightforward. >>>> >>>> We'd be happy to answer any more questions you might have and thanks a >>>> lot! >>>> >>>> Best, >>>> >>>> Jan >>>> >>>> >>>> On 12/18/2017 03:19 PM, Ismaël Mejía wrote: >>>> >>>>> Hi, >>>>> >>>>> It is great to see that you guys have achieved a maturity point to >>>>> propose this. Congratulations for your work and the idea to contribute >>>>> it into Beam. >>>>> >>>>> I remember from a previous discussion with Jan about the model >>>>> mismatch between Euphoria and Beam, because of some design decisions >>>>> of both projects. I remember you guys had some issues with the way >>>>> Beam's sources do partitioning, as well as Beam's lack of sorted data >>>>> (on shuffle a la hadoop). Also if I remember well the 'time' model of >>>>> Euphoria was simpler than Beam's. I talk about all of this because I >>>>> am curious about what parts of the Euphoria model you guys had to >>>>> sacrifice to support Beam, and what parts of Beam's model should still >>>>> be integrated into Euphoria (and if there is a straightforward path to >>>>> do it). >>>>> >>>>> If I understand well if this gets merged into Apache this means that >>>>> Euphoria's current implementation would be superseded by this DSL? I >>>>> am curious because I would like to understand your level of investment >>>>> on supporting the future of this DSL. >>>>> >>>>> Thanks and congrats again ! >>>>> Ismaël >>>>> >>>>> On Mon, Dec 18, 2017 at 10:12 AM, Jean-Baptiste Onofré < >>>>> [email protected]> wrote: >>>>> >>>>>> Depending of the donation, you would need ICLA for each contributor, >>>>>> and >>>>>> CCLA in addition of SGA. >>>>>> >>>>>> We can sync with Davor and I for the legal stuff. >>>>>> However, I would wait a little bit just to have feedback from the >>>>>> whole team >>>>>> and start a formal vote. >>>>>> >>>>>> I would be happy to start the formal vote. >>>>>> >>>>>> Regards >>>>>> JB >>>>>> >>>>>> On 12/18/2017 10:03 AM, David Morávek wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> Thanks for the awesome feedback! >>>>>>> >>>>>>> Romain: >>>>>>> >>>>>>> We already use Java Stream API in all operators where it makes sense >>>>>>> (eg.: >>>>>>> ReduceByKey). Still not sure if it was a good choice, but i can be >>>>>>> easily >>>>>>> converted to iterator anyway. >>>>>>> >>>>>>> Side outputs support is coming soon, we already made an initial work >>>>>>> on >>>>>>> this. >>>>>>> >>>>>>> Side inputs are not supported in a way you are used to from beam, >>>>>>> because >>>>>>> it can be replaced by Join operator on the same key (if annotated >>>>>>> with >>>>>>> broadcastHashJoin, it will be turned into map side join). >>>>>>> >>>>>>> Only significant difference from Beam is, that we decided not to >>>>>>> abstract >>>>>>> serialization, so we need to add support for Type Hints, because of >>>>>>> type >>>>>>> erasure. >>>>>>> >>>>>>> Fluent API: >>>>>>> >>>>>>> API is fluent within one operator. It is designed to "lead the >>>>>>> programmer", which means, that he we'll be only offered methods that >>>>>>> makes >>>>>>> sense after the last method he used (eg.: in ReduceByKey, we know >>>>>>> that after >>>>>>> keyBy either reduceBy method should come). It is implemented as a >>>>>>> series of >>>>>>> builders. >>>>>>> >>>>>>> Davor: >>>>>>> >>>>>>> Thanks, I'll contact you, and will start the process of having all >>>>>>> the >>>>>>> necessary paperwork signed on our side, so we can get things moving. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Dec 18, 2017 at 7:46 AM, Romain Manni-Bucau < >>>>>>> [email protected] >>>>>>> <mailto:[email protected]>> wrote: >>>>>>> >>>>>>> Hi guys >>>>>>> >>>>>>> A DSL would be very welcomed, in particular if fluent. >>>>>>> >>>>>>> Open question: did you study to implement Stream API (surely >>>>>>> extending >>>>>>> it to >>>>>>> have a BeamStream and a few more features like sides etc)? >>>>>>> Would be >>>>>>> very >>>>>>> natural and integrable easily anywhere and avoid a new API >>>>>>> discovery. >>>>>>> >>>>>>> Hazelcast jet did it so I dont see why Beam couldnt. >>>>>>> >>>>>>> Le 18 déc. 2017 07:26, "Davor Bonaci" <[email protected] >>>>>>> <mailto:[email protected]>> a écrit : >>>>>>> >>>>>>> Hi David, >>>>>>> As JB noted, merging of these two projects is a great idea. >>>>>>> If >>>>>>> fact, >>>>>>> some of us have had those discussions in the past. >>>>>>> >>>>>>> Legally, nothing particular is strictly necessary as the >>>>>>> code seem >>>>>>> to >>>>>>> already be Apache 2.0 licensed. We don't, however, want to >>>>>>> be >>>>>>> perceived >>>>>>> as making hostile forks, so it would be great to file a >>>>>>> Software >>>>>>> Grant >>>>>>> Agreement with the ASF Secretary. I can help with the >>>>>>> process, as >>>>>>> necessary. >>>>>>> >>>>>>> Project alignment-wise, there aren't any particular >>>>>>> blockers that >>>>>>> I am >>>>>>> aware of. We welcome DSLs. >>>>>>> >>>>>>> Technically, the code would start in a feature branch. >>>>>>> During this >>>>>>> stage, we'd need to validate a few things, including >>>>>>> confirmation >>>>>>> the >>>>>>> code and dependencies match the ASF policy, automate >>>>>>> testing in >>>>>>> Beam's >>>>>>> tooling, etc. At that point, we'd take a community vote to >>>>>>> accept >>>>>>> the >>>>>>> component into master, and consider author(s) for >>>>>>> committership in >>>>>>> the >>>>>>> overall project. >>>>>>> >>>>>>> Welcome to the ASF and Beam -- we are thrilled to have you! >>>>>>> Hope >>>>>>> this >>>>>>> helps, and please reach out if anybody on our end can help, >>>>>>> including JB >>>>>>> or myself. >>>>>>> >>>>>>> Davor >>>>>>> >>>>>>> >>>>>>> On Sun, Dec 17, 2017 at 10:13 AM, Jean-Baptiste Onofré >>>>>>> <[email protected] >>>>>>> <mailto:[email protected]>> wrote: >>>>>>> >>>>>>> Hi David, >>>>>>> >>>>>>> Generally speaking, having different fluent DSL on top >>>>>>> of the >>>>>>> Beam >>>>>>> SDK is great. >>>>>>> >>>>>>> I would like to take a look on your wordcount examples >>>>>>> to give >>>>>>> you a >>>>>>> complete feedback. I like the idea and a fluent Java >>>>>>> DSL is >>>>>>> valuable. >>>>>>> >>>>>>> Let's wait feedback from others. If we have a >>>>>>> consensus, then >>>>>>> I >>>>>>> would be more than happy to help you for the donation (I >>>>>>> worked on >>>>>>> the Camel Java DSL while ago, so I have some experience >>>>>>> here). >>>>>>> >>>>>>> Thanks ! >>>>>>> Regards >>>>>>> JB >>>>>>> >>>>>>> On 12/17/2017 07:00 PM, David Morávek wrote: >>>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> >>>>>>> First of all, thanks for the amazing work the >>>>>>> Apache Beam >>>>>>> community is doing! >>>>>>> >>>>>>> >>>>>>> In 2014, we've started development of the runtime >>>>>>> independent >>>>>>> Java 8 API, that helps us to create unified big-data >>>>>>> processing >>>>>>> flows. It has been used as a core building block of >>>>>>> Seznam.cz >>>>>>> web crawler data infrastructure every since. Its >>>>>>> design >>>>>>> principles and execution model are very similar to >>>>>>> Apache >>>>>>> Beam. >>>>>>> >>>>>>> >>>>>>> This API was open sourced in 2016, under the name >>>>>>> Euphoria >>>>>>> API: >>>>>>> >>>>>>> https://github.com/seznam/euphoria >>>>>>> <https://github.com/seznam/euphoria> >>>>>>> >>>>>>> >>>>>>> As it is very similar to Apache Beam, we feel, that >>>>>>> it is >>>>>>> not >>>>>>> worth of duplicating effort in terms of development >>>>>>> of new >>>>>>> runtimes and fine-tuning of current ones. >>>>>>> >>>>>>> >>>>>>> The main blocker for us to switch to Apache Beam is >>>>>>> lack >>>>>>> of the >>>>>>> Java 8 API. *W*e propose the integration of >>>>>>> Euphoria API >>>>>>> into >>>>>>> Apache Beam as a Java 8 DSL, in order to share our >>>>>>> effort >>>>>>> with >>>>>>> the community. >>>>>>> >>>>>>> >>>>>>> Simple example of the Euphoria API usage, can be >>>>>>> found >>>>>>> here: >>>>>>> >>>>>>> >>>>>>> https://github.com/seznam/euphoria/tree/master/euphoria-exam >>>>>>> ples/src/main/java/cz/seznam/euphoria/examples/wordcount >>>>>>> >>>>>>> <https://github.com/seznam/euphoria/tree/master/euphoria-exa >>>>>>> mples/src/main/java/cz/seznam/euphoria/examples/wordcount> >>>>>>> >>>>>>> >>>>>>> If you feel, that Beam community could leverage >>>>>>> from our >>>>>>> work, >>>>>>> we would love to start working on Euphoria >>>>>>> integration >>>>>>> into >>>>>>> Apache Beam (we already have a working POC, with >>>>>>> few basic >>>>>>> operators implemented). >>>>>>> >>>>>>> >>>>>>> I look forward to hearing from you, >>>>>>> >>>>>>> David >>>>>>> >>>>>>> >>>>>>> -- Jean-Baptiste Onofré >>>>>>> [email protected] <mailto:[email protected]> >>>>>>> http://blog.nanthrax.net >>>>>>> Talend - http://www.talend.com >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> s pozdravem >>>>>>> >>>>>>> David Morávek >>>>>>> >>>>>> >>>>>> -- >>>>>> Jean-Baptiste Onofré >>>>>> [email protected] >>>>>> http://blog.nanthrax.net >>>>>> Talend - http://www.talend.com >>>>>> >>>>> >>>> >>> >> > -- > Jean-Baptiste Onofré > [email protected] > http://blog.nanthrax.net > Talend - http://www.talend.com > -- s pozdravem David Morávek
