+1, I'm supportive of seeing this move forward. What remaining concrete concerns are there?
-Tyler On Tue, Jan 2, 2018 at 8:35 AM David Morávek <david.mora...@gmail.com> wrote: > Hello JB, > > can we help in any way to move things forward? > > Thanks, > D. > > On Mon, Dec 18, 2017 at 4:28 PM, Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > >> Thanks Jan, >> >> It makes sense. >> >> Let me take a look on the code to understand the "interaction". >> >> Regards >> JB >> >> >> On 12/18/2017 04:26 PM, Jan Lukavský wrote: >> >>> Hi JB, >>> >>> basically you are not wrong. The project started about three or four >>> years ago with a goal to unify batch and streaming processing into single >>> portable, executor independent API. Because of that, it is currently >>> "close" to Beam in this sense. But we don't see much added value keeping >>> this as a separate project, with one of the key differences to be the API >>> (not the model itself), so we would like to focus on translation from >>> Euphoria API to Beam's SDK. That's why we would like to see it as a DSL, so >>> that it would be possible to use Euphoria API with Beam's runners as much >>> natively as possible. >>> >>> I hope I didn't make the subject even more unclear, if so, I'll be happy >>> to explain anything in more detail. :-) >>> >>> Jan >>> >>> >>> On 12/18/2017 04:08 PM, Jean-Baptiste Onofré wrote: >>> >>>> Hi Jan, >>>> >>>> Thanks for your answers. >>>> >>>> However, they confused me ;) >>>> >>>> Regarding what you replied, Euphoria seems like a programming model/SDK >>>> "close" to Beam more than a DSL on top of an existing Beam SDK. >>>> >>>> Am I wrong ? >>>> >>>> Regards >>>> JB >>>> >>>> On 12/18/2017 03:44 PM, Jan Lukavský wrote: >>>> >>>>> Hi Ismael, >>>>> >>>>> basically we adopted the Beam's design regarding partitioning ( >>>>> https://github.com/seznam/euphoria/issues/160) and implemented the >>>>> sorting manually (https://github.com/seznam/euphoria/issues/158). I'm >>>>> not aware of the time model differences (Euphoria supports ingestion and >>>>> event time, we don't support processing time by decision). Regarding other >>>>> differences (looking into Beam capability matrix, I'd say that): >>>>> >>>>> - we don't support stateful FlatMap (i.e. ParDo) for now ( >>>>> https://github.com/seznam/euphoria/issues/192) >>>>> >>>>> - we don't support side inputs (by decision now, but might be >>>>> reconsidered) and outputs ( >>>>> https://github.com/seznam/euphoria/issues/124) >>>>> >>>>> - we support complete event-time windows (non-merging, merging, >>>>> aligned, unaligned) and time control >>>>> >>>>> - we don't support processing time by decision (might be >>>>> reconsidered if a valid use-case is found) >>>>> >>>>> - we support window triggering based on both time and data, >>>>> including discarding and accumulating (without accumulating & retracting) >>>>> >>>>> All our executors (runners) - Flink, Spark and Local - implement the >>>>> complete model, which we enforce using "operator test kit" that all >>>>> executors must pass. Spark executor supports bounded sources only (for >>>>> now). As David said, we currently don't have serialization abstraction, so >>>>> there is some work to be done in that regard. >>>>> >>>>> Our intention is to completely supersede Euphoria, we would like to >>>>> consider possibility to use executors that would not rely on Beam, but >>>>> that >>>>> is optional now and should be straightforward. >>>>> >>>>> We'd be happy to answer any more questions you might have and thanks a >>>>> lot! >>>>> >>>>> Best, >>>>> >>>>> Jan >>>>> >>>>> >>>>> On 12/18/2017 03:19 PM, Ismaël Mejía wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> It is great to see that you guys have achieved a maturity point to >>>>>> propose this. Congratulations for your work and the idea to contribute >>>>>> it into Beam. >>>>>> >>>>>> I remember from a previous discussion with Jan about the model >>>>>> mismatch between Euphoria and Beam, because of some design decisions >>>>>> of both projects. I remember you guys had some issues with the way >>>>>> Beam's sources do partitioning, as well as Beam's lack of sorted data >>>>>> (on shuffle a la hadoop). Also if I remember well the 'time' model of >>>>>> Euphoria was simpler than Beam's. I talk about all of this because I >>>>>> am curious about what parts of the Euphoria model you guys had to >>>>>> sacrifice to support Beam, and what parts of Beam's model should still >>>>>> be integrated into Euphoria (and if there is a straightforward path to >>>>>> do it). >>>>>> >>>>>> If I understand well if this gets merged into Apache this means that >>>>>> Euphoria's current implementation would be superseded by this DSL? I >>>>>> am curious because I would like to understand your level of investment >>>>>> on supporting the future of this DSL. >>>>>> >>>>>> Thanks and congrats again ! >>>>>> Ismaël >>>>>> >>>>>> On Mon, Dec 18, 2017 at 10:12 AM, Jean-Baptiste Onofré < >>>>>> j...@nanthrax.net> wrote: >>>>>> >>>>>>> Depending of the donation, you would need ICLA for each contributor, >>>>>>> and >>>>>>> CCLA in addition of SGA. >>>>>>> >>>>>>> We can sync with Davor and I for the legal stuff. >>>>>>> However, I would wait a little bit just to have feedback from the >>>>>>> whole team >>>>>>> and start a formal vote. >>>>>>> >>>>>>> I would be happy to start the formal vote. >>>>>>> >>>>>>> Regards >>>>>>> JB >>>>>>> >>>>>>> On 12/18/2017 10:03 AM, David Morávek wrote: >>>>>>> >>>>>>>> Hello, >>>>>>>> >>>>>>>> Thanks for the awesome feedback! >>>>>>>> >>>>>>>> Romain: >>>>>>>> >>>>>>>> We already use Java Stream API in all operators where it makes >>>>>>>> sense (eg.: >>>>>>>> ReduceByKey). Still not sure if it was a good choice, but i can be >>>>>>>> easily >>>>>>>> converted to iterator anyway. >>>>>>>> >>>>>>>> Side outputs support is coming soon, we already made an initial >>>>>>>> work on >>>>>>>> this. >>>>>>>> >>>>>>>> Side inputs are not supported in a way you are used to from beam, >>>>>>>> because >>>>>>>> it can be replaced by Join operator on the same key (if annotated >>>>>>>> with >>>>>>>> broadcastHashJoin, it will be turned into map side join). >>>>>>>> >>>>>>>> Only significant difference from Beam is, that we decided not to >>>>>>>> abstract >>>>>>>> serialization, so we need to add support for Type Hints, because of >>>>>>>> type >>>>>>>> erasure. >>>>>>>> >>>>>>>> Fluent API: >>>>>>>> >>>>>>>> API is fluent within one operator. It is designed to "lead the >>>>>>>> programmer", which means, that he we'll be only offered methods >>>>>>>> that makes >>>>>>>> sense after the last method he used (eg.: in ReduceByKey, we know >>>>>>>> that after >>>>>>>> keyBy either reduceBy method should come). It is implemented as a >>>>>>>> series of >>>>>>>> builders. >>>>>>>> >>>>>>>> Davor: >>>>>>>> >>>>>>>> Thanks, I'll contact you, and will start the process of having all >>>>>>>> the >>>>>>>> necessary paperwork signed on our side, so we can get things moving. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Dec 18, 2017 at 7:46 AM, Romain Manni-Bucau < >>>>>>>> rmannibu...@gmail.com >>>>>>>> <mailto:rmannibu...@gmail.com>> wrote: >>>>>>>> >>>>>>>> Hi guys >>>>>>>> >>>>>>>> A DSL would be very welcomed, in particular if fluent. >>>>>>>> >>>>>>>> Open question: did you study to implement Stream API (surely >>>>>>>> extending >>>>>>>> it to >>>>>>>> have a BeamStream and a few more features like sides etc)? >>>>>>>> Would be >>>>>>>> very >>>>>>>> natural and integrable easily anywhere and avoid a new API >>>>>>>> discovery. >>>>>>>> >>>>>>>> Hazelcast jet did it so I dont see why Beam couldnt. >>>>>>>> >>>>>>>> Le 18 déc. 2017 07:26, "Davor Bonaci" <da...@apache.org >>>>>>>> <mailto:da...@apache.org>> a écrit : >>>>>>>> >>>>>>>> Hi David, >>>>>>>> As JB noted, merging of these two projects is a great >>>>>>>> idea. If >>>>>>>> fact, >>>>>>>> some of us have had those discussions in the past. >>>>>>>> >>>>>>>> Legally, nothing particular is strictly necessary as the >>>>>>>> code seem >>>>>>>> to >>>>>>>> already be Apache 2.0 licensed. We don't, however, want to >>>>>>>> be >>>>>>>> perceived >>>>>>>> as making hostile forks, so it would be great to file a >>>>>>>> Software >>>>>>>> Grant >>>>>>>> Agreement with the ASF Secretary. I can help with the >>>>>>>> process, as >>>>>>>> necessary. >>>>>>>> >>>>>>>> Project alignment-wise, there aren't any particular >>>>>>>> blockers that >>>>>>>> I am >>>>>>>> aware of. We welcome DSLs. >>>>>>>> >>>>>>>> Technically, the code would start in a feature branch. >>>>>>>> During this >>>>>>>> stage, we'd need to validate a few things, including >>>>>>>> confirmation >>>>>>>> the >>>>>>>> code and dependencies match the ASF policy, automate >>>>>>>> testing in >>>>>>>> Beam's >>>>>>>> tooling, etc. At that point, we'd take a community vote to >>>>>>>> accept >>>>>>>> the >>>>>>>> component into master, and consider author(s) for >>>>>>>> committership in >>>>>>>> the >>>>>>>> overall project. >>>>>>>> >>>>>>>> Welcome to the ASF and Beam -- we are thrilled to have >>>>>>>> you! Hope >>>>>>>> this >>>>>>>> helps, and please reach out if anybody on our end can help, >>>>>>>> including JB >>>>>>>> or myself. >>>>>>>> >>>>>>>> Davor >>>>>>>> >>>>>>>> >>>>>>>> On Sun, Dec 17, 2017 at 10:13 AM, Jean-Baptiste Onofré >>>>>>>> <j...@nanthrax.net >>>>>>>> <mailto:j...@nanthrax.net>> wrote: >>>>>>>> >>>>>>>> Hi David, >>>>>>>> >>>>>>>> Generally speaking, having different fluent DSL on top >>>>>>>> of the >>>>>>>> Beam >>>>>>>> SDK is great. >>>>>>>> >>>>>>>> I would like to take a look on your wordcount examples >>>>>>>> to give >>>>>>>> you a >>>>>>>> complete feedback. I like the idea and a fluent Java >>>>>>>> DSL is >>>>>>>> valuable. >>>>>>>> >>>>>>>> Let's wait feedback from others. If we have a >>>>>>>> consensus, then >>>>>>>> I >>>>>>>> would be more than happy to help you for the donation >>>>>>>> (I >>>>>>>> worked on >>>>>>>> the Camel Java DSL while ago, so I have some >>>>>>>> experience here). >>>>>>>> >>>>>>>> Thanks ! >>>>>>>> Regards >>>>>>>> JB >>>>>>>> >>>>>>>> On 12/17/2017 07:00 PM, David Morávek wrote: >>>>>>>> >>>>>>>> Hello, >>>>>>>> >>>>>>>> >>>>>>>> First of all, thanks for the amazing work the >>>>>>>> Apache Beam >>>>>>>> community is doing! >>>>>>>> >>>>>>>> >>>>>>>> In 2014, we've started development of the runtime >>>>>>>> independent >>>>>>>> Java 8 API, that helps us to create unified >>>>>>>> big-data >>>>>>>> processing >>>>>>>> flows. It has been used as a core building block of >>>>>>>> Seznam.cz >>>>>>>> web crawler data infrastructure every since. Its >>>>>>>> design >>>>>>>> principles and execution model are very similar to >>>>>>>> Apache >>>>>>>> Beam. >>>>>>>> >>>>>>>> >>>>>>>> This API was open sourced in 2016, under the name >>>>>>>> Euphoria >>>>>>>> API: >>>>>>>> >>>>>>>> https://github.com/seznam/euphoria >>>>>>>> <https://github.com/seznam/euphoria> >>>>>>>> >>>>>>>> >>>>>>>> As it is very similar to Apache Beam, we feel, >>>>>>>> that it is >>>>>>>> not >>>>>>>> worth of duplicating effort in terms of >>>>>>>> development of new >>>>>>>> runtimes and fine-tuning of current ones. >>>>>>>> >>>>>>>> >>>>>>>> The main blocker for us to switch to Apache Beam >>>>>>>> is lack >>>>>>>> of the >>>>>>>> Java 8 API. *W*e propose the integration of >>>>>>>> Euphoria API >>>>>>>> into >>>>>>>> Apache Beam as a Java 8 DSL, in order to share our >>>>>>>> effort >>>>>>>> with >>>>>>>> the community. >>>>>>>> >>>>>>>> >>>>>>>> Simple example of the Euphoria API usage, can be >>>>>>>> found >>>>>>>> here: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount >>>>>>>> >>>>>>>> < >>>>>>>> https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> If you feel, that Beam community could leverage >>>>>>>> from our >>>>>>>> work, >>>>>>>> we would love to start working on Euphoria >>>>>>>> integration >>>>>>>> into >>>>>>>> Apache Beam (we already have a working POC, with >>>>>>>> few basic >>>>>>>> operators implemented). >>>>>>>> >>>>>>>> >>>>>>>> I look forward to hearing from you, >>>>>>>> >>>>>>>> David >>>>>>>> >>>>>>>> >>>>>>>> -- Jean-Baptiste Onofré >>>>>>>> jbono...@apache.org <mailto:jbono...@apache.org> >>>>>>>> http://blog.nanthrax.net >>>>>>>> Talend - http://www.talend.com >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> s pozdravem >>>>>>>> >>>>>>>> David Morávek >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Jean-Baptiste Onofré >>>>>>> jbono...@apache.org >>>>>>> http://blog.nanthrax.net >>>>>>> Talend - http://www.talend.com >>>>>>> >>>>>> >>>>> >>>> >>> >> -- >> Jean-Baptiste Onofré >> jbono...@apache.org >> http://blog.nanthrax.net >> Talend - http://www.talend.com >> > > > > -- > s pozdravem > > David Morávek >