Hello JB, Perfect! I'm already on the Beam Slack workspace, I'll contact you once I get to the office.
Thanks! D. On Wed, Jan 3, 2018 at 6:19 AM, Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > Hi David, > > absolutely !! Let's move forward on the preparation steps. > > Are you on Slack and/or hangout to plan this ? > > Thanks, > Regards > JB > > On 01/02/2018 05:35 PM, David Morávek wrote: > >> Hello JB, >> >> can we help in any way to move things forward? >> >> Thanks, >> D. >> >> On Mon, Dec 18, 2017 at 4:28 PM, Jean-Baptiste Onofré <j...@nanthrax.net >> <mailto:j...@nanthrax.net>> wrote: >> >> Thanks Jan, >> >> It makes sense. >> >> Let me take a look on the code to understand the "interaction". >> >> Regards >> JB >> >> >> On 12/18/2017 04:26 PM, Jan Lukavský wrote: >> >> Hi JB, >> >> basically you are not wrong. The project started about three or >> four >> years ago with a goal to unify batch and streaming processing into >> single portable, executor independent API. Because of that, it is >> currently "close" to Beam in this sense. But we don't see much >> added >> value keeping this as a separate project, with one of the key >> differences to be the API (not the model itself), so we would >> like to >> focus on translation from Euphoria API to Beam's SDK. That's why >> we >> would like to see it as a DSL, so that it would be possible to use >> Euphoria API with Beam's runners as much natively as possible. >> >> I hope I didn't make the subject even more unclear, if so, I'll >> be happy >> to explain anything in more detail. :-) >> >> Jan >> >> >> On 12/18/2017 04:08 PM, Jean-Baptiste Onofré wrote: >> >> Hi Jan, >> >> Thanks for your answers. >> >> However, they confused me ;) >> >> Regarding what you replied, Euphoria seems like a programming >> model/SDK "close" to Beam more than a DSL on top of an >> existing Beam >> SDK. >> >> Am I wrong ? >> >> Regards >> JB >> >> On 12/18/2017 03:44 PM, Jan Lukavský wrote: >> >> Hi Ismael, >> >> basically we adopted the Beam's design regarding >> partitioning >> (https://github.com/seznam/euphoria/issues/160 >> <https://github.com/seznam/euphoria/issues/160>) and >> implemented >> the sorting manually >> (https://github.com/seznam/euphoria/issues/158 >> <https://github.com/seznam/euphoria/issues/158>). I'm >> not aware >> of the time model differences (Euphoria supports >> ingestion and >> event time, we don't support processing time by decision). >> Regarding other differences (looking into Beam capability >> matrix, I'd say that): >> >> - we don't support stateful FlatMap (i.e. ParDo) for >> now >> (https://github.com/seznam/euphoria/issues/192 >> <https://github.com/seznam/euphoria/issues/192>) >> >> - we don't support side inputs (by decision now, but >> might be >> reconsidered) and outputs >> (https://github.com/seznam/euphoria/issues/124 >> <https://github.com/seznam/euphoria/issues/124>) >> >> >> - we support complete event-time windows (non-merging, >> merging, aligned, unaligned) and time control >> >> - we don't support processing time by decision (might >> be >> reconsidered if a valid use-case is found) >> >> - we support window triggering based on both time and >> data, >> including discarding and accumulating (without >> accumulating & >> retracting) >> >> All our executors (runners) - Flink, Spark and Local - >> implement >> the complete model, which we enforce using "operator test >> kit" >> that all executors must pass. Spark executor supports >> bounded >> sources only (for now). As David said, we currently don't >> have >> serialization abstraction, so there is some work to be >> done in >> that regard. >> >> Our intention is to completely supersede Euphoria, we >> would like >> to consider possibility to use executors that would not >> rely on >> Beam, but that is optional now and should be >> straightforward. >> >> We'd be happy to answer any more questions you might have >> and >> thanks a lot! >> >> Best, >> >> Jan >> >> >> On 12/18/2017 03:19 PM, Ismaël Mejía wrote: >> >> Hi, >> >> It is great to see that you guys have achieved a >> maturity >> point to >> propose this. Congratulations for your work and the >> idea to >> contribute >> it into Beam. >> >> I remember from a previous discussion with Jan about >> the model >> mismatch between Euphoria and Beam, because of some >> design >> decisions >> of both projects. I remember you guys had some issues >> with >> the way >> Beam's sources do partitioning, as well as Beam's >> lack of >> sorted data >> (on shuffle a la hadoop). Also if I remember well the >> 'time' >> model of >> Euphoria was simpler than Beam's. I talk about all of >> this >> because I >> am curious about what parts of the Euphoria model you >> guys >> had to >> sacrifice to support Beam, and what parts of Beam's >> model >> should still >> be integrated into Euphoria (and if there is a >> straightforward path to >> do it). >> >> If I understand well if this gets merged into Apache >> this >> means that >> Euphoria's current implementation would be superseded >> by >> this DSL? I >> am curious because I would like to understand your >> level of >> investment >> on supporting the future of this DSL. >> >> Thanks and congrats again ! >> Ismaël >> >> On Mon, Dec 18, 2017 at 10:12 AM, Jean-Baptiste Onofré >> <j...@nanthrax.net <mailto:j...@nanthrax.net>> wrote: >> >> Depending of the donation, you would need ICLA >> for each >> contributor, and >> CCLA in addition of SGA. >> >> We can sync with Davor and I for the legal stuff. >> However, I would wait a little bit just to have >> feedback >> from the whole team >> and start a formal vote. >> >> I would be happy to start the formal vote. >> >> Regards >> JB >> >> On 12/18/2017 10:03 AM, David Morávek wrote: >> >> Hello, >> >> Thanks for the awesome feedback! >> >> Romain: >> >> We already use Java Stream API in all >> operators >> where it makes sense (eg.: >> ReduceByKey). Still not sure if it was a good >> choice, but i can be easily >> converted to iterator anyway. >> >> Side outputs support is coming soon, we >> already made >> an initial work on >> this. >> >> Side inputs are not supported in a way you >> are used >> to from beam, because >> it can be replaced by Join operator on the >> same key >> (if annotated with >> broadcastHashJoin, it will be turned into map >> side >> join). >> >> Only significant difference from Beam is, >> that we >> decided not to abstract >> serialization, so we need to add support for >> Type >> Hints, because of type >> erasure. >> >> Fluent API: >> >> API is fluent within one operator. It is >> designed to >> "lead the >> programmer", which means, that he we'll be >> only >> offered methods that makes >> sense after the last method he used (eg.: in >> ReduceByKey, we know that after >> keyBy either reduceBy method should come). It >> is >> implemented as a series of >> builders. >> >> Davor: >> >> Thanks, I'll contact you, and will start the >> process >> of having all the >> necessary paperwork signed on our side, so we >> can >> get things moving. >> >> >> >> >> >> >> >> >> >> >> >> >> On Mon, Dec 18, 2017 at 7:46 AM, Romain >> Manni-Bucau >> <rmannibu...@gmail.com <mailto: >> rmannibu...@gmail.com> >> <mailto:rmannibu...@gmail.com >> <mailto:rmannibu...@gmail.com>>> wrote: >> >> Hi guys >> >> A DSL would be very welcomed, in >> particular if >> fluent. >> >> Open question: did you study to >> implement >> Stream API (surely extending >> it to >> have a BeamStream and a few more >> features like >> sides etc)? Would be >> very >> natural and integrable easily anywhere >> and >> avoid a new API discovery. >> >> Hazelcast jet did it so I dont see why >> Beam >> couldnt. >> >> Le 18 déc. 2017 07:26, "Davor Bonaci" >> <da...@apache.org <mailto:da...@apache.org> >> <mailto:da...@apache.org >> >> <mailto:da...@apache.org>>> a écrit : >> >> Hi David, >> As JB noted, merging of these two >> projects >> is a great idea. If >> fact, >> some of us have had those >> discussions in >> the past. >> >> Legally, nothing particular is >> strictly >> necessary as the code seem >> to >> already be Apache 2.0 licensed. We >> don't, >> however, want to be >> perceived >> as making hostile forks, so it >> would be >> great to file a Software >> Grant >> Agreement with the ASF Secretary. I >> can >> help with the process, as >> necessary. >> >> Project alignment-wise, there >> aren't any >> particular blockers that >> I am >> aware of. We welcome DSLs. >> >> Technically, the code would start >> in a >> feature branch. During this >> stage, we'd need to validate a few >> things, >> including confirmation >> the >> code and dependencies match the ASF >> policy, automate testing in >> Beam's >> tooling, etc. At that point, we'd >> take a >> community vote to accept >> the >> component into master, and consider >> author(s) for committership in >> the >> overall project. >> >> Welcome to the ASF and Beam -- we >> are >> thrilled to have you! Hope >> this >> helps, and please reach out if >> anybody on >> our end can help, >> including JB >> or myself. >> >> Davor >> >> >> On Sun, Dec 17, 2017 at 10:13 AM, >> Jean-Baptiste Onofré >> <j...@nanthrax.net <mailto:j...@nanthrax.net> >> <mailto:j...@nanthrax.net >> >> <mailto:j...@nanthrax.net>>> wrote: >> >> Hi David, >> >> Generally speaking, having >> different >> fluent DSL on top of the >> Beam >> SDK is great. >> >> I would like to take a look on >> your >> wordcount examples to give >> you a >> complete feedback. I like the >> idea and >> a fluent Java DSL is >> valuable. >> >> Let's wait feedback from >> others. If we >> have a consensus, then >> I >> would be more than happy to >> help you >> for the donation (I >> worked on >> the Camel Java DSL while ago, >> so I >> have some experience here). >> >> Thanks ! >> Regards >> JB >> >> On 12/17/2017 07:00 PM, David >> Morávek >> wrote: >> >> Hello, >> >> >> First of all, thanks for the >> amazing work the Apache Beam >> community is doing! >> >> >> In 2014, we've started >> development >> of the runtime >> independent >> Java 8 API, that helps us to >> create unified big-data >> processing >> flows. It has been used as >> a core >> building block of >> Seznam.cz >> web crawler data >> infrastructure >> every since. Its design >> principles and execution >> model are >> very similar to Apache >> Beam. >> >> >> This API was open sourced >> in 2016, >> under the name Euphoria >> API: >> >> https://github.com/seznam/euphoria >> <https://github.com/seznam/euphoria> >> <https://github.com/seznam/euphoria >> <https://github.com/seznam/euphoria>> >> >> >> As it is very similar to >> Apache >> Beam, we feel, that it is >> not >> worth of duplicating effort >> in >> terms of development of new >> runtimes and fine-tuning of >> current ones. >> >> >> The main blocker for us to >> switch >> to Apache Beam is lack >> of the >> Java 8 API. *W*e propose the >> integration of Euphoria API >> into >> Apache Beam as a Java 8 >> DSL, in >> order to share our effort >> with >> the community. >> >> >> Simple example of the >> Euphoria API >> usage, can be found >> here: >> >> >> https://github.com/seznam/euph >> oria/tree/master/euphoria-examples/src/main/java/cz/seznam/ >> euphoria/examples/wordcount >> <https://github.com/seznam/eup >> horia/tree/master/euphoria-examples/src/main/java/cz/seznam/ >> euphoria/examples/wordcount> >> >> >> <https://github.com/seznam/eup >> horia/tree/master/euphoria-examples/src/main/java/cz/seznam/ >> euphoria/examples/wordcount >> <https://github.com/seznam/eup >> horia/tree/master/euphoria-examples/src/main/java/cz/seznam/ >> euphoria/examples/wordcount>> >> >> >> >> If you feel, that Beam >> community >> could leverage from our >> work, >> we would love to start >> working on >> Euphoria integration >> into >> Apache Beam (we already >> have a >> working POC, with few basic >> operators implemented). >> >> >> I look forward to hearing >> from you, >> >> David >> >> >> -- Jean-Baptiste >> Onofré >> jbono...@apache.org <mailto: >> jbono...@apache.org> >> <mailto:jbono...@apache.org >> <mailto:jbono...@apache.org>> >> http://blog.nanthrax.net >> Talend - http://www.talend.com >> >> >> >> >> >> -- s pozdravem >> >> David Morávek >> >> >> -- Jean-Baptiste Onofré >> jbono...@apache.org <mailto:jbono...@apache.org> >> http://blog.nanthrax.net >> Talend - http://www.talend.com >> >> >> >> >> >> -- Jean-Baptiste Onofré >> jbono...@apache.org <mailto:jbono...@apache.org> >> http://blog.nanthrax.net >> Talend - http://www.talend.com >> >> >> >> >> -- >> s pozdravem >> >> David Morávek >> > > -- > Jean-Baptiste Onofré > jbono...@apache.org > http://blog.nanthrax.net > Talend - http://www.talend.com >