Hi, It is great to see that you guys have achieved a maturity point to propose this. Congratulations for your work and the idea to contribute it into Beam.
I remember from a previous discussion with Jan about the model mismatch between Euphoria and Beam, because of some design decisions of both projects. I remember you guys had some issues with the way Beam's sources do partitioning, as well as Beam's lack of sorted data (on shuffle a la hadoop). Also if I remember well the 'time' model of Euphoria was simpler than Beam's. I talk about all of this because I am curious about what parts of the Euphoria model you guys had to sacrifice to support Beam, and what parts of Beam's model should still be integrated into Euphoria (and if there is a straightforward path to do it). If I understand well if this gets merged into Apache this means that Euphoria's current implementation would be superseded by this DSL? I am curious because I would like to understand your level of investment on supporting the future of this DSL. Thanks and congrats again ! Ismaël On Mon, Dec 18, 2017 at 10:12 AM, Jean-Baptiste Onofré <[email protected]> wrote: > Depending of the donation, you would need ICLA for each contributor, and > CCLA in addition of SGA. > > We can sync with Davor and I for the legal stuff. > However, I would wait a little bit just to have feedback from the whole team > and start a formal vote. > > I would be happy to start the formal vote. > > Regards > JB > > On 12/18/2017 10:03 AM, David Morávek wrote: >> >> Hello, >> >> Thanks for the awesome feedback! >> >> Romain: >> >> We already use Java Stream API in all operators where it makes sense (eg.: >> ReduceByKey). Still not sure if it was a good choice, but i can be easily >> converted to iterator anyway. >> >> Side outputs support is coming soon, we already made an initial work on >> this. >> >> Side inputs are not supported in a way you are used to from beam, because >> it can be replaced by Join operator on the same key (if annotated with >> broadcastHashJoin, it will be turned into map side join). >> >> Only significant difference from Beam is, that we decided not to abstract >> serialization, so we need to add support for Type Hints, because of type >> erasure. >> >> Fluent API: >> >> API is fluent within one operator. It is designed to "lead the >> programmer", which means, that he we'll be only offered methods that makes >> sense after the last method he used (eg.: in ReduceByKey, we know that after >> keyBy either reduceBy method should come). It is implemented as a series of >> builders. >> >> Davor: >> >> Thanks, I'll contact you, and will start the process of having all the >> necessary paperwork signed on our side, so we can get things moving. >> >> >> >> >> >> >> >> >> >> >> >> >> On Mon, Dec 18, 2017 at 7:46 AM, Romain Manni-Bucau <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hi guys >> >> A DSL would be very welcomed, in particular if fluent. >> >> Open question: did you study to implement Stream API (surely extending >> it to >> have a BeamStream and a few more features like sides etc)? Would be >> very >> natural and integrable easily anywhere and avoid a new API discovery. >> >> Hazelcast jet did it so I dont see why Beam couldnt. >> >> Le 18 déc. 2017 07:26, "Davor Bonaci" <[email protected] >> <mailto:[email protected]>> a écrit : >> >> Hi David, >> As JB noted, merging of these two projects is a great idea. If >> fact, >> some of us have had those discussions in the past. >> >> Legally, nothing particular is strictly necessary as the code seem >> to >> already be Apache 2.0 licensed. We don't, however, want to be >> perceived >> as making hostile forks, so it would be great to file a Software >> Grant >> Agreement with the ASF Secretary. I can help with the process, as >> necessary. >> >> Project alignment-wise, there aren't any particular blockers that >> I am >> aware of. We welcome DSLs. >> >> Technically, the code would start in a feature branch. During this >> stage, we'd need to validate a few things, including confirmation >> the >> code and dependencies match the ASF policy, automate testing in >> Beam's >> tooling, etc. At that point, we'd take a community vote to accept >> the >> component into master, and consider author(s) for committership in >> the >> overall project. >> >> Welcome to the ASF and Beam -- we are thrilled to have you! Hope >> this >> helps, and please reach out if anybody on our end can help, >> including JB >> or myself. >> >> Davor >> >> >> On Sun, Dec 17, 2017 at 10:13 AM, Jean-Baptiste Onofré >> <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hi David, >> >> Generally speaking, having different fluent DSL on top of the >> Beam >> SDK is great. >> >> I would like to take a look on your wordcount examples to give >> you a >> complete feedback. I like the idea and a fluent Java DSL is >> valuable. >> >> Let's wait feedback from others. If we have a consensus, then >> I >> would be more than happy to help you for the donation (I >> worked on >> the Camel Java DSL while ago, so I have some experience here). >> >> Thanks ! >> Regards >> JB >> >> On 12/17/2017 07:00 PM, David Morávek wrote: >> >> Hello, >> >> >> First of all, thanks for the amazing work the Apache Beam >> community is doing! >> >> >> In 2014, we've started development of the runtime >> independent >> Java 8 API, that helps us to create unified big-data >> processing >> flows. It has been used as a core building block of >> Seznam.cz >> web crawler data infrastructure every since. Its design >> principles and execution model are very similar to Apache >> Beam. >> >> >> This API was open sourced in 2016, under the name Euphoria >> API: >> >> https://github.com/seznam/euphoria >> <https://github.com/seznam/euphoria> >> >> >> As it is very similar to Apache Beam, we feel, that it is >> not >> worth of duplicating effort in terms of development of new >> runtimes and fine-tuning of current ones. >> >> >> The main blocker for us to switch to Apache Beam is lack >> of the >> Java 8 API. *W*e propose the integration of Euphoria API >> into >> Apache Beam as a Java 8 DSL, in order to share our effort >> with >> the community. >> >> >> Simple example of the Euphoria API usage, can be found >> here: >> >> >> https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount >> >> <https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount> >> >> >> If you feel, that Beam community could leverage from our >> work, >> we would love to start working on Euphoria integration >> into >> Apache Beam (we already have a working POC, with few basic >> operators implemented). >> >> >> I look forward to hearing from you, >> >> David >> >> >> -- Jean-Baptiste Onofré >> [email protected] <mailto:[email protected]> >> http://blog.nanthrax.net >> Talend - http://www.talend.com >> >> >> >> >> >> -- >> s pozdravem >> >> David Morávek > > > -- > Jean-Baptiste Onofré > [email protected] > http://blog.nanthrax.net > Talend - http://www.talend.com
