I may have missed things, but any update on the progress of this donation? On Tue, Jan 2, 2018 at 10:52 PM, Jean-Baptiste Onofré <j...@nanthrax.net> wrote:
> Great ! > > Thanks ! > Regards > JB > > On 01/03/2018 07:29 AM, David Morávek wrote: > >> Hello JB, >> >> Perfect! I'm already on the Beam Slack workspace, I'll contact you once I >> get to the office. >> >> Thanks! >> D. >> >> On Wed, Jan 3, 2018 at 6:19 AM, Jean-Baptiste Onofré <j...@nanthrax.net >> <mailto:j...@nanthrax.net>> wrote: >> >> Hi David, >> >> absolutely !! Let's move forward on the preparation steps. >> >> Are you on Slack and/or hangout to plan this ? >> >> Thanks, >> Regards >> JB >> >> On 01/02/2018 05:35 PM, David Morávek wrote: >> >> Hello JB, >> >> can we help in any way to move things forward? >> >> Thanks, >> D. >> >> On Mon, Dec 18, 2017 at 4:28 PM, Jean-Baptiste Onofré < >> j...@nanthrax.net >> <mailto:j...@nanthrax.net> <mailto:j...@nanthrax.net >> <mailto:j...@nanthrax.net>>> wrote: >> >> Thanks Jan, >> >> It makes sense. >> >> Let me take a look on the code to understand the >> "interaction". >> >> Regards >> JB >> >> >> On 12/18/2017 04:26 PM, Jan Lukavský wrote: >> >> Hi JB, >> >> basically you are not wrong. The project started about >> three or >> four >> years ago with a goal to unify batch and streaming >> processing into >> single portable, executor independent API. Because of >> that, it is >> currently "close" to Beam in this sense. But we don't >> see much >> added >> value keeping this as a separate project, with one of >> the key >> differences to be the API (not the model itself), so we >> would >> like to >> focus on translation from Euphoria API to Beam's SDK. >> That's why we >> would like to see it as a DSL, so that it would be >> possible to use >> Euphoria API with Beam's runners as much natively as >> possible. >> >> I hope I didn't make the subject even more unclear, if >> so, I'll >> be happy >> to explain anything in more detail. :-) >> >> Jan >> >> >> On 12/18/2017 04:08 PM, Jean-Baptiste Onofré wrote: >> >> Hi Jan, >> >> Thanks for your answers. >> >> However, they confused me ;) >> >> Regarding what you replied, Euphoria seems like a >> programming >> model/SDK "close" to Beam more than a DSL on top of >> an >> existing Beam >> SDK. >> >> Am I wrong ? >> >> Regards >> JB >> >> On 12/18/2017 03:44 PM, Jan Lukavský wrote: >> >> Hi Ismael, >> >> basically we adopted the Beam's design regarding >> partitioning >> (https://github.com/seznam/euphoria/issues/160 >> <https://github.com/seznam/euphoria/issues/160> >> <https://github.com/seznam/euphoria/issues/160 >> <https://github.com/seznam/euphoria/issues/160>>) and implemented >> the sorting manually >> (https://github.com/seznam/euphoria/issues/158 >> <https://github.com/seznam/euphoria/issues/158> >> <https://github.com/seznam/euphoria/issues/158 >> <https://github.com/seznam/euphoria/issues/158>>). I'm not aware >> of the time model differences (Euphoria supports >> ingestion and >> event time, we don't support processing time by >> decision). >> Regarding other differences (looking into Beam >> capability >> matrix, I'd say that): >> >> - we don't support stateful FlatMap (i.e. >> ParDo) for now >> (https://github.com/seznam/euphoria/issues/192 >> <https://github.com/seznam/euphoria/issues/192> >> <https://github.com/seznam/euphoria/issues/192 >> <https://github.com/seznam/euphoria/issues/192>>) >> >> - we don't support side inputs (by decision >> now, but >> might be >> reconsidered) and outputs >> (https://github.com/seznam/euphoria/issues/124 >> <https://github.com/seznam/euphoria/issues/124> >> <https://github.com/seznam/euphoria/issues/124 >> <https://github.com/seznam/euphoria/issues/124>>) >> >> >> - we support complete event-time windows >> (non-merging, >> merging, aligned, unaligned) and time control >> >> - we don't support processing time by >> decision (might be >> reconsidered if a valid use-case is found) >> >> - we support window triggering based on both >> time >> and data, >> including discarding and accumulating (without >> accumulating & >> retracting) >> >> All our executors (runners) - Flink, Spark and >> Local - >> implement >> the complete model, which we enforce using >> "operator >> test kit" >> that all executors must pass. Spark executor >> supports >> bounded >> sources only (for now). As David said, we >> currently >> don't have >> serialization abstraction, so there is some work >> to be >> done in >> that regard. >> >> Our intention is to completely supersede >> Euphoria, we >> would like >> to consider possibility to use executors that >> would not >> rely on >> Beam, but that is optional now and should be >> straightforward. >> >> We'd be happy to answer any more questions you >> might >> have and >> thanks a lot! >> >> Best, >> >> Jan >> >> >> On 12/18/2017 03:19 PM, Ismaël Mejía wrote: >> >> Hi, >> >> It is great to see that you guys have >> achieved a >> maturity >> point to >> propose this. Congratulations for your work >> and the >> idea to >> contribute >> it into Beam. >> >> I remember from a previous discussion with >> Jan >> about the model >> mismatch between Euphoria and Beam, because >> of some >> design >> decisions >> of both projects. I remember you guys had >> some >> issues with >> the way >> Beam's sources do partitioning, as well as >> Beam's >> lack of >> sorted data >> (on shuffle a la hadoop). Also if I remember >> well >> the 'time' >> model of >> Euphoria was simpler than Beam's. I talk >> about all >> of this >> because I >> am curious about what parts of the Euphoria >> model >> you guys >> had to >> sacrifice to support Beam, and what parts of >> Beam's >> model >> should still >> be integrated into Euphoria (and if there is >> a >> straightforward path to >> do it). >> >> If I understand well if this gets merged into >> Apache this >> means that >> Euphoria's current implementation would be >> superseded by >> this DSL? I >> am curious because I would like to >> understand your >> level of >> investment >> on supporting the future of this DSL. >> >> Thanks and congrats again ! >> Ismaël >> >> On Mon, Dec 18, 2017 at 10:12 AM, >> Jean-Baptiste Onofré >> <j...@nanthrax.net <mailto:j...@nanthrax.net> >> <mailto:j...@nanthrax.net <mailto:j...@nanthrax.net>>> wrote: >> >> Depending of the donation, you would >> need ICLA >> for each >> contributor, and >> CCLA in addition of SGA. >> >> We can sync with Davor and I for the >> legal stuff. >> However, I would wait a little bit just >> to have >> feedback >> from the whole team >> and start a formal vote. >> >> I would be happy to start the formal >> vote. >> >> Regards >> JB >> >> On 12/18/2017 10:03 AM, David Morávek >> wrote: >> >> Hello, >> >> Thanks for the awesome feedback! >> >> Romain: >> >> We already use Java Stream API in >> all operators >> where it makes sense (eg.: >> ReduceByKey). Still not sure if it >> was a good >> choice, but i can be easily >> converted to iterator anyway. >> >> Side outputs support is coming soon, >> we >> already made >> an initial work on >> this. >> >> Side inputs are not supported in a >> way you >> are used >> to from beam, because >> it can be replaced by Join operator >> on the >> same key >> (if annotated with >> broadcastHashJoin, it will be turned >> into >> map side >> join). >> >> Only significant difference from >> Beam is, >> that we >> decided not to abstract >> serialization, so we need to add >> support >> for Type >> Hints, because of type >> erasure. >> >> Fluent API: >> >> API is fluent within one operator. >> It is >> designed to >> "lead the >> programmer", which means, that he >> we'll be only >> offered methods that makes >> sense after the last method he used >> (eg.: in >> ReduceByKey, we know that after >> keyBy either reduceBy method should >> come). >> It is >> implemented as a series of >> builders. >> >> Davor: >> >> Thanks, I'll contact you, and will >> start >> the process >> of having all the >> necessary paperwork signed on our >> side, so >> we can >> get things moving. >> >> >> >> >> >> >> >> >> >> >> >> >> On Mon, Dec 18, 2017 at 7:46 AM, >> Romain >> Manni-Bucau >> <rmannibu...@gmail.com >> <mailto:rmannibu...@gmail.com> <mailto:rmannibu...@gmail.com >> <mailto:rmannibu...@gmail.com>> >> <mailto:rmannibu...@gmail.com >> <mailto:rmannibu...@gmail.com> >> <mailto:rmannibu...@gmail.com >> <mailto:rmannibu...@gmail.com>>>> wrote: >> >> Hi guys >> >> A DSL would be very welcomed, >> in >> particular if >> fluent. >> >> Open question: did you study >> to implement >> Stream API (surely extending >> it to >> have a BeamStream and a few >> more >> features like >> sides etc)? Would be >> very >> natural and integrable easily >> anywhere and >> avoid a new API discovery. >> >> Hazelcast jet did it so I dont >> see >> why Beam >> couldnt. >> >> Le 18 déc. 2017 07:26, "Davor >> Bonaci" >> <da...@apache.org <mailto: >> da...@apache.org> >> <mailto:da...@apache.org <mailto:da...@apache.org>> >> <mailto:da...@apache.org >> <mailto:da...@apache.org> >> >> <mailto:da...@apache.org >> <mailto:da...@apache.org>>>> a écrit : >> >> Hi David, >> As JB noted, merging of >> these two >> projects >> is a great idea. If >> fact, >> some of us have had those >> discussions in >> the past. >> >> Legally, nothing >> particular is >> strictly >> necessary as the code seem >> to >> already be Apache 2.0 >> licensed. >> We don't, >> however, want to be >> perceived >> as making hostile forks, >> so it >> would be >> great to file a Software >> Grant >> Agreement with the ASF >> Secretary. >> I can >> help with the process, as >> necessary. >> >> Project alignment-wise, >> there >> aren't any >> particular blockers that >> I am >> aware of. We welcome DSLs. >> >> Technically, the code >> would start >> in a >> feature branch. During this >> stage, we'd need to >> validate a >> few things, >> including confirmation >> the >> code and dependencies >> match the ASF >> policy, automate testing in >> Beam's >> tooling, etc. At that >> point, we'd >> take a >> community vote to accept >> the >> component into master, and >> consider >> author(s) for committership in >> the >> overall project. >> >> Welcome to the ASF and >> Beam -- we are >> thrilled to have you! Hope >> this >> helps, and please reach >> out if >> anybody on >> our end can help, >> including JB >> or myself. >> >> Davor >> >> >> On Sun, Dec 17, 2017 at >> 10:13 AM, >> Jean-Baptiste Onofré >> <j...@nanthrax.net <mailto: >> j...@nanthrax.net> >> <mailto:j...@nanthrax.net <mailto:j...@nanthrax.net>> >> <mailto:j...@nanthrax.net >> <mailto:j...@nanthrax.net> >> >> <mailto:j...@nanthrax.net >> <mailto:j...@nanthrax.net>>>> wrote: >> >> Hi David, >> >> Generally speaking, >> having >> different >> fluent DSL on top of the >> Beam >> SDK is great. >> >> I would like to take a >> look >> on your >> wordcount examples to give >> you a >> complete feedback. I >> like the >> idea and >> a fluent Java DSL is >> valuable. >> >> Let's wait feedback >> from >> others. If we >> have a consensus, then >> I >> would be more than >> happy to >> help you >> for the donation (I >> worked on >> the Camel Java DSL >> while ago, >> so I >> have some experience here). >> >> Thanks ! >> Regards >> JB >> >> On 12/17/2017 07:00 >> PM, David >> Morávek >> wrote: >> >> Hello, >> >> >> First of all, >> thanks for the >> amazing work the Apache Beam >> community is doing! >> >> >> In 2014, we've >> started >> development >> of the runtime >> independent >> Java 8 API, that >> helps us to >> create unified big-data >> processing >> flows. It has been >> used >> as a core >> building block of >> Seznam.cz >> web crawler data >> infrastructure >> every since. Its design >> principles and >> execution >> model are >> very similar to Apache >> Beam. >> >> >> This API was open >> sourced >> in 2016, >> under the name Euphoria >> API: >> >> https://github.com/seznam/euphoria <https://github.com/seznam/eup >> horia> >> <https://github.com/seznam/euphoria >> <https://github.com/seznam/euphoria>> >> <https://github.com/seznam/euphoria >> <https://github.com/seznam/euphoria> >> <https://github.com/seznam/euphoria >> <https://github.com/seznam/euphoria>>> >> >> >> As it is very >> similar to >> Apache >> Beam, we feel, that it is >> not >> worth of >> duplicating >> effort in >> terms of development of new >> runtimes and >> fine-tuning of >> current ones. >> >> >> The main blocker >> for us >> to switch >> to Apache Beam is lack >> of the >> Java 8 API. *W*e >> propose the >> integration of Euphoria API >> into >> Apache Beam as a >> Java 8 >> DSL, in >> order to share our effort >> with >> the community. >> >> >> Simple example of >> the >> Euphoria API >> usage, can be found >> here: >> >> >> https://github.com/seznam/euphoria/tree/master/euphoria-exam >> ples/src/main/java/cz/seznam/euphoria/examples/wordcount >> <https://github.com/seznam/euphoria/tree/master/euphoria-exa >> mples/src/main/java/cz/seznam/euphoria/examples/wordcount> >> < >> https://github.com/seznam/euphoria/tree/master/euphoria-exa >> mples/src/main/java/cz/seznam/euphoria/examples/wordcount >> <https://github.com/seznam/euphoria/tree/master/euphoria-exa >> mples/src/main/java/cz/seznam/euphoria/examples/wordcount>> >> >> >> < >> https://github.com/seznam/euphoria/tree/master/euphoria-exa >> mples/src/main/java/cz/seznam/euphoria/examples/wordcount >> <https://github.com/seznam/euphoria/tree/master/euphoria-exa >> mples/src/main/java/cz/seznam/euphoria/examples/wordcount> >> < >> https://github.com/seznam/euphoria/tree/master/euphoria-exa >> mples/src/main/java/cz/seznam/euphoria/examples/wordcount >> <https://github.com/seznam/euphoria/tree/master/euphoria-exa >> mples/src/main/java/cz/seznam/euphoria/examples/wordcount>>> >> >> >> >> If you feel, that >> Beam >> community >> could leverage from our >> work, >> we would love to >> start >> working on >> Euphoria integration >> into >> Apache Beam (we >> already >> have a >> working POC, with few basic >> operators >> implemented). >> >> >> I look forward to >> hearing >> from you, >> >> David >> >> >> -- >> Jean-Baptiste >> Onofré >> jbono...@apache.org <mailto:jbono...@apache.org> >> <mailto:jbono...@apache.org <mailto:jbono...@apache.org>> >> <mailto:jbono...@apache.org >> <mailto:jbono...@apache.org> >> <mailto:jbono...@apache.org >> <mailto:jbono...@apache.org>>> >> http://blog.nanthrax.net >> Talend - >> http://www.talend.com >> >> >> >> >> >> -- s >> pozdravem >> >> David Morávek >> >> >> -- Jean-Baptiste >> Onofré >> jbono...@apache.org <mailto:jbono...@apache.org> >> <mailto:jbono...@apache.org <mailto:jbono...@apache.org>> >> http://blog.nanthrax.net >> Talend - http://www.talend.com >> >> >> >> >> >> -- Jean-Baptiste Onofré >> jbono...@apache.org <mailto:jbono...@apache.org> >> <mailto:jbono...@apache.org <mailto:jbono...@apache.org>> >> http://blog.nanthrax.net >> Talend - http://www.talend.com >> >> >> >> >> -- s pozdravem >> >> David Morávek >> >> >> -- Jean-Baptiste Onofré >> jbono...@apache.org <mailto:jbono...@apache.org> >> http://blog.nanthrax.net >> Talend - http://www.talend.com >> >> >> > -- > Jean-Baptiste Onofré > jbono...@apache.org > http://blog.nanthrax.net > Talend - http://www.talend.com >