Hello JB,

can we help in any way to move things forward?

Thanks,
D.

On Mon, Dec 18, 2017 at 4:28 PM, Jean-Baptiste Onofré <[email protected]>
wrote:

> Thanks Jan,
>
> It makes sense.
>
> Let me take a look on the code to understand the "interaction".
>
> Regards
> JB
>
>
> On 12/18/2017 04:26 PM, Jan Lukavský wrote:
>
>> Hi JB,
>>
>> basically you are not wrong. The project started about three or four
>> years ago with a goal to unify batch and streaming processing into single
>> portable, executor independent API. Because of that, it is currently
>> "close" to Beam in this sense. But we don't see much added value keeping
>> this as a separate project, with one of the key differences to be the API
>> (not the model itself), so we would like to focus on translation from
>> Euphoria API to Beam's SDK. That's why we would like to see it as a DSL, so
>> that it would be possible to use Euphoria API with Beam's runners as much
>> natively as possible.
>>
>> I hope I didn't make the subject even more unclear, if so, I'll be happy
>> to explain anything in more detail. :-)
>>
>>    Jan
>>
>>
>> On 12/18/2017 04:08 PM, Jean-Baptiste Onofré wrote:
>>
>>> Hi Jan,
>>>
>>> Thanks for your answers.
>>>
>>> However, they confused me ;)
>>>
>>> Regarding what you replied, Euphoria seems like a programming model/SDK
>>> "close" to Beam more than a DSL on top of an existing Beam SDK.
>>>
>>> Am I wrong ?
>>>
>>> Regards
>>> JB
>>>
>>> On 12/18/2017 03:44 PM, Jan Lukavský wrote:
>>>
>>>> Hi Ismael,
>>>>
>>>> basically we adopted the Beam's design regarding partitioning (
>>>> https://github.com/seznam/euphoria/issues/160) and implemented the
>>>> sorting manually (https://github.com/seznam/euphoria/issues/158). I'm
>>>> not aware of the time model differences (Euphoria supports ingestion and
>>>> event time, we don't support processing time by decision). Regarding other
>>>> differences (looking into Beam capability matrix, I'd say that):
>>>>
>>>>   - we don't support stateful FlatMap (i.e. ParDo) for now (
>>>> https://github.com/seznam/euphoria/issues/192)
>>>>
>>>>   - we don't support side inputs (by decision now, but might be
>>>> reconsidered) and outputs (https://github.com/seznam/eup
>>>> horia/issues/124)
>>>>
>>>>   - we support complete event-time windows (non-merging, merging,
>>>> aligned, unaligned) and time control
>>>>
>>>>   - we don't support processing time by decision (might be reconsidered
>>>> if a valid use-case is found)
>>>>
>>>>   - we support window triggering based on both time and data, including
>>>> discarding and accumulating (without accumulating & retracting)
>>>>
>>>> All our executors (runners) - Flink, Spark and Local - implement the
>>>> complete model, which we enforce using "operator test kit" that all
>>>> executors must pass. Spark executor supports bounded sources only (for
>>>> now). As David said, we currently don't have serialization abstraction, so
>>>> there is some work to be done in that regard.
>>>>
>>>> Our intention is to completely supersede Euphoria, we would like to
>>>> consider possibility to use executors that would not rely on Beam, but that
>>>> is optional now and should be straightforward.
>>>>
>>>> We'd be happy to answer any more questions you might have and thanks a
>>>> lot!
>>>>
>>>> Best,
>>>>
>>>>   Jan
>>>>
>>>>
>>>> On 12/18/2017 03:19 PM, Ismaël Mejía wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> It is great to see that you guys have achieved a maturity point to
>>>>> propose this. Congratulations for your work and the idea to contribute
>>>>> it into Beam.
>>>>>
>>>>> I remember from a previous discussion with Jan about the model
>>>>> mismatch between Euphoria and Beam, because of some design decisions
>>>>> of both projects. I remember you guys had some issues with the way
>>>>> Beam's sources do partitioning, as well as Beam's lack of sorted data
>>>>> (on shuffle a la hadoop). Also if I remember well the 'time' model of
>>>>> Euphoria was simpler than Beam's. I talk about all of this because I
>>>>> am curious about what parts of the Euphoria model you guys had to
>>>>> sacrifice to support Beam, and what parts of Beam's model should still
>>>>> be integrated into Euphoria (and if there is a straightforward path to
>>>>> do it).
>>>>>
>>>>> If I understand well if this gets merged into Apache this means that
>>>>> Euphoria's current implementation would be superseded by this DSL? I
>>>>> am curious because I would like to understand your level of investment
>>>>> on supporting the future of this DSL.
>>>>>
>>>>> Thanks and congrats again !
>>>>> Ismaël
>>>>>
>>>>> On Mon, Dec 18, 2017 at 10:12 AM, Jean-Baptiste Onofré <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Depending of the donation, you would need ICLA for each contributor,
>>>>>> and
>>>>>> CCLA in addition of SGA.
>>>>>>
>>>>>> We can sync with Davor and I for the legal stuff.
>>>>>> However, I would wait a little bit just to have feedback from the
>>>>>> whole team
>>>>>> and start a formal vote.
>>>>>>
>>>>>> I would be happy to start the formal vote.
>>>>>>
>>>>>> Regards
>>>>>> JB
>>>>>>
>>>>>> On 12/18/2017 10:03 AM, David Morávek wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> Thanks for the awesome feedback!
>>>>>>>
>>>>>>> Romain:
>>>>>>>
>>>>>>> We already use Java Stream API in all operators where it makes sense
>>>>>>> (eg.:
>>>>>>> ReduceByKey). Still not sure if it was a good choice, but i can be
>>>>>>> easily
>>>>>>> converted to iterator anyway.
>>>>>>>
>>>>>>> Side outputs support is coming soon, we already made an initial work
>>>>>>> on
>>>>>>> this.
>>>>>>>
>>>>>>> Side inputs are not supported in a way you are used to from beam,
>>>>>>> because
>>>>>>> it can be replaced by Join operator on the same key (if annotated
>>>>>>> with
>>>>>>> broadcastHashJoin, it will be turned into map side join).
>>>>>>>
>>>>>>> Only significant difference from Beam is, that we decided not to
>>>>>>> abstract
>>>>>>> serialization, so we need to add support for Type Hints, because of
>>>>>>> type
>>>>>>> erasure.
>>>>>>>
>>>>>>> Fluent API:
>>>>>>>
>>>>>>> API is fluent within one operator. It is designed to "lead the
>>>>>>> programmer", which means, that he we'll be only offered methods that
>>>>>>> makes
>>>>>>> sense after the last method he used (eg.: in ReduceByKey, we know
>>>>>>> that after
>>>>>>> keyBy either reduceBy method should come). It is implemented as a
>>>>>>> series of
>>>>>>> builders.
>>>>>>>
>>>>>>> Davor:
>>>>>>>
>>>>>>> Thanks, I'll contact you, and will start the process of having all
>>>>>>> the
>>>>>>> necessary paperwork signed on our side, so we can get things moving.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Dec 18, 2017 at 7:46 AM, Romain Manni-Bucau <
>>>>>>> [email protected]
>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>
>>>>>>>      Hi guys
>>>>>>>
>>>>>>>      A DSL would be very welcomed, in particular if fluent.
>>>>>>>
>>>>>>>      Open question: did you study to implement Stream API (surely
>>>>>>> extending
>>>>>>> it to
>>>>>>>      have a BeamStream and a few more features like sides etc)?
>>>>>>> Would be
>>>>>>> very
>>>>>>>      natural and integrable easily anywhere and avoid a new API
>>>>>>> discovery.
>>>>>>>
>>>>>>>      Hazelcast jet did it so I dont see why Beam couldnt.
>>>>>>>
>>>>>>>      Le 18 déc. 2017 07:26, "Davor Bonaci" <[email protected]
>>>>>>>      <mailto:[email protected]>> a écrit :
>>>>>>>
>>>>>>>          Hi David,
>>>>>>>          As JB noted, merging of these two projects is a great idea.
>>>>>>> If
>>>>>>> fact,
>>>>>>>          some of us have had those discussions in the past.
>>>>>>>
>>>>>>>          Legally, nothing particular is strictly necessary as the
>>>>>>> code seem
>>>>>>> to
>>>>>>>          already be Apache 2.0 licensed. We don't, however, want to
>>>>>>> be
>>>>>>> perceived
>>>>>>>          as making hostile forks, so it would be great to file a
>>>>>>> Software
>>>>>>> Grant
>>>>>>>          Agreement with the ASF Secretary. I can help with the
>>>>>>> process, as
>>>>>>> necessary.
>>>>>>>
>>>>>>>          Project alignment-wise, there aren't any particular
>>>>>>> blockers that
>>>>>>> I am
>>>>>>>          aware of. We welcome DSLs.
>>>>>>>
>>>>>>>          Technically, the code would start in a feature branch.
>>>>>>> During this
>>>>>>>          stage, we'd need to validate a few things, including
>>>>>>> confirmation
>>>>>>> the
>>>>>>>          code and dependencies match the ASF policy, automate
>>>>>>> testing in
>>>>>>> Beam's
>>>>>>>          tooling, etc. At that point, we'd take a community vote to
>>>>>>> accept
>>>>>>> the
>>>>>>>          component into master, and consider author(s) for
>>>>>>> committership in
>>>>>>> the
>>>>>>>          overall project.
>>>>>>>
>>>>>>>          Welcome to the ASF and Beam -- we are thrilled to have you!
>>>>>>> Hope
>>>>>>> this
>>>>>>>          helps, and please reach out if anybody on our end can help,
>>>>>>> including JB
>>>>>>>          or myself.
>>>>>>>
>>>>>>>          Davor
>>>>>>>
>>>>>>>
>>>>>>>          On Sun, Dec 17, 2017 at 10:13 AM, Jean-Baptiste Onofré
>>>>>>> <[email protected]
>>>>>>>          <mailto:[email protected]>> wrote:
>>>>>>>
>>>>>>>              Hi David,
>>>>>>>
>>>>>>>              Generally speaking, having different fluent DSL on top
>>>>>>> of the
>>>>>>> Beam
>>>>>>>              SDK is great.
>>>>>>>
>>>>>>>              I would like to take a look on your wordcount examples
>>>>>>> to give
>>>>>>> you a
>>>>>>>              complete feedback. I like the idea and a fluent Java
>>>>>>> DSL is
>>>>>>> valuable.
>>>>>>>
>>>>>>>              Let's wait feedback from others. If we have a
>>>>>>> consensus, then
>>>>>>> I
>>>>>>>              would be more than happy to help you for the donation (I
>>>>>>> worked on
>>>>>>>              the Camel Java DSL while ago, so I have some experience
>>>>>>> here).
>>>>>>>
>>>>>>>              Thanks !
>>>>>>>              Regards
>>>>>>>              JB
>>>>>>>
>>>>>>>              On 12/17/2017 07:00 PM, David Morávek wrote:
>>>>>>>
>>>>>>>                  Hello,
>>>>>>>
>>>>>>>
>>>>>>>                  First of all, thanks for the amazing work the
>>>>>>> Apache Beam
>>>>>>>                  community is doing!
>>>>>>>
>>>>>>>
>>>>>>>                  In 2014, we've started development of the runtime
>>>>>>> independent
>>>>>>>                  Java 8 API, that helps us to create unified big-data
>>>>>>> processing
>>>>>>>                  flows. It has been used as a core building block of
>>>>>>> Seznam.cz
>>>>>>>                  web crawler data infrastructure every since. Its
>>>>>>> design
>>>>>>>                  principles and execution model are very similar to
>>>>>>> Apache
>>>>>>> Beam.
>>>>>>>
>>>>>>>
>>>>>>>                  This API was open sourced in 2016, under the name
>>>>>>> Euphoria
>>>>>>> API:
>>>>>>>
>>>>>>>                  https://github.com/seznam/euphoria
>>>>>>> <https://github.com/seznam/euphoria>
>>>>>>>
>>>>>>>
>>>>>>>                  As it is very similar to Apache Beam, we feel, that
>>>>>>> it is
>>>>>>> not
>>>>>>>                  worth of duplicating effort in terms of development
>>>>>>> of new
>>>>>>>                  runtimes and fine-tuning of current ones.
>>>>>>>
>>>>>>>
>>>>>>>                  The main blocker for us to switch to Apache Beam is
>>>>>>> lack
>>>>>>> of the
>>>>>>>                  Java 8 API. *W*e propose the integration of
>>>>>>> Euphoria API
>>>>>>> into
>>>>>>>                  Apache Beam as a Java 8 DSL, in order to share our
>>>>>>> effort
>>>>>>> with
>>>>>>>                  the community.
>>>>>>>
>>>>>>>
>>>>>>>                  Simple example of the Euphoria API usage, can be
>>>>>>> found
>>>>>>> here:
>>>>>>>
>>>>>>>
>>>>>>> https://github.com/seznam/euphoria/tree/master/euphoria-exam
>>>>>>> ples/src/main/java/cz/seznam/euphoria/examples/wordcount
>>>>>>>
>>>>>>> <https://github.com/seznam/euphoria/tree/master/euphoria-exa
>>>>>>> mples/src/main/java/cz/seznam/euphoria/examples/wordcount>
>>>>>>>
>>>>>>>
>>>>>>>                  If you feel, that Beam community could leverage
>>>>>>> from our
>>>>>>> work,
>>>>>>>                  we would love to start working on Euphoria
>>>>>>> integration
>>>>>>> into
>>>>>>>                  Apache Beam (we already have a working POC, with
>>>>>>> few basic
>>>>>>>                  operators implemented).
>>>>>>>
>>>>>>>
>>>>>>>                  I look forward to hearing from you,
>>>>>>>
>>>>>>>                  David
>>>>>>>
>>>>>>>
>>>>>>>              --             Jean-Baptiste Onofré
>>>>>>>              [email protected] <mailto:[email protected]>
>>>>>>>              http://blog.nanthrax.net
>>>>>>>              Talend - http://www.talend.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> s pozdravem
>>>>>>>
>>>>>>> David Morávek
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jean-Baptiste Onofré
>>>>>> [email protected]
>>>>>> http://blog.nanthrax.net
>>>>>> Talend - http://www.talend.com
>>>>>>
>>>>>
>>>>
>>>
>>
> --
> Jean-Baptiste Onofré
> [email protected]
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>



-- 
s pozdravem

David Morávek

Reply via email to