+1, I'm supportive of seeing this move forward. What remaining concrete
concerns are there?

-Tyler


On Tue, Jan 2, 2018 at 8:35 AM David Morávek <david.mora...@gmail.com>
wrote:

> Hello JB,
>
> can we help in any way to move things forward?
>
> Thanks,
> D.
>
> On Mon, Dec 18, 2017 at 4:28 PM, Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
>
>> Thanks Jan,
>>
>> It makes sense.
>>
>> Let me take a look on the code to understand the "interaction".
>>
>> Regards
>> JB
>>
>>
>> On 12/18/2017 04:26 PM, Jan Lukavský wrote:
>>
>>> Hi JB,
>>>
>>> basically you are not wrong. The project started about three or four
>>> years ago with a goal to unify batch and streaming processing into single
>>> portable, executor independent API. Because of that, it is currently
>>> "close" to Beam in this sense. But we don't see much added value keeping
>>> this as a separate project, with one of the key differences to be the API
>>> (not the model itself), so we would like to focus on translation from
>>> Euphoria API to Beam's SDK. That's why we would like to see it as a DSL, so
>>> that it would be possible to use Euphoria API with Beam's runners as much
>>> natively as possible.
>>>
>>> I hope I didn't make the subject even more unclear, if so, I'll be happy
>>> to explain anything in more detail. :-)
>>>
>>>    Jan
>>>
>>>
>>> On 12/18/2017 04:08 PM, Jean-Baptiste Onofré wrote:
>>>
>>>> Hi Jan,
>>>>
>>>> Thanks for your answers.
>>>>
>>>> However, they confused me ;)
>>>>
>>>> Regarding what you replied, Euphoria seems like a programming model/SDK
>>>> "close" to Beam more than a DSL on top of an existing Beam SDK.
>>>>
>>>> Am I wrong ?
>>>>
>>>> Regards
>>>> JB
>>>>
>>>> On 12/18/2017 03:44 PM, Jan Lukavský wrote:
>>>>
>>>>> Hi Ismael,
>>>>>
>>>>> basically we adopted the Beam's design regarding partitioning (
>>>>> https://github.com/seznam/euphoria/issues/160) and implemented the
>>>>> sorting manually (https://github.com/seznam/euphoria/issues/158). I'm
>>>>> not aware of the time model differences (Euphoria supports ingestion and
>>>>> event time, we don't support processing time by decision). Regarding other
>>>>> differences (looking into Beam capability matrix, I'd say that):
>>>>>
>>>>>   - we don't support stateful FlatMap (i.e. ParDo) for now (
>>>>> https://github.com/seznam/euphoria/issues/192)
>>>>>
>>>>>   - we don't support side inputs (by decision now, but might be
>>>>> reconsidered) and outputs (
>>>>> https://github.com/seznam/euphoria/issues/124)
>>>>>
>>>>>   - we support complete event-time windows (non-merging, merging,
>>>>> aligned, unaligned) and time control
>>>>>
>>>>>   - we don't support processing time by decision (might be
>>>>> reconsidered if a valid use-case is found)
>>>>>
>>>>>   - we support window triggering based on both time and data,
>>>>> including discarding and accumulating (without accumulating & retracting)
>>>>>
>>>>> All our executors (runners) - Flink, Spark and Local - implement the
>>>>> complete model, which we enforce using "operator test kit" that all
>>>>> executors must pass. Spark executor supports bounded sources only (for
>>>>> now). As David said, we currently don't have serialization abstraction, so
>>>>> there is some work to be done in that regard.
>>>>>
>>>>> Our intention is to completely supersede Euphoria, we would like to
>>>>> consider possibility to use executors that would not rely on Beam, but 
>>>>> that
>>>>> is optional now and should be straightforward.
>>>>>
>>>>> We'd be happy to answer any more questions you might have and thanks a
>>>>> lot!
>>>>>
>>>>> Best,
>>>>>
>>>>>   Jan
>>>>>
>>>>>
>>>>> On 12/18/2017 03:19 PM, Ismaël Mejía wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> It is great to see that you guys have achieved a maturity point to
>>>>>> propose this. Congratulations for your work and the idea to contribute
>>>>>> it into Beam.
>>>>>>
>>>>>> I remember from a previous discussion with Jan about the model
>>>>>> mismatch between Euphoria and Beam, because of some design decisions
>>>>>> of both projects. I remember you guys had some issues with the way
>>>>>> Beam's sources do partitioning, as well as Beam's lack of sorted data
>>>>>> (on shuffle a la hadoop). Also if I remember well the 'time' model of
>>>>>> Euphoria was simpler than Beam's. I talk about all of this because I
>>>>>> am curious about what parts of the Euphoria model you guys had to
>>>>>> sacrifice to support Beam, and what parts of Beam's model should still
>>>>>> be integrated into Euphoria (and if there is a straightforward path to
>>>>>> do it).
>>>>>>
>>>>>> If I understand well if this gets merged into Apache this means that
>>>>>> Euphoria's current implementation would be superseded by this DSL? I
>>>>>> am curious because I would like to understand your level of investment
>>>>>> on supporting the future of this DSL.
>>>>>>
>>>>>> Thanks and congrats again !
>>>>>> Ismaël
>>>>>>
>>>>>> On Mon, Dec 18, 2017 at 10:12 AM, Jean-Baptiste Onofré <
>>>>>> j...@nanthrax.net> wrote:
>>>>>>
>>>>>>> Depending of the donation, you would need ICLA for each contributor,
>>>>>>> and
>>>>>>> CCLA in addition of SGA.
>>>>>>>
>>>>>>> We can sync with Davor and I for the legal stuff.
>>>>>>> However, I would wait a little bit just to have feedback from the
>>>>>>> whole team
>>>>>>> and start a formal vote.
>>>>>>>
>>>>>>> I would be happy to start the formal vote.
>>>>>>>
>>>>>>> Regards
>>>>>>> JB
>>>>>>>
>>>>>>> On 12/18/2017 10:03 AM, David Morávek wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> Thanks for the awesome feedback!
>>>>>>>>
>>>>>>>> Romain:
>>>>>>>>
>>>>>>>> We already use Java Stream API in all operators where it makes
>>>>>>>> sense (eg.:
>>>>>>>> ReduceByKey). Still not sure if it was a good choice, but i can be
>>>>>>>> easily
>>>>>>>> converted to iterator anyway.
>>>>>>>>
>>>>>>>> Side outputs support is coming soon, we already made an initial
>>>>>>>> work on
>>>>>>>> this.
>>>>>>>>
>>>>>>>> Side inputs are not supported in a way you are used to from beam,
>>>>>>>> because
>>>>>>>> it can be replaced by Join operator on the same key (if annotated
>>>>>>>> with
>>>>>>>> broadcastHashJoin, it will be turned into map side join).
>>>>>>>>
>>>>>>>> Only significant difference from Beam is, that we decided not to
>>>>>>>> abstract
>>>>>>>> serialization, so we need to add support for Type Hints, because of
>>>>>>>> type
>>>>>>>> erasure.
>>>>>>>>
>>>>>>>> Fluent API:
>>>>>>>>
>>>>>>>> API is fluent within one operator. It is designed to "lead the
>>>>>>>> programmer", which means, that he we'll be only offered methods
>>>>>>>> that makes
>>>>>>>> sense after the last method he used (eg.: in ReduceByKey, we know
>>>>>>>> that after
>>>>>>>> keyBy either reduceBy method should come). It is implemented as a
>>>>>>>> series of
>>>>>>>> builders.
>>>>>>>>
>>>>>>>> Davor:
>>>>>>>>
>>>>>>>> Thanks, I'll contact you, and will start the process of having all
>>>>>>>> the
>>>>>>>> necessary paperwork signed on our side, so we can get things moving.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Dec 18, 2017 at 7:46 AM, Romain Manni-Bucau <
>>>>>>>> rmannibu...@gmail.com
>>>>>>>> <mailto:rmannibu...@gmail.com>> wrote:
>>>>>>>>
>>>>>>>>      Hi guys
>>>>>>>>
>>>>>>>>      A DSL would be very welcomed, in particular if fluent.
>>>>>>>>
>>>>>>>>      Open question: did you study to implement Stream API (surely
>>>>>>>> extending
>>>>>>>> it to
>>>>>>>>      have a BeamStream and a few more features like sides etc)?
>>>>>>>> Would be
>>>>>>>> very
>>>>>>>>      natural and integrable easily anywhere and avoid a new API
>>>>>>>> discovery.
>>>>>>>>
>>>>>>>>      Hazelcast jet did it so I dont see why Beam couldnt.
>>>>>>>>
>>>>>>>>      Le 18 déc. 2017 07:26, "Davor Bonaci" <da...@apache.org
>>>>>>>>      <mailto:da...@apache.org>> a écrit :
>>>>>>>>
>>>>>>>>          Hi David,
>>>>>>>>          As JB noted, merging of these two projects is a great
>>>>>>>> idea. If
>>>>>>>> fact,
>>>>>>>>          some of us have had those discussions in the past.
>>>>>>>>
>>>>>>>>          Legally, nothing particular is strictly necessary as the
>>>>>>>> code seem
>>>>>>>> to
>>>>>>>>          already be Apache 2.0 licensed. We don't, however, want to
>>>>>>>> be
>>>>>>>> perceived
>>>>>>>>          as making hostile forks, so it would be great to file a
>>>>>>>> Software
>>>>>>>> Grant
>>>>>>>>          Agreement with the ASF Secretary. I can help with the
>>>>>>>> process, as
>>>>>>>> necessary.
>>>>>>>>
>>>>>>>>          Project alignment-wise, there aren't any particular
>>>>>>>> blockers that
>>>>>>>> I am
>>>>>>>>          aware of. We welcome DSLs.
>>>>>>>>
>>>>>>>>          Technically, the code would start in a feature branch.
>>>>>>>> During this
>>>>>>>>          stage, we'd need to validate a few things, including
>>>>>>>> confirmation
>>>>>>>> the
>>>>>>>>          code and dependencies match the ASF policy, automate
>>>>>>>> testing in
>>>>>>>> Beam's
>>>>>>>>          tooling, etc. At that point, we'd take a community vote to
>>>>>>>> accept
>>>>>>>> the
>>>>>>>>          component into master, and consider author(s) for
>>>>>>>> committership in
>>>>>>>> the
>>>>>>>>          overall project.
>>>>>>>>
>>>>>>>>          Welcome to the ASF and Beam -- we are thrilled to have
>>>>>>>> you! Hope
>>>>>>>> this
>>>>>>>>          helps, and please reach out if anybody on our end can help,
>>>>>>>> including JB
>>>>>>>>          or myself.
>>>>>>>>
>>>>>>>>          Davor
>>>>>>>>
>>>>>>>>
>>>>>>>>          On Sun, Dec 17, 2017 at 10:13 AM, Jean-Baptiste Onofré
>>>>>>>> <j...@nanthrax.net
>>>>>>>>          <mailto:j...@nanthrax.net>> wrote:
>>>>>>>>
>>>>>>>>              Hi David,
>>>>>>>>
>>>>>>>>              Generally speaking, having different fluent DSL on top
>>>>>>>> of the
>>>>>>>> Beam
>>>>>>>>              SDK is great.
>>>>>>>>
>>>>>>>>              I would like to take a look on your wordcount examples
>>>>>>>> to give
>>>>>>>> you a
>>>>>>>>              complete feedback. I like the idea and a fluent Java
>>>>>>>> DSL is
>>>>>>>> valuable.
>>>>>>>>
>>>>>>>>              Let's wait feedback from others. If we have a
>>>>>>>> consensus, then
>>>>>>>> I
>>>>>>>>              would be more than happy to help you for the donation
>>>>>>>> (I
>>>>>>>> worked on
>>>>>>>>              the Camel Java DSL while ago, so I have some
>>>>>>>> experience here).
>>>>>>>>
>>>>>>>>              Thanks !
>>>>>>>>              Regards
>>>>>>>>              JB
>>>>>>>>
>>>>>>>>              On 12/17/2017 07:00 PM, David Morávek wrote:
>>>>>>>>
>>>>>>>>                  Hello,
>>>>>>>>
>>>>>>>>
>>>>>>>>                  First of all, thanks for the amazing work the
>>>>>>>> Apache Beam
>>>>>>>>                  community is doing!
>>>>>>>>
>>>>>>>>
>>>>>>>>                  In 2014, we've started development of the runtime
>>>>>>>> independent
>>>>>>>>                  Java 8 API, that helps us to create unified
>>>>>>>> big-data
>>>>>>>> processing
>>>>>>>>                  flows. It has been used as a core building block of
>>>>>>>> Seznam.cz
>>>>>>>>                  web crawler data infrastructure every since. Its
>>>>>>>> design
>>>>>>>>                  principles and execution model are very similar to
>>>>>>>> Apache
>>>>>>>> Beam.
>>>>>>>>
>>>>>>>>
>>>>>>>>                  This API was open sourced in 2016, under the name
>>>>>>>> Euphoria
>>>>>>>> API:
>>>>>>>>
>>>>>>>>                  https://github.com/seznam/euphoria
>>>>>>>> <https://github.com/seznam/euphoria>
>>>>>>>>
>>>>>>>>
>>>>>>>>                  As it is very similar to Apache Beam, we feel,
>>>>>>>> that it is
>>>>>>>> not
>>>>>>>>                  worth of duplicating effort in terms of
>>>>>>>> development of new
>>>>>>>>                  runtimes and fine-tuning of current ones.
>>>>>>>>
>>>>>>>>
>>>>>>>>                  The main blocker for us to switch to Apache Beam
>>>>>>>> is lack
>>>>>>>> of the
>>>>>>>>                  Java 8 API. *W*e propose the integration of
>>>>>>>> Euphoria API
>>>>>>>> into
>>>>>>>>                  Apache Beam as a Java 8 DSL, in order to share our
>>>>>>>> effort
>>>>>>>> with
>>>>>>>>                  the community.
>>>>>>>>
>>>>>>>>
>>>>>>>>                  Simple example of the Euphoria API usage, can be
>>>>>>>> found
>>>>>>>> here:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount
>>>>>>>>
>>>>>>>> <
>>>>>>>> https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                  If you feel, that Beam community could leverage
>>>>>>>> from our
>>>>>>>> work,
>>>>>>>>                  we would love to start working on Euphoria
>>>>>>>> integration
>>>>>>>> into
>>>>>>>>                  Apache Beam (we already have a working POC, with
>>>>>>>> few basic
>>>>>>>>                  operators implemented).
>>>>>>>>
>>>>>>>>
>>>>>>>>                  I look forward to hearing from you,
>>>>>>>>
>>>>>>>>                  David
>>>>>>>>
>>>>>>>>
>>>>>>>>              --             Jean-Baptiste Onofré
>>>>>>>>              jbono...@apache.org <mailto:jbono...@apache.org>
>>>>>>>>              http://blog.nanthrax.net
>>>>>>>>              Talend - http://www.talend.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> s pozdravem
>>>>>>>>
>>>>>>>> David Morávek
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Jean-Baptiste Onofré
>>>>>>> jbono...@apache.org
>>>>>>> http://blog.nanthrax.net
>>>>>>> Talend - http://www.talend.com
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>
>
>
> --
> s pozdravem
>
> David Morávek
>

Reply via email to