Hello,
Thanks for the awesome feedback!
Romain:
We already use Java Stream API in all operators where it makes
sense (eg.:
ReduceByKey). Still not sure if it was a good choice, but i can be
easily
converted to iterator anyway.
Side outputs support is coming soon, we already made an initial
work on
this.
Side inputs are not supported in a way you are used to from beam,
because
it can be replaced by Join operator on the same key (if annotated
with
broadcastHashJoin, it will be turned into map side join).
Only significant difference from Beam is, that we decided not to
abstract
serialization, so we need to add support for Type Hints, because
of type
erasure.
Fluent API:
API is fluent within one operator. It is designed to "lead the
programmer", which means, that he we'll be only offered methods
that makes
sense after the last method he used (eg.: in ReduceByKey, we know
that after
keyBy either reduceBy method should come). It is implemented as a
series of
builders.
Davor:
Thanks, I'll contact you, and will start the process of having all
the
necessary paperwork signed on our side, so we can get things moving.
On Mon, Dec 18, 2017 at 7:46 AM, Romain Manni-Bucau
<rmannibu...@gmail.com
<mailto:rmannibu...@gmail.com>> wrote:
Hi guys
A DSL would be very welcomed, in particular if fluent.
Open question: did you study to implement Stream API (surely
extending
it to
have a BeamStream and a few more features like sides etc)?
Would be
very
natural and integrable easily anywhere and avoid a new API
discovery.
Hazelcast jet did it so I dont see why Beam couldnt.
Le 18 déc. 2017 07:26, "Davor Bonaci" <da...@apache.org
<mailto:da...@apache.org>> a écrit :
Hi David,
As JB noted, merging of these two projects is a great
idea. If
fact,
some of us have had those discussions in the past.
Legally, nothing particular is strictly necessary as the
code seem
to
already be Apache 2.0 licensed. We don't, however, want
to be
perceived
as making hostile forks, so it would be great to file a
Software
Grant
Agreement with the ASF Secretary. I can help with the
process, as
necessary.
Project alignment-wise, there aren't any particular
blockers that
I am
aware of. We welcome DSLs.
Technically, the code would start in a feature branch.
During this
stage, we'd need to validate a few things, including
confirmation
the
code and dependencies match the ASF policy, automate
testing in
Beam's
tooling, etc. At that point, we'd take a community vote
to accept
the
component into master, and consider author(s) for
committership in
the
overall project.
Welcome to the ASF and Beam -- we are thrilled to have
you! Hope
this
helps, and please reach out if anybody on our end can help,
including JB
or myself.
Davor
On Sun, Dec 17, 2017 at 10:13 AM, Jean-Baptiste Onofré
<j...@nanthrax.net
<mailto:j...@nanthrax.net>> wrote:
Hi David,
Generally speaking, having different fluent DSL on
top of the
Beam
SDK is great.
I would like to take a look on your wordcount
examples to give
you a
complete feedback. I like the idea and a fluent Java
DSL is
valuable.
Let's wait feedback from others. If we have a
consensus, then
I
would be more than happy to help you for the donation (I
worked on
the Camel Java DSL while ago, so I have some
experience here).
Thanks !
Regards
JB
On 12/17/2017 07:00 PM, David Morávek wrote:
Hello,
First of all, thanks for the amazing work the
Apache Beam
community is doing!
In 2014, we've started development of the runtime
independent
Java 8 API, that helps us to create unified big-data
processing
flows. It has been used as a core building block of
Seznam.cz
web crawler data infrastructure every since. Its
design
principles and execution model are very similar
to Apache
Beam.
This API was open sourced in 2016, under the name
Euphoria
API:
https://github.com/seznam/euphoria
<https://github.com/seznam/euphoria>
As it is very similar to Apache Beam, we feel,
that it is
not
worth of duplicating effort in terms of
development of new
runtimes and fine-tuning of current ones.
The main blocker for us to switch to Apache Beam
is lack
of the
Java 8 API. *W*e propose the integration of
Euphoria API
into
Apache Beam as a Java 8 DSL, in order to share
our effort
with
the community.
Simple example of the Euphoria API usage, can be
found
here:
https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount
<https://github.com/seznam/euphoria/tree/master/euphoria-examples/src/main/java/cz/seznam/euphoria/examples/wordcount>
If you feel, that Beam community could leverage
from our
work,
we would love to start working on Euphoria
integration
into
Apache Beam (we already have a working POC, with
few basic
operators implemented).
I look forward to hearing from you,
David
-- Jean-Baptiste Onofré
jbono...@apache.org <mailto:jbono...@apache.org>
http://blog.nanthrax.net
Talend - http://www.talend.com
--
s pozdravem
David Morávek