Hi Etienne,

That's a great news and good job !

By "having Nexmark on Beam", I guess you mean the translation of the NEXMark queries in Beam, not NEXMark itself, right ?

If you mean the later, I'm not sure as NEXMark is not Beam related (it's more generic) and it could be tricky in terms of legal (license, SGA, ...).

Regards
JB

On 03/21/2017 05:38 PM, Etienne Chauchot wrote:
Hi all,

Ismael and I are working on upgrading the Nexmark implementation for Beam. See
https://github.com/iemejia/beam/tree/BEAM-160-nexmark and
https://issues.apache.org/jira/browse/BEAM-160. We are continuing the work done
by Mark Shields. See https://github.com/apache/beam/pull/366 for the original 
PR.

The PR contains queries that have a wide coverage of the Beam model and that
represent a realistic end user use case (some come from client experience on
Google Cloud Dataflow).

So far, we have upgraded the implementation to the latest Beam snapshot. And we
are able to execute a good subset of the queries in the different runners. We
upgraded the nexmark drivers to do so: direct driver (upgraded from
inProcessDriver) and flink driver and we added a new one for spark.

There is still a good amount of work to do and we would like to know if you
think that this contribution can have its place into Beam eventually.

The interests of having Nexmark on Beam that we have seen so far are:

- Rich batch/streaming test

- A-B testing of runners or runtimes (non-regression, performance comparison
between versions ...)

- Integration testing (sdk/runners, runner/runtime, ...)

- Validate beam capability matrix

- It can be used as part of the ongoing PerfKit work (if there is any interest).

As a final note, we are tracking the issues in the same repo. If someone is
interested in contributing, or have more ideas, you are welcome :)

Etienne


--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Reply via email to