Flink on Tez

Kostas Tzoumas Fri, 07 Nov 2014 10:03:57 -0800

Hello Flink and Tez,

I would like to point you to a first version of Flink running on
Tez. This is a Flink subproject (to be initially contributed
to flink-addons) that allows you to run unmodified Flink programs on
top of Apache Tez.


You can get the code here:
https://github.com/ktzoumas/incubator-flink/tree/tez_support

If you want to give it a spin, some basic instructions are here:
https://github.com/ktzoumas/incubator-flink/tree/tez_support/flink-addons/flink-tez


Be warned that this is still work in progress, so you may encounter
bugs, and this has not yet been optimized for performance.

A few words on how it works and the motivation:

The programs pass as usual through the Flink compiler and use the
Flink runtime operators (map, reduce, join, etc, including the Flink
facilities for sorting, hashing, etc). Instead of generating a Flink
distributed program (called "JobGraph" in Flink), we can now also
generate a Tez program (called "DAG" in Tez).

I have been asked why would we want to do that, as Flink has its own
execution engine. Two reasons in my opinion.

First, Tez follows design choices that are geared towards resource
elasticity, whereas the design choices behind Flink's engine are
geared more towards low latency querying and iterative
processing. Therefoere, the two engines can really complement each
other. Users can run their Flink programs in the engine that fits
better their use case and setup.

Second, in Flink we have put a lot of effort in separating program
assembly with program execution and architecting the system in layers
(APIs, common API, compiler, data processing runtime, distributed
execution engine). The possibility to swap execution engines is a good
showcase of the benefits of such a layered architecture.

Of course, trying it out and reporting bugs or contributing is very
welcome!

Best,
Kostas

Flink on Tez

Reply via email to