Re: Flink Streaming Hangout

Márton Balassi Fri, 12 Dec 2014 12:39:16 -0800

The hangout was not recorded, so I'm providing a short write-up on the
issues and decisions. The discussion was 2 hours long, so please feel free
to add the important statements I have missed.

The initial ideas are listed on the project wiki as Flink Streaming
roadmap. [1] The hangout yielded the following additions:

   * Fault tolerance: We have a (mostly) working prototype not yet merged
for at least once semantics, that works similarly to Storm. A missing
feature on the streaming side is vertex restarts in the ExecutionGraph,
which will be made easier with Ufuk's intermediate results [2] pull
request, which will be merged after the 0.8 release. As for exactly once
semantics the preferred option was upstream backup, which is conceptually
the same as backtracking until an intermediate result is found - given that
intermediate results are stored at every vertex.

   * A common pipeline architecture for batch and streaming: The original
idea was to have just one ExecutionEnvironment which can convert DataSets
to DataStreams and vice versa. Gyula hacked together a small prototype
where a DataSet was fed into a DataStream, but for seamless integration
large refactor would have been needed. Stephan stepped in with the idea
that most likely only the DataSet to DataStream option should be supported
and initially let's work it through materializing the batch result in some
in-memory abstraction or even files. This would results in building
separate batch and streaming JobGraphs, and thus addressing optimization,
fault tolerance etc. separately. Gyula mentioned Chiwan Park's pending PR
on using HDFS updates as a streaming source as a possible solution for
feeding the results of recurring batch jobs into streaming.

   * API integrations: We've just added java 8 support to the streaming API
and started working on the Scala API as well, which seems to be a low
hanging fruit standing on Aljoscha's shoulder. A next step would be also
adding the Python API and building on that providing a notebook-like "IDE",
with e.g. Zeppelin. [4] This is also (in fact mainly) interesting for batch
processing. For further integrations a scala shell should be really useful.
According to Stephan the latter should not be too challenging, mostly API
and some Scheduler work is required.

   * Multiparadigm (batch & streaming) ML: Opening to the machine learning
direction Paris and Vasia took up the SAMOA [5] integration issue, which
would provide streaming machine learning support and also comparability
with Storm, S4 and Samza. Kostas mentioned that the Mahout port to Flink is
also an on-going effort.

Further topics included the state of the 0.8 release, for which the first
release candidate should come next week; the streaming windowing rework
lead by Jonas [6], and conceptional comparison of Spark and Flink initiated
by Henry.

Special thanks to Mayur for tuning in despite of being around midnight in
India and providing valuable insight on Tachyon.

[1] https://cwiki.apache.org/confluence/display/FLINK/Flink+Streaming
[2] https://github.com/apache/incubator-flink/pull/254
[3] https://github.com/apache/incubator-flink/pull/226
[4] https://github.com/NFLabs/zeppelin/blob/master/README.md
[5] http://samoa-project.net/
[6]
http://mail-archives.apache.org/mod_mbox/incubator-flink-dev/201412.mbox/%3CCANBGL8uzpthoapQRZPK1v8seFcTM%3DCFA2-MRECkfiNg4LXmbLA%40mail.gmail.com%3E

Cheers,

Marton

On Fri, Dec 12, 2014 at 8:25 PM, Henry Saputra <[email protected]>
wrote:
>
> Related to Zeppelin [1], looks like it is sponsored/ developed by a
> company in Korea [2] that has nothing to do with football
> unfortunately (I thought they were the same team that does
> http://www.nfl.com/stats/statslab),
> I was kinda disappointed at the beginning =P
>
> But anyway seemed like integration with Flink would be considered as
> potential next one  =)
>
> Just want to make sure I clear up my comments in the hangout.
>
> - Henry
>
> [1] https://github.com/NFLabs/zeppelin/blob/master/README.md
> [2] http://www.nflabs.com
>
> On Fri, Dec 12, 2014 at 3:22 AM, Gyula Fóra <[email protected]> wrote:
> > Hey All,
> >
> > I have created a google hangout for today's Streaming discussion:
> > https://plus.google.com/hangouts/_/gws3f77u5fee5euehtw7zwob2qa
> >
> > As we have agreed we will start today at 17:30 CET / 08:30 PST and I
> think
> > we'll be around for those who can only join later as well.
> >
> > Please find the topics here
> > <https://cwiki.apache.org/confluence/display/FLINK/Flink+Streaming>.
> >
> > Gyula
>

Re: Flink Streaming Hangout

Reply via email to