The hangout was not recorded, so I'm providing a short write-up on the issues and decisions. The discussion was 2 hours long, so please feel free to add the important statements I have missed.
The initial ideas are listed on the project wiki as Flink Streaming roadmap. [1] The hangout yielded the following additions: * Fault tolerance: We have a (mostly) working prototype not yet merged for at least once semantics, that works similarly to Storm. A missing feature on the streaming side is vertex restarts in the ExecutionGraph, which will be made easier with Ufuk's intermediate results [2] pull request, which will be merged after the 0.8 release. As for exactly once semantics the preferred option was upstream backup, which is conceptually the same as backtracking until an intermediate result is found - given that intermediate results are stored at every vertex. * A common pipeline architecture for batch and streaming: The original idea was to have just one ExecutionEnvironment which can convert DataSets to DataStreams and vice versa. Gyula hacked together a small prototype where a DataSet was fed into a DataStream, but for seamless integration large refactor would have been needed. Stephan stepped in with the idea that most likely only the DataSet to DataStream option should be supported and initially let's work it through materializing the batch result in some in-memory abstraction or even files. This would results in building separate batch and streaming JobGraphs, and thus addressing optimization, fault tolerance etc. separately. Gyula mentioned Chiwan Park's pending PR on using HDFS updates as a streaming source as a possible solution for feeding the results of recurring batch jobs into streaming. * API integrations: We've just added java 8 support to the streaming API and started working on the Scala API as well, which seems to be a low hanging fruit standing on Aljoscha's shoulder. A next step would be also adding the Python API and building on that providing a notebook-like "IDE", with e.g. Zeppelin. [4] This is also (in fact mainly) interesting for batch processing. For further integrations a scala shell should be really useful. According to Stephan the latter should not be too challenging, mostly API and some Scheduler work is required. * Multiparadigm (batch & streaming) ML: Opening to the machine learning direction Paris and Vasia took up the SAMOA [5] integration issue, which would provide streaming machine learning support and also comparability with Storm, S4 and Samza. Kostas mentioned that the Mahout port to Flink is also an on-going effort. Further topics included the state of the 0.8 release, for which the first release candidate should come next week; the streaming windowing rework lead by Jonas [6], and conceptional comparison of Spark and Flink initiated by Henry. Special thanks to Mayur for tuning in despite of being around midnight in India and providing valuable insight on Tachyon. [1] https://cwiki.apache.org/confluence/display/FLINK/Flink+Streaming [2] https://github.com/apache/incubator-flink/pull/254 [3] https://github.com/apache/incubator-flink/pull/226 [4] https://github.com/NFLabs/zeppelin/blob/master/README.md [5] http://samoa-project.net/ [6] http://mail-archives.apache.org/mod_mbox/incubator-flink-dev/201412.mbox/%3CCANBGL8uzpthoapQRZPK1v8seFcTM%3DCFA2-MRECkfiNg4LXmbLA%40mail.gmail.com%3E Cheers, Marton On Fri, Dec 12, 2014 at 8:25 PM, Henry Saputra <henry.sapu...@gmail.com> wrote: > > Related to Zeppelin [1], looks like it is sponsored/ developed by a > company in Korea [2] that has nothing to do with football > unfortunately (I thought they were the same team that does > http://www.nfl.com/stats/statslab), > I was kinda disappointed at the beginning =P > > But anyway seemed like integration with Flink would be considered as > potential next one =) > > Just want to make sure I clear up my comments in the hangout. > > - Henry > > [1] https://github.com/NFLabs/zeppelin/blob/master/README.md > [2] http://www.nflabs.com > > On Fri, Dec 12, 2014 at 3:22 AM, Gyula Fóra <gyula.f...@gmail.com> wrote: > > Hey All, > > > > I have created a google hangout for today's Streaming discussion: > > https://plus.google.com/hangouts/_/gws3f77u5fee5euehtw7zwob2qa > > > > As we have agreed we will start today at 17:30 CET / 08:30 PST and I > think > > we'll be around for those who can only join later as well. > > > > Please find the topics here > > <https://cwiki.apache.org/confluence/display/FLINK/Flink+Streaming>. > > > > Gyula >