Thanks for the recap, and sorry for being unable to join in the end. If there is anything I can do to help with the integration of SAMOA don't hesitate to ask.
Cheers, -- Gianmarco On 12 December 2014 at 21:35, Márton Balassi <balassi.mar...@gmail.com> wrote: > > The hangout was not recorded, so I'm providing a short write-up on the > issues and decisions. The discussion was 2 hours long, so please feel free > to add the important statements I have missed. > > The initial ideas are listed on the project wiki as Flink Streaming > roadmap. [1] The hangout yielded the following additions: > > * Fault tolerance: We have a (mostly) working prototype not yet merged > for at least once semantics, that works similarly to Storm. A missing > feature on the streaming side is vertex restarts in the ExecutionGraph, > which will be made easier with Ufuk's intermediate results [2] pull > request, which will be merged after the 0.8 release. As for exactly once > semantics the preferred option was upstream backup, which is conceptually > the same as backtracking until an intermediate result is found - given that > intermediate results are stored at every vertex. > > * A common pipeline architecture for batch and streaming: The original > idea was to have just one ExecutionEnvironment which can convert DataSets > to DataStreams and vice versa. Gyula hacked together a small prototype > where a DataSet was fed into a DataStream, but for seamless integration > large refactor would have been needed. Stephan stepped in with the idea > that most likely only the DataSet to DataStream option should be supported > and initially let's work it through materializing the batch result in some > in-memory abstraction or even files. This would results in building > separate batch and streaming JobGraphs, and thus addressing optimization, > fault tolerance etc. separately. Gyula mentioned Chiwan Park's pending PR > on using HDFS updates as a streaming source as a possible solution for > feeding the results of recurring batch jobs into streaming. > > * API integrations: We've just added java 8 support to the streaming API > and started working on the Scala API as well, which seems to be a low > hanging fruit standing on Aljoscha's shoulder. A next step would be also > adding the Python API and building on that providing a notebook-like "IDE", > with e.g. Zeppelin. [4] This is also (in fact mainly) interesting for batch > processing. For further integrations a scala shell should be really useful. > According to Stephan the latter should not be too challenging, mostly API > and some Scheduler work is required. > > * Multiparadigm (batch & streaming) ML: Opening to the machine learning > direction Paris and Vasia took up the SAMOA [5] integration issue, which > would provide streaming machine learning support and also comparability > with Storm, S4 and Samza. Kostas mentioned that the Mahout port to Flink is > also an on-going effort. > > Further topics included the state of the 0.8 release, for which the first > release candidate should come next week; the streaming windowing rework > lead by Jonas [6], and conceptional comparison of Spark and Flink initiated > by Henry. > > Special thanks to Mayur for tuning in despite of being around midnight in > India and providing valuable insight on Tachyon. > > [1] https://cwiki.apache.org/confluence/display/FLINK/Flink+Streaming > [2] https://github.com/apache/incubator-flink/pull/254 > [3] https://github.com/apache/incubator-flink/pull/226 > [4] https://github.com/NFLabs/zeppelin/blob/master/README.md > [5] http://samoa-project.net/ > [6] > > http://mail-archives.apache.org/mod_mbox/incubator-flink-dev/201412.mbox/%3CCANBGL8uzpthoapQRZPK1v8seFcTM%3DCFA2-MRECkfiNg4LXmbLA%40mail.gmail.com%3E > > Cheers, > > Marton > > On Fri, Dec 12, 2014 at 8:25 PM, Henry Saputra <henry.sapu...@gmail.com> > wrote: > > > > Related to Zeppelin [1], looks like it is sponsored/ developed by a > > company in Korea [2] that has nothing to do with football > > unfortunately (I thought they were the same team that does > > http://www.nfl.com/stats/statslab), > > I was kinda disappointed at the beginning =P > > > > But anyway seemed like integration with Flink would be considered as > > potential next one =) > > > > Just want to make sure I clear up my comments in the hangout. > > > > - Henry > > > > [1] https://github.com/NFLabs/zeppelin/blob/master/README.md > > [2] http://www.nflabs.com > > > > On Fri, Dec 12, 2014 at 3:22 AM, Gyula Fóra <gyula.f...@gmail.com> > wrote: > > > Hey All, > > > > > > I have created a google hangout for today's Streaming discussion: > > > https://plus.google.com/hangouts/_/gws3f77u5fee5euehtw7zwob2qa > > > > > > As we have agreed we will start today at 17:30 CET / 08:30 PST and I > > think > > > we'll be around for those who can only join later as well. > > > > > > Please find the topics here > > > <https://cwiki.apache.org/confluence/display/FLINK/Flink+Streaming>. > > > > > > Gyula > > >