Thanks for the recap, and sorry for being unable to join in the end.

If there is anything I can do to help with the integration of SAMOA don't
hesitate to ask.

Cheers,



--
Gianmarco

On 12 December 2014 at 21:35, Márton Balassi <balassi.mar...@gmail.com>
wrote:
>
> The hangout was not recorded, so I'm providing a short write-up on the
> issues and decisions. The discussion was 2 hours long, so please feel free
> to add the important statements I have missed.
>
> The initial ideas are listed on the project wiki as Flink Streaming
> roadmap. [1] The hangout yielded the following additions:
>
>    * Fault tolerance: We have a (mostly) working prototype not yet merged
> for at least once semantics, that works similarly to Storm. A missing
> feature on the streaming side is vertex restarts in the ExecutionGraph,
> which will be made easier with Ufuk's intermediate results [2] pull
> request, which will be merged after the 0.8 release. As for exactly once
> semantics the preferred option was upstream backup, which is conceptually
> the same as backtracking until an intermediate result is found - given that
> intermediate results are stored at every vertex.
>
>    * A common pipeline architecture for batch and streaming: The original
> idea was to have just one ExecutionEnvironment which can convert DataSets
> to DataStreams and vice versa. Gyula hacked together a small prototype
> where a DataSet was fed into a DataStream, but for seamless integration
> large refactor would have been needed. Stephan stepped in with the idea
> that most likely only the DataSet to DataStream option should be supported
> and initially let's work it through materializing the batch result in some
> in-memory abstraction or even files. This would results in building
> separate batch and streaming JobGraphs, and thus addressing optimization,
> fault tolerance etc. separately. Gyula mentioned Chiwan Park's pending PR
> on using HDFS updates as a streaming source as a possible solution for
> feeding the results of recurring batch jobs into streaming.
>
>    * API integrations: We've just added java 8 support to the streaming API
> and started working on the Scala API as well, which seems to be a low
> hanging fruit standing on Aljoscha's shoulder. A next step would be also
> adding the Python API and building on that providing a notebook-like "IDE",
> with e.g. Zeppelin. [4] This is also (in fact mainly) interesting for batch
> processing. For further integrations a scala shell should be really useful.
> According to Stephan the latter should not be too challenging, mostly API
> and some Scheduler work is required.
>
>    * Multiparadigm (batch & streaming) ML: Opening to the machine learning
> direction Paris and Vasia took up the SAMOA [5] integration issue, which
> would provide streaming machine learning support and also comparability
> with Storm, S4 and Samza. Kostas mentioned that the Mahout port to Flink is
> also an on-going effort.
>
> Further topics included the state of the 0.8 release, for which the first
> release candidate should come next week; the streaming windowing rework
> lead by Jonas [6], and conceptional comparison of Spark and Flink initiated
> by Henry.
>
> Special thanks to Mayur for tuning in despite of being around midnight in
> India and providing valuable insight on Tachyon.
>
> [1] https://cwiki.apache.org/confluence/display/FLINK/Flink+Streaming
> [2] https://github.com/apache/incubator-flink/pull/254
> [3] https://github.com/apache/incubator-flink/pull/226
> [4] https://github.com/NFLabs/zeppelin/blob/master/README.md
> [5] http://samoa-project.net/
> [6]
>
> http://mail-archives.apache.org/mod_mbox/incubator-flink-dev/201412.mbox/%3CCANBGL8uzpthoapQRZPK1v8seFcTM%3DCFA2-MRECkfiNg4LXmbLA%40mail.gmail.com%3E
>
> Cheers,
>
> Marton
>
> On Fri, Dec 12, 2014 at 8:25 PM, Henry Saputra <henry.sapu...@gmail.com>
> wrote:
> >
> > Related to Zeppelin [1], looks like it is sponsored/ developed by a
> > company in Korea [2] that has nothing to do with football
> > unfortunately (I thought they were the same team that does
> > http://www.nfl.com/stats/statslab),
> > I was kinda disappointed at the beginning =P
> >
> > But anyway seemed like integration with Flink would be considered as
> > potential next one  =)
> >
> > Just want to make sure I clear up my comments in the hangout.
> >
> > - Henry
> >
> > [1] https://github.com/NFLabs/zeppelin/blob/master/README.md
> > [2] http://www.nflabs.com
> >
> > On Fri, Dec 12, 2014 at 3:22 AM, Gyula Fóra <gyula.f...@gmail.com>
> wrote:
> > > Hey All,
> > >
> > > I have created a google hangout for today's Streaming discussion:
> > > https://plus.google.com/hangouts/_/gws3f77u5fee5euehtw7zwob2qa
> > >
> > > As we have agreed we will start today at 17:30 CET / 08:30 PST and I
> > think
> > > we'll be around for those who can only join later as well.
> > >
> > > Please find the topics here
> > > <https://cwiki.apache.org/confluence/display/FLINK/Flink+Streaming>.
> > >
> > > Gyula
> >
>

Reply via email to