Thanks Paris! This is really exciting and we should work on that definitely.
Cheers, Albert On Mon, Feb 8, 2016 at 9:57 AM, Paris Carbone <[email protected]> wrote: > Hi Albert, > > Perhaps the common denominator is time-based sliding and tumbling windows > based on event-time. Flink, Beam and Storm by now produce watermarks to > trigger event time windows consistently. One immediate difference I see > between Flink and Storm for example is that the default Flink windows operate > on a partitioned data stream (by key). A baseline solution would be to start > with task level windows which can also be achieved in Flink by > keyBy(partitionId) for example. There are several ways to go around every > each of these differences. > > Count windows are also supported in the task level and as far as I remember > they are used in Samoa by several operators (e.g. the ingestion of the VHT > model). > > There are quite a few unique features in each system (e.g. additional > triggers and custom windows) but it is safe to ignore them for now. I am not > following Samza much lately, perhaps someone from their community can tell us > more. I do remember seeing discussions around a similar window scheme, not > sure if something is merged yet [1]. > > Paris > > [1] > https://issues.apache.org/jira/browse/SAMZA-552?jql=project%20%3D%20SAMZA%20AND%20text%20~%20%22window%22<https://issues.apache.org/jira/browse/SAMZA-552?jql=project%20=%20SAMZA%20AND%20text%20~%20"window"> > > On 08 Feb 2016, at 09:21, Albert Bifet > <[email protected]<mailto:[email protected]>> wrote: > > Thanks Paris! As Gianmarco said, it could be nice to re-work on > windowing in the near future. What are the differences in windowing in > Google Data Flow, Flink and Storm right now? Any hint on how this is > going to evolve in the future? > > Cheers, Albert > > On Sun, Feb 7, 2016 at 3:23 PM, tarush grover > <[email protected]<mailto:[email protected]>> wrote: > Looking forward to be the part of this roadmap. > > Regards, > Tarush > > On Sunday 7 February 2016, Gianmarco De Francisci Morales > <[email protected]<mailto:[email protected]>> > wrote: > > Thanks for the pointer, Paris. > Finding the right abstraction level for distributed streaming ML is > definitely a worthy (and non-trivial) task. > > We are currently working on some improvements for VHT. > Once that's done, re-working it on a window-based abstraction with proper > support for iterations could be a nice project. > We wound need to drop support for S4 (not sure about Samza), but that's on > the roadmap anyway. > > Cheers, > > -- Gianmarco > > On Sat, Feb 6, 2016 at 1:42 PM, Márton Balassi > <[email protected]<mailto:[email protected]> > <javascript:;>> wrote: > > Great suggestion, Paris. I would love to see Samoa building on these > concept once they are stable enough in the supported data processing > engines. > > On Fri, Feb 5, 2016 at 6:15 PM, Paris Carbone > <[email protected]<mailto:[email protected]> > <javascript:;>> wrote: > > Hello Samoans, > > It seems that system semantics in stream processing are converging > lately. > Apache Storm has now explicit state and windows [1], almost identical > to > Flink and Beam. Samza is also moving in a similar direction. > > This is really exciting and it feels natural to start moving the Samoa > programming model a level up on top these establishing concepts. For > example, there is no more need for custom buffering to implement > windowing > and ML models etc. can be re-defined and engineered as operator state > to > be > durable. There are quite many cool things to be done and I believe > there > can be a very attractive roadmap for Samoa in that direction. What do > you > think? > > [1] > > > https://community.hortonworks.com/articles/14171/windowing-and-state-checkpointing-in-apache-storm.html > > Paris > > > >
