Thanks Paris! This is really exciting and we should work on that definitely.

Cheers, Albert

On Mon, Feb 8, 2016 at 9:57 AM, Paris Carbone <[email protected]> wrote:
> Hi Albert,
>
> Perhaps the common denominator is time-based sliding and tumbling windows 
> based on event-time. Flink, Beam and Storm by now produce watermarks to 
> trigger event time windows consistently. One immediate difference I see 
> between Flink and Storm for example is that the default Flink windows operate 
> on a partitioned data stream (by key). A baseline solution would be to start 
> with task level windows which can also be achieved in Flink by 
> keyBy(partitionId) for example. There are several ways to go around every 
> each of these differences.
>
> Count windows are also supported in the task level and as far as I remember 
> they are used in Samoa by several operators (e.g. the ingestion of the  VHT 
> model).
>
> There are quite a few unique features in each system (e.g. additional 
> triggers and custom windows) but it is safe to ignore them for now. I am not 
> following Samza much lately, perhaps someone from their community can tell us 
> more. I do remember seeing discussions around a similar window scheme, not 
> sure if something is merged yet [1].
>
> Paris
>
> [1] 
> https://issues.apache.org/jira/browse/SAMZA-552?jql=project%20%3D%20SAMZA%20AND%20text%20~%20%22window%22<https://issues.apache.org/jira/browse/SAMZA-552?jql=project%20=%20SAMZA%20AND%20text%20~%20"window";>
>
> On 08 Feb 2016, at 09:21, Albert Bifet 
> <[email protected]<mailto:[email protected]>> wrote:
>
> Thanks Paris! As Gianmarco said, it could be nice to re-work on
> windowing in the near future. What are the differences in windowing in
> Google Data Flow, Flink and Storm right now? Any hint on how this is
> going to evolve in the future?
>
> Cheers, Albert
>
> On Sun, Feb 7, 2016 at 3:23 PM, tarush grover 
> <[email protected]<mailto:[email protected]>> wrote:
> Looking forward to be the part of this roadmap.
>
> Regards,
> Tarush
>
> On Sunday 7 February 2016, Gianmarco De Francisci Morales 
> <[email protected]<mailto:[email protected]>>
> wrote:
>
> Thanks for the pointer, Paris.
> Finding the right abstraction level for distributed streaming ML is
> definitely a worthy (and non-trivial) task.
>
> We are currently working on some improvements for VHT.
> Once that's done, re-working it on a window-based abstraction with proper
> support for iterations could be a nice project.
> We wound need to drop support for S4 (not sure about Samza), but that's on
> the roadmap anyway.
>
> Cheers,
>
> -- Gianmarco
>
> On Sat, Feb 6, 2016 at 1:42 PM, Márton Balassi 
> <[email protected]<mailto:[email protected]>
> <javascript:;>> wrote:
>
> Great suggestion, Paris. I would love to see Samoa building on these
> concept once they are stable enough in the supported data processing
> engines.
>
> On Fri, Feb 5, 2016 at 6:15 PM, Paris Carbone 
> <[email protected]<mailto:[email protected]>
> <javascript:;>> wrote:
>
> Hello Samoans,
>
> It seems that system semantics in stream processing are converging
> lately.
> Apache Storm has now explicit state and windows [1], almost identical
> to
> Flink and Beam. Samza is also moving in a similar direction.
>
> This is really exciting and it feels natural to start moving the Samoa
> programming model a level up on top these establishing concepts. For
> example, there is no more need for custom buffering to implement
> windowing
> and ML models etc. can be re-defined and engineered as operator state
> to
> be
> durable. There are quite many cool things to be done and I believe
> there
> can be a very attractive roadmap for Samoa in that direction. What do
> you
> think?
>
> [1]
>
>
> https://community.hortonworks.com/articles/14171/windowing-and-state-checkpointing-in-apache-storm.html
>
> Paris
>
>
>
>

Reply via email to