Windowed stream operations -- These are too lazy for some use cases

2015-08-20 Thread Justin Grimes
We are aggregating real time logs of events, and want to do windows of 30 minutes. However, since the computation doesn't start until 30 minutes have passed, there is a ton of data built up that processing could've already started on. When it comes time to actually process the data, there is too

Re: Windowed stream operations -- These are too lazy for some use cases

2015-08-20 Thread Iulian DragoČ™
On Thu, Aug 20, 2015 at 6:58 PM, Justin Grimes jgri...@adzerk.com wrote: We are aggregating real time logs of events, and want to do windows of 30 minutes. However, since the computation doesn't start until 30 minutes have passed, there is a ton of data built up that processing could've already

Re: Windowed stream operations -- These are too lazy for some use cases

2015-08-20 Thread Justin Grimes
I tried something like that. When I tried just doing count() on the DStream, it didn't seem like it was actually forcing the computation. What (sort of) worked was doing a forEachRDD((rdd) = rdd.count()), or doing a print() on the DStream. The only problem was this seemed to add a lot of