We are aggregating real time logs of events, and want to do windows of 30
minutes. However, since the computation doesn't start until 30 minutes have
passed, there is a ton of data built up that processing could've already
started on. When it comes time to actually process the data, there is too
On Thu, Aug 20, 2015 at 6:58 PM, Justin Grimes jgri...@adzerk.com wrote:
We are aggregating real time logs of events, and want to do windows of 30
minutes. However, since the computation doesn't start until 30 minutes have
passed, there is a ton of data built up that processing could've already
I tried something like that. When I tried just doing count() on the
DStream, it didn't seem like it was actually forcing the computation.
What (sort of) worked was doing a forEachRDD((rdd) = rdd.count()), or
doing a print() on the DStream. The only problem was this seemed to add a
lot of