Re: streaming window not behaving as advertised (v1.0.1)

2014-08-05 Thread Tathagata Das
1. udpateStateByKey should be called on all keys even if there is not data corresponding to that key. There is a unit test for that. https://github.com/apache/spark/blob/master/streaming/src/test/scala/org/apache/spark/streaming/BasicOperationsSuite.scala#L337 2. I am increasing the priority for

Re: streaming window not behaving as advertised (v1.0.1)

2014-08-01 Thread Venkat Subramanian
TD, We are seeing the same issue. We struggled through this until we found this post and the work around. A quick fix in the Spark Streaming software will help a lot for others who are encountering this and pulling their hair out on why RDD on some partitions are not computed (we ended up

Re: streaming window not behaving as advertised (v1.0.1)

2014-08-01 Thread RodrigoB
Hi TD, I've also been fighting this issue only to find the exact same solution you are suggesting. Too bad I didn't find either the post or the issue sooner. I'm using a 1 second batch with N amount of kafka events (1 to 1 with the state objects) per batch and only calling the updatestatebykey

Re: streaming window not behaving as advertised (v1.0.1)

2014-07-26 Thread Tathagata Das
Yeah, maybe I should bump the issue to major. Now that I thought about to give my previous answer, this should be easy to fix just by doing a foreachRDD on all the input streams within the system (rather than explicitly doing it like I asked you to do). Thanks Alan, for testing this out and

Re: streaming window not behaving as advertised (v1.0.1)

2014-07-23 Thread Alan Ngai
foreachRDD is how I extracted values in the first place, so that’s not going to make a difference. I don’t think it’s related to SPARK-1312 because I’m generating data every second in the first place and I’m using foreachRDD right after the window operation. The code looks something like val

Re: streaming window not behaving as advertised (v1.0.1)

2014-07-23 Thread Alan Ngai
TD, it looks like your instincts were correct. I misunderstood what you meant. If I force an eval on the inputstream using foreachRDD, the windowing will work correctly. If I don’t do that, lazy eval somehow screws with window batches I eventually receive. Any reason the bug is categorized

streaming window not behaving as advertised (v1.0.1)

2014-07-22 Thread Alan Ngai
I have a sample application pumping out records 1 per second. The batch interval is set to 5 seconds. Here’s a list of “observed window intervals” vs what was actually set window=25, slide=25 : observed-window=25, overlapped-batches=0 window=25, slide=20 : observed-window=20,

Re: streaming window not behaving as advertised (v1.0.1)

2014-07-22 Thread Tathagata Das
It could be related to this bug that is currently open. https://issues.apache.org/jira/browse/SPARK-1312 Here is a workaround. Can you put a inputStream.foreachRDD(rdd = { }) and try these combos again? TD On Tue, Jul 22, 2014 at 6:01 PM, Alan Ngai a...@opsclarity.com wrote: I have a sample