I have what I would call unexpected behaviour when using window on a stream. I have 2 windowed streams with a 5s batch interval. One window stream is (5s,5s)=smallWindow and the other (10s,5s)=bigWindow What I've noticed is that the 1st RDD produced by bigWindow is incorrect and is of the size 5s not 10s. So instead of waiting 10s and producing 1 RDD with size 10s, Spark produced the 1st 10s RDD of size 5s. Why is this happening? To me it looks like a bug; Matei or TD can you verify that this is correct behaviour?
I have the following code val ssc = new StreamingContext(conf, Seconds(5)) val smallWindowStream = ssc.queueStream(smallWindowRddQueue) val bigWindowStream = ssc.queueStream(bigWindowRddQueue) val smallWindow = smallWindowReshapedStream.window(Seconds(5), Seconds(5)) .reduceByKey((t1, t2) => (t1._1, t1._2, t1._3 + t2._3)) val bigWindow = bigWindowReshapedStream.window(Seconds(10), Seconds(5)) .reduceByKey((t1, t2) => (t1._1, t1._2, t1._3 + t2._3)) -Adrian