I have what I would call unexpected behaviour when using window on a stream.
I have 2 windowed streams with a 5s batch interval. One window stream is 
(5s,5s)=smallWindow and the other (10s,5s)=bigWindow
What I've noticed is that the 1st RDD produced by bigWindow is incorrect and is 
of the size 5s not 10s. So instead of waiting 10s and producing 1 RDD with size 
10s, Spark produced the 1st 10s RDD of size 5s.
Why is this happening? To me it looks like a bug; Matei or TD can you verify 
that this is correct behaviour?


I have the following code
val ssc = new StreamingContext(conf, Seconds(5))

val smallWindowStream = ssc.queueStream(smallWindowRddQueue)
val bigWindowStream = ssc.queueStream(bigWindowRddQueue)

val smallWindow = smallWindowReshapedStream.window(Seconds(5), Seconds(5))
      .reduceByKey((t1, t2) => (t1._1, t1._2, t1._3 + t2._3))
val bigWindow = bigWindowReshapedStream.window(Seconds(10), Seconds(5))
        .reduceByKey((t1, t2) => (t1._1, t1._2, t1._3 + t2._3))

-Adrian

Reply via email to