Hi everyone, I'm trying to understand the windowed operations functioning. What I want to achieve is the following:
val ssc = new StreamingContext(sc, Seconds(1)) val lines = ssc.socketTextStream("localhost", 9999) val window5 = lines.window(Seconds(5),Seconds(5)).reduce((time1: Long, time2:Long) => time1 + time2) //basically the sum of all the numbers in a sliding window of 5 seconds each 5 seconds val window10 = window5.window(Seconds(10),Seconds(10)).reduce((time1: Long, time2:Long) => time1 + time2) //here I'm wanting to reuse the already calculated RDDs from window5. So I'm expecting 2 RDDs from the window5. val window20 = window10.window(Seconds(20),Seconds(20)).reduce((time1: Long, time2:Long) => time1 + time2) //the same as in window10 but in this case I expect two RDDs from window10. The thing is that sometimes window10 does not receive any RDD. Looking at the logs and the code it seems that the time in the interval gets invalid. In the code I see that WindowedDStream computes the interval doing the following: val currentWindow = new Interval(validTime - windowDuration + parent.slideDuration, validTime) val rddsInWindow = parent.slice(currentWindow) In the case of window10, the resulting slice is from validTime - 10 seconds + 5 seconds to validTime. I don't understand why the parent slideDuration is taking into account in this calculation. Could you please help me understand this logic?. Do you see something wrong in the code? Is there another way to achieve the same thing? Thanks! Pablo. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/WindowedDStreams-and-hierarchies-tp15029.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org