Dear all,

Can someone please explain me how Spark Streaming executes the window()
operation? From the Spark 1.6.1 documentation, it seems that windowed
batches are automatically cached in memory, but looking at the web UI it
seems that operations already executed in previous batches are executed
again. For your convenience, I attach a screenshot of my running application
below.

By looking at the web UI, it seems that flatMapValues() RDDs are cached
(green spot - this is the last operation executed before I call window() on
the DStream), but, at the same time, it also seems that all the
transformations that led to flatMapValues() in previous batches are executed
again. If this is the case, the window() operation may induce huge
performance penalties, especially if the window duration is 1 or 2 hours (as
I expect for my application). Do you think that checkpointing the DStream at
that time can be helpful? Consider that the expected slide window is about 5
minutes.

Hope someone can clarify this point.

Thanks,
Marco
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n27041/window.png> 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Is-window-caching-DStreams-tp27041.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to