reduceByKeyAndWindow with initial state

Imran Alam Fri, 10 Jul 2015 02:08:26 -0700

We have a streaming job that makes use of reduceByKeyAndWindow
<https://github.com/apache/spark/blob/v1.4.0/streaming/src/main/scala/org/apache/spark/streaming/dstream/PairDStreamFunctions.scala#L334-L341>.
We want this to work with an initial state. The idea is to avoid losing
state if the streaming job is restarted, also to take historical data into
account for the windows. But reduceByKeyAndWindow doesn't accept any
"initialRDD" parameter like updateStateByKey
<https://github.com/apache/spark/blob/v1.4.0/streaming/src/main/scala/org/apache/spark/streaming/dstream/PairDStreamFunctions.scala#L435-L445>
does.


The plan is to extend reduceByKeyAndWindow to accept an "initalRDDs"
parameter, so that the DStream starts with those RDDs as initial value of
"generatedRDD" rather than an empty map. But the "generatedRDD" is a
private variable, so I'm bit confused on how to proceed with the plan.

reduceByKeyAndWindow with initial state

Reply via email to