We have a streaming job that makes use of reduceByKeyAndWindow
<https://github.com/apache/spark/blob/v1.4.0/streaming/src/main/scala/org/apache/spark/streaming/dstream/PairDStreamFunctions.scala#L334-L341>.
We want this to work with an initial state. The idea is to avoid losing
state if the streaming job is restarted, also to take historical data into
account for the windows. But reduceByKeyAndWindow doesn't accept any
"initialRDD" parameter like updateStateByKey
<https://github.com/apache/spark/blob/v1.4.0/streaming/src/main/scala/org/apache/spark/streaming/dstream/PairDStreamFunctions.scala#L435-L445>
does.

The plan is to extend reduceByKeyAndWindow to accept an "initalRDDs"
parameter, so that the DStream starts with those RDDs as initial value of
"generatedRDD" rather than an empty map. But the "generatedRDD" is a
private variable, so I'm bit confused on how to proceed with the plan.

Reply via email to