Re: How to initialize StateDStream

qihong Sat, 13 Sep 2014 12:18:07 -0700

I'm not sure what you mean by "previous run". Is it previous batch? or
previous run of spark-submit?


If it's "previous batch" (spark streaming creates a batch every batch
interval), then there's nothing to do.

If it's previous run of spark-submit (assuming you are able to save the
result somewhere), then I can think of two possible ways to do it:

1. read saved result as RDD (just do this once), and join the RDD with each
RDD of the stateStream. 

2. add extra logic to updateFunction: when the previous state is None (one
of two Option type values), you get save state for the given key from saved
result somehow, then your original logic to create new state object based on
Seq[V] and previous state. note that you need use this version of
updateFunction: "updateFunc: (Iterator[(K, Seq[V], Option[S])]) =>
Iterator[(K, S)]", which make key available to the update function.





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-initialize-StateDStream-tp14113p14176.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: How to initialize StateDStream

Reply via email to