I need to implement following logic in a spark streaming app:

for the incoming dStream, do some transformation, and invoke
updateStateByKey to 
update state object for each key (mark data entries that are updated as
dirty for next
step), then let state objects produce event(s) based (based on dirty flag). 
After that, need to update the state object to clean up the dirty flags so
it won't produce
the same event(s) in next batch. The pseudo code is followings: 

1.  val dstream0 = ... 
2.  val state = dstream0.updateStateByKey[...](...)
3.  val events = state.flatMapValues(stateobj =>
stateobj.produceEvents(...))
4.  ?? how to update state again ??

Is it possible to do this? If so, how? Or is there alternative way to
archive the same thing?

I tried to update state object in stateobj.produceEvents(...) method, but
the state object 
at line#2 in next batch doesn't contain the change made in line#3 (in
previous batch).

Any suggestion?

Thanks!




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Is-possible-to-invoke-updateStateByKey-twice-on-the-same-RDD-tp16107.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to