I need to implement following logic in a spark streaming app: for the incoming dStream, do some transformation, and invoke updateStateByKey to update state object for each key (mark data entries that are updated as dirty for next step), then let state objects produce event(s) based (based on dirty flag). After that, need to update the state object to clean up the dirty flags so it won't produce the same event(s) in next batch. The pseudo code is followings:
1. val dstream0 = ... 2. val state = dstream0.updateStateByKey[...](...) 3. val events = state.flatMapValues(stateobj => stateobj.produceEvents(...)) 4. ?? how to update state again ?? Is it possible to do this? If so, how? Or is there alternative way to archive the same thing? I tried to update state object in stateobj.produceEvents(...) method, but the state object at line#2 in next batch doesn't contain the change made in line#3 (in previous batch). Any suggestion? Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Is-possible-to-invoke-updateStateByKey-twice-on-the-same-RDD-tp16107.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org