I've been recently thinking about continuous deployment to our spark
streaming service.

We have a streaming application which does sessionization via
`mapWithState`, aggregating sessions in memory until they are ready to be
deployed.

Now, as I see things we have two use cases here:

1. Spark streaming DAG didn't change - In this particular case, there
shouldn't be a problem as the spark DAG is checkpointed every X seconds, so
we should have most of our state saved and loaded.

2. Streaming streaming DAG changed - As far as I understand, if the spark
DAG changes between releases, checkpointed data cannot be read and
initialized again. This means that we'd actually lose all the state that was
saved up until we terminated our job.

Has anyone thought about this scenario? How do you guys deal with this in
production?

Yuval.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Continuous-deployment-to-Spark-Streaming-application-with-sessionization-tp26409.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to