I've been recently thinking about continuous deployment to our spark streaming service.
We have a streaming application which does sessionization via `mapWithState`, aggregating sessions in memory until they are ready to be deployed. Now, as I see things we have two use cases here: 1. Spark streaming DAG didn't change - In this particular case, there shouldn't be a problem as the spark DAG is checkpointed every X seconds, so we should have most of our state saved and loaded. 2. Streaming streaming DAG changed - As far as I understand, if the spark DAG changes between releases, checkpointed data cannot be read and initialized again. This means that we'd actually lose all the state that was saved up until we terminated our job. Has anyone thought about this scenario? How do you guys deal with this in production? Yuval. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Continuous-deployment-to-Spark-Streaming-application-with-sessionization-tp26409.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org