Solution 2 sounds better to me.  You aren't always going to have graceful
shutdowns.

On Mon, Sep 14, 2015 at 1:49 PM, Ofir Kerker <ofir.ker...@gmail.com> wrote:

> Hi,
> My Spark Streaming application consumes messages (events) from Kafka every
> 10 seconds using the direct stream approach and aggregates these messages
> into hourly aggregations (to answer analytics questions like: "How many
> users from Paris visited page X between 8PM to 9PM") and save the data to
> Cassandra.
>
> I was wondering if there's a good practice for handling a code change in a
> Spark Streaming applications that uses stateful transformations
> (updateStateByKey for example) because the new application code will not be
> able to use the data that was checkpointed by the former application.
> I have thought of a few solutions for this issue and was hoping some of you
> have some experience with such case and can suggest other solutions or
> feedback my suggested solutions:
> *Solution #1*: On a graceful shutdown, in addition to the current Kafka
> offsets, persist the current aggregated data into Cassandra tables
> (different than the regular aggregation tables) that would allow reading
> them easily when the new application starts in order to build the initial
> state.
> *Solution #2*: When an hour is "complete" (i.e not expecting more events
> with the timestamp of this hour), update somewhere persistent (DB / shared
> file) the last-complete-hour. This will allow me, when the new application
> starts, to read all the events from Kafka from the beginning of retention
> period (last X hours) and ignore events from timestamp smaller or equal
> than
> the last-complete-hour.
>
> I'll be happy to get your feedback!
>
> Thanks,
> Ofir
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-application-code-change-and-stateful-transformations-tp24692.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to