Re: Spark Streaming application code change and stateful transformations

2015-09-17 Thread Adrian Tanase
, data is loaded only once Hope this helps, -adrian From: Ofir Kerker Date: Wednesday, September 16, 2015 at 6:12 PM To: Cody Koeninger Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" Subject: Re: Spark Streaming application code change and stateful transformations Than

Re: Spark Streaming application code change and stateful transformations

2015-09-17 Thread Cody Koeninger
ialRdd with the values preloaded from DB > 2. By cleaning the checkpoint in between upgrades, data is loaded > only once > > Hope this helps, > -adrian > > From: Ofir Kerker > Date: Wednesday, September 16, 2015 at 6:12 PM > To: Cody Koeninger > Cc: "user@s

Re: Spark Streaming application code change and stateful transformations

2015-09-16 Thread Ofir Kerker
Thanks Cody! The 2nd solution is safer but seems wasteful :/ I'll try to optimize it by keeping in addition to the 'last-complete-hour' the corresponding offsets that bound the incomplete data to try and fast-forward only the last couple of hours in the worst case. On Mon, Sep 14, 2015 at 22:14

Re: Spark Streaming application code change and stateful transformations

2015-09-14 Thread Cody Koeninger
Solution 2 sounds better to me. You aren't always going to have graceful shutdowns. On Mon, Sep 14, 2015 at 1:49 PM, Ofir Kerker wrote: > Hi, > My Spark Streaming application consumes messages (events) from Kafka every > 10 seconds using the direct stream approach and