, data is loaded only
once
Hope this helps,
-adrian
From: Ofir Kerker
Date: Wednesday, September 16, 2015 at 6:12 PM
To: Cody Koeninger
Cc: "user@spark.apache.org<mailto:user@spark.apache.org>"
Subject: Re: Spark Streaming application code change and stateful
transformations
Than
ialRdd with the values preloaded from DB
> 2. By cleaning the checkpoint in between upgrades, data is loaded
> only once
>
> Hope this helps,
> -adrian
>
> From: Ofir Kerker
> Date: Wednesday, September 16, 2015 at 6:12 PM
> To: Cody Koeninger
> Cc: "user@s
Thanks Cody!
The 2nd solution is safer but seems wasteful :/
I'll try to optimize it by keeping in addition to the 'last-complete-hour'
the corresponding offsets that bound the incomplete data to try and
fast-forward only the last couple of hours in the worst case.
On Mon, Sep 14, 2015 at 22:14
Solution 2 sounds better to me. You aren't always going to have graceful
shutdowns.
On Mon, Sep 14, 2015 at 1:49 PM, Ofir Kerker wrote:
> Hi,
> My Spark Streaming application consumes messages (events) from Kafka every
> 10 seconds using the direct stream approach and