Hi Radu,

The problem itself is not checkpointing the data – if your operations are 
stateless then you are only checkpointing the kafka offsets, you are right.
The problem is that you are also checkpointing metadata – including the actual 
Code and serialized java classes – that’s why you’ll see ser/deser exceptions 
on restart with upgrade.

If you’re not using stateful opetations, you might get away by using the old 
Kafka receiver w/o WAL – but you accept “at least once semantics”. As soon as 
you add in the WAL you are forced to checkpoint and you’re better off with the 
DirectReceiver approach.

I believe the simplest way to get around is to support runnning 2 versions in 
parallel – with some app level control of a barrier (e.g. v1 reads events up to 
3:00am, v2 after that). Manual state management is also supported by the 
framework but it’s harder to control because:

  *   you’re not guaranteed to shut down gracefully
  *   You may have a bug that prevents the state to be saved and you can’t 
restart the app w/o upgrade

Less than ideal, yes :)

-adrian

From: Radu Brumariu
Date: Friday, September 25, 2015 at 1:31 AM
To: Cody Koeninger
Cc: "user@spark.apache.org<mailto:user@spark.apache.org>"
Subject: Re: kafka direct streaming with checkpointing

Would changing the direct stream api to support committing the offsets to 
kafka's ZK( like a regular consumer) as a fallback mechanism, in case 
recovering from checkpoint fails , be an accepted solution?

On Thursday, September 24, 2015, Cody Koeninger 
<c...@koeninger.org<mailto:c...@koeninger.org>> wrote:
This has been discussed numerous times, TD's response has consistently been 
that it's unlikely to be possible

On Thu, Sep 24, 2015 at 12:26 PM, Radu Brumariu 
<bru...@gmail.com<javascript:_e(%7B%7D,'cvml','bru...@gmail.com');>> wrote:
It seems to me that this scenario that I'm facing, is quite common for spark 
jobs using Kafka.
Is there a ticket to add this sort of semantics to checkpointing ? Does it even 
make sense to add it there ?

Thanks,
Radu


On Thursday, September 24, 2015, Cody Koeninger 
<c...@koeninger.org<javascript:_e(%7B%7D,'cvml','c...@koeninger.org');>> wrote:
No, you cant use checkpointing across code changes.  Either store offsets 
yourself, or start up your new app code and let it catch up before killing the 
old one.

On Thu, Sep 24, 2015 at 8:40 AM, Radu Brumariu <bru...@gmail.com> wrote:
Hi,
in my application I use Kafka direct streaming and I have also enabled 
checkpointing.
This seems to work fine if the application is restarted. However if I change 
the code and resubmit the application, it cannot start because of the 
checkpointed data being of different class versions.
Is there any way I can use checkpointing that can survive across application 
version changes?

Thanks,
Radu



Reply via email to