In my understand, here I have only three options to keep the DStream state
between redeploys (yes, I'm using updateStateByKey):

1. Use checkpoint.
2. Use my own database.
3. Use both.

But none of  these options are great:

1. Use checkpoint: I cannot load it after code change. Or I need to keep
the structure of the classes, which seems to be impossible in a developing
project.
2. Use my own database: there may be failure between the program read data
from Kafka and save the DStream to database. So there may have data lose.
3. Use both: Will the data load two times? How can I know in which
situation I should use the which one?

The power of checkpoint seems to be very limited. Is there any plan to
support checkpoint while class is changed, like the discussion you gave me
pointed out?



Akhil Das <ak...@sigmoidanalytics.com>于2015年9月17日周四 下午3:26写道:

> Any kind of changes to the jvm classes will make it fail. By checkpointing
> the data you mean using checkpoint with updateStateByKey? Here's a similar
> discussion happened earlier which will clear your doubts i guess
> http://mail-archives.us.apache.org/mod_mbox/spark-user/201507.mbox/%3CCA+AHuK=xoy8dsdaobmgm935goqytaaqkpqsvdaqpmojottj...@mail.gmail.com%3E
>
> Thanks
> Best Regards
>
> On Thu, Sep 17, 2015 at 10:01 AM, Bin Wang <wbi...@gmail.com> wrote:
>
>> And here is another question. If I load the DStream from database every
>> time I start the job, will the data be loaded when the job is failed and
>> auto restart? If so, both the checkpoint data and database data are loaded,
>> won't this a problem?
>>
>>
>>
>> Bin Wang <wbi...@gmail.com>于2015年9月16日周三 下午8:40写道:
>>
>>> Will StreamingContex.getOrCreate do this work?What kind of code change
>>> will make it cannot load?
>>>
>>> Akhil Das <ak...@sigmoidanalytics.com>于2015年9月16日周三 20:20写道:
>>>
>>>> You can't really recover from checkpoint if you alter the code. A
>>>> better approach would be to use some sort of external storage (like a db or
>>>> zookeeper etc) to keep the state (the indexes etc) and then when you deploy
>>>> new code they can be easily recovered.
>>>>
>>>> Thanks
>>>> Best Regards
>>>>
>>>> On Wed, Sep 16, 2015 at 3:52 PM, Bin Wang <wbi...@gmail.com> wrote:
>>>>
>>>>> I'd like to know if there is a way to recovery dstream from checkpoint.
>>>>>
>>>>> Because I stores state in DStream, I'd like the state to be recovered
>>>>> when I restart the application and deploy new code.
>>>>>
>>>>
>>>>
>

Reply via email to