[jira] [Closed] (SPARK-9947) Separate Metadata and State Checkpoint Data

Cody Koeninger (JIRA) Wed, 12 Oct 2016 16:23:37 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-9947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Cody Koeninger closed SPARK-9947.
---------------------------------
    Resolution: Won't Fix

The direct DStream api already gives access to offsets, and it seems clear that 
 most future work on streaming checkpointing is going to be focused on 
structured streaming. SPARK-15406

> Separate Metadata and State Checkpoint Data
> -------------------------------------------
>
>                 Key: SPARK-9947
>                 URL: https://issues.apache.org/jira/browse/SPARK-9947
>             Project: Spark
>          Issue Type: Improvement
>          Components: Streaming
>    Affects Versions: 1.4.1
>            Reporter: Dan Dutrow
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Problem: When updating an application that has checkpointing enabled to 
> support the updateStateByKey and 24/7 operation functionality, you encounter 
> the problem where you might like to maintain state data between restarts but 
> delete the metadata containing execution state. 
> If checkpoint data exists between code redeployment, the program may not 
> execute properly or at all. My current workaround for this issue is to wrap 
> updateStateByKey with my own function that persists the state after every 
> update to my own separate directory. (That allows me to delete the checkpoint 
> with its metadata before redeploying) Then, when I restart the application, I 
> initialize the state with this persisted data. This incurs additional 
> overhead due to persisting of the same data twice: once in the checkpoint and 
> once in my persisted data folder. 
> If Kafka Direct API offsets could be stored in another separate checkpoint 
> directory, that would help address the problem of having to blow that away 
> between code redeployment as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-9947) Separate Metadata and State Checkpoint Data

Reply via email to