[ 
https://issues.apache.org/jira/browse/SPARK-7139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14516279#comment-14516279
 ] 

Apache Spark commented on SPARK-7139:
-------------------------------------

User 'tdas' has created a pull request for this issue:
https://github.com/apache/spark/pull/5732

> Allow received block metadata to be saved to WAL and recovered on driver 
> failure
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-7139
>                 URL: https://issues.apache.org/jira/browse/SPARK-7139
>             Project: Spark
>          Issue Type: Improvement
>          Components: Streaming
>            Reporter: Tathagata Das
>            Assignee: Tathagata Das
>            Priority: Critical
>
> The received API allows arbitrary metadata to be added for each block. 
> However that information is not saved in the WAL as part of the block 
> information in the driver. 
> To fix this, the following needs to be done. 
> 1. Forward the metadata to the ReceivedBlockTracker in the driver.
> 2. ReceivedBlockTracker saves the metadata and recovers it on restart. 
> However there is one tricky thing. The ReceivedBlockTracker WAL is enabled 
> only when `spark.streaming.receiver.writeAheadLog.enable = true`. This means 
> that only when  receiver WAL is enabled is the driver WAL enabled. This is 
> not desired as the one may want to save and recovered block metadata 
> information (especially information like Kafka offsets or Kinesis sequence 
> numbers) that can be used to recover data without actually saving the data to 
> the receiver WAL. So we have to always enable the tracker WAL. 
> 3. Always enable the ReceivedBlockTracker WAL. However, make sure that the 
> blockIds are not recovered as they will not be useful after driver restart 
> (the blocks are gone!). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to