[ https://issues.apache.org/jira/browse/SPARK-7139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14516279#comment-14516279 ]
Apache Spark commented on SPARK-7139: ------------------------------------- User 'tdas' has created a pull request for this issue: https://github.com/apache/spark/pull/5732 > Allow received block metadata to be saved to WAL and recovered on driver > failure > -------------------------------------------------------------------------------- > > Key: SPARK-7139 > URL: https://issues.apache.org/jira/browse/SPARK-7139 > Project: Spark > Issue Type: Improvement > Components: Streaming > Reporter: Tathagata Das > Assignee: Tathagata Das > Priority: Critical > > The received API allows arbitrary metadata to be added for each block. > However that information is not saved in the WAL as part of the block > information in the driver. > To fix this, the following needs to be done. > 1. Forward the metadata to the ReceivedBlockTracker in the driver. > 2. ReceivedBlockTracker saves the metadata and recovers it on restart. > However there is one tricky thing. The ReceivedBlockTracker WAL is enabled > only when `spark.streaming.receiver.writeAheadLog.enable = true`. This means > that only when receiver WAL is enabled is the driver WAL enabled. This is > not desired as the one may want to save and recovered block metadata > information (especially information like Kafka offsets or Kinesis sequence > numbers) that can be used to recover data without actually saving the data to > the receiver WAL. So we have to always enable the tracker WAL. > 3. Always enable the ReceivedBlockTracker WAL. However, make sure that the > blockIds are not recovered as they will not be useful after driver restart > (the blocks are gone!). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org