[ 
https://issues.apache.org/jira/browse/SPARK-26425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782196#comment-16782196
 ] 

Sean Owen commented on SPARK-26425:
-----------------------------------

[~kabhwan] I think you're welcome to work on this.

> Add more constraint checks in file streaming source to avoid checkpoint 
> corruption
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-26425
>                 URL: https://issues.apache.org/jira/browse/SPARK-26425
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 2.4.0
>            Reporter: Tathagata Das
>            Assignee: Tathagata Das
>            Priority: Major
>
> Two issues observed in production. 
> - HDFSMetadataLog.getLatest() tries to read older versions when it is not 
> able to read the latest listed version file. Not sure why this was done but 
> this should not be done. If the latest listed file is not readable, then 
> something is horribly wrong and we should fail rather than report an older 
> version as that can completely corrupt the checkpoint directory. 
> - FileStreamSource should check whether adding the a new batch to the 
> FileStreamSourceLog succeeded or not (similar to how StreamExecution checks 
> for the OffsetSeqLog)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to