[ https://issues.apache.org/jira/browse/SPARK-26425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782196#comment-16782196 ]
Sean Owen commented on SPARK-26425: ----------------------------------- [~kabhwan] I think you're welcome to work on this. > Add more constraint checks in file streaming source to avoid checkpoint > corruption > ---------------------------------------------------------------------------------- > > Key: SPARK-26425 > URL: https://issues.apache.org/jira/browse/SPARK-26425 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming > Affects Versions: 2.4.0 > Reporter: Tathagata Das > Assignee: Tathagata Das > Priority: Major > > Two issues observed in production. > - HDFSMetadataLog.getLatest() tries to read older versions when it is not > able to read the latest listed version file. Not sure why this was done but > this should not be done. If the latest listed file is not readable, then > something is horribly wrong and we should fail rather than report an older > version as that can completely corrupt the checkpoint directory. > - FileStreamSource should check whether adding the a new batch to the > FileStreamSourceLog succeeded or not (similar to how StreamExecution checks > for the OffsetSeqLog) -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org