[jira] [Assigned] (SPARK-26425) Add more constraint checks in file streaming source to avoid checkpoint corruption

2020-09-16 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-26425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-26425:


Assignee: Jungtaek Lim  (was: Tathagata Das)

> Add more constraint checks in file streaming source to avoid checkpoint 
> corruption
> --
>
> Key: SPARK-26425
> URL: https://issues.apache.org/jira/browse/SPARK-26425
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Tathagata Das
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.1.0
>
>
> Two issues observed in production. 
> - HDFSMetadataLog.getLatest() tries to read older versions when it is not 
> able to read the latest listed version file. Not sure why this was done but 
> this should not be done. If the latest listed file is not readable, then 
> something is horribly wrong and we should fail rather than report an older 
> version as that can completely corrupt the checkpoint directory. 
> - FileStreamSource should check whether adding the a new batch to the 
> FileStreamSourceLog succeeded or not (similar to how StreamExecution checks 
> for the OffsetSeqLog)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26425) Add more constraint checks in file streaming source to avoid checkpoint corruption

2020-04-30 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-26425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26425:


Assignee: Tathagata Das  (was: Apache Spark)

> Add more constraint checks in file streaming source to avoid checkpoint 
> corruption
> --
>
> Key: SPARK-26425
> URL: https://issues.apache.org/jira/browse/SPARK-26425
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>Priority: Major
>
> Two issues observed in production. 
> - HDFSMetadataLog.getLatest() tries to read older versions when it is not 
> able to read the latest listed version file. Not sure why this was done but 
> this should not be done. If the latest listed file is not readable, then 
> something is horribly wrong and we should fail rather than report an older 
> version as that can completely corrupt the checkpoint directory. 
> - FileStreamSource should check whether adding the a new batch to the 
> FileStreamSourceLog succeeded or not (similar to how StreamExecution checks 
> for the OffsetSeqLog)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26425) Add more constraint checks in file streaming source to avoid checkpoint corruption

2020-04-30 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-26425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26425:


Assignee: Apache Spark  (was: Tathagata Das)

> Add more constraint checks in file streaming source to avoid checkpoint 
> corruption
> --
>
> Key: SPARK-26425
> URL: https://issues.apache.org/jira/browse/SPARK-26425
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Tathagata Das
>Assignee: Apache Spark
>Priority: Major
>
> Two issues observed in production. 
> - HDFSMetadataLog.getLatest() tries to read older versions when it is not 
> able to read the latest listed version file. Not sure why this was done but 
> this should not be done. If the latest listed file is not readable, then 
> something is horribly wrong and we should fail rather than report an older 
> version as that can completely corrupt the checkpoint directory. 
> - FileStreamSource should check whether adding the a new batch to the 
> FileStreamSourceLog succeeded or not (similar to how StreamExecution checks 
> for the OffsetSeqLog)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org