[ 
https://issues.apache.org/jira/browse/SPARK-55058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jerry Zheng updated SPARK-55058:
--------------------------------
    Description: The {{metadata}} file holds the streaming query ID, and should 
be existent if the commit and offset files are non-empty. This file not 
existing will result in duplicates and incorrectness downstream if using 
DeltaSink which uses the streaming query ID to dedup commits for the same 
batch. If the metadata file isn’t there, but the commit and offset files are 
there, we should throw an error as the checkpoint is in an inconsistent state.  
(was: The {{metadata}} file holds the streaming query ID, and should be 
existent if the commit and offset files are non-empty. This file not existing 
will result in duplicates and incorrectness downstream if using exactly-once 
sinks like DeltaSink which uses the streaming query ID to dedup commits for the 
same batch. If the metadata file isn’t there, but the commit and offset files 
are there, we should throw an error as the checkpoint is in an inconsistent 
state.)

> Throw an error if the /metadata file is not present, but offset or commit 
> directories are non-empty
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-55058
>                 URL: https://issues.apache.org/jira/browse/SPARK-55058
>             Project: Spark
>          Issue Type: Task
>          Components: Structured Streaming
>    Affects Versions: 4.2.0
>            Reporter: Jerry Zheng
>            Priority: Major
>
> The {{metadata}} file holds the streaming query ID, and should be existent if 
> the commit and offset files are non-empty. This file not existing will result 
> in duplicates and incorrectness downstream if using DeltaSink which uses the 
> streaming query ID to dedup commits for the same batch. If the metadata file 
> isn’t there, but the commit and offset files are there, we should throw an 
> error as the checkpoint is in an inconsistent state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to