[ https://issues.apache.org/jira/browse/HUDI-6416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated HUDI-6416: --------------------------------- Labels: pull-request-available (was: ) > Completion Markers for handling spark retries > --------------------------------------------- > > Key: HUDI-6416 > URL: https://issues.apache.org/jira/browse/HUDI-6416 > Project: Apache Hudi > Issue Type: Bug > Reporter: Balajee Nagasubramaniam > Priority: Major > Labels: pull-request-available > > During spark stage retries, spark driver may have all the information to > reconcile the commit and proceed with next steps, while a stray executor may > still be writing to a data file and complete later (before the JVM exit). > Extra files left on the dataset, excluded from reconcile commit step could > show up as data quality issue for query engines with duplicate records. > This change brings completion markers which tries to prevent the dataset from > experiencing data quality issues, in such corner case scenarios. -- This message was sent by Atlassian Jira (v8.20.10#820010)