[jira] [Updated] (HUDI-6416) Completion Markers for handling spark retries
[ https://issues.apache.org/jira/browse/HUDI-6416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-6416: -- Epic Link: HUDI-7967 > Completion Markers for handling spark retries > - > > Key: HUDI-6416 > URL: https://issues.apache.org/jira/browse/HUDI-6416 > Project: Apache Hudi > Issue Type: Bug >Reporter: Balajee Nagasubramaniam >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > During spark stage retries, spark driver may have all the information to > reconcile the commit and proceed with next steps, while a stray executor may > still be writing to a data file and complete later (before the JVM exit). > Extra files left on the dataset, excluded from reconcile commit step could > show up as data quality issue for query engines with duplicate records. > This change brings completion markers which tries to prevent the dataset from > experiencing data quality issues, in such corner case scenarios. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6416) Completion Markers for handling spark retries
[ https://issues.apache.org/jira/browse/HUDI-6416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-6416: -- Status: In Progress (was: Open) > Completion Markers for handling spark retries > - > > Key: HUDI-6416 > URL: https://issues.apache.org/jira/browse/HUDI-6416 > Project: Apache Hudi > Issue Type: Bug >Reporter: Balajee Nagasubramaniam >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > During spark stage retries, spark driver may have all the information to > reconcile the commit and proceed with next steps, while a stray executor may > still be writing to a data file and complete later (before the JVM exit). > Extra files left on the dataset, excluded from reconcile commit step could > show up as data quality issue for query engines with duplicate records. > This change brings completion markers which tries to prevent the dataset from > experiencing data quality issues, in such corner case scenarios. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6416) Completion Markers for handling spark retries
[ https://issues.apache.org/jira/browse/HUDI-6416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-6416: -- Sprint: 2024/06/17-30 > Completion Markers for handling spark retries > - > > Key: HUDI-6416 > URL: https://issues.apache.org/jira/browse/HUDI-6416 > Project: Apache Hudi > Issue Type: Bug >Reporter: Balajee Nagasubramaniam >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > > During spark stage retries, spark driver may have all the information to > reconcile the commit and proceed with next steps, while a stray executor may > still be writing to a data file and complete later (before the JVM exit). > Extra files left on the dataset, excluded from reconcile commit step could > show up as data quality issue for query engines with duplicate records. > This change brings completion markers which tries to prevent the dataset from > experiencing data quality issues, in such corner case scenarios. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-6416) Completion Markers for handling spark retries
[ https://issues.apache.org/jira/browse/HUDI-6416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6416: - Labels: pull-request-available (was: ) > Completion Markers for handling spark retries > - > > Key: HUDI-6416 > URL: https://issues.apache.org/jira/browse/HUDI-6416 > Project: Apache Hudi > Issue Type: Bug >Reporter: Balajee Nagasubramaniam >Priority: Major > Labels: pull-request-available > > During spark stage retries, spark driver may have all the information to > reconcile the commit and proceed with next steps, while a stray executor may > still be writing to a data file and complete later (before the JVM exit). > Extra files left on the dataset, excluded from reconcile commit step could > show up as data quality issue for query engines with duplicate records. > This change brings completion markers which tries to prevent the dataset from > experiencing data quality issues, in such corner case scenarios. -- This message was sent by Atlassian Jira (v8.20.10#820010)