nbalajee commented on code in PR #9035:
URL: https://github.com/apache/hudi/pull/9035#discussion_r1258566304


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java:
##########
@@ -612,6 +612,20 @@ public class HoodieWriteConfig extends HoodieConfig {
       .sinceVersion("0.10.0")
       .withDocumentation("File Id Prefix provider class, that implements 
`org.apache.hudi.fileid.FileIdPrefixProvider`");
 
+  public static final ConfigProperty<String> ENFORCE_COMPLETION_MARKER_CHECKS 
= ConfigProperty
+      .key("hoodie.markers.enforce.completion.checks")
+      .defaultValue("false")
+      .sinceVersion("0.10.0")
+      .withDocumentation("Prevents the creation of duplicate data files, when 
multiple spark tasks are racing to "
+          + "create data files and a completed data file is already present");
+
+  public static final ConfigProperty<String> ENFORCE_FINALIZE_WRITE_CHECK = 
ConfigProperty
+      .key("hoodie.markers.enforce.finalize.write.check")
+      .defaultValue("false")
+      .sinceVersion("0.10.0")
+      .withDocumentation("When WriteStatus obj is lost due to engine related 
failures, then recomputing would involve "
+          + "re-writing all the data files. When this check is enabled it 
would block the rewrite from happening.");

Review Comment:
   Write statuses are  persisted to the disk.  When running large jobs, if 
large number of containers running the code were to be lost, then the RDD 
blocks persisted in the local storage of the containers are also lost.  In this 
scenario, the spark/execution engine tries to rebuild the lost blocks by 
retrying the stage/tasks.
   
   I have renamed the Completion marker to "Optimize Task retries with 
Markers".  Similarly this can be renamed to "Fail retries after finalize 
markers" - "hoodie.markers.fail.retries.after.finalize" 
   
   Thoughts? @danny0405 @nsivabalan 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to