nbalajee commented on code in PR #9035: URL: https://github.com/apache/hudi/pull/9035#discussion_r1258658841
########## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java: ########## @@ -138,9 +139,35 @@ protected Path makeNewFilePath(String partitionPath, String fileName) { * * @param partitionPath Partition path */ - protected void createMarkerFile(String partitionPath, String dataFileName) { - WriteMarkersFactory.get(config.getMarkersType(), hoodieTable, instantTime) - .create(partitionPath, dataFileName, getIOType(), config, fileId, hoodieTable.getMetaClient().getActiveTimeline()); + protected void createInProgressMarkerFile(String partitionPath, String dataFileName, String markerInstantTime) { + WriteMarkers writeMarkers = WriteMarkersFactory.get(config.getMarkersType(), hoodieTable, instantTime); + if (!writeMarkers.doesMarkerDirExist()) { + throw new HoodieIOException(String.format("Marker root directory absent : %s/%s (%s)", + partitionPath, dataFileName, markerInstantTime)); + } + if (config.enforceFinalizeWriteCheck() + && writeMarkers.markerExists(writeMarkers.getCompletionMarkerPath("", "FINALIZE_WRITE", markerInstantTime, IOType.CREATE))) { Review Comment: If the job has passed through the (a) write stage to create the data files (b) started a commit and have finalized writes (keeping the files that are part of the write statuses and removing the duplicate files) (c) when updating the MDT for RLI (or before updating MDT), if writestatus information (RDD blocks also persisted in the containers local storage) are found to be lost due to lost/failed containers Having this flag turned on would force the job to fail, instead of retrying the tasks/stages to recreate the data files (associated with missing write statuses). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org