yihua commented on issue #6686:
URL: https://github.com/apache/hudi/issues/6686#issuecomment-1254272868

   @asankadarshana007  The consistency check, when enabled, happens when 
removing invalid data files: (1) check that all paths to delete exist, (2) 
delete them, (3) wait for all paths to disappear after eventual consistency.  
Note that this logic is not needed for strong consistency.  As the invalid data 
files are now determined based on the markers, there could be a case where a 
marker is created, but the data file has not started being written, so that the 
check (1) fails, which is okay.  Given that there is no use case for the 
eventual consistency atm, we don't maintain the logic.
   
   Let me know if turning off `hoodie.consistency.check.enabled` solves your 
problem.  You can close the ticket if all good.
   
   ```
         if (!invalidDataPaths.isEmpty()) {
           LOG.info("Removing duplicate data files created due to task retries 
before committing. Paths=" + invalidDataPaths);
           Map<String, List<Pair<String, String>>> invalidPathsByPartition = 
invalidDataPaths.stream()
               .map(dp -> Pair.of(new Path(basePath, 
dp).getParent().toString(), new Path(basePath, dp).toString()))
               .collect(Collectors.groupingBy(Pair::getKey));
   
           // Ensure all files in delete list is actually present. This is 
mandatory for an eventually consistent FS.
           // Otherwise, we may miss deleting such files. If files are not 
found even after retries, fail the commit
           if (consistencyCheckEnabled) {
             // This will either ensure all files to be deleted are present.
             waitForAllFiles(context, invalidPathsByPartition, 
FileVisibility.APPEAR);
           }
   
           // Now delete partially written files
           context.setJobStatus(this.getClass().getSimpleName(), "Delete all 
partially written files: " + config.getTableName());
           deleteInvalidFilesByPartitions(context, invalidPathsByPartition);
   
           // Now ensure the deleted files disappear
           if (consistencyCheckEnabled) {
             // This will either ensure all files to be deleted are absent.
             waitForAllFiles(context, invalidPathsByPartition, 
FileVisibility.DISAPPEAR);
           }
         }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to