voonhous commented on PR #9922:
URL: https://github.com/apache/hudi/pull/9922#issuecomment-1782187627

   > Thanks for the fix, from high-level, I kind of think we should avoid to 
relies on the Spark mechanisms to add any rollback/cleaning improvement here, 
it's hacky to maintain and it is not tenable for all engines.
   
   Agree, however, if we want to address, we would need mechanisms for ignoring 
corrupted files that were created by zombie tasks. Which at this stage, is not 
trivial to implement. 
   
   At the most vanilla deployment (no MDT) of Hudi, a "VALID" base file is 
basically a file with the largest timestamp (with filegroup that is not in any 
replacecommit).
   
   If we want to modify this from a high-level, we will need to modify the 
heuristics in determining what is a "VALID" basefile.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to