nsivabalan commented on code in PR #10915:
URL: https://github.com/apache/hudi/pull/10915#discussion_r1571247566


##########
hudi-common/src/main/java/org/apache/hudi/common/table/cdc/HoodieCDCExtractor.java:
##########
@@ -114,6 +114,24 @@ public Map<HoodieFileGroupId, List<HoodieCDCFileSplit>> 
extractCDCFileSplits() {
     ValidationUtils.checkState(commits != null, "Empty commits");
 
     Map<HoodieFileGroupId, List<HoodieCDCFileSplit>> fgToCommitChanges = new 
HashMap<>();
+

Review Comment:
   nope. I am saying HoodieCDCExtractor.java has some inherent bug which I am 
trying to fix here. 
   
   lets say timeline is as follows
   
   dc1
   dc2
   rc3
   dc4
   clean5 // cleans up data files from dc1 and dc2 since it was replaced by 
rc3. 
   
   as per master, HoodieCDCExtractor goes over commit metadata in 
activetimeline and tries to deduce base files for log files found. In this 
case, all data files from dc1 and dc2 are already deleted by clean5. but 
HoodieCDCExtractor tries to parse data files from dc1 and dc2 and so we might 
hit file not found issue as per master. 
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to