nsivabalan commented on code in PR #10915: URL: https://github.com/apache/hudi/pull/10915#discussion_r1571247566
########## hudi-common/src/main/java/org/apache/hudi/common/table/cdc/HoodieCDCExtractor.java: ########## @@ -114,6 +114,24 @@ public Map<HoodieFileGroupId, List<HoodieCDCFileSplit>> extractCDCFileSplits() { ValidationUtils.checkState(commits != null, "Empty commits"); Map<HoodieFileGroupId, List<HoodieCDCFileSplit>> fgToCommitChanges = new HashMap<>(); + Review Comment: nope. I am saying HoodieCDCExtractor.java has some inherent bug which I am trying to fix here. lets say timeline is as follows dc1 dc2 rc3 dc4 clean5 // cleans up data files from dc1 and dc2 since it was replaced by rc3. as per master, HoodieCDCExtractor goes over commit metadata in activetimeline and tries to deduce base files for log files found. In this case, all data files from dc1 and dc2 are already deleted by clean5. but HoodieCDCExtractor tries to parse data files from dc1 and dc2 and so we might hit file not found issue as per master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org