kbuci commented on code in PR #18016:
URL: https://github.com/apache/hudi/pull/18016#discussion_r2886782962
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java:
##########
@@ -241,6 +243,17 @@ private Stream<String>
getPartitionsForInstants(HoodieInstant instant) {
} else {
HoodieCommitMetadata commitMetadata =
hoodieTable.getActiveTimeline().readCommitMetadata(instant);
+ WriteOperationType operationType = commitMetadata.getOperationType();
+ if ((HoodieTimeline.COMMIT_ACTION.equals(instant.getAction()) &&
hoodieTable.getMetaClient().getTableType().equals(
+ HoodieTableType.COPY_ON_WRITE)) ||
(HoodieTimeline.DELTA_COMMIT_ACTION.equals(instant.getAction()) &&
hoodieTable.getMetaClient().getTableType().equals(
+ HoodieTableType.MERGE_ON_READ))) {
+ if (WriteOperationType.isUpsert(operationType) ||
WriteOperationType.isInsertWithoutReplace(operationType)) {
Review Comment:
I ended up removing the MOR deltacommit handling , and just limited this
change to COW.
This is since while trying to figure out why a test was failing
https://github.com/apache/hudi/pull/18016#issuecomment-3862359967 I looked a
bit further in code and I think MOR deltacommits (or at least write operations)
can actually create new base files during small file handling
https://github.com/apache/hudi/blob/3dcf4c65b26179b50b70b1c59670823b74e03793/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/deltacommit/BaseSparkDeltaCommitActionExecutor.java#L78
So for the time being I just added a TODO comment explaining how for MOR
cases we need to handle "parsing" the deltacommit to handle these cases
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]