prashantwason commented on code in PR #18092:
URL: https://github.com/apache/hudi/pull/18092#discussion_r2843936490
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/cluster/strategy/PartitionAwareClusteringPlanStrategy.java:
##########
@@ -122,7 +122,10 @@ protected Pair<Stream<HoodieClusteringGroup>, Boolean>
buildClusteringGroupsForP
* Return list of partition paths to be considered for clustering.
*/
public Pair<List<String>, List<String>>
filterPartitionPaths(HoodieWriteConfig writeConfig, List<String> partitions) {
- return ClusteringPlanPartitionFilter.filter(partitions, getWriteConfig());
+ Pair<List<String>, List<String>> result =
ClusteringPlanPartitionFilter.filter(partitions, getWriteConfig());
Review Comment:
Thanks for the review!
**Why sort here vs generateClusteringPlan():**
The sorting is intentionally placed in `filterPartitionPaths()` rather than
near the limit in `generateClusteringPlan()` because:
- `filterPartitionPaths()` is the source of partition paths for clustering -
sorting here ensures consistency regardless of how partitions are used
downstream
- The `clusteringMaxNumGroups` limit in `generateClusteringPlan()` operates
on clustering groups (not partitions directly), so sorting partitions there
would be too late
- Sorting at the filter level is more defensive and ensures deterministic
behavior in all code paths that consume the filtered partitions
**Compaction:**
Looking at the compaction code, the partition-aware strategies already
handle sorting:
- `BoundedPartitionAwareCompactionStrategy.filterPartitionPaths()` sorts
with `Comparator.reverseOrder()` (line 73-74)
- `UnBoundedPartitionAwareCompactionStrategy.filterPartitionPaths()` also
sorts in reverse order
So compaction doesn't have this issue - the bounded strategies already
include sorting as part of their logic.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]