[ https://issues.apache.org/jira/browse/HUDI-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
satish updated HUDI-5318: ------------------------- Fix Version/s: 0.13.0 (was: 0.12.2) > Clustering schduling now will list all partition in table when > PARTITION_SELECTED is set > ---------------------------------------------------------------------------------------- > > Key: HUDI-5318 > URL: https://issues.apache.org/jira/browse/HUDI-5318 > Project: Apache Hudi > Issue Type: Bug > Components: clustering > Reporter: Qijun Fu > Assignee: Qijun Fu > Priority: Major > Labels: pull-request-available > Fix For: 0.13.0 > > > Currently PartitionAwareClusteringPlanStrategy will list all partition in > table whether PARTITION_SELECTED is set or not. List all partition in the > dataset is a very expensive operation when the number of partition is huge. > We can skip list all partition when PARTITION_SELECTED is set, so that > clustering scheduling can benefit a lot fromĀ partition pruning. -- This message was sent by Atlassian Jira (v8.20.10#820010)