ad1happy2go opened a new pull request, #18968: URL: https://github.com/apache/hudi/pull/18968
### Describe the issue this Pull Request addresses Consistent hashing bucket-index clustering (bucket resizing) fails on **non-partitioned** tables. For a non-partitioned table the partition path stored in the clustering group metadata is an empty string (`""`). `SingleSparkJobConsistentHashingExecutionStrategy` validated the partition with a "not null **or empty**" guard and threw `IllegalArgumentException: Partition should not be null or empty` before any clustering work could run, so split/merge resizing was impossible on non-partitioned tables. JIRA: HUDI-18161 ### Summary and Changelog - Relax the partition guard in `SingleSparkJobConsistentHashingExecutionStrategy` (both the merge and split paths) from `!StringUtils.isNullOrEmpty(partition)` to `partition != null`. An empty partition path is valid for non-partitioned tables; a genuinely absent metadata key (`null`) is still rejected. - Add a parameterized test `testResizingNonPartitioned` in `TestSparkConsistentBucketClustering`, mirroring the existing `testResizing`, covering split and merge resizing on a non-partitioned table across the single-job and multi-job execution strategies and the row-writer on/off paths. A `setup(..., boolean nonPartitioned)` overload configures the non-partition key generator and an empty partition-path field. ### Impact Consistent hashing clustering now works on non-partitioned tables. No behavior change for partitioned tables — when the partition path is non-empty (every partitioned table), the guard evaluates identically and the code path is unchanged. No public API or config changes. ### Risk Level low Partitioned-table behavior is byte-for-byte identical (the relaxed guard only differs when the partition path is the empty string). Verified via the new parameterized test and an end-to-end `spark-shell` run on Spark 4.0.2 against a non-partitioned MOR table with consistent-hashing bucket index: a follow-up write triggered inline split clustering through `SingleSparkJobConsistentHashingExecutionStrategy`, increasing the bucket count from 2 to 4 with all records preserved. ### Documentation Update none ### Contributor's checklist - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [x] Enough context is provided in the sections above - [x] Adequate tests were added if applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
