vamshikrishnakyatham opened a new pull request, #14138:
URL: https://github.com/apache/hudi/pull/14138
### Describe the issue this Pull Request addresses
When downgrading a Hudi table from version 9 to version 8, column stats V2
partitions are correctly deleted, but partition stats partitions are not
deleted even though they should be. This happens because:
1. In some cases, partition_stats exists as a metadata partition but is
missing from the index definitions file (.hoodie/.index_defs/index.json)
2. Even when partitions are dropped, their index definitions remain in
index.json for column stats
This leaves the table in an inconsistent state after downgrade with:
- partition_stats directory still present in .hoodie/metadata/
- partition_stats still listed in hoodie.properties metadata partitions
- Stale index definitions in index.json
### Summary and Changelog
1. Enhanced `UpgradeDowngradeUtils.dropNonV1IndexPartitions()` by adding
logic to detect when column stats V2 is being deleted during downgrade. If
column stats V2 is in the deletion list and partition stats exists in metadata
partitions (even without an index definition), partition stats is now added to
the deletion list. This handles the bug where partition stats lacks an index
definition entry
2. Enhanced `BaseHoodieWriteClient.dropIndex()` by extending index
definition cleanup to include COLUMN_STATS and PARTITION_STATS partitions.
Previously only secondary indexes and expression indexes had their definitions
removed from index.json but now when dropping column stats or partition stats
(e.g., during downgrade), their index definitions are properly deleted.
### Impact
Users downgrading from table version 9 to 8 will now have a cleaner state
with proper cleanup
### Risk Level
low
### Documentation Update
none, its a fix
### Contributor's checklist
- [x] Read through [contributor's
guide](https://hudi.apache.org/contribute/how-to-contribute)
- [x] Enough context is provided in the sections above
- [x] Adequate tests were added if applicable
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]