suryaprasanna opened a new pull request, #17935:
URL: https://github.com/apache/hudi/pull/17935

   ### Describe the issue this Pull Request addresses
   
   Metadata table currently inherits the cleaning policy from the data table, 
which may not always be optimal for metadata table operations. This PR 
introduces a dedicated configurable cleaning policy for the metadata table that 
can be set independently from the data table.
   
   ### Summary and Changelog
   
   This PR adds support for configuring the metadata table's cleaning policy 
independently from the data table. Users can now set 
`hoodie.metadata.clean.policy` to control how the metadata table performs 
cleaning operations.
   
   **Changes:**
   - Added new config `hoodie.metadata.clean.policy` in `HoodieMetadataConfig` 
with default value `KEEP_LATEST_FILE_VERSIONS`
   - Added `getCleanerPolicy()` getter method in `HoodieMetadataConfig` to 
retrieve the configured policy
   - Added `withCleanerPolicy()` builder method in 
`HoodieMetadataConfig.Builder` to set the policy
   - Modified `HoodieMetadataWriteUtils.createMetadataWriteConfig()` to use 
metadata table's own cleaning policy instead of inheriting from data table
   - Retention values (commits/file versions/hours) are calculated as 1.2x the 
data table's configured values based on the selected policy
   
   The metadata table now uses its own cleaning policy configuration while 
still maintaining sensible defaults that scale with the data table's retention 
settings.
   
   ### Impact
   
   Users can now independently configure metadata table cleaning behavior. The 
default policy (`KEEP_LATEST_FILE_VERSIONS`) is optimal for most metadata table 
use cases as it ensures efficient file management regardless of the data 
table's cleaning strategy.
   
   ### Risk Level
   
   **low** - This change is backward compatible. The default policy 
(`KEEP_LATEST_FILE_VERSIONS`) ensures stable metadata table behavior, and 
retention values are automatically scaled from data table settings.
   
   ### Documentation Update
   
   **New config:**
   - `hoodie.metadata.clean.policy` (advanced): Determines the cleaner policy 
for metadata table. Default: `KEEP_LATEST_FILE_VERSIONS`. Supported values: 
`KEEP_LATEST_COMMITS`, `KEEP_LATEST_FILE_VERSIONS`, `KEEP_LATEST_BY_HOURS`. The 
retention values (commits/file versions/hours) are automatically calculated as 
1.2x the data table's configured values.
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Enough context is provided in the sections above
   - [ ] Adequate tests were added if applicable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to