prashantwason opened a new pull request, #18384: URL: https://github.com/apache/hudi/pull/18384
## Summary - Adds `hoodie.meta.fields.to.exclude` config for selective meta field population - Excluded meta fields are written as **null** (not empty string) for optimal Parquet storage savings - Covers all 4 write paths: Avro file writers, Spark InternalRow, Spark SQL row-writer, Flink - Uses pre-computed `boolean[5]` indexed by meta field ordinal for zero-overhead per-row checks - Disables bloom filter when `_hoodie_record_key` is excluded - Fixes null safety in Flink `AbstractHoodieRowData.getString()` ## Motivation Closes https://github.com/apache/hudi/issues/18383 Discussion: https://github.com/apache/hudi/discussions/17959 Users currently face a trade-off: `hoodie.populate.meta.fields` is all-or-nothing. Disabling it saves storage but loses incremental query capability (requires `_hoodie_commit_time`). Fields like `_hoodie_record_key`, `_hoodie_partition_path`, and `_hoodie_file_name` can be virtualized and don't need physical storage. This PR adds a middle ground: selectively exclude virtualizable meta fields while keeping essential ones like `_hoodie_commit_time`. **Example config:** ``` hoodie.populate.meta.fields=true hoodie.meta.fields.to.exclude=_hoodie_record_key,_hoodie_partition_path,_hoodie_file_name,_hoodie_commit_seqno ``` ## Test plan - [ ] Verify compilation across all modules (Spark, Flink, Avro) - [ ] Run existing `populateMetaFields` tests for regression (`TestHoodieRowCreateHandle`, `TestHoodieDatasetBulkInsertHelper`) - [ ] Add test with selective exclusion verifying excluded fields are null in written Parquet files - [ ] Verify non-excluded fields have correct values - [ ] Verify all-excluded behavior matches `populateMetaFields=false` 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
