ad1happy2go opened a new pull request, #18722: URL: https://github.com/apache/hudi/pull/18722
### Describe the issue this Pull Request addresses Parquet base files written by Hudi use the parquet-mr ZSTD codec at its hard-coded default level (3). Users who want a higher compression ratio (or faster level for ingest) currently have to push `parquet.compression.codec.zstd.level` through raw Hadoop conf, which bypasses Hudi's config plumbing and isn't surfaced anywhere in `HoodieStorageConfig`. ### Summary and Changelog - Added `hoodie.parquet.compression.codec.zstd.level` (default `3`, valid range `-22..22`, matching parquet-mr's `ZstandardCodec`). - Threaded the level through `HoodieParquetConfig` (new field + backward-compatible 8-arg constructor preserved) and on to parquet-mr via `parquet.compression.codec.zstd.level` on a defensive `Configuration` copy, so the shared task-level conf isn't mutated. - Wired the new setting through every parquet writer entry point: `HoodieAvroFileWriterFactory`, `HoodieSparkFileWriterFactory`, `HoodieRowDataFileWriterFactory`, `HoodieInternalRowFileWriterFactory`, `HoodieBaseParquetWriter`, `HoodieParquetStreamWriter`, `HoodieSparkParquetStreamWriter`, `HoodieRowDataParquetOutputStreamWriter`. - Level is only stamped onto the conf when the active codec is `ZSTD` — non-ZSTD writes are unaffected. Validation: `mvn -pl hudi-hadoop-common -am -Dtest=TestHoodieParquetConfig,TestHoodieStorageConfig -DfailIfNoTests=false -DskipITs=true -DskipFTs=true test` ### Impact New user-facing config only. No public API change — existing callers compile unchanged via the backward-compatible `HoodieParquetConfig` constructor that defaults the level to 3 (parquet-mr's existing default). Default behavior for existing tables is unchanged. ### Risk Level low. The level is only applied when the codec is ZSTD, and only on a defensive Configuration copy. All other codecs and the existing default ZSTD level are unaffected. ### Documentation Update The new config (`hoodie.parquet.compression.codec.zstd.level`) is declared with description and `withDocumentation` in `HoodieStorageConfig`, so it will be picked up by the auto-generated configurations page. ### Contributor's checklist - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [x] Enough context is provided in the sections above - [x] Adequate tests were added if applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
