ad1happy2go opened a new pull request, #18722:
URL: https://github.com/apache/hudi/pull/18722

   ### Describe the issue this Pull Request addresses
   
   Parquet base files written by Hudi use the parquet-mr ZSTD codec at its 
hard-coded default level (3). Users who want a higher compression ratio (or 
faster level for ingest) currently have to push 
`parquet.compression.codec.zstd.level` through raw Hadoop conf, which bypasses 
Hudi's config plumbing and isn't surfaced anywhere in `HoodieStorageConfig`.
   
   ### Summary and Changelog
   
   - Added `hoodie.parquet.compression.codec.zstd.level` (default `3`, valid 
range `-22..22`, matching parquet-mr's `ZstandardCodec`).
   - Threaded the level through `HoodieParquetConfig` (new field + 
backward-compatible 8-arg constructor preserved) and on to parquet-mr via 
`parquet.compression.codec.zstd.level` on a defensive `Configuration` copy, so 
the shared task-level conf isn't mutated.
   - Wired the new setting through every parquet writer entry point: 
`HoodieAvroFileWriterFactory`, `HoodieSparkFileWriterFactory`, 
`HoodieRowDataFileWriterFactory`, `HoodieInternalRowFileWriterFactory`, 
`HoodieBaseParquetWriter`, `HoodieParquetStreamWriter`, 
`HoodieSparkParquetStreamWriter`, `HoodieRowDataParquetOutputStreamWriter`.
   - Level is only stamped onto the conf when the active codec is `ZSTD` — 
non-ZSTD writes are unaffected.
   
   Validation:
   
   `mvn -pl hudi-hadoop-common -am 
-Dtest=TestHoodieParquetConfig,TestHoodieStorageConfig -DfailIfNoTests=false 
-DskipITs=true -DskipFTs=true test`
   
   ### Impact
   
   New user-facing config only. No public API change — existing callers compile 
unchanged via the backward-compatible `HoodieParquetConfig` constructor that 
defaults the level to 3 (parquet-mr's existing default). Default behavior for 
existing tables is unchanged.
   
   ### Risk Level
   
   low. The level is only applied when the codec is ZSTD, and only on a 
defensive Configuration copy. All other codecs and the existing default ZSTD 
level are unaffected.
   
   ### Documentation Update
   
   The new config (`hoodie.parquet.compression.codec.zstd.level`) is declared 
with description and `withDocumentation` in `HoodieStorageConfig`, so it will 
be picked up by the auto-generated configurations page.
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Enough context is provided in the sections above
   - [x] Adequate tests were added if applicable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to