[I] Fix Hudi to not write in Parquet legacy format [hudi]

via GitHub Sat, 29 Nov 2025 21:04:13 -0800


hudi-bot opened a new issue, #15253:
URL: https://github.com/apache/hudi/issues/15253


   Currently Hudi have to write in Parquet legacy-format 
("spark.sql.parquet.writeLegacyFormat") whenever schema contains Decimals, due 
to the fact that it relies on AvroParquetReader which is unable to read 
Decimals in the non-legacy format (ie it could only read Decimals encoded as 
FIXED_BYTE_ARRAY, and not as INT32/INT64)
   
   This leads to suboptimal storage footprint where for example on some 
datasets this could lead to a bloat of 10% or more.
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-4321
   - Type: Bug
   - Epic: https://issues.apache.org/jira/browse/HUDI-3217
   - Fix version(s):
     - 1.1.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Fix Hudi to not write in Parquet legacy format [hudi]

Reply via email to