hudi-bot opened a new issue, #15253:
URL: https://github.com/apache/hudi/issues/15253
Currently Hudi have to write in Parquet legacy-format
("spark.sql.parquet.writeLegacyFormat") whenever schema contains Decimals, due
to the fact that it relies on AvroParquetReader which is unable to read
Decimals in the non-legacy format (ie it could only read Decimals encoded as
FIXED_BYTE_ARRAY, and not as INT32/INT64)
This leads to suboptimal storage footprint where for example on some
datasets this could lead to a bloat of 10% or more.
## JIRA info
- Link: https://issues.apache.org/jira/browse/HUDI-4321
- Type: Bug
- Epic: https://issues.apache.org/jira/browse/HUDI-3217
- Fix version(s):
- 1.1.0
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]