swat1234 opened a new issue, #8713:
URL: https://github.com/apache/iceberg/issues/8713
Iceberg tables not compressing parquet file in s3. When the below Table
parameters are used for the Compression the file size is increasing in
comparison with uncompression. Can some one please assist on the same.
1. File with UNCOMPRESSED codec.
00000-0-0129ba78-17f6-466f-b57b-695c678d64d5-00001.parquet === size 682 bytes
},
"properties" : {
"codec" : "UNCOMPRESSED",
-------------------------------
2. File with gzip codec 733 bytes
00000-0-e6f22c0e-2e16-43aa-8a5f-efabee995876-00001.parquet
"properties" : {
"codec" : "GZIP",
-------------------------------
3. File with code snappy codec 686 bytes.
00000-0-36fd4aad-8c38-40f5-8241-78ffe4f0a032-00001.parquet
"codec" : "SNAPPY",
"path" : {
--------------------------------------------------------------
Table Properties:
"parquet.compression": "SNAPPY"
"read.parquet.vectorization.batch-size": "5000"
"read.split.target-size": "134217728"
"read.parquet.vectorization.enabled": "true"
"write.parquet.page-size-bytes": "1048576"
"write.parquet.row-group-size-bytes": "134217728"
"write_compression": "SNAPPY"
"write.parquet.compression-codec": "snappy"
"write.metadata.metrics.max-inferred-column-defaults": "100"
"write.parquet.compression-level": "4"
"write.target-file-size-bytes": "536870912"
"write.delete.target-file-size-bytes": "67108864"
"write.parquet.page-row-limit": "20000"
"write.format.default": "parquet"
"write.metadata.compression-codec": "gzip"
"write.compression": "SNAPPY"
Thanks in advance!!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]