Guosmilesmile commented on PR #13609:
URL: https://github.com/apache/iceberg/pull/13609#issuecomment-3138994794

   
   @b-rick Thank you very much for pointing this out. The configuration for 
parquet's default compression codec is somewhat special.
   
   Since version 1.4.0, parquet's default compression codec changed from gzip 
to zstd by explicitly setting defaults in the table properties that apply only 
to new tables.
   
   
https://github.com/apache/iceberg/blob/1bd8d5e2de56d05180030b856ce2c50c66ef1f13/core/src/main/java/org/apache/iceberg/TableMetadata.java#L94-L96
   
   Common sinks can obtain the table information in advance, but dynamic sinks 
can only get the table information at runtime. So, in common sinks, the table 
is passed to `SinkUtil.writeProperties` and the compression codec is obtained 
as zstd, but in `DynamicIcebergSink`, null is passed, so it defaults to gzip.
   
   
https://github.com/apache/iceberg/blob/1bd8d5e2de56d05180030b856ce2c50c66ef1f13/flink/v2.0/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/DynamicIcebergSink.java#L340-L341
   
   In my opinion, instead of passing null to `SinkUtil.writeProperties`, we 
could pass a Map with the property `write.parquet.compression-codec: zstd` 
configured. This would solve the issue in this scenario. However, if new 
default table properties are added in the future, this place would also need to 
be updated accordingly, which is not very convenient.
   
   @pvary @mxm Do you have any other suggestions?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to