Guosmilesmile commented on PR #13609: URL: https://github.com/apache/iceberg/pull/13609#issuecomment-3138994794
@b-rick Thank you very much for pointing this out. The configuration for parquet's default compression codec is somewhat special. Since version 1.4.0, parquet's default compression codec changed from gzip to zstd by explicitly setting defaults in the table properties that apply only to new tables. https://github.com/apache/iceberg/blob/1bd8d5e2de56d05180030b856ce2c50c66ef1f13/core/src/main/java/org/apache/iceberg/TableMetadata.java#L94-L96 Common sinks can obtain the table information in advance, but dynamic sinks can only get the table information at runtime. So, in common sinks, the table is passed to `SinkUtil.writeProperties` and the compression codec is obtained as zstd, but in `DynamicIcebergSink`, null is passed, so it defaults to gzip. https://github.com/apache/iceberg/blob/1bd8d5e2de56d05180030b856ce2c50c66ef1f13/flink/v2.0/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/DynamicIcebergSink.java#L340-L341 In my opinion, instead of passing null to `SinkUtil.writeProperties`, we could pass a Map with the property `write.parquet.compression-codec: zstd` configured. This would solve the issue in this scenario. However, if new default table properties are added in the future, this place would also need to be updated accordingly, which is not very convenient. @pvary @mxm Do you have any other suggestions? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
