770120041 commented on code in PR #15549:
URL: https://github.com/apache/iceberg/pull/15549#discussion_r2905808502


##########
core/src/main/java/org/apache/iceberg/TableProperties.java:
##########
@@ -143,7 +143,7 @@ private TableProperties() {}
 
   public static final String PARQUET_COMPRESSION = 
"write.parquet.compression-codec";
   public static final String DELETE_PARQUET_COMPRESSION = 
"write.delete.parquet.compression-codec";
-  public static final String PARQUET_COMPRESSION_DEFAULT = "gzip";
+  public static final String PARQUET_COMPRESSION_DEFAULT = "zstd";
   public static final String PARQUET_COMPRESSION_DEFAULT_SINCE_1_4_0 = "zstd";

Review Comment:
   Thanks for the review! You're right that new tables already get ZSTD via 
`persistedProperties()` in `TableMetadata`, which explicitly sets 
`write.parquet.compression-codec = zstd` at creation time.
   
   However, `PARQUET_COMPRESSION_DEFAULT` is still used as the fallback in 
`Parquet.java`, `SparkWriteConf`, and `FlinkWriteConf` for tables that don't 
have the property set (e.g., tables created by other implementations like 
pyiceberg, or older tables). I see from the [issue 
discussion](https://github.com/apache/iceberg/issues/15236#issuecomment-3854510554)
 that this was intentional to avoid changing behavior for existing tables on 
upgrade.
   
   Given that context, this change would affect existing tables that lack an 
explicit compression property, do we want to update those or we want to keep 
the existing behavior. I am open to close this pr if this is intended



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to