[GitHub] [iceberg] singhpk234 commented on a diff in pull request #5437: AWS: Fix kryo serialization failure for S3 FileIO

GitBox Sun, 07 Aug 2022 04:46:51 -0700


singhpk234 commented on code in PR #5437:
URL: https://github.com/apache/iceberg/pull/5437#discussion_r939650527



##########
aws/src/main/java/org/apache/iceberg/aws/glue/GlueCatalog.java:
##########
@@ -210,25 +211,24 @@ void initialize(
   protected TableOperations newTableOps(TableIdentifier tableIdentifier) {
     if (catalogProperties != null) {
       Map<String, String> tableSpecificCatalogProperties =
-          ImmutableMap.<String, String>builder()
-              .putAll(catalogProperties)
-              .put(
-                  AwsProperties.LAKE_FORMATION_DB_NAME,
-                  IcebergToGlueConverter.getDatabaseName(
-                      tableIdentifier, 
awsProperties.glueCatalogSkipNameValidation()))
-              .put(
-                  AwsProperties.LAKE_FORMATION_TABLE_NAME,
-                  IcebergToGlueConverter.getTableName(
-                      tableIdentifier, 
awsProperties.glueCatalogSkipNameValidation()))
-              .build();
+          Maps.newHashMapWithExpectedSize(catalogProperties.size() + 2);
+      tableSpecificCatalogProperties.putAll(catalogProperties);
+      tableSpecificCatalogProperties.put(
+          AwsProperties.LAKE_FORMATION_DB_NAME,
+          IcebergToGlueConverter.getDatabaseName(
+              tableIdentifier, awsProperties.glueCatalogSkipNameValidation()));
+      tableSpecificCatalogProperties.put(
+          AwsProperties.LAKE_FORMATION_TABLE_NAME,
+          IcebergToGlueConverter.getTableName(
+              tableIdentifier, awsProperties.glueCatalogSkipNameValidation()));
       // FileIO initialization depends on tableSpecificCatalogProperties, so a 
new FileIO is
       // initialized each time
       return new GlueTableOperations(
           glue,
           lockManager,
           catalogName,
           awsProperties,
-          initializeFileIO(tableSpecificCatalogProperties),
+          
initializeFileIO(Collections.unmodifiableMap(tableSpecificCatalogProperties)),

Review Comment:
   +1, having a UT will certainly help us catching this, if this happens again. 
   
   > To test this, you could do newTableOps on a GlueCatalog and then ensure 
that the resulting FileIO can be round trip serialized through Kryo (which 
there's a helper method for I believe).
   
   +1 mostly did the same just wrote a new helper for it (still thinking other 
alternatives), as at present Kryo helper depends on spark, in background 
includes Twitter#chill serializers. newTableOps presently have a protected 
access, so was facing issues in writing spark UT. 
   
   The new helper takes a dependency in twitter-chill which spark uses and i 
think flink as well (may be wrong here, ref 
[link](https://flink.apache.org/news/2020/04/15/flink-serialization-tuning-vol-1.html#kryo)).
   
   All suggestions are welcomed here :) !
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] singhpk234 commented on a diff in pull request #5437: AWS: Fix kryo serialization failure for S3 FileIO

Reply via email to