steveloughran commented on issue #14439:
URL: https://github.com/apache/iceberg/issues/14439#issuecomment-3712129356

   we moved to having checksum validation in hadoop 3.4.3 as follows
   
https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/DefaultS3ClientFactory.java#L217
   
   ```
       if (parameters.isMd5HeaderEnabled()) {
         LOG.debug("MD5 header enabled");
         builder.addPlugin(LegacyMd5Plugin.create());
       }
   
       //when to calculate request checksums.
       final RequestChecksumCalculation checksumCalculation =
           parameters.isChecksumCalculationEnabled()
               ? RequestChecksumCalculation.WHEN_SUPPORTED
               : RequestChecksumCalculation.WHEN_REQUIRED;
       LOG.debug("Using checksum calculation policy: {}", checksumCalculation);
       builder.requestChecksumCalculation(checksumCalculation);
   
       // response checksum validation. Slow, even with CRC32 checksums.
       final ResponseChecksumValidation checksumValidation;
       checksumValidation = parameters.isChecksumValidationEnabled()
           ? ResponseChecksumValidation.WHEN_SUPPORTED
           : ResponseChecksumValidation.WHEN_REQUIRED;
       LOG.debug("Using checksum validation policy: {}", checksumValidation);
       builder.responseChecksumValidation(checksumValidation);
   ```
   with defaults of: MD5 enabled; checksum validation down to WHEN_REQUIRED
   
   this works with all external stores tested (dell, gcs). Enabling the MD5 
plugin did cause problems with S3-express, but the SDK team came up with a 
workaround (https://github.com/aws/aws-sdk-java-v2/issues/6459). 
   
   to summarise, on AWS SDK 2.35.4+
   * restoring md5 plugin works everywhere, aws and non-aws
   * changing request and response checkums to when required works everywhere
   
   Given the connections to all production systems are via TLS, adding extra 
checksums seems, well, needless. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to