plusplusjiajia commented on PR #3120:
URL: https://github.com/apache/iceberg-python/pull/3120#issuecomment-4102841635

   > thanks for the PR! i understand this is to align with the java SigV4 
implementation. Could you help me understand the specific scenario in which 
this is currently breaking? (i dont know much about sigv4)
   
   That's a great question — let me walk through the context.
   The root cause is in how the Java Iceberg SDK computes the 
x-amz-content-sha256 header. It uses AWS SDK v2's SignerChecksumParams with 
Algorithm.SHA256 and sets the checksumHeaderName to [X-Amz-Content-SHA256 
](https://github.com/apache/iceberg/blob/fec9800bc/aws/src/main/java/org/apache/iceberg/aws/RESTSigV4AuthSession.java#L100-L104).
 Internally, the [AWS SDK's]( 
https://github.com/aws/aws-sdk-java-v2/blob/master/core/auth/src/main/java/software/amazon/awssdk/auth/signer/internal/AbstractAws4Signer.java)
   applies BinaryUtils.toBase64() to the checksum before writing it into the 
specified header — this is part of the flexible checksum mechanism rather than 
standard SigV4 behavior.
   So the base64 encoding in x-amz-content-sha256 is essentially a side effect 
of Java Iceberg leveraging the flexible checksum API. For empty bodies, the 
Java side already has a 
[RESTSigV4AuthSession.java#L119-L121](https://github.com/apache/iceberg/blob/fec9800bc/aws/src/main/java/org/apache/iceberg/aws/RESTSigV4AuthSession.java#L119-L121)
 to override this with the standard hex value, but for non-empty bodies, the 
base64 value is left as-is (confirmed by the 
[TestRESTSigV4AuthSession.java#L174 
](https://github.com/apache/iceberg/blob/a89f1f9aa/aws/src/test/java/org/apache/iceberg/aws/TestRESTSigV4AuthSession.java#L174)).
   Since x-amz-content-sha256 is a signed header, its value participates in the 
canonical request construction. When a REST catalog server built with the Java 
Iceberg SDK verifies incoming signatures, it expects the same base64-encoded 
value. If the Python client sends a hex-encoded value instead, the canonical 
headers won't match during server-side signature verification, resulting in a 
signature mismatch.
   This PR aligns the Python implementation with the Java SDK's current 
behavior to ensure interoperability. That said, I agree it would be worth 
discussing whether the Java side should also be updated to use standard hex 
encoding — but that would need to be a coordinated change across both 
implementations. Happy to hear your thoughts on this!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to