steveloughran commented on PR #15171:
URL: https://github.com/apache/iceberg/pull/15171#issuecomment-3993187579

   ok, this is what I think would work
   
   A immutable set of the headers not to sign visible to all
   ```
     public static final Set<String> UNSIGNED_HEADERS =
         ImmutableSet.of(
             "Content-Type",
             "amz-sdk-invocation-id",
             "amz-sdk-retry",
             "if-match",
             "if-modified-since",
             "if-none-match",
             "if-unmodified-since",
             "range",
             "referer",
             "user-agent",
             "x-amz-date",
             "x-amz-content-sha256"
             );
   ```
   
   (we should check with some s3/aws people about x-amz-content. If it's there 
and is a checksum, it signs the payload, so a different payload would fail. But 
as the only POST with a payload is bulk DELETE, it's not really an issue)
   
   The key to the cache becomes: request with the unsigned headers stripped
   the value in the cache becomes: response with the unsigned headers stripped.
   
   
   lookup and prepare becomes
   1. strip out unsigned headers for lookup
   2. look for the cached entry
   3. if found, combine the full request with the cached value
   4. if not: issue full request
   
   
   Tests to make sure that none of the stripped headers from request 1 are in 
request 2, and that all signed values are in. 
   
   (extension 1) Maybe, when a response comes back, client scans the auth 
header/uri and if any of these have been signed warns & doesn't cache, 
otherwise strips them out from the cache. That's for safe coexistence with 
other implementations with a smaller list of headers to strip before signing. 
(Or we could ignore this as extra complexity, tests, maintenance?)
   
   
   (extension 2) I think I'd consider adding an emergency switch to disable 
cacheing. I know it hurts the catalog, but what if some corner case does 
surface: you don't want to break every client request.
   
   
   @adutra I could try this tomorrow building my work on top of yours. I've 
done that bit of the auth header parsing I propose in extension 1, but it's 
untested and will be extra complexity. I'd leave it out of the first iteration
   
   thoughts? 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to