johnclara opened a new issue #1755:
URL: https://github.com/apache/iceberg/issues/1755


   Hi @danielcweeks ,
   
   I tested out switching from S3A hadoop 2.8.4 to the new iceberg-aws with our 
local tests.
   
   Most things worked out of the box. The differences were:
   - instead of things getting wrapped in a 
org.apache.hadoop.fs.s3a.AWSClientIOException, the raw exception s3 exception 
is thrown back up
   - switched from aws sdk1 to sdk2 which pulled in a new jackson version and 
other things that had to be tweaked.
   - didn't allow custom schemes
   
   (prepare for some hackiness):
   The custom schemes came up because s3a caches clients based on scheme + 
authority (bucket). We were abusing different schemes in order to cache an 
unencrypted and encrypted client without having to implement the AmazonS3 
interface.
   
   The motivation for using an encrypted and unencrypted client was the 
following:
   
   We use application level encryption from Iceberg's libraries. My 
understanding is that it encrypts all your data files and saves a dek per file 
in the manifest file. This led to us making a lot of calls to KMS. We were 
hoping to just to have a dek per manifest file. Our method for doing this was 
to continue using Iceberg's encryption but give it an unencrypted dek (just 
random bits) and then switch to an encrypted s3 client whenever interacting 
with the manifest file.
   
   We switched to an encrypted manifest file by string replacing the scheme 
every time we recognized a common prefix in the path. Since hadoop caches the 
file system per scheme, it's able to pick up the encrypted file system this way.
   
   This isn't fully rolled out and there's room for us to change it. I assume 
we should just implement the S3Client/AmazonAws client and switch on the path 
in there or implement a method to do this within iceberg instead of the client 
level.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to