johnclara opened a new issue #1755: URL: https://github.com/apache/iceberg/issues/1755
Hi @danielcweeks , I tested out switching from S3A hadoop 2.8.4 to the new iceberg-aws with our local tests. Most things worked out of the box. The differences were: - instead of things getting wrapped in a org.apache.hadoop.fs.s3a.AWSClientIOException, the raw exception s3 exception is thrown back up - switched from aws sdk1 to sdk2 which pulled in a new jackson version and other things that had to be tweaked. - didn't allow custom schemes (prepare for some hackiness): The custom schemes came up because s3a caches clients based on scheme + authority (bucket). We were abusing different schemes in order to cache an unencrypted and encrypted client without having to implement the AmazonS3 interface. The motivation for using an encrypted and unencrypted client was the following: We use application level encryption from Iceberg's libraries. My understanding is that it encrypts all your data files and saves a dek per file in the manifest file. This led to us making a lot of calls to KMS. We were hoping to just to have a dek per manifest file. Our method for doing this was to continue using Iceberg's encryption but give it an unencrypted dek (just random bits) and then switch to an encrypted s3 client whenever interacting with the manifest file. We switched to an encrypted manifest file by string replacing the scheme every time we recognized a common prefix in the path. Since hadoop caches the file system per scheme, it's able to pick up the encrypted file system this way. This isn't fully rolled out and there's room for us to change it. I assume we should just implement the S3Client/AmazonAws client and switch on the path in there or implement a method to do this within iceberg instead of the client level. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
