Hi Steve, Thank you for sharing the work done for Amazon STS token to work with s3a connector. This works for direct HDFS to S3 bucket interaction. Your statement is also spot on for containers running in YARN has no mechanism to update the triple of session credentials. If I am not mistaken, Amazon STS token is not renewable, and has a max life time of 12 hours. New token must be obtained for AWS role for long running containers. There are a number of ways to fix session issues for YARN:
1. RM keeps track of the session and login secrets, and inject STS token into container running environment periodically. (Nasty hack to modify environment variable of a running process). 2. Transport the client access key and secret key to container, and container performs the re-login process. 3. If user home directory contains ~/.aws/credentials on all nodes, this works without code change, but operational nightmare. 4. Streamline the token handling to use OIDC JWT token, and client libraries will always perform check with OIDC server to keep token fresh. Option 1-3 might work with existing s3a connector work with some modification to application as well. Number 4 is aimed to modify Hadoop libraries that does authentication and token renewal transparently. This allows existing application to work by swapping jar files only without more code modification. It will also improve security because session expiration is synchronized. I am leaning toward address the fundamental problem, and I know the community has spent years of improvement to get to this point. However, Hadoop needs a way forward. This discussion helps to determine if it is essential to support OIDC as alternate security mechanism. How to do it using existing code, and how not to break existing code. regards, Eric On Thu, May 21, 2020 at 9:22 AM Steve Loughran <ste...@cloudera.com.invalid> wrote: > On Wed, 6 May 2020 at 23:32, Eric Yang <eric...@gmail.com> wrote: > > > Hi all, > > > > > > 4. Passing different form of tokens does not work well with cloud > provider > > security mechanism. For example, passing AWS sts token for S3 bucket. > > There is no renewal mechanism, nor good way to identify when the token > > would expire. > > > > > well, HADOOP-14556 does it fairly well, supporting session and role tokens. > We even know when they expire because we ask for a duration when we request > the session/role creds. > See org.apache.hadoop.fs.s3a.auth.delegation.AbstractS3ATokenIdentifier for > the core of what we marshall, including encryption secrets. > > The main issue there is that Yarn can't refresh those tokens because a new > triple of session credentials are required; currently token renewal assumes > the token is unchanged and a request is made to the service to update their > table of issued tokens. But even if the RM could get back a new token from > a refresh call, we are left with the problem of "how to get an updated set > of creds to each process" >