XJDKC commented on code in PR #1506:
URL: https://github.com/apache/polaris/pull/1506#discussion_r2070930235


##########
spec/polaris-management-service.yml:
##########
@@ -938,6 +940,34 @@ components:
           format: password
           description: Bearer token (input-only)
 
+    SigV4AuthenticationParameters:
+      type: object
+      description: AWS Signature Version 4 authentication
+      allOf:
+        - $ref: '#/components/schemas/AuthenticationParameters'
+      properties:
+        roleArn:
+          type: string
+          description: The aws IAM role arn assume when signing requests

Review Comment:
   Yes, this assumes the use of STS. While SigV4 can technically work with just 
a keyID/keySecret pair, that's not how it works in Polaris.
   
   Let me break it down a bit:
   
   Polaris acts as the service provider, and it has its own IAM user with an 
AWS credential (key ID and key secret). However, this IAM user is owned by the 
Polaris service, not by the Polaris user. **Since IAM doesn't allow one AWS 
account to grant privileges directly to another IAM user belong to another AWS 
account**, Polaris wouldn't be able to access the polaris user's Glue catalog 
that way.
   
   So how does Polaris access a user's Glue catalog? By assuming the IAM role 
provided by the Polaris user. This lets the Polaris service temporarily inherit 
the permissions tied to that role, essentially gaining the necessary access 
without the user having to expose long-lived credentials.
   
   Now, why are there S3 implementations that don't use STS?
   That's very common on the query engine side. In that case, the query engine 
is fully managed by the Polaris user themselves. They can create an IAM user + 
access key pair, grant that IAM user privileges to their storage, and configure 
their engine to use that credential directly. No need for STS in that scenario.
   
   So here's the key difference:
   1. AWS User credentials (key ID/secret) are long-lived and tied to a user. 
IAM roles aren't real credentials, they're just a set of permissions. To access 
resources, Polaris assumes a role and obtains short-lived temporary credentials 
via STS. **This is much more secure**
   2. In the query engine use case, both the engine and the storage are owned 
by the same identity (the Polaris user), so they're free to use long-lived user 
credentials if they want. Bur for polaris service provider and polaris users, 
they come from two different groups. 
   
   
   The IAM role assumption model is actually consistent with how Polaris 
accesses S3 storage in general. In Polaris S3 storage configs, users provide an 
IAM role, Polaris assumes that role, and gets subscoped temporary credentials 
via STS to access the storage.
   
   Glue Catalog also recommends to use IAM role to manage the access.
   https://docs.aws.amazon.com/glue/latest/dg/configure-iam-for-glue.html



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to