Sbaia opened a new issue, #60501:
URL: https://github.com/apache/doris/issues/60501

   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/doris/issues) 
and found no similar issues.
   
   ### Version
   
   4.0.3-rc01
   
   ### What's Wrong
   
   After upgrading from **4.0.2** to **4.0.3**, `CREATE STORAGE VAULT` with 
`s3.role_arn` returns **403 Forbidden** when the Doris BE runs on EKS with IRSA 
(IAM Roles for Service Accounts).
   
   The same configuration worked perfectly on 4.0.2.
   
   ### Root Cause
   
   The regression was introduced by PR #59082 (`[feat](catalog) Support passing 
credentials_provider_type to BE for S3 access`, author: @zy-kkk, merged by 
@morningman), cherry-picked to branch-4.0 via #59246.
   
   The refactoring in `be/src/util/s3_util.cpp` changed 
`_get_aws_credentials_provider_v2` to use 
`_create_credentials_provider(s3_conf.cred_provider_type)` as the base provider 
for the STS AssumeRole client.
   
   **Before (4.0.2):**
   ```cpp
   auto stsClient = std::make_shared<Aws::STS::STSClient>(
       std::make_shared<CustomAwsCredentialsProviderChain>(), 
clientConfiguration);
   ```
   
   `CustomAwsCredentialsProviderChain` tries providers in order:
   1. `STSAssumeRoleWebIdentityCredentialsProvider` ← **IRSA/EKS**
   2. `TaskRoleCredentialsProvider` ← ECS
   3. `InstanceProfileCredentialsProvider` ← EC2 IMDS
   4. Environment, Profile, Process, SSO, Anonymous
   
   **After (4.0.3):**
   ```cpp
   auto baseProvider = _create_credentials_provider(s3_conf.cred_provider_type);
   auto stsClient = std::make_shared<Aws::STS::STSClient>(baseProvider, 
clientConfiguration);
   ```
   
   Since `cred_provider_type` is hardcoded to `INSTANCE_PROFILE` when 
`role_arn` is set (both in the FE at `S3Properties.java:634` and in the 
meta-service at `meta_service_resource.cpp:1173`), the new 
`_create_credentials_provider` resolves to **only** 
`InstanceProfileCredentialsProvider`.
   
   This means the STS client can **only** obtain base credentials from EC2 
IMDS. On EKS with IRSA, credentials are injected via 
`AWS_WEB_IDENTITY_TOKEN_FILE` and require 
`STSAssumeRoleWebIdentityCredentialsProvider`, which is no longer in the chain.
   
   ### How to Reproduce
   
   1. Deploy Doris 4.0.3 on EKS with a ServiceAccount that has an IAM role 
annotation (IRSA)
   2. The ServiceAccount's IAM role has permission to `sts:AssumeRole` into a 
target role
   3. Create a storage vault:
   ```sql
   CREATE STORAGE VAULT IF NOT EXISTS s3_vault PROPERTIES (
       "type" = "S3",
       "s3.endpoint" = "https://s3.eu-west-1.amazonaws.com";,
       "s3.region" = "eu-west-1",
       "s3.bucket" = "my-bucket",
       "s3.root.path" = "data",
       "s3.role_arn" = "arn:aws:iam::123456789:role/my-target-role",
       "provider" = "S3",
       "use_path_style" = "false"
   );
   ```
   4. Any operation that uses the vault returns **403 Forbidden**
   
   ### Suggested Fix
   
   In `be/src/util/s3_util.cpp`, `_get_aws_credentials_provider_v2`, when 
`role_arn` is present, the base provider for the STS client should use 
`CustomAwsCredentialsProviderChain` (or `Default`) instead of the literal 
`InstanceProfileCredentialsProvider`, to restore the 4.0.2 behavior:
   
   ```cpp
   // Line ~375 in _get_aws_credentials_provider_v2
   // Current (broken):
   auto baseProvider = _create_credentials_provider(s3_conf.cred_provider_type);
   
   // Suggested fix:
   auto baseProvider = (s3_conf.cred_provider_type == 
CredProviderType::InstanceProfile)
       ? std::static_pointer_cast<Aws::Auth::AWSCredentialsProvider>(
             std::make_shared<CustomAwsCredentialsProviderChain>())
       : _create_credentials_provider(s3_conf.cred_provider_type);
   ```
   
   This restores the full credentials chain for the STS client when assuming a 
role, supporting IRSA, ECS TaskRole, and EC2 IMDS transparently.
   
   Alternatively, a more complete fix could introduce a new `CredProviderType` 
(e.g., `AssumeRole`) distinct from `InstanceProfile`, so that the 
FE/meta-service can express "use AssumeRole with the default chain" instead of 
conflating it with `InstanceProfile`.
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Anything else?
   
   - This also affects **3.1.4** since PR #59082 was cherry-picked there as 
well (#59158).
   - The `_get_aws_credentials_provider_v1` path has the same issue but was not 
changed by this PR (it already used only `InstanceProfileCredentialsProvider`). 
The v1 path likely never worked with IRSA for AssumeRole.
   - There is no SQL-level workaround: the FE hardcodes `INSTANCE_PROFILE` when 
`role_arn` is present, and the meta-service validates this constraint.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to