morningman commented on PR #60498:
URL: https://github.com/apache/doris/pull/60498#issuecomment-4392537483

   # Improvement Suggestions for PR #60498
   
   **PR**: [\[feat\](Iceberg) Rest & S3Table Support 
Iam-role](https://github.com/apache/doris/pull/60498)
   **Review Date**: 2026-05-06
   
   ---
   
   The suggestions below are ordered by priority. The first three are worth 
landing before merge; the rest can ship as small follow-ups.
   
   ## 1. (medium) Validate region up front when an IAM role is configured
   
   `IcebergAwsAssumeRoleProperties.putAssumeRoleProperties` writes `aws.region` 
and `client.assume-role.region` directly from `s3Properties.getRegion()`. If 
the region is blank, the empty value is still written, and the failure surfaces 
later as an obscure AWS SDK error. Fail fast with a clear message instead:
   
   ```java
   public static void putAssumeRoleProperties(Map<String, String> target, 
S3Properties s3Properties) {
       if (StringUtils.isBlank(s3Properties.getS3IAMRole())) {
           return;
       }
       Preconditions.checkArgument(
               StringUtils.isNotBlank(s3Properties.getRegion()),
               "s3.region must be set when s3.role_arn is configured for 
Iceberg AWS clients");
       ...
   }
   ```
   
   Equivalently, the assertion can live in the `initNormalizeAndCheckProps` of 
`IcebergRestProperties` and `IcebergS3TablesMetaStoreProperties`, before the 
helper is invoked.
   
   ## 2. (medium) Reject the underscore variant of `iceberg.rest.external-id`
   
   `ICEBERG_REST_EXTERNAL_ID` is defined as `iceberg.rest.external-id` 
(hyphenated), but most other Doris properties β€” including `s3.external_id` β€” 
use underscores. Users will inevitably try `iceberg.rest.external_id`, and that 
key currently passes through silently because it is neither rejected nor 
recognized as an alias of `s3ExternalId`.
   
   Either reject both spellings:
   
   ```java
   private static final String ICEBERG_REST_EXTERNAL_ID = 
"iceberg.rest.external-id";
   private static final String ICEBERG_REST_EXTERNAL_ID_UNDERSCORE = 
"iceberg.rest.external_id";
   
   rejectUnsupportedAwsAssumeRoleProperty(ICEBERG_REST_EXTERNAL_ID);
   rejectUnsupportedAwsAssumeRoleProperty(ICEBERG_REST_EXTERNAL_ID_UNDERSCORE);
   ```
   
   …or pick one canonical naming style and document it explicitly. Right now 
`iceberg.rest.role_arn` (underscore) and `iceberg.rest.external-id` (hyphen) 
coexist, which is confusing.
   
   ## 3. (medium) Reject `s3.role_arn` when signing-name is neither `glue` nor 
`s3tables`
   
   If a user mistypes `iceberg.rest.signing-name` (for example `s3table` 
instead of `s3tables`), `shouldUseS3PropertiesForRestCredentials()` returns 
false, and any configured `s3.role_arn` is silently dropped. The catalog then 
falls back to the default provider chain, which is an extremely difficult 
misconfiguration to debug.
   
   Surface the conflict explicitly in `buildRules()`:
   
   ```java
   if (StringUtils.isNotBlank(origProps.get(S3Properties.ROLE_ARN))
           && !shouldUseS3PropertiesForRestCredentials()) {
       throw new IllegalArgumentException(
               "s3.role_arn is only supported when iceberg.rest.signing-name is 
"
                       + "'glue' or 's3tables'. Got signing-name='" + 
icebergRestSigningName + "'");
   }
   ```
   
   ## 4. (low) Use a class reference instead of a string literal in 
`getV2ClassName(mode)`
   
   The new single-argument overload returns a literal string for the `DEFAULT` 
case:
   
   ```java
   case DEFAULT:
       return 
"software.amazon.awssdk.auth.credentials.DefaultCredentialsProvider";
   ```
   
   while every other branch uses `XxxClass.class.getName()`. The literal will 
silently rot if the AWS SDK ever moves the class. Prefer:
   
   ```java
   case DEFAULT:
       return 
software.amazon.awssdk.auth.credentials.DefaultCredentialsProvider.class.getName();
   ```
   
   This is consistent with the rest of the file and gets compile-time 
validation.
   
   ## 5. (low) Rename `putS3FileIOCredentialProperties` or split it
   
   The method name advertises "S3 FileIO credentials," but it actually does two 
things: it writes the standard `S3FileIOProperties.*` keys *and* layers 
AssumeRole / provider-chain configuration on top via the internal 
`CredentialType` switch. Either rename it to something like 
`populateS3FileIOAndCredentials`, or split it into `populateS3FileIO` + 
`populateCredentials` so callers that need only the FileIO portion can reuse it 
without inheriting the credential-resolution logic.
   
   ## 6. (low) Widen the try/catch in 
`IcebergS3TablesMetaStoreProperties.initCatalog`
   
   The new `S3TablesClient client = buildS3TablesClient(catalogProps);` line 
lives outside the existing try block, so an exception from there (`Region.of` 
rejecting an invalid region, `HttpClientProperties` parse errors, etc.) 
bypasses the helpful "Failed to initialize S3TablesCatalog" wrapper. Move the 
assignment inside the try:
   
   ```java
   try {
       S3TablesClient client = buildS3TablesClient(catalogProps);
       S3TablesCatalog catalog = new S3TablesCatalog();
       catalog.initialize(catalogName, catalogProps, client);
       return catalog;
   } catch (Exception e) {
       throw new RuntimeException("Failed to initialize S3TablesCatalog for 
Iceberg. ...", e);
   }
   ```
   
   ## 7. (low) Replace `assertTrue(false)` with `Assertions.fail` in the 
regression test
   
   `test_iceberg_s3tables_catalog_credentials_provider.groovy` uses 
`assertTrue(false)` as a "should have thrown" sentinel. When the test does 
fail, the message is just `expected true got false`, which is hard to 
interpret. Use:
   
   ```groovy
   try {
       sql """show databases;"""
       Assertions.fail("Expected exception when web identity is missing")
   } catch (Exception e) {
       assertTrue(e.getMessage().contains(expectedMessage))
   }
   ```
   
   ## 8. (suggested) Add the missing unit tests
   
   The current FE coverage is strong but leaves a few gaps:
   
   - **Session token round-trip**: add a `testGlueWithSessionToken` (or 
`s3tables` equivalent) in `IcebergRestPropertiesTest` that sets 
`iceberg.rest.session-token` and asserts the final `rest.session-token` value 
in the catalog properties.
   - **REST + IAM Role via `S3Properties`**: add 
`testGlueWithIamRoleViaS3Properties` that sets `s3.role_arn` (and optionally 
`s3.external_id`) with `signing-name=glue`, then asserts the catalog properties 
contain `client.assume-role.arn`, `client.assume-role.external-id`, and 
`client.factory=AssumeRoleAwsClientFactory`.
   - **`AbstractIcebergProperties.toS3FileIOProperties` AssumeRole branch**: 
build a storage list containing a single `S3Properties` with `s3.role_arn` set, 
call `toFileIOProperties`, and assert the resulting map contains 
`client.assume-role.arn` and friends.
   
   ## 9. (suggested) Expand the PR description / release note
   
   The current PR description is one line ("FYI #59893"). Reviewers and 
downstream users will benefit from explicit notes on:
   
   - **New properties**: `iceberg.rest.session-token`, 
`iceberg.rest.credentials_provider_type` (the latter exists since #59893 but 
its behavior changes here).
   - **Behavior changes**: REST `signing-name=glue` no longer requires 
`iceberg.rest.access-key-id` and `iceberg.rest.secret-access-key`; 
`s3.role_arn` and `s3.external_id` now drive STS AssumeRole; 
`iceberg.rest.role_arn` and `iceberg.rest.external-id` are now explicitly 
rejected.
   - **Removal**: 
`org.apache.doris.datasource.iceberg.s3tables.CustomAwsCredentialsProvider` is 
deleted. It was an internal helper, but the removal should still be called out 
in case anyone depends on it.
   
   ---
   
   ## Overall
   
   The direction, abstraction, and tests are all above average. The main 
remaining work is making error paths louder and more specific (items 1–3 
above): validate region, reject the underscore variant of `external-id`, and 
refuse `s3.role_arn` when the signing-name is misconfigured. The remaining 
items are non-blocking polish and can ship as follow-ups.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to