Re: [PR] feat: AWS Cross-Account IAM Authentication for Aurora [superset]

via GitHub Fri, 30 Jan 2026 14:00:18 -0800


codeant-ai-for-open-source[bot] commented on code in PR #37585:
URL: https://github.com/apache/superset/pull/37585#discussion_r2748138059



##########
superset/db_engine_specs/aws_iam.py:
##########
@@ -0,0 +1,632 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""
+AWS IAM Authentication Mixin for database engine specs.
+
+This mixin provides cross-account IAM authentication support for AWS databases
+(Aurora PostgreSQL, Aurora MySQL, Redshift). It handles:
+- Assuming IAM roles via STS AssumeRole
+- Generating RDS IAM auth tokens
+- Generating Redshift Serverless credentials
+- Configuring SSL (required for IAM auth)
+- Caching STS credentials to reduce API calls
+"""
+
+from __future__ import annotations
+
+import logging
+import threading
+from typing import Any, TYPE_CHECKING, TypedDict
+
+from cachetools import TTLCache
+
+from superset.databases.utils import make_url_safe
+from superset.errors import ErrorLevel, SupersetError, SupersetErrorType
+from superset.exceptions import SupersetSecurityException
+
+if TYPE_CHECKING:
+    from superset.models.core import Database
+
+logger = logging.getLogger(__name__)
+
+# Default session duration for STS AssumeRole (1 hour)
+DEFAULT_SESSION_DURATION = 3600
+
+# Default ports
+DEFAULT_POSTGRES_PORT = 5432
+DEFAULT_MYSQL_PORT = 3306
+DEFAULT_REDSHIFT_PORT = 5439
+
+# Cache STS credentials: key = (role_arn, region, external_id), TTL = 50 min
+_credentials_cache: TTLCache[tuple[str, str, str | None], dict[str, Any]] = 
TTLCache(
+    maxsize=100, ttl=3000

Review Comment:
   **Suggestion:** The STS credentials cache uses a fixed 50‑minute TTL (3000 
seconds) regardless of the configured `session_duration`, so if a shorter 
session duration (for example 900 seconds, as used in tests and allowed by the 
API) is configured, expired STS credentials will still be pulled from the cache 
and reused, causing intermittent authentication failures until the cache entry 
ages out; the cache TTL should not exceed the minimum session duration the code 
intends to support, so reducing the TTL to match a conservative lower bound 
(e.g. 900 seconds) prevents reuse of expired credentials. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ IAM-auth DB connections intermittently fail.
   - ❌ RDS token generation errors in auth flow.
   - ⚠️ Cross-account Redshift credential refresh fails.
   - ⚠️ Connection retries hit expired cached credentials.
   ```
   </details>
   
   ```suggestion
   # Cache STS credentials: key = (role_arn, region, external_id), TTL = 15 min
   # Using a TTL no longer than the minimum supported session duration avoids
   # reusing expired STS credentials when a shorter session_duration is 
configured.
   _credentials_cache: TTLCache[tuple[str, str, str | None], dict[str, Any]] = 
TTLCache(
       maxsize=100, ttl=900
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Configure a Database encrypted_extra to enable AWS IAM with a short 
session_duration
   (e.g., 900s) — the code reads this in _apply_iam_authentication at
   `superset/db_engine_specs/aws_iam.py:416-419` where session_duration =
   iam_config.get("session_duration", DEFAULT_SESSION_DURATION).
   
   2. Trigger a connection attempt that uses IAM auth (any code path that calls
   `_apply_iam_authentication` — see calls in engine specs; the mixin's method 
is defined at
   `superset/db_engine_specs/aws_iam.py:388-496`). During that call, 
`get_iam_credentials()`
   is invoked at `superset/db_engine_specs/aws_iam.py:466-472` (call site).
   
   3. On first call, `get_iam_credentials()` assumes the role and stores the 
returned
   credentials in the module-level TTLCache defined at
   `superset/db_engine_specs/aws_iam.py:54-57` (cache key computed at `:142`). 
The cache TTL
   is fixed to 3000 seconds (50 minutes).
   
   4. If the configured session_duration is shorter (900s), the STS credentials 
will expire
   after 900s, but the cache entry will remain valid for 3000s. Subsequent 
connection
   attempts that occur after the STS credentials expired but before the 3000s 
cache TTL
   elapses will return the cached (expired) credentials from 
`_credentials_cache` at
   `superset/db_engine_specs/aws_iam.py:144-147`. These expired credentials 
will be used to
   generate RDS/Redshift tokens (e.g., `generate_rds_auth_token` at `:474-482` 
or
   `generate_redshift_cluster_credentials` at `:616-622`), triggering 
authentication failures
   from AWS APIs.
   
   5. Observe intermittent failures: calls to 
`rds_client.generate_db_auth_token()`
   (`:257-262`) or Redshift credential APIs (`:371-377` / `:312-316`) will fail 
with expired
   token errors until the cache entry ages out after 3000s. This reproduces the 
bug
   deterministically when session_duration < 3000s and multiple connection 
attempts span the
   credential expiry window.
   
   6. Note: the cache insertion/lookup is protected by `_credentials_lock` 
(`:58`, used at
   `:144-147` and `:177-179`), so race conditions are mitigated, but the core 
logic reuses
   expired credentials due to the TTL mismatch.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/db_engine_specs/aws_iam.py
   **Line:** 54:56
   **Comment:**
        *Logic Error: The STS credentials cache uses a fixed 50‑minute TTL 
(3000 seconds) regardless of the configured `session_duration`, so if a shorter 
session duration (for example 900 seconds, as used in tests and allowed by the 
API) is configured, expired STS credentials will still be pulled from the cache 
and reused, causing intermittent authentication failures until the cache entry 
ages out; the cache TTL should not exceed the minimum session duration the code 
intends to support, so reducing the TTL to match a conservative lower bound 
(e.g. 900 seconds) prevents reuse of expired credentials.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/db_engine_specs/mysql.py:
##########
@@ -294,6 +303,49 @@ class MySQLEngineSpec(BasicParametersMixin, 
BaseEngineSpec):
         "mysqlconnector": {"allow_local_infile": 0},
     }
 
+    # Sensitive fields that should be masked in encrypted_extra
+    encrypted_extra_sensitive_fields = {
+        "$.aws_iam.external_id",
+        "$.aws_iam.role_arn",
+    }

Review Comment:
   **Suggestion:** Overriding `encrypted_extra_sensitive_fields` to only 
`$.aws_iam.external_id` and `$.aws_iam.role_arn` removes the BaseEngineSpec 
default of masking all `encrypted_extra` fields (`$.*`), so any other secrets 
stored there (e.g. credentials or tokens) will now be rendered in clear text in 
the UI, which is a regression in secrecy handling. To avoid exposing unrelated 
sensitive values, this class should extend the base sensitive-field set with 
the IAM-specific paths rather than replacing it outright. [security]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ UI may display non-IAM secrets in DB config.
   - ⚠️ Admins could accidentally expose credentials.
   - ⚠️ Compliance auditors may flag secret leakage.
   ```
   </details>
   
   ```suggestion
       encrypted_extra_sensitive_fields = 
BaseEngineSpec.encrypted_extra_sensitive_fields.union(
           {
               "$.aws_iam.external_id",
               "$.aws_iam.role_arn",
           }
       )
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Inspect the class-level constant in the PR:
   MySQLEngineSpec.encrypted_extra_sensitive_fields is set explicitly at
   superset/db_engine_specs/mysql.py:306-310.
   
   2. Verify BaseEngineSpec provides a default mask set (in upstream codebase
   BaseEngineSpec.encrypted_extra_sensitive_fields typically includes a general 
"$.*" or
   other defaults). By replacing the attribute rather than extending it, the 
MySQL spec will
   no longer inherit those defaults.
   
   3. Reproduce in the Admin UI: create a Database with encrypted_extra 
containing other
   secrets (e.g., "$.secret_token" or "$.password") and open the DB 
configuration or test
   connection. Because the MySQL class overwrote the mask set, those non-IAM 
secrets may
   render in cleartext in UI views that use encrypted_extra_sensitive_fields to 
redact
   values.
   
   4. Observe sensitive values shown where previously the base mask would have 
hidden them.
   This is a regression in secrecy handling.
   
   Note: The improved change unions the BaseEngineSpec default set with 
IAM-specific keys to
   preserve prior masking behavior.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/db_engine_specs/mysql.py
   **Line:** 307:310
   **Comment:**
        *Security: Overriding `encrypted_extra_sensitive_fields` to only 
`$.aws_iam.external_id` and `$.aws_iam.role_arn` removes the BaseEngineSpec 
default of masking all `encrypted_extra` fields (`$.*`), so any other secrets 
stored there (e.g. credentials or tokens) will now be rendered in clear text in 
the UI, which is a regression in secrecy handling. To avoid exposing unrelated 
sensitive values, this class should extend the base sensitive-field set with 
the IAM-specific paths rather than replacing it outright.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/db_engine_specs/postgres.py:
##########
@@ -461,6 +467,43 @@ def adjust_engine_params(
 
         return uri, connect_args
 
+    @staticmethod
+    def update_params_from_encrypted_extra(
+        database: Database,
+        params: dict[str, Any],
+    ) -> None:
+        """
+        Extract sensitive parameters from encrypted_extra.
+
+        Handles AWS IAM authentication if configured, then merges any
+        remaining encrypted_extra keys into params (standard behavior).
+        """
+        if not database.encrypted_extra:
+            return
+
+        try:
+            encrypted_extra = json.loads(database.encrypted_extra)
+        except json.JSONDecodeError as ex:
+            logger.error(ex, exc_info=True)
+            raise
+
+        # Handle AWS IAM auth: pop the key so it doesn't reach create_engine()
+        iam_config = encrypted_extra.pop("aws_iam", None)
+        if iam_config and iam_config.get("enabled"):
+            from superset.db_engine_specs.aws_iam import AWSIAMAuthMixin
+
+            AWSIAMAuthMixin._apply_iam_authentication(
+                database,
+                params,
+                iam_config,
+                ssl_args={"sslmode": "require"},
+                default_port=5432,
+            )

Review Comment:
   **Suggestion:** When AWS IAM auth is enabled, this code always forces 
`sslmode="require"` and overwrites any stricter SSL mode (like `verify-full`) 
that may have been configured earlier (e.g. via a server certificate), silently 
downgrading TLS verification and weakening security for Postgres connections 
using IAM. [security]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ TLS verification downgraded for IAM Postgres connections.
   - ⚠️ Server certificate verification bypassed for verified databases.
   - ⚠️ Affects engine creation path in PostgresEngineSpec.
   ```
   </details>
   
   ```suggestion
               # Preserve a stricter existing sslmode (e.g. verify-full) if 
present
               connect_args = params.get("connect_args") or {}
               previous_sslmode = connect_args.get("sslmode")
   
               AWSIAMAuthMixin._apply_iam_authentication(
                   database,
                   params,
                   iam_config,
                   ssl_args={"sslmode": "require"},
                   default_port=5432,
               )
   
               if previous_sslmode == "verify-full":
                   params.setdefault("connect_args", {})["sslmode"] = 
previous_sslmode
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Create a Database in Superset with encrypted_extra containing:
   
      - "aws_iam": {"enabled": true, ...}
   
      - and a server certificate set on the Database (Database.server_cert 
populated).
   
      This configuration is stored on the Database object and later passed to 
engine creation
      code.
   
   2. When Superset builds engine params,
   PostgresEngineSpec.update_params_from_encrypted_extra()
   
      is executed (see superset/db_engine_specs/postgres.py). The function 
starting at
   
      superset/db_engine_specs/postgres.py: update_params_from_encrypted_extra 
(lines
      ~470-520)
   
      reaches the section at superset/db_engine_specs/postgres.py:490 which 
contains the
      shown code.
   
   3. Before this IAM branch, Database.server_cert handling sets
   connect_args["sslmode"]="verify-full"
   
      (see get_extra_params in the same file which sets sslmode to 
"verify-full" when
      server_cert exists).
   
   4. The current code calls AWSIAMAuthMixin._apply_iam_authentication(...,
   ssl_args={"sslmode": "require"}, ...)
   
      at superset/db_engine_specs/postgres.py:495-501. That call will result in
      params/connect_args having sslmode="require",
   
      effectively overwriting the previously-set "verify-full". After this 
function returns,
      the final params
   
      passed to SQLAlchemy will have sslmode=require, downgrading TLS 
verification.
   
   5. Reproduce by adding a Database with a valid server_cert and aws_iam 
enabled, then
   attempt to connect.
   
      Observe that the connection uses sslmode=require (no full certificate 
verification)
      instead of verify-full,
   
      which weakens TLS verification compared to the configured server 
certificate.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/db_engine_specs/postgres.py
   **Line:** 495:501
   **Comment:**
        *Security: When AWS IAM auth is enabled, this code always forces 
`sslmode="require"` and overwrites any stricter SSL mode (like `verify-full`) 
that may have been configured earlier (e.g. via a server certificate), silently 
downgrading TLS verification and weakening security for Postgres connections 
using IAM.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/db_engine_specs/redshift.py:
##########
@@ -201,6 +202,45 @@ def normalize_table_name_for_upload(
             schema_name.lower() if schema_name else None,
         )
 
+    # Sensitive fields that should be masked in encrypted_extra
+    encrypted_extra_sensitive_fields = {
+        "$.aws_iam.external_id",
+        "$.aws_iam.role_arn",
+    }

Review Comment:
   **Suggestion:** Overriding the default `encrypted_extra_sensitive_fields` to 
only mask the AWS IAM fields narrows the masking scope and can cause other 
secrets stored in `encrypted_extra` (which used to be fully masked by the base 
class) to be displayed in the UI, leaking sensitive configuration values; 
instead, extend the base/default sensitive set rather than replacing it. 
[security]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ Database edit UI may show non-AWS secrets.
   - ❌ Other Redshift DB configs may leak credentials.
   - ⚠️ Tests only assert presence of AWS keys, not full masking.
   - ⚠️ Secret redaction behavior differs from other engines.
   ```
   </details>
   
   ```suggestion
       encrypted_extra_sensitive_fields = 
BasicParametersMixin.encrypted_extra_sensitive_fields.union(
           {"$.aws_iam.external_id", "$.aws_iam.role_arn"}
       )
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Observe BaseEngineSpec default masking: open file 
`superset/db_engine_specs/base.py`
   and note `encrypted_extra_sensitive_fields = {"$.*"}` at line 533 (shows 
by-default
   everything in encrypted_extra is masked).
   
   2. Observe Redshift override: open `superset/db_engine_specs/redshift.py` 
and see the new
   class attribute at lines 205-209 where `encrypted_extra_sensitive_fields` is 
set to only
   `$.aws_iam.external_id` and `$.aws_iam.role_arn`.
   
   3. Locate code paths that apply masking/unmasking: inspect
   `superset/db_engine_specs/base.py` around lines ~2383-2419 where the 
class-level
   `encrypted_extra_sensitive_fields` is used to redact/reveal encrypted_extra 
when rendering
   or saving database configuration (the code references
   `cls.encrypted_extra_sensitive_fields` in those lines).
   
   4. Concrete reproduction:
   
      - Create or update a Database row whose `encrypted_extra` JSON contains 
secrets that
      are not under `$.aws_iam.*` (for example `{"connect_args": 
{"credentials": "SECRET"},
      "password": "SECRET_PW"}`).
   
      - Open the Database edit UI (the UI code path that calls the backend to 
reveal/redact
      `encrypted_extra` triggers the BaseEngineSpec masking logic referenced at
      `superset/db_engine_specs/base.py:2383-2419`).
   
      - Because RedshiftEngineSpec now overrides 
`encrypted_extra_sensitive_fields` to only
      AWS IAM JSON paths (lines 205-209), the base default `"$.*"` masking no 
longer applies
      for Redshift databases, so other secret fields (e.g., 
`$.connect_args.credentials`,
      `$.password`) will not be masked and may be shown in the UI or logs where 
the
      redact/reveal routines expect mask patterns.
   
   5. Verify tests and expectations: unit tests in
   `tests/unit_tests/db_engine_specs/test_redshift_iam.py` (referenced in 
repository search)
   expect the AWS IAM fields to be present in
   `RedshiftEngineSpec.encrypted_extra_sensitive_fields`. Those tests do not 
assert that
   overriding preserves base masking — thus the change can silently remove 
global masking
   behavior for Redshift without breaking those tests.
   
   6. Explanation why this is a real issue: BaseEngineSpec intentionally 
defaulted to `"$.*"`
   (mask everything) at `base.py:533`. Replacing that with a small set in
   `redshift.py:205-209` narrows masking scope and changes behavior on a 
reachable code path
   (UI/database edit and any code that relies on 
`cls.encrypted_extra_sensitive_fields` at
   `base.py:2383-2419`), causing potential secret exposure for any 
encrypted_extra keys that
   previously were masked by the base default.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/db_engine_specs/redshift.py
   **Line:** 206:209
   **Comment:**
        *Security: Overriding the default `encrypted_extra_sensitive_fields` to 
only mask the AWS IAM fields narrows the masking scope and can cause other 
secrets stored in `encrypted_extra` (which used to be fully masked by the base 
class) to be displayed in the UI, leaking sensitive configuration values; 
instead, extend the base/default sensitive set rather than replacing it.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
tests/unit_tests/db_engine_specs/test_aws_iam.py:
##########
@@ -0,0 +1,1002 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# pylint: disable=import-outside-toplevel, protected-access
+
+from __future__ import annotations
+
+from typing import Any
+from unittest.mock import MagicMock, patch
+
+import pytest
+
+from superset.exceptions import SupersetSecurityException
+
+
+def test_get_iam_credentials_success() -> None:
+    from superset.db_engine_specs.aws_iam import AWSIAMAuthMixin
+
+    mock_credentials = {
+        "AccessKeyId": "ASIA...",
+        "SecretAccessKey": "secret...",
+        "SessionToken": "token...",
+        "Expiration": "2025-01-01T00:00:00Z",
+    }
+
+    with patch("boto3.client") as mock_boto3_client:
+        mock_sts = MagicMock()
+        mock_sts.assume_role.return_value = {"Credentials": mock_credentials}
+        mock_boto3_client.return_value = mock_sts
+
+        credentials = AWSIAMAuthMixin.get_iam_credentials(
+            role_arn="arn:aws:iam::123456789012:role/TestRole",
+            region="us-east-1",
+        )
+
+        assert credentials == mock_credentials
+        mock_boto3_client.assert_called_once_with("sts", 
region_name="us-east-1")
+        mock_sts.assume_role.assert_called_once_with(
+            RoleArn="arn:aws:iam::123456789012:role/TestRole",
+            RoleSessionName="superset-iam-session",
+            DurationSeconds=3600,
+        )
+
+
+def test_get_iam_credentials_with_external_id() -> None:
+    from superset.db_engine_specs.aws_iam import AWSIAMAuthMixin
+
+    mock_credentials = {
+        "AccessKeyId": "ASIA...",
+        "SecretAccessKey": "secret...",
+        "SessionToken": "token...",
+    }
+
+    with patch("boto3.client") as mock_boto3_client:
+        mock_sts = MagicMock()
+        mock_sts.assume_role.return_value = {"Credentials": mock_credentials}
+        mock_boto3_client.return_value = mock_sts
+
+        credentials = AWSIAMAuthMixin.get_iam_credentials(
+            role_arn="arn:aws:iam::123456789012:role/TestRole",
+            region="us-west-2",
+            external_id="external-id-12345",
+            session_duration=900,
+        )
+
+        assert credentials == mock_credentials
+        mock_sts.assume_role.assert_called_once_with(
+            RoleArn="arn:aws:iam::123456789012:role/TestRole",
+            RoleSessionName="superset-iam-session",
+            DurationSeconds=900,
+            ExternalId="external-id-12345",
+        )
+
+
+def test_get_iam_credentials_access_denied() -> None:
+    from botocore.exceptions import ClientError
+
+    from superset.db_engine_specs.aws_iam import (
+        _credentials_cache,
+        _credentials_lock,
+        AWSIAMAuthMixin,
+    )
+
+    with _credentials_lock:
+        _credentials_cache.clear()
+
+    with patch("boto3.client") as mock_boto3_client:
+        mock_sts = MagicMock()
+        mock_sts.assume_role.side_effect = ClientError(
+            {"Error": {"Code": "AccessDenied", "Message": "Access Denied"}},
+            "AssumeRole",
+        )
+        mock_boto3_client.return_value = mock_sts
+
+        with pytest.raises(SupersetSecurityException) as exc_info:
+            AWSIAMAuthMixin.get_iam_credentials(
+                role_arn="arn:aws:iam::123456789012:role/TestRole",
+                region="us-east-1",
+            )
+
+        assert "Unable to assume IAM role" in str(exc_info.value)
+
+
+def test_get_iam_credentials_external_id_mismatch() -> None:
+    from botocore.exceptions import ClientError
+
+    from superset.db_engine_specs.aws_iam import AWSIAMAuthMixin
+
+    with patch("boto3.client") as mock_boto3_client:
+        mock_sts = MagicMock()
+        mock_sts.assume_role.side_effect = ClientError(
+            {
+                "Error": {
+                    "Code": "AccessDenied",
+                    "Message": "The external id does not match",
+                }
+            },
+            "AssumeRole",
+        )
+        mock_boto3_client.return_value = mock_sts
+
+        with pytest.raises(SupersetSecurityException) as exc_info:
+            AWSIAMAuthMixin.get_iam_credentials(
+                role_arn="arn:aws:iam::123456789012:role/TestRole",
+                region="us-east-1",
+                external_id="wrong-id",
+            )
+
+        assert "External ID mismatch" in str(exc_info.value)
+
+
+def test_generate_rds_auth_token() -> None:
+    from superset.db_engine_specs.aws_iam import AWSIAMAuthMixin
+
+    credentials = {
+        "AccessKeyId": "ASIA...",
+        "SecretAccessKey": "secret...",
+        "SessionToken": "token...",
+    }
+
+    with patch("boto3.client") as mock_boto3_client:
+        mock_rds = MagicMock()
+        mock_rds.generate_db_auth_token.return_value = "iam-token-12345"
+        mock_boto3_client.return_value = mock_rds
+
+        token = AWSIAMAuthMixin.generate_rds_auth_token(
+            credentials=credentials,
+            hostname="mydb.cluster-xyz.us-east-1.rds.amazonaws.com",
+            port=5432,
+            username="superset_user",
+            region="us-east-1",
+        )
+
+        assert token == "iam-token-12345"  # noqa: S105
+        mock_boto3_client.assert_called_once_with(
+            "rds",
+            region_name="us-east-1",
+            aws_access_key_id="ASIA...",
+            aws_secret_access_key="secret...",  # noqa: S106
+            aws_session_token="token...",  # noqa: S106
+        )
+        mock_rds.generate_db_auth_token.assert_called_once_with(
+            DBHostname="mydb.cluster-xyz.us-east-1.rds.amazonaws.com",
+            Port=5432,
+            DBUsername="superset_user",
+        )
+
+
+def test_apply_iam_authentication() -> None:
+    from superset.db_engine_specs.aws_iam import AWSIAMAuthMixin, AWSIAMConfig
+
+    mock_database = MagicMock()
+    mock_database.sqlalchemy_uri_decrypted = (
+        
"postgresql://[email protected]:5432/mydb"
+    )
+
+    iam_config: AWSIAMConfig = {
+        "enabled": True,
+        "role_arn": "arn:aws:iam::123456789012:role/TestRole",
+        "region": "us-east-1",
+        "db_username": "superset_iam_user",
+    }
+
+    params: dict[str, Any] = {}
+
+    with (
+        patch.object(
+            AWSIAMAuthMixin,
+            "get_iam_credentials",
+            return_value={
+                "AccessKeyId": "ASIA...",
+                "SecretAccessKey": "secret...",
+                "SessionToken": "token...",
+            },
+        ) as mock_get_creds,
+        patch.object(
+            AWSIAMAuthMixin,
+            "generate_rds_auth_token",
+            return_value="iam-auth-token",
+        ) as mock_gen_token,
+    ):
+        AWSIAMAuthMixin._apply_iam_authentication(mock_database, params, 
iam_config)
+
+    mock_get_creds.assert_called_once_with(
+        role_arn="arn:aws:iam::123456789012:role/TestRole",
+        region="us-east-1",
+        external_id=None,
+        session_duration=3600,
+    )
+
+    mock_gen_token.assert_called_once()
+    token_call_kwargs = mock_gen_token.call_args[1]
+    assert (
+        token_call_kwargs["hostname"] == 
"mydb.cluster-xyz.us-east-1.rds.amazonaws.com"
+    )
+    assert token_call_kwargs["port"] == 5432
+    assert token_call_kwargs["username"] == "superset_iam_user"
+
+    assert params["connect_args"]["password"] == "iam-auth-token"  # noqa: S105
+    assert params["connect_args"]["user"] == "superset_iam_user"
+    assert params["connect_args"]["sslmode"] == "require"
+
+
+def test_apply_iam_authentication_with_external_id() -> None:
+    from superset.db_engine_specs.aws_iam import AWSIAMAuthMixin, AWSIAMConfig
+
+    mock_database = MagicMock()
+    mock_database.sqlalchemy_uri_decrypted = (
+        "postgresql://[email protected]:5432/mydb"
+    )
+
+    iam_config: AWSIAMConfig = {
+        "enabled": True,
+        "role_arn": "arn:aws:iam::222222222222:role/CrossAccountRole",
+        "external_id": "superset-prod-12345",
+        "region": "us-west-2",
+        "db_username": "iam_user",
+        "session_duration": 1800,
+    }
+
+    params: dict[str, Any] = {}
+
+    with (
+        patch.object(
+            AWSIAMAuthMixin,
+            "get_iam_credentials",
+            return_value={
+                "AccessKeyId": "ASIA...",
+                "SecretAccessKey": "secret...",
+                "SessionToken": "token...",
+            },
+        ) as mock_get_creds,
+        patch.object(
+            AWSIAMAuthMixin,
+            "generate_rds_auth_token",
+            return_value="iam-auth-token",
+        ),
+    ):
+        AWSIAMAuthMixin._apply_iam_authentication(mock_database, params, 
iam_config)
+
+    mock_get_creds.assert_called_once_with(
+        role_arn="arn:aws:iam::222222222222:role/CrossAccountRole",
+        region="us-west-2",
+        external_id="superset-prod-12345",
+        session_duration=1800,
+    )
+
+
+def test_apply_iam_authentication_missing_role_arn() -> None:
+    from superset.db_engine_specs.aws_iam import AWSIAMAuthMixin, AWSIAMConfig
+
+    mock_database = MagicMock()
+    mock_database.sqlalchemy_uri_decrypted = (
+        "postgresql://[email protected]:5432/mydb"
+    )
+
+    iam_config: AWSIAMConfig = {
+        "enabled": True,
+        "region": "us-east-1",
+        "db_username": "superset_iam_user",
+    }
+
+    params: dict[str, Any] = {}
+
+    with pytest.raises(SupersetSecurityException) as exc_info:
+        AWSIAMAuthMixin._apply_iam_authentication(mock_database, params, 
iam_config)
+
+    assert "role_arn" in str(exc_info.value)
+
+
+def test_apply_iam_authentication_missing_db_username() -> None:
+    from superset.db_engine_specs.aws_iam import AWSIAMAuthMixin, AWSIAMConfig
+
+    mock_database = MagicMock()
+    mock_database.sqlalchemy_uri_decrypted = (
+        "postgresql://[email protected]:5432/mydb"
+    )
+
+    iam_config: AWSIAMConfig = {
+        "enabled": True,
+        "role_arn": "arn:aws:iam::123456789012:role/TestRole",
+        "region": "us-east-1",
+    }
+
+    params: dict[str, Any] = {}
+
+    with pytest.raises(SupersetSecurityException) as exc_info:
+        AWSIAMAuthMixin._apply_iam_authentication(mock_database, params, 
iam_config)
+
+    assert "db_username" in str(exc_info.value)
+
+
+def test_apply_iam_authentication_default_port() -> None:
+    from superset.db_engine_specs.aws_iam import AWSIAMAuthMixin, AWSIAMConfig
+
+    mock_database = MagicMock()
+    # URI without explicit port
+    mock_database.sqlalchemy_uri_decrypted = (
+        "postgresql://[email protected]/mydb"
+    )
+
+    iam_config: AWSIAMConfig = {
+        "enabled": True,
+        "role_arn": "arn:aws:iam::123456789012:role/TestRole",
+        "region": "us-east-1",
+        "db_username": "superset_iam_user",
+    }
+
+    params: dict[str, Any] = {}
+
+    with (
+        patch.object(
+            AWSIAMAuthMixin,
+            "get_iam_credentials",
+            return_value={
+                "AccessKeyId": "ASIA...",
+                "SecretAccessKey": "secret...",
+                "SessionToken": "token...",
+            },
+        ),
+        patch.object(
+            AWSIAMAuthMixin,
+            "generate_rds_auth_token",
+            return_value="iam-auth-token",
+        ) as mock_gen_token,
+    ):
+        AWSIAMAuthMixin._apply_iam_authentication(mock_database, params, 
iam_config)
+
+    # Should use default port 5432
+    token_call_kwargs = mock_gen_token.call_args[1]
+    assert token_call_kwargs["port"] == 5432
+
+
+def test_get_iam_credentials_boto3_not_installed() -> None:
+    import sys
+
+    from superset.db_engine_specs.aws_iam import (
+        _credentials_cache,
+        _credentials_lock,
+        AWSIAMAuthMixin,
+    )
+
+    with _credentials_lock:
+        _credentials_cache.clear()
+
+    # Temporarily hide boto3
+    boto3_module = sys.modules.get("boto3")
+    sys.modules["boto3"] = None  # type: ignore
+
+    try:
+        with pytest.raises(SupersetSecurityException) as exc_info:
+            AWSIAMAuthMixin.get_iam_credentials(
+                role_arn="arn:aws:iam::123456789012:role/TestRole",
+                region="us-east-1",
+            )
+
+        assert "boto3 is required" in str(exc_info.value)
+    finally:
+        # Restore boto3
+        if boto3_module is not None:
+            sys.modules["boto3"] = boto3_module
+        else:
+            del sys.modules["boto3"]

Review Comment:
   **Suggestion:** The `test_get_iam_credentials_boto3_not_installed` test 
simulates the absence of `boto3` by setting `sys.modules["boto3"] = None`, but 
in Python this makes `import boto3` succeed and bind `boto3` to `None` instead 
of raising `ImportError`, so the test will not trigger the intended error path 
and can fail with an unexpected `AttributeError` when `boto3.client` is 
accessed. The fix is to patch Python's import mechanism so that importing 
`boto3` actually raises `ImportError`, which reliably exercises the "boto3 not 
installed" branch regardless of the environment. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ Unit test may raise AttributeError causing CI failure.
   - ⚠️ Test incorrectly simulates missing boto3 environment.
   - ⚠️ CI images with boto3 installed trigger the failure.
   ```
   </details>
   
   ```suggestion
       import builtins
   
       from superset.db_engine_specs.aws_iam import (
           _credentials_cache,
           _credentials_lock,
           AWSIAMAuthMixin,
       )
   
       with _credentials_lock:
           _credentials_cache.clear()
   
       real_import = builtins.__import__
   
       def fake_import(name, *args, **kwargs):  # type: ignore[override]
           if name == "boto3":
               raise ImportError("No module named 'boto3'")
           return real_import(name, *args, **kwargs)
   
       with patch("builtins.__import__", side_effect=fake_import):
           with pytest.raises(SupersetSecurityException) as exc_info:
               AWSIAMAuthMixin.get_iam_credentials(
                   role_arn="arn:aws:iam::123456789012:role/TestRole",
                   region="us-east-1",
               )
   
       assert "boto3 is required" in str(exc_info.value)
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Run the single failing test: pytest
   
tests/unit_tests/db_engine_specs/test_aws_iam.py::test_get_iam_credentials_boto3_not_installed.
   
   2. The test sets sys.modules['boto3'] = None (tests file:
   tests/unit_tests/db_engine_specs/test_aws_iam.py: lines 368-372 and 
381-383), then calls
   AWSIAMAuthMixin.get_iam_credentials().
   
   3. In this environment, executing "import boto3" will succeed and bind boto3 
to the None
   object (because sys.modules contains the name), so when the production code 
tries to
   access boto3.client it raises AttributeError instead of ImportError. The 
test intends to
   exercise the "boto3 not installed" branch that should raise 
SupersetSecurityException but
   instead gets an unexpected AttributeError.
   
   4. Observed outcome: test fails with AttributeError (or a different 
exception) rather than
   asserting the expected SupersetSecurityException. Reproduced by running the 
test in an
   environment where boto3 is present in site-packages (common CI images).
   
   5. The proposed fix (replace sys.modules manipulation with patching 
builtins.__import__ to
   raise ImportError for 'boto3') ensures importing boto3 actually fails, 
reliably exercising
   the intended code path and allowing the test to assert 
SupersetSecurityException.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** tests/unit_tests/db_engine_specs/test_aws_iam.py
   **Line:** 369:397
   **Comment:**
        *Logic Error: The `test_get_iam_credentials_boto3_not_installed` test 
simulates the absence of `boto3` by setting `sys.modules["boto3"] = None`, but 
in Python this makes `import boto3` succeed and bind `boto3` to `None` instead 
of raising `ImportError`, so the test will not trigger the intended error path 
and can fail with an unexpected `AttributeError` when `boto3.client` is 
accessed. The fix is to patch Python's import mechanism so that importing 
`boto3` actually raises `ImportError`, which reliably exercises the "boto3 not 
installed" branch regardless of the environment.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/db_engine_specs/mysql.py:
##########
@@ -294,6 +303,49 @@ class MySQLEngineSpec(BasicParametersMixin, 
BaseEngineSpec):
         "mysqlconnector": {"allow_local_infile": 0},
     }
 
+    # Sensitive fields that should be masked in encrypted_extra
+    encrypted_extra_sensitive_fields = {
+        "$.aws_iam.external_id",
+        "$.aws_iam.role_arn",
+    }
+
+    @staticmethod
+    def update_params_from_encrypted_extra(
+        database: Database,
+        params: dict[str, Any],
+    ) -> None:
+        """
+        Extract sensitive parameters from encrypted_extra.
+
+        Handles AWS IAM authentication if configured, then merges any
+        remaining encrypted_extra keys into params.
+        """
+        if not database.encrypted_extra:
+            return
+
+        try:
+            encrypted_extra = json.loads(database.encrypted_extra)
+        except json.JSONDecodeError as ex:
+            logger.error(ex, exc_info=True)
+            raise
+
+        # Handle AWS IAM auth: pop the key so it doesn't reach create_engine()
+        iam_config = encrypted_extra.pop("aws_iam", None)
+        if iam_config and iam_config.get("enabled"):
+            from superset.db_engine_specs.aws_iam import AWSIAMAuthMixin
+
+            AWSIAMAuthMixin._apply_iam_authentication(
+                database,
+                params,
+                iam_config,
+                ssl_args={"ssl_mode": "REQUIRED"},
+                default_port=3306,

Review Comment:
   **Suggestion:** The IAM integration currently passes `ssl_mode="REQUIRED"` 
into `connect_args`, but common MySQL drivers used by Superset (e.g. 
`mysqlclient`, `mysql-connector-python`) do not accept an `ssl_mode` keyword 
and will raise a `TypeError` like "connect() got an unexpected keyword argument 
'ssl_mode'", causing IAM-authenticated MySQL connections to fail at runtime. 
Instead of injecting a driver-specific SSL option here, rely on the engine's 
existing SSL configuration and avoid passing unsupported SSL arguments. [type 
error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ IAM MySQL/Aurora connections fail with TypeError at connect.
   - ⚠️ Database connection tests error in Admin UI.
   - ⚠️ ETL/queries using IAM-enabled MySQL fail to start.
   ```
   </details>
   
   ```suggestion
                   ssl_args={},
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Prepare a Database record with encrypted_extra containing 
aws_iam.enabled=true (see
   superset/db_engine_specs/mysql.py:332-344 in PR).
   
   2. Trigger a connection creation that uses
   MySQLEngineSpec.update_params_from_encrypted_extra — this is executed during 
engine
   creation paths that call BaseEngineSpec.update_params_from_encrypted_extra 
(the
   MySQL-specific override is at superset/db_engine_specs/mysql.py:312-348 and 
runs before
   create_engine()).
   
   3. The override calls AWSIAMAuthMixin._apply_iam_authentication(...) passing
   ssl_args={"ssl_mode": "REQUIRED"} 
(superset/db_engine_specs/mysql.py:338-343).
   
   4. When SQLAlchemy/DBAPI constructs a MySQL connection, common drivers 
(mysqlclient /
   mysql-connector-python) accept driver-specific ssl options (e.g. ssl or 
ssl_ca) but do not
   accept ssl_mode in connect()/connect_args; the driver will raise TypeError: 
"connect() got
   an unexpected keyword argument 'ssl_mode'".
   
   5. Observe connection attempt fails with TypeError at runtime while 
establishing DB
   connection (visible in worker logs or UI connection test), blocking 
IAM-authenticated
   MySQL/Aurora connections.
   
   Note: The final file shows the ssl_args is injected here (mysql.py:338-343). 
The proposed
   change removes ssl_mode to avoid passing unsupported args.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/db_engine_specs/mysql.py
   **Line:** 341:342
   **Comment:**
        *Type Error: The IAM integration currently passes `ssl_mode="REQUIRED"` 
into `connect_args`, but common MySQL drivers used by Superset (e.g. 
`mysqlclient`, `mysql-connector-python`) do not accept an `ssl_mode` keyword 
and will raise a `TypeError` like "connect() got an unexpected keyword argument 
'ssl_mode'", causing IAM-authenticated MySQL connections to fail at runtime. 
Instead of injecting a driver-specific SSL option here, rely on the engine's 
existing SSL configuration and avoid passing unsupported SSL arguments.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: AWS Cross-Account IAM Authentication for Aurora [superset]

Reply via email to