codeant-ai-for-open-source[bot] commented on code in PR #37585: URL: https://github.com/apache/superset/pull/37585#discussion_r2748138059
########## superset/db_engine_specs/aws_iam.py: ########## @@ -0,0 +1,632 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +""" +AWS IAM Authentication Mixin for database engine specs. + +This mixin provides cross-account IAM authentication support for AWS databases +(Aurora PostgreSQL, Aurora MySQL, Redshift). It handles: +- Assuming IAM roles via STS AssumeRole +- Generating RDS IAM auth tokens +- Generating Redshift Serverless credentials +- Configuring SSL (required for IAM auth) +- Caching STS credentials to reduce API calls +""" + +from __future__ import annotations + +import logging +import threading +from typing import Any, TYPE_CHECKING, TypedDict + +from cachetools import TTLCache + +from superset.databases.utils import make_url_safe +from superset.errors import ErrorLevel, SupersetError, SupersetErrorType +from superset.exceptions import SupersetSecurityException + +if TYPE_CHECKING: + from superset.models.core import Database + +logger = logging.getLogger(__name__) + +# Default session duration for STS AssumeRole (1 hour) +DEFAULT_SESSION_DURATION = 3600 + +# Default ports +DEFAULT_POSTGRES_PORT = 5432 +DEFAULT_MYSQL_PORT = 3306 +DEFAULT_REDSHIFT_PORT = 5439 + +# Cache STS credentials: key = (role_arn, region, external_id), TTL = 50 min +_credentials_cache: TTLCache[tuple[str, str, str | None], dict[str, Any]] = TTLCache( + maxsize=100, ttl=3000 Review Comment: **Suggestion:** The STS credentials cache uses a fixed 50‑minute TTL (3000 seconds) regardless of the configured `session_duration`, so if a shorter session duration (for example 900 seconds, as used in tests and allowed by the API) is configured, expired STS credentials will still be pulled from the cache and reused, causing intermittent authentication failures until the cache entry ages out; the cache TTL should not exceed the minimum session duration the code intends to support, so reducing the TTL to match a conservative lower bound (e.g. 900 seconds) prevents reuse of expired credentials. [logic error] <details> <summary><b>Severity Level:</b> Major ⚠️</summary> ```mdx - ❌ IAM-auth DB connections intermittently fail. - ❌ RDS token generation errors in auth flow. - ⚠️ Cross-account Redshift credential refresh fails. - ⚠️ Connection retries hit expired cached credentials. ``` </details> ```suggestion # Cache STS credentials: key = (role_arn, region, external_id), TTL = 15 min # Using a TTL no longer than the minimum supported session duration avoids # reusing expired STS credentials when a shorter session_duration is configured. _credentials_cache: TTLCache[tuple[str, str, str | None], dict[str, Any]] = TTLCache( maxsize=100, ttl=900 ``` <details> <summary><b>Steps of Reproduction ✅ </b></summary> ```mdx 1. Configure a Database encrypted_extra to enable AWS IAM with a short session_duration (e.g., 900s) — the code reads this in _apply_iam_authentication at `superset/db_engine_specs/aws_iam.py:416-419` where session_duration = iam_config.get("session_duration", DEFAULT_SESSION_DURATION). 2. Trigger a connection attempt that uses IAM auth (any code path that calls `_apply_iam_authentication` — see calls in engine specs; the mixin's method is defined at `superset/db_engine_specs/aws_iam.py:388-496`). During that call, `get_iam_credentials()` is invoked at `superset/db_engine_specs/aws_iam.py:466-472` (call site). 3. On first call, `get_iam_credentials()` assumes the role and stores the returned credentials in the module-level TTLCache defined at `superset/db_engine_specs/aws_iam.py:54-57` (cache key computed at `:142`). The cache TTL is fixed to 3000 seconds (50 minutes). 4. If the configured session_duration is shorter (900s), the STS credentials will expire after 900s, but the cache entry will remain valid for 3000s. Subsequent connection attempts that occur after the STS credentials expired but before the 3000s cache TTL elapses will return the cached (expired) credentials from `_credentials_cache` at `superset/db_engine_specs/aws_iam.py:144-147`. These expired credentials will be used to generate RDS/Redshift tokens (e.g., `generate_rds_auth_token` at `:474-482` or `generate_redshift_cluster_credentials` at `:616-622`), triggering authentication failures from AWS APIs. 5. Observe intermittent failures: calls to `rds_client.generate_db_auth_token()` (`:257-262`) or Redshift credential APIs (`:371-377` / `:312-316`) will fail with expired token errors until the cache entry ages out after 3000s. This reproduces the bug deterministically when session_duration < 3000s and multiple connection attempts span the credential expiry window. 6. Note: the cache insertion/lookup is protected by `_credentials_lock` (`:58`, used at `:144-147` and `:177-179`), so race conditions are mitigated, but the core logic reuses expired credentials due to the TTL mismatch. ``` </details> <details> <summary><b>Prompt for AI Agent 🤖 </b></summary> ```mdx This is a comment left during a code review. **Path:** superset/db_engine_specs/aws_iam.py **Line:** 54:56 **Comment:** *Logic Error: The STS credentials cache uses a fixed 50‑minute TTL (3000 seconds) regardless of the configured `session_duration`, so if a shorter session duration (for example 900 seconds, as used in tests and allowed by the API) is configured, expired STS credentials will still be pulled from the cache and reused, causing intermittent authentication failures until the cache entry ages out; the cache TTL should not exceed the minimum session duration the code intends to support, so reducing the TTL to match a conservative lower bound (e.g. 900 seconds) prevents reuse of expired credentials. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. ``` </details> ########## superset/db_engine_specs/mysql.py: ########## @@ -294,6 +303,49 @@ class MySQLEngineSpec(BasicParametersMixin, BaseEngineSpec): "mysqlconnector": {"allow_local_infile": 0}, } + # Sensitive fields that should be masked in encrypted_extra + encrypted_extra_sensitive_fields = { + "$.aws_iam.external_id", + "$.aws_iam.role_arn", + } Review Comment: **Suggestion:** Overriding `encrypted_extra_sensitive_fields` to only `$.aws_iam.external_id` and `$.aws_iam.role_arn` removes the BaseEngineSpec default of masking all `encrypted_extra` fields (`$.*`), so any other secrets stored there (e.g. credentials or tokens) will now be rendered in clear text in the UI, which is a regression in secrecy handling. To avoid exposing unrelated sensitive values, this class should extend the base sensitive-field set with the IAM-specific paths rather than replacing it outright. [security] <details> <summary><b>Severity Level:</b> Major ⚠️</summary> ```mdx - ❌ UI may display non-IAM secrets in DB config. - ⚠️ Admins could accidentally expose credentials. - ⚠️ Compliance auditors may flag secret leakage. ``` </details> ```suggestion encrypted_extra_sensitive_fields = BaseEngineSpec.encrypted_extra_sensitive_fields.union( { "$.aws_iam.external_id", "$.aws_iam.role_arn", } ) ``` <details> <summary><b>Steps of Reproduction ✅ </b></summary> ```mdx 1. Inspect the class-level constant in the PR: MySQLEngineSpec.encrypted_extra_sensitive_fields is set explicitly at superset/db_engine_specs/mysql.py:306-310. 2. Verify BaseEngineSpec provides a default mask set (in upstream codebase BaseEngineSpec.encrypted_extra_sensitive_fields typically includes a general "$.*" or other defaults). By replacing the attribute rather than extending it, the MySQL spec will no longer inherit those defaults. 3. Reproduce in the Admin UI: create a Database with encrypted_extra containing other secrets (e.g., "$.secret_token" or "$.password") and open the DB configuration or test connection. Because the MySQL class overwrote the mask set, those non-IAM secrets may render in cleartext in UI views that use encrypted_extra_sensitive_fields to redact values. 4. Observe sensitive values shown where previously the base mask would have hidden them. This is a regression in secrecy handling. Note: The improved change unions the BaseEngineSpec default set with IAM-specific keys to preserve prior masking behavior. ``` </details> <details> <summary><b>Prompt for AI Agent 🤖 </b></summary> ```mdx This is a comment left during a code review. **Path:** superset/db_engine_specs/mysql.py **Line:** 307:310 **Comment:** *Security: Overriding `encrypted_extra_sensitive_fields` to only `$.aws_iam.external_id` and `$.aws_iam.role_arn` removes the BaseEngineSpec default of masking all `encrypted_extra` fields (`$.*`), so any other secrets stored there (e.g. credentials or tokens) will now be rendered in clear text in the UI, which is a regression in secrecy handling. To avoid exposing unrelated sensitive values, this class should extend the base sensitive-field set with the IAM-specific paths rather than replacing it outright. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. ``` </details> ########## superset/db_engine_specs/postgres.py: ########## @@ -461,6 +467,43 @@ def adjust_engine_params( return uri, connect_args + @staticmethod + def update_params_from_encrypted_extra( + database: Database, + params: dict[str, Any], + ) -> None: + """ + Extract sensitive parameters from encrypted_extra. + + Handles AWS IAM authentication if configured, then merges any + remaining encrypted_extra keys into params (standard behavior). + """ + if not database.encrypted_extra: + return + + try: + encrypted_extra = json.loads(database.encrypted_extra) + except json.JSONDecodeError as ex: + logger.error(ex, exc_info=True) + raise + + # Handle AWS IAM auth: pop the key so it doesn't reach create_engine() + iam_config = encrypted_extra.pop("aws_iam", None) + if iam_config and iam_config.get("enabled"): + from superset.db_engine_specs.aws_iam import AWSIAMAuthMixin + + AWSIAMAuthMixin._apply_iam_authentication( + database, + params, + iam_config, + ssl_args={"sslmode": "require"}, + default_port=5432, + ) Review Comment: **Suggestion:** When AWS IAM auth is enabled, this code always forces `sslmode="require"` and overwrites any stricter SSL mode (like `verify-full`) that may have been configured earlier (e.g. via a server certificate), silently downgrading TLS verification and weakening security for Postgres connections using IAM. [security] <details> <summary><b>Severity Level:</b> Major ⚠️</summary> ```mdx - ❌ TLS verification downgraded for IAM Postgres connections. - ⚠️ Server certificate verification bypassed for verified databases. - ⚠️ Affects engine creation path in PostgresEngineSpec. ``` </details> ```suggestion # Preserve a stricter existing sslmode (e.g. verify-full) if present connect_args = params.get("connect_args") or {} previous_sslmode = connect_args.get("sslmode") AWSIAMAuthMixin._apply_iam_authentication( database, params, iam_config, ssl_args={"sslmode": "require"}, default_port=5432, ) if previous_sslmode == "verify-full": params.setdefault("connect_args", {})["sslmode"] = previous_sslmode ``` <details> <summary><b>Steps of Reproduction ✅ </b></summary> ```mdx 1. Create a Database in Superset with encrypted_extra containing: - "aws_iam": {"enabled": true, ...} - and a server certificate set on the Database (Database.server_cert populated). This configuration is stored on the Database object and later passed to engine creation code. 2. When Superset builds engine params, PostgresEngineSpec.update_params_from_encrypted_extra() is executed (see superset/db_engine_specs/postgres.py). The function starting at superset/db_engine_specs/postgres.py: update_params_from_encrypted_extra (lines ~470-520) reaches the section at superset/db_engine_specs/postgres.py:490 which contains the shown code. 3. Before this IAM branch, Database.server_cert handling sets connect_args["sslmode"]="verify-full" (see get_extra_params in the same file which sets sslmode to "verify-full" when server_cert exists). 4. The current code calls AWSIAMAuthMixin._apply_iam_authentication(..., ssl_args={"sslmode": "require"}, ...) at superset/db_engine_specs/postgres.py:495-501. That call will result in params/connect_args having sslmode="require", effectively overwriting the previously-set "verify-full". After this function returns, the final params passed to SQLAlchemy will have sslmode=require, downgrading TLS verification. 5. Reproduce by adding a Database with a valid server_cert and aws_iam enabled, then attempt to connect. Observe that the connection uses sslmode=require (no full certificate verification) instead of verify-full, which weakens TLS verification compared to the configured server certificate. ``` </details> <details> <summary><b>Prompt for AI Agent 🤖 </b></summary> ```mdx This is a comment left during a code review. **Path:** superset/db_engine_specs/postgres.py **Line:** 495:501 **Comment:** *Security: When AWS IAM auth is enabled, this code always forces `sslmode="require"` and overwrites any stricter SSL mode (like `verify-full`) that may have been configured earlier (e.g. via a server certificate), silently downgrading TLS verification and weakening security for Postgres connections using IAM. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. ``` </details> ########## superset/db_engine_specs/redshift.py: ########## @@ -201,6 +202,45 @@ def normalize_table_name_for_upload( schema_name.lower() if schema_name else None, ) + # Sensitive fields that should be masked in encrypted_extra + encrypted_extra_sensitive_fields = { + "$.aws_iam.external_id", + "$.aws_iam.role_arn", + } Review Comment: **Suggestion:** Overriding the default `encrypted_extra_sensitive_fields` to only mask the AWS IAM fields narrows the masking scope and can cause other secrets stored in `encrypted_extra` (which used to be fully masked by the base class) to be displayed in the UI, leaking sensitive configuration values; instead, extend the base/default sensitive set rather than replacing it. [security] <details> <summary><b>Severity Level:</b> Critical 🚨</summary> ```mdx - ❌ Database edit UI may show non-AWS secrets. - ❌ Other Redshift DB configs may leak credentials. - ⚠️ Tests only assert presence of AWS keys, not full masking. - ⚠️ Secret redaction behavior differs from other engines. ``` </details> ```suggestion encrypted_extra_sensitive_fields = BasicParametersMixin.encrypted_extra_sensitive_fields.union( {"$.aws_iam.external_id", "$.aws_iam.role_arn"} ) ``` <details> <summary><b>Steps of Reproduction ✅ </b></summary> ```mdx 1. Observe BaseEngineSpec default masking: open file `superset/db_engine_specs/base.py` and note `encrypted_extra_sensitive_fields = {"$.*"}` at line 533 (shows by-default everything in encrypted_extra is masked). 2. Observe Redshift override: open `superset/db_engine_specs/redshift.py` and see the new class attribute at lines 205-209 where `encrypted_extra_sensitive_fields` is set to only `$.aws_iam.external_id` and `$.aws_iam.role_arn`. 3. Locate code paths that apply masking/unmasking: inspect `superset/db_engine_specs/base.py` around lines ~2383-2419 where the class-level `encrypted_extra_sensitive_fields` is used to redact/reveal encrypted_extra when rendering or saving database configuration (the code references `cls.encrypted_extra_sensitive_fields` in those lines). 4. Concrete reproduction: - Create or update a Database row whose `encrypted_extra` JSON contains secrets that are not under `$.aws_iam.*` (for example `{"connect_args": {"credentials": "SECRET"}, "password": "SECRET_PW"}`). - Open the Database edit UI (the UI code path that calls the backend to reveal/redact `encrypted_extra` triggers the BaseEngineSpec masking logic referenced at `superset/db_engine_specs/base.py:2383-2419`). - Because RedshiftEngineSpec now overrides `encrypted_extra_sensitive_fields` to only AWS IAM JSON paths (lines 205-209), the base default `"$.*"` masking no longer applies for Redshift databases, so other secret fields (e.g., `$.connect_args.credentials`, `$.password`) will not be masked and may be shown in the UI or logs where the redact/reveal routines expect mask patterns. 5. Verify tests and expectations: unit tests in `tests/unit_tests/db_engine_specs/test_redshift_iam.py` (referenced in repository search) expect the AWS IAM fields to be present in `RedshiftEngineSpec.encrypted_extra_sensitive_fields`. Those tests do not assert that overriding preserves base masking — thus the change can silently remove global masking behavior for Redshift without breaking those tests. 6. Explanation why this is a real issue: BaseEngineSpec intentionally defaulted to `"$.*"` (mask everything) at `base.py:533`. Replacing that with a small set in `redshift.py:205-209` narrows masking scope and changes behavior on a reachable code path (UI/database edit and any code that relies on `cls.encrypted_extra_sensitive_fields` at `base.py:2383-2419`), causing potential secret exposure for any encrypted_extra keys that previously were masked by the base default. ``` </details> <details> <summary><b>Prompt for AI Agent 🤖 </b></summary> ```mdx This is a comment left during a code review. **Path:** superset/db_engine_specs/redshift.py **Line:** 206:209 **Comment:** *Security: Overriding the default `encrypted_extra_sensitive_fields` to only mask the AWS IAM fields narrows the masking scope and can cause other secrets stored in `encrypted_extra` (which used to be fully masked by the base class) to be displayed in the UI, leaking sensitive configuration values; instead, extend the base/default sensitive set rather than replacing it. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. ``` </details> ########## tests/unit_tests/db_engine_specs/test_aws_iam.py: ########## @@ -0,0 +1,1002 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +# pylint: disable=import-outside-toplevel, protected-access + +from __future__ import annotations + +from typing import Any +from unittest.mock import MagicMock, patch + +import pytest + +from superset.exceptions import SupersetSecurityException + + +def test_get_iam_credentials_success() -> None: + from superset.db_engine_specs.aws_iam import AWSIAMAuthMixin + + mock_credentials = { + "AccessKeyId": "ASIA...", + "SecretAccessKey": "secret...", + "SessionToken": "token...", + "Expiration": "2025-01-01T00:00:00Z", + } + + with patch("boto3.client") as mock_boto3_client: + mock_sts = MagicMock() + mock_sts.assume_role.return_value = {"Credentials": mock_credentials} + mock_boto3_client.return_value = mock_sts + + credentials = AWSIAMAuthMixin.get_iam_credentials( + role_arn="arn:aws:iam::123456789012:role/TestRole", + region="us-east-1", + ) + + assert credentials == mock_credentials + mock_boto3_client.assert_called_once_with("sts", region_name="us-east-1") + mock_sts.assume_role.assert_called_once_with( + RoleArn="arn:aws:iam::123456789012:role/TestRole", + RoleSessionName="superset-iam-session", + DurationSeconds=3600, + ) + + +def test_get_iam_credentials_with_external_id() -> None: + from superset.db_engine_specs.aws_iam import AWSIAMAuthMixin + + mock_credentials = { + "AccessKeyId": "ASIA...", + "SecretAccessKey": "secret...", + "SessionToken": "token...", + } + + with patch("boto3.client") as mock_boto3_client: + mock_sts = MagicMock() + mock_sts.assume_role.return_value = {"Credentials": mock_credentials} + mock_boto3_client.return_value = mock_sts + + credentials = AWSIAMAuthMixin.get_iam_credentials( + role_arn="arn:aws:iam::123456789012:role/TestRole", + region="us-west-2", + external_id="external-id-12345", + session_duration=900, + ) + + assert credentials == mock_credentials + mock_sts.assume_role.assert_called_once_with( + RoleArn="arn:aws:iam::123456789012:role/TestRole", + RoleSessionName="superset-iam-session", + DurationSeconds=900, + ExternalId="external-id-12345", + ) + + +def test_get_iam_credentials_access_denied() -> None: + from botocore.exceptions import ClientError + + from superset.db_engine_specs.aws_iam import ( + _credentials_cache, + _credentials_lock, + AWSIAMAuthMixin, + ) + + with _credentials_lock: + _credentials_cache.clear() + + with patch("boto3.client") as mock_boto3_client: + mock_sts = MagicMock() + mock_sts.assume_role.side_effect = ClientError( + {"Error": {"Code": "AccessDenied", "Message": "Access Denied"}}, + "AssumeRole", + ) + mock_boto3_client.return_value = mock_sts + + with pytest.raises(SupersetSecurityException) as exc_info: + AWSIAMAuthMixin.get_iam_credentials( + role_arn="arn:aws:iam::123456789012:role/TestRole", + region="us-east-1", + ) + + assert "Unable to assume IAM role" in str(exc_info.value) + + +def test_get_iam_credentials_external_id_mismatch() -> None: + from botocore.exceptions import ClientError + + from superset.db_engine_specs.aws_iam import AWSIAMAuthMixin + + with patch("boto3.client") as mock_boto3_client: + mock_sts = MagicMock() + mock_sts.assume_role.side_effect = ClientError( + { + "Error": { + "Code": "AccessDenied", + "Message": "The external id does not match", + } + }, + "AssumeRole", + ) + mock_boto3_client.return_value = mock_sts + + with pytest.raises(SupersetSecurityException) as exc_info: + AWSIAMAuthMixin.get_iam_credentials( + role_arn="arn:aws:iam::123456789012:role/TestRole", + region="us-east-1", + external_id="wrong-id", + ) + + assert "External ID mismatch" in str(exc_info.value) + + +def test_generate_rds_auth_token() -> None: + from superset.db_engine_specs.aws_iam import AWSIAMAuthMixin + + credentials = { + "AccessKeyId": "ASIA...", + "SecretAccessKey": "secret...", + "SessionToken": "token...", + } + + with patch("boto3.client") as mock_boto3_client: + mock_rds = MagicMock() + mock_rds.generate_db_auth_token.return_value = "iam-token-12345" + mock_boto3_client.return_value = mock_rds + + token = AWSIAMAuthMixin.generate_rds_auth_token( + credentials=credentials, + hostname="mydb.cluster-xyz.us-east-1.rds.amazonaws.com", + port=5432, + username="superset_user", + region="us-east-1", + ) + + assert token == "iam-token-12345" # noqa: S105 + mock_boto3_client.assert_called_once_with( + "rds", + region_name="us-east-1", + aws_access_key_id="ASIA...", + aws_secret_access_key="secret...", # noqa: S106 + aws_session_token="token...", # noqa: S106 + ) + mock_rds.generate_db_auth_token.assert_called_once_with( + DBHostname="mydb.cluster-xyz.us-east-1.rds.amazonaws.com", + Port=5432, + DBUsername="superset_user", + ) + + +def test_apply_iam_authentication() -> None: + from superset.db_engine_specs.aws_iam import AWSIAMAuthMixin, AWSIAMConfig + + mock_database = MagicMock() + mock_database.sqlalchemy_uri_decrypted = ( + "postgresql://[email protected]:5432/mydb" + ) + + iam_config: AWSIAMConfig = { + "enabled": True, + "role_arn": "arn:aws:iam::123456789012:role/TestRole", + "region": "us-east-1", + "db_username": "superset_iam_user", + } + + params: dict[str, Any] = {} + + with ( + patch.object( + AWSIAMAuthMixin, + "get_iam_credentials", + return_value={ + "AccessKeyId": "ASIA...", + "SecretAccessKey": "secret...", + "SessionToken": "token...", + }, + ) as mock_get_creds, + patch.object( + AWSIAMAuthMixin, + "generate_rds_auth_token", + return_value="iam-auth-token", + ) as mock_gen_token, + ): + AWSIAMAuthMixin._apply_iam_authentication(mock_database, params, iam_config) + + mock_get_creds.assert_called_once_with( + role_arn="arn:aws:iam::123456789012:role/TestRole", + region="us-east-1", + external_id=None, + session_duration=3600, + ) + + mock_gen_token.assert_called_once() + token_call_kwargs = mock_gen_token.call_args[1] + assert ( + token_call_kwargs["hostname"] == "mydb.cluster-xyz.us-east-1.rds.amazonaws.com" + ) + assert token_call_kwargs["port"] == 5432 + assert token_call_kwargs["username"] == "superset_iam_user" + + assert params["connect_args"]["password"] == "iam-auth-token" # noqa: S105 + assert params["connect_args"]["user"] == "superset_iam_user" + assert params["connect_args"]["sslmode"] == "require" + + +def test_apply_iam_authentication_with_external_id() -> None: + from superset.db_engine_specs.aws_iam import AWSIAMAuthMixin, AWSIAMConfig + + mock_database = MagicMock() + mock_database.sqlalchemy_uri_decrypted = ( + "postgresql://[email protected]:5432/mydb" + ) + + iam_config: AWSIAMConfig = { + "enabled": True, + "role_arn": "arn:aws:iam::222222222222:role/CrossAccountRole", + "external_id": "superset-prod-12345", + "region": "us-west-2", + "db_username": "iam_user", + "session_duration": 1800, + } + + params: dict[str, Any] = {} + + with ( + patch.object( + AWSIAMAuthMixin, + "get_iam_credentials", + return_value={ + "AccessKeyId": "ASIA...", + "SecretAccessKey": "secret...", + "SessionToken": "token...", + }, + ) as mock_get_creds, + patch.object( + AWSIAMAuthMixin, + "generate_rds_auth_token", + return_value="iam-auth-token", + ), + ): + AWSIAMAuthMixin._apply_iam_authentication(mock_database, params, iam_config) + + mock_get_creds.assert_called_once_with( + role_arn="arn:aws:iam::222222222222:role/CrossAccountRole", + region="us-west-2", + external_id="superset-prod-12345", + session_duration=1800, + ) + + +def test_apply_iam_authentication_missing_role_arn() -> None: + from superset.db_engine_specs.aws_iam import AWSIAMAuthMixin, AWSIAMConfig + + mock_database = MagicMock() + mock_database.sqlalchemy_uri_decrypted = ( + "postgresql://[email protected]:5432/mydb" + ) + + iam_config: AWSIAMConfig = { + "enabled": True, + "region": "us-east-1", + "db_username": "superset_iam_user", + } + + params: dict[str, Any] = {} + + with pytest.raises(SupersetSecurityException) as exc_info: + AWSIAMAuthMixin._apply_iam_authentication(mock_database, params, iam_config) + + assert "role_arn" in str(exc_info.value) + + +def test_apply_iam_authentication_missing_db_username() -> None: + from superset.db_engine_specs.aws_iam import AWSIAMAuthMixin, AWSIAMConfig + + mock_database = MagicMock() + mock_database.sqlalchemy_uri_decrypted = ( + "postgresql://[email protected]:5432/mydb" + ) + + iam_config: AWSIAMConfig = { + "enabled": True, + "role_arn": "arn:aws:iam::123456789012:role/TestRole", + "region": "us-east-1", + } + + params: dict[str, Any] = {} + + with pytest.raises(SupersetSecurityException) as exc_info: + AWSIAMAuthMixin._apply_iam_authentication(mock_database, params, iam_config) + + assert "db_username" in str(exc_info.value) + + +def test_apply_iam_authentication_default_port() -> None: + from superset.db_engine_specs.aws_iam import AWSIAMAuthMixin, AWSIAMConfig + + mock_database = MagicMock() + # URI without explicit port + mock_database.sqlalchemy_uri_decrypted = ( + "postgresql://[email protected]/mydb" + ) + + iam_config: AWSIAMConfig = { + "enabled": True, + "role_arn": "arn:aws:iam::123456789012:role/TestRole", + "region": "us-east-1", + "db_username": "superset_iam_user", + } + + params: dict[str, Any] = {} + + with ( + patch.object( + AWSIAMAuthMixin, + "get_iam_credentials", + return_value={ + "AccessKeyId": "ASIA...", + "SecretAccessKey": "secret...", + "SessionToken": "token...", + }, + ), + patch.object( + AWSIAMAuthMixin, + "generate_rds_auth_token", + return_value="iam-auth-token", + ) as mock_gen_token, + ): + AWSIAMAuthMixin._apply_iam_authentication(mock_database, params, iam_config) + + # Should use default port 5432 + token_call_kwargs = mock_gen_token.call_args[1] + assert token_call_kwargs["port"] == 5432 + + +def test_get_iam_credentials_boto3_not_installed() -> None: + import sys + + from superset.db_engine_specs.aws_iam import ( + _credentials_cache, + _credentials_lock, + AWSIAMAuthMixin, + ) + + with _credentials_lock: + _credentials_cache.clear() + + # Temporarily hide boto3 + boto3_module = sys.modules.get("boto3") + sys.modules["boto3"] = None # type: ignore + + try: + with pytest.raises(SupersetSecurityException) as exc_info: + AWSIAMAuthMixin.get_iam_credentials( + role_arn="arn:aws:iam::123456789012:role/TestRole", + region="us-east-1", + ) + + assert "boto3 is required" in str(exc_info.value) + finally: + # Restore boto3 + if boto3_module is not None: + sys.modules["boto3"] = boto3_module + else: + del sys.modules["boto3"] Review Comment: **Suggestion:** The `test_get_iam_credentials_boto3_not_installed` test simulates the absence of `boto3` by setting `sys.modules["boto3"] = None`, but in Python this makes `import boto3` succeed and bind `boto3` to `None` instead of raising `ImportError`, so the test will not trigger the intended error path and can fail with an unexpected `AttributeError` when `boto3.client` is accessed. The fix is to patch Python's import mechanism so that importing `boto3` actually raises `ImportError`, which reliably exercises the "boto3 not installed" branch regardless of the environment. [logic error] <details> <summary><b>Severity Level:</b> Major ⚠️</summary> ```mdx - ❌ Unit test may raise AttributeError causing CI failure. - ⚠️ Test incorrectly simulates missing boto3 environment. - ⚠️ CI images with boto3 installed trigger the failure. ``` </details> ```suggestion import builtins from superset.db_engine_specs.aws_iam import ( _credentials_cache, _credentials_lock, AWSIAMAuthMixin, ) with _credentials_lock: _credentials_cache.clear() real_import = builtins.__import__ def fake_import(name, *args, **kwargs): # type: ignore[override] if name == "boto3": raise ImportError("No module named 'boto3'") return real_import(name, *args, **kwargs) with patch("builtins.__import__", side_effect=fake_import): with pytest.raises(SupersetSecurityException) as exc_info: AWSIAMAuthMixin.get_iam_credentials( role_arn="arn:aws:iam::123456789012:role/TestRole", region="us-east-1", ) assert "boto3 is required" in str(exc_info.value) ``` <details> <summary><b>Steps of Reproduction ✅ </b></summary> ```mdx 1. Run the single failing test: pytest tests/unit_tests/db_engine_specs/test_aws_iam.py::test_get_iam_credentials_boto3_not_installed. 2. The test sets sys.modules['boto3'] = None (tests file: tests/unit_tests/db_engine_specs/test_aws_iam.py: lines 368-372 and 381-383), then calls AWSIAMAuthMixin.get_iam_credentials(). 3. In this environment, executing "import boto3" will succeed and bind boto3 to the None object (because sys.modules contains the name), so when the production code tries to access boto3.client it raises AttributeError instead of ImportError. The test intends to exercise the "boto3 not installed" branch that should raise SupersetSecurityException but instead gets an unexpected AttributeError. 4. Observed outcome: test fails with AttributeError (or a different exception) rather than asserting the expected SupersetSecurityException. Reproduced by running the test in an environment where boto3 is present in site-packages (common CI images). 5. The proposed fix (replace sys.modules manipulation with patching builtins.__import__ to raise ImportError for 'boto3') ensures importing boto3 actually fails, reliably exercising the intended code path and allowing the test to assert SupersetSecurityException. ``` </details> <details> <summary><b>Prompt for AI Agent 🤖 </b></summary> ```mdx This is a comment left during a code review. **Path:** tests/unit_tests/db_engine_specs/test_aws_iam.py **Line:** 369:397 **Comment:** *Logic Error: The `test_get_iam_credentials_boto3_not_installed` test simulates the absence of `boto3` by setting `sys.modules["boto3"] = None`, but in Python this makes `import boto3` succeed and bind `boto3` to `None` instead of raising `ImportError`, so the test will not trigger the intended error path and can fail with an unexpected `AttributeError` when `boto3.client` is accessed. The fix is to patch Python's import mechanism so that importing `boto3` actually raises `ImportError`, which reliably exercises the "boto3 not installed" branch regardless of the environment. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. ``` </details> ########## superset/db_engine_specs/mysql.py: ########## @@ -294,6 +303,49 @@ class MySQLEngineSpec(BasicParametersMixin, BaseEngineSpec): "mysqlconnector": {"allow_local_infile": 0}, } + # Sensitive fields that should be masked in encrypted_extra + encrypted_extra_sensitive_fields = { + "$.aws_iam.external_id", + "$.aws_iam.role_arn", + } + + @staticmethod + def update_params_from_encrypted_extra( + database: Database, + params: dict[str, Any], + ) -> None: + """ + Extract sensitive parameters from encrypted_extra. + + Handles AWS IAM authentication if configured, then merges any + remaining encrypted_extra keys into params. + """ + if not database.encrypted_extra: + return + + try: + encrypted_extra = json.loads(database.encrypted_extra) + except json.JSONDecodeError as ex: + logger.error(ex, exc_info=True) + raise + + # Handle AWS IAM auth: pop the key so it doesn't reach create_engine() + iam_config = encrypted_extra.pop("aws_iam", None) + if iam_config and iam_config.get("enabled"): + from superset.db_engine_specs.aws_iam import AWSIAMAuthMixin + + AWSIAMAuthMixin._apply_iam_authentication( + database, + params, + iam_config, + ssl_args={"ssl_mode": "REQUIRED"}, + default_port=3306, Review Comment: **Suggestion:** The IAM integration currently passes `ssl_mode="REQUIRED"` into `connect_args`, but common MySQL drivers used by Superset (e.g. `mysqlclient`, `mysql-connector-python`) do not accept an `ssl_mode` keyword and will raise a `TypeError` like "connect() got an unexpected keyword argument 'ssl_mode'", causing IAM-authenticated MySQL connections to fail at runtime. Instead of injecting a driver-specific SSL option here, rely on the engine's existing SSL configuration and avoid passing unsupported SSL arguments. [type error] <details> <summary><b>Severity Level:</b> Major ⚠️</summary> ```mdx - ❌ IAM MySQL/Aurora connections fail with TypeError at connect. - ⚠️ Database connection tests error in Admin UI. - ⚠️ ETL/queries using IAM-enabled MySQL fail to start. ``` </details> ```suggestion ssl_args={}, ``` <details> <summary><b>Steps of Reproduction ✅ </b></summary> ```mdx 1. Prepare a Database record with encrypted_extra containing aws_iam.enabled=true (see superset/db_engine_specs/mysql.py:332-344 in PR). 2. Trigger a connection creation that uses MySQLEngineSpec.update_params_from_encrypted_extra — this is executed during engine creation paths that call BaseEngineSpec.update_params_from_encrypted_extra (the MySQL-specific override is at superset/db_engine_specs/mysql.py:312-348 and runs before create_engine()). 3. The override calls AWSIAMAuthMixin._apply_iam_authentication(...) passing ssl_args={"ssl_mode": "REQUIRED"} (superset/db_engine_specs/mysql.py:338-343). 4. When SQLAlchemy/DBAPI constructs a MySQL connection, common drivers (mysqlclient / mysql-connector-python) accept driver-specific ssl options (e.g. ssl or ssl_ca) but do not accept ssl_mode in connect()/connect_args; the driver will raise TypeError: "connect() got an unexpected keyword argument 'ssl_mode'". 5. Observe connection attempt fails with TypeError at runtime while establishing DB connection (visible in worker logs or UI connection test), blocking IAM-authenticated MySQL/Aurora connections. Note: The final file shows the ssl_args is injected here (mysql.py:338-343). The proposed change removes ssl_mode to avoid passing unsupported args. ``` </details> <details> <summary><b>Prompt for AI Agent 🤖 </b></summary> ```mdx This is a comment left during a code review. **Path:** superset/db_engine_specs/mysql.py **Line:** 341:342 **Comment:** *Type Error: The IAM integration currently passes `ssl_mode="REQUIRED"` into `connect_args`, but common MySQL drivers used by Superset (e.g. `mysqlclient`, `mysql-connector-python`) do not accept an `ssl_mode` keyword and will raise a `TypeError` like "connect() got an unexpected keyword argument 'ssl_mode'", causing IAM-authenticated MySQL connections to fail at runtime. Instead of injecting a driver-specific SSL option here, rely on the engine's existing SSL configuration and avoid passing unsupported SSL arguments. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. ``` </details> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
