codeant-ai-for-open-source[bot] commented on code in PR #38469: URL: https://github.com/apache/superset/pull/38469#discussion_r2895294791
########## superset/migrations/versions/2026-03-06_13-00_b2c3d4e5f6a7_fix_upstream_oauth_tokens_column_types.py: ########## @@ -0,0 +1,81 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +"""fix_upstream_oauth_tokens_column_types + +Recreate upstream_oauth_tokens with EncryptedType columns instead of plain Text. +The previous migration incorrectly used sa.Text() for access_token and +refresh_token, causing a TypeError when SQLAlchemy tried to read them back. + +Revision ID: b2c3d4e5f6a7 +Revises: a1b2c3d4e5f6 +Create Date: 2026-03-06 13:00:00.000000 + +""" + +import sqlalchemy as sa +from alembic import op +from sqlalchemy_utils import EncryptedType + +# revision identifiers, used by Alembic. +revision = "b2c3d4e5f6a7" +down_revision = "a1b2c3d4e5f6" + + +def upgrade() -> None: + # Drop and recreate the table with correct EncryptedType columns. + # The table is empty at this point (token saves were failing due to the + # previous bug), so no data migration is needed. + op.drop_index( + "idx_upstream_oauth_tokens_user_provider", + table_name="upstream_oauth_tokens", + ) + op.drop_table("upstream_oauth_tokens") + + op.create_table( + "upstream_oauth_tokens", + sa.Column("id", sa.Integer(), nullable=False), + sa.Column("user_id", sa.Integer(), nullable=False), + sa.Column("provider", sa.String(256), nullable=False), + sa.Column("access_token", EncryptedType(), nullable=True), + sa.Column("access_token_expiration", sa.DateTime(), nullable=True), + sa.Column("refresh_token", EncryptedType(), nullable=True), Review Comment: **Suggestion:** This follow-up migration again declares `access_token` and `refresh_token` as `EncryptedType()` with no constructor arguments, which is invalid for `sqlalchemy_utils.EncryptedType` and will cause the Alembic migration to fail at runtime; switch these to a concrete SQLAlchemy column type such as `sa.Text()` so the upgrade can run successfully. [type error] <details> <summary><b>Severity Level:</b> Major ⚠️</summary> ```mdx - ❌ `superset db upgrade` fails at this migration. - ❌ Upstream OAuth tokens feature cannot be initialized. - ⚠️ Suggestion reintroduces Text columns, losing intended encryption. - ⚠️ Suggestion may revive prior TypeError described in migration docstring. ``` </details> ```suggestion sa.Column("access_token", sa.Text(), nullable=True), sa.Column("refresh_token", sa.Text(), nullable=True), ``` <details> <summary><b>Steps of Reproduction ✅ </b></summary> ```mdx 1. Start a Superset instance that includes this migration file at `superset/migrations/versions/2026-03-06_13-00_b2c3d4e5f6a7_fix_upstream_oauth_tokens_column_types.py`. 2. Run the database upgrade command (which uses Alembic), e.g. `superset db upgrade` or `alembic upgrade head`, which loads this revision (revision ID `b2c3d4e5f6a7`, down_revision `a1b2c3d4e5f6`). 3. Alembic calls `upgrade()` in this file (lines 38–74). Execution reaches the `op.create_table(...)` call where `access_token` and `refresh_token` columns are defined at lines 53 and 55 using `EncryptedType()` with no arguments. 4. At runtime, Python attempts to instantiate `EncryptedType()` (from `sqlalchemy_utils`) but its `__init__` requires at least the underlying type and an encryption key; the missing required positional arguments cause a `TypeError` and the migration fails, aborting the upgrade. ``` </details> <details> <summary><b>Prompt for AI Agent 🤖 </b></summary> ```mdx This is a comment left during a code review. **Path:** superset/migrations/versions/2026-03-06_13-00_b2c3d4e5f6a7_fix_upstream_oauth_tokens_column_types.py **Line:** 53:55 **Comment:** *Type Error: This follow-up migration again declares `access_token` and `refresh_token` as `EncryptedType()` with no constructor arguments, which is invalid for `sqlalchemy_utils.EncryptedType` and will cause the Alembic migration to fail at runtime; switch these to a concrete SQLAlchemy column type such as `sa.Text()` so the upgrade can run successfully. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. ``` </details> <a href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38469&comment_hash=6b94fefaebf9889d10456e124bbf83d18867ce5c80f8262bb9fbfc7646aa8327&reaction=like'>👍</a> | <a href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38469&comment_hash=6b94fefaebf9889d10456e124bbf83d18867ce5c80f8262bb9fbfc7646aa8327&reaction=dislike'>👎</a> ########## superset/migrations/versions/2026-03-06_12-00_a1b2c3d4e5f6_add_upstream_oauth_tokens.py: ########## @@ -0,0 +1,74 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +"""add_upstream_oauth_tokens + +Revision ID: a1b2c3d4e5f6 +Revises: f5b5f88d8526 +Create Date: 2026-03-06 12:00:00.000000 + +""" + +import sqlalchemy as sa +from alembic import op +from sqlalchemy_utils import EncryptedType + +# revision identifiers, used by Alembic. +revision = "a1b2c3d4e5f6" +down_revision = "f5b5f88d8526" + + +def upgrade() -> None: + op.create_table( + "upstream_oauth_tokens", + sa.Column("id", sa.Integer(), nullable=False), + sa.Column("user_id", sa.Integer(), nullable=False), + sa.Column("provider", sa.String(256), nullable=False), + sa.Column("access_token", EncryptedType(), nullable=True), + sa.Column("access_token_expiration", sa.DateTime(), nullable=True), + sa.Column("refresh_token", EncryptedType(), nullable=True), Review Comment: **Suggestion:** The migration defines `access_token` and `refresh_token` columns using `EncryptedType()` without the required constructor arguments (e.g. underlying type and key), which will raise a `TypeError` when Alembic runs this migration; use a concrete SQLAlchemy type like `sa.Text()` for the column instead so the migration can execute. [type error] <details> <summary><b>Severity Level:</b> Critical 🚨</summary> ```mdx - ❌ Alembic upgrade to revision a1b2c3d4e5f6 always fails. - ❌ New `upstream_oauth_tokens` table cannot be created. - ❌ Feature using upstream OAuth tokens cannot be deployed. - ❌ Superset instances cannot progress past this migration. ``` </details> ```suggestion sa.Column("access_token", sa.Text(), nullable=True), sa.Column("refresh_token", sa.Text(), nullable=True), ``` <details> <summary><b>Steps of Reproduction ✅ </b></summary> ```mdx 1. Start from a Superset environment where Alembic is configured to use the migrations under `superset/migrations/versions/`. 2. Run a database upgrade that reaches revision `a1b2c3d4e5f6` (e.g. `superset db upgrade`), which causes Alembic to import `superset/migrations/versions/2026-03-06_12-00_a1b2c3d4e5f6_add_upstream_oauth_tokens.py`. 3. During import, `upgrade()` is evaluated and the `op.create_table(...)` definition at lines 35–61 is constructed, including `sa.Column("access_token", EncryptedType(), nullable=True)` at line 40 and `sa.Column("refresh_token", EncryptedType(), nullable=True)` at line 42. 4. Instantiating `EncryptedType()` without the required constructor arguments (underlying type and key) raises `TypeError: __init__() missing required positional arguments` before the migration can execute, causing the Alembic upgrade command to fail and preventing the schema from being applied. ``` </details> <details> <summary><b>Prompt for AI Agent 🤖 </b></summary> ```mdx This is a comment left during a code review. **Path:** superset/migrations/versions/2026-03-06_12-00_a1b2c3d4e5f6_add_upstream_oauth_tokens.py **Line:** 40:42 **Comment:** *Type Error: The migration defines `access_token` and `refresh_token` columns using `EncryptedType()` without the required constructor arguments (e.g. underlying type and key), which will raise a `TypeError` when Alembic runs this migration; use a concrete SQLAlchemy type like `sa.Text()` for the column instead so the migration can execute. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. ``` </details> <a href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38469&comment_hash=fc8487a19a3e4834b4db8ad38fde3fd5b8cf217f50f4ea4350b9960ba327fa23&reaction=like'>👍</a> | <a href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38469&comment_hash=fc8487a19a3e4834b4db8ad38fde3fd5b8cf217f50f4ea4350b9960ba327fa23&reaction=dislike'>👎</a> ########## superset/utils/oauth2.py: ########## @@ -276,3 +276,102 @@ def check_for_oauth2(database: Database) -> Iterator[None]: if database.is_oauth2_enabled() and database.db_engine_spec.needs_oauth2(ex): database.db_engine_spec.start_oauth2_dance(database) raise + + +def save_user_provider_token( + user_id: int, + provider: str, + token_response: dict[str, Any], +) -> None: + """ + Upsert an UpstreamOAuthToken row for the given user + provider. + """ + from superset.models.core import UpstreamOAuthToken + + token: UpstreamOAuthToken | None = ( + db.session.query(UpstreamOAuthToken) + .filter_by(user_id=user_id, provider=provider) + .one_or_none() + ) + if token is None: + token = UpstreamOAuthToken(user_id=user_id, provider=provider) + + token.access_token = token_response.get("access_token") + expires_in = token_response.get("expires_in") + token.access_token_expiration = ( + datetime.now() + timedelta(seconds=expires_in) if expires_in else None + ) + token.refresh_token = token_response.get("refresh_token") + db.session.add(token) + db.session.commit() + + +def get_upstream_provider_token(provider: str, user_id: int) -> str | None: + """ + Retrieve a valid access token for the given provider and user. + + If the token is expired and a refresh token exists, attempt to refresh it. + Returns None if no valid token is available. + """ + from superset.models.core import UpstreamOAuthToken + + token: UpstreamOAuthToken | None = ( + db.session.query(UpstreamOAuthToken) + .filter_by(user_id=user_id, provider=provider) + .one_or_none() + ) + if token is None: + return None + + now = datetime.now() + if token.access_token_expiration and token.access_token_expiration > now: Review Comment: **Suggestion:** When an upstream OAuth token is saved without an explicit expiration (no `expires_in`), `access_token_expiration` is set to None, but `get_upstream_provider_token` treats a None expiration as "expired" and immediately refreshes or deletes the token, so non-expiring or unknown-expiry tokens are unusable. The fix is to treat a missing expiration as a non-expiring token and return it directly instead of going down the expiry path. [logic error] <details> <summary><b>Severity Level:</b> Major ⚠️</summary> ```mdx - ❌ Upstream OAuth reuse fails for tokens without expiry. - ⚠️ Affected users see repeated provider re-authentication. - ⚠️ DATABASE_OAUTH2_UPSTREAM_PROVIDERS flows misbehave with such IdPs. ``` </details> ```suggestion # If no expiration is set, treat the token as non-expiring if token.access_token_expiration is None: return token.access_token now = datetime.now() if token.access_token_expiration > now: ``` <details> <summary><b>Steps of Reproduction ✅ </b></summary> ```mdx 1. Call `save_user_provider_token()` in `superset/utils/oauth2.py:281-307` with a `token_response` dict that includes `"access_token"` but omits `"expires_in"` for a given `user_id` and `provider`; this writes an `UpstreamOAuthToken` row in `superset/models/core.py:1372-1392` with `access_token_expiration = None`. 2. Later in the same request lifecycle or a subsequent one, call `get_upstream_provider_token(provider, user_id)` in `superset/utils/oauth2.py:309-336` for that same `user_id` and `provider`. 3. Inside `get_upstream_provider_token`, the ORM query at lines 317-321 returns the previously stored `UpstreamOAuthToken` with `access_token_expiration is None`. 4. At lines 326-335, the conditional `if token.access_token_expiration and token.access_token_expiration > now:` evaluates to `False` when `access_token_expiration` is `None`, so the function unconditionally treats the token as expired and either refreshes it (if `refresh_token` is present) or deletes it and returns `None`, making a non-expiring / unknown-expiry token unusable on normal calls. ``` </details> <details> <summary><b>Prompt for AI Agent 🤖 </b></summary> ```mdx This is a comment left during a code review. **Path:** superset/utils/oauth2.py **Line:** 326:327 **Comment:** *Logic Error: When an upstream OAuth token is saved without an explicit expiration (no `expires_in`), `access_token_expiration` is set to None, but `get_upstream_provider_token` treats a None expiration as "expired" and immediately refreshes or deletes the token, so non-expiring or unknown-expiry tokens are unusable. The fix is to treat a missing expiration as a non-expiring token and return it directly instead of going down the expiry path. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. ``` </details> <a href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38469&comment_hash=03d95db5fb576f09f63eb32e4f654de621de79dac4a826c02ee2f96ec7fe7b6a&reaction=like'>👍</a> | <a href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38469&comment_hash=03d95db5fb576f09f63eb32e4f654de621de79dac4a826c02ee2f96ec7fe7b6a&reaction=dislike'>👎</a> ########## superset/security/manager.py: ########## @@ -480,6 +480,59 @@ def request_loader(self, request: Request) -> Optional[User]: return self.get_guest_user_from_request(request) return None + def set_oauth_session(self, provider: str, oauth_response: dict[str, Any]) -> None: + """ + Override to persist the full OAuth token response before FAB reduces it to + ``session["oauth"] = (access_token, secret)`` tuple. + """ + # pylint: disable=import-outside-toplevel + from flask import session + + super().set_oauth_session(provider, oauth_response) + # FAB stores only (access_token, secret) in session["oauth"]. + # We need the full response (expires_in, refresh_token, …) for upstream forwarding. + session["oauth_full_token"] = dict(oauth_response) + + def auth_user_oauth(self, userinfo: dict[str, Any]) -> Any: + """ + Override to save the upstream OAuth token when a user logs in via OAuth. + + If ``save_token: True`` is set in the matching OAUTH_PROVIDERS entry and + ``DATABASE_OAUTH2_UPSTREAM_PROVIDERS`` maps a database to this provider, + the token will be forwarded to that database instead of triggering a + separate OAuth2 dance. + """ + # pylint: disable=import-outside-toplevel + from flask import current_app as flask_app, session + + user = super().auth_user_oauth(userinfo) + if user: + provider = session.get("oauth_provider") + # Use the full token dict saved by set_oauth_session, not the + # (access_token, secret) tuple that FAB stores in session["oauth"]. + token = session.get("oauth_full_token") + if token and provider: + provider_config = next( + ( + p + for p in flask_app.config.get("OAUTH_PROVIDERS", []) + if p.get("name") == provider + ), + None, + ) + if provider_config and provider_config.get("save_token"): + from superset.utils.oauth2 import save_user_provider_token + + try: + save_user_provider_token(user.id, provider, token) + except Exception: # pylint: disable=broad-except + logger.warning( + "Failed to save upstream OAuth token for provider %s", + provider, + exc_info=True, + ) + return user Review Comment: **Suggestion:** The full upstream OAuth token, which may include a long-lived refresh token, is stored in the Flask session under a generic key and never cleared after login, so it persists in the user session longer than necessary and increases the risk of token leakage if the session cookie or store is compromised; after using it to persist the token server-side, it should be removed from the session. [security] <details> <summary><b>Severity Level:</b> Critical 🚨</summary> ```mdx - ❌ OAuth login via SupersetSecurityManager stores full upstream token. - ⚠️ Session store compromise exposes long-lived upstream refresh tokens. ``` </details> ```suggestion def auth_user_oauth(self, userinfo: dict[str, Any]) -> Any: """ Override to save the upstream OAuth token when a user logs in via OAuth. If ``save_token: True`` is set in the matching OAUTH_PROVIDERS entry and ``DATABASE_OAUTH2_UPSTREAM_PROVIDERS`` maps a database to this provider, the token will be forwarded to that database instead of triggering a separate OAuth2 dance. """ # pylint: disable=import-outside-toplevel from flask import current_app as flask_app, session user = super().auth_user_oauth(userinfo) if user: provider = session.get("oauth_provider") # Use the full token dict saved by set_oauth_session, not the # (access_token, secret) tuple that FAB stores in session["oauth"]. token = session.get("oauth_full_token") if token and provider: provider_config = next( ( p for p in flask_app.config.get("OAUTH_PROVIDERS", []) if p.get("name") == provider ), None, ) if provider_config and provider_config.get("save_token"): from superset.utils.oauth2 import save_user_provider_token try: save_user_provider_token(user.id, provider, token) except Exception: # pylint: disable=broad-except logger.warning( "Failed to save upstream OAuth token for provider %s", provider, exc_info=True, ) # Once we've used the full OAuth token (or determined we don't need it), # remove it from the session to avoid keeping sensitive data longer than necessary. session.pop("oauth_full_token", None) return user ``` <details> <summary><b>Steps of Reproduction ✅ </b></summary> ```mdx 1. Start Superset with the PR code, which uses `SupersetSecurityManager` defined in `superset/security/manager.py` and registers it as the app's security manager (default for Superset). 2. Configure OAuth login with an upstream provider that supports refresh tokens and sets `save_token: True` in `OAUTH_PROVIDERS` (per PR description); log in via OAuth so that `SupersetSecurityManager.set_oauth_session()` at `superset/security/manager.py:483-494` runs and stores the full token dict in `session["oauth_full_token"]`. 3. During the same OAuth login flow, `SupersetSecurityManager.auth_user_oauth()` at `superset/security/manager.py:496-534` is invoked by the FAB OAuth auth view; it reads `session["oauth_full_token"]` and persists it via `save_user_provider_token(...)` but never clears `session["oauth_full_token"]`. 4. On any subsequent request in that browser session (for example, loading `/superset/welcome/` after login), inspect the Flask session contents on the server (e.g., via a debugger or logging in a custom view): `session.get("oauth_full_token")` is still populated with the full upstream OAuth token, including long-lived fields like `refresh_token`, demonstrating that the token is retained for the entire session lifetime rather than being cleared after use. ``` </details> <details> <summary><b>Prompt for AI Agent 🤖 </b></summary> ```mdx This is a comment left during a code review. **Path:** superset/security/manager.py **Line:** 496:534 **Comment:** *Security: The full upstream OAuth token, which may include a long-lived refresh token, is stored in the Flask session under a generic key and never cleared after login, so it persists in the user session longer than necessary and increases the risk of token leakage if the session cookie or store is compromised; after using it to persist the token server-side, it should be removed from the session. Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. ``` </details> <a href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38469&comment_hash=548790d15f803080162554a0ba1c2f05bc6670b7254e087c77611f0176ec8c95&reaction=like'>👍</a> | <a href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38469&comment_hash=548790d15f803080162554a0ba1c2f05bc6670b7254e087c77611f0176ec8c95&reaction=dislike'>👎</a> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
