codeant-ai-for-open-source[bot] commented on code in PR #36950:
URL: https://github.com/apache/superset/pull/36950#discussion_r2668161471
##########
superset/commands/streaming_export/base.py:
##########
@@ -34,6 +35,31 @@
logger = logging.getLogger(__name__)
+@contextmanager
+def preserve_g_context(
+ captured_g: dict[str, Any],
+) -> Generator[None, None, None]:
+ """
+ Context manager that restores captured flask.g attributes.
+
+ This is needed for streaming responses where the generator runs in a new
+ app context but needs access to request-scoped data (like tenant info
+ in multi-tenant setups).
+
+ Args:
+ captured_g: Dictionary of g attributes captured before context switch
+ """
+ for key, value in captured_g.items():
+ setattr(g, key, value)
+ try:
+ yield
+ finally:
+ # Clean up the attributes we set
+ for key in captured_g:
+ if hasattr(g, key):
+ delattr(g, key)
Review Comment:
**Suggestion:** The cleanup in `preserve_g_context` deletes attributes on
`flask.g` unconditionally, which will remove attributes that existed before the
context manager ran and cause state loss across contexts; instead capture the
original presence/value for each key and restore original values (or delete
only keys that were not present originally) and guard attribute
restores/deletions to avoid raising during cleanup. [logic error]
**Severity Level:** Minor ⚠️
```suggestion
# Record original state for keys we may override so we can restore it
original_values: dict[str, Any] = {}
original_present: set[str] = set()
for key, value in captured_g.items():
if hasattr(g, key):
try:
original_values[key] = getattr(g, key)
original_present.add(key)
except Exception:
# If reading the original raises, treat as not present
pass
# set the captured value for the new context
try:
setattr(g, key, value)
except Exception:
# If setting fails, skip that attribute
continue
try:
yield
finally:
# Restore original values if they existed, otherwise remove keys we
set
for key in captured_g:
if key in original_present:
try:
setattr(g, key, original_values[key])
except Exception:
# Best-effort restore; swallow to avoid breaking cleanup
pass
else:
try:
if hasattr(g, key):
delattr(g, key)
except Exception:
# Best-effort deletion; swallow to avoid breaking cleanup
pass
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
The current implementation sets captured values onto flask.g then
indiscriminately deletes those attributes on exit, which will remove any
attributes that originally existed and cause state loss across contexts. The
proposed change records original presence/values and restores them (or deletes
only what was introduced), fixing a real logic bug that can break
request-scoped state.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/commands/streaming_export/base.py
**Line:** 53:60
**Comment:**
*Logic Error: The cleanup in `preserve_g_context` deletes attributes on
`flask.g` unconditionally, which will remove attributes that existed before the
context manager ran and cause state loss across contexts; instead capture the
original presence/value for each key and restore original values (or delete
only keys that were not present originally) and guard attribute
restores/deletions to avoid raising during cleanup.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
superset/commands/streaming_export/base.py:
##########
@@ -58,6 +84,15 @@ def __init__(self, chunk_size: int = 1000):
"""
self._chunk_size = chunk_size
self._current_app = app._get_current_object()
+ # Capture flask.g attributes for multi-tenant support.
+ # When streaming responses run in a new app context, we need to
+ # preserve request-scoped data like tenant/workspace info.
+ self._captured_g: dict[str, Any] = {}
+ if has_app_context():
+ # Capture all g attributes (excludes internal Flask attributes)
+ self._captured_g = {
+ key: getattr(g, key) for key in dir(g) if not
key.startswith("_")
+ }
Review Comment:
**Suggestion:** Capturing attributes from `g` using `dir(g)` will include
methods, descriptors and other non-data attributes (and may include objects
that can't/shouldn't be moved between contexts); capture only actual stored
data attributes (e.g. from `g.__dict__`) and skip callables and private
attributes, and avoid sharing mutable objects by shallow-copying container
types. [possible bug]
**Severity Level:** Critical 🚨
```suggestion
# Capture only actual data attributes stored on g (avoid
methods/descriptors)
try:
g_dict = getattr(g, "__dict__", {}) or {}
captured: dict[str, Any] = {}
for key, val in g_dict.items():
if key.startswith("_"):
continue
# skip callables / methods to avoid overriding behavior
on g
if callable(val):
continue
# shallow copy mutable containers to avoid sharing
references
try:
if hasattr(val, "copy"):
captured[key] = val.copy()
else:
captured[key] = val
except Exception:
# Skip attributes that cannot be copied/read safely
continue
self._captured_g = captured
except Exception:
# If capturing fails for any reason, fall back to empty
capture
self._captured_g = {}
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
Using dir(g) pulls in methods and descriptors and may capture non-data
attributes; additionally, storing references to mutable objects can cause
surprising shared-state when restored in another context. Restricting capture
to g.__dict__, skipping callables/private names, and shallow-copying container
types addresses real correctness concerns.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/commands/streaming_export/base.py
**Line:** 92:95
**Comment:**
*Possible Bug: Capturing attributes from `g` using `dir(g)` will
include methods, descriptors and other non-data attributes (and may include
objects that can't/shouldn't be moved between contexts); capture only actual
stored data attributes (e.g. from `g.__dict__`) and skip callables and private
attributes, and avoid sharing mutable objects by shallow-copying container
types.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
superset/commands/streaming_export/base.py:
##########
@@ -192,23 +227,27 @@ def run(self) -> Callable[[], Generator[str, None, None]]:
# to avoid DetachedInstanceError
sql, database = self._get_sql_and_database()
limit = self._get_row_limit()
+ # Capture g attributes for restoration in the new context
+ captured_g = self._captured_g
Review Comment:
**Suggestion:** Capturing request-scoped `g` attributes at object
construction time can be too early (no request or different request), causing
stale or missing values during streaming; capture `g` lazily at the time
`run()` is invoked (when an app/request context is available) if nothing was
captured earlier. [state inconsistency]
**Severity Level:** Minor ⚠️
```suggestion
# If nothing was captured at init time, capture lazily here when an
app/request
# context is available so we don't store stale or partial
request-scoped state.
captured_g = self._captured_g
if not captured_g and has_app_context():
try:
g_dict = getattr(g, "__dict__", {}) or {}
captured: dict[str, Any] = {}
for key, val in g_dict.items():
if key.startswith("_"):
continue
if callable(val):
continue
try:
captured[key] = val.copy() if hasattr(val, "copy")
else val
except Exception:
continue
captured_g = captured
except Exception:
captured_g = {}
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
Capturing request-scoped state at object construction (when an app/request
context may not exist) can produce empty or stale captures. Capturing lazily in
run() (when an app context is available) avoids missing or stale values during
streaming. This change prevents state inconsistencies for long-lived objects
created outside request handling.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/commands/streaming_export/base.py
**Line:** 232:232
**Comment:**
*State Inconsistency: Capturing request-scoped `g` attributes at object
construction time can be too early (no request or different request), causing
stale or missing values during streaming; capture `g` lazily at the time
`run()` is invoked (when an app/request context is available) if nothing was
captured earlier.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]