codeant-ai-for-open-source[bot] commented on code in PR #36950:
URL: https://github.com/apache/superset/pull/36950#discussion_r2668161471


##########
superset/commands/streaming_export/base.py:
##########
@@ -34,6 +35,31 @@
 logger = logging.getLogger(__name__)
 
 
+@contextmanager
+def preserve_g_context(
+    captured_g: dict[str, Any],
+) -> Generator[None, None, None]:
+    """
+    Context manager that restores captured flask.g attributes.
+
+    This is needed for streaming responses where the generator runs in a new
+    app context but needs access to request-scoped data (like tenant info
+    in multi-tenant setups).
+
+    Args:
+        captured_g: Dictionary of g attributes captured before context switch
+    """
+    for key, value in captured_g.items():
+        setattr(g, key, value)
+    try:
+        yield
+    finally:
+        # Clean up the attributes we set
+        for key in captured_g:
+            if hasattr(g, key):
+                delattr(g, key)

Review Comment:
   **Suggestion:** The cleanup in `preserve_g_context` deletes attributes on 
`flask.g` unconditionally, which will remove attributes that existed before the 
context manager ran and cause state loss across contexts; instead capture the 
original presence/value for each key and restore original values (or delete 
only keys that were not present originally) and guard attribute 
restores/deletions to avoid raising during cleanup. [logic error]
   
   **Severity Level:** Minor ⚠️
   ```suggestion
       # Record original state for keys we may override so we can restore it
       original_values: dict[str, Any] = {}
       original_present: set[str] = set()
       for key, value in captured_g.items():
           if hasattr(g, key):
               try:
                   original_values[key] = getattr(g, key)
                   original_present.add(key)
               except Exception:
                   # If reading the original raises, treat as not present
                   pass
           # set the captured value for the new context
           try:
               setattr(g, key, value)
           except Exception:
               # If setting fails, skip that attribute
               continue
       try:
           yield
       finally:
           # Restore original values if they existed, otherwise remove keys we 
set
           for key in captured_g:
               if key in original_present:
                   try:
                       setattr(g, key, original_values[key])
                   except Exception:
                       # Best-effort restore; swallow to avoid breaking cleanup
                       pass
               else:
                   try:
                       if hasattr(g, key):
                           delattr(g, key)
                   except Exception:
                       # Best-effort deletion; swallow to avoid breaking cleanup
                       pass
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   The current implementation sets captured values onto flask.g then 
indiscriminately deletes those attributes on exit, which will remove any 
attributes that originally existed and cause state loss across contexts. The 
proposed change records original presence/values and restores them (or deletes 
only what was introduced), fixing a real logic bug that can break 
request-scoped state.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/commands/streaming_export/base.py
   **Line:** 53:60
   **Comment:**
        *Logic Error: The cleanup in `preserve_g_context` deletes attributes on 
`flask.g` unconditionally, which will remove attributes that existed before the 
context manager ran and cause state loss across contexts; instead capture the 
original presence/value for each key and restore original values (or delete 
only keys that were not present originally) and guard attribute 
restores/deletions to avoid raising during cleanup.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/commands/streaming_export/base.py:
##########
@@ -58,6 +84,15 @@ def __init__(self, chunk_size: int = 1000):
         """
         self._chunk_size = chunk_size
         self._current_app = app._get_current_object()
+        # Capture flask.g attributes for multi-tenant support.
+        # When streaming responses run in a new app context, we need to
+        # preserve request-scoped data like tenant/workspace info.
+        self._captured_g: dict[str, Any] = {}
+        if has_app_context():
+            # Capture all g attributes (excludes internal Flask attributes)
+            self._captured_g = {
+                key: getattr(g, key) for key in dir(g) if not 
key.startswith("_")
+            }

Review Comment:
   **Suggestion:** Capturing attributes from `g` using `dir(g)` will include 
methods, descriptors and other non-data attributes (and may include objects 
that can't/shouldn't be moved between contexts); capture only actual stored 
data attributes (e.g. from `g.__dict__`) and skip callables and private 
attributes, and avoid sharing mutable objects by shallow-copying container 
types. [possible bug]
   
   **Severity Level:** Critical 🚨
   ```suggestion
               # Capture only actual data attributes stored on g (avoid 
methods/descriptors)
               try:
                   g_dict = getattr(g, "__dict__", {}) or {}
                   captured: dict[str, Any] = {}
                   for key, val in g_dict.items():
                       if key.startswith("_"):
                           continue
                       # skip callables / methods to avoid overriding behavior 
on g
                       if callable(val):
                           continue
                       # shallow copy mutable containers to avoid sharing 
references
                       try:
                           if hasattr(val, "copy"):
                               captured[key] = val.copy()
                           else:
                               captured[key] = val
                       except Exception:
                           # Skip attributes that cannot be copied/read safely
                           continue
                   self._captured_g = captured
               except Exception:
                   # If capturing fails for any reason, fall back to empty 
capture
                   self._captured_g = {}
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   Using dir(g) pulls in methods and descriptors and may capture non-data 
attributes; additionally, storing references to mutable objects can cause 
surprising shared-state when restored in another context. Restricting capture 
to g.__dict__, skipping callables/private names, and shallow-copying container 
types addresses real correctness concerns.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/commands/streaming_export/base.py
   **Line:** 92:95
   **Comment:**
        *Possible Bug: Capturing attributes from `g` using `dir(g)` will 
include methods, descriptors and other non-data attributes (and may include 
objects that can't/shouldn't be moved between contexts); capture only actual 
stored data attributes (e.g. from `g.__dict__`) and skip callables and private 
attributes, and avoid sharing mutable objects by shallow-copying container 
types.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/commands/streaming_export/base.py:
##########
@@ -192,23 +227,27 @@ def run(self) -> Callable[[], Generator[str, None, None]]:
         # to avoid DetachedInstanceError
         sql, database = self._get_sql_and_database()
         limit = self._get_row_limit()
+        # Capture g attributes for restoration in the new context
+        captured_g = self._captured_g
 

Review Comment:
   **Suggestion:** Capturing request-scoped `g` attributes at object 
construction time can be too early (no request or different request), causing 
stale or missing values during streaming; capture `g` lazily at the time 
`run()` is invoked (when an app/request context is available) if nothing was 
captured earlier. [state inconsistency]
   
   **Severity Level:** Minor ⚠️
   ```suggestion
           # If nothing was captured at init time, capture lazily here when an 
app/request
           # context is available so we don't store stale or partial 
request-scoped state.
           captured_g = self._captured_g
           if not captured_g and has_app_context():
               try:
                   g_dict = getattr(g, "__dict__", {}) or {}
                   captured: dict[str, Any] = {}
                   for key, val in g_dict.items():
                       if key.startswith("_"):
                           continue
                       if callable(val):
                           continue
                       try:
                           captured[key] = val.copy() if hasattr(val, "copy") 
else val
                       except Exception:
                           continue
                   captured_g = captured
               except Exception:
                   captured_g = {}
   ```
   <details>
   <summary><b>Why it matters? ⭐ </b></summary>
   
   Capturing request-scoped state at object construction (when an app/request 
context may not exist) can produce empty or stale captures. Capturing lazily in 
run() (when an app context is available) avoids missing or stale values during 
streaming. This change prevents state inconsistencies for long-lived objects 
created outside request handling.
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/commands/streaming_export/base.py
   **Line:** 232:232
   **Comment:**
        *State Inconsistency: Capturing request-scoped `g` attributes at object 
construction time can be too early (no request or different request), causing 
stale or missing values during streaming; capture `g` lazily at the time 
`run()` is invoked (when an app/request context is available) if nothing was 
captured earlier.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to