codeant-ai-for-open-source[bot] commented on code in PR #37570:
URL: https://github.com/apache/superset/pull/37570#discussion_r2745238510


##########
superset/result_set.py:
##########
@@ -116,8 +141,13 @@ def __init__(  # pylint: disable=too-many-locals  # noqa: 
C901
 
         if cursor_description:
             # get deduped list of column names
+            # Use normalize_column_name to handle None/empty names from 
databases
+            # like MSSQL that return empty strings for unnamed columns
             column_names = dedup(
-                [convert_to_string(col[0]) for col in cursor_description]
+                [
+                    normalize_column_name(col[0], idx)
+                    for idx, col in enumerate(cursor_description)
+                ]
             )

Review Comment:
   **Suggestion:** Accessing `col[0]` without validating that `col` is a 
sequence (or non-None) can raise TypeError/IndexError if a cursor description 
entry is None or not indexable; iterate and safely extract the 0th element 
before calling `normalize_column_name`. [possible bug]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ SupersetResultSet construction fails on malformed cursor descriptions.
   - ⚠️ Query rendering flows may error when cursor.description malformed.
   - ⚠️ Alerts/dimension creation pipelines using result set fail.
   ```
   </details>
   
   ```suggestion
               column_names_values: list[str] = []
               for idx, col in enumerate(cursor_description):
                   try:
                       name_value = col[0]
                   except Exception:
                       # description entry missing or not indexable; treat as 
unnamed
                       name_value = None
                   column_names_values.append(normalize_column_name(name_value, 
idx))
               column_names = dedup(column_names_values)
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. In a Python REPL or unit test import the class: open 
`superset/result_set.py` and
   locate `SupersetResultSet.__init__` at `superset/result_set.py:128` 
(constructor start).
   
   2. Construct a malformed cursor description and invoke the constructor 
directly:
   
      - Create `cursor_description = [None]` or `cursor_description = [123]` in 
the test.
   
      - Call `SupersetResultSet(data=[], cursor_description=cursor_description,
      db_engine_spec=BaseEngineSpec)` (the call site is the `__init__` at
      `superset/result_set.py:128`).
   
   3. When the list comprehension at `superset/result_set.py:146-150` executes, 
it tries to
   evaluate `col[0]` (the call to `normalize_column_name(col[0], idx)` is at
   `superset/result_set.py:148`), which raises:
   
      - `TypeError: 'NoneType' object is not subscriptable` for `None`, or
   
      - `TypeError: 'int' object is not subscriptable` for `123`.
   
   4. Observe the exception bubbles out of `SupersetResultSet.__init__` and 
prevents
   result-set construction; this reproduces the failure caused by unguarded 
`col[0]` access.
   
      - Note: This is reproducible without other parts of Superset by directly 
calling the
      constructor described above.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/result_set.py
   **Line:** 146:151
   **Comment:**
        *Possible Bug: Accessing `col[0]` without validating that `col` is a 
sequence (or non-None) can raise TypeError/IndexError if a cursor description 
entry is None or not indexable; iterate and safely extract the 0th element 
before calling `normalize_column_name`.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/result_set.py:
##########
@@ -99,6 +99,31 @@ def convert_to_string(value: Any) -> str:
     return str(value)
 
 
+def normalize_column_name(value: Any, index: int) -> str:
+    """
+    Normalize a column name from the cursor description.
+
+    Some databases (e.g., MSSQL) return empty strings for unnamed columns
+    (e.g., SELECT COUNT(*) without an alias). This function ensures every
+    column has a valid, non-empty name by generating a positional fallback
+    name when needed.
+
+    :param value: The column name from cursor.description (can be str, bytes, 
None, etc.)
+    :param index: The 0-based column position, used to generate fallback names
+    :return: A non-empty string column name
+    """
+    if value is None:
+        return f"_col{index}"
+
+    name = convert_to_string(value)

Review Comment:
   **Suggestion:** Decoding column names can raise UnicodeDecodeError when 
`convert_to_string` tries to decode non-UTF-8 bytes; catch decoding errors and 
fallback to a safe decode (errors='replace') or str conversion so the 
normalization step never raises. [possible bug]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ SupersetResultSet fails on non-UTF8 column names.
   - ⚠️ Query display can error for weird driver encodings.
   - ⚠️ Downstream callers may see exceptions during result parsing.
   ```
   </details>
   
   ```suggestion
       try:
           name = convert_to_string(value)
       except UnicodeDecodeError:
           # Fallback for bytes that are not valid UTF-8: replace invalid bytes
           if isinstance(value, (bytes, bytearray)):
               name = value.decode("utf-8", errors="replace")
           else:
               name = str(value)
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Open `superset/result_set.py` and find `convert_to_string` at
   `superset/result_set.py:89-99` and `normalize_column_name` at
   `superset/result_set.py:102-124`.
   
   2. Create a cursor description entry containing non-UTF8 bytes, e.g. 
`cursor_description =
   [(b"\xff", None, None, None, None, None, None)]`.
   
   3. Instantiate `SupersetResultSet` in a test or REPL:
   
      `SupersetResultSet(data=[], cursor_description=cursor_description,
      db_engine_spec=BaseEngineSpec)` (`__init__` begins at 
`superset/result_set.py:128`).
   
   4. During name normalization the code calls `convert_to_string` (invoked from
   `normalize_column_name` at `superset/result_set.py:118`), which executes the 
bytes decode
   at `superset/result_set.py:96-97` and raises `UnicodeDecodeError` for 
invalid UTF-8 bytes,
   causing `SupersetResultSet.__init__` to error and abort result construction.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/result_set.py
   **Line:** 118:118
   **Comment:**
        *Possible Bug: Decoding column names can raise UnicodeDecodeError when 
`convert_to_string` tries to decode non-UTF-8 bytes; catch decoding errors and 
fallback to a safe decode (errors='replace') or str conversion so the 
normalization step never raises.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to