codeant-ai-for-open-source[bot] commented on code in PR #37570:
URL: https://github.com/apache/superset/pull/37570#discussion_r2745238510
##########
superset/result_set.py:
##########
@@ -116,8 +141,13 @@ def __init__( # pylint: disable=too-many-locals # noqa:
C901
if cursor_description:
# get deduped list of column names
+ # Use normalize_column_name to handle None/empty names from
databases
+ # like MSSQL that return empty strings for unnamed columns
column_names = dedup(
- [convert_to_string(col[0]) for col in cursor_description]
+ [
+ normalize_column_name(col[0], idx)
+ for idx, col in enumerate(cursor_description)
+ ]
)
Review Comment:
**Suggestion:** Accessing `col[0]` without validating that `col` is a
sequence (or non-None) can raise TypeError/IndexError if a cursor description
entry is None or not indexable; iterate and safely extract the 0th element
before calling `normalize_column_name`. [possible bug]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ SupersetResultSet construction fails on malformed cursor descriptions.
- ⚠️ Query rendering flows may error when cursor.description malformed.
- ⚠️ Alerts/dimension creation pipelines using result set fail.
```
</details>
```suggestion
column_names_values: list[str] = []
for idx, col in enumerate(cursor_description):
try:
name_value = col[0]
except Exception:
# description entry missing or not indexable; treat as
unnamed
name_value = None
column_names_values.append(normalize_column_name(name_value,
idx))
column_names = dedup(column_names_values)
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. In a Python REPL or unit test import the class: open
`superset/result_set.py` and
locate `SupersetResultSet.__init__` at `superset/result_set.py:128`
(constructor start).
2. Construct a malformed cursor description and invoke the constructor
directly:
- Create `cursor_description = [None]` or `cursor_description = [123]` in
the test.
- Call `SupersetResultSet(data=[], cursor_description=cursor_description,
db_engine_spec=BaseEngineSpec)` (the call site is the `__init__` at
`superset/result_set.py:128`).
3. When the list comprehension at `superset/result_set.py:146-150` executes,
it tries to
evaluate `col[0]` (the call to `normalize_column_name(col[0], idx)` is at
`superset/result_set.py:148`), which raises:
- `TypeError: 'NoneType' object is not subscriptable` for `None`, or
- `TypeError: 'int' object is not subscriptable` for `123`.
4. Observe the exception bubbles out of `SupersetResultSet.__init__` and
prevents
result-set construction; this reproduces the failure caused by unguarded
`col[0]` access.
- Note: This is reproducible without other parts of Superset by directly
calling the
constructor described above.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/result_set.py
**Line:** 146:151
**Comment:**
*Possible Bug: Accessing `col[0]` without validating that `col` is a
sequence (or non-None) can raise TypeError/IndexError if a cursor description
entry is None or not indexable; iterate and safely extract the 0th element
before calling `normalize_column_name`.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
superset/result_set.py:
##########
@@ -99,6 +99,31 @@ def convert_to_string(value: Any) -> str:
return str(value)
+def normalize_column_name(value: Any, index: int) -> str:
+ """
+ Normalize a column name from the cursor description.
+
+ Some databases (e.g., MSSQL) return empty strings for unnamed columns
+ (e.g., SELECT COUNT(*) without an alias). This function ensures every
+ column has a valid, non-empty name by generating a positional fallback
+ name when needed.
+
+ :param value: The column name from cursor.description (can be str, bytes,
None, etc.)
+ :param index: The 0-based column position, used to generate fallback names
+ :return: A non-empty string column name
+ """
+ if value is None:
+ return f"_col{index}"
+
+ name = convert_to_string(value)
Review Comment:
**Suggestion:** Decoding column names can raise UnicodeDecodeError when
`convert_to_string` tries to decode non-UTF-8 bytes; catch decoding errors and
fallback to a safe decode (errors='replace') or str conversion so the
normalization step never raises. [possible bug]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ SupersetResultSet fails on non-UTF8 column names.
- ⚠️ Query display can error for weird driver encodings.
- ⚠️ Downstream callers may see exceptions during result parsing.
```
</details>
```suggestion
try:
name = convert_to_string(value)
except UnicodeDecodeError:
# Fallback for bytes that are not valid UTF-8: replace invalid bytes
if isinstance(value, (bytes, bytearray)):
name = value.decode("utf-8", errors="replace")
else:
name = str(value)
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Open `superset/result_set.py` and find `convert_to_string` at
`superset/result_set.py:89-99` and `normalize_column_name` at
`superset/result_set.py:102-124`.
2. Create a cursor description entry containing non-UTF8 bytes, e.g.
`cursor_description =
[(b"\xff", None, None, None, None, None, None)]`.
3. Instantiate `SupersetResultSet` in a test or REPL:
`SupersetResultSet(data=[], cursor_description=cursor_description,
db_engine_spec=BaseEngineSpec)` (`__init__` begins at
`superset/result_set.py:128`).
4. During name normalization the code calls `convert_to_string` (invoked from
`normalize_column_name` at `superset/result_set.py:118`), which executes the
bytes decode
at `superset/result_set.py:96-97` and raises `UnicodeDecodeError` for
invalid UTF-8 bytes,
causing `SupersetResultSet.__init__` to error and abort result construction.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/result_set.py
**Line:** 118:118
**Comment:**
*Possible Bug: Decoding column names can raise UnicodeDecodeError when
`convert_to_string` tries to decode non-UTF-8 bytes; catch decoding errors and
fallback to a safe decode (errors='replace') or str conversion so the
normalization step never raises.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]