codeant-ai-for-open-source[bot] commented on code in PR #35962:
URL: https://github.com/apache/superset/pull/35962#discussion_r2586268297
##########
tests/unit_tests/result_set_test.py:
##########
@@ -185,3 +185,48 @@ def
test_get_column_description_from_empty_data_using_cursor_description(
)
assert any(col.get("column_name") == "__time" for col in
result_set.columns)
logger.exception.assert_not_called()
+
+
+def test_empty_result_set_preserves_column_metadata() -> None:
+ """
+ Test that column metadata is preserved when query returns zero rows.
+
+ When a query returns no data but has a valid cursor description, the
+ column names and types from cursor_description should be preserved
+ in the result set. This allows downstream consumers (like the UI)
+ to display column headers even for empty result sets.
+ """
+ data: DbapiResult = []
+ description = [
+ ("id", "int", None, None, None, None, True),
+ ("name", "varchar", None, None, None, None, True),
+ ("created_at", "timestamp", None, None, None, None, True),
+ ]
+
+ result_set = SupersetResultSet(
+ data,
+ description, # type: ignore
+ BaseEngineSpec,
+ )
+
+ # Verify column count
+ assert len(result_set.columns) == 3
+
+ # Verify column names are preserved
+ column_names = [col["column_name"] for col in result_set.columns]
+ assert column_names == ["id", "name", "created_at"]
+
+ # Verify types from cursor_description are used
+ assert result_set.columns[0]["type"] == "INT"
+ assert result_set.columns[1]["type"] == "VARCHAR"
+ assert result_set.columns[2]["type"] == "TIMESTAMP"
+
+ # Verify the PyArrow table has the correct schema
+ assert result_set.table.num_rows == 0
+ assert result_set.table.num_columns == 3
+ assert result_set.table.column_names == ["id", "name", "created_at"]
Review Comment:
**Suggestion:** The test directly compares `result_set.table.num_columns`
and `result_set.table.column_names` to Python primitives; depending on pyarrow
versions `column_names` may not be exactly a list (or `num_columns` access
semantics may vary). Use `len(result_set.table.column_names)` and cast
`column_names` to `list(...)` to make assertions robust and avoid false
negatives. [possible bug]
**Severity Level:** Critical 🚨
```suggestion
assert len(result_set.table.column_names) == 3
assert list(result_set.table.column_names) == ["id", "name",
"created_at"]
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
Using len(...) and casting column_names to list(...) is more tolerant of
pyarrow API differences
(some versions expose sequence-like objects). The change improves test
robustness without
altering semantics.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** tests/unit_tests/result_set_test.py
**Line:** 226:227
**Comment:**
*Possible Bug: The test directly compares
`result_set.table.num_columns` and `result_set.table.column_names` to Python
primitives; depending on pyarrow versions `column_names` may not be exactly a
list (or `num_columns` access semantics may vary). Use
`len(result_set.table.column_names)` and cast `column_names` to `list(...)` to
make assertions robust and avoid false negatives.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
tests/unit_tests/result_set_test.py:
##########
@@ -185,3 +185,48 @@ def
test_get_column_description_from_empty_data_using_cursor_description(
)
assert any(col.get("column_name") == "__time" for col in
result_set.columns)
logger.exception.assert_not_called()
+
+
+def test_empty_result_set_preserves_column_metadata() -> None:
+ """
+ Test that column metadata is preserved when query returns zero rows.
+
+ When a query returns no data but has a valid cursor description, the
+ column names and types from cursor_description should be preserved
+ in the result set. This allows downstream consumers (like the UI)
+ to display column headers even for empty result sets.
+ """
+ data: DbapiResult = []
+ description = [
+ ("id", "int", None, None, None, None, True),
+ ("name", "varchar", None, None, None, None, True),
+ ("created_at", "timestamp", None, None, None, None, True),
+ ]
+
+ result_set = SupersetResultSet(
+ data,
+ description, # type: ignore
+ BaseEngineSpec,
+ )
+
+ # Verify column count
+ assert len(result_set.columns) == 3
+
+ # Verify column names are preserved
+ column_names = [col["column_name"] for col in result_set.columns]
+ assert column_names == ["id", "name", "created_at"]
+
+ # Verify types from cursor_description are used
+ assert result_set.columns[0]["type"] == "INT"
+ assert result_set.columns[1]["type"] == "VARCHAR"
+ assert result_set.columns[2]["type"] == "TIMESTAMP"
Review Comment:
**Suggestion:** The test asserts exact uppercase SQL type strings ("INT",
"VARCHAR", "TIMESTAMP") but the actual mapping comes from the engine spec via
`get_datatype`; hardcoding the expected strings makes the test brittle and can
fail if `BaseEngineSpec.get_datatype` returns a different canonical
representation. Use `BaseEngineSpec.get_datatype` on the description's type
entries to derive the expected values. [possible bug]
**Severity Level:** Critical 🚨
```suggestion
# Verify types from cursor_description are used (derive expected values
from the engine spec)
assert result_set.columns[0]["type"] ==
BaseEngineSpec.get_datatype(description[0][1])
assert result_set.columns[1]["type"] ==
BaseEngineSpec.get_datatype(description[1][1])
assert result_set.columns[2]["type"] ==
BaseEngineSpec.get_datatype(description[2][1])
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
The suggestion reduces brittleness: the test currently assumes
BaseEngineSpec.get_datatype
will always return those exact uppercased strings. Deriving expected values
from the same
engine spec implementation ties the assertion to the actual mapping and
prevents false
failures if the canonical representation changes. It's a pragmatic, low-risk
improvement.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** tests/unit_tests/result_set_test.py
**Line:** 219:222
**Comment:**
*Possible Bug: The test asserts exact uppercase SQL type strings
("INT", "VARCHAR", "TIMESTAMP") but the actual mapping comes from the engine
spec via `get_datatype`; hardcoding the expected strings makes the test brittle
and can fail if `BaseEngineSpec.get_datatype` returns a different canonical
representation. Use `BaseEngineSpec.get_datatype` on the description's type
entries to derive the expected values.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
tests/unit_tests/result_set_test.py:
##########
@@ -185,3 +185,48 @@ def
test_get_column_description_from_empty_data_using_cursor_description(
)
assert any(col.get("column_name") == "__time" for col in
result_set.columns)
logger.exception.assert_not_called()
+
+
+def test_empty_result_set_preserves_column_metadata() -> None:
+ """
+ Test that column metadata is preserved when query returns zero rows.
+
+ When a query returns no data but has a valid cursor description, the
+ column names and types from cursor_description should be preserved
+ in the result set. This allows downstream consumers (like the UI)
+ to display column headers even for empty result sets.
+ """
+ data: DbapiResult = []
+ description = [
+ ("id", "int", None, None, None, None, True),
+ ("name", "varchar", None, None, None, None, True),
+ ("created_at", "timestamp", None, None, None, None, True),
+ ]
+
+ result_set = SupersetResultSet(
+ data,
+ description, # type: ignore
+ BaseEngineSpec,
+ )
+
+ # Verify column count
+ assert len(result_set.columns) == 3
+
+ # Verify column names are preserved
+ column_names = [col["column_name"] for col in result_set.columns]
+ assert column_names == ["id", "name", "created_at"]
+
+ # Verify types from cursor_description are used
+ assert result_set.columns[0]["type"] == "INT"
+ assert result_set.columns[1]["type"] == "VARCHAR"
+ assert result_set.columns[2]["type"] == "TIMESTAMP"
+
+ # Verify the PyArrow table has the correct schema
+ assert result_set.table.num_rows == 0
+ assert result_set.table.num_columns == 3
+ assert result_set.table.column_names == ["id", "name", "created_at"]
+
+ # Verify DataFrame conversion works
+ df = result_set.to_pandas_df()
+ assert len(df) == 0
+ assert list(df.columns) == ["id", "name", "created_at"]
Review Comment:
**Suggestion:** The DataFrame column names produced by pyarrow/pandas
conversion can sometimes be bytes or non-string types; asserting exact list
equality can fail. Normalize column names to strings before comparing to avoid
flaky failures. [possible bug]
**Severity Level:** Critical 🚨
```suggestion
assert list(map(str, df.columns)) == ["id", "name", "created_at"]
```
<details>
<summary><b>Why it matters? ⭐ </b></summary>
Normalizing column names to strings (e.g., list(map(str, df.columns)))
prevents flakiness
when column names come through as bytes or other non-str types. It's a small
defensive tweak
that reduces sporadic test failures.
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** tests/unit_tests/result_set_test.py
**Line:** 232:232
**Comment:**
*Possible Bug: The DataFrame column names produced by pyarrow/pandas
conversion can sometimes be bytes or non-string types; asserting exact list
equality can fail. Normalize column names to strings before comparing to avoid
flaky failures.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]