codeant-ai-for-open-source[bot] commented on code in PR #37698:
URL: https://github.com/apache/superset/pull/37698#discussion_r2768835798
##########
tests/integration_tests/query_context_tests.py:
##########
@@ -1143,6 +1143,61 @@ def
test_time_grain_and_time_offset_with_base_axis(app_context, physical_dataset
)
+@only_sqlite
+def test_dataset_offset_with_adhoc_temporal_column(app_context,
physical_dataset):
+ """
+ Test that dataset offset is applied to adhoc SQL expression temporal
columns.
+ Regression test for Issue #23167.
+ """
+ # Set dataset offset to 3 hours
+ physical_dataset.offset = 3
+ db.session.commit()
+
+ adhoc_temporal_column: AdhocColumn = {
+ "label": "col6",
+ "sqlExpression": "col6",
+ "columnType": "BASE_AXIS",
+ "isColumnReference": True,
+ }
+
+ qc = QueryContextFactory().create(
+ datasource={
+ "type": physical_dataset.type,
+ "id": physical_dataset.id,
+ },
+ queries=[
+ {
+ "columns": [adhoc_temporal_column],
+ "metrics": [
+ {
+ "label": "COUNT(*)",
+ "expressionType": "SQL",
+ "sqlExpression": "COUNT(*)",
+ }
+ ],
+ "granularity": "col6",
+ "time_range": "2002-01-01 : 2002-01-02",
+ }
+ ],
+ result_type=ChartDataResultType.FULL,
+ force=True,
+ )
+ query_object = qc.queries[0]
+ df = qc.get_df_payload(query_object)["df"]
+
+ # Verify offset was applied: timestamps should be shifted by 3 hours
+ # Original data: 2002-01-01 00:00:00
+ # With 3 hour offset: 2002-01-01 03:00:00
+ assert len(df) > 0
+ first_timestamp = df["col6"].iloc[0]
Review Comment:
**Suggestion:** The assertion assumes that `df["col6"].iloc[0]` is a
datetime-like object with an `.hour` attribute, but on SQLite this column can
be a string (as noted elsewhere in this file), which would raise an
AttributeError instead of correctly testing the offset behavior. [type error]
<details>
<summary><b>Severity Level:</b> Critical 🚨</summary>
```mdx
- ❌ Test fails immediately on SQLite backend.
- ⚠️ CI sqlite job reports deterministic failure.
- ⚠️ Regression test for Issue #23167 not validated.
```
</details>
```suggestion
first_timestamp = pd.to_datetime(df["col6"].iloc[0])
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Locate the test `test_dataset_offset_with_adhoc_temporal_column` in
tests/integration_tests/query_context_tests.py (decorated with
`@only_sqlite` at line 1146
in the PR hunk).
2. Run the test under the SQLite backend (pytest will execute this test
because of the
`@only_sqlite` decorator). The test obtains its DataFrame via
QueryContextFactory and
`qc.get_df_payload(... )` earlier in the test (see lines 1163-1186 in the
same test body).
3. On SQLite the test data in this file is known to return timestamp columns
as strings
(see prior tests in this file that explicitly check for string timestamps
under sqlite,
e.g., the `if query_object.datasource.database.backend == "sqlite": # sqlite
returns
string as timestamp column` checks used in earlier tests). Thus
df["col6"].iloc[0] will be
a string, not a datetime.
4. Executing the lines at 1192-1194 (`first_timestamp = df["col6"].iloc[0]`
then
`first_timestamp.hour`) raises AttributeError because a Python str has no
`.hour`
attribute. Converting with `pd.to_datetime()` (as in the improved code)
ensures a
Timestamp with `.hour` exists and the assertion correctly validates offset
behavior.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** tests/integration_tests/query_context_tests.py
**Line:** 1192:1192
**Comment:**
*Type Error: The assertion assumes that `df["col6"].iloc[0]` is a
datetime-like object with an `.hour` attribute, but on SQLite this column can
be a string (as noted elsewhere in this file), which would raise an
AttributeError instead of correctly testing the offset behavior.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
##########
superset/models/helpers.py:
##########
@@ -1245,16 +1245,27 @@ def _get_timestamp_format(column: str | None) -> str |
None:
return str(formatter)
return None
+ def _is_temporal_column(label: str) -> bool:
+ """Check if a column is temporal (physical or adhoc)."""
+ # First check if it's a physical column in metadata
+ if hasattr(self, "get_column") and (col := self.get_column(label)):
+ return col.get("is_dttm") if isinstance(col, dict) else
col.is_dttm
Review Comment:
**Suggestion:** The `_is_temporal_column` helper is intended to detect
temporal columns, but for physical columns it only checks `is_dttm`, whereas
elsewhere in this file temporal-ness is determined via `is_temporal` (or
`is_temporal` in dicts). This means valid temporal columns that are not marked
as `is_dttm` (but are `is_temporal`, e.g. secondary datetime fields) will be
excluded from `labels`, so `DateColumn` normalization and dataset offsets will
not be applied to them, leading to inconsistent time handling. The fix is to
treat a column as temporal if either `is_temporal` or `is_dttm` is truthy, for
both object and dict representations. [logic error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ Charts using secondary temporal columns show wrong timestamps.
- ⚠️ Time offsets not applied for non-main temporal fields.
- ⚠️ Time comparison joins/offsets miss expected columns.
- ⚠️ Query normalization pipeline inconsistent for temporal columns.
```
</details>
```suggestion
if isinstance(col, dict):
# Some connectors expose column metadata as dicts
return bool(col.get("is_temporal") or col.get("is_dttm"))
# TableColumn objects expose temporal info via is_temporal /
is_dttm
return bool(getattr(col, "is_temporal", False) or
getattr(col, "is_dttm", False))
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Ensure a dataset has a secondary/adhoc temporal column that is not the
main dttm:
- The dataset's TableColumn (returned by
superset/connectors/sqla/models.py:get_column
at lines 611-617)
has attribute is_temporal=True but is_dttm=False for that column name.
2. Create a QueryObject that includes that column as the x-axis or
granularity so it
appears
in the labels tuple built in normalize_df (superset/models/helpers.py at
lines
1261-1269,
inside normalize_df -> labels = tuple(... if label and
_is_temporal_column(label))).
3. Execute the query path that processes results: Query ->
get_query_result(...) which
calls
normalize_df(...). normalize_df calls the local helper at
superset/models/helpers.py:1248-1259
(_is_temporal_column). Because the helper currently checks only is_dttm
on physical
columns,
it returns False for the TableColumn that has
is_temporal=True/is_dttm=False.
4. Observed behavior: the column name is excluded from labels (see labels
tuple at
1261-1269),
so no DateColumn is created for that temporal column and
normalization/offset
application
(DateColumn construction uses self.offset at lines 1271-1276) is not
applied to it.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/models/helpers.py
**Line:** 1252:1252
**Comment:**
*Logic Error: The `_is_temporal_column` helper is intended to detect
temporal columns, but for physical columns it only checks `is_dttm`, whereas
elsewhere in this file temporal-ness is determined via `is_temporal` (or
`is_temporal` in dicts). This means valid temporal columns that are not marked
as `is_dttm` (but are `is_temporal`, e.g. secondary datetime fields) will be
excluded from `labels`, so `DateColumn` normalization and dataset offsets will
not be applied to them, leading to inconsistent time handling. The fix is to
treat a column as temporal if either `is_temporal` or `is_dttm` is truthy, for
both object and dict representations.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]