Re: [PR] fix: apply dataset offset to adhoc temporal columns [superset]

via GitHub Thu, 05 Feb 2026 04:12:23 -0800


codeant-ai-for-open-source[bot] commented on code in PR #37698:
URL: https://github.com/apache/superset/pull/37698#discussion_r2768835798



##########
tests/integration_tests/query_context_tests.py:
##########
@@ -1143,6 +1143,61 @@ def 
test_time_grain_and_time_offset_with_base_axis(app_context, physical_dataset
     )
 
 
+@only_sqlite
+def test_dataset_offset_with_adhoc_temporal_column(app_context, 
physical_dataset):
+    """
+    Test that dataset offset is applied to adhoc SQL expression temporal 
columns.
+    Regression test for Issue #23167.
+    """
+    # Set dataset offset to 3 hours
+    physical_dataset.offset = 3
+    db.session.commit()
+
+    adhoc_temporal_column: AdhocColumn = {
+        "label": "col6",
+        "sqlExpression": "col6",
+        "columnType": "BASE_AXIS",
+        "isColumnReference": True,
+    }
+    
+    qc = QueryContextFactory().create(
+        datasource={
+            "type": physical_dataset.type,
+            "id": physical_dataset.id,
+        },
+        queries=[
+            {
+                "columns": [adhoc_temporal_column],
+                "metrics": [
+                    {
+                        "label": "COUNT(*)",
+                        "expressionType": "SQL",
+                        "sqlExpression": "COUNT(*)",
+                    }
+                ],
+                "granularity": "col6",
+                "time_range": "2002-01-01 : 2002-01-02",
+            }
+        ],
+        result_type=ChartDataResultType.FULL,
+        force=True,
+    )
+    query_object = qc.queries[0]
+    df = qc.get_df_payload(query_object)["df"]
+    
+    # Verify offset was applied: timestamps should be shifted by 3 hours
+    # Original data: 2002-01-01 00:00:00
+    # With 3 hour offset:  2002-01-01 03:00:00
+    assert len(df) > 0
+    first_timestamp = df["col6"].iloc[0]

Review Comment:
   **Suggestion:** The assertion assumes that `df["col6"].iloc[0]` is a 
datetime-like object with an `.hour` attribute, but on SQLite this column can 
be a string (as noted elsewhere in this file), which would raise an 
AttributeError instead of correctly testing the offset behavior. [type error]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ Test fails immediately on SQLite backend.
   - ⚠️ CI sqlite job reports deterministic failure.
   - ⚠️ Regression test for Issue #23167 not validated.
   ```
   </details>
   
   ```suggestion
       first_timestamp = pd.to_datetime(df["col6"].iloc[0])
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Locate the test `test_dataset_offset_with_adhoc_temporal_column` in
   tests/integration_tests/query_context_tests.py (decorated with 
`@only_sqlite` at line 1146
   in the PR hunk).
   
   2. Run the test under the SQLite backend (pytest will execute this test 
because of the
   `@only_sqlite` decorator). The test obtains its DataFrame via 
QueryContextFactory and
   `qc.get_df_payload(... )` earlier in the test (see lines 1163-1186 in the 
same test body).
   
   3. On SQLite the test data in this file is known to return timestamp columns 
as strings
   (see prior tests in this file that explicitly check for string timestamps 
under sqlite,
   e.g., the `if query_object.datasource.database.backend == "sqlite": # sqlite 
returns
   string as timestamp column` checks used in earlier tests). Thus 
df["col6"].iloc[0] will be
   a string, not a datetime.
   
   4. Executing the lines at 1192-1194 (`first_timestamp = df["col6"].iloc[0]` 
then
   `first_timestamp.hour`) raises AttributeError because a Python str has no 
`.hour`
   attribute. Converting with `pd.to_datetime()` (as in the improved code) 
ensures a
   Timestamp with `.hour` exists and the assertion correctly validates offset 
behavior.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** tests/integration_tests/query_context_tests.py
   **Line:** 1192:1192
   **Comment:**
        *Type Error: The assertion assumes that `df["col6"].iloc[0]` is a 
datetime-like object with an `.hour` attribute, but on SQLite this column can 
be a string (as noted elsewhere in this file), which would raise an 
AttributeError instead of correctly testing the offset behavior.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



##########
superset/models/helpers.py:
##########
@@ -1245,16 +1245,27 @@ def _get_timestamp_format(column: str | None) -> str | 
None:
                 return str(formatter)
             return None
 
+        def _is_temporal_column(label: str) -> bool:
+            """Check if a column is temporal (physical or adhoc)."""
+            # First check if it's a physical column in metadata
+            if hasattr(self, "get_column") and (col := self.get_column(label)):
+                return col.get("is_dttm") if isinstance(col, dict) else 
col.is_dttm

Review Comment:
   **Suggestion:** The `_is_temporal_column` helper is intended to detect 
temporal columns, but for physical columns it only checks `is_dttm`, whereas 
elsewhere in this file temporal-ness is determined via `is_temporal` (or 
`is_temporal` in dicts). This means valid temporal columns that are not marked 
as `is_dttm` (but are `is_temporal`, e.g. secondary datetime fields) will be 
excluded from `labels`, so `DateColumn` normalization and dataset offsets will 
not be applied to them, leading to inconsistent time handling. The fix is to 
treat a column as temporal if either `is_temporal` or `is_dttm` is truthy, for 
both object and dict representations. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ Charts using secondary temporal columns show wrong timestamps.
   - ⚠️ Time offsets not applied for non-main temporal fields.
   - ⚠️ Time comparison joins/offsets miss expected columns.
   - ⚠️ Query normalization pipeline inconsistent for temporal columns.
   ```
   </details>
   
   ```suggestion
                   if isinstance(col, dict):
                       # Some connectors expose column metadata as dicts
                       return bool(col.get("is_temporal") or col.get("is_dttm"))
                   # TableColumn objects expose temporal info via is_temporal / 
is_dttm
                   return bool(getattr(col, "is_temporal", False) or 
getattr(col, "is_dttm", False))
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Ensure a dataset has a secondary/adhoc temporal column that is not the 
main dttm:
   
      - The dataset's TableColumn (returned by 
superset/connectors/sqla/models.py:get_column
      at lines 611-617)
   
        has attribute is_temporal=True but is_dttm=False for that column name.
   
   2. Create a QueryObject that includes that column as the x-axis or 
granularity so it
   appears
   
      in the labels tuple built in normalize_df (superset/models/helpers.py at 
lines
      1261-1269,
   
      inside normalize_df -> labels = tuple(... if label and 
_is_temporal_column(label))).
   
   3. Execute the query path that processes results: Query -> 
get_query_result(...) which
   calls
   
      normalize_df(...). normalize_df calls the local helper at
      superset/models/helpers.py:1248-1259
   
      (_is_temporal_column). Because the helper currently checks only is_dttm 
on physical
      columns,
   
      it returns False for the TableColumn that has 
is_temporal=True/is_dttm=False.
   
   4. Observed behavior: the column name is excluded from labels (see labels 
tuple at
   1261-1269),
   
      so no DateColumn is created for that temporal column and 
normalization/offset
      application
   
      (DateColumn construction uses self.offset at lines 1271-1276) is not 
applied to it.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/models/helpers.py
   **Line:** 1252:1252
   **Comment:**
        *Logic Error: The `_is_temporal_column` helper is intended to detect 
temporal columns, but for physical columns it only checks `is_dttm`, whereas 
elsewhere in this file temporal-ness is determined via `is_temporal` (or 
`is_temporal` in dicts). This means valid temporal columns that are not marked 
as `is_dttm` (but are `is_temporal`, e.g. secondary datetime fields) will be 
excluded from `labels`, so `DateColumn` normalization and dataset offsets will 
not be applied to them, leading to inconsistent time handling. The fix is to 
treat a column as temporal if either `is_temporal` or `is_dttm` is truthy, for 
both object and dict representations.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix: apply dataset offset to adhoc temporal columns [superset]

Reply via email to