fitzee commented on PR #40127:
URL: https://github.com/apache/superset/pull/40127#issuecomment-4473001323
**Griffen + Codex review (round 3) — 2026-05-18**
**Verdict: Needs changes**
Both reviewers flagged the same MEDIUM — the verdict diverged only on
framing (conditional LGTM vs. hard Needs changes), not on the finding itself.
---
## [MEDIUM] Mixed OOB + malformed column: invariant not proven by tests
`_convert_temporal_columns` is designed so malformed strings still raise —
only out-of-bounds timestamps should become NaT. This relies on
`pd.to_datetime(series)` raising `ParserError` for malformed strings **before**
raising `OutOfBoundsDatetime` for OOB values. Based on pandas'
parse-then-convert ordering, this likely holds. But the current test suite
doesn't prove it.
The missing test:
```python
def test_mixed_oob_and_malformed_raises(self) -> None:
"""Mixed OOB + malformed in same column: malformed must still raise."""
df = pd.DataFrame({"ts": ["3118-01-01", "not-a-date"]})
with pytest.raises((ValueError, pd.errors.ParserError)):
_convert_temporal_columns(df, {"ts": DateTime()})
```
**If the test passes:** the invariant is confirmed by pandas behavior. Merge
as-is (with nits below).
**If the test fails:** implement row-level fallback — after the broad
`errors="coerce"`, for each newly-NaT index retry that individual value with
`errors="raise"`, suppress only `OutOfBoundsDatetime`, re-raise everything
else. This converts the batch-coerce shortcut into a semantically precise
operation.
Why this matters: `load_data()` writes via `to_sql(...,
if_exists="replace")`. A malformed string that previously caused an import
validation failure would instead silently persist as NULL — durable data loss,
not a transient display issue.
---
## Nits (non-blocking, fix alongside)
- `pytest.raises(Exception)` in `test_malformed_dates_still_raise` →
`pytest.raises(ValueError)` or `pytest.raises(pandas.errors.ParserError)`. The
`# noqa: B017` suppresses the linter warning instead of fixing it; broad
`Exception` makes the test vacuous.
- `_make_dataset_col` and `_apply_temporal_conversion` are defined in
`utils_test.py` but never called by any test. Remove or wire them in.
---
## What's sound
The overall approach is correct: strict-first, catch only the specific
overflow exception, targeted fallback with warning. The `result_set.py`
one-liner is clean. The extraction into `_convert_temporal_columns` makes the
logic testable where it wasn't before. The warning on net-new NaTs gives
operators visibility into coercion at import time.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]