fitzee commented on PR #40127:
URL: https://github.com/apache/superset/pull/40127#issuecomment-4483470825

   ## Griffen + Codex Review — Round 4 (2026-05-19)
   
   > **Verdict: Needs changes.** Same MEDIUM blocker from rounds 2 and 3. One 
new non-functional commit (`fix(pre-commit): fix ruff formatting and lint 
errors`) — no functional changes. The mixed-column test is still absent.
   
   ---
   
   ### The blocking issue (unresolved since round 2)
   
   `_convert_temporal_columns()` uses `try: pd.to_datetime(series)` / `except 
OutOfBoundsDatetime:` / fallback `pd.to_datetime(series, errors="coerce")`.
   
   For a column like `["3118-01-01", "not-a-date"]`:
   
   1. `pd.to_datetime(["3118-01-01", "not-a-date"])` — processes index 0 first 
→ `OutOfBoundsDatetime`
   2. `except OutOfBoundsDatetime` catches it
   3. Fallback `pd.to_datetime([...], errors="coerce")` runs on the whole 
series → `"not-a-date"` coerced to `NaT` silently
   4. No exception raised; malformed string written as `NaT` to the DB via 
`to_sql`
   
   The docstring says *"coercing only out-of-bounds values"* — but the 
implementation can't guarantee that when a column contains both OOB and 
malformed values. `test_malformed_dates_still_raise` tests `["not-a-date"]` 
alone (pure-malformed path that never enters the `except` branch). It proves 
nothing about the mixed case.
   
   **The fastest resolution:** add this parametrized test, run it locally, and 
report:
   
   ```python
   @pytest.mark.parametrize(
       "values",
       [["3118-01-01", "not-a-date"], ["not-a-date", "3118-01-01"]],
   )
   def test_mixed_out_of_bounds_and_malformed_still_raises(values: list[str]) 
-> None:
       from sqlalchemy import DateTime
       from superset.commands.dataset.importers.v1.utils import 
_convert_temporal_columns
   
       df = pd.DataFrame({"ts": values})
       with pytest.raises((ValueError, pd.errors.ParserError)):
           _convert_temporal_columns(df, {"ts": DateTime()})
   ```
   
   - **If both parametrize cases raise** → the invariant holds by pandas 
ordering → merge.
   - **If either case does not raise** → the invariant is broken → implement 
row-level fallback in the `except OutOfBoundsDatetime` branch (iterate values, 
coerce individually on `OutOfBoundsDatetime`, re-raise on all other errors).
   
   ---
   
   ### Non-blocking (fix alongside)
   
   1. **`pytest.raises(pd.errors.ParserError)` in 
`test_malformed_dates_still_raise`** — `pd.errors.ParserError` is for CSV file 
parsing; `pd.to_datetime` may raise `dateutil.parser.ParserError` or 
`ValueError` depending on pandas version. Use `pytest.raises((ValueError, 
pd.errors.ParserError))` for version safety.
   
   2. **Unused helpers in `utils_test.py`** — `_make_dataset_col` and 
`_apply_temporal_conversion` are defined but never called. Remove.
   
   3. **Indentation check** — the diff for `test_normal_dates_converted` shows 
`assert pd.api.types.is_datetime64_any_dtype(df["ts"])` without leading spaces. 
If that's accurate in the file (not a diff rendering artifact), the assertion 
is at module level and `df` is undefined → `NameError` on test collection. 
Verify the indentation in the actual file.
   
   ---
   
   ### What's solid
   
   The `result_set.py` one-liner (`errors="coerce"` on the tz-aware path) is 
correct and well-tested. The `_convert_temporal_columns` architecture 
(try-strict / except-OOB / fallback-coerce) is the right pattern. The warning 
with net-new NaT count is precise observability. The alternative-approach 
reasoning in the PR description (dismissing clamping and string conversion) is 
correct.
   
   This is one test away from being ready to merge.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to