steveahnahn opened a new pull request, #68869:
URL: https://github.com/apache/airflow/pull/68869

   Cursor (keyset) paginated REST list endpoints (`GET /dags/{dag_id}/dagRuns` 
and `GET .../taskInstances`) silently dropped rows when sorted by a nullable 
column such as `start_date`, `end_date`, `duration`, or `state`. The keyset 
predicate and the generated `ORDER BY` disagreed on where NULLs sort, so once a 
page boundary fell on the NULL/non-NULL edge, every row on one side of it was 
skipped, with no error.
   
   This is reachable from the shipping web UI: the Task Instances and Dag Runs 
lists paginate by cursor and let you sort by clicking a column header, so 
sorting by **Start Date** while some queued (not yet started) task instances 
are present makes rows silently disappear from the grid.
   
   ### Fix
   
   `NULLS FIRST/LAST` is not portable (unsupported on MySQL and older SQLite), 
so the cursor path pins NULL placement with a portable `CASE`-based null-rank 
key shared by both the keyset `ORDER BY` and the keyset predicate, so they can 
no longer disagree on any backend. The rank follows the column's sort 
direction, so NULLs sort as the largest value (last when ascending, first when 
descending), matching PostgreSQL's default. PostgreSQL result ordering is 
therefore unchanged; SQLite and MySQL NULL ordering shifts to align with 
PostgreSQL.
   
   - Cursor token format is unchanged (the rank is derived from the decoded 
value, not encoded), so existing cursors keep working.
   - Offset pagination is untouched.
   
   The `CASE` in `ORDER BY` means the sort on a nullable column cannot use a 
plain column index. This is the cost of cross-backend portability; a 
PostgreSQL-only `NULLS LAST` fast path could be a future optimization.
   
   ### Tests
   
   - Fail-first endpoint regressions (`taskInstances` and `dagRuns`): 
paginating by a nullable column returns every row only after the fix.
   - Forward/backward cursor consistency over a nullable column.
   - Unit coverage for the keyset expansion (single and multiple nullable 
columns, no-NULLs case, rank derivation, nullability detection).
   
   closes: #68858
   
   ---
   
   ##### Was generative AI tooling used to co-author this PR?
   
   - [X] Yes (Claude Code, Opus 4.8)
   
   Generated-by: Claude Code (Opus 4.8) following [the 
guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to