manipatnam opened a new pull request, #67531:
URL: https://github.com/apache/airflow/pull/67531
## Problem
Sorting mapped task instances by `rendered_map_index` produced incorrect
**lexicographic order** instead of numeric order:
```
Got: "0", "1", "10", "100", "101", "11", "2", "3" ...
Wanted: 0, 1, 2, 3, ..., 10, 11, ..., 100, 101
```
Additionally, any request using cursor pagination with
`order_by=rendered_map_index` returned a **500 error**:
```
NotImplementedError: Cursor pagination does not support column-form
``to_replace`` mapping for ``rendered_map_index``.
```
## Fix
Map `rendered_map_index` to the integer `map_index` column via a **string
alias** in `SortParam.to_replace`:
```python
to_replace={
...
"rendered_map_index": "map_index", # routes ORDER BY through integer
column
}
```
A string alias goes through `getattr(row, "map_index")` in `row_value`, so
cursor pagination works correctly without any changes to `cursors.py`.
Also adds a `secondary_sort` parameter to `SortParam` as generic
infrastructure for appending tiebreaker columns to `ORDER BY`.
## ⚠️ Known side effect — input needed
For tasks with **custom labels** (set via `map_index_template`), sorting by
`rendered_map_index` now sorts by the underlying **integer creation index**
rather than alphabetically by label.
Example:
```python
@task(map_index_template="{{ user }}")
def process(user: str): ...
process.expand(user=["zebra", "apple", "mango", "cherry", "banana"])
```
| map_index | label | Sort by `rendered_map_index` (before) | Sort by
`rendered_map_index` (after) |
|---|---|---|---|
| 0 | zebra | apple | zebra |
| 1 | apple | banana | apple |
| 2 | mango | cherry | mango |
| 3 | cherry | mango | cherry |
| 4 | banana | zebra | banana |
**Before**: alphabetical by label ✓ but broken for numeric + 500 on cursor
pagination
**After**: integer creation order — same as `map_index` sort
## Alternatives considered
The correct solution for both cases simultaneously (numeric → integer order,
labeled → alphabetical) requires one of:
1. **Zero-padded string expression** — `CASE WHEN _rendered_map_index IS NOT
NULL THEN _rendered_map_index ELSE printf('%010d', map_index) END` — medium
complexity, requires DB-specific `printf`/`lpad` + a Python callable in
`row_value` for cursor encoding
2. **Multi-column cursor expansion** — expand `rendered_map_index` into
`(_rendered_map_index, map_index)` in `_resolve()`, with NULL-aware keyset
predicates in `cursors.py` — high complexity, touches correctness-critical
pagination code
3. **Pre-computed sort column** — store a ready-to-sort string in a new DB
column at TI creation time — cleanest long-term, but requires a migration
This PR takes the minimal safe approach and raises the question: **what
should `rendered_map_index` sort semantics be for labeled instances?** If
alphabetical label sort is important, option 1 or 3 above would be the next
step.
---
##### Was generative AI tooling used to co-author this PR?
- [X] Yes — Claude Sonnet 4.6
Generated-by: Claude Sonnet 4.6 following [the
guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]