zclllyybb commented on issue #64172:
URL: https://github.com/apache/doris/issues/64172#issuecomment-4640858451

   Breakwater-GitHub-Analysis-Slot: slot_84437a4c8059
   
   Initial triage: this looks like a valid correctness bug in the partition 
top-n path for `rank()` / `dense_rank()`, not a data-layout issue.
   
   I checked current upstream `master` 
(`c27fd0ba968daffe8329186205315f8b1cd147b7`) and the reported control flow is 
still present:
   
   - `CreatePartitionTopNFromWindow` can push filters on `row_number()`, 
`rank()`, and `dense_rank()` into `LogicalPartitionTopN`.
   - `PartitionSortNode` maps `rank()` to `TopNAlgorithm::RANK` and 
`dense_rank()` to `TopNAlgorithm::DENSE_RANK`.
   - `PartitionSorter::_read_row_rank()` uses `_get_enough_data()` in two 
different meanings: "the rank/top-n limit has been reached" and "this sorter is 
fully exhausted".
   - For tied rows, those meanings diverge. With `WHERE rk = 1`, 
`_get_enough_data()` can become true after the first emitted row of the first 
rank group, while the same rank group still has rows left in the queue.
   - If the current output block reaches `batch_size` before the tie group is 
fully drained, the deferred EOS assignment sets `*eos = true`. 
`PartitionSortSourceOperatorX::get_sorted_block()` then advances `_sort_idx`, 
so the remaining rows in the current sorter are skipped permanently.
   
   This also explains why the row count follows `min(batch_size, 
tied_group_size)` with one pipeline instance, and why disabling 
`enable_partition_topn` avoids the wrong result by routing through the normal 
analytic path. `row_number()` is not affected in the same way because stopping 
after the configured number of rows is its intended semantics.
   
   The proposed fix direction is reasonable: `_read_row_rank()` needs a 
separate "finished" state, and EOS should only be reported when the input queue 
is exhausted or when the code has crossed into the next distinct rank group 
after the rank/top-n limit has already been satisfied. Reaching 
`_get_enough_data()` alone should not mark the current sorter as drained.
   
   Recommended next steps:
   
   - Treat this as a silent wrong-result bug with high priority.
   - Add coverage for both `dense_rank()` and `rank()` where the first tied 
rank group is larger than `batch_size`.
   - Include a control case with `enable_partition_topn = false` and at least 
one case with `batch_size` smaller than the tied group.
   - Please also confirm the affected release branches when preparing 
backports; the inspected upstream path matches the report, but I did not run 
the SQL repro locally in this triage.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to