924060929 opened a new pull request, #64306:
URL: https://github.com/apache/doris/pull/64306

   ### What problem does this PR solve?
   
   Problem Summary:
   
   `MergeTopNs` (the `MERGE_TOP_N` rewrite rule) merges a parent `TopN` into 
its child `TopN`
   when their order keys are compatible. When the parent `TopN` carries a 
non-zero `OFFSET`, the
   merged limit was computed as `min(parent.limit, child.limit)`, which ignores 
that the parent
   offset consumes rows from the child's output. The merged `TopN` therefore 
keeps too many rows
   and the query returns a wrong result.
   
   Example:
   
   ```sql
   SELECT * FROM (SELECT k, v FROM t ORDER BY k LIMIT 5) s ORDER BY k LIMIT 3 
OFFSET 4;
   ```
   
   The inner `ORDER BY k LIMIT 5` yields 5 rows; the outer `LIMIT 3 OFFSET 4` 
skips 4 of them, so
   only 1 row should remain. Before this PR the rule merged the two `TopN` into 
`OFFSET 4 LIMIT 3`
   (instead of `OFFSET 4 LIMIT 1`), so it returned 3 rows.
   
   Fix: clamp the merged limit by `max(child.limit - parent.offset, 0)`, the 
same semantics
   already used by `MergeLimits.mergeLimit` for consecutive limits. The bug 
only triggers when the
   outer `TopN` has a non-zero offset (offset = 0 makes both formulas equal).
   
   The existing unit test `MergeTopNsTest.testOffset` asserted the buggy value 
(`limit == 10`,
   while the correct value is `9`); this PR corrects that assertion as well.
   
   ### Release note
   
   Fix the wrong result produced by the `MERGE_TOP_N` optimization when an outer
   `ORDER BY ... LIMIT` carries a non-zero `OFFSET` over an inner `ORDER BY ... 
LIMIT`.
   
   ### Check List (For Author)
   
   - Test
       - [x] Regression test
       - [x] Unit Test
       - [ ] Manual test (add detailed scripts or steps below)
       - [ ] No need to test or manual test. Explain why:
   
   - Behavior changed:
       - [ ] No.
       - [x] Yes. A stacked `TopN` whose outer `TopN` has a non-zero offset now 
returns the correct (fewer) rows instead of extra rows.
   
   - Does this need documentation?
       - [x] No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to