github-actions[bot] commented on code in PR #61535:
URL: https://github.com/apache/doris/pull/61535#discussion_r3025519977


##########
be/src/storage/iterator/vcollect_iterator.cpp:
##########
@@ -913,6 +955,24 @@ Status 
VCollectIterator::Level1Iterator::_merge_next(Block* block) {
             continuous_row_in_block = 0;
             pre_row_ref = cur_row;

Review Comment:
   `Level1Iterator::_merge_next()` now honors `preferred_block_size_bytes`, but 
`VCollectIterator::next(Block*)` can also route ordered storage reads through 
`_topn_next()` when `read_orderby_key_limit > 0` (for `DUP_KEYS` and 
`UNIQUE_KEYS` with MOW). That path still accumulates up to `_topn_limit` rows 
into one `MutableBlock` and returns it directly at the end, without consulting 
the new byte budget.
   
   A concrete case is `ORDER BY key LIMIT 100000` on wide rows: 
`rs_reader->next_batch(block)` may now produce smaller chunks, but 
`_topn_next()` keeps copying them into `mutable_block` until it has 
`_topn_limit` candidates, so the final returned block can still be far above 
`preferred_block_size_bytes`. This means the PR's new invariant does not hold 
on this parallel storage path yet.
   
   Please either add the same byte-budget enforcement here (and return partial 
TopN results across multiple calls), or explicitly scope the feature away from 
the ordered TopN path and add tests/documentation for that exception.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to