github-actions[bot] commented on code in PR #61535:
URL: https://github.com/apache/doris/pull/61535#discussion_r3025519977
##########
be/src/storage/iterator/vcollect_iterator.cpp:
##########
@@ -913,6 +955,24 @@ Status
VCollectIterator::Level1Iterator::_merge_next(Block* block) {
continuous_row_in_block = 0;
pre_row_ref = cur_row;
Review Comment:
`Level1Iterator::_merge_next()` now honors `preferred_block_size_bytes`, but
`VCollectIterator::next(Block*)` can also route ordered storage reads through
`_topn_next()` when `read_orderby_key_limit > 0` (for `DUP_KEYS` and
`UNIQUE_KEYS` with MOW). That path still accumulates up to `_topn_limit` rows
into one `MutableBlock` and returns it directly at the end, without consulting
the new byte budget.
A concrete case is `ORDER BY key LIMIT 100000` on wide rows:
`rs_reader->next_batch(block)` may now produce smaller chunks, but
`_topn_next()` keeps copying them into `mutable_block` until it has
`_topn_limit` candidates, so the final returned block can still be far above
`preferred_block_size_bytes`. This means the PR's new invariant does not hold
on this parallel storage path yet.
Please either add the same byte-budget enforcement here (and return partial
TopN results across multiple calls), or explicitly scope the feature away from
the ordered TopN path and add tests/documentation for that exception.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]