Re: [PR] [feat](storage) Implement adaptive batch size for SegmentIterator [doris]

via GitHub Wed, 01 Apr 2026 19:28:07 -0700


mrhhsg commented on code in PR #61535:
URL: https://github.com/apache/doris/pull/61535#discussion_r3025552834



##########
be/src/storage/iterator/vcollect_iterator.cpp:
##########
@@ -913,6 +955,24 @@ Status 
VCollectIterator::Level1Iterator::_merge_next(Block* block) {
             continuous_row_in_block = 0;
             pre_row_ref = cur_row;

Review Comment:
      1. 输出行数已被硬限制：`_topn_next` 最终只输出 _topn_limit 行（通常很小，如 LIMIT 
20），sorted_row_pos 始终被裁剪到 _topn_limit 大小（line 440-445）。这个行数天然远小于任何 batch 上限。
      2. 语义不同：_merge_next 是一个流式合并迭代器，会被上层反复调用产出 block，每次产出多少行直接影响 pipeline 
下游的内存占用。而 _topn_next 是一次性调用——它遍历所有 rowset，维护一个 top-N
     堆，最后一次性输出结果，不存在"中途可以切分输出"的场景。
      3. 内存已有保护：line 448 有 shrink 逻辑（mutable_block.rows() > _topn_limit
       * 2 时压缩），且中间态的 `mutable_block `行数也被限制在 2 * `_topn_limit` 左右。
      4. 调用链不同：`_topn_next` 由 VCollectIterator::next(Block*) 在 
`_topn_filter_is_set` 时直接走此路径，上层期望拿到完整的 top-N 结果，强行截断反而破坏语义正确性。



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [feat](storage) Implement adaptive batch size for SegmentIterator [doris]

Reply via email to