yiguolei commented on code in PR #61535:
URL: https://github.com/apache/doris/pull/61535#discussion_r3007194811


##########
be/src/storage/iterators.h:
##########
@@ -111,6 +111,13 @@ class StorageReadOptions {
     OlapReaderStatistics* stats = nullptr;
     bool use_page_cache = false;
     uint32_t block_row_max = 4096 - 32; // see 
https://github.com/apache/doris/pull/11816
+    // Adaptive batch size target (bytes). 0 = disabled.
+    size_t preferred_block_size_bytes = 0;
+    // Per-column byte limit for the EWMA predictor. 0 = no column limit.
+    size_t preferred_max_col_bytes = 1048576;
+    // True output columns of the final block. Used by 
AdaptiveBlockSizePredictor.
+    // Must match the order and count of columns in the block passed to 
next_batch().
+    std::vector<ColumnId> adaptive_batch_output_columns;

Review Comment:
   我感觉这里跟是否true output 没有关系。
   我们的读取是分层的,应该是每层都有自己的output,那么就应该每层根据自己的output 来控制block size



##########
be/src/storage/segment/adaptive_block_size_predictor.cpp:
##########
@@ -0,0 +1,138 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "storage/segment/adaptive_block_size_predictor.h"
+
+#include <algorithm>
+#include <cstddef>
+
+#include "core/block/block.h"
+#include "storage/segment/segment.h"
+#include "storage/tablet/tablet_schema.h"
+
+namespace doris::segment_v2 {
+
+AdaptiveBlockSizePredictor::AdaptiveBlockSizePredictor(size_t 
preferred_block_size_bytes,
+                                                       size_t 
preferred_max_col_bytes,
+                                                       const Segment& segment,

Review Comment:
   This interface is tight coupling with current internal table's logic. We 
could not use it in other cases, for example:
   1. scan next table
   2. stream load
   3. join operator?
   maybe we should move segment like logic from this class, and this class 
should be a util



##########
be/src/storage/segment/segment_iterator.cpp:
##########
@@ -369,6 +369,14 @@ Status SegmentIterator::_init_impl(const 
StorageReadOptions& opts) {
     // Read options will not change, so that just resize here
     _block_rowids.resize(_opts.block_row_max);
 
+    // Adaptive batch size: snapshot the initial row limit and create 
predictor if enabled.

Review Comment:
   And please use AI to find all scan like operators that may need use adaptive 
block size. because we need pick scan like optimizations to branch 4.0 and 
branch 4.1.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to