Re: [PR] [GLUTEN-10892][VL] Use `veloxPreferredBatchBytes` to control the max size of memory of batches combined [incubator-gluten]

via GitHub Mon, 01 Dec 2025 04:04:24 -0800


jinchengchenghh commented on code in PR #11140:
URL: 
https://github.com/apache/incubator-gluten/pull/11140#discussion_r2576794160



##########
backends-velox/src/test/scala/org/apache/gluten/execution/VeloxTPCHSuite.scala:
##########
@@ -73,6 +73,7 @@ abstract class VeloxTPCHSuite extends VeloxTPCHTableSupport {
       // for unexpected blank
       .replaceAll("Scan parquet ", "Scan parquet")
       // Spark QueryStageExec will take it's id as argument, replace it with X
+      .replaceAll("Arguments: [0-9]+, [0-9]+, [0-9]+", "Arguments: X, X")

Review Comment:
   X, X, X



##########
cpp/velox/utils/VeloxBatchResizer.cc:
##########
@@ -78,22 +80,41 @@ std::shared_ptr<ColumnarBatch> VeloxBatchResizer::next() {
   if (cb->numRows() < minOutputBatchSize_) {
     auto vb = VeloxColumnarBatch::from(pool_, cb);
     auto rv = vb->getRowVector();
+    auto vector = std::static_pointer_cast<facebook::velox::BaseVector>(rv);
+    uint64_t numBytes = cb->numBytes();
+    if (numBytes > preferredBatchBytes_) {
+      // Input batch is too large. Just return it as is.
+      return cb;
+    }
     auto buffer = facebook::velox::RowVector::createEmpty(rv->type(), pool_);
     buffer->append(rv.get());
 
-    for (auto nextCb = in_->next(); nextCb != nullptr; nextCb = in_->next()) {
-      auto nextVb = VeloxColumnarBatch::from(pool_, nextCb);
-      auto nextRv = nextVb->getRowVector();
-      if (buffer->size() + nextRv->size() > maxOutputBatchSize_) {
+    // Call reset manully to potentially release memory
+    vector.reset();

Review Comment:
   And also remove following rv batch.reset



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [GLUTEN-10892][VL] Use `veloxPreferredBatchBytes` to control the max size of memory of batches combined [incubator-gluten]

Reply via email to