jinchengchenghh commented on code in PR #11140:
URL:
https://github.com/apache/incubator-gluten/pull/11140#discussion_r2568000854
##########
backends-velox/src/main/scala/org/apache/gluten/extension/AppendBatchResizeForShuffleInputAndOutput.scala:
##########
@@ -40,19 +40,20 @@ case class AppendBatchResizeForShuffleInputAndOutput()
extends Rule[SparkPlan] {
}
val range = VeloxConfig.get.veloxResizeBatchesShuffleInputOutputRange
+ val memoryThreshold = VeloxConfig.get.veloxPreferredBatchBytes
plan.transformUp {
case shuffle: ColumnarShuffleExchangeExec
if resizeBatchesShuffleInputEnabled &&
shuffle.shuffleWriterType.requiresResizingShuffleInput =>
val appendBatches =
- VeloxResizeBatchesExec(shuffle.child, range.min, range.max)
+ VeloxResizeBatchesExec(shuffle.child, range.min, range.max,
memoryThreshold)
Review Comment:
Please use the original config name preferredBatchBytes instead of a new name
##########
cpp/velox/utils/VeloxBatchResizer.cc:
##########
@@ -78,22 +80,43 @@ std::shared_ptr<ColumnarBatch> VeloxBatchResizer::next() {
if (cb->numRows() < minOutputBatchSize_) {
Review Comment:
Please use cb->numBytes() to estimate the batch size, and update the
numBytes(), get the numBytes does not need to flat the vector, just load the
vector
##########
backends-velox/src/test/resources/tpch-approved-plan/v1-bhj-ras/spark32/1.txt:
##########
@@ -60,7 +60,7 @@ Arguments: false
(7) VeloxResizeBatches
Input [18]: [hash_partition_key#X, l_returnflag#X, l_linestatus#X, sum#X,
isEmpty#X, sum#X, isEmpty#X, sum#X, isEmpty#X, sum#X, isEmpty#X, sum#X,
count#X, sum#X, count#X, sum#X, count#X, count#X]
-Arguments: X, X
+Arguments: X, X, 10485760
Review Comment:
Please change it to X, we don't need to update it if we change the default
value in the future
##########
backends-velox/src/main/scala/org/apache/gluten/config/VeloxConfig.scala:
##########
@@ -680,6 +682,15 @@ object VeloxConfig extends ConfigRegistry {
.bytesConf(ByteUnit.BYTE)
.createWithDefaultString("10MB")
+ val ENABLE_LIMIT_BATCH_RESIZER = {
Review Comment:
This can be a default behavior, please remove this config, all the operator
should respect `preferredBatchBytes`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]