Re: [PR] feat: Enable Comet broadcast by default [arrow-datafusion-comet]

via GitHub Wed, 03 Apr 2024 07:27:27 -0700


viirya commented on code in PR #213:
URL: 
https://github.com/apache/arrow-datafusion-comet/pull/213#discussion_r1548991785



##########
spark/src/main/scala/org/apache/spark/sql/comet/CometBroadcastExchangeExec.scala:
##########
@@ -191,7 +193,7 @@ case class CometBroadcastExchangeExec(originalPlan: 
SparkPlan, child: SparkPlan)
   override protected def doExecuteColumnar(): RDD[ColumnarBatch] = {
     val broadcasted = executeBroadcast[Array[ChunkedByteBuffer]]()
 
-    new CometBatchRDD(sparkContext, broadcasted.value.length, broadcasted)
+    new CometBatchRDD(sparkContext, childRDD.getNumPartitions, broadcasted)

Review Comment:
   The broadcast RDD must have same number of partitions as child RDD. 
Previously we serialize all batches in one partition into a 
`ChunkedByteBuffer`, so `broadcasted.value.length` is the number of partitions. 
Now it is changed to serialize one batch in one `ChunkedByteBuffer`, so we need 
to use the correct number.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] feat: Enable Comet broadcast by default [arrow-datafusion-comet]

Reply via email to