viirya commented on code in PR #213: URL: https://github.com/apache/arrow-datafusion-comet/pull/213#discussion_r1548991785
########## spark/src/main/scala/org/apache/spark/sql/comet/CometBroadcastExchangeExec.scala: ########## @@ -191,7 +193,7 @@ case class CometBroadcastExchangeExec(originalPlan: SparkPlan, child: SparkPlan) override protected def doExecuteColumnar(): RDD[ColumnarBatch] = { val broadcasted = executeBroadcast[Array[ChunkedByteBuffer]]() - new CometBatchRDD(sparkContext, broadcasted.value.length, broadcasted) + new CometBatchRDD(sparkContext, childRDD.getNumPartitions, broadcasted) Review Comment: The broadcast RDD must have same number of partitions as child RDD. Previously we serialize all batches in one partition into a `ChunkedByteBuffer`, so `broadcasted.value.length` is the number of partitions. Now it is changed to serialize one batch in one `ChunkedByteBuffer`, so we need to use the correct number. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org