andygrove commented on PR #1298: URL: https://github.com/apache/datafusion-comet/pull/1298#issuecomment-2598503895
@viirya, I wonder if you could help me and @parthchandra understand why the test in this PR is failing. The error is: ``` Can't zip RDDs with unequal numbers of partitions: ArrayBuffer(10, 0) java.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of partitions: ArrayBuffer(10, 0) at org.apache.spark.rdd.ZippedPartitionsBaseRDD.getPartitions(ZippedPartitionsRDD.scala:58) ``` From the debug logging (shown below) we can see that `CometUnionExec` always reports its partitioning as `UnknownPartitioning(0)`, so that at least seems to be part of the issue. Debug logging: ``` ------------------ wrapped UnknownPartitioning(0) CometScanExec num Parts = 5 class org.apache.spark.sql.comet.CometScanExec has UnknownPartitioning(5) ------------------ containsBroadcastInput = false firstNonBroadcastPlanNumPartitions (CometScan parquet spark_catalog.default.dim_store) = 5 Setting broadcast partitions to 5 ------------------ wrapped UnknownPartitioning(0) CometScanExec num Parts = 26 class org.apache.spark.sql.comet.CometUnionExec has UnknownPartitioning(0); first child has UnknownPartitioning(26) class org.apache.spark.sql.execution.adaptive.BroadcastQueryStageExec has UnknownPartitioning(0) ------------------ containsBroadcastInput = true firstNonBroadcastPlanNumPartitions (CometUnion) = 0 Setting broadcast partitions to 0 ------------------ class org.apache.spark.sql.comet.CometScanExec has UnknownPartitioning(26) ------------------ containsBroadcastInput = false firstNonBroadcastPlanNumPartitions (CometScan parquet spark_catalog.default.fact_sk) = 26 Setting broadcast partitions to 26 ------------------ wrapped UnknownPartitioning(0) CometScanExec num Parts = 26 class org.apache.spark.sql.comet.CometScanExec has UnknownPartitioning(26) ------------------ containsBroadcastInput = false firstNonBroadcastPlanNumPartitions (CometScan parquet spark_catalog.default.fact_stats) = 26 Setting broadcast partitions to 26 setNumPartitions(0) ``` If I modify `CometUnionExec` to report the same partitioning as its first child, then I see: ``` Can't zip RDDs with unequal numbers of partitions: ArrayBuffer(10, 26) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org