andygrove commented on PR #1298:
URL: 
https://github.com/apache/datafusion-comet/pull/1298#issuecomment-2598503895

   @viirya, I wonder if you could help me and @parthchandra understand why the 
test in this PR is failing. 
   
   The error is:
   
   ```
   Can't zip RDDs with unequal numbers of partitions: ArrayBuffer(10, 0)
   java.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of 
partitions: ArrayBuffer(10, 0)
        at 
org.apache.spark.rdd.ZippedPartitionsBaseRDD.getPartitions(ZippedPartitionsRDD.scala:58)
   ```
   
   From the debug logging (shown below) we can see that `CometUnionExec` always 
reports its partitioning as `UnknownPartitioning(0)`, so that at least seems to 
be part of the issue.
   
   Debug logging:
   
   ```
   ------------------
   wrapped UnknownPartitioning(0)
   CometScanExec num Parts = 5
   class org.apache.spark.sql.comet.CometScanExec has UnknownPartitioning(5)
   ------------------
   containsBroadcastInput = false
   firstNonBroadcastPlanNumPartitions (CometScan parquet 
spark_catalog.default.dim_store) = 5
   Setting broadcast partitions to 5
   ------------------
   wrapped UnknownPartitioning(0)
   CometScanExec num Parts = 26
   class org.apache.spark.sql.comet.CometUnionExec has UnknownPartitioning(0); 
first child has UnknownPartitioning(26)
   class org.apache.spark.sql.execution.adaptive.BroadcastQueryStageExec has 
UnknownPartitioning(0)
   ------------------
   containsBroadcastInput = true
   firstNonBroadcastPlanNumPartitions (CometUnion) = 0
   Setting broadcast partitions to 0
   ------------------
   class org.apache.spark.sql.comet.CometScanExec has UnknownPartitioning(26)
   ------------------
   containsBroadcastInput = false
   firstNonBroadcastPlanNumPartitions (CometScan parquet 
spark_catalog.default.fact_sk) = 26
   Setting broadcast partitions to 26
   ------------------
   wrapped UnknownPartitioning(0)
   CometScanExec num Parts = 26
   class org.apache.spark.sql.comet.CometScanExec has UnknownPartitioning(26)
   ------------------
   containsBroadcastInput = false
   firstNonBroadcastPlanNumPartitions (CometScan parquet 
spark_catalog.default.fact_stats) = 26
   Setting broadcast partitions to 26
   setNumPartitions(0)
   ```
   
   If I modify `CometUnionExec` to report the same partitioning as its first 
child, then I see:
   
   ```
   Can't zip RDDs with unequal numbers of partitions: ArrayBuffer(10, 26)
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to