Re: [I] [Spark 4.0.0] Can't zip RDDs with unequal numbers of partitions: List(9, 9, 9, 17) [datafusion-comet]

via GitHub Thu, 18 Sep 2025 06:53:22 -0700


andygrove commented on issue #2408:
URL: 
https://github.com/apache/datafusion-comet/issues/2408#issuecomment-3307466292


   @wForget here you go:
   
   ```
   py4j.protocol.Py4JJavaError: An error occurred while calling 
o250.collectToPython.
   : org.apache.spark.SparkException: Exception thrown in awaitResult: 
        at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:358)
        at 
org.apache.spark.sql.execution.SubqueryExec.executeCollect(basicPhysicalOperators.scala:865)
        at 
org.apache.spark.sql.execution.ScalarSubquery.updateResult(subquery.scala:83)
        at 
org.apache.spark.sql.comet.CometNativeExec.$anonfun$prepareSubqueries$3(operators.scala:183)
        at 
org.apache.spark.sql.comet.CometNativeExec.$anonfun$prepareSubqueries$3$adapted(operators.scala:182)
        at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:619)
        at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:617)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:935)
        at 
org.apache.spark.sql.comet.CometNativeExec.prepareSubqueries(operators.scala:182)
        at 
org.apache.spark.sql.comet.CometNativeExec.doPrepare(operators.scala:165)
        at org.apache.spark.sql.execution.SparkPlan.prepare(SparkPlan.scala:309)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$prepare$1(SparkPlan.scala:305)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$prepare$1$adapted(SparkPlan.scala:305)
        at scala.collection.immutable.Vector.foreach(Vector.scala:2125)
        at org.apache.spark.sql.execution.SparkPlan.prepare(SparkPlan.scala:305)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:258)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:257)
        at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeLike.$anonfun$triggerFuture$1(ShuffleExchangeExec.scala:90)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$4(SQLExecution.scala:322)
        at 
org.apache.spark.sql.execution.SQLExecution$.withSessionTagsApplied(SQLExecution.scala:272)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$3(SQLExecution.scala:320)
        at 
org.apache.spark.JobArtifactSet$.withActiveJobArtifactState(JobArtifactSet.scala:94)
        at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withThreadLocalCaptured$2(SQLExecution.scala:316)
        at 
java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:840)
   Caused by: java.util.concurrent.ExecutionException: 
java.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of 
partitions: List(17, 17, 23)
        at 
java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
        at 
java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073)
        at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:350)
        ... 27 more
   Caused by: java.lang.IllegalArgumentException: Can't zip RDDs with unequal 
numbers of partitions: List(17, 17, 23)
        at 
org.apache.spark.rdd.ZippedPartitionsBaseRDD.getPartitions(ZippedPartitionsRDD.scala:58)
        at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:301)
        at scala.Option.getOrElse(Option.scala:201)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:297)
        at org.apache.spark.rdd.RDD.getNumPartitions(RDD.scala:317)
        at 
org.apache.spark.sql.comet.ZippedPartitionsRDD.<init>(ZippedPartitionsRDD.scala:41)
        at 
org.apache.spark.sql.comet.ZippedPartitionsRDD$.$anonfun$apply$1(ZippedPartitionsRDD.scala:62)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at 
org.apache.spark.sql.comet.ZippedPartitionsRDD$.withScope(ZippedPartitionsRDD.scala:66)
        at 
org.apache.spark.sql.comet.ZippedPartitionsRDD$.apply(ZippedPartitionsRDD.scala:62)
        at 
org.apache.spark.sql.comet.CometNativeExec.doExecuteColumnar(operators.scala:320)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnarRDD$1(SparkPlan.scala:222)
        at scala.util.Try$.apply(Try.scala:217)
        at 
org.apache.spark.util.Utils$.doTryWithCallerStacktrace(Utils.scala:1378)
        at 
org.apache.spark.util.Utils$.getTryWithCallerStacktrace(Utils.scala:1439)
        at org.apache.spark.util.LazyTry.get(LazyTry.scala:58)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnar$1(SparkPlan.scala:236)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:260)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:257)
        at 
org.apache.spark.sql.execution.SparkPlan.executeColumnar(SparkPlan.scala:232)
        at 
org.apache.spark.sql.comet.execution.shuffle.CometShuffleExchangeExec.inputRDD$lzycompute(CometShuffleExchangeExec.scala:88)
        at 
org.apache.spark.sql.comet.execution.shuffle.CometShuffleExchangeExec.inputRDD(CometShuffleExchangeExec.scala:86)
        at 
org.apache.spark.sql.comet.execution.shuffle.CometShuffleExchangeExec.mapOutputStatisticsFuture$lzycompute(CometShuffleExchangeExec.scala:102)
        at 
org.apache.spark.sql.comet.execution.shuffle.CometShuffleExchangeExec.mapOutputStatisticsFuture(CometShuffleExchangeExec.scala:101)
        at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeLike.$anonfun$triggerFuture$3(ShuffleExchangeExec.scala:97)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeLike.$anonfun$triggerFuture$1(ShuffleExchangeExec.scala:97)
        ... 9 more
        Suppressed: org.apache.spark.util.Utils$OriginalTryStackTraceException: 
Full stacktrace of original doTryWithCallerStacktrace caller
                at 
org.apache.spark.rdd.ZippedPartitionsBaseRDD.getPartitions(ZippedPartitionsRDD.scala:58)
                at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:301)
                at scala.Option.getOrElse(Option.scala:201)
                at org.apache.spark.rdd.RDD.partitions(RDD.scala:297)
                at org.apache.spark.rdd.RDD.getNumPartitions(RDD.scala:317)
                at 
org.apache.spark.sql.comet.ZippedPartitionsRDD.<init>(ZippedPartitionsRDD.scala:41)
                at 
org.apache.spark.sql.comet.ZippedPartitionsRDD$.$anonfun$apply$1(ZippedPartitionsRDD.scala:62)
                at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
                at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
                at 
org.apache.spark.sql.comet.ZippedPartitionsRDD$.withScope(ZippedPartitionsRDD.scala:66)
                at 
org.apache.spark.sql.comet.ZippedPartitionsRDD$.apply(ZippedPartitionsRDD.scala:62)
                at 
org.apache.spark.sql.comet.CometNativeExec.doExecuteColumnar(operators.scala:320)
                at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeColumnarRDD$1(SparkPlan.scala:222)
                at scala.util.Try$.apply(Try.scala:217)
                at 
org.apache.spark.util.Utils$.doTryWithCallerStacktrace(Utils.scala:1378)
                at 
org.apache.spark.util.LazyTry.tryT$lzycompute(LazyTry.scala:46)
                at org.apache.spark.util.LazyTry.tryT(LazyTry.scala:46)
                ... 22 more
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] [Spark 4.0.0] Can't zip RDDs with unequal numbers of partitions: List(9, 9, 9, 17) [datafusion-comet]

Reply via email to