[I] [VL] Implement outputPartitioning for ColumnarPartialProjectExec [incubator-gluten]

via GitHub Fri, 07 Nov 2025 04:23:51 -0800


wForget opened a new issue, #11048:
URL: https://github.com/apache/incubator-gluten/issues/11048


   ### Backend
   
   VL (Velox)
   
   ### Bug description
   
   
   I encountered an `PartitioningCollection requires all of its partitionings 
have the same numPartitions` error. After debugging, I found that this was 
because we hadn't implemented outputPartitioning for ColumnarPartialProjectExec.
   
   error log:
   
   ```
   25/11/07 19:47:08 ERROR ExecuteStatement: Error operating ExecuteStatement: 
java.lang.IllegalArgumentException: requirement failed: PartitioningCollection 
requires all of its partitionings have the same numPartitions.
        at scala.Predef$.require(Predef.scala:281)
        at 
org.apache.spark.sql.catalyst.plans.physical.PartitioningCollection.<init>(partitioning.scala:500)
        at 
org.apache.gluten.execution.ColumnarShuffledJoin.outputPartitioning(JoinExecTransformer.scala:64)
        at 
org.apache.gluten.execution.ColumnarShuffledJoin.outputPartitioning$(JoinExecTransformer.scala:62)
        at 
org.apache.gluten.execution.ShuffledHashJoinExecTransformerBase.outputPartitioning(JoinExecTransformer.scala:325)
        at 
org.apache.spark.sql.execution.PartitioningPreservingUnaryExecNode.outputPartitioning(AliasAwareOutputExpression.scala:33)
        at 
org.apache.spark.sql.execution.PartitioningPreservingUnaryExecNode.outputPartitioning$(AliasAwareOutputExpression.scala:31)
        at 
org.apache.gluten.execution.ProjectExecTransformerBase.outputPartitioning(BasicPhysicalOperatorTransformer.scala:166)
        at 
org.apache.gluten.extension.FlushableHashAggregateRule$.org$apache$gluten$extension$FlushableHashAggregateRule$$isAggInputAlreadyDistributedWithAggKeys(FlushableHashAggregateRule.scala:139)
        at 
org.apache.gluten.extension.FlushableHashAggregateRule.$anonfun$replaceEligibleAggregates$1(FlushableHashAggregateRule.scala:93)
        at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
        at scala.collection.Iterator.foreach(Iterator.scala:943)
        at scala.collection.Iterator.foreach$(Iterator.scala:943)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
        at scala.collection.IterableLike.foreach(IterableLike.scala:74)
        at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
        at scala.collection.TraversableLike.map(TraversableLike.scala:286)
        at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
        at scala.collection.AbstractTraversable.map(Traversable.scala:108)
        at 
org.apache.gluten.extension.FlushableHashAggregateRule.$anonfun$replaceEligibleAggregates$1(FlushableHashAggregateRule.scala:102)
        at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
        at scala.collection.Iterator.foreach(Iterator.scala:943)
        at scala.collection.Iterator.foreach$(Iterator.scala:943)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
        at scala.collection.IterableLike.foreach(IterableLike.scala:74)
        at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
        at scala.collection.TraversableLike.map(TraversableLike.scala:286)
        at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
        at scala.collection.AbstractTraversable.map(Traversable.scala:108)
        at 
org.apache.gluten.extension.FlushableHashAggregateRule.$anonfun$replaceEligibleAggregates$1(FlushableHashAggregateRule.scala:102)
        at 
org.apache.gluten.extension.FlushableHashAggregateRule.org$apache$gluten$extension$FlushableHashAggregateRule$$replaceEligibleAggregates(FlushableHashAggregateRule.scala:105)
        at 
org.apache.gluten.extension.FlushableHashAggregateRule$$anonfun$apply$2.applyOrElse(FlushableHashAggregateRule.scala:48)
        at 
org.apache.gluten.extension.FlushableHashAggregateRule$$anonfun$apply$2.applyOrElse(FlushableHashAggregateRule.scala:41)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:515)
        at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:515)
        at 
org.apache.gluten.extension.FlushableHashAggregateRule.apply(FlushableHashAggregateRule.scala:41)
        at 
org.apache.gluten.extension.FlushableHashAggregateRule.apply(FlushableHashAggregateRule.scala:35)
        at 
org.apache.gluten.extension.columnar.LoggedRule.$anonfun$apply$1(LoggedRule.scala:44)
        at 
org.apache.gluten.metrics.GlutenTimeMetric$.withNanoTime(GlutenTimeMetric.scala:41)
        at 
org.apache.gluten.metrics.GlutenTimeMetric$.withMillisTime(GlutenTimeMetric.scala:46)
        at 
org.apache.gluten.metrics.GlutenTimeMetric$.recordMillisTime(GlutenTimeMetric.scala:50)
        at 
org.apache.gluten.extension.columnar.LoggedRule.apply(LoggedRule.scala:44)
        at 
org.apache.gluten.extension.columnar.LoggedRule.apply(LoggedRule.scala:29)
        at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:222)
        at 
scala.collection.IndexedSeqOptimized.foldLeft(IndexedSeqOptimized.scala:60)
        at 
scala.collection.IndexedSeqOptimized.foldLeft$(IndexedSeqOptimized.scala:68)
        at scala.collection.mutable.ArrayBuffer.foldLeft(ArrayBuffer.scala:49)
        at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:219)
        at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:211)
        at scala.collection.immutable.List.foreach(List.scala:431)
        at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:211)
   ```
   
   `ColumnarPartialProjectExec.outputPartitioning` returns 
`UnknownPartitioning(0)`:
   
   
![Image](https://github.com/user-attachments/assets/794f2b74-94e8-497a-a4c6-8fa1b13acaff)
   
   ### Gluten version
   
   Gluten-1.5
   
   ### Spark version
   
   Spark-3.5.x
   
   ### Spark configurations
   
   _No response_
   
   ### System information
   
   _No response_
   
   ### Relevant logs
   
   ```bash
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [VL] Implement outputPartitioning for ColumnarPartialProjectExec [incubator-gluten]

Reply via email to