[I] [VL][BUG] Exception in evaluating deprecated dataset sum [incubator-gluten]

via GitHub Tue, 13 Jan 2026 06:10:22 -0800


Surbhi-Vijay opened a new issue, #11403:
URL: https://github.com/apache/incubator-gluten/issues/11403


   ### Backend
   
   VL (Velox)
   
   ### Bug description
   
   The below testcase is getting transformed into 
`HashAggregateExecTransformer` and failing in accessing value. The most likely 
reason is, it is not able to fallback properly and missing fallback tags in 
copying spark node somewhere.
   
   Note that since it uses deprecated APIs, 
`<arg>-Wconf:cat=deprecation:wv,any:e</arg>` needs to be commented in Gluten 
Pom.
   
   Repro testcase
   ```
   import org.apache.spark.sql.expressions.scalalang.typed
   
     test("typed aggregation: in project list") {
       withSQLConf(SQLConf.ANSI_ENABLED.key -> "false") {
         val ds = Seq(1, 3, 2, 5).toDS()
   
         checkDatasetUnorderly(ds.select(typed.sum((i: Int) => i)), 11.0)
         checkDatasetUnorderly(
           ds.select(typed.sum((i: Int) => i), typed.sum((i: Int) => i * 2)),
           11.0 -> 22.0)
       }
     }
   ```
   
   Failure:
   ```
   == Parsed Logical Plan ==
   'Project 
[unresolvedalias(typedsumdouble(org.apache.spark.sql.internal.TypedSumDouble@78723798,
 Some(unresolveddeserializer(assertnotnull(upcast(getcolumnbyordinal(0, 
IntegerType), IntegerType, - root class: "int")), value#62)), Some(int), 
Some(StructType(StructField(value,IntegerType,false))), input[0, double, false] 
AS value#94, value#94, 
unresolveddeserializer(assertnotnull(upcast(getcolumnbyordinal(0, DoubleType), 
DoubleType, - root class: "double")), value#94), input[0, double, false] AS 
value#95, DoubleType, DoubleType, false))]
   +- LocalRelation [value#62]
   
   == Analyzed Logical Plan ==
   TypedSumDouble(int): double
   Aggregate 
[typedsumdouble(org.apache.spark.sql.internal.TypedSumDouble@78723798, 
Some(assertnotnull(cast(value#62 as int))), Some(int), 
Some(StructType(StructField(value,IntegerType,false))), input[0, double, 
false], value#94, assertnotnull(cast(value#94 as double)), input[0, double, 
false], DoubleType, DoubleType, false) AS TypedSumDouble(int)#97]
   +- LocalRelation [value#62]
   
   == Optimized Logical Plan ==
   Aggregate 
[typedsumdouble(org.apache.spark.sql.internal.TypedSumDouble@78723798, 
Some(value#62), Some(int), 
Some(StructType(StructField(value,IntegerType,false))), input[0, double, 
false], value#94, value#94, input[0, double, false], DoubleType, DoubleType, 
false) AS TypedSumDouble(int)#97]
   +- LocalRelation [value#62]
   
   == Physical Plan ==
   AdaptiveSparkPlan isFinalPlan=false
   +- HashAggregate(keys=[], 
functions=[typedsumdouble(org.apache.spark.sql.internal.TypedSumDouble@78723798,
 Some(value#62), Some(int), 
Some(StructType(StructField(value,IntegerType,false))), input[0, double, 
false], value#94, value#94, input[0, double, false], DoubleType, DoubleType, 
false)], output=[TypedSumDouble(int)#97])
      +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=161]
         +- HashAggregate(keys=[], 
functions=[partial_typedsumdouble(org.apache.spark.sql.internal.TypedSumDouble@78723798,
 Some(value#62), Some(int), 
Some(StructType(StructField(value,IntegerType,false))), input[0, double, 
false], value#94, value#94, input[0, double, false], DoubleType, DoubleType, 
false)], output=[value#98])
            +- LocalTableScan [value#62]
   
   
   
   at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
        at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
        at 
org.scalatest.funsuite.AnyFunSuite.newAssertionFailedException(AnyFunSuite.scala:1564)
        at org.scalatest.Assertions.fail(Assertions.scala:949)
        at org.scalatest.Assertions.fail$(Assertions.scala:945)
        ......
   Caused by: org.apache.spark.SparkException: [INTERNAL_ERROR] Couldn't find 
value#73 in [value#62] SQLSTATE: XX000
        at 
org.apache.spark.SparkException$.internalError(SparkException.scala:92)
        at 
org.apache.spark.SparkException$.internalError(SparkException.scala:96)
        at 
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:81)
        at 
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:74)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:470)
        at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:86)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:470)
        at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:446)
   ``
   
   ### Gluten version
   
   main branch
   
   ### Spark version
   
   spark-4.0.x
   
   ### Spark configurations
   
   _No response_
   
   ### System information
   
   _No response_
   
   ### Relevant logs
   
   ```bash
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [VL][BUG] Exception in evaluating deprecated dataset sum [incubator-gluten]

Reply via email to