[ https://issues.apache.org/jira/browse/SPARK-38528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17505149#comment-17505149 ]
Bruce Robbins commented on SPARK-38528: --------------------------------------- This is a bug in {{ExtractGenerator}} in which an array ({{{}projectExprs{}}}) is updated from within a closure passed to a map operation (the array is external to the closure). If the sequence of expressions on which the map operation is called is a {{{}Stream{}}}, the map operation is evaluated lazily, so the array is not fully updated before the rule completes. > NullPointerException when selecting a generator in a Stream of aggregate > expressions > ------------------------------------------------------------------------------------ > > Key: SPARK-38528 > URL: https://issues.apache.org/jira/browse/SPARK-38528 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.1.3, 3.2.1, 3.3.0 > Reporter: Bruce Robbins > Priority: Major > > Assume this dataframe: > {noformat} > val df = Seq(1, 2, 3).toDF("v") > {noformat} > This works: > {noformat} > df.select(Seq(explode(array(min($"v"), max($"v"))), sum($"v")): _*).collect > {noformat} > However, this doesn't: > {noformat} > df.select(Stream(explode(array(min($"v"), max($"v"))), sum($"v")): _*).collect > {noformat} > It throws this error: > {noformat} > java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates$.$anonfun$containsAggregates$1(Analyzer.scala:2516) > at scala.collection.immutable.List.flatMap(List.scala:366) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates$.containsAggregates(Analyzer.scala:2515) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates$$anonfun$apply$31.applyOrElse(Analyzer.scala:2509) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates$$anonfun$apply$31.applyOrElse(Analyzer.scala:2508) > {noformat} > The only difference between the two queries is that the first one uses > {{Seq}} to specify the varargs, whereas the second one uses {{Stream}}. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org