[ 
https://issues.apache.org/jira/browse/SPARK-38666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-38666.
---------------------------------
    Fix Version/s: 3.3.0
         Assignee: Bruce Robbins
       Resolution: Fixed

> Missing aggregate filter checks
> -------------------------------
>
>                 Key: SPARK-38666
>                 URL: https://issues.apache.org/jira/browse/SPARK-38666
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.4.0
>            Reporter: Bruce Robbins
>            Assignee: Bruce Robbins
>            Priority: Major
>             Fix For: 3.3.0
>
>
> h3. Window function in filter
> {noformat}
> select sum(a) filter (where nth_value(a, 2) over (order by b) > 1)
> from (select 1 a, '2' b);
> {noformat}
> This query should produce an analysis error, but instead produces a stack 
> overflow:
> {noformat}
> java.lang.StackOverflowError: null
>       at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$collect$1(TreeNode.scala:305)
>  ~[spark-catalyst_2.12-3.4.0-SNAPSHOT.jar:3.4.0-SNAPSHOT]
>       at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$collect$1$adapted(TreeNode.scala:305)
>  ~[spark-catalyst_2.12-3.4.0-SNAPSHOT.jar:3.4.0-SNAPSHOT]
>       at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:264) 
> ~[spark-catalyst_2.12-3.4.0-SNAPSHOT.jar:3.4.0-SNAPSHOT]
>       at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:265)
>  ~[spark-catalyst_2.12-3.4.0-SNAPSHOT.jar:3.4.0-SNAPSHOT]
>       at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:265)
>  ~[spark-catalyst_2.12-3.4.0-SNAPSHOT.jar:3.4.0-SNAPSHOT]
>       at scala.collection.Iterator.foreach(Iterator.scala:943) 
> ~[scala-library.jar:?]
> ...
> {noformat}
> h3. Non-boolean filter expression
> {noformat}
> select sum(a) filter (where a) from (select 1 a, '2' b);
> {noformat}
> This query should produce an analysis error, but instead causes a projection 
> compilation error or whole-stage codegen error (depending on the datatype of 
> the expression):
> {noformat}
> 22/03/26 17:19:03 ERROR CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 50, Column 6: Not a boolean expression
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 50, Column 6: Not a boolean expression
>       at 
> org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:12021) 
> ~[janino-3.0.16.jar:?]
>       at 
> org.codehaus.janino.UnitCompiler.compileBoolean2(UnitCompiler.java:4049) 
> ~[janino-3.0.16.jar:?]
>       at org.codehaus.janino.UnitCompiler.access$6300(UnitCompiler.java:226) 
> ~[janino-3.0.16.jar:?]
>       at 
> org.codehaus.janino.UnitCompiler$14.visitIntegerLiteral(UnitCompiler.java:4016)
>  ~[janino-3.0.16.jar:?]
> ...
> 22/03/26 17:19:05 WARN MutableProjection: Expr codegen error and falling back 
> to interpreter mode
> java.util.concurrent.ExecutionException: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 40, Column 15: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 40, Column 15: Not a boolean expression
>       at 
> com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)
>  ~[guava-14.0.1.jar:?]
>       at 
> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293)
>  ~[guava-14.0.1.jar:?]
>       at 
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) 
> ~[guava-14.0.1.jar:?]
>       at 
> com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
>  ~[guava-14.0.1.jar:?]
>       at 
> com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410)
>  ~[guava-14.0.1.jar:?]
>       at 
> com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2380) 
> ~[guava-14.0.1.jar:?]
>       at 
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>  ~[guava-14.0.1.jar:?]
>       at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257) 
> ~[guava-14.0.1.jar:?]
>       at com.google.common.cache.LocalCache.get(LocalCache.java:4000) 
> ~[guava-14.0.1.jar:?]
>       at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4004) 
> ~[guava-14.0.1.jar:?]
> ...
> NULL
> Time taken: 5.397 seconds, Fetched 1 row(s)
> {noformat}
> Interestingly, it also returns a result (NULL).
> h3. Aggregate expression in filter expression
> {noformat}
> select max(b) filter (where max(a) > 1) from (select 1 a, '2' b);
> {noformat}
> This query should produce an analysis error, but instead causes a projection 
> compilation error or whole-stage codegen error (depending on the datatype of 
> the expression being aggregated):
> {noformat}
> 22/03/26 17:26:38 ERROR TaskSetManager: Task 0 in stage 3.0 failed 1 times; 
> aborting job
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 3.0 failed 1 times, most recent failure: Lost task 0.0 in stage 3.0 
> (TID 2) (10.0.0.106 executor driver): 
> org.apache.spark.SparkUnsupportedOperationException: Cannot evaluate 
> expression: max(1)
>       at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.cannotEvaluateExpressionError(QueryExecutionErrors.scala:79)
>       at 
> org.apache.spark.sql.catalyst.expressions.Unevaluable.eval(Expression.scala:344)
>       at 
> org.apache.spark.sql.catalyst.expressions.Unevaluable.eval$(Expression.scala:343)
>       at 
> org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression.eval(interfaces.scala:99)
>       at 
> org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:593)
>       at 
> org.apache.spark.sql.catalyst.expressions.If.eval(conditionalExpressions.scala:68)
> ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to