Bruce Robbins created SPARK-38666: ------------------------------------- Summary: Missing aggregate filter checks Key: SPARK-38666 URL: https://issues.apache.org/jira/browse/SPARK-38666 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.0 Reporter: Bruce Robbins
h2. Window function in filter {noformat} select sum(a) filter (where nth_value(a, 2) over (order by b) > 1) from (select 1 a, '2' b); {noformat} This query should produce an analysis error, but instead produces a stack overflow: {noformat} java.lang.StackOverflowError: null at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$collect$1(TreeNode.scala:305) ~[spark-catalyst_2.12-3.4.0-SNAPSHOT.jar:3.4.0-SNAPSHOT] at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$collect$1$adapted(TreeNode.scala:305) ~[spark-catalyst_2.12-3.4.0-SNAPSHOT.jar:3.4.0-SNAPSHOT] at org.apache.spark.sql.catalyst.trees.TreeNode.foreach(TreeNode.scala:264) ~[spark-catalyst_2.12-3.4.0-SNAPSHOT.jar:3.4.0-SNAPSHOT] at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1(TreeNode.scala:265) ~[spark-catalyst_2.12-3.4.0-SNAPSHOT.jar:3.4.0-SNAPSHOT] at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreach$1$adapted(TreeNode.scala:265) ~[spark-catalyst_2.12-3.4.0-SNAPSHOT.jar:3.4.0-SNAPSHOT] at scala.collection.Iterator.foreach(Iterator.scala:943) ~[scala-library.jar:?] ... {noformat} h2. Non-boolean filter expression {noformat} select sum(a) filter (where a) from (select 1 a, '2' b); {noformat} This query should produce an analysis error, but instead causes a projection compilation error or whole-stage codegen error (depending on the datatype of the expression): {noformat} 22/03/26 17:19:03 ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 50, Column 6: Not a boolean expression org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 50, Column 6: Not a boolean expression at org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:12021) ~[janino-3.0.16.jar:?] at org.codehaus.janino.UnitCompiler.compileBoolean2(UnitCompiler.java:4049) ~[janino-3.0.16.jar:?] at org.codehaus.janino.UnitCompiler.access$6300(UnitCompiler.java:226) ~[janino-3.0.16.jar:?] at org.codehaus.janino.UnitCompiler$14.visitIntegerLiteral(UnitCompiler.java:4016) ~[janino-3.0.16.jar:?] ... 22/03/26 17:19:05 WARN MutableProjection: Expr codegen error and falling back to interpreter mode java.util.concurrent.ExecutionException: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 40, Column 15: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 40, Column 15: Not a boolean expression at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306) ~[guava-14.0.1.jar:?] at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293) ~[guava-14.0.1.jar:?] at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ~[guava-14.0.1.jar:?] at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135) ~[guava-14.0.1.jar:?] at com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410) ~[guava-14.0.1.jar:?] at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2380) ~[guava-14.0.1.jar:?] at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) ~[guava-14.0.1.jar:?] at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257) ~[guava-14.0.1.jar:?] at com.google.common.cache.LocalCache.get(LocalCache.java:4000) ~[guava-14.0.1.jar:?] at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4004) ~[guava-14.0.1.jar:?] ... NULL Time taken: 5.397 seconds, Fetched 1 row(s) {noformat} Interestingly, it also returns a result (NULL). h2. Aggregate expression in filter expression {noformat} select max(b) filter (where max(a) > 1) from (select 1 a, '2' b); {noformat} This query should produce an analysis error, but instead causes a projection compilation error or whole-stage codegen error (depending on the datatype of the expression being aggregated): {noformat} 22/03/26 17:26:38 ERROR TaskSetManager: Task 0 in stage 3.0 failed 1 times; aborting job org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 3.0 failed 1 times, most recent failure: Lost task 0.0 in stage 3.0 (TID 2) (10.0.0.106 executor driver): org.apache.spark.SparkUnsupportedOperationException: Cannot evaluate expression: max(1) at org.apache.spark.sql.errors.QueryExecutionErrors$.cannotEvaluateExpressionError(QueryExecutionErrors.scala:79) at org.apache.spark.sql.catalyst.expressions.Unevaluable.eval(Expression.scala:344) at org.apache.spark.sql.catalyst.expressions.Unevaluable.eval$(Expression.scala:343) at org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression.eval(interfaces.scala:99) at org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:593) at org.apache.spark.sql.catalyst.expressions.If.eval(conditionalExpressions.scala:68) ... {noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org