[ https://issues.apache.org/jira/browse/SPARK-38118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-38118: ------------------------------------ Assignee: Apache Spark > MEAN(Boolean) in the HAVING claus should throw data mismatch error > ------------------------------------------------------------------ > > Key: SPARK-38118 > URL: https://issues.apache.org/jira/browse/SPARK-38118 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.3.0 > Reporter: Rui Wang > Assignee: Apache Spark > Priority: Major > > {code:java} > with t as (select true c) > 3select t.c > 4from t > 5group by t.c > 6having mean(t.c) > 0 {code} > This query throws "Column 't.c' does not exist. Did you mean one of the > following? [t.c]" > However, mean(boolean) is not a supported function signature, thus error > result should be "cannot resolve 'mean(t.c)' due to data type mismatch: > function average requires numeric or interval types, not boolean" > > This is because > # The mean(boolean) in HAVING was not marked as resolved in > {{ResolveFunctions}} rule. > # Thus in {{{}ResolveAggregationFunctions{}}}, the {{TempResolvedColumn}} as > a wrapper in mean({{{}TempResolvedColumn{}}}(t.c)) cannot be removed (only > resolved AGG can remove its’s TempResolvedColumn). > # Thus in a later batch rule applying, {{TempResolvedColumn}} was reverted > and it becomes mean(`t.c`), so mean loses the information about t.c. > # Thus at the last step, the analyzer can only report t.c not found. > > mean(boolean) in HAVING is not marked as resolved in {{ResolveFunctions}} > rule because > # It uses Expression default `resolved` field population code > {code:java} > lazy val resolved: Boolean = childrenResolved && > checkInputDataTypes().isSuccess {code} > > # During the analyzing, mean(boolean) is mean(TempResolveColumn(boolean), > thus childrenResolved is true. > # however checkInputDataTypes() will be false > ([Average.scala#L55|[https://github.com/apache/spark/blob/74ebef243c18e7a8f32bf90ea75ab6afed9e3132/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala#L55])] > # Thus eventually Average's `resolved` will be false, but it leads to wrong > error message. > > -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org