[ 
https://issues.apache.org/jira/browse/SPARK-38118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-38118:
------------------------------------

    Assignee:     (was: Apache Spark)

> MEAN(Boolean) in the HAVING claus should throw data mismatch error
> ------------------------------------------------------------------
>
>                 Key: SPARK-38118
>                 URL: https://issues.apache.org/jira/browse/SPARK-38118
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.3.0
>            Reporter: Rui Wang
>            Priority: Major
>
> {code:java}
> with t as (select true c)
> 3select t.c
> 4from t
> 5group by t.c
> 6having mean(t.c) > 0 {code}
> This query throws "Column 't.c' does not exist. Did you mean one of the 
> following? [t.c]"
> However, mean(boolean) is not a supported function signature, thus error 
> result should be  "cannot resolve 'mean(t.c)' due to data type mismatch: 
> function average requires numeric or interval types, not boolean"
>  
> This is because
>  # The mean(boolean) in HAVING was not marked as resolved in 
> {{ResolveFunctions}} rule.
>  # Thus in {{{}ResolveAggregationFunctions{}}}, the {{TempResolvedColumn}} as 
> a wrapper in mean({{{}TempResolvedColumn{}}}(t.c)) cannot be removed (only 
> resolved AGG can remove its’s TempResolvedColumn).
>  # Thus in a later batch rule applying,  {{TempResolvedColumn}} was reverted 
> and it becomes mean(`t.c`), so mean loses the information about t.c.
>  # Thus at the last step, the analyzer can only report t.c not found.
>  
> mean(boolean) in HAVING is not marked as resolved in {{ResolveFunctions}} 
> rule because 
>  # It uses Expression default `resolved` field population code 
> {code:java}
> lazy val resolved: Boolean = childrenResolved && 
> checkInputDataTypes().isSuccess {code}
>  
>  #  During the analyzing,  mean(boolean) is mean(TempResolveColumn(boolean), 
> thus childrenResolved is true.
>  # however checkInputDataTypes() will be false 
> ([Average.scala#L55|[https://github.com/apache/spark/blob/74ebef243c18e7a8f32bf90ea75ab6afed9e3132/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala#L55])]
>  # Thus eventually Average's `resolved`  will be false, but it leads to wrong 
> error message.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to