[jira] [Commented] (SPARK-38099) Query using an aggregation on a literal value with an empty underlying dataframe returns null

L. C. Hsieh (Jira) Sat, 05 Feb 2022 00:54:04 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-38099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17487435#comment-17487435
 ]


L. C. Hsieh commented on SPARK-38099:
-------------------------------------

As this is not a bug, I will close this ticket then.

> Query using an aggregation on a literal value with an empty underlying 
> dataframe returns null
> ---------------------------------------------------------------------------------------------
>
>                 Key: SPARK-38099
>                 URL: https://issues.apache.org/jira/browse/SPARK-38099
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.2.0
>         Environment: Windows 10, Spark 3.2.0, Java 11.
>            Reporter: Laurens Versluis
>            Priority: Major
>
> Running a query with an aggregation functions such as average on literal 
> value input with an empty dataframe in the FROM clause causes Spark to return 
> null.
> Minimal reproducible example using Spark 3.2.0 with Java 11:
>  
> {code:java}
> sparkSession.emptyDataFrame().createOrReplaceTempView("empty");
> StructType someSchema = new StructType(new 
> StructField[]{DataTypes.createStructField("a", DataTypes.StringType, false)});
> final Row aRow = Row.fromSeq(asScalaBuffer(List.of("a")));
> sparkSession.createDataFrame(List.of(aRow), 
> someSchema).createOrReplaceTempView("non_empty");
> sparkSession.sql("SELECT avg(1)").show(); // standalone query works
> sparkSession.sql("SELECT avg(1) FROM empty").show(); // empty DF gives null
> sparkSession.sql("SELECT avg(1) FROM non_empty").show(); // It does work with 
> any non-empty DF{code}
> Output is as follows:
> {noformat}
> +------+
> |avg(1)|
> +------+
> |   1.0|
> +------+
> +------+
> |avg(1)|
> +------+
> |  null|
> +------+
> +------+
> |avg(1)|
> +------+
> |   1.0|
> +------+
> {noformat}
> I would expect that the second query also returns 1.0. It seems that any 
> non-empty DataFrame returns 1.0. 
>  
> Out of curiosity: is this Spark Catalyst doing some empty DataFrame 
> optimizations that affect the result?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38099) Query using an aggregation on a literal value with an empty underlying dataframe returns null

Reply via email to