[ 
https://issues.apache.org/jira/browse/SPARK-27768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16849078#comment-16849078
 ] 

Dilip Biswal commented on SPARK-27768:
--------------------------------------

[~dongjoon]
 Thanks for trying out Presto.

Just want to share my 2 cents before we take a final call on it. I am okay with 
whatever you guys decide :).
 There seems to be a subtle difference between Presto and Spark ? Spark returns 
"NULL" in this case where as presto returns an error ? Because of this i think 
we should be more accommodative of data that is accepted in other systems. I am 
afraid, because of "authoring null" semantics, sometimes during the etl process 
we will treat some valid input from other systems as nulls and its probably 
hard for users to locate the bad record and fix..

Lets say for a second that we decide to accept this case. So technically, we 
will not be portable with Hive and Presto. But we are allowing something more 
that these two systems, right ? Do we think that some users would actually want 
the strings such as "infinity" to be treated as null and would be negatively 
surprised to see the new behaviour ? 
 Let me know what you think..

> Infinity, -Infinity, NaN should be recognized in a case insensitive manner
> --------------------------------------------------------------------------
>
>                 Key: SPARK-27768
>                 URL: https://issues.apache.org/jira/browse/SPARK-27768
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Xiao Li
>            Priority: Major
>
> When the inputs contain the constant 'infinity', Spark SQL does not generate 
> the expected results.
> {code:java}
> SELECT avg(CAST(x AS DOUBLE)), var_pop(CAST(x AS DOUBLE))
> FROM (VALUES ('1'), (CAST('infinity' AS DOUBLE))) v(x);
> SELECT avg(CAST(x AS DOUBLE)), var_pop(CAST(x AS DOUBLE))
> FROM (VALUES ('infinity'), ('1')) v(x);
> SELECT avg(CAST(x AS DOUBLE)), var_pop(CAST(x AS DOUBLE))
> FROM (VALUES ('infinity'), ('infinity')) v(x);
> SELECT avg(CAST(x AS DOUBLE)), var_pop(CAST(x AS DOUBLE))
> FROM (VALUES ('-infinity'), ('infinity')) v(x);{code}
>  The root cause: Spark SQL does not recognize the special constants in a case 
> insensitive way. In PostgreSQL, they are recognized in a case insensitive 
> way. 
> Link: https://www.postgresql.org/docs/9.3/datatype-numeric.html 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to