[ 
https://issues.apache.org/jira/browse/SPARK-18772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-18772:
---------------------------------
    Description: 
It looks we can avoid some cases for unnecessary conversion try in special 
floats in JSON.

Also, we could support some other cases for them such as {{+INF}}, {{INF}} and 
{{-INF}}.

For avoiding additional conversions, please refer the codes below:

{code}
scala> import org.apache.spark.sql.types._
import org.apache.spark.sql.types._

scala> spark.read.schema(StructType(Seq(StructField("a", 
DoubleType)))).option("mode", "FAILFAST").json(Seq("""{"a": 
"nan"}""").toDS).show()
17/05/12 11:30:41 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2)
java.lang.NumberFormatException: For input string: "nan"
...
{code}




  was:
JacksonParser tests for infinite and NaN values in a way that is not supported 
by the underlying float/double parser. For example, the input string is always 
lowercased to check for {{-Infinity}} but the parser only supports titlecased 
values. So a {{-infinitY}} will pass the test but fail with a 
{{NumberFormatException}} when parsing. This exception is not caught anywhere 
and the task ends up failing.
A related issue is that the code checks for {{Inf}} but the parser only 
supports the long form of {{Infinity}}.


> Unnecessary conversion try and some missing cases for special floats in JSON
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-18772
>                 URL: https://issues.apache.org/jira/browse/SPARK-18772
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.2
>            Reporter: Nathan Howell
>            Priority: Minor
>
> It looks we can avoid some cases for unnecessary conversion try in special 
> floats in JSON.
> Also, we could support some other cases for them such as {{+INF}}, {{INF}} 
> and {{-INF}}.
> For avoiding additional conversions, please refer the codes below:
> {code}
> scala> import org.apache.spark.sql.types._
> import org.apache.spark.sql.types._
> scala> spark.read.schema(StructType(Seq(StructField("a", 
> DoubleType)))).option("mode", "FAILFAST").json(Seq("""{"a": 
> "nan"}""").toDS).show()
> 17/05/12 11:30:41 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2)
> java.lang.NumberFormatException: For input string: "nan"
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to