GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/22019

    [SPARK-25040][SQL] Empty string for double and float types should be nulls 
in JSON

    ## What changes were proposed in this pull request?
    
    This PR proposes to treat empty strings for double and float types as 
`null` consistently. Looks we mistakenly missed this corner case, which I guess 
is not that serious since this looks happened betwen 1.x and 2.x, and pretty 
corner case.
    
    For an easy reproducer, in case of double, the code below raises an error:
    
    ```scala
    spark.read.option("mode", "FAILFAST").json(Seq("""{"a":"", "b": ""}""", 
"""{"a": 1.1, "b": 1.1}""").toDS).show()
    ```
    
    ```scala
    Caused by: java.lang.RuntimeException: Cannot parse  as double.
      at 
org.apache.spark.sql.catalyst.json.JacksonParser$$anonfun$makeConverter$7$$anonfun$apply$10.applyOrElse(JacksonParser.scala:163)
      at 
org.apache.spark.sql.catalyst.json.JacksonParser$$anonfun$makeConverter$7$$anonfun$apply$10.applyOrElse(JacksonParser.scala:152)
      at 
org.apache.spark.sql.catalyst.json.JacksonParser.org$apache$spark$sql$catalyst$json$JacksonParser$$parseJsonToken(JacksonParser.scala:277)
      at 
org.apache.spark.sql.catalyst.json.JacksonParser$$anonfun$makeConverter$7.apply(JacksonParser.scala:152)
      at 
org.apache.spark.sql.catalyst.json.JacksonParser$$anonfun$makeConverter$7.apply(JacksonParser.scala:152)
      at 
org.apache.spark.sql.catalyst.json.JacksonParser.org$apache$spark$sql$catalyst$json$JacksonParser$$convertObject(JacksonParser.scala:312)
      at 
org.apache.spark.sql.catalyst.json.JacksonParser$$anonfun$makeStructRootConverter$1$$anonfun$apply$2.applyOrElse(JacksonParser.scala:71)
      at 
org.apache.spark.sql.catalyst.json.JacksonParser$$anonfun$makeStructRootConverter$1$$anonfun$apply$2.applyOrElse(JacksonParser.scala:70)
      at 
org.apache.spark.sql.catalyst.json.JacksonParser.org$apache$spark$sql$catalyst$json$JacksonParser$$parseJsonToken(JacksonParser.scala:277)
      at 
org.apache.spark.sql.catalyst.json.JacksonParser$$anonfun$makeStructRootConverter$1.apply(JacksonParser.scala:70)
      at 
org.apache.spark.sql.catalyst.json.JacksonParser$$anonfun$makeStructRootConverter$1.apply(JacksonParser.scala:70)
      at 
org.apache.spark.sql.catalyst.json.JacksonParser$$anonfun$parse$2.apply(JacksonParser.scala:368)
      at 
org.apache.spark.sql.catalyst.json.JacksonParser$$anonfun$parse$2.apply(JacksonParser.scala:363)
      at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2491)
      at 
org.apache.spark.sql.catalyst.json.JacksonParser.parse(JacksonParser.scala:363)
      at 
org.apache.spark.sql.DataFrameReader$$anonfun$5$$anonfun$6.apply(DataFrameReader.scala:450)
      at 
org.apache.spark.sql.DataFrameReader$$anonfun$5$$anonfun$6.apply(DataFrameReader.scala:450)
      at 
org.apache.spark.sql.execution.datasources.FailureSafeParser.parse(FailureSafeParser.scala:61)
      ... 24 more
    ```
    
    Unlike other types:
    
    ```scala
    spark.read.option("mode", "FAILFAST").json(Seq("""{"a":"", "b": ""}""", 
"""{"a": 1, "b": 1}""").toDS).show()
    ```
    
    ```
    +----+----+
    |   a|   b|
    +----+----+
    |null|null|
    |   1|   1|
    +----+----+
    ```
    
    ## How was this patch tested?
    
    Unit tests were added and manually tested.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark double-float-empty

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22019.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22019
    
----
commit ef57fdd5b0a6f7f0b6343c91c6983d20bc67fb5b
Author: hyukjinkwon <gurwls223@...>
Date:   2018-08-07T05:23:43Z

    Empty string for double and float types should be nulls in JSON

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to