GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/22019
[SPARK-25040][SQL] Empty string for double and float types should be nulls in JSON ## What changes were proposed in this pull request? This PR proposes to treat empty strings for double and float types as `null` consistently. Looks we mistakenly missed this corner case, which I guess is not that serious since this looks happened betwen 1.x and 2.x, and pretty corner case. For an easy reproducer, in case of double, the code below raises an error: ```scala spark.read.option("mode", "FAILFAST").json(Seq("""{"a":"", "b": ""}""", """{"a": 1.1, "b": 1.1}""").toDS).show() ``` ```scala Caused by: java.lang.RuntimeException: Cannot parse as double. at org.apache.spark.sql.catalyst.json.JacksonParser$$anonfun$makeConverter$7$$anonfun$apply$10.applyOrElse(JacksonParser.scala:163) at org.apache.spark.sql.catalyst.json.JacksonParser$$anonfun$makeConverter$7$$anonfun$apply$10.applyOrElse(JacksonParser.scala:152) at org.apache.spark.sql.catalyst.json.JacksonParser.org$apache$spark$sql$catalyst$json$JacksonParser$$parseJsonToken(JacksonParser.scala:277) at org.apache.spark.sql.catalyst.json.JacksonParser$$anonfun$makeConverter$7.apply(JacksonParser.scala:152) at org.apache.spark.sql.catalyst.json.JacksonParser$$anonfun$makeConverter$7.apply(JacksonParser.scala:152) at org.apache.spark.sql.catalyst.json.JacksonParser.org$apache$spark$sql$catalyst$json$JacksonParser$$convertObject(JacksonParser.scala:312) at org.apache.spark.sql.catalyst.json.JacksonParser$$anonfun$makeStructRootConverter$1$$anonfun$apply$2.applyOrElse(JacksonParser.scala:71) at org.apache.spark.sql.catalyst.json.JacksonParser$$anonfun$makeStructRootConverter$1$$anonfun$apply$2.applyOrElse(JacksonParser.scala:70) at org.apache.spark.sql.catalyst.json.JacksonParser.org$apache$spark$sql$catalyst$json$JacksonParser$$parseJsonToken(JacksonParser.scala:277) at org.apache.spark.sql.catalyst.json.JacksonParser$$anonfun$makeStructRootConverter$1.apply(JacksonParser.scala:70) at org.apache.spark.sql.catalyst.json.JacksonParser$$anonfun$makeStructRootConverter$1.apply(JacksonParser.scala:70) at org.apache.spark.sql.catalyst.json.JacksonParser$$anonfun$parse$2.apply(JacksonParser.scala:368) at org.apache.spark.sql.catalyst.json.JacksonParser$$anonfun$parse$2.apply(JacksonParser.scala:363) at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2491) at org.apache.spark.sql.catalyst.json.JacksonParser.parse(JacksonParser.scala:363) at org.apache.spark.sql.DataFrameReader$$anonfun$5$$anonfun$6.apply(DataFrameReader.scala:450) at org.apache.spark.sql.DataFrameReader$$anonfun$5$$anonfun$6.apply(DataFrameReader.scala:450) at org.apache.spark.sql.execution.datasources.FailureSafeParser.parse(FailureSafeParser.scala:61) ... 24 more ``` Unlike other types: ```scala spark.read.option("mode", "FAILFAST").json(Seq("""{"a":"", "b": ""}""", """{"a": 1, "b": 1}""").toDS).show() ``` ``` +----+----+ | a| b| +----+----+ |null|null| | 1| 1| +----+----+ ``` ## How was this patch tested? Unit tests were added and manually tested. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark double-float-empty Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22019.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22019 ---- commit ef57fdd5b0a6f7f0b6343c91c6983d20bc67fb5b Author: hyukjinkwon <gurwls223@...> Date: 2018-08-07T05:23:43Z Empty string for double and float types should be nulls in JSON ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org