Hyukjin Kwon created SPARK-25040: ------------------------------------ Summary: Empty string for double and float types should be nulls in JSON Key: SPARK-25040 URL: https://issues.apache.org/jira/browse/SPARK-25040 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.2.0, 2.4.0 Reporter: Hyukjin Kwon
The issue itself seems to be a behaviour change between 1.6 and 2.x for treating empty string as null or not in double and float. {code} {"a":"a1","int":1,"other":4.4} {"a":"a2","int":"","other":""} {code} code : {code} val config = new SparkConf().setMaster("local[5]").setAppName("test") val sc = SparkContext.getOrCreate(config) val sql = new SQLContext(sc) val file_path = this.getClass.getClassLoader.getResource("Sanity4.json").getFile val df = sql.read.schema(null).json(file_path) df.show(30) {code} then in spark 1.6, result is {code} +---+----+-----+ | a| int|other| +---+----+-----+ | a1| 1| 4.4| | a2|null| null| +---+----+-----+ {code} {code} root |-- a: string (nullable = true) |-- int: long (nullable = true) |-- other: double (nullable = true) {code} but in spark 2.2, result is {code} +----+----+-----+ | a| int|other| +----+----+-----+ | a1| 1| 4.4| |null|null| null| +----+----+-----+ {code} {code} root |-- a: string (nullable = true) |-- int: long (nullable = true) |-- other: double (nullable = true) {code} Another easy reproducer: {code} spark.read.schema("a DOUBLE, b FLOAT") .option("mode", "FAILFAST").json(Seq("""{"a":"", "b": ""}""", """{"a": 1.1, "b": 1.1}""").toDS) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org