Andy Grove created SPARK-38060: ---------------------------------- Summary: Inconsistent behavior from JSON option allowNonNumericNumbers Key: SPARK-38060 URL: https://issues.apache.org/jira/browse/SPARK-38060 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0 Environment: Running Spark 3.2.0 in local mode on Ubuntu 20.04.3 LTS Reporter: Andy Grove
The behavior of the JSON option allowNonNumericNumbers is not consistent and still supports parsing NaN and Infinity values in some cases when the option is set to false. h2. Input data {code:java} { "number": "NaN" } { "number": NaN } { "number": "+INF" } { "number": +INF } { "number": "-INF" } { "number": -INF } { "number": "INF" } { "number": INF } { "number": Infinity } { "number": +Infinity } { "number": -Infinity } { "number": "Infinity" } { "number": "+Infinity" } { "number": "-Infinity" } {code} h2. Setup {code:java} import org.apache.spark.sql.types._ val schema = StructType(Seq(StructField("number", DataTypes.FloatType, false))) {code} h2. allowNonNumericNumbers = false {code:java} spark.read.format("json").schema(schema).option("allowNonNumericNumbers", "false").json("nan_valid.json") df.show +---------+ | number| +---------+ | NaN| | null| | null| | null| | null| | null| | null| | null| | null| | null| | null| | Infinity| | null| |-Infinity| +---------+ {code} h2. allowNonNumericNumbers = true {code:java} val df = spark.read.format("json").schema(schema).option("allowNonNumericNumbers", "true").json("nan_valid.json") df.show +---------+ | number| +---------+ | NaN| | NaN| | null| | Infinity| | null| |-Infinity| | null| | null| | Infinity| | Infinity| |-Infinity| | Infinity| | null| |-Infinity| +---------+{code} -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org