Jork Zijlstra created SPARK-18269:
-------------------------------------

             Summary: NumberFormatException when reading csv for a nullable 
column
                 Key: SPARK-18269
                 URL: https://issues.apache.org/jira/browse/SPARK-18269
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.0.1
            Reporter: Jork Zijlstra


Having a schema with a nullable column thrown an 
java.lang.NumberFormatException: null when the data + delimeter isn't specified 
in the csv.

Specifying the schema:
StructType(Array(
  StructField("id", IntegerType, nullable = false),
  StructField("underlyingId", IntegerType, true)
))

Data (without trailing delimeter to specify the second column):
1

Read the data:
sparkSession.read
    .schema(sourceSchema)
    .option("header", "false")
    .option("delimiter", """\t""")
    .csv(files(dates): _*)
    .rdd

Actual Result: 
java.lang.NumberFormatException: null
        at java.lang.Integer.parseInt(Integer.java:542)
        at java.lang.Integer.parseInt(Integer.java:615)
        at 
scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272)
        at scala.collection.immutable.StringOps.toInt(StringOps.scala:29)
        at 
org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:244)

Reason:
The csv line is parsed into a Map (indexSafeTokens), which is short of one 
value. So indexSafeTokens(index) throws a NullpointerException reading the 
optional value which isn't in the Map.

The NullpointerException is then given to the CSVTypeCast.castTo(datum: String, 
.....) as the datum value.
The subsequent NumberFormatException is thrown due to the fact that a 
NullpointerException cannot be cast into the Type.

Possible fix:
- Use the provided schema to parse the line with the correct number of columns
- Since its nullable implement a try catch on CSVRelation.csvParser 
indexSafeTokens(index)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to