[jira] [Comment Edited] (SPARK-18269) NumberFormatException when reading csv for a nullable column

Jork Zijlstra (JIRA) Fri, 04 Nov 2016 05:24:08 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-18269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15636192#comment-15636192
 ]


Jork Zijlstra edited comment on SPARK-18269 at 11/4/16 12:22 PM:
-----------------------------------------------------------------

The error that is thrown is java.lang.NumberFormatException: null. In this case 
null is a NullPointerException and not the value "null".
I did try this before submitting this issue but having the value "null" as 
nullValue doesn't work since "null" != NullPointerException.

Apparently putting a NullpointerException in a parameter of type String works.


was (Author: jzijlstra):
The error that is thrown is java.lang.NumberFormatException: null. In this case 
null is a NullPointerException and not the value "null".
I did try this before submitting this issue but having the value "null" as 
nullValue doesn't work since "null" != NullPointerException.

Apparently putting a NullpointterException in a parameter of type String works.

> NumberFormatException when reading csv for a nullable column
> ------------------------------------------------------------
>
>                 Key: SPARK-18269
>                 URL: https://issues.apache.org/jira/browse/SPARK-18269
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.1
>            Reporter: Jork Zijlstra
>
> Having a schema with a nullable column thrown an 
> java.lang.NumberFormatException: null when the data + delimeter isn't 
> specified in the csv.
> Specifying the schema:
> StructType(Array(
>   StructField("id", IntegerType, nullable = false),
>   StructField("underlyingId", IntegerType, true)
> ))
> Data (without trailing delimeter to specify the second column):
> 1
> Read the data:
> sparkSession.read
>     .schema(sourceSchema)
>     .option("header", "false")
>     .option("delimiter", """\t""")
>     .csv(files(dates): _*)
>     .rdd
> Actual Result: 
> java.lang.NumberFormatException: null
>       at java.lang.Integer.parseInt(Integer.java:542)
>       at java.lang.Integer.parseInt(Integer.java:615)
>       at 
> scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272)
>       at scala.collection.immutable.StringOps.toInt(StringOps.scala:29)
>       at 
> org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:244)
> Reason:
> The csv line is parsed into a Map (indexSafeTokens), which is short of one 
> value. So indexSafeTokens(index) throws a NullpointerException reading the 
> optional value which isn't in the Map.
> The NullpointerException is then given to the CSVTypeCast.castTo(datum: 
> String, .....) as the datum value.
> The subsequent NumberFormatException is thrown due to the fact that a 
> NullpointerException cannot be cast into the Type.
> Possible fix:
> - Use the provided schema to parse the line with the correct number of columns
> - Since its nullable implement a try catch on CSVRelation.csvParser 
> indexSafeTokens(index)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-18269) NumberFormatException when reading csv for a nullable column

Reply via email to