Maxim Gekk created SPARK-25387: ---------------------------------- Summary: Malformed CSV causes NPE Key: SPARK-25387 URL: https://issues.apache.org/jira/browse/SPARK-25387 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.1 Reporter: Maxim Gekk
Loading a malformed CSV files or a dataset can cause NullPointerException, for example the code: {code:scala} val schema = StructType(StructField("a", IntegerType) :: Nil) val input = spark.createDataset(Seq("\u0000\u0000\u0001234")) spark.read.schema(schema).csv(input).collect() {code} crashes with the exception: {code:java} Caused by: java.lang.NullPointerException at org.apache.spark.sql.execution.datasources.csv.UnivocityParser.org$apache$spark$sql$execution$datasources$csv$UnivocityParser$$convert(UnivocityParser.scala:219) at org.apache.spark.sql.execution.datasources.csv.UnivocityParser.parse(UnivocityParser.scala:210) at org.apache.spark.sql.DataFrameReader$$anonfun$11$$anonfun$12.apply(DataFrameReader.scala:523) at org.apache.spark.sql.DataFrameReader$$anonfun$11$$anonfun$12.apply(DataFrameReader.scala:523) at org.apache.spark.sql.execution.datasources.FailureSafeParser.parse(FailureSafeParser.scala:68) {code} If schema is not specified, the following exception is thrown: {code:java} java.lang.NullPointerException at scala.collection.mutable.ArrayOps$ofRef$.length$extension(ArrayOps.scala:192) at scala.collection.mutable.ArrayOps$ofRef.length(ArrayOps.scala:192) at scala.collection.IndexedSeqOptimized$class.zipWithIndex(IndexedSeqOptimized.scala:99) at scala.collection.mutable.ArrayOps$ofRef.zipWithIndex(ArrayOps.scala:186) at org.apache.spark.sql.execution.datasources.csv.CSVDataSource.makeSafeHeader(CSVDataSource.scala:109) at org.apache.spark.sql.execution.datasources.csv.TextInputCSVDataSource$.inferFromDataset(CSVDataSource.scala:247) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org