[ https://issues.apache.org/jira/browse/SPARK-25387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan reassigned SPARK-25387: ----------------------------------- Assignee: Maxim Gekk > Malformed CSV causes NPE > ------------------------ > > Key: SPARK-25387 > URL: https://issues.apache.org/jira/browse/SPARK-25387 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.1 > Reporter: Maxim Gekk > Assignee: Maxim Gekk > Priority: Major > Fix For: 2.4.0 > > > Loading a malformed CSV files or a dataset can cause NullPointerException, > for example the code: > {code:scala} > val schema = StructType(StructField("a", IntegerType) :: Nil) > val input = spark.createDataset(Seq("\u0000\u0000\u0001234")) > spark.read.schema(schema).csv(input).collect() > {code} > crashes with the exception: > {code:java} > Caused by: java.lang.NullPointerException > at > org.apache.spark.sql.execution.datasources.csv.UnivocityParser.org$apache$spark$sql$execution$datasources$csv$UnivocityParser$$convert(UnivocityParser.scala:219) > at > org.apache.spark.sql.execution.datasources.csv.UnivocityParser.parse(UnivocityParser.scala:210) > at > org.apache.spark.sql.DataFrameReader$$anonfun$11$$anonfun$12.apply(DataFrameReader.scala:523) > at > org.apache.spark.sql.DataFrameReader$$anonfun$11$$anonfun$12.apply(DataFrameReader.scala:523) > at > org.apache.spark.sql.execution.datasources.FailureSafeParser.parse(FailureSafeParser.scala:68) > {code} > If schema is not specified, the following exception is thrown: > {code:java} > java.lang.NullPointerException > at > scala.collection.mutable.ArrayOps$ofRef$.length$extension(ArrayOps.scala:192) > at scala.collection.mutable.ArrayOps$ofRef.length(ArrayOps.scala:192) > at > scala.collection.IndexedSeqOptimized$class.zipWithIndex(IndexedSeqOptimized.scala:99) > at > scala.collection.mutable.ArrayOps$ofRef.zipWithIndex(ArrayOps.scala:186) > at > org.apache.spark.sql.execution.datasources.csv.CSVDataSource.makeSafeHeader(CSVDataSource.scala:109) > at > org.apache.spark.sql.execution.datasources.csv.TextInputCSVDataSource$.inferFromDataset(CSVDataSource.scala:247) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org