Takeshi Yamamuro created SPARK-21024: ----------------------------------------
Summary: CSV parse mode handles Univocity parser exceptions Key: SPARK-21024 URL: https://issues.apache.org/jira/browse/SPARK-21024 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.1.1 Reporter: Takeshi Yamamuro Priority: Minor The current master cannot skip the illegal records that Univocity parsers: This comes from the spark-user mailing list: https://www.mail-archive.com/user@spark.apache.org/msg63985.html {code} scala> Seq("0,1", "0,1,2,3").toDF().write.text("/Users/maropu/Desktop/data") scala> val df = spark.read.format("csv").schema("a int, b int").option("maxColumns", "3").load("/Users/maropu/Desktop/data") scala> df.show com.univocity.parsers.common.TextParsingException: java.lang.ArrayIndexOutOfBoundsException - 3 Hint: Number of columns processed may have exceeded limit of 3 columns. Use settings.setMaxColumns(int) to define the maximum number of columns your input can have Ensure your configuration is correct, with delimiters, quotes and escape sequences that match the input format you are trying to parse Parser Configuration: CsvParserSettings: Auto configuration enabled=true Autodetect column delimiter=false Autodetect quotes=false Column reordering enabled=true Empty value=null Escape unquoted values=false ... at com.univocity.parsers.common.AbstractParser.handleException(AbstractParser.java:339) at com.univocity.parsers.common.AbstractParser.handleEOF(AbstractParser.java:195) at com.univocity.parsers.common.AbstractParser.parseLine(AbstractParser.java:544) at org.apache.spark.sql.execution.datasources.csv.UnivocityParser.parse(UnivocityParser.scala:191) at org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anonfun$5.apply(UnivocityParser.scala:308) at org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anonfun$5.apply(UnivocityParser.scala:308) at org.apache.spark.sql.execution.datasources.FailureSafeParser.parse(FailureSafeParser.scala:60) at org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anonfun$parseIterator$1.apply(UnivocityParser.scala:312) at org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anonfun$parseIterator$1.apply(UnivocityParser.scala:312) at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) ... {code} We could easily fix this like: https://github.com/apache/spark/compare/master...maropu:HandleExceptionInParser -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org