[ https://issues.apache.org/jira/browse/SPARK-21024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-21024. ---------------------------------- Resolution: Incomplete > CSV parse mode handles Univocity parser exceptions > -------------------------------------------------- > > Key: SPARK-21024 > URL: https://issues.apache.org/jira/browse/SPARK-21024 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.1.1 > Reporter: Takeshi Yamamuro > Priority: Minor > Labels: bulk-closed > > The current master cannot skip the illegal records that Univocity parsers: > This comes from the spark-user mailing list: > https://www.mail-archive.com/user@spark.apache.org/msg63985.html > {code} > scala> Seq("0,1", "0,1,2,3").toDF().write.text("/Users/maropu/Desktop/data") > scala> val df = spark.read.format("csv").schema("a int, b > int").option("maxColumns", "3").load("/Users/maropu/Desktop/data") > scala> df.show > com.univocity.parsers.common.TextParsingException: > java.lang.ArrayIndexOutOfBoundsException - 3 > Hint: Number of columns processed may have exceeded limit of 3 columns. Use > settings.setMaxColumns(int) to define the maximum number of columns your > input can have > Ensure your configuration is correct, with delimiters, quotes and escape > sequences that match the input format you are trying to parse > Parser Configuration: CsvParserSettings: > Auto configuration enabled=true > Autodetect column delimiter=false > Autodetect quotes=false > Column reordering enabled=true > Empty value=null > Escape unquoted values=false > ... > at > com.univocity.parsers.common.AbstractParser.handleException(AbstractParser.java:339) > at > com.univocity.parsers.common.AbstractParser.handleEOF(AbstractParser.java:195) > at > com.univocity.parsers.common.AbstractParser.parseLine(AbstractParser.java:544) > at > org.apache.spark.sql.execution.datasources.csv.UnivocityParser.parse(UnivocityParser.scala:191) > at > org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anonfun$5.apply(UnivocityParser.scala:308) > at > org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anonfun$5.apply(UnivocityParser.scala:308) > at > org.apache.spark.sql.execution.datasources.FailureSafeParser.parse(FailureSafeParser.scala:60) > at > org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anonfun$parseIterator$1.apply(UnivocityParser.scala:312) > at > org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anonfun$parseIterator$1.apply(UnivocityParser.scala:312) > at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) > at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > ... > {code} > We could easily fix this like: > https://github.com/apache/spark/compare/master...maropu:HandleExceptionInParser -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org