[ 
https://issues.apache.org/jira/browse/SPARK-21024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16492333#comment-16492333
 ] 

Takeshi Yamamuro commented on SPARK-21024:
------------------------------------------

No, feel free to take over if someone has good ideas.

> CSV parse mode handles Univocity parser exceptions
> --------------------------------------------------
>
>                 Key: SPARK-21024
>                 URL: https://issues.apache.org/jira/browse/SPARK-21024
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.1.1
>            Reporter: Takeshi Yamamuro
>            Priority: Minor
>
> The current master cannot skip the illegal records that Univocity parsers:
> This comes from the spark-user mailing list:
> https://www.mail-archive.com/user@spark.apache.org/msg63985.html
> {code}
> scala> Seq("0,1", "0,1,2,3").toDF().write.text("/Users/maropu/Desktop/data")
> scala> val df = spark.read.format("csv").schema("a int, b 
> int").option("maxColumns", "3").load("/Users/maropu/Desktop/data")
> scala> df.show
> com.univocity.parsers.common.TextParsingException: 
> java.lang.ArrayIndexOutOfBoundsException - 3
> Hint: Number of columns processed may have exceeded limit of 3 columns. Use 
> settings.setMaxColumns(int) to define the maximum number of columns your 
> input can have
> Ensure your configuration is correct, with delimiters, quotes and escape 
> sequences that match the input format you are trying to parse
> Parser Configuration: CsvParserSettings:
>         Auto configuration enabled=true
>         Autodetect column delimiter=false
>         Autodetect quotes=false
>         Column reordering enabled=true
>         Empty value=null
>         Escape unquoted values=false
>         ...
> at 
> com.univocity.parsers.common.AbstractParser.handleException(AbstractParser.java:339)
> at 
> com.univocity.parsers.common.AbstractParser.handleEOF(AbstractParser.java:195)
> at 
> com.univocity.parsers.common.AbstractParser.parseLine(AbstractParser.java:544)
> at 
> org.apache.spark.sql.execution.datasources.csv.UnivocityParser.parse(UnivocityParser.scala:191)
> at 
> org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anonfun$5.apply(UnivocityParser.scala:308)
> at 
> org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anonfun$5.apply(UnivocityParser.scala:308)
> at 
> org.apache.spark.sql.execution.datasources.FailureSafeParser.parse(FailureSafeParser.scala:60)
> at 
> org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anonfun$parseIterator$1.apply(UnivocityParser.scala:312)
> at 
> org.apache.spark.sql.execution.datasources.csv.UnivocityParser$$anonfun$parseIterator$1.apply(UnivocityParser.scala:312)
> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> ...
> {code}
> We could easily fix this like: 
> https://github.com/apache/spark/compare/master...maropu:HandleExceptionInParser



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to