[ https://issues.apache.org/jira/browse/CSV-141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17652125#comment-17652125 ]
Damjan Jovanovic commented on CSV-141: -------------------------------------- This kind of patch: {code:java} // code placeholder diff --git a/src/main/java/org/apache/commons/csv/Lexer.java b/src/main/java/org/apache/commons/csv/Lexer.java index fd60b5ac..177f56d6 100644 --- a/src/main/java/org/apache/commons/csv/Lexer.java +++ b/src/main/java/org/apache/commons/csv/Lexer.java @@ -378,9 +378,15 @@ final class Lexer implements Closeable { } } } else if (isEndOfFile(c)) { - // error condition (end of file before end of token) - throw new IOException("(startline " + startLineNumber + - ") EOF reached before encapsulated token finished"); + if (allowTrailingText) { + token.type = EOF; + token.isReady = true; // There is data at EOF + return token; + } else { + // error condition (end of file before end of token) + throw new IOException("(startline " + startLineNumber + + ") EOF reached before encapsulated token finished"); + } } else { // consume character token.content.append((char) c); {code} gets the EOF-implicitly-closes-unquoted-field feature to work too, and successfully parses the CSV snippet in the original comment in the same way as Excel. I am not sure whether this should be activated by the same flag (allowTrailingText) as my PR, or whether it should be a separate setting users can toggle on and off. [~ggregory]? > Handle malformed CSV files > -------------------------- > > Key: CSV-141 > URL: https://issues.apache.org/jira/browse/CSV-141 > Project: Commons CSV > Issue Type: Wish > Components: Parser > Affects Versions: 1.0 > Reporter: Nguyen Minh > Priority: Minor > Fix For: 1.x > > > My java application has to handle thousands of CSV files uploaded by the > client phones everyday. So, there some CSV files have the wrong format which > I'm not sure why. > Here is my sample CSV. Microsoft Excel parses it correctly, but both Common > CSV and OpenCSV can't parse it. Open CSV can't parse line 2 (due to '\' > character) and Common CSV will crash on line 3 and 4: > "1414770317901","android.widget.EditText","pass sem1 _84*|*","0","pass sem1 > _8" > "1414770318470","android.widget.EditText","pass sem1 _84:*|*","0","pass sem1 > _84:\" > "1414770318327","android.widget.EditText","pass sem1 > "1414770318628","android.widget.EditText","pass sem1 _84*|*","0","pass sem1 > Line 3: java.io.IOException: (line 5) invalid char between encapsulated token > and delimiter > at org.apache.commons.csv.CSVParser$1.getNextRecord(CSVParser.java:398) > at org.apache.commons.csv.CSVParser$1.hasNext(CSVParser.java:407) > Line 4: java.io.IOException: (startline 5) EOF reached before encapsulated > token finished > at org.apache.commons.csv.CSVParser$1.getNextRecord(CSVParser.java:398) > at org.apache.commons.csv.CSVParser$1.hasNext(CSVParser.java:407) -- This message was sent by Atlassian Jira (v8.20.10#820010)