[ https://issues.apache.org/jira/browse/CSV-290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492340#comment-17492340 ]
Angus C edited comment on CSV-290 at 2/15/22, 3:52 AM: ------------------------------------------------------- Basically the "EOF reached" always happens if quote-char = escape-char. Considering the input string ("a"), Lexer.java treats the second (") as an escape char and read the unescaped \r, and then complain for missing the ending-quote (") {code:java} CSVFormat.Builder.create().setEscape('"').build().parse(new StringReader("\"a\"")).getRecords(); {code} I think the setEscape() is used for escaping special char like \r, \t etc. as in Lexer.readEscape() but not the quote-char. The quote-char should always be escaped by quote-char, not the escape-char. Your fix is to disable the escape-char in quoted-string if it is equal to quote-char. It can be a fail-safe but I think we should remove the .setEscape(DOUBLE_QUOTE_CHAR) in POSTGRESQL_CSV. The javadoc says "special characters are escaped with quote" but I doubt that it is correct or not was (Author: JIRAUSER285196): Basically the "EOF reached" always happens if quote-char = escape-char. Considering the input string ("a"), Lexer.java treats the second (") as an escape char and read the unescaped \r, and then complain for missing the ending-quote (") {code:java} CSVFormat.Builder.create().setEscape('"').build().parse(new StringReader("\"a\"")).getRecords(); {code} I think the setEscape() is used for escaping special char like \r, \t etc. as in Lexer.readEscape() but not the quote-char. The quote-char should be always escaped by quote-char, not the escape-char. Your fix is to disable the escape-char in quoted-string if it is equal to quote-char. It can be a fail-save but I think we should remove the .setEscape(DOUBLE_QUOTE_CHAR) in POSTGRESQL_CSV. The javadoc says "special * characters are escaped with quote" but I doubt that it is correct or not > Produced CSV using PostgreSQL format cannot be read > --------------------------------------------------- > > Key: CSV-290 > URL: https://issues.apache.org/jira/browse/CSV-290 > Project: Commons CSV > Issue Type: Bug > Components: Parser > Affects Versions: 1.6, 1.9.0 > Reporter: Anatoliy Artemenko > Priority: Major > > {code:java} > // code placeholder > {code} > CSV, produced using printer: > > CSVPrinter printer = new CSVPrinter(sw, > CSVFormat.POSTGRESQL_CSV.withFirstRecordAsHeader()); > > cannot be be read with same format parser: > > CSVParser parser = new CSVParser(new StringReader(sw.toString()), > CSVFormat.POSTGRESQL_CSV.withFirstRecordAsHeader()); > > To reproduce: > > {code:java} > StringWriter sw = new StringWriter(); > CSVPrinter printer = new CSVPrinter(sw, > CSVFormat.POSTGRESQL_CSV.withFirstRecordAsHeader()); > printer.printRecord("column1", "column2"); > printer.printRecord("v11", "v12"); > printer.printRecord("v21", "v22"); > printer.close(); > CSVParser parser = new CSVParser(new StringReader(sw.toString()), > CSVFormat.POSTGRESQL_CSV.withFirstRecordAsHeader()); > System.out.println("headers: " + > Arrays.equals(parser.getHeaderNames().toArray(), new String[] {"column1", > "column2"})); > Iterator<CSVRecord> i = parser.iterator(); > System.out.println("row: " + Arrays.equals(i.next().toList().toArray(), new > String[] {"v11", "v12"})); > System.out.println("row: " + Arrays.equals(i.next().toList().toArray(), new > String[] {"v21", "v22"}));{code} > I'd expect the above code to work, but it fails: > {code:java} > java.io.IOException: (startline 1) EOF reached before encapsulated token > finishedjava.io.IOException: (startline 1) EOF reached before encapsulated > token finished > at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:371) > at org.apache.commons.csv.Lexer.nextToken(Lexer.java:285) > at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:701) > at org.apache.commons.csv.CSVParser.createHeaders(CSVParser.java:480) > at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:432) > at org.apache.commons.csv.CSVParser.<init>(CSVParser.java:398) > at Test.main(Test.java:25) > {code} > -- This message was sent by Atlassian Jira (v8.20.1#820001)