[ 
https://issues.apache.org/jira/browse/CSV-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18086343#comment-18086343
 ] 

Gary D. Gregory commented on CSV-326:
-------------------------------------

Hello [~rickydong]

Thank you for your report.
Feel free to provide a PR on GitHub.

> CSVPrinter Reader printing with quote and escape can emit CSV that its parser 
> cannot read back
> ----------------------------------------------------------------------------------------------
>
>                 Key: CSV-326
>                 URL: https://issues.apache.org/jira/browse/CSV-326
>             Project: Commons CSV
>          Issue Type: Bug
>            Reporter: Ruiqi Dong
>            Priority: Major
>
> *Summary*
> When printing normal `CharSequence` values with both a quote character and an 
> escape character configured, `CSVFormat#printWithQuotes(Object, CharSequence, 
> ...)` escapes both quote characters and escape characters.
> The `Reader` path does not do the same. `CSVFormat#printWithQuotes(Reader, 
> ...)` only doubles quote characters and leaves escape characters unchanged. 
> If the input stream contains an escape character immediately before a quote, 
> the generated CSV can no longer be parsed by the same format.
>  
> *Affected code*
> File: `src/main/java/org/apache/commons/csv/CSVFormat.java`
> The `CharSequence` path handles both quote and escape:
> {code:java}
> if (c == quoteChar || c == escapeChar) {
>     out.append(charSeq, start, pos);
>     out.append(escapeChar);
>     start = pos;
> } {code}
> The `Reader` path only handles quote:
> {code:java}
> private void printWithQuotes(final Reader reader, final Appendable 
> appendable) throws IOException {
>     if (getQuoteMode() == QuoteMode.NONE) {
>         printWithEscapes(reader, appendable);
>         return;
>     }
>     final char quote = getQuoteCharacter().charValue();
>     append(quote, appendable);
>     int c;
>     while (EOF != (c = reader.read())) {
>         append((char) c, appendable);
>         if (c == quote) {
>             append(quote, appendable);
>         }
>     }
>     append(quote, appendable);
> } {code}
> *Reproducer*
> Add this test to `src/test/java/org/apache/commons/csv/CSVPrinterTest.java`:
> {code:java}
> @Test
> void testPrintReaderWithQuoteAndEscapeRoundTripsEscapeBeforeQuote() throws 
> IOException {
>     final CSVFormat format = CSVFormat.DEFAULT.builder()
>             .setEscape(BACKSLASH)
>             .setQuote('"')
>             .get();
>     final StringWriter sw = new StringWriter();
>     try (CSVPrinter printer = new CSVPrinter(sw, format)) {
>         printer.printRecord(new StringReader("\\\""));
>     }
>     try (CSVParser parser = format.parse(new StringReader(sw.toString()))) {
>         assertEquals("\\\"", parser.getRecords().get(0).get(0));
>     }
> } {code}
> Run:
> {code:java}
> mvn -q 
> -Dtest=org.apache.commons.csv.CSVPrinterTest#testPrintReaderWithQuoteAndEscapeRoundTripsEscapeBeforeQuote
>  test {code}
> Observed behavior:
> The parser cannot read the printer's output
> {code:java}
> java.io.UncheckedIOException: org.apache.commons.csv.CSVException:
> (startline 1) EOF reached before encapsulated token finished {code}
> *Expected behavior*
> Printing a `Reader` value should preserve the same escaping invariants as 
> printing the equivalent `String` value. In particular, if an escape character 
> is configured, the quoted `Reader` path should not leave escape characters 
> unescaped when doing so changes how following quotes are parsed.
>  
> This is a semantic mismatch between two printer paths for the same logical 
> value. A streaming `Reader` value should not produce CSV that is less valid 
> than the corresponding in-memory `CharSequence` value under the same 
> `CSVFormat`.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to