[
https://issues.apache.org/jira/browse/CSV-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18086343#comment-18086343
]
Gary D. Gregory commented on CSV-326:
-------------------------------------
Hello [~rickydong]
Thank you for your report.
Feel free to provide a PR on GitHub.
> CSVPrinter Reader printing with quote and escape can emit CSV that its parser
> cannot read back
> ----------------------------------------------------------------------------------------------
>
> Key: CSV-326
> URL: https://issues.apache.org/jira/browse/CSV-326
> Project: Commons CSV
> Issue Type: Bug
> Reporter: Ruiqi Dong
> Priority: Major
>
> *Summary*
> When printing normal `CharSequence` values with both a quote character and an
> escape character configured, `CSVFormat#printWithQuotes(Object, CharSequence,
> ...)` escapes both quote characters and escape characters.
> The `Reader` path does not do the same. `CSVFormat#printWithQuotes(Reader,
> ...)` only doubles quote characters and leaves escape characters unchanged.
> If the input stream contains an escape character immediately before a quote,
> the generated CSV can no longer be parsed by the same format.
>
> *Affected code*
> File: `src/main/java/org/apache/commons/csv/CSVFormat.java`
> The `CharSequence` path handles both quote and escape:
> {code:java}
> if (c == quoteChar || c == escapeChar) {
> out.append(charSeq, start, pos);
> out.append(escapeChar);
> start = pos;
> } {code}
> The `Reader` path only handles quote:
> {code:java}
> private void printWithQuotes(final Reader reader, final Appendable
> appendable) throws IOException {
> if (getQuoteMode() == QuoteMode.NONE) {
> printWithEscapes(reader, appendable);
> return;
> }
> final char quote = getQuoteCharacter().charValue();
> append(quote, appendable);
> int c;
> while (EOF != (c = reader.read())) {
> append((char) c, appendable);
> if (c == quote) {
> append(quote, appendable);
> }
> }
> append(quote, appendable);
> } {code}
> *Reproducer*
> Add this test to `src/test/java/org/apache/commons/csv/CSVPrinterTest.java`:
> {code:java}
> @Test
> void testPrintReaderWithQuoteAndEscapeRoundTripsEscapeBeforeQuote() throws
> IOException {
> final CSVFormat format = CSVFormat.DEFAULT.builder()
> .setEscape(BACKSLASH)
> .setQuote('"')
> .get();
> final StringWriter sw = new StringWriter();
> try (CSVPrinter printer = new CSVPrinter(sw, format)) {
> printer.printRecord(new StringReader("\\\""));
> }
> try (CSVParser parser = format.parse(new StringReader(sw.toString()))) {
> assertEquals("\\\"", parser.getRecords().get(0).get(0));
> }
> } {code}
> Run:
> {code:java}
> mvn -q
> -Dtest=org.apache.commons.csv.CSVPrinterTest#testPrintReaderWithQuoteAndEscapeRoundTripsEscapeBeforeQuote
> test {code}
> Observed behavior:
> The parser cannot read the printer's output
> {code:java}
> java.io.UncheckedIOException: org.apache.commons.csv.CSVException:
> (startline 1) EOF reached before encapsulated token finished {code}
> *Expected behavior*
> Printing a `Reader` value should preserve the same escaping invariants as
> printing the equivalent `String` value. In particular, if an escape character
> is configured, the quoted `Reader` path should not leave escape characters
> unescaped when doing so changes how following quotes are parsed.
>
> This is a semantic mismatch between two printer paths for the same logical
> value. A streaming `Reader` value should not produce CSV that is less valid
> than the corresponding in-memory `CharSequence` value under the same
> `CSVFormat`.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)