Ruiqi Dong created CSV-326:
------------------------------
Summary: CSVPrinter Reader printing with quote and escape can emit
CSV that its parser cannot read back
Key: CSV-326
URL: https://issues.apache.org/jira/browse/CSV-326
Project: Commons CSV
Issue Type: Bug
Reporter: Ruiqi Dong
*Summary*
When printing normal `CharSequence` values with both a quote character and an
escape character configured, `CSVFormat#printWithQuotes(Object, CharSequence,
...)` escapes both quote characters and escape characters.
The `Reader` path does not do the same. `CSVFormat#printWithQuotes(Reader,
...)` only doubles quote characters and leaves escape characters unchanged. If
the input stream contains an escape character immediately before a quote, the
generated CSV can no longer be parsed by the same format.
*Affected code*
File: `src/main/java/org/apache/commons/csv/CSVFormat.java`
The `CharSequence` path handles both quote and escape:
{code:java}
if (c == quoteChar || c == escapeChar) {
out.append(charSeq, start, pos);
out.append(escapeChar);
start = pos;
} {code}
The `Reader` path only handles quote:
{code:java}
private void printWithQuotes(final Reader reader, final Appendable appendable)
throws IOException {
if (getQuoteMode() == QuoteMode.NONE) {
printWithEscapes(reader, appendable);
return;
}
final char quote = getQuoteCharacter().charValue();
append(quote, appendable);
int c;
while (EOF != (c = reader.read())) {
append((char) c, appendable);
if (c == quote) {
append(quote, appendable);
}
}
append(quote, appendable);
} {code}
*Reproducer*
Add this test to `src/test/java/org/apache/commons/csv/CSVPrinterTest.java`:
{code:java}
@Test
void testPrintReaderWithQuoteAndEscapeRoundTripsEscapeBeforeQuote() throws
IOException {
final CSVFormat format = CSVFormat.DEFAULT.builder()
.setEscape(BACKSLASH)
.setQuote('"')
.get();
final StringWriter sw = new StringWriter();
try (CSVPrinter printer = new CSVPrinter(sw, format)) {
printer.printRecord(new StringReader("\\\""));
}
try (CSVParser parser = format.parse(new StringReader(sw.toString()))) {
assertEquals("\\\"", parser.getRecords().get(0).get(0));
}
} {code}
Run:
{code:java}
mvn -q
-Dtest=org.apache.commons.csv.CSVPrinterTest#testPrintReaderWithQuoteAndEscapeRoundTripsEscapeBeforeQuote
test {code}
Observed behavior:
The parser cannot read the printer's output
{code:java}
java.io.UncheckedIOException: org.apache.commons.csv.CSVException:
(startline 1) EOF reached before encapsulated token finished {code}
*Expected behavior*
Printing a `Reader` value should preserve the same escaping invariants as
printing the equivalent `String` value. In particular, if an escape character
is configured, the quoted `Reader` path should not leave escape characters
unescaped when doing so changes how following quotes are parsed.
This is a semantic mismatch between two printer paths for the same logical
value. A streaming `Reader` value should not produce CSV that is less valid
than the corresponding in-memory `CharSequence` value under the same
`CSVFormat`.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)