Ruiqi Dong created CSV-327:
------------------------------
Summary: CSVParser applies maxRows to record numbers instead of
rows produced when setRecordNumber(...) is used
Key: CSV-327
URL: https://issues.apache.org/jira/browse/CSV-327
Project: Commons CSV
Issue Type: Bug
Reporter: Ruiqi Dong
*Summary*
`CSVParser.Builder#setRecordNumber(...)` sets the next record number to assign.
This is useful when parsing resumes from the middle of a larger source.
When `CSVFormat.Builder#setMaxRows(...)` is also configured, the iterator
checks `format.useRow(recordNumber + 1)` before parsing the next row. If the
resumed record number is greater than `maxRows`, the parser returns no rows at
all, even though it has not produced any row in this parser instance.
*Affected code*
File: `src/main/java/org/apache/commons/csv/CSVParser.java`
{code:java}
private CSVRecord getNextRecord() {
CSVRecord record = null;
if (format.useRow(recordNumber + 1)) {
record = Uncheck.get(CSVParser.this::nextRecord);
}
return record;
} {code}
File: `src/main/java/org/apache/commons/csv/CSVFormat.java`
{code:java}
boolean useRow(final long rowNum) {
return !useMaxRows() || rowNum <= getMaxRows();
} {code}
The check uses the assigned record number as if it were a count of rows
produced by the current parser.
*Reproducer*
Add this test to `src/test/java/org/apache/commons/csv/CSVParserTest.java`
{code:java}
@Test
void testGetRecordsMaxRowsWithRecordNumberOffset() throws IOException {
try (CSVParser parser = CSVParser.builder()
.setReader(new StringReader("a,b\nc,d\n"))
.setFormat(CSVFormat.DEFAULT.builder().setMaxRows(1).get())
.setRecordNumber(2)
.get()) {
final List<CSVRecord> records = parser.getRecords();
assertEquals(1, records.size());
assertEquals(2, records.get(0).getRecordNumber());
assertValuesEquals(new String[] { "a", "b" }, records.get(0));
}
} {code}
Run:
{code:java}
mvn -q
-Dtest=org.apache.commons.csv.CSVParserTest#testGetRecordsMaxRowsWithRecordNumberOffset
test {code}
Observed behavior:
The test fails
{code:java}
expected: <1> but was: <0> {code}
*Expected behavior:*
`maxRows` should limit how many rows this parser/iterator produces. It should
not reject the first row solely because the caller configured the next assigned
record number to start at `2`.
`setRecordNumber(...)` is record-number metadata. `setMaxRows(...)` is a
row-production limit. Coupling these two values breaks resumed parsing: a
caller cannot both preserve original record numbers and limit the number of
rows returned.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)