OldTruckDriver opened a new pull request, #613: URL: https://github.com/apache/commons-csv/pull/613
[CSV-329] Fix byte tracking for supplementary delimiters CSVParser with trackBytes enabled could throw CharacterCodingException when a multi-character delimiter contained a supplementary Unicode character. The failure happened while delimiter lookahead read a surrogate pair through ExtendedBufferedReader.read(char[]). This change updates ExtendedBufferedReader byte-length accounting for char-buffer reads so surrogate pairs are evaluated with the correct previous character before lastChar is updated. This lets byte tracking remain metadata-only and not change parser correctness. Tests cover trackBytes=true with a multi-character delimiter containing an emoji, including byte-position tracking across records. Tests run: - mvn -q -Dtest=org.apache.commons.csv.CSVParserTest#testGetBytePositionMultiCharacterDelimiterWithSupplementaryCharacter test - mvn -q -Dtest=org.apache.commons.csv.CSVParserTest,org.apache.commons.csv.ExtendedBufferedReaderTest test - mvn -q -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
