OldTruckDriver opened a new pull request, #613:
URL: https://github.com/apache/commons-csv/pull/613

   [CSV-329] Fix byte tracking for supplementary delimiters
   
   CSVParser with trackBytes enabled could throw CharacterCodingException when 
a multi-character delimiter contained a supplementary Unicode character. The 
failure happened while delimiter lookahead read a surrogate pair through 
ExtendedBufferedReader.read(char[]).
   
   This change updates ExtendedBufferedReader byte-length accounting for 
char-buffer reads so surrogate pairs are evaluated with the correct previous 
character before lastChar is updated. This lets byte tracking remain 
metadata-only and not change parser correctness.
   
   Tests cover trackBytes=true with a multi-character delimiter containing an 
emoji, including byte-position tracking across records.
   
   Tests run:
   - mvn -q 
-Dtest=org.apache.commons.csv.CSVParserTest#testGetBytePositionMultiCharacterDelimiterWithSupplementaryCharacter
 test
   - mvn -q 
-Dtest=org.apache.commons.csv.CSVParserTest,org.apache.commons.csv.ExtendedBufferedReaderTest
 test
   - mvn -q


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to