Barry M. Caceres created CSV-296: ------------------------------------ Summary: Delimiter followed by Whitespace then by Quotes Failing with setTrim(true) Key: CSV-296 URL: https://issues.apache.org/jira/browse/CSV-296 Project: Commons CSV Issue Type: Bug Components: Parser Affects Versions: 1.9.0, 1.8 Environment: +{*}macOS{*}:+ {code:java} > uname -a Darwin Senzing-MacBook-Pro.local 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64 x86_64 {code} {code:java} > java -version openjdk version "11.0.14" 2022-01-18 OpenJDK Runtime Environment Temurin-11.0.14+9 (build 11.0.14+9) OpenJDK 64-Bit Server VM Temurin-11.0.14+9 (build 11.0.14+9, mixed mode) {code} {+}*Linux*{+}: {code:java} > uname -a Linux lnxdev 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux {code} {code:java} > java -version openjdk version "11.0.11" 2021-04-20 OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9) OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed mode){code} Reporter: Barry M. Caceres Attachments: csvfail.zip
I have my CSVFormat initialized such that *{{withTrim(true)}}* has been set. {code:java} CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader() .withIgnoreEmptyLines(true).withTrim(true);{code} However, a quoted string that begins after a delimiter followed by preceding whitespace is not properly parsed. For example: {code:java} GIVEN_NAME,SURNAME,ADDRESS,PHONE_NUMBER "Joe", "Schmoe","101 Main Street; Las Vegas, NV 89101","702-555-1212" "John","Doe", "201 First Street; Las Vegas, NV 89102", "702-555-1313" "Jane","Doe","301 Second Street; Las Vegas, NV 89103","702-555-1414" {code} Notice the whitespace preceding *{{"Schmoe"}}* on the first record? This leads to the actual value containing the quotation marks instead of them being stripped off. The whitespace preceding {color:#0747a6}*{{"201 First Street; Las Vegas, NV 89102"}}*{color} on the second record leads to it to being parsed as two values: {color:#0747a6}*{{"201 First Street; Las Vegas}}*{color} and {*}{{NV 89102"}}{*}. The third record is the only one that parses as expected. I believe that this is because the trimming is done *after* the value is being parsed rather than consuming the whitespace following the delimiter during parsing. Either that, or the check for a quoted string is occurring *before* the whitespace is being consumed. *NOTE:* I have attached a ZIP file that easily reproduces the problem with the CSV file given above. -- This message was sent by Atlassian Jira (v8.20.7#820007)