[ https://issues.apache.org/jira/browse/CSV-296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gary D. Gregory closed CSV-296. ------------------------------- Fix Version/s: 1.9.0 Resolution: Fixed > Delimiter followed by Whitespace then by Quotes Failing with setTrim(true) > -------------------------------------------------------------------------- > > Key: CSV-296 > URL: https://issues.apache.org/jira/browse/CSV-296 > Project: Commons CSV > Issue Type: Bug > Components: Parser > Affects Versions: 1.8 > Environment: +{*}macOS{*}:+ > {code:java} > > uname -a > Darwin Senzing-MacBook-Pro.local 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar > 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64 x86_64 {code} > {code:java} > > java -version > openjdk version "11.0.14" 2022-01-18 > OpenJDK Runtime Environment Temurin-11.0.14+9 (build 11.0.14+9) > OpenJDK 64-Bit Server VM Temurin-11.0.14+9 (build 11.0.14+9, mixed mode) > {code} > {+}*Linux*{+}: > {code:java} > > uname -a > Linux lnxdev 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 > x86_64 x86_64 x86_64 GNU/Linux {code} > {code:java} > > java -version > openjdk version "11.0.11" 2021-04-20 > OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9) > OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed > mode){code} > Reporter: Barry M. Caceres > Priority: Major > Fix For: 1.9.0 > > Attachments: csvfail.zip > > > I have my CSVFormat initialized such that *{{withTrim(true)}}* has been set > {_}(see attached ZIP file){_}: > {code:java} > CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader() > .withIgnoreEmptyLines(true).withTrim(true);{code} > > However, a quoted string that begins after a delimiter followed by preceding > whitespace is not properly parsed. For example: > {code:java} > GIVEN_NAME,SURNAME,ADDRESS,PHONE_NUMBER > "Joe", "Schmoe","101 Main Street; Las Vegas, NV 89101","702-555-1212" > "John","Doe", "201 First Street; Las Vegas, NV 89102", "702-555-1313" > "Jane","Doe","301 Second Street; Las Vegas, NV 89103","702-555-1414" > {code} > > * Notice the whitespace preceding {color:#0747a6}*{{"Schmoe"}}*{color} on > the first record? This leads to the actual value containing the quotation > marks instead of them being stripped off. > * The whitespace preceding {color:#0747a6}*{{"201 First Street; Las Vegas, > NV 89102"}}*{color} on the second record leads to it to being parsed as two > values: {color:#0747a6}*{{"201 First Street; Las Vegas}}*{color} and {*}{{NV > 89102"}}{*}. > * The third record is the only one that parses as expected. > I believe that this is because the trimming is done *after* the value is > being parsed rather than consuming the whitespace following the delimiter > during parsing. Either that, or the check for a quoted string is occurring > *before* the whitespace is being consumed. > > *NOTE:* I have attached a ZIP file that easily reproduces the problem with > the CSV file given above. > To build the attached project use Apache Maven and then execute using using > Java 11: > {code:java} > > unzip csvfail.zip > > cd csvfail > > mvn package > > java -jar target/csv-fail-1.0-SNAPSHOT.jar{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)