[ 
https://issues.apache.org/jira/browse/CSV-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17608438#comment-17608438
 ] 

Angus C commented on CSV-296:
-----------------------------

Use 

setIgnoreSurroundingSpaces(true)

> Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)
> --------------------------------------------------------------------------
>
>                 Key: CSV-296
>                 URL: https://issues.apache.org/jira/browse/CSV-296
>             Project: Commons CSV
>          Issue Type: Bug
>          Components: Parser
>    Affects Versions: 1.8, 1.9.0
>         Environment: +{*}macOS{*}:+
> {code:java}
> > uname -a
> Darwin Senzing-MacBook-Pro.local 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar 
> 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64 x86_64 {code}
> {code:java}
> > java -version
> openjdk version "11.0.14" 2022-01-18
> OpenJDK Runtime Environment Temurin-11.0.14+9 (build 11.0.14+9)
> OpenJDK 64-Bit Server VM Temurin-11.0.14+9 (build 11.0.14+9, mixed mode) 
> {code}
> {+}*Linux*{+}:
> {code:java}
> > uname -a
> Linux lnxdev 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 
> x86_64 x86_64 x86_64 GNU/Linux {code}
> {code:java}
> > java -version
> openjdk version "11.0.11" 2021-04-20
> OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)
> OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed 
> mode){code}
>            Reporter: Barry M. Caceres
>            Priority: Major
>         Attachments: csvfail.zip
>
>
> I have my CSVFormat initialized such that *{{withTrim(true)}}* has been set 
> {_}(see attached ZIP file){_}:
> {code:java}
> CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader()
>         .withIgnoreEmptyLines(true).withTrim(true);{code}
>  
> However, a quoted string that begins after a delimiter followed by preceding 
> whitespace is not properly parsed. For example:
> {code:java}
> GIVEN_NAME,SURNAME,ADDRESS,PHONE_NUMBER
> "Joe",  "Schmoe","101 Main Street; Las Vegas, NV 89101","702-555-1212"
> "John","Doe",  "201 First Street; Las Vegas, NV 89102", "702-555-1313"
> "Jane","Doe","301 Second Street; Las Vegas, NV 89103","702-555-1414"
> {code}
>  
>  * Notice the whitespace preceding {color:#0747a6}*{{"Schmoe"}}*{color} on 
> the first record?  This leads to the actual value containing the quotation 
> marks instead of them being stripped off.
>  * The whitespace preceding {color:#0747a6}*{{"201 First Street; Las Vegas, 
> NV 89102"}}*{color} on the second record leads to it to being parsed as two 
> values: {color:#0747a6}*{{"201 First Street; Las Vegas}}*{color} and {*}{{NV 
> 89102"}}{*}.
>  * The third record is the only one that parses as expected.
> I believe that this is because the trimming is done *after* the value is 
> being parsed rather than consuming the whitespace following the delimiter 
> during parsing.   Either that, or the check for a quoted string is occurring 
> *before* the whitespace is being consumed.
>  
> *NOTE:* I have attached a ZIP file that easily reproduces the problem with 
> the CSV file given above.
> To build the attached project use Apache Maven and then execute using using 
> Java 11:
> {code:java}
> > unzip csvfail.zip
> > cd csvfail
> > mvn package
> > java -jar target/csv-fail-1.0-SNAPSHOT.jar{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to