Barry M. Caceres created CSV-296:
------------------------------------

             Summary: Delimiter followed by Whitespace then by Quotes Failing 
with setTrim(true)
                 Key: CSV-296
                 URL: https://issues.apache.org/jira/browse/CSV-296
             Project: Commons CSV
          Issue Type: Bug
          Components: Parser
    Affects Versions: 1.9.0, 1.8
         Environment: +{*}macOS{*}:+
{code:java}
> uname -a
Darwin Senzing-MacBook-Pro.local 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar 
18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64 x86_64 {code}
{code:java}
> java -version
openjdk version "11.0.14" 2022-01-18
OpenJDK Runtime Environment Temurin-11.0.14+9 (build 11.0.14+9)
OpenJDK 64-Bit Server VM Temurin-11.0.14+9 (build 11.0.14+9, mixed mode) {code}
{+}*Linux*{+}:
{code:java}
> uname -a
Linux lnxdev 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 
x86_64 x86_64 x86_64 GNU/Linux {code}
{code:java}
> java -version
openjdk version "11.0.11" 2021-04-20
OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)
OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed 
mode){code}
            Reporter: Barry M. Caceres
         Attachments: csvfail.zip

I have my CSVFormat initialized such that *{{withTrim(true)}}* has been set.

 
{code:java}
CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader()
        .withIgnoreEmptyLines(true).withTrim(true);{code}
 

 

However, a quoted string that begins after a delimiter followed by preceding 
whitespace is not properly parsed.

For example:

 
{code:java}
GIVEN_NAME,SURNAME,ADDRESS,PHONE_NUMBER
"Joe",  "Schmoe","101 Main Street; Las Vegas, NV 89101","702-555-1212"
"John","Doe",  "201 First Street; Las Vegas, NV 89102", "702-555-1313"
"Jane","Doe","301 Second Street; Las Vegas, NV 89103","702-555-1414"
{code}
 

Notice the whitespace preceding *{{"Schmoe"}}* on the first record?  This leads 
to the actual value containing the quotation marks instead of them being 
stripped off.

 

The whitespace preceding {color:#0747a6}*{{"201 First Street; Las Vegas, NV 
89102"}}*{color} on the second record leads to it to being parsed as two 
values: {color:#0747a6}*{{"201 First Street; Las Vegas}}*{color} and {*}{{NV 
89102"}}{*}.

 

The third record is the only one that parses as expected.

 

 

I believe that this is because the trimming is done *after* the value is being 
parsed rather than consuming the whitespace following the delimiter during 
parsing.   Either that, or the check for a quoted string is occurring *before* 
the whitespace is being consumed.

 

*NOTE:* I have attached a ZIP file that easily reproduces the problem with the 
CSV file given above.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to