[jira] [Comment Edited] (CSV-296) Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)

2022-11-01 Thread Barry M. Caceres (Jira)


[ 
https://issues.apache.org/jira/browse/CSV-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627302#comment-17627302
 ] 

Barry M. Caceres edited comment on CSV-296 at 11/1/22 7:35 PM:
---

{{setIgnoreSurroundingSpaces(true) only kinda works If the text is quoted 
it maintains the quotes as part of the text, whereas if the quotes immediately 
follow the comma (or other separator) then the text is processed as a CSV 
quoted string and the value does not have the quotes.  But if any spaces follow 
the comma before the first quote then the quotes are included as part of the 
value and the text is taken literally.}}


was (Author: JIRAUSER289277):
{{setIgnoreSurroundingSpaces(true) only kinda works I the text is quoted it 
maintains the quotes as part of the text, whereas if the quotes immediately 
follow the comma (or other separator) then the text is processed as a CSV 
quoted string and the value does not have the quotes.  But if any spaces follow 
the comma before the first quote then the quotes are included as part of the 
value and the text is taken literally.}}

> Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)
> --
>
> Key: CSV-296
> URL: https://issues.apache.org/jira/browse/CSV-296
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.8, 1.9.0
> Environment: +{*}macOS{*}:+
> {code:java}
> > uname -a
> Darwin Senzing-MacBook-Pro.local 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar 
> 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64 x86_64 {code}
> {code:java}
> > java -version
> openjdk version "11.0.14" 2022-01-18
> OpenJDK Runtime Environment Temurin-11.0.14+9 (build 11.0.14+9)
> OpenJDK 64-Bit Server VM Temurin-11.0.14+9 (build 11.0.14+9, mixed mode) 
> {code}
> {+}*Linux*{+}:
> {code:java}
> > uname -a
> Linux lnxdev 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 
> x86_64 x86_64 x86_64 GNU/Linux {code}
> {code:java}
> > java -version
> openjdk version "11.0.11" 2021-04-20
> OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)
> OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed 
> mode){code}
>Reporter: Barry M. Caceres
>Priority: Major
> Attachments: csvfail.zip
>
>
> I have my CSVFormat initialized such that *{{withTrim(true)}}* has been set 
> {_}(see attached ZIP file){_}:
> {code:java}
> CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader()
>         .withIgnoreEmptyLines(true).withTrim(true);{code}
>  
> However, a quoted string that begins after a delimiter followed by preceding 
> whitespace is not properly parsed. For example:
> {code:java}
> GIVEN_NAME,SURNAME,ADDRESS,PHONE_NUMBER
> "Joe",  "Schmoe","101 Main Street; Las Vegas, NV 89101","702-555-1212"
> "John","Doe",  "201 First Street; Las Vegas, NV 89102", "702-555-1313"
> "Jane","Doe","301 Second Street; Las Vegas, NV 89103","702-555-1414"
> {code}
>  
>  * Notice the whitespace preceding {color:#0747a6}*{{"Schmoe"}}*{color} on 
> the first record?  This leads to the actual value containing the quotation 
> marks instead of them being stripped off.
>  * The whitespace preceding {color:#0747a6}*{{"201 First Street; Las Vegas, 
> NV 89102"}}*{color} on the second record leads to it to being parsed as two 
> values: {color:#0747a6}*{{"201 First Street; Las Vegas}}*{color} and {*}{{NV 
> 89102"}}{*}.
>  * The third record is the only one that parses as expected.
> I believe that this is because the trimming is done *after* the value is 
> being parsed rather than consuming the whitespace following the delimiter 
> during parsing.   Either that, or the check for a quoted string is occurring 
> *before* the whitespace is being consumed.
>  
> *NOTE:* I have attached a ZIP file that easily reproduces the problem with 
> the CSV file given above.
> To build the attached project use Apache Maven and then execute using using 
> Java 11:
> {code:java}
> > unzip csvfail.zip
> > cd csvfail
> > mvn package
> > java -jar target/csv-fail-1.0-SNAPSHOT.jar{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CSV-296) Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)

2022-11-01 Thread Barry M. Caceres (Jira)


[ 
https://issues.apache.org/jira/browse/CSV-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627302#comment-17627302
 ] 

Barry M. Caceres edited comment on CSV-296 at 11/1/22 7:27 PM:
---

{{setIgnoreSurroundingSpaces(true) only kinda works I the text is quoted it 
maintains the quotes as part of the text, whereas if the quotes immediately 
follow the comma (or other separator) then the text is processed as a CSV 
quoted string and the value does not have the quotes.  But if any spaces follow 
the comma before the first quote then the quotes are included as part of the 
value and the text is taken literally.}}


was (Author: JIRAUSER289277):
{{setIgnoreSurroundingSpaces(true) only kinda works I the text is quoted it 
maintains the quotes as part of the text, whereas if the quotes immediately 
follow the comma (or other separator) then the text is processed as a CSV 
quoted string and the value does not have the quotes.  But if any spaces follow 
the comma before the first quote then the quotes are included as part of the 
value and the text is taken literally.}}

> Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)
> --
>
> Key: CSV-296
> URL: https://issues.apache.org/jira/browse/CSV-296
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.8, 1.9.0
> Environment: +{*}macOS{*}:+
> {code:java}
> > uname -a
> Darwin Senzing-MacBook-Pro.local 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar 
> 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64 x86_64 {code}
> {code:java}
> > java -version
> openjdk version "11.0.14" 2022-01-18
> OpenJDK Runtime Environment Temurin-11.0.14+9 (build 11.0.14+9)
> OpenJDK 64-Bit Server VM Temurin-11.0.14+9 (build 11.0.14+9, mixed mode) 
> {code}
> {+}*Linux*{+}:
> {code:java}
> > uname -a
> Linux lnxdev 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 
> x86_64 x86_64 x86_64 GNU/Linux {code}
> {code:java}
> > java -version
> openjdk version "11.0.11" 2021-04-20
> OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)
> OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed 
> mode){code}
>Reporter: Barry M. Caceres
>Priority: Major
> Attachments: csvfail.zip
>
>
> I have my CSVFormat initialized such that *{{withTrim(true)}}* has been set 
> {_}(see attached ZIP file){_}:
> {code:java}
> CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader()
>         .withIgnoreEmptyLines(true).withTrim(true);{code}
>  
> However, a quoted string that begins after a delimiter followed by preceding 
> whitespace is not properly parsed. For example:
> {code:java}
> GIVEN_NAME,SURNAME,ADDRESS,PHONE_NUMBER
> "Joe",  "Schmoe","101 Main Street; Las Vegas, NV 89101","702-555-1212"
> "John","Doe",  "201 First Street; Las Vegas, NV 89102", "702-555-1313"
> "Jane","Doe","301 Second Street; Las Vegas, NV 89103","702-555-1414"
> {code}
>  
>  * Notice the whitespace preceding {color:#0747a6}*{{"Schmoe"}}*{color} on 
> the first record?  This leads to the actual value containing the quotation 
> marks instead of them being stripped off.
>  * The whitespace preceding {color:#0747a6}*{{"201 First Street; Las Vegas, 
> NV 89102"}}*{color} on the second record leads to it to being parsed as two 
> values: {color:#0747a6}*{{"201 First Street; Las Vegas}}*{color} and {*}{{NV 
> 89102"}}{*}.
>  * The third record is the only one that parses as expected.
> I believe that this is because the trimming is done *after* the value is 
> being parsed rather than consuming the whitespace following the delimiter 
> during parsing.   Either that, or the check for a quoted string is occurring 
> *before* the whitespace is being consumed.
>  
> *NOTE:* I have attached a ZIP file that easily reproduces the problem with 
> the CSV file given above.
> To build the attached project use Apache Maven and then execute using using 
> Java 11:
> {code:java}
> > unzip csvfail.zip
> > cd csvfail
> > mvn package
> > java -jar target/csv-fail-1.0-SNAPSHOT.jar{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CSV-296) Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)

2022-11-01 Thread Barry M. Caceres (Jira)


[ 
https://issues.apache.org/jira/browse/CSV-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627302#comment-17627302
 ] 

Barry M. Caceres edited comment on CSV-296 at 11/1/22 7:27 PM:
---

{{setIgnoreSurroundingSpaces(true) only kinda works I the text is quoted it 
maintains the quotes as part of the text, whereas if the quotes immediately 
follow the comma (or other separator) then the text is processed as a CSV 
quoted string and the value does not have the quotes.  But if any spaces follow 
the comma before the first quote then the quotes are included as part of the 
value and the text is taken literally.}}


was (Author: JIRAUSER289277):
setIgnoreSurroundingSpaces(true) only kinda works I the text is quoted it 
maintains the quotes as part of the text, whereas if the quotes immediately 
follow the comma (or other separator) then the text is processed as a CSV 
quoted string and the value does not have the quotes.  But if any spaces follow 
the comma before the first quote then the quotes are included as part of the 
value and the text is taken literally.

> Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)
> --
>
> Key: CSV-296
> URL: https://issues.apache.org/jira/browse/CSV-296
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.8, 1.9.0
> Environment: +{*}macOS{*}:+
> {code:java}
> > uname -a
> Darwin Senzing-MacBook-Pro.local 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar 
> 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64 x86_64 {code}
> {code:java}
> > java -version
> openjdk version "11.0.14" 2022-01-18
> OpenJDK Runtime Environment Temurin-11.0.14+9 (build 11.0.14+9)
> OpenJDK 64-Bit Server VM Temurin-11.0.14+9 (build 11.0.14+9, mixed mode) 
> {code}
> {+}*Linux*{+}:
> {code:java}
> > uname -a
> Linux lnxdev 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 
> x86_64 x86_64 x86_64 GNU/Linux {code}
> {code:java}
> > java -version
> openjdk version "11.0.11" 2021-04-20
> OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)
> OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed 
> mode){code}
>Reporter: Barry M. Caceres
>Priority: Major
> Attachments: csvfail.zip
>
>
> I have my CSVFormat initialized such that *{{withTrim(true)}}* has been set 
> {_}(see attached ZIP file){_}:
> {code:java}
> CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader()
>         .withIgnoreEmptyLines(true).withTrim(true);{code}
>  
> However, a quoted string that begins after a delimiter followed by preceding 
> whitespace is not properly parsed. For example:
> {code:java}
> GIVEN_NAME,SURNAME,ADDRESS,PHONE_NUMBER
> "Joe",  "Schmoe","101 Main Street; Las Vegas, NV 89101","702-555-1212"
> "John","Doe",  "201 First Street; Las Vegas, NV 89102", "702-555-1313"
> "Jane","Doe","301 Second Street; Las Vegas, NV 89103","702-555-1414"
> {code}
>  
>  * Notice the whitespace preceding {color:#0747a6}*{{"Schmoe"}}*{color} on 
> the first record?  This leads to the actual value containing the quotation 
> marks instead of them being stripped off.
>  * The whitespace preceding {color:#0747a6}*{{"201 First Street; Las Vegas, 
> NV 89102"}}*{color} on the second record leads to it to being parsed as two 
> values: {color:#0747a6}*{{"201 First Street; Las Vegas}}*{color} and {*}{{NV 
> 89102"}}{*}.
>  * The third record is the only one that parses as expected.
> I believe that this is because the trimming is done *after* the value is 
> being parsed rather than consuming the whitespace following the delimiter 
> during parsing.   Either that, or the check for a quoted string is occurring 
> *before* the whitespace is being consumed.
>  
> *NOTE:* I have attached a ZIP file that easily reproduces the problem with 
> the CSV file given above.
> To build the attached project use Apache Maven and then execute using using 
> Java 11:
> {code:java}
> > unzip csvfail.zip
> > cd csvfail
> > mvn package
> > java -jar target/csv-fail-1.0-SNAPSHOT.jar{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)