[jira] [Commented] (CSV-296) Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)

2022-11-01 Thread Barry M. Caceres (Jira)


[ 
https://issues.apache.org/jira/browse/CSV-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17627358#comment-17627358
 ] 

Barry M. Caceres commented on CSV-296:
--

Once I made sure I was using the correct version of my library and upgraded to 
version v1.9.0 of Commons CSV, your suggestion worked as expected.

> Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)
> --
>
> Key: CSV-296
> URL: https://issues.apache.org/jira/browse/CSV-296
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.8, 1.9.0
> Environment: +{*}macOS{*}:+
> {code:java}
> > uname -a
> Darwin Senzing-MacBook-Pro.local 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar 
> 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64 x86_64 {code}
> {code:java}
> > java -version
> openjdk version "11.0.14" 2022-01-18
> OpenJDK Runtime Environment Temurin-11.0.14+9 (build 11.0.14+9)
> OpenJDK 64-Bit Server VM Temurin-11.0.14+9 (build 11.0.14+9, mixed mode) 
> {code}
> {+}*Linux*{+}:
> {code:java}
> > uname -a
> Linux lnxdev 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 
> x86_64 x86_64 x86_64 GNU/Linux {code}
> {code:java}
> > java -version
> openjdk version "11.0.11" 2021-04-20
> OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)
> OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed 
> mode){code}
>Reporter: Barry M. Caceres
>Priority: Major
> Attachments: csvfail.zip
>
>
> I have my CSVFormat initialized such that *{{withTrim(true)}}* has been set 
> {_}(see attached ZIP file){_}:
> {code:java}
> CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader()
>         .withIgnoreEmptyLines(true).withTrim(true);{code}
>  
> However, a quoted string that begins after a delimiter followed by preceding 
> whitespace is not properly parsed. For example:
> {code:java}
> GIVEN_NAME,SURNAME,ADDRESS,PHONE_NUMBER
> "Joe",  "Schmoe","101 Main Street; Las Vegas, NV 89101","702-555-1212"
> "John","Doe",  "201 First Street; Las Vegas, NV 89102", "702-555-1313"
> "Jane","Doe","301 Second Street; Las Vegas, NV 89103","702-555-1414"
> {code}
>  
>  * Notice the whitespace preceding {color:#0747a6}*{{"Schmoe"}}*{color} on 
> the first record?  This leads to the actual value containing the quotation 
> marks instead of them being stripped off.
>  * The whitespace preceding {color:#0747a6}*{{"201 First Street; Las Vegas, 
> NV 89102"}}*{color} on the second record leads to it to being parsed as two 
> values: {color:#0747a6}*{{"201 First Street; Las Vegas}}*{color} and {*}{{NV 
> 89102"}}{*}.
>  * The third record is the only one that parses as expected.
> I believe that this is because the trimming is done *after* the value is 
> being parsed rather than consuming the whitespace following the delimiter 
> during parsing.   Either that, or the check for a quoted string is occurring 
> *before* the whitespace is being consumed.
>  
> *NOTE:* I have attached a ZIP file that easily reproduces the problem with 
> the CSV file given above.
> To build the attached project use Apache Maven and then execute using using 
> Java 11:
> {code:java}
> > unzip csvfail.zip
> > cd csvfail
> > mvn package
> > java -jar target/csv-fail-1.0-SNAPSHOT.jar{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CSV-296) Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)

2022-11-01 Thread Barry M. Caceres (Jira)


[ 
https://issues.apache.org/jira/browse/CSV-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17627302#comment-17627302
 ] 

Barry M. Caceres commented on CSV-296:
--

setIgnoreSurroundingSpaces(true) only kinda works I the text is quoted it 
maintains the quotes as part of the text, whereas if the quotes immediately 
follow the comma (or other separator) then the text is processed as a CSV 
quoted string and the value does not have the quotes.  But if any spaces follow 
the comma before the first quote then the quotes are included as part of the 
value and the text is taken literally.

> Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)
> --
>
> Key: CSV-296
> URL: https://issues.apache.org/jira/browse/CSV-296
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.8, 1.9.0
> Environment: +{*}macOS{*}:+
> {code:java}
> > uname -a
> Darwin Senzing-MacBook-Pro.local 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar 
> 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64 x86_64 {code}
> {code:java}
> > java -version
> openjdk version "11.0.14" 2022-01-18
> OpenJDK Runtime Environment Temurin-11.0.14+9 (build 11.0.14+9)
> OpenJDK 64-Bit Server VM Temurin-11.0.14+9 (build 11.0.14+9, mixed mode) 
> {code}
> {+}*Linux*{+}:
> {code:java}
> > uname -a
> Linux lnxdev 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 
> x86_64 x86_64 x86_64 GNU/Linux {code}
> {code:java}
> > java -version
> openjdk version "11.0.11" 2021-04-20
> OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)
> OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed 
> mode){code}
>Reporter: Barry M. Caceres
>Priority: Major
> Attachments: csvfail.zip
>
>
> I have my CSVFormat initialized such that *{{withTrim(true)}}* has been set 
> {_}(see attached ZIP file){_}:
> {code:java}
> CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader()
>         .withIgnoreEmptyLines(true).withTrim(true);{code}
>  
> However, a quoted string that begins after a delimiter followed by preceding 
> whitespace is not properly parsed. For example:
> {code:java}
> GIVEN_NAME,SURNAME,ADDRESS,PHONE_NUMBER
> "Joe",  "Schmoe","101 Main Street; Las Vegas, NV 89101","702-555-1212"
> "John","Doe",  "201 First Street; Las Vegas, NV 89102", "702-555-1313"
> "Jane","Doe","301 Second Street; Las Vegas, NV 89103","702-555-1414"
> {code}
>  
>  * Notice the whitespace preceding {color:#0747a6}*{{"Schmoe"}}*{color} on 
> the first record?  This leads to the actual value containing the quotation 
> marks instead of them being stripped off.
>  * The whitespace preceding {color:#0747a6}*{{"201 First Street; Las Vegas, 
> NV 89102"}}*{color} on the second record leads to it to being parsed as two 
> values: {color:#0747a6}*{{"201 First Street; Las Vegas}}*{color} and {*}{{NV 
> 89102"}}{*}.
>  * The third record is the only one that parses as expected.
> I believe that this is because the trimming is done *after* the value is 
> being parsed rather than consuming the whitespace following the delimiter 
> during parsing.   Either that, or the check for a quoted string is occurring 
> *before* the whitespace is being consumed.
>  
> *NOTE:* I have attached a ZIP file that easily reproduces the problem with 
> the CSV file given above.
> To build the attached project use Apache Maven and then execute using using 
> Java 11:
> {code:java}
> > unzip csvfail.zip
> > cd csvfail
> > mvn package
> > java -jar target/csv-fail-1.0-SNAPSHOT.jar{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CSV-296) Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)

2022-09-22 Thread Angus C (Jira)


[ 
https://issues.apache.org/jira/browse/CSV-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17608438#comment-17608438
 ] 

Angus C commented on CSV-296:
-

Use 

setIgnoreSurroundingSpaces(true)

> Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)
> --
>
> Key: CSV-296
> URL: https://issues.apache.org/jira/browse/CSV-296
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.8, 1.9.0
> Environment: +{*}macOS{*}:+
> {code:java}
> > uname -a
> Darwin Senzing-MacBook-Pro.local 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar 
> 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64 x86_64 {code}
> {code:java}
> > java -version
> openjdk version "11.0.14" 2022-01-18
> OpenJDK Runtime Environment Temurin-11.0.14+9 (build 11.0.14+9)
> OpenJDK 64-Bit Server VM Temurin-11.0.14+9 (build 11.0.14+9, mixed mode) 
> {code}
> {+}*Linux*{+}:
> {code:java}
> > uname -a
> Linux lnxdev 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 
> x86_64 x86_64 x86_64 GNU/Linux {code}
> {code:java}
> > java -version
> openjdk version "11.0.11" 2021-04-20
> OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)
> OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed 
> mode){code}
>Reporter: Barry M. Caceres
>Priority: Major
> Attachments: csvfail.zip
>
>
> I have my CSVFormat initialized such that *{{withTrim(true)}}* has been set 
> {_}(see attached ZIP file){_}:
> {code:java}
> CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader()
>         .withIgnoreEmptyLines(true).withTrim(true);{code}
>  
> However, a quoted string that begins after a delimiter followed by preceding 
> whitespace is not properly parsed. For example:
> {code:java}
> GIVEN_NAME,SURNAME,ADDRESS,PHONE_NUMBER
> "Joe",  "Schmoe","101 Main Street; Las Vegas, NV 89101","702-555-1212"
> "John","Doe",  "201 First Street; Las Vegas, NV 89102", "702-555-1313"
> "Jane","Doe","301 Second Street; Las Vegas, NV 89103","702-555-1414"
> {code}
>  
>  * Notice the whitespace preceding {color:#0747a6}*{{"Schmoe"}}*{color} on 
> the first record?  This leads to the actual value containing the quotation 
> marks instead of them being stripped off.
>  * The whitespace preceding {color:#0747a6}*{{"201 First Street; Las Vegas, 
> NV 89102"}}*{color} on the second record leads to it to being parsed as two 
> values: {color:#0747a6}*{{"201 First Street; Las Vegas}}*{color} and {*}{{NV 
> 89102"}}{*}.
>  * The third record is the only one that parses as expected.
> I believe that this is because the trimming is done *after* the value is 
> being parsed rather than consuming the whitespace following the delimiter 
> during parsing.   Either that, or the check for a quoted string is occurring 
> *before* the whitespace is being consumed.
>  
> *NOTE:* I have attached a ZIP file that easily reproduces the problem with 
> the CSV file given above.
> To build the attached project use Apache Maven and then execute using using 
> Java 11:
> {code:java}
> > unzip csvfail.zip
> > cd csvfail
> > mvn package
> > java -jar target/csv-fail-1.0-SNAPSHOT.jar{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CSV-296) Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)

2022-05-23 Thread Barry M. Caceres (Jira)


[ 
https://issues.apache.org/jira/browse/CSV-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17541096#comment-17541096
 ] 

Barry M. Caceres commented on CSV-296:
--

Exactly my assessment – the trimming is only affecting the values after they 
have been parsed.  However, there is no way to ignore white space in the 
parsing phase.  If the "trim" was implemented in the parsing phase it would 
likely handle the problem I am dealing with it, but might not deal with quoted 
strings properly.

 

How do I get whitespace after the delimiter ignored during parsing?

> Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)
> --
>
> Key: CSV-296
> URL: https://issues.apache.org/jira/browse/CSV-296
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.8, 1.9.0
> Environment: +{*}macOS{*}:+
> {code:java}
> > uname -a
> Darwin Senzing-MacBook-Pro.local 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar 
> 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64 x86_64 {code}
> {code:java}
> > java -version
> openjdk version "11.0.14" 2022-01-18
> OpenJDK Runtime Environment Temurin-11.0.14+9 (build 11.0.14+9)
> OpenJDK 64-Bit Server VM Temurin-11.0.14+9 (build 11.0.14+9, mixed mode) 
> {code}
> {+}*Linux*{+}:
> {code:java}
> > uname -a
> Linux lnxdev 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 
> x86_64 x86_64 x86_64 GNU/Linux {code}
> {code:java}
> > java -version
> openjdk version "11.0.11" 2021-04-20
> OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)
> OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed 
> mode){code}
>Reporter: Barry M. Caceres
>Priority: Major
> Attachments: csvfail.zip
>
>
> I have my CSVFormat initialized such that *{{withTrim(true)}}* has been set 
> {_}(see attached ZIP file){_}:
> {code:java}
> CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader()
>         .withIgnoreEmptyLines(true).withTrim(true);{code}
>  
> However, a quoted string that begins after a delimiter followed by preceding 
> whitespace is not properly parsed. For example:
> {code:java}
> GIVEN_NAME,SURNAME,ADDRESS,PHONE_NUMBER
> "Joe",  "Schmoe","101 Main Street; Las Vegas, NV 89101","702-555-1212"
> "John","Doe",  "201 First Street; Las Vegas, NV 89102", "702-555-1313"
> "Jane","Doe","301 Second Street; Las Vegas, NV 89103","702-555-1414"
> {code}
>  
>  * Notice the whitespace preceding {color:#0747a6}*{{"Schmoe"}}*{color} on 
> the first record?  This leads to the actual value containing the quotation 
> marks instead of them being stripped off.
>  * The whitespace preceding {color:#0747a6}*{{"201 First Street; Las Vegas, 
> NV 89102"}}*{color} on the second record leads to it to being parsed as two 
> values: {color:#0747a6}*{{"201 First Street; Las Vegas}}*{color} and {*}{{NV 
> 89102"}}{*}.
>  * The third record is the only one that parses as expected.
> I believe that this is because the trimming is done *after* the value is 
> being parsed rather than consuming the whitespace following the delimiter 
> during parsing.   Either that, or the check for a quoted string is occurring 
> *before* the whitespace is being consumed.
>  
> *NOTE:* I have attached a ZIP file that easily reproduces the problem with 
> the CSV file given above.
> To build the attached project use Apache Maven and then execute using using 
> Java 11:
> {code:java}
> > unzip csvfail.zip
> > cd csvfail
> > mvn package
> > java -jar target/csv-fail-1.0-SNAPSHOT.jar{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (CSV-296) Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)

2022-05-22 Thread Gary D. Gregory (Jira)


[ 
https://issues.apache.org/jira/browse/CSV-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17540582#comment-17540582
 ] 

Gary D. Gregory commented on CSV-296:
-

Trimming is only used on _values_ after parsing such that {{" foo "}} becomes 
{{{}"foo"{}}}. It seems you are dealing with a parsing issue, not a trimming 
issue.

> Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)
> --
>
> Key: CSV-296
> URL: https://issues.apache.org/jira/browse/CSV-296
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.8, 1.9.0
> Environment: +{*}macOS{*}:+
> {code:java}
> > uname -a
> Darwin Senzing-MacBook-Pro.local 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar 
> 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64 x86_64 {code}
> {code:java}
> > java -version
> openjdk version "11.0.14" 2022-01-18
> OpenJDK Runtime Environment Temurin-11.0.14+9 (build 11.0.14+9)
> OpenJDK 64-Bit Server VM Temurin-11.0.14+9 (build 11.0.14+9, mixed mode) 
> {code}
> {+}*Linux*{+}:
> {code:java}
> > uname -a
> Linux lnxdev 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 
> x86_64 x86_64 x86_64 GNU/Linux {code}
> {code:java}
> > java -version
> openjdk version "11.0.11" 2021-04-20
> OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)
> OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed 
> mode){code}
>Reporter: Barry M. Caceres
>Priority: Major
> Attachments: csvfail.zip
>
>
> I have my CSVFormat initialized such that *{{withTrim(true)}}* has been set 
> {_}(see attached ZIP file){_}:
> {code:java}
> CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader()
>         .withIgnoreEmptyLines(true).withTrim(true);{code}
>  
> However, a quoted string that begins after a delimiter followed by preceding 
> whitespace is not properly parsed. For example:
> {code:java}
> GIVEN_NAME,SURNAME,ADDRESS,PHONE_NUMBER
> "Joe",  "Schmoe","101 Main Street; Las Vegas, NV 89101","702-555-1212"
> "John","Doe",  "201 First Street; Las Vegas, NV 89102", "702-555-1313"
> "Jane","Doe","301 Second Street; Las Vegas, NV 89103","702-555-1414"
> {code}
>  
>  * Notice the whitespace preceding {color:#0747a6}*{{"Schmoe"}}*{color} on 
> the first record?  This leads to the actual value containing the quotation 
> marks instead of them being stripped off.
>  * The whitespace preceding {color:#0747a6}*{{"201 First Street; Las Vegas, 
> NV 89102"}}*{color} on the second record leads to it to being parsed as two 
> values: {color:#0747a6}*{{"201 First Street; Las Vegas}}*{color} and {*}{{NV 
> 89102"}}{*}.
>  * The third record is the only one that parses as expected.
> I believe that this is because the trimming is done *after* the value is 
> being parsed rather than consuming the whitespace following the delimiter 
> during parsing.   Either that, or the check for a quoted string is occurring 
> *before* the whitespace is being consumed.
>  
> *NOTE:* I have attached a ZIP file that easily reproduces the problem with 
> the CSV file given above.
> To build the attached project use Apache Maven and then execute using using 
> Java 11:
> {code:java}
> > unzip csvfail.zip
> > cd csvfail
> > mvn package
> > java -jar target/csv-fail-1.0-SNAPSHOT.jar{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)