[jira] [Commented] (CSV-296) Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)

2022-11-01 Thread Barry M. Caceres (Jira)


[ 
https://issues.apache.org/jira/browse/CSV-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627358#comment-17627358
 ] 

Barry M. Caceres commented on CSV-296:
--

Once I made sure I was using the correct version of my library and upgraded to 
version v1.9.0 of Commons CSV, your suggestion worked as expected.

> Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)
> --
>
> Key: CSV-296
> URL: https://issues.apache.org/jira/browse/CSV-296
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.8, 1.9.0
> Environment: +{*}macOS{*}:+
> {code:java}
> > uname -a
> Darwin Senzing-MacBook-Pro.local 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar 
> 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64 x86_64 {code}
> {code:java}
> > java -version
> openjdk version "11.0.14" 2022-01-18
> OpenJDK Runtime Environment Temurin-11.0.14+9 (build 11.0.14+9)
> OpenJDK 64-Bit Server VM Temurin-11.0.14+9 (build 11.0.14+9, mixed mode) 
> {code}
> {+}*Linux*{+}:
> {code:java}
> > uname -a
> Linux lnxdev 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 
> x86_64 x86_64 x86_64 GNU/Linux {code}
> {code:java}
> > java -version
> openjdk version "11.0.11" 2021-04-20
> OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)
> OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed 
> mode){code}
>Reporter: Barry M. Caceres
>Priority: Major
> Attachments: csvfail.zip
>
>
> I have my CSVFormat initialized such that *{{withTrim(true)}}* has been set 
> {_}(see attached ZIP file){_}:
> {code:java}
> CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader()
>         .withIgnoreEmptyLines(true).withTrim(true);{code}
>  
> However, a quoted string that begins after a delimiter followed by preceding 
> whitespace is not properly parsed. For example:
> {code:java}
> GIVEN_NAME,SURNAME,ADDRESS,PHONE_NUMBER
> "Joe",  "Schmoe","101 Main Street; Las Vegas, NV 89101","702-555-1212"
> "John","Doe",  "201 First Street; Las Vegas, NV 89102", "702-555-1313"
> "Jane","Doe","301 Second Street; Las Vegas, NV 89103","702-555-1414"
> {code}
>  
>  * Notice the whitespace preceding {color:#0747a6}*{{"Schmoe"}}*{color} on 
> the first record?  This leads to the actual value containing the quotation 
> marks instead of them being stripped off.
>  * The whitespace preceding {color:#0747a6}*{{"201 First Street; Las Vegas, 
> NV 89102"}}*{color} on the second record leads to it to being parsed as two 
> values: {color:#0747a6}*{{"201 First Street; Las Vegas}}*{color} and {*}{{NV 
> 89102"}}{*}.
>  * The third record is the only one that parses as expected.
> I believe that this is because the trimming is done *after* the value is 
> being parsed rather than consuming the whitespace following the delimiter 
> during parsing.   Either that, or the check for a quoted string is occurring 
> *before* the whitespace is being consumed.
>  
> *NOTE:* I have attached a ZIP file that easily reproduces the problem with 
> the CSV file given above.
> To build the attached project use Apache Maven and then execute using using 
> Java 11:
> {code:java}
> > unzip csvfail.zip
> > cd csvfail
> > mvn package
> > java -jar target/csv-fail-1.0-SNAPSHOT.jar{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (CSV-296) Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)

2022-11-01 Thread Barry M. Caceres (Jira)


[ https://issues.apache.org/jira/browse/CSV-296 ]


Barry M. Caceres deleted comment on CSV-296:
--

was (Author: JIRAUSER289277):
{{setIgnoreSurroundingSpaces(true) only kinda works If the text is quoted 
it maintains the quotes as part of the text, whereas if the quotes immediately 
follow the comma (or other separator) then the text is processed as a CSV 
quoted string and the value does not have the quotes.  But if any spaces follow 
the comma before the first quote then the quotes are included as part of the 
value and the text is taken literally.}}

> Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)
> --
>
> Key: CSV-296
> URL: https://issues.apache.org/jira/browse/CSV-296
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.8, 1.9.0
> Environment: +{*}macOS{*}:+
> {code:java}
> > uname -a
> Darwin Senzing-MacBook-Pro.local 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar 
> 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64 x86_64 {code}
> {code:java}
> > java -version
> openjdk version "11.0.14" 2022-01-18
> OpenJDK Runtime Environment Temurin-11.0.14+9 (build 11.0.14+9)
> OpenJDK 64-Bit Server VM Temurin-11.0.14+9 (build 11.0.14+9, mixed mode) 
> {code}
> {+}*Linux*{+}:
> {code:java}
> > uname -a
> Linux lnxdev 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 
> x86_64 x86_64 x86_64 GNU/Linux {code}
> {code:java}
> > java -version
> openjdk version "11.0.11" 2021-04-20
> OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)
> OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed 
> mode){code}
>Reporter: Barry M. Caceres
>Priority: Major
> Attachments: csvfail.zip
>
>
> I have my CSVFormat initialized such that *{{withTrim(true)}}* has been set 
> {_}(see attached ZIP file){_}:
> {code:java}
> CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader()
>         .withIgnoreEmptyLines(true).withTrim(true);{code}
>  
> However, a quoted string that begins after a delimiter followed by preceding 
> whitespace is not properly parsed. For example:
> {code:java}
> GIVEN_NAME,SURNAME,ADDRESS,PHONE_NUMBER
> "Joe",  "Schmoe","101 Main Street; Las Vegas, NV 89101","702-555-1212"
> "John","Doe",  "201 First Street; Las Vegas, NV 89102", "702-555-1313"
> "Jane","Doe","301 Second Street; Las Vegas, NV 89103","702-555-1414"
> {code}
>  
>  * Notice the whitespace preceding {color:#0747a6}*{{"Schmoe"}}*{color} on 
> the first record?  This leads to the actual value containing the quotation 
> marks instead of them being stripped off.
>  * The whitespace preceding {color:#0747a6}*{{"201 First Street; Las Vegas, 
> NV 89102"}}*{color} on the second record leads to it to being parsed as two 
> values: {color:#0747a6}*{{"201 First Street; Las Vegas}}*{color} and {*}{{NV 
> 89102"}}{*}.
>  * The third record is the only one that parses as expected.
> I believe that this is because the trimming is done *after* the value is 
> being parsed rather than consuming the whitespace following the delimiter 
> during parsing.   Either that, or the check for a quoted string is occurring 
> *before* the whitespace is being consumed.
>  
> *NOTE:* I have attached a ZIP file that easily reproduces the problem with 
> the CSV file given above.
> To build the attached project use Apache Maven and then execute using using 
> Java 11:
> {code:java}
> > unzip csvfail.zip
> > cd csvfail
> > mvn package
> > java -jar target/csv-fail-1.0-SNAPSHOT.jar{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CSV-296) Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)

2022-11-01 Thread Barry M. Caceres (Jira)


[ 
https://issues.apache.org/jira/browse/CSV-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627302#comment-17627302
 ] 

Barry M. Caceres edited comment on CSV-296 at 11/1/22 7:35 PM:
---

{{setIgnoreSurroundingSpaces(true) only kinda works If the text is quoted 
it maintains the quotes as part of the text, whereas if the quotes immediately 
follow the comma (or other separator) then the text is processed as a CSV 
quoted string and the value does not have the quotes.  But if any spaces follow 
the comma before the first quote then the quotes are included as part of the 
value and the text is taken literally.}}


was (Author: JIRAUSER289277):
{{setIgnoreSurroundingSpaces(true) only kinda works I the text is quoted it 
maintains the quotes as part of the text, whereas if the quotes immediately 
follow the comma (or other separator) then the text is processed as a CSV 
quoted string and the value does not have the quotes.  But if any spaces follow 
the comma before the first quote then the quotes are included as part of the 
value and the text is taken literally.}}

> Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)
> --
>
> Key: CSV-296
> URL: https://issues.apache.org/jira/browse/CSV-296
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.8, 1.9.0
> Environment: +{*}macOS{*}:+
> {code:java}
> > uname -a
> Darwin Senzing-MacBook-Pro.local 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar 
> 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64 x86_64 {code}
> {code:java}
> > java -version
> openjdk version "11.0.14" 2022-01-18
> OpenJDK Runtime Environment Temurin-11.0.14+9 (build 11.0.14+9)
> OpenJDK 64-Bit Server VM Temurin-11.0.14+9 (build 11.0.14+9, mixed mode) 
> {code}
> {+}*Linux*{+}:
> {code:java}
> > uname -a
> Linux lnxdev 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 
> x86_64 x86_64 x86_64 GNU/Linux {code}
> {code:java}
> > java -version
> openjdk version "11.0.11" 2021-04-20
> OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)
> OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed 
> mode){code}
>Reporter: Barry M. Caceres
>Priority: Major
> Attachments: csvfail.zip
>
>
> I have my CSVFormat initialized such that *{{withTrim(true)}}* has been set 
> {_}(see attached ZIP file){_}:
> {code:java}
> CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader()
>         .withIgnoreEmptyLines(true).withTrim(true);{code}
>  
> However, a quoted string that begins after a delimiter followed by preceding 
> whitespace is not properly parsed. For example:
> {code:java}
> GIVEN_NAME,SURNAME,ADDRESS,PHONE_NUMBER
> "Joe",  "Schmoe","101 Main Street; Las Vegas, NV 89101","702-555-1212"
> "John","Doe",  "201 First Street; Las Vegas, NV 89102", "702-555-1313"
> "Jane","Doe","301 Second Street; Las Vegas, NV 89103","702-555-1414"
> {code}
>  
>  * Notice the whitespace preceding {color:#0747a6}*{{"Schmoe"}}*{color} on 
> the first record?  This leads to the actual value containing the quotation 
> marks instead of them being stripped off.
>  * The whitespace preceding {color:#0747a6}*{{"201 First Street; Las Vegas, 
> NV 89102"}}*{color} on the second record leads to it to being parsed as two 
> values: {color:#0747a6}*{{"201 First Street; Las Vegas}}*{color} and {*}{{NV 
> 89102"}}{*}.
>  * The third record is the only one that parses as expected.
> I believe that this is because the trimming is done *after* the value is 
> being parsed rather than consuming the whitespace following the delimiter 
> during parsing.   Either that, or the check for a quoted string is occurring 
> *before* the whitespace is being consumed.
>  
> *NOTE:* I have attached a ZIP file that easily reproduces the problem with 
> the CSV file given above.
> To build the attached project use Apache Maven and then execute using using 
> Java 11:
> {code:java}
> > unzip csvfail.zip
> > cd csvfail
> > mvn package
> > java -jar target/csv-fail-1.0-SNAPSHOT.jar{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CSV-296) Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)

2022-11-01 Thread Barry M. Caceres (Jira)


[ 
https://issues.apache.org/jira/browse/CSV-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627302#comment-17627302
 ] 

Barry M. Caceres edited comment on CSV-296 at 11/1/22 7:27 PM:
---

{{setIgnoreSurroundingSpaces(true) only kinda works I the text is quoted it 
maintains the quotes as part of the text, whereas if the quotes immediately 
follow the comma (or other separator) then the text is processed as a CSV 
quoted string and the value does not have the quotes.  But if any spaces follow 
the comma before the first quote then the quotes are included as part of the 
value and the text is taken literally.}}


was (Author: JIRAUSER289277):
{{setIgnoreSurroundingSpaces(true) only kinda works I the text is quoted it 
maintains the quotes as part of the text, whereas if the quotes immediately 
follow the comma (or other separator) then the text is processed as a CSV 
quoted string and the value does not have the quotes.  But if any spaces follow 
the comma before the first quote then the quotes are included as part of the 
value and the text is taken literally.}}

> Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)
> --
>
> Key: CSV-296
> URL: https://issues.apache.org/jira/browse/CSV-296
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.8, 1.9.0
> Environment: +{*}macOS{*}:+
> {code:java}
> > uname -a
> Darwin Senzing-MacBook-Pro.local 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar 
> 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64 x86_64 {code}
> {code:java}
> > java -version
> openjdk version "11.0.14" 2022-01-18
> OpenJDK Runtime Environment Temurin-11.0.14+9 (build 11.0.14+9)
> OpenJDK 64-Bit Server VM Temurin-11.0.14+9 (build 11.0.14+9, mixed mode) 
> {code}
> {+}*Linux*{+}:
> {code:java}
> > uname -a
> Linux lnxdev 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 
> x86_64 x86_64 x86_64 GNU/Linux {code}
> {code:java}
> > java -version
> openjdk version "11.0.11" 2021-04-20
> OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)
> OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed 
> mode){code}
>Reporter: Barry M. Caceres
>Priority: Major
> Attachments: csvfail.zip
>
>
> I have my CSVFormat initialized such that *{{withTrim(true)}}* has been set 
> {_}(see attached ZIP file){_}:
> {code:java}
> CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader()
>         .withIgnoreEmptyLines(true).withTrim(true);{code}
>  
> However, a quoted string that begins after a delimiter followed by preceding 
> whitespace is not properly parsed. For example:
> {code:java}
> GIVEN_NAME,SURNAME,ADDRESS,PHONE_NUMBER
> "Joe",  "Schmoe","101 Main Street; Las Vegas, NV 89101","702-555-1212"
> "John","Doe",  "201 First Street; Las Vegas, NV 89102", "702-555-1313"
> "Jane","Doe","301 Second Street; Las Vegas, NV 89103","702-555-1414"
> {code}
>  
>  * Notice the whitespace preceding {color:#0747a6}*{{"Schmoe"}}*{color} on 
> the first record?  This leads to the actual value containing the quotation 
> marks instead of them being stripped off.
>  * The whitespace preceding {color:#0747a6}*{{"201 First Street; Las Vegas, 
> NV 89102"}}*{color} on the second record leads to it to being parsed as two 
> values: {color:#0747a6}*{{"201 First Street; Las Vegas}}*{color} and {*}{{NV 
> 89102"}}{*}.
>  * The third record is the only one that parses as expected.
> I believe that this is because the trimming is done *after* the value is 
> being parsed rather than consuming the whitespace following the delimiter 
> during parsing.   Either that, or the check for a quoted string is occurring 
> *before* the whitespace is being consumed.
>  
> *NOTE:* I have attached a ZIP file that easily reproduces the problem with 
> the CSV file given above.
> To build the attached project use Apache Maven and then execute using using 
> Java 11:
> {code:java}
> > unzip csvfail.zip
> > cd csvfail
> > mvn package
> > java -jar target/csv-fail-1.0-SNAPSHOT.jar{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CSV-296) Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)

2022-11-01 Thread Barry M. Caceres (Jira)


[ 
https://issues.apache.org/jira/browse/CSV-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627302#comment-17627302
 ] 

Barry M. Caceres edited comment on CSV-296 at 11/1/22 7:27 PM:
---

{{setIgnoreSurroundingSpaces(true) only kinda works I the text is quoted it 
maintains the quotes as part of the text, whereas if the quotes immediately 
follow the comma (or other separator) then the text is processed as a CSV 
quoted string and the value does not have the quotes.  But if any spaces follow 
the comma before the first quote then the quotes are included as part of the 
value and the text is taken literally.}}


was (Author: JIRAUSER289277):
setIgnoreSurroundingSpaces(true) only kinda works I the text is quoted it 
maintains the quotes as part of the text, whereas if the quotes immediately 
follow the comma (or other separator) then the text is processed as a CSV 
quoted string and the value does not have the quotes.  But if any spaces follow 
the comma before the first quote then the quotes are included as part of the 
value and the text is taken literally.

> Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)
> --
>
> Key: CSV-296
> URL: https://issues.apache.org/jira/browse/CSV-296
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.8, 1.9.0
> Environment: +{*}macOS{*}:+
> {code:java}
> > uname -a
> Darwin Senzing-MacBook-Pro.local 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar 
> 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64 x86_64 {code}
> {code:java}
> > java -version
> openjdk version "11.0.14" 2022-01-18
> OpenJDK Runtime Environment Temurin-11.0.14+9 (build 11.0.14+9)
> OpenJDK 64-Bit Server VM Temurin-11.0.14+9 (build 11.0.14+9, mixed mode) 
> {code}
> {+}*Linux*{+}:
> {code:java}
> > uname -a
> Linux lnxdev 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 
> x86_64 x86_64 x86_64 GNU/Linux {code}
> {code:java}
> > java -version
> openjdk version "11.0.11" 2021-04-20
> OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)
> OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed 
> mode){code}
>Reporter: Barry M. Caceres
>Priority: Major
> Attachments: csvfail.zip
>
>
> I have my CSVFormat initialized such that *{{withTrim(true)}}* has been set 
> {_}(see attached ZIP file){_}:
> {code:java}
> CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader()
>         .withIgnoreEmptyLines(true).withTrim(true);{code}
>  
> However, a quoted string that begins after a delimiter followed by preceding 
> whitespace is not properly parsed. For example:
> {code:java}
> GIVEN_NAME,SURNAME,ADDRESS,PHONE_NUMBER
> "Joe",  "Schmoe","101 Main Street; Las Vegas, NV 89101","702-555-1212"
> "John","Doe",  "201 First Street; Las Vegas, NV 89102", "702-555-1313"
> "Jane","Doe","301 Second Street; Las Vegas, NV 89103","702-555-1414"
> {code}
>  
>  * Notice the whitespace preceding {color:#0747a6}*{{"Schmoe"}}*{color} on 
> the first record?  This leads to the actual value containing the quotation 
> marks instead of them being stripped off.
>  * The whitespace preceding {color:#0747a6}*{{"201 First Street; Las Vegas, 
> NV 89102"}}*{color} on the second record leads to it to being parsed as two 
> values: {color:#0747a6}*{{"201 First Street; Las Vegas}}*{color} and {*}{{NV 
> 89102"}}{*}.
>  * The third record is the only one that parses as expected.
> I believe that this is because the trimming is done *after* the value is 
> being parsed rather than consuming the whitespace following the delimiter 
> during parsing.   Either that, or the check for a quoted string is occurring 
> *before* the whitespace is being consumed.
>  
> *NOTE:* I have attached a ZIP file that easily reproduces the problem with 
> the CSV file given above.
> To build the attached project use Apache Maven and then execute using using 
> Java 11:
> {code:java}
> > unzip csvfail.zip
> > cd csvfail
> > mvn package
> > java -jar target/csv-fail-1.0-SNAPSHOT.jar{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CSV-296) Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)

2022-11-01 Thread Barry M. Caceres (Jira)


[ 
https://issues.apache.org/jira/browse/CSV-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627302#comment-17627302
 ] 

Barry M. Caceres commented on CSV-296:
--

setIgnoreSurroundingSpaces(true) only kinda works I the text is quoted it 
maintains the quotes as part of the text, whereas if the quotes immediately 
follow the comma (or other separator) then the text is processed as a CSV 
quoted string and the value does not have the quotes.  But if any spaces follow 
the comma before the first quote then the quotes are included as part of the 
value and the text is taken literally.

> Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)
> --
>
> Key: CSV-296
> URL: https://issues.apache.org/jira/browse/CSV-296
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.8, 1.9.0
> Environment: +{*}macOS{*}:+
> {code:java}
> > uname -a
> Darwin Senzing-MacBook-Pro.local 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar 
> 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64 x86_64 {code}
> {code:java}
> > java -version
> openjdk version "11.0.14" 2022-01-18
> OpenJDK Runtime Environment Temurin-11.0.14+9 (build 11.0.14+9)
> OpenJDK 64-Bit Server VM Temurin-11.0.14+9 (build 11.0.14+9, mixed mode) 
> {code}
> {+}*Linux*{+}:
> {code:java}
> > uname -a
> Linux lnxdev 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 
> x86_64 x86_64 x86_64 GNU/Linux {code}
> {code:java}
> > java -version
> openjdk version "11.0.11" 2021-04-20
> OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)
> OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed 
> mode){code}
>Reporter: Barry M. Caceres
>Priority: Major
> Attachments: csvfail.zip
>
>
> I have my CSVFormat initialized such that *{{withTrim(true)}}* has been set 
> {_}(see attached ZIP file){_}:
> {code:java}
> CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader()
>         .withIgnoreEmptyLines(true).withTrim(true);{code}
>  
> However, a quoted string that begins after a delimiter followed by preceding 
> whitespace is not properly parsed. For example:
> {code:java}
> GIVEN_NAME,SURNAME,ADDRESS,PHONE_NUMBER
> "Joe",  "Schmoe","101 Main Street; Las Vegas, NV 89101","702-555-1212"
> "John","Doe",  "201 First Street; Las Vegas, NV 89102", "702-555-1313"
> "Jane","Doe","301 Second Street; Las Vegas, NV 89103","702-555-1414"
> {code}
>  
>  * Notice the whitespace preceding {color:#0747a6}*{{"Schmoe"}}*{color} on 
> the first record?  This leads to the actual value containing the quotation 
> marks instead of them being stripped off.
>  * The whitespace preceding {color:#0747a6}*{{"201 First Street; Las Vegas, 
> NV 89102"}}*{color} on the second record leads to it to being parsed as two 
> values: {color:#0747a6}*{{"201 First Street; Las Vegas}}*{color} and {*}{{NV 
> 89102"}}{*}.
>  * The third record is the only one that parses as expected.
> I believe that this is because the trimming is done *after* the value is 
> being parsed rather than consuming the whitespace following the delimiter 
> during parsing.   Either that, or the check for a quoted string is occurring 
> *before* the whitespace is being consumed.
>  
> *NOTE:* I have attached a ZIP file that easily reproduces the problem with 
> the CSV file given above.
> To build the attached project use Apache Maven and then execute using using 
> Java 11:
> {code:java}
> > unzip csvfail.zip
> > cd csvfail
> > mvn package
> > java -jar target/csv-fail-1.0-SNAPSHOT.jar{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CSV-296) Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)

2022-05-23 Thread Barry M. Caceres (Jira)


[ 
https://issues.apache.org/jira/browse/CSV-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541096#comment-17541096
 ] 

Barry M. Caceres commented on CSV-296:
--

Exactly my assessment – the trimming is only affecting the values after they 
have been parsed.  However, there is no way to ignore white space in the 
parsing phase.  If the "trim" was implemented in the parsing phase it would 
likely handle the problem I am dealing with it, but might not deal with quoted 
strings properly.

 

How do I get whitespace after the delimiter ignored during parsing?

> Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)
> --
>
> Key: CSV-296
> URL: https://issues.apache.org/jira/browse/CSV-296
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.8, 1.9.0
> Environment: +{*}macOS{*}:+
> {code:java}
> > uname -a
> Darwin Senzing-MacBook-Pro.local 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar 
> 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64 x86_64 {code}
> {code:java}
> > java -version
> openjdk version "11.0.14" 2022-01-18
> OpenJDK Runtime Environment Temurin-11.0.14+9 (build 11.0.14+9)
> OpenJDK 64-Bit Server VM Temurin-11.0.14+9 (build 11.0.14+9, mixed mode) 
> {code}
> {+}*Linux*{+}:
> {code:java}
> > uname -a
> Linux lnxdev 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 
> x86_64 x86_64 x86_64 GNU/Linux {code}
> {code:java}
> > java -version
> openjdk version "11.0.11" 2021-04-20
> OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)
> OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed 
> mode){code}
>Reporter: Barry M. Caceres
>Priority: Major
> Attachments: csvfail.zip
>
>
> I have my CSVFormat initialized such that *{{withTrim(true)}}* has been set 
> {_}(see attached ZIP file){_}:
> {code:java}
> CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader()
>         .withIgnoreEmptyLines(true).withTrim(true);{code}
>  
> However, a quoted string that begins after a delimiter followed by preceding 
> whitespace is not properly parsed. For example:
> {code:java}
> GIVEN_NAME,SURNAME,ADDRESS,PHONE_NUMBER
> "Joe",  "Schmoe","101 Main Street; Las Vegas, NV 89101","702-555-1212"
> "John","Doe",  "201 First Street; Las Vegas, NV 89102", "702-555-1313"
> "Jane","Doe","301 Second Street; Las Vegas, NV 89103","702-555-1414"
> {code}
>  
>  * Notice the whitespace preceding {color:#0747a6}*{{"Schmoe"}}*{color} on 
> the first record?  This leads to the actual value containing the quotation 
> marks instead of them being stripped off.
>  * The whitespace preceding {color:#0747a6}*{{"201 First Street; Las Vegas, 
> NV 89102"}}*{color} on the second record leads to it to being parsed as two 
> values: {color:#0747a6}*{{"201 First Street; Las Vegas}}*{color} and {*}{{NV 
> 89102"}}{*}.
>  * The third record is the only one that parses as expected.
> I believe that this is because the trimming is done *after* the value is 
> being parsed rather than consuming the whitespace following the delimiter 
> during parsing.   Either that, or the check for a quoted string is occurring 
> *before* the whitespace is being consumed.
>  
> *NOTE:* I have attached a ZIP file that easily reproduces the problem with 
> the CSV file given above.
> To build the attached project use Apache Maven and then execute using using 
> Java 11:
> {code:java}
> > unzip csvfail.zip
> > cd csvfail
> > mvn package
> > java -jar target/csv-fail-1.0-SNAPSHOT.jar{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (CSV-296) Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)

2022-05-10 Thread Barry M. Caceres (Jira)


 [ 
https://issues.apache.org/jira/browse/CSV-296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barry M. Caceres updated CSV-296:
-
Description: 
I have my CSVFormat initialized such that *{{withTrim(true)}}* has been set 
{_}(see attached ZIP file){_}:
{code:java}
CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader()
        .withIgnoreEmptyLines(true).withTrim(true);{code}
 

However, a quoted string that begins after a delimiter followed by preceding 
whitespace is not properly parsed. For example:
{code:java}
GIVEN_NAME,SURNAME,ADDRESS,PHONE_NUMBER
"Joe",  "Schmoe","101 Main Street; Las Vegas, NV 89101","702-555-1212"
"John","Doe",  "201 First Street; Las Vegas, NV 89102", "702-555-1313"
"Jane","Doe","301 Second Street; Las Vegas, NV 89103","702-555-1414"
{code}
 
 * Notice the whitespace preceding {color:#0747a6}*{{"Schmoe"}}*{color} on the 
first record?  This leads to the actual value containing the quotation marks 
instead of them being stripped off.
 * The whitespace preceding {color:#0747a6}*{{"201 First Street; Las Vegas, NV 
89102"}}*{color} on the second record leads to it to being parsed as two 
values: {color:#0747a6}*{{"201 First Street; Las Vegas}}*{color} and {*}{{NV 
89102"}}{*}.
 * The third record is the only one that parses as expected.

I believe that this is because the trimming is done *after* the value is being 
parsed rather than consuming the whitespace following the delimiter during 
parsing.   Either that, or the check for a quoted string is occurring *before* 
the whitespace is being consumed.

 

*NOTE:* I have attached a ZIP file that easily reproduces the problem with the 
CSV file given above.

To build the attached project use Apache Maven and then execute using using 
Java 11:
{code:java}
> unzip csvfail.zip
> cd csvfail
> mvn package
> java -jar target/csv-fail-1.0-SNAPSHOT.jar{code}

  was:
I have my CSVFormat initialized such that *{{withTrim(true)}}* has been set.

 
{code:java}
CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader()
        .withIgnoreEmptyLines(true).withTrim(true);{code}
 

 

However, a quoted string that begins after a delimiter followed by preceding 
whitespace is not properly parsed.

For example:

 
{code:java}
GIVEN_NAME,SURNAME,ADDRESS,PHONE_NUMBER
"Joe",  "Schmoe","101 Main Street; Las Vegas, NV 89101","702-555-1212"
"John","Doe",  "201 First Street; Las Vegas, NV 89102", "702-555-1313"
"Jane","Doe","301 Second Street; Las Vegas, NV 89103","702-555-1414"
{code}
 

Notice the whitespace preceding *{{"Schmoe"}}* on the first record?  This leads 
to the actual value containing the quotation marks instead of them being 
stripped off.

 

The whitespace preceding {color:#0747a6}*{{"201 First Street; Las Vegas, NV 
89102"}}*{color} on the second record leads to it to being parsed as two 
values: {color:#0747a6}*{{"201 First Street; Las Vegas}}*{color} and {*}{{NV 
89102"}}{*}.

 

The third record is the only one that parses as expected.

 

 

I believe that this is because the trimming is done *after* the value is being 
parsed rather than consuming the whitespace following the delimiter during 
parsing.   Either that, or the check for a quoted string is occurring *before* 
the whitespace is being consumed.

 

*NOTE:* I have attached a ZIP file that easily reproduces the problem with the 
CSV file given above.


> Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)
> --
>
> Key: CSV-296
> URL: https://issues.apache.org/jira/browse/CSV-296
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.8, 1.9.0
> Environment: +{*}macOS{*}:+
> {code:java}
> > uname -a
> Darwin Senzing-MacBook-Pro.local 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar 
> 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64 x86_64 {code}
> {code:java}
> > java -version
> openjdk version "11.0.14" 2022-01-18
> OpenJDK Runtime Environment Temurin-11.0.14+9 (build 11.0.14+9)
> OpenJDK 64-Bit Server VM Temurin-11.0.14+9 (build 11.0.14+9, mixed mode) 
> {code}
> {+}*Linux*{+}:
> {code:java}
> > uname -a
> Linux lnxdev 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 
> x86_64 x86_64 x86_64 GNU/Linux {code}
> {code:java}
> > java -version
> openjdk version "11.0.11" 2021-04-20
> OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)
> OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed 
> mode){code}
>Reporter: Barry M. Caceres
>Priority: Major
> Attachments: csvfail.zip
>
>
> I have my CSVFormat initialized such that *{{withTrim(true)}}* has been set 
> {_}(see attached ZIP file){_}:
> {code:java}
> CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader()
>         .withIgnoreEmptyLines(true).withTrim(true)

[jira] [Created] (CSV-296) Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)

2022-05-10 Thread Barry M. Caceres (Jira)
Barry M. Caceres created CSV-296:


 Summary: Delimiter followed by Whitespace then by Quotes Failing 
with setTrim(true)
 Key: CSV-296
 URL: https://issues.apache.org/jira/browse/CSV-296
 Project: Commons CSV
  Issue Type: Bug
  Components: Parser
Affects Versions: 1.9.0, 1.8
 Environment: +{*}macOS{*}:+
{code:java}
> uname -a
Darwin Senzing-MacBook-Pro.local 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar 
18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64 x86_64 {code}
{code:java}
> java -version
openjdk version "11.0.14" 2022-01-18
OpenJDK Runtime Environment Temurin-11.0.14+9 (build 11.0.14+9)
OpenJDK 64-Bit Server VM Temurin-11.0.14+9 (build 11.0.14+9, mixed mode) {code}
{+}*Linux*{+}:
{code:java}
> uname -a
Linux lnxdev 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 
x86_64 x86_64 x86_64 GNU/Linux {code}
{code:java}
> java -version
openjdk version "11.0.11" 2021-04-20
OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)
OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed 
mode){code}
Reporter: Barry M. Caceres
 Attachments: csvfail.zip

I have my CSVFormat initialized such that *{{withTrim(true)}}* has been set.

 
{code:java}
CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader()
        .withIgnoreEmptyLines(true).withTrim(true);{code}
 

 

However, a quoted string that begins after a delimiter followed by preceding 
whitespace is not properly parsed.

For example:

 
{code:java}
GIVEN_NAME,SURNAME,ADDRESS,PHONE_NUMBER
"Joe",  "Schmoe","101 Main Street; Las Vegas, NV 89101","702-555-1212"
"John","Doe",  "201 First Street; Las Vegas, NV 89102", "702-555-1313"
"Jane","Doe","301 Second Street; Las Vegas, NV 89103","702-555-1414"
{code}
 

Notice the whitespace preceding *{{"Schmoe"}}* on the first record?  This leads 
to the actual value containing the quotation marks instead of them being 
stripped off.

 

The whitespace preceding {color:#0747a6}*{{"201 First Street; Las Vegas, NV 
89102"}}*{color} on the second record leads to it to being parsed as two 
values: {color:#0747a6}*{{"201 First Street; Las Vegas}}*{color} and {*}{{NV 
89102"}}{*}.

 

The third record is the only one that parses as expected.

 

 

I believe that this is because the trimming is done *after* the value is being 
parsed rather than consuming the whitespace following the delimiter during 
parsing.   Either that, or the check for a quoted string is occurring *before* 
the whitespace is being consumed.

 

*NOTE:* I have attached a ZIP file that easily reproduces the problem with the 
CSV file given above.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)