[jira] [Commented] (CSV-296) Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)
[ https://issues.apache.org/jira/browse/CSV-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627358#comment-17627358 ] Barry M. Caceres commented on CSV-296: -- Once I made sure I was using the correct version of my library and upgraded to version v1.9.0 of Commons CSV, your suggestion worked as expected. > Delimiter followed by Whitespace then by Quotes Failing with setTrim(true) > -- > > Key: CSV-296 > URL: https://issues.apache.org/jira/browse/CSV-296 > Project: Commons CSV > Issue Type: Bug > Components: Parser >Affects Versions: 1.8, 1.9.0 > Environment: +{*}macOS{*}:+ > {code:java} > > uname -a > Darwin Senzing-MacBook-Pro.local 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar > 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64 x86_64 {code} > {code:java} > > java -version > openjdk version "11.0.14" 2022-01-18 > OpenJDK Runtime Environment Temurin-11.0.14+9 (build 11.0.14+9) > OpenJDK 64-Bit Server VM Temurin-11.0.14+9 (build 11.0.14+9, mixed mode) > {code} > {+}*Linux*{+}: > {code:java} > > uname -a > Linux lnxdev 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 > x86_64 x86_64 x86_64 GNU/Linux {code} > {code:java} > > java -version > openjdk version "11.0.11" 2021-04-20 > OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9) > OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed > mode){code} >Reporter: Barry M. Caceres >Priority: Major > Attachments: csvfail.zip > > > I have my CSVFormat initialized such that *{{withTrim(true)}}* has been set > {_}(see attached ZIP file){_}: > {code:java} > CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader() > .withIgnoreEmptyLines(true).withTrim(true);{code} > > However, a quoted string that begins after a delimiter followed by preceding > whitespace is not properly parsed. For example: > {code:java} > GIVEN_NAME,SURNAME,ADDRESS,PHONE_NUMBER > "Joe", "Schmoe","101 Main Street; Las Vegas, NV 89101","702-555-1212" > "John","Doe", "201 First Street; Las Vegas, NV 89102", "702-555-1313" > "Jane","Doe","301 Second Street; Las Vegas, NV 89103","702-555-1414" > {code} > > * Notice the whitespace preceding {color:#0747a6}*{{"Schmoe"}}*{color} on > the first record? This leads to the actual value containing the quotation > marks instead of them being stripped off. > * The whitespace preceding {color:#0747a6}*{{"201 First Street; Las Vegas, > NV 89102"}}*{color} on the second record leads to it to being parsed as two > values: {color:#0747a6}*{{"201 First Street; Las Vegas}}*{color} and {*}{{NV > 89102"}}{*}. > * The third record is the only one that parses as expected. > I believe that this is because the trimming is done *after* the value is > being parsed rather than consuming the whitespace following the delimiter > during parsing. Either that, or the check for a quoted string is occurring > *before* the whitespace is being consumed. > > *NOTE:* I have attached a ZIP file that easily reproduces the problem with > the CSV file given above. > To build the attached project use Apache Maven and then execute using using > Java 11: > {code:java} > > unzip csvfail.zip > > cd csvfail > > mvn package > > java -jar target/csv-fail-1.0-SNAPSHOT.jar{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] (CSV-296) Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)
[ https://issues.apache.org/jira/browse/CSV-296 ] Barry M. Caceres deleted comment on CSV-296: -- was (Author: JIRAUSER289277): {{setIgnoreSurroundingSpaces(true) only kinda works If the text is quoted it maintains the quotes as part of the text, whereas if the quotes immediately follow the comma (or other separator) then the text is processed as a CSV quoted string and the value does not have the quotes. But if any spaces follow the comma before the first quote then the quotes are included as part of the value and the text is taken literally.}} > Delimiter followed by Whitespace then by Quotes Failing with setTrim(true) > -- > > Key: CSV-296 > URL: https://issues.apache.org/jira/browse/CSV-296 > Project: Commons CSV > Issue Type: Bug > Components: Parser >Affects Versions: 1.8, 1.9.0 > Environment: +{*}macOS{*}:+ > {code:java} > > uname -a > Darwin Senzing-MacBook-Pro.local 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar > 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64 x86_64 {code} > {code:java} > > java -version > openjdk version "11.0.14" 2022-01-18 > OpenJDK Runtime Environment Temurin-11.0.14+9 (build 11.0.14+9) > OpenJDK 64-Bit Server VM Temurin-11.0.14+9 (build 11.0.14+9, mixed mode) > {code} > {+}*Linux*{+}: > {code:java} > > uname -a > Linux lnxdev 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 > x86_64 x86_64 x86_64 GNU/Linux {code} > {code:java} > > java -version > openjdk version "11.0.11" 2021-04-20 > OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9) > OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed > mode){code} >Reporter: Barry M. Caceres >Priority: Major > Attachments: csvfail.zip > > > I have my CSVFormat initialized such that *{{withTrim(true)}}* has been set > {_}(see attached ZIP file){_}: > {code:java} > CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader() > .withIgnoreEmptyLines(true).withTrim(true);{code} > > However, a quoted string that begins after a delimiter followed by preceding > whitespace is not properly parsed. For example: > {code:java} > GIVEN_NAME,SURNAME,ADDRESS,PHONE_NUMBER > "Joe", "Schmoe","101 Main Street; Las Vegas, NV 89101","702-555-1212" > "John","Doe", "201 First Street; Las Vegas, NV 89102", "702-555-1313" > "Jane","Doe","301 Second Street; Las Vegas, NV 89103","702-555-1414" > {code} > > * Notice the whitespace preceding {color:#0747a6}*{{"Schmoe"}}*{color} on > the first record? This leads to the actual value containing the quotation > marks instead of them being stripped off. > * The whitespace preceding {color:#0747a6}*{{"201 First Street; Las Vegas, > NV 89102"}}*{color} on the second record leads to it to being parsed as two > values: {color:#0747a6}*{{"201 First Street; Las Vegas}}*{color} and {*}{{NV > 89102"}}{*}. > * The third record is the only one that parses as expected. > I believe that this is because the trimming is done *after* the value is > being parsed rather than consuming the whitespace following the delimiter > during parsing. Either that, or the check for a quoted string is occurring > *before* the whitespace is being consumed. > > *NOTE:* I have attached a ZIP file that easily reproduces the problem with > the CSV file given above. > To build the attached project use Apache Maven and then execute using using > Java 11: > {code:java} > > unzip csvfail.zip > > cd csvfail > > mvn package > > java -jar target/csv-fail-1.0-SNAPSHOT.jar{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (CSV-296) Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)
[ https://issues.apache.org/jira/browse/CSV-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627302#comment-17627302 ] Barry M. Caceres edited comment on CSV-296 at 11/1/22 7:35 PM: --- {{setIgnoreSurroundingSpaces(true) only kinda works If the text is quoted it maintains the quotes as part of the text, whereas if the quotes immediately follow the comma (or other separator) then the text is processed as a CSV quoted string and the value does not have the quotes. But if any spaces follow the comma before the first quote then the quotes are included as part of the value and the text is taken literally.}} was (Author: JIRAUSER289277): {{setIgnoreSurroundingSpaces(true) only kinda works I the text is quoted it maintains the quotes as part of the text, whereas if the quotes immediately follow the comma (or other separator) then the text is processed as a CSV quoted string and the value does not have the quotes. But if any spaces follow the comma before the first quote then the quotes are included as part of the value and the text is taken literally.}} > Delimiter followed by Whitespace then by Quotes Failing with setTrim(true) > -- > > Key: CSV-296 > URL: https://issues.apache.org/jira/browse/CSV-296 > Project: Commons CSV > Issue Type: Bug > Components: Parser >Affects Versions: 1.8, 1.9.0 > Environment: +{*}macOS{*}:+ > {code:java} > > uname -a > Darwin Senzing-MacBook-Pro.local 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar > 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64 x86_64 {code} > {code:java} > > java -version > openjdk version "11.0.14" 2022-01-18 > OpenJDK Runtime Environment Temurin-11.0.14+9 (build 11.0.14+9) > OpenJDK 64-Bit Server VM Temurin-11.0.14+9 (build 11.0.14+9, mixed mode) > {code} > {+}*Linux*{+}: > {code:java} > > uname -a > Linux lnxdev 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 > x86_64 x86_64 x86_64 GNU/Linux {code} > {code:java} > > java -version > openjdk version "11.0.11" 2021-04-20 > OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9) > OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed > mode){code} >Reporter: Barry M. Caceres >Priority: Major > Attachments: csvfail.zip > > > I have my CSVFormat initialized such that *{{withTrim(true)}}* has been set > {_}(see attached ZIP file){_}: > {code:java} > CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader() > .withIgnoreEmptyLines(true).withTrim(true);{code} > > However, a quoted string that begins after a delimiter followed by preceding > whitespace is not properly parsed. For example: > {code:java} > GIVEN_NAME,SURNAME,ADDRESS,PHONE_NUMBER > "Joe", "Schmoe","101 Main Street; Las Vegas, NV 89101","702-555-1212" > "John","Doe", "201 First Street; Las Vegas, NV 89102", "702-555-1313" > "Jane","Doe","301 Second Street; Las Vegas, NV 89103","702-555-1414" > {code} > > * Notice the whitespace preceding {color:#0747a6}*{{"Schmoe"}}*{color} on > the first record? This leads to the actual value containing the quotation > marks instead of them being stripped off. > * The whitespace preceding {color:#0747a6}*{{"201 First Street; Las Vegas, > NV 89102"}}*{color} on the second record leads to it to being parsed as two > values: {color:#0747a6}*{{"201 First Street; Las Vegas}}*{color} and {*}{{NV > 89102"}}{*}. > * The third record is the only one that parses as expected. > I believe that this is because the trimming is done *after* the value is > being parsed rather than consuming the whitespace following the delimiter > during parsing. Either that, or the check for a quoted string is occurring > *before* the whitespace is being consumed. > > *NOTE:* I have attached a ZIP file that easily reproduces the problem with > the CSV file given above. > To build the attached project use Apache Maven and then execute using using > Java 11: > {code:java} > > unzip csvfail.zip > > cd csvfail > > mvn package > > java -jar target/csv-fail-1.0-SNAPSHOT.jar{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (CSV-296) Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)
[ https://issues.apache.org/jira/browse/CSV-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627302#comment-17627302 ] Barry M. Caceres edited comment on CSV-296 at 11/1/22 7:27 PM: --- {{setIgnoreSurroundingSpaces(true) only kinda works I the text is quoted it maintains the quotes as part of the text, whereas if the quotes immediately follow the comma (or other separator) then the text is processed as a CSV quoted string and the value does not have the quotes. But if any spaces follow the comma before the first quote then the quotes are included as part of the value and the text is taken literally.}} was (Author: JIRAUSER289277): {{setIgnoreSurroundingSpaces(true) only kinda works I the text is quoted it maintains the quotes as part of the text, whereas if the quotes immediately follow the comma (or other separator) then the text is processed as a CSV quoted string and the value does not have the quotes. But if any spaces follow the comma before the first quote then the quotes are included as part of the value and the text is taken literally.}} > Delimiter followed by Whitespace then by Quotes Failing with setTrim(true) > -- > > Key: CSV-296 > URL: https://issues.apache.org/jira/browse/CSV-296 > Project: Commons CSV > Issue Type: Bug > Components: Parser >Affects Versions: 1.8, 1.9.0 > Environment: +{*}macOS{*}:+ > {code:java} > > uname -a > Darwin Senzing-MacBook-Pro.local 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar > 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64 x86_64 {code} > {code:java} > > java -version > openjdk version "11.0.14" 2022-01-18 > OpenJDK Runtime Environment Temurin-11.0.14+9 (build 11.0.14+9) > OpenJDK 64-Bit Server VM Temurin-11.0.14+9 (build 11.0.14+9, mixed mode) > {code} > {+}*Linux*{+}: > {code:java} > > uname -a > Linux lnxdev 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 > x86_64 x86_64 x86_64 GNU/Linux {code} > {code:java} > > java -version > openjdk version "11.0.11" 2021-04-20 > OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9) > OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed > mode){code} >Reporter: Barry M. Caceres >Priority: Major > Attachments: csvfail.zip > > > I have my CSVFormat initialized such that *{{withTrim(true)}}* has been set > {_}(see attached ZIP file){_}: > {code:java} > CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader() > .withIgnoreEmptyLines(true).withTrim(true);{code} > > However, a quoted string that begins after a delimiter followed by preceding > whitespace is not properly parsed. For example: > {code:java} > GIVEN_NAME,SURNAME,ADDRESS,PHONE_NUMBER > "Joe", "Schmoe","101 Main Street; Las Vegas, NV 89101","702-555-1212" > "John","Doe", "201 First Street; Las Vegas, NV 89102", "702-555-1313" > "Jane","Doe","301 Second Street; Las Vegas, NV 89103","702-555-1414" > {code} > > * Notice the whitespace preceding {color:#0747a6}*{{"Schmoe"}}*{color} on > the first record? This leads to the actual value containing the quotation > marks instead of them being stripped off. > * The whitespace preceding {color:#0747a6}*{{"201 First Street; Las Vegas, > NV 89102"}}*{color} on the second record leads to it to being parsed as two > values: {color:#0747a6}*{{"201 First Street; Las Vegas}}*{color} and {*}{{NV > 89102"}}{*}. > * The third record is the only one that parses as expected. > I believe that this is because the trimming is done *after* the value is > being parsed rather than consuming the whitespace following the delimiter > during parsing. Either that, or the check for a quoted string is occurring > *before* the whitespace is being consumed. > > *NOTE:* I have attached a ZIP file that easily reproduces the problem with > the CSV file given above. > To build the attached project use Apache Maven and then execute using using > Java 11: > {code:java} > > unzip csvfail.zip > > cd csvfail > > mvn package > > java -jar target/csv-fail-1.0-SNAPSHOT.jar{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (CSV-296) Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)
[ https://issues.apache.org/jira/browse/CSV-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627302#comment-17627302 ] Barry M. Caceres edited comment on CSV-296 at 11/1/22 7:27 PM: --- {{setIgnoreSurroundingSpaces(true) only kinda works I the text is quoted it maintains the quotes as part of the text, whereas if the quotes immediately follow the comma (or other separator) then the text is processed as a CSV quoted string and the value does not have the quotes. But if any spaces follow the comma before the first quote then the quotes are included as part of the value and the text is taken literally.}} was (Author: JIRAUSER289277): setIgnoreSurroundingSpaces(true) only kinda works I the text is quoted it maintains the quotes as part of the text, whereas if the quotes immediately follow the comma (or other separator) then the text is processed as a CSV quoted string and the value does not have the quotes. But if any spaces follow the comma before the first quote then the quotes are included as part of the value and the text is taken literally. > Delimiter followed by Whitespace then by Quotes Failing with setTrim(true) > -- > > Key: CSV-296 > URL: https://issues.apache.org/jira/browse/CSV-296 > Project: Commons CSV > Issue Type: Bug > Components: Parser >Affects Versions: 1.8, 1.9.0 > Environment: +{*}macOS{*}:+ > {code:java} > > uname -a > Darwin Senzing-MacBook-Pro.local 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar > 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64 x86_64 {code} > {code:java} > > java -version > openjdk version "11.0.14" 2022-01-18 > OpenJDK Runtime Environment Temurin-11.0.14+9 (build 11.0.14+9) > OpenJDK 64-Bit Server VM Temurin-11.0.14+9 (build 11.0.14+9, mixed mode) > {code} > {+}*Linux*{+}: > {code:java} > > uname -a > Linux lnxdev 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 > x86_64 x86_64 x86_64 GNU/Linux {code} > {code:java} > > java -version > openjdk version "11.0.11" 2021-04-20 > OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9) > OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed > mode){code} >Reporter: Barry M. Caceres >Priority: Major > Attachments: csvfail.zip > > > I have my CSVFormat initialized such that *{{withTrim(true)}}* has been set > {_}(see attached ZIP file){_}: > {code:java} > CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader() > .withIgnoreEmptyLines(true).withTrim(true);{code} > > However, a quoted string that begins after a delimiter followed by preceding > whitespace is not properly parsed. For example: > {code:java} > GIVEN_NAME,SURNAME,ADDRESS,PHONE_NUMBER > "Joe", "Schmoe","101 Main Street; Las Vegas, NV 89101","702-555-1212" > "John","Doe", "201 First Street; Las Vegas, NV 89102", "702-555-1313" > "Jane","Doe","301 Second Street; Las Vegas, NV 89103","702-555-1414" > {code} > > * Notice the whitespace preceding {color:#0747a6}*{{"Schmoe"}}*{color} on > the first record? This leads to the actual value containing the quotation > marks instead of them being stripped off. > * The whitespace preceding {color:#0747a6}*{{"201 First Street; Las Vegas, > NV 89102"}}*{color} on the second record leads to it to being parsed as two > values: {color:#0747a6}*{{"201 First Street; Las Vegas}}*{color} and {*}{{NV > 89102"}}{*}. > * The third record is the only one that parses as expected. > I believe that this is because the trimming is done *after* the value is > being parsed rather than consuming the whitespace following the delimiter > during parsing. Either that, or the check for a quoted string is occurring > *before* the whitespace is being consumed. > > *NOTE:* I have attached a ZIP file that easily reproduces the problem with > the CSV file given above. > To build the attached project use Apache Maven and then execute using using > Java 11: > {code:java} > > unzip csvfail.zip > > cd csvfail > > mvn package > > java -jar target/csv-fail-1.0-SNAPSHOT.jar{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CSV-296) Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)
[ https://issues.apache.org/jira/browse/CSV-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627302#comment-17627302 ] Barry M. Caceres commented on CSV-296: -- setIgnoreSurroundingSpaces(true) only kinda works I the text is quoted it maintains the quotes as part of the text, whereas if the quotes immediately follow the comma (or other separator) then the text is processed as a CSV quoted string and the value does not have the quotes. But if any spaces follow the comma before the first quote then the quotes are included as part of the value and the text is taken literally. > Delimiter followed by Whitespace then by Quotes Failing with setTrim(true) > -- > > Key: CSV-296 > URL: https://issues.apache.org/jira/browse/CSV-296 > Project: Commons CSV > Issue Type: Bug > Components: Parser >Affects Versions: 1.8, 1.9.0 > Environment: +{*}macOS{*}:+ > {code:java} > > uname -a > Darwin Senzing-MacBook-Pro.local 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar > 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64 x86_64 {code} > {code:java} > > java -version > openjdk version "11.0.14" 2022-01-18 > OpenJDK Runtime Environment Temurin-11.0.14+9 (build 11.0.14+9) > OpenJDK 64-Bit Server VM Temurin-11.0.14+9 (build 11.0.14+9, mixed mode) > {code} > {+}*Linux*{+}: > {code:java} > > uname -a > Linux lnxdev 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 > x86_64 x86_64 x86_64 GNU/Linux {code} > {code:java} > > java -version > openjdk version "11.0.11" 2021-04-20 > OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9) > OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed > mode){code} >Reporter: Barry M. Caceres >Priority: Major > Attachments: csvfail.zip > > > I have my CSVFormat initialized such that *{{withTrim(true)}}* has been set > {_}(see attached ZIP file){_}: > {code:java} > CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader() > .withIgnoreEmptyLines(true).withTrim(true);{code} > > However, a quoted string that begins after a delimiter followed by preceding > whitespace is not properly parsed. For example: > {code:java} > GIVEN_NAME,SURNAME,ADDRESS,PHONE_NUMBER > "Joe", "Schmoe","101 Main Street; Las Vegas, NV 89101","702-555-1212" > "John","Doe", "201 First Street; Las Vegas, NV 89102", "702-555-1313" > "Jane","Doe","301 Second Street; Las Vegas, NV 89103","702-555-1414" > {code} > > * Notice the whitespace preceding {color:#0747a6}*{{"Schmoe"}}*{color} on > the first record? This leads to the actual value containing the quotation > marks instead of them being stripped off. > * The whitespace preceding {color:#0747a6}*{{"201 First Street; Las Vegas, > NV 89102"}}*{color} on the second record leads to it to being parsed as two > values: {color:#0747a6}*{{"201 First Street; Las Vegas}}*{color} and {*}{{NV > 89102"}}{*}. > * The third record is the only one that parses as expected. > I believe that this is because the trimming is done *after* the value is > being parsed rather than consuming the whitespace following the delimiter > during parsing. Either that, or the check for a quoted string is occurring > *before* the whitespace is being consumed. > > *NOTE:* I have attached a ZIP file that easily reproduces the problem with > the CSV file given above. > To build the attached project use Apache Maven and then execute using using > Java 11: > {code:java} > > unzip csvfail.zip > > cd csvfail > > mvn package > > java -jar target/csv-fail-1.0-SNAPSHOT.jar{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CSV-296) Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)
[ https://issues.apache.org/jira/browse/CSV-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17541096#comment-17541096 ] Barry M. Caceres commented on CSV-296: -- Exactly my assessment – the trimming is only affecting the values after they have been parsed. However, there is no way to ignore white space in the parsing phase. If the "trim" was implemented in the parsing phase it would likely handle the problem I am dealing with it, but might not deal with quoted strings properly. How do I get whitespace after the delimiter ignored during parsing? > Delimiter followed by Whitespace then by Quotes Failing with setTrim(true) > -- > > Key: CSV-296 > URL: https://issues.apache.org/jira/browse/CSV-296 > Project: Commons CSV > Issue Type: Bug > Components: Parser >Affects Versions: 1.8, 1.9.0 > Environment: +{*}macOS{*}:+ > {code:java} > > uname -a > Darwin Senzing-MacBook-Pro.local 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar > 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64 x86_64 {code} > {code:java} > > java -version > openjdk version "11.0.14" 2022-01-18 > OpenJDK Runtime Environment Temurin-11.0.14+9 (build 11.0.14+9) > OpenJDK 64-Bit Server VM Temurin-11.0.14+9 (build 11.0.14+9, mixed mode) > {code} > {+}*Linux*{+}: > {code:java} > > uname -a > Linux lnxdev 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 > x86_64 x86_64 x86_64 GNU/Linux {code} > {code:java} > > java -version > openjdk version "11.0.11" 2021-04-20 > OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9) > OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed > mode){code} >Reporter: Barry M. Caceres >Priority: Major > Attachments: csvfail.zip > > > I have my CSVFormat initialized such that *{{withTrim(true)}}* has been set > {_}(see attached ZIP file){_}: > {code:java} > CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader() > .withIgnoreEmptyLines(true).withTrim(true);{code} > > However, a quoted string that begins after a delimiter followed by preceding > whitespace is not properly parsed. For example: > {code:java} > GIVEN_NAME,SURNAME,ADDRESS,PHONE_NUMBER > "Joe", "Schmoe","101 Main Street; Las Vegas, NV 89101","702-555-1212" > "John","Doe", "201 First Street; Las Vegas, NV 89102", "702-555-1313" > "Jane","Doe","301 Second Street; Las Vegas, NV 89103","702-555-1414" > {code} > > * Notice the whitespace preceding {color:#0747a6}*{{"Schmoe"}}*{color} on > the first record? This leads to the actual value containing the quotation > marks instead of them being stripped off. > * The whitespace preceding {color:#0747a6}*{{"201 First Street; Las Vegas, > NV 89102"}}*{color} on the second record leads to it to being parsed as two > values: {color:#0747a6}*{{"201 First Street; Las Vegas}}*{color} and {*}{{NV > 89102"}}{*}. > * The third record is the only one that parses as expected. > I believe that this is because the trimming is done *after* the value is > being parsed rather than consuming the whitespace following the delimiter > during parsing. Either that, or the check for a quoted string is occurring > *before* the whitespace is being consumed. > > *NOTE:* I have attached a ZIP file that easily reproduces the problem with > the CSV file given above. > To build the attached project use Apache Maven and then execute using using > Java 11: > {code:java} > > unzip csvfail.zip > > cd csvfail > > mvn package > > java -jar target/csv-fail-1.0-SNAPSHOT.jar{code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (CSV-296) Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)
[ https://issues.apache.org/jira/browse/CSV-296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry M. Caceres updated CSV-296: - Description: I have my CSVFormat initialized such that *{{withTrim(true)}}* has been set {_}(see attached ZIP file){_}: {code:java} CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader() .withIgnoreEmptyLines(true).withTrim(true);{code} However, a quoted string that begins after a delimiter followed by preceding whitespace is not properly parsed. For example: {code:java} GIVEN_NAME,SURNAME,ADDRESS,PHONE_NUMBER "Joe", "Schmoe","101 Main Street; Las Vegas, NV 89101","702-555-1212" "John","Doe", "201 First Street; Las Vegas, NV 89102", "702-555-1313" "Jane","Doe","301 Second Street; Las Vegas, NV 89103","702-555-1414" {code} * Notice the whitespace preceding {color:#0747a6}*{{"Schmoe"}}*{color} on the first record? This leads to the actual value containing the quotation marks instead of them being stripped off. * The whitespace preceding {color:#0747a6}*{{"201 First Street; Las Vegas, NV 89102"}}*{color} on the second record leads to it to being parsed as two values: {color:#0747a6}*{{"201 First Street; Las Vegas}}*{color} and {*}{{NV 89102"}}{*}. * The third record is the only one that parses as expected. I believe that this is because the trimming is done *after* the value is being parsed rather than consuming the whitespace following the delimiter during parsing. Either that, or the check for a quoted string is occurring *before* the whitespace is being consumed. *NOTE:* I have attached a ZIP file that easily reproduces the problem with the CSV file given above. To build the attached project use Apache Maven and then execute using using Java 11: {code:java} > unzip csvfail.zip > cd csvfail > mvn package > java -jar target/csv-fail-1.0-SNAPSHOT.jar{code} was: I have my CSVFormat initialized such that *{{withTrim(true)}}* has been set. {code:java} CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader() .withIgnoreEmptyLines(true).withTrim(true);{code} However, a quoted string that begins after a delimiter followed by preceding whitespace is not properly parsed. For example: {code:java} GIVEN_NAME,SURNAME,ADDRESS,PHONE_NUMBER "Joe", "Schmoe","101 Main Street; Las Vegas, NV 89101","702-555-1212" "John","Doe", "201 First Street; Las Vegas, NV 89102", "702-555-1313" "Jane","Doe","301 Second Street; Las Vegas, NV 89103","702-555-1414" {code} Notice the whitespace preceding *{{"Schmoe"}}* on the first record? This leads to the actual value containing the quotation marks instead of them being stripped off. The whitespace preceding {color:#0747a6}*{{"201 First Street; Las Vegas, NV 89102"}}*{color} on the second record leads to it to being parsed as two values: {color:#0747a6}*{{"201 First Street; Las Vegas}}*{color} and {*}{{NV 89102"}}{*}. The third record is the only one that parses as expected. I believe that this is because the trimming is done *after* the value is being parsed rather than consuming the whitespace following the delimiter during parsing. Either that, or the check for a quoted string is occurring *before* the whitespace is being consumed. *NOTE:* I have attached a ZIP file that easily reproduces the problem with the CSV file given above. > Delimiter followed by Whitespace then by Quotes Failing with setTrim(true) > -- > > Key: CSV-296 > URL: https://issues.apache.org/jira/browse/CSV-296 > Project: Commons CSV > Issue Type: Bug > Components: Parser >Affects Versions: 1.8, 1.9.0 > Environment: +{*}macOS{*}:+ > {code:java} > > uname -a > Darwin Senzing-MacBook-Pro.local 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar > 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64 x86_64 {code} > {code:java} > > java -version > openjdk version "11.0.14" 2022-01-18 > OpenJDK Runtime Environment Temurin-11.0.14+9 (build 11.0.14+9) > OpenJDK 64-Bit Server VM Temurin-11.0.14+9 (build 11.0.14+9, mixed mode) > {code} > {+}*Linux*{+}: > {code:java} > > uname -a > Linux lnxdev 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 > x86_64 x86_64 x86_64 GNU/Linux {code} > {code:java} > > java -version > openjdk version "11.0.11" 2021-04-20 > OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9) > OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed > mode){code} >Reporter: Barry M. Caceres >Priority: Major > Attachments: csvfail.zip > > > I have my CSVFormat initialized such that *{{withTrim(true)}}* has been set > {_}(see attached ZIP file){_}: > {code:java} > CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader() > .withIgnoreEmptyLines(true).withTrim(true)
[jira] [Created] (CSV-296) Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)
Barry M. Caceres created CSV-296: Summary: Delimiter followed by Whitespace then by Quotes Failing with setTrim(true) Key: CSV-296 URL: https://issues.apache.org/jira/browse/CSV-296 Project: Commons CSV Issue Type: Bug Components: Parser Affects Versions: 1.9.0, 1.8 Environment: +{*}macOS{*}:+ {code:java} > uname -a Darwin Senzing-MacBook-Pro.local 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64 x86_64 {code} {code:java} > java -version openjdk version "11.0.14" 2022-01-18 OpenJDK Runtime Environment Temurin-11.0.14+9 (build 11.0.14+9) OpenJDK 64-Bit Server VM Temurin-11.0.14+9 (build 11.0.14+9, mixed mode) {code} {+}*Linux*{+}: {code:java} > uname -a Linux lnxdev 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux {code} {code:java} > java -version openjdk version "11.0.11" 2021-04-20 OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9) OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed mode){code} Reporter: Barry M. Caceres Attachments: csvfail.zip I have my CSVFormat initialized such that *{{withTrim(true)}}* has been set. {code:java} CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader() .withIgnoreEmptyLines(true).withTrim(true);{code} However, a quoted string that begins after a delimiter followed by preceding whitespace is not properly parsed. For example: {code:java} GIVEN_NAME,SURNAME,ADDRESS,PHONE_NUMBER "Joe", "Schmoe","101 Main Street; Las Vegas, NV 89101","702-555-1212" "John","Doe", "201 First Street; Las Vegas, NV 89102", "702-555-1313" "Jane","Doe","301 Second Street; Las Vegas, NV 89103","702-555-1414" {code} Notice the whitespace preceding *{{"Schmoe"}}* on the first record? This leads to the actual value containing the quotation marks instead of them being stripped off. The whitespace preceding {color:#0747a6}*{{"201 First Street; Las Vegas, NV 89102"}}*{color} on the second record leads to it to being parsed as two values: {color:#0747a6}*{{"201 First Street; Las Vegas}}*{color} and {*}{{NV 89102"}}{*}. The third record is the only one that parses as expected. I believe that this is because the trimming is done *after* the value is being parsed rather than consuming the whitespace following the delimiter during parsing. Either that, or the check for a quoted string is occurring *before* the whitespace is being consumed. *NOTE:* I have attached a ZIP file that easily reproduces the problem with the CSV file given above. -- This message was sent by Atlassian Jira (v8.20.7#820007)