[jira] [Commented] (CSV-290) Produced CSV using PostgreSQL format cannot be read

2022-09-25 Thread Angus C (Jira)


[ 
https://issues.apache.org/jira/browse/CSV-290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609259#comment-17609259
 ] 

Angus C commented on CSV-290:
-

https://github.com/apache/commons-csv/pull/265

> Produced CSV using PostgreSQL format cannot be read
> ---
>
> Key: CSV-290
> URL: https://issues.apache.org/jira/browse/CSV-290
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.6, 1.9.0
>Reporter: Anatoliy Artemenko
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code:java}
> // code placeholder
> {code}
> CSV, produced using printer:
>  
> CSVPrinter printer = new CSVPrinter(sw, 
> CSVFormat.POSTGRESQL_CSV.withFirstRecordAsHeader());
>  
> cannot be be read with same format parser:
>  
> CSVParser parser = new CSVParser(new StringReader(sw.toString()), 
> CSVFormat.POSTGRESQL_CSV.withFirstRecordAsHeader());
>  
> To reproduce: 
>  
> {code:java}
> StringWriter sw = new StringWriter(); 
> CSVPrinter printer = new CSVPrinter(sw, 
> CSVFormat.POSTGRESQL_CSV.withFirstRecordAsHeader());  
> printer.printRecord("column1", "column2"); 
> printer.printRecord("v11", "v12"); 
> printer.printRecord("v21", "v22");  
> printer.close();  
> CSVParser parser = new CSVParser(new StringReader(sw.toString()), 
> CSVFormat.POSTGRESQL_CSV.withFirstRecordAsHeader());  
> System.out.println("headers: " + 
> Arrays.equals(parser.getHeaderNames().toArray(), new String[] {"column1", 
> "column2"}));  
> Iterator i = parser.iterator(); 
> System.out.println("row: " + Arrays.equals(i.next().toList().toArray(), new 
> String[] {"v11", "v12"})); 
> System.out.println("row: " + Arrays.equals(i.next().toList().toArray(), new 
> String[] {"v21", "v22"}));{code}
> I'd expect the above code to work, but it fails:
> {code:java}
> java.io.IOException: (startline 1) EOF reached before encapsulated token 
> finishedjava.io.IOException: (startline 1) EOF reached before encapsulated 
> token finished 
> at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:371) 
> at org.apache.commons.csv.Lexer.nextToken(Lexer.java:285) 
> at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:701) 
> at org.apache.commons.csv.CSVParser.createHeaders(CSVParser.java:480) 
> at org.apache.commons.csv.CSVParser.(CSVParser.java:432) 
> at org.apache.commons.csv.CSVParser.(CSVParser.java:398) 
> at Test.main(Test.java:25)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CSV-290) Produced CSV using PostgreSQL format cannot be read

2022-09-25 Thread Angus C (Jira)


[ 
https://issues.apache.org/jira/browse/CSV-290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609257#comment-17609257
 ] 

Angus C commented on CSV-290:
-

I tested in psql 14.5 Homebrew in Mac M1.

for {{{}CSVFormat.POSTGRESQL_CSV{}}}, special characters are not escaped.
for {{{}CSVFormat.POSTGRESQL_TEXT{}}}, values are not quoted.
{code:sql}
create table COMMONS_CSV_PSQL_TEST (ID INTEGER, COL1 VARCHAR, COL2 VARCHAR, 
COL3 VARCHAR, COL4 VARCHAR);
insert into COMMONS_CSV_PSQL_TEST select 1, 'abc', 'test line 1' || chr(10) || 
'test line 2', null, '';
insert into COMMONS_CSV_PSQL_TEST select 2, 'xyz', '\b:' || chr(8) || ' \n:' || 
chr(10) || ' \r:' || chr(13), 'a', 'b';
insert into COMMONS_CSV_PSQL_TEST values (3, 'a', 'b,c,d', '"quoted"', 'e');
copy COMMONS_CSV_PSQL_TEST to '/tmp/psql.csv' with (FORMAT CSV);
copy COMMONS_CSV_PSQL_TEST to '/tmp/psql.tsv';{code}
{code:java}
cat /tmp/psql.csv
1,abc,"test line 1
test line 2",,""
2,xyz,"\b:^H \n:
\r:^M",a,b
3,a,"b,c,d","""quoted""",e{code}
{code:java}
cat /tmp/psql.tsv
1    abc    test line 1\ntest line 2               \N
2    xyz    \\b:\b \\n:\n \\r:\r       a           b
3    a      b,c,d                      "quoted"    e{code}

> Produced CSV using PostgreSQL format cannot be read
> ---
>
> Key: CSV-290
> URL: https://issues.apache.org/jira/browse/CSV-290
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.6, 1.9.0
>Reporter: Anatoliy Artemenko
>Priority: Major
>
> {code:java}
> // code placeholder
> {code}
> CSV, produced using printer:
>  
> CSVPrinter printer = new CSVPrinter(sw, 
> CSVFormat.POSTGRESQL_CSV.withFirstRecordAsHeader());
>  
> cannot be be read with same format parser:
>  
> CSVParser parser = new CSVParser(new StringReader(sw.toString()), 
> CSVFormat.POSTGRESQL_CSV.withFirstRecordAsHeader());
>  
> To reproduce: 
>  
> {code:java}
> StringWriter sw = new StringWriter(); 
> CSVPrinter printer = new CSVPrinter(sw, 
> CSVFormat.POSTGRESQL_CSV.withFirstRecordAsHeader());  
> printer.printRecord("column1", "column2"); 
> printer.printRecord("v11", "v12"); 
> printer.printRecord("v21", "v22");  
> printer.close();  
> CSVParser parser = new CSVParser(new StringReader(sw.toString()), 
> CSVFormat.POSTGRESQL_CSV.withFirstRecordAsHeader());  
> System.out.println("headers: " + 
> Arrays.equals(parser.getHeaderNames().toArray(), new String[] {"column1", 
> "column2"}));  
> Iterator i = parser.iterator(); 
> System.out.println("row: " + Arrays.equals(i.next().toList().toArray(), new 
> String[] {"v11", "v12"})); 
> System.out.println("row: " + Arrays.equals(i.next().toList().toArray(), new 
> String[] {"v21", "v22"}));{code}
> I'd expect the above code to work, but it fails:
> {code:java}
> java.io.IOException: (startline 1) EOF reached before encapsulated token 
> finishedjava.io.IOException: (startline 1) EOF reached before encapsulated 
> token finished 
> at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:371) 
> at org.apache.commons.csv.Lexer.nextToken(Lexer.java:285) 
> at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:701) 
> at org.apache.commons.csv.CSVParser.createHeaders(CSVParser.java:480) 
> at org.apache.commons.csv.CSVParser.(CSVParser.java:432) 
> at org.apache.commons.csv.CSVParser.(CSVParser.java:398) 
> at Test.main(Test.java:25)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (COLLECTIONS-814) CollectionUtils.removeAll() not throwing proper NullPointerException(NPE) if the first parameter is empty

2022-09-24 Thread Angus C (Jira)


[ 
https://issues.apache.org/jira/browse/COLLECTIONS-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609060#comment-17609060
 ] 

Angus C commented on COLLECTIONS-814:
-

https://github.com/apache/commons-collections/pull/340

> CollectionUtils.removeAll() not throwing proper NullPointerException(NPE) if 
> the first parameter is empty
> -
>
> Key: COLLECTIONS-814
> URL: https://issues.apache.org/jira/browse/COLLECTIONS-814
> Project: Commons Collections
>  Issue Type: Bug
>Affects Versions: 4.4
>Reporter: Elia Bertolina
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The CollectionUtils.removeAll(Collection collection, Collection remove) 
> does not throw a NullPointerException(NPE) when the “remove” parameters is 
> null, but only if the “collection” parameter is empty.
> In the documentation it is stated that an NPE will be thrown if any of the 
> parameters is null.
> However, in this test case:
>  
> {code:java}
> public class CollectionUtils_failure_Test {
>     public void test() throws Throwable {
>     LinkedList linkedList = new
> LinkedList();
> try {
> Collection collection =
> CollectionUtils.removeAll(
> (Collection)linkedList,
> (Collection) null);
>     org.junit.Assert.fail();
>     } catch (java.lang.NullPointerException e) {
>            //Exception caught and test successful
>     }    
> }
> } {code}
>  
> This is a special case (first parameter needs to be empty and the second 
> needs to be null) but this behavior is missing in the documentation. While 
> this behavior is somehow correct (removing a null Object from an empty 
> Collection we should obtain an empty Collection) I think throwing an NPE 
> would be more in line with the documentation provided.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CSV-296) Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)

2022-09-22 Thread Angus C (Jira)


[ 
https://issues.apache.org/jira/browse/CSV-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17608438#comment-17608438
 ] 

Angus C commented on CSV-296:
-

Use 

setIgnoreSurroundingSpaces(true)

> Delimiter followed by Whitespace then by Quotes Failing with setTrim(true)
> --
>
> Key: CSV-296
> URL: https://issues.apache.org/jira/browse/CSV-296
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.8, 1.9.0
> Environment: +{*}macOS{*}:+
> {code:java}
> > uname -a
> Darwin Senzing-MacBook-Pro.local 21.4.0 Darwin Kernel Version 21.4.0: Fri Mar 
> 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64 x86_64 {code}
> {code:java}
> > java -version
> openjdk version "11.0.14" 2022-01-18
> OpenJDK Runtime Environment Temurin-11.0.14+9 (build 11.0.14+9)
> OpenJDK 64-Bit Server VM Temurin-11.0.14+9 (build 11.0.14+9, mixed mode) 
> {code}
> {+}*Linux*{+}:
> {code:java}
> > uname -a
> Linux lnxdev 5.4.0-109-generic #123-Ubuntu SMP Fri Apr 8 09:10:54 UTC 2022 
> x86_64 x86_64 x86_64 GNU/Linux {code}
> {code:java}
> > java -version
> openjdk version "11.0.11" 2021-04-20
> OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)
> OpenJDK 64-Bit Server VM AdoptOpenJDK-11.0.11+9 (build 11.0.11+9, mixed 
> mode){code}
>Reporter: Barry M. Caceres
>Priority: Major
> Attachments: csvfail.zip
>
>
> I have my CSVFormat initialized such that *{{withTrim(true)}}* has been set 
> {_}(see attached ZIP file){_}:
> {code:java}
> CSVFormat csvFormat = CSVFormat.DEFAULT.withFirstRecordAsHeader()
>         .withIgnoreEmptyLines(true).withTrim(true);{code}
>  
> However, a quoted string that begins after a delimiter followed by preceding 
> whitespace is not properly parsed. For example:
> {code:java}
> GIVEN_NAME,SURNAME,ADDRESS,PHONE_NUMBER
> "Joe",  "Schmoe","101 Main Street; Las Vegas, NV 89101","702-555-1212"
> "John","Doe",  "201 First Street; Las Vegas, NV 89102", "702-555-1313"
> "Jane","Doe","301 Second Street; Las Vegas, NV 89103","702-555-1414"
> {code}
>  
>  * Notice the whitespace preceding {color:#0747a6}*{{"Schmoe"}}*{color} on 
> the first record?  This leads to the actual value containing the quotation 
> marks instead of them being stripped off.
>  * The whitespace preceding {color:#0747a6}*{{"201 First Street; Las Vegas, 
> NV 89102"}}*{color} on the second record leads to it to being parsed as two 
> values: {color:#0747a6}*{{"201 First Street; Las Vegas}}*{color} and {*}{{NV 
> 89102"}}{*}.
>  * The third record is the only one that parses as expected.
> I believe that this is because the trimming is done *after* the value is 
> being parsed rather than consuming the whitespace following the delimiter 
> during parsing.   Either that, or the check for a quoted string is occurring 
> *before* the whitespace is being consumed.
>  
> *NOTE:* I have attached a ZIP file that easily reproduces the problem with 
> the CSV file given above.
> To build the attached project use Apache Maven and then execute using using 
> Java 11:
> {code:java}
> > unzip csvfail.zip
> > cd csvfail
> > mvn package
> > java -jar target/csv-fail-1.0-SNAPSHOT.jar{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CSV-295) Support for parallelism in CSVPrinter

2022-03-11 Thread Angus C (Jira)


[ 
https://issues.apache.org/jira/browse/CSV-295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17505168#comment-17505168
 ] 

Angus C commented on CSV-295:
-

Is it good to add synchronized in the core library which cause performance 
penalty to all single-threaded application? Can just lock the CSVPrinter in 
your application?

> Support for parallelism in CSVPrinter
> -
>
> Key: CSV-295
> URL: https://issues.apache.org/jira/browse/CSV-295
> Project: Commons CSV
>  Issue Type: Improvement
>  Components: Printer
>Affects Versions: 1.9.0
> Environment: 
> https://zio.dev/version-1.x/overview/overview_creating_effects#blocking-synchronous-side-effects
>Reporter: Zimo Li
>Priority: Major
> Fix For: 1.10.0
>
>
> I am trying to write the result of network IO to a CSV file using Scala and 
> the ZIO library. The order of the rows does not matter, so I decided to use a 
> concurrency of 8.
> Each thread calls {{{}CSVPrinter.printRecord{}}}, and this caused some rows 
> to intersect with others. Eventually, I decided to use a 
> [Semaphore|https://zio.dev/version-1.x/datatypes/concurrency/semaphore] to 
> fix it.
> A locking mechanism for {{printRecord}} can be implemented just like the 
> underlying {{FileWriter}} or {{PrintWriter}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (CSV-288) String delimiter (||) is not working as expected.

2022-02-15 Thread Angus C (Jira)


[ 
https://issues.apache.org/jira/browse/CSV-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17492428#comment-17492428
 ] 

Angus C edited comment on CSV-288 at 2/15/22, 8:15 AM:
---

In below line in Lexer.java, the isDelimiter() unintentionally advance the 
buffer pointer  and "eat" the first "|" when it comes to the "b" in "|b|" 
(lastChar is "|", nextChar is also "|", make it "||"). So "a||bc||d " doesn't 
fail (lastChar is "|", but next char is "c"), neither if the delimiter is two 
different char (e.g. |!).

The checking is used to detect the last empty column (e.g. in "a,b,")
{code:java}
// did we reach eof during the last iteration already ? EOF
if (isEndOfFile(lastChar) || !isDelimiter(lastChar) && isEndOfFile(c)) { {code}


was (Author: JIRAUSER285196):
In below line in Lexar.java, the isDelimiter() unintentionally advance the 
buffer pointer  and "eat" the first "|" when it comes to the "b" in "|b|" 
(lastChar is "|", nextChar is also "|", make it "||"). So "a||bc||d " doesn't 
fail (lastChar is "|", but next char is "c"), neither if the delimiter is two 
different char (e.g. |!).

The checking is used to detect the last empty column (e.g. in "a,b,")
{code:java}
// did we reach eof during the last iteration already ? EOF
if (isEndOfFile(lastChar) || !isDelimiter(lastChar) && isEndOfFile(c)) { {code}

> String delimiter (||) is not working as expected.
> -
>
> Key: CSV-288
> URL: https://issues.apache.org/jira/browse/CSV-288
> Project: Commons CSV
>  Issue Type: Bug
>Reporter: Santhsoh
>Priority: Major
>
> Steps to reproduce  : 
> 1. Parse CSV file with || as delimiter and having empty columns
> 2. Print the CSVRecord resulting from CSVParser
>  
> //Expected : a,b,c,d,,f,g 
> // Actual : a,b|c,d,|f,g
> public static void main(String[] args) throws Exception\{
>  String row = "a||b||c||df||g";
>  StringBuilder stringBuilder = new StringBuilder();
>  try (CSVPrinter csvPrinter = new CSVPrinter(stringBuilder, 
> CSVFormat.EXCEL);
>   CSVParser csvParser = CSVParser.parse(new StringInputStream(row), 
> StandardCharsets.UTF_8, 
> CSVFormat.Builder.create().setDelimiter("||").build())) {
>  for (CSVRecord csvRecord : csvParser) {
>  for (int i = 0; i < csvRecord.size(); i++) {
>  csvPrinter.print(csvRecord.get(i));
>  }
>  System.out.println(stringBuilder.toString());
>  //Expected : a,b,c,d,,f,g
> // Actual : a,b|c,d,|f,g
>  }
>  }
>  }
> With the snippet provided above, actual value is not same as expected value



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (CSV-288) String delimiter (||) is not working as expected.

2022-02-15 Thread Angus C (Jira)


[ 
https://issues.apache.org/jira/browse/CSV-288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17492428#comment-17492428
 ] 

Angus C commented on CSV-288:
-

In below line in Lexar.java, the isDelimiter() unintentionally advance the 
buffer pointer  and "eat" the first "|" when it comes to the "b" in "|b|" 
(lastChar is "|", nextChar is also "|", make it "||"). So "a||bc||d " doesn't 
fail (lastChar is "|", but next char is "c"), neither if the delimiter is two 
different char (e.g. |!).

The checking is used to detect the last empty column (e.g. in "a,b,")
{code:java}
// did we reach eof during the last iteration already ? EOF
if (isEndOfFile(lastChar) || !isDelimiter(lastChar) && isEndOfFile(c)) { {code}

> String delimiter (||) is not working as expected.
> -
>
> Key: CSV-288
> URL: https://issues.apache.org/jira/browse/CSV-288
> Project: Commons CSV
>  Issue Type: Bug
>Reporter: Santhsoh
>Priority: Major
>
> Steps to reproduce  : 
> 1. Parse CSV file with || as delimiter and having empty columns
> 2. Print the CSVRecord resulting from CSVParser
>  
> //Expected : a,b,c,d,,f,g 
> // Actual : a,b|c,d,|f,g
> public static void main(String[] args) throws Exception\{
>  String row = "a||b||c||df||g";
>  StringBuilder stringBuilder = new StringBuilder();
>  try (CSVPrinter csvPrinter = new CSVPrinter(stringBuilder, 
> CSVFormat.EXCEL);
>   CSVParser csvParser = CSVParser.parse(new StringInputStream(row), 
> StandardCharsets.UTF_8, 
> CSVFormat.Builder.create().setDelimiter("||").build())) {
>  for (CSVRecord csvRecord : csvParser) {
>  for (int i = 0; i < csvRecord.size(); i++) {
>  csvPrinter.print(csvRecord.get(i));
>  }
>  System.out.println(stringBuilder.toString());
>  //Expected : a,b,c,d,,f,g
> // Actual : a,b|c,d,|f,g
>  }
>  }
>  }
> With the snippet provided above, actual value is not same as expected value



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (CSV-290) Produced CSV using PostgreSQL format cannot be read

2022-02-14 Thread Angus C (Jira)


[ 
https://issues.apache.org/jira/browse/CSV-290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17492340#comment-17492340
 ] 

Angus C edited comment on CSV-290 at 2/15/22, 3:52 AM:
---

Basically the "EOF reached" always happens if quote-char = escape-char. 
Considering the input string ("a"), Lexer.java treats the second (") as an 
escape char and read the unescaped \r, and then complain for missing the 
ending-quote (")
{code:java}
CSVFormat.Builder.create().setEscape('"').build().parse(new 
StringReader("\"a\"")).getRecords();
{code}
I think the setEscape() is used for escaping special char like \r, \t etc. as 
in Lexer.readEscape() but not the quote-char. The quote-char should always be 
escaped by quote-char, not the escape-char.

Your fix is to disable the escape-char in quoted-string if it is equal to 
quote-char. It can be a fail-safe but I think we should remove the 
.setEscape(DOUBLE_QUOTE_CHAR) in POSTGRESQL_CSV. The javadoc says "special 
characters are escaped with quote" but I doubt that it is correct or not


was (Author: JIRAUSER285196):
Basically the "EOF reached" always happens if quote-char = escape-char. 
Considering  the input string ("a"), Lexer.java treats the second (") as an 
escape char and read the unescaped \r, and then complain for missing the 
ending-quote (")
{code:java}
CSVFormat.Builder.create().setEscape('"').build().parse(new 
StringReader("\"a\"")).getRecords();
{code}
I think the setEscape() is used for escaping special char like \r, \t etc. as 
in Lexer.readEscape() but not the quote-char.  The quote-char should be always 
escaped by quote-char, not the escape-char.

Your fix is to disable the escape-char in quoted-string if it is equal to 
quote-char.  It can be a fail-save but I think we should remove the 
.setEscape(DOUBLE_QUOTE_CHAR) in POSTGRESQL_CSV.  The javadoc says "special * 
characters are escaped with quote" but I doubt that it is correct or not

> Produced CSV using PostgreSQL format cannot be read
> ---
>
> Key: CSV-290
> URL: https://issues.apache.org/jira/browse/CSV-290
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.6, 1.9.0
>Reporter: Anatoliy Artemenko
>Priority: Major
>
> {code:java}
> // code placeholder
> {code}
> CSV, produced using printer:
>  
> CSVPrinter printer = new CSVPrinter(sw, 
> CSVFormat.POSTGRESQL_CSV.withFirstRecordAsHeader());
>  
> cannot be be read with same format parser:
>  
> CSVParser parser = new CSVParser(new StringReader(sw.toString()), 
> CSVFormat.POSTGRESQL_CSV.withFirstRecordAsHeader());
>  
> To reproduce: 
>  
> {code:java}
> StringWriter sw = new StringWriter(); 
> CSVPrinter printer = new CSVPrinter(sw, 
> CSVFormat.POSTGRESQL_CSV.withFirstRecordAsHeader());  
> printer.printRecord("column1", "column2"); 
> printer.printRecord("v11", "v12"); 
> printer.printRecord("v21", "v22");  
> printer.close();  
> CSVParser parser = new CSVParser(new StringReader(sw.toString()), 
> CSVFormat.POSTGRESQL_CSV.withFirstRecordAsHeader());  
> System.out.println("headers: " + 
> Arrays.equals(parser.getHeaderNames().toArray(), new String[] {"column1", 
> "column2"}));  
> Iterator i = parser.iterator(); 
> System.out.println("row: " + Arrays.equals(i.next().toList().toArray(), new 
> String[] {"v11", "v12"})); 
> System.out.println("row: " + Arrays.equals(i.next().toList().toArray(), new 
> String[] {"v21", "v22"}));{code}
> I'd expect the above code to work, but it fails:
> {code:java}
> java.io.IOException: (startline 1) EOF reached before encapsulated token 
> finishedjava.io.IOException: (startline 1) EOF reached before encapsulated 
> token finished 
> at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:371) 
> at org.apache.commons.csv.Lexer.nextToken(Lexer.java:285) 
> at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:701) 
> at org.apache.commons.csv.CSVParser.createHeaders(CSVParser.java:480) 
> at org.apache.commons.csv.CSVParser.(CSVParser.java:432) 
> at org.apache.commons.csv.CSVParser.(CSVParser.java:398) 
> at Test.main(Test.java:25)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (CSV-290) Produced CSV using PostgreSQL format cannot be read

2022-02-14 Thread Angus C (Jira)


[ 
https://issues.apache.org/jira/browse/CSV-290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17492340#comment-17492340
 ] 

Angus C commented on CSV-290:
-

Basically the "EOF reached" always happens if quote-char = escape-char. 
Considering  the input string ("a"), Lexer.java treats the second (") as an 
escape char and read the unescaped \r, and then complain for missing the 
ending-quote (")
{code:java}
CSVFormat.Builder.create().setEscape('"').build().parse(new 
StringReader("\"a\"")).getRecords();
{code}
I think the setEscape() is used for escaping special char like \r, \t etc. as 
in Lexer.readEscape() but not the quote-char.  The quote-char should be always 
escaped by quote-char, not the escape-char.

Your fix is to disable the escape-char in quoted-string if it is equal to 
quote-char.  It can be a fail-save but I think we should remove the 
.setEscape(DOUBLE_QUOTE_CHAR) in POSTGRESQL_CSV.  The javadoc says "special * 
characters are escaped with quote" but I doubt that it is correct or not

> Produced CSV using PostgreSQL format cannot be read
> ---
>
> Key: CSV-290
> URL: https://issues.apache.org/jira/browse/CSV-290
> Project: Commons CSV
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 1.6, 1.9.0
>Reporter: Anatoliy Artemenko
>Priority: Major
>
> {code:java}
> // code placeholder
> {code}
> CSV, produced using printer:
>  
> CSVPrinter printer = new CSVPrinter(sw, 
> CSVFormat.POSTGRESQL_CSV.withFirstRecordAsHeader());
>  
> cannot be be read with same format parser:
>  
> CSVParser parser = new CSVParser(new StringReader(sw.toString()), 
> CSVFormat.POSTGRESQL_CSV.withFirstRecordAsHeader());
>  
> To reproduce: 
>  
> {code:java}
> StringWriter sw = new StringWriter(); 
> CSVPrinter printer = new CSVPrinter(sw, 
> CSVFormat.POSTGRESQL_CSV.withFirstRecordAsHeader());  
> printer.printRecord("column1", "column2"); 
> printer.printRecord("v11", "v12"); 
> printer.printRecord("v21", "v22");  
> printer.close();  
> CSVParser parser = new CSVParser(new StringReader(sw.toString()), 
> CSVFormat.POSTGRESQL_CSV.withFirstRecordAsHeader());  
> System.out.println("headers: " + 
> Arrays.equals(parser.getHeaderNames().toArray(), new String[] {"column1", 
> "column2"}));  
> Iterator i = parser.iterator(); 
> System.out.println("row: " + Arrays.equals(i.next().toList().toArray(), new 
> String[] {"v11", "v12"})); 
> System.out.println("row: " + Arrays.equals(i.next().toList().toArray(), new 
> String[] {"v21", "v22"}));{code}
> I'd expect the above code to work, but it fails:
> {code:java}
> java.io.IOException: (startline 1) EOF reached before encapsulated token 
> finishedjava.io.IOException: (startline 1) EOF reached before encapsulated 
> token finished 
> at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:371) 
> at org.apache.commons.csv.Lexer.nextToken(Lexer.java:285) 
> at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:701) 
> at org.apache.commons.csv.CSVParser.createHeaders(CSVParser.java:480) 
> at org.apache.commons.csv.CSVParser.(CSVParser.java:432) 
> at org.apache.commons.csv.CSVParser.(CSVParser.java:398) 
> at Test.main(Test.java:25)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (CSV-294) CSVFormat does not support explicit " as escape char

2022-02-14 Thread Angus C (Jira)


[ 
https://issues.apache.org/jira/browse/CSV-294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17492138#comment-17492138
 ] 

Angus C edited comment on CSV-294 at 2/15/22, 3:37 AM:
---

Even the input string ("a") will cause the exception as Lexer.java treats the 
second (") as an escape char and read the unescaped \r, and then complain for 
missing the ending-quote (")

E.g.
{code:java}
CSVFormat.Builder.create().setEscape('"').build().parse(new 
StringReader("\"a\"")).getRecords();{code}
I think you cannot use the quote char as escape char. commons-cvs already 
implement the RFC part that the quote char is escaped by preceding quote char, 
but not the escape char

E.g.
{code:java}
System.out.println("1 " + CSVFormat.Builder.create().build().parse(new 
StringReader("\"a\"")).getRecords().get(0).get(0));
System.out.println("2 " + CSVFormat.Builder.create().build().parse(new 
StringReader("\"\"\"a\"\"\"")).getRecords().get(0).get(0));
System.out.println("3 " + 
CSVFormat.Builder.create().setQuote('|').build().parse(new 
StringReader("|a|")).getRecords().get(0).get(0));
System.out.println("4 " + 
CSVFormat.Builder.create().setQuote('|').build().parse(new 
StringReader("|||a|||")).getRecords().get(0).get(0));
{code}
Output
{code:java}
1 a
2 "a"
3 a
4 |a|
{code}
 

 


was (Author: JIRAUSER285196):
Even the input string ("a") will cause the exception as Lexer.java treats the 
second (") as an escape char and read the unescaped \r, and then complain for 
missing the ending-quote (")

E.g.
{code:java}
CSVFormat.Builder.create().setEscape('"').build().parse(new 
StringReader("\"a\"")).getRecords();{code}
I think you cannot use the quote char as escape char. commons-cvs already 
implement the RFC part that the quote char is escaped by preceding quote char, 
but not the escape char

E.g.
{code:java}
System.out.println("1 " + CSVFormat.Builder.create().build().parse(new 
StringReader("\"a\"")).getRecords().get(0).get(0));
System.out.println("2 " + CSVFormat.Builder.create().build().parse(new 
StringReader("\"\"\"a\"\"\"")).getRecords().get(0).get(0));
System.out.println("3 " + 
CSVFormat.Builder.create().setQuote('|').build().parse(new 
StringReader("|a|")).getRecords().get(0).get(0));
System.out.println("4 " + 
CSVFormat.Builder.create().setQuote('|').build().parse(new 
StringReader("|||a|||")).getRecords().get(0).get(0));
{code}
 

Output
{code:java}
1 a
2 "a"
3 a
4 |a|
{code}
 

 

> CSVFormat does not support explicit " as escape char
> 
>
> Key: CSV-294
> URL: https://issues.apache.org/jira/browse/CSV-294
> Project: Commons CSV
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Joern Huxhorn
>Priority: Major
> Attachments: JiraCsv294Test.java
>
>
> Reading data that contains " does not work if escape character is *manually 
> set to {{'"'}}* as specified in [RFC 
> 4180|https://datatracker.ietf.org/doc/html/rfc4180].
> *It works for other escape characters or if no escape character is explicitly 
> defined in the format.*
> This line in {{Lexer.java}} is responsible for the originally quite erroneous 
> ticket:
> {{this.escape = mapNullToDisabled(format.getEscapeCharacter());}}
> From this line I (wrongly) deduced that an unspecified escape character would 
> actually disable escaping. Because of that I wanted to enable it by setting 
> it to {{'"'}} which causes exceptions in the Lexer for perfectly valid input. 
> That in turn convinced my that this is a way bigger issue than it is. Sorry 
> about that.
> I don't think that the current situation is ideal, though.
> I would not have been this confused if {{CSVFormat}} would be more explicit 
> about the escape char that will be used, i.e. if {{toString()}} would show 
> the implicitly used quote character or print - in case of {{null}} - that 
> this means it's using the quote character. It is currently omitted from the 
> output if it is not set explicitly.
> There is also no documentation about what {{null}} as escape character 
> actually means - it may be documented somewhere but isn't documented for 
> {{CSVFormat.getEscapeCharacter()}} or {{CSVFormat.Builder.set/getEscape()}} 
> methods.
> And setting the escape character explicitly to the value specified in the RFC 
> should certainly not fail, even if setting it to that value is superfluous 
> since {{null}} behaves exactly the same. 
> h4. Relevant part of the RFC:
> 7. If double-quotes are used to enclose fields, then a double-quote
> appearing inside a field must be escaped by preceding it with
> another double quote. For example:
> "aaa","b""bb","ccc"
> h4. Related issue:
> https://issues.apache.org/jira/browse/CSV-150



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (CSV-294) CSVFormat does not support explicit " as escape char

2022-02-14 Thread Angus C (Jira)


[ 
https://issues.apache.org/jira/browse/CSV-294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17492138#comment-17492138
 ] 

Angus C edited comment on CSV-294 at 2/14/22, 6:28 PM:
---

Even the input string ("a") will cause the exception as Lexer.java treats the 
second (") as an escape char and read the unescaped \r, and then complain for 
missing the ending-quote (")

E.g.

 
{code:java}

{code}
{{CSVFormat.Builder.create().setEscape('"').build().parse(new 
StringReader("\"a\"")).getRecords();}}

 

I think you cannot use the quote char as escape char. commons-cvs already 
implement the RFC part that the quote char is escaped by preceding quote char, 
but not the escape char

E.g.

 
{code:java}
System.out.println("1 " + CSVFormat.Builder.create().build().parse(new 
StringReader("\"a\"")).getRecords().get(0).get(0));
System.out.println("2 " + CSVFormat.Builder.create().build().parse(new 
StringReader("\"\"\"a\"\"\"")).getRecords().get(0).get(0));
System.out.println("3 " + 
CSVFormat.Builder.create().setQuote('|').build().parse(new 
StringReader("|a|")).getRecords().get(0).get(0));
System.out.println("4 " + 
CSVFormat.Builder.create().setQuote('|').build().parse(new 
StringReader("|||a|||")).getRecords().get(0).get(0));
{code}
 

Output
{code:java}
1 a
2 "a"
3 a
4 |a|
{code}
 

 


was (Author: JIRAUSER285196):
Even the input string ("a") will cause the exception as Lexer.java treats the 
second (") as an escape char and read the unescaped \r, and then complain for 
missing the ending-quote (")

E.g.

{{CSVFormat.Builder.{_}create{_}().setEscape('"').build().parse({*}new{*} 
StringReader("\"a\"")).getRecords();}}

I think you cannot use the quote char as escape char. commons-cvs already 
implement the RFC part that the quote char is escaped by preceding quote char, 
but not the escape char

E.g.

{{System.{*}_out_{*}.println("1 " + 
CSVFormat.Builder.{_}create{_}().build().parse({*}new{*} 
StringReader("\"a\"")).getRecords().get(0).get(0));}}

{{System.{*}_out_{*}.println("2 " + 
CSVFormat.Builder.{_}create{_}().build().parse({*}new{*} 
StringReader("\"\"\"a\"\"\"")).getRecords().get(0).get(0));}}

{{System.{*}_out_{*}.println("3 " + 
CSVFormat.Builder.{_}create{_}().setQuote('|').build().parse({*}new{*} 
StringReader("|a|")).getRecords().get(0).get(0));}}

{{System.{*}_out_{*}.println("4 " + 
CSVFormat.Builder.{_}create{_}().setQuote('|').build().parse({*}new{*} 
StringReader("|||a|||")).getRecords().get(0).get(0));}}

Output

--

{{1 a}}

{{2 "a"}}

{{3 a}}

{{4 |a|}}

 

 

> CSVFormat does not support explicit " as escape char
> 
>
> Key: CSV-294
> URL: https://issues.apache.org/jira/browse/CSV-294
> Project: Commons CSV
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Joern Huxhorn
>Priority: Major
> Attachments: JiraCsv294Test.java
>
>
> Reading data that contains " does not work if escape character is *manually 
> set to {{'"'}}* as specified in [RFC 
> 4180|https://datatracker.ietf.org/doc/html/rfc4180].
> *It works for other escape characters or if no escape character is explicitly 
> defined in the format.*
> This line in {{Lexer.java}} is responsible for the originally quite erroneous 
> ticket:
> {{this.escape = mapNullToDisabled(format.getEscapeCharacter());}}
> From this line I (wrongly) deduced that an unspecified escape character would 
> actually disable escaping. Because of that I wanted to enable it by setting 
> it to {{'"'}} which causes exceptions in the Lexer for perfectly valid input. 
> That in turn convinced my that this is a way bigger issue than it is. Sorry 
> about that.
> I don't think that the current situation is ideal, though.
> I would not have been this confused if {{CSVFormat}} would be more explicit 
> about the escape char that will be used, i.e. if {{toString()}} would show 
> the implicitly used quote character or print - in case of {{null}} - that 
> this means it's using the quote character. It is currently omitted from the 
> output if it is not set explicitly.
> There is also no documentation about what {{null}} as escape character 
> actually means - it may be documented somewhere but isn't documented for 
> {{CSVFormat.getEscapeCharacter()}} or {{CSVFormat.Builder.set/getEscape()}} 
> methods.
> And setting the escape character explicitly to the value specified in the RFC 
> should certainly not fail, even if setting it to that value is superfluous 
> since {{null}} behaves exactly the same. 
> h4. Relevant part of the RFC:
> 7. If double-quotes are used to enclose fields, then a double-quote
> appearing inside a field must be escaped by preceding it with
> another double quote. For example:
> "aaa","b""bb","ccc"
> h4. Related issue:
> https://issues.apache.org/jira/browse/CSV-150



--
This message was sent by Atlassian Jira

[jira] [Comment Edited] (CSV-294) CSVFormat does not support explicit " as escape char

2022-02-14 Thread Angus C (Jira)


[ 
https://issues.apache.org/jira/browse/CSV-294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17492138#comment-17492138
 ] 

Angus C edited comment on CSV-294 at 2/14/22, 6:28 PM:
---

Even the input string ("a") will cause the exception as Lexer.java treats the 
second (") as an escape char and read the unescaped \r, and then complain for 
missing the ending-quote (")

E.g.
{code:java}
CSVFormat.Builder.create().setEscape('"').build().parse(new 
StringReader("\"a\"")).getRecords();{code}
I think you cannot use the quote char as escape char. commons-cvs already 
implement the RFC part that the quote char is escaped by preceding quote char, 
but not the escape char

E.g.
{code:java}
System.out.println("1 " + CSVFormat.Builder.create().build().parse(new 
StringReader("\"a\"")).getRecords().get(0).get(0));
System.out.println("2 " + CSVFormat.Builder.create().build().parse(new 
StringReader("\"\"\"a\"\"\"")).getRecords().get(0).get(0));
System.out.println("3 " + 
CSVFormat.Builder.create().setQuote('|').build().parse(new 
StringReader("|a|")).getRecords().get(0).get(0));
System.out.println("4 " + 
CSVFormat.Builder.create().setQuote('|').build().parse(new 
StringReader("|||a|||")).getRecords().get(0).get(0));
{code}
 

Output
{code:java}
1 a
2 "a"
3 a
4 |a|
{code}
 

 


was (Author: JIRAUSER285196):
Even the input string ("a") will cause the exception as Lexer.java treats the 
second (") as an escape char and read the unescaped \r, and then complain for 
missing the ending-quote (")

E.g.

 
{code:java}

{code}
{{CSVFormat.Builder.create().setEscape('"').build().parse(new 
StringReader("\"a\"")).getRecords();}}

 

I think you cannot use the quote char as escape char. commons-cvs already 
implement the RFC part that the quote char is escaped by preceding quote char, 
but not the escape char

E.g.

 
{code:java}
System.out.println("1 " + CSVFormat.Builder.create().build().parse(new 
StringReader("\"a\"")).getRecords().get(0).get(0));
System.out.println("2 " + CSVFormat.Builder.create().build().parse(new 
StringReader("\"\"\"a\"\"\"")).getRecords().get(0).get(0));
System.out.println("3 " + 
CSVFormat.Builder.create().setQuote('|').build().parse(new 
StringReader("|a|")).getRecords().get(0).get(0));
System.out.println("4 " + 
CSVFormat.Builder.create().setQuote('|').build().parse(new 
StringReader("|||a|||")).getRecords().get(0).get(0));
{code}
 

Output
{code:java}
1 a
2 "a"
3 a
4 |a|
{code}
 

 

> CSVFormat does not support explicit " as escape char
> 
>
> Key: CSV-294
> URL: https://issues.apache.org/jira/browse/CSV-294
> Project: Commons CSV
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Joern Huxhorn
>Priority: Major
> Attachments: JiraCsv294Test.java
>
>
> Reading data that contains " does not work if escape character is *manually 
> set to {{'"'}}* as specified in [RFC 
> 4180|https://datatracker.ietf.org/doc/html/rfc4180].
> *It works for other escape characters or if no escape character is explicitly 
> defined in the format.*
> This line in {{Lexer.java}} is responsible for the originally quite erroneous 
> ticket:
> {{this.escape = mapNullToDisabled(format.getEscapeCharacter());}}
> From this line I (wrongly) deduced that an unspecified escape character would 
> actually disable escaping. Because of that I wanted to enable it by setting 
> it to {{'"'}} which causes exceptions in the Lexer for perfectly valid input. 
> That in turn convinced my that this is a way bigger issue than it is. Sorry 
> about that.
> I don't think that the current situation is ideal, though.
> I would not have been this confused if {{CSVFormat}} would be more explicit 
> about the escape char that will be used, i.e. if {{toString()}} would show 
> the implicitly used quote character or print - in case of {{null}} - that 
> this means it's using the quote character. It is currently omitted from the 
> output if it is not set explicitly.
> There is also no documentation about what {{null}} as escape character 
> actually means - it may be documented somewhere but isn't documented for 
> {{CSVFormat.getEscapeCharacter()}} or {{CSVFormat.Builder.set/getEscape()}} 
> methods.
> And setting the escape character explicitly to the value specified in the RFC 
> should certainly not fail, even if setting it to that value is superfluous 
> since {{null}} behaves exactly the same. 
> h4. Relevant part of the RFC:
> 7. If double-quotes are used to enclose fields, then a double-quote
> appearing inside a field must be escaped by preceding it with
> another double quote. For example:
> "aaa","b""bb","ccc"
> h4. Related issue:
> https://issues.apache.org/jira/browse/CSV-150



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (CSV-294) CSVFormat does not support explicit " as escape char

2022-02-14 Thread Angus C (Jira)


[ 
https://issues.apache.org/jira/browse/CSV-294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17492138#comment-17492138
 ] 

Angus C edited comment on CSV-294 at 2/14/22, 5:58 PM:
---

Even the input string ("a") will cause the exception as Lexer.java treats the 
second (") as an escape char and read the unescaped \r, and then complain for 
missing the ending-quote (")

E.g.

{{CSVFormat.Builder.{_}create{_}().setEscape('"').build().parse({*}new{*} 
StringReader("\"a\"")).getRecords();}}

I think you cannot use the quote char as escape char. commons-cvs already 
implement the RFC part that the quote char is escaped by preceding quote char, 
but not the escape char

E.g.

{{System.{*}_out_{*}.println("1 " + 
CSVFormat.Builder.{_}create{_}().build().parse({*}new{*} 
StringReader("\"a\"")).getRecords().get(0).get(0));}}

{{System.{*}_out_{*}.println("2 " + 
CSVFormat.Builder.{_}create{_}().build().parse({*}new{*} 
StringReader("\"\"\"a\"\"\"")).getRecords().get(0).get(0));}}

{{System.{*}_out_{*}.println("3 " + 
CSVFormat.Builder.{_}create{_}().setQuote('|').build().parse({*}new{*} 
StringReader("|a|")).getRecords().get(0).get(0));}}

{{System.{*}_out_{*}.println("4 " + 
CSVFormat.Builder.{_}create{_}().setQuote('|').build().parse({*}new{*} 
StringReader("|||a|||")).getRecords().get(0).get(0));}}

Output

--

{{1 a}}

{{2 "a"}}

{{3 a}}

{{4 |a|}}

 

 


was (Author: JIRAUSER285196):
Even the input string ("a") will cause the exception as Lexer.java treats the 
second (") as an escape char and read the unescaped \r, and then complain for 
missing the ending-quote (")

E.g.

{{CSVFormat.Builder.{_}create{_}().setEscape('"').build().parse(*new* 
StringReader("\"a\"")).getRecords();}}

I think you cannot use the quote char as escape char. commons-cvs already 
implement the RFC part that the quote char is escaped by preceding quote char, 
but not the escape char

E.g.

{{System.*_out_*.println("1 " + 
CSVFormat.Builder.{_}create{_}().build().parse(*new* 
StringReader("\"a\"")).getRecords().get(0).get(0));}}{{  }}

{{System.*_out_*.println("2 " + 
CSVFormat.Builder.{_}create{_}().build().parse(*new* 
StringReader("\"\"\"a\"\"\"")).getRecords().get(0).get(0));}}{{        }}

{{System.*_out_*.println("3 " + 
CSVFormat.Builder.{_}create{_}().setQuote('|').build().parse(*new* 
StringReader("|a|")).getRecords().get(0).get(0));}}{{        }}

{{System.*_out_*.println("4 " + 
CSVFormat.Builder.{_}create{_}().setQuote('|').build().parse(*new* 
StringReader("|||a|||")).getRecords().get(0).get(0));}}

Output

--

{{1 a}}

{{2 "a"}}

{{3 a}}

{{4 |a|}}

 

 

> CSVFormat does not support explicit " as escape char
> 
>
> Key: CSV-294
> URL: https://issues.apache.org/jira/browse/CSV-294
> Project: Commons CSV
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Joern Huxhorn
>Priority: Major
> Attachments: JiraCsv294Test.java
>
>
> Reading data that contains " does not work if escape character is *manually 
> set to {{'"'}}* as specified in [RFC 
> 4180|https://datatracker.ietf.org/doc/html/rfc4180].
> *It works for other escape characters or if no escape character is explicitly 
> defined in the format.*
> This line in {{Lexer.java}} is responsible for the originally quite erroneous 
> ticket:
> {{this.escape = mapNullToDisabled(format.getEscapeCharacter());}}
> From this line I (wrongly) deduced that an unspecified escape character would 
> actually disable escaping. Because of that I wanted to enable it by setting 
> it to {{'"'}} which causes exceptions in the Lexer for perfectly valid input. 
> That in turn convinced my that this is a way bigger issue than it is. Sorry 
> about that.
> I don't think that the current situation is ideal, though.
> I would not have been this confused if {{CSVFormat}} would be more explicit 
> about the escape char that will be used, i.e. if {{toString()}} would show 
> the implicitly used quote character or print - in case of {{null}} - that 
> this means it's using the quote character. It is currently omitted from the 
> output if it is not set explicitly.
> There is also no documentation about what {{null}} as escape character 
> actually means - it may be documented somewhere but isn't documented for 
> {{CSVFormat.getEscapeCharacter()}} or {{CSVFormat.Builder.set/getEscape()}} 
> methods.
> And setting the escape character explicitly to the value specified in the RFC 
> should certainly not fail, even if setting it to that value is superfluous 
> since {{null}} behaves exactly the same. 
> h4. Relevant part of the RFC:
> 7. If double-quotes are used to enclose fields, then a double-quote
> appearing inside a field must be escaped by preceding it with
> another double quote. For example:
> "aaa","b""bb","ccc"
> h4. Related issue:
> 

[jira] [Commented] (CSV-294) CSVFormat does not support explicit " as escape char

2022-02-14 Thread Angus C (Jira)


[ 
https://issues.apache.org/jira/browse/CSV-294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17492138#comment-17492138
 ] 

Angus C commented on CSV-294:
-

Even the input string ("a") will cause the exception as Lexer.java treats the 
second (") as an escape char and read the unescaped \r, and then complain for 
missing the ending-quote (")

E.g.

{{CSVFormat.Builder.{_}create{_}().setEscape('"').build().parse(*new* 
StringReader("\"a\"")).getRecords();}}

I think you cannot use the quote char as escape char. commons-cvs already 
implement the RFC part that the quote char is escaped by preceding quote char, 
but not the escape char

E.g.

{{System.*_out_*.println("1 " + 
CSVFormat.Builder.{_}create{_}().build().parse(*new* 
StringReader("\"a\"")).getRecords().get(0).get(0));}}{{  }}

{{System.*_out_*.println("2 " + 
CSVFormat.Builder.{_}create{_}().build().parse(*new* 
StringReader("\"\"\"a\"\"\"")).getRecords().get(0).get(0));}}{{        }}

{{System.*_out_*.println("3 " + 
CSVFormat.Builder.{_}create{_}().setQuote('|').build().parse(*new* 
StringReader("|a|")).getRecords().get(0).get(0));}}{{        }}

{{System.*_out_*.println("4 " + 
CSVFormat.Builder.{_}create{_}().setQuote('|').build().parse(*new* 
StringReader("|||a|||")).getRecords().get(0).get(0));}}

Output

--

{{1 a}}

{{2 "a"}}

{{3 a}}

{{4 |a|}}

 

 

> CSVFormat does not support explicit " as escape char
> 
>
> Key: CSV-294
> URL: https://issues.apache.org/jira/browse/CSV-294
> Project: Commons CSV
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Joern Huxhorn
>Priority: Major
> Attachments: JiraCsv294Test.java
>
>
> Reading data that contains " does not work if escape character is *manually 
> set to {{'"'}}* as specified in [RFC 
> 4180|https://datatracker.ietf.org/doc/html/rfc4180].
> *It works for other escape characters or if no escape character is explicitly 
> defined in the format.*
> This line in {{Lexer.java}} is responsible for the originally quite erroneous 
> ticket:
> {{this.escape = mapNullToDisabled(format.getEscapeCharacter());}}
> From this line I (wrongly) deduced that an unspecified escape character would 
> actually disable escaping. Because of that I wanted to enable it by setting 
> it to {{'"'}} which causes exceptions in the Lexer for perfectly valid input. 
> That in turn convinced my that this is a way bigger issue than it is. Sorry 
> about that.
> I don't think that the current situation is ideal, though.
> I would not have been this confused if {{CSVFormat}} would be more explicit 
> about the escape char that will be used, i.e. if {{toString()}} would show 
> the implicitly used quote character or print - in case of {{null}} - that 
> this means it's using the quote character. It is currently omitted from the 
> output if it is not set explicitly.
> There is also no documentation about what {{null}} as escape character 
> actually means - it may be documented somewhere but isn't documented for 
> {{CSVFormat.getEscapeCharacter()}} or {{CSVFormat.Builder.set/getEscape()}} 
> methods.
> And setting the escape character explicitly to the value specified in the RFC 
> should certainly not fail, even if setting it to that value is superfluous 
> since {{null}} behaves exactly the same. 
> h4. Relevant part of the RFC:
> 7. If double-quotes are used to enclose fields, then a double-quote
> appearing inside a field must be escaped by preceding it with
> another double quote. For example:
> "aaa","b""bb","ccc"
> h4. Related issue:
> https://issues.apache.org/jira/browse/CSV-150



--
This message was sent by Atlassian Jira
(v8.20.1#820001)