[jira] [Commented] (CSV-294) CSVFormat does not support explicit " as escape char

2022-02-14 Thread Angus C (Jira)


[ 
https://issues.apache.org/jira/browse/CSV-294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17492138#comment-17492138
 ] 

Angus C commented on CSV-294:
-

Even the input string ("a") will cause the exception as Lexer.java treats the 
second (") as an escape char and read the unescaped \r, and then complain for 
missing the ending-quote (")

E.g.

{{CSVFormat.Builder.{_}create{_}().setEscape('"').build().parse(*new* 
StringReader("\"a\"")).getRecords();}}

I think you cannot use the quote char as escape char. commons-cvs already 
implement the RFC part that the quote char is escaped by preceding quote char, 
but not the escape char

E.g.

{{System.*_out_*.println("1 " + 
CSVFormat.Builder.{_}create{_}().build().parse(*new* 
StringReader("\"a\"")).getRecords().get(0).get(0));}}{{  }}

{{System.*_out_*.println("2 " + 
CSVFormat.Builder.{_}create{_}().build().parse(*new* 
StringReader("\"\"\"a\"\"\"")).getRecords().get(0).get(0));}}{{        }}

{{System.*_out_*.println("3 " + 
CSVFormat.Builder.{_}create{_}().setQuote('|').build().parse(*new* 
StringReader("|a|")).getRecords().get(0).get(0));}}{{        }}

{{System.*_out_*.println("4 " + 
CSVFormat.Builder.{_}create{_}().setQuote('|').build().parse(*new* 
StringReader("|||a|||")).getRecords().get(0).get(0));}}

Output

--

{{1 a}}

{{2 "a"}}

{{3 a}}

{{4 |a|}}

 

 

> CSVFormat does not support explicit " as escape char
> 
>
> Key: CSV-294
> URL: https://issues.apache.org/jira/browse/CSV-294
> Project: Commons CSV
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Joern Huxhorn
>Priority: Major
> Attachments: JiraCsv294Test.java
>
>
> Reading data that contains " does not work if escape character is *manually 
> set to {{'"'}}* as specified in [RFC 
> 4180|https://datatracker.ietf.org/doc/html/rfc4180].
> *It works for other escape characters or if no escape character is explicitly 
> defined in the format.*
> This line in {{Lexer.java}} is responsible for the originally quite erroneous 
> ticket:
> {{this.escape = mapNullToDisabled(format.getEscapeCharacter());}}
> From this line I (wrongly) deduced that an unspecified escape character would 
> actually disable escaping. Because of that I wanted to enable it by setting 
> it to {{'"'}} which causes exceptions in the Lexer for perfectly valid input. 
> That in turn convinced my that this is a way bigger issue than it is. Sorry 
> about that.
> I don't think that the current situation is ideal, though.
> I would not have been this confused if {{CSVFormat}} would be more explicit 
> about the escape char that will be used, i.e. if {{toString()}} would show 
> the implicitly used quote character or print - in case of {{null}} - that 
> this means it's using the quote character. It is currently omitted from the 
> output if it is not set explicitly.
> There is also no documentation about what {{null}} as escape character 
> actually means - it may be documented somewhere but isn't documented for 
> {{CSVFormat.getEscapeCharacter()}} or {{CSVFormat.Builder.set/getEscape()}} 
> methods.
> And setting the escape character explicitly to the value specified in the RFC 
> should certainly not fail, even if setting it to that value is superfluous 
> since {{null}} behaves exactly the same. 
> h4. Relevant part of the RFC:
> 7. If double-quotes are used to enclose fields, then a double-quote
> appearing inside a field must be escaped by preceding it with
> another double quote. For example:
> "aaa","b""bb","ccc"
> h4. Related issue:
> https://issues.apache.org/jira/browse/CSV-150



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (CSV-294) CSVFormat does not support explicit " as escape char

2021-12-16 Thread Joern Huxhorn (Jira)


[ 
https://issues.apache.org/jira/browse/CSV-294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17460861#comment-17460861
 ] 

Joern Huxhorn commented on CSV-294:
---

Added a test reproducing the issue.

> CSVFormat does not support explicit " as escape char
> 
>
> Key: CSV-294
> URL: https://issues.apache.org/jira/browse/CSV-294
> Project: Commons CSV
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Joern Huxhorn
>Priority: Major
> Attachments: JiraCsv294Test.java
>
>
> Reading data that contains " does not work if escape character is *manually 
> set to {{'"'}}* as specified in [RFC 
> 4180|https://datatracker.ietf.org/doc/html/rfc4180].
> *It works for other escape characters or if no escape character is explicitly 
> defined in the format.*
> This line in {{Lexer.java}} is responsible for the originally quite erroneous 
> ticket:
> {{this.escape = mapNullToDisabled(format.getEscapeCharacter());}}
> From this line I (wrongly) deduced that an unspecified escape character would 
> actually disable escaping. Because of that I wanted to enable it by setting 
> it to {{'"'}} which causes exceptions in the Lexer for perfectly valid input. 
> That in turn convinced my that this is a way bigger issue than it is. Sorry 
> about that.
> I don't think that the current situation is ideal, though.
> I would not have been this confused if {{CSVFormat}} would be more explicit 
> about the escape char that will be used, i.e. if {{toString()}} would show 
> the implicitly used quote character or print - in case of {{null}} - that 
> this means it's using the quote character. It is currently omitted from the 
> output if it is not set explicitly.
> There is also no documentation about what {{null}} as escape character 
> actually means - it may be documented somewhere but isn't documented for 
> {{CSVFormat.getEscapeCharacter()}} or {{CSVFormat.Builder.set/getEscape()}} 
> methods.
> And setting the escape character explicitly to the value specified in the RFC 
> should certainly not fail, even if setting it to that value is superfluous 
> since {{null}} behaves exactly the same. 
> h4. Relevant part of the RFC:
> 7. If double-quotes are used to enclose fields, then a double-quote
> appearing inside a field must be escaped by preceding it with
> another double quote. For example:
> "aaa","b""bb","ccc"
> h4. Related issue:
> https://issues.apache.org/jira/browse/CSV-150



--
This message was sent by Atlassian Jira
(v8.20.1#820001)