[jira] [Commented] (CSV-67) UnicodeUnescapeReader should not be applied before parsing

Emmanuel Bourg (Commented) (JIRA) Fri, 16 Mar 2012 01:14:10 -0700

    [ 
https://issues.apache.org/jira/browse/CSV-67?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13230965#comment-13230965
 ]


Emmanuel Bourg commented on CSV-67:
-----------------------------------

Good point but I'm not sure it actually happens. So far the only application I 
have found supporting unicode escapes is HSQLDB. It can read them but doesn't 
write them (I checked HSQL 1.8, I'll look at 2.x). I believe these unicode 
escapes are typically created by a program like native2ascii which converts 
only non ascii characters, so I believe the line separators are safe.

I agree on removing the unicode escape setting from CSVFormat. I would prefer 
submitting the reader to [io] than making it public in [csv] though.
                
> UnicodeUnescapeReader should not be applied before parsing
> ----------------------------------------------------------
>
>                 Key: CSV-67
>                 URL: https://issues.apache.org/jira/browse/CSV-67
>             Project: Commons CSV
>          Issue Type: Bug
>            Reporter: Sebb
>
> The UnicodeEscapeReader is currently applied before the input file is parsed.
> This means that unicode escapes are treated differently from other escapes.
> For example, the sequence <esc>r<esc>n is not treated as a new-line for the 
> purpose of recognising the end of a record, yet \o000D\u000A is converted to 
> CRLF and would terminate the record (unless embedded in a quoted string).
> The unicode escape processing (if selected) should occur as part of the 
> parsing, just as for ordinary escape processing.
> The class can be made public so the user can wrap the input if required; this 
> preserves the existing functionality should it be required, so there is no 
> need to introduce another setting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CSV-67) UnicodeUnescapeReader should not be applied before parsing

Reply via email to