[ https://issues.apache.org/jira/browse/CSV-67?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13230965#comment-13230965 ]
Emmanuel Bourg commented on CSV-67: ----------------------------------- Good point but I'm not sure it actually happens. So far the only application I have found supporting unicode escapes is HSQLDB. It can read them but doesn't write them (I checked HSQL 1.8, I'll look at 2.x). I believe these unicode escapes are typically created by a program like native2ascii which converts only non ascii characters, so I believe the line separators are safe. I agree on removing the unicode escape setting from CSVFormat. I would prefer submitting the reader to [io] than making it public in [csv] though. > UnicodeUnescapeReader should not be applied before parsing > ---------------------------------------------------------- > > Key: CSV-67 > URL: https://issues.apache.org/jira/browse/CSV-67 > Project: Commons CSV > Issue Type: Bug > Reporter: Sebb > > The UnicodeEscapeReader is currently applied before the input file is parsed. > This means that unicode escapes are treated differently from other escapes. > For example, the sequence <esc>r<esc>n is not treated as a new-line for the > purpose of recognising the end of a record, yet \o000D\u000A is converted to > CRLF and would terminate the record (unless embedded in a quoted string). > The unicode escape processing (if selected) should occur as part of the > parsing, just as for ordinary escape processing. > The class can be made public so the user can wrap the input if required; this > preserves the existing functionality should it be required, so there is no > need to introduce another setting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira