Muhammad created CRUNCH-564:
-------------------------------
Summary: Add support for using escape character same as open/close
quote character
Key: CRUNCH-564
URL: https://issues.apache.org/jira/browse/CRUNCH-564
Project: Crunch
Issue Type: Improvement
Components: Core
Reporter: Muhammad
Assignee: Josh Wills
Priority: Trivial
As a user I would like to use CSVInputFormat to handle the CSV files following
this RFC http://www.ietf.org/rfc/rfc4180.txt.
Many developers use Apache StringEscapeUtils.escapeCsv( ) method to escape
their CSVs. The method escapes the CSV following the RFC4180.
https://commons.apache.org/proper/commons-lang/javadocs/api-2.6/org/apache/commons/lang/StringEscapeUtils.html
The CSVLineReader throws exception in such a case. We can enhance the code to
support the CSVs that use escape same as the quote characters.
https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/io/text/csv/CSVLineReader.java#L152
I would appreciate a comment, if someone has knowingly rejected the idea due to
some technical limitation or a problem with allowing escape and quote as same
characters. By the way Apache HAWQ seem to get around this issue somehow and
reads such CSVs alright.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)