Allow setting of end-of-record delimiter for TextInputFormat
------------------------------------------------------------
Key: MAPREDUCE-2254
URL: https://issues.apache.org/jira/browse/MAPREDUCE-2254
Project: Hadoop Map/Reduce
Issue Type: Improvement
Reporter: Ahmed Radwan
It will be useful to allow setting the end-of-record delimiter for
TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as
the only possible record delimiters. This is a problem if users have impeded
newlines in their data fields (which is pretty common). This is also a problem
for other tools using this TextInputFormat (See for example:
https://issues.apache.org/jira/browse/PIG-836 and
https://issues.cloudera.org/browse/SQOOP-136).
I have wrote a patch to address this issue. This patch allows users to specify
any custom end-of-record delimiter using a new added configuration property.
For backward compatibility, if this new configuration property is absent, then
the same exact previous delimiters are used (i.e., '\n', '\r' or '\r\n').
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.