[ 
https://issues.apache.org/jira/browse/HADOOP-7096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Radwan updated HADOOP-7096:
---------------------------------

    Status: Patch Available  (was: Open)

> Allow setting of end-of-record delimiter for TextInputFormat
> ------------------------------------------------------------
>
>                 Key: HADOOP-7096
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7096
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Ahmed Radwan
>         Attachments: 2.patch
>
>
> The patch for https://issues.apache.org/jira/browse/MAPREDUCE-2254 required 
> minor changes to the LineReader class to allow extensions (see attached 
> 2.patch). Description copied below:
> It will be useful to allow setting the end-of-record delimiter for 
> TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as 
> the only possible record delimiters. This is a problem if users have impeded 
> newlines in their data fields (which is pretty common). This is also a 
> problem for other tools using this TextInputFormat (See for example: 
> https://issues.apache.org/jira/browse/PIG-836 and 
> https://issues.cloudera.org/browse/SQOOP-136).
> I have wrote a patch to address this issue. This patch allows users to 
> specify any custom end-of-record delimiter using a new added configuration 
> property. For backward compatibility, if this new configuration property is 
> absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or 
> '\r\n').

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to