[jira] Commented: (HADOOP-788) Streaming should use a subclass of TextInputFormat for reading text inputs.

Hadoop QA (JIRA) Wed, 31 Jan 2007 02:34:27 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468970
 ]


Hadoop QA commented on HADOOP-788:
----------------------------------

+1, because 
http://issues.apache.org/jira/secure/attachment/12349008/Hadoop-788.patch 
applied and successfully tested against trunk revision r501616.

> Streaming should use a subclass of TextInputFormat for reading text inputs.
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-788
>                 URL: https://issues.apache.org/jira/browse/HADOOP-788
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/streaming
>            Reporter: Owen O'Malley
>         Assigned To: Sanjay Dahiya
>         Attachments: Hadoop-788.patch
>
>
> Currently streaming uses a lot of custom code for processing text inputs. 
> I propose:
>  1. Move class LineRecordReader  out of TextInputFormat.
>  2. Make class StreamLineRecordReader extend LineRecordReader.
>  3. StreamLineRecordReader uses LineRecordReader.next to read the lines and 
> splits them on tab to generate a Text/Text key/value pair.
> This will remove a lot of code from streaming and give it automatic support 
> for the compression codecs that the "base" part of Hadoop enjoys. In 
> particular, if the native zlib code is used, it will remove the 2gb limit on 
> compressed files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-788) Streaming should use a subclass of TextInputFormat for reading text inputs.

Reply via email to