[
https://issues.apache.org/jira/browse/HADOOP-788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468970
]
Hadoop QA commented on HADOOP-788:
----------------------------------
+1, because
http://issues.apache.org/jira/secure/attachment/12349008/Hadoop-788.patch
applied and successfully tested against trunk revision r501616.
> Streaming should use a subclass of TextInputFormat for reading text inputs.
> ---------------------------------------------------------------------------
>
> Key: HADOOP-788
> URL: https://issues.apache.org/jira/browse/HADOOP-788
> Project: Hadoop
> Issue Type: Improvement
> Components: contrib/streaming
> Reporter: Owen O'Malley
> Assigned To: Sanjay Dahiya
> Attachments: Hadoop-788.patch
>
>
> Currently streaming uses a lot of custom code for processing text inputs.
> I propose:
> 1. Move class LineRecordReader out of TextInputFormat.
> 2. Make class StreamLineRecordReader extend LineRecordReader.
> 3. StreamLineRecordReader uses LineRecordReader.next to read the lines and
> splits them on tab to generate a Text/Text key/value pair.
> This will remove a lot of code from streaming and give it automatic support
> for the compression codecs that the "base" part of Hadoop enjoys. In
> particular, if the native zlib code is used, it will remove the 2gb limit on
> compressed files.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.