[ https://issues.apache.org/jira/browse/HADOOP-788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468970 ]
Hadoop QA commented on HADOOP-788: ---------------------------------- +1, because http://issues.apache.org/jira/secure/attachment/12349008/Hadoop-788.patch applied and successfully tested against trunk revision r501616. > Streaming should use a subclass of TextInputFormat for reading text inputs. > --------------------------------------------------------------------------- > > Key: HADOOP-788 > URL: https://issues.apache.org/jira/browse/HADOOP-788 > Project: Hadoop > Issue Type: Improvement > Components: contrib/streaming > Reporter: Owen O'Malley > Assigned To: Sanjay Dahiya > Attachments: Hadoop-788.patch > > > Currently streaming uses a lot of custom code for processing text inputs. > I propose: > 1. Move class LineRecordReader out of TextInputFormat. > 2. Make class StreamLineRecordReader extend LineRecordReader. > 3. StreamLineRecordReader uses LineRecordReader.next to read the lines and > splits them on tab to generate a Text/Text key/value pair. > This will remove a lot of code from streaming and give it automatic support > for the compression codecs that the "base" part of Hadoop enjoys. In > particular, if the native zlib code is used, it will remove the 2gb limit on > compressed files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.