[ http://issues.apache.org/jira/browse/HADOOP-439?page=comments#action_12428258 ] Sameer Paranjpye commented on HADOOP-439: -----------------------------------------
This ought to be resolvable by replacing UTF8 by the new Text class. Streaming should use Text instead of UTF8 to represent strings. > Streaming does not work for text data if the records don't fit in a short > UTF8 [2^16/3 characters] > -------------------------------------------------------------------------------------------------- > > Key: HADOOP-439 > URL: http://issues.apache.org/jira/browse/HADOOP-439 > Project: Hadoop > Issue Type: Bug > Affects Versions: 0.5.0 > Reporter: Dick King > Priority: Critical > Fix For: 0.6.0 > > > The streaming code internally reads the input data into a UTF8 . This causes > truncated data to be shipped to the mapper when the input exceeds about 21000 > characters, with no notice to the user except possibly in individual tasks' > machines' logs, which people would not normally read for apparently > successful jobs. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira