[ http://issues.apache.org/jira/browse/HADOOP-439?page=all ]
Sameer Paranjpye updated HADOOP-439: ------------------------------------ Component/s: contrib/streaming > Streaming does not work for text data if the records don't fit in a short > UTF8 [2^16/3 characters] > -------------------------------------------------------------------------------------------------- > > Key: HADOOP-439 > URL: http://issues.apache.org/jira/browse/HADOOP-439 > Project: Hadoop > Issue Type: Bug > Components: contrib/streaming > Affects Versions: 0.5.0 > Reporter: Dick King > Assigned To: Hairong Kuang > Priority: Critical > > The streaming code internally reads the input data into a UTF8 . This causes > truncated data to be shipped to the mapper when the input exceeds about 21000 > characters, with no notice to the user except possibly in individual tasks' > machines' logs, which people would not normally read for apparently > successful jobs. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira