Streaming does not work for text data if the records don't fit in a short UTF8
[2^16/3 characters]
--------------------------------------------------------------------------------------------------
Key: HADOOP-439
URL: http://issues.apache.org/jira/browse/HADOOP-439
Project: Hadoop
Issue Type: Bug
Reporter: Dick King
Priority: Critical
The streaming code internally reads the input data into a UTF8 . This causes
truncated data to be shipped to the mapper when the input exceeds about 21000
characters, with no notice to the user except possibly in individual tasks'
machines' logs, which people would not normally read for apparently successful
jobs.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira