Streaming does not work for text data if the records don't fit in a short UTF8 
[2^16/3 characters]
--------------------------------------------------------------------------------------------------

                 Key: HADOOP-439
                 URL: http://issues.apache.org/jira/browse/HADOOP-439
             Project: Hadoop
          Issue Type: Bug
            Reporter: Dick King
            Priority: Critical


The streaming code internally reads the input data into a UTF8 .  This causes 
truncated data to be shipped to the mapper when the input exceeds about 21000 
characters, with no notice to the user except possibly in individual tasks' 
machines' logs, which people would not normally read for apparently successful 
jobs.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to