[jira] Commented: (HADOOP-439) Streaming does not work for text data if the records don't fit in a short UTF8 [2^16/3 characters]

Sameer Paranjpye (JIRA) Tue, 15 Aug 2006 16:04:11 -0700

    [ 
http://issues.apache.org/jira/browse/HADOOP-439?page=comments#action_12428258 ] 
            
Sameer Paranjpye commented on HADOOP-439:
-----------------------------------------


This ought to be resolvable by replacing UTF8 by the new Text class. Streaming 
should use Text instead of UTF8 to represent strings.

> Streaming does not work for text data if the records don't fit in a short 
> UTF8 [2^16/3 characters]
> --------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-439
>                 URL: http://issues.apache.org/jira/browse/HADOOP-439
>             Project: Hadoop
>          Issue Type: Bug
>    Affects Versions: 0.5.0
>            Reporter: Dick King
>            Priority: Critical
>             Fix For: 0.6.0
>
>
> The streaming code internally reads the input data into a UTF8 .  This causes 
> truncated data to be shipped to the mapper when the input exceeds about 21000 
> characters, with no notice to the user except possibly in individual tasks' 
> machines' logs, which people would not normally read for apparently 
> successful jobs.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-439) Streaming does not work for text data if the records don't fit in a short UTF8 [2^16/3 characters]

Reply via email to