[ 
https://issues.apache.org/jira/browse/HADOOP-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652863#action_12652863
 ] 

Ruyue Ma commented on HADOOP-4620:
----------------------------------

Hi, Ravi

You can re-produce the problem. 

copy a zero-length file to hdfs. Then run streaming job for this file. You know 
mapper will handle zero-length data, if the mapper still produce out, if the 
out is larger than 4KB, the mapper task will hang.

The reason is: if mapper input data is zero, streaming would not start map 
output thread, but open the mapper's stdout pipe, the pipe's size defaultly is 
4KB. If mapper doesn't generate data large than 4KB, mapper will end normally. 
But if larger than 4KB, it will hang.

If you guys think it is bug, I can submit a patch to correct it. 

As far as I know, the latest revision resolve the problem for stderr output 
thread, but not for stdout output thread.

Thanks 

> Streaming mapper never completes if the mapper does not write to stdout
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-4620
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4620
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.17.2
>            Reporter: Runping Qi
>            Assignee: Ravi Gummadi
>
> A mapper of a streaming job has empty input data and thus it produces no 
> output.
> The task never completes.
> The following are the last two lines from the task log:
> 2008-11-07 21:59:48,254 INFO org.apache.hadoop.streaming.PipeMapRed: 
> PipeMapRed exec [/usr/bin/perl, xxx]
> 2008-11-07 21:59:48,330 INFO org.apache.hadoop.streaming.PipeMapRed: 
> mapRedFinished
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to