[
https://issues.apache.org/jira/browse/HADOOP-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652863#action_12652863
]
Ruyue Ma commented on HADOOP-4620:
----------------------------------
Hi, Ravi
You can re-produce the problem.
copy a zero-length file to hdfs. Then run streaming job for this file. You know
mapper will handle zero-length data, if the mapper still produce out, if the
out is larger than 4KB, the mapper task will hang.
The reason is: if mapper input data is zero, streaming would not start map
output thread, but open the mapper's stdout pipe, the pipe's size defaultly is
4KB. If mapper doesn't generate data large than 4KB, mapper will end normally.
But if larger than 4KB, it will hang.
If you guys think it is bug, I can submit a patch to correct it.
As far as I know, the latest revision resolve the problem for stderr output
thread, but not for stdout output thread.
Thanks
> Streaming mapper never completes if the mapper does not write to stdout
> -----------------------------------------------------------------------
>
> Key: HADOOP-4620
> URL: https://issues.apache.org/jira/browse/HADOOP-4620
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.17.2
> Reporter: Runping Qi
> Assignee: Ravi Gummadi
>
> A mapper of a streaming job has empty input data and thus it produces no
> output.
> The task never completes.
> The following are the last two lines from the task log:
> 2008-11-07 21:59:48,254 INFO org.apache.hadoop.streaming.PipeMapRed:
> PipeMapRed exec [/usr/bin/perl, xxx]
> 2008-11-07 21:59:48,330 INFO org.apache.hadoop.streaming.PipeMapRed:
> mapRedFinished
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.