[
https://issues.apache.org/jira/browse/MAPREDUCE-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847610#action_12847610
]
Rocio Delgado commented on MAPREDUCE-1264:
------------------------------------------
I have experienced the same under version 0.18.3
> Error Recovery failed, task will continue but run forever as new data only
> comes in very very slowly
> ----------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-1264
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1264
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Affects Versions: 0.20.1
> Reporter: Thibaut
>
> Hi,
> Sometimes, some of my jobs (It normally always happens in the reducers and on
> random basis) will not finish and will run forever. I have to manually fail
> the task so the task will be started and be finished.
> The error log on the node is full of entries like:
> java.io.IOException: Error Recovery for block
> blk_-8036012205502614140_21582139 failed because recovery from primary
> datanode 192.168.0.3:50011 failed 6 times. Pipeline was 192.168.0.3:50011.
> Aborting...
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2582)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:2076)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2239)
> java.io.IOException: Error Recovery for block
> blk_-8036012205502614140_21582139 failed because recovery from primary
> datanode 192.168.0.3:50011 failed 6 times. Pipeline was 192.168.0.3:50011.
> Aborting...
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2582)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:2076)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2239)
> java.io.IOException: Error Recovery for block
> blk_-8036012205502614140_21582139 failed because recovery from primary
> datanode 192.168.0.3:50011 failed 6 times. Pipeline was 192.168.0.3:50011.
> Aborting...
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2582)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:2076)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2239)
> The error entries all refer to the same data block.
> Unfortunately, the reduce function still seems to be called in the reducer
> with valid data (although very very slowly), so the task will never been
> killed and restarted and will take forever to run!
> If I kill the task, the job will finish without any problems.
> I experienced the same problem under version 0.20.0 as well.
> Thanks,
> Thibaut
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.