[jira] [Resolved] (MAPREDUCE-1264) Error Recovery failed, task will continue but run forever as new data only comes in very very slowly

Allen Wittenauer (JIRA) Tue, 29 Jul 2014 14:08:53 -0700

     [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Allen Wittenauer resolved MAPREDUCE-1264.
-----------------------------------------

    Resolution: Incomplete

Closing this as stale.

> Error Recovery failed, task will continue but run forever as new data only 
> comes in very very slowly
> ----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1264
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1264
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.20.1
>            Reporter: Thibaut
>
> Hi,
> Sometimes, some of my jobs (It normally always happens in the reducers and on 
> random basis) will not finish and will run forever. I have to manually fail 
> the task so the task will be started and be finished.
> The error log on the node is full of entries like:
> java.io.IOException: Error Recovery for block 
> blk_-8036012205502614140_21582139 failed  because recovery from primary 
> datanode 192.168.0.3:50011 failed 6 times.  Pipeline was 192.168.0.3:50011. 
> Aborting...
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2582)
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:2076)
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2239)
> java.io.IOException: Error Recovery for block 
> blk_-8036012205502614140_21582139 failed  because recovery from primary 
> datanode 192.168.0.3:50011 failed 6 times.  Pipeline was 192.168.0.3:50011. 
> Aborting...
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2582)
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:2076)
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2239)
> java.io.IOException: Error Recovery for block 
> blk_-8036012205502614140_21582139 failed  because recovery from primary 
> datanode 192.168.0.3:50011 failed 6 times.  Pipeline was 192.168.0.3:50011. 
> Aborting...
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2582)
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:2076)
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2239)
> The error entries all refer to the same data block.
> Unfortunately, the reduce function still seems to be called in the reducer 
> with valid data (although very very slowly), so the task will never been 
> killed and restarted and will take forever to run!
> If I kill the task, the job will finish without any problems. 
> I experienced the same problem under version 0.20.0 as well.
> Thanks,
> Thibaut



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (MAPREDUCE-1264) Error Recovery failed, task will continue but run forever as new data only comes in very very slowly

Reply via email to