[jira] Commented: (MAPREDUCE-1264) Error Recovery failed, task will continue but run forever as new data only comes in very very slowly

Rocio Delgado (JIRA) Fri, 19 Mar 2010 15:08:52 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847610#action_12847610
 ]


Rocio Delgado commented on MAPREDUCE-1264:
------------------------------------------

I have experienced the same under version 0.18.3 

> Error Recovery failed, task will continue but run forever as new data only 
> comes in very very slowly
> ----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1264
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1264
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.20.1
>            Reporter: Thibaut
>
> Hi,
> Sometimes, some of my jobs (It normally always happens in the reducers and on 
> random basis) will not finish and will run forever. I have to manually fail 
> the task so the task will be started and be finished.
> The error log on the node is full of entries like:
> java.io.IOException: Error Recovery for block 
> blk_-8036012205502614140_21582139 failed  because recovery from primary 
> datanode 192.168.0.3:50011 failed 6 times.  Pipeline was 192.168.0.3:50011. 
> Aborting...
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2582)
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:2076)
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2239)
> java.io.IOException: Error Recovery for block 
> blk_-8036012205502614140_21582139 failed  because recovery from primary 
> datanode 192.168.0.3:50011 failed 6 times.  Pipeline was 192.168.0.3:50011. 
> Aborting...
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2582)
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:2076)
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2239)
> java.io.IOException: Error Recovery for block 
> blk_-8036012205502614140_21582139 failed  because recovery from primary 
> datanode 192.168.0.3:50011 failed 6 times.  Pipeline was 192.168.0.3:50011. 
> Aborting...
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2582)
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:2076)
>         at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2239)
> The error entries all refer to the same data block.
> Unfortunately, the reduce function still seems to be called in the reducer 
> with valid data (although very very slowly), so the task will never been 
> killed and restarted and will take forever to run!
> If I kill the task, the job will finish without any problems. 
> I experienced the same problem under version 0.20.0 as well.
> Thanks,
> Thibaut

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1264) Error Recovery failed, task will continue but run forever as new data only comes in very very slowly

Reply via email to