[jira] Commented: (HADOOP-4115) Reducer gets stuck in shuffle when local disk out of space

Devaraj Das (JIRA) Mon, 08 Sep 2008 12:04:05 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629259#action_12629259
 ]


Devaraj Das commented on HADOOP-4115:
-------------------------------------

Ideally, the task should have exited when this exception was thrown. By any 
chance, did you get a jstack dump of the task when it hung after the exception? 
(kill -3 <pid> would also help). At this point of time, one suspect is some 
thread is not a daemon thread and it is preventing the process from exiting. 
The other suspect is that the task JVM is stuck for some reason in the finally 
clause of TaskTracker.Child.main(). Do you know whether the TT was reachable 
and whether it logged the task failure?

> Reducer gets stuck in shuffle when local disk out of space
> ----------------------------------------------------------
>
>                 Key: HADOOP-4115
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4115
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.17.2
>            Reporter: Marco Nicosia
>            Priority: Critical
>
> 2008-08-29 23:53:12,357 WARN org.apache.hadoop.mapred.ReduceTask: 
> task_200808291851_0001_r_000245_0 Merging of the local FS files threw an 
> exception: org.apache.hadoop.fs.FSError: java.io.IOException: No space left 
> on device
>       at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:199)
>       at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>       at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
>       at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:47)
>       at java.io.DataOutputStream.write(DataOutputStream.java:90)
>       at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.writeChunk(ChecksumFileSystem.java:339)
>       at 
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:155)
>       at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
>       at 
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121)
>       at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112)
>       at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
>       at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:47)
>       at java.io.DataOutputStream.write(DataOutputStream.java:90)
>       at 
> org.apache.hadoop.io.SequenceFile$UncompressedBytes.writeUncompressedBytes(SequenceFile.java:617)
>       at 
> org.apache.hadoop.io.SequenceFile$Writer.appendRaw(SequenceFile.java:1038)
>       at 
> org.apache.hadoop.io.SequenceFile$Sorter.writeFile(SequenceFile.java:2626)
>       at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$LocalFSMerger.run(ReduceTask.java:1564)
> Caused by: java.io.IOException: No space left on device
>       at java.io.FileOutputStream.writeBytes(Native Method)
>       at java.io.FileOutputStream.write(FileOutputStream.java:260)
>       at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:197)
>       ... 16 more
> 2008-08-29 23:53:14,013 WARN org.apache.hadoop.mapred.TaskTracker: Error 
> running child
> java.io.IOException: task_200808291851_0001_r_000245_0The reduce copier failed
>       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:329)
>       at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4115) Reducer gets stuck in shuffle when local disk out of space

Reply via email to