[
https://issues.apache.org/jira/browse/HADOOP-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629293#action_12629293
]
Chris Douglas commented on HADOOP-4115:
---------------------------------------
Is there space available on other drives on that TT that aren't being used, or
are all configured drives completely out of space? Does the reduce eventually
fail and get rescheduled or does it hang? In the latter case, is the task ever
rescheduled/speculated or does this state persist until the job is killed? In
the former case, is it being rescheduled on the same node, ultimately and
incorrectly failing the job, or does the job eventually succeed?
Quick aside: it would help a lot if the issue description were to present an
abstract of the observed behavior; stack traces and other verbose diagnostic
information is more readable (especially by email) in a comment.
> Reducer gets stuck in shuffle when local disk out of space
> ----------------------------------------------------------
>
> Key: HADOOP-4115
> URL: https://issues.apache.org/jira/browse/HADOOP-4115
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.17.2
> Reporter: Marco Nicosia
> Priority: Critical
>
> 2008-08-29 23:53:12,357 WARN org.apache.hadoop.mapred.ReduceTask:
> task_200808291851_0001_r_000245_0 Merging of the local FS files threw an
> exception: org.apache.hadoop.fs.FSError: java.io.IOException: No space left
> on device
> at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:199)
> at
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
> at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:47)
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
> at
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.writeChunk(ChecksumFileSystem.java:339)
> at
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:155)
> at
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
> at
> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121)
> at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112)
> at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
> at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:47)
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
> at
> org.apache.hadoop.io.SequenceFile$UncompressedBytes.writeUncompressedBytes(SequenceFile.java:617)
> at
> org.apache.hadoop.io.SequenceFile$Writer.appendRaw(SequenceFile.java:1038)
> at
> org.apache.hadoop.io.SequenceFile$Sorter.writeFile(SequenceFile.java:2626)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$LocalFSMerger.run(ReduceTask.java:1564)
> Caused by: java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:260)
> at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:197)
> ... 16 more
> 2008-08-29 23:53:14,013 WARN org.apache.hadoop.mapred.TaskTracker: Error
> running child
> java.io.IOException: task_200808291851_0001_r_000245_0The reduce copier failed
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:329)
> at
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.