[
https://issues.apache.org/jira/browse/HADOOP-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582951#action_12582951
]
Amar Kamat commented on HADOOP-2893:
------------------------------------
The first stack trace indicates that the checksum error appears while serving
the map output while the second is in the reduce phase. But in both the cases
the file resides on the same disk (#4)!
This is also true for the log messages from Koji (disk #3). I think its a disk
related problem. I haven't seen this error with trunk for long. Can some verify
this? Can someone attach logs so that it can be analyzed further?
> checksum exceptions on trunk
> ----------------------------
>
> Key: HADOOP-2893
> URL: https://issues.apache.org/jira/browse/HADOOP-2893
> Project: Hadoop Core
> Issue Type: Bug
> Components: fs
> Affects Versions: 0.15.3, 0.17.0
> Reporter: lohit vijayarenu
>
> While running jobs like Sort/WordCount on trunk I see few task failures with
> ChecksumException
> Re-running the tasks on different nodes succeeds.
> Here is the stack
> {noformat}
> Map output lost, rescheduling:
> getMapOutput(task_200802251721_0004_m_000237_0,29) failed :
> org.apache.hadoop.fs.ChecksumException: Checksum error:
> /tmps/4/mapred-tt/mapred-local/task_200802251721_0004_m_000237_0/file.out at
> 2085376
> at org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:276)
> at
> org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:238)
> at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:189)
> at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:157)
> at java.io.DataInputStream.read(DataInputStream.java:132)
> at
> org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2299)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
> at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
> at
> org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
> at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
> at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
> at
> org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
> at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
> at org.mortbay.http.HttpServer.service(HttpServer.java:954)
> at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
> at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
> at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
> at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
> at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
> at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
> {noformat}
> another stack
> {noformat}
> Caused by: org.apache.hadoop.fs.ChecksumException: Checksum error:
> /tmps/4/mapred-tt/mapred-local/task_200802251721_0004_r_000110_0/map_367.out
> at 21884416
> at org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:276)
> at
> org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:238)
> at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:176)
> at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:193)
> at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:157)
> at java.io.DataInputStream.readFully(DataInputStream.java:178)
> at
> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:56)
> at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90)
> at
> org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1930)
> at
> org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(SequenceFile.java:2958)
> at
> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.next(SequenceFile.java:2716)
> at
> org.apache.hadoop.mapred.ReduceTask$ValuesIterator.getNext(ReduceTask.java:209)
> at
> org.apache.hadoop.mapred.ReduceTask$ValuesIterator.next(ReduceTask.java:177)
> ... 5 more
> {noformat}
> both with local files
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.