[ 
https://issues.apache.org/jira/browse/HADOOP-2893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12582951#action_12582951
 ] 

Amar Kamat commented on HADOOP-2893:
------------------------------------

The first stack trace indicates that the checksum error appears while serving 
the map output while the second is in the reduce phase. But in both the cases 
the file resides on the same disk (#4)! 
This is also true for the log messages from Koji  (disk #3). I think its a disk 
related problem. I haven't seen this error with trunk for long. Can some verify 
this? Can someone attach logs so that it can be analyzed further?

> checksum exceptions on trunk
> ----------------------------
>
>                 Key: HADOOP-2893
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2893
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.15.3, 0.17.0
>            Reporter: lohit vijayarenu
>
> While running jobs like Sort/WordCount on trunk I see few task failures with 
> ChecksumException
> Re-running the tasks on different nodes succeeds. 
> Here is the stack
> {noformat}
> Map output lost, rescheduling: 
> getMapOutput(task_200802251721_0004_m_000237_0,29) failed :
> org.apache.hadoop.fs.ChecksumException: Checksum error: 
> /tmps/4/mapred-tt/mapred-local/task_200802251721_0004_m_000237_0/file.out at 
> 2085376
>   at org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:276)
>   at 
> org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:238)
>   at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:189)
>   at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:157)
>   at java.io.DataInputStream.read(DataInputStream.java:132)
>   at 
> org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2299)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
>   at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
>   at 
> org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
>   at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
>   at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
>   at 
> org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
>   at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
>   at org.mortbay.http.HttpServer.service(HttpServer.java:954)
>   at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
>   at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
>   at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
>   at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
>   at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
>   at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
> {noformat}
> another stack
> {noformat}
> Caused by: org.apache.hadoop.fs.ChecksumException: Checksum error: 
> /tmps/4/mapred-tt/mapred-local/task_200802251721_0004_r_000110_0/map_367.out 
> at 21884416
>   at org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:276)
>   at 
> org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:238)
>   at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:176)
>   at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:193)
>   at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:157)
>   at java.io.DataInputStream.readFully(DataInputStream.java:178)
>   at 
> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:56)
>   at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1930)
>   at 
> org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(SequenceFile.java:2958)
>   at 
> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.next(SequenceFile.java:2716)
>   at 
> org.apache.hadoop.mapred.ReduceTask$ValuesIterator.getNext(ReduceTask.java:209)
>   at 
> org.apache.hadoop.mapred.ReduceTask$ValuesIterator.next(ReduceTask.java:177)
>   ... 5 more
> {noformat}
> both with local files

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to