[
https://issues.apache.org/jira/browse/HADOOP-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531171
]
Raghu Angadi commented on HADOOP-1955:
--------------------------------------
Yes, this is an issue.
Koji, as a crude work around, could you try reading the file ? If reading
succeeds, you could just manually remove the courrupt source block.
> Corrupted block replication retries for ever
> --------------------------------------------
>
> Key: HADOOP-1955
> URL: https://issues.apache.org/jira/browse/HADOOP-1955
> Project: Hadoop
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.14.1
> Reporter: Koji Noguchi
> Assignee: Raghu Angadi
> Priority: Blocker
> Fix For: 0.14.2
>
>
> When replicating corrupted block, receiving side rejects the block due to
> checksum error. Namenode keeps on retrying (with the same source datanode).
> Fsck shows those blocks as under-replicated.
> [Namenode log]
> {noformat}
> 2007-09-27 02:00:05,273 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.heartbeatCheck: lost heartbeat from 99.2.99.111
> ...
> 2007-09-27 02:01:02,618 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.pendingTransfer: ask 99.9.99.11:9999 to replicate
> blk_-5925066143536023890 to datanode(s) 99.9.99.37:9999
> 2007-09-27 02:10:03,843 WARN org.apache.hadoop.fs.FSNamesystem:
> PendingReplicationMonitor timed out block blk_-5925066143536023890
> 2007-09-27 02:10:08,248 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.pendingTransfer: ask 99.9.99.11:9999 to replicate
> blk_-5925066143536023890 to datanode(s) 99.9.99.35:9999
> 2007-09-27 02:20:03,848 WARN org.apache.hadoop.fs.FSNamesystem:
> PendingReplicationMonitor timed out block blk_-5925066143536023890
> 2007-09-27 02:20:08,646 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.pendingTransfer: ask 99.9.99.11:9999 to replicate
> blk_-5925066143536023890 to datanode(s) 99.9.99.19:9999
> (repeats)
> {noformat}
> [Datanode(sender) 99.9.99.11 log]
> {noformat}
> 2007-09-27 02:01:04,493 INFO org.apache.hadoop.dfs.DataNode: Starting thread
> to transfer block blk_-5925066143536023890 to
> [Lorg.apache.hadoop.dfs.DatanodeInfo;@e58187
> 2007-09-27 02:01:05,153 WARN org.apache.hadoop.dfs.DataNode: Failed to
> transfer blk_-5925066143536023890 to 74.6.128.37:50010 got
> java.net.SocketException: Connection reset
> at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96)
> at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
> at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
> at org.apache.hadoop.dfs.DataNode.sendBlock(DataNode.java:1231)
> at org.apache.hadoop.dfs.DataNode$DataTransfer.run(DataNode.java:1280)
> at java.lang.Thread.run(Thread.java:619)
> (repeats)
> {noformat}
> [Datanode(one of the receiver) 99.9.99.37 log]
> {noformat}
> 2007-09-27 02:01:05,150 ERROR org.apache.hadoop.dfs.DataNode: DataXceiver:
> java.io.IOException: Unexpected checksum mismatch while writing
> blk_-5925066143536023890 from /74.6.128.33:57605
> at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:902)
> at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:727)
> at java.lang.Thread.run(Thread.java:619)
> {noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.