[ https://issues.apache.org/jira/browse/HDFS-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13196553#comment-13196553 ]
Ravi Prakash commented on HDFS-2848: ------------------------------------ The problem seems to be in BlockSender.java:258-285 Imagine our file was originally 100 bytes and got corrupted to 110 bytes. {noformat} // end is either last byte on disk or the length for which we have a // checksum long end = chunkChecksum != null ? chunkChecksum.getDataLength() : replica.getBytesOnDisk(); if (startOffset < 0 || startOffset > end || (length + startOffset) > end) { String msg = " Offset " + startOffset + " and length " + length + " don't match block " + block + " ( blockLen " + end + " )"; LOG.warn(datanode.getDNRegistrationForBP(block.getBlockPoolId()) + ":sendBlock() : " + msg); throw new IOException(msg); } // Ensure read offset is position at the beginning of chunk offset = startOffset - (startOffset % chunkSize); if (length >= 0) { // Ensure endOffset points to end of chunk. long tmpLen = startOffset + length; if (tmpLen % chunkSize != 0) { tmpLen += (chunkSize - tmpLen % chunkSize); } if (tmpLen < end) { // will use on-disk checksum here since the end is a stable chunk end = tmpLen; } else if (chunkChecksum != null) { // last chunk is changing. flag that we need to use in-memory checksum this.lastChunkChecksum = chunkChecksum; } } endOffset = end; {noformat} Then "end" here will be 110, because of replica.getBytesOnDisk() The calculation of endOffset seems to be missing its mark. Either that or BlockSender:sendPacket() should be properly checking the checksum till endOffset which it is not > hdfs corruption appended to blocks is not detected by fs commands or fsck > ------------------------------------------------------------------------- > > Key: HDFS-2848 > URL: https://issues.apache.org/jira/browse/HDFS-2848 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 0.23.0 > Reporter: Ravi Prakash > Assignee: Ravi Prakash > > Courtesy Pat White [~patwhitey2007] > {quote} > Appears that there is a regression in corrupt block detection by both fsck > and fs cmds like 'cat'. Testcases for > pre-block and block-overwrite corruption of all replicas is correctly > reporting errors however post-block corruption is > not, fsck on the filesystem reports it's Healthy and 'cat' returns without > error. Looking at the DN blocks themselves, > they clearly contain the injected corruption pattern. > {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira