[ https://issues.apache.org/jira/browse/HDFS-8224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15399736#comment-15399736 ]
Rushabh S Shah commented on HDFS-8224: -------------------------------------- The exception in this jira is occurring at BlockSender constructor. {noformat} blockSender = new BlockSender(b, 0, b.getNumBytes(), false, false, true, DataNode.this, null, cachingStrategy); {noformat} The exception mentioned in HDFS-10627 is occurring at: {noformat} // send data & checksum blockSender.sendBlock(out, unbufOut, null); {noformat} For this jira, I was thinking as follows: {code:title=DataChecksum.java|borderStyle=solid} public static DataChecksum newDataChecksum( DataInputStream in ) throws IOException { int type = in.readByte(); int bpc = in.readInt(); DataChecksum summer = newDataChecksum(Type.valueOf(type), bpc ); if ( summer == null ) { throw new IOException( "Could not create DataChecksum of type " + type + " with bytesPerChecksum " + bpc ); } return summer; } {code} If we can throw _TypeZeroException_ instead of IOException (which ofcourse extends IOException) in case if summer == null Since summer will be null only if bytesPerChecksum <= 0 {code:title=DataChecksum.java|borderStyle=solid} public static DataChecksum newDataChecksum(Type type, int bytesPerChecksum ) { if ( bytesPerChecksum <= 0 ) { return null; } switch ( type ) { case NULL : return new DataChecksum(type, new ChecksumNull(), bytesPerChecksum ); case CRC32 : return new DataChecksum(type, newCrc32(), bytesPerChecksum ); case CRC32C: return new DataChecksum(type, new PureJavaCrc32C(), bytesPerChecksum); default: return null; } } {code} In the DataTransfer#run method, either we can add a try block across BlockSender constructor and check if thrown exception is an instance of _TypeZeroException_ or in the catch block as per the code today. If it is _TypeZeroException_ then we can add it to scanning queue and keep the remaining logic as it is [~jojochuang]: Any thoughts ? > Any IOException in DataTransfer#run() will run diskError thread even if it is > not disk error > -------------------------------------------------------------------------------------------- > > Key: HDFS-8224 > URL: https://issues.apache.org/jira/browse/HDFS-8224 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.6.0 > Reporter: Rushabh S Shah > Assignee: Rushabh S Shah > Fix For: 2.8.0 > > > This happened in our 2.6 cluster. > One of the block and its metadata file were corrupted. > The disk was healthy in this case. > Only the block was corrupt. > Namenode tried to copy that block to another datanode but failed with the > following stack trace: > 2015-04-20 01:04:04,421 > [org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer@11319bc4] WARN > datanode.DataNode: DatanodeRegistration(a.b.c.d, > datanodeUuid=e8c5135c-9b9f-4d05-a59d-e5525518aca7, infoPort=1006, > infoSecurePort=0, ipcPort=8020, > storageInfo=lv=-56;cid=CID-e7f736ac-158e-446e-9091-7e66f3cddf3c;nsid=358250775;c=1428471998571):Failed > to transfer BP-xxx-1351096255769:blk_2697560713_1107108863999 to > a1.b1.c1.d1:1004 got > java.io.IOException: Could not create DataChecksum of type 0 with > bytesPerChecksum 0 > at > org.apache.hadoop.util.DataChecksum.newDataChecksum(DataChecksum.java:125) > at > org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.readHeader(BlockMetadataHeader.java:175) > at > org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.readHeader(BlockMetadataHeader.java:140) > at > org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.readDataChecksum(BlockMetadataHeader.java:102) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:287) > at > org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:1989) > at java.lang.Thread.run(Thread.java:722) > The following catch block in DataTransfer#run method will treat every > IOException as disk error fault and run disk errror > {noformat} > catch (IOException ie) { > LOG.warn(bpReg + ":Failed to transfer " + b + " to " + > targets[0] + " got ", ie); > // check if there are any disk problem > checkDiskErrorAsync(); > } > {noformat} > This block was never scanned by BlockPoolSliceScanner otherwise it would have > reported as corrupt block. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org