[jira] [Commented] (HDFS-8224) Any IOException in DataTransfer#run() will run diskError thread even if it is not disk error

Rushabh S Shah (JIRA) Fri, 29 Jul 2016 10:41:08 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-8224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15399736#comment-15399736
 ]


Rushabh S Shah commented on HDFS-8224:
--------------------------------------

The exception in this jira is occurring at BlockSender constructor.
{noformat}
  blockSender = new BlockSender(b, 0, b.getNumBytes(), 
            false, false, true, DataNode.this, null, cachingStrategy);
{noformat}

The exception mentioned in HDFS-10627 is occurring at:
{noformat}
 // send data & checksum
        blockSender.sendBlock(out, unbufOut, null);
{noformat}

For this jira, I was thinking as follows:

{code:title=DataChecksum.java|borderStyle=solid}
public static DataChecksum newDataChecksum( DataInputStream in )
                                 throws IOException {
    int type = in.readByte();
    int bpc = in.readInt();
    DataChecksum summer = newDataChecksum(Type.valueOf(type), bpc );
    if ( summer == null ) {
      throw new IOException( "Could not create DataChecksum of type " +
                             type + " with bytesPerChecksum " + bpc );
    }
    return summer;
  }
{code}
If we can throw _TypeZeroException_ instead of IOException (which ofcourse 
extends IOException) in case if summer == null
Since summer will be null only if bytesPerChecksum <= 0 
{code:title=DataChecksum.java|borderStyle=solid}
 public static DataChecksum newDataChecksum(Type type, int bytesPerChecksum ) {
    if ( bytesPerChecksum <= 0 ) {
      return null;
    }
    
    switch ( type ) {
    case NULL :
      return new DataChecksum(type, new ChecksumNull(), bytesPerChecksum );
    case CRC32 :
      return new DataChecksum(type, newCrc32(), bytesPerChecksum );
    case CRC32C:
      return new DataChecksum(type, new PureJavaCrc32C(), bytesPerChecksum);
    default:
      return null;  
    }
  }
{code}
In the DataTransfer#run method, either we can add a try block across 
BlockSender constructor and  check if thrown exception is an instance of 
_TypeZeroException_ or in the catch block as per the code today. 
If it is _TypeZeroException_ then we can add it to scanning queue and keep the 
remaining logic as it is 
[~jojochuang]: Any thoughts ?


> Any IOException in DataTransfer#run() will run diskError thread even if it is 
> not disk error
> --------------------------------------------------------------------------------------------
>
>                 Key: HDFS-8224
>                 URL: https://issues.apache.org/jira/browse/HDFS-8224
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Rushabh S Shah
>            Assignee: Rushabh S Shah
>             Fix For: 2.8.0
>
>
> This happened in our 2.6 cluster.
> One of the block and its metadata file were corrupted.
> The disk was healthy in this case.
> Only the block was corrupt.
> Namenode tried to copy that block to another datanode but failed with the 
> following stack trace:
> 2015-04-20 01:04:04,421 
> [org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer@11319bc4] WARN 
> datanode.DataNode: DatanodeRegistration(a.b.c.d, 
> datanodeUuid=e8c5135c-9b9f-4d05-a59d-e5525518aca7, infoPort=1006, 
> infoSecurePort=0, ipcPort=8020, 
> storageInfo=lv=-56;cid=CID-e7f736ac-158e-446e-9091-7e66f3cddf3c;nsid=358250775;c=1428471998571):Failed
>  to transfer BP-xxx-1351096255769:blk_2697560713_1107108863999 to 
> a1.b1.c1.d1:1004 got 
> java.io.IOException: Could not create DataChecksum of type 0 with 
> bytesPerChecksum 0
>         at 
> org.apache.hadoop.util.DataChecksum.newDataChecksum(DataChecksum.java:125)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.readHeader(BlockMetadataHeader.java:175)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.readHeader(BlockMetadataHeader.java:140)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.readDataChecksum(BlockMetadataHeader.java:102)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:287)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:1989)
>         at java.lang.Thread.run(Thread.java:722)
> The following catch block in DataTransfer#run method will treat every 
> IOException as disk error fault and run disk errror
> {noformat}
> catch (IOException ie) {
>         LOG.warn(bpReg + ":Failed to transfer " + b + " to " +
>             targets[0] + " got ", ie);
>         // check if there are any disk problem
>         checkDiskErrorAsync();
>       } 
> {noformat}
> This block was never scanned by BlockPoolSliceScanner otherwise it would have 
> reported as corrupt block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-8224) Any IOException in DataTransfer#run() will run diskError thread even if it is not disk error

Reply via email to