[jira] [Comment Edited] (HDFS-10178) Permanent write failures can happen if pipeline recoveries occur for the first packet

2016-03-19 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15200464#comment-15200464
 ] 

Kihwal Lee edited comment on HDFS-10178 at 3/17/16 10:08 PM:
-

Datanodes logs something like this:

{noformat}
java.io.IOException: Invalid checksum length: received length is 504 but 
expected length is 0
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:586)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:895)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:804)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251)
at java.lang.Thread.run(Thread.java:745)
{noformat}
This causes a permanent write failure.

The problem is in {{BlockSender}}. When transferring a block, {{BlockSender}} 
gets checksum type from the on-disk meta file.
{code:java}
  if (metaIn.getLength() > BlockMetadataHeader.getHeaderSize()) {
...
csum = BlockMetadataHeader.readDataChecksum(checksumIn, block);
...
  }
...
  if (csum == null) {
csum = DataChecksum.newDataChecksum(DataChecksum.Type.NULL, 512);
  }
{code}

Since the code sets the checksum type to {{NULL}} if the on-disk meta file 
contains only the header portion, the checksum type during a block transfer is 
set incorrectly.  When a data packet arrives with checksum, datanode checks 
whether it has received the correct amount of checksum data.
{code:java}
  final int checksumLen = diskChecksum.getChecksumSize(len);
  final int checksumReceivedLen = checksumBuf.capacity();

  if (checksumReceivedLen > 0 && checksumReceivedLen != checksumLen) {
throw new IOException("Invalid checksum length: received length is "
+ checksumReceivedLen + " but expected length is " + checksumLen);
  }
{code}

The {{getChecksumSize()}} method of  {{NULL}} checksum type returns 0, so this 
check fails.


was (Author: kihwal):
Datanodes logs something like this:

{noformat}
java.io.IOException: Invalid checksum length: received length is 504 but 
expected length is 0
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:586)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:895)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:804)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:251)
at java.lang.Thread.run(Thread.java:745)
{noformat}
This causes a permanent write failure.

The problem is in {{BlockSender}}. When transferring a block, {{BlockSender}} 
gets checksum type from the on-disk meta file.
{code:java}
  if (metaIn.getLength() > BlockMetadataHeader.getHeaderSize()) {
...
csum = BlockMetadataHeader.readDataChecksum(checksumIn, block);
...
  }
...
  if (csum == null) {
csum = DataChecksum.newDataChecksum(DataChecksum.Type.NULL, 512);
  }
{code}

Since the code sets the checksum type to {{NULL}} if the on-disk meta file 
contains only the header portion, the checksum type during a block transfer is 
set incorrectly.  When a data packet arrives with checksum, datanode checks 
whether it has received the correct amount of checksum data.
{code:java}
  final int checksumLen = diskChecksum.getChecksumSize(len);
  final int checksumReceivedLen = checksumBuf.capacity();

  if (checksumReceivedLen > 0 && checksumReceivedLen != checksumLen) {
throw new IOException("Invalid checksum length: received length is "
+ checksumReceivedLen + " but expected length is " + checksumLen);
  }
{code}

The {{getChecksumSize()}} method of  {{NULL}} checksum type returns 0, this 
check fails.

> Permanent write failures can happen if pipeline recoveries occur for the 
> first packet
> -
>
> Key: HDFS-10178
> URL: https://issues.apache.org/jira/browse/HDFS-10178
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-10178.patch
>
>
> We have observed that write fails permanently if the first packet doesn't go 
> through properly and pipeline recovery happens. If the packet header is sent 
> o

[jira] [Comment Edited] (HDFS-10178) Permanent write failures can happen if pipeline recoveries occur for the first packet

2016-03-31 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15220639#comment-15220639
 ] 

Arpit Agarwal edited comment on HDFS-10178 at 3/31/16 8:43 PM:
---

Hi Kihwal, I think we can check {{replica.isOnTransientStorage()}} instead of 
passing the new flag. Something like this should work in {{BlockSender}}.
{code}
if (!replica.isOnTransientStorage() &&
metaIn.getLength() >= BlockMetadataHeader.getHeaderSize()) {
{code}


was (Author: arpitagarwal):
Hi Kihwal, I think we can check {{replica.isOnTransientStorage()}} instead of 
passing the new flag. Something like this should work
{code}
if (!replica.isOnTransientStorage() &&
metaIn.getLength() >= BlockMetadataHeader.getHeaderSize()) {
{code}

> Permanent write failures can happen if pipeline recoveries occur for the 
> first packet
> -
>
> Key: HDFS-10178
> URL: https://issues.apache.org/jira/browse/HDFS-10178
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Critical
> Attachments: HDFS-10178.patch, HDFS-10178.v2.patch, 
> HDFS-10178.v3.patch
>
>
> We have observed that write fails permanently if the first packet doesn't go 
> through properly and pipeline recovery happens. If the packet header is sent 
> out, but the data portion of the packet does not reach one or more datanodes 
> in time, the pipeline recovery will be done against the 0-byte partial block. 
>  
> If additional datanodes are added, the block is transferred to the new nodes. 
>  After the transfer, each node will have a meta file containing the header 
> and 0-length data block file. The pipeline recovery seems to work correctly 
> up to this point, but write fails when actual data packet is resent. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)