[ 
https://issues.apache.org/jira/browse/HDFS-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-10178:
------------------------------
    Attachment: HDFS-10178.patch

The following is from {{BlockSender}}, added by HDFS-6934.
{code:java}
// The meta file will contain only the header if the NULL checksum
// type was used, or if the replica was written to transient storage.
// Checksum verification is not performed for replicas on transient
// storage.  The header is important for determining the checksum
// type later when lazy persistence copies the block to non-transient
// storage and computes the checksum.
if (metaIn.getLength() > BlockMetadataHeader.getHeaderSize()) {
{code}
The code in the {{BlockSender}} makes a wrong assumption. If I simply changes 
{{>}} to {{>=}}, my test passes, but some of the lazy persist test cases fail. 
So I added another argument to the constructor.

 [~cnauroth], can you take a look at my patch?  I am not familiar with the lazy 
persist feature. There might be a better way.

> Permanent write failures can happen if pipeline recoveries occur for the 
> first packet
> -------------------------------------------------------------------------------------
>
>                 Key: HDFS-10178
>                 URL: https://issues.apache.org/jira/browse/HDFS-10178
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Critical
>         Attachments: HDFS-10178.patch
>
>
> We have observed that write fails permanently if the first packet doesn't go 
> through properly and pipeline recovery happens. If the packet header is sent 
> out, but the data portion of the packet does not reach one or more datanodes 
> in time, the pipeline recovery will be done against the 0-byte partial block. 
>  
> If additional datanodes are added, the block is transferred to the new nodes. 
>  After the transfer, each node will have a meta file containing the header 
> and 0-length data block file. The pipeline recovery seems to work correctly 
> up to this point, but write fails when actual data packet is resent. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to