[ 
https://issues.apache.org/jira/browse/HDFS-7348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HDFS-7348:
-------------------------
    Attachment: HDFS-7348.002.patch

Thanks Zhe for the good comment.

I update the patch according to our discussion and address the comments. Main 
changes to the patch:
*1.* The buffer size is configurable now, and default size is 256KB, same as 
default cell size.
*2.* Add encode and decode logic for recovery. If all missed blocks are parity 
blocks, then we need to do encode, there is an improvement, I filed 
HADOOP-11908.  If one of missed blocks is data block, we need to do decode, 
currently I found decode only works for data blocks and we also need to prepare 
full inputs as Zhe said. So the decode logic in the patch is a workaround and 
only works for parityBlkNum number of data blocks missed. We can update it 
after HADOOP-11847.
*3.* Enhance test cases. And they success in my local env.

Zhe, following is reply to some of your comments and I address your other 
comments in the patch:
{quote}
Why do we need targetInputStreams?
{quote}
My original design is to do packet ack check, we can do it in phase 2, so I 
remove it from the current patch.

{quote}
The test failed on my local machine, reporting NPE when closing file
{quote}
I found it's a bug of existing code, I filed HDFS-8313 for it. The exception 
occurs accidentally.

{quote}
cluster#stopDataNode might be an easier way to kill a DN?
{quote}
{{stopDataNode}} can only shutdown the DN, and NN needs to wait for long time 
to mark the datanode as dead.  So as I said in the test comment, we need to 
clear its update time and trigger NN to check heartbeat, then NN will mark the 
datanode as dead immediately, and then can schedule striped block recovery.

{quote}
Should WRITE_PACKET_SIZE be linked to BlockSender#MIN_BUFFER_WITH_TRANSFERTO
{quote}
{{BlockSender#MIN_BUFFER_WITH_TRANSFERTO}} is for transfer of continuous block 
replication, it's a little different (transfer the file directly), I don't want 
to connect it with that, I think it's fine we define the value directly.

{quote}
Follow on: we should consider consolidating the init thread pool logic for 
hedged read, client striped read, and DN striped read.
{quote}
yes, we can do it in follow-on.

> Erasure Coding: striped block recovery
> --------------------------------------
>
>                 Key: HDFS-7348
>                 URL: https://issues.apache.org/jira/browse/HDFS-7348
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode
>            Reporter: Kai Zheng
>            Assignee: Yi Liu
>         Attachments: ECWorker.java, HDFS-7348.001.patch, HDFS-7348.002.patch
>
>
> This JIRA is to recover one or more missed striped block in the striped block 
> group.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to