[jira] [Updated] (HDFS-7348) Erasure Coding: striped block recovery

2015-05-05 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HDFS-7348:
-
Attachment: HDFS-7348.003.patch

Update the patch for Zhe's comments.
I also remove the hardcode of (dataBlkNum, parityBlkNum, cellSize), since they 
are now available in ECSchema from BlockECRecoveryCommand.
The test passed in my local environment and current tests include recovering 
one or more parity blocks and recovering  3 data blocks, we can update 
{{recoverTargets}} and related tests in follow-on after the decode 
issue(HADOOP-11847) fixed.

The comments are addressed in the patch and some replies:
{quote}
I guess somewhere in the code (not necessarily in this path) we should assert 
all internal blocks indeed share the same checksum?
{quote}
You are right, we can assert them, we can do it in a {{else}}.

{quote}
In the main run loop, success and success4Target look a little confusing ...
{quote}
Right, I rename {{success4Target}} to {{targetsStatus}} as you suggested, I 
keep others and not use BitSet, since we need array access for them, which is 
more efficient and convenient for code.

Besides, I add {{targetInputStreams}}, they are used in the sasl negotiation 
when connecting to other DNs, and we'd better to close it too, even thought we 
had closed it's socket.

> Erasure Coding: striped block recovery
> --
>
> Key: HDFS-7348
> URL: https://issues.apache.org/jira/browse/HDFS-7348
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Kai Zheng
>Assignee: Yi Liu
> Attachments: ECWorker.java, HDFS-7348.001.patch, HDFS-7348.002.patch, 
> HDFS-7348.003.patch
>
>
> This JIRA is to recover one or more missed striped block in the striped block 
> group.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7348) Erasure Coding: striped block recovery

2015-05-03 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HDFS-7348:
-
Attachment: HDFS-7348.002.patch

Thanks Zhe for the good comment.

I update the patch according to our discussion and address the comments. Main 
changes to the patch:
*1.* The buffer size is configurable now, and default size is 256KB, same as 
default cell size.
*2.* Add encode and decode logic for recovery. If all missed blocks are parity 
blocks, then we need to do encode, there is an improvement, I filed 
HADOOP-11908.  If one of missed blocks is data block, we need to do decode, 
currently I found decode only works for data blocks and we also need to prepare 
full inputs as Zhe said. So the decode logic in the patch is a workaround and 
only works for parityBlkNum number of data blocks missed. We can update it 
after HADOOP-11847.
*3.* Enhance test cases. And they success in my local env.

Zhe, following is reply to some of your comments and I address your other 
comments in the patch:
{quote}
Why do we need targetInputStreams?
{quote}
My original design is to do packet ack check, we can do it in phase 2, so I 
remove it from the current patch.

{quote}
The test failed on my local machine, reporting NPE when closing file
{quote}
I found it's a bug of existing code, I filed HDFS-8313 for it. The exception 
occurs accidentally.

{quote}
cluster#stopDataNode might be an easier way to kill a DN?
{quote}
{{stopDataNode}} can only shutdown the DN, and NN needs to wait for long time 
to mark the datanode as dead.  So as I said in the test comment, we need to 
clear its update time and trigger NN to check heartbeat, then NN will mark the 
datanode as dead immediately, and then can schedule striped block recovery.

{quote}
Should WRITE_PACKET_SIZE be linked to BlockSender#MIN_BUFFER_WITH_TRANSFERTO
{quote}
{{BlockSender#MIN_BUFFER_WITH_TRANSFERTO}} is for transfer of continuous block 
replication, it's a little different (transfer the file directly), I don't want 
to connect it with that, I think it's fine we define the value directly.

{quote}
Follow on: we should consider consolidating the init thread pool logic for 
hedged read, client striped read, and DN striped read.
{quote}
yes, we can do it in follow-on.

> Erasure Coding: striped block recovery
> --
>
> Key: HDFS-7348
> URL: https://issues.apache.org/jira/browse/HDFS-7348
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Kai Zheng
>Assignee: Yi Liu
> Attachments: ECWorker.java, HDFS-7348.001.patch, HDFS-7348.002.patch
>
>
> This JIRA is to recover one or more missed striped block in the striped block 
> group.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7348) Erasure Coding: striped block recovery

2015-04-28 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HDFS-7348:
-
Description: This JIRA is to recover one or more missed striped block in 
the striped block group.  (was: This assumes the facilities like block reader 
and writer are ready, implements and performs erasure decoding/recovery work in 
*stripping* case utilizing erasure codec and coder provided by the codec 
framework.)

> Erasure Coding: striped block recovery
> --
>
> Key: HDFS-7348
> URL: https://issues.apache.org/jira/browse/HDFS-7348
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Kai Zheng
>Assignee: Yi Liu
> Attachments: ECWorker.java, HDFS-7348.001.patch
>
>
> This JIRA is to recover one or more missed striped block in the striped block 
> group.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7348) Erasure Coding: striped block recovery

2015-04-28 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HDFS-7348:
-
Summary: Erasure Coding: striped block recovery  (was: Erasure Coding: 
perform stripping erasure decoding/recovery work given block reader and writer)

> Erasure Coding: striped block recovery
> --
>
> Key: HDFS-7348
> URL: https://issues.apache.org/jira/browse/HDFS-7348
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Kai Zheng
>Assignee: Yi Liu
> Attachments: ECWorker.java, HDFS-7348.001.patch
>
>
> This assumes the facilities like block reader and writer are ready, 
> implements and performs erasure decoding/recovery work in *stripping* case 
> utilizing erasure codec and coder provided by the codec framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)