[jira] [Updated] (HDFS-7348) Erasure Coding: striped block recovery
[ https://issues.apache.org/jira/browse/HDFS-7348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-7348: - Attachment: HDFS-7348.003.patch Update the patch for Zhe's comments. I also remove the hardcode of (dataBlkNum, parityBlkNum, cellSize), since they are now available in ECSchema from BlockECRecoveryCommand. The test passed in my local environment and current tests include recovering one or more parity blocks and recovering 3 data blocks, we can update {{recoverTargets}} and related tests in follow-on after the decode issue(HADOOP-11847) fixed. The comments are addressed in the patch and some replies: {quote} I guess somewhere in the code (not necessarily in this path) we should assert all internal blocks indeed share the same checksum? {quote} You are right, we can assert them, we can do it in a {{else}}. {quote} In the main run loop, success and success4Target look a little confusing ... {quote} Right, I rename {{success4Target}} to {{targetsStatus}} as you suggested, I keep others and not use BitSet, since we need array access for them, which is more efficient and convenient for code. Besides, I add {{targetInputStreams}}, they are used in the sasl negotiation when connecting to other DNs, and we'd better to close it too, even thought we had closed it's socket. > Erasure Coding: striped block recovery > -- > > Key: HDFS-7348 > URL: https://issues.apache.org/jira/browse/HDFS-7348 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Reporter: Kai Zheng >Assignee: Yi Liu > Attachments: ECWorker.java, HDFS-7348.001.patch, HDFS-7348.002.patch, > HDFS-7348.003.patch > > > This JIRA is to recover one or more missed striped block in the striped block > group. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7348) Erasure Coding: striped block recovery
[ https://issues.apache.org/jira/browse/HDFS-7348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-7348: - Attachment: HDFS-7348.002.patch Thanks Zhe for the good comment. I update the patch according to our discussion and address the comments. Main changes to the patch: *1.* The buffer size is configurable now, and default size is 256KB, same as default cell size. *2.* Add encode and decode logic for recovery. If all missed blocks are parity blocks, then we need to do encode, there is an improvement, I filed HADOOP-11908. If one of missed blocks is data block, we need to do decode, currently I found decode only works for data blocks and we also need to prepare full inputs as Zhe said. So the decode logic in the patch is a workaround and only works for parityBlkNum number of data blocks missed. We can update it after HADOOP-11847. *3.* Enhance test cases. And they success in my local env. Zhe, following is reply to some of your comments and I address your other comments in the patch: {quote} Why do we need targetInputStreams? {quote} My original design is to do packet ack check, we can do it in phase 2, so I remove it from the current patch. {quote} The test failed on my local machine, reporting NPE when closing file {quote} I found it's a bug of existing code, I filed HDFS-8313 for it. The exception occurs accidentally. {quote} cluster#stopDataNode might be an easier way to kill a DN? {quote} {{stopDataNode}} can only shutdown the DN, and NN needs to wait for long time to mark the datanode as dead. So as I said in the test comment, we need to clear its update time and trigger NN to check heartbeat, then NN will mark the datanode as dead immediately, and then can schedule striped block recovery. {quote} Should WRITE_PACKET_SIZE be linked to BlockSender#MIN_BUFFER_WITH_TRANSFERTO {quote} {{BlockSender#MIN_BUFFER_WITH_TRANSFERTO}} is for transfer of continuous block replication, it's a little different (transfer the file directly), I don't want to connect it with that, I think it's fine we define the value directly. {quote} Follow on: we should consider consolidating the init thread pool logic for hedged read, client striped read, and DN striped read. {quote} yes, we can do it in follow-on. > Erasure Coding: striped block recovery > -- > > Key: HDFS-7348 > URL: https://issues.apache.org/jira/browse/HDFS-7348 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Reporter: Kai Zheng >Assignee: Yi Liu > Attachments: ECWorker.java, HDFS-7348.001.patch, HDFS-7348.002.patch > > > This JIRA is to recover one or more missed striped block in the striped block > group. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7348) Erasure Coding: striped block recovery
[ https://issues.apache.org/jira/browse/HDFS-7348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-7348: - Description: This JIRA is to recover one or more missed striped block in the striped block group. (was: This assumes the facilities like block reader and writer are ready, implements and performs erasure decoding/recovery work in *stripping* case utilizing erasure codec and coder provided by the codec framework.) > Erasure Coding: striped block recovery > -- > > Key: HDFS-7348 > URL: https://issues.apache.org/jira/browse/HDFS-7348 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Reporter: Kai Zheng >Assignee: Yi Liu > Attachments: ECWorker.java, HDFS-7348.001.patch > > > This JIRA is to recover one or more missed striped block in the striped block > group. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7348) Erasure Coding: striped block recovery
[ https://issues.apache.org/jira/browse/HDFS-7348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-7348: - Summary: Erasure Coding: striped block recovery (was: Erasure Coding: perform stripping erasure decoding/recovery work given block reader and writer) > Erasure Coding: striped block recovery > -- > > Key: HDFS-7348 > URL: https://issues.apache.org/jira/browse/HDFS-7348 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Reporter: Kai Zheng >Assignee: Yi Liu > Attachments: ECWorker.java, HDFS-7348.001.patch > > > This assumes the facilities like block reader and writer are ready, > implements and performs erasure decoding/recovery work in *stripping* case > utilizing erasure codec and coder provided by the codec framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332)