[ https://issues.apache.org/jira/browse/HDFS-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14379785#comment-14379785 ]
Kai Zheng commented on HDFS-7344: --------------------------------- Hello [~szetszwo], Thanks for taking care of this. Let me address your comments together. Please let know if it works. Thanks. bq.For 1 missing block, we may not need to recover it at all since (6,3)\-Reed-Solomon can tolerate 3 missing blocks. Also recovery is more efficient for 2- or 3- missing blocks. Good thoughts. I remembered we had related discussion with [~zhz]. The idea is we have different priorities for recovery tasks considering how urgent the erased blocks are necessarily to be recovered. As you said, 2- or 3- erased blocks are more urgent than 1- erased so would be of higher priority when NN schedules. Note 1- erased block is still needed to be recovered when possible because as existing customer runs, in most cases only one block is erased and to be recovered. Recovering 1- erased block can also be efficient, because in such case simple XOR calculation can be used and no RS overhead will incur. bq.Since you are working on the client, how about letting someone else working on the datanode changes? Good suggestion. Discussed with [~libo-intel], I will help before he can be back to this after done with the client side. As it's going in the client side where [~libo-intel] collaborates with [~jingzhao], [~zhz] and gets the hard part already done, I believe we also need the very good community collaboration here as well. How do you like this, let me update the design doc first in the early of next week, discussing with [~umamaheswararao], [~vinayrpet] and etc., incorporating the discussions here by [~zhz] and you. The doc is subject to your review and further discussion here. Meanwhile I will also update and refine Bo's codes based on the latest design and the branch in another week, so have concrete doable thoughts to break this whole down into smaller tasks, then others than me and Bo can also help in parallel as you suggested. Hope this works. > Erasure Coding worker and support in DataNode > --------------------------------------------- > > Key: HDFS-7344 > URL: https://issues.apache.org/jira/browse/HDFS-7344 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode > Reporter: Kai Zheng > Assignee: Li Bo > Attachments: HDFS ECWorker Design.pdf, hdfs-ec-datanode.0108.zip, > hdfs-ec-datanode.0108.zip > > > According to HDFS-7285 and the design, this handles DataNode side extension > and related support for Erasure Coding, and implements ECWorker. It mainly > covers the following aspects, and separate tasks may be opened to handle each > of them. > * Process encoding work, calculating parity blocks as specified in block > groups and codec schema; > * Process decoding work, recovering data blocks according to block groups and > codec schema; > * Handle client requests for passive recovery blocks data and serving data on > demand while reconstructing; > * Write parity blocks according to storage policy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)