[ 
https://issues.apache.org/jira/browse/HADOOP-15558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588900#comment-16588900
 ] 

Sammi Chen edited comment on HADOOP-15558 at 8/22/18 2:11 PM:
--------------------------------------------------------------

Hi [~shreya2205], several comments after gone through the slides and code.  
1. The encoding and decoding of Clay Codes involve PFT, PRT and RS computation. 
So basically the idea is to reduce the network throughput and disk read during 
data repair phase by adding additional computations. In single data node 
failure case, Clay Codes can save 2/3 network bandwidth compared with RS. In 
worst case, Clay Codes will behave the same as RS from network bandwidth wise.  
Given that most of the failures are single node failure in the storage cluster, 
cluster can benefit from Clay Codes with no doubt.  I assume all the benchmark 
data in slides are collected in single data node failure case. Correct me if 
it's not correct.   

2. On P22 of slides, it says " total encoding time remains the same while Clay 
Codec has 70% higher encode computation time".  Confused, could you explain it 
further? 

3. On P21 of slices, Fragmented Read, it says there is no impact on SSD when 
sub-chunk size reaches 4KB. DO you have any data for HDD? Since the 
Hadoop/HDFS, HDD is still the majority. 

4. P23, what does the "Degraded I/O" scenario means in the slides? 

5. From the slices, we can see to configure a Clay Codec, k, m, d and sub-trunk 
size all have matter. While in the implementation, only k and m are 
configurable. What about d and sub-trunk? 

6. I googled a lot but found very few links about PFT and PRT matrix. Do you 
have any documents for them?

7. For implementation part, is clone input blocks a must when 
prepareEncodingStep?  Also could you add more comments, such as whih part is 
PFT computation, and PRT computation. I will go through the code again later. 
Also ClayCodeUtil is better to be placed in a new file. 

8. Code style. Here are a list of Hadoop code styles to follow. 
*       a. Import * is not recommended
*       b. a line cannot exceed 80 characters
*       c. tab is 4 spaces
*       d. new line indent is 2 spaces
*       e. cross line indent is 4 spaces
*       f. remove unnecessary empty line
*       g. 1 space between operator and value,
        
for examples,   
              
{code:java}
  if (rsRawDecoder==null) {     =>  if (rsRawDecoder == null) {
  new ErasureCoderOptions(2,2);   => new ErasureCoderOptions(2, 2);
  if(erasedIndexes.length==1){ =>    if (erasedIndexes.length == 1) {
{code}

        


was (Author: sammi):
Hi [~shreya2205], several comments after gone through the slides and code.  
1. The encoding and decoding of Clay Codes involve PFT, PRT and RS computation. 
So basically the idea is to reduce the network throughput and disk read during 
data repair phase by adding additional computations. In single data node 
failure case, Clay Codes can save 2/3 network bandwidth compared with RS. In 
worst case, Clay Codes will behave the same as RS from network bandwidth wise.  
Given that most of the failures are single node failure in the storage cluster, 
cluster can benefit from Clay Codes with no doubt.  I assume all the benchmark 
data in slides are collected in single data node failure case. Correct me if 
it's not correct.   
2. On P22 of slides, it says " total encoding time remains the same while Clay 
Codec has 70% higher encode computation time".  Confused, could you explain it 
further? 
3. On P21 of slices, Fragmented Read, it says there is no impact on SSD when 
sub-chunk size reaches 4KB. DO you have any data for HDD? Since the 
Hadoop/HDFS, HDD is still the majority. 
4. P23, what does the "Degraded I/O" scenario means in the slides? 
5. From the slices, we can see to configure a Clay Codec, k, m, d and sub-trunk 
size all have matter. While in the implementation, only k and m are 
configurable. What about d and sub-trunk? 
6. I googled a lot but found very few links about PFT and PRT matrix. Do you 
have any documents for them?
7. For implementation part, is clone input blocks a must when 
prepareEncodingStep?  Also could you add more comments, such as whih part is 
PFT computation, and PRT computation. I will go through the code again later. 
Also ClayCodeUtil is better to be placed in a new file. 
8. Code style. Here are a list of Hadoop code styles to follow. 
        a. Import * is not recommended
        b. a line cannot exceed 80 characters
        c. tab is 4 spaces
        d. new line indent is 2 spaces
        e. cross line indent is 4 spaces
        f. remove unnecessary empty line
        g. 1 space between operator and value,
        for examples,   
            if (rsRawDecoder==null) {     =>  if (rsRawDecoder == null) {
                new ErasureCoderOptions(2,2);   => new ErasureCoderOptions(2, 
2);
                if(erasedIndexes.length==1){ =>    if (erasedIndexes.length == 
1) {
        

> Implementation of Clay Codes plugin (Coupled Layer MSR codes) 
> --------------------------------------------------------------
>
>                 Key: HADOOP-15558
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15558
>             Project: Hadoop Common
>          Issue Type: New Feature
>            Reporter: Chaitanya Mukka
>            Assignee: Chaitanya Mukka
>            Priority: Major
>         Attachments: ClayCodeCodecDesign-20180630.pdf, 
> HADOOP-15558.001.patch, HADOOP-15558.002.patch
>
>
> [Clay Codes|https://www.usenix.org/conference/fast18/presentation/vajha] are 
> new erasure codes developed as a research project at Codes and Signal Design 
> Lab, IISc Bangalore. A particular Clay code, with storage overhead 1.25x, has 
> been shown to reduce repair network traffic, disk read and repair times by 
> factors of 2.9, 3.4 and 3 respectively compared to the RS codes with the same 
> parameters. 
> This Jira aims to introduce Clay Codes to HDFS-EC as one of the pluggable 
> erasure codec.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to