[jira] Issue Comment Edited: (HADOOP-3328) DFS write pipeline : only the last datanode needs to verify checksum

Raghu Angadi (JIRA) Tue, 08 Jul 2008 13:58:22 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12611792#action_12611792
 ]


rangadi edited comment on HADOOP-3328 at 7/8/08 1:56 PM:
--------------------------------------------------------------

CPU measurements for writing a 4Gb file (filled with zeros) to a 3 datanode 
cluster with a replication of 3 shows 20% combined CPU improvement on 
Datanodes. Since the last datanode's work in the write pipeline does not 
change, this would be 30% CPU reduction on intermediate datanodes. The results 
are average of 3 runs. All the three datanodes are running on the same physical 
node and input for 4Gb file is read from /dev/zero.

||CPU |  User | Kernel | Total | % improvement ||
|| Trunk | 17777 | 16971 | 34749 | 0%
|| Trunk + patch | 10462 | 17314 |  27776 || 20% |

20% is a little less than the original estimate above, but is within the range. 
 

      was (Author: rangadi):
    
CPU measurements for writing a 4Gb file (filled with zeros) to a 3 datanode 
cluster with a replication of 3 shows 20% CPU combined improvement on 
Datanodes. Since the last datanode's work in the write pipeline does not 
change, this would be 30% CPU reduction on intermediate datanodes. The results 
are average of 3 runs. All the three datanodes are running on the same physical 
node and input for 4Gb file is read from /dev/zero.

||CPU |  User | Kernel | Total | % improvement ||
|| Trunk | 17777 | 169971 | 34749 | 0%
|| Trunk + patch | 10462 | 17314 |  27776 || 20% |

20% is a little less than the original estimate above, but is within the range. 
 
  
> DFS write pipeline : only the last datanode needs to verify checksum
> --------------------------------------------------------------------
>
>                 Key: HADOOP-3328
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3328
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: HADOOP-3328.patch, HADOOP-3328.patch
>
>
> Currently all the datanodes in DFS write pipeline verify checksum. Since the 
> current protocol includes acks from  the datanodes, an ack from the last node 
> could also serve as verification that checksum ok. In that sense, only the 
> last datanode needs to verify checksum. Based on [this 
> comment|http://issues.apache.org/jira/browse/HADOOP-1702?focusedCommentId=12575553#action_12575553]
>  from HADOOP-1702, CPU consumption might go down by another 25-30% (4/14) 
> after HADOOP-1702. 
> Also this would make it easier to use transferTo() and transferFrom() on 
> intermediate datanodes since they don't need to look at the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HADOOP-3328) DFS write pipeline : only the last datanode needs to verify checksum

Reply via email to