[ https://issues.apache.org/jira/browse/HADOOP-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12611792#action_12611792 ]
rangadi edited comment on HADOOP-3328 at 7/8/08 1:56 PM: -------------------------------------------------------------- CPU measurements for writing a 4Gb file (filled with zeros) to a 3 datanode cluster with a replication of 3 shows 20% combined CPU improvement on Datanodes. Since the last datanode's work in the write pipeline does not change, this would be 30% CPU reduction on intermediate datanodes. The results are average of 3 runs. All the three datanodes are running on the same physical node and input for 4Gb file is read from /dev/zero. ||CPU | User | Kernel | Total | % improvement || || Trunk | 17777 | 16971 | 34749 | 0% || Trunk + patch | 10462 | 17314 | 27776 || 20% | 20% is a little less than the original estimate above, but is within the range. was (Author: rangadi): CPU measurements for writing a 4Gb file (filled with zeros) to a 3 datanode cluster with a replication of 3 shows 20% CPU combined improvement on Datanodes. Since the last datanode's work in the write pipeline does not change, this would be 30% CPU reduction on intermediate datanodes. The results are average of 3 runs. All the three datanodes are running on the same physical node and input for 4Gb file is read from /dev/zero. ||CPU | User | Kernel | Total | % improvement || || Trunk | 17777 | 169971 | 34749 | 0% || Trunk + patch | 10462 | 17314 | 27776 || 20% | 20% is a little less than the original estimate above, but is within the range. > DFS write pipeline : only the last datanode needs to verify checksum > -------------------------------------------------------------------- > > Key: HADOOP-3328 > URL: https://issues.apache.org/jira/browse/HADOOP-3328 > Project: Hadoop Core > Issue Type: Improvement > Components: dfs > Affects Versions: 0.16.0 > Reporter: Raghu Angadi > Assignee: Raghu Angadi > Attachments: HADOOP-3328.patch, HADOOP-3328.patch > > > Currently all the datanodes in DFS write pipeline verify checksum. Since the > current protocol includes acks from the datanodes, an ack from the last node > could also serve as verification that checksum ok. In that sense, only the > last datanode needs to verify checksum. Based on [this > comment|http://issues.apache.org/jira/browse/HADOOP-1702?focusedCommentId=12575553#action_12575553] > from HADOOP-1702, CPU consumption might go down by another 25-30% (4/14) > after HADOOP-1702. > Also this would make it easier to use transferTo() and transferFrom() on > intermediate datanodes since they don't need to look at the data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.