[ https://issues.apache.org/jira/browse/HADOOP-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539628 ]
Raghu Angadi commented on HADOOP-2144: -------------------------------------- Runping, What was the overall cpu reported at the top of top? (system, user, idle, iowait). What % of cpu was taken by all 4 clients together? If it is easy to reproduce for you, I would like to take a quick look. > Data node process consumes 180% cpu > ------------------------------------ > > Key: HADOOP-2144 > URL: https://issues.apache.org/jira/browse/HADOOP-2144 > Project: Hadoop > Issue Type: Improvement > Components: dfs > Reporter: Runping Qi > > I did a test on DFS read throughput and found that the data node > process consumes up to 180% cpu when it is under heavi load. Here are the > details: > The cluster has 380+ machines, each with 3GB mem and 4 cpus and 4 disks. > I copied a 10GB file to dfs from one machine with a data node running there. > Based on the dfs block placement policy, that machine has one replica for > each block of the file. > then I run 4 of the following commands in parellel: > hadoop dfs -cat thefile > /dev/null & > Since all the blocks have a local replica, all the read requests went to the > local data node. > I observed that: > The data node process's cpu usage was around 180% for most of the time . > The clients's cpu usage was moderate (as it should be). > All the four disks were working concurrently with comparable read > throughput. > The total read throughput was maxed at 90MB/Sec, about 60% of the > expected total > aggregated max read throughput of 4 disks (160MB/Sec). Thus disks were > not a bottleneck > in this case. > The data node's cpu usage seems unreasonably high. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.