[ https://issues.apache.org/jira/browse/HADOOP-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539752 ]
eric baldeschwieler commented on HADOOP-2144: --------------------------------------------- We should focus this test. Let's get say 32 distinct 128 MB files. Let's make sure they are approximately evenly distributed. Then let's cat them all. This should saturate the disks and/or CPU quite effectively. Yes? Let's post comparable numbers for straight cat from files and do some back of the envelope for what ratio we think a well implemented system might achieve. ok? This is very interesting. > Data node process consumes 180% cpu > ------------------------------------ > > Key: HADOOP-2144 > URL: https://issues.apache.org/jira/browse/HADOOP-2144 > Project: Hadoop > Issue Type: Improvement > Components: dfs > Reporter: Runping Qi > > I did a test on DFS read throughput and found that the data node > process consumes up to 180% cpu when it is under heavi load. Here are the > details: > The cluster has 380+ machines, each with 3GB mem and 4 cpus and 4 disks. > I copied a 10GB file to dfs from one machine with a data node running there. > Based on the dfs block placement policy, that machine has one replica for > each block of the file. > then I run 4 of the following commands in parellel: > hadoop dfs -cat thefile > /dev/null & > Since all the blocks have a local replica, all the read requests went to the > local data node. > I observed that: > The data node process's cpu usage was around 180% for most of the time . > The clients's cpu usage was moderate (as it should be). > All the four disks were working concurrently with comparable read > throughput. > The total read throughput was maxed at 90MB/Sec, about 60% of the > expected total > aggregated max read throughput of 4 disks (160MB/Sec). Thus disks were > not a bottleneck > in this case. > The data node's cpu usage seems unreasonably high. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.