[ 
https://issues.apache.org/jira/browse/HADOOP-2144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539694
 ] 

Doug Cutting commented on HADOOP-2144:
--------------------------------------

> The total read throughput was maxed at 90MB/Sec, about 60% of the expected 
> total
aggregated max read throughput of 4 disks (160MB/Sec). Thus disks were not a 
bottleneck
in this case.

That's not clear.  The blocks are all local, and the four processes are all 
accessing the same sequence of blocks.  If the four processes are synchronized, 
each reading the same block at the same time, then they should share the buffer 
cache, streaming through the same data four times in parallel at 4x40MB/s.  But 
if they get out of sync then they'll end up competing for drives.  One process 
could be reading the 23rd block from one drive and another process could be 
reading the 27th block from that same drive.  Since the drive can only read 
40MB/s, those two processes would, in aggregate only be able to read 40MB/s, 
since they'd be competing for that drive.  I'd expect them to sync up, but they 
might not.

With 128MB blocks, it should be possible to see this with 'iostat -x' or 'sar', 
reporting once per second.  If they're sync'd, then you'd expect to see one 
drive at 100% busy and the others at 0% busy, with the busy drive switching 
every three seconds.  If they're out of sync, then you'd expect the drives to 
mostly be 100% busy, but some to occasionally be idle.

> Data node process consumes 180% cpu 
> ------------------------------------
>
>                 Key: HADOOP-2144
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2144
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Runping Qi
>
> I did a test on DFS read throughput and found that the data node 
> process consumes up to 180% cpu when it is under heavi load. Here are the 
> details:
> The cluster has 380+ machines, each with 3GB mem and 4 cpus and 4 disks.
> I copied a 10GB file to dfs from one machine with a data node running there.
> Based on the dfs block placement policy, that machine has one replica for 
> each block of the file.
> then I run 4 of the following commands in parellel:
> hadoop dfs -cat thefile > /dev/null &
> Since all the blocks have a local replica, all the read requests went to the 
> local data node.
> I observed that:
>     The data node process's cpu usage was around 180% for most of the time .
>     The clients's cpu usage was moderate (as it should be).
>     All the four disks were working concurrently with comparable read 
> throughput.
>     The total read throughput was maxed at 90MB/Sec, about 60% of the 
> expected total 
>     aggregated max read throughput of 4 disks (160MB/Sec). Thus disks were 
> not a bottleneck
>     in this case.
> The data node's cpu usage seems unreasonably high.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to