[ 
https://issues.apache.org/jira/browse/HADOOP-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469260
 ] 

Milind Bhandarkar commented on HADOOP-885:
------------------------------------------

Dhruba,

WallClock works only when the timing thread gets a chance to execute, which in 
a heavily multithreaded app like namenode may not get any chance at all for a 
few seconds (especially if all threads are doing cpu intensive work). so, there 
is no way to guarantee that this resolution is maintained by the clock (one 
could play with thread priorities to achieve that.)

Also, your benchmark numbers are misleading because in 15 ms, WallClock gets 
only one call to currrentTimeMillis (assuming 10ms interval between 
context-switching). so, even when I change the 1 second resolution and make it 
1 ms, I get the same timing, but without the 1ms resolution requested.

> Reduce CPU usage on namenode: gettimeofday
> ------------------------------------------
>
>                 Key: HADOOP-885
>                 URL: https://issues.apache.org/jira/browse/HADOOP-885
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.10.1
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>         Attachments: WallClock.java
>
>
> On a 900 node idle cluster, the namenode spends about  20% of CPU. Most of 
> this CPU is spent processing pure heartbeats. No jobs are running on this 
> cluster and all nodes are alive and acting well.
> Of the total namenode CPU usage, about 12% is in usermode and about 70% is in 
> kernel mode! The question that natually arises is why is heartbeat processing 
> taking so much time in kernel mode?
> An strace of namenode reveals that a 20 second period has about 52000 
> syscalls with the following breakup:
> gettimeofday  :       18000 calls
> accept             :          2655 calls
> close               :          2655 calls
> shutdown       :          2655 calls
> fcntl                  :          7965 calls
> read                 :          7965 calls
> futex                 :          5295 calls
> poll                   :          4894 calls
> A code inspection reveals that the code is doing multiple (about 5) calls to 
> System.currentTimeMillis() in processing a single request in the RPC.java and 
> Server.java classes. This might mean that there is a possibility of 
> optimization.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to