[
https://issues.apache.org/jira/browse/HADOOP-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12487360
]
Raghu Angadi commented on HADOOP-1221:
--------------------------------------
We were looking at the the namenode code around the above trace. This is what
it is doing :
max = 100; // in this case
for( iter = invalidateSet.iterator(); max > 0; max-- ) {
it.remove();
}
invalidateSet is not actually set but ArrayList(). So if it has 500 blocks, the
above loop could result in 450 blocks shifted 100 times in the array. This
could be one of the things exaggerating CPU. We could use LinkedList for this
and also not call it a 'Set' since that could imply to the readers that this
container is a Set.
If each it.remove() resulted in a big memmove(), do you think we should have
seen more Java stuff above remove() in the stack trace?
Next we should also capture pstack of the JVM also so that we can see what this
is doing in JVM..
Note that changing container to LinkedList might only reduce the CPU but won't
fix the bug if there is any.
> high cpu usage in ReplicationMonitor thread
> --------------------------------------------
>
> Key: HADOOP-1221
> URL: https://issues.apache.org/jira/browse/HADOOP-1221
> Project: Hadoop
> Issue Type: Bug
> Components: dfs
> Reporter: Koji Noguchi
>
> We had a namenode stuck in CPU 99% and it was showing a slow response time.
> (dfs.namenode.handler.count was still set to 10.)
> ReplicationMonitor thread was using the most CPU time.
> Jstack showed,
> "[EMAIL PROTECTED]" daemon prio=10 tid=0x0000002d90690800 nid=0x4855 runnable
> [0x0000000041941000..0x0000000041941b30]
> java.lang.Thread.State: RUNNABLE
> at java.util.AbstractList$Itr.remove(AbstractList.java:360)
> at
> org.apache.hadoop.dfs.FSNamesystem.blocksToInvalidate(FSNamesystem.java:2475)
> - locked <0x0000002a9f522038> (a org.apache.hadoop.dfs.FSNamesystem)
> at
> org.apache.hadoop.dfs.FSNamesystem.computeDatanodeWork(FSNamesystem.java:1775)
> at
> org.apache.hadoop.dfs.FSNamesystem$ReplicationMonitor.run(FSNamesystem.java:1713)
> at java.lang.Thread.run(Thread.java:619)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.