[ 
https://issues.apache.org/jira/browse/HDFS-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076272#comment-14076272
 ] 

Daryn Sharp commented on HDFS-6709:
-----------------------------------

Yes, we definitely generate a lot of garbage per call.  Due to GC concerns, 
I've got work in progress to reduce the garbage generated which is why I'm 
concerned about even more garbage per call (inodes are repeatedly looked up far 
more than you think, working on single resolution).  We've already tuned young 
generation to be in-line with what other companies running large scale services 
use.

bq. Maybe you think I've chosen an easy example. Hmm... the operation that I 
can think of that touches the most inodes is recursive delete.

Yes, deletes of large trees is a good example but in practice it's a rare 
operation on a large tree.  However getContentSummary is run often for 
monitoring.  It may take many seconds on just a subtree of some clusters.  It 
may visit millions or tens of millions of inodes.

Many block level operation fetch the block collection which is really the 
inode.  Sometimes to verify the block isn't abandoned or to access other 
related blocks.  Decommissioning has always been a problem in general.  It will 
repeatedly crawl hundreds of thousands of blocks, each requiring a BC/inode 
lookup.  The replication monitor is likely to indirectly require the BC/inode 
too when it runs every 3s.  Refer to {{BlocksMap.getBlockCollection}} to see 
how many other places it's called.

Even the unobtainable best case of eliminating the 1.5s per 6h CMS pause at the 
expense of increasing the frequency and/duration of the ParNew pauses is a huge 
loss.  I suppose the proof is in a simulation.  Perhaps a rudimentary test is 
instantiating a garbage inode and blockinfo every time one is looked up, yet 
still return the "real" one, so we can how well ParNew handles the onslaught of 
garbage.

> Implement off-heap data structures for NameNode and other HDFS memory 
> optimization
> ----------------------------------------------------------------------------------
>
>                 Key: HDFS-6709
>                 URL: https://issues.apache.org/jira/browse/HDFS-6709
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-6709.001.patch
>
>
> We should investigate implementing off-heap data structures for NameNode and 
> other HDFS memory optimization.  These data structures could reduce latency 
> by avoiding the long GC times that occur with large Java heaps.  We could 
> also avoid per-object memory overheads and control memory layout a little bit 
> better.  This also would allow us to use the JVM's "compressed oops" 
> optimization even with really large namespaces, if we could get the Java heap 
> below 32 GB for those cases.  This would provide another performance and 
> memory efficiency boost.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to