[ 
https://issues.apache.org/jira/browse/HBASE-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ryan rawson updated HBASE-2902:
-------------------------------

    Attachment: Fragger.java

>From the Azul guys, Fragger which is in the public domain.  A tool to induce 
>fragmentation in a heap to trigger a full heap compaction.

> Improve our default shipping GC config. and doc -- along the way do a bit of 
> GC myth-busting
> --------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2902
>                 URL: https://issues.apache.org/jira/browse/HBASE-2902
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: stack
>         Attachments: Fragger.java
>
>
> This issue is about improving the near-term story, working with our current 
> lot, the slowly evolving (?) 1.6x JVMs and CMS (Longer-term, another issue in 
> hbase tracks the G1 story and longer term, Todd is making a bit of traction 
> over on the GC hotspot list).  
> At the moment we ship with CMS and i-CMS enabled by default.   At a minimum, 
> i-cms does not apply on most hw hbase is deployed on -- i-cms is for hw w/ 2 
> or less processors -- and it seems as though we do not use multiple threads 
> doing YG collections; i.e. -XX:UseParNewGC "Use parallel threads in the new 
> generation" (Here's what I see...it seems to be off in jdk6 according to 
> http://www.md.pp.ru/~eu/jdk6options.html#UseParNewGC  but then this says its 
> on by default when use CMS -> 
> http://blogs.sun.com/jonthecollector/category/Java ... but then this says 
> enable it http://www.austinjug.org/presentations/JDK6PerfUpdate_Dec2009.pdf.  
> I see this when its enabled: [Rescan (parallel) ... so it seems like its off. 
>  Need to review the src code).
> We should make the above changes or at least doc them.
> We should consider enabling GC logging by default.  Its low cost apparently 
> (citation below).  We'd just need to do something about the log management.  
> Not sure you can roll them -- investigate -- and anyways we should roll on 
> startup at least so we don't lose GC logs across restarts.
> We should play with initiating ratios; maybe starting CMS earlier will push 
> out the fragmented heap that brings on the killer stop-the-world collection.
> I read somewhere recently that invoking System.gc will run a CMS GC if CMS is 
> enabled.  We should investigate.  If it ran the serial collector, we could at 
> least doc. that users could run a defragmenting stop-the-world serial 
> collection on 'off' times or at least make it so the stop-the-world happened 
> when expected instead of at some random time.
> While here, lets do a bit of myth-busting.  Here's a few postulates:
> + Keep the young generation small or at least, cap its size else it grows to 
> occupy a large part of the heap
> The above is a Ryanism.  Doing the above -- along w/ massive heap size -- has 
> put off the fragmentation that others run into at SU at least.
> Interestingly, this document -- 
> http://www.google.com/url?sa=t&source=web&cd=1&ved=0CBcQFjAA&url=http%3A%2F%2Fmediacast.sun.com%2Fusers%2FLudovic%2Fmedia%2FGCTuningPresentationFISL10.pdf&ei=ZPtaTOiLL5bcsAa7gsl1&usg=AFQjCNHP691SIIE-6NSKccM4mZtm1U6Ahw&sig2=2cjvcaeyn1aISL2THEENjQ
>  -- would seem to recommend near the opposite in that it suggests that when 
> using CMS, do all you can to keep stuff in the YG.  Avoid having stuff age up 
> to the tenured heap if you can.  This would seem imply using a larger YG.
> Chatting w/ Ryan, the reason to keep the YG small is so we don't have long 
> pauses doing YG collections.  According to the above citation, its not big 
> YGs that cause long YG pauses but the copying of data (not sure if its 
> copying of data inside the YG or if it meant copying up to tenured -- 
> chatting w/ Ryan we thought there'd be no difference -- but we should 
> investigate)
> I look a look at a running upload with a small heap admittedly.  What I was 
> seeing was that using our defaults, rare was anything in YG of age > 1 GC; 
> i.e. near everything in YG was being promoted.  This may have been a symptom 
> of my small (default) heap but we should look into this and try and ensure 
> objects are promoted because they are old, not because there is not enough 
> space in YG. 
> + We should write a slab allocator or allocate memory outside of the JVM heap
> Thinking on this, slab allocator, while a lot of work, I can see it helping 
> us w/ block cache, but what if memstore is the fragmented-heap maker?  In 
> this case, slab-allocator is only part of the fix.  It should be easy to see 
> which is the fragmented heap maker since we can turn off the cache easy 
> enough (though it seems like its accessed anyways even if disabled -- need to 
> make sure its not doing allocations to the cache in this case)
> Other things while on this topic.  We need to come up w/ a loading that 
> brings on the CMS fault that comes of a fragmented heap (CMS is 
> non-compacting but apparently it will join together free blocks to make 
> bigger ones so there is some anti-fragmenting behavior going on).  Apparently 
> lots of large irregular sized items is the ticket. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to