[jira] [Commented] (SOLR-7319) Workaround the "Four Month Bug" causing GC pause problems

Shawn Heisey (JIRA) Tue, 31 Mar 2015 20:12:06 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-7319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14389924#comment-14389924
 ]


Shawn Heisey commented on SOLR-7319:
------------------------------------

Devolving into a general discussion about garbage collection tuning:

[~jim.ferenczi], I've had really good luck with these GC tuning options, 
although I have now moved on to G1GC:

https://wiki.apache.org/solr/ShawnHeisey#CMS_.28ConcurrentMarkSweep.29_Collector

I tried really hard to make these options completely generic and not dependent 
on the number of CPUs, the size of the heap, the amount of system memory, or 
anything else that's site specific, but users with particularly small or large 
setups might need to adjust them.

Here's the GC tuning options I ended up when I updated and compiled branch_5x 
and started the server with bin/solr:

{noformat}
-XX:NewRatio=3
-XX:SurvivorRatio=4
-XX:TargetSurvivorRatio=90
-XX:MaxTenuringThreshold=8
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
-XX:ConcGCThreads=4
-XX:ParallelGCThreads=4
-XX:+CMSScavengeBeforeRemark
-XX:PretenureSizeThreshold=64m
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=50
-XX:CMSMaxAbortablePrecleanTime=6000
-XX:+CMSParallelRemarkEnabled
-XX:+ParallelRefProcEnabled
{noformat}

These are largely the same as what I came up with for my system.  Both sets 
have options that the other set doesn't.

I know from experience and my discussions on the hotspot-gc-use mailing list 
that ParallelRefProcEnabled is *critical* for good GC performance with Solr.  
Solr apparently creates a LOT of references, so processing them in parallel is 
a real help.  PretenureSizeThreshold is probably very important, to make sure 
that objects will not automatically end up in the old generation unless they're 
REALLY big - similar to the G1HeapRegionSize option for G1 that can control 
which objects are classified as humongous allocations.  The other options are a 
concerted effort to avoid full GCs.

I don't like the fact that the number of GC threads is hard-coded.  For someone 
who's got 8 or more CPU cores (which I do), these are probably good options, 
but if you've got a low end system with one or two cores, it's too many threads.

I have to wonder whether the 512MB default heap size is a problem.  It would be 
for me, but for a small-scale proof-of-concept, it is probably plenty.  Would 
it be easily possible to detect the total amount of system memory and set the 
max heap to a percentage?


> Workaround the "Four Month Bug" causing GC pause problems
> ---------------------------------------------------------
>
>                 Key: SOLR-7319
>                 URL: https://issues.apache.org/jira/browse/SOLR-7319
>             Project: Solr
>          Issue Type: Bug
>          Components: scripts and tools
>    Affects Versions: 5.0
>            Reporter: Shawn Heisey
>            Assignee: Shawn Heisey
>         Attachments: SOLR-7319.patch, SOLR-7319.patch, SOLR-7319.patch
>
>
> A twitter engineer found a bug in the JVM that contributes to GC pause 
> problems:
> http://www.evanjones.ca/jvm-mmap-pause.html
> Problem summary (in case the blog post disappears):  The JVM calculates 
> statistics on things like garbage collection and writes them to a file in the 
> temp directory using MMAP.  If there is a lot of other MMAP write activity, 
> which is precisely how Lucene accomplishes indexing and merging, it can 
> result in a GC pause because the mmap write to the temp file is delayed.
> We should implement the workaround in the solr start scripts (disable 
> creation of the mmap statistics tempfile) and document the impact in 
> CHANGES.txt.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-7319) Workaround the "Four Month Bug" causing GC pause problems

Reply via email to