RE: Solr and Garbage Collection

2009-10-06 Thread Fuad Efendi
Master-Slave replica: new caches will be warmed&prepopulated _before_ making
new IndexReader available for _new_ requests and _before_ discarding old one
- it means that theoretical sizing for FieldCache (which is defined by
number of docs in an index and cardinality of a field) should be doubled...
of course we need to play with GC options too for performance tuning

> > I read pretty much all posts on this thread (before and after this one).
> Looks
> > like the main suggestion from you and others is to keep max heap size
> (-Xmx)
> > as small as possible (as long as you don't see OOM exception).
> I suggested absolute opposite; please note also that "as small as
> does not have any meaning in multiuser environment of Tomcat. It depends
> query types (10 documents per request? OR, may be 1???) AND it depends
> on average server loading (one concurrent request? Or, may be 200 threads
> trying to deal with 2000 concurrent requests?) AND it depends on whether
> is Master (used for updates - parses tons of docs in a single file???) -
> it depends on unpredictable memory fragmentation - it all depends on use
> case too(!!!), additionally to schema / index size.
> Please note also, such staff depends on JVM vendor too: what if it
> precompiles everything into CPU native code (including memory dealloc
> each call)? Some do!
> -Fuad
> ...but 'core' constantly disagrees with me :)

RE: Solr and Garbage Collection

2009-10-06 Thread Fuad Efendi
> I read pretty much all posts on this thread (before and after this one).
> like the main suggestion from you and others is to keep max heap size
> as small as possible (as long as you don't see OOM exception). 

I suggested absolute opposite; please note also that "as small as possible"
does not have any meaning in multiuser environment of Tomcat. It depends on
query types (10 documents per request? OR, may be 1???) AND it depends
on average server loading (one concurrent request? Or, may be 200 threads
trying to deal with 2000 concurrent requests?) AND it depends on whether it
is Master (used for updates - parses tons of docs in a single file???) - and
it depends on unpredictable memory fragmentation - it all depends on use
case too(!!!), additionally to schema / index size.

Please note also, such staff depends on JVM vendor too: what if it
precompiles everything into CPU native code (including memory dealloc after
each call)? Some do!


...but 'core' constantly disagrees with me :)

Re: Solr and Garbage Collection

2009-10-03 Thread Mark Miller
>>>>>>>> Hi,
>>>>>>>> I read pretty much all posts on this thread (before and after this
>>>> one).
>>>>>> Looks like the main suggestion from you and others is to keep max heap
>>>> size
>>>>>> (-Xmx) as small as possible (as long as you don't see OOM exception).
>>>> This
>>>>>> brings more questions than answers (for me at least. I'm new to Solr).
>>>>>>>> First, our environment and problem encountered: Solr1.4 (nightly
>>>> build,
>>>>>> downloaded about 2 months ago), Sun JDK1.6, Tomcat 5.5, running on
>>>>>> Solaris(multi-cpu/cores). The cache setting is from the default
>>>>>> solrconfig.xml (looks very small). At first we used minimum JAVA_OPTS
>>>> and
>>>>>> quickly run into the problem similar to the one orignal poster reported
>>>> --
>>>>>> long pause (seconds to minutes) under load test. jconsole showed that it
>>>>>> pauses on GC. So more JAVA_OPTS get added: "-XX:+UseConcMarkSweepGC
>>>>>> -XX:+UseParNewGC -XX:ParallelGCThreads=8 -XX:SurvivorRatio=2
>>>>>> -XX:NewSize=128m -XX:MaxNewSize=512m -XX:MaxGCPauseMillis=200", the
>>>> thinking
>>>>>> is with mutile-cpu/cores we can get over with GC as quickly as possibe.
>>>> With
>>>>>> the new setup, it works fine until Tomcat reaches heap size, then it
>>>> blocks
>>>>>> and takes minutes on "full GC" to get more space from "tenure
>>>> generation".
>>>>>> We tried different Xmx (from very small to large), no difference in long
>>>> GC
>>>>>> time. We never run into OOM.
>>>>>>> MaxGCPauseMillis doesnt work with UseConcMarkSweepGC - its for use with
>>>>>>> the Parallel collector. That also doesnt look like a good
>>>> survivorratio.

Re: Solr and Garbage Collection

2009-10-03 Thread Mark Miller
Sun JDK1.6, Tomcat 5.5, running on
>>>>> Solaris(multi-cpu/cores). The cache setting is from the default
>>>>> solrconfig.xml (looks very small). At first we used minimum JAVA_OPTS
>>> and
>>>>> quickly run into the problem similar to the one orignal poster reported
>>> --
>>>>> long pause (seconds to minutes) under load test. jconsole showed that it
>>>>> pauses on GC. So more JAVA_OPTS get added: "-XX:+UseConcMarkSweepGC
>>>>> -XX:+UseParNewGC -XX:ParallelGCThreads=8 -XX:SurvivorRatio=2
>>>>> -XX:NewSize=128m -XX:MaxNewSize=512m -XX:MaxGCPauseMillis=200", the
>>> thinking
>>>>> is with mutile-cpu/cores we can get over with GC as quickly as possibe.
>>> With
>>>>> the new setup, it works fine until Tomcat reaches heap size, then it
>>> blocks
>>>>> and takes minutes on "full GC" to get more space from "tenure
>>> generation".
>>>>> We tried different Xmx (from very small to large), no difference in long
>>> GC
>>>>> time. We never run into OOM.
>>>>>> MaxGCPauseMillis doesnt work with UseConcMarkSweepGC - its for use with
>>>>>> the Parallel collector. That also doesnt look like a good
>>> survivorratio.
>>>>>>> Questions:
>>>>>>> * In general various cachings are good for performance, we have more
>>> RAM
>>>>> to use and want to use more caching to boost performance, isn't your
>>>>> suggestion (of lowering heap limit) going against that?
>>>>>> Leaving RAM for the FileSystem cache is also very important. But you
>>>>>> should also have enough RAM for your Solr caches of course.
>>>>>>> * Looks like Solr caching made its way into tenure-generation on heap,
>>>>> that's good. But why they get GC'ed eventually?? I did a quick check of
>>> Solr
>>>>> code (Solr 1.3, not 1.4), and see a single instance of using
>>> WeakReference.
>>>>> Is that what is causing all this? This seems to suggest a design flaw in
>>>>> Solr's memory management strategy (or just my ignorance about Solr?). I
>>>>> mean, wouldn't this be the "right" way of doing it -- you allow user to
>>>>> specify the cache size in solrconfig.xml, then user can set up heap
>>> limit in
>>>>> JAVA_OPTS accordingly, and no need to use WeakReference (BTW, why not
>>>>> SoftReference)??
>>>>>> Do you see concurrent mode failure when looking at your gc logs? ie:
>>>>>> 174.445: [GC 174.446: [ParNew: 66408K->66408K(66416K), 0.618
>>>>>> secs]174.446: [CMS (concurrent mode failure):

Re: Solr and Garbage Collection

2009-10-03 Thread Mark Miller
 to get more space from "tenure
>> generation".
>>>> We tried different Xmx (from very small to large), no difference in long
>> GC
>>>> time. We never run into OOM.
>>>>> MaxGCPauseMillis doesnt work with UseConcMarkSweepGC - its for use with
>>>>> the Parallel collector. That also doesnt look like a good
>> survivorratio.
>>>>>> Questions:
>>>>>> * In general various cachings are good for performance, we have more
>> RAM
>>>> to use and want to use more caching to boost performance, isn't your
>>>> suggestion (of lowering heap limit) going against that?
>>>>> Leaving RAM for the FileSystem cache is also very important. But you
>>>>> should also have enough RAM for your Solr caches of course.
>>>>>> * Looks like Solr caching made its way into tenure-generation on heap,
>>>> that's good. But why they get GC'ed eventually?? I did a quick check of
>> Solr
>>>> code (Solr 1.3, not 1.4), and see a single instance of using
>> WeakReference.
>>>> Is that what is causing all this? This seems to suggest a design flaw in
>>>> Solr's memory management strategy (or just my ignorance about Solr?). I
>>>> mean, wouldn't this be the "right" way of doing it -- you allow user to
>>>> specify the cache size in solrconfig.xml, then user can set up heap
>> limit in
>>>> JAVA_OPTS accordingly, and no need to use WeakReference (BTW, why not
>>>> SoftReference)??
>>>>> Do you see concurrent mode failure when looking at your gc logs? ie:
>>>>> 174.445: [GC 174.446: [ParNew: 66408K->66408K(66416K), 0.618
>>>>> secs]174.446: [CMS (concurrent mode failure):
>> 161928K->162118K(175104K),
>>>>> 4.0975124 secs] 228336K->162118K(241520K)
>>>>> That means you have still getting major collections with CMS, and you
>>>>> don't want that. You might try kicking GC off earlier with something
>>>>> like: -XX:CMSInitiatingOccupancyFraction=50
>>>>>> * Right now I have a single Tomcat hosting Solr and other
>> applications.
>>>> I guess now it's better to have Solr on its own Tomcat, given that it's
>>>> tricky to adjust the java options.
>>>>>> thanks.
>>>>>>> From:
>>>>>>> To:
>>>>>>> Subject: RE: Solr and Garbage Collection
>>>>>>> Date: Fri, 25 Sep 2009 09:51:29 -0700
>>>>>>> 30ms is not better or worse than 1s until you look at the service
>>>>>>> requirements. For many applications, it is worth dedicating 10% of
>> your
>>>>>>> processing time to GC if that makes the worst-case pause short.
>>>>>>> On the other hand, my experience with the IBM JVM was that the
>> maximum
>>>> query
>>>>>>> rate was 2-3X better with the concurrent generational GC compared to
>>>> any of
>>>>>>> their other GC algorithms, so we got the best throughput along with
>> the

Re: Solr and Garbage Collection

2009-10-03 Thread Bill Au
ts way into tenure-generation on heap,
> >>>>
> >> that's good. But why they get GC'ed eventually?? I did a quick check of
> Solr
> >> code (Solr 1.3, not 1.4), and see a single instance of using
> WeakReference.
> >> Is that what is causing all this? This seems to suggest a design flaw in
> >> Solr's memory management strategy (or just my ignorance about Solr?). I
> >> mean, wouldn't this be the "right" way of doing it -- you allow user to
> >> specify the cache size in solrconfig.xml, then user can set up heap
> limit in
> >> JAVA_OPTS accordingly, and no need to use WeakReference (BTW, why not
> >> SoftReference)??
> >>
> >>>>
> >>> Do you see concurrent mode failure when looking at your gc logs? ie:
> >>>
> >>> 174.445: [GC 174.446: [ParNew: 66408K->66408K(66416K), 0.618
> >>> secs]174.446: [CMS (concurrent mode failure):
> 161928K->162118K(175104K),
> >>> 4.0975124 secs] 228336K->162118K(241520K)
> >>>
> >>> That means you have still getting major collections with CMS, and you
> >>> don't want that. You might try kicking GC off earlier with something
> >>> like: -XX:CMSInitiatingOccupancyFraction=50
> >>>
> >>>
> >>>> * Right now I have a single Tomcat hosting Solr and other
> applications.
> >>>>
> >> I guess now it's better to have Solr on its own Tomcat, given that it's
> >> tricky to adjust the java options.
> >>
> >>>>
> >>>> thanks.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>> From:
> >>>>> To:
> >>>>> Subject: RE: Solr and Garbage Collection
> >>>>> Date: Fri, 25 Sep 2009 09:51:29 -0700
> >>>>>
> >>>>> 30ms is not better or worse than 1s until you look at the service
> >>>>> requirements. For many applications, it is worth dedicating 10% of
> your
> >>>>> processing time to GC if that makes the worst-case pause short.
> >>>>>
> >>>>> On the other hand, my experience with the IBM JVM was that the
> maximum
> >>>>>
> >> query
> >>
> >>>>> rate was 2-3X better with the concurrent generational GC compared to
> >>>>>
> >> any of
> >>
> >>>>> their other GC algorithms, so we got the best throughput along with
> the
> >>>>> shortest pauses.
> >>>>>
> >>>>> Solr garbage generation (for queries) seems to have two major
> >>>>>
> >> components:
> >>
> >>>>> per-request garbage and cache evictions. With a generational
> collector,
> >>>>> these two are handled by separate parts of the collector. Per-request
> >>>>> garbage should completely fit in the short-term heap (nursery), so
> that
> >>>>>
> >> it
> >>
> >>>>> can be collected rapidly and returned to use for further requests. If
> >>>>>
> >> the
> >>
> >>>>> nursery is too small, the per-request allocations will be made in
> >>>>>
> >> tenured
> >>
> >>>>> space and sit there until the next major GC. Cache evictions are
> almost
> >>>>> always in long-term storage (tenured space) because an LRU algorithm
> >>>>> guarantees that the garbage will be old.
> >>>>>
> >>>>> Check the growth rate of tenured space (under constant load, of
> course)
> >>>>> while increasing the size of the nursery. That rate should drop when
> >>>>>
> >> the
> >>
> >>>>> nursery gets big enough, then not drop much further as it is
> increased
> >>>>>
> >> more.
> >>
> >>>>> After that, reduce the size of tenured space until major GCs start
> >>>>>
> >> happening
> >>
> >>>>> "too often" (a judgment call). A bigger tenured space means longer
> >>>>>
> >> major GCs
> >>
> >>>>> and thus longer pauses, so you don't want it oversized by too much.
> >>>>>
> >>>>> Also check the hit rates of your caches. If the hit rate is low, say
> >>>>>
> >> 20% or
> >>
> >>>>> less, make that cache much bigger or set it to zero. Either one will
> >>>>>
> >> reduce
> >>
> >>>>> the number of cache evictions. If you have an HTTP cache in front of
> >>>>>
> >> Solr,
> >>
> >>>>> zero may be the right choice, since the HTTP cache is cherry-picking
> >>>>>
> >> the
> >>
> >>>>> easily cacheable requests.
> >>>>>
> >>>>> Note that a commit nearly doubles the memory required, because you
> have
> >>>>>
> >> two
> >>
> >>>>> live Searcher objects with all their caches. Make sure you have
> >>>>>
> >> headroom for
> >>
> >>>>> a commit.
> >>>>>
> >>>>> If you want to test the tenured space usage, you must test with real
> >>>>>
> >> world
> >>
> >>>>> queries. Those are the only way to get accurate cache eviction rates.
> >>>>>
> >>>>> wunder
> >>>>>
> >>>>>
> >>>>>
> >>>> _
> >>>> Bing™  brings you maps, menus, and reviews organized in one place.
> Try
> >>>>
> >> it now.
> >>
> >>
> >>
> >>>>
> >>>
> >>>
> >> --
> >> - Mark
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >
> >
> --
> - Mark

Re: Solr and Garbage Collection

2009-10-03 Thread Mark Miller
ce (BTW, why not
>> SoftReference)??
>>> Do you see concurrent mode failure when looking at your gc logs? ie:
>>> 174.445: [GC 174.446: [ParNew: 66408K->66408K(66416K), 0.618
>>> secs]174.446: [CMS (concurrent mode failure): 161928K->162118K(175104K),
>>> 4.0975124 secs] 228336K->162118K(241520K)
>>> That means you have still getting major collections with CMS, and you
>>> don't want that. You might try kicking GC off earlier with something
>>> like: -XX:CMSInitiatingOccupancyFraction=50
>>>> * Right now I have a single Tomcat hosting Solr and other applications.
>> I guess now it's better to have Solr on its own Tomcat, given that it's
>> tricky to adjust the java options.
>>>> thanks.
>>>>> From:
>>>>> To:
>>>>> Subject: RE: Solr and Garbage Collection
>>>>> Date: Fri, 25 Sep 2009 09:51:29 -0700
>>>>> 30ms is not better or worse than 1s until you look at the service
>>>>> requirements. For many applications, it is worth dedicating 10% of your
>>>>> processing time to GC if that makes the worst-case pause short.
>>>>> On the other hand, my experience with the IBM JVM was that the maximum
>> query
>>>>> rate was 2-3X better with the concurrent generational GC compared to
>> any of
>>>>> their other GC algorithms, so we got the best throughput along with the
>>>>> shortest pauses.
>>>>> Solr garbage generation (for queries) seems to have two major
>> components:
>>>>> per-request garbage and cache evictions. With a generational collector,
>>>>> these two are handled by separate parts of the collector. Per-request
>>>>> garbage should completely fit in the short-term heap (nursery), so that
>> it
>>>>> can be collected rapidly and returned to use for further requests. If
>> the
>>>>> nursery is too small, the per-request allocations will be made in
>> tenured
>>>>> space and sit there until the next major GC. Cache evictions are almost
>>>>> always in long-term storage (tenured space) because an LRU algorithm
>>>>> guarantees that the garbage will be old.
>>>>> Check the growth rate of tenured space (under constant load, of course)
>>>>> while increasing the size of the nursery. That rate should drop when
>> the
>>>>> nursery gets big enough, then not drop much further as it is increased
>> more.
>>>>> After that, reduce the size of tenured space until major GCs start
>> happening
>>>>> "too often" (a judgment call). A bigger tenured space means longer
>> major GCs
>>>>> and thus longer pauses, so you don't want it oversized by too much.
>>>>> Also check the hit rates of your caches. If the hit rate is low, say
>> 20% or
>>>>> less, make that cache much bigger or set it to zero. Either one will
>> reduce
>>>>> the number of cache evictions. If you have an HTTP cache in front of
>> Solr,
>>>>> zero may be the right choice, since the HTTP cache is cherry-picking
>> the
>>>>> easily cacheable requests.
>>>>> Note that a commit nearly doubles the memory required, because you have
>> two
>>>>> live Searcher objects with all their caches. Make sure you have
>> headroom for
>>>>> a commit.
>>>>> If you want to test the tenured space usage, you must test with real
>> world
>>>>> queries. Those are the only way to get accurate cache eviction rates.
>>>>> wunder
>>>> _
>>>> Bing™  brings you maps, menus, and reviews organized in one place.   Try
>> it now.
>> --
>> - Mark

- Mark

Re: Solr and Garbage Collection

2009-10-03 Thread Bill Au
SUN has recently clarify the issue regarding "unsupported unless you pay"
for the G1 garbage collector. Here is the updated release of Java 6 update

G1 will be part of Java 7, fully supported without pay.  The version
included in Java 6 update 14 is a beta release.  Since it is beta, SUN does
not recommend using it unless you have a support contract because as with
any beta software there will be bugs.  Non paying customers may very well
have to wait for the official version in Java 7 for bug fixes.

Here is more info on the G1 garbage collector:


On Sat, Oct 3, 2009 at 1:28 PM, Mark Miller  wrote:

> Another option of course, if you're using a recent version of Java 6:
> try out the beta-ish, unsupported unless you pay, G1 garbage collector.
> I've only recently started playing with it, but its supposed to be much
> better than CMS. Its supposedly got much better throughput, its much
> better at dealing with fragmentation issues (CMS is actually pretty bad
> with fragmentation come to find out), and overall its just supposed to
> be a very nice leap ahead in GC. Havn't had a chance to play with it
> much myself, but its supposed to be fantastic. A whole new approach to
> generational collection for Sun, and much closer to the "real time" GC's
> available from some other vendors.
> Mark Miller wrote:
> > siping liu wrote:
> >
> >> Hi,
> >>
> >> I read pretty much all posts on this thread (before and after this one).
> Looks like the main suggestion from you and others is to keep max heap size
> (-Xmx) as small as possible (as long as you don't see OOM exception). This
> brings more questions than answers (for me at least. I'm new to Solr).
> >>
> >>
> >>
> >> First, our environment and problem encountered: Solr1.4 (nightly build,
> downloaded about 2 months ago), Sun JDK1.6, Tomcat 5.5, running on
> Solaris(multi-cpu/cores). The cache setting is from the default
> solrconfig.xml (looks very small). At first we used minimum JAVA_OPTS and
> quickly run into the problem similar to the one orignal poster reported --
> long pause (seconds to minutes) under load test. jconsole showed that it
> pauses on GC. So more JAVA_OPTS get added: "-XX:+UseConcMarkSweepGC
> -XX:+UseParNewGC -XX:ParallelGCThreads=8 -XX:SurvivorRatio=2
> -XX:NewSize=128m -XX:MaxNewSize=512m -XX:MaxGCPauseMillis=200", the thinking
> is with mutile-cpu/cores we can get over with GC as quickly as possibe. With
> the new setup, it works fine until Tomcat reaches heap size, then it blocks
> and takes minutes on "full GC" to get more space from "tenure generation".
> We tried different Xmx (from very small to large), no difference in long GC
> time. We never run into OOM.
> >>
> >>
> > MaxGCPauseMillis doesnt work with UseConcMarkSweepGC - its for use with
> > the Parallel collector. That also doesnt look like a good survivorratio.
> >
> >>
> >>
> >> Questions:
> >>
> >> * In general various cachings are good for performance, we have more RAM
> to use and want to use more caching to boost performance, isn't your
> suggestion (of lowering heap limit) going against that?
> >>
> >>
> > Leaving RAM for the FileSystem cache is also very important. But you
> > should also have enough RAM for your Solr caches of course.
> >
> >> * Looks like Solr caching made its way into tenure-generation on heap,
> that's good. But why they get GC'ed eventually?? I did a quick check of Solr
> code (Solr 1.3, not 1.4), and see a single instance of using WeakReference.
> Is that what is causing all this? This seems to suggest a design flaw in
> Solr's memory management strategy (or just my ignorance about Solr?). I
> mean, wouldn't this be the "right" way of doing it -- you allow user to
> specify the cache size in solrconfig.xml, then user can set up heap limit in
> JAVA_OPTS accordingly, and no need to use WeakReference (BTW, why not
> SoftReference)??
> >>
> >>
> > Do you see concurrent mode failure when looking at your gc logs? ie:
> >
> > 174.445: [GC 174.446: [ParNew: 66408K->66408K(66416K), 0.618
> > secs]174.446: [CMS (concurrent mode failure): 161928K->162118K(175104K),
> > 4.0975124 secs] 228336K->162118K(241520K)
> >
> > That means you have still getting major collections with CMS, and you
> > don't want that. You might try kicking GC off earlier with something
> > like: -XX:CMSInitiatingOccupancyFraction=50
> >

Re: Solr and Garbage Collection

2009-10-03 Thread Mark Miller
Another option of course, if you're using a recent version of Java 6:

try out the beta-ish, unsupported unless you pay, G1 garbage collector.
I've only recently started playing with it, but its supposed to be much
better than CMS. Its supposedly got much better throughput, its much
better at dealing with fragmentation issues (CMS is actually pretty bad
with fragmentation come to find out), and overall its just supposed to
be a very nice leap ahead in GC. Havn't had a chance to play with it
much myself, but its supposed to be fantastic. A whole new approach to
generational collection for Sun, and much closer to the "real time" GC's
available from some other vendors.

Mark Miller wrote:
> siping liu wrote:
>> Hi,
>> I read pretty much all posts on this thread (before and after this one). 
>> Looks like the main suggestion from you and others is to keep max heap size 
>> (-Xmx) as small as possible (as long as you don't see OOM exception). This 
>> brings more questions than answers (for me at least. I'm new to Solr).
>> First, our environment and problem encountered: Solr1.4 (nightly build, 
>> downloaded about 2 months ago), Sun JDK1.6, Tomcat 5.5, running on 
>> Solaris(multi-cpu/cores). The cache setting is from the default 
>> solrconfig.xml (looks very small). At first we used minimum JAVA_OPTS and 
>> quickly run into the problem similar to the one orignal poster reported -- 
>> long pause (seconds to minutes) under load test. jconsole showed that it 
>> pauses on GC. So more JAVA_OPTS get added: "-XX:+UseConcMarkSweepGC 
>> -XX:+UseParNewGC -XX:ParallelGCThreads=8 -XX:SurvivorRatio=2 
>> -XX:NewSize=128m -XX:MaxNewSize=512m -XX:MaxGCPauseMillis=200", the thinking 
>> is with mutile-cpu/cores we can get over with GC as quickly as possibe. With 
>> the new setup, it works fine until Tomcat reaches heap size, then it blocks 
>> and takes minutes on "full GC" to get more space from "tenure generation". 
>> We tried different Xmx (from very small to large), no difference in long GC 
>> time. We never run into OOM.
> MaxGCPauseMillis doesnt work with UseConcMarkSweepGC - its for use with
> the Parallel collector. That also doesnt look like a good survivorratio.
>> Questions:
>> * In general various cachings are good for performance, we have more RAM to 
>> use and want to use more caching to boost performance, isn't your suggestion 
>> (of lowering heap limit) going against that?
> Leaving RAM for the FileSystem cache is also very important. But you
> should also have enough RAM for your Solr caches of course.
>> * Looks like Solr caching made its way into tenure-generation on heap, 
>> that's good. But why they get GC'ed eventually?? I did a quick check of Solr 
>> code (Solr 1.3, not 1.4), and see a single instance of using WeakReference. 
>> Is that what is causing all this? This seems to suggest a design flaw in 
>> Solr's memory management strategy (or just my ignorance about Solr?). I 
>> mean, wouldn't this be the "right" way of doing it -- you allow user to 
>> specify the cache size in solrconfig.xml, then user can set up heap limit in 
>> JAVA_OPTS accordingly, and no need to use WeakReference (BTW, why not 
>> SoftReference)??
> Do you see concurrent mode failure when looking at your gc logs? ie:
> 174.445: [GC 174.446: [ParNew: 66408K->66408K(66416K), 0.618
> secs]174.446: [CMS (concurrent mode failure): 161928K->162118K(175104K),
> 4.0975124 secs] 228336K->162118K(241520K)
> That means you have still getting major collections with CMS, and you
> don't want that. You might try kicking GC off earlier with something
> like: -XX:CMSInitiatingOccupancyFraction=50
>> * Right now I have a single Tomcat hosting Solr and other applications. I 
>> guess now it's better to have Solr on its own Tomcat, given that it's tricky 
>> to adjust the java options.
>> thanks.
>>> From:
>>> To:
>>> Subject: RE: Solr and Garbage Collection
>>> Date: Fri, 25 Sep 2009 09:51:29 -0700
>>> 30ms is not better or worse than 1s until you look at the service
>>> requirements. For many applications, it is worth dedicating 10% of your
>>> processing time to GC if that makes the worst-case pause short.
>>> On the other hand, my exper

Re: Solr and Garbage Collection

2009-10-02 Thread Mark Miller
siping liu wrote:
> Hi,
> I read pretty much all posts on this thread (before and after this one). 
> Looks like the main suggestion from you and others is to keep max heap size 
> (-Xmx) as small as possible (as long as you don't see OOM exception). This 
> brings more questions than answers (for me at least. I'm new to Solr).
> First, our environment and problem encountered: Solr1.4 (nightly build, 
> downloaded about 2 months ago), Sun JDK1.6, Tomcat 5.5, running on 
> Solaris(multi-cpu/cores). The cache setting is from the default 
> solrconfig.xml (looks very small). At first we used minimum JAVA_OPTS and 
> quickly run into the problem similar to the one orignal poster reported -- 
> long pause (seconds to minutes) under load test. jconsole showed that it 
> pauses on GC. So more JAVA_OPTS get added: "-XX:+UseConcMarkSweepGC 
> -XX:+UseParNewGC -XX:ParallelGCThreads=8 -XX:SurvivorRatio=2 -XX:NewSize=128m 
> -XX:MaxNewSize=512m -XX:MaxGCPauseMillis=200", the thinking is with 
> mutile-cpu/cores we can get over with GC as quickly as possibe. With the new 
> setup, it works fine until Tomcat reaches heap size, then it blocks and takes 
> minutes on "full GC" to get more space from "tenure generation". We tried 
> different Xmx (from very small to large), no difference in long GC time. We 
> never run into OOM.
MaxGCPauseMillis doesnt work with UseConcMarkSweepGC - its for use with
the Parallel collector. That also doesnt look like a good survivorratio.
> Questions:
> * In general various cachings are good for performance, we have more RAM to 
> use and want to use more caching to boost performance, isn't your suggestion 
> (of lowering heap limit) going against that?
Leaving RAM for the FileSystem cache is also very important. But you
should also have enough RAM for your Solr caches of course.
> * Looks like Solr caching made its way into tenure-generation on heap, that's 
> good. But why they get GC'ed eventually?? I did a quick check of Solr code 
> (Solr 1.3, not 1.4), and see a single instance of using WeakReference. Is 
> that what is causing all this? This seems to suggest a design flaw in Solr's 
> memory management strategy (or just my ignorance about Solr?). I mean, 
> wouldn't this be the "right" way of doing it -- you allow user to specify the 
> cache size in solrconfig.xml, then user can set up heap limit in JAVA_OPTS 
> accordingly, and no need to use WeakReference (BTW, why not SoftReference)??
Do you see concurrent mode failure when looking at your gc logs? ie:

174.445: [GC 174.446: [ParNew: 66408K->66408K(66416K), 0.618
secs]174.446: [CMS (concurrent mode failure): 161928K->162118K(175104K),
4.0975124 secs] 228336K->162118K(241520K)

That means you have still getting major collections with CMS, and you
don't want that. You might try kicking GC off earlier with something
like: -XX:CMSInitiatingOccupancyFraction=50
> * Right now I have a single Tomcat hosting Solr and other applications. I 
> guess now it's better to have Solr on its own Tomcat, given that it's tricky 
> to adjust the java options.
> thanks.
>> From:
>> To:
>> Subject: RE: Solr and Garbage Collection
>> Date: Fri, 25 Sep 2009 09:51:29 -0700
>> 30ms is not better or worse than 1s until you look at the service
>> requirements. For many applications, it is worth dedicating 10% of your
>> processing time to GC if that makes the worst-case pause short.
>> On the other hand, my experience with the IBM JVM was that the maximum query
>> rate was 2-3X better with the concurrent generational GC compared to any of
>> their other GC algorithms, so we got the best throughput along with the
>> shortest pauses.
>> Solr garbage generation (for queries) seems to have two major components:
>> per-request garbage and cache evictions. With a generational collector,
>> these two are handled by separate parts of the collector. Per-request
>> garbage should completely fit in the short-term heap (nursery), so that it
>> can be collected rapidly and returned to use for further requests. If the
>> nursery is too small, the per-request allocations will be made in tenured
>> space and sit there until the next major GC. Cache evictions are almost
>> always in long-term storage (tenured space) because an LRU algorithm
>> guarantees that the garbage will be old.
>> Check the growth rate of tenured space (under constant load, of course)
>> while increasing the size of the nursery. That rate should drop when the
>> nurse

RE: Solr and Garbage Collection

2009-10-02 Thread siping liu


I read pretty much all posts on this thread (before and after this one). Looks 
like the main suggestion from you and others is to keep max heap size (-Xmx) as 
small as possible (as long as you don't see OOM exception). This brings more 
questions than answers (for me at least. I'm new to Solr).


First, our environment and problem encountered: Solr1.4 (nightly build, 
downloaded about 2 months ago), Sun JDK1.6, Tomcat 5.5, running on 
Solaris(multi-cpu/cores). The cache setting is from the default solrconfig.xml 
(looks very small). At first we used minimum JAVA_OPTS and quickly run into the 
problem similar to the one orignal poster reported -- long pause (seconds to 
minutes) under load test. jconsole showed that it pauses on GC. So more 
JAVA_OPTS get added: "-XX:+UseConcMarkSweepGC -XX:+UseParNewGC 
-XX:ParallelGCThreads=8 -XX:SurvivorRatio=2 -XX:NewSize=128m 
-XX:MaxNewSize=512m -XX:MaxGCPauseMillis=200", the thinking is with 
mutile-cpu/cores we can get over with GC as quickly as possibe. With the new 
setup, it works fine until Tomcat reaches heap size, then it blocks and takes 
minutes on "full GC" to get more space from "tenure generation". We tried 
different Xmx (from very small to large), no difference in long GC time. We 
never run into OOM.



* In general various cachings are good for performance, we have more RAM to use 
and want to use more caching to boost performance, isn't your suggestion (of 
lowering heap limit) going against that?

* Looks like Solr caching made its way into tenure-generation on heap, that's 
good. But why they get GC'ed eventually?? I did a quick check of Solr code 
(Solr 1.3, not 1.4), and see a single instance of using WeakReference. Is that 
what is causing all this? This seems to suggest a design flaw in Solr's memory 
management strategy (or just my ignorance about Solr?). I mean, wouldn't this 
be the "right" way of doing it -- you allow user to specify the cache size in 
solrconfig.xml, then user can set up heap limit in JAVA_OPTS accordingly, and 
no need to use WeakReference (BTW, why not SoftReference)??

* Right now I have a single Tomcat hosting Solr and other applications. I guess 
now it's better to have Solr on its own Tomcat, given that it's tricky to 
adjust the java options.



> From:
> To:
> Subject: RE: Solr and Garbage Collection
> Date: Fri, 25 Sep 2009 09:51:29 -0700
> 30ms is not better or worse than 1s until you look at the service
> requirements. For many applications, it is worth dedicating 10% of your
> processing time to GC if that makes the worst-case pause short.
> On the other hand, my experience with the IBM JVM was that the maximum query
> rate was 2-3X better with the concurrent generational GC compared to any of
> their other GC algorithms, so we got the best throughput along with the
> shortest pauses.
> Solr garbage generation (for queries) seems to have two major components:
> per-request garbage and cache evictions. With a generational collector,
> these two are handled by separate parts of the collector. Per-request
> garbage should completely fit in the short-term heap (nursery), so that it
> can be collected rapidly and returned to use for further requests. If the
> nursery is too small, the per-request allocations will be made in tenured
> space and sit there until the next major GC. Cache evictions are almost
> always in long-term storage (tenured space) because an LRU algorithm
> guarantees that the garbage will be old.
> Check the growth rate of tenured space (under constant load, of course)
> while increasing the size of the nursery. That rate should drop when the
> nursery gets big enough, then not drop much further as it is increased more.
> After that, reduce the size of tenured space until major GCs start happening
> "too often" (a judgment call). A bigger tenured space means longer major GCs
> and thus longer pauses, so you don't want it oversized by too much.
> Also check the hit rates of your caches. If the hit rate is low, say 20% or
> less, make that cache much bigger or set it to zero. Either one will reduce
> the number of cache evictions. If you have an HTTP cache in front of Solr,
> zero may be the right choice, since the HTTP cache is cherry-picking the
> easily cacheable requests.
> Note that a commit nearly doubles the memory required, because you have two
> live Searcher objects with all their caches. Make sure you have headroom for
> a commit.
> If you want to test the tenured space usage, you must test with real world
> queries. Those are the only way to get accurate cache eviction rates.
> wunder
Bing™  brings you maps, menus, and reviews organized in one place.   Try it now.

RE: Solr and Garbage Collection

2009-09-29 Thread Fuad Efendi

> Actually the CPU usage of the solr servers is almost insignificant (it was
> like that before).

>>The time spent on collecting memory dropped from 11% to 3.81%

I even think that 3.81% from 5% is nothing (suspecting that SOLR uses 5%
CPU, mostly loading large field values in memory) :)))
(would be nice to load-stress-multithreaded except of waiting...)

Most Expensive Query: faceting on all fields with generic query like *:*

Re: Solr and Garbage Collection

2009-09-28 Thread Bill Au
One way to track expensive is to look at the query time, QTime, in the solr
There are a couple of tools for analyzing gc logs:

They will give you frequency and duration of minor and major collection.

On a multi-processor/core system with CPU cycles to spare, using the
concurrent collector will reduce (may even eliminate) major collection.  The
trade off is that CPU utilization on the system will go up.  When I tried it
with one of my Java app, the system utilization went up so much under heavy
load that it reduced the overall throughput of my app.  You milage may
varies.  You will have to measure it for your app to see for yourself.


On Mon, Sep 28, 2009 at 4:49 PM, Jonathan Ariel  wrote:

> How do you track major collections? Even better, how do you log your GC
> behavior with details? Right now I just log total time spent on
> collections,
> but I don't really know on which collections.Regard application performance
> with the ConcMarkSweepGC, I think I didn't experience any impact for now.
> Actually the CPU usage of the solr servers is almost insignificant (it was
> like that before).
> BTW, do you know a good way to track the N most expensive solr queries? I
> would like to measure that on 2 different solr servers with different GC.
> On Mon, Sep 28, 2009 at 4:42 PM, Mark Miller 
> wrote:
> > Do you have your GC logs? Are you still seeing major collections?
> >
> > Where is the time spent?
> >
> > Hard to say without some of that info.
> >
> > The goal of the low pause collector is to finish collecting before the
> > tenured space is filled - if it doesn't, a standard major collection
> > occurs.
> >
> > The collector will use recent stats it records to try and pick a good
> > time to start - as a fail safe though, it will trigger no matter what at
> > a certain percentage. With Java 1.5, it was 68% full that it triggered.
> > With 1.6, its 92%.
> >
> > If your still getting major collections, you might want to see if
> > lowering that helps (-XX:CMSInitiatingOccupancyFraction=). If not,
> > you might be near optimal settings.
> >
> > There is likely not anything else you should mess with - unless using
> > the extra thread to collect while your app is running affects your apps
> > performance - in that case you might want to look into turning on the
> > incremental mode. But you havn't mentioned that, so I doubt it.
> >
> >
> >
> > --
> > - Mark
> >
> >
> >
> >
> >
> > Jonathan Ariel wrote:
> > > Ok... good news! Upgrading to the newest version of JVM 6 (update 6)
> > seems
> > > to solve this ugly bug. With the upgraded JVM I could run the solr
> > servers
> > > for more than 12 hours on the production environment with the GC
> > mentioned
> > > in the previous e-mails. The results are really amazing. The time spent
> > on
> > > collecting memory dropped from 11% to 3.81%Do you think there is more
> to
> > > tune there?
> > >
> > > Thanks!
> > >
> > > Jonathan
> > >
> > > On Sun, Sep 27, 2009 at 8:39 PM, Bill Au  wrote:
> > >
> > >
> > >> You are running a very old version of Java 6 (update 6).  The latest
> is
> > >> update 16.  You should definitely upgrade.  There is a bug in Java 6
> > >> starting with update 4 that may result in a corrupted Lucene/Solr
> index:
> > >>
> > >>
> > >>
> > >> The JVM crash occurred in the gc thread.  So it looks like a bug in
> the
> > JVM
> > >> itself.  Upgrading to the latest release might help.  Switching to a
> > >> different garbage collector should help.
> > >>
> > >> Bill
> > >>
> > >> On Sat, Sep 26, 2009 at 4:31 PM, Mark Miller 
> > >> wrote:
> > >>
> > >>
> > >>> Jonathan Ariel wrote:
> > >>>
> >  Ok. After the server ran for more than 12 hours, the time spent on
> GC
> >  decreased from 11% to 3,4%, but 5 hours later it crashed. This is
> the
> > 
> > >>> thread
> > >>>
> >  dump, maybe you can help identify what happened?
> > 
> > 
> > >>> Well thats a tough ;) My guess is its a bug :)
> > >>>
> > >>> Your two survivor spaces are filled, so it was likely about to move
> > >>> objects into the tenured space, which still has plenty of room for
> them
> > >>> (barring horrible fragmentation). Any issues with that type of thing
> > >>> should generate an OOM anyway though. You can find people that have
> run
> > >>> into similar issues in the past, but a lot of times unreproducible.
> > >>> Usually, their bugs are closed and they are told to try a newer JVM.
> > >>>
> > >>> Your JVM appears to be quite a few versions back. There have been
> many
> > >>> garbage collection bugs fixed in the 7 or so updates since your
> > version,
> > >>> a good handful of them related to CMS.
> > >>>
> > >>> If you can, my best suggestion at the moment is to upgrade to th

Re: Solr and Garbage Collection

2009-09-28 Thread Mark Miller
Another good option.

Here is a comparison of the commands I replied with and this one:

Very similar.

Otis Gospodnetic wrote:
> Jonathan,
> Here is the JVM argument for logging GC activity:
> -Xloggc:log GC status to a file with time stamps
> Otis
> --
> Sematext is hiring --
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
> - Original Message 
>> From: Jonathan Ariel 
>> To:
>> Sent: Monday, September 28, 2009 4:49:03 PM
>> Subject: Re: Solr and Garbage Collection
>> How do you track major collections? Even better, how do you log your GC
>> behavior with details? Right now I just log total time spent on collections,
>> but I don't really know on which collections.Regard application performance
>> with the ConcMarkSweepGC, I think I didn't experience any impact for now.
>> Actually the CPU usage of the solr servers is almost insignificant (it was
>> like that before).
>> BTW, do you know a good way to track the N most expensive solr queries? I
>> would like to measure that on 2 different solr servers with different GC.
>> On Mon, Sep 28, 2009 at 4:42 PM, Mark Miller wrote:
>>> Do you have your GC logs? Are you still seeing major collections?
>>> Where is the time spent?
>>> Hard to say without some of that info.
>>> The goal of the low pause collector is to finish collecting before the
>>> tenured space is filled - if it doesn't, a standard major collection
>>> occurs.
>>> The collector will use recent stats it records to try and pick a good
>>> time to start - as a fail safe though, it will trigger no matter what at
>>> a certain percentage. With Java 1.5, it was 68% full that it triggered.
>>> With 1.6, its 92%.
>>> If your still getting major collections, you might want to see if
>>> lowering that helps (-XX:CMSInitiatingOccupancyFraction=). If not,
>>> you might be near optimal settings.
>>> There is likely not anything else you should mess with - unless using
>>> the extra thread to collect while your app is running affects your apps
>>> performance - in that case you might want to look into turning on the
>>> incremental mode. But you havn't mentioned that, so I doubt it.
>>> --
>>> - Mark
>>> Jonathan Ariel wrote:
>>>> Ok... good news! Upgrading to the newest version of JVM 6 (update 6)
>>> seems
>>>> to solve this ugly bug. With the upgraded JVM I could run the solr
>>> servers
>>>> for more than 12 hours on the production environment with the GC
>>> mentioned
>>>> in the previous e-mails. The results are really amazing. The time spent
>>> on
>>>> collecting memory dropped from 11% to 3.81%Do you think there is more to
>>>> tune there?
>>>> Thanks!
>>>> Jonathan
>>>> On Sun, Sep 27, 2009 at 8:39 PM, Bill Au wrote:
>>>>> You are running a very old version of Java 6 (update 6).  The latest is
>>>>> update 16.  You should definitely upgrade.  There is a bug in Java 6
>>>>> starting with update 4 that may result in a corrupted Lucene/Solr index:
>>>>> The JVM crash occurred in the gc thread.  So it looks like a bug in the
>>> JVM
>>>>> itself.  Upgrading to the latest release might help.  Switching to a
>>>>> different garbage collector should help.
>>>>> Bill
>>>>> On Sat, Sep 26, 2009 at 4:31 PM, Mark Miller 
>>>>> wrote:
>>>>>> Jonathan Ariel wrote:

Re: Solr and Garbage Collection

2009-09-28 Thread Mark Miller


|[GC 325407K->83000K(776768K), 0.2300771 secs]
[GC 325816K->83372K(776768K), 0.2454258 secs]
[Full GC 267628K->83769K(776768K), 1.8479984 secs]|

Additional details with: |-XX:+PrintGCDetails|

|[GC [DefNew: 64575K->959K(64576K), 0.0457646 secs] 196016K->133633K(261184K), 
0.0459067 secs]

And timestamps with: ||-XX:+PrintGCTimeStamps|

|111.042: [GC 111.042: [DefNew: 8128K->8128K(8128K), 0.505
secs]111.042: [Tenured: 18154K->2311K(24576K), 0.1290354 secs]
26282K->2311K(32704K), 0.1293306 secs] |

Jonathan Ariel wrote:
> How do you track major collections? Even better, how do you log your GC
> behavior with details? Right now I just log total time spent on collections,
> but I don't really know on which collections.Regard application performance
> with the ConcMarkSweepGC, I think I didn't experience any impact for now.
> Actually the CPU usage of the solr servers is almost insignificant (it was
> like that before).
> BTW, do you know a good way to track the N most expensive solr queries? I
> would like to measure that on 2 different solr servers with different GC.
> On Mon, Sep 28, 2009 at 4:42 PM, Mark Miller  wrote:
>> Do you have your GC logs? Are you still seeing major collections?
>> Where is the time spent?
>> Hard to say without some of that info.
>> The goal of the low pause collector is to finish collecting before the
>> tenured space is filled - if it doesn't, a standard major collection
>> occurs.
>> The collector will use recent stats it records to try and pick a good
>> time to start - as a fail safe though, it will trigger no matter what at
>> a certain percentage. With Java 1.5, it was 68% full that it triggered.
>> With 1.6, its 92%.
>> If your still getting major collections, you might want to see if
>> lowering that helps (-XX:CMSInitiatingOccupancyFraction=). If not,
>> you might be near optimal settings.
>> There is likely not anything else you should mess with - unless using
>> the extra thread to collect while your app is running affects your apps
>> performance - in that case you might want to look into turning on the
>> incremental mode. But you havn't mentioned that, so I doubt it.
>> --
>> - Mark
>> Jonathan Ariel wrote:
>>> Ok... good news! Upgrading to the newest version of JVM 6 (update 6)
>> seems
>>> to solve this ugly bug. With the upgraded JVM I could run the solr
>> servers
>>> for more than 12 hours on the production environment with the GC
>> mentioned
>>> in the previous e-mails. The results are really amazing. The time spent
>> on
>>> collecting memory dropped from 11% to 3.81%Do you think there is more to
>>> tune there?
>>> Thanks!
>>> Jonathan
>>> On Sun, Sep 27, 2009 at 8:39 PM, Bill Au  wrote:
 You are running a very old version of Java 6 (update 6).  The latest is
 update 16.  You should definitely upgrade.  There is a bug in Java 6
 starting with update 4 that may result in a corrupted Lucene/Solr index:

 The JVM crash occurred in the gc thread.  So it looks like a bug in the
>> JVM
 itself.  Upgrading to the latest release might help.  Switching to a
 different garbage collector should help.


 On Sat, Sep 26, 2009 at 4:31 PM, Mark Miller 

> Jonathan Ariel wrote:
>> Ok. After the server ran for more than 12 hours, the time spent on GC
>> decreased from 11% to 3,4%, but 5 hours later it crashed. This is the
> thread
>> dump, maybe you can help identify what happened?
> Well thats a tough ;) My guess is its a bug :)
> Your two survivor spaces are filled, so it was likely about to move
> objects into the tenured space, which still has plenty of room for them
> (barring horrible fragmentation). Any issues with that type of thing
> should generate an OOM anyway though. You can find people that have run
> into similar issues in the past, but a lot of times unreproducible.
> Usually, their bugs are closed and they are told to try a newer JVM.
> Your JVM appears to be quite a few versions back. There have been many
> garbage collection bugs fixed in the 7 or so updates since your
>> version,
> a good handful of them related to CMS.
> If you can, my best suggestion at the moment is to upgrade to the
>> latest
> and see how that fairs.
> If not, you might see if going back to the throughput collector and
> turning on the parallel tenured space collector might meet your needs

Re: Solr and Garbage Collection

2009-09-28 Thread Otis Gospodnetic

Here is the JVM argument for logging GC activity:

-Xloggc:log GC status to a file with time stamps

Sematext is hiring --
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR

- Original Message 
> From: Jonathan Ariel 
> To:
> Sent: Monday, September 28, 2009 4:49:03 PM
> Subject: Re: Solr and Garbage Collection
> How do you track major collections? Even better, how do you log your GC
> behavior with details? Right now I just log total time spent on collections,
> but I don't really know on which collections.Regard application performance
> with the ConcMarkSweepGC, I think I didn't experience any impact for now.
> Actually the CPU usage of the solr servers is almost insignificant (it was
> like that before).
> BTW, do you know a good way to track the N most expensive solr queries? I
> would like to measure that on 2 different solr servers with different GC.
> On Mon, Sep 28, 2009 at 4:42 PM, Mark Miller wrote:
> > Do you have your GC logs? Are you still seeing major collections?
> >
> > Where is the time spent?
> >
> > Hard to say without some of that info.
> >
> > The goal of the low pause collector is to finish collecting before the
> > tenured space is filled - if it doesn't, a standard major collection
> > occurs.
> >
> > The collector will use recent stats it records to try and pick a good
> > time to start - as a fail safe though, it will trigger no matter what at
> > a certain percentage. With Java 1.5, it was 68% full that it triggered.
> > With 1.6, its 92%.
> >
> > If your still getting major collections, you might want to see if
> > lowering that helps (-XX:CMSInitiatingOccupancyFraction=). If not,
> > you might be near optimal settings.
> >
> > There is likely not anything else you should mess with - unless using
> > the extra thread to collect while your app is running affects your apps
> > performance - in that case you might want to look into turning on the
> > incremental mode. But you havn't mentioned that, so I doubt it.
> >
> >
> >
> > --
> > - Mark
> >
> >
> >
> >
> >
> > Jonathan Ariel wrote:
> > > Ok... good news! Upgrading to the newest version of JVM 6 (update 6)
> > seems
> > > to solve this ugly bug. With the upgraded JVM I could run the solr
> > servers
> > > for more than 12 hours on the production environment with the GC
> > mentioned
> > > in the previous e-mails. The results are really amazing. The time spent
> > on
> > > collecting memory dropped from 11% to 3.81%Do you think there is more to
> > > tune there?
> > >
> > > Thanks!
> > >
> > > Jonathan
> > >
> > > On Sun, Sep 27, 2009 at 8:39 PM, Bill Au wrote:
> > >
> > >
> > >> You are running a very old version of Java 6 (update 6).  The latest is
> > >> update 16.  You should definitely upgrade.  There is a bug in Java 6
> > >> starting with update 4 that may result in a corrupted Lucene/Solr index:
> > >>
> > >>
> > >>
> > >> The JVM crash occurred in the gc thread.  So it looks like a bug in the
> > JVM
> > >> itself.  Upgrading to the latest release might help.  Switching to a
> > >> different garbage collector should help.
> > >>
> > >> Bill
> > >>
> > >> On Sat, Sep 26, 2009 at 4:31 PM, Mark Miller 
> > >> wrote:
> > >>
> > >>
> > >>> Jonathan Ariel wrote:
> > >>>
> > >>>> Ok. After the server ran for more than 12 hours, the time spent on GC
> > >>>> decreased from 11% to 3,4%, but 5 hours later it crashed. This is the
> > >>>>
> > >>> thread
> > >>>
> > >>>> dump, maybe you can help identify what happened?
> > >>>>
> > >>>>
> > >>> Well thats a tough ;) My guess is its a bug :)
> > >>>
> > >>> Your two survivor spaces are filled, so it was likely about to move
> > >>> objects into the tenured space, which still has plenty of room for them
> > >>> (barring horrible fragmentation). Any issues with that type of thing
> > >>> should generate an OOM anyway though. You can find people that have 

Re: Solr and Garbage Collection

2009-09-28 Thread Jonathan Ariel
How do you track major collections? Even better, how do you log your GC
behavior with details? Right now I just log total time spent on collections,
but I don't really know on which collections.Regard application performance
with the ConcMarkSweepGC, I think I didn't experience any impact for now.
Actually the CPU usage of the solr servers is almost insignificant (it was
like that before).
BTW, do you know a good way to track the N most expensive solr queries? I
would like to measure that on 2 different solr servers with different GC.

On Mon, Sep 28, 2009 at 4:42 PM, Mark Miller  wrote:

> Do you have your GC logs? Are you still seeing major collections?
> Where is the time spent?
> Hard to say without some of that info.
> The goal of the low pause collector is to finish collecting before the
> tenured space is filled - if it doesn't, a standard major collection
> occurs.
> The collector will use recent stats it records to try and pick a good
> time to start - as a fail safe though, it will trigger no matter what at
> a certain percentage. With Java 1.5, it was 68% full that it triggered.
> With 1.6, its 92%.
> If your still getting major collections, you might want to see if
> lowering that helps (-XX:CMSInitiatingOccupancyFraction=). If not,
> you might be near optimal settings.
> There is likely not anything else you should mess with - unless using
> the extra thread to collect while your app is running affects your apps
> performance - in that case you might want to look into turning on the
> incremental mode. But you havn't mentioned that, so I doubt it.
> --
> - Mark
> Jonathan Ariel wrote:
> > Ok... good news! Upgrading to the newest version of JVM 6 (update 6)
> seems
> > to solve this ugly bug. With the upgraded JVM I could run the solr
> servers
> > for more than 12 hours on the production environment with the GC
> mentioned
> > in the previous e-mails. The results are really amazing. The time spent
> on
> > collecting memory dropped from 11% to 3.81%Do you think there is more to
> > tune there?
> >
> > Thanks!
> >
> > Jonathan
> >
> > On Sun, Sep 27, 2009 at 8:39 PM, Bill Au  wrote:
> >
> >
> >> You are running a very old version of Java 6 (update 6).  The latest is
> >> update 16.  You should definitely upgrade.  There is a bug in Java 6
> >> starting with update 4 that may result in a corrupted Lucene/Solr index:
> >>
> >>
> >>
> >> The JVM crash occurred in the gc thread.  So it looks like a bug in the
> >> itself.  Upgrading to the latest release might help.  Switching to a
> >> different garbage collector should help.
> >>
> >> Bill
> >>
> >> On Sat, Sep 26, 2009 at 4:31 PM, Mark Miller 
> >> wrote:
> >>
> >>
> >>> Jonathan Ariel wrote:
> >>>
>  Ok. After the server ran for more than 12 hours, the time spent on GC
>  decreased from 11% to 3,4%, but 5 hours later it crashed. This is the
> >>> thread
> >>>
>  dump, maybe you can help identify what happened?
> >>> Well thats a tough ;) My guess is its a bug :)
> >>>
> >>> Your two survivor spaces are filled, so it was likely about to move
> >>> objects into the tenured space, which still has plenty of room for them
> >>> (barring horrible fragmentation). Any issues with that type of thing
> >>> should generate an OOM anyway though. You can find people that have run
> >>> into similar issues in the past, but a lot of times unreproducible.
> >>> Usually, their bugs are closed and they are told to try a newer JVM.
> >>>
> >>> Your JVM appears to be quite a few versions back. There have been many
> >>> garbage collection bugs fixed in the 7 or so updates since your
> version,
> >>> a good handful of them related to CMS.
> >>>
> >>> If you can, my best suggestion at the moment is to upgrade to the
> latest
> >>> and see how that fairs.
> >>>
> >>> If not, you might see if going back to the throughput collector and
> >>> turning on the parallel tenured space collector might meet your needs
> >>> instead. You can work with other params to get that going better if you
> >>> have to as well.
> >>>
> >>> Also, adjusting other settings with the low pause collector might
> >>> trigger something to side step the bug. Not a great option there though
> >>>
> >> ;)
> >>
> >>> How many unique fields are you sorting/faceting on? It must be a lot if
> >>> you need 10 gig for 8 million documents. Its kind of rough to have to
> >>> work at such a close limit to your total heap available as a min mem
> >>> requirement.
> >>>
> >>> --
> >>> - Mark
> >>>
> >>>
> >>>
> >>>
> >>>
>  #
>  # An unexpected error has been detected by Java Runtime Environment:
>  #
>  #  SIGSEGV (0xb) at pc=0x2b4e0f69ea2a, pid=32224, tid=1103812928
>  #
>  # Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode
> >>>

Re: Solr and Garbage Collection

2009-09-28 Thread Mark Miller
Do you have your GC logs? Are you still seeing major collections?

Where is the time spent?

Hard to say without some of that info.

The goal of the low pause collector is to finish collecting before the
tenured space is filled - if it doesn't, a standard major collection occurs.

The collector will use recent stats it records to try and pick a good
time to start - as a fail safe though, it will trigger no matter what at
a certain percentage. With Java 1.5, it was 68% full that it triggered.
With 1.6, its 92%.

If your still getting major collections, you might want to see if
lowering that helps (-XX:CMSInitiatingOccupancyFraction=). If not,
you might be near optimal settings.

There is likely not anything else you should mess with - unless using
the extra thread to collect while your app is running affects your apps
performance - in that case you might want to look into turning on the
incremental mode. But you havn't mentioned that, so I doubt it.

- Mark

Jonathan Ariel wrote:
> Ok... good news! Upgrading to the newest version of JVM 6 (update 6) seems
> to solve this ugly bug. With the upgraded JVM I could run the solr servers
> for more than 12 hours on the production environment with the GC mentioned
> in the previous e-mails. The results are really amazing. The time spent on
> collecting memory dropped from 11% to 3.81%Do you think there is more to
> tune there?
> Thanks!
> Jonathan
> On Sun, Sep 27, 2009 at 8:39 PM, Bill Au  wrote:
>> You are running a very old version of Java 6 (update 6).  The latest is
>> update 16.  You should definitely upgrade.  There is a bug in Java 6
>> starting with update 4 that may result in a corrupted Lucene/Solr index:
>> The JVM crash occurred in the gc thread.  So it looks like a bug in the JVM
>> itself.  Upgrading to the latest release might help.  Switching to a
>> different garbage collector should help.
>> Bill
>> On Sat, Sep 26, 2009 at 4:31 PM, Mark Miller 
>> wrote:
>>> Jonathan Ariel wrote:
 Ok. After the server ran for more than 12 hours, the time spent on GC
 decreased from 11% to 3,4%, but 5 hours later it crashed. This is the
>>> thread
 dump, maybe you can help identify what happened?

>>> Well thats a tough ;) My guess is its a bug :)
>>> Your two survivor spaces are filled, so it was likely about to move
>>> objects into the tenured space, which still has plenty of room for them
>>> (barring horrible fragmentation). Any issues with that type of thing
>>> should generate an OOM anyway though. You can find people that have run
>>> into similar issues in the past, but a lot of times unreproducible.
>>> Usually, their bugs are closed and they are told to try a newer JVM.
>>> Your JVM appears to be quite a few versions back. There have been many
>>> garbage collection bugs fixed in the 7 or so updates since your version,
>>> a good handful of them related to CMS.
>>> If you can, my best suggestion at the moment is to upgrade to the latest
>>> and see how that fairs.
>>> If not, you might see if going back to the throughput collector and
>>> turning on the parallel tenured space collector might meet your needs
>>> instead. You can work with other params to get that going better if you
>>> have to as well.
>>> Also, adjusting other settings with the low pause collector might
>>> trigger something to side step the bug. Not a great option there though
>> ;)
>>> How many unique fields are you sorting/faceting on? It must be a lot if
>>> you need 10 gig for 8 million documents. Its kind of rough to have to
>>> work at such a close limit to your total heap available as a min mem
>>> requirement.
>>> --
>>> - Mark
 # An unexpected error has been detected by Java Runtime Environment:
 #  SIGSEGV (0xb) at pc=0x2b4e0f69ea2a, pid=32224, tid=1103812928
 # Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode
 # Problematic frame:
 # V  []
 # If you would like to submit a bug report, please visit:

 ---  T H R E A D  ---

 Current thread (0x5be47400):  VMThread [stack:
 0x41bad000,0x41cae000] [id=32249]

 siginfo:si_signo=SIGSEGV: si_errno=0, si_code=128 (),

 RAX=0x2aac929b4c70, RBX=0x0037c985003a095e, RCX=0x0006,
 RSP=0x41cac550, RBP=0x41cac550, RSI=0x2aac929b4c70,
 R8 =0x2aadab201538, R9 =0x0005, R10=0x000

Re: Solr and Garbage Collection

2009-09-28 Thread Jonathan Ariel
Ok... good news! Upgrading to the newest version of JVM 6 (update 6) seems
to solve this ugly bug. With the upgraded JVM I could run the solr servers
for more than 12 hours on the production environment with the GC mentioned
in the previous e-mails. The results are really amazing. The time spent on
collecting memory dropped from 11% to 3.81%Do you think there is more to
tune there?



On Sun, Sep 27, 2009 at 8:39 PM, Bill Au  wrote:

> You are running a very old version of Java 6 (update 6).  The latest is
> update 16.  You should definitely upgrade.  There is a bug in Java 6
> starting with update 4 that may result in a corrupted Lucene/Solr index:
> The JVM crash occurred in the gc thread.  So it looks like a bug in the JVM
> itself.  Upgrading to the latest release might help.  Switching to a
> different garbage collector should help.
> Bill
> On Sat, Sep 26, 2009 at 4:31 PM, Mark Miller 
> wrote:
> > Jonathan Ariel wrote:
> > > Ok. After the server ran for more than 12 hours, the time spent on GC
> > > decreased from 11% to 3,4%, but 5 hours later it crashed. This is the
> > thread
> > > dump, maybe you can help identify what happened?
> > >
> > Well thats a tough ;) My guess is its a bug :)
> >
> > Your two survivor spaces are filled, so it was likely about to move
> > objects into the tenured space, which still has plenty of room for them
> > (barring horrible fragmentation). Any issues with that type of thing
> > should generate an OOM anyway though. You can find people that have run
> > into similar issues in the past, but a lot of times unreproducible.
> > Usually, their bugs are closed and they are told to try a newer JVM.
> >
> > Your JVM appears to be quite a few versions back. There have been many
> > garbage collection bugs fixed in the 7 or so updates since your version,
> > a good handful of them related to CMS.
> >
> > If you can, my best suggestion at the moment is to upgrade to the latest
> > and see how that fairs.
> >
> > If not, you might see if going back to the throughput collector and
> > turning on the parallel tenured space collector might meet your needs
> > instead. You can work with other params to get that going better if you
> > have to as well.
> >
> > Also, adjusting other settings with the low pause collector might
> > trigger something to side step the bug. Not a great option there though
> ;)
> >
> > How many unique fields are you sorting/faceting on? It must be a lot if
> > you need 10 gig for 8 million documents. Its kind of rough to have to
> > work at such a close limit to your total heap available as a min mem
> > requirement.
> >
> > --
> > - Mark
> >
> >
> >
> >
> > > #
> > > # An unexpected error has been detected by Java Runtime Environment:
> > > #
> > > #  SIGSEGV (0xb) at pc=0x2b4e0f69ea2a, pid=32224, tid=1103812928
> > > #
> > > # Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode
> > > linux-amd64)
> > > # Problematic frame:
> > > # V  []
> > > #
> > > # If you would like to submit a bug report, please visit:
> > > #
> > > #
> > >
> > > ---  T H R E A D  ---
> > >
> > > Current thread (0x5be47400):  VMThread [stack:
> > > 0x41bad000,0x41cae000] [id=32249]
> > >
> > > siginfo:si_signo=SIGSEGV: si_errno=0, si_code=128 (),
> > > si_addr=0x
> > >
> > > Registers:
> > > RAX=0x2aac929b4c70, RBX=0x0037c985003a095e, RCX=0x0006,
> > > RDX=0x005c49870037c996
> > > RSP=0x41cac550, RBP=0x41cac550, RSI=0x2aac929b4c70,
> > > RDI=0x0037c985003a095e
> > > R8 =0x2aadab201538, R9 =0x0005, R10=0x0001,
> > > R11=0x0010
> > > R12=0x2aac929b4c70, R13=0x2aac9289cf58, R14=0x2aac9289cf40,
> > > R15=0x2aadab2015ac
> > > RIP=0x2b4e0f69ea2a, EFL=0x00010206,
> > CSGSFS=0x0033,
> > > ERR=0x
> > >   TRAPNO=0x000d
> > >
> > > Top of Stack: (sp=0x41cac550)
> > > 0x41cac550:   41cac580 2b4e0f903c5b
> > > 0x41cac560:   41cac590 0003
> > > 0x41cac570:   2aac9289cf50 2aadab2015a8
> > > 0x41cac580:   41cac5c0 2b4e0f72e388
> > > 0x41cac590:   41cac5c0 2aac9289cf40
> > > 0x41cac5a0:   0005 2b4e0fc86330
> > > 0x41cac5b0:    2b4e0fd8c740
> > > 0x41cac5c0:   41cac5f0 2b4e0f903b7f
> > > 0x41cac5d0:   41cac610 0003
> > > 0x41cac5e0:   2aaccb1750f8 2aaccea41570
> > > 0x41cac5f0:   41cac610 2b4e0f931548
> > > 0x41cac600:   2b4e0fc861d8 2aadd4052ab0
> > > 0x41cac610:   41cac

Re: Solr and Garbage Collection

2009-09-27 Thread Bill Au
You are running a very old version of Java 6 (update 6).  The latest is
update 16.  You should definitely upgrade.  There is a bug in Java 6
starting with update 4 that may result in a corrupted Lucene/Solr index:

The JVM crash occurred in the gc thread.  So it looks like a bug in the JVM
itself.  Upgrading to the latest release might help.  Switching to a
different garbage collector should help.


On Sat, Sep 26, 2009 at 4:31 PM, Mark Miller  wrote:

> Jonathan Ariel wrote:
> > Ok. After the server ran for more than 12 hours, the time spent on GC
> > decreased from 11% to 3,4%, but 5 hours later it crashed. This is the
> thread
> > dump, maybe you can help identify what happened?
> >
> Well thats a tough ;) My guess is its a bug :)
> Your two survivor spaces are filled, so it was likely about to move
> objects into the tenured space, which still has plenty of room for them
> (barring horrible fragmentation). Any issues with that type of thing
> should generate an OOM anyway though. You can find people that have run
> into similar issues in the past, but a lot of times unreproducible.
> Usually, their bugs are closed and they are told to try a newer JVM.
> Your JVM appears to be quite a few versions back. There have been many
> garbage collection bugs fixed in the 7 or so updates since your version,
> a good handful of them related to CMS.
> If you can, my best suggestion at the moment is to upgrade to the latest
> and see how that fairs.
> If not, you might see if going back to the throughput collector and
> turning on the parallel tenured space collector might meet your needs
> instead. You can work with other params to get that going better if you
> have to as well.
> Also, adjusting other settings with the low pause collector might
> trigger something to side step the bug. Not a great option there though ;)
> How many unique fields are you sorting/faceting on? It must be a lot if
> you need 10 gig for 8 million documents. Its kind of rough to have to
> work at such a close limit to your total heap available as a min mem
> requirement.
> --
> - Mark
> > #
> > # An unexpected error has been detected by Java Runtime Environment:
> > #
> > #  SIGSEGV (0xb) at pc=0x2b4e0f69ea2a, pid=32224, tid=1103812928
> > #
> > # Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode
> > linux-amd64)
> > # Problematic frame:
> > # V  []
> > #
> > # If you would like to submit a bug report, please visit:
> > #
> > #
> >
> > ---  T H R E A D  ---
> >
> > Current thread (0x5be47400):  VMThread [stack:
> > 0x41bad000,0x41cae000] [id=32249]
> >
> > siginfo:si_signo=SIGSEGV: si_errno=0, si_code=128 (),
> > si_addr=0x
> >
> > Registers:
> > RAX=0x2aac929b4c70, RBX=0x0037c985003a095e, RCX=0x0006,
> > RDX=0x005c49870037c996
> > RSP=0x41cac550, RBP=0x41cac550, RSI=0x2aac929b4c70,
> > RDI=0x0037c985003a095e
> > R8 =0x2aadab201538, R9 =0x0005, R10=0x0001,
> > R11=0x0010
> > R12=0x2aac929b4c70, R13=0x2aac9289cf58, R14=0x2aac9289cf40,
> > R15=0x2aadab2015ac
> > RIP=0x2b4e0f69ea2a, EFL=0x00010206,
> CSGSFS=0x0033,
> > ERR=0x
> >   TRAPNO=0x000d
> >
> > Top of Stack: (sp=0x41cac550)
> > 0x41cac550:   41cac580 2b4e0f903c5b
> > 0x41cac560:   41cac590 0003
> > 0x41cac570:   2aac9289cf50 2aadab2015a8
> > 0x41cac580:   41cac5c0 2b4e0f72e388
> > 0x41cac590:   41cac5c0 2aac9289cf40
> > 0x41cac5a0:   0005 2b4e0fc86330
> > 0x41cac5b0:    2b4e0fd8c740
> > 0x41cac5c0:   41cac5f0 2b4e0f903b7f
> > 0x41cac5d0:   41cac610 0003
> > 0x41cac5e0:   2aaccb1750f8 2aaccea41570
> > 0x41cac5f0:   41cac610 2b4e0f931548
> > 0x41cac600:   2b4e0fc861d8 2aadd4052ab0
> > 0x41cac610:   41cac640 2b4e0f903d1a
> > 0x41cac620:   41cac650 0003
> > 0x41cac630:   5bc7d6d0 2b4e0fd8c740
> > 0x41cac640:   41cac650 2b4e0f90411c
> > 0x41cac650:   41cac680 2b4e0fa1d16e
> > 0x41cac660:    5bc7d6d0
> > 0x41cac670:   0002 2b4e0fd8c740
> > 0x41cac680:   41cac6c0 2b4e0fa74640
> > 0x41cac690:   41cac6b0 5bc7d6d0
> > 0x41cac6a0:   0002 2b4e0fd8c740
> > 0x41cac6b0:   0001 2b4e0fd8c740
> > 0x41cac6c0:   41cac700 0

Re: Solr and Garbage Collection

2009-09-27 Thread Jonathan Ariel
Right... when I increased it to 12GB all OOM just disappear. And all the
tests are being run on the live environment and for several hours, so it is
real enough :)As soon as I update JVM and test again the GC I will let you
know. If you think I can run another test meanwhile just let me know.

On Sun, Sep 27, 2009 at 5:05 PM, Mark Miller  wrote:

> Jonathan Ariel wrote:
> > Well.. it is strange that when I use the default GC I don't get any
> errors.
> >
> Not so strange - it's different code. The bug is Likely in the low pause
> collector and not the serial collector.
> > If I'm so close to run out of memory I should see those OOM exceptions as
> > well with the standard GC.
> Those? Your not seeing any that you mentioned unless you lower your heap?
> > BTW I'm faceting on around 13 fields and my total
> > number of unique values is around 3.
> > One of the fields with the biggest amount of unique values has almost
> 16000
> > unique values.
> >
> >
> > On Sun, Sep 27, 2009 at 4:32 PM, Fuad Efendi  wrote:
> >
> >
> >> Mark,
> >>
> >>
> >> Nothing against orange-hat :)
> >>
> >> Nothing against GC tuning; but if SOLR needs application-specific
> settings
> >> it should be well-documented.
> >>
> >> GC-tuning: for instance, we need it for 'realtime' Online Trading
> >> applications. However, even Online Banking doesn't need; primary reason
> -
> >> GC
> >> must happen 'outside of current transaction', GC 'must be predictable',
> and
> >> (for instance) Oracle/BEA JRockit has specific 'realtime' version for
> >> that... Does SOLR need that?
> >>
> >>
> >> Having load-stress simulator (multithreaded!!!) will definitely help to
> >> predict any possible bottleneck... it's even better to write it from
> >> scratch
> >> (depends on schema!), by sending random requests to SOLR in-parallel...
> >> instead of waiting when FieldCache tries to add new FieldImpl to cache
> >> (unpredictable!)
> >>
> >>
> >> Tomcat is multithreaded; what if end-users need to load 1000s large
> >> documents (in parallel! 1000s concurrent users), can you predict memory
> >> requirements and GC options without application-specific knowledge? What
> >> about new SOLR-Caches warming up?
> >>
> >>
> >> -Fuad
> >>
> >>
> >>
> >>> -Original Message-
> >>> From: Mark Miller []
> >>> Sent: September-27-09 2:46 PM
> >>> To:
> >>> Subject: Re: Solr and Garbage Collection
> >>>
> >>> If he needed double the RAM, he'd likely know by now :) The JVM likes
> to
> >>> throw OOM exceptions when you need more RAM. Until it does - thats an
> >>> odd path to focus on. There has been no indication he has ever seen an
> >>> OOM with his over 10 GB heap.  It sounds like he has run Solr in his
> >>> environment for quite a long time - after running for that long, until
> >>> he gets an OOM, its about as good as chasing ghost to worry about it.
> >>>
> >>> I like to think of GC tuning as orange-hat. Mostly because I like the
> >>> color orange.
> >>>
> >>> Fuad Efendi wrote:
> >>>
> >>>>>> Ok. After the server ran for more than 12 hours, the time spent on
> GC
> >>>>>> decreased from 11% to 3,4%, but 5 hours later it crashed.
> >>>>>>
> >>>>>>
> >>>> All this 'black-hat' GC tuning and 'fast' object moving (especially
> >>>>
> >> objects
> >>
> >>>> accessing by some thread during GC-defragmentation)
> >>>>
> >>>> - try to use multithreaded load-stress tools (at least 100 requests
> >>>> in-parallel) and see that you need at least double memory if 12Gb is
> >>>> threshold for your FieldCache (largest objects)
> >>>>
> >>>>
> >>>> Also, don't trust this counters:
> >>>>
> >>>>
> >>>>> So I logged the Garbage Collection activity to check if it's because
> >>>>>
> >> of
> >>
> >>>>> that. It seems like 11% of the time the application runs, it is
> >>>>>
> >> stopped
> >>
> >>>>> because of GC.
> >>>>>
> >>>>>
> >>>> Stopped? Of course, locking/unlocking in order to move objects
> >>>>
> >> currently
> >>
> >>>> accessesd in multiuser-multithreaded Tomcat... you can easily create
> >>>>
> >> crash
> >>
> >>>> scenario proving that latest-greatest JVMs are buggy too.
> >>>>
> >>>>
> >>>>
> >>>> Don't forget: Tomcat is multithreaded, and if 'core' needs 10Gb in
> >>>>
> >> order
> >> to
> >>
> >>>> avoid OOM, you need to double it (in order to warm new cash instances
> >>>>
> >> on
> >>
> >>>> index replica / update).
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>> --
> >>> - Mark
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >>
> >>
> >
> >
> --
> - Mark

Re: Solr and Garbage Collection

2009-09-27 Thread Mark Miller
Jonathan Ariel wrote:
> Well.. it is strange that when I use the default GC I don't get any errors.
Not so strange - it's different code. The bug is Likely in the low pause
collector and not the serial collector.
> If I'm so close to run out of memory I should see those OOM exceptions as
> well with the standard GC.
Those? Your not seeing any that you mentioned unless you lower your heap?
> BTW I'm faceting on around 13 fields and my total
> number of unique values is around 3.
> One of the fields with the biggest amount of unique values has almost 16000
> unique values.
> On Sun, Sep 27, 2009 at 4:32 PM, Fuad Efendi  wrote:
>> Mark,
>> Nothing against orange-hat :)
>> Nothing against GC tuning; but if SOLR needs application-specific settings
>> it should be well-documented.
>> GC-tuning: for instance, we need it for 'realtime' Online Trading
>> applications. However, even Online Banking doesn't need; primary reason -
>> GC
>> must happen 'outside of current transaction', GC 'must be predictable', and
>> (for instance) Oracle/BEA JRockit has specific 'realtime' version for
>> that... Does SOLR need that?
>> Having load-stress simulator (multithreaded!!!) will definitely help to
>> predict any possible bottleneck... it's even better to write it from
>> scratch
>> (depends on schema!), by sending random requests to SOLR in-parallel...
>> instead of waiting when FieldCache tries to add new FieldImpl to cache
>> (unpredictable!)
>> Tomcat is multithreaded; what if end-users need to load 1000s large
>> documents (in parallel! 1000s concurrent users), can you predict memory
>> requirements and GC options without application-specific knowledge? What
>> about new SOLR-Caches warming up?
>> -Fuad
>>> -Original Message-
>>> From: Mark Miller []
>>> Sent: September-27-09 2:46 PM
>>> To:
>>> Subject: Re: Solr and Garbage Collection
>>> If he needed double the RAM, he'd likely know by now :) The JVM likes to
>>> throw OOM exceptions when you need more RAM. Until it does - thats an
>>> odd path to focus on. There has been no indication he has ever seen an
>>> OOM with his over 10 GB heap.  It sounds like he has run Solr in his
>>> environment for quite a long time - after running for that long, until
>>> he gets an OOM, its about as good as chasing ghost to worry about it.
>>> I like to think of GC tuning as orange-hat. Mostly because I like the
>>> color orange.
>>> Fuad Efendi wrote:
>>>>>> Ok. After the server ran for more than 12 hours, the time spent on GC
>>>>>> decreased from 11% to 3,4%, but 5 hours later it crashed.
>>>> All this 'black-hat' GC tuning and 'fast' object moving (especially
>> objects
>>>> accessing by some thread during GC-defragmentation)
>>>> - try to use multithreaded load-stress tools (at least 100 requests
>>>> in-parallel) and see that you need at least double memory if 12Gb is
>>>> threshold for your FieldCache (largest objects)
>>>> Also, don't trust this counters:
>>>>> So I logged the Garbage Collection activity to check if it's because
>> of
>>>>> that. It seems like 11% of the time the application runs, it is
>> stopped
>>>>> because of GC.
>>>> Stopped? Of course, locking/unlocking in order to move objects
>> currently
>>>> accessesd in multiuser-multithreaded Tomcat... you can easily create
>> crash
>>>> scenario proving that latest-greatest JVMs are buggy too.
>>>> Don't forget: Tomcat is multithreaded, and if 'core' needs 10Gb in
>> order
>> to
>>>> avoid OOM, you need to double it (in order to warm new cash instances
>> on
>>>> index replica / update).
>>> --
>>> - Mark

- Mark

Re: Solr and Garbage Collection

2009-09-27 Thread Mark Miller
Fuad Efendi wrote:
> Mark,
> Nothing against orange-hat :)
> Nothing against GC tuning; but if SOLR needs application-specific settings
> it should be well-documented.
> GC-tuning: for instance, we need it for 'realtime' Online Trading
> applications. However, even Online Banking doesn't need; primary reason - GC
> must happen 'outside of current transaction', GC 'must be predictable', and
> (for instance) Oracle/BEA JRockit has specific 'realtime' version for
> that... Does SOLR need that?
I'm not sure that Solr needs anything specific - but with a heap near 10
GB, you really do need some sort of
parrallel or concurrent collection of the tenured space - unless you can
live with the long pauses. I don't think
thats Solr specific though.

> Having load-stress simulator (multithreaded!!!) will definitely help to
> predict any possible bottleneck... it's even better to write it from scratch
> (depends on schema!), by sending random requests to SOLR in-parallel...
> instead of waiting when FieldCache tries to add new FieldImpl to cache
> (unpredictable!)
Yup - no argument from me here. Perhaps he does need more RAM and will
find that out. Testing for that is a good idea. But by the sound of it,
I just don't think we can guess that yet. I'm not against him testing to
see though - its just semi a solution looking for a problem at the
moment. It sounds like he is running this thing for hours and hours in a
semi real environment (else why all the GC). He hasn't mentioned any
need for more RAM yet.

Again though, I'm not saying he shouldn't make sure he has enough RAM
under any scenario. Everyone should. It just doesn't seem to be an issue
hes indicated hes having.
> Tomcat is multithreaded; what if end-users need to load 1000s large
> documents (in parallel! 1000s concurrent users), can you predict memory
> requirements and GC options without application-specific knowledge? What
> about new SOLR-Caches warming up?
> -Fuad
>> -Original Message-
>> From: Mark Miller []
>> Sent: September-27-09 2:46 PM
>> To:
>> Subject: Re: Solr and Garbage Collection
>> If he needed double the RAM, he'd likely know by now :) The JVM likes to
>> throw OOM exceptions when you need more RAM. Until it does - thats an
>> odd path to focus on. There has been no indication he has ever seen an
>> OOM with his over 10 GB heap.  It sounds like he has run Solr in his
>> environment for quite a long time - after running for that long, until
>> he gets an OOM, its about as good as chasing ghost to worry about it.
>> I like to think of GC tuning as orange-hat. Mostly because I like the
>> color orange.
>> Fuad Efendi wrote:
>>>>> Ok. After the server ran for more than 12 hours, the time spent on GC
>>>>> decreased from 11% to 3,4%, but 5 hours later it crashed.
>>> All this 'black-hat' GC tuning and 'fast' object moving (especially
> objects
>>> accessing by some thread during GC-defragmentation)
>>> - try to use multithreaded load-stress tools (at least 100 requests
>>> in-parallel) and see that you need at least double memory if 12Gb is
>>> threshold for your FieldCache (largest objects)
>>> Also, don't trust this counters:
>>>> So I logged the Garbage Collection activity to check if it's because of
>>>> that. It seems like 11% of the time the application runs, it is stopped
>>>> because of GC.
>>> Stopped? Of course, locking/unlocking in order to move objects currently
>>> accessesd in multiuser-multithreaded Tomcat... you can easily create
> crash
>>> scenario proving that latest-greatest JVMs are buggy too.
>>> Don't forget: Tomcat is multithreaded, and if 'core' needs 10Gb in order
> to
>>> avoid OOM, you need to double it (in order to warm new cash instances on
>>> index replica / update).
>> --
>> - Mark

- Mark

Re: Solr and Garbage Collection

2009-09-27 Thread Jonathan Ariel
Well.. it is strange that when I use the default GC I don't get any errors.
If I'm so close to run out of memory I should see those OOM exceptions as
well with the standard GC.BTW I'm faceting on around 13 fields and my total
number of unique values is around 3.
One of the fields with the biggest amount of unique values has almost 16000
unique values.

On Sun, Sep 27, 2009 at 4:32 PM, Fuad Efendi  wrote:

> Mark,
> Nothing against orange-hat :)
> Nothing against GC tuning; but if SOLR needs application-specific settings
> it should be well-documented.
> GC-tuning: for instance, we need it for 'realtime' Online Trading
> applications. However, even Online Banking doesn't need; primary reason -
> GC
> must happen 'outside of current transaction', GC 'must be predictable', and
> (for instance) Oracle/BEA JRockit has specific 'realtime' version for
> that... Does SOLR need that?
> Having load-stress simulator (multithreaded!!!) will definitely help to
> predict any possible bottleneck... it's even better to write it from
> scratch
> (depends on schema!), by sending random requests to SOLR in-parallel...
> instead of waiting when FieldCache tries to add new FieldImpl to cache
> (unpredictable!)
> Tomcat is multithreaded; what if end-users need to load 1000s large
> documents (in parallel! 1000s concurrent users), can you predict memory
> requirements and GC options without application-specific knowledge? What
> about new SOLR-Caches warming up?
> -Fuad
> > -----Original Message-----
> > From: Mark Miller []
> > Sent: September-27-09 2:46 PM
> > To:
> > Subject: Re: Solr and Garbage Collection
> >
> > If he needed double the RAM, he'd likely know by now :) The JVM likes to
> > throw OOM exceptions when you need more RAM. Until it does - thats an
> > odd path to focus on. There has been no indication he has ever seen an
> > OOM with his over 10 GB heap.  It sounds like he has run Solr in his
> > environment for quite a long time - after running for that long, until
> > he gets an OOM, its about as good as chasing ghost to worry about it.
> >
> > I like to think of GC tuning as orange-hat. Mostly because I like the
> > color orange.
> >
> > Fuad Efendi wrote:
> > >>> Ok. After the server ran for more than 12 hours, the time spent on GC
> > >>> decreased from 11% to 3,4%, but 5 hours later it crashed.
> > >>>
> > >
> > > All this 'black-hat' GC tuning and 'fast' object moving (especially
> objects
> > > accessing by some thread during GC-defragmentation)
> > >
> > > - try to use multithreaded load-stress tools (at least 100 requests
> > > in-parallel) and see that you need at least double memory if 12Gb is
> > > threshold for your FieldCache (largest objects)
> > >
> > >
> > > Also, don't trust this counters:
> > >
> > >> So I logged the Garbage Collection activity to check if it's because
> of
> > >> that. It seems like 11% of the time the application runs, it is
> stopped
> > >> because of GC.
> > >>
> > >
> > >
> > > Stopped? Of course, locking/unlocking in order to move objects
> currently
> > > accessesd in multiuser-multithreaded Tomcat... you can easily create
> crash
> > > scenario proving that latest-greatest JVMs are buggy too.
> > >
> > >
> > >
> > > Don't forget: Tomcat is multithreaded, and if 'core' needs 10Gb in
> order
> to
> > > avoid OOM, you need to double it (in order to warm new cash instances
> on
> > > index replica / update).
> > >
> > >
> > >
> > >
> > >
> > >
> >
> >
> > --
> > - Mark
> >
> >
> >
> >

RE: Solr and Garbage Collection

2009-09-27 Thread Fuad Efendi

Nothing against orange-hat :)

Nothing against GC tuning; but if SOLR needs application-specific settings
it should be well-documented.

GC-tuning: for instance, we need it for 'realtime' Online Trading
applications. However, even Online Banking doesn't need; primary reason - GC
must happen 'outside of current transaction', GC 'must be predictable', and
(for instance) Oracle/BEA JRockit has specific 'realtime' version for
that... Does SOLR need that?

Having load-stress simulator (multithreaded!!!) will definitely help to
predict any possible bottleneck... it's even better to write it from scratch
(depends on schema!), by sending random requests to SOLR in-parallel...
instead of waiting when FieldCache tries to add new FieldImpl to cache

Tomcat is multithreaded; what if end-users need to load 1000s large
documents (in parallel! 1000s concurrent users), can you predict memory
requirements and GC options without application-specific knowledge? What
about new SOLR-Caches warming up?


> -Original Message-
> From: Mark Miller []
> Sent: September-27-09 2:46 PM
> To:
> Subject: Re: Solr and Garbage Collection
> If he needed double the RAM, he'd likely know by now :) The JVM likes to
> throw OOM exceptions when you need more RAM. Until it does - thats an
> odd path to focus on. There has been no indication he has ever seen an
> OOM with his over 10 GB heap.  It sounds like he has run Solr in his
> environment for quite a long time - after running for that long, until
> he gets an OOM, its about as good as chasing ghost to worry about it.
> I like to think of GC tuning as orange-hat. Mostly because I like the
> color orange.
> Fuad Efendi wrote:
> >>> Ok. After the server ran for more than 12 hours, the time spent on GC
> >>> decreased from 11% to 3,4%, but 5 hours later it crashed.
> >>>
> >
> > All this 'black-hat' GC tuning and 'fast' object moving (especially
> > accessing by some thread during GC-defragmentation)
> >
> > - try to use multithreaded load-stress tools (at least 100 requests
> > in-parallel) and see that you need at least double memory if 12Gb is
> > threshold for your FieldCache (largest objects)
> >
> >
> > Also, don't trust this counters:
> >
> >> So I logged the Garbage Collection activity to check if it's because of
> >> that. It seems like 11% of the time the application runs, it is stopped
> >> because of GC.
> >>
> >
> >
> > Stopped? Of course, locking/unlocking in order to move objects currently
> > accessesd in multiuser-multithreaded Tomcat... you can easily create
> > scenario proving that latest-greatest JVMs are buggy too.
> >
> >
> >
> > Don't forget: Tomcat is multithreaded, and if 'core' needs 10Gb in order
> > avoid OOM, you need to double it (in order to warm new cash instances on
> > index replica / update).
> >
> >
> >
> >
> >
> >
> --
> - Mark

Re: Solr and Garbage Collection

2009-09-27 Thread Mark Miller
If he needed double the RAM, he'd likely know by now :) The JVM likes to
throw OOM exceptions when you need more RAM. Until it does - thats an
odd path to focus on. There has been no indication he has ever seen an
OOM with his over 10 GB heap.  It sounds like he has run Solr in his
environment for quite a long time - after running for that long, until
he gets an OOM, its about as good as chasing ghost to worry about it.

I like to think of GC tuning as orange-hat. Mostly because I like the
color orange.

Fuad Efendi wrote:
>>> Ok. After the server ran for more than 12 hours, the time spent on GC
>>> decreased from 11% to 3,4%, but 5 hours later it crashed.
> All this 'black-hat' GC tuning and 'fast' object moving (especially objects
> accessing by some thread during GC-defragmentation)
> - try to use multithreaded load-stress tools (at least 100 requests
> in-parallel) and see that you need at least double memory if 12Gb is
> threshold for your FieldCache (largest objects)
> Also, don't trust this counters:
>> So I logged the Garbage Collection activity to check if it's because of
>> that. It seems like 11% of the time the application runs, it is stopped
>> because of GC.
> Stopped? Of course, locking/unlocking in order to move objects currently
> accessesd in multiuser-multithreaded Tomcat... you can easily create crash
> scenario proving that latest-greatest JVMs are buggy too.
> Don't forget: Tomcat is multithreaded, and if 'core' needs 10Gb in order to
> avoid OOM, you need to double it (in order to warm new cash instances on
> index replica / update).

- Mark

RE: Solr and Garbage Collection

2009-09-27 Thread Fuad Efendi
>> Ok. After the server ran for more than 12 hours, the time spent on GC
>> decreased from 11% to 3,4%, but 5 hours later it crashed.

All this 'black-hat' GC tuning and 'fast' object moving (especially objects
accessing by some thread during GC-defragmentation)

- try to use multithreaded load-stress tools (at least 100 requests
in-parallel) and see that you need at least double memory if 12Gb is
threshold for your FieldCache (largest objects)

Also, don't trust this counters:
>So I logged the Garbage Collection activity to check if it's because of
>that. It seems like 11% of the time the application runs, it is stopped
>because of GC.

Stopped? Of course, locking/unlocking in order to move objects currently
accessesd in multiuser-multithreaded Tomcat... you can easily create crash
scenario proving that latest-greatest JVMs are buggy too.

Don't forget: Tomcat is multithreaded, and if 'core' needs 10Gb in order to
avoid OOM, you need to double it (in order to warm new cash instances on
index replica / update).

Re: Solr and Garbage Collection

2009-09-26 Thread Jonathan Ariel
Yes, it seems like a bug. I will update my JVM, try again and let you
know the results :)

On 9/26/09, Mark Miller  wrote:
> Jonathan Ariel wrote:
>> Ok. After the server ran for more than 12 hours, the time spent on GC
>> decreased from 11% to 3,4%, but 5 hours later it crashed. This is the
>> thread
>> dump, maybe you can help identify what happened?
> Well thats a tough ;) My guess is its a bug :)
> Your two survivor spaces are filled, so it was likely about to move
> objects into the tenured space, which still has plenty of room for them
> (barring horrible fragmentation). Any issues with that type of thing
> should generate an OOM anyway though. You can find people that have run
> into similar issues in the past, but a lot of times unreproducible.
> Usually, their bugs are closed and they are told to try a newer JVM.
> Your JVM appears to be quite a few versions back. There have been many
> garbage collection bugs fixed in the 7 or so updates since your version,
> a good handful of them related to CMS.
> If you can, my best suggestion at the moment is to upgrade to the latest
> and see how that fairs.
> If not, you might see if going back to the throughput collector and
> turning on the parallel tenured space collector might meet your needs
> instead. You can work with other params to get that going better if you
> have to as well.
> Also, adjusting other settings with the low pause collector might
> trigger something to side step the bug. Not a great option there though ;)
> How many unique fields are you sorting/faceting on? It must be a lot if
> you need 10 gig for 8 million documents. Its kind of rough to have to
> work at such a close limit to your total heap available as a min mem
> requirement.
> --
> - Mark
>> #
>> # An unexpected error has been detected by Java Runtime Environment:
>> #
>> #  SIGSEGV (0xb) at pc=0x2b4e0f69ea2a, pid=32224, tid=1103812928
>> #
>> # Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode
>> linux-amd64)
>> # Problematic frame:
>> # V  []
>> #
>> # If you would like to submit a bug report, please visit:
>> #
>> #
>> ---  T H R E A D  ---
>> Current thread (0x5be47400):  VMThread [stack:
>> 0x41bad000,0x41cae000] [id=32249]
>> siginfo:si_signo=SIGSEGV: si_errno=0, si_code=128 (),
>> si_addr=0x
>> Registers:
>> RAX=0x2aac929b4c70, RBX=0x0037c985003a095e, RCX=0x0006,
>> RDX=0x005c49870037c996
>> RSP=0x41cac550, RBP=0x41cac550, RSI=0x2aac929b4c70,
>> RDI=0x0037c985003a095e
>> R8 =0x2aadab201538, R9 =0x0005, R10=0x0001,
>> R11=0x0010
>> R12=0x2aac929b4c70, R13=0x2aac9289cf58, R14=0x2aac9289cf40,
>> R15=0x2aadab2015ac
>> RIP=0x2b4e0f69ea2a, EFL=0x00010206, CSGSFS=0x0033,
>> ERR=0x
>>   TRAPNO=0x000d
>> Top of Stack: (sp=0x41cac550)
>> 0x41cac550:   41cac580 2b4e0f903c5b
>> 0x41cac560:   41cac590 0003
>> 0x41cac570:   2aac9289cf50 2aadab2015a8
>> 0x41cac580:   41cac5c0 2b4e0f72e388
>> 0x41cac590:   41cac5c0 2aac9289cf40
>> 0x41cac5a0:   0005 2b4e0fc86330
>> 0x41cac5b0:    2b4e0fd8c740
>> 0x41cac5c0:   41cac5f0 2b4e0f903b7f
>> 0x41cac5d0:   41cac610 0003
>> 0x41cac5e0:   2aaccb1750f8 2aaccea41570
>> 0x41cac5f0:   41cac610 2b4e0f931548
>> 0x41cac600:   2b4e0fc861d8 2aadd4052ab0
>> 0x41cac610:   41cac640 2b4e0f903d1a
>> 0x41cac620:   41cac650 0003
>> 0x41cac630:   5bc7d6d0 2b4e0fd8c740
>> 0x41cac640:   41cac650 2b4e0f90411c
>> 0x41cac650:   41cac680 2b4e0fa1d16e
>> 0x41cac660:    5bc7d6d0
>> 0x41cac670:   0002 2b4e0fd8c740
>> 0x41cac680:   41cac6c0 2b4e0fa74640
>> 0x41cac690:   41cac6b0 5bc7d6d0
>> 0x41cac6a0:   0002 2b4e0fd8c740
>> 0x41cac6b0:   0001 2b4e0fd8c740
>> 0x41cac6c0:   41cac700 2b4e0f9a52da
>> 0x41cac6d0:   bfc0 
>> 0x41cac6e0:   2b4e0fd8c740 5bc7d6d0
>> 0x41cac6f0:   2b4e0fd8c740 0001
>> 0x41cac700:   41cac750 2b4e0f6feb80
>> 0x41cac710:   449dae1d9ae42358 3ff0cccd
>> 0x41cac720:   2aad289aa680 0001
>> 0x41cac730:    41cac780
>> 0x41cac740:   0001 5bc7d6d0
>> Instructions: (pc=0

Re: Solr and Garbage Collection

2009-09-26 Thread Mark Miller
Sorry Walter. Half the time I type faster than I think. I was mixing
concurrent with parallel.
I do agree with you on the concurrent part for batch processing (and
likely other things).
It would likely be far better to use as many CPU's as you can (as many
as make sense) collecting in
parallel while the world is stopped, rather than paying to do it
concurrently. My fault on the confusion.

Parallel, super important for large heaps.
Concurrent, supper important for systems that always need low response

Hence the Parallel collector being named the throughput collector :)

Sorry for the confusion - wouldn't be the first time ;)

I'll stick to my generational argument though - as I said, if most of
your objects are long lived (*extremely* rare from what I know), it make
senses, but in almost all cases, its super helpful. Which is why sun
doesnt even offer non generational anymore.

- Mark

Mark Miller wrote:
> Walter Underwood wrote:
>> For batch-oriented computing, like Hadoop, the most efficient GC is probably
>> a non-concurrent, non-generational GC. 
> Okay - for batch we somewhat agree I guess - if you can stand any length
> of pausing, non concurrent can be nice, because you don't pay for thread
> sync communication. Only with a small heap size though (less than 100MB
> is what I've seen). You would pause the batch job while GC takes place.
> If you have 8 processors, and you are pausing all of them to collect a
> large heap using only 1 processor, that doesn't make much sense to me.
> The thread communication pain will be far outweighed by using more
> processors to do the collection faster, and not "stop the world" for
> your batch job so long. Stopping your application dead in its tracks,
> and then only using one of the available processors to collect a large
> heap, while the rest sit idle, doesn't make much sense.
> I also don't agree it ever really makes sense not to do generational
> collection. What is your argument here? Generational collection is
> **way** more efficient for short lived objects, which tend to be up to
> 98% of the objects in most applications. The only way I see that making
> sense is if you have almost no short lived objects (which occurs in
> what, .0001% of apps if at all?). The Sun JVM doesn't even offer a non
> generational approach anymore. It's just standard GC practice.
>> I doubt that there are many
>> batch-oriented applications of Solr, though.
>> The rest of the advice is intended to be general and it sounds like we agree
>> about sizing. If the nursery is not big enough, the tenured space will be
>> used for allocations that have a short lifetime and that will increase the
>> length and/or frequency of major collections.
> Yes - I wasn't arguing with every point - I was picking and choosing :)
> After the heap size, the size of the young generation is the most
> important factor.
>> Cache evictions are the interesting part, because they cause a constant rate
>> of tenured space garbage. In most many servers, you can get a big enough
>> nursery that major collections are very rare. That won't happen in Solr
>> because of cache evictions.
>> The IBM JVM is excellent. Their concurrent generational GC policy is
>> "gencon".
> Yeah, I actually know very little about the IBM JVM, so I wasn't really
> commenting. But from the info I gleaned here and on a couple quick web
> searches, I'm not too impressed by it's GC.
>> wunder
>> -Original Message-
>> From: Mark Miller [] 
>> Sent: Friday, September 25, 2009 10:31 AM
>> To:
>> Subject: Re: Solr and Garbage Collection
>> My bad - later, it looks as if your giving general advice, and thats
>> what I took issue with.
>> Any Collector that is not doing generational collection is essentially
>> from the dark ages and shouldn't be used.
>> Any Collector that doesn't have concurrent options, unless possibly your
>> running a tiny app (under 100MB of RAM), or only have a single CPU, is
>> also dark ages, and not fit for a server environement.
>> I havn't kept up with IBM's JVM, but it sounds like they are well behind
>> Sun in GC then.
>> - Mark
>> Walter Underwood wrote:
>>> As I said, I was using the IBM JVM, not the Sun JVM. The "concurrent low
>>> pause" collector is only in the Sun JVM.
>>> I just found this excellent articl

Re: Solr and Garbage Collection

2009-09-26 Thread Mark Miller
Also, in case the info might help track something down:

Its pretty darn odd that both your survivor spaces are full. I've never
seen that ever in one of these dumps. Always one is empty. When one is
filled, its moved to the other. Then back. And forth. For a certain
number of times until its moved into the tenured space. Both being
filled like that really seems like a bug to me - I've looked over tons
of the dumps in the past (random ones online), and I have never seen one
of the survivor spaces not empty.

Mark Miller wrote:
> Jonathan Ariel wrote:
>> Ok. After the server ran for more than 12 hours, the time spent on GC
>> decreased from 11% to 3,4%, but 5 hours later it crashed. This is the thread
>> dump, maybe you can help identify what happened?
> Well thats a tough ;) My guess is its a bug :)
> Your two survivor spaces are filled, so it was likely about to move
> objects into the tenured space, which still has plenty of room for them
> (barring horrible fragmentation). Any issues with that type of thing
> should generate an OOM anyway though. You can find people that have run
> into similar issues in the past, but a lot of times unreproducible.
> Usually, their bugs are closed and they are told to try a newer JVM.
> Your JVM appears to be quite a few versions back. There have been many
> garbage collection bugs fixed in the 7 or so updates since your version,
> a good handful of them related to CMS.
> If you can, my best suggestion at the moment is to upgrade to the latest
> and see how that fairs.
> If not, you might see if going back to the throughput collector and
> turning on the parallel tenured space collector might meet your needs
> instead. You can work with other params to get that going better if you
> have to as well.
> Also, adjusting other settings with the low pause collector might
> trigger something to side step the bug. Not a great option there though ;)
> How many unique fields are you sorting/faceting on? It must be a lot if
> you need 10 gig for 8 million documents. Its kind of rough to have to
> work at such a close limit to your total heap available as a min mem
> requirement.

- Mark

Re: Solr and Garbage Collection

2009-09-26 Thread Mark Miller
Jonathan Ariel wrote:
> Ok. After the server ran for more than 12 hours, the time spent on GC
> decreased from 11% to 3,4%, but 5 hours later it crashed. This is the thread
> dump, maybe you can help identify what happened?
Well thats a tough ;) My guess is its a bug :)

Your two survivor spaces are filled, so it was likely about to move
objects into the tenured space, which still has plenty of room for them
(barring horrible fragmentation). Any issues with that type of thing
should generate an OOM anyway though. You can find people that have run
into similar issues in the past, but a lot of times unreproducible.
Usually, their bugs are closed and they are told to try a newer JVM.

Your JVM appears to be quite a few versions back. There have been many
garbage collection bugs fixed in the 7 or so updates since your version,
a good handful of them related to CMS.

If you can, my best suggestion at the moment is to upgrade to the latest
and see how that fairs.

If not, you might see if going back to the throughput collector and
turning on the parallel tenured space collector might meet your needs
instead. You can work with other params to get that going better if you
have to as well.

Also, adjusting other settings with the low pause collector might
trigger something to side step the bug. Not a great option there though ;)

How many unique fields are you sorting/faceting on? It must be a lot if
you need 10 gig for 8 million documents. Its kind of rough to have to
work at such a close limit to your total heap available as a min mem

- Mark

> #
> # An unexpected error has been detected by Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x2b4e0f69ea2a, pid=32224, tid=1103812928
> #
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode
> linux-amd64)
> # Problematic frame:
> # V  []
> #
> # If you would like to submit a bug report, please visit:
> #
> #
> ---  T H R E A D  ---
> Current thread (0x5be47400):  VMThread [stack:
> 0x41bad000,0x41cae000] [id=32249]
> siginfo:si_signo=SIGSEGV: si_errno=0, si_code=128 (),
> si_addr=0x
> Registers:
> RAX=0x2aac929b4c70, RBX=0x0037c985003a095e, RCX=0x0006,
> RDX=0x005c49870037c996
> RSP=0x41cac550, RBP=0x41cac550, RSI=0x2aac929b4c70,
> RDI=0x0037c985003a095e
> R8 =0x2aadab201538, R9 =0x0005, R10=0x0001,
> R11=0x0010
> R12=0x2aac929b4c70, R13=0x2aac9289cf58, R14=0x2aac9289cf40,
> R15=0x2aadab2015ac
> RIP=0x2b4e0f69ea2a, EFL=0x00010206, CSGSFS=0x0033,
> ERR=0x
>   TRAPNO=0x000d
> Top of Stack: (sp=0x41cac550)
> 0x41cac550:   41cac580 2b4e0f903c5b
> 0x41cac560:   41cac590 0003
> 0x41cac570:   2aac9289cf50 2aadab2015a8
> 0x41cac580:   41cac5c0 2b4e0f72e388
> 0x41cac590:   41cac5c0 2aac9289cf40
> 0x41cac5a0:   0005 2b4e0fc86330
> 0x41cac5b0:    2b4e0fd8c740
> 0x41cac5c0:   41cac5f0 2b4e0f903b7f
> 0x41cac5d0:   41cac610 0003
> 0x41cac5e0:   2aaccb1750f8 2aaccea41570
> 0x41cac5f0:   41cac610 2b4e0f931548
> 0x41cac600:   2b4e0fc861d8 2aadd4052ab0
> 0x41cac610:   41cac640 2b4e0f903d1a
> 0x41cac620:   41cac650 0003
> 0x41cac630:   5bc7d6d0 2b4e0fd8c740
> 0x41cac640:   41cac650 2b4e0f90411c
> 0x41cac650:   41cac680 2b4e0fa1d16e
> 0x41cac660:    5bc7d6d0
> 0x41cac670:   0002 2b4e0fd8c740
> 0x41cac680:   41cac6c0 2b4e0fa74640
> 0x41cac690:   41cac6b0 5bc7d6d0
> 0x41cac6a0:   0002 2b4e0fd8c740
> 0x41cac6b0:   0001 2b4e0fd8c740
> 0x41cac6c0:   41cac700 2b4e0f9a52da
> 0x41cac6d0:   bfc0 
> 0x41cac6e0:   2b4e0fd8c740 5bc7d6d0
> 0x41cac6f0:   2b4e0fd8c740 0001
> 0x41cac700:   41cac750 2b4e0f6feb80
> 0x41cac710:   449dae1d9ae42358 3ff0cccd
> 0x41cac720:   2aad289aa680 0001
> 0x41cac730:    41cac780
> 0x41cac740:   0001 5bc7d6d0
> Instructions: (pc=0x2b4e0f69ea2a)
> 0x2b4e0f69ea1a:   89 e5 48 83 f9 05 74 38 48 8b 56 08 48 83 c2 10
> 0x2b4e0f69ea2a:   48 8b b2 a0 00 00 00 ba 01 00 00 00 83 e6 07 48
> Stack: [0x41bad000,0x41cae000],  sp=0x41cac550,
>  free space=1021k

Re: Solr and Garbage Collection

2009-09-26 Thread Jonathan Ariel
Ok. After the server ran for more than 12 hours, the time spent on GC
decreased from 11% to 3,4%, but 5 hours later it crashed. This is the thread
dump, maybe you can help identify what happened?
# An unexpected error has been detected by Java Runtime Environment:
#  SIGSEGV (0xb) at pc=0x2b4e0f69ea2a, pid=32224, tid=1103812928
# Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode
# Problematic frame:
# V  []
# If you would like to submit a bug report, please visit:

---  T H R E A D  ---

Current thread (0x5be47400):  VMThread [stack:
0x41bad000,0x41cae000] [id=32249]

siginfo:si_signo=SIGSEGV: si_errno=0, si_code=128 (),

RAX=0x2aac929b4c70, RBX=0x0037c985003a095e, RCX=0x0006,
RSP=0x41cac550, RBP=0x41cac550, RSI=0x2aac929b4c70,
R8 =0x2aadab201538, R9 =0x0005, R10=0x0001,
R12=0x2aac929b4c70, R13=0x2aac9289cf58, R14=0x2aac9289cf40,
RIP=0x2b4e0f69ea2a, EFL=0x00010206, CSGSFS=0x0033,

Top of Stack: (sp=0x41cac550)
0x41cac550:   41cac580 2b4e0f903c5b
0x41cac560:   41cac590 0003
0x41cac570:   2aac9289cf50 2aadab2015a8
0x41cac580:   41cac5c0 2b4e0f72e388
0x41cac590:   41cac5c0 2aac9289cf40
0x41cac5a0:   0005 2b4e0fc86330
0x41cac5b0:    2b4e0fd8c740
0x41cac5c0:   41cac5f0 2b4e0f903b7f
0x41cac5d0:   41cac610 0003
0x41cac5e0:   2aaccb1750f8 2aaccea41570
0x41cac5f0:   41cac610 2b4e0f931548
0x41cac600:   2b4e0fc861d8 2aadd4052ab0
0x41cac610:   41cac640 2b4e0f903d1a
0x41cac620:   41cac650 0003
0x41cac630:   5bc7d6d0 2b4e0fd8c740
0x41cac640:   41cac650 2b4e0f90411c
0x41cac650:   41cac680 2b4e0fa1d16e
0x41cac660:    5bc7d6d0
0x41cac670:   0002 2b4e0fd8c740
0x41cac680:   41cac6c0 2b4e0fa74640
0x41cac690:   41cac6b0 5bc7d6d0
0x41cac6a0:   0002 2b4e0fd8c740
0x41cac6b0:   0001 2b4e0fd8c740
0x41cac6c0:   41cac700 2b4e0f9a52da
0x41cac6d0:   bfc0 
0x41cac6e0:   2b4e0fd8c740 5bc7d6d0
0x41cac6f0:   2b4e0fd8c740 0001
0x41cac700:   41cac750 2b4e0f6feb80
0x41cac710:   449dae1d9ae42358 3ff0cccd
0x41cac720:   2aad289aa680 0001
0x41cac730:    41cac780
0x41cac740:   0001 5bc7d6d0

Instructions: (pc=0x2b4e0f69ea2a)
0x2b4e0f69ea1a:   89 e5 48 83 f9 05 74 38 48 8b 56 08 48 83 c2 10
0x2b4e0f69ea2a:   48 8b b2 a0 00 00 00 ba 01 00 00 00 83 e6 07 48

Stack: [0x41bad000,0x41cae000],  sp=0x41cac550,
 free space=1021k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
V  []
V  []
V  []
V  []
V  []
V  []
V  []
V  []
V  []
V  []
V  []
V  []
V  []
V  []
V  []
V  []
V  []
V  []
V  []
V  []
V  []
V  []
V  []
V  []
V  []

VM_Operation (0x4076bd20): GenCollectForAllocation, mode: safepoint,
requested by thread 0x5c42d800

---  P R O C E S S  ---

Java Threads: ( => current thread )
  0x5c466400 JavaThread "btpool0-502" [_thread_blocked, id=4508,
  0x5c2a2400 JavaThread "btpool0-501" [_thread_blocked, id=4507,
  0x5c0fec00 JavaThread "btpool0-500" [_thread_blocked, id=4506,
  0x5c2ce400 JavaThread "btpool0-498" [_thread_blocked, id=4504,
  0x5be69000 JavaThread "btpool0-497" [_thread_blocked, id=4503,
  0x5c30e000 JavaThread "btpool0-496" [_thread_blocked, id=4251,

Re: Solr and Garbage Collection

2009-09-26 Thread Mark Miller
Jonathan Ariel wrote:
> I have around 8M documents.
Thats actually not so bad - I take it you are faceting/sorting on quite
a few unique fields?

> I set up my server to use a different collector and it seems like it
> decreased from 11% to 4%, of course I need to wait a bit more because it is
> just a 1 hour old log. But it seems like it is much better now.
> I will tell you on Monday the results :)
Are you still seeing major collections then? (eg the tenured space hits
its limit) You might be able to get even better.
> On Fri, Sep 25, 2009 at 6:07 PM, Mark Miller  wrote:
>> Thats a good point too - if you can reduce your need for such a large
>> heap, by all means, do so.
>> However, considering you already need at least 10GB or you get OOM, you
>> have a long way to go with that approach. Good luck :)
>> How many docs do you have ? I'm guessing its mostly FieldCache type
>> stuff, and thats the type of thing you can't really side step, unless
>> you give up the functionality thats using it.
>> Grant Ingersoll wrote:
>>> On Sep 25, 2009, at 9:30 AM, Jonathan Ariel wrote:
 Hi to all!
 Lately my solr servers seem to stop responding once in a while. I'm
 solr 1.3.
 Of course I'm having more traffic on the servers.
 So I logged the Garbage Collection activity to check if it's because of
 that. It seems like 11% of the time the application runs, it is stopped
 because of GC. And some times the GC takes up to 10 seconds!
 Is is normal? My instances run on a 16GB RAM, Dual Quad Core Intel Xeon
 servers. My index is around 10GB and I'm giving to the instances 10GB of

 How can I check which is the GC that it is being used? If I'm right JVM
 Ergonomics should use the Throughput GC, but I'm not 100% sure. Do
 you have
 any recommendation on this?
>>> As I said in Eteve's thread on JVM settings, some extra time spent on
>>> application design/debugging will save a whole lot of headache in
>>> Garbage Collection and trying to tune the gazillion different options
>>> available.  Ask yourself:  What is on the heap and does it need to be
>>> there?  For instance, do you, if you have them, really need sortable
>>> ints?   If your servers seem to come to a stop, I'm going to bet you
>>> have major collections going on.  Major collections in a production
>>> system are very bad.  They tend to happen right after commits in
>>> poorly tuned systems, but can also happen in other places if you let
>>> things build up due to really large heaps and/or things like really
>>> large cache settings.  I would pull up jConsole and have a look at
>>> what is happening when the pauses occur.  Is it a major collection?
>>> If so, then hook up a heap analyzer or a profiler and see what is on
>>> the heap around those times.  Then have a look at your schema/config,
>>> etc. and see if there are things that are memory intensive (sorting,
>>> faceting, excessively large filter caches).
>>> --
>>> Grant Ingersoll
>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
>>> using Solr/Lucene:
>> --
>> - Mark

- Mark

RE: Solr and Garbage Collection

2009-09-25 Thread Fuad Efendi
Sorry for OFF-topic:
Create dummy "Hello, World!" JSP, use Tomcat, execute load-stress
simulator(s) from separate machine(s), and measure... don't forget to
allocate necessary thread pools in Tomcat (if you have to)...
Although such JSP doesn't use any memory, you will see how easy one can go
with 5000 TPS (or 'virtually' 5 concurrent users) on modern quad-cores
by simply allocating more memory (...GB) and more Tomcat threads. There is
threshold too... repeat it with HTTPD Workers (and threads), same result,
although it doesn't use any GC. More memory - more threads - more "keep
alives" per TCP...

However, 'theoretically' you need only 64Mb for "Hello World" :)))

Re: Solr and Garbage Collection

2009-09-25 Thread Jonathan Ariel
I have around 8M documents.
I set up my server to use a different collector and it seems like it
decreased from 11% to 4%, of course I need to wait a bit more because it is
just a 1 hour old log. But it seems like it is much better now.
I will tell you on Monday the results :)

On Fri, Sep 25, 2009 at 6:07 PM, Mark Miller  wrote:

> Thats a good point too - if you can reduce your need for such a large
> heap, by all means, do so.
> However, considering you already need at least 10GB or you get OOM, you
> have a long way to go with that approach. Good luck :)
> How many docs do you have ? I'm guessing its mostly FieldCache type
> stuff, and thats the type of thing you can't really side step, unless
> you give up the functionality thats using it.
> Grant Ingersoll wrote:
> >
> > On Sep 25, 2009, at 9:30 AM, Jonathan Ariel wrote:
> >
> >> Hi to all!
> >> Lately my solr servers seem to stop responding once in a while. I'm
> >> using
> >> solr 1.3.
> >> Of course I'm having more traffic on the servers.
> >> So I logged the Garbage Collection activity to check if it's because of
> >> that. It seems like 11% of the time the application runs, it is stopped
> >> because of GC. And some times the GC takes up to 10 seconds!
> >> Is is normal? My instances run on a 16GB RAM, Dual Quad Core Intel Xeon
> >> servers. My index is around 10GB and I'm giving to the instances 10GB of
> >> RAM.
> >>
> >> How can I check which is the GC that it is being used? If I'm right JVM
> >> Ergonomics should use the Throughput GC, but I'm not 100% sure. Do
> >> you have
> >> any recommendation on this?
> >
> >
> > As I said in Eteve's thread on JVM settings, some extra time spent on
> > application design/debugging will save a whole lot of headache in
> > Garbage Collection and trying to tune the gazillion different options
> > available.  Ask yourself:  What is on the heap and does it need to be
> > there?  For instance, do you, if you have them, really need sortable
> > ints?   If your servers seem to come to a stop, I'm going to bet you
> > have major collections going on.  Major collections in a production
> > system are very bad.  They tend to happen right after commits in
> > poorly tuned systems, but can also happen in other places if you let
> > things build up due to really large heaps and/or things like really
> > large cache settings.  I would pull up jConsole and have a look at
> > what is happening when the pauses occur.  Is it a major collection?
> > If so, then hook up a heap analyzer or a profiler and see what is on
> > the heap around those times.  Then have a look at your schema/config,
> > etc. and see if there are things that are memory intensive (sorting,
> > faceting, excessively large filter caches).
> >
> > --
> > Grant Ingersoll
> >
> >
> > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
> > using Solr/Lucene:
> >
> >
> --
> - Mark

Re: Solr and Garbage Collection

2009-09-25 Thread Mark Miller
One more point and I'll stop - I've hit my email quota for the day ;)

While its a pain to have to juggle GC params and tune - when you require
a heap thats more than a gig or two, I personally believe its essential
to do so for good performance. The (default settings / ergonomics with
throughput) just don't cut it. Sad fact of life :) Luckily, you don't
generally have to do that much to get things nice - the number of
options is not that staggering, and you don't usually need to get into
most of them. Choosing the right collector, and tweaking a setting or
two can often be enough.

The most important to do with a large heap and the throughput collector
is to turn on parallel tenured collection. I've said it before, but it
really is key. At least if you have more than a processor or two -
which, for your sake, I hope you do :)

- Mark

Mark Miller wrote:
> Thats a good point too - if you can reduce your need for such a large
> heap, by all means, do so.
> However, considering you already need at least 10GB or you get OOM, you
> have a long way to go with that approach. Good luck :)
> How many docs do you have ? I'm guessing its mostly FieldCache type
> stuff, and thats the type of thing you can't really side step, unless
> you give up the functionality thats using it.
> Grant Ingersoll wrote:
>> On Sep 25, 2009, at 9:30 AM, Jonathan Ariel wrote:
>>> Hi to all!
>>> Lately my solr servers seem to stop responding once in a while. I'm
>>> using
>>> solr 1.3.
>>> Of course I'm having more traffic on the servers.
>>> So I logged the Garbage Collection activity to check if it's because of
>>> that. It seems like 11% of the time the application runs, it is stopped
>>> because of GC. And some times the GC takes up to 10 seconds!
>>> Is is normal? My instances run on a 16GB RAM, Dual Quad Core Intel Xeon
>>> servers. My index is around 10GB and I'm giving to the instances 10GB of
>>> RAM.
>>> How can I check which is the GC that it is being used? If I'm right JVM
>>> Ergonomics should use the Throughput GC, but I'm not 100% sure. Do
>>> you have
>>> any recommendation on this?
>> As I said in Eteve's thread on JVM settings, some extra time spent on
>> application design/debugging will save a whole lot of headache in
>> Garbage Collection and trying to tune the gazillion different options
>> available.  Ask yourself:  What is on the heap and does it need to be
>> there?  For instance, do you, if you have them, really need sortable
>> ints?   If your servers seem to come to a stop, I'm going to bet you
>> have major collections going on.  Major collections in a production
>> system are very bad.  They tend to happen right after commits in
>> poorly tuned systems, but can also happen in other places if you let
>> things build up due to really large heaps and/or things like really
>> large cache settings.  I would pull up jConsole and have a look at
>> what is happening when the pauses occur.  Is it a major collection? 
>> If so, then hook up a heap analyzer or a profiler and see what is on
>> the heap around those times.  Then have a look at your schema/config,
>> etc. and see if there are things that are memory intensive (sorting,
>> faceting, excessively large filter caches).
>> --
>> Grant Ingersoll
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
>> using Solr/Lucene:

- Mark

Re: Solr and Garbage Collection

2009-09-25 Thread Mark Miller
Thats a good point too - if you can reduce your need for such a large
heap, by all means, do so.

However, considering you already need at least 10GB or you get OOM, you
have a long way to go with that approach. Good luck :)

How many docs do you have ? I'm guessing its mostly FieldCache type
stuff, and thats the type of thing you can't really side step, unless
you give up the functionality thats using it.

Grant Ingersoll wrote:
> On Sep 25, 2009, at 9:30 AM, Jonathan Ariel wrote:
>> Hi to all!
>> Lately my solr servers seem to stop responding once in a while. I'm
>> using
>> solr 1.3.
>> Of course I'm having more traffic on the servers.
>> So I logged the Garbage Collection activity to check if it's because of
>> that. It seems like 11% of the time the application runs, it is stopped
>> because of GC. And some times the GC takes up to 10 seconds!
>> Is is normal? My instances run on a 16GB RAM, Dual Quad Core Intel Xeon
>> servers. My index is around 10GB and I'm giving to the instances 10GB of
>> RAM.
>> How can I check which is the GC that it is being used? If I'm right JVM
>> Ergonomics should use the Throughput GC, but I'm not 100% sure. Do
>> you have
>> any recommendation on this?
> As I said in Eteve's thread on JVM settings, some extra time spent on
> application design/debugging will save a whole lot of headache in
> Garbage Collection and trying to tune the gazillion different options
> available.  Ask yourself:  What is on the heap and does it need to be
> there?  For instance, do you, if you have them, really need sortable
> ints?   If your servers seem to come to a stop, I'm going to bet you
> have major collections going on.  Major collections in a production
> system are very bad.  They tend to happen right after commits in
> poorly tuned systems, but can also happen in other places if you let
> things build up due to really large heaps and/or things like really
> large cache settings.  I would pull up jConsole and have a look at
> what is happening when the pauses occur.  Is it a major collection? 
> If so, then hook up a heap analyzer or a profiler and see what is on
> the heap around those times.  Then have a look at your schema/config,
> etc. and see if there are things that are memory intensive (sorting,
> faceting, excessively large filter caches).
> --
> Grant Ingersoll
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
> using Solr/Lucene:

- Mark

Re: Solr and Garbage Collection

2009-09-25 Thread Mark Miller
Mark Miller wrote:
> Jonathan Ariel wrote:
>> How can I check which is the GC that it is being used? If I'm right JVM
>> Ergonomics should use the Throughput GC, but I'm not 100% sure. Do you have
>> any recommendation on this?
> Just to straighten out this one too - Ergonomics doesn't use throughput
> - throughput is the collector that allows Ergonomics ;)
> And throughput is the default as long as your machine is detected as
> server class.
> But throughput is not great with large tenured spaces out of the box. It
> only parallelizes the new space collection. You have to turn on an
> option to get parallel tenured collection as well - which is essential
> to scale to large heap sizes.
hmm - I'm not being totally accurate there - ergonomics is what detects
server and so makes throughput the default collector for a server
machine. But much of the GC ergonomics support only works with the
throughput collector. Kind of chicken and egg :)

- Mark

Re: Solr and Garbage Collection

2009-09-25 Thread Mark Miller
Jonathan Ariel wrote:
> How can I check which is the GC that it is being used? If I'm right JVM
> Ergonomics should use the Throughput GC, but I'm not 100% sure. Do you have
> any recommendation on this?
Just to straighten out this one too - Ergonomics doesn't use throughput
- throughput is the collector that allows Ergonomics ;)

And throughput is the default as long as your machine is detected as
server class.

But throughput is not great with large tenured spaces out of the box. It
only parallelizes the new space collection. You have to turn on an
option to get parallel tenured collection as well - which is essential
to scale to large heap sizes.

- Mark

Re: Solr and Garbage Collection

2009-09-25 Thread Grant Ingersoll

On Sep 25, 2009, at 9:30 AM, Jonathan Ariel wrote:

Hi to all!
Lately my solr servers seem to stop responding once in a while. I'm  

solr 1.3.
Of course I'm having more traffic on the servers.
So I logged the Garbage Collection activity to check if it's because  
that. It seems like 11% of the time the application runs, it is  

because of GC. And some times the GC takes up to 10 seconds!
Is is normal? My instances run on a 16GB RAM, Dual Quad Core Intel  
servers. My index is around 10GB and I'm giving to the instances  
10GB of


How can I check which is the GC that it is being used? If I'm right  
Ergonomics should use the Throughput GC, but I'm not 100% sure. Do  
you have

any recommendation on this?

As I said in Eteve's thread on JVM settings, some extra time spent on  
application design/debugging will save a whole lot of headache in  
Garbage Collection and trying to tune the gazillion different options  
available.  Ask yourself:  What is on the heap and does it need to be  
there?  For instance, do you, if you have them, really need sortable  
ints?   If your servers seem to come to a stop, I'm going to bet you  
have major collections going on.  Major collections in a production  
system are very bad.  They tend to happen right after commits in  
poorly tuned systems, but can also happen in other places if you let  
things build up due to really large heaps and/or things like really  
large cache settings.  I would pull up jConsole and have a look at  
what is happening when the pauses occur.  Is it a major collection?   
If so, then hook up a heap analyzer or a profiler and see what is on  
the heap around those times.  Then have a look at your schema/config,  
etc. and see if there are things that are memory intensive (sorting,  
faceting, excessively large filter caches).

Grant Ingersoll

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

Re: Solr and Garbage Collection

2009-09-25 Thread Jonathan Ariel
Ok. I'll first change the GC and see if the time spent decreased. Than
I'll try increasing the heap as Fuad recommends.

On 9/25/09, Mark Miller  wrote:
> When we talk about Collectors, we are not just talking about
> "collecting" - whatever that means. There isn't really a "collecting"
> phase - the whole algorithm is garbage collecting - hence calling the
> different implementations "collectors".
> Usually, fragmentation is dealt with using a mark-compact collector (or
> IBM has used a mark-sweep-compact collector).
> Copying collectors are not only super efficient at collecting young
> spaces, but they are also great for fragmentation - when you copy
> everything to the new space, you can remove any fragmentation. At the
> cost of double the space requirements though.
> So mark-compact is a compromise. First you mark whats reachable, then
> everything thats marked is copied/compacted to the bottom of the heap.
> Its all part of a "collection" though.
> Jonathan Ariel wrote:
>> Maybe what's missing here is how did I get the 11%.I just ran solr with
>> the
>> following JVM params: -XX:+PrintGCApplicationConcurrentTime
>> -XX:+PrintGCApplicationStoppedTime with that I can measure the amount of
>> time the application run between collection pauses and the length of the
>> collection pauses, respectively.
>> I think that in this case the 11% is just for memory collection and not
>> defragmentation... but I'm not 100% sure.
>> On Fri, Sep 25, 2009 at 5:05 PM, Fuad Efendi  wrote:
>>> But again, GC is not just "Garbage Collection" as many in this thread
>>> think... it is also "memory defragmentation" which is much costly than
>>> "collection" just because it needs move somewhere _live_objects_ (and
>>> wait/lock till such objects get unlocked to be moved...) - obviously more
>>> memory helps...
>>> 11% is extremely high.
>>> -Fuad
 -Original Message-
 From: Jonathan Ariel []
 Sent: September-25-09 3:36 PM
 Subject: Re: FW: Solr and Garbage Collection

 I'm not planning on lowering the heap. I just want to lower the time
 "wasted" on GC, which is 11% right now.So what I'll try is changing the

>>> GC
 to -XX:+UseConcMarkSweepGC

 On Fri, Sep 25, 2009 at 4:17 PM, Fuad Efendi  wrote:

> Mark,
> what if piece of code needs 10 contiguous Kb to load a document field?
>>> How
> locked memory pieces are optimized/moved (putting on hold almost whole
> application)?
> Lowering heap is _bad_ idea; we will have extremely frequent GC
>>> (optimize
> of
> live objects!!!) even if RAM is (theoretically) enough.
> -Fuad
>> Faud, you didn't read the thread right.
>> He is not having a problem with OOM. He got the OOM because he
>>> lowered
>> the heap to try and help GC.
>> He normally runs with a heap that can handle his FC.
>> Please re-read the thread. You are confusing the tread.
>> - Mark
>>> GC will frequently happen even if RAM is more than enough: in case
>>> if
>>> it
> is
>>> heavily sparse... so that have even more RAM!
>>> -Fuad
> --
> - Mark

Re: Solr and Garbage Collection

2009-09-25 Thread Mark Miller
This all applies to having more than once processor though - if you have
one processor, than non concurrent can also make sense.

But especially with the young space, you want concurrency - with upto
98% of objects being short lived, and multiple threads generally
creating new objects, its a huge boon to collect the young space

Mark Miller wrote:
> Walter Underwood wrote:
>> For batch-oriented computing, like Hadoop, the most efficient GC is probably
>> a non-concurrent, non-generational GC. 
> Okay - for batch we somewhat agree I guess - if you can stand any length
> of pausing, non concurrent can be nice, because you don't pay for thread
> sync communication. Only with a small heap size though (less than 100MB
> is what I've seen). You would pause the batch job while GC takes place.
> If you have 8 processors, and you are pausing all of them to collect a
> large heap using only 1 processor, that doesn't make much sense to me.
> The thread communication pain will be far outweighed by using more
> processors to do the collection faster, and not "stop the world" for
> your batch job so long. Stopping your application dead in its tracks,
> and then only using one of the available processors to collect a large
> heap, while the rest sit idle, doesn't make much sense.
> I also don't agree it ever really makes sense not to do generational
> collection. What is your argument here? Generational collection is
> **way** more efficient for short lived objects, which tend to be up to
> 98% of the objects in most applications. The only way I see that making
> sense is if you have almost no short lived objects (which occurs in
> what, .0001% of apps if at all?). The Sun JVM doesn't even offer a non
> generational approach anymore. It's just standard GC practice.
>> I doubt that there are many
>> batch-oriented applications of Solr, though.
>> The rest of the advice is intended to be general and it sounds like we agree
>> about sizing. If the nursery is not big enough, the tenured space will be
>> used for allocations that have a short lifetime and that will increase the
>> length and/or frequency of major collections.
> Yes - I wasn't arguing with every point - I was picking and choosing :)
> After the heap size, the size of the young generation is the most
> important factor.
>> Cache evictions are the interesting part, because they cause a constant rate
>> of tenured space garbage. In most many servers, you can get a big enough
>> nursery that major collections are very rare. That won't happen in Solr
>> because of cache evictions.
>> The IBM JVM is excellent. Their concurrent generational GC policy is
>> "gencon".
> Yeah, I actually know very little about the IBM JVM, so I wasn't really
> commenting. But from the info I gleaned here and on a couple quick web
> searches, I'm not too impressed by it's GC.
>> wunder
>> -Original Message-
>> From: Mark Miller [] 
>> Sent: Friday, September 25, 2009 10:31 AM
>> To:
>> Subject: Re: Solr and Garbage Collection
>> My bad - later, it looks as if your giving general advice, and thats
>> what I took issue with.
>> Any Collector that is not doing generational collection is essentially
>> from the dark ages and shouldn't be used.
>> Any Collector that doesn't have concurrent options, unless possibly your
>> running a tiny app (under 100MB of RAM), or only have a single CPU, is
>> also dark ages, and not fit for a server environement.
>> I havn't kept up with IBM's JVM, but it sounds like they are well behind
>> Sun in GC then.
>> - Mark
>> Walter Underwood wrote:
>>> As I said, I was using the IBM JVM, not the Sun JVM. The "concurrent low
>>> pause" collector is only in the Sun JVM.
>>> I just found this excellent article about the various IBM GC options for a
>>> Lucene application with a 100GB heap:
>>> _h.html
>>> wunder
>>> -Original Message-
>>> From: Mark Miller [] 
>>> Sent: Friday, September 25, 2009 10:03 AM
>>> To:
>>> Subject: Re: Solr and Garbage

Re: Solr and Garbage Collection

2009-09-25 Thread Mark Miller
Walter Underwood wrote:
> For batch-oriented computing, like Hadoop, the most efficient GC is probably
> a non-concurrent, non-generational GC. 
Okay - for batch we somewhat agree I guess - if you can stand any length
of pausing, non concurrent can be nice, because you don't pay for thread
sync communication. Only with a small heap size though (less than 100MB
is what I've seen). You would pause the batch job while GC takes place.
If you have 8 processors, and you are pausing all of them to collect a
large heap using only 1 processor, that doesn't make much sense to me.
The thread communication pain will be far outweighed by using more
processors to do the collection faster, and not "stop the world" for
your batch job so long. Stopping your application dead in its tracks,
and then only using one of the available processors to collect a large
heap, while the rest sit idle, doesn't make much sense.

I also don't agree it ever really makes sense not to do generational
collection. What is your argument here? Generational collection is
**way** more efficient for short lived objects, which tend to be up to
98% of the objects in most applications. The only way I see that making
sense is if you have almost no short lived objects (which occurs in
what, .0001% of apps if at all?). The Sun JVM doesn't even offer a non
generational approach anymore. It's just standard GC practice.
> I doubt that there are many
> batch-oriented applications of Solr, though.
> The rest of the advice is intended to be general and it sounds like we agree
> about sizing. If the nursery is not big enough, the tenured space will be
> used for allocations that have a short lifetime and that will increase the
> length and/or frequency of major collections.
Yes - I wasn't arguing with every point - I was picking and choosing :)
After the heap size, the size of the young generation is the most
important factor.
> Cache evictions are the interesting part, because they cause a constant rate
> of tenured space garbage. In most many servers, you can get a big enough
> nursery that major collections are very rare. That won't happen in Solr
> because of cache evictions.
> The IBM JVM is excellent. Their concurrent generational GC policy is
> "gencon".
Yeah, I actually know very little about the IBM JVM, so I wasn't really
commenting. But from the info I gleaned here and on a couple quick web
searches, I'm not too impressed by it's GC.
> wunder
> -Original Message-
> From: Mark Miller [] 
> Sent: Friday, September 25, 2009 10:31 AM
> To:
> Subject: Re: Solr and Garbage Collection
> My bad - later, it looks as if your giving general advice, and thats
> what I took issue with.
> Any Collector that is not doing generational collection is essentially
> from the dark ages and shouldn't be used.
> Any Collector that doesn't have concurrent options, unless possibly your
> running a tiny app (under 100MB of RAM), or only have a single CPU, is
> also dark ages, and not fit for a server environement.
> I havn't kept up with IBM's JVM, but it sounds like they are well behind
> Sun in GC then.
> - Mark
> Walter Underwood wrote:
>> As I said, I was using the IBM JVM, not the Sun JVM. The "concurrent low
>> pause" collector is only in the Sun JVM.
>> I just found this excellent article about the various IBM GC options for a
>> Lucene application with a 100GB heap:
>> _h.html
>> wunder
>> -Original Message-
>> From: Mark Miller [] 
>> Sent: Friday, September 25, 2009 10:03 AM
>> To:
>> Subject: Re: Solr and Garbage Collection
>> Walter Underwood wrote:
>>> 30ms is not better or worse than 1s until you look at the service
>>> requirements. For many applications, it is worth dedicating 10% of your
>>> processing time to GC if that makes the worst-case pause short.
>>> On the other hand, my experience with the IBM JVM was that the maximum
>> query
>>> rate was 2-3X better with the concurrent generational GC compared to any
>> of
>>> their other GC algorithms, so we got the best throughput along with the
>>> shortest pauses.
>> With which collector? Since 

RE: Solr and Garbage Collection

2009-09-25 Thread Walter Underwood
For batch-oriented computing, like Hadoop, the most efficient GC is probably
a non-concurrent, non-generational GC. I doubt that there are many
batch-oriented applications of Solr, though.

The rest of the advice is intended to be general and it sounds like we agree
about sizing. If the nursery is not big enough, the tenured space will be
used for allocations that have a short lifetime and that will increase the
length and/or frequency of major collections.

Cache evictions are the interesting part, because they cause a constant rate
of tenured space garbage. In most many servers, you can get a big enough
nursery that major collections are very rare. That won't happen in Solr
because of cache evictions.

The IBM JVM is excellent. Their concurrent generational GC policy is


-Original Message-
From: Mark Miller [] 
Sent: Friday, September 25, 2009 10:31 AM
Subject: Re: Solr and Garbage Collection

My bad - later, it looks as if your giving general advice, and thats
what I took issue with.

Any Collector that is not doing generational collection is essentially
from the dark ages and shouldn't be used.

Any Collector that doesn't have concurrent options, unless possibly your
running a tiny app (under 100MB of RAM), or only have a single CPU, is
also dark ages, and not fit for a server environement.

I havn't kept up with IBM's JVM, but it sounds like they are well behind
Sun in GC then.

- Mark

Walter Underwood wrote:
> As I said, I was using the IBM JVM, not the Sun JVM. The "concurrent low
> pause" collector is only in the Sun JVM.
> I just found this excellent article about the various IBM GC options for a
> Lucene application with a 100GB heap:
> _h.html
> wunder
> -Original Message-
> From: Mark Miller [] 
> Sent: Friday, September 25, 2009 10:03 AM
> To:
> Subject: Re: Solr and Garbage Collection
> Walter Underwood wrote:
>> 30ms is not better or worse than 1s until you look at the service
>> requirements. For many applications, it is worth dedicating 10% of your
>> processing time to GC if that makes the worst-case pause short.
>> On the other hand, my experience with the IBM JVM was that the maximum
> query
>> rate was 2-3X better with the concurrent generational GC compared to any
> of
>> their other GC algorithms, so we got the best throughput along with the
>> shortest pauses.
> With which collector? Since the very early JVM's, all GC is generational.
> Most of the collectors (other than the Serial Collector) also work
> concurrently.
> By default, they are concurrent on different generations, but you can
> add concurrency
> to the "other" generation with each now too.
>> Solr garbage generation (for queries) seems to have two major components:
>> per-request garbage and cache evictions. With a generational collector,
>> these two are handled by separate parts of the collector.
> Different parts of the collector? Its a different collector depending on
> the generation.
> The young generation is collected with a copy collector. This is because
> almost all the objects
> in the young generation are likely dead, and a copy collector only needs
> to visit live objects. So
> its very efficient. The tenured generation uses something more along the
> lines of mark and sweep or mark
> and compact.
>>  Per-request
>> garbage should completely fit in the short-term heap (nursery), so that
>> can be collected rapidly and returned to use for further requests. If the
>> nursery is too small, the per-request allocations will be made in tenured
>> space and sit there until the next major GC. Cache evictions are almost
>> always in long-term storage (tenured space) because an LRU algorithm
>> guarantees that the garbage will be old.
>> Check the growth rate of tenured space (under constant load, of course)
>> while increasing the size of the nursery. That rate should drop when the
>> nursery gets big enough, then not drop much further as it is increased
> more.
>> After that, reduce the size of tenured space until major GCs start
> happening
>> "too often" (a judgment call). A bigger tenured space means longer major
> GCs
>> and thus longer pauses, so you don't want it oversized by too much.
> With the concurrent low pause coll

Re: Solr and Garbage Collection

2009-09-25 Thread Mark Miller
My bad - later, it looks as if your giving general advice, and thats
what I took issue with.

Any Collector that is not doing generational collection is essentially
from the dark ages and shouldn't be used.

Any Collector that doesn't have concurrent options, unless possibly your
running a tiny app (under 100MB of RAM), or only have a single CPU, is
also dark ages, and not fit for a server environement.

I havn't kept up with IBM's JVM, but it sounds like they are well behind
Sun in GC then.

- Mark

Walter Underwood wrote:
> As I said, I was using the IBM JVM, not the Sun JVM. The "concurrent low
> pause" collector is only in the Sun JVM.
> I just found this excellent article about the various IBM GC options for a
> Lucene application with a 100GB heap:
> _h.html
> wunder
> -Original Message-
> From: Mark Miller [] 
> Sent: Friday, September 25, 2009 10:03 AM
> To:
> Subject: Re: Solr and Garbage Collection
> Walter Underwood wrote:
>> 30ms is not better or worse than 1s until you look at the service
>> requirements. For many applications, it is worth dedicating 10% of your
>> processing time to GC if that makes the worst-case pause short.
>> On the other hand, my experience with the IBM JVM was that the maximum
> query
>> rate was 2-3X better with the concurrent generational GC compared to any
> of
>> their other GC algorithms, so we got the best throughput along with the
>> shortest pauses.
> With which collector? Since the very early JVM's, all GC is generational.
> Most of the collectors (other than the Serial Collector) also work
> concurrently.
> By default, they are concurrent on different generations, but you can
> add concurrency
> to the "other" generation with each now too.
>> Solr garbage generation (for queries) seems to have two major components:
>> per-request garbage and cache evictions. With a generational collector,
>> these two are handled by separate parts of the collector.
> Different parts of the collector? Its a different collector depending on
> the generation.
> The young generation is collected with a copy collector. This is because
> almost all the objects
> in the young generation are likely dead, and a copy collector only needs
> to visit live objects. So
> its very efficient. The tenured generation uses something more along the
> lines of mark and sweep or mark
> and compact.
>>  Per-request
>> garbage should completely fit in the short-term heap (nursery), so that it
>> can be collected rapidly and returned to use for further requests. If the
>> nursery is too small, the per-request allocations will be made in tenured
>> space and sit there until the next major GC. Cache evictions are almost
>> always in long-term storage (tenured space) because an LRU algorithm
>> guarantees that the garbage will be old.
>> Check the growth rate of tenured space (under constant load, of course)
>> while increasing the size of the nursery. That rate should drop when the
>> nursery gets big enough, then not drop much further as it is increased
> more.
>> After that, reduce the size of tenured space until major GCs start
> happening
>> "too often" (a judgment call). A bigger tenured space means longer major
> GCs
>> and thus longer pauses, so you don't want it oversized by too much.
> With the concurrent low pause collector, the goal is to avoid "major"
> collections,
> by collecting *before* the tenured space is filled. If you you are
> getting "major" collections,
> you need to tune your settings - the whole point of that collector is to
> avoid "major"
> collections, and do almost all of the work while your application is not
> paused. There are
> still 2 brief pauses during the collection, but they should not be
> significant at all.
>> Also check the hit rates of your caches. If the hit rate is low, say 20%
> or
>> less, make that cache much bigger or set it to zero. Either one will
> reduce
>> the number of cache evictions. If you have an HTTP cache in front of Solr,
>> zero may be the right choice, since the HTTP cache is cherry-picking the
>> easily cacheable requests.
>> Note that a commit nearly doubles the memory required, because you have

Re: Solr and Garbage Collection

2009-09-25 Thread Jonathan Ariel
Ok. I will try with the "concurrent low pause" collector and let you know
the results.
On Fri, Sep 25, 2009 at 2:23 PM, Walter Underwood wrote:

> As I said, I was using the IBM JVM, not the Sun JVM. The "concurrent low
> pause" collector is only in the Sun JVM.
> I just found this excellent article about the various IBM GC options for a
> Lucene application with a 100GB heap:
> _h.html
> wunder
> -Original Message-
> From: Mark Miller []
> Sent: Friday, September 25, 2009 10:03 AM
> To:
> Subject: Re: Solr and Garbage Collection
> Walter Underwood wrote:
> > 30ms is not better or worse than 1s until you look at the service
> > requirements. For many applications, it is worth dedicating 10% of your
> > processing time to GC if that makes the worst-case pause short.
> >
> > On the other hand, my experience with the IBM JVM was that the maximum
> query
> > rate was 2-3X better with the concurrent generational GC compared to any
> of
> > their other GC algorithms, so we got the best throughput along with the
> > shortest pauses.
> >
> With which collector? Since the very early JVM's, all GC is generational.
> Most of the collectors (other than the Serial Collector) also work
> concurrently.
> By default, they are concurrent on different generations, but you can
> add concurrency
> to the "other" generation with each now too.
> > Solr garbage generation (for queries) seems to have two major components:
> > per-request garbage and cache evictions. With a generational collector,
> > these two are handled by separate parts of the collector.
> Different parts of the collector? Its a different collector depending on
> the generation.
> The young generation is collected with a copy collector. This is because
> almost all the objects
> in the young generation are likely dead, and a copy collector only needs
> to visit live objects. So
> its very efficient. The tenured generation uses something more along the
> lines of mark and sweep or mark
> and compact.
> >  Per-request
> > garbage should completely fit in the short-term heap (nursery), so that
> it
> > can be collected rapidly and returned to use for further requests. If the
> > nursery is too small, the per-request allocations will be made in tenured
> > space and sit there until the next major GC. Cache evictions are almost
> > always in long-term storage (tenured space) because an LRU algorithm
> > guarantees that the garbage will be old.
> >
> > Check the growth rate of tenured space (under constant load, of course)
> > while increasing the size of the nursery. That rate should drop when the
> > nursery gets big enough, then not drop much further as it is increased
> more.
> >
> > After that, reduce the size of tenured space until major GCs start
> happening
> > "too often" (a judgment call). A bigger tenured space means longer major
> GCs
> > and thus longer pauses, so you don't want it oversized by too much.
> >
> With the concurrent low pause collector, the goal is to avoid "major"
> collections,
> by collecting *before* the tenured space is filled. If you you are
> getting "major" collections,
> you need to tune your settings - the whole point of that collector is to
> avoid "major"
> collections, and do almost all of the work while your application is not
> paused. There are
> still 2 brief pauses during the collection, but they should not be
> significant at all.
> > Also check the hit rates of your caches. If the hit rate is low, say 20%
> or
> > less, make that cache much bigger or set it to zero. Either one will
> reduce
> > the number of cache evictions. If you have an HTTP cache in front of
> Solr,
> > zero may be the right choice, since the HTTP cache is cherry-picking the
> > easily cacheable requests.
> >
> > Note that a commit nearly doubles the memory required, because you have
> two
> > live Searcher objects with all their caches. Make sure you have headroom
> for
> > a commit.
> >
> > If you want to test the tenured space usage, you must test with real
> world
> > queries. Those are the only way to get accurate cache eviction rates.
> >
> > wunder
> >
> > -Original Message-
> > From: Jonathan Ariel []
> > Sent: Friday, September 25, 2009 9:34 AM
> > To:
> > Subject: Re: Solr and Garbage Collection

RE: Solr and Garbage Collection

2009-09-25 Thread Walter Underwood
As I said, I was using the IBM JVM, not the Sun JVM. The "concurrent low
pause" collector is only in the Sun JVM.

I just found this excellent article about the various IBM GC options for a
Lucene application with a 100GB heap:


-Original Message-
From: Mark Miller [] 
Sent: Friday, September 25, 2009 10:03 AM
Subject: Re: Solr and Garbage Collection

Walter Underwood wrote:
> 30ms is not better or worse than 1s until you look at the service
> requirements. For many applications, it is worth dedicating 10% of your
> processing time to GC if that makes the worst-case pause short.
> On the other hand, my experience with the IBM JVM was that the maximum
> rate was 2-3X better with the concurrent generational GC compared to any
> their other GC algorithms, so we got the best throughput along with the
> shortest pauses.
With which collector? Since the very early JVM's, all GC is generational.
Most of the collectors (other than the Serial Collector) also work
By default, they are concurrent on different generations, but you can
add concurrency
to the "other" generation with each now too.
> Solr garbage generation (for queries) seems to have two major components:
> per-request garbage and cache evictions. With a generational collector,
> these two are handled by separate parts of the collector.
Different parts of the collector? Its a different collector depending on
the generation.
The young generation is collected with a copy collector. This is because
almost all the objects
in the young generation are likely dead, and a copy collector only needs
to visit live objects. So
its very efficient. The tenured generation uses something more along the
lines of mark and sweep or mark
and compact.
>  Per-request
> garbage should completely fit in the short-term heap (nursery), so that it
> can be collected rapidly and returned to use for further requests. If the
> nursery is too small, the per-request allocations will be made in tenured
> space and sit there until the next major GC. Cache evictions are almost
> always in long-term storage (tenured space) because an LRU algorithm
> guarantees that the garbage will be old.
> Check the growth rate of tenured space (under constant load, of course)
> while increasing the size of the nursery. That rate should drop when the
> nursery gets big enough, then not drop much further as it is increased
> After that, reduce the size of tenured space until major GCs start
> "too often" (a judgment call). A bigger tenured space means longer major
> and thus longer pauses, so you don't want it oversized by too much.
With the concurrent low pause collector, the goal is to avoid "major"
by collecting *before* the tenured space is filled. If you you are
getting "major" collections,
you need to tune your settings - the whole point of that collector is to
avoid "major"
collections, and do almost all of the work while your application is not
paused. There are
still 2 brief pauses during the collection, but they should not be
significant at all.
> Also check the hit rates of your caches. If the hit rate is low, say 20%
> less, make that cache much bigger or set it to zero. Either one will
> the number of cache evictions. If you have an HTTP cache in front of Solr,
> zero may be the right choice, since the HTTP cache is cherry-picking the
> easily cacheable requests.
> Note that a commit nearly doubles the memory required, because you have
> live Searcher objects with all their caches. Make sure you have headroom
> a commit.
> If you want to test the tenured space usage, you must test with real world
> queries. Those are the only way to get accurate cache eviction rates.
> wunder
> -Original Message-
> From: Jonathan Ariel [] 
> Sent: Friday, September 25, 2009 9:34 AM
> To:
> Subject: Re: Solr and Garbage Collection
> BTW why making them equal will lower the frequency of GC?
> On 9/25/09, Fuad Efendi  wrote:
>>> Bigger heaps lead to bigger GC pauses in general.
>> Opposite viewpoint:
>> 1sec GC happening once an hour is MUCH BETTER than 30ms GC
> once-per-second.
>> To lower frequency of GC: -Xms4096m -Xmx4096m (make it equal!)
>> Use -server option.
>> -server option of JVM is 'native CPU code', I remember WebLogic 7 console
>> with SUN JVM 1.3 not showing any GC (just horizontal line).
>> -Fuad

- Mark

Re: Solr and Garbage Collection

2009-09-25 Thread Mark Miller
Walter Underwood wrote:
> 30ms is not better or worse than 1s until you look at the service
> requirements. For many applications, it is worth dedicating 10% of your
> processing time to GC if that makes the worst-case pause short.
> On the other hand, my experience with the IBM JVM was that the maximum query
> rate was 2-3X better with the concurrent generational GC compared to any of
> their other GC algorithms, so we got the best throughput along with the
> shortest pauses.
With which collector? Since the very early JVM's, all GC is generational.
Most of the collectors (other than the Serial Collector) also work
By default, they are concurrent on different generations, but you can
add concurrency
to the "other" generation with each now too.
> Solr garbage generation (for queries) seems to have two major components:
> per-request garbage and cache evictions. With a generational collector,
> these two are handled by separate parts of the collector.
Different parts of the collector? Its a different collector depending on
the generation.
The young generation is collected with a copy collector. This is because
almost all the objects
in the young generation are likely dead, and a copy collector only needs
to visit live objects. So
its very efficient. The tenured generation uses something more along the
lines of mark and sweep or mark
and compact.
>  Per-request
> garbage should completely fit in the short-term heap (nursery), so that it
> can be collected rapidly and returned to use for further requests. If the
> nursery is too small, the per-request allocations will be made in tenured
> space and sit there until the next major GC. Cache evictions are almost
> always in long-term storage (tenured space) because an LRU algorithm
> guarantees that the garbage will be old.
> Check the growth rate of tenured space (under constant load, of course)
> while increasing the size of the nursery. That rate should drop when the
> nursery gets big enough, then not drop much further as it is increased more.
> After that, reduce the size of tenured space until major GCs start happening
> "too often" (a judgment call). A bigger tenured space means longer major GCs
> and thus longer pauses, so you don't want it oversized by too much.
With the concurrent low pause collector, the goal is to avoid "major"
by collecting *before* the tenured space is filled. If you you are
getting "major" collections,
you need to tune your settings - the whole point of that collector is to
avoid "major"
collections, and do almost all of the work while your application is not
paused. There are
still 2 brief pauses during the collection, but they should not be
significant at all.
> Also check the hit rates of your caches. If the hit rate is low, say 20% or
> less, make that cache much bigger or set it to zero. Either one will reduce
> the number of cache evictions. If you have an HTTP cache in front of Solr,
> zero may be the right choice, since the HTTP cache is cherry-picking the
> easily cacheable requests.
> Note that a commit nearly doubles the memory required, because you have two
> live Searcher objects with all their caches. Make sure you have headroom for
> a commit.
> If you want to test the tenured space usage, you must test with real world
> queries. Those are the only way to get accurate cache eviction rates.
> wunder
> -Original Message-
> From: Jonathan Ariel [] 
> Sent: Friday, September 25, 2009 9:34 AM
> To:
> Subject: Re: Solr and Garbage Collection
> BTW why making them equal will lower the frequency of GC?
> On 9/25/09, Fuad Efendi  wrote:
>>> Bigger heaps lead to bigger GC pauses in general.
>> Opposite viewpoint:
>> 1sec GC happening once an hour is MUCH BETTER than 30ms GC
> once-per-second.
>> To lower frequency of GC: -Xms4096m -Xmx4096m (make it equal!)
>> Use -server option.
>> -server option of JVM is 'native CPU code', I remember WebLogic 7 console
>> with SUN JVM 1.3 not showing any GC (just horizontal line).
>> -Fuad

- Mark

RE: Solr and Garbage Collection

2009-09-25 Thread Walter Underwood
30ms is not better or worse than 1s until you look at the service
requirements. For many applications, it is worth dedicating 10% of your
processing time to GC if that makes the worst-case pause short.

On the other hand, my experience with the IBM JVM was that the maximum query
rate was 2-3X better with the concurrent generational GC compared to any of
their other GC algorithms, so we got the best throughput along with the
shortest pauses.

Solr garbage generation (for queries) seems to have two major components:
per-request garbage and cache evictions. With a generational collector,
these two are handled by separate parts of the collector. Per-request
garbage should completely fit in the short-term heap (nursery), so that it
can be collected rapidly and returned to use for further requests. If the
nursery is too small, the per-request allocations will be made in tenured
space and sit there until the next major GC. Cache evictions are almost
always in long-term storage (tenured space) because an LRU algorithm
guarantees that the garbage will be old.

Check the growth rate of tenured space (under constant load, of course)
while increasing the size of the nursery. That rate should drop when the
nursery gets big enough, then not drop much further as it is increased more.

After that, reduce the size of tenured space until major GCs start happening
"too often" (a judgment call). A bigger tenured space means longer major GCs
and thus longer pauses, so you don't want it oversized by too much.

Also check the hit rates of your caches. If the hit rate is low, say 20% or
less, make that cache much bigger or set it to zero. Either one will reduce
the number of cache evictions. If you have an HTTP cache in front of Solr,
zero may be the right choice, since the HTTP cache is cherry-picking the
easily cacheable requests.

Note that a commit nearly doubles the memory required, because you have two
live Searcher objects with all their caches. Make sure you have headroom for
a commit.

If you want to test the tenured space usage, you must test with real world
queries. Those are the only way to get accurate cache eviction rates.


-Original Message-
From: Jonathan Ariel [] 
Sent: Friday, September 25, 2009 9:34 AM
Subject: Re: Solr and Garbage Collection

BTW why making them equal will lower the frequency of GC?

On 9/25/09, Fuad Efendi  wrote:
>> Bigger heaps lead to bigger GC pauses in general.
> Opposite viewpoint:
> 1sec GC happening once an hour is MUCH BETTER than 30ms GC
> To lower frequency of GC: -Xms4096m -Xmx4096m (make it equal!)
> Use -server option.
> -server option of JVM is 'native CPU code', I remember WebLogic 7 console
> with SUN JVM 1.3 not showing any GC (just horizontal line).
> -Fuad

Re: Solr and Garbage Collection

2009-09-25 Thread Mark Miller
>-server option of JVM is 'native CPU code', I remember WebLogic 7 console
>with SUN JVM 1.3 not showing any GC (just horizontal line).

Not sure what that is all about either. -server and -client are just two
different versions of hotspot.
The -server version is optimized for long running applications - it
starts slower, and over time, it learns
about your app and makes good throughput optimizations.

The -client hotspot version works faster quicker, and does concentrate
more on response than throughput.
Better for desktop apps. -server is better for long lived server apps.

Mark Miller wrote:
> It won't really - it will just keep the JVM from wasting time resizing
> the heap on you. Since you know you need so much RAM anyway, no reason
> not to just pin it at what you need.
> Not going to help you much with GC though.
> Jonathan Ariel wrote:
>> BTW why making them equal will lower the frequency of GC?
>> On 9/25/09, Fuad Efendi  wrote:
 Bigger heaps lead to bigger GC pauses in general.
>>> Opposite viewpoint:
>>> 1sec GC happening once an hour is MUCH BETTER than 30ms GC once-per-second.
>>> To lower frequency of GC: -Xms4096m -Xmx4096m (make it equal!)
>>> Use -server option.
>>> -server option of JVM is 'native CPU code', I remember WebLogic 7 console
>>> with SUN JVM 1.3 not showing any GC (just horizontal line).
>>> -Fuad

- Mark

Re: Solr and Garbage Collection

2009-09-25 Thread Mark Miller
It won't really - it will just keep the JVM from wasting time resizing
the heap on you. Since you know you need so much RAM anyway, no reason
not to just pin it at what you need.
Not going to help you much with GC though.

Jonathan Ariel wrote:
> BTW why making them equal will lower the frequency of GC?
> On 9/25/09, Fuad Efendi  wrote:
>>> Bigger heaps lead to bigger GC pauses in general.
>> Opposite viewpoint:
>> 1sec GC happening once an hour is MUCH BETTER than 30ms GC once-per-second.
>> To lower frequency of GC: -Xms4096m -Xmx4096m (make it equal!)
>> Use -server option.
>> -server option of JVM is 'native CPU code', I remember WebLogic 7 console
>> with SUN JVM 1.3 not showing any GC (just horizontal line).
>> -Fuad

- Mark

Re: Solr and Garbage Collection

2009-09-25 Thread Jonathan Ariel
BTW why making them equal will lower the frequency of GC?

On 9/25/09, Fuad Efendi  wrote:
>> Bigger heaps lead to bigger GC pauses in general.
> Opposite viewpoint:
> 1sec GC happening once an hour is MUCH BETTER than 30ms GC once-per-second.
> To lower frequency of GC: -Xms4096m -Xmx4096m (make it equal!)
> Use -server option.
> -server option of JVM is 'native CPU code', I remember WebLogic 7 console
> with SUN JVM 1.3 not showing any GC (just horizontal line).
> -Fuad

Re: Solr and Garbage Collection

2009-09-25 Thread Jonathan Ariel
I can't really understand how increasing the heap will decrease the
11% dedicated to GC

On 9/25/09, Fuad Efendi  wrote:
>> You are saying that I should give more memory than 12GB?
> Yes. Look at this:
>> > SEVERE: java.lang.OutOfMemoryError: Java heap space
>> 61
>> > )
> It can't find few (!!!) contiguous bytes for .createValue(...)
> It can't add (Field Value, Document ID) pair to an array.
> GC tuning won't help in this specific case...
> May be SOLR/Lucene core developers may WARM FieldCache at IndexReader
> opening time, in the future... to have early OOM...
> Avoiding faceting (and sorting) on such field will only postpone OOM to
> unpredictable date/time...
> -Fuad

RE: Solr and Garbage Collection

2009-09-25 Thread cbennett
I would look at the JVM. Have you tried switching to the concurrent low
pause collector ?


-Original Message-
From: Jonathan Ariel [] 
Sent: Friday, September 25, 2009 12:07 PM
Subject: Re: Solr and Garbage Collection

You are saying that I should give more memory than 12GB?
When I was with 10GB I had the exceptions that I sent. Switching to 12GB
made them disappear.
So I think I don't have problems with FieldCache any more. What it seems
like a problem is 11% on the application time dedicated to GC. Specially
when those servers are under really heavy load.
I think that's why I sometimes get queries that in one moment are being
executed in a few ms and a moment after 20 seconds!

It seems like I should tune my jvm, don't you think so?

On Fri, Sep 25, 2009 at 1:01 PM, Fuad Efendi  wrote:

> Give it even more memory.
> Lucene FieldCache is used to store non-tokenized single-value non-boolean
> (DocumentId -> FieldValue) pairs, and it is used (in-full!) for instance
> for
> sorting query results.
> So that if you have 100,000,000 documents with specific heavily
> field values (cardinality is high! Size is 100bytes!) you need
> 10,000,000,000 bytes for just this instance of FieldCache.
> GC does not play any role. FieldCache won't be GC-collected.
> -Fuad
> > -Original Message-
> > From: Jonathan Ariel []
> > Sent: September-25-09 11:37 AM
> > To:;
> > Subject: Re: Solr and Garbage Collection
> >
> > Right, now I'm giving it 12GB of heap memory.
> > If I give it less (10GB) it throws the following exception:
> >
> > Sep 5, 2009 7:18:32 PM org.apache.solr.common.SolrException log
> > SEVERE: java.lang.OutOfMemoryError: Java heap space
> > at
> >
> 61
> > )
> > at
> >$Cache.get(
> > at
> >
> 52
> > )
> > at
> >
> 67
> > )
> > at
> >
> > at
> >
> 07
> > )
> > at
> >
> org.apache.solr.request.SimpleFacets.getFacetCounts(
> > at
> >
> :7
> > 0)
> > at
> >
> le
> >
> > at
> >
> ja
> > va:131)
> > at org.apache.solr.core.SolrCore.execute(
> > at
> >
> 03
> > )
> > at
> >
> 23
> > 2)
> > at
> >
> .j
> > ava:1089)
> > at
> > org.mortbay.jetty.servlet.ServletHandler.handle(
> > at
> >
> > at
> > org.mortbay.jetty.servlet.SessionHandler.handle(
> > at
> > org.mortbay.jetty.handler.ContextHandler.handle(
> > at
> > org.mortbay.jetty.webapp.WebAppContext.handle(
> > at
> >
> ec
> >
> > at
> >
> 4)
> > at
> > org.mortbay.jetty.handler.HandlerWrapper.handle(
> > at org.mortbay.jetty.Server.handle(
> > at
> > org.mortbay.jetty.HttpConnection.handleRequest(
> > at
> >

Re: Solr and Garbage Collection

2009-09-25 Thread Mark Miller
Yes - more RAM is not a solution to your problem.

Jonathan Ariel wrote:
> You are saying that I should give more memory than 12GB?
> When I was with 10GB I had the exceptions that I sent. Switching to 12GB
> made them disappear.
> So I think I don't have problems with FieldCache any more. What it seems
> like a problem is 11% on the application time dedicated to GC. Specially
> when those servers are under really heavy load.
> I think that's why I sometimes get queries that in one moment are being
> executed in a few ms and a moment after 20 seconds!
> It seems like I should tune my jvm, don't you think so?
> On Fri, Sep 25, 2009 at 1:01 PM, Fuad Efendi  wrote:
>> Give it even more memory.
>> Lucene FieldCache is used to store non-tokenized single-value non-boolean
>> (DocumentId -> FieldValue) pairs, and it is used (in-full!) for instance
>> for
>> sorting query results.
>> So that if you have 100,000,000 documents with specific heavily distributed
>> field values (cardinality is high! Size is 100bytes!) you need
>> 10,000,000,000 bytes for just this instance of FieldCache.
>> GC does not play any role. FieldCache won't be GC-collected.
>> -Fuad
>>> -----Original Message-
>>> From: Jonathan Ariel []
>>> Sent: September-25-09 11:37 AM
>>> To:;
>>> Subject: Re: Solr and Garbage Collection
>>> Right, now I'm giving it 12GB of heap memory.
>>> If I give it less (10GB) it throws the following exception:
>>> Sep 5, 2009 7:18:32 PM org.apache.solr.common.SolrException log
>>> SEVERE: java.lang.OutOfMemoryError: Java heap space
>>> at
>> 61
>>> )
>>> at
>>> at
>> 52
>>> )
>>> at
>> org.apache.solr.request.SimpleFacets.getFieldCacheCounts(
>> 67
>>> )
>>> at
>>> org.apache.solr.request.SimpleFacets.getTermCounts(
>>> at
>> org.apache.solr.request.SimpleFacets.getFacetFieldCounts(
>> 07
>>> )
>>> at
>> org.apache.solr.request.SimpleFacets.getFacetCounts(
>>> at
>> org.apache.solr.handler.component.FacetComponent.process(
>> :7
>>> 0)
>>> at
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHand
>> le
>>> at
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.
>> ja
>>> va:131)
>>> at org.apache.solr.core.SolrCore.execute(
>>> at
>> org.apache.solr.servlet.SolrDispatchFilter.execute(
>> 03
>>> )
>>> at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
>> 23
>>> 2)
>>> at
>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler
>> .j
>>> ava:1089)
>>> at
>>> org.mortbay.jetty.servlet.ServletHandler.handle(
>>> at
>>> at
>>> org.mortbay.jetty.servlet.SessionHandler.handle(
>>> at
>>> org.mortbay.jetty.handler.ContextHandler.handle(
>>> at
>>> org.mortbay.jetty.webapp.WebAppContext.handle(
>>> at

RE: Solr and Garbage Collection

2009-09-25 Thread Fuad Efendi
> You are saying that I should give more memory than 12GB?

Yes. Look at this:

> > SEVERE: java.lang.OutOfMemoryError: Java heap space
> 61
> > )

It can't find few (!!!) contiguous bytes for .createValue(...)

It can't add (Field Value, Document ID) pair to an array.

GC tuning won't help in this specific case...

May be SOLR/Lucene core developers may WARM FieldCache at IndexReader
opening time, in the future... to have early OOM...

Avoiding faceting (and sorting) on such field will only postpone OOM to
unpredictable date/time...


Re: Solr and Garbage Collection

2009-09-25 Thread Jonathan Ariel
You are saying that I should give more memory than 12GB?
When I was with 10GB I had the exceptions that I sent. Switching to 12GB
made them disappear.
So I think I don't have problems with FieldCache any more. What it seems
like a problem is 11% on the application time dedicated to GC. Specially
when those servers are under really heavy load.
I think that's why I sometimes get queries that in one moment are being
executed in a few ms and a moment after 20 seconds!

It seems like I should tune my jvm, don't you think so?

On Fri, Sep 25, 2009 at 1:01 PM, Fuad Efendi  wrote:

> Give it even more memory.
> Lucene FieldCache is used to store non-tokenized single-value non-boolean
> (DocumentId -> FieldValue) pairs, and it is used (in-full!) for instance
> for
> sorting query results.
> So that if you have 100,000,000 documents with specific heavily distributed
> field values (cardinality is high! Size is 100bytes!) you need
> 10,000,000,000 bytes for just this instance of FieldCache.
> GC does not play any role. FieldCache won't be GC-collected.
> -Fuad
> > -Original Message-
> > From: Jonathan Ariel []
> > Sent: September-25-09 11:37 AM
> > To:;
> > Subject: Re: Solr and Garbage Collection
> >
> > Right, now I'm giving it 12GB of heap memory.
> > If I give it less (10GB) it throws the following exception:
> >
> > Sep 5, 2009 7:18:32 PM org.apache.solr.common.SolrException log
> > SEVERE: java.lang.OutOfMemoryError: Java heap space
> > at
> >
> 61
> > )
> > at
> >$Cache.get(
> > at
> >
> 52
> > )
> > at
> >
> org.apache.solr.request.SimpleFacets.getFieldCacheCounts(
> 67
> > )
> > at
> > org.apache.solr.request.SimpleFacets.getTermCounts(
> > at
> >
> org.apache.solr.request.SimpleFacets.getFacetFieldCounts(
> 07
> > )
> > at
> >
> org.apache.solr.request.SimpleFacets.getFacetCounts(
> > at
> >
> org.apache.solr.handler.component.FacetComponent.process(
> :7
> > 0)
> > at
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHand
> le
> >
> > at
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.
> ja
> > va:131)
> > at org.apache.solr.core.SolrCore.execute(
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(
> 03
> > )
> > at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> 23
> > 2)
> > at
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler
> .j
> > ava:1089)
> > at
> > org.mortbay.jetty.servlet.ServletHandler.handle(
> > at
> >
> > at
> > org.mortbay.jetty.servlet.SessionHandler.handle(
> > at
> > org.mortbay.jetty.handler.ContextHandler.handle(
> > at
> > org.mortbay.jetty.webapp.WebAppContext.handle(
> > at
> >
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerColl
> ec
> >
> > at
> >
> org.mortbay.jetty.handler.HandlerCollection.handle(
> 4)
> > at
> > org.mortbay.jetty.handler.HandlerWrapper.handle(
> > at org.mortbay.jetty.Server.handle(
> > at
> > org.mortbay.jetty.HttpConnection.handleRequest(
> > at
> >
> org.mortbay.jetty.HttpConnection$RequestHandler.content(
> 83
> > 5)
> > at org.mortbay.jetty.HttpParser.parseNext(
> > at
> org.mortbay.jetty.HttpParser.parseAvailable(
> > at
> org.mortbay.jetty.Htt

RE: Solr and Garbage Collection

2009-09-25 Thread Fuad Efendi
Give it even more memory.

Lucene FieldCache is used to store non-tokenized single-value non-boolean
(DocumentId -> FieldValue) pairs, and it is used (in-full!) for instance for
sorting query results.

So that if you have 100,000,000 documents with specific heavily distributed
field values (cardinality is high! Size is 100bytes!) you need
10,000,000,000 bytes for just this instance of FieldCache.

GC does not play any role. FieldCache won't be GC-collected.


> -Original Message-
> From: Jonathan Ariel []
> Sent: September-25-09 11:37 AM
> To:;
> Subject: Re: Solr and Garbage Collection
> Right, now I'm giving it 12GB of heap memory.
> If I give it less (10GB) it throws the following exception:
> Sep 5, 2009 7:18:32 PM org.apache.solr.common.SolrException log
> SEVERE: java.lang.OutOfMemoryError: Java heap space
> at
> )
> at
> at
> )
> at
> )
> at
> org.apache.solr.request.SimpleFacets.getTermCounts(
> at
> )
> at
> org.apache.solr.request.SimpleFacets.getFacetCounts(
> at
> 0)
> at
> at
> va:131)
> at org.apache.solr.core.SolrCore.execute(
> at
> )
> at
> 2)
> at
> ava:1089)
> at
> org.mortbay.jetty.servlet.ServletHandler.handle(
> at
> at
> org.mortbay.jetty.servlet.SessionHandler.handle(
> at
> org.mortbay.jetty.handler.ContextHandler.handle(
> at
> org.mortbay.jetty.webapp.WebAppContext.handle(
> at
> at
> at
> org.mortbay.jetty.handler.HandlerWrapper.handle(
> at org.mortbay.jetty.Server.handle(
> at
> org.mortbay.jetty.HttpConnection.handleRequest(
> at
> 5)
> at org.mortbay.jetty.HttpParser.parseNext(
> at
> at
> at
> at
> )
> On Fri, Sep 25, 2009 at 10:55 AM, Yonik Seeley
> wrote:
> > On Fri, Sep 25, 2009 at 9:30 AM, Jonathan Ariel 
> > wrote:
> > > Hi to all!
> > > Lately my solr servers seem to stop responding once in a while. I'm
> > > solr 1.3.
> > > Of course I'm having more traffic on the servers.
> > > So I logged the Garbage Collection activity to check if it's because
> > > that. It seems like 11% of the time the application runs, it is
> > > because of GC. And some times the GC takes up to 10 seconds!
> > > Is is normal? My instances run on a 16GB RAM, Dual Quad Core Intel
> > > servers. My index is around 10GB and I'm giving to the instances 10GB
> > > RAM.
> >
> > Bigger heaps lead to bigger GC pauses in general.
> > Do you mean that you are giving the JVM a 10GB heap?  Were you getting
> > OOM exceptions with a smaller heap?
> >
> > -Yonik
> >
> >

RE: Solr and Garbage Collection

2009-09-25 Thread Fuad Efendi
> Bigger heaps lead to bigger GC pauses in general.

Opposite viewpoint:
1sec GC happening once an hour is MUCH BETTER than 30ms GC once-per-second. 

To lower frequency of GC: -Xms4096m -Xmx4096m (make it equal!)

Use -server option.

-server option of JVM is 'native CPU code', I remember WebLogic 7 console
with SUN JVM 1.3 not showing any GC (just horizontal line). 


Re: Solr and Garbage Collection

2009-09-25 Thread Mark Miller
I've got the start of a Garbage Collection article here:

I plan to tie it more into Lucene/Solr and add some more about the
theory/methods in the final version.

With so much RAM, I take it you prob have a handful of processors as well?

You might start by trying the Concurrent Low Pause Collector if you have
not. You might also pair it with the parallel new generation collector.
If you still get long pauses, you might try lowering
-XX:CMSInitiatingOccupancyFraction, to kick off major collections earlier.

It can still be difficult with really large fieldcaches, because all of
sudden, everything is released at once when the Reader goes away - but
there should be some combo of settings that at least help alleviate the
issue, especially by dedicating another processor to the task that can
work somewhat in parallel without stopping your application threads for
so long.

If you have some success tuning, report back with your results if you could.

- Mark

Jonathan Ariel wrote:
> Right, now I'm giving it 12GB of heap memory.
> If I give it less (10GB) it throws the following exception:
> Sep 5, 2009 7:18:32 PM org.apache.solr.common.SolrException log
> SEVERE: java.lang.OutOfMemoryError: Java heap space
> at
> at
> at
> at
> org.apache.solr.request.SimpleFacets.getFieldCacheCounts(
> at
> org.apache.solr.request.SimpleFacets.getTermCounts(
> at
> org.apache.solr.request.SimpleFacets.getFacetFieldCounts(
> at
> org.apache.solr.request.SimpleFacets.getFacetCounts(
> at
> org.apache.solr.handler.component.FacetComponent.process(
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(
> at org.apache.solr.core.SolrCore.execute(
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(
> at
> org.mortbay.jetty.servlet.ServletHandler.handle(
> at
> at
> org.mortbay.jetty.servlet.SessionHandler.handle(
> at
> org.mortbay.jetty.handler.ContextHandler.handle(
> at
> org.mortbay.jetty.webapp.WebAppContext.handle(
> at
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(
> at
> org.mortbay.jetty.handler.HandlerCollection.handle(
> at
> org.mortbay.jetty.handler.HandlerWrapper.handle(
> at org.mortbay.jetty.Server.handle(
> at
> org.mortbay.jetty.HttpConnection.handleRequest(
> at
> org.mortbay.jetty.HttpConnection$RequestHandler.content(
> at org.mortbay.jetty.HttpParser.parseNext(
> at org.mortbay.jetty.HttpParser.parseAvailable(
> at org.mortbay.jetty.HttpConnection.handle(
> at
> at
> org.mortbay.thread.BoundedThreadPool$
> On Fri, Sep 25, 2009 at 10:55 AM, Yonik Seeley
> wrote:
>> On Fri, Sep 25, 2009 at 9:30 AM, Jonathan Ariel 
>> wrote:
>>> Hi to all!
>>> Lately my solr servers seem to stop responding once in a while. I'm using
>>> solr 1.3.
>>> Of course I'm having more traffic on the servers.
>>> So I logged the Garbage Collection activity to check if it's because of
>>> that. It seems like 11% of the time the application runs, it is stopped
>>> because of GC. And some times the GC takes up to 10 seconds!
>>> Is is normal? My instances run on a 16GB RAM, Dual Quad Core Intel Xeon
>>> servers. My index is around 10GB and I'm giving to the instances 10GB of
>>> RAM.
>> Bigger heaps lead to bigger GC pauses in general.
>> Do you mean that you are giving the JVM a 10GB heap?  Were you getting
>> OOM exceptions with a smaller heap?
>> -Yonik

RE: Solr and Garbage Collection

2009-09-25 Thread cbennett

Have you looked at tuning the garbage collection ?

Take a look at the following articles

Changing to the concurrent or throughput collector should help with the long


-Original Message-
From: Jonathan Ariel [] 
Sent: Friday, September 25, 2009 11:37 AM
Subject: Re: Solr and Garbage Collection

Right, now I'm giving it 12GB of heap memory.
If I give it less (10GB) it throws the following exception:

Sep 5, 2009 7:18:32 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.OutOfMemoryError: Java heap space
at org.apache.solr.core.SolrCore.execute(
at org.mortbay.jetty.Server.handle(
at org.mortbay.jetty.HttpParser.parseNext(
at org.mortbay.jetty.HttpParser.parseAvailable(
at org.mortbay.jetty.HttpConnection.handle(

On Fri, Sep 25, 2009 at 10:55 AM, Yonik Seeley

> On Fri, Sep 25, 2009 at 9:30 AM, Jonathan Ariel 
> wrote:
> > Hi to all!
> > Lately my solr servers seem to stop responding once in a while. I'm
> > solr 1.3.
> > Of course I'm having more traffic on the servers.
> > So I logged the Garbage Collection activity to check if it's because of
> > that. It seems like 11% of the time the application runs, it is stopped
> > because of GC. And some times the GC takes up to 10 seconds!
> > Is is normal? My instances run on a 16GB RAM, Dual Quad Core Intel Xeon
> > servers. My index is around 10GB and I'm giving to the instances 10GB of
> > RAM.
> Bigger heaps lead to bigger GC pauses in general.
> Do you mean that you are giving the JVM a 10GB heap?  Were you getting
> OOM exceptions with a smaller heap?
> -Yonik

Re: Solr and Garbage Collection

2009-09-25 Thread Jonathan Ariel
Right, now I'm giving it 12GB of heap memory.
If I give it less (10GB) it throws the following exception:

Sep 5, 2009 7:18:32 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.OutOfMemoryError: Java heap space
at org.apache.solr.core.SolrCore.execute(
at org.mortbay.jetty.Server.handle(
at org.mortbay.jetty.HttpParser.parseNext(
at org.mortbay.jetty.HttpParser.parseAvailable(
at org.mortbay.jetty.HttpConnection.handle(

On Fri, Sep 25, 2009 at 10:55 AM, Yonik Seeley

> On Fri, Sep 25, 2009 at 9:30 AM, Jonathan Ariel 
> wrote:
> > Hi to all!
> > Lately my solr servers seem to stop responding once in a while. I'm using
> > solr 1.3.
> > Of course I'm having more traffic on the servers.
> > So I logged the Garbage Collection activity to check if it's because of
> > that. It seems like 11% of the time the application runs, it is stopped
> > because of GC. And some times the GC takes up to 10 seconds!
> > Is is normal? My instances run on a 16GB RAM, Dual Quad Core Intel Xeon
> > servers. My index is around 10GB and I'm giving to the instances 10GB of
> > RAM.
> Bigger heaps lead to bigger GC pauses in general.
> Do you mean that you are giving the JVM a 10GB heap?  Were you getting
> OOM exceptions with a smaller heap?
> -Yonik

Re: Solr and Garbage Collection

2009-09-25 Thread Yonik Seeley
On Fri, Sep 25, 2009 at 9:30 AM, Jonathan Ariel  wrote:
> Hi to all!
> Lately my solr servers seem to stop responding once in a while. I'm using
> solr 1.3.
> Of course I'm having more traffic on the servers.
> So I logged the Garbage Collection activity to check if it's because of
> that. It seems like 11% of the time the application runs, it is stopped
> because of GC. And some times the GC takes up to 10 seconds!
> Is is normal? My instances run on a 16GB RAM, Dual Quad Core Intel Xeon
> servers. My index is around 10GB and I'm giving to the instances 10GB of
> RAM.

Bigger heaps lead to bigger GC pauses in general.
Do you mean that you are giving the JVM a 10GB heap?  Were you getting
OOM exceptions with a smaller heap?
