RE: Solr and Garbage Collection
Master-Slave replica: new caches will be warmed&prepopulated _before_ making new IndexReader available for _new_ requests and _before_ discarding old one - it means that theoretical sizing for FieldCache (which is defined by number of docs in an index and cardinality of a field) should be doubled... of course we need to play with GC options too for performance tuning (mostly) > > I read pretty much all posts on this thread (before and after this one). > Looks > > like the main suggestion from you and others is to keep max heap size > (-Xmx) > > as small as possible (as long as you don't see OOM exception). > > > I suggested absolute opposite; please note also that "as small as possible" > does not have any meaning in multiuser environment of Tomcat. It depends on > query types (10 documents per request? OR, may be 1???) AND it depends > on average server loading (one concurrent request? Or, may be 200 threads > trying to deal with 2000 concurrent requests?) AND it depends on whether it > is Master (used for updates - parses tons of docs in a single file???) - and > it depends on unpredictable memory fragmentation - it all depends on use > case too(!!!), additionally to schema / index size. > > > Please note also, such staff depends on JVM vendor too: what if it > precompiles everything into CPU native code (including memory dealloc after > each call)? Some do! > > -Fuad > http://www.linkedin.com/in/liferay > > > ...but 'core' constantly disagrees with me :) > > >
RE: Solr and Garbage Collection
> I read pretty much all posts on this thread (before and after this one). Looks > like the main suggestion from you and others is to keep max heap size (-Xmx) > as small as possible (as long as you don't see OOM exception). I suggested absolute opposite; please note also that "as small as possible" does not have any meaning in multiuser environment of Tomcat. It depends on query types (10 documents per request? OR, may be 1???) AND it depends on average server loading (one concurrent request? Or, may be 200 threads trying to deal with 2000 concurrent requests?) AND it depends on whether it is Master (used for updates - parses tons of docs in a single file???) - and it depends on unpredictable memory fragmentation - it all depends on use case too(!!!), additionally to schema / index size. Please note also, such staff depends on JVM vendor too: what if it precompiles everything into CPU native code (including memory dealloc after each call)? Some do! -Fuad http://www.linkedin.com/in/liferay ...but 'core' constantly disagrees with me :)
Re: Solr and Garbage Collection
;>>> >>>>>>> >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I read pretty much all posts on this thread (before and after this >>>>>>>> >>>>>>>> >>>>>>>> >>>> one). >>>> >>>> >>>> >>>>>> Looks like the main suggestion from you and others is to keep max heap >>>>>> >>>>>> >>>>>> >>>> size >>>> >>>> >>>> >>>>>> (-Xmx) as small as possible (as long as you don't see OOM exception). >>>>>> >>>>>> >>>>>> >>>> This >>>> >>>> >>>> >>>>>> brings more questions than answers (for me at least. I'm new to Solr). >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>> First, our environment and problem encountered: Solr1.4 (nightly >>>>>>>> >>>>>>>> >>>>>>>> >>>> build, >>>> >>>> >>>> >>>>>> downloaded about 2 months ago), Sun JDK1.6, Tomcat 5.5, running on >>>>>> Solaris(multi-cpu/cores). The cache setting is from the default >>>>>> solrconfig.xml (looks very small). At first we used minimum JAVA_OPTS >>>>>> >>>>>> >>>>>> >>>> and >>>> >>>> >>>> >>>>>> quickly run into the problem similar to the one orignal poster reported >>>>>> >>>>>> >>>>>> >>>> -- >>>> >>>> >>>> >>>>>> long pause (seconds to minutes) under load test. jconsole showed that it >>>>>> pauses on GC. So more JAVA_OPTS get added: "-XX:+UseConcMarkSweepGC >>>>>> -XX:+UseParNewGC -XX:ParallelGCThreads=8 -XX:SurvivorRatio=2 >>>>>> -XX:NewSize=128m -XX:MaxNewSize=512m -XX:MaxGCPauseMillis=200", the >>>>>> >>>>>> >>>>>> >>>> thinking >>>> >>>> >>>> >>>>>> is with mutile-cpu/cores we can get over with GC as quickly as possibe. >>>>>> >>>>>> >>>>>> >>>> With >>>> >>>> >>>> >>>>>> the new setup, it works fine until Tomcat reaches heap size, then it >>>>>> >>>>>> >>>>>> >>>> blocks >>>> >>>> >>>> >>>>>> and takes minutes on "full GC" to get more space from "tenure >>>>>> >>>>>> >>>>>> >>>> generation". >>>> >>>> >>>> >>>>>> We tried different Xmx (from very small to large), no difference in long >>>>>> >>>>>> >>>>>> >>>> GC >>>> >>>> >>>> >>>>>> time. We never run into OOM. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> MaxGCPauseMillis doesnt work with UseConcMarkSweepGC - its for use with >>>>>>> the Parallel collector. That also doesnt look like a good >>>>>>> >>>>>>> >>>>>>> >>>> survivorratio. >>>> >>>> >>>> >>>>>>> >>>>>>>
Re: Solr and Garbage Collection
Sun JDK1.6, Tomcat 5.5, running on >>>>> Solaris(multi-cpu/cores). The cache setting is from the default >>>>> solrconfig.xml (looks very small). At first we used minimum JAVA_OPTS >>>>> >>>>> >>> and >>> >>> >>>>> quickly run into the problem similar to the one orignal poster reported >>>>> >>>>> >>> -- >>> >>> >>>>> long pause (seconds to minutes) under load test. jconsole showed that it >>>>> pauses on GC. So more JAVA_OPTS get added: "-XX:+UseConcMarkSweepGC >>>>> -XX:+UseParNewGC -XX:ParallelGCThreads=8 -XX:SurvivorRatio=2 >>>>> -XX:NewSize=128m -XX:MaxNewSize=512m -XX:MaxGCPauseMillis=200", the >>>>> >>>>> >>> thinking >>> >>> >>>>> is with mutile-cpu/cores we can get over with GC as quickly as possibe. >>>>> >>>>> >>> With >>> >>> >>>>> the new setup, it works fine until Tomcat reaches heap size, then it >>>>> >>>>> >>> blocks >>> >>> >>>>> and takes minutes on "full GC" to get more space from "tenure >>>>> >>>>> >>> generation". >>> >>> >>>>> We tried different Xmx (from very small to large), no difference in long >>>>> >>>>> >>> GC >>> >>> >>>>> time. We never run into OOM. >>>>> >>>>> >>>>> >>>>>> MaxGCPauseMillis doesnt work with UseConcMarkSweepGC - its for use with >>>>>> the Parallel collector. That also doesnt look like a good >>>>>> >>>>>> >>> survivorratio. >>> >>> >>>>>> >>>>>> >>>>>>> Questions: >>>>>>> >>>>>>> * In general various cachings are good for performance, we have more >>>>>>> >>>>>>> >>> RAM >>> >>> >>>>> to use and want to use more caching to boost performance, isn't your >>>>> suggestion (of lowering heap limit) going against that? >>>>> >>>>> >>>>> >>>>>> Leaving RAM for the FileSystem cache is also very important. But you >>>>>> should also have enough RAM for your Solr caches of course. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> * Looks like Solr caching made its way into tenure-generation on heap, >>>>>>> >>>>>>> >>>>>>> >>>>> that's good. But why they get GC'ed eventually?? I did a quick check of >>>>> >>>>> >>> Solr >>> >>> >>>>> code (Solr 1.3, not 1.4), and see a single instance of using >>>>> >>>>> >>> WeakReference. >>> >>> >>>>> Is that what is causing all this? This seems to suggest a design flaw in >>>>> Solr's memory management strategy (or just my ignorance about Solr?). I >>>>> mean, wouldn't this be the "right" way of doing it -- you allow user to >>>>> specify the cache size in solrconfig.xml, then user can set up heap >>>>> >>>>> >>> limit in >>> >>> >>>>> JAVA_OPTS accordingly, and no need to use WeakReference (BTW, why not >>>>> SoftReference)?? >>>>> >>>>> >>>>> >>>>>> Do you see concurrent mode failure when looking at your gc logs? ie: >>>>>> >>>>>> 174.445: [GC 174.446: [ParNew: 66408K->66408K(66416K), 0.618 >>>>>> secs]174.446: [CMS (concurrent mode failure): >>>>>> >>>>>>
Re: Solr and Garbage Collection
to get more space from "tenure >>>> >> generation". >> >>>> We tried different Xmx (from very small to large), no difference in long >>>> >> GC >> >>>> time. We never run into OOM. >>>> >>>> >>>>> MaxGCPauseMillis doesnt work with UseConcMarkSweepGC - its for use with >>>>> the Parallel collector. That also doesnt look like a good >>>>> >> survivorratio. >> >>>>> >>>>>> Questions: >>>>>> >>>>>> * In general various cachings are good for performance, we have more >>>>>> >> RAM >> >>>> to use and want to use more caching to boost performance, isn't your >>>> suggestion (of lowering heap limit) going against that? >>>> >>>> >>>>> Leaving RAM for the FileSystem cache is also very important. But you >>>>> should also have enough RAM for your Solr caches of course. >>>>> >>>>> >>>>> >>>>>> * Looks like Solr caching made its way into tenure-generation on heap, >>>>>> >>>>>> >>>> that's good. But why they get GC'ed eventually?? I did a quick check of >>>> >> Solr >> >>>> code (Solr 1.3, not 1.4), and see a single instance of using >>>> >> WeakReference. >> >>>> Is that what is causing all this? This seems to suggest a design flaw in >>>> Solr's memory management strategy (or just my ignorance about Solr?). I >>>> mean, wouldn't this be the "right" way of doing it -- you allow user to >>>> specify the cache size in solrconfig.xml, then user can set up heap >>>> >> limit in >> >>>> JAVA_OPTS accordingly, and no need to use WeakReference (BTW, why not >>>> SoftReference)?? >>>> >>>> >>>>> Do you see concurrent mode failure when looking at your gc logs? ie: >>>>> >>>>> 174.445: [GC 174.446: [ParNew: 66408K->66408K(66416K), 0.618 >>>>> secs]174.446: [CMS (concurrent mode failure): >>>>> >> 161928K->162118K(175104K), >> >>>>> 4.0975124 secs] 228336K->162118K(241520K) >>>>> >>>>> That means you have still getting major collections with CMS, and you >>>>> don't want that. You might try kicking GC off earlier with something >>>>> like: -XX:CMSInitiatingOccupancyFraction=50 >>>>> >>>>> >>>>> >>>>>> * Right now I have a single Tomcat hosting Solr and other >>>>>> >> applications. >> >>>> I guess now it's better to have Solr on its own Tomcat, given that it's >>>> tricky to adjust the java options. >>>> >>>> >>>>>> thanks. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> From: wun...@wunderwood.org >>>>>>> To: solr-user@lucene.apache.org >>>>>>> Subject: RE: Solr and Garbage Collection >>>>>>> Date: Fri, 25 Sep 2009 09:51:29 -0700 >>>>>>> >>>>>>> 30ms is not better or worse than 1s until you look at the service >>>>>>> requirements. For many applications, it is worth dedicating 10% of >>>>>>> >> your >> >>>>>>> processing time to GC if that makes the worst-case pause short. >>>>>>> >>>>>>> On the other hand, my experience with the IBM JVM was that the >>>>>>> >> maximum >> >>>> query >>>> >>>> >>>>>>> rate was 2-3X better with the concurrent generational GC compared to >>>>>>> >>>>>>> >>>> any of >>>> >>>> >>>>>>> their other GC algorithms, so we got the best throughput along with >>>>>>> >> the >> >>>>
Re: Solr and Garbage Collection
ts way into tenure-generation on heap, > >>>> > >> that's good. But why they get GC'ed eventually?? I did a quick check of > Solr > >> code (Solr 1.3, not 1.4), and see a single instance of using > WeakReference. > >> Is that what is causing all this? This seems to suggest a design flaw in > >> Solr's memory management strategy (or just my ignorance about Solr?). I > >> mean, wouldn't this be the "right" way of doing it -- you allow user to > >> specify the cache size in solrconfig.xml, then user can set up heap > limit in > >> JAVA_OPTS accordingly, and no need to use WeakReference (BTW, why not > >> SoftReference)?? > >> > >>>> > >>> Do you see concurrent mode failure when looking at your gc logs? ie: > >>> > >>> 174.445: [GC 174.446: [ParNew: 66408K->66408K(66416K), 0.618 > >>> secs]174.446: [CMS (concurrent mode failure): > 161928K->162118K(175104K), > >>> 4.0975124 secs] 228336K->162118K(241520K) > >>> > >>> That means you have still getting major collections with CMS, and you > >>> don't want that. You might try kicking GC off earlier with something > >>> like: -XX:CMSInitiatingOccupancyFraction=50 > >>> > >>> > >>>> * Right now I have a single Tomcat hosting Solr and other > applications. > >>>> > >> I guess now it's better to have Solr on its own Tomcat, given that it's > >> tricky to adjust the java options. > >> > >>>> > >>>> thanks. > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>>> From: wun...@wunderwood.org > >>>>> To: solr-user@lucene.apache.org > >>>>> Subject: RE: Solr and Garbage Collection > >>>>> Date: Fri, 25 Sep 2009 09:51:29 -0700 > >>>>> > >>>>> 30ms is not better or worse than 1s until you look at the service > >>>>> requirements. For many applications, it is worth dedicating 10% of > your > >>>>> processing time to GC if that makes the worst-case pause short. > >>>>> > >>>>> On the other hand, my experience with the IBM JVM was that the > maximum > >>>>> > >> query > >> > >>>>> rate was 2-3X better with the concurrent generational GC compared to > >>>>> > >> any of > >> > >>>>> their other GC algorithms, so we got the best throughput along with > the > >>>>> shortest pauses. > >>>>> > >>>>> Solr garbage generation (for queries) seems to have two major > >>>>> > >> components: > >> > >>>>> per-request garbage and cache evictions. With a generational > collector, > >>>>> these two are handled by separate parts of the collector. Per-request > >>>>> garbage should completely fit in the short-term heap (nursery), so > that > >>>>> > >> it > >> > >>>>> can be collected rapidly and returned to use for further requests. If > >>>>> > >> the > >> > >>>>> nursery is too small, the per-request allocations will be made in > >>>>> > >> tenured > >> > >>>>> space and sit there until the next major GC. Cache evictions are > almost > >>>>> always in long-term storage (tenured space) because an LRU algorithm > >>>>> guarantees that the garbage will be old. > >>>>> > >>>>> Check the growth rate of tenured space (under constant load, of > course) > >>>>> while increasing the size of the nursery. That rate should drop when > >>>>> > >> the > >> > >>>>> nursery gets big enough, then not drop much further as it is > increased > >>>>> > >> more. > >> > >>>>> After that, reduce the size of tenured space until major GCs start > >>>>> > >> happening > >> > >>>>> "too often" (a judgment call). A bigger tenured space means longer > >>>>> > >> major GCs > >> > >>>>> and thus longer pauses, so you don't want it oversized by too much. > >>>>> > >>>>> Also check the hit rates of your caches. If the hit rate is low, say > >>>>> > >> 20% or > >> > >>>>> less, make that cache much bigger or set it to zero. Either one will > >>>>> > >> reduce > >> > >>>>> the number of cache evictions. If you have an HTTP cache in front of > >>>>> > >> Solr, > >> > >>>>> zero may be the right choice, since the HTTP cache is cherry-picking > >>>>> > >> the > >> > >>>>> easily cacheable requests. > >>>>> > >>>>> Note that a commit nearly doubles the memory required, because you > have > >>>>> > >> two > >> > >>>>> live Searcher objects with all their caches. Make sure you have > >>>>> > >> headroom for > >> > >>>>> a commit. > >>>>> > >>>>> If you want to test the tenured space usage, you must test with real > >>>>> > >> world > >> > >>>>> queries. Those are the only way to get accurate cache eviction rates. > >>>>> > >>>>> wunder > >>>>> > >>>>> > >>>>> > >>>> _ > >>>> Bing™ brings you maps, menus, and reviews organized in one place. > Try > >>>> > >> it now. > >> > >> > http://www.bing.com/search?q=restaurants&form=MLOGEN&publ=WLHMTAG&crea=TEXT_MLOGEN_Core_tagline_local_1x1 > >> > >>>> > >>> > >>> > >> -- > >> - Mark > >> > >> http://www.lucidimagination.com > >> > >> > >> > >> > >> > >> > > > > > > > -- > - Mark > > http://www.lucidimagination.com > > > >
Re: Solr and Garbage Collection
ce (BTW, why not >> SoftReference)?? >> >>>> >>> Do you see concurrent mode failure when looking at your gc logs? ie: >>> >>> 174.445: [GC 174.446: [ParNew: 66408K->66408K(66416K), 0.618 >>> secs]174.446: [CMS (concurrent mode failure): 161928K->162118K(175104K), >>> 4.0975124 secs] 228336K->162118K(241520K) >>> >>> That means you have still getting major collections with CMS, and you >>> don't want that. You might try kicking GC off earlier with something >>> like: -XX:CMSInitiatingOccupancyFraction=50 >>> >>> >>>> * Right now I have a single Tomcat hosting Solr and other applications. >>>> >> I guess now it's better to have Solr on its own Tomcat, given that it's >> tricky to adjust the java options. >> >>>> >>>> thanks. >>>> >>>> >>>> >>>> >>>> >>>> >>>>> From: wun...@wunderwood.org >>>>> To: solr-user@lucene.apache.org >>>>> Subject: RE: Solr and Garbage Collection >>>>> Date: Fri, 25 Sep 2009 09:51:29 -0700 >>>>> >>>>> 30ms is not better or worse than 1s until you look at the service >>>>> requirements. For many applications, it is worth dedicating 10% of your >>>>> processing time to GC if that makes the worst-case pause short. >>>>> >>>>> On the other hand, my experience with the IBM JVM was that the maximum >>>>> >> query >> >>>>> rate was 2-3X better with the concurrent generational GC compared to >>>>> >> any of >> >>>>> their other GC algorithms, so we got the best throughput along with the >>>>> shortest pauses. >>>>> >>>>> Solr garbage generation (for queries) seems to have two major >>>>> >> components: >> >>>>> per-request garbage and cache evictions. With a generational collector, >>>>> these two are handled by separate parts of the collector. Per-request >>>>> garbage should completely fit in the short-term heap (nursery), so that >>>>> >> it >> >>>>> can be collected rapidly and returned to use for further requests. If >>>>> >> the >> >>>>> nursery is too small, the per-request allocations will be made in >>>>> >> tenured >> >>>>> space and sit there until the next major GC. Cache evictions are almost >>>>> always in long-term storage (tenured space) because an LRU algorithm >>>>> guarantees that the garbage will be old. >>>>> >>>>> Check the growth rate of tenured space (under constant load, of course) >>>>> while increasing the size of the nursery. That rate should drop when >>>>> >> the >> >>>>> nursery gets big enough, then not drop much further as it is increased >>>>> >> more. >> >>>>> After that, reduce the size of tenured space until major GCs start >>>>> >> happening >> >>>>> "too often" (a judgment call). A bigger tenured space means longer >>>>> >> major GCs >> >>>>> and thus longer pauses, so you don't want it oversized by too much. >>>>> >>>>> Also check the hit rates of your caches. If the hit rate is low, say >>>>> >> 20% or >> >>>>> less, make that cache much bigger or set it to zero. Either one will >>>>> >> reduce >> >>>>> the number of cache evictions. If you have an HTTP cache in front of >>>>> >> Solr, >> >>>>> zero may be the right choice, since the HTTP cache is cherry-picking >>>>> >> the >> >>>>> easily cacheable requests. >>>>> >>>>> Note that a commit nearly doubles the memory required, because you have >>>>> >> two >> >>>>> live Searcher objects with all their caches. Make sure you have >>>>> >> headroom for >> >>>>> a commit. >>>>> >>>>> If you want to test the tenured space usage, you must test with real >>>>> >> world >> >>>>> queries. Those are the only way to get accurate cache eviction rates. >>>>> >>>>> wunder >>>>> >>>>> >>>>> >>>> _ >>>> Bing™ brings you maps, menus, and reviews organized in one place. Try >>>> >> it now. >> >> http://www.bing.com/search?q=restaurants&form=MLOGEN&publ=WLHMTAG&crea=TEXT_MLOGEN_Core_tagline_local_1x1 >> >>>> >>> >>> >> -- >> - Mark >> >> http://www.lucidimagination.com >> >> >> >> >> >> > > -- - Mark http://www.lucidimagination.com
Re: Solr and Garbage Collection
SUN has recently clarify the issue regarding "unsupported unless you pay" for the G1 garbage collector. Here is the updated release of Java 6 update 14: http://java.sun.com/javase/6/webnotes/6u14.html G1 will be part of Java 7, fully supported without pay. The version included in Java 6 update 14 is a beta release. Since it is beta, SUN does not recommend using it unless you have a support contract because as with any beta software there will be bugs. Non paying customers may very well have to wait for the official version in Java 7 for bug fixes. Here is more info on the G1 garbage collector: http://java.sun.com/javase/technologies/hotspot/gc/g1_intro.jsp Bill On Sat, Oct 3, 2009 at 1:28 PM, Mark Miller wrote: > Another option of course, if you're using a recent version of Java 6: > > try out the beta-ish, unsupported unless you pay, G1 garbage collector. > I've only recently started playing with it, but its supposed to be much > better than CMS. Its supposedly got much better throughput, its much > better at dealing with fragmentation issues (CMS is actually pretty bad > with fragmentation come to find out), and overall its just supposed to > be a very nice leap ahead in GC. Havn't had a chance to play with it > much myself, but its supposed to be fantastic. A whole new approach to > generational collection for Sun, and much closer to the "real time" GC's > available from some other vendors. > > Mark Miller wrote: > > siping liu wrote: > > > >> Hi, > >> > >> I read pretty much all posts on this thread (before and after this one). > Looks like the main suggestion from you and others is to keep max heap size > (-Xmx) as small as possible (as long as you don't see OOM exception). This > brings more questions than answers (for me at least. I'm new to Solr). > >> > >> > >> > >> First, our environment and problem encountered: Solr1.4 (nightly build, > downloaded about 2 months ago), Sun JDK1.6, Tomcat 5.5, running on > Solaris(multi-cpu/cores). The cache setting is from the default > solrconfig.xml (looks very small). At first we used minimum JAVA_OPTS and > quickly run into the problem similar to the one orignal poster reported -- > long pause (seconds to minutes) under load test. jconsole showed that it > pauses on GC. So more JAVA_OPTS get added: "-XX:+UseConcMarkSweepGC > -XX:+UseParNewGC -XX:ParallelGCThreads=8 -XX:SurvivorRatio=2 > -XX:NewSize=128m -XX:MaxNewSize=512m -XX:MaxGCPauseMillis=200", the thinking > is with mutile-cpu/cores we can get over with GC as quickly as possibe. With > the new setup, it works fine until Tomcat reaches heap size, then it blocks > and takes minutes on "full GC" to get more space from "tenure generation". > We tried different Xmx (from very small to large), no difference in long GC > time. We never run into OOM. > >> > >> > > MaxGCPauseMillis doesnt work with UseConcMarkSweepGC - its for use with > > the Parallel collector. That also doesnt look like a good survivorratio. > > > >> > >> > >> Questions: > >> > >> * In general various cachings are good for performance, we have more RAM > to use and want to use more caching to boost performance, isn't your > suggestion (of lowering heap limit) going against that? > >> > >> > > Leaving RAM for the FileSystem cache is also very important. But you > > should also have enough RAM for your Solr caches of course. > > > >> * Looks like Solr caching made its way into tenure-generation on heap, > that's good. But why they get GC'ed eventually?? I did a quick check of Solr > code (Solr 1.3, not 1.4), and see a single instance of using WeakReference. > Is that what is causing all this? This seems to suggest a design flaw in > Solr's memory management strategy (or just my ignorance about Solr?). I > mean, wouldn't this be the "right" way of doing it -- you allow user to > specify the cache size in solrconfig.xml, then user can set up heap limit in > JAVA_OPTS accordingly, and no need to use WeakReference (BTW, why not > SoftReference)?? > >> > >> > > Do you see concurrent mode failure when looking at your gc logs? ie: > > > > 174.445: [GC 174.446: [ParNew: 66408K->66408K(66416K), 0.618 > > secs]174.446: [CMS (concurrent mode failure): 161928K->162118K(175104K), > > 4.0975124 secs] 228336K->162118K(241520K) > > > > That means you have still getting major collections with CMS, and you > > don't want that. You might try kicking GC off earlier with something > > like: -XX:CMSInitiatingOccupancyFraction=50 > >
Re: Solr and Garbage Collection
Another option of course, if you're using a recent version of Java 6: try out the beta-ish, unsupported unless you pay, G1 garbage collector. I've only recently started playing with it, but its supposed to be much better than CMS. Its supposedly got much better throughput, its much better at dealing with fragmentation issues (CMS is actually pretty bad with fragmentation come to find out), and overall its just supposed to be a very nice leap ahead in GC. Havn't had a chance to play with it much myself, but its supposed to be fantastic. A whole new approach to generational collection for Sun, and much closer to the "real time" GC's available from some other vendors. Mark Miller wrote: > siping liu wrote: > >> Hi, >> >> I read pretty much all posts on this thread (before and after this one). >> Looks like the main suggestion from you and others is to keep max heap size >> (-Xmx) as small as possible (as long as you don't see OOM exception). This >> brings more questions than answers (for me at least. I'm new to Solr). >> >> >> >> First, our environment and problem encountered: Solr1.4 (nightly build, >> downloaded about 2 months ago), Sun JDK1.6, Tomcat 5.5, running on >> Solaris(multi-cpu/cores). The cache setting is from the default >> solrconfig.xml (looks very small). At first we used minimum JAVA_OPTS and >> quickly run into the problem similar to the one orignal poster reported -- >> long pause (seconds to minutes) under load test. jconsole showed that it >> pauses on GC. So more JAVA_OPTS get added: "-XX:+UseConcMarkSweepGC >> -XX:+UseParNewGC -XX:ParallelGCThreads=8 -XX:SurvivorRatio=2 >> -XX:NewSize=128m -XX:MaxNewSize=512m -XX:MaxGCPauseMillis=200", the thinking >> is with mutile-cpu/cores we can get over with GC as quickly as possibe. With >> the new setup, it works fine until Tomcat reaches heap size, then it blocks >> and takes minutes on "full GC" to get more space from "tenure generation". >> We tried different Xmx (from very small to large), no difference in long GC >> time. We never run into OOM. >> >> > MaxGCPauseMillis doesnt work with UseConcMarkSweepGC - its for use with > the Parallel collector. That also doesnt look like a good survivorratio. > >> >> >> Questions: >> >> * In general various cachings are good for performance, we have more RAM to >> use and want to use more caching to boost performance, isn't your suggestion >> (of lowering heap limit) going against that? >> >> > Leaving RAM for the FileSystem cache is also very important. But you > should also have enough RAM for your Solr caches of course. > >> * Looks like Solr caching made its way into tenure-generation on heap, >> that's good. But why they get GC'ed eventually?? I did a quick check of Solr >> code (Solr 1.3, not 1.4), and see a single instance of using WeakReference. >> Is that what is causing all this? This seems to suggest a design flaw in >> Solr's memory management strategy (or just my ignorance about Solr?). I >> mean, wouldn't this be the "right" way of doing it -- you allow user to >> specify the cache size in solrconfig.xml, then user can set up heap limit in >> JAVA_OPTS accordingly, and no need to use WeakReference (BTW, why not >> SoftReference)?? >> >> > Do you see concurrent mode failure when looking at your gc logs? ie: > > 174.445: [GC 174.446: [ParNew: 66408K->66408K(66416K), 0.618 > secs]174.446: [CMS (concurrent mode failure): 161928K->162118K(175104K), > 4.0975124 secs] 228336K->162118K(241520K) > > That means you have still getting major collections with CMS, and you > don't want that. You might try kicking GC off earlier with something > like: -XX:CMSInitiatingOccupancyFraction=50 > >> * Right now I have a single Tomcat hosting Solr and other applications. I >> guess now it's better to have Solr on its own Tomcat, given that it's tricky >> to adjust the java options. >> >> >> >> thanks. >> >> >> >> >> >>> From: wun...@wunderwood.org >>> To: solr-user@lucene.apache.org >>> Subject: RE: Solr and Garbage Collection >>> Date: Fri, 25 Sep 2009 09:51:29 -0700 >>> >>> 30ms is not better or worse than 1s until you look at the service >>> requirements. For many applications, it is worth dedicating 10% of your >>> processing time to GC if that makes the worst-case pause short. >>> >>> On the other hand, my exper
Re: Solr and Garbage Collection
siping liu wrote: > Hi, > > I read pretty much all posts on this thread (before and after this one). > Looks like the main suggestion from you and others is to keep max heap size > (-Xmx) as small as possible (as long as you don't see OOM exception). This > brings more questions than answers (for me at least. I'm new to Solr). > > > > First, our environment and problem encountered: Solr1.4 (nightly build, > downloaded about 2 months ago), Sun JDK1.6, Tomcat 5.5, running on > Solaris(multi-cpu/cores). The cache setting is from the default > solrconfig.xml (looks very small). At first we used minimum JAVA_OPTS and > quickly run into the problem similar to the one orignal poster reported -- > long pause (seconds to minutes) under load test. jconsole showed that it > pauses on GC. So more JAVA_OPTS get added: "-XX:+UseConcMarkSweepGC > -XX:+UseParNewGC -XX:ParallelGCThreads=8 -XX:SurvivorRatio=2 -XX:NewSize=128m > -XX:MaxNewSize=512m -XX:MaxGCPauseMillis=200", the thinking is with > mutile-cpu/cores we can get over with GC as quickly as possibe. With the new > setup, it works fine until Tomcat reaches heap size, then it blocks and takes > minutes on "full GC" to get more space from "tenure generation". We tried > different Xmx (from very small to large), no difference in long GC time. We > never run into OOM. > MaxGCPauseMillis doesnt work with UseConcMarkSweepGC - its for use with the Parallel collector. That also doesnt look like a good survivorratio. > > > Questions: > > * In general various cachings are good for performance, we have more RAM to > use and want to use more caching to boost performance, isn't your suggestion > (of lowering heap limit) going against that? > Leaving RAM for the FileSystem cache is also very important. But you should also have enough RAM for your Solr caches of course. > * Looks like Solr caching made its way into tenure-generation on heap, that's > good. But why they get GC'ed eventually?? I did a quick check of Solr code > (Solr 1.3, not 1.4), and see a single instance of using WeakReference. Is > that what is causing all this? This seems to suggest a design flaw in Solr's > memory management strategy (or just my ignorance about Solr?). I mean, > wouldn't this be the "right" way of doing it -- you allow user to specify the > cache size in solrconfig.xml, then user can set up heap limit in JAVA_OPTS > accordingly, and no need to use WeakReference (BTW, why not SoftReference)?? > Do you see concurrent mode failure when looking at your gc logs? ie: 174.445: [GC 174.446: [ParNew: 66408K->66408K(66416K), 0.618 secs]174.446: [CMS (concurrent mode failure): 161928K->162118K(175104K), 4.0975124 secs] 228336K->162118K(241520K) That means you have still getting major collections with CMS, and you don't want that. You might try kicking GC off earlier with something like: -XX:CMSInitiatingOccupancyFraction=50 > * Right now I have a single Tomcat hosting Solr and other applications. I > guess now it's better to have Solr on its own Tomcat, given that it's tricky > to adjust the java options. > > > > thanks. > > > > >> From: wun...@wunderwood.org >> To: solr-user@lucene.apache.org >> Subject: RE: Solr and Garbage Collection >> Date: Fri, 25 Sep 2009 09:51:29 -0700 >> >> 30ms is not better or worse than 1s until you look at the service >> requirements. For many applications, it is worth dedicating 10% of your >> processing time to GC if that makes the worst-case pause short. >> >> On the other hand, my experience with the IBM JVM was that the maximum query >> rate was 2-3X better with the concurrent generational GC compared to any of >> their other GC algorithms, so we got the best throughput along with the >> shortest pauses. >> >> Solr garbage generation (for queries) seems to have two major components: >> per-request garbage and cache evictions. With a generational collector, >> these two are handled by separate parts of the collector. Per-request >> garbage should completely fit in the short-term heap (nursery), so that it >> can be collected rapidly and returned to use for further requests. If the >> nursery is too small, the per-request allocations will be made in tenured >> space and sit there until the next major GC. Cache evictions are almost >> always in long-term storage (tenured space) because an LRU algorithm >> guarantees that the garbage will be old. >> >> Check the growth rate of tenured space (under constant load, of course) >> while increasing the size of the nursery. That rate should drop when the >> nurse
RE: Solr and Garbage Collection
Hi, I read pretty much all posts on this thread (before and after this one). Looks like the main suggestion from you and others is to keep max heap size (-Xmx) as small as possible (as long as you don't see OOM exception). This brings more questions than answers (for me at least. I'm new to Solr). First, our environment and problem encountered: Solr1.4 (nightly build, downloaded about 2 months ago), Sun JDK1.6, Tomcat 5.5, running on Solaris(multi-cpu/cores). The cache setting is from the default solrconfig.xml (looks very small). At first we used minimum JAVA_OPTS and quickly run into the problem similar to the one orignal poster reported -- long pause (seconds to minutes) under load test. jconsole showed that it pauses on GC. So more JAVA_OPTS get added: "-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ParallelGCThreads=8 -XX:SurvivorRatio=2 -XX:NewSize=128m -XX:MaxNewSize=512m -XX:MaxGCPauseMillis=200", the thinking is with mutile-cpu/cores we can get over with GC as quickly as possibe. With the new setup, it works fine until Tomcat reaches heap size, then it blocks and takes minutes on "full GC" to get more space from "tenure generation". We tried different Xmx (from very small to large), no difference in long GC time. We never run into OOM. Questions: * In general various cachings are good for performance, we have more RAM to use and want to use more caching to boost performance, isn't your suggestion (of lowering heap limit) going against that? * Looks like Solr caching made its way into tenure-generation on heap, that's good. But why they get GC'ed eventually?? I did a quick check of Solr code (Solr 1.3, not 1.4), and see a single instance of using WeakReference. Is that what is causing all this? This seems to suggest a design flaw in Solr's memory management strategy (or just my ignorance about Solr?). I mean, wouldn't this be the "right" way of doing it -- you allow user to specify the cache size in solrconfig.xml, then user can set up heap limit in JAVA_OPTS accordingly, and no need to use WeakReference (BTW, why not SoftReference)?? * Right now I have a single Tomcat hosting Solr and other applications. I guess now it's better to have Solr on its own Tomcat, given that it's tricky to adjust the java options. thanks. > From: wun...@wunderwood.org > To: solr-user@lucene.apache.org > Subject: RE: Solr and Garbage Collection > Date: Fri, 25 Sep 2009 09:51:29 -0700 > > 30ms is not better or worse than 1s until you look at the service > requirements. For many applications, it is worth dedicating 10% of your > processing time to GC if that makes the worst-case pause short. > > On the other hand, my experience with the IBM JVM was that the maximum query > rate was 2-3X better with the concurrent generational GC compared to any of > their other GC algorithms, so we got the best throughput along with the > shortest pauses. > > Solr garbage generation (for queries) seems to have two major components: > per-request garbage and cache evictions. With a generational collector, > these two are handled by separate parts of the collector. Per-request > garbage should completely fit in the short-term heap (nursery), so that it > can be collected rapidly and returned to use for further requests. If the > nursery is too small, the per-request allocations will be made in tenured > space and sit there until the next major GC. Cache evictions are almost > always in long-term storage (tenured space) because an LRU algorithm > guarantees that the garbage will be old. > > Check the growth rate of tenured space (under constant load, of course) > while increasing the size of the nursery. That rate should drop when the > nursery gets big enough, then not drop much further as it is increased more. > > After that, reduce the size of tenured space until major GCs start happening > "too often" (a judgment call). A bigger tenured space means longer major GCs > and thus longer pauses, so you don't want it oversized by too much. > > Also check the hit rates of your caches. If the hit rate is low, say 20% or > less, make that cache much bigger or set it to zero. Either one will reduce > the number of cache evictions. If you have an HTTP cache in front of Solr, > zero may be the right choice, since the HTTP cache is cherry-picking the > easily cacheable requests. > > Note that a commit nearly doubles the memory required, because you have two > live Searcher objects with all their caches. Make sure you have headroom for > a commit. > > If you want to test the tenured space usage, you must test with real world > queries. Those are the only way to get accurate cache eviction rates. > > wunder _ Bing™ brings you maps, menus, and reviews organized in one place. Try it now. http://www.bing.com/search?q=restaurants&form=MLOGEN&publ=WLHMTAG&crea=TEXT_MLOGEN_Core_tagline_local_1x1
RE: Solr and Garbage Collection
> Actually the CPU usage of the solr servers is almost insignificant (it was > like that before). >>The time spent on collecting memory dropped from 11% to 3.81% I even think that 3.81% from 5% is nothing (suspecting that SOLR uses 5% CPU, mostly loading large field values in memory) :))) (would be nice to load-stress-multithreaded except of waiting...) Most Expensive Query: faceting on all fields with generic query like *:*
Re: Solr and Garbage Collection
One way to track expensive is to look at the query time, QTime, in the solr log. There are a couple of tools for analyzing gc logs: http://www.tagtraum.com/gcviewer.html https://h20392.www2.hp.com/portal/swdepot/displayProductInfo.do?productNumber=HPJMETER They will give you frequency and duration of minor and major collection. On a multi-processor/core system with CPU cycles to spare, using the concurrent collector will reduce (may even eliminate) major collection. The trade off is that CPU utilization on the system will go up. When I tried it with one of my Java app, the system utilization went up so much under heavy load that it reduced the overall throughput of my app. You milage may varies. You will have to measure it for your app to see for yourself. Bill On Mon, Sep 28, 2009 at 4:49 PM, Jonathan Ariel wrote: > How do you track major collections? Even better, how do you log your GC > behavior with details? Right now I just log total time spent on > collections, > but I don't really know on which collections.Regard application performance > with the ConcMarkSweepGC, I think I didn't experience any impact for now. > Actually the CPU usage of the solr servers is almost insignificant (it was > like that before). > BTW, do you know a good way to track the N most expensive solr queries? I > would like to measure that on 2 different solr servers with different GC. > > On Mon, Sep 28, 2009 at 4:42 PM, Mark Miller > wrote: > > > Do you have your GC logs? Are you still seeing major collections? > > > > Where is the time spent? > > > > Hard to say without some of that info. > > > > The goal of the low pause collector is to finish collecting before the > > tenured space is filled - if it doesn't, a standard major collection > > occurs. > > > > The collector will use recent stats it records to try and pick a good > > time to start - as a fail safe though, it will trigger no matter what at > > a certain percentage. With Java 1.5, it was 68% full that it triggered. > > With 1.6, its 92%. > > > > If your still getting major collections, you might want to see if > > lowering that helps (-XX:CMSInitiatingOccupancyFraction=). If not, > > you might be near optimal settings. > > > > There is likely not anything else you should mess with - unless using > > the extra thread to collect while your app is running affects your apps > > performance - in that case you might want to look into turning on the > > incremental mode. But you havn't mentioned that, so I doubt it. > > > > > > > > -- > > - Mark > > > > http://www.lucidimagination.com > > > > > > > > Jonathan Ariel wrote: > > > Ok... good news! Upgrading to the newest version of JVM 6 (update 6) > > seems > > > to solve this ugly bug. With the upgraded JVM I could run the solr > > servers > > > for more than 12 hours on the production environment with the GC > > mentioned > > > in the previous e-mails. The results are really amazing. The time spent > > on > > > collecting memory dropped from 11% to 3.81%Do you think there is more > to > > > tune there? > > > > > > Thanks! > > > > > > Jonathan > > > > > > On Sun, Sep 27, 2009 at 8:39 PM, Bill Au wrote: > > > > > > > > >> You are running a very old version of Java 6 (update 6). The latest > is > > >> update 16. You should definitely upgrade. There is a bug in Java 6 > > >> starting with update 4 that may result in a corrupted Lucene/Solr > index: > > >> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6707044 > > >> https://issues.apache.org/jira/browse/LUCENE-1282 > > >> > > >> The JVM crash occurred in the gc thread. So it looks like a bug in > the > > JVM > > >> itself. Upgrading to the latest release might help. Switching to a > > >> different garbage collector should help. > > >> > > >> Bill > > >> > > >> On Sat, Sep 26, 2009 at 4:31 PM, Mark Miller > > >> wrote: > > >> > > >> > > >>> Jonathan Ariel wrote: > > >>> > > Ok. After the server ran for more than 12 hours, the time spent on > GC > > decreased from 11% to 3,4%, but 5 hours later it crashed. This is > the > > > > >>> thread > > >>> > > dump, maybe you can help identify what happened? > > > > > > >>> Well thats a tough ;) My guess is its a bug :) > > >>> > > >>> Your two survivor spaces are filled, so it was likely about to move > > >>> objects into the tenured space, which still has plenty of room for > them > > >>> (barring horrible fragmentation). Any issues with that type of thing > > >>> should generate an OOM anyway though. You can find people that have > run > > >>> into similar issues in the past, but a lot of times unreproducible. > > >>> Usually, their bugs are closed and they are told to try a newer JVM. > > >>> > > >>> Your JVM appears to be quite a few versions back. There have been > many > > >>> garbage collection bugs fixed in the 7 or so updates since your > > version, > > >>> a good handful of them related to CMS. > > >>> > > >>> If you can, my best suggestion at the moment is to upgrade to th
Re: Solr and Garbage Collection
Another good option. Here is a comparison of the commands I replied with and this one: http://docs.hp.com/en/5992-5899/ch06s02.html Very similar. Otis Gospodnetic wrote: > Jonathan, > > Here is the JVM argument for logging GC activity: > > -Xloggc:log GC status to a file with time stamps > > Otis > -- > Sematext is hiring -- http://sematext.com/about/jobs.html?mls > Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR > > > > - Original Message > >> From: Jonathan Ariel >> To: solr-user@lucene.apache.org >> Sent: Monday, September 28, 2009 4:49:03 PM >> Subject: Re: Solr and Garbage Collection >> >> How do you track major collections? Even better, how do you log your GC >> behavior with details? Right now I just log total time spent on collections, >> but I don't really know on which collections.Regard application performance >> with the ConcMarkSweepGC, I think I didn't experience any impact for now. >> Actually the CPU usage of the solr servers is almost insignificant (it was >> like that before). >> BTW, do you know a good way to track the N most expensive solr queries? I >> would like to measure that on 2 different solr servers with different GC. >> >> On Mon, Sep 28, 2009 at 4:42 PM, Mark Miller wrote: >> >> >>> Do you have your GC logs? Are you still seeing major collections? >>> >>> Where is the time spent? >>> >>> Hard to say without some of that info. >>> >>> The goal of the low pause collector is to finish collecting before the >>> tenured space is filled - if it doesn't, a standard major collection >>> occurs. >>> >>> The collector will use recent stats it records to try and pick a good >>> time to start - as a fail safe though, it will trigger no matter what at >>> a certain percentage. With Java 1.5, it was 68% full that it triggered. >>> With 1.6, its 92%. >>> >>> If your still getting major collections, you might want to see if >>> lowering that helps (-XX:CMSInitiatingOccupancyFraction=). If not, >>> you might be near optimal settings. >>> >>> There is likely not anything else you should mess with - unless using >>> the extra thread to collect while your app is running affects your apps >>> performance - in that case you might want to look into turning on the >>> incremental mode. But you havn't mentioned that, so I doubt it. >>> >>> >>> >>> -- >>> - Mark >>> >>> http://www.lucidimagination.com >>> >>> >>> >>> Jonathan Ariel wrote: >>> >>>> Ok... good news! Upgrading to the newest version of JVM 6 (update 6) >>>> >>> seems >>> >>>> to solve this ugly bug. With the upgraded JVM I could run the solr >>>> >>> servers >>> >>>> for more than 12 hours on the production environment with the GC >>>> >>> mentioned >>> >>>> in the previous e-mails. The results are really amazing. The time spent >>>> >>> on >>> >>>> collecting memory dropped from 11% to 3.81%Do you think there is more to >>>> tune there? >>>> >>>> Thanks! >>>> >>>> Jonathan >>>> >>>> On Sun, Sep 27, 2009 at 8:39 PM, Bill Au wrote: >>>> >>>> >>>> >>>>> You are running a very old version of Java 6 (update 6). The latest is >>>>> update 16. You should definitely upgrade. There is a bug in Java 6 >>>>> starting with update 4 that may result in a corrupted Lucene/Solr index: >>>>> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6707044 >>>>> https://issues.apache.org/jira/browse/LUCENE-1282 >>>>> >>>>> The JVM crash occurred in the gc thread. So it looks like a bug in the >>>>> >>> JVM >>> >>>>> itself. Upgrading to the latest release might help. Switching to a >>>>> different garbage collector should help. >>>>> >>>>> Bill >>>>> >>>>> On Sat, Sep 26, 2009 at 4:31 PM, Mark Miller >>>>> wrote: >>>>> >>>>> >>>>> >>>>>> Jonathan Ariel wrote: >>>>>> >>>>>> >>>&g
Re: Solr and Garbage Collection
|-verbose:gc | |[GC 325407K->83000K(776768K), 0.2300771 secs] [GC 325816K->83372K(776768K), 0.2454258 secs] [Full GC 267628K->83769K(776768K), 1.8479984 secs]| Additional details with: |-XX:+PrintGCDetails| |[GC [DefNew: 64575K->959K(64576K), 0.0457646 secs] 196016K->133633K(261184K), 0.0459067 secs] And timestamps with: ||-XX:+PrintGCTimeStamps| |111.042: [GC 111.042: [DefNew: 8128K->8128K(8128K), 0.505 secs]111.042: [Tenured: 18154K->2311K(24576K), 0.1290354 secs] 26282K->2311K(32704K), 0.1293306 secs] | Jonathan Ariel wrote: > How do you track major collections? Even better, how do you log your GC > behavior with details? Right now I just log total time spent on collections, > but I don't really know on which collections.Regard application performance > with the ConcMarkSweepGC, I think I didn't experience any impact for now. > Actually the CPU usage of the solr servers is almost insignificant (it was > like that before). > BTW, do you know a good way to track the N most expensive solr queries? I > would like to measure that on 2 different solr servers with different GC. > > On Mon, Sep 28, 2009 at 4:42 PM, Mark Miller wrote: > > >> Do you have your GC logs? Are you still seeing major collections? >> >> Where is the time spent? >> >> Hard to say without some of that info. >> >> The goal of the low pause collector is to finish collecting before the >> tenured space is filled - if it doesn't, a standard major collection >> occurs. >> >> The collector will use recent stats it records to try and pick a good >> time to start - as a fail safe though, it will trigger no matter what at >> a certain percentage. With Java 1.5, it was 68% full that it triggered. >> With 1.6, its 92%. >> >> If your still getting major collections, you might want to see if >> lowering that helps (-XX:CMSInitiatingOccupancyFraction=). If not, >> you might be near optimal settings. >> >> There is likely not anything else you should mess with - unless using >> the extra thread to collect while your app is running affects your apps >> performance - in that case you might want to look into turning on the >> incremental mode. But you havn't mentioned that, so I doubt it. >> >> >> >> -- >> - Mark >> >> http://www.lucidimagination.com >> >> >> >> Jonathan Ariel wrote: >> >>> Ok... good news! Upgrading to the newest version of JVM 6 (update 6) >>> >> seems >> >>> to solve this ugly bug. With the upgraded JVM I could run the solr >>> >> servers >> >>> for more than 12 hours on the production environment with the GC >>> >> mentioned >> >>> in the previous e-mails. The results are really amazing. The time spent >>> >> on >> >>> collecting memory dropped from 11% to 3.81%Do you think there is more to >>> tune there? >>> >>> Thanks! >>> >>> Jonathan >>> >>> On Sun, Sep 27, 2009 at 8:39 PM, Bill Au wrote: >>> >>> >>> You are running a very old version of Java 6 (update 6). The latest is update 16. You should definitely upgrade. There is a bug in Java 6 starting with update 4 that may result in a corrupted Lucene/Solr index: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6707044 https://issues.apache.org/jira/browse/LUCENE-1282 The JVM crash occurred in the gc thread. So it looks like a bug in the >> JVM >> itself. Upgrading to the latest release might help. Switching to a different garbage collector should help. Bill On Sat, Sep 26, 2009 at 4:31 PM, Mark Miller wrote: > Jonathan Ariel wrote: > > >> Ok. After the server ran for more than 12 hours, the time spent on GC >> decreased from 11% to 3,4%, but 5 hours later it crashed. This is the >> >> > thread > > >> dump, maybe you can help identify what happened? >> >> >> > Well thats a tough ;) My guess is its a bug :) > > Your two survivor spaces are filled, so it was likely about to move > objects into the tenured space, which still has plenty of room for them > (barring horrible fragmentation). Any issues with that type of thing > should generate an OOM anyway though. You can find people that have run > into similar issues in the past, but a lot of times unreproducible. > Usually, their bugs are closed and they are told to try a newer JVM. > > Your JVM appears to be quite a few versions back. There have been many > garbage collection bugs fixed in the 7 or so updates since your > >> version, >> > a good handful of them related to CMS. > > If you can, my best suggestion at the moment is to upgrade to the > >> latest >> > and see how that fairs. > > If not, you might see if going back to the throughput collector and > turning on the parallel tenured space collector might meet your needs
Re: Solr and Garbage Collection
Jonathan, Here is the JVM argument for logging GC activity: -Xloggc:log GC status to a file with time stamps Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message > From: Jonathan Ariel > To: solr-user@lucene.apache.org > Sent: Monday, September 28, 2009 4:49:03 PM > Subject: Re: Solr and Garbage Collection > > How do you track major collections? Even better, how do you log your GC > behavior with details? Right now I just log total time spent on collections, > but I don't really know on which collections.Regard application performance > with the ConcMarkSweepGC, I think I didn't experience any impact for now. > Actually the CPU usage of the solr servers is almost insignificant (it was > like that before). > BTW, do you know a good way to track the N most expensive solr queries? I > would like to measure that on 2 different solr servers with different GC. > > On Mon, Sep 28, 2009 at 4:42 PM, Mark Miller wrote: > > > Do you have your GC logs? Are you still seeing major collections? > > > > Where is the time spent? > > > > Hard to say without some of that info. > > > > The goal of the low pause collector is to finish collecting before the > > tenured space is filled - if it doesn't, a standard major collection > > occurs. > > > > The collector will use recent stats it records to try and pick a good > > time to start - as a fail safe though, it will trigger no matter what at > > a certain percentage. With Java 1.5, it was 68% full that it triggered. > > With 1.6, its 92%. > > > > If your still getting major collections, you might want to see if > > lowering that helps (-XX:CMSInitiatingOccupancyFraction=). If not, > > you might be near optimal settings. > > > > There is likely not anything else you should mess with - unless using > > the extra thread to collect while your app is running affects your apps > > performance - in that case you might want to look into turning on the > > incremental mode. But you havn't mentioned that, so I doubt it. > > > > > > > > -- > > - Mark > > > > http://www.lucidimagination.com > > > > > > > > Jonathan Ariel wrote: > > > Ok... good news! Upgrading to the newest version of JVM 6 (update 6) > > seems > > > to solve this ugly bug. With the upgraded JVM I could run the solr > > servers > > > for more than 12 hours on the production environment with the GC > > mentioned > > > in the previous e-mails. The results are really amazing. The time spent > > on > > > collecting memory dropped from 11% to 3.81%Do you think there is more to > > > tune there? > > > > > > Thanks! > > > > > > Jonathan > > > > > > On Sun, Sep 27, 2009 at 8:39 PM, Bill Au wrote: > > > > > > > > >> You are running a very old version of Java 6 (update 6). The latest is > > >> update 16. You should definitely upgrade. There is a bug in Java 6 > > >> starting with update 4 that may result in a corrupted Lucene/Solr index: > > >> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6707044 > > >> https://issues.apache.org/jira/browse/LUCENE-1282 > > >> > > >> The JVM crash occurred in the gc thread. So it looks like a bug in the > > JVM > > >> itself. Upgrading to the latest release might help. Switching to a > > >> different garbage collector should help. > > >> > > >> Bill > > >> > > >> On Sat, Sep 26, 2009 at 4:31 PM, Mark Miller > > >> wrote: > > >> > > >> > > >>> Jonathan Ariel wrote: > > >>> > > >>>> Ok. After the server ran for more than 12 hours, the time spent on GC > > >>>> decreased from 11% to 3,4%, but 5 hours later it crashed. This is the > > >>>> > > >>> thread > > >>> > > >>>> dump, maybe you can help identify what happened? > > >>>> > > >>>> > > >>> Well thats a tough ;) My guess is its a bug :) > > >>> > > >>> Your two survivor spaces are filled, so it was likely about to move > > >>> objects into the tenured space, which still has plenty of room for them > > >>> (barring horrible fragmentation). Any issues with that type of thing > > >>> should generate an OOM anyway though. You can find people that have
Re: Solr and Garbage Collection
How do you track major collections? Even better, how do you log your GC behavior with details? Right now I just log total time spent on collections, but I don't really know on which collections.Regard application performance with the ConcMarkSweepGC, I think I didn't experience any impact for now. Actually the CPU usage of the solr servers is almost insignificant (it was like that before). BTW, do you know a good way to track the N most expensive solr queries? I would like to measure that on 2 different solr servers with different GC. On Mon, Sep 28, 2009 at 4:42 PM, Mark Miller wrote: > Do you have your GC logs? Are you still seeing major collections? > > Where is the time spent? > > Hard to say without some of that info. > > The goal of the low pause collector is to finish collecting before the > tenured space is filled - if it doesn't, a standard major collection > occurs. > > The collector will use recent stats it records to try and pick a good > time to start - as a fail safe though, it will trigger no matter what at > a certain percentage. With Java 1.5, it was 68% full that it triggered. > With 1.6, its 92%. > > If your still getting major collections, you might want to see if > lowering that helps (-XX:CMSInitiatingOccupancyFraction=). If not, > you might be near optimal settings. > > There is likely not anything else you should mess with - unless using > the extra thread to collect while your app is running affects your apps > performance - in that case you might want to look into turning on the > incremental mode. But you havn't mentioned that, so I doubt it. > > > > -- > - Mark > > http://www.lucidimagination.com > > > > Jonathan Ariel wrote: > > Ok... good news! Upgrading to the newest version of JVM 6 (update 6) > seems > > to solve this ugly bug. With the upgraded JVM I could run the solr > servers > > for more than 12 hours on the production environment with the GC > mentioned > > in the previous e-mails. The results are really amazing. The time spent > on > > collecting memory dropped from 11% to 3.81%Do you think there is more to > > tune there? > > > > Thanks! > > > > Jonathan > > > > On Sun, Sep 27, 2009 at 8:39 PM, Bill Au wrote: > > > > > >> You are running a very old version of Java 6 (update 6). The latest is > >> update 16. You should definitely upgrade. There is a bug in Java 6 > >> starting with update 4 that may result in a corrupted Lucene/Solr index: > >> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6707044 > >> https://issues.apache.org/jira/browse/LUCENE-1282 > >> > >> The JVM crash occurred in the gc thread. So it looks like a bug in the > JVM > >> itself. Upgrading to the latest release might help. Switching to a > >> different garbage collector should help. > >> > >> Bill > >> > >> On Sat, Sep 26, 2009 at 4:31 PM, Mark Miller > >> wrote: > >> > >> > >>> Jonathan Ariel wrote: > >>> > Ok. After the server ran for more than 12 hours, the time spent on GC > decreased from 11% to 3,4%, but 5 hours later it crashed. This is the > > >>> thread > >>> > dump, maybe you can help identify what happened? > > > >>> Well thats a tough ;) My guess is its a bug :) > >>> > >>> Your two survivor spaces are filled, so it was likely about to move > >>> objects into the tenured space, which still has plenty of room for them > >>> (barring horrible fragmentation). Any issues with that type of thing > >>> should generate an OOM anyway though. You can find people that have run > >>> into similar issues in the past, but a lot of times unreproducible. > >>> Usually, their bugs are closed and they are told to try a newer JVM. > >>> > >>> Your JVM appears to be quite a few versions back. There have been many > >>> garbage collection bugs fixed in the 7 or so updates since your > version, > >>> a good handful of them related to CMS. > >>> > >>> If you can, my best suggestion at the moment is to upgrade to the > latest > >>> and see how that fairs. > >>> > >>> If not, you might see if going back to the throughput collector and > >>> turning on the parallel tenured space collector might meet your needs > >>> instead. You can work with other params to get that going better if you > >>> have to as well. > >>> > >>> Also, adjusting other settings with the low pause collector might > >>> trigger something to side step the bug. Not a great option there though > >>> > >> ;) > >> > >>> How many unique fields are you sorting/faceting on? It must be a lot if > >>> you need 10 gig for 8 million documents. Its kind of rough to have to > >>> work at such a close limit to your total heap available as a min mem > >>> requirement. > >>> > >>> -- > >>> - Mark > >>> > >>> http://www.lucidimagination.com > >>> > >>> > >>> > # > # An unexpected error has been detected by Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x2b4e0f69ea2a, pid=32224, tid=1103812928 > # > # Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode > >>>
Re: Solr and Garbage Collection
Do you have your GC logs? Are you still seeing major collections? Where is the time spent? Hard to say without some of that info. The goal of the low pause collector is to finish collecting before the tenured space is filled - if it doesn't, a standard major collection occurs. The collector will use recent stats it records to try and pick a good time to start - as a fail safe though, it will trigger no matter what at a certain percentage. With Java 1.5, it was 68% full that it triggered. With 1.6, its 92%. If your still getting major collections, you might want to see if lowering that helps (-XX:CMSInitiatingOccupancyFraction=). If not, you might be near optimal settings. There is likely not anything else you should mess with - unless using the extra thread to collect while your app is running affects your apps performance - in that case you might want to look into turning on the incremental mode. But you havn't mentioned that, so I doubt it. -- - Mark http://www.lucidimagination.com Jonathan Ariel wrote: > Ok... good news! Upgrading to the newest version of JVM 6 (update 6) seems > to solve this ugly bug. With the upgraded JVM I could run the solr servers > for more than 12 hours on the production environment with the GC mentioned > in the previous e-mails. The results are really amazing. The time spent on > collecting memory dropped from 11% to 3.81%Do you think there is more to > tune there? > > Thanks! > > Jonathan > > On Sun, Sep 27, 2009 at 8:39 PM, Bill Au wrote: > > >> You are running a very old version of Java 6 (update 6). The latest is >> update 16. You should definitely upgrade. There is a bug in Java 6 >> starting with update 4 that may result in a corrupted Lucene/Solr index: >> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6707044 >> https://issues.apache.org/jira/browse/LUCENE-1282 >> >> The JVM crash occurred in the gc thread. So it looks like a bug in the JVM >> itself. Upgrading to the latest release might help. Switching to a >> different garbage collector should help. >> >> Bill >> >> On Sat, Sep 26, 2009 at 4:31 PM, Mark Miller >> wrote: >> >> >>> Jonathan Ariel wrote: >>> Ok. After the server ran for more than 12 hours, the time spent on GC decreased from 11% to 3,4%, but 5 hours later it crashed. This is the >>> thread >>> dump, maybe you can help identify what happened? >>> Well thats a tough ;) My guess is its a bug :) >>> >>> Your two survivor spaces are filled, so it was likely about to move >>> objects into the tenured space, which still has plenty of room for them >>> (barring horrible fragmentation). Any issues with that type of thing >>> should generate an OOM anyway though. You can find people that have run >>> into similar issues in the past, but a lot of times unreproducible. >>> Usually, their bugs are closed and they are told to try a newer JVM. >>> >>> Your JVM appears to be quite a few versions back. There have been many >>> garbage collection bugs fixed in the 7 or so updates since your version, >>> a good handful of them related to CMS. >>> >>> If you can, my best suggestion at the moment is to upgrade to the latest >>> and see how that fairs. >>> >>> If not, you might see if going back to the throughput collector and >>> turning on the parallel tenured space collector might meet your needs >>> instead. You can work with other params to get that going better if you >>> have to as well. >>> >>> Also, adjusting other settings with the low pause collector might >>> trigger something to side step the bug. Not a great option there though >>> >> ;) >> >>> How many unique fields are you sorting/faceting on? It must be a lot if >>> you need 10 gig for 8 million documents. Its kind of rough to have to >>> work at such a close limit to your total heap available as a min mem >>> requirement. >>> >>> -- >>> - Mark >>> >>> http://www.lucidimagination.com >>> >>> >>> # # An unexpected error has been detected by Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x2b4e0f69ea2a, pid=32224, tid=1103812928 # # Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode linux-amd64) # Problematic frame: # V [libjvm.so+0x265a2a] # # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp # --- T H R E A D --- Current thread (0x5be47400): VMThread [stack: 0x41bad000,0x41cae000] [id=32249] siginfo:si_signo=SIGSEGV: si_errno=0, si_code=128 (), si_addr=0x Registers: RAX=0x2aac929b4c70, RBX=0x0037c985003a095e, RCX=0x0006, RDX=0x005c49870037c996 RSP=0x41cac550, RBP=0x41cac550, RSI=0x2aac929b4c70, RDI=0x0037c985003a095e R8 =0x2aadab201538, R9 =0x0005, R10=0x000
Re: Solr and Garbage Collection
Ok... good news! Upgrading to the newest version of JVM 6 (update 6) seems to solve this ugly bug. With the upgraded JVM I could run the solr servers for more than 12 hours on the production environment with the GC mentioned in the previous e-mails. The results are really amazing. The time spent on collecting memory dropped from 11% to 3.81%Do you think there is more to tune there? Thanks! Jonathan On Sun, Sep 27, 2009 at 8:39 PM, Bill Au wrote: > You are running a very old version of Java 6 (update 6). The latest is > update 16. You should definitely upgrade. There is a bug in Java 6 > starting with update 4 that may result in a corrupted Lucene/Solr index: > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6707044 > https://issues.apache.org/jira/browse/LUCENE-1282 > > The JVM crash occurred in the gc thread. So it looks like a bug in the JVM > itself. Upgrading to the latest release might help. Switching to a > different garbage collector should help. > > Bill > > On Sat, Sep 26, 2009 at 4:31 PM, Mark Miller > wrote: > > > Jonathan Ariel wrote: > > > Ok. After the server ran for more than 12 hours, the time spent on GC > > > decreased from 11% to 3,4%, but 5 hours later it crashed. This is the > > thread > > > dump, maybe you can help identify what happened? > > > > > Well thats a tough ;) My guess is its a bug :) > > > > Your two survivor spaces are filled, so it was likely about to move > > objects into the tenured space, which still has plenty of room for them > > (barring horrible fragmentation). Any issues with that type of thing > > should generate an OOM anyway though. You can find people that have run > > into similar issues in the past, but a lot of times unreproducible. > > Usually, their bugs are closed and they are told to try a newer JVM. > > > > Your JVM appears to be quite a few versions back. There have been many > > garbage collection bugs fixed in the 7 or so updates since your version, > > a good handful of them related to CMS. > > > > If you can, my best suggestion at the moment is to upgrade to the latest > > and see how that fairs. > > > > If not, you might see if going back to the throughput collector and > > turning on the parallel tenured space collector might meet your needs > > instead. You can work with other params to get that going better if you > > have to as well. > > > > Also, adjusting other settings with the low pause collector might > > trigger something to side step the bug. Not a great option there though > ;) > > > > How many unique fields are you sorting/faceting on? It must be a lot if > > you need 10 gig for 8 million documents. Its kind of rough to have to > > work at such a close limit to your total heap available as a min mem > > requirement. > > > > -- > > - Mark > > > > http://www.lucidimagination.com > > > > > > > # > > > # An unexpected error has been detected by Java Runtime Environment: > > > # > > > # SIGSEGV (0xb) at pc=0x2b4e0f69ea2a, pid=32224, tid=1103812928 > > > # > > > # Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode > > > linux-amd64) > > > # Problematic frame: > > > # V [libjvm.so+0x265a2a] > > > # > > > # If you would like to submit a bug report, please visit: > > > # http://java.sun.com/webapps/bugreport/crash.jsp > > > # > > > > > > --- T H R E A D --- > > > > > > Current thread (0x5be47400): VMThread [stack: > > > 0x41bad000,0x41cae000] [id=32249] > > > > > > siginfo:si_signo=SIGSEGV: si_errno=0, si_code=128 (), > > > si_addr=0x > > > > > > Registers: > > > RAX=0x2aac929b4c70, RBX=0x0037c985003a095e, RCX=0x0006, > > > RDX=0x005c49870037c996 > > > RSP=0x41cac550, RBP=0x41cac550, RSI=0x2aac929b4c70, > > > RDI=0x0037c985003a095e > > > R8 =0x2aadab201538, R9 =0x0005, R10=0x0001, > > > R11=0x0010 > > > R12=0x2aac929b4c70, R13=0x2aac9289cf58, R14=0x2aac9289cf40, > > > R15=0x2aadab2015ac > > > RIP=0x2b4e0f69ea2a, EFL=0x00010206, > > CSGSFS=0x0033, > > > ERR=0x > > > TRAPNO=0x000d > > > > > > Top of Stack: (sp=0x41cac550) > > > 0x41cac550: 41cac580 2b4e0f903c5b > > > 0x41cac560: 41cac590 0003 > > > 0x41cac570: 2aac9289cf50 2aadab2015a8 > > > 0x41cac580: 41cac5c0 2b4e0f72e388 > > > 0x41cac590: 41cac5c0 2aac9289cf40 > > > 0x41cac5a0: 0005 2b4e0fc86330 > > > 0x41cac5b0: 2b4e0fd8c740 > > > 0x41cac5c0: 41cac5f0 2b4e0f903b7f > > > 0x41cac5d0: 41cac610 0003 > > > 0x41cac5e0: 2aaccb1750f8 2aaccea41570 > > > 0x41cac5f0: 41cac610 2b4e0f931548 > > > 0x41cac600: 2b4e0fc861d8 2aadd4052ab0 > > > 0x41cac610: 41cac
Re: Solr and Garbage Collection
You are running a very old version of Java 6 (update 6). The latest is update 16. You should definitely upgrade. There is a bug in Java 6 starting with update 4 that may result in a corrupted Lucene/Solr index: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6707044 https://issues.apache.org/jira/browse/LUCENE-1282 The JVM crash occurred in the gc thread. So it looks like a bug in the JVM itself. Upgrading to the latest release might help. Switching to a different garbage collector should help. Bill On Sat, Sep 26, 2009 at 4:31 PM, Mark Miller wrote: > Jonathan Ariel wrote: > > Ok. After the server ran for more than 12 hours, the time spent on GC > > decreased from 11% to 3,4%, but 5 hours later it crashed. This is the > thread > > dump, maybe you can help identify what happened? > > > Well thats a tough ;) My guess is its a bug :) > > Your two survivor spaces are filled, so it was likely about to move > objects into the tenured space, which still has plenty of room for them > (barring horrible fragmentation). Any issues with that type of thing > should generate an OOM anyway though. You can find people that have run > into similar issues in the past, but a lot of times unreproducible. > Usually, their bugs are closed and they are told to try a newer JVM. > > Your JVM appears to be quite a few versions back. There have been many > garbage collection bugs fixed in the 7 or so updates since your version, > a good handful of them related to CMS. > > If you can, my best suggestion at the moment is to upgrade to the latest > and see how that fairs. > > If not, you might see if going back to the throughput collector and > turning on the parallel tenured space collector might meet your needs > instead. You can work with other params to get that going better if you > have to as well. > > Also, adjusting other settings with the low pause collector might > trigger something to side step the bug. Not a great option there though ;) > > How many unique fields are you sorting/faceting on? It must be a lot if > you need 10 gig for 8 million documents. Its kind of rough to have to > work at such a close limit to your total heap available as a min mem > requirement. > > -- > - Mark > > http://www.lucidimagination.com > > > > # > > # An unexpected error has been detected by Java Runtime Environment: > > # > > # SIGSEGV (0xb) at pc=0x2b4e0f69ea2a, pid=32224, tid=1103812928 > > # > > # Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode > > linux-amd64) > > # Problematic frame: > > # V [libjvm.so+0x265a2a] > > # > > # If you would like to submit a bug report, please visit: > > # http://java.sun.com/webapps/bugreport/crash.jsp > > # > > > > --- T H R E A D --- > > > > Current thread (0x5be47400): VMThread [stack: > > 0x41bad000,0x41cae000] [id=32249] > > > > siginfo:si_signo=SIGSEGV: si_errno=0, si_code=128 (), > > si_addr=0x > > > > Registers: > > RAX=0x2aac929b4c70, RBX=0x0037c985003a095e, RCX=0x0006, > > RDX=0x005c49870037c996 > > RSP=0x41cac550, RBP=0x41cac550, RSI=0x2aac929b4c70, > > RDI=0x0037c985003a095e > > R8 =0x2aadab201538, R9 =0x0005, R10=0x0001, > > R11=0x0010 > > R12=0x2aac929b4c70, R13=0x2aac9289cf58, R14=0x2aac9289cf40, > > R15=0x2aadab2015ac > > RIP=0x2b4e0f69ea2a, EFL=0x00010206, > CSGSFS=0x0033, > > ERR=0x > > TRAPNO=0x000d > > > > Top of Stack: (sp=0x41cac550) > > 0x41cac550: 41cac580 2b4e0f903c5b > > 0x41cac560: 41cac590 0003 > > 0x41cac570: 2aac9289cf50 2aadab2015a8 > > 0x41cac580: 41cac5c0 2b4e0f72e388 > > 0x41cac590: 41cac5c0 2aac9289cf40 > > 0x41cac5a0: 0005 2b4e0fc86330 > > 0x41cac5b0: 2b4e0fd8c740 > > 0x41cac5c0: 41cac5f0 2b4e0f903b7f > > 0x41cac5d0: 41cac610 0003 > > 0x41cac5e0: 2aaccb1750f8 2aaccea41570 > > 0x41cac5f0: 41cac610 2b4e0f931548 > > 0x41cac600: 2b4e0fc861d8 2aadd4052ab0 > > 0x41cac610: 41cac640 2b4e0f903d1a > > 0x41cac620: 41cac650 0003 > > 0x41cac630: 5bc7d6d0 2b4e0fd8c740 > > 0x41cac640: 41cac650 2b4e0f90411c > > 0x41cac650: 41cac680 2b4e0fa1d16e > > 0x41cac660: 5bc7d6d0 > > 0x41cac670: 0002 2b4e0fd8c740 > > 0x41cac680: 41cac6c0 2b4e0fa74640 > > 0x41cac690: 41cac6b0 5bc7d6d0 > > 0x41cac6a0: 0002 2b4e0fd8c740 > > 0x41cac6b0: 0001 2b4e0fd8c740 > > 0x41cac6c0: 41cac700 0
Re: Solr and Garbage Collection
Right... when I increased it to 12GB all OOM just disappear. And all the tests are being run on the live environment and for several hours, so it is real enough :)As soon as I update JVM and test again the GC I will let you know. If you think I can run another test meanwhile just let me know. On Sun, Sep 27, 2009 at 5:05 PM, Mark Miller wrote: > Jonathan Ariel wrote: > > Well.. it is strange that when I use the default GC I don't get any > errors. > > > Not so strange - it's different code. The bug is Likely in the low pause > collector and not the serial collector. > > If I'm so close to run out of memory I should see those OOM exceptions as > > well with the standard GC. > Those? Your not seeing any that you mentioned unless you lower your heap? > > BTW I'm faceting on around 13 fields and my total > > number of unique values is around 3. > > One of the fields with the biggest amount of unique values has almost > 16000 > > unique values. > > > > > > On Sun, Sep 27, 2009 at 4:32 PM, Fuad Efendi wrote: > > > > > >> Mark, > >> > >> > >> Nothing against orange-hat :) > >> > >> Nothing against GC tuning; but if SOLR needs application-specific > settings > >> it should be well-documented. > >> > >> GC-tuning: for instance, we need it for 'realtime' Online Trading > >> applications. However, even Online Banking doesn't need; primary reason > - > >> GC > >> must happen 'outside of current transaction', GC 'must be predictable', > and > >> (for instance) Oracle/BEA JRockit has specific 'realtime' version for > >> that... Does SOLR need that? > >> > >> > >> Having load-stress simulator (multithreaded!!!) will definitely help to > >> predict any possible bottleneck... it's even better to write it from > >> scratch > >> (depends on schema!), by sending random requests to SOLR in-parallel... > >> instead of waiting when FieldCache tries to add new FieldImpl to cache > >> (unpredictable!) > >> > >> > >> Tomcat is multithreaded; what if end-users need to load 1000s large > >> documents (in parallel! 1000s concurrent users), can you predict memory > >> requirements and GC options without application-specific knowledge? What > >> about new SOLR-Caches warming up? > >> > >> > >> -Fuad > >> > >> > >> > >>> -Original Message- > >>> From: Mark Miller [mailto:markrmil...@gmail.com] > >>> Sent: September-27-09 2:46 PM > >>> To: solr-user@lucene.apache.org > >>> Subject: Re: Solr and Garbage Collection > >>> > >>> If he needed double the RAM, he'd likely know by now :) The JVM likes > to > >>> throw OOM exceptions when you need more RAM. Until it does - thats an > >>> odd path to focus on. There has been no indication he has ever seen an > >>> OOM with his over 10 GB heap. It sounds like he has run Solr in his > >>> environment for quite a long time - after running for that long, until > >>> he gets an OOM, its about as good as chasing ghost to worry about it. > >>> > >>> I like to think of GC tuning as orange-hat. Mostly because I like the > >>> color orange. > >>> > >>> Fuad Efendi wrote: > >>> > >>>>>> Ok. After the server ran for more than 12 hours, the time spent on > GC > >>>>>> decreased from 11% to 3,4%, but 5 hours later it crashed. > >>>>>> > >>>>>> > >>>> All this 'black-hat' GC tuning and 'fast' object moving (especially > >>>> > >> objects > >> > >>>> accessing by some thread during GC-defragmentation) > >>>> > >>>> - try to use multithreaded load-stress tools (at least 100 requests > >>>> in-parallel) and see that you need at least double memory if 12Gb is > >>>> threshold for your FieldCache (largest objects) > >>>> > >>>> > >>>> Also, don't trust this counters: > >>>> > >>>> > >>>>> So I logged the Garbage Collection activity to check if it's because > >>>>> > >> of > >> > >>>>> that. It seems like 11% of the time the application runs, it is > >>>>> > >> stopped > >> > >>>>> because of GC. > >>>>> > >>>>> > >>>> Stopped? Of course, locking/unlocking in order to move objects > >>>> > >> currently > >> > >>>> accessesd in multiuser-multithreaded Tomcat... you can easily create > >>>> > >> crash > >> > >>>> scenario proving that latest-greatest JVMs are buggy too. > >>>> > >>>> > >>>> > >>>> Don't forget: Tomcat is multithreaded, and if 'core' needs 10Gb in > >>>> > >> order > >> to > >> > >>>> avoid OOM, you need to double it (in order to warm new cash instances > >>>> > >> on > >> > >>>> index replica / update). > >>>> > >>>> > >>>> http://www.linkedin.com/in/liferay > >>>> > >>>> > >>>> > >>>> > >>> -- > >>> - Mark > >>> > >>> http://www.lucidimagination.com > >>> > >>> > >>> > >> > >> > >> > > > > > > > -- > - Mark > > http://www.lucidimagination.com > > > >
Re: Solr and Garbage Collection
Jonathan Ariel wrote: > Well.. it is strange that when I use the default GC I don't get any errors. > Not so strange - it's different code. The bug is Likely in the low pause collector and not the serial collector. > If I'm so close to run out of memory I should see those OOM exceptions as > well with the standard GC. Those? Your not seeing any that you mentioned unless you lower your heap? > BTW I'm faceting on around 13 fields and my total > number of unique values is around 3. > One of the fields with the biggest amount of unique values has almost 16000 > unique values. > > > On Sun, Sep 27, 2009 at 4:32 PM, Fuad Efendi wrote: > > >> Mark, >> >> >> Nothing against orange-hat :) >> >> Nothing against GC tuning; but if SOLR needs application-specific settings >> it should be well-documented. >> >> GC-tuning: for instance, we need it for 'realtime' Online Trading >> applications. However, even Online Banking doesn't need; primary reason - >> GC >> must happen 'outside of current transaction', GC 'must be predictable', and >> (for instance) Oracle/BEA JRockit has specific 'realtime' version for >> that... Does SOLR need that? >> >> >> Having load-stress simulator (multithreaded!!!) will definitely help to >> predict any possible bottleneck... it's even better to write it from >> scratch >> (depends on schema!), by sending random requests to SOLR in-parallel... >> instead of waiting when FieldCache tries to add new FieldImpl to cache >> (unpredictable!) >> >> >> Tomcat is multithreaded; what if end-users need to load 1000s large >> documents (in parallel! 1000s concurrent users), can you predict memory >> requirements and GC options without application-specific knowledge? What >> about new SOLR-Caches warming up? >> >> >> -Fuad >> >> >> >>> -Original Message- >>> From: Mark Miller [mailto:markrmil...@gmail.com] >>> Sent: September-27-09 2:46 PM >>> To: solr-user@lucene.apache.org >>> Subject: Re: Solr and Garbage Collection >>> >>> If he needed double the RAM, he'd likely know by now :) The JVM likes to >>> throw OOM exceptions when you need more RAM. Until it does - thats an >>> odd path to focus on. There has been no indication he has ever seen an >>> OOM with his over 10 GB heap. It sounds like he has run Solr in his >>> environment for quite a long time - after running for that long, until >>> he gets an OOM, its about as good as chasing ghost to worry about it. >>> >>> I like to think of GC tuning as orange-hat. Mostly because I like the >>> color orange. >>> >>> Fuad Efendi wrote: >>> >>>>>> Ok. After the server ran for more than 12 hours, the time spent on GC >>>>>> decreased from 11% to 3,4%, but 5 hours later it crashed. >>>>>> >>>>>> >>>> All this 'black-hat' GC tuning and 'fast' object moving (especially >>>> >> objects >> >>>> accessing by some thread during GC-defragmentation) >>>> >>>> - try to use multithreaded load-stress tools (at least 100 requests >>>> in-parallel) and see that you need at least double memory if 12Gb is >>>> threshold for your FieldCache (largest objects) >>>> >>>> >>>> Also, don't trust this counters: >>>> >>>> >>>>> So I logged the Garbage Collection activity to check if it's because >>>>> >> of >> >>>>> that. It seems like 11% of the time the application runs, it is >>>>> >> stopped >> >>>>> because of GC. >>>>> >>>>> >>>> Stopped? Of course, locking/unlocking in order to move objects >>>> >> currently >> >>>> accessesd in multiuser-multithreaded Tomcat... you can easily create >>>> >> crash >> >>>> scenario proving that latest-greatest JVMs are buggy too. >>>> >>>> >>>> >>>> Don't forget: Tomcat is multithreaded, and if 'core' needs 10Gb in >>>> >> order >> to >> >>>> avoid OOM, you need to double it (in order to warm new cash instances >>>> >> on >> >>>> index replica / update). >>>> >>>> >>>> http://www.linkedin.com/in/liferay >>>> >>>> >>>> >>>> >>> -- >>> - Mark >>> >>> http://www.lucidimagination.com >>> >>> >>> >> >> >> > > -- - Mark http://www.lucidimagination.com
Re: Solr and Garbage Collection
Fuad Efendi wrote: > Mark, > > > Nothing against orange-hat :) > > Nothing against GC tuning; but if SOLR needs application-specific settings > it should be well-documented. > > GC-tuning: for instance, we need it for 'realtime' Online Trading > applications. However, even Online Banking doesn't need; primary reason - GC > must happen 'outside of current transaction', GC 'must be predictable', and > (for instance) Oracle/BEA JRockit has specific 'realtime' version for > that... Does SOLR need that? > I'm not sure that Solr needs anything specific - but with a heap near 10 GB, you really do need some sort of parrallel or concurrent collection of the tenured space - unless you can live with the long pauses. I don't think thats Solr specific though. > > Having load-stress simulator (multithreaded!!!) will definitely help to > predict any possible bottleneck... it's even better to write it from scratch > (depends on schema!), by sending random requests to SOLR in-parallel... > instead of waiting when FieldCache tries to add new FieldImpl to cache > (unpredictable!) > Yup - no argument from me here. Perhaps he does need more RAM and will find that out. Testing for that is a good idea. But by the sound of it, I just don't think we can guess that yet. I'm not against him testing to see though - its just semi a solution looking for a problem at the moment. It sounds like he is running this thing for hours and hours in a semi real environment (else why all the GC). He hasn't mentioned any need for more RAM yet. Again though, I'm not saying he shouldn't make sure he has enough RAM under any scenario. Everyone should. It just doesn't seem to be an issue hes indicated hes having. > > Tomcat is multithreaded; what if end-users need to load 1000s large > documents (in parallel! 1000s concurrent users), can you predict memory > requirements and GC options without application-specific knowledge? What > about new SOLR-Caches warming up? > > > -Fuad > > > >> -Original Message- >> From: Mark Miller [mailto:markrmil...@gmail.com] >> Sent: September-27-09 2:46 PM >> To: solr-user@lucene.apache.org >> Subject: Re: Solr and Garbage Collection >> >> If he needed double the RAM, he'd likely know by now :) The JVM likes to >> throw OOM exceptions when you need more RAM. Until it does - thats an >> odd path to focus on. There has been no indication he has ever seen an >> OOM with his over 10 GB heap. It sounds like he has run Solr in his >> environment for quite a long time - after running for that long, until >> he gets an OOM, its about as good as chasing ghost to worry about it. >> >> I like to think of GC tuning as orange-hat. Mostly because I like the >> color orange. >> >> Fuad Efendi wrote: >> >>>>> Ok. After the server ran for more than 12 hours, the time spent on GC >>>>> decreased from 11% to 3,4%, but 5 hours later it crashed. >>>>> >>>>> >>> All this 'black-hat' GC tuning and 'fast' object moving (especially >>> > objects > >>> accessing by some thread during GC-defragmentation) >>> >>> - try to use multithreaded load-stress tools (at least 100 requests >>> in-parallel) and see that you need at least double memory if 12Gb is >>> threshold for your FieldCache (largest objects) >>> >>> >>> Also, don't trust this counters: >>> >>> >>>> So I logged the Garbage Collection activity to check if it's because of >>>> that. It seems like 11% of the time the application runs, it is stopped >>>> because of GC. >>>> >>>> >>> Stopped? Of course, locking/unlocking in order to move objects currently >>> accessesd in multiuser-multithreaded Tomcat... you can easily create >>> > crash > >>> scenario proving that latest-greatest JVMs are buggy too. >>> >>> >>> >>> Don't forget: Tomcat is multithreaded, and if 'core' needs 10Gb in order >>> > to > >>> avoid OOM, you need to double it (in order to warm new cash instances on >>> index replica / update). >>> >>> >>> http://www.linkedin.com/in/liferay >>> >>> >>> >>> >> -- >> - Mark >> >> http://www.lucidimagination.com >> >> >> > > > > -- - Mark http://www.lucidimagination.com
Re: Solr and Garbage Collection
Well.. it is strange that when I use the default GC I don't get any errors. If I'm so close to run out of memory I should see those OOM exceptions as well with the standard GC.BTW I'm faceting on around 13 fields and my total number of unique values is around 3. One of the fields with the biggest amount of unique values has almost 16000 unique values. On Sun, Sep 27, 2009 at 4:32 PM, Fuad Efendi wrote: > Mark, > > > Nothing against orange-hat :) > > Nothing against GC tuning; but if SOLR needs application-specific settings > it should be well-documented. > > GC-tuning: for instance, we need it for 'realtime' Online Trading > applications. However, even Online Banking doesn't need; primary reason - > GC > must happen 'outside of current transaction', GC 'must be predictable', and > (for instance) Oracle/BEA JRockit has specific 'realtime' version for > that... Does SOLR need that? > > > Having load-stress simulator (multithreaded!!!) will definitely help to > predict any possible bottleneck... it's even better to write it from > scratch > (depends on schema!), by sending random requests to SOLR in-parallel... > instead of waiting when FieldCache tries to add new FieldImpl to cache > (unpredictable!) > > > Tomcat is multithreaded; what if end-users need to load 1000s large > documents (in parallel! 1000s concurrent users), can you predict memory > requirements and GC options without application-specific knowledge? What > about new SOLR-Caches warming up? > > > -Fuad > > > > -----Original Message----- > > From: Mark Miller [mailto:markrmil...@gmail.com] > > Sent: September-27-09 2:46 PM > > To: solr-user@lucene.apache.org > > Subject: Re: Solr and Garbage Collection > > > > If he needed double the RAM, he'd likely know by now :) The JVM likes to > > throw OOM exceptions when you need more RAM. Until it does - thats an > > odd path to focus on. There has been no indication he has ever seen an > > OOM with his over 10 GB heap. It sounds like he has run Solr in his > > environment for quite a long time - after running for that long, until > > he gets an OOM, its about as good as chasing ghost to worry about it. > > > > I like to think of GC tuning as orange-hat. Mostly because I like the > > color orange. > > > > Fuad Efendi wrote: > > >>> Ok. After the server ran for more than 12 hours, the time spent on GC > > >>> decreased from 11% to 3,4%, but 5 hours later it crashed. > > >>> > > > > > > All this 'black-hat' GC tuning and 'fast' object moving (especially > objects > > > accessing by some thread during GC-defragmentation) > > > > > > - try to use multithreaded load-stress tools (at least 100 requests > > > in-parallel) and see that you need at least double memory if 12Gb is > > > threshold for your FieldCache (largest objects) > > > > > > > > > Also, don't trust this counters: > > > > > >> So I logged the Garbage Collection activity to check if it's because > of > > >> that. It seems like 11% of the time the application runs, it is > stopped > > >> because of GC. > > >> > > > > > > > > > Stopped? Of course, locking/unlocking in order to move objects > currently > > > accessesd in multiuser-multithreaded Tomcat... you can easily create > crash > > > scenario proving that latest-greatest JVMs are buggy too. > > > > > > > > > > > > Don't forget: Tomcat is multithreaded, and if 'core' needs 10Gb in > order > to > > > avoid OOM, you need to double it (in order to warm new cash instances > on > > > index replica / update). > > > > > > > > > http://www.linkedin.com/in/liferay > > > > > > > > > > > > > > > -- > > - Mark > > > > http://www.lucidimagination.com > > > > > > > >
RE: Solr and Garbage Collection
Mark, Nothing against orange-hat :) Nothing against GC tuning; but if SOLR needs application-specific settings it should be well-documented. GC-tuning: for instance, we need it for 'realtime' Online Trading applications. However, even Online Banking doesn't need; primary reason - GC must happen 'outside of current transaction', GC 'must be predictable', and (for instance) Oracle/BEA JRockit has specific 'realtime' version for that... Does SOLR need that? Having load-stress simulator (multithreaded!!!) will definitely help to predict any possible bottleneck... it's even better to write it from scratch (depends on schema!), by sending random requests to SOLR in-parallel... instead of waiting when FieldCache tries to add new FieldImpl to cache (unpredictable!) Tomcat is multithreaded; what if end-users need to load 1000s large documents (in parallel! 1000s concurrent users), can you predict memory requirements and GC options without application-specific knowledge? What about new SOLR-Caches warming up? -Fuad > -Original Message- > From: Mark Miller [mailto:markrmil...@gmail.com] > Sent: September-27-09 2:46 PM > To: solr-user@lucene.apache.org > Subject: Re: Solr and Garbage Collection > > If he needed double the RAM, he'd likely know by now :) The JVM likes to > throw OOM exceptions when you need more RAM. Until it does - thats an > odd path to focus on. There has been no indication he has ever seen an > OOM with his over 10 GB heap. It sounds like he has run Solr in his > environment for quite a long time - after running for that long, until > he gets an OOM, its about as good as chasing ghost to worry about it. > > I like to think of GC tuning as orange-hat. Mostly because I like the > color orange. > > Fuad Efendi wrote: > >>> Ok. After the server ran for more than 12 hours, the time spent on GC > >>> decreased from 11% to 3,4%, but 5 hours later it crashed. > >>> > > > > All this 'black-hat' GC tuning and 'fast' object moving (especially objects > > accessing by some thread during GC-defragmentation) > > > > - try to use multithreaded load-stress tools (at least 100 requests > > in-parallel) and see that you need at least double memory if 12Gb is > > threshold for your FieldCache (largest objects) > > > > > > Also, don't trust this counters: > > > >> So I logged the Garbage Collection activity to check if it's because of > >> that. It seems like 11% of the time the application runs, it is stopped > >> because of GC. > >> > > > > > > Stopped? Of course, locking/unlocking in order to move objects currently > > accessesd in multiuser-multithreaded Tomcat... you can easily create crash > > scenario proving that latest-greatest JVMs are buggy too. > > > > > > > > Don't forget: Tomcat is multithreaded, and if 'core' needs 10Gb in order to > > avoid OOM, you need to double it (in order to warm new cash instances on > > index replica / update). > > > > > > http://www.linkedin.com/in/liferay > > > > > > > > > -- > - Mark > > http://www.lucidimagination.com > >
Re: Solr and Garbage Collection
If he needed double the RAM, he'd likely know by now :) The JVM likes to throw OOM exceptions when you need more RAM. Until it does - thats an odd path to focus on. There has been no indication he has ever seen an OOM with his over 10 GB heap. It sounds like he has run Solr in his environment for quite a long time - after running for that long, until he gets an OOM, its about as good as chasing ghost to worry about it. I like to think of GC tuning as orange-hat. Mostly because I like the color orange. Fuad Efendi wrote: >>> Ok. After the server ran for more than 12 hours, the time spent on GC >>> decreased from 11% to 3,4%, but 5 hours later it crashed. >>> > > All this 'black-hat' GC tuning and 'fast' object moving (especially objects > accessing by some thread during GC-defragmentation) > > - try to use multithreaded load-stress tools (at least 100 requests > in-parallel) and see that you need at least double memory if 12Gb is > threshold for your FieldCache (largest objects) > > > Also, don't trust this counters: > >> So I logged the Garbage Collection activity to check if it's because of >> that. It seems like 11% of the time the application runs, it is stopped >> because of GC. >> > > > Stopped? Of course, locking/unlocking in order to move objects currently > accessesd in multiuser-multithreaded Tomcat... you can easily create crash > scenario proving that latest-greatest JVMs are buggy too. > > > > Don't forget: Tomcat is multithreaded, and if 'core' needs 10Gb in order to > avoid OOM, you need to double it (in order to warm new cash instances on > index replica / update). > > > http://www.linkedin.com/in/liferay > > > -- - Mark http://www.lucidimagination.com
RE: Solr and Garbage Collection
>> Ok. After the server ran for more than 12 hours, the time spent on GC >> decreased from 11% to 3,4%, but 5 hours later it crashed. All this 'black-hat' GC tuning and 'fast' object moving (especially objects accessing by some thread during GC-defragmentation) - try to use multithreaded load-stress tools (at least 100 requests in-parallel) and see that you need at least double memory if 12Gb is threshold for your FieldCache (largest objects) Also, don't trust this counters: >So I logged the Garbage Collection activity to check if it's because of >that. It seems like 11% of the time the application runs, it is stopped >because of GC. Stopped? Of course, locking/unlocking in order to move objects currently accessesd in multiuser-multithreaded Tomcat... you can easily create crash scenario proving that latest-greatest JVMs are buggy too. Don't forget: Tomcat is multithreaded, and if 'core' needs 10Gb in order to avoid OOM, you need to double it (in order to warm new cash instances on index replica / update). http://www.linkedin.com/in/liferay
Re: Solr and Garbage Collection
Yes, it seems like a bug. I will update my JVM, try again and let you know the results :) On 9/26/09, Mark Miller wrote: > Jonathan Ariel wrote: >> Ok. After the server ran for more than 12 hours, the time spent on GC >> decreased from 11% to 3,4%, but 5 hours later it crashed. This is the >> thread >> dump, maybe you can help identify what happened? >> > Well thats a tough ;) My guess is its a bug :) > > Your two survivor spaces are filled, so it was likely about to move > objects into the tenured space, which still has plenty of room for them > (barring horrible fragmentation). Any issues with that type of thing > should generate an OOM anyway though. You can find people that have run > into similar issues in the past, but a lot of times unreproducible. > Usually, their bugs are closed and they are told to try a newer JVM. > > Your JVM appears to be quite a few versions back. There have been many > garbage collection bugs fixed in the 7 or so updates since your version, > a good handful of them related to CMS. > > If you can, my best suggestion at the moment is to upgrade to the latest > and see how that fairs. > > If not, you might see if going back to the throughput collector and > turning on the parallel tenured space collector might meet your needs > instead. You can work with other params to get that going better if you > have to as well. > > Also, adjusting other settings with the low pause collector might > trigger something to side step the bug. Not a great option there though ;) > > How many unique fields are you sorting/faceting on? It must be a lot if > you need 10 gig for 8 million documents. Its kind of rough to have to > work at such a close limit to your total heap available as a min mem > requirement. > > -- > - Mark > > http://www.lucidimagination.com > > >> # >> # An unexpected error has been detected by Java Runtime Environment: >> # >> # SIGSEGV (0xb) at pc=0x2b4e0f69ea2a, pid=32224, tid=1103812928 >> # >> # Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode >> linux-amd64) >> # Problematic frame: >> # V [libjvm.so+0x265a2a] >> # >> # If you would like to submit a bug report, please visit: >> # http://java.sun.com/webapps/bugreport/crash.jsp >> # >> >> --- T H R E A D --- >> >> Current thread (0x5be47400): VMThread [stack: >> 0x41bad000,0x41cae000] [id=32249] >> >> siginfo:si_signo=SIGSEGV: si_errno=0, si_code=128 (), >> si_addr=0x >> >> Registers: >> RAX=0x2aac929b4c70, RBX=0x0037c985003a095e, RCX=0x0006, >> RDX=0x005c49870037c996 >> RSP=0x41cac550, RBP=0x41cac550, RSI=0x2aac929b4c70, >> RDI=0x0037c985003a095e >> R8 =0x2aadab201538, R9 =0x0005, R10=0x0001, >> R11=0x0010 >> R12=0x2aac929b4c70, R13=0x2aac9289cf58, R14=0x2aac9289cf40, >> R15=0x2aadab2015ac >> RIP=0x2b4e0f69ea2a, EFL=0x00010206, CSGSFS=0x0033, >> ERR=0x >> TRAPNO=0x000d >> >> Top of Stack: (sp=0x41cac550) >> 0x41cac550: 41cac580 2b4e0f903c5b >> 0x41cac560: 41cac590 0003 >> 0x41cac570: 2aac9289cf50 2aadab2015a8 >> 0x41cac580: 41cac5c0 2b4e0f72e388 >> 0x41cac590: 41cac5c0 2aac9289cf40 >> 0x41cac5a0: 0005 2b4e0fc86330 >> 0x41cac5b0: 2b4e0fd8c740 >> 0x41cac5c0: 41cac5f0 2b4e0f903b7f >> 0x41cac5d0: 41cac610 0003 >> 0x41cac5e0: 2aaccb1750f8 2aaccea41570 >> 0x41cac5f0: 41cac610 2b4e0f931548 >> 0x41cac600: 2b4e0fc861d8 2aadd4052ab0 >> 0x41cac610: 41cac640 2b4e0f903d1a >> 0x41cac620: 41cac650 0003 >> 0x41cac630: 5bc7d6d0 2b4e0fd8c740 >> 0x41cac640: 41cac650 2b4e0f90411c >> 0x41cac650: 41cac680 2b4e0fa1d16e >> 0x41cac660: 5bc7d6d0 >> 0x41cac670: 0002 2b4e0fd8c740 >> 0x41cac680: 41cac6c0 2b4e0fa74640 >> 0x41cac690: 41cac6b0 5bc7d6d0 >> 0x41cac6a0: 0002 2b4e0fd8c740 >> 0x41cac6b0: 0001 2b4e0fd8c740 >> 0x41cac6c0: 41cac700 2b4e0f9a52da >> 0x41cac6d0: bfc0 >> 0x41cac6e0: 2b4e0fd8c740 5bc7d6d0 >> 0x41cac6f0: 2b4e0fd8c740 0001 >> 0x41cac700: 41cac750 2b4e0f6feb80 >> 0x41cac710: 449dae1d9ae42358 3ff0cccd >> 0x41cac720: 2aad289aa680 0001 >> 0x41cac730: 41cac780 >> 0x41cac740: 0001 5bc7d6d0 >> >> Instructions: (pc=0
Re: Solr and Garbage Collection
Sorry Walter. Half the time I type faster than I think. I was mixing concurrent with parallel. I do agree with you on the concurrent part for batch processing (and likely other things). It would likely be far better to use as many CPU's as you can (as many as make sense) collecting in parallel while the world is stopped, rather than paying to do it concurrently. My fault on the confusion. Parallel, super important for large heaps. Concurrent, supper important for systems that always need low response times. Hence the Parallel collector being named the throughput collector :) Sorry for the confusion - wouldn't be the first time ;) I'll stick to my generational argument though - as I said, if most of your objects are long lived (*extremely* rare from what I know), it make senses, but in almost all cases, its super helpful. Which is why sun doesnt even offer non generational anymore. - Mark Mark Miller wrote: > Walter Underwood wrote: > >> For batch-oriented computing, like Hadoop, the most efficient GC is probably >> a non-concurrent, non-generational GC. >> > Okay - for batch we somewhat agree I guess - if you can stand any length > of pausing, non concurrent can be nice, because you don't pay for thread > sync communication. Only with a small heap size though (less than 100MB > is what I've seen). You would pause the batch job while GC takes place. > If you have 8 processors, and you are pausing all of them to collect a > large heap using only 1 processor, that doesn't make much sense to me. > The thread communication pain will be far outweighed by using more > processors to do the collection faster, and not "stop the world" for > your batch job so long. Stopping your application dead in its tracks, > and then only using one of the available processors to collect a large > heap, while the rest sit idle, doesn't make much sense. > > I also don't agree it ever really makes sense not to do generational > collection. What is your argument here? Generational collection is > **way** more efficient for short lived objects, which tend to be up to > 98% of the objects in most applications. The only way I see that making > sense is if you have almost no short lived objects (which occurs in > what, .0001% of apps if at all?). The Sun JVM doesn't even offer a non > generational approach anymore. It's just standard GC practice. > >> I doubt that there are many >> batch-oriented applications of Solr, though. >> >> The rest of the advice is intended to be general and it sounds like we agree >> about sizing. If the nursery is not big enough, the tenured space will be >> used for allocations that have a short lifetime and that will increase the >> length and/or frequency of major collections. >> >> > Yes - I wasn't arguing with every point - I was picking and choosing :) > After the heap size, the size of the young generation is the most > important factor. > >> Cache evictions are the interesting part, because they cause a constant rate >> of tenured space garbage. In most many servers, you can get a big enough >> nursery that major collections are very rare. That won't happen in Solr >> because of cache evictions. >> >> The IBM JVM is excellent. Their concurrent generational GC policy is >> "gencon". >> >> > Yeah, I actually know very little about the IBM JVM, so I wasn't really > commenting. But from the info I gleaned here and on a couple quick web > searches, I'm not too impressed by it's GC. > >> wunder >> >> -Original Message- >> From: Mark Miller [mailto:markrmil...@gmail.com] >> Sent: Friday, September 25, 2009 10:31 AM >> To: solr-user@lucene.apache.org >> Subject: Re: Solr and Garbage Collection >> >> My bad - later, it looks as if your giving general advice, and thats >> what I took issue with. >> >> Any Collector that is not doing generational collection is essentially >> from the dark ages and shouldn't be used. >> >> Any Collector that doesn't have concurrent options, unless possibly your >> running a tiny app (under 100MB of RAM), or only have a single CPU, is >> also dark ages, and not fit for a server environement. >> >> I havn't kept up with IBM's JVM, but it sounds like they are well behind >> Sun in GC then. >> >> - Mark >> >> Walter Underwood wrote: >> >> >>> As I said, I was using the IBM JVM, not the Sun JVM. The "concurrent low >>> pause" collector is only in the Sun JVM. >>> >>> I just found this excellent articl
Re: Solr and Garbage Collection
Also, in case the info might help track something down: Its pretty darn odd that both your survivor spaces are full. I've never seen that ever in one of these dumps. Always one is empty. When one is filled, its moved to the other. Then back. And forth. For a certain number of times until its moved into the tenured space. Both being filled like that really seems like a bug to me - I've looked over tons of the dumps in the past (random ones online), and I have never seen one of the survivor spaces not empty. Mark Miller wrote: > Jonathan Ariel wrote: > >> Ok. After the server ran for more than 12 hours, the time spent on GC >> decreased from 11% to 3,4%, but 5 hours later it crashed. This is the thread >> dump, maybe you can help identify what happened? >> >> > Well thats a tough ;) My guess is its a bug :) > > Your two survivor spaces are filled, so it was likely about to move > objects into the tenured space, which still has plenty of room for them > (barring horrible fragmentation). Any issues with that type of thing > should generate an OOM anyway though. You can find people that have run > into similar issues in the past, but a lot of times unreproducible. > Usually, their bugs are closed and they are told to try a newer JVM. > > Your JVM appears to be quite a few versions back. There have been many > garbage collection bugs fixed in the 7 or so updates since your version, > a good handful of them related to CMS. > > If you can, my best suggestion at the moment is to upgrade to the latest > and see how that fairs. > > If not, you might see if going back to the throughput collector and > turning on the parallel tenured space collector might meet your needs > instead. You can work with other params to get that going better if you > have to as well. > > Also, adjusting other settings with the low pause collector might > trigger something to side step the bug. Not a great option there though ;) > > How many unique fields are you sorting/faceting on? It must be a lot if > you need 10 gig for 8 million documents. Its kind of rough to have to > work at such a close limit to your total heap available as a min mem > requirement. > > -- - Mark http://www.lucidimagination.com
Re: Solr and Garbage Collection
Jonathan Ariel wrote: > Ok. After the server ran for more than 12 hours, the time spent on GC > decreased from 11% to 3,4%, but 5 hours later it crashed. This is the thread > dump, maybe you can help identify what happened? > Well thats a tough ;) My guess is its a bug :) Your two survivor spaces are filled, so it was likely about to move objects into the tenured space, which still has plenty of room for them (barring horrible fragmentation). Any issues with that type of thing should generate an OOM anyway though. You can find people that have run into similar issues in the past, but a lot of times unreproducible. Usually, their bugs are closed and they are told to try a newer JVM. Your JVM appears to be quite a few versions back. There have been many garbage collection bugs fixed in the 7 or so updates since your version, a good handful of them related to CMS. If you can, my best suggestion at the moment is to upgrade to the latest and see how that fairs. If not, you might see if going back to the throughput collector and turning on the parallel tenured space collector might meet your needs instead. You can work with other params to get that going better if you have to as well. Also, adjusting other settings with the low pause collector might trigger something to side step the bug. Not a great option there though ;) How many unique fields are you sorting/faceting on? It must be a lot if you need 10 gig for 8 million documents. Its kind of rough to have to work at such a close limit to your total heap available as a min mem requirement. -- - Mark http://www.lucidimagination.com > # > # An unexpected error has been detected by Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x2b4e0f69ea2a, pid=32224, tid=1103812928 > # > # Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode > linux-amd64) > # Problematic frame: > # V [libjvm.so+0x265a2a] > # > # If you would like to submit a bug report, please visit: > # http://java.sun.com/webapps/bugreport/crash.jsp > # > > --- T H R E A D --- > > Current thread (0x5be47400): VMThread [stack: > 0x41bad000,0x41cae000] [id=32249] > > siginfo:si_signo=SIGSEGV: si_errno=0, si_code=128 (), > si_addr=0x > > Registers: > RAX=0x2aac929b4c70, RBX=0x0037c985003a095e, RCX=0x0006, > RDX=0x005c49870037c996 > RSP=0x41cac550, RBP=0x41cac550, RSI=0x2aac929b4c70, > RDI=0x0037c985003a095e > R8 =0x2aadab201538, R9 =0x0005, R10=0x0001, > R11=0x0010 > R12=0x2aac929b4c70, R13=0x2aac9289cf58, R14=0x2aac9289cf40, > R15=0x2aadab2015ac > RIP=0x2b4e0f69ea2a, EFL=0x00010206, CSGSFS=0x0033, > ERR=0x > TRAPNO=0x000d > > Top of Stack: (sp=0x41cac550) > 0x41cac550: 41cac580 2b4e0f903c5b > 0x41cac560: 41cac590 0003 > 0x41cac570: 2aac9289cf50 2aadab2015a8 > 0x41cac580: 41cac5c0 2b4e0f72e388 > 0x41cac590: 41cac5c0 2aac9289cf40 > 0x41cac5a0: 0005 2b4e0fc86330 > 0x41cac5b0: 2b4e0fd8c740 > 0x41cac5c0: 41cac5f0 2b4e0f903b7f > 0x41cac5d0: 41cac610 0003 > 0x41cac5e0: 2aaccb1750f8 2aaccea41570 > 0x41cac5f0: 41cac610 2b4e0f931548 > 0x41cac600: 2b4e0fc861d8 2aadd4052ab0 > 0x41cac610: 41cac640 2b4e0f903d1a > 0x41cac620: 41cac650 0003 > 0x41cac630: 5bc7d6d0 2b4e0fd8c740 > 0x41cac640: 41cac650 2b4e0f90411c > 0x41cac650: 41cac680 2b4e0fa1d16e > 0x41cac660: 5bc7d6d0 > 0x41cac670: 0002 2b4e0fd8c740 > 0x41cac680: 41cac6c0 2b4e0fa74640 > 0x41cac690: 41cac6b0 5bc7d6d0 > 0x41cac6a0: 0002 2b4e0fd8c740 > 0x41cac6b0: 0001 2b4e0fd8c740 > 0x41cac6c0: 41cac700 2b4e0f9a52da > 0x41cac6d0: bfc0 > 0x41cac6e0: 2b4e0fd8c740 5bc7d6d0 > 0x41cac6f0: 2b4e0fd8c740 0001 > 0x41cac700: 41cac750 2b4e0f6feb80 > 0x41cac710: 449dae1d9ae42358 3ff0cccd > 0x41cac720: 2aad289aa680 0001 > 0x41cac730: 41cac780 > 0x41cac740: 0001 5bc7d6d0 > > Instructions: (pc=0x2b4e0f69ea2a) > 0x2b4e0f69ea1a: 89 e5 48 83 f9 05 74 38 48 8b 56 08 48 83 c2 10 > 0x2b4e0f69ea2a: 48 8b b2 a0 00 00 00 ba 01 00 00 00 83 e6 07 48 > > Stack: [0x41bad000,0x41cae000], sp=0x41cac550, > free space=1021k
Re: Solr and Garbage Collection
Ok. After the server ran for more than 12 hours, the time spent on GC decreased from 11% to 3,4%, but 5 hours later it crashed. This is the thread dump, maybe you can help identify what happened? # # An unexpected error has been detected by Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x2b4e0f69ea2a, pid=32224, tid=1103812928 # # Java VM: Java HotSpot(TM) 64-Bit Server VM (10.0-b22 mixed mode linux-amd64) # Problematic frame: # V [libjvm.so+0x265a2a] # # If you would like to submit a bug report, please visit: # http://java.sun.com/webapps/bugreport/crash.jsp # --- T H R E A D --- Current thread (0x5be47400): VMThread [stack: 0x41bad000,0x41cae000] [id=32249] siginfo:si_signo=SIGSEGV: si_errno=0, si_code=128 (), si_addr=0x Registers: RAX=0x2aac929b4c70, RBX=0x0037c985003a095e, RCX=0x0006, RDX=0x005c49870037c996 RSP=0x41cac550, RBP=0x41cac550, RSI=0x2aac929b4c70, RDI=0x0037c985003a095e R8 =0x2aadab201538, R9 =0x0005, R10=0x0001, R11=0x0010 R12=0x2aac929b4c70, R13=0x2aac9289cf58, R14=0x2aac9289cf40, R15=0x2aadab2015ac RIP=0x2b4e0f69ea2a, EFL=0x00010206, CSGSFS=0x0033, ERR=0x TRAPNO=0x000d Top of Stack: (sp=0x41cac550) 0x41cac550: 41cac580 2b4e0f903c5b 0x41cac560: 41cac590 0003 0x41cac570: 2aac9289cf50 2aadab2015a8 0x41cac580: 41cac5c0 2b4e0f72e388 0x41cac590: 41cac5c0 2aac9289cf40 0x41cac5a0: 0005 2b4e0fc86330 0x41cac5b0: 2b4e0fd8c740 0x41cac5c0: 41cac5f0 2b4e0f903b7f 0x41cac5d0: 41cac610 0003 0x41cac5e0: 2aaccb1750f8 2aaccea41570 0x41cac5f0: 41cac610 2b4e0f931548 0x41cac600: 2b4e0fc861d8 2aadd4052ab0 0x41cac610: 41cac640 2b4e0f903d1a 0x41cac620: 41cac650 0003 0x41cac630: 5bc7d6d0 2b4e0fd8c740 0x41cac640: 41cac650 2b4e0f90411c 0x41cac650: 41cac680 2b4e0fa1d16e 0x41cac660: 5bc7d6d0 0x41cac670: 0002 2b4e0fd8c740 0x41cac680: 41cac6c0 2b4e0fa74640 0x41cac690: 41cac6b0 5bc7d6d0 0x41cac6a0: 0002 2b4e0fd8c740 0x41cac6b0: 0001 2b4e0fd8c740 0x41cac6c0: 41cac700 2b4e0f9a52da 0x41cac6d0: bfc0 0x41cac6e0: 2b4e0fd8c740 5bc7d6d0 0x41cac6f0: 2b4e0fd8c740 0001 0x41cac700: 41cac750 2b4e0f6feb80 0x41cac710: 449dae1d9ae42358 3ff0cccd 0x41cac720: 2aad289aa680 0001 0x41cac730: 41cac780 0x41cac740: 0001 5bc7d6d0 Instructions: (pc=0x2b4e0f69ea2a) 0x2b4e0f69ea1a: 89 e5 48 83 f9 05 74 38 48 8b 56 08 48 83 c2 10 0x2b4e0f69ea2a: 48 8b b2 a0 00 00 00 ba 01 00 00 00 83 e6 07 48 Stack: [0x41bad000,0x41cae000], sp=0x41cac550, free space=1021k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x265a2a] V [libjvm.so+0x4cac5b] V [libjvm.so+0x2f5388] V [libjvm.so+0x4cab7f] V [libjvm.so+0x4f8548] V [libjvm.so+0x4cad1a] V [libjvm.so+0x4cb11c] V [libjvm.so+0x5e416e] V [libjvm.so+0x63b640] V [libjvm.so+0x56c2da] V [libjvm.so+0x2c5b80] V [libjvm.so+0x2c8866] V [libjvm.so+0x2c7f10] V [libjvm.so+0x2551ba] V [libjvm.so+0x254a6a] V [libjvm.so+0x254778] V [libjvm.so+0x2c579c] V [libjvm.so+0x23502a] V [libjvm.so+0x2c5b0e] V [libjvm.so+0x661a5e] V [libjvm.so+0x66e48a] V [libjvm.so+0x66da32] V [libjvm.so+0x66dcb4] V [libjvm.so+0x66d7ae] V [libjvm.so+0x50628a] VM_Operation (0x4076bd20): GenCollectForAllocation, mode: safepoint, requested by thread 0x5c42d800 --- P R O C E S S --- Java Threads: ( => current thread ) 0x5c466400 JavaThread "btpool0-502" [_thread_blocked, id=4508, stack(0x46332000,0x46433000)] 0x5c2a2400 JavaThread "btpool0-501" [_thread_blocked, id=4507, stack(0x428f8000,0x429f9000)] 0x5c0fec00 JavaThread "btpool0-500" [_thread_blocked, id=4506, stack(0x43e0d000,0x43f0e000)] 0x5c2ce400 JavaThread "btpool0-498" [_thread_blocked, id=4504, stack(0x42dfd000,0x42efe000)] 0x5be69000 JavaThread "btpool0-497" [_thread_blocked, id=4503, stack(0x45f2e000,0x4602f000)] 0x5c30e000 JavaThread "btpool0-496" [_thread_blocked, id=4251, stack(0x0
Re: Solr and Garbage Collection
Jonathan Ariel wrote: > I have around 8M documents. > Thats actually not so bad - I take it you are faceting/sorting on quite a few unique fields? > I set up my server to use a different collector and it seems like it > decreased from 11% to 4%, of course I need to wait a bit more because it is > just a 1 hour old log. But it seems like it is much better now. > I will tell you on Monday the results :) > Are you still seeing major collections then? (eg the tenured space hits its limit) You might be able to get even better. > On Fri, Sep 25, 2009 at 6:07 PM, Mark Miller wrote: > > >> Thats a good point too - if you can reduce your need for such a large >> heap, by all means, do so. >> >> However, considering you already need at least 10GB or you get OOM, you >> have a long way to go with that approach. Good luck :) >> >> How many docs do you have ? I'm guessing its mostly FieldCache type >> stuff, and thats the type of thing you can't really side step, unless >> you give up the functionality thats using it. >> >> Grant Ingersoll wrote: >> >>> On Sep 25, 2009, at 9:30 AM, Jonathan Ariel wrote: >>> >>> Hi to all! Lately my solr servers seem to stop responding once in a while. I'm using solr 1.3. Of course I'm having more traffic on the servers. So I logged the Garbage Collection activity to check if it's because of that. It seems like 11% of the time the application runs, it is stopped because of GC. And some times the GC takes up to 10 seconds! Is is normal? My instances run on a 16GB RAM, Dual Quad Core Intel Xeon servers. My index is around 10GB and I'm giving to the instances 10GB of RAM. How can I check which is the GC that it is being used? If I'm right JVM Ergonomics should use the Throughput GC, but I'm not 100% sure. Do you have any recommendation on this? >>> As I said in Eteve's thread on JVM settings, some extra time spent on >>> application design/debugging will save a whole lot of headache in >>> Garbage Collection and trying to tune the gazillion different options >>> available. Ask yourself: What is on the heap and does it need to be >>> there? For instance, do you, if you have them, really need sortable >>> ints? If your servers seem to come to a stop, I'm going to bet you >>> have major collections going on. Major collections in a production >>> system are very bad. They tend to happen right after commits in >>> poorly tuned systems, but can also happen in other places if you let >>> things build up due to really large heaps and/or things like really >>> large cache settings. I would pull up jConsole and have a look at >>> what is happening when the pauses occur. Is it a major collection? >>> If so, then hook up a heap analyzer or a profiler and see what is on >>> the heap around those times. Then have a look at your schema/config, >>> etc. and see if there are things that are memory intensive (sorting, >>> faceting, excessively large filter caches). >>> >>> -- >>> Grant Ingersoll >>> http://www.lucidimagination.com/ >>> >>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) >>> using Solr/Lucene: >>> http://www.lucidimagination.com/search >>> >>> >> -- >> - Mark >> >> http://www.lucidimagination.com >> >> >> >> >> > > -- - Mark http://www.lucidimagination.com
RE: Solr and Garbage Collection
Sorry for OFF-topic: Create dummy "Hello, World!" JSP, use Tomcat, execute load-stress simulator(s) from separate machine(s), and measure... don't forget to allocate necessary thread pools in Tomcat (if you have to)... Although such JSP doesn't use any memory, you will see how easy one can go with 5000 TPS (or 'virtually' 5 concurrent users) on modern quad-cores by simply allocating more memory (...GB) and more Tomcat threads. There is threshold too... repeat it with HTTPD Workers (and threads), same result, although it doesn't use any GC. More memory - more threads - more "keep alives" per TCP... However, 'theoretically' you need only 64Mb for "Hello World" :)))
Re: Solr and Garbage Collection
I have around 8M documents. I set up my server to use a different collector and it seems like it decreased from 11% to 4%, of course I need to wait a bit more because it is just a 1 hour old log. But it seems like it is much better now. I will tell you on Monday the results :) On Fri, Sep 25, 2009 at 6:07 PM, Mark Miller wrote: > Thats a good point too - if you can reduce your need for such a large > heap, by all means, do so. > > However, considering you already need at least 10GB or you get OOM, you > have a long way to go with that approach. Good luck :) > > How many docs do you have ? I'm guessing its mostly FieldCache type > stuff, and thats the type of thing you can't really side step, unless > you give up the functionality thats using it. > > Grant Ingersoll wrote: > > > > On Sep 25, 2009, at 9:30 AM, Jonathan Ariel wrote: > > > >> Hi to all! > >> Lately my solr servers seem to stop responding once in a while. I'm > >> using > >> solr 1.3. > >> Of course I'm having more traffic on the servers. > >> So I logged the Garbage Collection activity to check if it's because of > >> that. It seems like 11% of the time the application runs, it is stopped > >> because of GC. And some times the GC takes up to 10 seconds! > >> Is is normal? My instances run on a 16GB RAM, Dual Quad Core Intel Xeon > >> servers. My index is around 10GB and I'm giving to the instances 10GB of > >> RAM. > >> > >> How can I check which is the GC that it is being used? If I'm right JVM > >> Ergonomics should use the Throughput GC, but I'm not 100% sure. Do > >> you have > >> any recommendation on this? > > > > > > As I said in Eteve's thread on JVM settings, some extra time spent on > > application design/debugging will save a whole lot of headache in > > Garbage Collection and trying to tune the gazillion different options > > available. Ask yourself: What is on the heap and does it need to be > > there? For instance, do you, if you have them, really need sortable > > ints? If your servers seem to come to a stop, I'm going to bet you > > have major collections going on. Major collections in a production > > system are very bad. They tend to happen right after commits in > > poorly tuned systems, but can also happen in other places if you let > > things build up due to really large heaps and/or things like really > > large cache settings. I would pull up jConsole and have a look at > > what is happening when the pauses occur. Is it a major collection? > > If so, then hook up a heap analyzer or a profiler and see what is on > > the heap around those times. Then have a look at your schema/config, > > etc. and see if there are things that are memory intensive (sorting, > > faceting, excessively large filter caches). > > > > -- > > Grant Ingersoll > > http://www.lucidimagination.com/ > > > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) > > using Solr/Lucene: > > http://www.lucidimagination.com/search > > > > > -- > - Mark > > http://www.lucidimagination.com > > > >
Re: Solr and Garbage Collection
One more point and I'll stop - I've hit my email quota for the day ;) While its a pain to have to juggle GC params and tune - when you require a heap thats more than a gig or two, I personally believe its essential to do so for good performance. The (default settings / ergonomics with throughput) just don't cut it. Sad fact of life :) Luckily, you don't generally have to do that much to get things nice - the number of options is not that staggering, and you don't usually need to get into most of them. Choosing the right collector, and tweaking a setting or two can often be enough. The most important to do with a large heap and the throughput collector is to turn on parallel tenured collection. I've said it before, but it really is key. At least if you have more than a processor or two - which, for your sake, I hope you do :) - Mark Mark Miller wrote: > Thats a good point too - if you can reduce your need for such a large > heap, by all means, do so. > > However, considering you already need at least 10GB or you get OOM, you > have a long way to go with that approach. Good luck :) > > How many docs do you have ? I'm guessing its mostly FieldCache type > stuff, and thats the type of thing you can't really side step, unless > you give up the functionality thats using it. > > Grant Ingersoll wrote: > >> On Sep 25, 2009, at 9:30 AM, Jonathan Ariel wrote: >> >> >>> Hi to all! >>> Lately my solr servers seem to stop responding once in a while. I'm >>> using >>> solr 1.3. >>> Of course I'm having more traffic on the servers. >>> So I logged the Garbage Collection activity to check if it's because of >>> that. It seems like 11% of the time the application runs, it is stopped >>> because of GC. And some times the GC takes up to 10 seconds! >>> Is is normal? My instances run on a 16GB RAM, Dual Quad Core Intel Xeon >>> servers. My index is around 10GB and I'm giving to the instances 10GB of >>> RAM. >>> >>> How can I check which is the GC that it is being used? If I'm right JVM >>> Ergonomics should use the Throughput GC, but I'm not 100% sure. Do >>> you have >>> any recommendation on this? >>> >> As I said in Eteve's thread on JVM settings, some extra time spent on >> application design/debugging will save a whole lot of headache in >> Garbage Collection and trying to tune the gazillion different options >> available. Ask yourself: What is on the heap and does it need to be >> there? For instance, do you, if you have them, really need sortable >> ints? If your servers seem to come to a stop, I'm going to bet you >> have major collections going on. Major collections in a production >> system are very bad. They tend to happen right after commits in >> poorly tuned systems, but can also happen in other places if you let >> things build up due to really large heaps and/or things like really >> large cache settings. I would pull up jConsole and have a look at >> what is happening when the pauses occur. Is it a major collection? >> If so, then hook up a heap analyzer or a profiler and see what is on >> the heap around those times. Then have a look at your schema/config, >> etc. and see if there are things that are memory intensive (sorting, >> faceting, excessively large filter caches). >> >> -- >> Grant Ingersoll >> http://www.lucidimagination.com/ >> >> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) >> using Solr/Lucene: >> http://www.lucidimagination.com/search >> >> > > > -- - Mark http://www.lucidimagination.com
Re: Solr and Garbage Collection
Thats a good point too - if you can reduce your need for such a large heap, by all means, do so. However, considering you already need at least 10GB or you get OOM, you have a long way to go with that approach. Good luck :) How many docs do you have ? I'm guessing its mostly FieldCache type stuff, and thats the type of thing you can't really side step, unless you give up the functionality thats using it. Grant Ingersoll wrote: > > On Sep 25, 2009, at 9:30 AM, Jonathan Ariel wrote: > >> Hi to all! >> Lately my solr servers seem to stop responding once in a while. I'm >> using >> solr 1.3. >> Of course I'm having more traffic on the servers. >> So I logged the Garbage Collection activity to check if it's because of >> that. It seems like 11% of the time the application runs, it is stopped >> because of GC. And some times the GC takes up to 10 seconds! >> Is is normal? My instances run on a 16GB RAM, Dual Quad Core Intel Xeon >> servers. My index is around 10GB and I'm giving to the instances 10GB of >> RAM. >> >> How can I check which is the GC that it is being used? If I'm right JVM >> Ergonomics should use the Throughput GC, but I'm not 100% sure. Do >> you have >> any recommendation on this? > > > As I said in Eteve's thread on JVM settings, some extra time spent on > application design/debugging will save a whole lot of headache in > Garbage Collection and trying to tune the gazillion different options > available. Ask yourself: What is on the heap and does it need to be > there? For instance, do you, if you have them, really need sortable > ints? If your servers seem to come to a stop, I'm going to bet you > have major collections going on. Major collections in a production > system are very bad. They tend to happen right after commits in > poorly tuned systems, but can also happen in other places if you let > things build up due to really large heaps and/or things like really > large cache settings. I would pull up jConsole and have a look at > what is happening when the pauses occur. Is it a major collection? > If so, then hook up a heap analyzer or a profiler and see what is on > the heap around those times. Then have a look at your schema/config, > etc. and see if there are things that are memory intensive (sorting, > faceting, excessively large filter caches). > > -- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) > using Solr/Lucene: > http://www.lucidimagination.com/search > -- - Mark http://www.lucidimagination.com
Re: Solr and Garbage Collection
Mark Miller wrote: > Jonathan Ariel wrote: > >> How can I check which is the GC that it is being used? If I'm right JVM >> Ergonomics should use the Throughput GC, but I'm not 100% sure. Do you have >> any recommendation on this? >> >> >> > Just to straighten out this one too - Ergonomics doesn't use throughput > - throughput is the collector that allows Ergonomics ;) > > And throughput is the default as long as your machine is detected as > server class. > > But throughput is not great with large tenured spaces out of the box. It > only parallelizes the new space collection. You have to turn on an > option to get parallel tenured collection as well - which is essential > to scale to large heap sizes. > > hmm - I'm not being totally accurate there - ergonomics is what detects server and so makes throughput the default collector for a server machine. But much of the GC ergonomics support only works with the throughput collector. Kind of chicken and egg :) -- - Mark http://www.lucidimagination.com
Re: Solr and Garbage Collection
Jonathan Ariel wrote: > How can I check which is the GC that it is being used? If I'm right JVM > Ergonomics should use the Throughput GC, but I'm not 100% sure. Do you have > any recommendation on this? > > Just to straighten out this one too - Ergonomics doesn't use throughput - throughput is the collector that allows Ergonomics ;) And throughput is the default as long as your machine is detected as server class. But throughput is not great with large tenured spaces out of the box. It only parallelizes the new space collection. You have to turn on an option to get parallel tenured collection as well - which is essential to scale to large heap sizes. -- - Mark http://www.lucidimagination.com
Re: Solr and Garbage Collection
On Sep 25, 2009, at 9:30 AM, Jonathan Ariel wrote: Hi to all! Lately my solr servers seem to stop responding once in a while. I'm using solr 1.3. Of course I'm having more traffic on the servers. So I logged the Garbage Collection activity to check if it's because of that. It seems like 11% of the time the application runs, it is stopped because of GC. And some times the GC takes up to 10 seconds! Is is normal? My instances run on a 16GB RAM, Dual Quad Core Intel Xeon servers. My index is around 10GB and I'm giving to the instances 10GB of RAM. How can I check which is the GC that it is being used? If I'm right JVM Ergonomics should use the Throughput GC, but I'm not 100% sure. Do you have any recommendation on this? As I said in Eteve's thread on JVM settings, some extra time spent on application design/debugging will save a whole lot of headache in Garbage Collection and trying to tune the gazillion different options available. Ask yourself: What is on the heap and does it need to be there? For instance, do you, if you have them, really need sortable ints? If your servers seem to come to a stop, I'm going to bet you have major collections going on. Major collections in a production system are very bad. They tend to happen right after commits in poorly tuned systems, but can also happen in other places if you let things build up due to really large heaps and/or things like really large cache settings. I would pull up jConsole and have a look at what is happening when the pauses occur. Is it a major collection? If so, then hook up a heap analyzer or a profiler and see what is on the heap around those times. Then have a look at your schema/config, etc. and see if there are things that are memory intensive (sorting, faceting, excessively large filter caches). -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Solr and Garbage Collection
Ok. I'll first change the GC and see if the time spent decreased. Than I'll try increasing the heap as Fuad recommends. On 9/25/09, Mark Miller wrote: > When we talk about Collectors, we are not just talking about > "collecting" - whatever that means. There isn't really a "collecting" > phase - the whole algorithm is garbage collecting - hence calling the > different implementations "collectors". > > Usually, fragmentation is dealt with using a mark-compact collector (or > IBM has used a mark-sweep-compact collector). > Copying collectors are not only super efficient at collecting young > spaces, but they are also great for fragmentation - when you copy > everything to the new space, you can remove any fragmentation. At the > cost of double the space requirements though. > > So mark-compact is a compromise. First you mark whats reachable, then > everything thats marked is copied/compacted to the bottom of the heap. > Its all part of a "collection" though. > > Jonathan Ariel wrote: >> Maybe what's missing here is how did I get the 11%.I just ran solr with >> the >> following JVM params: -XX:+PrintGCApplicationConcurrentTime >> -XX:+PrintGCApplicationStoppedTime with that I can measure the amount of >> time the application run between collection pauses and the length of the >> collection pauses, respectively. >> I think that in this case the 11% is just for memory collection and not >> defragmentation... but I'm not 100% sure. >> >> On Fri, Sep 25, 2009 at 5:05 PM, Fuad Efendi wrote: >> >> >>> But again, GC is not just "Garbage Collection" as many in this thread >>> think... it is also "memory defragmentation" which is much costly than >>> "collection" just because it needs move somewhere _live_objects_ (and >>> wait/lock till such objects get unlocked to be moved...) - obviously more >>> memory helps... >>> >>> 11% is extremely high. >>> >>> >>> -Fuad >>> http://www.linkedin.com/in/liferay >>> >>> >>> -Original Message- From: Jonathan Ariel [mailto:ionat...@gmail.com] Sent: September-25-09 3:36 PM To: solr-user@lucene.apache.org Subject: Re: FW: Solr and Garbage Collection I'm not planning on lowering the heap. I just want to lower the time "wasted" on GC, which is 11% right now.So what I'll try is changing the >>> GC >>> to -XX:+UseConcMarkSweepGC On Fri, Sep 25, 2009 at 4:17 PM, Fuad Efendi wrote: > Mark, > > what if piece of code needs 10 contiguous Kb to load a document field? > >>> How >>> > locked memory pieces are optimized/moved (putting on hold almost whole > application)? > Lowering heap is _bad_ idea; we will have extremely frequent GC > >>> (optimize >>> > of > live objects!!!) even if RAM is (theoretically) enough. > > -Fuad > > > >> Faud, you didn't read the thread right. >> >> He is not having a problem with OOM. He got the OOM because he >> >>> lowered >>> >> the heap to try and help GC. >> >> He normally runs with a heap that can handle his FC. >> >> Please re-read the thread. You are confusing the tread. >> >> - Mark >> >> > >>> GC will frequently happen even if RAM is more than enough: in case >>> >>> if >>> it >>> > is > >>> heavily sparse... so that have even more RAM! >>> -Fuad >>> > > >>> >>> >> >> > > > -- > - Mark > > http://www.lucidimagination.com > > > >
Re: Solr and Garbage Collection
This all applies to having more than once processor though - if you have one processor, than non concurrent can also make sense. But especially with the young space, you want concurrency - with upto 98% of objects being short lived, and multiple threads generally creating new objects, its a huge boon to collect the young space concurrently. Mark Miller wrote: > Walter Underwood wrote: > >> For batch-oriented computing, like Hadoop, the most efficient GC is probably >> a non-concurrent, non-generational GC. >> > Okay - for batch we somewhat agree I guess - if you can stand any length > of pausing, non concurrent can be nice, because you don't pay for thread > sync communication. Only with a small heap size though (less than 100MB > is what I've seen). You would pause the batch job while GC takes place. > If you have 8 processors, and you are pausing all of them to collect a > large heap using only 1 processor, that doesn't make much sense to me. > The thread communication pain will be far outweighed by using more > processors to do the collection faster, and not "stop the world" for > your batch job so long. Stopping your application dead in its tracks, > and then only using one of the available processors to collect a large > heap, while the rest sit idle, doesn't make much sense. > > I also don't agree it ever really makes sense not to do generational > collection. What is your argument here? Generational collection is > **way** more efficient for short lived objects, which tend to be up to > 98% of the objects in most applications. The only way I see that making > sense is if you have almost no short lived objects (which occurs in > what, .0001% of apps if at all?). The Sun JVM doesn't even offer a non > generational approach anymore. It's just standard GC practice. > >> I doubt that there are many >> batch-oriented applications of Solr, though. >> >> The rest of the advice is intended to be general and it sounds like we agree >> about sizing. If the nursery is not big enough, the tenured space will be >> used for allocations that have a short lifetime and that will increase the >> length and/or frequency of major collections. >> >> > Yes - I wasn't arguing with every point - I was picking and choosing :) > After the heap size, the size of the young generation is the most > important factor. > >> Cache evictions are the interesting part, because they cause a constant rate >> of tenured space garbage. In most many servers, you can get a big enough >> nursery that major collections are very rare. That won't happen in Solr >> because of cache evictions. >> >> The IBM JVM is excellent. Their concurrent generational GC policy is >> "gencon". >> >> > Yeah, I actually know very little about the IBM JVM, so I wasn't really > commenting. But from the info I gleaned here and on a couple quick web > searches, I'm not too impressed by it's GC. > >> wunder >> >> -Original Message- >> From: Mark Miller [mailto:markrmil...@gmail.com] >> Sent: Friday, September 25, 2009 10:31 AM >> To: solr-user@lucene.apache.org >> Subject: Re: Solr and Garbage Collection >> >> My bad - later, it looks as if your giving general advice, and thats >> what I took issue with. >> >> Any Collector that is not doing generational collection is essentially >> from the dark ages and shouldn't be used. >> >> Any Collector that doesn't have concurrent options, unless possibly your >> running a tiny app (under 100MB of RAM), or only have a single CPU, is >> also dark ages, and not fit for a server environement. >> >> I havn't kept up with IBM's JVM, but it sounds like they are well behind >> Sun in GC then. >> >> - Mark >> >> Walter Underwood wrote: >> >> >>> As I said, I was using the IBM JVM, not the Sun JVM. The "concurrent low >>> pause" collector is only in the Sun JVM. >>> >>> I just found this excellent article about the various IBM GC options for a >>> Lucene application with a 100GB heap: >>> >>> >>> >>> >> http://www.nearinfinity.com/blogs/aaron_mccurry/tuning_the_ibm_jvm_for_large >> >> >>> _h.html >>> >>> wunder >>> >>> -Original Message- >>> From: Mark Miller [mailto:markrmil...@gmail.com] >>> Sent: Friday, September 25, 2009 10:03 AM >>> To: solr-user@lucene.apache.org >>> Subject: Re: Solr and Garbage
Re: Solr and Garbage Collection
Walter Underwood wrote: > For batch-oriented computing, like Hadoop, the most efficient GC is probably > a non-concurrent, non-generational GC. Okay - for batch we somewhat agree I guess - if you can stand any length of pausing, non concurrent can be nice, because you don't pay for thread sync communication. Only with a small heap size though (less than 100MB is what I've seen). You would pause the batch job while GC takes place. If you have 8 processors, and you are pausing all of them to collect a large heap using only 1 processor, that doesn't make much sense to me. The thread communication pain will be far outweighed by using more processors to do the collection faster, and not "stop the world" for your batch job so long. Stopping your application dead in its tracks, and then only using one of the available processors to collect a large heap, while the rest sit idle, doesn't make much sense. I also don't agree it ever really makes sense not to do generational collection. What is your argument here? Generational collection is **way** more efficient for short lived objects, which tend to be up to 98% of the objects in most applications. The only way I see that making sense is if you have almost no short lived objects (which occurs in what, .0001% of apps if at all?). The Sun JVM doesn't even offer a non generational approach anymore. It's just standard GC practice. > I doubt that there are many > batch-oriented applications of Solr, though. > > The rest of the advice is intended to be general and it sounds like we agree > about sizing. If the nursery is not big enough, the tenured space will be > used for allocations that have a short lifetime and that will increase the > length and/or frequency of major collections. > Yes - I wasn't arguing with every point - I was picking and choosing :) After the heap size, the size of the young generation is the most important factor. > Cache evictions are the interesting part, because they cause a constant rate > of tenured space garbage. In most many servers, you can get a big enough > nursery that major collections are very rare. That won't happen in Solr > because of cache evictions. > > The IBM JVM is excellent. Their concurrent generational GC policy is > "gencon". > Yeah, I actually know very little about the IBM JVM, so I wasn't really commenting. But from the info I gleaned here and on a couple quick web searches, I'm not too impressed by it's GC. > wunder > > -Original Message- > From: Mark Miller [mailto:markrmil...@gmail.com] > Sent: Friday, September 25, 2009 10:31 AM > To: solr-user@lucene.apache.org > Subject: Re: Solr and Garbage Collection > > My bad - later, it looks as if your giving general advice, and thats > what I took issue with. > > Any Collector that is not doing generational collection is essentially > from the dark ages and shouldn't be used. > > Any Collector that doesn't have concurrent options, unless possibly your > running a tiny app (under 100MB of RAM), or only have a single CPU, is > also dark ages, and not fit for a server environement. > > I havn't kept up with IBM's JVM, but it sounds like they are well behind > Sun in GC then. > > - Mark > > Walter Underwood wrote: > >> As I said, I was using the IBM JVM, not the Sun JVM. The "concurrent low >> pause" collector is only in the Sun JVM. >> >> I just found this excellent article about the various IBM GC options for a >> Lucene application with a 100GB heap: >> >> >> > http://www.nearinfinity.com/blogs/aaron_mccurry/tuning_the_ibm_jvm_for_large > >> _h.html >> >> wunder >> >> -Original Message- >> From: Mark Miller [mailto:markrmil...@gmail.com] >> Sent: Friday, September 25, 2009 10:03 AM >> To: solr-user@lucene.apache.org >> Subject: Re: Solr and Garbage Collection >> >> Walter Underwood wrote: >> >> >>> 30ms is not better or worse than 1s until you look at the service >>> requirements. For many applications, it is worth dedicating 10% of your >>> processing time to GC if that makes the worst-case pause short. >>> >>> On the other hand, my experience with the IBM JVM was that the maximum >>> >>> >> query >> >> >>> rate was 2-3X better with the concurrent generational GC compared to any >>> >>> >> of >> >> >>> their other GC algorithms, so we got the best throughput along with the >>> shortest pauses. >>> >>> >>> >> With which collector? Since
RE: Solr and Garbage Collection
For batch-oriented computing, like Hadoop, the most efficient GC is probably a non-concurrent, non-generational GC. I doubt that there are many batch-oriented applications of Solr, though. The rest of the advice is intended to be general and it sounds like we agree about sizing. If the nursery is not big enough, the tenured space will be used for allocations that have a short lifetime and that will increase the length and/or frequency of major collections. Cache evictions are the interesting part, because they cause a constant rate of tenured space garbage. In most many servers, you can get a big enough nursery that major collections are very rare. That won't happen in Solr because of cache evictions. The IBM JVM is excellent. Their concurrent generational GC policy is "gencon". wunder -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Friday, September 25, 2009 10:31 AM To: solr-user@lucene.apache.org Subject: Re: Solr and Garbage Collection My bad - later, it looks as if your giving general advice, and thats what I took issue with. Any Collector that is not doing generational collection is essentially from the dark ages and shouldn't be used. Any Collector that doesn't have concurrent options, unless possibly your running a tiny app (under 100MB of RAM), or only have a single CPU, is also dark ages, and not fit for a server environement. I havn't kept up with IBM's JVM, but it sounds like they are well behind Sun in GC then. - Mark Walter Underwood wrote: > As I said, I was using the IBM JVM, not the Sun JVM. The "concurrent low > pause" collector is only in the Sun JVM. > > I just found this excellent article about the various IBM GC options for a > Lucene application with a 100GB heap: > > http://www.nearinfinity.com/blogs/aaron_mccurry/tuning_the_ibm_jvm_for_large > _h.html > > wunder > > -Original Message- > From: Mark Miller [mailto:markrmil...@gmail.com] > Sent: Friday, September 25, 2009 10:03 AM > To: solr-user@lucene.apache.org > Subject: Re: Solr and Garbage Collection > > Walter Underwood wrote: > >> 30ms is not better or worse than 1s until you look at the service >> requirements. For many applications, it is worth dedicating 10% of your >> processing time to GC if that makes the worst-case pause short. >> >> On the other hand, my experience with the IBM JVM was that the maximum >> > query > >> rate was 2-3X better with the concurrent generational GC compared to any >> > of > >> their other GC algorithms, so we got the best throughput along with the >> shortest pauses. >> >> > With which collector? Since the very early JVM's, all GC is generational. > Most of the collectors (other than the Serial Collector) also work > concurrently. > By default, they are concurrent on different generations, but you can > add concurrency > to the "other" generation with each now too. > >> Solr garbage generation (for queries) seems to have two major components: >> per-request garbage and cache evictions. With a generational collector, >> these two are handled by separate parts of the collector. >> > Different parts of the collector? Its a different collector depending on > the generation. > The young generation is collected with a copy collector. This is because > almost all the objects > in the young generation are likely dead, and a copy collector only needs > to visit live objects. So > its very efficient. The tenured generation uses something more along the > lines of mark and sweep or mark > and compact. > >> Per-request >> garbage should completely fit in the short-term heap (nursery), so that it >> can be collected rapidly and returned to use for further requests. If the >> nursery is too small, the per-request allocations will be made in tenured >> space and sit there until the next major GC. Cache evictions are almost >> always in long-term storage (tenured space) because an LRU algorithm >> guarantees that the garbage will be old. >> >> Check the growth rate of tenured space (under constant load, of course) >> while increasing the size of the nursery. That rate should drop when the >> nursery gets big enough, then not drop much further as it is increased >> > more. > >> After that, reduce the size of tenured space until major GCs start >> > happening > >> "too often" (a judgment call). A bigger tenured space means longer major >> > GCs > >> and thus longer pauses, so you don't want it oversized by too much. >> >> > With the concurrent low pause coll
Re: Solr and Garbage Collection
My bad - later, it looks as if your giving general advice, and thats what I took issue with. Any Collector that is not doing generational collection is essentially from the dark ages and shouldn't be used. Any Collector that doesn't have concurrent options, unless possibly your running a tiny app (under 100MB of RAM), or only have a single CPU, is also dark ages, and not fit for a server environement. I havn't kept up with IBM's JVM, but it sounds like they are well behind Sun in GC then. - Mark Walter Underwood wrote: > As I said, I was using the IBM JVM, not the Sun JVM. The "concurrent low > pause" collector is only in the Sun JVM. > > I just found this excellent article about the various IBM GC options for a > Lucene application with a 100GB heap: > > http://www.nearinfinity.com/blogs/aaron_mccurry/tuning_the_ibm_jvm_for_large > _h.html > > wunder > > -Original Message- > From: Mark Miller [mailto:markrmil...@gmail.com] > Sent: Friday, September 25, 2009 10:03 AM > To: solr-user@lucene.apache.org > Subject: Re: Solr and Garbage Collection > > Walter Underwood wrote: > >> 30ms is not better or worse than 1s until you look at the service >> requirements. For many applications, it is worth dedicating 10% of your >> processing time to GC if that makes the worst-case pause short. >> >> On the other hand, my experience with the IBM JVM was that the maximum >> > query > >> rate was 2-3X better with the concurrent generational GC compared to any >> > of > >> their other GC algorithms, so we got the best throughput along with the >> shortest pauses. >> >> > With which collector? Since the very early JVM's, all GC is generational. > Most of the collectors (other than the Serial Collector) also work > concurrently. > By default, they are concurrent on different generations, but you can > add concurrency > to the "other" generation with each now too. > >> Solr garbage generation (for queries) seems to have two major components: >> per-request garbage and cache evictions. With a generational collector, >> these two are handled by separate parts of the collector. >> > Different parts of the collector? Its a different collector depending on > the generation. > The young generation is collected with a copy collector. This is because > almost all the objects > in the young generation are likely dead, and a copy collector only needs > to visit live objects. So > its very efficient. The tenured generation uses something more along the > lines of mark and sweep or mark > and compact. > >> Per-request >> garbage should completely fit in the short-term heap (nursery), so that it >> can be collected rapidly and returned to use for further requests. If the >> nursery is too small, the per-request allocations will be made in tenured >> space and sit there until the next major GC. Cache evictions are almost >> always in long-term storage (tenured space) because an LRU algorithm >> guarantees that the garbage will be old. >> >> Check the growth rate of tenured space (under constant load, of course) >> while increasing the size of the nursery. That rate should drop when the >> nursery gets big enough, then not drop much further as it is increased >> > more. > >> After that, reduce the size of tenured space until major GCs start >> > happening > >> "too often" (a judgment call). A bigger tenured space means longer major >> > GCs > >> and thus longer pauses, so you don't want it oversized by too much. >> >> > With the concurrent low pause collector, the goal is to avoid "major" > collections, > by collecting *before* the tenured space is filled. If you you are > getting "major" collections, > you need to tune your settings - the whole point of that collector is to > avoid "major" > collections, and do almost all of the work while your application is not > paused. There are > still 2 brief pauses during the collection, but they should not be > significant at all. > >> Also check the hit rates of your caches. If the hit rate is low, say 20% >> > or > >> less, make that cache much bigger or set it to zero. Either one will >> > reduce > >> the number of cache evictions. If you have an HTTP cache in front of Solr, >> zero may be the right choice, since the HTTP cache is cherry-picking the >> easily cacheable requests. >> >> Note that a commit nearly doubles the memory required, because you have >>
Re: Solr and Garbage Collection
Ok. I will try with the "concurrent low pause" collector and let you know the results. On Fri, Sep 25, 2009 at 2:23 PM, Walter Underwood wrote: > As I said, I was using the IBM JVM, not the Sun JVM. The "concurrent low > pause" collector is only in the Sun JVM. > > I just found this excellent article about the various IBM GC options for a > Lucene application with a 100GB heap: > > > http://www.nearinfinity.com/blogs/aaron_mccurry/tuning_the_ibm_jvm_for_large > _h.html > > wunder > > -Original Message- > From: Mark Miller [mailto:markrmil...@gmail.com] > Sent: Friday, September 25, 2009 10:03 AM > To: solr-user@lucene.apache.org > Subject: Re: Solr and Garbage Collection > > Walter Underwood wrote: > > 30ms is not better or worse than 1s until you look at the service > > requirements. For many applications, it is worth dedicating 10% of your > > processing time to GC if that makes the worst-case pause short. > > > > On the other hand, my experience with the IBM JVM was that the maximum > query > > rate was 2-3X better with the concurrent generational GC compared to any > of > > their other GC algorithms, so we got the best throughput along with the > > shortest pauses. > > > With which collector? Since the very early JVM's, all GC is generational. > Most of the collectors (other than the Serial Collector) also work > concurrently. > By default, they are concurrent on different generations, but you can > add concurrency > to the "other" generation with each now too. > > Solr garbage generation (for queries) seems to have two major components: > > per-request garbage and cache evictions. With a generational collector, > > these two are handled by separate parts of the collector. > Different parts of the collector? Its a different collector depending on > the generation. > The young generation is collected with a copy collector. This is because > almost all the objects > in the young generation are likely dead, and a copy collector only needs > to visit live objects. So > its very efficient. The tenured generation uses something more along the > lines of mark and sweep or mark > and compact. > > Per-request > > garbage should completely fit in the short-term heap (nursery), so that > it > > can be collected rapidly and returned to use for further requests. If the > > nursery is too small, the per-request allocations will be made in tenured > > space and sit there until the next major GC. Cache evictions are almost > > always in long-term storage (tenured space) because an LRU algorithm > > guarantees that the garbage will be old. > > > > Check the growth rate of tenured space (under constant load, of course) > > while increasing the size of the nursery. That rate should drop when the > > nursery gets big enough, then not drop much further as it is increased > more. > > > > After that, reduce the size of tenured space until major GCs start > happening > > "too often" (a judgment call). A bigger tenured space means longer major > GCs > > and thus longer pauses, so you don't want it oversized by too much. > > > With the concurrent low pause collector, the goal is to avoid "major" > collections, > by collecting *before* the tenured space is filled. If you you are > getting "major" collections, > you need to tune your settings - the whole point of that collector is to > avoid "major" > collections, and do almost all of the work while your application is not > paused. There are > still 2 brief pauses during the collection, but they should not be > significant at all. > > Also check the hit rates of your caches. If the hit rate is low, say 20% > or > > less, make that cache much bigger or set it to zero. Either one will > reduce > > the number of cache evictions. If you have an HTTP cache in front of > Solr, > > zero may be the right choice, since the HTTP cache is cherry-picking the > > easily cacheable requests. > > > > Note that a commit nearly doubles the memory required, because you have > two > > live Searcher objects with all their caches. Make sure you have headroom > for > > a commit. > > > > If you want to test the tenured space usage, you must test with real > world > > queries. Those are the only way to get accurate cache eviction rates. > > > > wunder > > > > -Original Message- > > From: Jonathan Ariel [mailto:ionat...@gmail.com] > > Sent: Friday, September 25, 2009 9:34 AM > > To: solr-user@lucene.apache.org > > Subject: Re: Solr and Garbage Collection &
RE: Solr and Garbage Collection
As I said, I was using the IBM JVM, not the Sun JVM. The "concurrent low pause" collector is only in the Sun JVM. I just found this excellent article about the various IBM GC options for a Lucene application with a 100GB heap: http://www.nearinfinity.com/blogs/aaron_mccurry/tuning_the_ibm_jvm_for_large _h.html wunder -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Friday, September 25, 2009 10:03 AM To: solr-user@lucene.apache.org Subject: Re: Solr and Garbage Collection Walter Underwood wrote: > 30ms is not better or worse than 1s until you look at the service > requirements. For many applications, it is worth dedicating 10% of your > processing time to GC if that makes the worst-case pause short. > > On the other hand, my experience with the IBM JVM was that the maximum query > rate was 2-3X better with the concurrent generational GC compared to any of > their other GC algorithms, so we got the best throughput along with the > shortest pauses. > With which collector? Since the very early JVM's, all GC is generational. Most of the collectors (other than the Serial Collector) also work concurrently. By default, they are concurrent on different generations, but you can add concurrency to the "other" generation with each now too. > Solr garbage generation (for queries) seems to have two major components: > per-request garbage and cache evictions. With a generational collector, > these two are handled by separate parts of the collector. Different parts of the collector? Its a different collector depending on the generation. The young generation is collected with a copy collector. This is because almost all the objects in the young generation are likely dead, and a copy collector only needs to visit live objects. So its very efficient. The tenured generation uses something more along the lines of mark and sweep or mark and compact. > Per-request > garbage should completely fit in the short-term heap (nursery), so that it > can be collected rapidly and returned to use for further requests. If the > nursery is too small, the per-request allocations will be made in tenured > space and sit there until the next major GC. Cache evictions are almost > always in long-term storage (tenured space) because an LRU algorithm > guarantees that the garbage will be old. > > Check the growth rate of tenured space (under constant load, of course) > while increasing the size of the nursery. That rate should drop when the > nursery gets big enough, then not drop much further as it is increased more. > > After that, reduce the size of tenured space until major GCs start happening > "too often" (a judgment call). A bigger tenured space means longer major GCs > and thus longer pauses, so you don't want it oversized by too much. > With the concurrent low pause collector, the goal is to avoid "major" collections, by collecting *before* the tenured space is filled. If you you are getting "major" collections, you need to tune your settings - the whole point of that collector is to avoid "major" collections, and do almost all of the work while your application is not paused. There are still 2 brief pauses during the collection, but they should not be significant at all. > Also check the hit rates of your caches. If the hit rate is low, say 20% or > less, make that cache much bigger or set it to zero. Either one will reduce > the number of cache evictions. If you have an HTTP cache in front of Solr, > zero may be the right choice, since the HTTP cache is cherry-picking the > easily cacheable requests. > > Note that a commit nearly doubles the memory required, because you have two > live Searcher objects with all their caches. Make sure you have headroom for > a commit. > > If you want to test the tenured space usage, you must test with real world > queries. Those are the only way to get accurate cache eviction rates. > > wunder > > -Original Message- > From: Jonathan Ariel [mailto:ionat...@gmail.com] > Sent: Friday, September 25, 2009 9:34 AM > To: solr-user@lucene.apache.org > Subject: Re: Solr and Garbage Collection > > BTW why making them equal will lower the frequency of GC? > > On 9/25/09, Fuad Efendi wrote: > >>> Bigger heaps lead to bigger GC pauses in general. >>> >> Opposite viewpoint: >> 1sec GC happening once an hour is MUCH BETTER than 30ms GC >> > once-per-second. > >> To lower frequency of GC: -Xms4096m -Xmx4096m (make it equal!) >> >> Use -server option. >> >> -server option of JVM is 'native CPU code', I remember WebLogic 7 console >> with SUN JVM 1.3 not showing any GC (just horizontal line). >> >> -Fuad >> http://www.linkedin.com/in/liferay >> >> >> >> >> > > > -- - Mark http://www.lucidimagination.com
Re: Solr and Garbage Collection
Walter Underwood wrote: > 30ms is not better or worse than 1s until you look at the service > requirements. For many applications, it is worth dedicating 10% of your > processing time to GC if that makes the worst-case pause short. > > On the other hand, my experience with the IBM JVM was that the maximum query > rate was 2-3X better with the concurrent generational GC compared to any of > their other GC algorithms, so we got the best throughput along with the > shortest pauses. > With which collector? Since the very early JVM's, all GC is generational. Most of the collectors (other than the Serial Collector) also work concurrently. By default, they are concurrent on different generations, but you can add concurrency to the "other" generation with each now too. > Solr garbage generation (for queries) seems to have two major components: > per-request garbage and cache evictions. With a generational collector, > these two are handled by separate parts of the collector. Different parts of the collector? Its a different collector depending on the generation. The young generation is collected with a copy collector. This is because almost all the objects in the young generation are likely dead, and a copy collector only needs to visit live objects. So its very efficient. The tenured generation uses something more along the lines of mark and sweep or mark and compact. > Per-request > garbage should completely fit in the short-term heap (nursery), so that it > can be collected rapidly and returned to use for further requests. If the > nursery is too small, the per-request allocations will be made in tenured > space and sit there until the next major GC. Cache evictions are almost > always in long-term storage (tenured space) because an LRU algorithm > guarantees that the garbage will be old. > > Check the growth rate of tenured space (under constant load, of course) > while increasing the size of the nursery. That rate should drop when the > nursery gets big enough, then not drop much further as it is increased more. > > After that, reduce the size of tenured space until major GCs start happening > "too often" (a judgment call). A bigger tenured space means longer major GCs > and thus longer pauses, so you don't want it oversized by too much. > With the concurrent low pause collector, the goal is to avoid "major" collections, by collecting *before* the tenured space is filled. If you you are getting "major" collections, you need to tune your settings - the whole point of that collector is to avoid "major" collections, and do almost all of the work while your application is not paused. There are still 2 brief pauses during the collection, but they should not be significant at all. > Also check the hit rates of your caches. If the hit rate is low, say 20% or > less, make that cache much bigger or set it to zero. Either one will reduce > the number of cache evictions. If you have an HTTP cache in front of Solr, > zero may be the right choice, since the HTTP cache is cherry-picking the > easily cacheable requests. > > Note that a commit nearly doubles the memory required, because you have two > live Searcher objects with all their caches. Make sure you have headroom for > a commit. > > If you want to test the tenured space usage, you must test with real world > queries. Those are the only way to get accurate cache eviction rates. > > wunder > > -Original Message- > From: Jonathan Ariel [mailto:ionat...@gmail.com] > Sent: Friday, September 25, 2009 9:34 AM > To: solr-user@lucene.apache.org > Subject: Re: Solr and Garbage Collection > > BTW why making them equal will lower the frequency of GC? > > On 9/25/09, Fuad Efendi wrote: > >>> Bigger heaps lead to bigger GC pauses in general. >>> >> Opposite viewpoint: >> 1sec GC happening once an hour is MUCH BETTER than 30ms GC >> > once-per-second. > >> To lower frequency of GC: -Xms4096m -Xmx4096m (make it equal!) >> >> Use -server option. >> >> -server option of JVM is 'native CPU code', I remember WebLogic 7 console >> with SUN JVM 1.3 not showing any GC (just horizontal line). >> >> -Fuad >> http://www.linkedin.com/in/liferay >> >> >> >> >> > > > -- - Mark http://www.lucidimagination.com
RE: Solr and Garbage Collection
30ms is not better or worse than 1s until you look at the service requirements. For many applications, it is worth dedicating 10% of your processing time to GC if that makes the worst-case pause short. On the other hand, my experience with the IBM JVM was that the maximum query rate was 2-3X better with the concurrent generational GC compared to any of their other GC algorithms, so we got the best throughput along with the shortest pauses. Solr garbage generation (for queries) seems to have two major components: per-request garbage and cache evictions. With a generational collector, these two are handled by separate parts of the collector. Per-request garbage should completely fit in the short-term heap (nursery), so that it can be collected rapidly and returned to use for further requests. If the nursery is too small, the per-request allocations will be made in tenured space and sit there until the next major GC. Cache evictions are almost always in long-term storage (tenured space) because an LRU algorithm guarantees that the garbage will be old. Check the growth rate of tenured space (under constant load, of course) while increasing the size of the nursery. That rate should drop when the nursery gets big enough, then not drop much further as it is increased more. After that, reduce the size of tenured space until major GCs start happening "too often" (a judgment call). A bigger tenured space means longer major GCs and thus longer pauses, so you don't want it oversized by too much. Also check the hit rates of your caches. If the hit rate is low, say 20% or less, make that cache much bigger or set it to zero. Either one will reduce the number of cache evictions. If you have an HTTP cache in front of Solr, zero may be the right choice, since the HTTP cache is cherry-picking the easily cacheable requests. Note that a commit nearly doubles the memory required, because you have two live Searcher objects with all their caches. Make sure you have headroom for a commit. If you want to test the tenured space usage, you must test with real world queries. Those are the only way to get accurate cache eviction rates. wunder -Original Message- From: Jonathan Ariel [mailto:ionat...@gmail.com] Sent: Friday, September 25, 2009 9:34 AM To: solr-user@lucene.apache.org Subject: Re: Solr and Garbage Collection BTW why making them equal will lower the frequency of GC? On 9/25/09, Fuad Efendi wrote: >> Bigger heaps lead to bigger GC pauses in general. > > Opposite viewpoint: > 1sec GC happening once an hour is MUCH BETTER than 30ms GC once-per-second. > > To lower frequency of GC: -Xms4096m -Xmx4096m (make it equal!) > > Use -server option. > > -server option of JVM is 'native CPU code', I remember WebLogic 7 console > with SUN JVM 1.3 not showing any GC (just horizontal line). > > -Fuad > http://www.linkedin.com/in/liferay > > > >
Re: Solr and Garbage Collection
>-server option of JVM is 'native CPU code', I remember WebLogic 7 console >with SUN JVM 1.3 not showing any GC (just horizontal line). Not sure what that is all about either. -server and -client are just two different versions of hotspot. The -server version is optimized for long running applications - it starts slower, and over time, it learns about your app and makes good throughput optimizations. The -client hotspot version works faster quicker, and does concentrate more on response than throughput. Better for desktop apps. -server is better for long lived server apps. Generally. Mark Miller wrote: > It won't really - it will just keep the JVM from wasting time resizing > the heap on you. Since you know you need so much RAM anyway, no reason > not to just pin it at what you need. > Not going to help you much with GC though. > > Jonathan Ariel wrote: > >> BTW why making them equal will lower the frequency of GC? >> >> On 9/25/09, Fuad Efendi wrote: >> >> Bigger heaps lead to bigger GC pauses in general. >>> Opposite viewpoint: >>> 1sec GC happening once an hour is MUCH BETTER than 30ms GC once-per-second. >>> >>> To lower frequency of GC: -Xms4096m -Xmx4096m (make it equal!) >>> >>> Use -server option. >>> >>> -server option of JVM is 'native CPU code', I remember WebLogic 7 console >>> with SUN JVM 1.3 not showing any GC (just horizontal line). >>> >>> -Fuad >>> http://www.linkedin.com/in/liferay >>> >>> >>> >>> >>> >>> > > > -- - Mark http://www.lucidimagination.com
Re: Solr and Garbage Collection
It won't really - it will just keep the JVM from wasting time resizing the heap on you. Since you know you need so much RAM anyway, no reason not to just pin it at what you need. Not going to help you much with GC though. Jonathan Ariel wrote: > BTW why making them equal will lower the frequency of GC? > > On 9/25/09, Fuad Efendi wrote: > >>> Bigger heaps lead to bigger GC pauses in general. >>> >> Opposite viewpoint: >> 1sec GC happening once an hour is MUCH BETTER than 30ms GC once-per-second. >> >> To lower frequency of GC: -Xms4096m -Xmx4096m (make it equal!) >> >> Use -server option. >> >> -server option of JVM is 'native CPU code', I remember WebLogic 7 console >> with SUN JVM 1.3 not showing any GC (just horizontal line). >> >> -Fuad >> http://www.linkedin.com/in/liferay >> >> >> >> >> -- - Mark http://www.lucidimagination.com
Re: Solr and Garbage Collection
BTW why making them equal will lower the frequency of GC? On 9/25/09, Fuad Efendi wrote: >> Bigger heaps lead to bigger GC pauses in general. > > Opposite viewpoint: > 1sec GC happening once an hour is MUCH BETTER than 30ms GC once-per-second. > > To lower frequency of GC: -Xms4096m -Xmx4096m (make it equal!) > > Use -server option. > > -server option of JVM is 'native CPU code', I remember WebLogic 7 console > with SUN JVM 1.3 not showing any GC (just horizontal line). > > -Fuad > http://www.linkedin.com/in/liferay > > > >
Re: Solr and Garbage Collection
I can't really understand how increasing the heap will decrease the 11% dedicated to GC On 9/25/09, Fuad Efendi wrote: >> You are saying that I should give more memory than 12GB? > > > Yes. Look at this: > >> > SEVERE: java.lang.OutOfMemoryError: Java heap space >> > org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:3 >> 61 >> > ) > > > > It can't find few (!!!) contiguous bytes for .createValue(...) > > It can't add (Field Value, Document ID) pair to an array. > > GC tuning won't help in this specific case... > > May be SOLR/Lucene core developers may WARM FieldCache at IndexReader > opening time, in the future... to have early OOM... > > > Avoiding faceting (and sorting) on such field will only postpone OOM to > unpredictable date/time... > > > -Fuad > http://www.linkedin.com/in/liferay > > > >
RE: Solr and Garbage Collection
I would look at the JVM. Have you tried switching to the concurrent low pause collector ? Colin. -Original Message- From: Jonathan Ariel [mailto:ionat...@gmail.com] Sent: Friday, September 25, 2009 12:07 PM To: solr-user@lucene.apache.org Subject: Re: Solr and Garbage Collection You are saying that I should give more memory than 12GB? When I was with 10GB I had the exceptions that I sent. Switching to 12GB made them disappear. So I think I don't have problems with FieldCache any more. What it seems like a problem is 11% on the application time dedicated to GC. Specially when those servers are under really heavy load. I think that's why I sometimes get queries that in one moment are being executed in a few ms and a moment after 20 seconds! It seems like I should tune my jvm, don't you think so? On Fri, Sep 25, 2009 at 1:01 PM, Fuad Efendi wrote: > Give it even more memory. > > Lucene FieldCache is used to store non-tokenized single-value non-boolean > (DocumentId -> FieldValue) pairs, and it is used (in-full!) for instance > for > sorting query results. > > So that if you have 100,000,000 documents with specific heavily distributed > field values (cardinality is high! Size is 100bytes!) you need > 10,000,000,000 bytes for just this instance of FieldCache. > > GC does not play any role. FieldCache won't be GC-collected. > > > -Fuad > http://www.linkedin.com/in/liferay > > > > > -Original Message- > > From: Jonathan Ariel [mailto:ionat...@gmail.com] > > Sent: September-25-09 11:37 AM > > To: solr-user@lucene.apache.org; yo...@lucidimagination.com > > Subject: Re: Solr and Garbage Collection > > > > Right, now I'm giving it 12GB of heap memory. > > If I give it less (10GB) it throws the following exception: > > > > Sep 5, 2009 7:18:32 PM org.apache.solr.common.SolrException log > > SEVERE: java.lang.OutOfMemoryError: Java heap space > > at > > > > org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:3 > 61 > > ) > > at > > org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) > > at > > > > org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:3 > 52 > > ) > > at > > > > org.apache.solr.request.SimpleFacets.getFieldCacheCounts(SimpleFacets.java:2 > 67 > > ) > > at > > org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:185) > > at > > > > org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:2 > 07 > > ) > > at > > > org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:104) > > at > > > > org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java > :7 > > 0) > > at > > > > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHand > le > > r.java:169) > > at > > > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase. > ja > > va:131) > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) > > at > > > > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:3 > 03 > > ) > > at > > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: > 23 > > 2) > > at > > > > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler > .j > > ava:1089) > > at > > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) > > at > > > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > > at > > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > > at > > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) > > at > > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) > > at > > > > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerColl > ec > > tion.java:211) > > at > > > > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:11 > 4) > > at > > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) > > at org.mortbay.jetty.Server.handle(Server.java:285) > > at > > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) > > at > > > > org.mortbay.jetty.HttpConn
Re: Solr and Garbage Collection
Yes - more RAM is not a solution to your problem. Jonathan Ariel wrote: > You are saying that I should give more memory than 12GB? > When I was with 10GB I had the exceptions that I sent. Switching to 12GB > made them disappear. > So I think I don't have problems with FieldCache any more. What it seems > like a problem is 11% on the application time dedicated to GC. Specially > when those servers are under really heavy load. > I think that's why I sometimes get queries that in one moment are being > executed in a few ms and a moment after 20 seconds! > > It seems like I should tune my jvm, don't you think so? > > On Fri, Sep 25, 2009 at 1:01 PM, Fuad Efendi wrote: > > >> Give it even more memory. >> >> Lucene FieldCache is used to store non-tokenized single-value non-boolean >> (DocumentId -> FieldValue) pairs, and it is used (in-full!) for instance >> for >> sorting query results. >> >> So that if you have 100,000,000 documents with specific heavily distributed >> field values (cardinality is high! Size is 100bytes!) you need >> 10,000,000,000 bytes for just this instance of FieldCache. >> >> GC does not play any role. FieldCache won't be GC-collected. >> >> >> -Fuad >> http://www.linkedin.com/in/liferay >> >> >> >> >>> -----Original Message- >>> From: Jonathan Ariel [mailto:ionat...@gmail.com] >>> Sent: September-25-09 11:37 AM >>> To: solr-user@lucene.apache.org; yo...@lucidimagination.com >>> Subject: Re: Solr and Garbage Collection >>> >>> Right, now I'm giving it 12GB of heap memory. >>> If I give it less (10GB) it throws the following exception: >>> >>> Sep 5, 2009 7:18:32 PM org.apache.solr.common.SolrException log >>> SEVERE: java.lang.OutOfMemoryError: Java heap space >>> at >>> >>> >> org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:3 >> 61 >> >>> ) >>> at >>> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) >>> at >>> >>> >> org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:3 >> 52 >> >>> ) >>> at >>> >>> >> org.apache.solr.request.SimpleFacets.getFieldCacheCounts(SimpleFacets.java:2 >> 67 >> >>> ) >>> at >>> org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:185) >>> at >>> >>> >> org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:2 >> 07 >> >>> ) >>> at >>> >>> >> org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:104) >> >>> at >>> >>> >> org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java >> :7 >> >>> 0) >>> at >>> >>> >> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHand >> le >> >>> r.java:169) >>> at >>> >>> >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase. >> ja >> >>> va:131) >>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) >>> at >>> >>> >> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:3 >> 03 >> >>> ) >>> at >>> >>> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: >> 23 >> >>> 2) >>> at >>> >>> >> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler >> .j >> >>> ava:1089) >>> at >>> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) >>> at >>> >>> >> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) >> >>> at >>> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) >>> at >>> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) >>> at >>> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) >>> at >>> &
RE: Solr and Garbage Collection
> You are saying that I should give more memory than 12GB? Yes. Look at this: > > SEVERE: java.lang.OutOfMemoryError: Java heap space > org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:3 > 61 > > ) It can't find few (!!!) contiguous bytes for .createValue(...) It can't add (Field Value, Document ID) pair to an array. GC tuning won't help in this specific case... May be SOLR/Lucene core developers may WARM FieldCache at IndexReader opening time, in the future... to have early OOM... Avoiding faceting (and sorting) on such field will only postpone OOM to unpredictable date/time... -Fuad http://www.linkedin.com/in/liferay
Re: Solr and Garbage Collection
You are saying that I should give more memory than 12GB? When I was with 10GB I had the exceptions that I sent. Switching to 12GB made them disappear. So I think I don't have problems with FieldCache any more. What it seems like a problem is 11% on the application time dedicated to GC. Specially when those servers are under really heavy load. I think that's why I sometimes get queries that in one moment are being executed in a few ms and a moment after 20 seconds! It seems like I should tune my jvm, don't you think so? On Fri, Sep 25, 2009 at 1:01 PM, Fuad Efendi wrote: > Give it even more memory. > > Lucene FieldCache is used to store non-tokenized single-value non-boolean > (DocumentId -> FieldValue) pairs, and it is used (in-full!) for instance > for > sorting query results. > > So that if you have 100,000,000 documents with specific heavily distributed > field values (cardinality is high! Size is 100bytes!) you need > 10,000,000,000 bytes for just this instance of FieldCache. > > GC does not play any role. FieldCache won't be GC-collected. > > > -Fuad > http://www.linkedin.com/in/liferay > > > > > -Original Message- > > From: Jonathan Ariel [mailto:ionat...@gmail.com] > > Sent: September-25-09 11:37 AM > > To: solr-user@lucene.apache.org; yo...@lucidimagination.com > > Subject: Re: Solr and Garbage Collection > > > > Right, now I'm giving it 12GB of heap memory. > > If I give it less (10GB) it throws the following exception: > > > > Sep 5, 2009 7:18:32 PM org.apache.solr.common.SolrException log > > SEVERE: java.lang.OutOfMemoryError: Java heap space > > at > > > > org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:3 > 61 > > ) > > at > > org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) > > at > > > > org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:3 > 52 > > ) > > at > > > > org.apache.solr.request.SimpleFacets.getFieldCacheCounts(SimpleFacets.java:2 > 67 > > ) > > at > > org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:185) > > at > > > > org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:2 > 07 > > ) > > at > > > org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:104) > > at > > > > org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java > :7 > > 0) > > at > > > > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHand > le > > r.java:169) > > at > > > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase. > ja > > va:131) > > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) > > at > > > > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:3 > 03 > > ) > > at > > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: > 23 > > 2) > > at > > > > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler > .j > > ava:1089) > > at > > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) > > at > > > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > > at > > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > > at > > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) > > at > > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) > > at > > > > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerColl > ec > > tion.java:211) > > at > > > > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:11 > 4) > > at > > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) > > at org.mortbay.jetty.Server.handle(Server.java:285) > > at > > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) > > at > > > > org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java: > 83 > > 5) > > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) > > at > org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) > > at > org.mortbay.jetty.Htt
RE: Solr and Garbage Collection
Give it even more memory. Lucene FieldCache is used to store non-tokenized single-value non-boolean (DocumentId -> FieldValue) pairs, and it is used (in-full!) for instance for sorting query results. So that if you have 100,000,000 documents with specific heavily distributed field values (cardinality is high! Size is 100bytes!) you need 10,000,000,000 bytes for just this instance of FieldCache. GC does not play any role. FieldCache won't be GC-collected. -Fuad http://www.linkedin.com/in/liferay > -Original Message- > From: Jonathan Ariel [mailto:ionat...@gmail.com] > Sent: September-25-09 11:37 AM > To: solr-user@lucene.apache.org; yo...@lucidimagination.com > Subject: Re: Solr and Garbage Collection > > Right, now I'm giving it 12GB of heap memory. > If I give it less (10GB) it throws the following exception: > > Sep 5, 2009 7:18:32 PM org.apache.solr.common.SolrException log > SEVERE: java.lang.OutOfMemoryError: Java heap space > at > org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:3 61 > ) > at > org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) > at > org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:3 52 > ) > at > org.apache.solr.request.SimpleFacets.getFieldCacheCounts(SimpleFacets.java:2 67 > ) > at > org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:185) > at > org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:2 07 > ) > at > org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:104) > at > org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java :7 > 0) > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHand le > r.java:169) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase. ja > va:131) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:3 03 > ) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 23 > 2) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler .j > ava:1089) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerColl ec > tion.java:211) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:11 4) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) > at org.mortbay.jetty.Server.handle(Server.java:285) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) > at > org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java: 83 > 5) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) > at > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:22 6) > at > org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:4 42 > ) > > On Fri, Sep 25, 2009 at 10:55 AM, Yonik Seeley > wrote: > > > On Fri, Sep 25, 2009 at 9:30 AM, Jonathan Ariel > > wrote: > > > Hi to all! > > > Lately my solr servers seem to stop responding once in a while. I'm using > > > solr 1.3. > > > Of course I'm having more traffic on the servers. > > > So I logged the Garbage Collection activity to check if it's because of > > > that. It seems like 11% of the time the application runs, it is stopped > > > because of GC. And some times the GC takes up to 10 seconds! > > > Is is normal? My instances run on a 16GB RAM, Dual Quad Core Intel Xeon > > > servers. My index is around 10GB and I'm giving to the instances 10GB of > > > RAM. > > > > Bigger heaps lead to bigger GC pauses in general. > > Do you mean that you are giving the JVM a 10GB heap? Were you getting > > OOM exceptions with a smaller heap? > > > > -Yonik > > http://www.lucidimagination.com > >
RE: Solr and Garbage Collection
> Bigger heaps lead to bigger GC pauses in general. Opposite viewpoint: 1sec GC happening once an hour is MUCH BETTER than 30ms GC once-per-second. To lower frequency of GC: -Xms4096m -Xmx4096m (make it equal!) Use -server option. -server option of JVM is 'native CPU code', I remember WebLogic 7 console with SUN JVM 1.3 not showing any GC (just horizontal line). -Fuad http://www.linkedin.com/in/liferay
Re: Solr and Garbage Collection
I've got the start of a Garbage Collection article here: http://www.lucidimagination.com/blog/2009/09/19/java-garbage-collection-boot-camp-draft/ I plan to tie it more into Lucene/Solr and add some more about the theory/methods in the final version. With so much RAM, I take it you prob have a handful of processors as well? You might start by trying the Concurrent Low Pause Collector if you have not. You might also pair it with the parallel new generation collector. If you still get long pauses, you might try lowering -XX:CMSInitiatingOccupancyFraction, to kick off major collections earlier. It can still be difficult with really large fieldcaches, because all of sudden, everything is released at once when the Reader goes away - but there should be some combo of settings that at least help alleviate the issue, especially by dedicating another processor to the task that can work somewhat in parallel without stopping your application threads for so long. If you have some success tuning, report back with your results if you could. -- - Mark http://www.lucidimagination.com Jonathan Ariel wrote: > Right, now I'm giving it 12GB of heap memory. > If I give it less (10GB) it throws the following exception: > > Sep 5, 2009 7:18:32 PM org.apache.solr.common.SolrException log > SEVERE: java.lang.OutOfMemoryError: Java heap space > at > org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:361) > at > org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) > at > org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:352) > at > org.apache.solr.request.SimpleFacets.getFieldCacheCounts(SimpleFacets.java:267) > at > org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:185) > at > org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:207) > at > org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:104) > at > org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:70) > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:169) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) > at > org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) > at org.mortbay.jetty.Server.handle(Server.java:285) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) > at > org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) > at > org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) > at > org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) > > On Fri, Sep 25, 2009 at 10:55 AM, Yonik Seeley > wrote: > > >> On Fri, Sep 25, 2009 at 9:30 AM, Jonathan Ariel >> wrote: >> >>> Hi to all! >>> Lately my solr servers seem to stop responding once in a while. I'm using >>> solr 1.3. >>> Of course I'm having more traffic on the servers. >>> So I logged the Garbage Collection activity to check if it's because of >>> that. It seems like 11% of the time the application runs, it is stopped >>> because of GC. And some times the GC takes up to 10 seconds! >>> Is is normal? My instances run on a 16GB RAM, Dual Quad Core Intel Xeon >>> servers. My index is around 10GB and I'm giving to the instances 10GB of >>> RAM. >>> >> Bigger heaps lead to bigger GC pauses in general. >> Do you mean that you are giving the JVM a 10GB heap? Were you getting >> OOM exceptions with a smaller heap? >> >> -Yonik >> http://www.lucidimagination.com >> >> > >
RE: Solr and Garbage Collection
Hi, Have you looked at tuning the garbage collection ? Take a look at the following articles http://www.lucidimagination.com/blog/2009/09/19/java-garbage-collection-boot -camp-draft/ http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html Changing to the concurrent or throughput collector should help with the long pauses. Colin. -Original Message- From: Jonathan Ariel [mailto:ionat...@gmail.com] Sent: Friday, September 25, 2009 11:37 AM To: solr-user@lucene.apache.org; yo...@lucidimagination.com Subject: Re: Solr and Garbage Collection Right, now I'm giving it 12GB of heap memory. If I give it less (10GB) it throws the following exception: Sep 5, 2009 7:18:32 PM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:3 61) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) at org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:3 52) at org.apache.solr.request.SimpleFacets.getFieldCacheCounts(SimpleFacets.java:2 67) at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:185) at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:2 07) at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:104) at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java :70) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHand ler.java:169) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase. java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:3 03) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 232) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler .java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerColl ection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:11 4) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java: 835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:22 6) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:4 42) On Fri, Sep 25, 2009 at 10:55 AM, Yonik Seeley wrote: > On Fri, Sep 25, 2009 at 9:30 AM, Jonathan Ariel > wrote: > > Hi to all! > > Lately my solr servers seem to stop responding once in a while. I'm using > > solr 1.3. > > Of course I'm having more traffic on the servers. > > So I logged the Garbage Collection activity to check if it's because of > > that. It seems like 11% of the time the application runs, it is stopped > > because of GC. And some times the GC takes up to 10 seconds! > > Is is normal? My instances run on a 16GB RAM, Dual Quad Core Intel Xeon > > servers. My index is around 10GB and I'm giving to the instances 10GB of > > RAM. > > Bigger heaps lead to bigger GC pauses in general. > Do you mean that you are giving the JVM a 10GB heap? Were you getting > OOM exceptions with a smaller heap? > > -Yonik > http://www.lucidimagination.com >
Re: Solr and Garbage Collection
Right, now I'm giving it 12GB of heap memory. If I give it less (10GB) it throws the following exception: Sep 5, 2009 7:18:32 PM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:361) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) at org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:352) at org.apache.solr.request.SimpleFacets.getFieldCacheCounts(SimpleFacets.java:267) at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:185) at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:207) at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:104) at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:70) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:169) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) On Fri, Sep 25, 2009 at 10:55 AM, Yonik Seeley wrote: > On Fri, Sep 25, 2009 at 9:30 AM, Jonathan Ariel > wrote: > > Hi to all! > > Lately my solr servers seem to stop responding once in a while. I'm using > > solr 1.3. > > Of course I'm having more traffic on the servers. > > So I logged the Garbage Collection activity to check if it's because of > > that. It seems like 11% of the time the application runs, it is stopped > > because of GC. And some times the GC takes up to 10 seconds! > > Is is normal? My instances run on a 16GB RAM, Dual Quad Core Intel Xeon > > servers. My index is around 10GB and I'm giving to the instances 10GB of > > RAM. > > Bigger heaps lead to bigger GC pauses in general. > Do you mean that you are giving the JVM a 10GB heap? Were you getting > OOM exceptions with a smaller heap? > > -Yonik > http://www.lucidimagination.com >
Re: Solr and Garbage Collection
On Fri, Sep 25, 2009 at 9:30 AM, Jonathan Ariel wrote: > Hi to all! > Lately my solr servers seem to stop responding once in a while. I'm using > solr 1.3. > Of course I'm having more traffic on the servers. > So I logged the Garbage Collection activity to check if it's because of > that. It seems like 11% of the time the application runs, it is stopped > because of GC. And some times the GC takes up to 10 seconds! > Is is normal? My instances run on a 16GB RAM, Dual Quad Core Intel Xeon > servers. My index is around 10GB and I'm giving to the instances 10GB of > RAM. Bigger heaps lead to bigger GC pauses in general. Do you mean that you are giving the JVM a 10GB heap? Were you getting OOM exceptions with a smaller heap? -Yonik http://www.lucidimagination.com