Re: Out of memory on some faceting queries
On Wed, Apr 3, 2013 at 8:47 PM, Shawn Heisey wrote: > On 4/2/2013 3:09 AM, Dotan Cohen wrote: >> I notice that this only occurs on queries that run facets. I start >> Solr with the following command: >> sudo nohup java -XX:NewRatio=1 -XX:+UseParNewGC >> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled >> -Dsolr.solr.home=/mnt/SolrFiles100/solr -jar >> /opt/solr-4.1.0/example/start.jar & > > It looks like you've followed some advice that I gave previously on how > to tune java. I have since learned that this advice is bad, it results > in long GC pauses, even with heaps that aren't huge. > I see, thanks. > As others have pointed out, you don't have a max heap setting, which > would mean that you're using whatever Java chooses for its default, > which might not be enough. If you can get Solr to successfully run for > a while with queries and updates happening, the heap should eventually > max out and the admin UI will show you what Java is choosing by default. > > Here is what I would now recommend for a beginning point on your Solr > startup command. You may need to increase the heap beyond 4GB, but be > careful that you still have enough free memory to be able to do > effective caching of your index. > > sudo nohup java -Xms4096M -Xmx4096M -XX:+UseConcMarkSweepGC > -XX:CMSInitiatingOccupancyFraction=75 -XX:NewRatio=3 > -XX:MaxTenuringThreshold=8 -XX:+CMSParallelRemarkEnabled > -XX:+ParallelRefProcEnabled -XX:+UseLargePages -XX:+AggressiveOpts > -Dsolr.solr.home=/mnt/SolrFiles100/solr -jar > /opt/solr-4.1.0/example/start.jar & > Thank you, I will experiment with that. > If you are running a really old build of java (latest versions on > Oracle's website are 1.6 build 43 and 1.7 build 17), you might want to > leave AggressiveOpts out. Some people would argue that you should never > use that option. > Great, thank for the warning. This is what we're running, I'll see about updating it through my distro's package manager: $ java -version java version "1.6.0_27" OpenJDK Runtime Environment (IcedTea6 1.12.3) (6b27-1.12.3-0ubuntu1~12.04.1) OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode) -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Out of memory on some faceting queries
On 4/2/2013 3:09 AM, Dotan Cohen wrote: > I notice that this only occurs on queries that run facets. I start > Solr with the following command: > sudo nohup java -XX:NewRatio=1 -XX:+UseParNewGC > -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled > -Dsolr.solr.home=/mnt/SolrFiles100/solr -jar > /opt/solr-4.1.0/example/start.jar & It looks like you've followed some advice that I gave previously on how to tune java. I have since learned that this advice is bad, it results in long GC pauses, even with heaps that aren't huge. As others have pointed out, you don't have a max heap setting, which would mean that you're using whatever Java chooses for its default, which might not be enough. If you can get Solr to successfully run for a while with queries and updates happening, the heap should eventually max out and the admin UI will show you what Java is choosing by default. Here is what I would now recommend for a beginning point on your Solr startup command. You may need to increase the heap beyond 4GB, but be careful that you still have enough free memory to be able to do effective caching of your index. sudo nohup java -Xms4096M -Xmx4096M -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:NewRatio=3 -XX:MaxTenuringThreshold=8 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:+UseLargePages -XX:+AggressiveOpts -Dsolr.solr.home=/mnt/SolrFiles100/solr -jar /opt/solr-4.1.0/example/start.jar & If you are running a really old build of java (latest versions on Oracle's website are 1.6 build 43 and 1.7 build 17), you might want to leave AggressiveOpts out. Some people would argue that you should never use that option. Thanks, Shawn
Re: Out of memory on some faceting queries
On Wed, Apr 3, 2013 at 10:11 AM, Toke Eskildsen wrote: >> However, once per day I would like to facet on the text field, >> which is a free-text field usually around 1 KiB (about 100 words), in >> order to determine what the top keywords / topics are. That query >> would take up to 200 seconds to run, [...] > > If that query is somehow part of your warming, then I am surprised that > search has worked at all with your commit frequency. That would however > explain your OOM if you have multiple warmups running at the same time. > No, the 'heavy facet' is not part of the warming. I run it at most once per day, at the end of the day. Solr is not shut down daily. > It sounds like TermsComponent would be a better fit for getting top > topics: https://wiki.apache.org/solr/TermsComponent > I had once looked at TermsComponent, but I think that I eliminated it as a possibility because I actually need the top keywords related to a specific keyword. For instance, I need to know which words are most commonly used with the word "coffee". -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Out of memory on some faceting queries
On Tue, Apr 2, 2013 at 6:26 PM, Andre Bois-Crettez wrote: > warmupTime is available on the admin page for each type of cache (in > milliseconds) : > http://solr-box:8983/solr/#/core1/plugins/cache > > Or if you are only interested in the total : > http://solr-box:8983/solr/core1/admin/mbeans?stats=true&key=searcher > Thanks. >> Batches of 20-50 results are added to solr a few times a minute, and a >> commit is done after each batch since I'm calling Solr as such: >> http://127.0.0.1:8983/solr/core/update/json?commit=true Should I >> remove commit=true and run a cron job to commit once per minute? > > > Even better, it sounds like a job for CommitWithin : > http://wiki.apache.org/solr/CommitWithin > I'll look into that. Thank you! -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Out of memory on some faceting queries
On Tue, 2013-04-02 at 17:08 +0200, Dotan Cohen wrote: > Most of the time I facet on one field that has about twenty unique > values. They are likely to be disk cached so warming those for 9M documents should only take a few seconds. > However, once per day I would like to facet on the text field, > which is a free-text field usually around 1 KiB (about 100 words), in > order to determine what the top keywords / topics are. That query > would take up to 200 seconds to run, [...] If that query is somehow part of your warming, then I am surprised that search has worked at all with your commit frequency. That would however explain your OOM if you have multiple warmups running at the same time. It sounds like TermsComponent would be a better fit for getting top topics: https://wiki.apache.org/solr/TermsComponent
Re: Out of memory on some faceting queries
On 04/02/2013 05:04 PM, Dotan Cohen wrote: How might I time the warming? I've been googling warming since your earlier message but there does not seem to be any really good documentation on the subject. If there is anything that you feel I should be reading I would appreciate a link or a keyword to search on. I've read the Solr wiki on caching and performance, but other than that I don't see the issue addressed. warmupTime is available on the admin page for each type of cache (in milliseconds) : http://solr-box:8983/solr/#/core1/plugins/cache Or if you are only interested in the total : http://solr-box:8983/solr/core1/admin/mbeans?stats=true&key=searcher Batches of 20-50 results are added to solr a few times a minute, and a commit is done after each batch since I'm calling Solr as such: http://127.0.0.1:8983/solr/core/update/json?commit=true Should I remove commit=true and run a cron job to commit once per minute? Even better, it sounds like a job for CommitWithin : http://wiki.apache.org/solr/CommitWithin André Kelkoo SAS Société par Actions Simplifiée Au capital de € 4.168.964,30 Siège social : 8, rue du Sentier 75002 Paris 425 093 069 RCS Paris Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.
Re: Out of memory on some faceting queries
> How often do you commit and how many unique values does your facet > fields have? > Most of the time I facet on one field that has about twenty unique values. However, once per day I would like to facet on the text field, which is a free-text field usually around 1 KiB (about 100 words), in order to determine what the top keywords / topics are. That query would take up to 200 seconds to run, but it does not have to return the results in real-time (the output goes to another process, not to a waiting user). -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Out of memory on some faceting queries
On Tue, Apr 2, 2013 at 5:33 PM, Toke Eskildsen wrote: > Memory does not help you if you commit too frequently. If you commit > each X seconds and warming takes X+Y seconds, then you will run out of > memory at some point. > How might I time the warming? I've been googling warming since your earlier message but there does not seem to be any really good documentation on the subject. If there is anything that you feel I should be reading I would appreciate a link or a keyword to search on. I've read the Solr wiki on caching and performance, but other than that I don't see the issue addressed. >> I have increased maxWarmingSearchers to 4, let's see how this goes. > > If you still get the error with 4 concurrent searchers, you will have to > either speed up warmup time or commit less frequently. You should be > able to reduce facet startup time by switching to segment based faceting > (at the cost of worse search-time performance) or maybe by using > DocValues. Some of the current threads on the solr-user list is about > these topics. > > How often do you commit and how many unique values does your facet > fields have? > Batches of 20-50 results are added to solr a few times a minute, and a commit is done after each batch since I'm calling Solr as such: http://127.0.0.1:8983/solr/core/update/json?commit=true Should I remove commit=true and run a cron job to commit once per minute? -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Out of memory on some faceting queries
On Tue, Apr 2, 2013 at 5:33 PM, Toke Eskildsen wrote: > On Tue, 2013-04-02 at 15:55 +0200, Dotan Cohen wrote: > > [Tokd: maxWarmingSearchers limit exceeded?] > >> Thank you Toke, this is exactly on my "list of things to learn about >> Solr". We do get the error mentioned and we cannot reduce the amount >> of commits. Also, I do believe that we have the necessary server >> resources (16 GiB RAM). > > Memory does not help you if you commit too frequently. If you commit > each X seconds and warming takes X+Y seconds, then you will run out of > memory at some point. > >> I have increased maxWarmingSearchers to 4, let's see how this goes. > > If you still get the error with 4 concurrent searchers, you will have to > either speed up warmup time or commit less frequently. You should be > able to reduce facet startup time by switching to segment based faceting > (at the cost of worse search-time performance) or maybe by using > DocValues. Some of the current threads on the solr-user list is about > these topics. > > How often do you commit and how many unique values does your facet > fields have? > > Regards, > Toke Eskildsen > -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Out of memory on some faceting queries
On Tue, 2013-04-02 at 15:55 +0200, Dotan Cohen wrote: [Tokd: maxWarmingSearchers limit exceeded?] > Thank you Toke, this is exactly on my "list of things to learn about > Solr". We do get the error mentioned and we cannot reduce the amount > of commits. Also, I do believe that we have the necessary server > resources (16 GiB RAM). Memory does not help you if you commit too frequently. If you commit each X seconds and warming takes X+Y seconds, then you will run out of memory at some point. > I have increased maxWarmingSearchers to 4, let's see how this goes. If you still get the error with 4 concurrent searchers, you will have to either speed up warmup time or commit less frequently. You should be able to reduce facet startup time by switching to segment based faceting (at the cost of worse search-time performance) or maybe by using DocValues. Some of the current threads on the solr-user list is about these topics. How often do you commit and how many unique values does your facet fields have? Regards, Toke Eskildsen
Re: Out of memory on some faceting queries
On Tue, Apr 2, 2013 at 2:41 PM, Toke Eskildsen wrote: > 9M documents in a heavily updated index with faceting. Maybe you are > committing faster than the faceting can be prepared? > https://wiki.apache.org/solr/FAQ#What_does_.22exceeded_limit_of_maxWarmingSearchers.3DX.22_mean.3F > Thank you Toke, this is exactly on my "list of things to learn about Solr". We do get the error mentioned and we cannot reduce the amount of commits. Also, I do believe that we have the necessary server resources (16 GiB RAM). I have increased maxWarmingSearchers to 4, let's see how this goes. Thank you. -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Out of memory on some faceting queries
On Tue, 2013-04-02 at 12:16 +0200, Dotan Cohen wrote: > 8971763 documents, growing at a rate of about 500 per minute. We > actually expect that to be ~5 per minute once we get out of > testing. 9M documents in a heavily updated index with faceting. Maybe you are committing faster than the faceting can be prepared? https://wiki.apache.org/solr/FAQ#What_does_.22exceeded_limit_of_maxWarmingSearchers.3DX.22_mean.3F Regards, Toke Eskildsen
Re: Out of memory on some faceting queries
On Tue, Apr 2, 2013 at 12:59 PM, Toke Eskildsen wrote: > How many documents does your index have, how many fields do you facet on > and approximately how many unique values does your facet fields have? > 8971763 documents, growing at a rate of about 500 per minute. We actually expect that to be ~5 per minute once we get out of testing. Most documents are less than a KiB in the 'text' field, and they have a few other fields which store short strings, dates, or ints. You can think of these documents like tweets: short general purpose text messages. >> I notice that this only occurs on queries that run facets. I start >> Solr with the following command: >> sudo nohup java -XX:NewRatio=1 -XX:+UseParNewGC >> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled >> -Dsolr.solr.home=/mnt/SolrFiles100/solr -jar >> /opt/solr-4.1.0/example/start.jar & > > You are not specifying any maximum heap size (-Xmx), which you should do > in order to avoid unpleasant surprises. Facets and sorting are often > memory hungry, but your system seems to have 13GB free RAM so the easy > solution attempt would be to increase the heap until Solr serves the > facets without OOM. > Thanks, I will start with "-Xmx8g" and test. -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Out of memory on some faceting queries
On Tue, 2013-04-02 at 11:09 +0200, Dotan Cohen wrote: > On some queries I get out of memory errors: > > {"error":{"msg":"java.lang.OutOfMemoryError: Java heap [...] > org.apache.lucene.index.DocTermOrds.uninvert(DocTermOrds.java:273)\n\tat > org.apache.solr.request.UnInvertedField.(UnInvertedField.java:178)\n\tat [...] Yep, your OOM is due to faceting. How many documents does your index have, how many fields do you facet on and approximately how many unique values does your facet fields have? > I notice that this only occurs on queries that run facets. I start > Solr with the following command: > sudo nohup java -XX:NewRatio=1 -XX:+UseParNewGC > -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled > -Dsolr.solr.home=/mnt/SolrFiles100/solr -jar > /opt/solr-4.1.0/example/start.jar & You are not specifying any maximum heap size (-Xmx), which you should do in order to avoid unpleasant surprises. Facets and sorting are often memory hungry, but your system seems to have 13GB free RAM so the easy solution attempt would be to increase the heap until Solr serves the facets without OOM. - Toke Eskildsen, State and University Library, Denmark