Re: Out of memory on some faceting queries

2013-04-08 Thread Dotan Cohen
On Wed, Apr 3, 2013 at 8:47 PM, Shawn Heisey  wrote:
> On 4/2/2013 3:09 AM, Dotan Cohen wrote:
>> I notice that this only occurs on queries that run facets. I start
>> Solr with the following command:
>> sudo nohup java -XX:NewRatio=1 -XX:+UseParNewGC
>> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
>> -Dsolr.solr.home=/mnt/SolrFiles100/solr -jar
>> /opt/solr-4.1.0/example/start.jar &
>
> It looks like you've followed some advice that I gave previously on how
> to tune java.  I have since learned that this advice is bad, it results
> in long GC pauses, even with heaps that aren't huge.
>

I see, thanks.

> As others have pointed out, you don't have a max heap setting, which
> would mean that you're using whatever Java chooses for its default,
> which might not be enough.  If you can get Solr to successfully run for
> a while with queries and updates happening, the heap should eventually
> max out and the admin UI will show you what Java is choosing by default.
>
> Here is what I would now recommend for a beginning point on your Solr
> startup command.  You may need to increase the heap beyond 4GB, but be
> careful that you still have enough free memory to be able to do
> effective caching of your index.
>
> sudo nohup java -Xms4096M -Xmx4096M -XX:+UseConcMarkSweepGC
> -XX:CMSInitiatingOccupancyFraction=75 -XX:NewRatio=3
> -XX:MaxTenuringThreshold=8 -XX:+CMSParallelRemarkEnabled
> -XX:+ParallelRefProcEnabled -XX:+UseLargePages -XX:+AggressiveOpts
> -Dsolr.solr.home=/mnt/SolrFiles100/solr -jar
> /opt/solr-4.1.0/example/start.jar &
>

Thank you, I will experiment with that.

> If you are running a really old build of java (latest versions on
> Oracle's website are 1.6 build 43 and 1.7 build 17), you might want to
> leave AggressiveOpts out.  Some people would argue that you should never
> use that option.
>

Great, thank for the warning. This is what we're running, I'll see
about updating it through my distro's package manager:
$ java -version
java version "1.6.0_27"
OpenJDK Runtime Environment (IcedTea6 1.12.3) (6b27-1.12.3-0ubuntu1~12.04.1)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Out of memory on some faceting queries

2013-04-03 Thread Shawn Heisey
On 4/2/2013 3:09 AM, Dotan Cohen wrote:
> I notice that this only occurs on queries that run facets. I start
> Solr with the following command:
> sudo nohup java -XX:NewRatio=1 -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
> -Dsolr.solr.home=/mnt/SolrFiles100/solr -jar
> /opt/solr-4.1.0/example/start.jar &

It looks like you've followed some advice that I gave previously on how
to tune java.  I have since learned that this advice is bad, it results
in long GC pauses, even with heaps that aren't huge.

As others have pointed out, you don't have a max heap setting, which
would mean that you're using whatever Java chooses for its default,
which might not be enough.  If you can get Solr to successfully run for
a while with queries and updates happening, the heap should eventually
max out and the admin UI will show you what Java is choosing by default.

Here is what I would now recommend for a beginning point on your Solr
startup command.  You may need to increase the heap beyond 4GB, but be
careful that you still have enough free memory to be able to do
effective caching of your index.

sudo nohup java -Xms4096M -Xmx4096M -XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75 -XX:NewRatio=3
-XX:MaxTenuringThreshold=8 -XX:+CMSParallelRemarkEnabled
-XX:+ParallelRefProcEnabled -XX:+UseLargePages -XX:+AggressiveOpts
-Dsolr.solr.home=/mnt/SolrFiles100/solr -jar
/opt/solr-4.1.0/example/start.jar &

If you are running a really old build of java (latest versions on
Oracle's website are 1.6 build 43 and 1.7 build 17), you might want to
leave AggressiveOpts out.  Some people would argue that you should never
use that option.

Thanks,
Shawn



Re: Out of memory on some faceting queries

2013-04-03 Thread Dotan Cohen
On Wed, Apr 3, 2013 at 10:11 AM, Toke Eskildsen  
wrote:
>> However, once per day I would like to facet on the text field,
>> which is a free-text field usually around 1 KiB (about 100 words), in
>> order to determine what the top keywords / topics are. That query
>> would take up to 200 seconds to run, [...]
>
> If that query is somehow part of your warming, then I am surprised that
> search has worked at all with your commit frequency. That would however
> explain your OOM if you have multiple warmups running at the same time.
>

No, the 'heavy facet' is not part of the warming. I run it at most
once per day, at the end of the day. Solr is not shut down daily.

> It sounds like TermsComponent would be a better fit for getting top
> topics: https://wiki.apache.org/solr/TermsComponent
>

I had once looked at TermsComponent, but I think that I eliminated it
as a possibility because I actually need the top keywords related to a
specific keyword. For instance, I need to know which words are most
commonly used with the word "coffee".


--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Out of memory on some faceting queries

2013-04-03 Thread Dotan Cohen
On Tue, Apr 2, 2013 at 6:26 PM, Andre Bois-Crettez
 wrote:
> warmupTime is available on the admin page for each type of cache (in
> milliseconds) :
> http://solr-box:8983/solr/#/core1/plugins/cache
>
> Or if you are only interested in the total :
> http://solr-box:8983/solr/core1/admin/mbeans?stats=true&key=searcher
>

Thanks.


>> Batches of 20-50 results are added to solr a few times a minute, and a
>> commit is done after each batch since I'm calling Solr as such:
>> http://127.0.0.1:8983/solr/core/update/json?commit=true Should I
>> remove commit=true and run a cron job to commit once per minute?
>
>
> Even better, it sounds like a job for CommitWithin :
> http://wiki.apache.org/solr/CommitWithin
>


I'll look into that. Thank you!


--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Out of memory on some faceting queries

2013-04-03 Thread Toke Eskildsen
On Tue, 2013-04-02 at 17:08 +0200, Dotan Cohen wrote:
> Most of the time I facet on one field that has about twenty unique
> values.

They are likely to be disk cached so warming those for 9M documents
should only take a few seconds.

> However, once per day I would like to facet on the text field,
> which is a free-text field usually around 1 KiB (about 100 words), in
> order to determine what the top keywords / topics are. That query
> would take up to 200 seconds to run, [...]

If that query is somehow part of your warming, then I am surprised that
search has worked at all with your commit frequency. That would however
explain your OOM if you have multiple warmups running at the same time.

It sounds like TermsComponent would be a better fit for getting top
topics: https://wiki.apache.org/solr/TermsComponent



Re: Out of memory on some faceting queries

2013-04-02 Thread Andre Bois-Crettez

On 04/02/2013 05:04 PM, Dotan Cohen wrote:

How might I time the warming? I've been googling warming since your
earlier message but there does not seem to be any really good
documentation on the subject. If there is anything that you feel I
should be reading I would appreciate a link or a keyword to search on.
I've read the Solr wiki on caching and performance, but other than
that I don't see the issue addressed.


warmupTime is available on the admin page for each type of cache (in
milliseconds) :
http://solr-box:8983/solr/#/core1/plugins/cache

Or if you are only interested in the total :
http://solr-box:8983/solr/core1/admin/mbeans?stats=true&key=searcher


Batches of 20-50 results are added to solr a few times a minute, and a
commit is done after each batch since I'm calling Solr as such:
http://127.0.0.1:8983/solr/core/update/json?commit=true Should I
remove commit=true and run a cron job to commit once per minute?


Even better, it sounds like a job for CommitWithin :
http://wiki.apache.org/solr/CommitWithin


André

Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.


Re: Out of memory on some faceting queries

2013-04-02 Thread Dotan Cohen
> How often do you commit and how many unique values does your facet
> fields have?
>

Most of the time I facet on one field that has about twenty unique
values. However, once per day I would like to facet on the text field,
which is a free-text field usually around 1 KiB (about 100 words), in
order to determine what the top keywords / topics are. That query
would take up to 200 seconds to run, but it does not have to return
the results in real-time (the output goes to another process, not to a
waiting user).

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Out of memory on some faceting queries

2013-04-02 Thread Dotan Cohen
On Tue, Apr 2, 2013 at 5:33 PM, Toke Eskildsen  wrote:
> Memory does not help you if you commit too frequently. If you commit
> each X seconds and warming takes X+Y seconds, then you will run out of
> memory at some point.
>

How might I time the warming? I've been googling warming since your
earlier message but there does not seem to be any really good
documentation on the subject. If there is anything that you feel I
should be reading I would appreciate a link or a keyword to search on.
I've read the Solr wiki on caching and performance, but other than
that I don't see the issue addressed.


>> I have increased maxWarmingSearchers to 4, let's see how this goes.
>
> If you still get the error with 4 concurrent searchers, you will have to
> either speed up warmup time or commit less frequently. You should be
> able to reduce facet startup time by switching to segment based faceting
> (at the cost of worse search-time performance) or maybe by using
> DocValues. Some of the current threads on the solr-user list is about
> these topics.
>
> How often do you commit and how many unique values does your facet
> fields have?
>

Batches of 20-50 results are added to solr a few times a minute, and a
commit is done after each batch since I'm calling Solr as such:
http://127.0.0.1:8983/solr/core/update/json?commit=true

Should I remove commit=true and run a cron job to commit once per minute?

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Out of memory on some faceting queries

2013-04-02 Thread Dotan Cohen
On Tue, Apr 2, 2013 at 5:33 PM, Toke Eskildsen  wrote:
> On Tue, 2013-04-02 at 15:55 +0200, Dotan Cohen wrote:
>
> [Tokd: maxWarmingSearchers limit exceeded?]
>
>> Thank you Toke, this is exactly on my "list of things to learn about
>> Solr". We do get the error mentioned and we cannot reduce the amount
>> of commits. Also, I do believe that we have the necessary server
>> resources (16 GiB RAM).
>
> Memory does not help you if you commit too frequently. If you commit
> each X seconds and warming takes X+Y seconds, then you will run out of
> memory at some point.
>
>> I have increased maxWarmingSearchers to 4, let's see how this goes.
>
> If you still get the error with 4 concurrent searchers, you will have to
> either speed up warmup time or commit less frequently. You should be
> able to reduce facet startup time by switching to segment based faceting
> (at the cost of worse search-time performance) or maybe by using
> DocValues. Some of the current threads on the solr-user list is about
> these topics.
>
> How often do you commit and how many unique values does your facet
> fields have?
>
> Regards,
> Toke Eskildsen
>



-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Out of memory on some faceting queries

2013-04-02 Thread Toke Eskildsen
On Tue, 2013-04-02 at 15:55 +0200, Dotan Cohen wrote:

[Tokd: maxWarmingSearchers limit exceeded?]

> Thank you Toke, this is exactly on my "list of things to learn about
> Solr". We do get the error mentioned and we cannot reduce the amount
> of commits. Also, I do believe that we have the necessary server
> resources (16 GiB RAM).

Memory does not help you if you commit too frequently. If you commit
each X seconds and warming takes X+Y seconds, then you will run out of
memory at some point.

> I have increased maxWarmingSearchers to 4, let's see how this goes.

If you still get the error with 4 concurrent searchers, you will have to
either speed up warmup time or commit less frequently. You should be
able to reduce facet startup time by switching to segment based faceting
(at the cost of worse search-time performance) or maybe by using
DocValues. Some of the current threads on the solr-user list is about
these topics.

How often do you commit and how many unique values does your facet
fields have?

Regards,
Toke Eskildsen



Re: Out of memory on some faceting queries

2013-04-02 Thread Dotan Cohen
On Tue, Apr 2, 2013 at 2:41 PM, Toke Eskildsen  wrote:
> 9M documents in a heavily updated index with faceting. Maybe you are
> committing faster than the faceting can be prepared?
> https://wiki.apache.org/solr/FAQ#What_does_.22exceeded_limit_of_maxWarmingSearchers.3DX.22_mean.3F
>

Thank you Toke, this is exactly on my "list of things to learn about
Solr". We do get the error mentioned and we cannot reduce the amount
of commits. Also, I do believe that we have the necessary server
resources (16 GiB RAM).

I have increased maxWarmingSearchers to 4, let's see how this goes.

Thank you.

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Out of memory on some faceting queries

2013-04-02 Thread Toke Eskildsen
On Tue, 2013-04-02 at 12:16 +0200, Dotan Cohen wrote:
> 8971763 documents, growing at a rate of about 500 per minute. We
> actually expect that to be ~5 per minute once we get out of
> testing.

9M documents in a heavily updated index with faceting. Maybe you are
committing faster than the faceting can be prepared?
https://wiki.apache.org/solr/FAQ#What_does_.22exceeded_limit_of_maxWarmingSearchers.3DX.22_mean.3F

Regards,
Toke Eskildsen



Re: Out of memory on some faceting queries

2013-04-02 Thread Dotan Cohen
On Tue, Apr 2, 2013 at 12:59 PM, Toke Eskildsen  
wrote:
> How many documents does your index have, how many fields do you facet on
> and approximately how many unique values does your facet fields have?
>

8971763 documents, growing at a rate of about 500 per minute. We
actually expect that to be ~5 per minute once we get out of
testing. Most documents are less than a KiB in the 'text' field, and
they have a few other fields which store short strings, dates, or
ints. You can think of these documents like tweets: short general
purpose text messages.

>> I notice that this only occurs on queries that run facets. I start
>> Solr with the following command:
>> sudo nohup java -XX:NewRatio=1 -XX:+UseParNewGC
>> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
>> -Dsolr.solr.home=/mnt/SolrFiles100/solr -jar
>> /opt/solr-4.1.0/example/start.jar &
>
> You are not specifying any maximum heap size (-Xmx), which you should do
> in order to avoid unpleasant surprises. Facets and sorting are often
> memory hungry, but your system seems to have 13GB free RAM so the easy
> solution attempt would be to increase the heap until Solr serves the
> facets without OOM.
>

Thanks, I will start with "-Xmx8g" and test.

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: Out of memory on some faceting queries

2013-04-02 Thread Toke Eskildsen
On Tue, 2013-04-02 at 11:09 +0200, Dotan Cohen wrote:
> On some queries I get out of memory errors:
> 
> {"error":{"msg":"java.lang.OutOfMemoryError: Java heap
[...]
> org.apache.lucene.index.DocTermOrds.uninvert(DocTermOrds.java:273)\n\tat
> org.apache.solr.request.UnInvertedField.(UnInvertedField.java:178)\n\tat
[...]

Yep, your OOM is due to faceting.

How many documents does your index have, how many fields do you facet on
and approximately how many unique values does your facet fields have?

> I notice that this only occurs on queries that run facets. I start
> Solr with the following command:
> sudo nohup java -XX:NewRatio=1 -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
> -Dsolr.solr.home=/mnt/SolrFiles100/solr -jar
> /opt/solr-4.1.0/example/start.jar &

You are not specifying any maximum heap size (-Xmx), which you should do
in order to avoid unpleasant surprises. Facets and sorting are often
memory hungry, but your system seems to have 13GB free RAM so the easy
solution attempt would be to increase the heap until Solr serves the
facets without OOM.

- Toke Eskildsen, State and University Library, Denmark