Re: Solr Heap, MMaps and Garbage Collection

2014-03-03 Thread Tri Cao
If you just want to see which classes are occupying the most memory in a live JVM,you can do:jmap -permstat I don't think you can dump the contents of PERM space.Hope this helps,TriOn Mar 03, 2014, at 11:41 AM, KNitin  wrote:Is there a way to dump the contents of permgen and look at which classes are occupying the most memory in that?  - Nitin   On Mon, Mar 3, 2014 at 11:19 AM, KNitin  wrote: Regarding PermGen: Yes we have a bunch of custom jars loaded in solrcloud(containing custom parsing, analyzers). But I haven't specifically enabledany string interning. Does solr intern all strings in a collection bydefault?I agree with doc and Filter Query Cache. Query Result cache hits arepractically 0 for the large collection since our queries are tail by natureThanksNitinOn Mon, Mar 3, 2014 at 5:01 AM, Michael Sokolov  wrote:On 3/3/2014 1:54 AM, KNitin wrote:3. 2.8 Gb - Perm Gen (I am guessing this is because of interned strings)As others have pointed out, this is really unusual for Solr. We oftensee high permgen in our app servers due to dynamic class loading that theframework performs; maybe you are somehow loading lots of new Solr plugins,or otherwise creating lots of classes? Of course if you have a plugin orsomething that does a lot of string interning, that could also be anexplanation.-Mike

Re: Solr Heap, MMaps and Garbage Collection

2014-03-03 Thread KNitin
Is there a way to dump the contents of permgen and look at which classes
are occupying the most memory in that?

- Nitin


On Mon, Mar 3, 2014 at 11:19 AM, KNitin  wrote:

> Regarding PermGen: Yes we have a bunch of custom jars loaded in solrcloud
> (containing custom parsing, analyzers). But I haven't specifically enabled
> any string interning. Does solr intern all strings in a collection by
> default?
>
> I agree with doc and Filter Query Cache. Query Result cache hits are
> practically 0 for the large collection since our queries are tail by nature
>
>
> Thanks
> Nitin
>
>
> On Mon, Mar 3, 2014 at 5:01 AM, Michael Sokolov <
> msoko...@safaribooksonline.com> wrote:
>
>> On 3/3/2014 1:54 AM, KNitin wrote:
>>
>>> 3. 2.8 Gb - Perm Gen (I am guessing this is because of interned strings)
>>>
>> As others have pointed out, this is really unusual for Solr.  We often
>> see high permgen in our app servers due to dynamic class loading that the
>> framework performs; maybe you are somehow loading lots of new Solr plugins,
>> or otherwise creating lots of classes?  Of course if you have a plugin or
>> something that does a lot of string interning, that could also be an
>> explanation.
>>
>> -Mike
>>
>
>


Re: Solr Heap, MMaps and Garbage Collection

2014-03-03 Thread KNitin
Regarding PermGen: Yes we have a bunch of custom jars loaded in solrcloud
(containing custom parsing, analyzers). But I haven't specifically enabled
any string interning. Does solr intern all strings in a collection by
default?

I agree with doc and Filter Query Cache. Query Result cache hits are
practically 0 for the large collection since our queries are tail by nature


Thanks
Nitin


On Mon, Mar 3, 2014 at 5:01 AM, Michael Sokolov <
msoko...@safaribooksonline.com> wrote:

> On 3/3/2014 1:54 AM, KNitin wrote:
>
>> 3. 2.8 Gb - Perm Gen (I am guessing this is because of interned strings)
>>
> As others have pointed out, this is really unusual for Solr.  We often see
> high permgen in our app servers due to dynamic class loading that the
> framework performs; maybe you are somehow loading lots of new Solr plugins,
> or otherwise creating lots of classes?  Of course if you have a plugin or
> something that does a lot of string interning, that could also be an
> explanation.
>
> -Mike
>


Re: Solr Heap, MMaps and Garbage Collection

2014-03-03 Thread Michael Sokolov

On 3/3/2014 1:54 AM, KNitin wrote:

3. 2.8 Gb - Perm Gen (I am guessing this is because of interned strings)
As others have pointed out, this is really unusual for Solr.  We often 
see high permgen in our app servers due to dynamic class loading that 
the framework performs; maybe you are somehow loading lots of new Solr 
plugins, or otherwise creating lots of classes?  Of course if you have a 
plugin or something that does a lot of string interning, that could also 
be an explanation.


-Mike


Re: Solr Heap, MMaps and Garbage Collection

2014-03-02 Thread Walter Underwood
New gen should be big enough to handle all allocations that have a lifetime of 
a single request, considering that you'll have multiple concurrent requests. If 
new gen is routinely overflowed, you can put short-lived objects in the old gen.

Yes, you need to go to CMS.

I have usually seen the hit rates on query results and doc caches to be fairly 
similar, with doc cache somewhat higher.

Cache hit rates depend on the number of queries between updates. If you update 
once per day and get a million queries or so, your hit rates can get pretty 
good.

70-80% seems typical for doc cache on an infrequently updated index. We stay 
around 75% on our busiest 4m doc index. 

The query result cache is the most important, because it saves the most work. 
Ours stays around 20%, but I should spend some time improving that.

The perm gen size is very big. I think we run with 128 Meg.

wunder

On Mar 2, 2014, at 10:54 PM, KNitin  wrote:

> Thanks, Walter
> 
> Hit rate on the document caches is close to 70-80% and the filter caches
> are a 100% hit (since most of our queries filter on the same fields but
> have a different q parameter). Query result cache is not of great
> importance to me since the hit rate their is almost negligible.
> 
> Does it mean i need to increase the size of my filter and document cache
> for large indices?
> 
> The split up of my 25Gb heap usage is split as follows
> 
> 1. 19 GB - Old Gen (100% pool utilization)
> 2.  3 Gb - New Gen (50% pool utilization)
> 3. 2.8 Gb - Perm Gen (I am guessing this is because of interned strings)
> 4. Survivor space is in the order of 300-400 MB and is almost always 100%
> full.(Is this a major issue?)
> 
> We are also currently using Parallel GC collector but planning to move to
> CMS for lesser stop-the-world gc times. If i increase the filter cache and
> document cache entry sizes, they would also go to the Old gen right?
> 
> A very naive question: How does increasing young gen going to help if we
> know that solr is already pushing major caches and other objects to old gen
> because of their nature? My young gen pool utilization is still well under
> 50%
> 
> 
> Thanks
> Nitin
> 
> 
> On Sun, Mar 2, 2014 at 9:31 PM, Walter Underwood wrote:
> 
>> An LRU cache will always fill up the old generation. Old objects are
>> ejected, and those are usually in the old generation.
>> 
>> Increasing the heap size will not eliminate this. It will make major, stop
>> the world collections longer.
>> 
>> Increase the new generation size until the rate of old gen increase slows
>> down. Then choose a total heap size to control the frequency (and duration)
>> of major collections.
>> 
>> We run with the new generation at about 25% of the heap, so 8GB total and
>> a 2GB newgen.
>> 
>> A 512 entry cache is very small for query results or docs. We run with 10K
>> or more entries for those. The filter cache size depends on your usage. We
>> have only a handful of different filter queries, so a tiny cache is fine.
>> 
>> What is your hit rate on the caches?
>> 
>> wunder
>> 
>> On Mar 2, 2014, at 7:42 PM, KNitin  wrote:
>> 
>>> Hi
>>> 
>>> I have very large index for a few collections and when they are being
>>> queried, i see the Old gen space close to 100% Usage all the time. The
>>> system becomes extremely slow due to GC activity right after that and it
>>> gets into this cycle very often
>>> 
>>> I have given solr close to 30G of heap in a 65 GB ram machine and rest is
>>> given to RAm. I have a lot of hits in filter,query result and document
>>> caches and the size of all the caches is around 512 entries per
>>> collection.Are all the caches used by solr on or off heap ?
>>> 
>>> 
>>> Given this scenario where GC is the primary bottleneck what is a good
>>> recommended memory settings for solr? Should i increase the heap memory
>>> (that will only postpone the problem before the heap becomes full again
>>> after a while) ? Will memory maps help at all in this scenario?
>>> 
>>> 
>>> Kindly advise on the best practices
>>> Thanks
>>> Nitin
>> 
>> 
>> 

--
Walter Underwood
wun...@wunderwood.org





Re: Solr Heap, MMaps and Garbage Collection

2014-03-02 Thread Bernd Fehling
Actually, I haven't ever seen a PermGen with 2.8 GB.
So you must have a very special use case with SOLR.

For my little index with 60 million docs and 170GB index size I gave
PermGen 82 MB and it is only using 50.6 MB for a single VM.

Permanent Generation (PermGen) is completely separate from the heap.

Permanent Generation (non-heap):
The pool containing all the reflective data of the virtual machine itself,
such as class and method objects. With Java VMs that use class data sharing,
this generation is divided into read-only and read-write areas.

Regards
Bernd


Am 03.03.2014 07:54, schrieb KNitin:
> Thanks, Walter
> 
> Hit rate on the document caches is close to 70-80% and the filter caches
> are a 100% hit (since most of our queries filter on the same fields but
> have a different q parameter). Query result cache is not of great
> importance to me since the hit rate their is almost negligible.
> 
> Does it mean i need to increase the size of my filter and document cache
> for large indices?
> 
> The split up of my 25Gb heap usage is split as follows
> 
> 1. 19 GB - Old Gen (100% pool utilization)
> 2.  3 Gb - New Gen (50% pool utilization)
> 3. 2.8 Gb - Perm Gen (I am guessing this is because of interned strings)
> 4. Survivor space is in the order of 300-400 MB and is almost always 100%
> full.(Is this a major issue?)
> 
> We are also currently using Parallel GC collector but planning to move to
> CMS for lesser stop-the-world gc times. If i increase the filter cache and
> document cache entry sizes, they would also go to the Old gen right?
> 
> A very naive question: How does increasing young gen going to help if we
> know that solr is already pushing major caches and other objects to old gen
> because of their nature? My young gen pool utilization is still well under
> 50%
> 
> 
> Thanks
> Nitin
> 
> 
> On Sun, Mar 2, 2014 at 9:31 PM, Walter Underwood wrote:
> 
>> An LRU cache will always fill up the old generation. Old objects are
>> ejected, and those are usually in the old generation.
>>
>> Increasing the heap size will not eliminate this. It will make major, stop
>> the world collections longer.
>>
>> Increase the new generation size until the rate of old gen increase slows
>> down. Then choose a total heap size to control the frequency (and duration)
>> of major collections.
>>
>> We run with the new generation at about 25% of the heap, so 8GB total and
>> a 2GB newgen.
>>
>> A 512 entry cache is very small for query results or docs. We run with 10K
>> or more entries for those. The filter cache size depends on your usage. We
>> have only a handful of different filter queries, so a tiny cache is fine.
>>
>> What is your hit rate on the caches?
>>
>> wunder
>>
>> On Mar 2, 2014, at 7:42 PM, KNitin  wrote:
>>
>>> Hi
>>>
>>> I have very large index for a few collections and when they are being
>>> queried, i see the Old gen space close to 100% Usage all the time. The
>>> system becomes extremely slow due to GC activity right after that and it
>>> gets into this cycle very often
>>>
>>> I have given solr close to 30G of heap in a 65 GB ram machine and rest is
>>> given to RAm. I have a lot of hits in filter,query result and document
>>> caches and the size of all the caches is around 512 entries per
>>> collection.Are all the caches used by solr on or off heap ?
>>>
>>>
>>> Given this scenario where GC is the primary bottleneck what is a good
>>> recommended memory settings for solr? Should i increase the heap memory
>>> (that will only postpone the problem before the heap becomes full again
>>> after a while) ? Will memory maps help at all in this scenario?
>>>
>>>
>>> Kindly advise on the best practices
>>> Thanks
>>> Nitin
>>
>>
>>
> 

-- 
*
Bernd FehlingBielefeld University Library
Dipl.-Inform. (FH)LibTec - Library Technology
Universitätsstr. 25  and Knowledge Management
33615 Bielefeld
Tel. +49 521 106-4060   bernd.fehling(at)uni-bielefeld.de

BASE - Bielefeld Academic Search Engine - www.base-search.net
*


Re: Solr Heap, MMaps and Garbage Collection

2014-03-02 Thread KNitin
Thanks, Walter

Hit rate on the document caches is close to 70-80% and the filter caches
are a 100% hit (since most of our queries filter on the same fields but
have a different q parameter). Query result cache is not of great
importance to me since the hit rate their is almost negligible.

Does it mean i need to increase the size of my filter and document cache
for large indices?

The split up of my 25Gb heap usage is split as follows

1. 19 GB - Old Gen (100% pool utilization)
2.  3 Gb - New Gen (50% pool utilization)
3. 2.8 Gb - Perm Gen (I am guessing this is because of interned strings)
4. Survivor space is in the order of 300-400 MB and is almost always 100%
full.(Is this a major issue?)

We are also currently using Parallel GC collector but planning to move to
CMS for lesser stop-the-world gc times. If i increase the filter cache and
document cache entry sizes, they would also go to the Old gen right?

A very naive question: How does increasing young gen going to help if we
know that solr is already pushing major caches and other objects to old gen
because of their nature? My young gen pool utilization is still well under
50%


Thanks
Nitin


On Sun, Mar 2, 2014 at 9:31 PM, Walter Underwood wrote:

> An LRU cache will always fill up the old generation. Old objects are
> ejected, and those are usually in the old generation.
>
> Increasing the heap size will not eliminate this. It will make major, stop
> the world collections longer.
>
> Increase the new generation size until the rate of old gen increase slows
> down. Then choose a total heap size to control the frequency (and duration)
> of major collections.
>
> We run with the new generation at about 25% of the heap, so 8GB total and
> a 2GB newgen.
>
> A 512 entry cache is very small for query results or docs. We run with 10K
> or more entries for those. The filter cache size depends on your usage. We
> have only a handful of different filter queries, so a tiny cache is fine.
>
> What is your hit rate on the caches?
>
> wunder
>
> On Mar 2, 2014, at 7:42 PM, KNitin  wrote:
>
> > Hi
> >
> > I have very large index for a few collections and when they are being
> > queried, i see the Old gen space close to 100% Usage all the time. The
> > system becomes extremely slow due to GC activity right after that and it
> > gets into this cycle very often
> >
> > I have given solr close to 30G of heap in a 65 GB ram machine and rest is
> > given to RAm. I have a lot of hits in filter,query result and document
> > caches and the size of all the caches is around 512 entries per
> > collection.Are all the caches used by solr on or off heap ?
> >
> >
> > Given this scenario where GC is the primary bottleneck what is a good
> > recommended memory settings for solr? Should i increase the heap memory
> > (that will only postpone the problem before the heap becomes full again
> > after a while) ? Will memory maps help at all in this scenario?
> >
> >
> > Kindly advise on the best practices
> > Thanks
> > Nitin
>
>
>


Re: Solr Heap, MMaps and Garbage Collection

2014-03-02 Thread Walter Underwood
An LRU cache will always fill up the old generation. Old objects are ejected, 
and those are usually in the old generation.

Increasing the heap size will not eliminate this. It will make major, stop the 
world collections longer.

Increase the new generation size until the rate of old gen increase slows down. 
Then choose a total heap size to control the frequency (and duration) of major 
collections.

We run with the new generation at about 25% of the heap, so 8GB total and a 2GB 
newgen.

A 512 entry cache is very small for query results or docs. We run with 10K or 
more entries for those. The filter cache size depends on your usage. We have 
only a handful of different filter queries, so a tiny cache is fine.

What is your hit rate on the caches?

wunder

On Mar 2, 2014, at 7:42 PM, KNitin  wrote:

> Hi
> 
> I have very large index for a few collections and when they are being
> queried, i see the Old gen space close to 100% Usage all the time. The
> system becomes extremely slow due to GC activity right after that and it
> gets into this cycle very often
> 
> I have given solr close to 30G of heap in a 65 GB ram machine and rest is
> given to RAm. I have a lot of hits in filter,query result and document
> caches and the size of all the caches is around 512 entries per
> collection.Are all the caches used by solr on or off heap ?
> 
> 
> Given this scenario where GC is the primary bottleneck what is a good
> recommended memory settings for solr? Should i increase the heap memory
> (that will only postpone the problem before the heap becomes full again
> after a while) ? Will memory maps help at all in this scenario?
> 
> 
> Kindly advise on the best practices
> Thanks
> Nitin