Re: Solr Heap, MMaps and Garbage Collection

2014-03-03 Thread Michael Sokolov

On 3/3/2014 1:54 AM, KNitin wrote:

3. 2.8 Gb - Perm Gen (I am guessing this is because of interned strings)
As others have pointed out, this is really unusual for Solr.  We often 
see high permgen in our app servers due to dynamic class loading that 
the framework performs; maybe you are somehow loading lots of new Solr 
plugins, or otherwise creating lots of classes?  Of course if you have a 
plugin or something that does a lot of string interning, that could also 
be an explanation.


-Mike


Re: Solr Heap, MMaps and Garbage Collection

2014-03-03 Thread KNitin
Regarding PermGen: Yes we have a bunch of custom jars loaded in solrcloud
(containing custom parsing, analyzers). But I haven't specifically enabled
any string interning. Does solr intern all strings in a collection by
default?

I agree with doc and Filter Query Cache. Query Result cache hits are
practically 0 for the large collection since our queries are tail by nature


Thanks
Nitin


On Mon, Mar 3, 2014 at 5:01 AM, Michael Sokolov 
msoko...@safaribooksonline.com wrote:

 On 3/3/2014 1:54 AM, KNitin wrote:

 3. 2.8 Gb - Perm Gen (I am guessing this is because of interned strings)

 As others have pointed out, this is really unusual for Solr.  We often see
 high permgen in our app servers due to dynamic class loading that the
 framework performs; maybe you are somehow loading lots of new Solr plugins,
 or otherwise creating lots of classes?  Of course if you have a plugin or
 something that does a lot of string interning, that could also be an
 explanation.

 -Mike



Re: Solr Heap, MMaps and Garbage Collection

2014-03-03 Thread KNitin
Is there a way to dump the contents of permgen and look at which classes
are occupying the most memory in that?

- Nitin


On Mon, Mar 3, 2014 at 11:19 AM, KNitin nitin.t...@gmail.com wrote:

 Regarding PermGen: Yes we have a bunch of custom jars loaded in solrcloud
 (containing custom parsing, analyzers). But I haven't specifically enabled
 any string interning. Does solr intern all strings in a collection by
 default?

 I agree with doc and Filter Query Cache. Query Result cache hits are
 practically 0 for the large collection since our queries are tail by nature


 Thanks
 Nitin


 On Mon, Mar 3, 2014 at 5:01 AM, Michael Sokolov 
 msoko...@safaribooksonline.com wrote:

 On 3/3/2014 1:54 AM, KNitin wrote:

 3. 2.8 Gb - Perm Gen (I am guessing this is because of interned strings)

 As others have pointed out, this is really unusual for Solr.  We often
 see high permgen in our app servers due to dynamic class loading that the
 framework performs; maybe you are somehow loading lots of new Solr plugins,
 or otherwise creating lots of classes?  Of course if you have a plugin or
 something that does a lot of string interning, that could also be an
 explanation.

 -Mike





Re: Solr Heap, MMaps and Garbage Collection

2014-03-03 Thread Tri Cao
If you just want to see which classes are occupying the most memory in a live JVM,you can do:jmap -permstat pidI don't think you can dump the contents of PERM space.Hope this helps,TriOn Mar 03, 2014, at 11:41 AM, KNitin nitin.t...@gmail.com wrote:Is there a way to dump the contents of permgen and look at which classes are occupying the most memory in that?  - Nitin   On Mon, Mar 3, 2014 at 11:19 AM, KNitin nitin.t...@gmail.com wrote: Regarding PermGen: Yes we have a bunch of custom jars loaded in solrcloud(containing custom parsing, analyzers). But I haven't specifically enabledany string interning. Does solr intern all strings in a collection bydefault?I agree with doc and Filter Query Cache. Query Result cache hits arepractically 0 for the large collection since our queries are tail by natureThanksNitinOn Mon, Mar 3, 2014 at 5:01 AM, Michael Sokolov msoko...@safaribooksonline.com wrote:On 3/3/2014 1:54 AM, KNitin wrote:3. 2.8 Gb - Perm Gen (I am guessing this is because of interned strings)As others have pointed out, this is really unusual for Solr. We oftensee high permgen in our app servers due to dynamic class loading that theframework performs; maybe you are somehow loading lots of new Solr plugins,or otherwise creating lots of classes? Of course if you have a plugin orsomething that does a lot of string interning, that could also be anexplanation.-Mike

Solr Heap, MMaps and Garbage Collection

2014-03-02 Thread KNitin
Hi

I have very large index for a few collections and when they are being
queried, i see the Old gen space close to 100% Usage all the time. The
system becomes extremely slow due to GC activity right after that and it
gets into this cycle very often

I have given solr close to 30G of heap in a 65 GB ram machine and rest is
given to RAm. I have a lot of hits in filter,query result and document
caches and the size of all the caches is around 512 entries per
collection.Are all the caches used by solr on or off heap ?


Given this scenario where GC is the primary bottleneck what is a good
recommended memory settings for solr? Should i increase the heap memory
(that will only postpone the problem before the heap becomes full again
after a while) ? Will memory maps help at all in this scenario?


Kindly advise on the best practices
Thanks
Nitin


Re: Solr Heap, MMaps and Garbage Collection

2014-03-02 Thread Walter Underwood
An LRU cache will always fill up the old generation. Old objects are ejected, 
and those are usually in the old generation.

Increasing the heap size will not eliminate this. It will make major, stop the 
world collections longer.

Increase the new generation size until the rate of old gen increase slows down. 
Then choose a total heap size to control the frequency (and duration) of major 
collections.

We run with the new generation at about 25% of the heap, so 8GB total and a 2GB 
newgen.

A 512 entry cache is very small for query results or docs. We run with 10K or 
more entries for those. The filter cache size depends on your usage. We have 
only a handful of different filter queries, so a tiny cache is fine.

What is your hit rate on the caches?

wunder

On Mar 2, 2014, at 7:42 PM, KNitin nitin.t...@gmail.com wrote:

 Hi
 
 I have very large index for a few collections and when they are being
 queried, i see the Old gen space close to 100% Usage all the time. The
 system becomes extremely slow due to GC activity right after that and it
 gets into this cycle very often
 
 I have given solr close to 30G of heap in a 65 GB ram machine and rest is
 given to RAm. I have a lot of hits in filter,query result and document
 caches and the size of all the caches is around 512 entries per
 collection.Are all the caches used by solr on or off heap ?
 
 
 Given this scenario where GC is the primary bottleneck what is a good
 recommended memory settings for solr? Should i increase the heap memory
 (that will only postpone the problem before the heap becomes full again
 after a while) ? Will memory maps help at all in this scenario?
 
 
 Kindly advise on the best practices
 Thanks
 Nitin




Re: Solr Heap, MMaps and Garbage Collection

2014-03-02 Thread KNitin
Thanks, Walter

Hit rate on the document caches is close to 70-80% and the filter caches
are a 100% hit (since most of our queries filter on the same fields but
have a different q parameter). Query result cache is not of great
importance to me since the hit rate their is almost negligible.

Does it mean i need to increase the size of my filter and document cache
for large indices?

The split up of my 25Gb heap usage is split as follows

1. 19 GB - Old Gen (100% pool utilization)
2.  3 Gb - New Gen (50% pool utilization)
3. 2.8 Gb - Perm Gen (I am guessing this is because of interned strings)
4. Survivor space is in the order of 300-400 MB and is almost always 100%
full.(Is this a major issue?)

We are also currently using Parallel GC collector but planning to move to
CMS for lesser stop-the-world gc times. If i increase the filter cache and
document cache entry sizes, they would also go to the Old gen right?

A very naive question: How does increasing young gen going to help if we
know that solr is already pushing major caches and other objects to old gen
because of their nature? My young gen pool utilization is still well under
50%


Thanks
Nitin


On Sun, Mar 2, 2014 at 9:31 PM, Walter Underwood wun...@wunderwood.orgwrote:

 An LRU cache will always fill up the old generation. Old objects are
 ejected, and those are usually in the old generation.

 Increasing the heap size will not eliminate this. It will make major, stop
 the world collections longer.

 Increase the new generation size until the rate of old gen increase slows
 down. Then choose a total heap size to control the frequency (and duration)
 of major collections.

 We run with the new generation at about 25% of the heap, so 8GB total and
 a 2GB newgen.

 A 512 entry cache is very small for query results or docs. We run with 10K
 or more entries for those. The filter cache size depends on your usage. We
 have only a handful of different filter queries, so a tiny cache is fine.

 What is your hit rate on the caches?

 wunder

 On Mar 2, 2014, at 7:42 PM, KNitin nitin.t...@gmail.com wrote:

  Hi
 
  I have very large index for a few collections and when they are being
  queried, i see the Old gen space close to 100% Usage all the time. The
  system becomes extremely slow due to GC activity right after that and it
  gets into this cycle very often
 
  I have given solr close to 30G of heap in a 65 GB ram machine and rest is
  given to RAm. I have a lot of hits in filter,query result and document
  caches and the size of all the caches is around 512 entries per
  collection.Are all the caches used by solr on or off heap ?
 
 
  Given this scenario where GC is the primary bottleneck what is a good
  recommended memory settings for solr? Should i increase the heap memory
  (that will only postpone the problem before the heap becomes full again
  after a while) ? Will memory maps help at all in this scenario?
 
 
  Kindly advise on the best practices
  Thanks
  Nitin





Re: Solr Heap, MMaps and Garbage Collection

2014-03-02 Thread Bernd Fehling
Actually, I haven't ever seen a PermGen with 2.8 GB.
So you must have a very special use case with SOLR.

For my little index with 60 million docs and 170GB index size I gave
PermGen 82 MB and it is only using 50.6 MB for a single VM.

Permanent Generation (PermGen) is completely separate from the heap.

Permanent Generation (non-heap):
The pool containing all the reflective data of the virtual machine itself,
such as class and method objects. With Java VMs that use class data sharing,
this generation is divided into read-only and read-write areas.

Regards
Bernd


Am 03.03.2014 07:54, schrieb KNitin:
 Thanks, Walter
 
 Hit rate on the document caches is close to 70-80% and the filter caches
 are a 100% hit (since most of our queries filter on the same fields but
 have a different q parameter). Query result cache is not of great
 importance to me since the hit rate their is almost negligible.
 
 Does it mean i need to increase the size of my filter and document cache
 for large indices?
 
 The split up of my 25Gb heap usage is split as follows
 
 1. 19 GB - Old Gen (100% pool utilization)
 2.  3 Gb - New Gen (50% pool utilization)
 3. 2.8 Gb - Perm Gen (I am guessing this is because of interned strings)
 4. Survivor space is in the order of 300-400 MB and is almost always 100%
 full.(Is this a major issue?)
 
 We are also currently using Parallel GC collector but planning to move to
 CMS for lesser stop-the-world gc times. If i increase the filter cache and
 document cache entry sizes, they would also go to the Old gen right?
 
 A very naive question: How does increasing young gen going to help if we
 know that solr is already pushing major caches and other objects to old gen
 because of their nature? My young gen pool utilization is still well under
 50%
 
 
 Thanks
 Nitin
 
 
 On Sun, Mar 2, 2014 at 9:31 PM, Walter Underwood wun...@wunderwood.orgwrote:
 
 An LRU cache will always fill up the old generation. Old objects are
 ejected, and those are usually in the old generation.

 Increasing the heap size will not eliminate this. It will make major, stop
 the world collections longer.

 Increase the new generation size until the rate of old gen increase slows
 down. Then choose a total heap size to control the frequency (and duration)
 of major collections.

 We run with the new generation at about 25% of the heap, so 8GB total and
 a 2GB newgen.

 A 512 entry cache is very small for query results or docs. We run with 10K
 or more entries for those. The filter cache size depends on your usage. We
 have only a handful of different filter queries, so a tiny cache is fine.

 What is your hit rate on the caches?

 wunder

 On Mar 2, 2014, at 7:42 PM, KNitin nitin.t...@gmail.com wrote:

 Hi

 I have very large index for a few collections and when they are being
 queried, i see the Old gen space close to 100% Usage all the time. The
 system becomes extremely slow due to GC activity right after that and it
 gets into this cycle very often

 I have given solr close to 30G of heap in a 65 GB ram machine and rest is
 given to RAm. I have a lot of hits in filter,query result and document
 caches and the size of all the caches is around 512 entries per
 collection.Are all the caches used by solr on or off heap ?


 Given this scenario where GC is the primary bottleneck what is a good
 recommended memory settings for solr? Should i increase the heap memory
 (that will only postpone the problem before the heap becomes full again
 after a while) ? Will memory maps help at all in this scenario?


 Kindly advise on the best practices
 Thanks
 Nitin



 

-- 
*
Bernd FehlingBielefeld University Library
Dipl.-Inform. (FH)LibTec - Library Technology
Universitätsstr. 25  and Knowledge Management
33615 Bielefeld
Tel. +49 521 106-4060   bernd.fehling(at)uni-bielefeld.de

BASE - Bielefeld Academic Search Engine - www.base-search.net
*


Re: Solr Heap, MMaps and Garbage Collection

2014-03-02 Thread Walter Underwood
New gen should be big enough to handle all allocations that have a lifetime of 
a single request, considering that you'll have multiple concurrent requests. If 
new gen is routinely overflowed, you can put short-lived objects in the old gen.

Yes, you need to go to CMS.

I have usually seen the hit rates on query results and doc caches to be fairly 
similar, with doc cache somewhat higher.

Cache hit rates depend on the number of queries between updates. If you update 
once per day and get a million queries or so, your hit rates can get pretty 
good.

70-80% seems typical for doc cache on an infrequently updated index. We stay 
around 75% on our busiest 4m doc index. 

The query result cache is the most important, because it saves the most work. 
Ours stays around 20%, but I should spend some time improving that.

The perm gen size is very big. I think we run with 128 Meg.

wunder

On Mar 2, 2014, at 10:54 PM, KNitin nitin.t...@gmail.com wrote:

 Thanks, Walter
 
 Hit rate on the document caches is close to 70-80% and the filter caches
 are a 100% hit (since most of our queries filter on the same fields but
 have a different q parameter). Query result cache is not of great
 importance to me since the hit rate their is almost negligible.
 
 Does it mean i need to increase the size of my filter and document cache
 for large indices?
 
 The split up of my 25Gb heap usage is split as follows
 
 1. 19 GB - Old Gen (100% pool utilization)
 2.  3 Gb - New Gen (50% pool utilization)
 3. 2.8 Gb - Perm Gen (I am guessing this is because of interned strings)
 4. Survivor space is in the order of 300-400 MB and is almost always 100%
 full.(Is this a major issue?)
 
 We are also currently using Parallel GC collector but planning to move to
 CMS for lesser stop-the-world gc times. If i increase the filter cache and
 document cache entry sizes, they would also go to the Old gen right?
 
 A very naive question: How does increasing young gen going to help if we
 know that solr is already pushing major caches and other objects to old gen
 because of their nature? My young gen pool utilization is still well under
 50%
 
 
 Thanks
 Nitin
 
 
 On Sun, Mar 2, 2014 at 9:31 PM, Walter Underwood wun...@wunderwood.orgwrote:
 
 An LRU cache will always fill up the old generation. Old objects are
 ejected, and those are usually in the old generation.
 
 Increasing the heap size will not eliminate this. It will make major, stop
 the world collections longer.
 
 Increase the new generation size until the rate of old gen increase slows
 down. Then choose a total heap size to control the frequency (and duration)
 of major collections.
 
 We run with the new generation at about 25% of the heap, so 8GB total and
 a 2GB newgen.
 
 A 512 entry cache is very small for query results or docs. We run with 10K
 or more entries for those. The filter cache size depends on your usage. We
 have only a handful of different filter queries, so a tiny cache is fine.
 
 What is your hit rate on the caches?
 
 wunder
 
 On Mar 2, 2014, at 7:42 PM, KNitin nitin.t...@gmail.com wrote:
 
 Hi
 
 I have very large index for a few collections and when they are being
 queried, i see the Old gen space close to 100% Usage all the time. The
 system becomes extremely slow due to GC activity right after that and it
 gets into this cycle very often
 
 I have given solr close to 30G of heap in a 65 GB ram machine and rest is
 given to RAm. I have a lot of hits in filter,query result and document
 caches and the size of all the caches is around 512 entries per
 collection.Are all the caches used by solr on or off heap ?
 
 
 Given this scenario where GC is the primary bottleneck what is a good
 recommended memory settings for solr? Should i increase the heap memory
 (that will only postpone the problem before the heap becomes full again
 after a while) ? Will memory maps help at all in this scenario?
 
 
 Kindly advise on the best practices
 Thanks
 Nitin
 
 
 

--
Walter Underwood
wun...@wunderwood.org