Re: Solr memory requirements?

2009-05-17 Thread Peter Wolanin
I think that if you have in your index any documents with norms, you
will still use norms for those fields even if the schema is changed
later.  Did you wipe and re-index after all your schema changes?

-Peter

On Fri, May 15, 2009 at 9:14 PM, vivek sar vivex...@gmail.com wrote:
 Some more info,

  Profiling the heap dump shows
 org.apache.lucene.index.ReadOnlySegmentReader as the biggest object
 - taking up almost 80% of total memory (6G) - see the attached screen
 shot for a smaller dump. There is some norms object - not sure where
 are they coming from as I've omitnorms=true for all indexed records.

 I also noticed that if I run a query - let's say generic query that
 hits 100million records and then follow up with a specific query -
 which hits only 1 record, the second query causes the increase in
 heap.

 Looks like there are few bytes being loaded into memory for each
 document - I've checked the schema all indexes have omitNorms=true,
 all caches are commented out - still looking to see what else might
 put things in memory which don't get collected by GC.

 I also saw, https://issues.apache.org/jira/browse/SOLR- for Solr
 1.4 (which I'm using). Not sure if that can cause any problem. I do
 use range queries for dates - would that have any effect?

 Any other ideas?

 Thanks,
 -vivek

 On Thu, May 14, 2009 at 8:38 PM, vivek sar vivex...@gmail.com wrote:
 Thanks Mark.

 I checked all the items you mentioned,

 1) I've omitnorms=true for all my indexed fields (stored only fields I
 guess doesn't matter)
 2) I've tried commenting out all caches in the solrconfig.xml, but
 that doesn't help much
 3) I've tried commenting out the first and new searcher listeners
 settings in the solrconfig.xml - the only way that helps is that at
 startup time the memory usage doesn't spike up - that's only because
 there is no auto-warmer query to run. But, I noticed commenting out
 searchers slows down any other queries to Solr.
 4) I don't have any sort or facet in my queries
 5) I'm not sure how to change the Lucene term interval from Solr -
 is there a way to do that?

 I've been playing around with this memory thing the whole day and have
 found that it's the search that's hogging the memory. Any time there
 is a search on all the records (800 million) the heap consumption
 jumps by 5G. This makes me think there has to be some configuration in
 Solr that's causing some terms per document to be loaded in memory.

 I've posted my settings several times on this forum, but no one has
 been able to pin point what configuration might be causing this. If
 someone is interested I can attach the solrconfig and schema files as
 well. Here are the settings again under Query tag,

 query
  maxBooleanClauses1024/maxBooleanClauses
  enableLazyFieldLoadingtrue/enableLazyFieldLoading
  queryResultWindowSize50/queryResultWindowSize
  queryResultMaxDocsCached200/queryResultMaxDocsCached
   HashDocSet maxSize=3000 loadFactor=0.75/
  useColdSearcherfalse/useColdSearcher
  maxWarmingSearchers2/maxWarmingSearchers
  /query

 and schema,

  field name=id type=long indexed=true stored=true
 required=true omitNorms=true compressed=false/

  field name=atmps type=integer indexed=false stored=true
 compressed=false/
  field name=bcid type=string indexed=true stored=true
 omitNorms=true compressed=false/
  field name=cmpcd type=string indexed=true stored=true
 omitNorms=true compressed=false/
  field name=ctry type=string indexed=true stored=true
 omitNorms=true compressed=false/
  field name=dlt type=date indexed=false stored=true
 default=NOW/HOUR  compressed=false/
  field name=dmn type=string indexed=true stored=true
 omitNorms=true compressed=false/
  field name=eaddr type=string indexed=true stored=true
 omitNorms=true compressed=false/
  field name=emsg type=string indexed=false stored=true
 compressed=false/
  field name=erc type=string indexed=false stored=true
 compressed=false/
  field name=evt type=string indexed=true stored=true
 omitNorms=true compressed=false/
  field name=from type=string indexed=true stored=true
 omitNorms=true compressed=false/
  field name=lfid type=string indexed=true stored=true
 omitNorms=true compressed=false/
  field name=lsid type=string indexed=true stored=true
 omitNorms=true compressed=false/
  field name=prsid type=string indexed=true stored=true
 omitNorms=true compressed=false/
  field name=rc type=string indexed=false stored=true
 compressed=false/
  field name=rmcd type=string indexed=false stored=true
 compressed=false/
  field name=rmscd type=string indexed=false stored=true
 compressed=false/
  field name=scd type=string indexed=true stored=true
 omitNorms=true compressed=false/
  field name=sip type=string indexed=false stored=true
 compressed=false/
  field name=ts type=date indexed=true stored=false
 default=NOW/HOUR omitNorms=true/

  !-- catchall field, containing all other searchable text fields 
 (implemented
       via copyField further on in this schema  --
  field name=all 

Re: Solr memory requirements?

2009-05-17 Thread jlist9
I've never paid attention to post/commit ration. I usually do a commit
after maybe 100 posts. Is there a guideline about this? Thanks.

On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:

 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will consume 
 during indexing.  There is no need to commit every 50K docs unless you want 
 to trigger snapshot creation.


Re: Solr memory requirements?

2009-05-15 Thread vivek sar
Some more info,

  Profiling the heap dump shows
org.apache.lucene.index.ReadOnlySegmentReader as the biggest object
- taking up almost 80% of total memory (6G) - see the attached screen
shot for a smaller dump. There is some norms object - not sure where
are they coming from as I've omitnorms=true for all indexed records.

I also noticed that if I run a query - let's say generic query that
hits 100million records and then follow up with a specific query -
which hits only 1 record, the second query causes the increase in
heap.

Looks like there are few bytes being loaded into memory for each
document - I've checked the schema all indexes have omitNorms=true,
all caches are commented out - still looking to see what else might
put things in memory which don't get collected by GC.

I also saw, https://issues.apache.org/jira/browse/SOLR- for Solr
1.4 (which I'm using). Not sure if that can cause any problem. I do
use range queries for dates - would that have any effect?

Any other ideas?

Thanks,
-vivek

On Thu, May 14, 2009 at 8:38 PM, vivek sar vivex...@gmail.com wrote:
 Thanks Mark.

 I checked all the items you mentioned,

 1) I've omitnorms=true for all my indexed fields (stored only fields I
 guess doesn't matter)
 2) I've tried commenting out all caches in the solrconfig.xml, but
 that doesn't help much
 3) I've tried commenting out the first and new searcher listeners
 settings in the solrconfig.xml - the only way that helps is that at
 startup time the memory usage doesn't spike up - that's only because
 there is no auto-warmer query to run. But, I noticed commenting out
 searchers slows down any other queries to Solr.
 4) I don't have any sort or facet in my queries
 5) I'm not sure how to change the Lucene term interval from Solr -
 is there a way to do that?

 I've been playing around with this memory thing the whole day and have
 found that it's the search that's hogging the memory. Any time there
 is a search on all the records (800 million) the heap consumption
 jumps by 5G. This makes me think there has to be some configuration in
 Solr that's causing some terms per document to be loaded in memory.

 I've posted my settings several times on this forum, but no one has
 been able to pin point what configuration might be causing this. If
 someone is interested I can attach the solrconfig and schema files as
 well. Here are the settings again under Query tag,

 query
  maxBooleanClauses1024/maxBooleanClauses
  enableLazyFieldLoadingtrue/enableLazyFieldLoading
  queryResultWindowSize50/queryResultWindowSize
  queryResultMaxDocsCached200/queryResultMaxDocsCached
   HashDocSet maxSize=3000 loadFactor=0.75/
  useColdSearcherfalse/useColdSearcher
  maxWarmingSearchers2/maxWarmingSearchers
  /query

 and schema,

  field name=id type=long indexed=true stored=true
 required=true omitNorms=true compressed=false/

  field name=atmps type=integer indexed=false stored=true
 compressed=false/
  field name=bcid type=string indexed=true stored=true
 omitNorms=true compressed=false/
  field name=cmpcd type=string indexed=true stored=true
 omitNorms=true compressed=false/
  field name=ctry type=string indexed=true stored=true
 omitNorms=true compressed=false/
  field name=dlt type=date indexed=false stored=true
 default=NOW/HOUR  compressed=false/
  field name=dmn type=string indexed=true stored=true
 omitNorms=true compressed=false/
  field name=eaddr type=string indexed=true stored=true
 omitNorms=true compressed=false/
  field name=emsg type=string indexed=false stored=true
 compressed=false/
  field name=erc type=string indexed=false stored=true
 compressed=false/
  field name=evt type=string indexed=true stored=true
 omitNorms=true compressed=false/
  field name=from type=string indexed=true stored=true
 omitNorms=true compressed=false/
  field name=lfid type=string indexed=true stored=true
 omitNorms=true compressed=false/
  field name=lsid type=string indexed=true stored=true
 omitNorms=true compressed=false/
  field name=prsid type=string indexed=true stored=true
 omitNorms=true compressed=false/
  field name=rc type=string indexed=false stored=true
 compressed=false/
  field name=rmcd type=string indexed=false stored=true
 compressed=false/
  field name=rmscd type=string indexed=false stored=true
 compressed=false/
  field name=scd type=string indexed=true stored=true
 omitNorms=true compressed=false/
  field name=sip type=string indexed=false stored=true
 compressed=false/
  field name=ts type=date indexed=true stored=false
 default=NOW/HOUR omitNorms=true/

  !-- catchall field, containing all other searchable text fields (implemented
       via copyField further on in this schema  --
  field name=all type=text_ws indexed=true stored=false
 omitNorms=true multiValued=true/

 Any help is greatly appreciated.

 Thanks,
 -vivek

 On Thu, May 14, 2009 at 6:22 PM, Mark Miller markrmil...@gmail.com wrote:
 800 million docs is on the high side for modern hardware.

 If even one field has norms on, your 

Re: Solr memory requirements?

2009-05-14 Thread vivek sar
Otis,

 We are not running master-slave configuration. We get very few
searches(admin only) in a day so we didn't see the need of
replication/snapshot. This problem is with one Solr instance managing
4 cores (each core 200 million records). Both indexing and searching
is performed by the same Solr instance.

What are .tii files used for? I see this file under only one core.

Still looking for what gets loaded in heap by Solr (during load time,
indexing, and searching) and stays there. I see most of these are
tenured objects and not getting released by GC - will post profile
records tomorrow.

Thanks,
-vivek





On Wed, May 13, 2009 at 6:34 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:

 There is constant mixing of indexing concepts and searching concepts in this 
 thread.  Are you having problems on the master (indexing) or on the slave 
 (searching)?


 That .tii is only 20K and you said this is a large index?  That doesn't smell 
 right...

 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
 From: vivek sar vivex...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Wednesday, May 13, 2009 5:12:00 PM
 Subject: Re: Solr memory requirements?

 Otis,

 In that case, I'm not sure why Solr is taking up so much memory as
 soon as we start it up. I checked for .tii file and there is only one,

 -rw-r--r--  1 search  staff  20306 May 11 21:47 
 ./20090510_1/data/index/_3au.tii

 I have all the cache disabled - so that shouldn't be a problem too. My
 ramBuffer size is only 64MB.

 I read note on sorting,
 http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see
 something related to FieldCache. I don't see this as parameter defined
 in either solrconfig.xml or schema.xml. Could this be something that
 can load things in memory at startup? How can we disable it?

 I'm trying to find out if there is a way to tell how much memory Solr
 would consume and way to cap it.

 Thanks,
 -vivek




 On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic
 wrote:
 
  Hi,
 
  Sorting is triggered by the sort parameter in the URL, not a 
  characteristic of
 a field. :)
 
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
  - Original Message 
  From: vivek sar
  To: solr-user@lucene.apache.org
  Sent: Wednesday, May 13, 2009 4:42:16 PM
  Subject: Re: Solr memory requirements?
 
  Thanks Otis.
 
  Our use case doesn't require any sorting or faceting. I'm wondering if
  I've configured anything wrong.
 
  I got total of 25 fields (15 are indexed and stored, other 10 are just
  stored). All my fields are basic data type - which I thought are not
  sorted. My id field is unique key.
 
  Is there any field here that might be getting sorted?
 
 
  required=true omitNorms=true compressed=false/
 
 
  compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  default=NOW/HOUR  compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  compressed=false/
 
  compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  compressed=false/
 
  compressed=false/
 
  compressed=false/
 
  omitNorms=true compressed=false/
 
  compressed=false/
 
  default=NOW/HOUR omitNorms=true/
 
 
 
 
  omitNorms=true multiValued=true/
 
  Thanks,
  -vivek
 
  On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
  wrote:
  
   Hi,
   Some answers:
   1) .tii files in the Lucene index.  When you sort, all distinct values 
   for
 the
  field(s) used for sorting.  Similarly for facet fields.  Solr caches.
   2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will
 consume
  during indexing.  There is no need to commit every 50K docs unless you 
  want
 to
  trigger snapshot creation.
   3) see 1) above
  
   1.5 billion docs per instance where each doc is cca 1KB?  I doubt that's
 going
  to fly. :)
  
   Otis
   --
   Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
  
  
  
   - Original Message 
   From: vivek sar
   To: solr-user@lucene.apache.org
   Sent: Wednesday, May 13, 2009 3:04:46 PM
   Subject: Solr memory requirements?
  
   Hi,
  
     I'm pretty sure this has been asked before, but I couldn't find a
   complete answer in the forum archive. Here are my questions,
  
   1) When solr starts up what does it loads up in the memory? Let's say
   I've 4 cores with each core 50G in size. When Solr comes up how much
   of it would be loaded in memory?
  
   2) How much memory is required during index time? If I'm committing
   50K records at a time (1 record = 1KB) using solrj, how much memory do
   I need to give to Solr.
  
   3) Is there a minimum memory requirement by Solr to maintain a certain
   size index? Is there any benchmark on this?
  
   Here are some of my

Re: Solr memory requirements?

2009-05-14 Thread vivek sar
I don't know if field type has any impact on the memory usage - does it?

Our use cases require complete matches, thus there is no need of any
analysis in most cases - does it matter in terms of memory usage?

Also, is there any default caching used by Solr if I comment out all
the caches under query in solrconfig.xml? I also don't have any
auto-warming queries.

Thanks,
-vivek

On Wed, May 13, 2009 at 4:24 PM, Erick Erickson erickerick...@gmail.com wrote:
 Warning: I'm wy out of my competency range when I comment
 on SOLR, but I've seen the statement that string fields are NOT
 tokenized while text fields are, and I notice that almost all of your fields
 are string type.

 Would someone more knowledgeable than me care to comment on whether
 this is at all relevant? Offered in the spirit that sometimes there are
 things
 so basic that only an amateur can see them G

 Best
 Erick

 On Wed, May 13, 2009 at 4:42 PM, vivek sar vivex...@gmail.com wrote:

 Thanks Otis.

 Our use case doesn't require any sorting or faceting. I'm wondering if
 I've configured anything wrong.

 I got total of 25 fields (15 are indexed and stored, other 10 are just
 stored). All my fields are basic data type - which I thought are not
 sorted. My id field is unique key.

 Is there any field here that might be getting sorted?

  field name=id type=long indexed=true stored=true
 required=true omitNorms=true compressed=false/

   field name=atmps type=integer indexed=false stored=true
 compressed=false/
   field name=bcid type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=cmpcd type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=ctry type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=dlt type=date indexed=false stored=true
 default=NOW/HOUR  compressed=false/
   field name=dmn type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=eaddr type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=emsg type=string indexed=false stored=true
 compressed=false/
   field name=erc type=string indexed=false stored=true
 compressed=false/
   field name=evt type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=from type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=lfid type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=lsid type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=prsid type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=rc type=string indexed=false stored=true
 compressed=false/
   field name=rmcd type=string indexed=false stored=true
 compressed=false/
   field name=rmscd type=string indexed=false stored=true
 compressed=false/
   field name=scd type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=sip type=string indexed=false stored=true
 compressed=false/
   field name=ts type=date indexed=true stored=false
 default=NOW/HOUR omitNorms=true/


   !-- catchall field, containing all other searchable text fields
 (implemented
        via copyField further on in this schema  --
   field name=all type=text_ws indexed=true stored=false
 omitNorms=true multiValued=true/

 Thanks,
 -vivek

 On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
 otis_gospodne...@yahoo.com wrote:
 
  Hi,
  Some answers:
  1) .tii files in the Lucene index.  When you sort, all distinct values
 for the field(s) used for sorting.  Similarly for facet fields.  Solr
 caches.
  2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will
 consume during indexing.  There is no need to commit every 50K docs unless
 you want to trigger snapshot creation.
  3) see 1) above
 
  1.5 billion docs per instance where each doc is cca 1KB?  I doubt that's
 going to fly. :)
 
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
  - Original Message 
  From: vivek sar vivex...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Wednesday, May 13, 2009 3:04:46 PM
  Subject: Solr memory requirements?
 
  Hi,
 
    I'm pretty sure this has been asked before, but I couldn't find a
  complete answer in the forum archive. Here are my questions,
 
  1) When solr starts up what does it loads up in the memory? Let's say
  I've 4 cores with each core 50G in size. When Solr comes up how much
  of it would be loaded in memory?
 
  2) How much memory is required during index time? If I'm committing
  50K records at a time (1 record = 1KB) using solrj, how much memory do
  I need to give to Solr.
 
  3) Is there a minimum memory requirement by Solr to maintain a certain
  size index? Is there any benchmark on this?
 
  Here are some of my configuration from solrconfig.xml,
 
  1) 64
  2) All the caches (under query tag) are commented out
  3) Few others,
        a)  true    ==
  would this require 

Re: Solr memory requirements?

2009-05-14 Thread vivek sar
Some update on this issue,

1) I attached jconsole to my app and monitored the memory usage.
During indexing the memory usage goes up and down, which I think is
normal. The memory remains around the min heap size (4 G) for
indexing, but as soon as I run a search the tenured heap usage jumps
up to 6G and remains there. Subsequent searches increases the heap
usage even more until it reaches the max (8G) - after which everything
(indexing and searching becomes slow).

The search query is a very generic one in this case which goes through
all the cores (4 of them - 800 million records), finds 400million
matches and returns 100 rows.

Does the Solr searcher holds up the reference to objects in memory? I
couldn't find any settings that would tell me it does, but every
search causing heap to go up is definitely suspicious.

2) I ran the jmap histo to get the top objects (this is on a smaller
instance with 2 G memory, this is before running search - after
running search I wasn't able to run jmap),

 num #instances #bytes  class name
--
   1:   3890855  222608992  [C
   2:   3891673  155666920  java.lang.String
   3:   3284341  131373640  org.apache.lucene.index.TermInfo
   4:   3334198  106694336  org.apache.lucene.index.Term
   5:   271   26286496  [J
   6:16   26273936  [Lorg.apache.lucene.index.Term;
   7:16   26273936  [Lorg.apache.lucene.index.TermInfo;
   8:320512   15384576
org.apache.lucene.index.FreqProxTermsWriter$PostingList
   9: 10335   11554136  [I

I'm not sure what's the first one (C)? I couldn't profile it to know
what all the Strings are being allocated by - any ideas?

Any ideas on what Searcher might be holding on and how can we change
that behavior?

Thanks,
-vivek


On Thu, May 14, 2009 at 11:33 AM, vivek sar vivex...@gmail.com wrote:
 I don't know if field type has any impact on the memory usage - does it?

 Our use cases require complete matches, thus there is no need of any
 analysis in most cases - does it matter in terms of memory usage?

 Also, is there any default caching used by Solr if I comment out all
 the caches under query in solrconfig.xml? I also don't have any
 auto-warming queries.

 Thanks,
 -vivek

 On Wed, May 13, 2009 at 4:24 PM, Erick Erickson erickerick...@gmail.com 
 wrote:
 Warning: I'm wy out of my competency range when I comment
 on SOLR, but I've seen the statement that string fields are NOT
 tokenized while text fields are, and I notice that almost all of your fields
 are string type.

 Would someone more knowledgeable than me care to comment on whether
 this is at all relevant? Offered in the spirit that sometimes there are
 things
 so basic that only an amateur can see them G

 Best
 Erick

 On Wed, May 13, 2009 at 4:42 PM, vivek sar vivex...@gmail.com wrote:

 Thanks Otis.

 Our use case doesn't require any sorting or faceting. I'm wondering if
 I've configured anything wrong.

 I got total of 25 fields (15 are indexed and stored, other 10 are just
 stored). All my fields are basic data type - which I thought are not
 sorted. My id field is unique key.

 Is there any field here that might be getting sorted?

  field name=id type=long indexed=true stored=true
 required=true omitNorms=true compressed=false/

   field name=atmps type=integer indexed=false stored=true
 compressed=false/
   field name=bcid type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=cmpcd type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=ctry type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=dlt type=date indexed=false stored=true
 default=NOW/HOUR  compressed=false/
   field name=dmn type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=eaddr type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=emsg type=string indexed=false stored=true
 compressed=false/
   field name=erc type=string indexed=false stored=true
 compressed=false/
   field name=evt type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=from type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=lfid type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=lsid type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=prsid type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=rc type=string indexed=false stored=true
 compressed=false/
   field name=rmcd type=string indexed=false stored=true
 compressed=false/
   field name=rmscd type=string indexed=false stored=true
 compressed=false/
   field name=scd type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=sip type=string indexed=false stored=true
 compressed=false/
   field 

Re: Solr memory requirements?

2009-05-14 Thread Mark Miller

800 million docs is on the high side for modern hardware.

If even one field has norms on, your talking almost 800 MB right there. 
And then if another Searcher is brought up well the old one is serving 
(which happens when you update)? Doubled.


Your best bet is to distribute across a couple machines.

To minimize you would want to turn off or down caching, don't facet, 
don't sort, turn off all norms, possibly get at the Lucene term interval 
and raise it. Drop on deck searchers setting. Even then, 800 
million...time to distribute I'd think.


vivek sar wrote:

Some update on this issue,

1) I attached jconsole to my app and monitored the memory usage.
During indexing the memory usage goes up and down, which I think is
normal. The memory remains around the min heap size (4 G) for
indexing, but as soon as I run a search the tenured heap usage jumps
up to 6G and remains there. Subsequent searches increases the heap
usage even more until it reaches the max (8G) - after which everything
(indexing and searching becomes slow).

The search query is a very generic one in this case which goes through
all the cores (4 of them - 800 million records), finds 400million
matches and returns 100 rows.

Does the Solr searcher holds up the reference to objects in memory? I
couldn't find any settings that would tell me it does, but every
search causing heap to go up is definitely suspicious.

2) I ran the jmap histo to get the top objects (this is on a smaller
instance with 2 G memory, this is before running search - after
running search I wasn't able to run jmap),

 num #instances #bytes  class name
--
   1:   3890855  222608992  [C
   2:   3891673  155666920  java.lang.String
   3:   3284341  131373640  org.apache.lucene.index.TermInfo
   4:   3334198  106694336  org.apache.lucene.index.Term
   5:   271   26286496  [J
   6:16   26273936  [Lorg.apache.lucene.index.Term;
   7:16   26273936  [Lorg.apache.lucene.index.TermInfo;
   8:320512   15384576
org.apache.lucene.index.FreqProxTermsWriter$PostingList
   9: 10335   11554136  [I

I'm not sure what's the first one (C)? I couldn't profile it to know
what all the Strings are being allocated by - any ideas?

Any ideas on what Searcher might be holding on and how can we change
that behavior?

Thanks,
-vivek


On Thu, May 14, 2009 at 11:33 AM, vivek sar vivex...@gmail.com wrote:
  

I don't know if field type has any impact on the memory usage - does it?

Our use cases require complete matches, thus there is no need of any
analysis in most cases - does it matter in terms of memory usage?

Also, is there any default caching used by Solr if I comment out all
the caches under query in solrconfig.xml? I also don't have any
auto-warming queries.

Thanks,
-vivek

On Wed, May 13, 2009 at 4:24 PM, Erick Erickson erickerick...@gmail.com wrote:


Warning: I'm wy out of my competency range when I comment
on SOLR, but I've seen the statement that string fields are NOT
tokenized while text fields are, and I notice that almost all of your fields
are string type.

Would someone more knowledgeable than me care to comment on whether
this is at all relevant? Offered in the spirit that sometimes there are
things
so basic that only an amateur can see them G

Best
Erick

On Wed, May 13, 2009 at 4:42 PM, vivek sar vivex...@gmail.com wrote:

  

Thanks Otis.

Our use case doesn't require any sorting or faceting. I'm wondering if
I've configured anything wrong.

I got total of 25 fields (15 are indexed and stored, other 10 are just
stored). All my fields are basic data type - which I thought are not
sorted. My id field is unique key.

Is there any field here that might be getting sorted?

 field name=id type=long indexed=true stored=true
required=true omitNorms=true compressed=false/

  field name=atmps type=integer indexed=false stored=true
compressed=false/
  field name=bcid type=string indexed=true stored=true
omitNorms=true compressed=false/
  field name=cmpcd type=string indexed=true stored=true
omitNorms=true compressed=false/
  field name=ctry type=string indexed=true stored=true
omitNorms=true compressed=false/
  field name=dlt type=date indexed=false stored=true
default=NOW/HOUR  compressed=false/
  field name=dmn type=string indexed=true stored=true
omitNorms=true compressed=false/
  field name=eaddr type=string indexed=true stored=true
omitNorms=true compressed=false/
  field name=emsg type=string indexed=false stored=true
compressed=false/
  field name=erc type=string indexed=false stored=true
compressed=false/
  field name=evt type=string indexed=true stored=true
omitNorms=true compressed=false/
  field name=from type=string indexed=true stored=true
omitNorms=true compressed=false/
  field name=lfid type=string indexed=true stored=true
omitNorms=true compressed=false/
  field name=lsid type=string indexed=true 

Re: Solr memory requirements?

2009-05-14 Thread vivek sar
Thanks Mark.

I checked all the items you mentioned,

1) I've omitnorms=true for all my indexed fields (stored only fields I
guess doesn't matter)
2) I've tried commenting out all caches in the solrconfig.xml, but
that doesn't help much
3) I've tried commenting out the first and new searcher listeners
settings in the solrconfig.xml - the only way that helps is that at
startup time the memory usage doesn't spike up - that's only because
there is no auto-warmer query to run. But, I noticed commenting out
searchers slows down any other queries to Solr.
4) I don't have any sort or facet in my queries
5) I'm not sure how to change the Lucene term interval from Solr -
is there a way to do that?

I've been playing around with this memory thing the whole day and have
found that it's the search that's hogging the memory. Any time there
is a search on all the records (800 million) the heap consumption
jumps by 5G. This makes me think there has to be some configuration in
Solr that's causing some terms per document to be loaded in memory.

I've posted my settings several times on this forum, but no one has
been able to pin point what configuration might be causing this. If
someone is interested I can attach the solrconfig and schema files as
well. Here are the settings again under Query tag,

query
  maxBooleanClauses1024/maxBooleanClauses
  enableLazyFieldLoadingtrue/enableLazyFieldLoading
  queryResultWindowSize50/queryResultWindowSize
  queryResultMaxDocsCached200/queryResultMaxDocsCached
   HashDocSet maxSize=3000 loadFactor=0.75/
  useColdSearcherfalse/useColdSearcher
  maxWarmingSearchers2/maxWarmingSearchers
 /query

and schema,

 field name=id type=long indexed=true stored=true
required=true omitNorms=true compressed=false/

  field name=atmps type=integer indexed=false stored=true
compressed=false/
  field name=bcid type=string indexed=true stored=true
omitNorms=true compressed=false/
  field name=cmpcd type=string indexed=true stored=true
omitNorms=true compressed=false/
  field name=ctry type=string indexed=true stored=true
omitNorms=true compressed=false/
  field name=dlt type=date indexed=false stored=true
default=NOW/HOUR  compressed=false/
  field name=dmn type=string indexed=true stored=true
omitNorms=true compressed=false/
  field name=eaddr type=string indexed=true stored=true
omitNorms=true compressed=false/
  field name=emsg type=string indexed=false stored=true
compressed=false/
  field name=erc type=string indexed=false stored=true
compressed=false/
  field name=evt type=string indexed=true stored=true
omitNorms=true compressed=false/
  field name=from type=string indexed=true stored=true
omitNorms=true compressed=false/
  field name=lfid type=string indexed=true stored=true
omitNorms=true compressed=false/
  field name=lsid type=string indexed=true stored=true
omitNorms=true compressed=false/
  field name=prsid type=string indexed=true stored=true
omitNorms=true compressed=false/
  field name=rc type=string indexed=false stored=true
compressed=false/
  field name=rmcd type=string indexed=false stored=true
compressed=false/
  field name=rmscd type=string indexed=false stored=true
compressed=false/
  field name=scd type=string indexed=true stored=true
omitNorms=true compressed=false/
  field name=sip type=string indexed=false stored=true
compressed=false/
  field name=ts type=date indexed=true stored=false
default=NOW/HOUR omitNorms=true/

  !-- catchall field, containing all other searchable text fields (implemented
   via copyField further on in this schema  --
  field name=all type=text_ws indexed=true stored=false
omitNorms=true multiValued=true/

Any help is greatly appreciated.

Thanks,
-vivek

On Thu, May 14, 2009 at 6:22 PM, Mark Miller markrmil...@gmail.com wrote:
 800 million docs is on the high side for modern hardware.

 If even one field has norms on, your talking almost 800 MB right there. And
 then if another Searcher is brought up well the old one is serving (which
 happens when you update)? Doubled.

 Your best bet is to distribute across a couple machines.

 To minimize you would want to turn off or down caching, don't facet, don't
 sort, turn off all norms, possibly get at the Lucene term interval and raise
 it. Drop on deck searchers setting. Even then, 800 million...time to
 distribute I'd think.

 vivek sar wrote:

 Some update on this issue,

 1) I attached jconsole to my app and monitored the memory usage.
 During indexing the memory usage goes up and down, which I think is
 normal. The memory remains around the min heap size (4 G) for
 indexing, but as soon as I run a search the tenured heap usage jumps
 up to 6G and remains there. Subsequent searches increases the heap
 usage even more until it reaches the max (8G) - after which everything
 (indexing and searching becomes slow).

 The search query is a very generic one in this case which goes through
 all the cores (4 of them - 800 million records), finds 400million
 matches and returns 100 rows.

 Does the Solr 

Re: Solr memory requirements?

2009-05-13 Thread Otis Gospodnetic

Hi,
Some answers:
1) .tii files in the Lucene index.  When you sort, all distinct values for the 
field(s) used for sorting.  Similarly for facet fields.  Solr caches.
2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will consume 
during indexing.  There is no need to commit every 50K docs unless you want to 
trigger snapshot creation.
3) see 1) above

1.5 billion docs per instance where each doc is cca 1KB?  I doubt that's going 
to fly. :)

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: vivek sar vivex...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Wednesday, May 13, 2009 3:04:46 PM
 Subject: Solr memory requirements?
 
 Hi,
 
   I'm pretty sure this has been asked before, but I couldn't find a
 complete answer in the forum archive. Here are my questions,
 
 1) When solr starts up what does it loads up in the memory? Let's say
 I've 4 cores with each core 50G in size. When Solr comes up how much
 of it would be loaded in memory?
 
 2) How much memory is required during index time? If I'm committing
 50K records at a time (1 record = 1KB) using solrj, how much memory do
 I need to give to Solr.
 
 3) Is there a minimum memory requirement by Solr to maintain a certain
 size index? Is there any benchmark on this?
 
 Here are some of my configuration from solrconfig.xml,
 
 1) 64
 2) All the caches (under query tag) are commented out
 3) Few others,
   a)  true==
 would this require memory?
   b)  50
   c) 200
   d) 
   e) false
   f)  2
 
 The problem we are having is following,
 
 I've given Solr RAM of 6G. As the total index size (all cores
 combined) start growing the Solr memory consumption  goes up. With 800
 million documents, I see Solr already taking up all the memory at
 startup. After that the commits, searches everything become slow. We
 will be having distributed setup with multiple Solr instances (around
 8) on four boxes, but our requirement is to have each Solr instance at
 least maintain around 1.5 billion documents.
 
 We are trying to see if we can somehow reduce the Solr memory
 footprint. If someone can provide a pointer on what parameters affect
 memory and what effects it has we can then decide whether we want that
 parameter or not. I'm not sure if there is any minimum Solr
 requirement for it to be able maintain large indexes. I've used Lucene
 before and that didn't require anything by default - it used up memory
 only during index and search times - not otherwise.
 
 Any help is very much appreciated.
 
 Thanks,
 -vivek



Re: Solr memory requirements?

2009-05-13 Thread vivek sar
Thanks Otis.

Our use case doesn't require any sorting or faceting. I'm wondering if
I've configured anything wrong.

I got total of 25 fields (15 are indexed and stored, other 10 are just
stored). All my fields are basic data type - which I thought are not
sorted. My id field is unique key.

Is there any field here that might be getting sorted?

 field name=id type=long indexed=true stored=true
required=true omitNorms=true compressed=false/

   field name=atmps type=integer indexed=false stored=true
compressed=false/
   field name=bcid type=string indexed=true stored=true
omitNorms=true compressed=false/
   field name=cmpcd type=string indexed=true stored=true
omitNorms=true compressed=false/
   field name=ctry type=string indexed=true stored=true
omitNorms=true compressed=false/
   field name=dlt type=date indexed=false stored=true
default=NOW/HOUR  compressed=false/
   field name=dmn type=string indexed=true stored=true
omitNorms=true compressed=false/
   field name=eaddr type=string indexed=true stored=true
omitNorms=true compressed=false/
   field name=emsg type=string indexed=false stored=true
compressed=false/
   field name=erc type=string indexed=false stored=true
compressed=false/
   field name=evt type=string indexed=true stored=true
omitNorms=true compressed=false/
   field name=from type=string indexed=true stored=true
omitNorms=true compressed=false/
   field name=lfid type=string indexed=true stored=true
omitNorms=true compressed=false/
   field name=lsid type=string indexed=true stored=true
omitNorms=true compressed=false/
   field name=prsid type=string indexed=true stored=true
omitNorms=true compressed=false/
   field name=rc type=string indexed=false stored=true
compressed=false/
   field name=rmcd type=string indexed=false stored=true
compressed=false/
   field name=rmscd type=string indexed=false stored=true
compressed=false/
   field name=scd type=string indexed=true stored=true
omitNorms=true compressed=false/
   field name=sip type=string indexed=false stored=true
compressed=false/
   field name=ts type=date indexed=true stored=false
default=NOW/HOUR omitNorms=true/


   !-- catchall field, containing all other searchable text fields (implemented
via copyField further on in this schema  --
   field name=all type=text_ws indexed=true stored=false
omitNorms=true multiValued=true/

Thanks,
-vivek

On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:

 Hi,
 Some answers:
 1) .tii files in the Lucene index.  When you sort, all distinct values for 
 the field(s) used for sorting.  Similarly for facet fields.  Solr caches.
 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will consume 
 during indexing.  There is no need to commit every 50K docs unless you want 
 to trigger snapshot creation.
 3) see 1) above

 1.5 billion docs per instance where each doc is cca 1KB?  I doubt that's 
 going to fly. :)

 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
 From: vivek sar vivex...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Wednesday, May 13, 2009 3:04:46 PM
 Subject: Solr memory requirements?

 Hi,

   I'm pretty sure this has been asked before, but I couldn't find a
 complete answer in the forum archive. Here are my questions,

 1) When solr starts up what does it loads up in the memory? Let's say
 I've 4 cores with each core 50G in size. When Solr comes up how much
 of it would be loaded in memory?

 2) How much memory is required during index time? If I'm committing
 50K records at a time (1 record = 1KB) using solrj, how much memory do
 I need to give to Solr.

 3) Is there a minimum memory requirement by Solr to maintain a certain
 size index? Is there any benchmark on this?

 Here are some of my configuration from solrconfig.xml,

 1) 64
 2) All the caches (under query tag) are commented out
 3) Few others,
       a)  true    ==
 would this require memory?
       b)  50
       c) 200
       d)
       e) false
       f)  2

 The problem we are having is following,

 I've given Solr RAM of 6G. As the total index size (all cores
 combined) start growing the Solr memory consumption  goes up. With 800
 million documents, I see Solr already taking up all the memory at
 startup. After that the commits, searches everything become slow. We
 will be having distributed setup with multiple Solr instances (around
 8) on four boxes, but our requirement is to have each Solr instance at
 least maintain around 1.5 billion documents.

 We are trying to see if we can somehow reduce the Solr memory
 footprint. If someone can provide a pointer on what parameters affect
 memory and what effects it has we can then decide whether we want that
 parameter or not. I'm not sure if there is any minimum Solr
 requirement for it to be able maintain large indexes. I've used Lucene
 before and that didn't require anything by default - it used up memory
 only during index and search times - not otherwise.

 Any 

Re: Solr memory requirements?

2009-05-13 Thread Otis Gospodnetic

Hi,

Sorting is triggered by the sort parameter in the URL, not a characteristic of 
a field. :)

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: vivek sar vivex...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Wednesday, May 13, 2009 4:42:16 PM
 Subject: Re: Solr memory requirements?
 
 Thanks Otis.
 
 Our use case doesn't require any sorting or faceting. I'm wondering if
 I've configured anything wrong.
 
 I got total of 25 fields (15 are indexed and stored, other 10 are just
 stored). All my fields are basic data type - which I thought are not
 sorted. My id field is unique key.
 
 Is there any field here that might be getting sorted?
 
 
 required=true omitNorms=true compressed=false/
 
   
 compressed=false/
   
 omitNorms=true compressed=false/
   
 omitNorms=true compressed=false/
   
 omitNorms=true compressed=false/
   
 default=NOW/HOUR  compressed=false/
   
 omitNorms=true compressed=false/
   
 omitNorms=true compressed=false/
   
 compressed=false/
   
 compressed=false/
   
 omitNorms=true compressed=false/
   
 omitNorms=true compressed=false/
   
 omitNorms=true compressed=false/
   
 omitNorms=true compressed=false/
   
 omitNorms=true compressed=false/
   
 compressed=false/
   
 compressed=false/
   
 compressed=false/
   
 omitNorms=true compressed=false/
   
 compressed=false/
   
 default=NOW/HOUR omitNorms=true/
 
 
   
   
 omitNorms=true multiValued=true/
 
 Thanks,
 -vivek
 
 On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
 wrote:
 
  Hi,
  Some answers:
  1) .tii files in the Lucene index.  When you sort, all distinct values for 
  the 
 field(s) used for sorting.  Similarly for facet fields.  Solr caches.
  2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will 
  consume 
 during indexing.  There is no need to commit every 50K docs unless you want 
 to 
 trigger snapshot creation.
  3) see 1) above
 
  1.5 billion docs per instance where each doc is cca 1KB?  I doubt that's 
  going 
 to fly. :)
 
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
  - Original Message 
  From: vivek sar 
  To: solr-user@lucene.apache.org
  Sent: Wednesday, May 13, 2009 3:04:46 PM
  Subject: Solr memory requirements?
 
  Hi,
 
I'm pretty sure this has been asked before, but I couldn't find a
  complete answer in the forum archive. Here are my questions,
 
  1) When solr starts up what does it loads up in the memory? Let's say
  I've 4 cores with each core 50G in size. When Solr comes up how much
  of it would be loaded in memory?
 
  2) How much memory is required during index time? If I'm committing
  50K records at a time (1 record = 1KB) using solrj, how much memory do
  I need to give to Solr.
 
  3) Is there a minimum memory requirement by Solr to maintain a certain
  size index? Is there any benchmark on this?
 
  Here are some of my configuration from solrconfig.xml,
 
  1) 64
  2) All the caches (under query tag) are commented out
  3) Few others,
a)  true==
  would this require memory?
b)  50
c) 200
d)
e) false
f)  2
 
  The problem we are having is following,
 
  I've given Solr RAM of 6G. As the total index size (all cores
  combined) start growing the Solr memory consumption  goes up. With 800
  million documents, I see Solr already taking up all the memory at
  startup. After that the commits, searches everything become slow. We
  will be having distributed setup with multiple Solr instances (around
  8) on four boxes, but our requirement is to have each Solr instance at
  least maintain around 1.5 billion documents.
 
  We are trying to see if we can somehow reduce the Solr memory
  footprint. If someone can provide a pointer on what parameters affect
  memory and what effects it has we can then decide whether we want that
  parameter or not. I'm not sure if there is any minimum Solr
  requirement for it to be able maintain large indexes. I've used Lucene
  before and that didn't require anything by default - it used up memory
  only during index and search times - not otherwise.
 
  Any help is very much appreciated.
 
  Thanks,
  -vivek
 
 



Re: Solr memory requirements?

2009-05-13 Thread vivek sar
Otis,

In that case, I'm not sure why Solr is taking up so much memory as
soon as we start it up. I checked for .tii file and there is only one,

-rw-r--r--  1 search  staff  20306 May 11 21:47 ./20090510_1/data/index/_3au.tii

I have all the cache disabled - so that shouldn't be a problem too. My
ramBuffer size is only 64MB.

I read note on sorting,
http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see
something related to FieldCache. I don't see this as parameter defined
in either solrconfig.xml or schema.xml. Could this be something that
can load things in memory at startup? How can we disable it?

I'm trying to find out if there is a way to tell how much memory Solr
would consume and way to cap it.

Thanks,
-vivek




On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:

 Hi,

 Sorting is triggered by the sort parameter in the URL, not a characteristic 
 of a field. :)

 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
 From: vivek sar vivex...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Wednesday, May 13, 2009 4:42:16 PM
 Subject: Re: Solr memory requirements?

 Thanks Otis.

 Our use case doesn't require any sorting or faceting. I'm wondering if
 I've configured anything wrong.

 I got total of 25 fields (15 are indexed and stored, other 10 are just
 stored). All my fields are basic data type - which I thought are not
 sorted. My id field is unique key.

 Is there any field here that might be getting sorted?


 required=true omitNorms=true compressed=false/


 compressed=false/

 omitNorms=true compressed=false/

 omitNorms=true compressed=false/

 omitNorms=true compressed=false/

 default=NOW/HOUR  compressed=false/

 omitNorms=true compressed=false/

 omitNorms=true compressed=false/

 compressed=false/

 compressed=false/

 omitNorms=true compressed=false/

 omitNorms=true compressed=false/

 omitNorms=true compressed=false/

 omitNorms=true compressed=false/

 omitNorms=true compressed=false/

 compressed=false/

 compressed=false/

 compressed=false/

 omitNorms=true compressed=false/

 compressed=false/

 default=NOW/HOUR omitNorms=true/




 omitNorms=true multiValued=true/

 Thanks,
 -vivek

 On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
 wrote:
 
  Hi,
  Some answers:
  1) .tii files in the Lucene index.  When you sort, all distinct values for 
  the
 field(s) used for sorting.  Similarly for facet fields.  Solr caches.
  2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will 
  consume
 during indexing.  There is no need to commit every 50K docs unless you want 
 to
 trigger snapshot creation.
  3) see 1) above
 
  1.5 billion docs per instance where each doc is cca 1KB?  I doubt that's 
  going
 to fly. :)
 
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
  - Original Message 
  From: vivek sar
  To: solr-user@lucene.apache.org
  Sent: Wednesday, May 13, 2009 3:04:46 PM
  Subject: Solr memory requirements?
 
  Hi,
 
    I'm pretty sure this has been asked before, but I couldn't find a
  complete answer in the forum archive. Here are my questions,
 
  1) When solr starts up what does it loads up in the memory? Let's say
  I've 4 cores with each core 50G in size. When Solr comes up how much
  of it would be loaded in memory?
 
  2) How much memory is required during index time? If I'm committing
  50K records at a time (1 record = 1KB) using solrj, how much memory do
  I need to give to Solr.
 
  3) Is there a minimum memory requirement by Solr to maintain a certain
  size index? Is there any benchmark on this?
 
  Here are some of my configuration from solrconfig.xml,
 
  1) 64
  2) All the caches (under query tag) are commented out
  3) Few others,
        a)  true    ==
  would this require memory?
        b)  50
        c) 200
        d)
        e) false
        f)  2
 
  The problem we are having is following,
 
  I've given Solr RAM of 6G. As the total index size (all cores
  combined) start growing the Solr memory consumption  goes up. With 800
  million documents, I see Solr already taking up all the memory at
  startup. After that the commits, searches everything become slow. We
  will be having distributed setup with multiple Solr instances (around
  8) on four boxes, but our requirement is to have each Solr instance at
  least maintain around 1.5 billion documents.
 
  We are trying to see if we can somehow reduce the Solr memory
  footprint. If someone can provide a pointer on what parameters affect
  memory and what effects it has we can then decide whether we want that
  parameter or not. I'm not sure if there is any minimum Solr
  requirement for it to be able maintain large indexes. I've used Lucene
  before and that didn't require anything by default - it used up memory
  only during index and search times - not otherwise.
 
  Any help is very much appreciated.
 
  Thanks,
  -vivek
 
 




Re: Solr memory requirements?

2009-05-13 Thread Grant Ingersoll
Have you done any profiling to see where the hotspots are?  I realize  
that may be difficult on an index of that size, but maybe you can  
approximate on a smaller version.  Also, do you have warming queries?


You might also look into setting the termIndexInterval at the Lucene  
level.  This is not currently exposed in Solr (AFAIK), but likely  
could be added fairly easily as part of the index parameters.  http://lucene.apache.org/java/2_4_1/api/core/org/apache/lucene/index/IndexWriter.html#setTermIndexInterval(int)


-Grant

On May 13, 2009, at 5:12 PM, vivek sar wrote:


Otis,

In that case, I'm not sure why Solr is taking up so much memory as
soon as we start it up. I checked for .tii file and there is only one,

-rw-r--r--  1 search  staff  20306 May 11 21:47 ./20090510_1/data/ 
index/_3au.tii


I have all the cache disabled - so that shouldn't be a problem too. My
ramBuffer size is only 64MB.

I read note on sorting,
http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see
something related to FieldCache. I don't see this as parameter defined
in either solrconfig.xml or schema.xml. Could this be something that
can load things in memory at startup? How can we disable it?

I'm trying to find out if there is a way to tell how much memory Solr
would consume and way to cap it.

Thanks,
-vivek




On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:


Hi,

Sorting is triggered by the sort parameter in the URL, not a  
characteristic of a field. :)


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: vivek sar vivex...@gmail.com
To: solr-user@lucene.apache.org
Sent: Wednesday, May 13, 2009 4:42:16 PM
Subject: Re: Solr memory requirements?

Thanks Otis.

Our use case doesn't require any sorting or faceting. I'm  
wondering if

I've configured anything wrong.

I got total of 25 fields (15 are indexed and stored, other 10 are  
just

stored). All my fields are basic data type - which I thought are not
sorted. My id field is unique key.

Is there any field here that might be getting sorted?


required=true omitNorms=true compressed=false/


compressed=false/

omitNorms=true compressed=false/

omitNorms=true compressed=false/

omitNorms=true compressed=false/

default=NOW/HOUR  compressed=false/

omitNorms=true compressed=false/

omitNorms=true compressed=false/

compressed=false/

compressed=false/

omitNorms=true compressed=false/

omitNorms=true compressed=false/

omitNorms=true compressed=false/

omitNorms=true compressed=false/

omitNorms=true compressed=false/

compressed=false/

compressed=false/

compressed=false/

omitNorms=true compressed=false/

compressed=false/

default=NOW/HOUR omitNorms=true/




omitNorms=true multiValued=true/

Thanks,
-vivek

On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
wrote:


Hi,
Some answers:
1) .tii files in the Lucene index.  When you sort, all distinct  
values for the
field(s) used for sorting.  Similarly for facet fields.  Solr  
caches.
2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr  
will consume
during indexing.  There is no need to commit every 50K docs unless  
you want to

trigger snapshot creation.

3) see 1) above

1.5 billion docs per instance where each doc is cca 1KB?  I doubt  
that's going

to fly. :)


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: vivek sar
To: solr-user@lucene.apache.org
Sent: Wednesday, May 13, 2009 3:04:46 PM
Subject: Solr memory requirements?

Hi,

  I'm pretty sure this has been asked before, but I couldn't  
find a

complete answer in the forum archive. Here are my questions,

1) When solr starts up what does it loads up in the memory?  
Let's say
I've 4 cores with each core 50G in size. When Solr comes up how  
much

of it would be loaded in memory?

2) How much memory is required during index time? If I'm  
committing
50K records at a time (1 record = 1KB) using solrj, how much  
memory do

I need to give to Solr.

3) Is there a minimum memory requirement by Solr to maintain a  
certain

size index? Is there any benchmark on this?

Here are some of my configuration from solrconfig.xml,

1) 64
2) All the caches (under query tag) are commented out
3) Few others,
  a)  true==
would this require memory?
  b)  50
  c) 200
  d)
  e) false
  f)  2

The problem we are having is following,

I've given Solr RAM of 6G. As the total index size (all cores
combined) start growing the Solr memory consumption  goes up.  
With 800

million documents, I see Solr already taking up all the memory at
startup. After that the commits, searches everything become  
slow. We
will be having distributed setup with multiple Solr instances  
(around
8) on four boxes, but our requirement is to have each Solr  
instance at

least maintain around 1.5 billion documents.

We are trying to see if we can somehow reduce the Solr memory
footprint. If someone can

Re: Solr memory requirements?

2009-05-13 Thread vivek sar
Just an update on the memory issue - might be useful for others. I
read the following,

 http://wiki.apache.org/solr/SolrCaching?highlight=(SolrCaching)

and looks like the first and new searcher listeners would populate the
FieldCache. Commenting out these two listener entries seems to do the
trick - at least the heap size is not growing as soon as Solr starts
up.

I ran some searches and they all came out fine. Index rate is also
pretty good. Would there be any impact of disabling these listeners?

Thanks,
-vivek

On Wed, May 13, 2009 at 2:12 PM, vivek sar vivex...@gmail.com wrote:
 Otis,

 In that case, I'm not sure why Solr is taking up so much memory as
 soon as we start it up. I checked for .tii file and there is only one,

 -rw-r--r--  1 search  staff  20306 May 11 21:47 
 ./20090510_1/data/index/_3au.tii

 I have all the cache disabled - so that shouldn't be a problem too. My
 ramBuffer size is only 64MB.

 I read note on sorting,
 http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see
 something related to FieldCache. I don't see this as parameter defined
 in either solrconfig.xml or schema.xml. Could this be something that
 can load things in memory at startup? How can we disable it?

 I'm trying to find out if there is a way to tell how much memory Solr
 would consume and way to cap it.

 Thanks,
 -vivek




 On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic
 otis_gospodne...@yahoo.com wrote:

 Hi,

 Sorting is triggered by the sort parameter in the URL, not a characteristic 
 of a field. :)

 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
 From: vivek sar vivex...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Wednesday, May 13, 2009 4:42:16 PM
 Subject: Re: Solr memory requirements?

 Thanks Otis.

 Our use case doesn't require any sorting or faceting. I'm wondering if
 I've configured anything wrong.

 I got total of 25 fields (15 are indexed and stored, other 10 are just
 stored). All my fields are basic data type - which I thought are not
 sorted. My id field is unique key.

 Is there any field here that might be getting sorted?


 required=true omitNorms=true compressed=false/


 compressed=false/

 omitNorms=true compressed=false/

 omitNorms=true compressed=false/

 omitNorms=true compressed=false/

 default=NOW/HOUR  compressed=false/

 omitNorms=true compressed=false/

 omitNorms=true compressed=false/

 compressed=false/

 compressed=false/

 omitNorms=true compressed=false/

 omitNorms=true compressed=false/

 omitNorms=true compressed=false/

 omitNorms=true compressed=false/

 omitNorms=true compressed=false/

 compressed=false/

 compressed=false/

 compressed=false/

 omitNorms=true compressed=false/

 compressed=false/

 default=NOW/HOUR omitNorms=true/




 omitNorms=true multiValued=true/

 Thanks,
 -vivek

 On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
 wrote:
 
  Hi,
  Some answers:
  1) .tii files in the Lucene index.  When you sort, all distinct values 
  for the
 field(s) used for sorting.  Similarly for facet fields.  Solr caches.
  2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will 
  consume
 during indexing.  There is no need to commit every 50K docs unless you want 
 to
 trigger snapshot creation.
  3) see 1) above
 
  1.5 billion docs per instance where each doc is cca 1KB?  I doubt that's 
  going
 to fly. :)
 
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
  - Original Message 
  From: vivek sar
  To: solr-user@lucene.apache.org
  Sent: Wednesday, May 13, 2009 3:04:46 PM
  Subject: Solr memory requirements?
 
  Hi,
 
    I'm pretty sure this has been asked before, but I couldn't find a
  complete answer in the forum archive. Here are my questions,
 
  1) When solr starts up what does it loads up in the memory? Let's say
  I've 4 cores with each core 50G in size. When Solr comes up how much
  of it would be loaded in memory?
 
  2) How much memory is required during index time? If I'm committing
  50K records at a time (1 record = 1KB) using solrj, how much memory do
  I need to give to Solr.
 
  3) Is there a minimum memory requirement by Solr to maintain a certain
  size index? Is there any benchmark on this?
 
  Here are some of my configuration from solrconfig.xml,
 
  1) 64
  2) All the caches (under query tag) are commented out
  3) Few others,
        a)  true    ==
  would this require memory?
        b)  50
        c) 200
        d)
        e) false
        f)  2
 
  The problem we are having is following,
 
  I've given Solr RAM of 6G. As the total index size (all cores
  combined) start growing the Solr memory consumption  goes up. With 800
  million documents, I see Solr already taking up all the memory at
  startup. After that the commits, searches everything become slow. We
  will be having distributed setup with multiple Solr instances (around
  8) on four boxes, but our requirement is to have each Solr instance at
  least

Re: Solr memory requirements?

2009-05-13 Thread vivek sar
Disabling first/new searchers did help for the initial load time, but
after 10-15 min the heap memory start climbing up again and reached
max within 20 min. Now the GC is coming up all the time, which is
slowing down the commit and search cycles.

This is still puzzling what does Solr holds in the memory and doesn't release?

I haven't been able to profile as the dump is too big. Would setting
termIndexInterval help - not sure how can that be set using Solr.

Some other query properties under solrconfig,

query
   maxBooleanClauses1024/maxBooleanClauses
   enableLazyFieldLoadingtrue/enableLazyFieldLoading
   queryResultWindowSize50/queryResultWindowSize
   queryResultMaxDocsCached200/queryResultMaxDocsCached
HashDocSet maxSize=3000 loadFactor=0.75/
   useColdSearcherfalse/useColdSearcher
   maxWarmingSearchers2/maxWarmingSearchers
 /query

Currently, I got 800 million documents and have specified 8G heap size.

Any other suggestion on what can I do to control the Solr memory consumption?

Thanks,
-vivek

On Wed, May 13, 2009 at 2:53 PM, vivek sar vivex...@gmail.com wrote:
 Just an update on the memory issue - might be useful for others. I
 read the following,

  http://wiki.apache.org/solr/SolrCaching?highlight=(SolrCaching)

 and looks like the first and new searcher listeners would populate the
 FieldCache. Commenting out these two listener entries seems to do the
 trick - at least the heap size is not growing as soon as Solr starts
 up.

 I ran some searches and they all came out fine. Index rate is also
 pretty good. Would there be any impact of disabling these listeners?

 Thanks,
 -vivek

 On Wed, May 13, 2009 at 2:12 PM, vivek sar vivex...@gmail.com wrote:
 Otis,

 In that case, I'm not sure why Solr is taking up so much memory as
 soon as we start it up. I checked for .tii file and there is only one,

 -rw-r--r--  1 search  staff  20306 May 11 21:47 
 ./20090510_1/data/index/_3au.tii

 I have all the cache disabled - so that shouldn't be a problem too. My
 ramBuffer size is only 64MB.

 I read note on sorting,
 http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see
 something related to FieldCache. I don't see this as parameter defined
 in either solrconfig.xml or schema.xml. Could this be something that
 can load things in memory at startup? How can we disable it?

 I'm trying to find out if there is a way to tell how much memory Solr
 would consume and way to cap it.

 Thanks,
 -vivek




 On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic
 otis_gospodne...@yahoo.com wrote:

 Hi,

 Sorting is triggered by the sort parameter in the URL, not a characteristic 
 of a field. :)

 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
 From: vivek sar vivex...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Wednesday, May 13, 2009 4:42:16 PM
 Subject: Re: Solr memory requirements?

 Thanks Otis.

 Our use case doesn't require any sorting or faceting. I'm wondering if
 I've configured anything wrong.

 I got total of 25 fields (15 are indexed and stored, other 10 are just
 stored). All my fields are basic data type - which I thought are not
 sorted. My id field is unique key.

 Is there any field here that might be getting sorted?


 required=true omitNorms=true compressed=false/


 compressed=false/

 omitNorms=true compressed=false/

 omitNorms=true compressed=false/

 omitNorms=true compressed=false/

 default=NOW/HOUR  compressed=false/

 omitNorms=true compressed=false/

 omitNorms=true compressed=false/

 compressed=false/

 compressed=false/

 omitNorms=true compressed=false/

 omitNorms=true compressed=false/

 omitNorms=true compressed=false/

 omitNorms=true compressed=false/

 omitNorms=true compressed=false/

 compressed=false/

 compressed=false/

 compressed=false/

 omitNorms=true compressed=false/

 compressed=false/

 default=NOW/HOUR omitNorms=true/




 omitNorms=true multiValued=true/

 Thanks,
 -vivek

 On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
 wrote:
 
  Hi,
  Some answers:
  1) .tii files in the Lucene index.  When you sort, all distinct values 
  for the
 field(s) used for sorting.  Similarly for facet fields.  Solr caches.
  2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will 
  consume
 during indexing.  There is no need to commit every 50K docs unless you 
 want to
 trigger snapshot creation.
  3) see 1) above
 
  1.5 billion docs per instance where each doc is cca 1KB?  I doubt that's 
  going
 to fly. :)
 
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
  - Original Message 
  From: vivek sar
  To: solr-user@lucene.apache.org
  Sent: Wednesday, May 13, 2009 3:04:46 PM
  Subject: Solr memory requirements?
 
  Hi,
 
    I'm pretty sure this has been asked before, but I couldn't find a
  complete answer in the forum archive. Here are my questions,
 
  1) When solr starts up what does it loads up in the memory? Let's say
  I've 4 cores with each core 50G

Re: Solr memory requirements?

2009-05-13 Thread Jack Godwin
Have you checked the maxBufferedDocs?  I had to drop mine down to 1000 with
3 million docs.
Jack

On Wed, May 13, 2009 at 6:53 PM, vivek sar vivex...@gmail.com wrote:

 Disabling first/new searchers did help for the initial load time, but
 after 10-15 min the heap memory start climbing up again and reached
 max within 20 min. Now the GC is coming up all the time, which is
 slowing down the commit and search cycles.

 This is still puzzling what does Solr holds in the memory and doesn't
 release?

 I haven't been able to profile as the dump is too big. Would setting
 termIndexInterval help - not sure how can that be set using Solr.

 Some other query properties under solrconfig,

 query
   maxBooleanClauses1024/maxBooleanClauses
   enableLazyFieldLoadingtrue/enableLazyFieldLoading
   queryResultWindowSize50/queryResultWindowSize
   queryResultMaxDocsCached200/queryResultMaxDocsCached
HashDocSet maxSize=3000 loadFactor=0.75/
   useColdSearcherfalse/useColdSearcher
   maxWarmingSearchers2/maxWarmingSearchers
  /query

 Currently, I got 800 million documents and have specified 8G heap size.

 Any other suggestion on what can I do to control the Solr memory
 consumption?

 Thanks,
 -vivek

 On Wed, May 13, 2009 at 2:53 PM, vivek sar vivex...@gmail.com wrote:
  Just an update on the memory issue - might be useful for others. I
  read the following,
 
   http://wiki.apache.org/solr/SolrCaching?highlight=(SolrCaching)
 
  and looks like the first and new searcher listeners would populate the
  FieldCache. Commenting out these two listener entries seems to do the
  trick - at least the heap size is not growing as soon as Solr starts
  up.
 
  I ran some searches and they all came out fine. Index rate is also
  pretty good. Would there be any impact of disabling these listeners?
 
  Thanks,
  -vivek
 
  On Wed, May 13, 2009 at 2:12 PM, vivek sar vivex...@gmail.com wrote:
  Otis,
 
  In that case, I'm not sure why Solr is taking up so much memory as
  soon as we start it up. I checked for .tii file and there is only one,
 
  -rw-r--r--  1 search  staff  20306 May 11 21:47
 ./20090510_1/data/index/_3au.tii
 
  I have all the cache disabled - so that shouldn't be a problem too. My
  ramBuffer size is only 64MB.
 
  I read note on sorting,
  http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see
  something related to FieldCache. I don't see this as parameter defined
  in either solrconfig.xml or schema.xml. Could this be something that
  can load things in memory at startup? How can we disable it?
 
  I'm trying to find out if there is a way to tell how much memory Solr
  would consume and way to cap it.
 
  Thanks,
  -vivek
 
 
 
 
  On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic
  otis_gospodne...@yahoo.com wrote:
 
  Hi,
 
  Sorting is triggered by the sort parameter in the URL, not a
 characteristic of a field. :)
 
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
  - Original Message 
  From: vivek sar vivex...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Wednesday, May 13, 2009 4:42:16 PM
  Subject: Re: Solr memory requirements?
 
  Thanks Otis.
 
  Our use case doesn't require any sorting or faceting. I'm wondering if
  I've configured anything wrong.
 
  I got total of 25 fields (15 are indexed and stored, other 10 are just
  stored). All my fields are basic data type - which I thought are not
  sorted. My id field is unique key.
 
  Is there any field here that might be getting sorted?
 
 
  required=true omitNorms=true compressed=false/
 
 
  compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  default=NOW/HOUR  compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  compressed=false/
 
  compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  compressed=false/
 
  compressed=false/
 
  compressed=false/
 
  omitNorms=true compressed=false/
 
  compressed=false/
 
  default=NOW/HOUR omitNorms=true/
 
 
 
 
  omitNorms=true multiValued=true/
 
  Thanks,
  -vivek
 
  On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
  wrote:
  
   Hi,
   Some answers:
   1) .tii files in the Lucene index.  When you sort, all distinct
 values for the
  field(s) used for sorting.  Similarly for facet fields.  Solr caches.
   2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will
 consume
  during indexing.  There is no need to commit every 50K docs unless you
 want to
  trigger snapshot creation.
   3) see 1) above
  
   1.5 billion docs per instance where each doc is cca 1KB?  I doubt
 that's going
  to fly. :)
  
   Otis
   --
   Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
  
  
  
   - Original Message 
   From: vivek sar
   To: solr-user@lucene.apache.org

Re: Solr memory requirements?

2009-05-13 Thread vivek sar
I think maxBufferedDocs has been deprecated in Solr 1.4 - it's
recommended to use ramBufferSizeMB instead. My ramBufferSizeMB=64.
This shouldn't be a problem I think.

There has to be something else that Solr is holding up in memory. Anyone else?

Thanks,
-vivek

On Wed, May 13, 2009 at 4:01 PM, Jack Godwin god...@gmail.com wrote:
 Have you checked the maxBufferedDocs?  I had to drop mine down to 1000 with
 3 million docs.
 Jack

 On Wed, May 13, 2009 at 6:53 PM, vivek sar vivex...@gmail.com wrote:

 Disabling first/new searchers did help for the initial load time, but
 after 10-15 min the heap memory start climbing up again and reached
 max within 20 min. Now the GC is coming up all the time, which is
 slowing down the commit and search cycles.

 This is still puzzling what does Solr holds in the memory and doesn't
 release?

 I haven't been able to profile as the dump is too big. Would setting
 termIndexInterval help - not sure how can that be set using Solr.

 Some other query properties under solrconfig,

 query
   maxBooleanClauses1024/maxBooleanClauses
   enableLazyFieldLoadingtrue/enableLazyFieldLoading
   queryResultWindowSize50/queryResultWindowSize
   queryResultMaxDocsCached200/queryResultMaxDocsCached
    HashDocSet maxSize=3000 loadFactor=0.75/
   useColdSearcherfalse/useColdSearcher
   maxWarmingSearchers2/maxWarmingSearchers
  /query

 Currently, I got 800 million documents and have specified 8G heap size.

 Any other suggestion on what can I do to control the Solr memory
 consumption?

 Thanks,
 -vivek

 On Wed, May 13, 2009 at 2:53 PM, vivek sar vivex...@gmail.com wrote:
  Just an update on the memory issue - might be useful for others. I
  read the following,
 
   http://wiki.apache.org/solr/SolrCaching?highlight=(SolrCaching)
 
  and looks like the first and new searcher listeners would populate the
  FieldCache. Commenting out these two listener entries seems to do the
  trick - at least the heap size is not growing as soon as Solr starts
  up.
 
  I ran some searches and they all came out fine. Index rate is also
  pretty good. Would there be any impact of disabling these listeners?
 
  Thanks,
  -vivek
 
  On Wed, May 13, 2009 at 2:12 PM, vivek sar vivex...@gmail.com wrote:
  Otis,
 
  In that case, I'm not sure why Solr is taking up so much memory as
  soon as we start it up. I checked for .tii file and there is only one,
 
  -rw-r--r--  1 search  staff  20306 May 11 21:47
 ./20090510_1/data/index/_3au.tii
 
  I have all the cache disabled - so that shouldn't be a problem too. My
  ramBuffer size is only 64MB.
 
  I read note on sorting,
  http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see
  something related to FieldCache. I don't see this as parameter defined
  in either solrconfig.xml or schema.xml. Could this be something that
  can load things in memory at startup? How can we disable it?
 
  I'm trying to find out if there is a way to tell how much memory Solr
  would consume and way to cap it.
 
  Thanks,
  -vivek
 
 
 
 
  On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic
  otis_gospodne...@yahoo.com wrote:
 
  Hi,
 
  Sorting is triggered by the sort parameter in the URL, not a
 characteristic of a field. :)
 
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
  - Original Message 
  From: vivek sar vivex...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Wednesday, May 13, 2009 4:42:16 PM
  Subject: Re: Solr memory requirements?
 
  Thanks Otis.
 
  Our use case doesn't require any sorting or faceting. I'm wondering if
  I've configured anything wrong.
 
  I got total of 25 fields (15 are indexed and stored, other 10 are just
  stored). All my fields are basic data type - which I thought are not
  sorted. My id field is unique key.
 
  Is there any field here that might be getting sorted?
 
 
  required=true omitNorms=true compressed=false/
 
 
  compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  default=NOW/HOUR  compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  compressed=false/
 
  compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  compressed=false/
 
  compressed=false/
 
  compressed=false/
 
  omitNorms=true compressed=false/
 
  compressed=false/
 
  default=NOW/HOUR omitNorms=true/
 
 
 
 
  omitNorms=true multiValued=true/
 
  Thanks,
  -vivek
 
  On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
  wrote:
  
   Hi,
   Some answers:
   1) .tii files in the Lucene index.  When you sort, all distinct
 values for the
  field(s) used for sorting.  Similarly for facet fields.  Solr caches.
   2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will
 consume
  during indexing.  There is no need to commit every 50K docs unless

Re: Solr memory requirements?

2009-05-13 Thread Erick Erickson
Warning: I'm wy out of my competency range when I comment
on SOLR, but I've seen the statement that string fields are NOT
tokenized while text fields are, and I notice that almost all of your fields
are string type.

Would someone more knowledgeable than me care to comment on whether
this is at all relevant? Offered in the spirit that sometimes there are
things
so basic that only an amateur can see them G

Best
Erick

On Wed, May 13, 2009 at 4:42 PM, vivek sar vivex...@gmail.com wrote:

 Thanks Otis.

 Our use case doesn't require any sorting or faceting. I'm wondering if
 I've configured anything wrong.

 I got total of 25 fields (15 are indexed and stored, other 10 are just
 stored). All my fields are basic data type - which I thought are not
 sorted. My id field is unique key.

 Is there any field here that might be getting sorted?

  field name=id type=long indexed=true stored=true
 required=true omitNorms=true compressed=false/

   field name=atmps type=integer indexed=false stored=true
 compressed=false/
   field name=bcid type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=cmpcd type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=ctry type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=dlt type=date indexed=false stored=true
 default=NOW/HOUR  compressed=false/
   field name=dmn type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=eaddr type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=emsg type=string indexed=false stored=true
 compressed=false/
   field name=erc type=string indexed=false stored=true
 compressed=false/
   field name=evt type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=from type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=lfid type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=lsid type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=prsid type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=rc type=string indexed=false stored=true
 compressed=false/
   field name=rmcd type=string indexed=false stored=true
 compressed=false/
   field name=rmscd type=string indexed=false stored=true
 compressed=false/
   field name=scd type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=sip type=string indexed=false stored=true
 compressed=false/
   field name=ts type=date indexed=true stored=false
 default=NOW/HOUR omitNorms=true/


   !-- catchall field, containing all other searchable text fields
 (implemented
via copyField further on in this schema  --
   field name=all type=text_ws indexed=true stored=false
 omitNorms=true multiValued=true/

 Thanks,
 -vivek

 On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
 otis_gospodne...@yahoo.com wrote:
 
  Hi,
  Some answers:
  1) .tii files in the Lucene index.  When you sort, all distinct values
 for the field(s) used for sorting.  Similarly for facet fields.  Solr
 caches.
  2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will
 consume during indexing.  There is no need to commit every 50K docs unless
 you want to trigger snapshot creation.
  3) see 1) above
 
  1.5 billion docs per instance where each doc is cca 1KB?  I doubt that's
 going to fly. :)
 
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
  - Original Message 
  From: vivek sar vivex...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Wednesday, May 13, 2009 3:04:46 PM
  Subject: Solr memory requirements?
 
  Hi,
 
I'm pretty sure this has been asked before, but I couldn't find a
  complete answer in the forum archive. Here are my questions,
 
  1) When solr starts up what does it loads up in the memory? Let's say
  I've 4 cores with each core 50G in size. When Solr comes up how much
  of it would be loaded in memory?
 
  2) How much memory is required during index time? If I'm committing
  50K records at a time (1 record = 1KB) using solrj, how much memory do
  I need to give to Solr.
 
  3) Is there a minimum memory requirement by Solr to maintain a certain
  size index? Is there any benchmark on this?
 
  Here are some of my configuration from solrconfig.xml,
 
  1) 64
  2) All the caches (under query tag) are commented out
  3) Few others,
a)  true==
  would this require memory?
b)  50
c) 200
d)
e) false
f)  2
 
  The problem we are having is following,
 
  I've given Solr RAM of 6G. As the total index size (all cores
  combined) start growing the Solr memory consumption  goes up. With 800
  million documents, I see Solr already taking up all the memory at
  startup. After that the commits, searches everything become slow. We
  will be having distributed setup with multiple Solr instances (around

Re: Solr memory requirements?

2009-05-13 Thread Otis Gospodnetic

Even a simple command like this will help:

  jmap -histo:live java pid here | head -30

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: vivek sar vivex...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Wednesday, May 13, 2009 6:53:29 PM
 Subject: Re: Solr memory requirements?
 
 Disabling first/new searchers did help for the initial load time, but
 after 10-15 min the heap memory start climbing up again and reached
 max within 20 min. Now the GC is coming up all the time, which is
 slowing down the commit and search cycles.
 
 This is still puzzling what does Solr holds in the memory and doesn't release?
 
 I haven't been able to profile as the dump is too big. Would setting
 termIndexInterval help - not sure how can that be set using Solr.
 
 Some other query properties under solrconfig,
 
 
   1024
   true
   50
   200
 
   false
   2
 
 
 Currently, I got 800 million documents and have specified 8G heap size.
 
 Any other suggestion on what can I do to control the Solr memory consumption?
 
 Thanks,
 -vivek
 
 On Wed, May 13, 2009 at 2:53 PM, vivek sar wrote:
  Just an update on the memory issue - might be useful for others. I
  read the following,
 
   http://wiki.apache.org/solr/SolrCaching?highlight=(SolrCaching)
 
  and looks like the first and new searcher listeners would populate the
  FieldCache. Commenting out these two listener entries seems to do the
  trick - at least the heap size is not growing as soon as Solr starts
  up.
 
  I ran some searches and they all came out fine. Index rate is also
  pretty good. Would there be any impact of disabling these listeners?
 
  Thanks,
  -vivek
 
  On Wed, May 13, 2009 at 2:12 PM, vivek sar wrote:
  Otis,
 
  In that case, I'm not sure why Solr is taking up so much memory as
  soon as we start it up. I checked for .tii file and there is only one,
 
  -rw-r--r--  1 search  staff  20306 May 11 21:47 
 ./20090510_1/data/index/_3au.tii
 
  I have all the cache disabled - so that shouldn't be a problem too. My
  ramBuffer size is only 64MB.
 
  I read note on sorting,
  http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see
  something related to FieldCache. I don't see this as parameter defined
  in either solrconfig.xml or schema.xml. Could this be something that
  can load things in memory at startup? How can we disable it?
 
  I'm trying to find out if there is a way to tell how much memory Solr
  would consume and way to cap it.
 
  Thanks,
  -vivek
 
 
 
 
  On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic
  wrote:
 
  Hi,
 
  Sorting is triggered by the sort parameter in the URL, not a 
  characteristic 
 of a field. :)
 
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
  - Original Message 
  From: vivek sar 
  To: solr-user@lucene.apache.org
  Sent: Wednesday, May 13, 2009 4:42:16 PM
  Subject: Re: Solr memory requirements?
 
  Thanks Otis.
 
  Our use case doesn't require any sorting or faceting. I'm wondering if
  I've configured anything wrong.
 
  I got total of 25 fields (15 are indexed and stored, other 10 are just
  stored). All my fields are basic data type - which I thought are not
  sorted. My id field is unique key.
 
  Is there any field here that might be getting sorted?
 
 
  required=true omitNorms=true compressed=false/
 
 
  compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  default=NOW/HOUR  compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  compressed=false/
 
  compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  compressed=false/
 
  compressed=false/
 
  compressed=false/
 
  omitNorms=true compressed=false/
 
  compressed=false/
 
  default=NOW/HOUR omitNorms=true/
 
 
 
 
  omitNorms=true multiValued=true/
 
  Thanks,
  -vivek
 
  On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
  wrote:
  
   Hi,
   Some answers:
   1) .tii files in the Lucene index.  When you sort, all distinct values 
 for the
  field(s) used for sorting.  Similarly for facet fields.  Solr caches.
   2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will 
 consume
  during indexing.  There is no need to commit every 50K docs unless you 
  want 
 to
  trigger snapshot creation.
   3) see 1) above
  
   1.5 billion docs per instance where each doc is cca 1KB?  I doubt 
   that's 
 going
  to fly. :)
  
   Otis
   --
   Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
  
  
  
   - Original Message 
   From: vivek sar
   To: solr-user@lucene.apache.org
   Sent: Wednesday, May 13, 2009 3:04:46 PM
   Subject: Solr memory requirements?
  
   Hi,
  
 I'm pretty sure this has been asked before, but I couldn't find a
   complete answer

Re: Solr memory requirements?

2009-05-13 Thread Otis Gospodnetic

Yeah, I'm not sure why this would help.  There should be nothing in FieldCaches 
unless you sort or use facets.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: vivek sar vivex...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Wednesday, May 13, 2009 5:53:45 PM
 Subject: Re: Solr memory requirements?
 
 Just an update on the memory issue - might be useful for others. I
 read the following,
 
 http://wiki.apache.org/solr/SolrCaching?highlight=(SolrCaching)
 
 and looks like the first and new searcher listeners would populate the
 FieldCache. Commenting out these two listener entries seems to do the
 trick - at least the heap size is not growing as soon as Solr starts
 up.
 
 I ran some searches and they all came out fine. Index rate is also
 pretty good. Would there be any impact of disabling these listeners?
 
 Thanks,
 -vivek
 
 On Wed, May 13, 2009 at 2:12 PM, vivek sar wrote:
  Otis,
 
  In that case, I'm not sure why Solr is taking up so much memory as
  soon as we start it up. I checked for .tii file and there is only one,
 
  -rw-r--r--  1 search  staff  20306 May 11 21:47 
 ./20090510_1/data/index/_3au.tii
 
  I have all the cache disabled - so that shouldn't be a problem too. My
  ramBuffer size is only 64MB.
 
  I read note on sorting,
  http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see
  something related to FieldCache. I don't see this as parameter defined
  in either solrconfig.xml or schema.xml. Could this be something that
  can load things in memory at startup? How can we disable it?
 
  I'm trying to find out if there is a way to tell how much memory Solr
  would consume and way to cap it.
 
  Thanks,
  -vivek
 
 
 
 
  On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic
  wrote:
 
  Hi,
 
  Sorting is triggered by the sort parameter in the URL, not a 
  characteristic 
 of a field. :)
 
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
  - Original Message 
  From: vivek sar 
  To: solr-user@lucene.apache.org
  Sent: Wednesday, May 13, 2009 4:42:16 PM
  Subject: Re: Solr memory requirements?
 
  Thanks Otis.
 
  Our use case doesn't require any sorting or faceting. I'm wondering if
  I've configured anything wrong.
 
  I got total of 25 fields (15 are indexed and stored, other 10 are just
  stored). All my fields are basic data type - which I thought are not
  sorted. My id field is unique key.
 
  Is there any field here that might be getting sorted?
 
 
  required=true omitNorms=true compressed=false/
 
 
  compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  default=NOW/HOUR  compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  compressed=false/
 
  compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  compressed=false/
 
  compressed=false/
 
  compressed=false/
 
  omitNorms=true compressed=false/
 
  compressed=false/
 
  default=NOW/HOUR omitNorms=true/
 
 
 
 
  omitNorms=true multiValued=true/
 
  Thanks,
  -vivek
 
  On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
  wrote:
  
   Hi,
   Some answers:
   1) .tii files in the Lucene index.  When you sort, all distinct values 
   for 
 the
  field(s) used for sorting.  Similarly for facet fields.  Solr caches.
   2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will 
 consume
  during indexing.  There is no need to commit every 50K docs unless you 
  want 
 to
  trigger snapshot creation.
   3) see 1) above
  
   1.5 billion docs per instance where each doc is cca 1KB?  I doubt 
   that's 
 going
  to fly. :)
  
   Otis
   --
   Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
  
  
  
   - Original Message 
   From: vivek sar
   To: solr-user@lucene.apache.org
   Sent: Wednesday, May 13, 2009 3:04:46 PM
   Subject: Solr memory requirements?
  
   Hi,
  
 I'm pretty sure this has been asked before, but I couldn't find a
   complete answer in the forum archive. Here are my questions,
  
   1) When solr starts up what does it loads up in the memory? Let's say
   I've 4 cores with each core 50G in size. When Solr comes up how much
   of it would be loaded in memory?
  
   2) How much memory is required during index time? If I'm committing
   50K records at a time (1 record = 1KB) using solrj, how much memory do
   I need to give to Solr.
  
   3) Is there a minimum memory requirement by Solr to maintain a certain
   size index? Is there any benchmark on this?
  
   Here are some of my configuration from solrconfig.xml,
  
   1) 64
   2) All the caches (under query tag) are commented out
   3) Few others,
 a)  true==
   would this require memory?
 b)  50
 c) 200
 d)
 e) false

Re: Solr memory requirements?

2009-05-13 Thread Otis Gospodnetic

There is constant mixing of indexing concepts and searching concepts in this 
thread.  Are you having problems on the master (indexing) or on the slave 
(searching)?


That .tii is only 20K and you said this is a large index?  That doesn't smell 
right...

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: vivek sar vivex...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Wednesday, May 13, 2009 5:12:00 PM
 Subject: Re: Solr memory requirements?
 
 Otis,
 
 In that case, I'm not sure why Solr is taking up so much memory as
 soon as we start it up. I checked for .tii file and there is only one,
 
 -rw-r--r--  1 search  staff  20306 May 11 21:47 
 ./20090510_1/data/index/_3au.tii
 
 I have all the cache disabled - so that shouldn't be a problem too. My
 ramBuffer size is only 64MB.
 
 I read note on sorting,
 http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see
 something related to FieldCache. I don't see this as parameter defined
 in either solrconfig.xml or schema.xml. Could this be something that
 can load things in memory at startup? How can we disable it?
 
 I'm trying to find out if there is a way to tell how much memory Solr
 would consume and way to cap it.
 
 Thanks,
 -vivek
 
 
 
 
 On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic
 wrote:
 
  Hi,
 
  Sorting is triggered by the sort parameter in the URL, not a characteristic 
  of 
 a field. :)
 
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
  - Original Message 
  From: vivek sar 
  To: solr-user@lucene.apache.org
  Sent: Wednesday, May 13, 2009 4:42:16 PM
  Subject: Re: Solr memory requirements?
 
  Thanks Otis.
 
  Our use case doesn't require any sorting or faceting. I'm wondering if
  I've configured anything wrong.
 
  I got total of 25 fields (15 are indexed and stored, other 10 are just
  stored). All my fields are basic data type - which I thought are not
  sorted. My id field is unique key.
 
  Is there any field here that might be getting sorted?
 
 
  required=true omitNorms=true compressed=false/
 
 
  compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  default=NOW/HOUR  compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  compressed=false/
 
  compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  compressed=false/
 
  compressed=false/
 
  compressed=false/
 
  omitNorms=true compressed=false/
 
  compressed=false/
 
  default=NOW/HOUR omitNorms=true/
 
 
 
 
  omitNorms=true multiValued=true/
 
  Thanks,
  -vivek
 
  On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
  wrote:
  
   Hi,
   Some answers:
   1) .tii files in the Lucene index.  When you sort, all distinct values 
   for 
 the
  field(s) used for sorting.  Similarly for facet fields.  Solr caches.
   2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will 
 consume
  during indexing.  There is no need to commit every 50K docs unless you 
  want 
 to
  trigger snapshot creation.
   3) see 1) above
  
   1.5 billion docs per instance where each doc is cca 1KB?  I doubt that's 
 going
  to fly. :)
  
   Otis
   --
   Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
  
  
  
   - Original Message 
   From: vivek sar
   To: solr-user@lucene.apache.org
   Sent: Wednesday, May 13, 2009 3:04:46 PM
   Subject: Solr memory requirements?
  
   Hi,
  
 I'm pretty sure this has been asked before, but I couldn't find a
   complete answer in the forum archive. Here are my questions,
  
   1) When solr starts up what does it loads up in the memory? Let's say
   I've 4 cores with each core 50G in size. When Solr comes up how much
   of it would be loaded in memory?
  
   2) How much memory is required during index time? If I'm committing
   50K records at a time (1 record = 1KB) using solrj, how much memory do
   I need to give to Solr.
  
   3) Is there a minimum memory requirement by Solr to maintain a certain
   size index? Is there any benchmark on this?
  
   Here are some of my configuration from solrconfig.xml,
  
   1) 64
   2) All the caches (under query tag) are commented out
   3) Few others,
 a)  true==
   would this require memory?
 b)  50
 c) 200
 d)
 e) false
 f)  2
  
   The problem we are having is following,
  
   I've given Solr RAM of 6G. As the total index size (all cores
   combined) start growing the Solr memory consumption  goes up. With 800
   million documents, I see Solr already taking up all the memory at
   startup. After that the commits, searches everything become slow. We
   will be having distributed setup with multiple Solr instances (around
   8) on four boxes, but our requirement

Re: Solr memory requirements?

2009-05-13 Thread Grant Ingersoll


On May 13, 2009, at 6:53 PM, vivek sar wrote:


Disabling first/new searchers did help for the initial load time, but
after 10-15 min the heap memory start climbing up again and reached
max within 20 min. Now the GC is coming up all the time, which is
slowing down the commit and search cycles.

This is still puzzling what does Solr holds in the memory and  
doesn't release?


I haven't been able to profile as the dump is too big. Would setting
termIndexInterval help - not sure how can that be set using Solr.


It would have to be set in the same place that the ramBufferSizeMB  
gets set, in the config, but this would require some coding (albeit  
pretty straightforward) to set it on the IndexWriter.  I don't think  
it would help in profiling.


Do you have warming queries? (Sorry if I missed your answer)

Also, I know you have set the heap to 8 gbs.  Is there a size you can  
get to that it levels out at?  I presume you are getting Out Of  
Memory, right?  Or, are you just concerned about the current mem. size?