Re: Solr memory requirements?

2009-05-17 Thread jlist9
I've never paid attention to post/commit ration. I usually do a commit
after maybe 100 posts. Is there a guideline about this? Thanks.

On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
 wrote:

> 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will consume 
> during indexing.  There is no need to commit every 50K docs unless you want 
> to trigger snapshot creation.


Re: Solr memory requirements?

2009-05-17 Thread Peter Wolanin
I think that if you have in your index any documents with norms, you
will still use norms for those fields even if the schema is changed
later.  Did you wipe and re-index after all your schema changes?

-Peter

On Fri, May 15, 2009 at 9:14 PM, vivek sar  wrote:
> Some more info,
>
>  Profiling the heap dump shows
> "org.apache.lucene.index.ReadOnlySegmentReader" as the biggest object
> - taking up almost 80% of total memory (6G) - see the attached screen
> shot for a smaller dump. There is some norms object - not sure where
> are they coming from as I've omitnorms=true for all indexed records.
>
> I also noticed that if I run a query - let's say generic query that
> hits 100million records and then follow up with a specific query -
> which hits only 1 record, the second query causes the increase in
> heap.
>
> Looks like there are few bytes being loaded into memory for each
> document - I've checked the schema all indexes have omitNorms=true,
> all caches are commented out - still looking to see what else might
> put things in memory which don't get collected by GC.
>
> I also saw, https://issues.apache.org/jira/browse/SOLR- for Solr
> 1.4 (which I'm using). Not sure if that can cause any problem. I do
> use range queries for dates - would that have any effect?
>
> Any other ideas?
>
> Thanks,
> -vivek
>
> On Thu, May 14, 2009 at 8:38 PM, vivek sar  wrote:
>> Thanks Mark.
>>
>> I checked all the items you mentioned,
>>
>> 1) I've omitnorms=true for all my indexed fields (stored only fields I
>> guess doesn't matter)
>> 2) I've tried commenting out all caches in the solrconfig.xml, but
>> that doesn't help much
>> 3) I've tried commenting out the first and new searcher listeners
>> settings in the solrconfig.xml - the only way that helps is that at
>> startup time the memory usage doesn't spike up - that's only because
>> there is no auto-warmer query to run. But, I noticed commenting out
>> searchers slows down any other queries to Solr.
>> 4) I don't have any sort or facet in my queries
>> 5) I'm not sure how to change the "Lucene term interval" from Solr -
>> is there a way to do that?
>>
>> I've been playing around with this memory thing the whole day and have
>> found that it's the search that's hogging the memory. Any time there
>> is a search on all the records (800 million) the heap consumption
>> jumps by 5G. This makes me think there has to be some configuration in
>> Solr that's causing some terms per document to be loaded in memory.
>>
>> I've posted my settings several times on this forum, but no one has
>> been able to pin point what configuration might be causing this. If
>> someone is interested I can attach the solrconfig and schema files as
>> well. Here are the settings again under Query tag,
>>
>> 
>>  1024
>>  true
>>  50
>>  200
>>   
>>  false
>>  2
>>  
>>
>> and schema,
>>
>>  > required="true" omitNorms="true" compressed="false"/>
>>
>>  > compressed="false"/>
>>  > omitNorms="true" compressed="false"/>
>>  > omitNorms="true" compressed="false"/>
>>  > omitNorms="true" compressed="false"/>
>>  > default="NOW/HOUR"  compressed="false"/>
>>  > omitNorms="true" compressed="false"/>
>>  > omitNorms="true" compressed="false"/>
>>  > compressed="false"/>
>>  > compressed="false"/>
>>  > omitNorms="true" compressed="false"/>
>>  > omitNorms="true" compressed="false"/>
>>  > omitNorms="true" compressed="false"/>
>>  > omitNorms="true" compressed="false"/>
>>  > omitNorms="true" compressed="false"/>
>>  > compressed="false"/>
>>  > compressed="false"/>
>>  > compressed="false"/>
>>  > omitNorms="true" compressed="false"/>
>>  > compressed="false"/>
>>  > default="NOW/HOUR" omitNorms="true"/>
>>
>>  
>>  > omitNorms="true" multiValued="true"/>
>>
>> Any help is greatly appreciated.
>>
>> Thanks,
>> -vivek
>>
>> On Thu, May 14, 2009 at 6:22 PM, Mark Miller  wrote:
>>> 800 million docs is on the high side for modern hardware.
>>>
>>> If even one field has norms on, your talking almost 800 MB right there. And
>>> then if another Searcher is brought up well the old one is serving (which
>>> happens when you update)? Doubled.
>>>
>>> Your best bet is to distribute across a couple machines.
>>>
>>> To minimize you would want to turn off or down caching, don't facet, don't
>>> sort, turn off all norms, possibly get at the Lucene term interval and raise
>>> it. Drop on deck searchers setting. Even then, 800 million...time to
>>> distribute I'd think.
>>>
>>> vivek sar wrote:

 Some update on this issue,

 1) I attached jconsole to my app and monitored the memory usage.
 During indexing the memory usage goes up and down, which I think is
 normal. The memory remains around the min heap size (4 G) for
 indexing, but as soon as I run a search the tenured heap usage jumps
 up to 6G and remains there. Subsequent searches increases the heap
 usage even more until it reaches the max (8G) - after which everything
 (indexing and searching becomes slow).

 Th

Re: Solr memory requirements?

2009-05-15 Thread vivek sar
Some more info,

  Profiling the heap dump shows
"org.apache.lucene.index.ReadOnlySegmentReader" as the biggest object
- taking up almost 80% of total memory (6G) - see the attached screen
shot for a smaller dump. There is some norms object - not sure where
are they coming from as I've omitnorms=true for all indexed records.

I also noticed that if I run a query - let's say generic query that
hits 100million records and then follow up with a specific query -
which hits only 1 record, the second query causes the increase in
heap.

Looks like there are few bytes being loaded into memory for each
document - I've checked the schema all indexes have omitNorms=true,
all caches are commented out - still looking to see what else might
put things in memory which don't get collected by GC.

I also saw, https://issues.apache.org/jira/browse/SOLR- for Solr
1.4 (which I'm using). Not sure if that can cause any problem. I do
use range queries for dates - would that have any effect?

Any other ideas?

Thanks,
-vivek

On Thu, May 14, 2009 at 8:38 PM, vivek sar  wrote:
> Thanks Mark.
>
> I checked all the items you mentioned,
>
> 1) I've omitnorms=true for all my indexed fields (stored only fields I
> guess doesn't matter)
> 2) I've tried commenting out all caches in the solrconfig.xml, but
> that doesn't help much
> 3) I've tried commenting out the first and new searcher listeners
> settings in the solrconfig.xml - the only way that helps is that at
> startup time the memory usage doesn't spike up - that's only because
> there is no auto-warmer query to run. But, I noticed commenting out
> searchers slows down any other queries to Solr.
> 4) I don't have any sort or facet in my queries
> 5) I'm not sure how to change the "Lucene term interval" from Solr -
> is there a way to do that?
>
> I've been playing around with this memory thing the whole day and have
> found that it's the search that's hogging the memory. Any time there
> is a search on all the records (800 million) the heap consumption
> jumps by 5G. This makes me think there has to be some configuration in
> Solr that's causing some terms per document to be loaded in memory.
>
> I've posted my settings several times on this forum, but no one has
> been able to pin point what configuration might be causing this. If
> someone is interested I can attach the solrconfig and schema files as
> well. Here are the settings again under Query tag,
>
> 
>  1024
>  true
>  50
>  200
>   
>  false
>  2
>  
>
> and schema,
>
>   required="true" omitNorms="true" compressed="false"/>
>
>   compressed="false"/>
>   omitNorms="true" compressed="false"/>
>   omitNorms="true" compressed="false"/>
>   omitNorms="true" compressed="false"/>
>   default="NOW/HOUR"  compressed="false"/>
>   omitNorms="true" compressed="false"/>
>   omitNorms="true" compressed="false"/>
>   compressed="false"/>
>   compressed="false"/>
>   omitNorms="true" compressed="false"/>
>   omitNorms="true" compressed="false"/>
>   omitNorms="true" compressed="false"/>
>   omitNorms="true" compressed="false"/>
>   omitNorms="true" compressed="false"/>
>   compressed="false"/>
>   compressed="false"/>
>   compressed="false"/>
>   omitNorms="true" compressed="false"/>
>   compressed="false"/>
>   default="NOW/HOUR" omitNorms="true"/>
>
>  
>   omitNorms="true" multiValued="true"/>
>
> Any help is greatly appreciated.
>
> Thanks,
> -vivek
>
> On Thu, May 14, 2009 at 6:22 PM, Mark Miller  wrote:
>> 800 million docs is on the high side for modern hardware.
>>
>> If even one field has norms on, your talking almost 800 MB right there. And
>> then if another Searcher is brought up well the old one is serving (which
>> happens when you update)? Doubled.
>>
>> Your best bet is to distribute across a couple machines.
>>
>> To minimize you would want to turn off or down caching, don't facet, don't
>> sort, turn off all norms, possibly get at the Lucene term interval and raise
>> it. Drop on deck searchers setting. Even then, 800 million...time to
>> distribute I'd think.
>>
>> vivek sar wrote:
>>>
>>> Some update on this issue,
>>>
>>> 1) I attached jconsole to my app and monitored the memory usage.
>>> During indexing the memory usage goes up and down, which I think is
>>> normal. The memory remains around the min heap size (4 G) for
>>> indexing, but as soon as I run a search the tenured heap usage jumps
>>> up to 6G and remains there. Subsequent searches increases the heap
>>> usage even more until it reaches the max (8G) - after which everything
>>> (indexing and searching becomes slow).
>>>
>>> The search query is a very generic one in this case which goes through
>>> all the cores (4 of them - 800 million records), finds 400million
>>> matches and returns 100 rows.
>>>
>>> Does the Solr searcher holds up the reference to objects in memory? I
>>> couldn't find any settings that would tell me it does, but every
>>> search causing heap to go up is definitely suspicious.
>>>
>>> 2) I ran the jmap histo to get the top obje

Re: Solr memory requirements?

2009-05-14 Thread vivek sar
Thanks Mark.

I checked all the items you mentioned,

1) I've omitnorms=true for all my indexed fields (stored only fields I
guess doesn't matter)
2) I've tried commenting out all caches in the solrconfig.xml, but
that doesn't help much
3) I've tried commenting out the first and new searcher listeners
settings in the solrconfig.xml - the only way that helps is that at
startup time the memory usage doesn't spike up - that's only because
there is no auto-warmer query to run. But, I noticed commenting out
searchers slows down any other queries to Solr.
4) I don't have any sort or facet in my queries
5) I'm not sure how to change the "Lucene term interval" from Solr -
is there a way to do that?

I've been playing around with this memory thing the whole day and have
found that it's the search that's hogging the memory. Any time there
is a search on all the records (800 million) the heap consumption
jumps by 5G. This makes me think there has to be some configuration in
Solr that's causing some terms per document to be loaded in memory.

I've posted my settings several times on this forum, but no one has
been able to pin point what configuration might be causing this. If
someone is interested I can attach the solrconfig and schema files as
well. Here are the settings again under Query tag,


  1024
  true
  50
  200
   
  false
  2
 

and schema,

 

  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  

  
  

Any help is greatly appreciated.

Thanks,
-vivek

On Thu, May 14, 2009 at 6:22 PM, Mark Miller  wrote:
> 800 million docs is on the high side for modern hardware.
>
> If even one field has norms on, your talking almost 800 MB right there. And
> then if another Searcher is brought up well the old one is serving (which
> happens when you update)? Doubled.
>
> Your best bet is to distribute across a couple machines.
>
> To minimize you would want to turn off or down caching, don't facet, don't
> sort, turn off all norms, possibly get at the Lucene term interval and raise
> it. Drop on deck searchers setting. Even then, 800 million...time to
> distribute I'd think.
>
> vivek sar wrote:
>>
>> Some update on this issue,
>>
>> 1) I attached jconsole to my app and monitored the memory usage.
>> During indexing the memory usage goes up and down, which I think is
>> normal. The memory remains around the min heap size (4 G) for
>> indexing, but as soon as I run a search the tenured heap usage jumps
>> up to 6G and remains there. Subsequent searches increases the heap
>> usage even more until it reaches the max (8G) - after which everything
>> (indexing and searching becomes slow).
>>
>> The search query is a very generic one in this case which goes through
>> all the cores (4 of them - 800 million records), finds 400million
>> matches and returns 100 rows.
>>
>> Does the Solr searcher holds up the reference to objects in memory? I
>> couldn't find any settings that would tell me it does, but every
>> search causing heap to go up is definitely suspicious.
>>
>> 2) I ran the jmap histo to get the top objects (this is on a smaller
>> instance with 2 G memory, this is before running search - after
>> running search I wasn't able to run jmap),
>>
>>  num     #instances         #bytes  class name
>> --
>>   1:       3890855      222608992  [C
>>   2:       3891673      155666920  java.lang.String
>>   3:       3284341      131373640  org.apache.lucene.index.TermInfo
>>   4:       3334198      106694336  org.apache.lucene.index.Term
>>   5:           271       26286496  [J
>>   6:            16       26273936  [Lorg.apache.lucene.index.Term;
>>   7:            16       26273936  [Lorg.apache.lucene.index.TermInfo;
>>   8:        320512       15384576
>> org.apache.lucene.index.FreqProxTermsWriter$PostingList
>>   9:         10335       11554136  [I
>>
>> I'm not sure what's the first one (C)? I couldn't profile it to know
>> what all the Strings are being allocated by - any ideas?
>>
>> Any ideas on what Searcher might be holding on and how can we change
>> that behavior?
>>
>> Thanks,
>> -vivek
>>
>>
>> On Thu, May 14, 2009 at 11:33 AM, vivek sar  wrote:
>>
>>>
>>> I don't know if field type has any impact on the memory usage - does it?
>>>
>>> Our use cases require complete matches, thus there is no need of any
>>> analysis in most cases - does it matter in terms of memory usage?
>>>
>>> Also, is there any default caching used by Solr if I comment out all
>>> the caches under query in solrconfig.xml? I also don't have any
>>> auto-warming queries.
>>>
>>> Thanks,
>>> -vivek
>>>
>>> On Wed, May 13, 2009 at 4:24 PM, Erick Erickson 
>>> wrote:
>>>

 Warning: I'm wy out of my competency range when I comment
 on SOLR, but I've seen the statement that string fields are NOT
 tokenized while text fields are, and I notice that almost all of your
 fields
 are string type.

 Would someone more knowledgeable than me care to comment on whether
 t

Re: Solr memory requirements?

2009-05-14 Thread Mark Miller

800 million docs is on the high side for modern hardware.

If even one field has norms on, your talking almost 800 MB right there. 
And then if another Searcher is brought up well the old one is serving 
(which happens when you update)? Doubled.


Your best bet is to distribute across a couple machines.

To minimize you would want to turn off or down caching, don't facet, 
don't sort, turn off all norms, possibly get at the Lucene term interval 
and raise it. Drop on deck searchers setting. Even then, 800 
million...time to distribute I'd think.


vivek sar wrote:

Some update on this issue,

1) I attached jconsole to my app and monitored the memory usage.
During indexing the memory usage goes up and down, which I think is
normal. The memory remains around the min heap size (4 G) for
indexing, but as soon as I run a search the tenured heap usage jumps
up to 6G and remains there. Subsequent searches increases the heap
usage even more until it reaches the max (8G) - after which everything
(indexing and searching becomes slow).

The search query is a very generic one in this case which goes through
all the cores (4 of them - 800 million records), finds 400million
matches and returns 100 rows.

Does the Solr searcher holds up the reference to objects in memory? I
couldn't find any settings that would tell me it does, but every
search causing heap to go up is definitely suspicious.

2) I ran the jmap histo to get the top objects (this is on a smaller
instance with 2 G memory, this is before running search - after
running search I wasn't able to run jmap),

 num #instances #bytes  class name
--
   1:   3890855  222608992  [C
   2:   3891673  155666920  java.lang.String
   3:   3284341  131373640  org.apache.lucene.index.TermInfo
   4:   3334198  106694336  org.apache.lucene.index.Term
   5:   271   26286496  [J
   6:16   26273936  [Lorg.apache.lucene.index.Term;
   7:16   26273936  [Lorg.apache.lucene.index.TermInfo;
   8:320512   15384576
org.apache.lucene.index.FreqProxTermsWriter$PostingList
   9: 10335   11554136  [I

I'm not sure what's the first one (C)? I couldn't profile it to know
what all the Strings are being allocated by - any ideas?

Any ideas on what Searcher might be holding on and how can we change
that behavior?

Thanks,
-vivek


On Thu, May 14, 2009 at 11:33 AM, vivek sar  wrote:
  

I don't know if field type has any impact on the memory usage - does it?

Our use cases require complete matches, thus there is no need of any
analysis in most cases - does it matter in terms of memory usage?

Also, is there any default caching used by Solr if I comment out all
the caches under query in solrconfig.xml? I also don't have any
auto-warming queries.

Thanks,
-vivek

On Wed, May 13, 2009 at 4:24 PM, Erick Erickson  wrote:


Warning: I'm wy out of my competency range when I comment
on SOLR, but I've seen the statement that string fields are NOT
tokenized while text fields are, and I notice that almost all of your fields
are string type.

Would someone more knowledgeable than me care to comment on whether
this is at all relevant? Offered in the spirit that sometimes there are
things
so basic that only an amateur can see them 

Best
Erick

On Wed, May 13, 2009 at 4:42 PM, vivek sar  wrote:

  

Thanks Otis.

Our use case doesn't require any sorting or faceting. I'm wondering if
I've configured anything wrong.

I got total of 25 fields (15 are indexed and stored, other 10 are just
stored). All my fields are basic data type - which I thought are not
sorted. My id field is unique key.

Is there any field here that might be getting sorted?

 

  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  


  
  

Thanks,
-vivek

On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
 wrote:


Hi,
Some answers:
1) .tii files in the Lucene index.  When you sort, all distinct values
  

for the field(s) used for sorting.  Similarly for facet fields.  Solr
caches.


2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will
  

consume during indexing.  There is no need to commit every 50K docs unless
you want to trigger snapshot creation.


3) see 1) above

1.5 billion docs per instance where each doc is cca 1KB?  I doubt that's
  

going to fly. :)


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
  

From: vivek sar 
To: solr-user@lucene.apache.org
Sent: Wednesday, May 13, 2009 3:04:46 PM
Subject: Solr memory requirements?

Hi,

  I'm pretty sure this has been asked before, but I couldn't find a
complete answer in the forum archive. Here are my questions,

1) When solr starts up what does it loads up in the memory? Let's say
I've 4 cores with each core 50G in size. When Solr comes up how much
of it would be loaded in memory?

2

Re: Solr memory requirements?

2009-05-14 Thread vivek sar
Some update on this issue,

1) I attached jconsole to my app and monitored the memory usage.
During indexing the memory usage goes up and down, which I think is
normal. The memory remains around the min heap size (4 G) for
indexing, but as soon as I run a search the tenured heap usage jumps
up to 6G and remains there. Subsequent searches increases the heap
usage even more until it reaches the max (8G) - after which everything
(indexing and searching becomes slow).

The search query is a very generic one in this case which goes through
all the cores (4 of them - 800 million records), finds 400million
matches and returns 100 rows.

Does the Solr searcher holds up the reference to objects in memory? I
couldn't find any settings that would tell me it does, but every
search causing heap to go up is definitely suspicious.

2) I ran the jmap histo to get the top objects (this is on a smaller
instance with 2 G memory, this is before running search - after
running search I wasn't able to run jmap),

 num #instances #bytes  class name
--
   1:   3890855  222608992  [C
   2:   3891673  155666920  java.lang.String
   3:   3284341  131373640  org.apache.lucene.index.TermInfo
   4:   3334198  106694336  org.apache.lucene.index.Term
   5:   271   26286496  [J
   6:16   26273936  [Lorg.apache.lucene.index.Term;
   7:16   26273936  [Lorg.apache.lucene.index.TermInfo;
   8:320512   15384576
org.apache.lucene.index.FreqProxTermsWriter$PostingList
   9: 10335   11554136  [I

I'm not sure what's the first one (C)? I couldn't profile it to know
what all the Strings are being allocated by - any ideas?

Any ideas on what Searcher might be holding on and how can we change
that behavior?

Thanks,
-vivek


On Thu, May 14, 2009 at 11:33 AM, vivek sar  wrote:
> I don't know if field type has any impact on the memory usage - does it?
>
> Our use cases require complete matches, thus there is no need of any
> analysis in most cases - does it matter in terms of memory usage?
>
> Also, is there any default caching used by Solr if I comment out all
> the caches under query in solrconfig.xml? I also don't have any
> auto-warming queries.
>
> Thanks,
> -vivek
>
> On Wed, May 13, 2009 at 4:24 PM, Erick Erickson  
> wrote:
>> Warning: I'm wy out of my competency range when I comment
>> on SOLR, but I've seen the statement that string fields are NOT
>> tokenized while text fields are, and I notice that almost all of your fields
>> are string type.
>>
>> Would someone more knowledgeable than me care to comment on whether
>> this is at all relevant? Offered in the spirit that sometimes there are
>> things
>> so basic that only an amateur can see them 
>>
>> Best
>> Erick
>>
>> On Wed, May 13, 2009 at 4:42 PM, vivek sar  wrote:
>>
>>> Thanks Otis.
>>>
>>> Our use case doesn't require any sorting or faceting. I'm wondering if
>>> I've configured anything wrong.
>>>
>>> I got total of 25 fields (15 are indexed and stored, other 10 are just
>>> stored). All my fields are basic data type - which I thought are not
>>> sorted. My id field is unique key.
>>>
>>> Is there any field here that might be getting sorted?
>>>
>>>  >> required="true" omitNorms="true" compressed="false"/>
>>>
>>>   >> compressed="false"/>
>>>   >> omitNorms="true" compressed="false"/>
>>>   >> omitNorms="true" compressed="false"/>
>>>   >> omitNorms="true" compressed="false"/>
>>>   >> default="NOW/HOUR"  compressed="false"/>
>>>   >> omitNorms="true" compressed="false"/>
>>>   >> omitNorms="true" compressed="false"/>
>>>   >> compressed="false"/>
>>>   >> compressed="false"/>
>>>   >> omitNorms="true" compressed="false"/>
>>>   >> omitNorms="true" compressed="false"/>
>>>   >> omitNorms="true" compressed="false"/>
>>>   >> omitNorms="true" compressed="false"/>
>>>   >> omitNorms="true" compressed="false"/>
>>>   >> compressed="false"/>
>>>   >> compressed="false"/>
>>>   >> compressed="false"/>
>>>   >> omitNorms="true" compressed="false"/>
>>>   >> compressed="false"/>
>>>   >> default="NOW/HOUR" omitNorms="true"/>
>>>
>>>
>>>   
>>>   >> omitNorms="true" multiValued="true"/>
>>>
>>> Thanks,
>>> -vivek
>>>
>>> On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
>>>  wrote:
>>> >
>>> > Hi,
>>> > Some answers:
>>> > 1) .tii files in the Lucene index.  When you sort, all distinct values
>>> for the field(s) used for sorting.  Similarly for facet fields.  Solr
>>> caches.
>>> > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will
>>> consume during indexing.  There is no need to commit every 50K docs unless
>>> you want to trigger snapshot creation.
>>> > 3) see 1) above
>>> >
>>> > 1.5 billion docs per instance where each doc is cca 1KB?  I doubt that's
>>> going to fly. :)
>>> >
>>> > Otis
>>> > --
>>> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>> >
>>> >
>>> >
>>> > - Original Messag

Re: Solr memory requirements?

2009-05-14 Thread vivek sar
I don't know if field type has any impact on the memory usage - does it?

Our use cases require complete matches, thus there is no need of any
analysis in most cases - does it matter in terms of memory usage?

Also, is there any default caching used by Solr if I comment out all
the caches under query in solrconfig.xml? I also don't have any
auto-warming queries.

Thanks,
-vivek

On Wed, May 13, 2009 at 4:24 PM, Erick Erickson  wrote:
> Warning: I'm wy out of my competency range when I comment
> on SOLR, but I've seen the statement that string fields are NOT
> tokenized while text fields are, and I notice that almost all of your fields
> are string type.
>
> Would someone more knowledgeable than me care to comment on whether
> this is at all relevant? Offered in the spirit that sometimes there are
> things
> so basic that only an amateur can see them 
>
> Best
> Erick
>
> On Wed, May 13, 2009 at 4:42 PM, vivek sar  wrote:
>
>> Thanks Otis.
>>
>> Our use case doesn't require any sorting or faceting. I'm wondering if
>> I've configured anything wrong.
>>
>> I got total of 25 fields (15 are indexed and stored, other 10 are just
>> stored). All my fields are basic data type - which I thought are not
>> sorted. My id field is unique key.
>>
>> Is there any field here that might be getting sorted?
>>
>>  > required="true" omitNorms="true" compressed="false"/>
>>
>>   > compressed="false"/>
>>   > omitNorms="true" compressed="false"/>
>>   > omitNorms="true" compressed="false"/>
>>   > omitNorms="true" compressed="false"/>
>>   > default="NOW/HOUR"  compressed="false"/>
>>   > omitNorms="true" compressed="false"/>
>>   > omitNorms="true" compressed="false"/>
>>   > compressed="false"/>
>>   > compressed="false"/>
>>   > omitNorms="true" compressed="false"/>
>>   > omitNorms="true" compressed="false"/>
>>   > omitNorms="true" compressed="false"/>
>>   > omitNorms="true" compressed="false"/>
>>   > omitNorms="true" compressed="false"/>
>>   > compressed="false"/>
>>   > compressed="false"/>
>>   > compressed="false"/>
>>   > omitNorms="true" compressed="false"/>
>>   > compressed="false"/>
>>   > default="NOW/HOUR" omitNorms="true"/>
>>
>>
>>   
>>   > omitNorms="true" multiValued="true"/>
>>
>> Thanks,
>> -vivek
>>
>> On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
>>  wrote:
>> >
>> > Hi,
>> > Some answers:
>> > 1) .tii files in the Lucene index.  When you sort, all distinct values
>> for the field(s) used for sorting.  Similarly for facet fields.  Solr
>> caches.
>> > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will
>> consume during indexing.  There is no need to commit every 50K docs unless
>> you want to trigger snapshot creation.
>> > 3) see 1) above
>> >
>> > 1.5 billion docs per instance where each doc is cca 1KB?  I doubt that's
>> going to fly. :)
>> >
>> > Otis
>> > --
>> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>> >
>> >
>> >
>> > - Original Message 
>> >> From: vivek sar 
>> >> To: solr-user@lucene.apache.org
>> >> Sent: Wednesday, May 13, 2009 3:04:46 PM
>> >> Subject: Solr memory requirements?
>> >>
>> >> Hi,
>> >>
>> >>   I'm pretty sure this has been asked before, but I couldn't find a
>> >> complete answer in the forum archive. Here are my questions,
>> >>
>> >> 1) When solr starts up what does it loads up in the memory? Let's say
>> >> I've 4 cores with each core 50G in size. When Solr comes up how much
>> >> of it would be loaded in memory?
>> >>
>> >> 2) How much memory is required during index time? If I'm committing
>> >> 50K records at a time (1 record = 1KB) using solrj, how much memory do
>> >> I need to give to Solr.
>> >>
>> >> 3) Is there a minimum memory requirement by Solr to maintain a certain
>> >> size index? Is there any benchmark on this?
>> >>
>> >> Here are some of my configuration from solrconfig.xml,
>> >>
>> >> 1) 64
>> >> 2) All the caches (under query tag) are commented out
>> >> 3) Few others,
>> >>       a)  true    ==>
>> >> would this require memory?
>> >>       b)  50
>> >>       c) 200
>> >>       d)
>> >>       e) false
>> >>       f)  2
>> >>
>> >> The problem we are having is following,
>> >>
>> >> I've given Solr RAM of 6G. As the total index size (all cores
>> >> combined) start growing the Solr memory consumption  goes up. With 800
>> >> million documents, I see Solr already taking up all the memory at
>> >> startup. After that the commits, searches everything become slow. We
>> >> will be having distributed setup with multiple Solr instances (around
>> >> 8) on four boxes, but our requirement is to have each Solr instance at
>> >> least maintain around 1.5 billion documents.
>> >>
>> >> We are trying to see if we can somehow reduce the Solr memory
>> >> footprint. If someone can provide a pointer on what parameters affect
>> >> memory and what effects it has we can then decide whether we want that
>> >> parameter or not. I'm not sure if there is any minimum Solr
>> >> requirement for it to be able mainta

Re: Solr memory requirements?

2009-05-13 Thread vivek sar
Otis,

 We are not running master-slave configuration. We get very few
searches(admin only) in a day so we didn't see the need of
replication/snapshot. This problem is with one Solr instance managing
4 cores (each core 200 million records). Both indexing and searching
is performed by the same Solr instance.

What are .tii files used for? I see this file under only one core.

Still looking for what gets loaded in heap by Solr (during load time,
indexing, and searching) and stays there. I see most of these are
tenured objects and not getting released by GC - will post profile
records tomorrow.

Thanks,
-vivek





On Wed, May 13, 2009 at 6:34 PM, Otis Gospodnetic
 wrote:
>
> There is constant mixing of indexing concepts and searching concepts in this 
> thread.  Are you having problems on the master (indexing) or on the slave 
> (searching)?
>
>
> That .tii is only 20K and you said this is a large index?  That doesn't smell 
> right...
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
>> From: vivek sar 
>> To: solr-user@lucene.apache.org
>> Sent: Wednesday, May 13, 2009 5:12:00 PM
>> Subject: Re: Solr memory requirements?
>>
>> Otis,
>>
>> In that case, I'm not sure why Solr is taking up so much memory as
>> soon as we start it up. I checked for .tii file and there is only one,
>>
>> -rw-r--r--  1 search  staff  20306 May 11 21:47 
>> ./20090510_1/data/index/_3au.tii
>>
>> I have all the cache disabled - so that shouldn't be a problem too. My
>> ramBuffer size is only 64MB.
>>
>> I read note on sorting,
>> http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see
>> something related to FieldCache. I don't see this as parameter defined
>> in either solrconfig.xml or schema.xml. Could this be something that
>> can load things in memory at startup? How can we disable it?
>>
>> I'm trying to find out if there is a way to tell how much memory Solr
>> would consume and way to cap it.
>>
>> Thanks,
>> -vivek
>>
>>
>>
>>
>> On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic
>> wrote:
>> >
>> > Hi,
>> >
>> > Sorting is triggered by the sort parameter in the URL, not a 
>> > characteristic of
>> a field. :)
>> >
>> > Otis
>> > --
>> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>> >
>> >
>> >
>> > - Original Message 
>> >> From: vivek sar
>> >> To: solr-user@lucene.apache.org
>> >> Sent: Wednesday, May 13, 2009 4:42:16 PM
>> >> Subject: Re: Solr memory requirements?
>> >>
>> >> Thanks Otis.
>> >>
>> >> Our use case doesn't require any sorting or faceting. I'm wondering if
>> >> I've configured anything wrong.
>> >>
>> >> I got total of 25 fields (15 are indexed and stored, other 10 are just
>> >> stored). All my fields are basic data type - which I thought are not
>> >> sorted. My id field is unique key.
>> >>
>> >> Is there any field here that might be getting sorted?
>> >>
>> >>
>> >> required="true" omitNorms="true" compressed="false"/>
>> >>
>> >>
>> >> compressed="false"/>
>> >>
>> >> omitNorms="true" compressed="false"/>
>> >>
>> >> omitNorms="true" compressed="false"/>
>> >>
>> >> omitNorms="true" compressed="false"/>
>> >>
>> >> default="NOW/HOUR"  compressed="false"/>
>> >>
>> >> omitNorms="true" compressed="false"/>
>> >>
>> >> omitNorms="true" compressed="false"/>
>> >>
>> >> compressed="false"/>
>> >>
>> >> compressed="false"/>
>> >>
>> >> omitNorms="true" compressed="false"/>
>> >>
>> >> omitNorms="true" compressed="false"/>
>> >>
>> >> omitNorms="true" compressed="false"/>
>> >>
>> >> omitNorms="true" compressed="false"/>
>> >>
>> >> omitNorms="true" compressed="false"/>
>> >>
>> >> compressed="false&qu

Re: Solr memory requirements?

2009-05-13 Thread Grant Ingersoll


On May 13, 2009, at 6:53 PM, vivek sar wrote:


Disabling first/new searchers did help for the initial load time, but
after 10-15 min the heap memory start climbing up again and reached
max within 20 min. Now the GC is coming up all the time, which is
slowing down the commit and search cycles.

This is still puzzling what does Solr holds in the memory and  
doesn't release?


I haven't been able to profile as the dump is too big. Would setting
termIndexInterval help - not sure how can that be set using Solr.


It would have to be set in the same place that the ramBufferSizeMB  
gets set, in the config, but this would require some coding (albeit  
pretty straightforward) to set it on the IndexWriter.  I don't think  
it would help in profiling.


Do you have warming queries? (Sorry if I missed your answer)

Also, I know you have set the heap to 8 gbs.  Is there a size you can  
get to that it levels out at?  I presume you are getting Out Of  
Memory, right?  Or, are you just concerned about the current mem. size?


Re: Solr memory requirements?

2009-05-13 Thread Otis Gospodnetic

There is constant mixing of indexing concepts and searching concepts in this 
thread.  Are you having problems on the master (indexing) or on the slave 
(searching)?


That .tii is only 20K and you said this is a large index?  That doesn't smell 
right...

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: vivek sar 
> To: solr-user@lucene.apache.org
> Sent: Wednesday, May 13, 2009 5:12:00 PM
> Subject: Re: Solr memory requirements?
> 
> Otis,
> 
> In that case, I'm not sure why Solr is taking up so much memory as
> soon as we start it up. I checked for .tii file and there is only one,
> 
> -rw-r--r--  1 search  staff  20306 May 11 21:47 
> ./20090510_1/data/index/_3au.tii
> 
> I have all the cache disabled - so that shouldn't be a problem too. My
> ramBuffer size is only 64MB.
> 
> I read note on sorting,
> http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see
> something related to FieldCache. I don't see this as parameter defined
> in either solrconfig.xml or schema.xml. Could this be something that
> can load things in memory at startup? How can we disable it?
> 
> I'm trying to find out if there is a way to tell how much memory Solr
> would consume and way to cap it.
> 
> Thanks,
> -vivek
> 
> 
> 
> 
> On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic
> wrote:
> >
> > Hi,
> >
> > Sorting is triggered by the sort parameter in the URL, not a characteristic 
> > of 
> a field. :)
> >
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >
> >
> >
> > - Original Message 
> >> From: vivek sar 
> >> To: solr-user@lucene.apache.org
> >> Sent: Wednesday, May 13, 2009 4:42:16 PM
> >> Subject: Re: Solr memory requirements?
> >>
> >> Thanks Otis.
> >>
> >> Our use case doesn't require any sorting or faceting. I'm wondering if
> >> I've configured anything wrong.
> >>
> >> I got total of 25 fields (15 are indexed and stored, other 10 are just
> >> stored). All my fields are basic data type - which I thought are not
> >> sorted. My id field is unique key.
> >>
> >> Is there any field here that might be getting sorted?
> >>
> >>
> >> required="true" omitNorms="true" compressed="false"/>
> >>
> >>
> >> compressed="false"/>
> >>
> >> omitNorms="true" compressed="false"/>
> >>
> >> omitNorms="true" compressed="false"/>
> >>
> >> omitNorms="true" compressed="false"/>
> >>
> >> default="NOW/HOUR"  compressed="false"/>
> >>
> >> omitNorms="true" compressed="false"/>
> >>
> >> omitNorms="true" compressed="false"/>
> >>
> >> compressed="false"/>
> >>
> >> compressed="false"/>
> >>
> >> omitNorms="true" compressed="false"/>
> >>
> >> omitNorms="true" compressed="false"/>
> >>
> >> omitNorms="true" compressed="false"/>
> >>
> >> omitNorms="true" compressed="false"/>
> >>
> >> omitNorms="true" compressed="false"/>
> >>
> >> compressed="false"/>
> >>
> >> compressed="false"/>
> >>
> >> compressed="false"/>
> >>
> >> omitNorms="true" compressed="false"/>
> >>
> >> compressed="false"/>
> >>
> >> default="NOW/HOUR" omitNorms="true"/>
> >>
> >>
> >>
> >>
> >> omitNorms="true" multiValued="true"/>
> >>
> >> Thanks,
> >> -vivek
> >>
> >> On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
> >> wrote:
> >> >
> >> > Hi,
> >> > Some answers:
> >> > 1) .tii files in the Lucene index.  When you sort, all distinct values 
> >> > for 
> the
> >> field(s) used for sorting.  Similarly for facet fields.  Solr caches.
> >> > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will 
> consume
> >> during indexing.  There is no need to commit every 50K d

Re: Solr memory requirements?

2009-05-13 Thread Otis Gospodnetic

Yeah, I'm not sure why this would help.  There should be nothing in FieldCaches 
unless you sort or use facets.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: vivek sar 
> To: solr-user@lucene.apache.org
> Sent: Wednesday, May 13, 2009 5:53:45 PM
> Subject: Re: Solr memory requirements?
> 
> Just an update on the memory issue - might be useful for others. I
> read the following,
> 
> http://wiki.apache.org/solr/SolrCaching?highlight=(SolrCaching)
> 
> and looks like the first and new searcher listeners would populate the
> FieldCache. Commenting out these two listener entries seems to do the
> trick - at least the heap size is not growing as soon as Solr starts
> up.
> 
> I ran some searches and they all came out fine. Index rate is also
> pretty good. Would there be any impact of disabling these listeners?
> 
> Thanks,
> -vivek
> 
> On Wed, May 13, 2009 at 2:12 PM, vivek sar wrote:
> > Otis,
> >
> > In that case, I'm not sure why Solr is taking up so much memory as
> > soon as we start it up. I checked for .tii file and there is only one,
> >
> > -rw-r--r--  1 search  staff  20306 May 11 21:47 
> ./20090510_1/data/index/_3au.tii
> >
> > I have all the cache disabled - so that shouldn't be a problem too. My
> > ramBuffer size is only 64MB.
> >
> > I read note on sorting,
> > http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see
> > something related to FieldCache. I don't see this as parameter defined
> > in either solrconfig.xml or schema.xml. Could this be something that
> > can load things in memory at startup? How can we disable it?
> >
> > I'm trying to find out if there is a way to tell how much memory Solr
> > would consume and way to cap it.
> >
> > Thanks,
> > -vivek
> >
> >
> >
> >
> > On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic
> > wrote:
> >>
> >> Hi,
> >>
> >> Sorting is triggered by the sort parameter in the URL, not a 
> >> characteristic 
> of a field. :)
> >>
> >> Otis
> >> --
> >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >>
> >>
> >>
> >> - Original Message 
> >>> From: vivek sar 
> >>> To: solr-user@lucene.apache.org
> >>> Sent: Wednesday, May 13, 2009 4:42:16 PM
> >>> Subject: Re: Solr memory requirements?
> >>>
> >>> Thanks Otis.
> >>>
> >>> Our use case doesn't require any sorting or faceting. I'm wondering if
> >>> I've configured anything wrong.
> >>>
> >>> I got total of 25 fields (15 are indexed and stored, other 10 are just
> >>> stored). All my fields are basic data type - which I thought are not
> >>> sorted. My id field is unique key.
> >>>
> >>> Is there any field here that might be getting sorted?
> >>>
> >>>
> >>> required="true" omitNorms="true" compressed="false"/>
> >>>
> >>>
> >>> compressed="false"/>
> >>>
> >>> omitNorms="true" compressed="false"/>
> >>>
> >>> omitNorms="true" compressed="false"/>
> >>>
> >>> omitNorms="true" compressed="false"/>
> >>>
> >>> default="NOW/HOUR"  compressed="false"/>
> >>>
> >>> omitNorms="true" compressed="false"/>
> >>>
> >>> omitNorms="true" compressed="false"/>
> >>>
> >>> compressed="false"/>
> >>>
> >>> compressed="false"/>
> >>>
> >>> omitNorms="true" compressed="false"/>
> >>>
> >>> omitNorms="true" compressed="false"/>
> >>>
> >>> omitNorms="true" compressed="false"/>
> >>>
> >>> omitNorms="true" compressed="false"/>
> >>>
> >>> omitNorms="true" compressed="false"/>
> >>>
> >>> compressed="false"/>
> >>>
> >>> compressed="false"/>
> >>>
> >>> compressed="false"/>
> >>>
> >>> omitNorms="true" compressed="false"/>

Re: Solr memory requirements?

2009-05-13 Thread Otis Gospodnetic

Even a simple command like this will help:

  jmap -histo:live  | head -30

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: vivek sar 
> To: solr-user@lucene.apache.org
> Sent: Wednesday, May 13, 2009 6:53:29 PM
> Subject: Re: Solr memory requirements?
> 
> Disabling first/new searchers did help for the initial load time, but
> after 10-15 min the heap memory start climbing up again and reached
> max within 20 min. Now the GC is coming up all the time, which is
> slowing down the commit and search cycles.
> 
> This is still puzzling what does Solr holds in the memory and doesn't release?
> 
> I haven't been able to profile as the dump is too big. Would setting
> termIndexInterval help - not sure how can that be set using Solr.
> 
> Some other query properties under solrconfig,
> 
> 
>   1024
>   true
>   50
>   200
> 
>   false
>   2
> 
> 
> Currently, I got 800 million documents and have specified 8G heap size.
> 
> Any other suggestion on what can I do to control the Solr memory consumption?
> 
> Thanks,
> -vivek
> 
> On Wed, May 13, 2009 at 2:53 PM, vivek sar wrote:
> > Just an update on the memory issue - might be useful for others. I
> > read the following,
> >
> >  http://wiki.apache.org/solr/SolrCaching?highlight=(SolrCaching)
> >
> > and looks like the first and new searcher listeners would populate the
> > FieldCache. Commenting out these two listener entries seems to do the
> > trick - at least the heap size is not growing as soon as Solr starts
> > up.
> >
> > I ran some searches and they all came out fine. Index rate is also
> > pretty good. Would there be any impact of disabling these listeners?
> >
> > Thanks,
> > -vivek
> >
> > On Wed, May 13, 2009 at 2:12 PM, vivek sar wrote:
> >> Otis,
> >>
> >> In that case, I'm not sure why Solr is taking up so much memory as
> >> soon as we start it up. I checked for .tii file and there is only one,
> >>
> >> -rw-r--r--  1 search  staff  20306 May 11 21:47 
> ./20090510_1/data/index/_3au.tii
> >>
> >> I have all the cache disabled - so that shouldn't be a problem too. My
> >> ramBuffer size is only 64MB.
> >>
> >> I read note on sorting,
> >> http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see
> >> something related to FieldCache. I don't see this as parameter defined
> >> in either solrconfig.xml or schema.xml. Could this be something that
> >> can load things in memory at startup? How can we disable it?
> >>
> >> I'm trying to find out if there is a way to tell how much memory Solr
> >> would consume and way to cap it.
> >>
> >> Thanks,
> >> -vivek
> >>
> >>
> >>
> >>
> >> On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic
> >> wrote:
> >>>
> >>> Hi,
> >>>
> >>> Sorting is triggered by the sort parameter in the URL, not a 
> >>> characteristic 
> of a field. :)
> >>>
> >>> Otis
> >>> --
> >>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >>>
> >>>
> >>>
> >>> - Original Message 
> >>>> From: vivek sar 
> >>>> To: solr-user@lucene.apache.org
> >>>> Sent: Wednesday, May 13, 2009 4:42:16 PM
> >>>> Subject: Re: Solr memory requirements?
> >>>>
> >>>> Thanks Otis.
> >>>>
> >>>> Our use case doesn't require any sorting or faceting. I'm wondering if
> >>>> I've configured anything wrong.
> >>>>
> >>>> I got total of 25 fields (15 are indexed and stored, other 10 are just
> >>>> stored). All my fields are basic data type - which I thought are not
> >>>> sorted. My id field is unique key.
> >>>>
> >>>> Is there any field here that might be getting sorted?
> >>>>
> >>>>
> >>>> required="true" omitNorms="true" compressed="false"/>
> >>>>
> >>>>
> >>>> compressed="false"/>
> >>>>
> >>>> omitNorms="true" compressed="false"/>
> >>>>
> >>>> omitNorms="true" compressed="false"/>
> >>>>
> >>>> omitNorms="true" compressed="

Re: Solr memory requirements?

2009-05-13 Thread Erick Erickson
Warning: I'm wy out of my competency range when I comment
on SOLR, but I've seen the statement that string fields are NOT
tokenized while text fields are, and I notice that almost all of your fields
are string type.

Would someone more knowledgeable than me care to comment on whether
this is at all relevant? Offered in the spirit that sometimes there are
things
so basic that only an amateur can see them 

Best
Erick

On Wed, May 13, 2009 at 4:42 PM, vivek sar  wrote:

> Thanks Otis.
>
> Our use case doesn't require any sorting or faceting. I'm wondering if
> I've configured anything wrong.
>
> I got total of 25 fields (15 are indexed and stored, other 10 are just
> stored). All my fields are basic data type - which I thought are not
> sorted. My id field is unique key.
>
> Is there any field here that might be getting sorted?
>
>   required="true" omitNorms="true" compressed="false"/>
>
>compressed="false"/>
>omitNorms="true" compressed="false"/>
>omitNorms="true" compressed="false"/>
>omitNorms="true" compressed="false"/>
>default="NOW/HOUR"  compressed="false"/>
>omitNorms="true" compressed="false"/>
>omitNorms="true" compressed="false"/>
>compressed="false"/>
>compressed="false"/>
>omitNorms="true" compressed="false"/>
>omitNorms="true" compressed="false"/>
>omitNorms="true" compressed="false"/>
>omitNorms="true" compressed="false"/>
>omitNorms="true" compressed="false"/>
>compressed="false"/>
>compressed="false"/>
>compressed="false"/>
>omitNorms="true" compressed="false"/>
>compressed="false"/>
>default="NOW/HOUR" omitNorms="true"/>
>
>
>   
>omitNorms="true" multiValued="true"/>
>
> Thanks,
> -vivek
>
> On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
>  wrote:
> >
> > Hi,
> > Some answers:
> > 1) .tii files in the Lucene index.  When you sort, all distinct values
> for the field(s) used for sorting.  Similarly for facet fields.  Solr
> caches.
> > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will
> consume during indexing.  There is no need to commit every 50K docs unless
> you want to trigger snapshot creation.
> > 3) see 1) above
> >
> > 1.5 billion docs per instance where each doc is cca 1KB?  I doubt that's
> going to fly. :)
> >
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >
> >
> >
> > - Original Message 
> >> From: vivek sar 
> >> To: solr-user@lucene.apache.org
> >> Sent: Wednesday, May 13, 2009 3:04:46 PM
> >> Subject: Solr memory requirements?
> >>
> >> Hi,
> >>
> >>   I'm pretty sure this has been asked before, but I couldn't find a
> >> complete answer in the forum archive. Here are my questions,
> >>
> >> 1) When solr starts up what does it loads up in the memory? Let's say
> >> I've 4 cores with each core 50G in size. When Solr comes up how much
> >> of it would be loaded in memory?
> >>
> >> 2) How much memory is required during index time? If I'm committing
> >> 50K records at a time (1 record = 1KB) using solrj, how much memory do
> >> I need to give to Solr.
> >>
> >> 3) Is there a minimum memory requirement by Solr to maintain a certain
> >> size index? Is there any benchmark on this?
> >>
> >> Here are some of my configuration from solrconfig.xml,
> >>
> >> 1) 64
> >> 2) All the caches (under query tag) are commented out
> >> 3) Few others,
> >>   a)  true==>
> >> would this require memory?
> >>   b)  50
> >>   c) 200
> >>   d)
> >>   e) false
> >>   f)  2
> >>
> >> The problem we are having is following,
> >>
> >> I've given Solr RAM of 6G. As the total index size (all cores
> >> combined) start growing the Solr memory consumption  goes up. With 800
> >> million documents, I see Solr already taking up all the memory at
> >> startup. After that the commits, searches everything become slow. We
> >> will be having distributed setup with multiple Solr instances (around
> >> 8) on four boxes, but our requirement is to have each Solr instance at
> >> least maintain around 1.5 billion documents.
> >>
> >> We are trying to see if we can somehow reduce the Solr memory
> >> footprint. If someone can provide a pointer on what parameters affect
> >> memory and what effects it has we can then decide whether we want that
> >> parameter or not. I'm not sure if there is any minimum Solr
> >> requirement for it to be able maintain large indexes. I've used Lucene
> >> before and that didn't require anything by default - it used up memory
> >> only during index and search times - not otherwise.
> >>
> >> Any help is very much appreciated.
> >>
> >> Thanks,
> >> -vivek
> >
> >
>


Re: Solr memory requirements?

2009-05-13 Thread vivek sar
I think maxBufferedDocs has been deprecated in Solr 1.4 - it's
recommended to use ramBufferSizeMB instead. My ramBufferSizeMB=64.
This shouldn't be a problem I think.

There has to be something else that Solr is holding up in memory. Anyone else?

Thanks,
-vivek

On Wed, May 13, 2009 at 4:01 PM, Jack Godwin  wrote:
> Have you checked the maxBufferedDocs?  I had to drop mine down to 1000 with
> 3 million docs.
> Jack
>
> On Wed, May 13, 2009 at 6:53 PM, vivek sar  wrote:
>
>> Disabling first/new searchers did help for the initial load time, but
>> after 10-15 min the heap memory start climbing up again and reached
>> max within 20 min. Now the GC is coming up all the time, which is
>> slowing down the commit and search cycles.
>>
>> This is still puzzling what does Solr holds in the memory and doesn't
>> release?
>>
>> I haven't been able to profile as the dump is too big. Would setting
>> termIndexInterval help - not sure how can that be set using Solr.
>>
>> Some other query properties under solrconfig,
>>
>> 
>>   1024
>>   true
>>   50
>>   200
>>    
>>   false
>>   2
>>  
>>
>> Currently, I got 800 million documents and have specified 8G heap size.
>>
>> Any other suggestion on what can I do to control the Solr memory
>> consumption?
>>
>> Thanks,
>> -vivek
>>
>> On Wed, May 13, 2009 at 2:53 PM, vivek sar  wrote:
>> > Just an update on the memory issue - might be useful for others. I
>> > read the following,
>> >
>> >  http://wiki.apache.org/solr/SolrCaching?highlight=(SolrCaching)
>> >
>> > and looks like the first and new searcher listeners would populate the
>> > FieldCache. Commenting out these two listener entries seems to do the
>> > trick - at least the heap size is not growing as soon as Solr starts
>> > up.
>> >
>> > I ran some searches and they all came out fine. Index rate is also
>> > pretty good. Would there be any impact of disabling these listeners?
>> >
>> > Thanks,
>> > -vivek
>> >
>> > On Wed, May 13, 2009 at 2:12 PM, vivek sar  wrote:
>> >> Otis,
>> >>
>> >> In that case, I'm not sure why Solr is taking up so much memory as
>> >> soon as we start it up. I checked for .tii file and there is only one,
>> >>
>> >> -rw-r--r--  1 search  staff  20306 May 11 21:47
>> ./20090510_1/data/index/_3au.tii
>> >>
>> >> I have all the cache disabled - so that shouldn't be a problem too. My
>> >> ramBuffer size is only 64MB.
>> >>
>> >> I read note on sorting,
>> >> http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see
>> >> something related to FieldCache. I don't see this as parameter defined
>> >> in either solrconfig.xml or schema.xml. Could this be something that
>> >> can load things in memory at startup? How can we disable it?
>> >>
>> >> I'm trying to find out if there is a way to tell how much memory Solr
>> >> would consume and way to cap it.
>> >>
>> >> Thanks,
>> >> -vivek
>> >>
>> >>
>> >>
>> >>
>> >> On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic
>> >>  wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> Sorting is triggered by the sort parameter in the URL, not a
>> characteristic of a field. :)
>> >>>
>> >>> Otis
>> >>> --
>> >>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>> >>>
>> >>>
>> >>>
>> >>> - Original Message 
>> >>>> From: vivek sar 
>> >>>> To: solr-user@lucene.apache.org
>> >>>> Sent: Wednesday, May 13, 2009 4:42:16 PM
>> >>>> Subject: Re: Solr memory requirements?
>> >>>>
>> >>>> Thanks Otis.
>> >>>>
>> >>>> Our use case doesn't require any sorting or faceting. I'm wondering if
>> >>>> I've configured anything wrong.
>> >>>>
>> >>>> I got total of 25 fields (15 are indexed and stored, other 10 are just
>> >>>> stored). All my fields are basic data type - which I thought are not
>> >>>> sorted. My id field is unique key.
>> >>>>
>> >>>> Is there any fie

Re: Solr memory requirements?

2009-05-13 Thread Jack Godwin
Have you checked the maxBufferedDocs?  I had to drop mine down to 1000 with
3 million docs.
Jack

On Wed, May 13, 2009 at 6:53 PM, vivek sar  wrote:

> Disabling first/new searchers did help for the initial load time, but
> after 10-15 min the heap memory start climbing up again and reached
> max within 20 min. Now the GC is coming up all the time, which is
> slowing down the commit and search cycles.
>
> This is still puzzling what does Solr holds in the memory and doesn't
> release?
>
> I haven't been able to profile as the dump is too big. Would setting
> termIndexInterval help - not sure how can that be set using Solr.
>
> Some other query properties under solrconfig,
>
> 
>   1024
>   true
>   50
>   200
>
>   false
>   2
>  
>
> Currently, I got 800 million documents and have specified 8G heap size.
>
> Any other suggestion on what can I do to control the Solr memory
> consumption?
>
> Thanks,
> -vivek
>
> On Wed, May 13, 2009 at 2:53 PM, vivek sar  wrote:
> > Just an update on the memory issue - might be useful for others. I
> > read the following,
> >
> >  http://wiki.apache.org/solr/SolrCaching?highlight=(SolrCaching)
> >
> > and looks like the first and new searcher listeners would populate the
> > FieldCache. Commenting out these two listener entries seems to do the
> > trick - at least the heap size is not growing as soon as Solr starts
> > up.
> >
> > I ran some searches and they all came out fine. Index rate is also
> > pretty good. Would there be any impact of disabling these listeners?
> >
> > Thanks,
> > -vivek
> >
> > On Wed, May 13, 2009 at 2:12 PM, vivek sar  wrote:
> >> Otis,
> >>
> >> In that case, I'm not sure why Solr is taking up so much memory as
> >> soon as we start it up. I checked for .tii file and there is only one,
> >>
> >> -rw-r--r--  1 search  staff  20306 May 11 21:47
> ./20090510_1/data/index/_3au.tii
> >>
> >> I have all the cache disabled - so that shouldn't be a problem too. My
> >> ramBuffer size is only 64MB.
> >>
> >> I read note on sorting,
> >> http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see
> >> something related to FieldCache. I don't see this as parameter defined
> >> in either solrconfig.xml or schema.xml. Could this be something that
> >> can load things in memory at startup? How can we disable it?
> >>
> >> I'm trying to find out if there is a way to tell how much memory Solr
> >> would consume and way to cap it.
> >>
> >> Thanks,
> >> -vivek
> >>
> >>
> >>
> >>
> >> On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic
> >>  wrote:
> >>>
> >>> Hi,
> >>>
> >>> Sorting is triggered by the sort parameter in the URL, not a
> characteristic of a field. :)
> >>>
> >>> Otis
> >>> --
> >>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >>>
> >>>
> >>>
> >>> - Original Message 
> >>>> From: vivek sar 
> >>>> To: solr-user@lucene.apache.org
> >>>> Sent: Wednesday, May 13, 2009 4:42:16 PM
> >>>> Subject: Re: Solr memory requirements?
> >>>>
> >>>> Thanks Otis.
> >>>>
> >>>> Our use case doesn't require any sorting or faceting. I'm wondering if
> >>>> I've configured anything wrong.
> >>>>
> >>>> I got total of 25 fields (15 are indexed and stored, other 10 are just
> >>>> stored). All my fields are basic data type - which I thought are not
> >>>> sorted. My id field is unique key.
> >>>>
> >>>> Is there any field here that might be getting sorted?
> >>>>
> >>>>
> >>>> required="true" omitNorms="true" compressed="false"/>
> >>>>
> >>>>
> >>>> compressed="false"/>
> >>>>
> >>>> omitNorms="true" compressed="false"/>
> >>>>
> >>>> omitNorms="true" compressed="false"/>
> >>>>
> >>>> omitNorms="true" compressed="false"/>
> >>>>
> >>>> default="NOW/HOUR"  compressed="false"/>
> >>>>
> >>>> omitNorms="true&qu

Re: Solr memory requirements?

2009-05-13 Thread vivek sar
Disabling first/new searchers did help for the initial load time, but
after 10-15 min the heap memory start climbing up again and reached
max within 20 min. Now the GC is coming up all the time, which is
slowing down the commit and search cycles.

This is still puzzling what does Solr holds in the memory and doesn't release?

I haven't been able to profile as the dump is too big. Would setting
termIndexInterval help - not sure how can that be set using Solr.

Some other query properties under solrconfig,


   1024
   true
   50
   200

   false
   2
 

Currently, I got 800 million documents and have specified 8G heap size.

Any other suggestion on what can I do to control the Solr memory consumption?

Thanks,
-vivek

On Wed, May 13, 2009 at 2:53 PM, vivek sar  wrote:
> Just an update on the memory issue - might be useful for others. I
> read the following,
>
>  http://wiki.apache.org/solr/SolrCaching?highlight=(SolrCaching)
>
> and looks like the first and new searcher listeners would populate the
> FieldCache. Commenting out these two listener entries seems to do the
> trick - at least the heap size is not growing as soon as Solr starts
> up.
>
> I ran some searches and they all came out fine. Index rate is also
> pretty good. Would there be any impact of disabling these listeners?
>
> Thanks,
> -vivek
>
> On Wed, May 13, 2009 at 2:12 PM, vivek sar  wrote:
>> Otis,
>>
>> In that case, I'm not sure why Solr is taking up so much memory as
>> soon as we start it up. I checked for .tii file and there is only one,
>>
>> -rw-r--r--  1 search  staff  20306 May 11 21:47 
>> ./20090510_1/data/index/_3au.tii
>>
>> I have all the cache disabled - so that shouldn't be a problem too. My
>> ramBuffer size is only 64MB.
>>
>> I read note on sorting,
>> http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see
>> something related to FieldCache. I don't see this as parameter defined
>> in either solrconfig.xml or schema.xml. Could this be something that
>> can load things in memory at startup? How can we disable it?
>>
>> I'm trying to find out if there is a way to tell how much memory Solr
>> would consume and way to cap it.
>>
>> Thanks,
>> -vivek
>>
>>
>>
>>
>> On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic
>>  wrote:
>>>
>>> Hi,
>>>
>>> Sorting is triggered by the sort parameter in the URL, not a characteristic 
>>> of a field. :)
>>>
>>> Otis
>>> --
>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>>
>>>
>>>
>>> - Original Message 
>>>> From: vivek sar 
>>>> To: solr-user@lucene.apache.org
>>>> Sent: Wednesday, May 13, 2009 4:42:16 PM
>>>> Subject: Re: Solr memory requirements?
>>>>
>>>> Thanks Otis.
>>>>
>>>> Our use case doesn't require any sorting or faceting. I'm wondering if
>>>> I've configured anything wrong.
>>>>
>>>> I got total of 25 fields (15 are indexed and stored, other 10 are just
>>>> stored). All my fields are basic data type - which I thought are not
>>>> sorted. My id field is unique key.
>>>>
>>>> Is there any field here that might be getting sorted?
>>>>
>>>>
>>>> required="true" omitNorms="true" compressed="false"/>
>>>>
>>>>
>>>> compressed="false"/>
>>>>
>>>> omitNorms="true" compressed="false"/>
>>>>
>>>> omitNorms="true" compressed="false"/>
>>>>
>>>> omitNorms="true" compressed="false"/>
>>>>
>>>> default="NOW/HOUR"  compressed="false"/>
>>>>
>>>> omitNorms="true" compressed="false"/>
>>>>
>>>> omitNorms="true" compressed="false"/>
>>>>
>>>> compressed="false"/>
>>>>
>>>> compressed="false"/>
>>>>
>>>> omitNorms="true" compressed="false"/>
>>>>
>>>> omitNorms="true" compressed="false"/>
>>>>
>>>> omitNorms="true" compressed="false"/>
>>>>
>>>> omitNorms="true" compressed="false"/>
>>>>
>>>> omitNorms="true" co

Re: Solr memory requirements?

2009-05-13 Thread vivek sar
Just an update on the memory issue - might be useful for others. I
read the following,

 http://wiki.apache.org/solr/SolrCaching?highlight=(SolrCaching)

and looks like the first and new searcher listeners would populate the
FieldCache. Commenting out these two listener entries seems to do the
trick - at least the heap size is not growing as soon as Solr starts
up.

I ran some searches and they all came out fine. Index rate is also
pretty good. Would there be any impact of disabling these listeners?

Thanks,
-vivek

On Wed, May 13, 2009 at 2:12 PM, vivek sar  wrote:
> Otis,
>
> In that case, I'm not sure why Solr is taking up so much memory as
> soon as we start it up. I checked for .tii file and there is only one,
>
> -rw-r--r--  1 search  staff  20306 May 11 21:47 
> ./20090510_1/data/index/_3au.tii
>
> I have all the cache disabled - so that shouldn't be a problem too. My
> ramBuffer size is only 64MB.
>
> I read note on sorting,
> http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see
> something related to FieldCache. I don't see this as parameter defined
> in either solrconfig.xml or schema.xml. Could this be something that
> can load things in memory at startup? How can we disable it?
>
> I'm trying to find out if there is a way to tell how much memory Solr
> would consume and way to cap it.
>
> Thanks,
> -vivek
>
>
>
>
> On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic
>  wrote:
>>
>> Hi,
>>
>> Sorting is triggered by the sort parameter in the URL, not a characteristic 
>> of a field. :)
>>
>> Otis
>> --
>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>
>>
>>
>> - Original Message 
>>> From: vivek sar 
>>> To: solr-user@lucene.apache.org
>>> Sent: Wednesday, May 13, 2009 4:42:16 PM
>>> Subject: Re: Solr memory requirements?
>>>
>>> Thanks Otis.
>>>
>>> Our use case doesn't require any sorting or faceting. I'm wondering if
>>> I've configured anything wrong.
>>>
>>> I got total of 25 fields (15 are indexed and stored, other 10 are just
>>> stored). All my fields are basic data type - which I thought are not
>>> sorted. My id field is unique key.
>>>
>>> Is there any field here that might be getting sorted?
>>>
>>>
>>> required="true" omitNorms="true" compressed="false"/>
>>>
>>>
>>> compressed="false"/>
>>>
>>> omitNorms="true" compressed="false"/>
>>>
>>> omitNorms="true" compressed="false"/>
>>>
>>> omitNorms="true" compressed="false"/>
>>>
>>> default="NOW/HOUR"  compressed="false"/>
>>>
>>> omitNorms="true" compressed="false"/>
>>>
>>> omitNorms="true" compressed="false"/>
>>>
>>> compressed="false"/>
>>>
>>> compressed="false"/>
>>>
>>> omitNorms="true" compressed="false"/>
>>>
>>> omitNorms="true" compressed="false"/>
>>>
>>> omitNorms="true" compressed="false"/>
>>>
>>> omitNorms="true" compressed="false"/>
>>>
>>> omitNorms="true" compressed="false"/>
>>>
>>> compressed="false"/>
>>>
>>> compressed="false"/>
>>>
>>> compressed="false"/>
>>>
>>> omitNorms="true" compressed="false"/>
>>>
>>> compressed="false"/>
>>>
>>> default="NOW/HOUR" omitNorms="true"/>
>>>
>>>
>>>
>>>
>>> omitNorms="true" multiValued="true"/>
>>>
>>> Thanks,
>>> -vivek
>>>
>>> On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
>>> wrote:
>>> >
>>> > Hi,
>>> > Some answers:
>>> > 1) .tii files in the Lucene index.  When you sort, all distinct values 
>>> > for the
>>> field(s) used for sorting.  Similarly for facet fields.  Solr caches.
>>> > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will 
>>> > consume
>>> during indexing.  There is no need to commit every 50K docs unless you want 
>>> to
>>> trigger

Re: Solr memory requirements?

2009-05-13 Thread Grant Ingersoll
Have you done any profiling to see where the hotspots are?  I realize  
that may be difficult on an index of that size, but maybe you can  
approximate on a smaller version.  Also, do you have warming queries?


You might also look into setting the termIndexInterval at the Lucene  
level.  This is not currently exposed in Solr (AFAIK), but likely  
could be added fairly easily as part of the index parameters.  http://lucene.apache.org/java/2_4_1/api/core/org/apache/lucene/index/IndexWriter.html#setTermIndexInterval(int)


-Grant

On May 13, 2009, at 5:12 PM, vivek sar wrote:


Otis,

In that case, I'm not sure why Solr is taking up so much memory as
soon as we start it up. I checked for .tii file and there is only one,

-rw-r--r--  1 search  staff  20306 May 11 21:47 ./20090510_1/data/ 
index/_3au.tii


I have all the cache disabled - so that shouldn't be a problem too. My
ramBuffer size is only 64MB.

I read note on sorting,
http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see
something related to FieldCache. I don't see this as parameter defined
in either solrconfig.xml or schema.xml. Could this be something that
can load things in memory at startup? How can we disable it?

I'm trying to find out if there is a way to tell how much memory Solr
would consume and way to cap it.

Thanks,
-vivek




On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic
 wrote:


Hi,

Sorting is triggered by the sort parameter in the URL, not a  
characteristic of a field. :)


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: vivek sar 
To: solr-user@lucene.apache.org
Sent: Wednesday, May 13, 2009 4:42:16 PM
Subject: Re: Solr memory requirements?

Thanks Otis.

Our use case doesn't require any sorting or faceting. I'm  
wondering if

I've configured anything wrong.

I got total of 25 fields (15 are indexed and stored, other 10 are  
just

stored). All my fields are basic data type - which I thought are not
sorted. My id field is unique key.

Is there any field here that might be getting sorted?


required="true" omitNorms="true" compressed="false"/>


compressed="false"/>

omitNorms="true" compressed="false"/>

omitNorms="true" compressed="false"/>

omitNorms="true" compressed="false"/>

default="NOW/HOUR"  compressed="false"/>

omitNorms="true" compressed="false"/>

omitNorms="true" compressed="false"/>

compressed="false"/>

compressed="false"/>

omitNorms="true" compressed="false"/>

omitNorms="true" compressed="false"/>

omitNorms="true" compressed="false"/>

omitNorms="true" compressed="false"/>

omitNorms="true" compressed="false"/>

compressed="false"/>

compressed="false"/>

compressed="false"/>

omitNorms="true" compressed="false"/>

compressed="false"/>

default="NOW/HOUR" omitNorms="true"/>




omitNorms="true" multiValued="true"/>

Thanks,
-vivek

On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
wrote:


Hi,
Some answers:
1) .tii files in the Lucene index.  When you sort, all distinct  
values for the
field(s) used for sorting.  Similarly for facet fields.  Solr  
caches.
2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr  
will consume
during indexing.  There is no need to commit every 50K docs unless  
you want to

trigger snapshot creation.

3) see 1) above

1.5 billion docs per instance where each doc is cca 1KB?  I doubt  
that's going

to fly. :)


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: vivek sar
To: solr-user@lucene.apache.org
Sent: Wednesday, May 13, 2009 3:04:46 PM
Subject: Solr memory requirements?

Hi,

  I'm pretty sure this has been asked before, but I couldn't  
find a

complete answer in the forum archive. Here are my questions,

1) When solr starts up what does it loads up in the memory?  
Let's say
I've 4 cores with each core 50G in size. When Solr comes up how  
much

of it would be loaded in memory?

2) How much memory is required during index time? If I'm  
committing
50K records at a time (1 record = 1KB) using solrj, how much  
memory do

I need to give to Solr.

3) Is there a minimum memory requirement by Solr to maintain a  
certain

size index? Is there any benchmark on this?

Here are some of my configuration from solrconfig.xml,

1) 64
2) All the caches (under query tag) are commented out
3) Few others,
  a)  true==>
would this require memory?
  b)  50
  c) 200
  d)
  e) false
  f)  2

The problem we are hav

Re: Solr memory requirements?

2009-05-13 Thread vivek sar
Otis,

In that case, I'm not sure why Solr is taking up so much memory as
soon as we start it up. I checked for .tii file and there is only one,

-rw-r--r--  1 search  staff  20306 May 11 21:47 ./20090510_1/data/index/_3au.tii

I have all the cache disabled - so that shouldn't be a problem too. My
ramBuffer size is only 64MB.

I read note on sorting,
http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see
something related to FieldCache. I don't see this as parameter defined
in either solrconfig.xml or schema.xml. Could this be something that
can load things in memory at startup? How can we disable it?

I'm trying to find out if there is a way to tell how much memory Solr
would consume and way to cap it.

Thanks,
-vivek




On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic
 wrote:
>
> Hi,
>
> Sorting is triggered by the sort parameter in the URL, not a characteristic 
> of a field. :)
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
>> From: vivek sar 
>> To: solr-user@lucene.apache.org
>> Sent: Wednesday, May 13, 2009 4:42:16 PM
>> Subject: Re: Solr memory requirements?
>>
>> Thanks Otis.
>>
>> Our use case doesn't require any sorting or faceting. I'm wondering if
>> I've configured anything wrong.
>>
>> I got total of 25 fields (15 are indexed and stored, other 10 are just
>> stored). All my fields are basic data type - which I thought are not
>> sorted. My id field is unique key.
>>
>> Is there any field here that might be getting sorted?
>>
>>
>> required="true" omitNorms="true" compressed="false"/>
>>
>>
>> compressed="false"/>
>>
>> omitNorms="true" compressed="false"/>
>>
>> omitNorms="true" compressed="false"/>
>>
>> omitNorms="true" compressed="false"/>
>>
>> default="NOW/HOUR"  compressed="false"/>
>>
>> omitNorms="true" compressed="false"/>
>>
>> omitNorms="true" compressed="false"/>
>>
>> compressed="false"/>
>>
>> compressed="false"/>
>>
>> omitNorms="true" compressed="false"/>
>>
>> omitNorms="true" compressed="false"/>
>>
>> omitNorms="true" compressed="false"/>
>>
>> omitNorms="true" compressed="false"/>
>>
>> omitNorms="true" compressed="false"/>
>>
>> compressed="false"/>
>>
>> compressed="false"/>
>>
>> compressed="false"/>
>>
>> omitNorms="true" compressed="false"/>
>>
>> compressed="false"/>
>>
>> default="NOW/HOUR" omitNorms="true"/>
>>
>>
>>
>>
>> omitNorms="true" multiValued="true"/>
>>
>> Thanks,
>> -vivek
>>
>> On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
>> wrote:
>> >
>> > Hi,
>> > Some answers:
>> > 1) .tii files in the Lucene index.  When you sort, all distinct values for 
>> > the
>> field(s) used for sorting.  Similarly for facet fields.  Solr caches.
>> > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will 
>> > consume
>> during indexing.  There is no need to commit every 50K docs unless you want 
>> to
>> trigger snapshot creation.
>> > 3) see 1) above
>> >
>> > 1.5 billion docs per instance where each doc is cca 1KB?  I doubt that's 
>> > going
>> to fly. :)
>> >
>> > Otis
>> > --
>> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>> >
>> >
>> >
>> > - Original Message 
>> >> From: vivek sar
>> >> To: solr-user@lucene.apache.org
>> >> Sent: Wednesday, May 13, 2009 3:04:46 PM
>> >> Subject: Solr memory requirements?
>> >>
>> >> Hi,
>> >>
>> >>   I'm pretty sure this has been asked before, but I couldn't find a
>> >> complete answer in the forum archive. Here are my questions,
>> >>
>> >> 1) When solr starts up what does it loads up in the memory? Let's say
>> >> I've 4 cores with each core 50G in size. When Solr comes up how much
>> >> of it would be loaded in me

Re: Solr memory requirements?

2009-05-13 Thread Otis Gospodnetic

Hi,

Sorting is triggered by the sort parameter in the URL, not a characteristic of 
a field. :)

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: vivek sar 
> To: solr-user@lucene.apache.org
> Sent: Wednesday, May 13, 2009 4:42:16 PM
> Subject: Re: Solr memory requirements?
> 
> Thanks Otis.
> 
> Our use case doesn't require any sorting or faceting. I'm wondering if
> I've configured anything wrong.
> 
> I got total of 25 fields (15 are indexed and stored, other 10 are just
> stored). All my fields are basic data type - which I thought are not
> sorted. My id field is unique key.
> 
> Is there any field here that might be getting sorted?
> 
> 
> required="true" omitNorms="true" compressed="false"/>
> 
>   
> compressed="false"/>
>   
> omitNorms="true" compressed="false"/>
>   
> omitNorms="true" compressed="false"/>
>   
> omitNorms="true" compressed="false"/>
>   
> default="NOW/HOUR"  compressed="false"/>
>   
> omitNorms="true" compressed="false"/>
>   
> omitNorms="true" compressed="false"/>
>   
> compressed="false"/>
>   
> compressed="false"/>
>   
> omitNorms="true" compressed="false"/>
>   
> omitNorms="true" compressed="false"/>
>   
> omitNorms="true" compressed="false"/>
>   
> omitNorms="true" compressed="false"/>
>   
> omitNorms="true" compressed="false"/>
>   
> compressed="false"/>
>   
> compressed="false"/>
>   
> compressed="false"/>
>   
> omitNorms="true" compressed="false"/>
>   
> compressed="false"/>
>   
> default="NOW/HOUR" omitNorms="true"/>
> 
> 
>   
>   
> omitNorms="true" multiValued="true"/>
> 
> Thanks,
> -vivek
> 
> On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
> wrote:
> >
> > Hi,
> > Some answers:
> > 1) .tii files in the Lucene index.  When you sort, all distinct values for 
> > the 
> field(s) used for sorting.  Similarly for facet fields.  Solr caches.
> > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will 
> > consume 
> during indexing.  There is no need to commit every 50K docs unless you want 
> to 
> trigger snapshot creation.
> > 3) see 1) above
> >
> > 1.5 billion docs per instance where each doc is cca 1KB?  I doubt that's 
> > going 
> to fly. :)
> >
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >
> >
> >
> > - Original Message 
> >> From: vivek sar 
> >> To: solr-user@lucene.apache.org
> >> Sent: Wednesday, May 13, 2009 3:04:46 PM
> >> Subject: Solr memory requirements?
> >>
> >> Hi,
> >>
> >>   I'm pretty sure this has been asked before, but I couldn't find a
> >> complete answer in the forum archive. Here are my questions,
> >>
> >> 1) When solr starts up what does it loads up in the memory? Let's say
> >> I've 4 cores with each core 50G in size. When Solr comes up how much
> >> of it would be loaded in memory?
> >>
> >> 2) How much memory is required during index time? If I'm committing
> >> 50K records at a time (1 record = 1KB) using solrj, how much memory do
> >> I need to give to Solr.
> >>
> >> 3) Is there a minimum memory requirement by Solr to maintain a certain
> >> size index? Is there any benchmark on this?
> >>
> >> Here are some of my configuration from solrconfig.xml,
> >>
> >> 1) 64
> >> 2) All the caches (under query tag) are commented out
> >> 3) Few others,
> >>   a)  true==>
> >> would this require memory?
> >>   b)  50
> >>   c) 200
> >>   d)
> >>   e) false
> >>   f)  2
> >>
> >> The problem we are having is following,
> >>
> >> I've given Solr RAM of 6G. As the total index size (all cores
> >> combined) start growing the Solr memory consumption  goes up. With 800
> >> million documents, I see Solr already taking up all the memory at
> >> startup. After that the commits, searches everything become slow. We
> >> will be having distributed setup with multiple Solr instances (around
> >> 8) on four boxes, but our requirement is to have each Solr instance at
> >> least maintain around 1.5 billion documents.
> >>
> >> We are trying to see if we can somehow reduce the Solr memory
> >> footprint. If someone can provide a pointer on what parameters affect
> >> memory and what effects it has we can then decide whether we want that
> >> parameter or not. I'm not sure if there is any minimum Solr
> >> requirement for it to be able maintain large indexes. I've used Lucene
> >> before and that didn't require anything by default - it used up memory
> >> only during index and search times - not otherwise.
> >>
> >> Any help is very much appreciated.
> >>
> >> Thanks,
> >> -vivek
> >
> >



Re: Solr memory requirements?

2009-05-13 Thread vivek sar
Thanks Otis.

Our use case doesn't require any sorting or faceting. I'm wondering if
I've configured anything wrong.

I got total of 25 fields (15 are indexed and stored, other 10 are just
stored). All my fields are basic data type - which I thought are not
sorted. My id field is unique key.

Is there any field here that might be getting sorted?

 

   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   


   
   

Thanks,
-vivek

On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
 wrote:
>
> Hi,
> Some answers:
> 1) .tii files in the Lucene index.  When you sort, all distinct values for 
> the field(s) used for sorting.  Similarly for facet fields.  Solr caches.
> 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will consume 
> during indexing.  There is no need to commit every 50K docs unless you want 
> to trigger snapshot creation.
> 3) see 1) above
>
> 1.5 billion docs per instance where each doc is cca 1KB?  I doubt that's 
> going to fly. :)
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
>> From: vivek sar 
>> To: solr-user@lucene.apache.org
>> Sent: Wednesday, May 13, 2009 3:04:46 PM
>> Subject: Solr memory requirements?
>>
>> Hi,
>>
>>   I'm pretty sure this has been asked before, but I couldn't find a
>> complete answer in the forum archive. Here are my questions,
>>
>> 1) When solr starts up what does it loads up in the memory? Let's say
>> I've 4 cores with each core 50G in size. When Solr comes up how much
>> of it would be loaded in memory?
>>
>> 2) How much memory is required during index time? If I'm committing
>> 50K records at a time (1 record = 1KB) using solrj, how much memory do
>> I need to give to Solr.
>>
>> 3) Is there a minimum memory requirement by Solr to maintain a certain
>> size index? Is there any benchmark on this?
>>
>> Here are some of my configuration from solrconfig.xml,
>>
>> 1) 64
>> 2) All the caches (under query tag) are commented out
>> 3) Few others,
>>       a)  true    ==>
>> would this require memory?
>>       b)  50
>>       c) 200
>>       d)
>>       e) false
>>       f)  2
>>
>> The problem we are having is following,
>>
>> I've given Solr RAM of 6G. As the total index size (all cores
>> combined) start growing the Solr memory consumption  goes up. With 800
>> million documents, I see Solr already taking up all the memory at
>> startup. After that the commits, searches everything become slow. We
>> will be having distributed setup with multiple Solr instances (around
>> 8) on four boxes, but our requirement is to have each Solr instance at
>> least maintain around 1.5 billion documents.
>>
>> We are trying to see if we can somehow reduce the Solr memory
>> footprint. If someone can provide a pointer on what parameters affect
>> memory and what effects it has we can then decide whether we want that
>> parameter or not. I'm not sure if there is any minimum Solr
>> requirement for it to be able maintain large indexes. I've used Lucene
>> before and that didn't require anything by default - it used up memory
>> only during index and search times - not otherwise.
>>
>> Any help is very much appreciated.
>>
>> Thanks,
>> -vivek
>
>


Re: Solr memory requirements?

2009-05-13 Thread Otis Gospodnetic

Hi,
Some answers:
1) .tii files in the Lucene index.  When you sort, all distinct values for the 
field(s) used for sorting.  Similarly for facet fields.  Solr caches.
2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will consume 
during indexing.  There is no need to commit every 50K docs unless you want to 
trigger snapshot creation.
3) see 1) above

1.5 billion docs per instance where each doc is cca 1KB?  I doubt that's going 
to fly. :)

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: vivek sar 
> To: solr-user@lucene.apache.org
> Sent: Wednesday, May 13, 2009 3:04:46 PM
> Subject: Solr memory requirements?
> 
> Hi,
> 
>   I'm pretty sure this has been asked before, but I couldn't find a
> complete answer in the forum archive. Here are my questions,
> 
> 1) When solr starts up what does it loads up in the memory? Let's say
> I've 4 cores with each core 50G in size. When Solr comes up how much
> of it would be loaded in memory?
> 
> 2) How much memory is required during index time? If I'm committing
> 50K records at a time (1 record = 1KB) using solrj, how much memory do
> I need to give to Solr.
> 
> 3) Is there a minimum memory requirement by Solr to maintain a certain
> size index? Is there any benchmark on this?
> 
> Here are some of my configuration from solrconfig.xml,
> 
> 1) 64
> 2) All the caches (under query tag) are commented out
> 3) Few others,
>   a)  true==>
> would this require memory?
>   b)  50
>   c) 200
>   d) 
>   e) false
>   f)  2
> 
> The problem we are having is following,
> 
> I've given Solr RAM of 6G. As the total index size (all cores
> combined) start growing the Solr memory consumption  goes up. With 800
> million documents, I see Solr already taking up all the memory at
> startup. After that the commits, searches everything become slow. We
> will be having distributed setup with multiple Solr instances (around
> 8) on four boxes, but our requirement is to have each Solr instance at
> least maintain around 1.5 billion documents.
> 
> We are trying to see if we can somehow reduce the Solr memory
> footprint. If someone can provide a pointer on what parameters affect
> memory and what effects it has we can then decide whether we want that
> parameter or not. I'm not sure if there is any minimum Solr
> requirement for it to be able maintain large indexes. I've used Lucene
> before and that didn't require anything by default - it used up memory
> only during index and search times - not otherwise.
> 
> Any help is very much appreciated.
> 
> Thanks,
> -vivek