Re: Solr memory requirements?
I've never paid attention to post/commit ration. I usually do a commit after maybe 100 posts. Is there a guideline about this? Thanks. On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic wrote: > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will consume > during indexing. There is no need to commit every 50K docs unless you want > to trigger snapshot creation.
Re: Solr memory requirements?
I think that if you have in your index any documents with norms, you will still use norms for those fields even if the schema is changed later. Did you wipe and re-index after all your schema changes? -Peter On Fri, May 15, 2009 at 9:14 PM, vivek sar wrote: > Some more info, > > Profiling the heap dump shows > "org.apache.lucene.index.ReadOnlySegmentReader" as the biggest object > - taking up almost 80% of total memory (6G) - see the attached screen > shot for a smaller dump. There is some norms object - not sure where > are they coming from as I've omitnorms=true for all indexed records. > > I also noticed that if I run a query - let's say generic query that > hits 100million records and then follow up with a specific query - > which hits only 1 record, the second query causes the increase in > heap. > > Looks like there are few bytes being loaded into memory for each > document - I've checked the schema all indexes have omitNorms=true, > all caches are commented out - still looking to see what else might > put things in memory which don't get collected by GC. > > I also saw, https://issues.apache.org/jira/browse/SOLR- for Solr > 1.4 (which I'm using). Not sure if that can cause any problem. I do > use range queries for dates - would that have any effect? > > Any other ideas? > > Thanks, > -vivek > > On Thu, May 14, 2009 at 8:38 PM, vivek sar wrote: >> Thanks Mark. >> >> I checked all the items you mentioned, >> >> 1) I've omitnorms=true for all my indexed fields (stored only fields I >> guess doesn't matter) >> 2) I've tried commenting out all caches in the solrconfig.xml, but >> that doesn't help much >> 3) I've tried commenting out the first and new searcher listeners >> settings in the solrconfig.xml - the only way that helps is that at >> startup time the memory usage doesn't spike up - that's only because >> there is no auto-warmer query to run. But, I noticed commenting out >> searchers slows down any other queries to Solr. >> 4) I don't have any sort or facet in my queries >> 5) I'm not sure how to change the "Lucene term interval" from Solr - >> is there a way to do that? >> >> I've been playing around with this memory thing the whole day and have >> found that it's the search that's hogging the memory. Any time there >> is a search on all the records (800 million) the heap consumption >> jumps by 5G. This makes me think there has to be some configuration in >> Solr that's causing some terms per document to be loaded in memory. >> >> I've posted my settings several times on this forum, but no one has >> been able to pin point what configuration might be causing this. If >> someone is interested I can attach the solrconfig and schema files as >> well. Here are the settings again under Query tag, >> >> >> 1024 >> true >> 50 >> 200 >> >> false >> 2 >> >> >> and schema, >> >> > required="true" omitNorms="true" compressed="false"/> >> >> > compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > default="NOW/HOUR" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > compressed="false"/> >> > compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > compressed="false"/> >> > compressed="false"/> >> > compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > compressed="false"/> >> > default="NOW/HOUR" omitNorms="true"/> >> >> >> > omitNorms="true" multiValued="true"/> >> >> Any help is greatly appreciated. >> >> Thanks, >> -vivek >> >> On Thu, May 14, 2009 at 6:22 PM, Mark Miller wrote: >>> 800 million docs is on the high side for modern hardware. >>> >>> If even one field has norms on, your talking almost 800 MB right there. And >>> then if another Searcher is brought up well the old one is serving (which >>> happens when you update)? Doubled. >>> >>> Your best bet is to distribute across a couple machines. >>> >>> To minimize you would want to turn off or down caching, don't facet, don't >>> sort, turn off all norms, possibly get at the Lucene term interval and raise >>> it. Drop on deck searchers setting. Even then, 800 million...time to >>> distribute I'd think. >>> >>> vivek sar wrote: Some update on this issue, 1) I attached jconsole to my app and monitored the memory usage. During indexing the memory usage goes up and down, which I think is normal. The memory remains around the min heap size (4 G) for indexing, but as soon as I run a search the tenured heap usage jumps up to 6G and remains there. Subsequent searches increases the heap usage even more until it reaches the max (8G) - after which everything (indexing and searching becomes slow). Th
Re: Solr memory requirements?
Some more info, Profiling the heap dump shows "org.apache.lucene.index.ReadOnlySegmentReader" as the biggest object - taking up almost 80% of total memory (6G) - see the attached screen shot for a smaller dump. There is some norms object - not sure where are they coming from as I've omitnorms=true for all indexed records. I also noticed that if I run a query - let's say generic query that hits 100million records and then follow up with a specific query - which hits only 1 record, the second query causes the increase in heap. Looks like there are few bytes being loaded into memory for each document - I've checked the schema all indexes have omitNorms=true, all caches are commented out - still looking to see what else might put things in memory which don't get collected by GC. I also saw, https://issues.apache.org/jira/browse/SOLR- for Solr 1.4 (which I'm using). Not sure if that can cause any problem. I do use range queries for dates - would that have any effect? Any other ideas? Thanks, -vivek On Thu, May 14, 2009 at 8:38 PM, vivek sar wrote: > Thanks Mark. > > I checked all the items you mentioned, > > 1) I've omitnorms=true for all my indexed fields (stored only fields I > guess doesn't matter) > 2) I've tried commenting out all caches in the solrconfig.xml, but > that doesn't help much > 3) I've tried commenting out the first and new searcher listeners > settings in the solrconfig.xml - the only way that helps is that at > startup time the memory usage doesn't spike up - that's only because > there is no auto-warmer query to run. But, I noticed commenting out > searchers slows down any other queries to Solr. > 4) I don't have any sort or facet in my queries > 5) I'm not sure how to change the "Lucene term interval" from Solr - > is there a way to do that? > > I've been playing around with this memory thing the whole day and have > found that it's the search that's hogging the memory. Any time there > is a search on all the records (800 million) the heap consumption > jumps by 5G. This makes me think there has to be some configuration in > Solr that's causing some terms per document to be loaded in memory. > > I've posted my settings several times on this forum, but no one has > been able to pin point what configuration might be causing this. If > someone is interested I can attach the solrconfig and schema files as > well. Here are the settings again under Query tag, > > > 1024 > true > 50 > 200 > > false > 2 > > > and schema, > > required="true" omitNorms="true" compressed="false"/> > > compressed="false"/> > omitNorms="true" compressed="false"/> > omitNorms="true" compressed="false"/> > omitNorms="true" compressed="false"/> > default="NOW/HOUR" compressed="false"/> > omitNorms="true" compressed="false"/> > omitNorms="true" compressed="false"/> > compressed="false"/> > compressed="false"/> > omitNorms="true" compressed="false"/> > omitNorms="true" compressed="false"/> > omitNorms="true" compressed="false"/> > omitNorms="true" compressed="false"/> > omitNorms="true" compressed="false"/> > compressed="false"/> > compressed="false"/> > compressed="false"/> > omitNorms="true" compressed="false"/> > compressed="false"/> > default="NOW/HOUR" omitNorms="true"/> > > > omitNorms="true" multiValued="true"/> > > Any help is greatly appreciated. > > Thanks, > -vivek > > On Thu, May 14, 2009 at 6:22 PM, Mark Miller wrote: >> 800 million docs is on the high side for modern hardware. >> >> If even one field has norms on, your talking almost 800 MB right there. And >> then if another Searcher is brought up well the old one is serving (which >> happens when you update)? Doubled. >> >> Your best bet is to distribute across a couple machines. >> >> To minimize you would want to turn off or down caching, don't facet, don't >> sort, turn off all norms, possibly get at the Lucene term interval and raise >> it. Drop on deck searchers setting. Even then, 800 million...time to >> distribute I'd think. >> >> vivek sar wrote: >>> >>> Some update on this issue, >>> >>> 1) I attached jconsole to my app and monitored the memory usage. >>> During indexing the memory usage goes up and down, which I think is >>> normal. The memory remains around the min heap size (4 G) for >>> indexing, but as soon as I run a search the tenured heap usage jumps >>> up to 6G and remains there. Subsequent searches increases the heap >>> usage even more until it reaches the max (8G) - after which everything >>> (indexing and searching becomes slow). >>> >>> The search query is a very generic one in this case which goes through >>> all the cores (4 of them - 800 million records), finds 400million >>> matches and returns 100 rows. >>> >>> Does the Solr searcher holds up the reference to objects in memory? I >>> couldn't find any settings that would tell me it does, but every >>> search causing heap to go up is definitely suspicious. >>> >>> 2) I ran the jmap histo to get the top obje
Re: Solr memory requirements?
Thanks Mark. I checked all the items you mentioned, 1) I've omitnorms=true for all my indexed fields (stored only fields I guess doesn't matter) 2) I've tried commenting out all caches in the solrconfig.xml, but that doesn't help much 3) I've tried commenting out the first and new searcher listeners settings in the solrconfig.xml - the only way that helps is that at startup time the memory usage doesn't spike up - that's only because there is no auto-warmer query to run. But, I noticed commenting out searchers slows down any other queries to Solr. 4) I don't have any sort or facet in my queries 5) I'm not sure how to change the "Lucene term interval" from Solr - is there a way to do that? I've been playing around with this memory thing the whole day and have found that it's the search that's hogging the memory. Any time there is a search on all the records (800 million) the heap consumption jumps by 5G. This makes me think there has to be some configuration in Solr that's causing some terms per document to be loaded in memory. I've posted my settings several times on this forum, but no one has been able to pin point what configuration might be causing this. If someone is interested I can attach the solrconfig and schema files as well. Here are the settings again under Query tag, 1024 true 50 200 false 2 and schema, Any help is greatly appreciated. Thanks, -vivek On Thu, May 14, 2009 at 6:22 PM, Mark Miller wrote: > 800 million docs is on the high side for modern hardware. > > If even one field has norms on, your talking almost 800 MB right there. And > then if another Searcher is brought up well the old one is serving (which > happens when you update)? Doubled. > > Your best bet is to distribute across a couple machines. > > To minimize you would want to turn off or down caching, don't facet, don't > sort, turn off all norms, possibly get at the Lucene term interval and raise > it. Drop on deck searchers setting. Even then, 800 million...time to > distribute I'd think. > > vivek sar wrote: >> >> Some update on this issue, >> >> 1) I attached jconsole to my app and monitored the memory usage. >> During indexing the memory usage goes up and down, which I think is >> normal. The memory remains around the min heap size (4 G) for >> indexing, but as soon as I run a search the tenured heap usage jumps >> up to 6G and remains there. Subsequent searches increases the heap >> usage even more until it reaches the max (8G) - after which everything >> (indexing and searching becomes slow). >> >> The search query is a very generic one in this case which goes through >> all the cores (4 of them - 800 million records), finds 400million >> matches and returns 100 rows. >> >> Does the Solr searcher holds up the reference to objects in memory? I >> couldn't find any settings that would tell me it does, but every >> search causing heap to go up is definitely suspicious. >> >> 2) I ran the jmap histo to get the top objects (this is on a smaller >> instance with 2 G memory, this is before running search - after >> running search I wasn't able to run jmap), >> >> num #instances #bytes class name >> -- >> 1: 3890855 222608992 [C >> 2: 3891673 155666920 java.lang.String >> 3: 3284341 131373640 org.apache.lucene.index.TermInfo >> 4: 3334198 106694336 org.apache.lucene.index.Term >> 5: 271 26286496 [J >> 6: 16 26273936 [Lorg.apache.lucene.index.Term; >> 7: 16 26273936 [Lorg.apache.lucene.index.TermInfo; >> 8: 320512 15384576 >> org.apache.lucene.index.FreqProxTermsWriter$PostingList >> 9: 10335 11554136 [I >> >> I'm not sure what's the first one (C)? I couldn't profile it to know >> what all the Strings are being allocated by - any ideas? >> >> Any ideas on what Searcher might be holding on and how can we change >> that behavior? >> >> Thanks, >> -vivek >> >> >> On Thu, May 14, 2009 at 11:33 AM, vivek sar wrote: >> >>> >>> I don't know if field type has any impact on the memory usage - does it? >>> >>> Our use cases require complete matches, thus there is no need of any >>> analysis in most cases - does it matter in terms of memory usage? >>> >>> Also, is there any default caching used by Solr if I comment out all >>> the caches under query in solrconfig.xml? I also don't have any >>> auto-warming queries. >>> >>> Thanks, >>> -vivek >>> >>> On Wed, May 13, 2009 at 4:24 PM, Erick Erickson >>> wrote: >>> Warning: I'm wy out of my competency range when I comment on SOLR, but I've seen the statement that string fields are NOT tokenized while text fields are, and I notice that almost all of your fields are string type. Would someone more knowledgeable than me care to comment on whether t
Re: Solr memory requirements?
800 million docs is on the high side for modern hardware. If even one field has norms on, your talking almost 800 MB right there. And then if another Searcher is brought up well the old one is serving (which happens when you update)? Doubled. Your best bet is to distribute across a couple machines. To minimize you would want to turn off or down caching, don't facet, don't sort, turn off all norms, possibly get at the Lucene term interval and raise it. Drop on deck searchers setting. Even then, 800 million...time to distribute I'd think. vivek sar wrote: Some update on this issue, 1) I attached jconsole to my app and monitored the memory usage. During indexing the memory usage goes up and down, which I think is normal. The memory remains around the min heap size (4 G) for indexing, but as soon as I run a search the tenured heap usage jumps up to 6G and remains there. Subsequent searches increases the heap usage even more until it reaches the max (8G) - after which everything (indexing and searching becomes slow). The search query is a very generic one in this case which goes through all the cores (4 of them - 800 million records), finds 400million matches and returns 100 rows. Does the Solr searcher holds up the reference to objects in memory? I couldn't find any settings that would tell me it does, but every search causing heap to go up is definitely suspicious. 2) I ran the jmap histo to get the top objects (this is on a smaller instance with 2 G memory, this is before running search - after running search I wasn't able to run jmap), num #instances #bytes class name -- 1: 3890855 222608992 [C 2: 3891673 155666920 java.lang.String 3: 3284341 131373640 org.apache.lucene.index.TermInfo 4: 3334198 106694336 org.apache.lucene.index.Term 5: 271 26286496 [J 6:16 26273936 [Lorg.apache.lucene.index.Term; 7:16 26273936 [Lorg.apache.lucene.index.TermInfo; 8:320512 15384576 org.apache.lucene.index.FreqProxTermsWriter$PostingList 9: 10335 11554136 [I I'm not sure what's the first one (C)? I couldn't profile it to know what all the Strings are being allocated by - any ideas? Any ideas on what Searcher might be holding on and how can we change that behavior? Thanks, -vivek On Thu, May 14, 2009 at 11:33 AM, vivek sar wrote: I don't know if field type has any impact on the memory usage - does it? Our use cases require complete matches, thus there is no need of any analysis in most cases - does it matter in terms of memory usage? Also, is there any default caching used by Solr if I comment out all the caches under query in solrconfig.xml? I also don't have any auto-warming queries. Thanks, -vivek On Wed, May 13, 2009 at 4:24 PM, Erick Erickson wrote: Warning: I'm wy out of my competency range when I comment on SOLR, but I've seen the statement that string fields are NOT tokenized while text fields are, and I notice that almost all of your fields are string type. Would someone more knowledgeable than me care to comment on whether this is at all relevant? Offered in the spirit that sometimes there are things so basic that only an amateur can see them Best Erick On Wed, May 13, 2009 at 4:42 PM, vivek sar wrote: Thanks Otis. Our use case doesn't require any sorting or faceting. I'm wondering if I've configured anything wrong. I got total of 25 fields (15 are indexed and stored, other 10 are just stored). All my fields are basic data type - which I thought are not sorted. My id field is unique key. Is there any field here that might be getting sorted? Thanks, -vivek On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic wrote: Hi, Some answers: 1) .tii files in the Lucene index. When you sort, all distinct values for the field(s) used for sorting. Similarly for facet fields. Solr caches. 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will consume during indexing. There is no need to commit every 50K docs unless you want to trigger snapshot creation. 3) see 1) above 1.5 billion docs per instance where each doc is cca 1KB? I doubt that's going to fly. :) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: vivek sar To: solr-user@lucene.apache.org Sent: Wednesday, May 13, 2009 3:04:46 PM Subject: Solr memory requirements? Hi, I'm pretty sure this has been asked before, but I couldn't find a complete answer in the forum archive. Here are my questions, 1) When solr starts up what does it loads up in the memory? Let's say I've 4 cores with each core 50G in size. When Solr comes up how much of it would be loaded in memory? 2
Re: Solr memory requirements?
Some update on this issue, 1) I attached jconsole to my app and monitored the memory usage. During indexing the memory usage goes up and down, which I think is normal. The memory remains around the min heap size (4 G) for indexing, but as soon as I run a search the tenured heap usage jumps up to 6G and remains there. Subsequent searches increases the heap usage even more until it reaches the max (8G) - after which everything (indexing and searching becomes slow). The search query is a very generic one in this case which goes through all the cores (4 of them - 800 million records), finds 400million matches and returns 100 rows. Does the Solr searcher holds up the reference to objects in memory? I couldn't find any settings that would tell me it does, but every search causing heap to go up is definitely suspicious. 2) I ran the jmap histo to get the top objects (this is on a smaller instance with 2 G memory, this is before running search - after running search I wasn't able to run jmap), num #instances #bytes class name -- 1: 3890855 222608992 [C 2: 3891673 155666920 java.lang.String 3: 3284341 131373640 org.apache.lucene.index.TermInfo 4: 3334198 106694336 org.apache.lucene.index.Term 5: 271 26286496 [J 6:16 26273936 [Lorg.apache.lucene.index.Term; 7:16 26273936 [Lorg.apache.lucene.index.TermInfo; 8:320512 15384576 org.apache.lucene.index.FreqProxTermsWriter$PostingList 9: 10335 11554136 [I I'm not sure what's the first one (C)? I couldn't profile it to know what all the Strings are being allocated by - any ideas? Any ideas on what Searcher might be holding on and how can we change that behavior? Thanks, -vivek On Thu, May 14, 2009 at 11:33 AM, vivek sar wrote: > I don't know if field type has any impact on the memory usage - does it? > > Our use cases require complete matches, thus there is no need of any > analysis in most cases - does it matter in terms of memory usage? > > Also, is there any default caching used by Solr if I comment out all > the caches under query in solrconfig.xml? I also don't have any > auto-warming queries. > > Thanks, > -vivek > > On Wed, May 13, 2009 at 4:24 PM, Erick Erickson > wrote: >> Warning: I'm wy out of my competency range when I comment >> on SOLR, but I've seen the statement that string fields are NOT >> tokenized while text fields are, and I notice that almost all of your fields >> are string type. >> >> Would someone more knowledgeable than me care to comment on whether >> this is at all relevant? Offered in the spirit that sometimes there are >> things >> so basic that only an amateur can see them >> >> Best >> Erick >> >> On Wed, May 13, 2009 at 4:42 PM, vivek sar wrote: >> >>> Thanks Otis. >>> >>> Our use case doesn't require any sorting or faceting. I'm wondering if >>> I've configured anything wrong. >>> >>> I got total of 25 fields (15 are indexed and stored, other 10 are just >>> stored). All my fields are basic data type - which I thought are not >>> sorted. My id field is unique key. >>> >>> Is there any field here that might be getting sorted? >>> >>> >> required="true" omitNorms="true" compressed="false"/> >>> >>> >> compressed="false"/> >>> >> omitNorms="true" compressed="false"/> >>> >> omitNorms="true" compressed="false"/> >>> >> omitNorms="true" compressed="false"/> >>> >> default="NOW/HOUR" compressed="false"/> >>> >> omitNorms="true" compressed="false"/> >>> >> omitNorms="true" compressed="false"/> >>> >> compressed="false"/> >>> >> compressed="false"/> >>> >> omitNorms="true" compressed="false"/> >>> >> omitNorms="true" compressed="false"/> >>> >> omitNorms="true" compressed="false"/> >>> >> omitNorms="true" compressed="false"/> >>> >> omitNorms="true" compressed="false"/> >>> >> compressed="false"/> >>> >> compressed="false"/> >>> >> compressed="false"/> >>> >> omitNorms="true" compressed="false"/> >>> >> compressed="false"/> >>> >> default="NOW/HOUR" omitNorms="true"/> >>> >>> >>> >>> >> omitNorms="true" multiValued="true"/> >>> >>> Thanks, >>> -vivek >>> >>> On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic >>> wrote: >>> > >>> > Hi, >>> > Some answers: >>> > 1) .tii files in the Lucene index. When you sort, all distinct values >>> for the field(s) used for sorting. Similarly for facet fields. Solr >>> caches. >>> > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will >>> consume during indexing. There is no need to commit every 50K docs unless >>> you want to trigger snapshot creation. >>> > 3) see 1) above >>> > >>> > 1.5 billion docs per instance where each doc is cca 1KB? I doubt that's >>> going to fly. :) >>> > >>> > Otis >>> > -- >>> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >>> > >>> > >>> > >>> > - Original Messag
Re: Solr memory requirements?
I don't know if field type has any impact on the memory usage - does it? Our use cases require complete matches, thus there is no need of any analysis in most cases - does it matter in terms of memory usage? Also, is there any default caching used by Solr if I comment out all the caches under query in solrconfig.xml? I also don't have any auto-warming queries. Thanks, -vivek On Wed, May 13, 2009 at 4:24 PM, Erick Erickson wrote: > Warning: I'm wy out of my competency range when I comment > on SOLR, but I've seen the statement that string fields are NOT > tokenized while text fields are, and I notice that almost all of your fields > are string type. > > Would someone more knowledgeable than me care to comment on whether > this is at all relevant? Offered in the spirit that sometimes there are > things > so basic that only an amateur can see them > > Best > Erick > > On Wed, May 13, 2009 at 4:42 PM, vivek sar wrote: > >> Thanks Otis. >> >> Our use case doesn't require any sorting or faceting. I'm wondering if >> I've configured anything wrong. >> >> I got total of 25 fields (15 are indexed and stored, other 10 are just >> stored). All my fields are basic data type - which I thought are not >> sorted. My id field is unique key. >> >> Is there any field here that might be getting sorted? >> >> > required="true" omitNorms="true" compressed="false"/> >> >> > compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > default="NOW/HOUR" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > compressed="false"/> >> > compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > compressed="false"/> >> > compressed="false"/> >> > compressed="false"/> >> > omitNorms="true" compressed="false"/> >> > compressed="false"/> >> > default="NOW/HOUR" omitNorms="true"/> >> >> >> >> > omitNorms="true" multiValued="true"/> >> >> Thanks, >> -vivek >> >> On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic >> wrote: >> > >> > Hi, >> > Some answers: >> > 1) .tii files in the Lucene index. When you sort, all distinct values >> for the field(s) used for sorting. Similarly for facet fields. Solr >> caches. >> > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will >> consume during indexing. There is no need to commit every 50K docs unless >> you want to trigger snapshot creation. >> > 3) see 1) above >> > >> > 1.5 billion docs per instance where each doc is cca 1KB? I doubt that's >> going to fly. :) >> > >> > Otis >> > -- >> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> > >> > >> > >> > - Original Message >> >> From: vivek sar >> >> To: solr-user@lucene.apache.org >> >> Sent: Wednesday, May 13, 2009 3:04:46 PM >> >> Subject: Solr memory requirements? >> >> >> >> Hi, >> >> >> >> I'm pretty sure this has been asked before, but I couldn't find a >> >> complete answer in the forum archive. Here are my questions, >> >> >> >> 1) When solr starts up what does it loads up in the memory? Let's say >> >> I've 4 cores with each core 50G in size. When Solr comes up how much >> >> of it would be loaded in memory? >> >> >> >> 2) How much memory is required during index time? If I'm committing >> >> 50K records at a time (1 record = 1KB) using solrj, how much memory do >> >> I need to give to Solr. >> >> >> >> 3) Is there a minimum memory requirement by Solr to maintain a certain >> >> size index? Is there any benchmark on this? >> >> >> >> Here are some of my configuration from solrconfig.xml, >> >> >> >> 1) 64 >> >> 2) All the caches (under query tag) are commented out >> >> 3) Few others, >> >> a) true ==> >> >> would this require memory? >> >> b) 50 >> >> c) 200 >> >> d) >> >> e) false >> >> f) 2 >> >> >> >> The problem we are having is following, >> >> >> >> I've given Solr RAM of 6G. As the total index size (all cores >> >> combined) start growing the Solr memory consumption goes up. With 800 >> >> million documents, I see Solr already taking up all the memory at >> >> startup. After that the commits, searches everything become slow. We >> >> will be having distributed setup with multiple Solr instances (around >> >> 8) on four boxes, but our requirement is to have each Solr instance at >> >> least maintain around 1.5 billion documents. >> >> >> >> We are trying to see if we can somehow reduce the Solr memory >> >> footprint. If someone can provide a pointer on what parameters affect >> >> memory and what effects it has we can then decide whether we want that >> >> parameter or not. I'm not sure if there is any minimum Solr >> >> requirement for it to be able mainta
Re: Solr memory requirements?
Otis, We are not running master-slave configuration. We get very few searches(admin only) in a day so we didn't see the need of replication/snapshot. This problem is with one Solr instance managing 4 cores (each core 200 million records). Both indexing and searching is performed by the same Solr instance. What are .tii files used for? I see this file under only one core. Still looking for what gets loaded in heap by Solr (during load time, indexing, and searching) and stays there. I see most of these are tenured objects and not getting released by GC - will post profile records tomorrow. Thanks, -vivek On Wed, May 13, 2009 at 6:34 PM, Otis Gospodnetic wrote: > > There is constant mixing of indexing concepts and searching concepts in this > thread. Are you having problems on the master (indexing) or on the slave > (searching)? > > > That .tii is only 20K and you said this is a large index? That doesn't smell > right... > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >> From: vivek sar >> To: solr-user@lucene.apache.org >> Sent: Wednesday, May 13, 2009 5:12:00 PM >> Subject: Re: Solr memory requirements? >> >> Otis, >> >> In that case, I'm not sure why Solr is taking up so much memory as >> soon as we start it up. I checked for .tii file and there is only one, >> >> -rw-r--r-- 1 search staff 20306 May 11 21:47 >> ./20090510_1/data/index/_3au.tii >> >> I have all the cache disabled - so that shouldn't be a problem too. My >> ramBuffer size is only 64MB. >> >> I read note on sorting, >> http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see >> something related to FieldCache. I don't see this as parameter defined >> in either solrconfig.xml or schema.xml. Could this be something that >> can load things in memory at startup? How can we disable it? >> >> I'm trying to find out if there is a way to tell how much memory Solr >> would consume and way to cap it. >> >> Thanks, >> -vivek >> >> >> >> >> On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic >> wrote: >> > >> > Hi, >> > >> > Sorting is triggered by the sort parameter in the URL, not a >> > characteristic of >> a field. :) >> > >> > Otis >> > -- >> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> > >> > >> > >> > - Original Message >> >> From: vivek sar >> >> To: solr-user@lucene.apache.org >> >> Sent: Wednesday, May 13, 2009 4:42:16 PM >> >> Subject: Re: Solr memory requirements? >> >> >> >> Thanks Otis. >> >> >> >> Our use case doesn't require any sorting or faceting. I'm wondering if >> >> I've configured anything wrong. >> >> >> >> I got total of 25 fields (15 are indexed and stored, other 10 are just >> >> stored). All my fields are basic data type - which I thought are not >> >> sorted. My id field is unique key. >> >> >> >> Is there any field here that might be getting sorted? >> >> >> >> >> >> required="true" omitNorms="true" compressed="false"/> >> >> >> >> >> >> compressed="false"/> >> >> >> >> omitNorms="true" compressed="false"/> >> >> >> >> omitNorms="true" compressed="false"/> >> >> >> >> omitNorms="true" compressed="false"/> >> >> >> >> default="NOW/HOUR" compressed="false"/> >> >> >> >> omitNorms="true" compressed="false"/> >> >> >> >> omitNorms="true" compressed="false"/> >> >> >> >> compressed="false"/> >> >> >> >> compressed="false"/> >> >> >> >> omitNorms="true" compressed="false"/> >> >> >> >> omitNorms="true" compressed="false"/> >> >> >> >> omitNorms="true" compressed="false"/> >> >> >> >> omitNorms="true" compressed="false"/> >> >> >> >> omitNorms="true" compressed="false"/> >> >> >> >> compressed="false&qu
Re: Solr memory requirements?
On May 13, 2009, at 6:53 PM, vivek sar wrote: Disabling first/new searchers did help for the initial load time, but after 10-15 min the heap memory start climbing up again and reached max within 20 min. Now the GC is coming up all the time, which is slowing down the commit and search cycles. This is still puzzling what does Solr holds in the memory and doesn't release? I haven't been able to profile as the dump is too big. Would setting termIndexInterval help - not sure how can that be set using Solr. It would have to be set in the same place that the ramBufferSizeMB gets set, in the config, but this would require some coding (albeit pretty straightforward) to set it on the IndexWriter. I don't think it would help in profiling. Do you have warming queries? (Sorry if I missed your answer) Also, I know you have set the heap to 8 gbs. Is there a size you can get to that it levels out at? I presume you are getting Out Of Memory, right? Or, are you just concerned about the current mem. size?
Re: Solr memory requirements?
There is constant mixing of indexing concepts and searching concepts in this thread. Are you having problems on the master (indexing) or on the slave (searching)? That .tii is only 20K and you said this is a large index? That doesn't smell right... Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: vivek sar > To: solr-user@lucene.apache.org > Sent: Wednesday, May 13, 2009 5:12:00 PM > Subject: Re: Solr memory requirements? > > Otis, > > In that case, I'm not sure why Solr is taking up so much memory as > soon as we start it up. I checked for .tii file and there is only one, > > -rw-r--r-- 1 search staff 20306 May 11 21:47 > ./20090510_1/data/index/_3au.tii > > I have all the cache disabled - so that shouldn't be a problem too. My > ramBuffer size is only 64MB. > > I read note on sorting, > http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see > something related to FieldCache. I don't see this as parameter defined > in either solrconfig.xml or schema.xml. Could this be something that > can load things in memory at startup? How can we disable it? > > I'm trying to find out if there is a way to tell how much memory Solr > would consume and way to cap it. > > Thanks, > -vivek > > > > > On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic > wrote: > > > > Hi, > > > > Sorting is triggered by the sort parameter in the URL, not a characteristic > > of > a field. :) > > > > Otis > > -- > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > > > - Original Message > >> From: vivek sar > >> To: solr-user@lucene.apache.org > >> Sent: Wednesday, May 13, 2009 4:42:16 PM > >> Subject: Re: Solr memory requirements? > >> > >> Thanks Otis. > >> > >> Our use case doesn't require any sorting or faceting. I'm wondering if > >> I've configured anything wrong. > >> > >> I got total of 25 fields (15 are indexed and stored, other 10 are just > >> stored). All my fields are basic data type - which I thought are not > >> sorted. My id field is unique key. > >> > >> Is there any field here that might be getting sorted? > >> > >> > >> required="true" omitNorms="true" compressed="false"/> > >> > >> > >> compressed="false"/> > >> > >> omitNorms="true" compressed="false"/> > >> > >> omitNorms="true" compressed="false"/> > >> > >> omitNorms="true" compressed="false"/> > >> > >> default="NOW/HOUR" compressed="false"/> > >> > >> omitNorms="true" compressed="false"/> > >> > >> omitNorms="true" compressed="false"/> > >> > >> compressed="false"/> > >> > >> compressed="false"/> > >> > >> omitNorms="true" compressed="false"/> > >> > >> omitNorms="true" compressed="false"/> > >> > >> omitNorms="true" compressed="false"/> > >> > >> omitNorms="true" compressed="false"/> > >> > >> omitNorms="true" compressed="false"/> > >> > >> compressed="false"/> > >> > >> compressed="false"/> > >> > >> compressed="false"/> > >> > >> omitNorms="true" compressed="false"/> > >> > >> compressed="false"/> > >> > >> default="NOW/HOUR" omitNorms="true"/> > >> > >> > >> > >> > >> omitNorms="true" multiValued="true"/> > >> > >> Thanks, > >> -vivek > >> > >> On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic > >> wrote: > >> > > >> > Hi, > >> > Some answers: > >> > 1) .tii files in the Lucene index. When you sort, all distinct values > >> > for > the > >> field(s) used for sorting. Similarly for facet fields. Solr caches. > >> > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will > consume > >> during indexing. There is no need to commit every 50K d
Re: Solr memory requirements?
Yeah, I'm not sure why this would help. There should be nothing in FieldCaches unless you sort or use facets. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: vivek sar > To: solr-user@lucene.apache.org > Sent: Wednesday, May 13, 2009 5:53:45 PM > Subject: Re: Solr memory requirements? > > Just an update on the memory issue - might be useful for others. I > read the following, > > http://wiki.apache.org/solr/SolrCaching?highlight=(SolrCaching) > > and looks like the first and new searcher listeners would populate the > FieldCache. Commenting out these two listener entries seems to do the > trick - at least the heap size is not growing as soon as Solr starts > up. > > I ran some searches and they all came out fine. Index rate is also > pretty good. Would there be any impact of disabling these listeners? > > Thanks, > -vivek > > On Wed, May 13, 2009 at 2:12 PM, vivek sar wrote: > > Otis, > > > > In that case, I'm not sure why Solr is taking up so much memory as > > soon as we start it up. I checked for .tii file and there is only one, > > > > -rw-r--r-- 1 search staff 20306 May 11 21:47 > ./20090510_1/data/index/_3au.tii > > > > I have all the cache disabled - so that shouldn't be a problem too. My > > ramBuffer size is only 64MB. > > > > I read note on sorting, > > http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see > > something related to FieldCache. I don't see this as parameter defined > > in either solrconfig.xml or schema.xml. Could this be something that > > can load things in memory at startup? How can we disable it? > > > > I'm trying to find out if there is a way to tell how much memory Solr > > would consume and way to cap it. > > > > Thanks, > > -vivek > > > > > > > > > > On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic > > wrote: > >> > >> Hi, > >> > >> Sorting is triggered by the sort parameter in the URL, not a > >> characteristic > of a field. :) > >> > >> Otis > >> -- > >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > >> > >> > >> > >> - Original Message > >>> From: vivek sar > >>> To: solr-user@lucene.apache.org > >>> Sent: Wednesday, May 13, 2009 4:42:16 PM > >>> Subject: Re: Solr memory requirements? > >>> > >>> Thanks Otis. > >>> > >>> Our use case doesn't require any sorting or faceting. I'm wondering if > >>> I've configured anything wrong. > >>> > >>> I got total of 25 fields (15 are indexed and stored, other 10 are just > >>> stored). All my fields are basic data type - which I thought are not > >>> sorted. My id field is unique key. > >>> > >>> Is there any field here that might be getting sorted? > >>> > >>> > >>> required="true" omitNorms="true" compressed="false"/> > >>> > >>> > >>> compressed="false"/> > >>> > >>> omitNorms="true" compressed="false"/> > >>> > >>> omitNorms="true" compressed="false"/> > >>> > >>> omitNorms="true" compressed="false"/> > >>> > >>> default="NOW/HOUR" compressed="false"/> > >>> > >>> omitNorms="true" compressed="false"/> > >>> > >>> omitNorms="true" compressed="false"/> > >>> > >>> compressed="false"/> > >>> > >>> compressed="false"/> > >>> > >>> omitNorms="true" compressed="false"/> > >>> > >>> omitNorms="true" compressed="false"/> > >>> > >>> omitNorms="true" compressed="false"/> > >>> > >>> omitNorms="true" compressed="false"/> > >>> > >>> omitNorms="true" compressed="false"/> > >>> > >>> compressed="false"/> > >>> > >>> compressed="false"/> > >>> > >>> compressed="false"/> > >>> > >>> omitNorms="true" compressed="false"/>
Re: Solr memory requirements?
Even a simple command like this will help: jmap -histo:live | head -30 Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: vivek sar > To: solr-user@lucene.apache.org > Sent: Wednesday, May 13, 2009 6:53:29 PM > Subject: Re: Solr memory requirements? > > Disabling first/new searchers did help for the initial load time, but > after 10-15 min the heap memory start climbing up again and reached > max within 20 min. Now the GC is coming up all the time, which is > slowing down the commit and search cycles. > > This is still puzzling what does Solr holds in the memory and doesn't release? > > I haven't been able to profile as the dump is too big. Would setting > termIndexInterval help - not sure how can that be set using Solr. > > Some other query properties under solrconfig, > > > 1024 > true > 50 > 200 > > false > 2 > > > Currently, I got 800 million documents and have specified 8G heap size. > > Any other suggestion on what can I do to control the Solr memory consumption? > > Thanks, > -vivek > > On Wed, May 13, 2009 at 2:53 PM, vivek sar wrote: > > Just an update on the memory issue - might be useful for others. I > > read the following, > > > > http://wiki.apache.org/solr/SolrCaching?highlight=(SolrCaching) > > > > and looks like the first and new searcher listeners would populate the > > FieldCache. Commenting out these two listener entries seems to do the > > trick - at least the heap size is not growing as soon as Solr starts > > up. > > > > I ran some searches and they all came out fine. Index rate is also > > pretty good. Would there be any impact of disabling these listeners? > > > > Thanks, > > -vivek > > > > On Wed, May 13, 2009 at 2:12 PM, vivek sar wrote: > >> Otis, > >> > >> In that case, I'm not sure why Solr is taking up so much memory as > >> soon as we start it up. I checked for .tii file and there is only one, > >> > >> -rw-r--r-- 1 search staff 20306 May 11 21:47 > ./20090510_1/data/index/_3au.tii > >> > >> I have all the cache disabled - so that shouldn't be a problem too. My > >> ramBuffer size is only 64MB. > >> > >> I read note on sorting, > >> http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see > >> something related to FieldCache. I don't see this as parameter defined > >> in either solrconfig.xml or schema.xml. Could this be something that > >> can load things in memory at startup? How can we disable it? > >> > >> I'm trying to find out if there is a way to tell how much memory Solr > >> would consume and way to cap it. > >> > >> Thanks, > >> -vivek > >> > >> > >> > >> > >> On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic > >> wrote: > >>> > >>> Hi, > >>> > >>> Sorting is triggered by the sort parameter in the URL, not a > >>> characteristic > of a field. :) > >>> > >>> Otis > >>> -- > >>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > >>> > >>> > >>> > >>> - Original Message > >>>> From: vivek sar > >>>> To: solr-user@lucene.apache.org > >>>> Sent: Wednesday, May 13, 2009 4:42:16 PM > >>>> Subject: Re: Solr memory requirements? > >>>> > >>>> Thanks Otis. > >>>> > >>>> Our use case doesn't require any sorting or faceting. I'm wondering if > >>>> I've configured anything wrong. > >>>> > >>>> I got total of 25 fields (15 are indexed and stored, other 10 are just > >>>> stored). All my fields are basic data type - which I thought are not > >>>> sorted. My id field is unique key. > >>>> > >>>> Is there any field here that might be getting sorted? > >>>> > >>>> > >>>> required="true" omitNorms="true" compressed="false"/> > >>>> > >>>> > >>>> compressed="false"/> > >>>> > >>>> omitNorms="true" compressed="false"/> > >>>> > >>>> omitNorms="true" compressed="false"/> > >>>> > >>>> omitNorms="true" compressed="
Re: Solr memory requirements?
Warning: I'm wy out of my competency range when I comment on SOLR, but I've seen the statement that string fields are NOT tokenized while text fields are, and I notice that almost all of your fields are string type. Would someone more knowledgeable than me care to comment on whether this is at all relevant? Offered in the spirit that sometimes there are things so basic that only an amateur can see them Best Erick On Wed, May 13, 2009 at 4:42 PM, vivek sar wrote: > Thanks Otis. > > Our use case doesn't require any sorting or faceting. I'm wondering if > I've configured anything wrong. > > I got total of 25 fields (15 are indexed and stored, other 10 are just > stored). All my fields are basic data type - which I thought are not > sorted. My id field is unique key. > > Is there any field here that might be getting sorted? > > required="true" omitNorms="true" compressed="false"/> > >compressed="false"/> >omitNorms="true" compressed="false"/> >omitNorms="true" compressed="false"/> >omitNorms="true" compressed="false"/> >default="NOW/HOUR" compressed="false"/> >omitNorms="true" compressed="false"/> >omitNorms="true" compressed="false"/> >compressed="false"/> >compressed="false"/> >omitNorms="true" compressed="false"/> >omitNorms="true" compressed="false"/> >omitNorms="true" compressed="false"/> >omitNorms="true" compressed="false"/> >omitNorms="true" compressed="false"/> >compressed="false"/> >compressed="false"/> >compressed="false"/> >omitNorms="true" compressed="false"/> >compressed="false"/> >default="NOW/HOUR" omitNorms="true"/> > > > >omitNorms="true" multiValued="true"/> > > Thanks, > -vivek > > On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic > wrote: > > > > Hi, > > Some answers: > > 1) .tii files in the Lucene index. When you sort, all distinct values > for the field(s) used for sorting. Similarly for facet fields. Solr > caches. > > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will > consume during indexing. There is no need to commit every 50K docs unless > you want to trigger snapshot creation. > > 3) see 1) above > > > > 1.5 billion docs per instance where each doc is cca 1KB? I doubt that's > going to fly. :) > > > > Otis > > -- > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > > > - Original Message > >> From: vivek sar > >> To: solr-user@lucene.apache.org > >> Sent: Wednesday, May 13, 2009 3:04:46 PM > >> Subject: Solr memory requirements? > >> > >> Hi, > >> > >> I'm pretty sure this has been asked before, but I couldn't find a > >> complete answer in the forum archive. Here are my questions, > >> > >> 1) When solr starts up what does it loads up in the memory? Let's say > >> I've 4 cores with each core 50G in size. When Solr comes up how much > >> of it would be loaded in memory? > >> > >> 2) How much memory is required during index time? If I'm committing > >> 50K records at a time (1 record = 1KB) using solrj, how much memory do > >> I need to give to Solr. > >> > >> 3) Is there a minimum memory requirement by Solr to maintain a certain > >> size index? Is there any benchmark on this? > >> > >> Here are some of my configuration from solrconfig.xml, > >> > >> 1) 64 > >> 2) All the caches (under query tag) are commented out > >> 3) Few others, > >> a) true==> > >> would this require memory? > >> b) 50 > >> c) 200 > >> d) > >> e) false > >> f) 2 > >> > >> The problem we are having is following, > >> > >> I've given Solr RAM of 6G. As the total index size (all cores > >> combined) start growing the Solr memory consumption goes up. With 800 > >> million documents, I see Solr already taking up all the memory at > >> startup. After that the commits, searches everything become slow. We > >> will be having distributed setup with multiple Solr instances (around > >> 8) on four boxes, but our requirement is to have each Solr instance at > >> least maintain around 1.5 billion documents. > >> > >> We are trying to see if we can somehow reduce the Solr memory > >> footprint. If someone can provide a pointer on what parameters affect > >> memory and what effects it has we can then decide whether we want that > >> parameter or not. I'm not sure if there is any minimum Solr > >> requirement for it to be able maintain large indexes. I've used Lucene > >> before and that didn't require anything by default - it used up memory > >> only during index and search times - not otherwise. > >> > >> Any help is very much appreciated. > >> > >> Thanks, > >> -vivek > > > > >
Re: Solr memory requirements?
I think maxBufferedDocs has been deprecated in Solr 1.4 - it's recommended to use ramBufferSizeMB instead. My ramBufferSizeMB=64. This shouldn't be a problem I think. There has to be something else that Solr is holding up in memory. Anyone else? Thanks, -vivek On Wed, May 13, 2009 at 4:01 PM, Jack Godwin wrote: > Have you checked the maxBufferedDocs? I had to drop mine down to 1000 with > 3 million docs. > Jack > > On Wed, May 13, 2009 at 6:53 PM, vivek sar wrote: > >> Disabling first/new searchers did help for the initial load time, but >> after 10-15 min the heap memory start climbing up again and reached >> max within 20 min. Now the GC is coming up all the time, which is >> slowing down the commit and search cycles. >> >> This is still puzzling what does Solr holds in the memory and doesn't >> release? >> >> I haven't been able to profile as the dump is too big. Would setting >> termIndexInterval help - not sure how can that be set using Solr. >> >> Some other query properties under solrconfig, >> >> >> 1024 >> true >> 50 >> 200 >> >> false >> 2 >> >> >> Currently, I got 800 million documents and have specified 8G heap size. >> >> Any other suggestion on what can I do to control the Solr memory >> consumption? >> >> Thanks, >> -vivek >> >> On Wed, May 13, 2009 at 2:53 PM, vivek sar wrote: >> > Just an update on the memory issue - might be useful for others. I >> > read the following, >> > >> > http://wiki.apache.org/solr/SolrCaching?highlight=(SolrCaching) >> > >> > and looks like the first and new searcher listeners would populate the >> > FieldCache. Commenting out these two listener entries seems to do the >> > trick - at least the heap size is not growing as soon as Solr starts >> > up. >> > >> > I ran some searches and they all came out fine. Index rate is also >> > pretty good. Would there be any impact of disabling these listeners? >> > >> > Thanks, >> > -vivek >> > >> > On Wed, May 13, 2009 at 2:12 PM, vivek sar wrote: >> >> Otis, >> >> >> >> In that case, I'm not sure why Solr is taking up so much memory as >> >> soon as we start it up. I checked for .tii file and there is only one, >> >> >> >> -rw-r--r-- 1 search staff 20306 May 11 21:47 >> ./20090510_1/data/index/_3au.tii >> >> >> >> I have all the cache disabled - so that shouldn't be a problem too. My >> >> ramBuffer size is only 64MB. >> >> >> >> I read note on sorting, >> >> http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see >> >> something related to FieldCache. I don't see this as parameter defined >> >> in either solrconfig.xml or schema.xml. Could this be something that >> >> can load things in memory at startup? How can we disable it? >> >> >> >> I'm trying to find out if there is a way to tell how much memory Solr >> >> would consume and way to cap it. >> >> >> >> Thanks, >> >> -vivek >> >> >> >> >> >> >> >> >> >> On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic >> >> wrote: >> >>> >> >>> Hi, >> >>> >> >>> Sorting is triggered by the sort parameter in the URL, not a >> characteristic of a field. :) >> >>> >> >>> Otis >> >>> -- >> >>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> >>> >> >>> >> >>> >> >>> - Original Message >> >>>> From: vivek sar >> >>>> To: solr-user@lucene.apache.org >> >>>> Sent: Wednesday, May 13, 2009 4:42:16 PM >> >>>> Subject: Re: Solr memory requirements? >> >>>> >> >>>> Thanks Otis. >> >>>> >> >>>> Our use case doesn't require any sorting or faceting. I'm wondering if >> >>>> I've configured anything wrong. >> >>>> >> >>>> I got total of 25 fields (15 are indexed and stored, other 10 are just >> >>>> stored). All my fields are basic data type - which I thought are not >> >>>> sorted. My id field is unique key. >> >>>> >> >>>> Is there any fie
Re: Solr memory requirements?
Have you checked the maxBufferedDocs? I had to drop mine down to 1000 with 3 million docs. Jack On Wed, May 13, 2009 at 6:53 PM, vivek sar wrote: > Disabling first/new searchers did help for the initial load time, but > after 10-15 min the heap memory start climbing up again and reached > max within 20 min. Now the GC is coming up all the time, which is > slowing down the commit and search cycles. > > This is still puzzling what does Solr holds in the memory and doesn't > release? > > I haven't been able to profile as the dump is too big. Would setting > termIndexInterval help - not sure how can that be set using Solr. > > Some other query properties under solrconfig, > > > 1024 > true > 50 > 200 > > false > 2 > > > Currently, I got 800 million documents and have specified 8G heap size. > > Any other suggestion on what can I do to control the Solr memory > consumption? > > Thanks, > -vivek > > On Wed, May 13, 2009 at 2:53 PM, vivek sar wrote: > > Just an update on the memory issue - might be useful for others. I > > read the following, > > > > http://wiki.apache.org/solr/SolrCaching?highlight=(SolrCaching) > > > > and looks like the first and new searcher listeners would populate the > > FieldCache. Commenting out these two listener entries seems to do the > > trick - at least the heap size is not growing as soon as Solr starts > > up. > > > > I ran some searches and they all came out fine. Index rate is also > > pretty good. Would there be any impact of disabling these listeners? > > > > Thanks, > > -vivek > > > > On Wed, May 13, 2009 at 2:12 PM, vivek sar wrote: > >> Otis, > >> > >> In that case, I'm not sure why Solr is taking up so much memory as > >> soon as we start it up. I checked for .tii file and there is only one, > >> > >> -rw-r--r-- 1 search staff 20306 May 11 21:47 > ./20090510_1/data/index/_3au.tii > >> > >> I have all the cache disabled - so that shouldn't be a problem too. My > >> ramBuffer size is only 64MB. > >> > >> I read note on sorting, > >> http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see > >> something related to FieldCache. I don't see this as parameter defined > >> in either solrconfig.xml or schema.xml. Could this be something that > >> can load things in memory at startup? How can we disable it? > >> > >> I'm trying to find out if there is a way to tell how much memory Solr > >> would consume and way to cap it. > >> > >> Thanks, > >> -vivek > >> > >> > >> > >> > >> On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic > >> wrote: > >>> > >>> Hi, > >>> > >>> Sorting is triggered by the sort parameter in the URL, not a > characteristic of a field. :) > >>> > >>> Otis > >>> -- > >>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > >>> > >>> > >>> > >>> - Original Message > >>>> From: vivek sar > >>>> To: solr-user@lucene.apache.org > >>>> Sent: Wednesday, May 13, 2009 4:42:16 PM > >>>> Subject: Re: Solr memory requirements? > >>>> > >>>> Thanks Otis. > >>>> > >>>> Our use case doesn't require any sorting or faceting. I'm wondering if > >>>> I've configured anything wrong. > >>>> > >>>> I got total of 25 fields (15 are indexed and stored, other 10 are just > >>>> stored). All my fields are basic data type - which I thought are not > >>>> sorted. My id field is unique key. > >>>> > >>>> Is there any field here that might be getting sorted? > >>>> > >>>> > >>>> required="true" omitNorms="true" compressed="false"/> > >>>> > >>>> > >>>> compressed="false"/> > >>>> > >>>> omitNorms="true" compressed="false"/> > >>>> > >>>> omitNorms="true" compressed="false"/> > >>>> > >>>> omitNorms="true" compressed="false"/> > >>>> > >>>> default="NOW/HOUR" compressed="false"/> > >>>> > >>>> omitNorms="true&qu
Re: Solr memory requirements?
Disabling first/new searchers did help for the initial load time, but after 10-15 min the heap memory start climbing up again and reached max within 20 min. Now the GC is coming up all the time, which is slowing down the commit and search cycles. This is still puzzling what does Solr holds in the memory and doesn't release? I haven't been able to profile as the dump is too big. Would setting termIndexInterval help - not sure how can that be set using Solr. Some other query properties under solrconfig, 1024 true 50 200 false 2 Currently, I got 800 million documents and have specified 8G heap size. Any other suggestion on what can I do to control the Solr memory consumption? Thanks, -vivek On Wed, May 13, 2009 at 2:53 PM, vivek sar wrote: > Just an update on the memory issue - might be useful for others. I > read the following, > > http://wiki.apache.org/solr/SolrCaching?highlight=(SolrCaching) > > and looks like the first and new searcher listeners would populate the > FieldCache. Commenting out these two listener entries seems to do the > trick - at least the heap size is not growing as soon as Solr starts > up. > > I ran some searches and they all came out fine. Index rate is also > pretty good. Would there be any impact of disabling these listeners? > > Thanks, > -vivek > > On Wed, May 13, 2009 at 2:12 PM, vivek sar wrote: >> Otis, >> >> In that case, I'm not sure why Solr is taking up so much memory as >> soon as we start it up. I checked for .tii file and there is only one, >> >> -rw-r--r-- 1 search staff 20306 May 11 21:47 >> ./20090510_1/data/index/_3au.tii >> >> I have all the cache disabled - so that shouldn't be a problem too. My >> ramBuffer size is only 64MB. >> >> I read note on sorting, >> http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see >> something related to FieldCache. I don't see this as parameter defined >> in either solrconfig.xml or schema.xml. Could this be something that >> can load things in memory at startup? How can we disable it? >> >> I'm trying to find out if there is a way to tell how much memory Solr >> would consume and way to cap it. >> >> Thanks, >> -vivek >> >> >> >> >> On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic >> wrote: >>> >>> Hi, >>> >>> Sorting is triggered by the sort parameter in the URL, not a characteristic >>> of a field. :) >>> >>> Otis >>> -- >>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >>> >>> >>> >>> - Original Message >>>> From: vivek sar >>>> To: solr-user@lucene.apache.org >>>> Sent: Wednesday, May 13, 2009 4:42:16 PM >>>> Subject: Re: Solr memory requirements? >>>> >>>> Thanks Otis. >>>> >>>> Our use case doesn't require any sorting or faceting. I'm wondering if >>>> I've configured anything wrong. >>>> >>>> I got total of 25 fields (15 are indexed and stored, other 10 are just >>>> stored). All my fields are basic data type - which I thought are not >>>> sorted. My id field is unique key. >>>> >>>> Is there any field here that might be getting sorted? >>>> >>>> >>>> required="true" omitNorms="true" compressed="false"/> >>>> >>>> >>>> compressed="false"/> >>>> >>>> omitNorms="true" compressed="false"/> >>>> >>>> omitNorms="true" compressed="false"/> >>>> >>>> omitNorms="true" compressed="false"/> >>>> >>>> default="NOW/HOUR" compressed="false"/> >>>> >>>> omitNorms="true" compressed="false"/> >>>> >>>> omitNorms="true" compressed="false"/> >>>> >>>> compressed="false"/> >>>> >>>> compressed="false"/> >>>> >>>> omitNorms="true" compressed="false"/> >>>> >>>> omitNorms="true" compressed="false"/> >>>> >>>> omitNorms="true" compressed="false"/> >>>> >>>> omitNorms="true" compressed="false"/> >>>> >>>> omitNorms="true" co
Re: Solr memory requirements?
Just an update on the memory issue - might be useful for others. I read the following, http://wiki.apache.org/solr/SolrCaching?highlight=(SolrCaching) and looks like the first and new searcher listeners would populate the FieldCache. Commenting out these two listener entries seems to do the trick - at least the heap size is not growing as soon as Solr starts up. I ran some searches and they all came out fine. Index rate is also pretty good. Would there be any impact of disabling these listeners? Thanks, -vivek On Wed, May 13, 2009 at 2:12 PM, vivek sar wrote: > Otis, > > In that case, I'm not sure why Solr is taking up so much memory as > soon as we start it up. I checked for .tii file and there is only one, > > -rw-r--r-- 1 search staff 20306 May 11 21:47 > ./20090510_1/data/index/_3au.tii > > I have all the cache disabled - so that shouldn't be a problem too. My > ramBuffer size is only 64MB. > > I read note on sorting, > http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see > something related to FieldCache. I don't see this as parameter defined > in either solrconfig.xml or schema.xml. Could this be something that > can load things in memory at startup? How can we disable it? > > I'm trying to find out if there is a way to tell how much memory Solr > would consume and way to cap it. > > Thanks, > -vivek > > > > > On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic > wrote: >> >> Hi, >> >> Sorting is triggered by the sort parameter in the URL, not a characteristic >> of a field. :) >> >> Otis >> -- >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> >> >> >> - Original Message >>> From: vivek sar >>> To: solr-user@lucene.apache.org >>> Sent: Wednesday, May 13, 2009 4:42:16 PM >>> Subject: Re: Solr memory requirements? >>> >>> Thanks Otis. >>> >>> Our use case doesn't require any sorting or faceting. I'm wondering if >>> I've configured anything wrong. >>> >>> I got total of 25 fields (15 are indexed and stored, other 10 are just >>> stored). All my fields are basic data type - which I thought are not >>> sorted. My id field is unique key. >>> >>> Is there any field here that might be getting sorted? >>> >>> >>> required="true" omitNorms="true" compressed="false"/> >>> >>> >>> compressed="false"/> >>> >>> omitNorms="true" compressed="false"/> >>> >>> omitNorms="true" compressed="false"/> >>> >>> omitNorms="true" compressed="false"/> >>> >>> default="NOW/HOUR" compressed="false"/> >>> >>> omitNorms="true" compressed="false"/> >>> >>> omitNorms="true" compressed="false"/> >>> >>> compressed="false"/> >>> >>> compressed="false"/> >>> >>> omitNorms="true" compressed="false"/> >>> >>> omitNorms="true" compressed="false"/> >>> >>> omitNorms="true" compressed="false"/> >>> >>> omitNorms="true" compressed="false"/> >>> >>> omitNorms="true" compressed="false"/> >>> >>> compressed="false"/> >>> >>> compressed="false"/> >>> >>> compressed="false"/> >>> >>> omitNorms="true" compressed="false"/> >>> >>> compressed="false"/> >>> >>> default="NOW/HOUR" omitNorms="true"/> >>> >>> >>> >>> >>> omitNorms="true" multiValued="true"/> >>> >>> Thanks, >>> -vivek >>> >>> On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic >>> wrote: >>> > >>> > Hi, >>> > Some answers: >>> > 1) .tii files in the Lucene index. When you sort, all distinct values >>> > for the >>> field(s) used for sorting. Similarly for facet fields. Solr caches. >>> > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will >>> > consume >>> during indexing. There is no need to commit every 50K docs unless you want >>> to >>> trigger
Re: Solr memory requirements?
Have you done any profiling to see where the hotspots are? I realize that may be difficult on an index of that size, but maybe you can approximate on a smaller version. Also, do you have warming queries? You might also look into setting the termIndexInterval at the Lucene level. This is not currently exposed in Solr (AFAIK), but likely could be added fairly easily as part of the index parameters. http://lucene.apache.org/java/2_4_1/api/core/org/apache/lucene/index/IndexWriter.html#setTermIndexInterval(int) -Grant On May 13, 2009, at 5:12 PM, vivek sar wrote: Otis, In that case, I'm not sure why Solr is taking up so much memory as soon as we start it up. I checked for .tii file and there is only one, -rw-r--r-- 1 search staff 20306 May 11 21:47 ./20090510_1/data/ index/_3au.tii I have all the cache disabled - so that shouldn't be a problem too. My ramBuffer size is only 64MB. I read note on sorting, http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see something related to FieldCache. I don't see this as parameter defined in either solrconfig.xml or schema.xml. Could this be something that can load things in memory at startup? How can we disable it? I'm trying to find out if there is a way to tell how much memory Solr would consume and way to cap it. Thanks, -vivek On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic wrote: Hi, Sorting is triggered by the sort parameter in the URL, not a characteristic of a field. :) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: vivek sar To: solr-user@lucene.apache.org Sent: Wednesday, May 13, 2009 4:42:16 PM Subject: Re: Solr memory requirements? Thanks Otis. Our use case doesn't require any sorting or faceting. I'm wondering if I've configured anything wrong. I got total of 25 fields (15 are indexed and stored, other 10 are just stored). All my fields are basic data type - which I thought are not sorted. My id field is unique key. Is there any field here that might be getting sorted? required="true" omitNorms="true" compressed="false"/> compressed="false"/> omitNorms="true" compressed="false"/> omitNorms="true" compressed="false"/> omitNorms="true" compressed="false"/> default="NOW/HOUR" compressed="false"/> omitNorms="true" compressed="false"/> omitNorms="true" compressed="false"/> compressed="false"/> compressed="false"/> omitNorms="true" compressed="false"/> omitNorms="true" compressed="false"/> omitNorms="true" compressed="false"/> omitNorms="true" compressed="false"/> omitNorms="true" compressed="false"/> compressed="false"/> compressed="false"/> compressed="false"/> omitNorms="true" compressed="false"/> compressed="false"/> default="NOW/HOUR" omitNorms="true"/> omitNorms="true" multiValued="true"/> Thanks, -vivek On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic wrote: Hi, Some answers: 1) .tii files in the Lucene index. When you sort, all distinct values for the field(s) used for sorting. Similarly for facet fields. Solr caches. 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will consume during indexing. There is no need to commit every 50K docs unless you want to trigger snapshot creation. 3) see 1) above 1.5 billion docs per instance where each doc is cca 1KB? I doubt that's going to fly. :) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: vivek sar To: solr-user@lucene.apache.org Sent: Wednesday, May 13, 2009 3:04:46 PM Subject: Solr memory requirements? Hi, I'm pretty sure this has been asked before, but I couldn't find a complete answer in the forum archive. Here are my questions, 1) When solr starts up what does it loads up in the memory? Let's say I've 4 cores with each core 50G in size. When Solr comes up how much of it would be loaded in memory? 2) How much memory is required during index time? If I'm committing 50K records at a time (1 record = 1KB) using solrj, how much memory do I need to give to Solr. 3) Is there a minimum memory requirement by Solr to maintain a certain size index? Is there any benchmark on this? Here are some of my configuration from solrconfig.xml, 1) 64 2) All the caches (under query tag) are commented out 3) Few others, a) true==> would this require memory? b) 50 c) 200 d) e) false f) 2 The problem we are hav
Re: Solr memory requirements?
Otis, In that case, I'm not sure why Solr is taking up so much memory as soon as we start it up. I checked for .tii file and there is only one, -rw-r--r-- 1 search staff 20306 May 11 21:47 ./20090510_1/data/index/_3au.tii I have all the cache disabled - so that shouldn't be a problem too. My ramBuffer size is only 64MB. I read note on sorting, http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see something related to FieldCache. I don't see this as parameter defined in either solrconfig.xml or schema.xml. Could this be something that can load things in memory at startup? How can we disable it? I'm trying to find out if there is a way to tell how much memory Solr would consume and way to cap it. Thanks, -vivek On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic wrote: > > Hi, > > Sorting is triggered by the sort parameter in the URL, not a characteristic > of a field. :) > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >> From: vivek sar >> To: solr-user@lucene.apache.org >> Sent: Wednesday, May 13, 2009 4:42:16 PM >> Subject: Re: Solr memory requirements? >> >> Thanks Otis. >> >> Our use case doesn't require any sorting or faceting. I'm wondering if >> I've configured anything wrong. >> >> I got total of 25 fields (15 are indexed and stored, other 10 are just >> stored). All my fields are basic data type - which I thought are not >> sorted. My id field is unique key. >> >> Is there any field here that might be getting sorted? >> >> >> required="true" omitNorms="true" compressed="false"/> >> >> >> compressed="false"/> >> >> omitNorms="true" compressed="false"/> >> >> omitNorms="true" compressed="false"/> >> >> omitNorms="true" compressed="false"/> >> >> default="NOW/HOUR" compressed="false"/> >> >> omitNorms="true" compressed="false"/> >> >> omitNorms="true" compressed="false"/> >> >> compressed="false"/> >> >> compressed="false"/> >> >> omitNorms="true" compressed="false"/> >> >> omitNorms="true" compressed="false"/> >> >> omitNorms="true" compressed="false"/> >> >> omitNorms="true" compressed="false"/> >> >> omitNorms="true" compressed="false"/> >> >> compressed="false"/> >> >> compressed="false"/> >> >> compressed="false"/> >> >> omitNorms="true" compressed="false"/> >> >> compressed="false"/> >> >> default="NOW/HOUR" omitNorms="true"/> >> >> >> >> >> omitNorms="true" multiValued="true"/> >> >> Thanks, >> -vivek >> >> On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic >> wrote: >> > >> > Hi, >> > Some answers: >> > 1) .tii files in the Lucene index. When you sort, all distinct values for >> > the >> field(s) used for sorting. Similarly for facet fields. Solr caches. >> > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will >> > consume >> during indexing. There is no need to commit every 50K docs unless you want >> to >> trigger snapshot creation. >> > 3) see 1) above >> > >> > 1.5 billion docs per instance where each doc is cca 1KB? I doubt that's >> > going >> to fly. :) >> > >> > Otis >> > -- >> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> > >> > >> > >> > - Original Message >> >> From: vivek sar >> >> To: solr-user@lucene.apache.org >> >> Sent: Wednesday, May 13, 2009 3:04:46 PM >> >> Subject: Solr memory requirements? >> >> >> >> Hi, >> >> >> >> I'm pretty sure this has been asked before, but I couldn't find a >> >> complete answer in the forum archive. Here are my questions, >> >> >> >> 1) When solr starts up what does it loads up in the memory? Let's say >> >> I've 4 cores with each core 50G in size. When Solr comes up how much >> >> of it would be loaded in me
Re: Solr memory requirements?
Hi, Sorting is triggered by the sort parameter in the URL, not a characteristic of a field. :) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: vivek sar > To: solr-user@lucene.apache.org > Sent: Wednesday, May 13, 2009 4:42:16 PM > Subject: Re: Solr memory requirements? > > Thanks Otis. > > Our use case doesn't require any sorting or faceting. I'm wondering if > I've configured anything wrong. > > I got total of 25 fields (15 are indexed and stored, other 10 are just > stored). All my fields are basic data type - which I thought are not > sorted. My id field is unique key. > > Is there any field here that might be getting sorted? > > > required="true" omitNorms="true" compressed="false"/> > > > compressed="false"/> > > omitNorms="true" compressed="false"/> > > omitNorms="true" compressed="false"/> > > omitNorms="true" compressed="false"/> > > default="NOW/HOUR" compressed="false"/> > > omitNorms="true" compressed="false"/> > > omitNorms="true" compressed="false"/> > > compressed="false"/> > > compressed="false"/> > > omitNorms="true" compressed="false"/> > > omitNorms="true" compressed="false"/> > > omitNorms="true" compressed="false"/> > > omitNorms="true" compressed="false"/> > > omitNorms="true" compressed="false"/> > > compressed="false"/> > > compressed="false"/> > > compressed="false"/> > > omitNorms="true" compressed="false"/> > > compressed="false"/> > > default="NOW/HOUR" omitNorms="true"/> > > > > > omitNorms="true" multiValued="true"/> > > Thanks, > -vivek > > On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic > wrote: > > > > Hi, > > Some answers: > > 1) .tii files in the Lucene index. When you sort, all distinct values for > > the > field(s) used for sorting. Similarly for facet fields. Solr caches. > > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will > > consume > during indexing. There is no need to commit every 50K docs unless you want > to > trigger snapshot creation. > > 3) see 1) above > > > > 1.5 billion docs per instance where each doc is cca 1KB? I doubt that's > > going > to fly. :) > > > > Otis > > -- > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > > > - Original Message > >> From: vivek sar > >> To: solr-user@lucene.apache.org > >> Sent: Wednesday, May 13, 2009 3:04:46 PM > >> Subject: Solr memory requirements? > >> > >> Hi, > >> > >> I'm pretty sure this has been asked before, but I couldn't find a > >> complete answer in the forum archive. Here are my questions, > >> > >> 1) When solr starts up what does it loads up in the memory? Let's say > >> I've 4 cores with each core 50G in size. When Solr comes up how much > >> of it would be loaded in memory? > >> > >> 2) How much memory is required during index time? If I'm committing > >> 50K records at a time (1 record = 1KB) using solrj, how much memory do > >> I need to give to Solr. > >> > >> 3) Is there a minimum memory requirement by Solr to maintain a certain > >> size index? Is there any benchmark on this? > >> > >> Here are some of my configuration from solrconfig.xml, > >> > >> 1) 64 > >> 2) All the caches (under query tag) are commented out > >> 3) Few others, > >> a) true==> > >> would this require memory? > >> b) 50 > >> c) 200 > >> d) > >> e) false > >> f) 2 > >> > >> The problem we are having is following, > >> > >> I've given Solr RAM of 6G. As the total index size (all cores > >> combined) start growing the Solr memory consumption goes up. With 800 > >> million documents, I see Solr already taking up all the memory at > >> startup. After that the commits, searches everything become slow. We > >> will be having distributed setup with multiple Solr instances (around > >> 8) on four boxes, but our requirement is to have each Solr instance at > >> least maintain around 1.5 billion documents. > >> > >> We are trying to see if we can somehow reduce the Solr memory > >> footprint. If someone can provide a pointer on what parameters affect > >> memory and what effects it has we can then decide whether we want that > >> parameter or not. I'm not sure if there is any minimum Solr > >> requirement for it to be able maintain large indexes. I've used Lucene > >> before and that didn't require anything by default - it used up memory > >> only during index and search times - not otherwise. > >> > >> Any help is very much appreciated. > >> > >> Thanks, > >> -vivek > > > >
Re: Solr memory requirements?
Thanks Otis. Our use case doesn't require any sorting or faceting. I'm wondering if I've configured anything wrong. I got total of 25 fields (15 are indexed and stored, other 10 are just stored). All my fields are basic data type - which I thought are not sorted. My id field is unique key. Is there any field here that might be getting sorted? Thanks, -vivek On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic wrote: > > Hi, > Some answers: > 1) .tii files in the Lucene index. When you sort, all distinct values for > the field(s) used for sorting. Similarly for facet fields. Solr caches. > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will consume > during indexing. There is no need to commit every 50K docs unless you want > to trigger snapshot creation. > 3) see 1) above > > 1.5 billion docs per instance where each doc is cca 1KB? I doubt that's > going to fly. :) > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >> From: vivek sar >> To: solr-user@lucene.apache.org >> Sent: Wednesday, May 13, 2009 3:04:46 PM >> Subject: Solr memory requirements? >> >> Hi, >> >> I'm pretty sure this has been asked before, but I couldn't find a >> complete answer in the forum archive. Here are my questions, >> >> 1) When solr starts up what does it loads up in the memory? Let's say >> I've 4 cores with each core 50G in size. When Solr comes up how much >> of it would be loaded in memory? >> >> 2) How much memory is required during index time? If I'm committing >> 50K records at a time (1 record = 1KB) using solrj, how much memory do >> I need to give to Solr. >> >> 3) Is there a minimum memory requirement by Solr to maintain a certain >> size index? Is there any benchmark on this? >> >> Here are some of my configuration from solrconfig.xml, >> >> 1) 64 >> 2) All the caches (under query tag) are commented out >> 3) Few others, >> a) true ==> >> would this require memory? >> b) 50 >> c) 200 >> d) >> e) false >> f) 2 >> >> The problem we are having is following, >> >> I've given Solr RAM of 6G. As the total index size (all cores >> combined) start growing the Solr memory consumption goes up. With 800 >> million documents, I see Solr already taking up all the memory at >> startup. After that the commits, searches everything become slow. We >> will be having distributed setup with multiple Solr instances (around >> 8) on four boxes, but our requirement is to have each Solr instance at >> least maintain around 1.5 billion documents. >> >> We are trying to see if we can somehow reduce the Solr memory >> footprint. If someone can provide a pointer on what parameters affect >> memory and what effects it has we can then decide whether we want that >> parameter or not. I'm not sure if there is any minimum Solr >> requirement for it to be able maintain large indexes. I've used Lucene >> before and that didn't require anything by default - it used up memory >> only during index and search times - not otherwise. >> >> Any help is very much appreciated. >> >> Thanks, >> -vivek > >
Re: Solr memory requirements?
Hi, Some answers: 1) .tii files in the Lucene index. When you sort, all distinct values for the field(s) used for sorting. Similarly for facet fields. Solr caches. 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will consume during indexing. There is no need to commit every 50K docs unless you want to trigger snapshot creation. 3) see 1) above 1.5 billion docs per instance where each doc is cca 1KB? I doubt that's going to fly. :) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: vivek sar > To: solr-user@lucene.apache.org > Sent: Wednesday, May 13, 2009 3:04:46 PM > Subject: Solr memory requirements? > > Hi, > > I'm pretty sure this has been asked before, but I couldn't find a > complete answer in the forum archive. Here are my questions, > > 1) When solr starts up what does it loads up in the memory? Let's say > I've 4 cores with each core 50G in size. When Solr comes up how much > of it would be loaded in memory? > > 2) How much memory is required during index time? If I'm committing > 50K records at a time (1 record = 1KB) using solrj, how much memory do > I need to give to Solr. > > 3) Is there a minimum memory requirement by Solr to maintain a certain > size index? Is there any benchmark on this? > > Here are some of my configuration from solrconfig.xml, > > 1) 64 > 2) All the caches (under query tag) are commented out > 3) Few others, > a) true==> > would this require memory? > b) 50 > c) 200 > d) > e) false > f) 2 > > The problem we are having is following, > > I've given Solr RAM of 6G. As the total index size (all cores > combined) start growing the Solr memory consumption goes up. With 800 > million documents, I see Solr already taking up all the memory at > startup. After that the commits, searches everything become slow. We > will be having distributed setup with multiple Solr instances (around > 8) on four boxes, but our requirement is to have each Solr instance at > least maintain around 1.5 billion documents. > > We are trying to see if we can somehow reduce the Solr memory > footprint. If someone can provide a pointer on what parameters affect > memory and what effects it has we can then decide whether we want that > parameter or not. I'm not sure if there is any minimum Solr > requirement for it to be able maintain large indexes. I've used Lucene > before and that didn't require anything by default - it used up memory > only during index and search times - not otherwise. > > Any help is very much appreciated. > > Thanks, > -vivek