RE: Out of memory on sorting
For saving Memory: 1. allocate as much memory to the JVM (especially if you are using 64bit OS) 2. You can set "omitNorms=true" for your date & id fields (actually for all fields where index-time boosting & length normalization isn't required. This will require a full reindex) 3. Are you sorting on all document available in index. Try to limit it using filter queries. 4. Avoid match all docs query like, q=*:* (if you are using this) 5. If you could do away with sorting on ID field, and sort on field with lesser unique terms Hope this helps -- View this message in context: http://lucene.472066.n3.nabble.com/Out-of-memory-on-sorting-tp2960578p2988336.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Out of memory on sorting
See below: On Thu, May 19, 2011 at 9:06 AM, Rohit wrote: > Hi Erick, > > My OOM problem starts when I query the core with 13217121 documents. My > schema and other details are given below, H, how many cores are you running and what are they doing? Because they all use the same memory pool, so you may be getting some carry-over. So one strategy would be just to move this core to a dedicated machine. > > 1> how is your sort field defined? String? Integer? If it's a string and you > could change it to a numeric type, you'd use a lot less memory. > > We primarily use two different sort criteria one is a date field and the > other is string (id). I cannot change the "id" field as this is also the > uniquekey for my schema. OK, but can you use a separate field just for sorting? Populate it with a and sort on that rather than ID. This is only helpful if you can make a compact representation, e.g. integer. > > 2> How many distinct terms? I'm guessing one/document actually,this is > somewhat of an anti-pattern in Solr for all it's sometimes necessary. > > Since one of the field is a timestamp instance and the other a unique key > all are distinct. (These are tweets happening for keyword) > Not one, but two fields where all values are distinct. Although I don't think the timestamp is much of a problem, assuming you're storing it as one of the numeric types (I'd especially make sure it was one of the Trie types, specifically "tdate" if you're going to do range queries). There are tricks for dealing with this, but your "id" field will get you a bigger bang for the buck, concentrate on that first. > 3> How much memory are you allocating for the JVM? > > I am starting solr with the following command java -Xms1024M -Xmx-2048M > start.jar > Well, you can bump this higher if you're on 64 bit OSs, The other possibility is to shard your index. But really, with 13M documents this should fit on one machine. What does your statistics page tell you, especially about cache usage? > > All out test case for moving to solr has passed, this is proving to be a big > set back. Help would be greatly appreciated. > > Regards, > Rohit > > > > -Original Message- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: 19 May 2011 18:21 > To: solr-user@lucene.apache.org > Subject: Re: Out of memory on sorting > > The warming queries warm up the caches used in sorting. So > just including the &sort=. will warm the sort caches. the terms > searched are not important. The same is true with facets... > > However, I don't understand how that relates to your OOM problems. I'd > expect the OOM to start happening on startup, you'd be doing > the operation that runs you out of memory on startup... > > So, we need more details: > 1> how is your sort field defined? String? Integer? If it's a string > and you could change it to a numeric type, you'd use a lot > less memory. > 2> How many distinct terms? I'm guessing one/document actually, > this is somewhat of an anti-pattern in Solr for all it's sometimes > necessary. > 3> How much memory are you allocating for the JVM? > 4> What other fields are you sorting on and how many unique values > in each? Solr Admin can help you here > > Best > Erick > > > On Thu, May 19, 2011 at 6:20 AM, Rohit wrote: >> Thanks for pointing me in the right direction, now I see the configuration >> for firstsearcher or newsearcher, the needs to configured >> previously. In my case the q is every changing, users can actually search >> for anything and the possibilities of queries unlimited. >> >> How can I make this generic? >> >> -Rohit >> >> >> >> -Original Message- >> From: rajini maski [mailto:rajinima...@gmail.com] >> Sent: 19 May 2011 14:53 >> To: solr-user@lucene.apache.org >> Subject: Re: Out of memory on sorting >> >> Explicit Warming of Sort Fields >> >> If you do a lot of field based sorting, it is advantageous to add > explicitly >> warming queries to the "newSearcher" and "firstSearcher" event listeners > in >> your solrconfig which sort on those fields, so the FieldCache is populated >> prior to any queries being executed by your users. >> firstSearcher >> solr rocks0> name="rows">10empID asc >> >> >> >> On Thu, May 19, 2011 at 2:39 PM, Rohit wrote: >> >>> Hi, >>> >>> >>> >>> We are moving to a multi-core Solr installation with each of the core >>> having >>> millions of documents, also documents would be added to the index on an >>> hourly basis. Everything seems to run find and I getting the expected >>> result and performance, except where sorting is concerned. >>> >>> >>> >>> I have an index size of 13217121 documents, now when I want to get >>> documents >>> between two dates and then sort them by ID solr goes out of memory. This >>> is >>> with just me using the system, we might also have simultaneous users, how >>> can I improve this performance? >>> >>> >>> >>> Rohit >>> >>> >> >> > >
RE: Out of memory on sorting
Hi Erick, My OOM problem starts when I query the core with 13217121 documents. My schema and other details are given below, 1> how is your sort field defined? String? Integer? If it's a string and you could change it to a numeric type, you'd use a lot less memory. We primarily use two different sort criteria one is a date field and the other is string (id). I cannot change the "id" field as this is also the uniquekey for my schema. 2> How many distinct terms? I'm guessing one/document actually,this is somewhat of an anti-pattern in Solr for all it's sometimes necessary. Since one of the field is a timestamp instance and the other a unique key all are distinct. (These are tweets happening for keyword) 3> How much memory are you allocating for the JVM? I am starting solr with the following command java -Xms1024M -Xmx-2048M start.jar All out test case for moving to solr has passed, this is proving to be a big set back. Help would be greatly appreciated. Regards, Rohit -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 19 May 2011 18:21 To: solr-user@lucene.apache.org Subject: Re: Out of memory on sorting The warming queries warm up the caches used in sorting. So just including the &sort=. will warm the sort caches. the terms searched are not important. The same is true with facets... However, I don't understand how that relates to your OOM problems. I'd expect the OOM to start happening on startup, you'd be doing the operation that runs you out of memory on startup... So, we need more details: 1> how is your sort field defined? String? Integer? If it's a string and you could change it to a numeric type, you'd use a lot less memory. 2> How many distinct terms? I'm guessing one/document actually, this is somewhat of an anti-pattern in Solr for all it's sometimes necessary. 3> How much memory are you allocating for the JVM? 4> What other fields are you sorting on and how many unique values in each? Solr Admin can help you here Best Erick On Thu, May 19, 2011 at 6:20 AM, Rohit wrote: > Thanks for pointing me in the right direction, now I see the configuration > for firstsearcher or newsearcher, the needs to configured > previously. In my case the q is every changing, users can actually search > for anything and the possibilities of queries unlimited. > > How can I make this generic? > > -Rohit > > > > -Original Message- > From: rajini maski [mailto:rajinima...@gmail.com] > Sent: 19 May 2011 14:53 > To: solr-user@lucene.apache.org > Subject: Re: Out of memory on sorting > > Explicit Warming of Sort Fields > > If you do a lot of field based sorting, it is advantageous to add explicitly > warming queries to the "newSearcher" and "firstSearcher" event listeners in > your solrconfig which sort on those fields, so the FieldCache is populated > prior to any queries being executed by your users. > firstSearcher > solr rocks0 name="rows">10empID asc > > > > On Thu, May 19, 2011 at 2:39 PM, Rohit wrote: > >> Hi, >> >> >> >> We are moving to a multi-core Solr installation with each of the core >> having >> millions of documents, also documents would be added to the index on an >> hourly basis. Everything seems to run find and I getting the expected >> result and performance, except where sorting is concerned. >> >> >> >> I have an index size of 13217121 documents, now when I want to get >> documents >> between two dates and then sort them by ID solr goes out of memory. This >> is >> with just me using the system, we might also have simultaneous users, how >> can I improve this performance? >> >> >> >> Rohit >> >> > >
Re: Out of memory on sorting
The warming queries warm up the caches used in sorting. So just including the &sort=. will warm the sort caches. the terms searched are not important. The same is true with facets... However, I don't understand how that relates to your OOM problems. I'd expect the OOM to start happening on startup, you'd be doing the operation that runs you out of memory on startup... So, we need more details: 1> how is your sort field defined? String? Integer? If it's a string and you could change it to a numeric type, you'd use a lot less memory. 2> How many distinct terms? I'm guessing one/document actually, this is somewhat of an anti-pattern in Solr for all it's sometimes necessary. 3> How much memory are you allocating for the JVM? 4> What other fields are you sorting on and how many unique values in each? Solr Admin can help you here Best Erick On Thu, May 19, 2011 at 6:20 AM, Rohit wrote: > Thanks for pointing me in the right direction, now I see the configuration > for firstsearcher or newsearcher, the needs to configured > previously. In my case the q is every changing, users can actually search > for anything and the possibilities of queries unlimited. > > How can I make this generic? > > -Rohit > > > > -Original Message- > From: rajini maski [mailto:rajinima...@gmail.com] > Sent: 19 May 2011 14:53 > To: solr-user@lucene.apache.org > Subject: Re: Out of memory on sorting > > Explicit Warming of Sort Fields > > If you do a lot of field based sorting, it is advantageous to add explicitly > warming queries to the "newSearcher" and "firstSearcher" event listeners in > your solrconfig which sort on those fields, so the FieldCache is populated > prior to any queries being executed by your users. > firstSearcher > solr rocks0 name="rows">10empID asc > > > > On Thu, May 19, 2011 at 2:39 PM, Rohit wrote: > >> Hi, >> >> >> >> We are moving to a multi-core Solr installation with each of the core >> having >> millions of documents, also documents would be added to the index on an >> hourly basis. Everything seems to run find and I getting the expected >> result and performance, except where sorting is concerned. >> >> >> >> I have an index size of 13217121 documents, now when I want to get >> documents >> between two dates and then sort them by ID solr goes out of memory. This >> is >> with just me using the system, we might also have simultaneous users, how >> can I improve this performance? >> >> >> >> Rohit >> >> > >
RE: Out of memory on sorting
Thanks for pointing me in the right direction, now I see the configuration for firstsearcher or newsearcher, the needs to configured previously. In my case the q is every changing, users can actually search for anything and the possibilities of queries unlimited. How can I make this generic? -Rohit -Original Message- From: rajini maski [mailto:rajinima...@gmail.com] Sent: 19 May 2011 14:53 To: solr-user@lucene.apache.org Subject: Re: Out of memory on sorting Explicit Warming of Sort Fields If you do a lot of field based sorting, it is advantageous to add explicitly warming queries to the "newSearcher" and "firstSearcher" event listeners in your solrconfig which sort on those fields, so the FieldCache is populated prior to any queries being executed by your users. firstSearcher solr rocks010empID asc On Thu, May 19, 2011 at 2:39 PM, Rohit wrote: > Hi, > > > > We are moving to a multi-core Solr installation with each of the core > having > millions of documents, also documents would be added to the index on an > hourly basis. Everything seems to run find and I getting the expected > result and performance, except where sorting is concerned. > > > > I have an index size of 13217121 documents, now when I want to get > documents > between two dates and then sort them by ID solr goes out of memory. This > is > with just me using the system, we might also have simultaneous users, how > can I improve this performance? > > > > Rohit > >
Re: Out of memory on sorting
Explicit Warming of Sort Fields If you do a lot of field based sorting, it is advantageous to add explicitly warming queries to the "newSearcher" and "firstSearcher" event listeners in your solrconfig which sort on those fields, so the FieldCache is populated prior to any queries being executed by your users. firstSearcher solr rocks010empID asc On Thu, May 19, 2011 at 2:39 PM, Rohit wrote: > Hi, > > > > We are moving to a multi-core Solr installation with each of the core > having > millions of documents, also documents would be added to the index on an > hourly basis. Everything seems to run find and I getting the expected > result and performance, except where sorting is concerned. > > > > I have an index size of 13217121 documents, now when I want to get > documents > between two dates and then sort them by ID solr goes out of memory. This > is > with just me using the system, we might also have simultaneous users, how > can I improve this performance? > > > > Rohit > >
Out of memory on sorting
Hi, We are moving to a multi-core Solr installation with each of the core having millions of documents, also documents would be added to the index on an hourly basis. Everything seems to run find and I getting the expected result and performance, except where sorting is concerned. I have an index size of 13217121 documents, now when I want to get documents between two dates and then sort them by ID solr goes out of memory. This is with just me using the system, we might also have simultaneous users, how can I improve this performance? Rohit