RE: Out of memory on Solr sorting
Hi all, I seemed to have found the solution to this problem. Apparently, allocating enough virtual memory on the server seems to only solve on half of the problem. Even after allocating 4 gigs of Virtual memory on jboss server, I still did get the Out of memory on sorting. I didn't how ever notice that the LRU cache on my config was set to default which was still 512 megs of max memory. I had to increase that to a round 2 gigs and the sorting did work perfectly ok. Even though I am satisfied that I have found the solution to the problem, i am still not satisfied to know that Sort consumes so much memory. In no products have I seen sorting 10 fields take up 1 gig and half of virtual memory. I am not sure, if there could be a better implementation of this. But something doesn't seem right to me. Thanks for all your support. It has truly been overwhelming. Sundar From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: RE: Out of memory on Solr sorting Date: Tue, 29 Jul 2008 10:43:05 -0700 A sneaky source of OutOfMemory errors is the permanent generation. If you add this: -XX:PermSize=64m -XX:MaxPermSize=96m You will increase the size of the permanent generation. We found this helped. Also note that when you undeploy a war file, the old deployment has permanent storage that is not reclaimed, and so each undeploy/redeploy cycle eats up the permanent generation pool. -Original Message- From: david w [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 29, 2008 7:20 AM To: solr-user@lucene.apache.org Subject: Re: Out of memory on Solr sorting Hi, Daniel I got the same probem like Sundar. Is that possible to tell me what profiling tool you are using? thx a lot. /David On Tue, Jul 29, 2008 at 8:19 PM, Daniel Alheiros [EMAIL PROTECTED]wrote: Hi Sundar. Well it would be good if you could do some profiling on your Solr app. I've done it during the indexing process so I could figure out what was going on in the OutOfMemoryErrors I was getting. But you won't definitelly need to have as much memory as your whole index size. I have a 3.5 million documents (aprox. 10Gb) running on this 2Gb heap VM. Cheers, Daniel -Original Message- From: sundar shankar [mailto:[EMAIL PROTECTED] Sent: 23 July 2008 23:45 To: solr-user@lucene.apache.org Subject: RE: Out of memory on Solr sorting Hi Daniel, I am afraid that didnt solve my problem. I was guessing my problem was that I have too much of data and too little memory allocated for that. I happened to read in couple of the posts which mentioned that I need VM that is close to the size of my data(folder). I have like 540 Megs now and a little more than a million and a half docs. Ideally in that case 512 megs should be enough for me. In fact I am able to perform all other operations now, commit, optmize, select, update, nightly cron jobs to index data again. etc etc with no hassles. Even my load tests perform very well. Just the sort and it doesnt seem to work. I allocated 2 gigs of memory now. Still same results. Used the GC params u gave me too. No change what so ever. Am not sure, whats going on. Is there something that I can do to find out how much is needed in actuality as my production server might need to be configured in accordance. I dont store any documents. We basically fetch standard column data from oracle database store them into Solr fields. Before I had EdgeNGram configured and had Solr 1.2, My data size was less that half of what it is right now. I guess if I remember right, it was of the order of 100 megs. The max size of a field right now might not cross a 100 chars too. Quizzled even more now. -Sundar P.S: My configurations : Solr 1.3 Red hat 540 megs of data (1855013 docs) 2 gigs of memory installed and allocated like this JAVA_OPTS=$JAVA_OPTS -Xms2048m -Xmx2048m -XX:MinHeapFreeRatio=50 -XX:NewSize=1024m -XX:NewRatio=2 -Dsun.rmi.dgc.client.gcInterval=360 -Dsun.rmi.dgc.server.gcInterval=360 Jboss 4.05 Subject: RE: Out of memory on Solr sorting Date: Wed, 23 Jul 2008 10:49:06 +0100 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Hi I haven't read the whole thread so I will take my chances here. I've been fighting recently to keep my Solr instances stable because they were frequently crashing with OutOfMemoryErrors. I'm using Solr 1.2 and when it happens there is a bug that makes the index locked unless you restart Solr... So in my cenario it was extremelly damaging. After some profiling I realized that my major problem was caused by the way the JVM heap was being used as I haven't configured it to run using any advanced configuration (I had just made it bigger -Xmx and Xms 1.5 Gb), it's running on Sun JVM 1.5 (the most recent1.5 available) and it's deployed
RE: Out of memory on Solr sorting
Hi Sundar, If increasing LRU cache helps you: - you are probably using 'tokenized' field for sorting (could you confirm please?)... ...you should use 'non-tokenized single-valued non-boolean' for better performance of sorting... Fuad Efendi == http://www.tokenizer.org Quoting sundar shankar [EMAIL PROTECTED]: Hi all, I seemed to have found the solution to this problem. Apparently, allocating enough virtual memory on the server seems to only solve on half of the problem. Even after allocating 4 gigs of Virtual memory on jboss server, I still did get the Out of memory on sorting. I didn't how ever notice that the LRU cache on my config was set to default which was still 512 megs of max memory. I had to increase that to a round 2 gigs and the sorting did work perfectly ok. Even though I am satisfied that I have found the solution to the problem, i am still not satisfied to know that Sort consumes so much memory. In no products have I seen sorting 10 fields take up 1 gig and half of virtual memory. I am not sure, if there could be a better implementation of this. But something doesn't seem right to me. Thanks for all your support. It has truly been overwhelming. Sundar
RE: Out of memory on Solr sorting
The field is of type text_ws. Is this not recomended. Should I use text instead? Date: Tue, 5 Aug 2008 10:58:35 -0700 From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: RE: Out of memory on Solr sorting Hi Sundar, If increasing LRU cache helps you: - you are probably using 'tokenized' field for sorting (could you confirm please?)... ...you should use 'non-tokenized single-valued non-boolean' for better performance of sorting... Fuad Efendi == http://www.tokenizer.org Quoting sundar shankar [EMAIL PROTECTED]: Hi all, I seemed to have found the solution to this problem. Apparently, allocating enough virtual memory on the server seems to only solve on half of the problem. Even after allocating 4 gigs of Virtual memory on jboss server, I still did get the Out of memory on sorting. I didn't how ever notice that the LRU cache on my config was set to default which was still 512 megs of max memory. I had to increase that to a round 2 gigs and the sorting did work perfectly ok. Even though I am satisfied that I have found the solution to the problem, i am still not satisfied to know that Sort consumes so much memory. In no products have I seen sorting 10 fields take up 1 gig and half of virtual memory. I am not sure, if there could be a better implementation of this. But something doesn't seem right to me. Thanks for all your support. It has truly been overwhelming. Sundar Movies, sports news! Get your daily entertainment fix, only on live.com Try it now! _ Searching for the best deals on travel? Visit MSN Travel. http://msn.coxandkings.co.in/cnk/cnk.do
RE: Out of memory on Solr sorting
My understanding of Lucene Sorting is that it will sort by 'tokens' and not by 'full fields'... so that for sorting you need 'full-string' (non-tokenized) field, and to search you need another one tokenized. For instance, use 'string' for sorting, and 'text_ws' for search; and use 'copyField'... (some memory for copyField) Sorting using tokenized field: 100,000 documents, each 'Book Title' consists of 10 tokens in average, ... - total 1,000,000 (probably unique) tokens in a hashtable; with nontokenized field - 100,000 entries, and Lucene internal FieldCache is used instead of SOLR LRU. Also, with tokenized fields 'sorting' is not natural (alphabetical order)... Fuad Efendi == http://www.linkedin.com/in/liferay Quoting sundar shankar [EMAIL PROTECTED]: The field is of type text_ws. Is this not recomended. Should I use text instead? If increasing LRU cache helps you: - you are probably using 'tokenized' field for sorting (could you confirm please?)... ...you should use 'non-tokenized single-valued non-boolean' for better performance of sorting...
RE: Out of memory on Solr sorting
Best choice for sorting field: !-- This is an example of using the KeywordTokenizer along With various TokenFilterFactories to produce a sortable field that does not include some properties of the source text -- fieldType name=alphaOnlySort class=solr.TextField sortMissingLast=true omitNorms=true - case-insentitive etc... I might be partially wrong about SOLR LRU Cache but it is used somehow in your specific case... 'filterCache' is probably used for 'tokenized' sorting: it stores (token, DocList)... Fuad Efendi == http://www.tokenizer.org Quoting Fuad Efendi [EMAIL PROTECTED]: My understanding of Lucene Sorting is that it will sort by 'tokens' and not by 'full fields'... so that for sorting you need 'full-string' (non-tokenized) field, and to search you need another one tokenized. For instance, use 'string' for sorting, and 'text_ws' for search; and use 'copyField'... (some memory for copyField) Sorting using tokenized field: 100,000 documents, each 'Book Title' consists of 10 tokens in average, ... - total 1,000,000 (probably unique) tokens in a hashtable; with nontokenized field - 100,000 entries, and Lucene internal FieldCache is used instead of SOLR LRU. Also, with tokenized fields 'sorting' is not natural (alphabetical order)... Fuad Efendi == http://www.linkedin.com/in/liferay Quoting sundar shankar [EMAIL PROTECTED]: The field is of type text_ws. Is this not recomended. Should I use text instead? If increasing LRU cache helps you: - you are probably using 'tokenized' field for sorting (could you confirm please?)... ...you should use 'non-tokenized single-valued non-boolean' for better performance of sorting...
Re: Out of memory on Solr sorting
On Tue, Aug 5, 2008 at 1:59 PM, Fuad Efendi [EMAIL PROTECTED] wrote: If increasing LRU cache helps you: - you are probably using 'tokenized' field for sorting (could you confirm please?)... Sorting does not utilize any Solr caches. -Yonik
Re: Out of memory on Solr sorting
I know, and this is strange... I was guessing filterCache is used implicitly to get DocSet for token; as Sundar wrote, increase of LRUCache helped him (he is sorting on 'text-ws' field) -Fuad If increasing LRU cache helps you: - you are probably using 'tokenized' field for sorting (could you confirm please?)... Sorting does not utilize any Solr caches. -Yonik
RE: Out of memory on Solr sorting
Yes this is what I did. I got an out of memory while executing a query with a sort param 1. Stopped Jboss server 2. filterCache class=solr.LRUCache size=2048 initialSize=512 autowarmCount=256/ !-- queryResultCache caches results of searches - ordered lists of document ids (DocList) based on a query, a sort, and the range of documents requested. --queryResultCache class=solr.LRUCache size=2048 initialSize=512 autowarmCount=256/ !-- documentCache caches Lucene Document objects (the stored fields for each document). Since Lucene internal document ids are transient, this cache will not be autowarmed. --documentCache class=solr.LRUCache size=2048 initialSize=512 autowarmCount=0/ In these 3 params, I changed size from 512 to 2048. 3. Restarted the server 4. Ran query again. It worked just fine. after that. I am currently reinexing, replaving the text_ws to string and having the default size of all 3 caches to 512 and seeing if the problem goes away. -Sundar Date: Tue, 5 Aug 2008 14:05:05 -0700 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Re: Out of memory on Solr sorting I know, and this is strange... I was guessing filterCache is used implicitly to get DocSet for token; as Sundar wrote, increase of LRUCache helped him (he is sorting on 'text-ws' field) -Fuad If increasing LRU cache helps you: - you are probably using 'tokenized' field for sorting (could you confirm please?)... Sorting does not utilize any Solr caches. -Yonik _ Searching for the best deals on travel? Visit MSN Travel. http://msn.coxandkings.co.in/cnk/cnk.do
RE: Out of memory on Solr sorting
Sundar, very strange that increase of size/initialSize of LRUCache helps with OutOfMemoryError... 2048 is number of entries in cache and _not_ 2Gb of memory... Making size==initialSize of HashMap-based LRUCache would help with performance anyway; may be with OOMs (probably no need to resize HashMap...) In these 3 params, I changed size from 512 to 2048. 3. Restarted the server sorting I know, and this is strange... I was guessing filterCache is used implicitly to get DocSet for token; as Sundar wrote, increase of LRUCache helped him (he is sorting on 'text-ws' field) -Fuad If increasing LRU cache helps you: - you are probably using 'tokenized' field for sorting (could you confirm please?)... Sorting does not utilize any Solr caches. -Yonik _ Searching for the best deals on travel? Visit MSN Travel. http://msn.coxandkings.co.in/cnk/cnk.do
RE: Out of memory on Solr sorting
Oh Wow, I didnt know that was the case. I am completely left baffled now. BAck to square one I guess. :) Date: Tue, 5 Aug 2008 14:31:28 -0700 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: RE: Out of memory on Solr sorting Sundar, very strange that increase of size/initialSize of LRUCache helps with OutOfMemoryError... 2048 is number of entries in cache and _not_ 2Gb of memory... Making size==initialSize of HashMap-based LRUCache would help with performance anyway; may be with OOMs (probably no need to resize HashMap...) In these 3 params, I changed size from 512 to 2048. 3. Restarted the server sorting I know, and this is strange... I was guessing filterCache is used implicitly to get DocSet for token; as Sundar wrote, increase of LRUCache helped him (he is sorting on 'text-ws' field) -Fuad If increasing LRU cache helps you:- you are probably using 'tokenized' field for sorting (could you confirm please?)... Sorting does not utilize any Solr caches. -Yonik _ Searching for the best deals on travel? Visit MSN Travel. http://msn.coxandkings.co.in/cnk/cnk.do _ Searching for the best deals on travel? Visit MSN Travel. http://msn.coxandkings.co.in/cnk/cnk.do
RE: Out of memory on Solr sorting
A sneaky source of OutOfMemory errors is the permanent generation. If you add this: -XX:PermSize=64m -XX:MaxPermSize=96m You will increase the size of the permanent generation. We found this helped. Also note that when you undeploy a war file, the old deployment has permanent storage that is not reclaimed, and so each undeploy/redeploy cycle eats up the permanent generation pool. -Original Message- From: david w [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 29, 2008 7:20 AM To: solr-user@lucene.apache.org Subject: Re: Out of memory on Solr sorting Hi, Daniel I got the same probem like Sundar. Is that possible to tell me what profiling tool you are using? thx a lot. /David On Tue, Jul 29, 2008 at 8:19 PM, Daniel Alheiros [EMAIL PROTECTED]wrote: Hi Sundar. Well it would be good if you could do some profiling on your Solr app. I've done it during the indexing process so I could figure out what was going on in the OutOfMemoryErrors I was getting. But you won't definitelly need to have as much memory as your whole index size. I have a 3.5 million documents (aprox. 10Gb) running on this 2Gb heap VM. Cheers, Daniel -Original Message- From: sundar shankar [mailto:[EMAIL PROTECTED] Sent: 23 July 2008 23:45 To: solr-user@lucene.apache.org Subject: RE: Out of memory on Solr sorting Hi Daniel, I am afraid that didnt solve my problem. I was guessing my problem was that I have too much of data and too little memory allocated for that. I happened to read in couple of the posts which mentioned that I need VM that is close to the size of my data(folder). I have like 540 Megs now and a little more than a million and a half docs. Ideally in that case 512 megs should be enough for me. In fact I am able to perform all other operations now, commit, optmize, select, update, nightly cron jobs to index data again. etc etc with no hassles. Even my load tests perform very well. Just the sort and it doesnt seem to work. I allocated 2 gigs of memory now. Still same results. Used the GC params u gave me too. No change what so ever. Am not sure, whats going on. Is there something that I can do to find out how much is needed in actuality as my production server might need to be configured in accordance. I dont store any documents. We basically fetch standard column data from oracle database store them into Solr fields. Before I had EdgeNGram configured and had Solr 1.2, My data size was less that half of what it is right now. I guess if I remember right, it was of the order of 100 megs. The max size of a field right now might not cross a 100 chars too. Quizzled even more now. -Sundar P.S: My configurations : Solr 1.3 Red hat 540 megs of data (1855013 docs) 2 gigs of memory installed and allocated like this JAVA_OPTS=$JAVA_OPTS -Xms2048m -Xmx2048m -XX:MinHeapFreeRatio=50 -XX:NewSize=1024m -XX:NewRatio=2 -Dsun.rmi.dgc.client.gcInterval=360 -Dsun.rmi.dgc.server.gcInterval=360 Jboss 4.05 Subject: RE: Out of memory on Solr sorting Date: Wed, 23 Jul 2008 10:49:06 +0100 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Hi I haven't read the whole thread so I will take my chances here. I've been fighting recently to keep my Solr instances stable because they were frequently crashing with OutOfMemoryErrors. I'm using Solr 1.2 and when it happens there is a bug that makes the index locked unless you restart Solr... So in my cenario it was extremelly damaging. After some profiling I realized that my major problem was caused by the way the JVM heap was being used as I haven't configured it to run using any advanced configuration (I had just made it bigger - Xmx and Xms 1.5 Gb), it's running on Sun JVM 1.5 (the most recent 1.5 available) and it's deployed on a Jboss 4.2 on a RHEL. So my findings were too many objects were being allocated on the old generation area of the heap, which makes them harder to be disposed, and also the default behaviour was letting the heap get too filled up before kicking a GC and according to the JVM specs the default is if after a short period when a full gc is executed if a certain percentage of the heap is not freed an OutOfMemoryError should be thrown. I've changed my JVM startup params and it's working extremelly stable since then: -Xmx2048m -Xms2048m -XX:MinHeapFreeRatio=50 -XX:NewSize=1024m -XX:NewRatio=2 -Dsun.rmi.dgc.client.gcInterval=360 -Dsun.rmi.dgc.server.gcInterval=360 I hope it helps. Regards, Daniel Alheiros -Original Message- From: Fuad Efendi [mailto:[EMAIL PROTECTED] Sent: 22 July 2008 23:23 To: solr-user@lucene.apache.org Subject: RE: Out of memory on Solr sorting Yes, it is a cache, it stores sorted by sorted field array of Document IDs together with sorted fields; query results can intersect with it and reorder accordingly
RE: Out of memory on Solr sorting
Hi I haven't read the whole thread so I will take my chances here. I've been fighting recently to keep my Solr instances stable because they were frequently crashing with OutOfMemoryErrors. I'm using Solr 1.2 and when it happens there is a bug that makes the index locked unless you restart Solr... So in my cenario it was extremelly damaging. After some profiling I realized that my major problem was caused by the way the JVM heap was being used as I haven't configured it to run using any advanced configuration (I had just made it bigger - Xmx and Xms 1.5 Gb), it's running on Sun JVM 1.5 (the most recent 1.5 available) and it's deployed on a Jboss 4.2 on a RHEL. So my findings were too many objects were being allocated on the old generation area of the heap, which makes them harder to be disposed, and also the default behaviour was letting the heap get too filled up before kicking a GC and according to the JVM specs the default is if after a short period when a full gc is executed if a certain percentage of the heap is not freed an OutOfMemoryError should be thrown. I've changed my JVM startup params and it's working extremelly stable since then: -Xmx2048m -Xms2048m -XX:MinHeapFreeRatio=50 -XX:NewSize=1024m -XX:NewRatio=2 -Dsun.rmi.dgc.client.gcInterval=360 -Dsun.rmi.dgc.server.gcInterval=360 I hope it helps. Regards, Daniel Alheiros -Original Message- From: Fuad Efendi [mailto:[EMAIL PROTECTED] Sent: 22 July 2008 23:23 To: solr-user@lucene.apache.org Subject: RE: Out of memory on Solr sorting Yes, it is a cache, it stores sorted by sorted field array of Document IDs together with sorted fields; query results can intersect with it and reorder accordingly. But memory requirements should be well documented. It uses internally WeakHashMap which is not good(!!!) - a lot of underground warming ups of caches which SOLR is not aware of... Could be. I think Lucene-SOLR developers should join this discussion: /** * Expert: The default cache implementation, storing all values in memory. * A WeakHashMap is used for storage. * .. // inherit javadocs public StringIndex getStringIndex(IndexReader reader, String field) throws IOException { return (StringIndex) stringsIndexCache.get(reader, field); } Cache stringsIndexCache = new Cache() { protected Object createValue(IndexReader reader, Object fieldKey) throws IOException { String field = ((String) fieldKey).intern(); final int[] retArray = new int[reader.maxDoc()]; String[] mterms = new String[reader.maxDoc()+1]; TermDocs termDocs = reader.termDocs(); TermEnum termEnum = reader.terms (new Term (field, )); Quoting Fuad Efendi [EMAIL PROTECTED]: I am hoping [new StringIndex (retArray, mterms)] is called only once per-sort-field and cached somewhere at Lucene; theoretically you need multiply number of documents on size of field (supposing that field contains unique text); you need not tokenize this field; you need not store TermVector. for 2 000 000 documents with simple untokenized text field such as title of book (256 bytes) you need probably 512 000 000 bytes per Searcher, and as Mark mentioned you should limit number of searchers in SOLR. So that Xmx512M is definitely not enough even for simple cases... Quoting sundar shankar [EMAIL PROTECTED]: I haven't seen the source code before, But I don't know why the sorting isn't done after the fetch is done. Wouldn't that make it more faster. at least in case of field level sorting? I could be wrong too and the implementation might probably be better. But don't know why all of the fields have had to be loaded. Date: Tue, 22 Jul 2008 14:26:26 -0700 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Re: Out of memory on Solr sorting Ok, after some analysis of FieldCacheImpl: - it is supposed that (sorted) Enumeration of terms is less than total number of documents (so that SOLR uses specific field type for sorted searches: solr.StrField with omitNorms=true) It creates int[reader.maxDoc()] array, checks (sorted) Enumeration of terms (untokenized solr.StrField), and populates array with document Ids. - it also creates array of String String[] mterms = new String[reader.maxDoc()+1]; Why do we need that? For 1G document with average term/StrField size of 100 bytes (which could be unique text!!!) it will create kind of huge 100Gb cache which is not really needed... StringIndex value = new StringIndex (retArray, mterms); If I understand correctly... StringIndex _must_ be a file in a filesystem for such a case... We create StringIndex, and retrieve top 10 documents, huge overhead. Quoting Fuad Efendi [EMAIL PROTECTED]:Ok, what is confusing me is implicit guess that FieldCache contains field and Lucene uses in-memory sort
RE: Out of memory on Solr sorting
Hi Daniel, I am afraid that didnt solve my problem. I was guessing my problem was that I have too much of data and too little memory allocated for that. I happened to read in couple of the posts which mentioned that I need VM that is close to the size of my data(folder). I have like 540 Megs now and a little more than a million and a half docs. Ideally in that case 512 megs should be enough for me. In fact I am able to perform all other operations now, commit, optmize, select, update, nightly cron jobs to index data again. etc etc with no hassles. Even my load tests perform very well. Just the sort and it doesnt seem to work. I allocated 2 gigs of memory now. Still same results. Used the GC params u gave me too. No change what so ever. Am not sure, whats going on. Is there something that I can do to find out how much is needed in actuality as my production server might need to be configured in accordance. I dont store any documents. We basically fetch standard column data from oracle database store them into Solr fields. Before I had EdgeNGram configured and had Solr 1.2, My data size was less that half of what it is right now. I guess if I remember right, it was of the order of 100 megs. The max size of a field right now might not cross a 100 chars too. Quizzled even more now. -Sundar P.S: My configurations : Solr 1.3 Red hat 540 megs of data (1855013 docs) 2 gigs of memory installed and allocated like this JAVA_OPTS=$JAVA_OPTS -Xms2048m -Xmx2048m -XX:MinHeapFreeRatio=50 -XX:NewSize=1024m -XX:NewRatio=2 -Dsun.rmi.dgc.client.gcInterval=360 -Dsun.rmi.dgc.server.gcInterval=360 Jboss 4.05 Subject: RE: Out of memory on Solr sorting Date: Wed, 23 Jul 2008 10:49:06 +0100 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Hi I haven't read the whole thread so I will take my chances here. I've been fighting recently to keep my Solr instances stable because they were frequently crashing with OutOfMemoryErrors. I'm using Solr 1.2 and when it happens there is a bug that makes the index locked unless you restart Solr... So in my cenario it was extremelly damaging. After some profiling I realized that my major problem was caused by the way the JVM heap was being used as I haven't configured it to run using any advanced configuration (I had just made it bigger - Xmx and Xms 1.5 Gb), it's running on Sun JVM 1.5 (the most recent 1.5 available) and it's deployed on a Jboss 4.2 on a RHEL. So my findings were too many objects were being allocated on the old generation area of the heap, which makes them harder to be disposed, and also the default behaviour was letting the heap get too filled up before kicking a GC and according to the JVM specs the default is if after a short period when a full gc is executed if a certain percentage of the heap is not freed an OutOfMemoryError should be thrown. I've changed my JVM startup params and it's working extremelly stable since then: -Xmx2048m -Xms2048m -XX:MinHeapFreeRatio=50 -XX:NewSize=1024m -XX:NewRatio=2 -Dsun.rmi.dgc.client.gcInterval=360 -Dsun.rmi.dgc.server.gcInterval=360 I hope it helps. Regards, Daniel Alheiros -Original Message- From: Fuad Efendi [mailto:[EMAIL PROTECTED] Sent: 22 July 2008 23:23 To: solr-user@lucene.apache.org Subject: RE: Out of memory on Solr sorting Yes, it is a cache, it stores sorted by sorted field array of Document IDs together with sorted fields; query results can intersect with it and reorder accordingly. But memory requirements should be well documented. It uses internally WeakHashMap which is not good(!!!) - a lot of underground warming ups of caches which SOLR is not aware of... Could be. I think Lucene-SOLR developers should join this discussion: /** * Expert: The default cache implementation, storing all values in memory. * A WeakHashMap is used for storage. * .. // inherit javadocs public StringIndex getStringIndex(IndexReader reader, String field) throws IOException { return (StringIndex) stringsIndexCache.get(reader, field); } Cache stringsIndexCache = new Cache() { protected Object createValue(IndexReader reader, Object fieldKey) throws IOException { String field = ((String) fieldKey).intern(); final int[] retArray = new int[reader.maxDoc()]; String[] mterms = new String[reader.maxDoc()+1]; TermDocs termDocs = reader.termDocs(); TermEnum termEnum = reader.terms (new Term (field, )); Quoting Fuad Efendi [EMAIL PROTECTED]: I am hoping [new StringIndex (retArray, mterms)] is called only once per-sort-field and cached somewhere at Lucene; theoretically you need multiply number of documents on size of field (supposing that field contains unique text); you need not tokenize this field; you need not store TermVector. for 2 000 000
RE: Out of memory on Solr sorting
From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Out of memory on Solr sorting Date: Tue, 22 Jul 2008 19:11:02 + Hi, Sorry again fellos. I am not sure whats happening. The day with solr is bad for me I guess. EZMLM didnt let me send any mails this morning. Asked me to confirm subscription and when I did, it said I was already a member. Now my mails are all coming out bad. Sorry for troubling y'all this bad. I hope this mail comes out right. Hi, We are developing a product in a agile manner and the current implementation has a data of size just about a 800 megs in dev. The memory allocated to solr on dev (Dual core Linux box) is 128-512. My config = !-- autocommit pending docs if certain criteria are met autoCommit maxDocs1/maxDocs maxTime1000/maxTime /autoCommit -- filterCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=256/ queryResultCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=256/ documentCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ enableLazyFieldLoadingtrue/enableLazyFieldLoading My Field === fieldType name=autocomplete class=solr.TextField analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9]) replacement= replace=all / filter class=solr.EdgeNGramFilterFactory maxGramSize=100 minGramSize=1 / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9]) replacement= replace=all / filter class=solr.PatternReplaceFilterFactory pattern=^(.{20})(.*)? replacement=$1 replace=all / /analyzer /fieldType Problem == I execute a query that returns 24 rows of result. I pick 10 out of it. I have no problem when I execute this. But When I do sort it by a String field that is fetched from this result. I get an OOM. I am able to execute several other queries with no problem. Just having a sort asc clause added to the query throws an OOM. Why is that. What should I have ideally done. My config on QA is pretty similar to the dev box and probably has more data than on dev. It didnt throw any OOM during the integration test. The Autocomplete is a new field we added recently. Another point is that the indexing is done with a field of type string field name=XXX type=string indexed=true stored=true termVectors=true/ and the autocomplete field is a copy field. The sorting is done based on string field. Please do lemme know what mistake am I doing? Regards Sundar P.S: The stack trace of the exception is Caused by: org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:86) at org.apache.solr.client.solrj.impl.BaseSolrServer.query(BaseSolrServer.java:101) at com.apollo.sisaw.solr.service.AbstractSolrSearchService.makeSolrQuery(AbstractSolrSearchService.java:193) ... 105 more Caused by: org.apache.solr.common.SolrException: Java heap space java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) at org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:352) at org.apache.lucene.search.FieldSortedHitQueue.comparatorString(FieldSortedHitQueue.java:416) at org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:207) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) at org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:168) at org.apache.lucene.search.FieldSortedHitQueue.init(FieldSortedHitQueue.java:56) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:907) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:838) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:269) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:156) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:128) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1025) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at
RE: Out of memory on Solr sorting
org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403) - this piece of code do not request Array[100M] (as I seen with Lucene), it asks only few bytes / Kb for a field... Probably 128 - 512 is not enough; it is also advisable to use equal sizes -Xms1024M -Xmx1024M (it minimizes GC frequency, and itensures that 1024M is available at startup) OOM happens also with fragmented memory, when application requests big contigues fragment and GC is unable to optimize; looks like your application requests a little and memory is not available... Quoting sundar shankar [EMAIL PROTECTED]: From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Out of memory on Solr sorting Date: Tue, 22 Jul 2008 19:11:02 + Hi, Sorry again fellos. I am not sure whats happening. The day with solr is bad for me I guess. EZMLM didnt let me send any mails this morning. Asked me to confirm subscription and when I did, it said I was already a member. Now my mails are all coming out bad. Sorry for troubling y'all this bad. I hope this mail comes out right. Hi, We are developing a product in a agile manner and the current implementation has a data of size just about a 800 megs in dev. The memory allocated to solr on dev (Dual core Linux box) is 128-512. My config = !-- autocommit pending docs if certain criteria are met autoCommit maxDocs1/maxDocs maxTime1000/maxTime /autoCommit -- filterCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=256/ queryResultCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=256/ documentCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ enableLazyFieldLoadingtrue/enableLazyFieldLoading My Field === fieldType name=autocomplete class=solr.TextField analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9]) replacement= replace=all / filter class=solr.EdgeNGramFilterFactory maxGramSize=100 minGramSize=1 / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9]) replacement= replace=all / filter class=solr.PatternReplaceFilterFactory pattern=^(.{20})(.*)? replacement=$1 replace=all / /analyzer /fieldType Problem == I execute a query that returns 24 rows of result. I pick 10 out of it. I have no problem when I execute this. But When I do sort it by a String field that is fetched from this result. I get an OOM. I am able to execute several other queries with no problem. Just having a sort asc clause added to the query throws an OOM. Why is that. What should I have ideally done. My config on QA is pretty similar to the dev box and probably has more data than on dev. It didnt throw any OOM during the integration test. The Autocomplete is a new field we added recently. Another point is that the indexing is done with a field of type string field name=XXX type=string indexed=true stored=true termVectors=true/ and the autocomplete field is a copy field. The sorting is done based on string field. Please do lemme know what mistake am I doing? Regards Sundar P.S: The stack trace of the exception is Caused by: org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:86) at org.apache.solr.client.solrj.impl.BaseSolrServer.query(BaseSolrServer.java:101) at com.apollo.sisaw.solr.service.AbstractSolrSearchService.makeSolrQuery(AbstractSolrSearchService.java:193) ... 105 more Caused by: org.apache.solr.common.SolrException: Java heap space java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) at org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:352) at org.apache.lucene.search.FieldSortedHitQueue.comparatorString(FieldSortedHitQueue.java:416) at org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:207) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) at org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:168) at org.apache.lucene.search.FieldSortedHitQueue.init(FieldSortedHitQueue.java:56) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:907) at
RE: Out of memory on Solr sorting
Thanks Fuad. But why does just sorting provide an OOM. I executed the query without adding the sort clause it executed perfectly. In fact I even tried remove the maxrows=10 and executed. it came out fine. Queries with bigger results seems to come out fine too. But why just sort of that too just 10 rows?? -Sundar Date: Tue, 22 Jul 2008 12:24:35 -0700 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: RE: Out of memory on Solr sorting org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403) - this piece of code do not request Array[100M] (as I seen with Lucene), it asks only few bytes / Kb for a field... Probably 128 - 512 is not enough; it is also advisable to use equal sizes -Xms1024M -Xmx1024M (it minimizes GC frequency, and itensures that 1024M is available at startup) OOM happens also with fragmented memory, when application requests big contigues fragment and GC is unable to optimize; looks like your application requests a little and memory is not available... Quoting sundar shankar [EMAIL PROTECTED]: From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Out of memory on Solr sorting Date: Tue, 22 Jul 2008 19:11:02 +Hi, Sorry again fellos. I am not sure whats happening. The day with solr is bad for me I guess. EZMLM didnt let me send any mails this morning. Asked me to confirm subscription and when I did, it said I was already a member. Now my mails are all coming out bad. Sorry for troubling y'all this bad. I hope this mail comes out right.Hi, We are developing a product in a agile manner and the current implementation has a data of size just about a 800 megs in dev. The memory allocated to solr on dev (Dual core Linux box) is 128-512. My config = !-- autocommit pending docs if certain criteria are met autoCommit maxDocs1/maxDocs maxTime1000/maxTime /autoCommit -- filterCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=256/ queryResultCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=256/ documentCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ enableLazyFieldLoadingtrue/enableLazyFieldLoadingMy Field === fieldType name=autocomplete class=solr.TextField analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9]) replacement= replace=all / filter class=solr.EdgeNGramFilterFactory maxGramSize=100 minGramSize=1 / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9]) replacement= replace=all / filter class=solr.PatternReplaceFilterFactory pattern=^(.{20})(.*)? replacement=$1 replace=all / /analyzer /fieldType Problem == I execute a query that returns 24 rows of result. I pick 10 out of it. I have no problem when I execute this. But When I do sort it by a String field that is fetched from this result. I get an OOM. I am able to execute several other queries with no problem. Just having a sort asc clause added to the query throws an OOM. Why is that. What should I have ideally done. My config on QA is pretty similar to the dev box and probably has more data than on dev. It didnt throw any OOM during the integration test. The Autocomplete is a new field we added recently. Another point is that the indexing is done with a field of type string field name=XXX type=string indexed=true stored=true termVectors=true/ and the autocomplete field is a copy field. The sorting is done based on string field. Please do lemme know what mistake am I doing? Regards Sundar P.S: The stack trace of the exception isCaused by: org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:86) at org.apache.solr.client.solrj.impl.BaseSolrServer.query(BaseSolrServer.java:101) at com.apollo.sisaw.solr.service.AbstractSolrSearchService.makeSolrQuery(AbstractSolrSearchService.java:193) ... 105 more Caused by: org.apache.solr.common.SolrException: Java heap space java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) at org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:352) at org.apache.lucene.search.FieldSortedHitQueue.comparatorString(FieldSortedHitQueue.java:416) at org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:207
Re: Out of memory on Solr sorting
Because to sort efficiently, Solr loads the term to sort on for each doc in the index into an array. For ints,longs, etc its just an array the size of the number of docs in your index (i believe deleted or not). For a String its an array to hold each unique string and an array of ints indexing into the String array. So if you do a sort, and search for something that only gets 1 doc as a hit...your still loading up that field cache for every single doc in your index on the first search. With solr, this happens in the background as it warms up the searcher. The end story is, you need more RAM to accommodate the sort most likely...have you upped your xmx setting? I think you can roughly say a 2 million doc index would need 40-50 MB (depending and rough, but to give an idea) per field your sorting on. - Mark sundar shankar wrote: Thanks Fuad. But why does just sorting provide an OOM. I executed the query without adding the sort clause it executed perfectly. In fact I even tried remove the maxrows=10 and executed. it came out fine. Queries with bigger results seems to come out fine too. But why just sort of that too just 10 rows?? -Sundar Date: Tue, 22 Jul 2008 12:24:35 -0700 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: RE: Out of memory on Solr sorting org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403) - this piece of code do not request Array[100M] (as I seen with Lucene), it asks only few bytes / Kb for a field... Probably 128 - 512 is not enough; it is also advisable to use equal sizes -Xms1024M -Xmx1024M (it minimizes GC frequency, and itensures that 1024M is available at startup) OOM happens also with fragmented memory, when application requests big contigues fragment and GC is unable to optimize; looks like your application requests a little and memory is not available... Quoting sundar shankar [EMAIL PROTECTED]: From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Out of memory on Solr sorting Date: Tue, 22 Jul 2008 19:11:02 +Hi, Sorry again fellos. I am not sure whats happening. The day with solr is bad for me I guess. EZMLM didnt let me send any mails this morning. Asked me to confirm subscription and when I did, it said I was already a member. Now my mails are all coming out bad. Sorry for troubling y'all this bad. I hope this mail comes out right.Hi, We are developing a product in a agile manner and the current implementation has a data of size just about a 800 megs in dev. The memory allocated to solr on dev (Dual core Linux box) is 128-512. My config = !-- autocommit pending docs if certain criteria are met autoCommit maxDocs1/maxDocs maxTime1000/maxTime /autoCommit -- filterCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=256/ queryResultCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=256/ documentCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ enableLazyFieldLoadingtrue/enableLazyFieldLoadingMy Field === fieldType name=autocomplete class=solr.TextField analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9]) replacement= replace=all / filter class=solr.EdgeNGramFilterFactory maxGramSize=100 minGramSize=1 / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9]) replacement= replace=all / filter class=solr.PatternReplaceFilterFactory pattern=^(.{20})(.*)? replacement=$1 replace=all / /analyzer /fieldTypeProblem == I execute a query that returns 24 rows of result. I pick 10 out of it. I have no problem when I execute this. But When I do sort it by a String field that is fetched from this result. I get an OOM. I am able to execute several other queries with no problem. Just having a sort asc clause added to the query throws an OOM. Why is that. What should I have ideally done. My config on QA is pretty similar to the dev box and probably has more data than on dev. It didnt throw any OOM during the integration test. The Autocomplete is a new field we added recently. Another point is that the indexing is done with a field of type string field name=XXX type=string indexed=true stored=true termVectors=true/ and the autocomplete field is a copy field. The sorting is done based on string field. Please do lemme know what mistake am I doing? Regards Sundar P.S: The stack trace of the exception isCaused by: org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:86) at org.apache.solr.client.solrj.impl.BaseSolrServer.query(BaseSolrServer.java:101
Re: Out of memory on Solr sorting
I've even seen exceptions (posted here) when sort-type queries caused Lucene to allocate 100Mb arrays, here is what happened to me: SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 100767936, Num elements: 25191979 at org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:360) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) - it does not happen after I increased from 4096M to 8192M (JRockit R27; more intelligent stacktrace, isn't it?) Thanks Mark; I didn't know that it happens only once (on warming up a searcher). Quoting Mark Miller [EMAIL PROTECTED]: Because to sort efficiently, Solr loads the term to sort on for each doc in the index into an array. For ints,longs, etc its just an array the size of the number of docs in your index (i believe deleted or not). For a String its an array to hold each unique string and an array of ints indexing into the String array. So if you do a sort, and search for something that only gets 1 doc as a hit...your still loading up that field cache for every single doc in your index on the first search. With solr, this happens in the background as it warms up the searcher. The end story is, you need more RAM to accommodate the sort most likely...have you upped your xmx setting? I think you can roughly say a 2 million doc index would need 40-50 MB (depending and rough, but to give an idea) per field your sorting on. - Mark sundar shankar wrote: Thanks Fuad. But why does just sorting provide an OOM. I executed the query without adding the sort clause it executed perfectly. In fact I even tried remove the maxrows=10 and executed. it came out fine. Queries with bigger results seems to come out fine too. But why just sort of that too just 10 rows?? -Sundar Date: Tue, 22 Jul 2008 12:24:35 -0700 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: RE: Out of memory on Solr sorting org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403) - this piece of code do not request Array[100M] (as I seen with Lucene), it asks only few bytes / Kb for a field... Probably 128 - 512 is not enough; it is also advisable to use equal sizes -Xms1024M -Xmx1024M (it minimizes GC frequency, and itensures that 1024M is available at startup) OOM happens also with fragmented memory, when application requests big contigues fragment and GC is unable to optimize; looks like your application requests a little and memory is not available... Quoting sundar shankar [EMAIL PROTECTED]: From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Out of memory on Solr sorting Date: Tue, 22 Jul 2008 19:11:02 +Hi, Sorry again fellos. I am not sure whats happening. The day with solr is bad for me I guess. EZMLM didnt let me send any mails this morning. Asked me to confirm subscription and when I did, it said I was already a member. Now my mails are all coming out bad. Sorry for troubling y'all this bad. I hope this mail comes out right.Hi, We are developing a product in a agile manner and the current implementation has a data of size just about a 800 megs in dev. The memory allocated to solr on dev (Dual core Linux box) is 128-512. My config = !-- autocommit pending docs if certain criteria are met autoCommit maxDocs1/maxDocs maxTime1000/maxTime /autoCommit -- filterCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=256/ queryResultCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=256/ documentCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ enableLazyFieldLoadingtrue/enableLazyFieldLoadingMy Field === fieldType name=autocomplete class=solr.TextField analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9]) replacement= replace=all / filter class=solr.EdgeNGramFilterFactory maxGramSize=100 minGramSize=1 / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9]) replacement= replace=all / filter class=solr.PatternReplaceFilterFactory pattern=^(.{20})(.*)? replacement=$1 replace=all / /analyzer /fieldTypeProblem == I execute a query that returns 24 rows of result. I pick 10 out of it. I have no problem when I execute this. But When I do sort it by a String field that is fetched from this result. I get an OOM. I am able to execute several other queries with no problem. Just having a sort asc clause added to the query throws an OOM. Why is that. What should I have ideally done. My config on QA is pretty similar to the dev box and probably has more data than on dev. It didnt throw any OOM during
Re: Out of memory on Solr sorting
SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 100767936, Num elements: 25191979 I just noticed, this is an exact number of documents in index: 25191979 (http://www.tokenizer.org/, you can sort - click headers Id, [COuntry, Site, Price] in a table; experimental) If array is allocated ONLY on new searcher warming up I am _extremely_ happy... I had constant OOMs during past month (SUN Java 5). Quoting Fuad Efendi [EMAIL PROTECTED]: I've even seen exceptions (posted here) when sort-type queries caused Lucene to allocate 100Mb arrays, here is what happened to me: SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 100767936, Num elements: 25191979 at org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:360) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) - it does not happen after I increased from 4096M to 8192M (JRockit R27; more intelligent stacktrace, isn't it?) Thanks Mark; I didn't know that it happens only once (on warming up a searcher). Quoting Mark Miller [EMAIL PROTECTED]: Because to sort efficiently, Solr loads the term to sort on for each doc in the index into an array. For ints,longs, etc its just an array the size of the number of docs in your index (i believe deleted or not). For a String its an array to hold each unique string and an array of ints indexing into the String array. So if you do a sort, and search for something that only gets 1 doc as a hit...your still loading up that field cache for every single doc in your index on the first search. With solr, this happens in the background as it warms up the searcher. The end story is, you need more RAM to accommodate the sort most likely...have you upped your xmx setting? I think you can roughly say a 2 million doc index would need 40-50 MB (depending and rough, but to give an idea) per field your sorting on. - Mark sundar shankar wrote: Thanks Fuad. But why does just sorting provide an OOM. I executed the query without adding the sort clause it executed perfectly. In fact I even tried remove the maxrows=10 and executed. it came out fine. Queries with bigger results seems to come out fine too. But why just sort of that too just 10 rows?? -Sundar Date: Tue, 22 Jul 2008 12:24:35 -0700 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: RE: Out of memory on Solr sorting org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403) - this piece of code do not request Array[100M] (as I seen with Lucene), it asks only few bytes / Kb for a field... Probably 128 - 512 is not enough; it is also advisable to use equal sizes -Xms1024M -Xmx1024M (it minimizes GC frequency, and itensures that 1024M is available at startup) OOM happens also with fragmented memory, when application requests big contigues fragment and GC is unable to optimize; looks like your application requests a little and memory is not available... Quoting sundar shankar [EMAIL PROTECTED]: From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Out of memory on Solr sorting Date: Tue, 22 Jul 2008 19:11:02 +Hi, Sorry again fellos. I am not sure whats happening. The day with solr is bad for me I guess. EZMLM didnt let me send any mails this morning. Asked me to confirm subscription and when I did, it said I was already a member. Now my mails are all coming out bad. Sorry for troubling y'all this bad. I hope this mail comes out right.Hi, We are developing a product in a agile manner and the current implementation has a data of size just about a 800 megs in dev. The memory allocated to solr on dev (Dual core Linux box) is 128-512. My config = !-- autocommit pending docs if certain criteria are met autoCommit maxDocs1/maxDocs maxTime1000/maxTime /autoCommit -- filterCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=256/ queryResultCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=256/ documentCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ enableLazyFieldLoadingtrue/enableLazyFieldLoadingMy Field === fieldType name=autocomplete class=solr.TextField analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9]) replacement= replace=all / filter class=solr.EdgeNGramFilterFactory maxGramSize=100 minGramSize=1 / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9]) replacement= replace=all / filter class=solr.PatternReplaceFilterFactory pattern=^(.{20})(.*)? replacement=$1 replace=all / /analyzer /fieldTypeProblem == I execute a query that returns
RE: Out of memory on Solr sorting
Thanks for the explanation mark. The reason I had it as 512 max was cos earlier the data file was just about 30 megs and it increased to this much for of the usage of EdgeNGramFactoryFilter for 2 fields. Thats great to know it just happens for the first search. But this exception has been occuring for me for the whole of today. Should I fiddle around with the warmer settings too? I have also instructed an increase in Heap to 1024. Will keep you posted on the turn arounds. Thanks -Sundar Date: Tue, 22 Jul 2008 15:46:04 -0400 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Re: Out of memory on Solr sorting Because to sort efficiently, Solr loads the term to sort on for each doc in the index into an array. For ints,longs, etc its just an array the size of the number of docs in your index (i believe deleted or not). For a String its an array to hold each unique string and an array of ints indexing into the String array. So if you do a sort, and search for something that only gets 1 doc as a hit...your still loading up that field cache for every single doc in your index on the first search. With solr, this happens in the background as it warms up the searcher. The end story is, you need more RAM to accommodate the sort most likely...have you upped your xmx setting? I think you can roughly say a 2 million doc index would need 40-50 MB (depending and rough, but to give an idea) per field your sorting on. - Mark sundar shankar wrote: Thanks Fuad. But why does just sorting provide an OOM. I executed the query without adding the sort clause it executed perfectly. In fact I even tried remove the maxrows=10 and executed. it came out fine. Queries with bigger results seems to come out fine too. But why just sort of that too just 10 rows?? -Sundar Date: Tue, 22 Jul 2008 12:24:35 -0700 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: RE: Out of memory on Solr sorting org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403) - this piece of code do not request Array[100M] (as I seen with Lucene), it asks only few bytes / Kb for a field... Probably 128 - 512 is not enough; it is also advisable to use equal sizes -Xms1024M -Xmx1024M (it minimizes GC frequency, and itensures that 1024M is available at startup) OOM happens also with fragmented memory, when application requests big contigues fragment and GC is unable to optimize; looks like your application requests a little and memory is not available... Quoting sundar shankar [EMAIL PROTECTED]: From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Out of memory on Solr sorting Date: Tue, 22 Jul 2008 19:11:02 +Hi, Sorry again fellos. I am not sure whats happening. The day with solr is bad for me I guess. EZMLM didnt let me send any mails this morning. Asked me to confirm subscription and when I did, it said I was already a member. Now my mails are all coming out bad. Sorry for troubling y'all this bad. I hope this mail comes out right.Hi, We are developing a product in a agile manner and the current implementation has a data of size just about a 800 megs in dev. The memory allocated to solr on dev (Dual core Linux box) is 128-512. My config = !-- autocommit pending docs if certain criteria are met autoCommit maxDocs1/maxDocs maxTime1000/maxTime /autoCommit -- filterCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=256/ queryResultCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=256/ documentCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ enableLazyFieldLoadingtrue/enableLazyFieldLoadingMy Field === fieldType name=autocomplete class=solr.TextField analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9]) replacement= replace=all / filter class=solr.EdgeNGramFilterFactory maxGramSize=100 minGramSize=1 / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9]) replacement= replace=all / filter class=solr.PatternReplaceFilterFactory pattern=^(.{20})(.*)? replacement=$1 replace=all / /analyzer /fieldType Problem == I execute a query that returns 24 rows of result. I pick 10 out of it. I have no problem when I execute this. But When I do sort it by a String field that is fetched from this result. I get an OOM. I am able to execute several other queries with no problem. Just having a sort asc clause added to the query throws an OOM. Why is that. What
RE: Out of memory on Solr sorting
Sorry, Not 30, but 300 :) From: [EMAIL PROTECTED]: [EMAIL PROTECTED]: RE: Out of memory on Solr sortingDate: Tue, 22 Jul 2008 20:19:49 + Thanks for the explanation mark. The reason I had it as 512 max was cos earlier the data file was just about 30 megs and it increased to this much for of the usage of EdgeNGramFactoryFilter for 2 fields. Thats great to know it just happens for the first search. But this exception has been occuring for me for the whole of today. Should I fiddle around with the warmer settings too? I have also instructed an increase in Heap to 1024. Will keep you posted on the turn arounds.Thanks-Sundar Date: Tue, 22 Jul 2008 15:46:04 -0400 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Re: Out of memory on Solr sorting Because to sort efficiently, Solr loads the term to sort on for each doc in the index into an array. For ints,longs, etc its just an array the size of the number of docs in your index (i believe deleted or not). For a String its an array to hold each unique string and an array of ints indexing into the String array. So if you do a sort, and search for something that only gets 1 doc as a hit...your still loading up that field cache for every single doc in your index on the first search. With solr, this happens in the background as it warms up the searcher. The end story is, you need more RAM to accommodate the sort most likely...have you upped your xmx setting? I think you can roughly say a 2 million doc index would need 40-50 MB (depending and rough, but to give an idea) per field your sorting on. - Mark sundar shankar wrote: Thanks Fuad. But why does just sorting provide an OOM. I executed the query without adding the sort clause it executed perfectly. In fact I even tried remove the maxrows=10 and executed. it came out fine. Queries with bigger results seems to come out fine too. But why just sort of that too just 10 rows??-Sundar Date: Tue, 22 Jul 2008 12:24:35 -0700 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: RE: Out of memory on Solr sorting org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403) - this piece of code do not request Array[100M] (as I seen with Lucene), it asks only few bytes / Kb for a field... Probably 128 - 512 is not enough; it is also advisable to use equal sizes -Xms1024M -Xmx1024M (it minimizes GC frequency, and itensures that 1024M is available at startup) OOM happens also with fragmented memory, when application requests big contigues fragment and GC is unable to optimize; looks like your application requests a little and memory is not available... Quoting sundar shankar [EMAIL PROTECTED]: From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Out of memory on Solr sorting Date: Tue, 22 Jul 2008 19:11:02 +Hi, Sorry again fellos. I am not sure whats happening. The day with solr is bad for me I guess. EZMLM didnt let me send any mails this morning. Asked me to confirm subscription and when I did, it said I was already a member. Now my mails are all coming out bad. Sorry for troubling y'all this bad. I hope this mail comes out right.Hi, We are developing a product in a agile manner and the current implementation has a data of size just about a 800 megs in dev. The memory allocated to solr on dev (Dual core Linux box) is 128-512. My config = !-- autocommit pending docs if certain criteria are met autoCommit maxDocs1/maxDocs maxTime1000/maxTime /autoCommit -- filterCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=256/ queryResultCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=256/ documentCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ enableLazyFieldLoadingtrue/enableLazyFieldLoadingMy Field === fieldType name=autocomplete class=solr.TextField analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9]) replacement= replace=all / filter class=solr.EdgeNGramFilterFactory maxGramSize=100 minGramSize=1 / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9]) replacement= replace=all / filter class=solr.PatternReplaceFilterFactory pattern=^(.{20})(.*)? replacement=$1 replace=all / /analyzer /fieldType Problem == I execute a query that returns 24 rows of result. I pick 10 out of it. I have no problem when I execute this. But When I do sort it by a String field that is fetched from this result. I get an OOM. I am able to execute several other queries with no problem. Just having a sort asc clause added to the query throws an OOM. Why is that. What should I
Re: Out of memory on Solr sorting
Fuad Efendi wrote: SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 100767936, Num elements: 25191979 I just noticed, this is an exact number of documents in index: 25191979 (http://www.tokenizer.org/, you can sort - click headers Id, [COuntry, Site, Price] in a table; experimental) If array is allocated ONLY on new searcher warming up I am _extremely_ happy... I had constant OOMs during past month (SUN Java 5). It is only on warmup - I believe its lazy loaded, so the first time a search is done (solr does the search as part of warmup I believe) the fieldcache is loaded. The underlying IndexReader is the key to the fieldcache, so until you get a new IndexReader (SolrSearcher in solr world?) the field cache will be good. Keep in mind that as a searcher is warming, the other search is still serving, so that will up the ram requirements...and since I think you can have 1 searchers on deck...you get the idea. As far as the number I gave, thats from a memory made months and months ago, so go with what you see. Quoting Fuad Efendi [EMAIL PROTECTED]: I've even seen exceptions (posted here) when sort-type queries caused Lucene to allocate 100Mb arrays, here is what happened to me: SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 100767936, Num elements: 25191979 at org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:360) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) - it does not happen after I increased from 4096M to 8192M (JRockit R27; more intelligent stacktrace, isn't it?) Thanks Mark; I didn't know that it happens only once (on warming up a searcher). Quoting Mark Miller [EMAIL PROTECTED]: Because to sort efficiently, Solr loads the term to sort on for each doc in the index into an array. For ints,longs, etc its just an array the size of the number of docs in your index (i believe deleted or not). For a String its an array to hold each unique string and an array of ints indexing into the String array. So if you do a sort, and search for something that only gets 1 doc as a hit...your still loading up that field cache for every single doc in your index on the first search. With solr, this happens in the background as it warms up the searcher. The end story is, you need more RAM to accommodate the sort most likely...have you upped your xmx setting? I think you can roughly say a 2 million doc index would need 40-50 MB (depending and rough, but to give an idea) per field your sorting on. - Mark sundar shankar wrote: Thanks Fuad. But why does just sorting provide an OOM. I executed the query without adding the sort clause it executed perfectly. In fact I even tried remove the maxrows=10 and executed. it came out fine. Queries with bigger results seems to come out fine too. But why just sort of that too just 10 rows?? -Sundar Date: Tue, 22 Jul 2008 12:24:35 -0700 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: RE: Out of memory on Solr sorting org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403) - this piece of code do not request Array[100M] (as I seen with Lucene), it asks only few bytes / Kb for a field... Probably 128 - 512 is not enough; it is also advisable to use equal sizes -Xms1024M -Xmx1024M (it minimizes GC frequency, and itensures that 1024M is available at startup) OOM happens also with fragmented memory, when application requests big contigues fragment and GC is unable to optimize; looks like your application requests a little and memory is not available... Quoting sundar shankar [EMAIL PROTECTED]: From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Out of memory on Solr sorting Date: Tue, 22 Jul 2008 19:11:02 +Hi, Sorry again fellos. I am not sure whats happening. The day with solr is bad for me I guess. EZMLM didnt let me send any mails this morning. Asked me to confirm subscription and when I did, it said I was already a member. Now my mails are all coming out bad. Sorry for troubling y'all this bad. I hope this mail comes out right.Hi, We are developing a product in a agile manner and the current implementation has a data of size just about a 800 megs in dev. The memory allocated to solr on dev (Dual core Linux box) is 128-512. My config = !-- autocommit pending docs if certain criteria are met autoCommit maxDocs1/maxDocs maxTime1000/maxTime /autoCommit -- filterCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=256/ queryResultCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=256/ documentCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ enableLazyFieldLoadingtrue/enableLazyFieldLoadingMy Field === fieldType name=autocomplete class=solr.TextField analyzer
Re: Out of memory on Solr sorting
Mark, Question: how much memory I need for 25,000,000 docs if I do a sort by string field, 256 bytes. 6.4Gb? Quoting Mark Miller [EMAIL PROTECTED]: Because to sort efficiently, Solr loads the term to sort on for each doc in the index into an array. For ints,longs, etc its just an array the size of the number of docs in your index (i believe deleted or not). For a String its an array to hold each unique string and an array of ints indexing into the String array. So if you do a sort, and search for something that only gets 1 doc as a hit...your still loading up that field cache for every single doc in your index on the first search. With solr, this happens in the background as it warms up the searcher. The end story is, you need more RAM to accommodate the sort most likely...have you upped your xmx setting? I think you can roughly say a 2 million doc index would need 40-50 MB (depending and rough, but to give an idea) per field your sorting on. - Mark
Re: Out of memory on Solr sorting
Thank you very much Mark, it explains me a lot; I am guessing: for 1,000,000 documents with a [string] field of average size 1024 bytes I need 1Gb for single IndexSearcher instance; field-level cache it is used internally by Lucene (can Lucene manage size if it?); we can't have 1G of such documents without having 1Tb RAM... Quoting Mark Miller [EMAIL PROTECTED]: Fuad Efendi wrote: SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 100767936, Num elements: 25191979 I just noticed, this is an exact number of documents in index: 25191979 (http://www.tokenizer.org/, you can sort - click headers Id, [COuntry, Site, Price] in a table; experimental) If array is allocated ONLY on new searcher warming up I am _extremely_ happy... I had constant OOMs during past month (SUN Java 5). It is only on warmup - I believe its lazy loaded, so the first time a search is done (solr does the search as part of warmup I believe) the fieldcache is loaded. The underlying IndexReader is the key to the fieldcache, so until you get a new IndexReader (SolrSearcher in solr world?) the field cache will be good. Keep in mind that as a searcher is warming, the other search is still serving, so that will up the ram requirements...and since I think you can have 1 searchers on deck...you get the idea. As far as the number I gave, thats from a memory made months and months ago, so go with what you see. Quoting Fuad Efendi [EMAIL PROTECTED]: I've even seen exceptions (posted here) when sort-type queries caused Lucene to allocate 100Mb arrays, here is what happened to me: SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 100767936, Num elements: 25191979 at org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:360) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) - it does not happen after I increased from 4096M to 8192M (JRockit R27; more intelligent stacktrace, isn't it?) Thanks Mark; I didn't know that it happens only once (on warming up a searcher). Quoting Mark Miller [EMAIL PROTECTED]: Because to sort efficiently, Solr loads the term to sort on for each doc in the index into an array. For ints,longs, etc its just an array the size of the number of docs in your index (i believe deleted or not). For a String its an array to hold each unique string and an array of ints indexing into the String array. So if you do a sort, and search for something that only gets 1 doc as a hit...your still loading up that field cache for every single doc in your index on the first search. With solr, this happens in the background as it warms up the searcher. The end story is, you need more RAM to accommodate the sort most likely...have you upped your xmx setting? I think you can roughly say a 2 million doc index would need 40-50 MB (depending and rough, but to give an idea) per field your sorting on. - Mark sundar shankar wrote: Thanks Fuad. But why does just sorting provide an OOM. I executed the query without adding the sort clause it executed perfectly. In fact I even tried remove the maxrows=10 and executed. it came out fine. Queries with bigger results seems to come out fine too. But why just sort of that too just 10 rows?? -Sundar Date: Tue, 22 Jul 2008 12:24:35 -0700 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: RE: Out of memory on Solr sorting org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403) - this piece of code do not request Array[100M] (as I seen with Lucene), it asks only few bytes / Kb for a field... Probably 128 - 512 is not enough; it is also advisable to use equal sizes -Xms1024M -Xmx1024M (it minimizes GC frequency, and itensures that 1024M is available at startup) OOM happens also with fragmented memory, when application requests big contigues fragment and GC is unable to optimize; looks like your application requests a little and memory is not available... Quoting sundar shankar [EMAIL PROTECTED]: From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Out of memory on Solr sorting Date: Tue, 22 Jul 2008 19:11:02 +Hi, Sorry again fellos. I am not sure whats happening. The day with solr is bad for me I guess. EZMLM didnt let me send any mails this morning. Asked me to confirm subscription and when I did, it said I was already a member. Now my mails are all coming out bad. Sorry for troubling y'all this bad. I hope this mail comes out right.Hi, We are developing a product in a agile manner and the current implementation has a data of size just about a 800 megs in dev. The memory allocated to solr on dev (Dual core Linux box) is 128-512. My config = !-- autocommit pending docs if certain criteria are met autoCommit maxDocs1/maxDocs maxTime1000/maxTime /autoCommit -- filterCache class
Re: Out of memory on Solr sorting
Hmmm...I think its 32bits an integer with an index entry for each doc, so **25 000 000 x 32 bits = 95.3674316 megabytes** Then you have the string array that contains each unique term from your index...you can guess that based on the number of terms in your index and an avg length guess. There is some other overhead beyond the sort cache as well, but thats the bulk of what it will add. I think my memory may be bad with my original estimate :) Fuad Efendi wrote: Thank you very much Mark, it explains me a lot; I am guessing: for 1,000,000 documents with a [string] field of average size 1024 bytes I need 1Gb for single IndexSearcher instance; field-level cache it is used internally by Lucene (can Lucene manage size if it?); we can't have 1G of such documents without having 1Tb RAM... Quoting Mark Miller [EMAIL PROTECTED]: Fuad Efendi wrote: SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 100767936, Num elements: 25191979 I just noticed, this is an exact number of documents in index: 25191979 (http://www.tokenizer.org/, you can sort - click headers Id, [COuntry, Site, Price] in a table; experimental) If array is allocated ONLY on new searcher warming up I am _extremely_ happy... I had constant OOMs during past month (SUN Java 5). It is only on warmup - I believe its lazy loaded, so the first time a search is done (solr does the search as part of warmup I believe) the fieldcache is loaded. The underlying IndexReader is the key to the fieldcache, so until you get a new IndexReader (SolrSearcher in solr world?) the field cache will be good. Keep in mind that as a searcher is warming, the other search is still serving, so that will up the ram requirements...and since I think you can have 1 searchers on deck...you get the idea. As far as the number I gave, thats from a memory made months and months ago, so go with what you see. Quoting Fuad Efendi [EMAIL PROTECTED]: I've even seen exceptions (posted here) when sort-type queries caused Lucene to allocate 100Mb arrays, here is what happened to me: SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 100767936, Num elements: 25191979 at org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:360) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) - it does not happen after I increased from 4096M to 8192M (JRockit R27; more intelligent stacktrace, isn't it?) Thanks Mark; I didn't know that it happens only once (on warming up a searcher). Quoting Mark Miller [EMAIL PROTECTED]: Because to sort efficiently, Solr loads the term to sort on for each doc in the index into an array. For ints,longs, etc its just an array the size of the number of docs in your index (i believe deleted or not). For a String its an array to hold each unique string and an array of ints indexing into the String array. So if you do a sort, and search for something that only gets 1 doc as a hit...your still loading up that field cache for every single doc in your index on the first search. With solr, this happens in the background as it warms up the searcher. The end story is, you need more RAM to accommodate the sort most likely...have you upped your xmx setting? I think you can roughly say a 2 million doc index would need 40-50 MB (depending and rough, but to give an idea) per field your sorting on. - Mark sundar shankar wrote: Thanks Fuad. But why does just sorting provide an OOM. I executed the query without adding the sort clause it executed perfectly. In fact I even tried remove the maxrows=10 and executed. it came out fine. Queries with bigger results seems to come out fine too. But why just sort of that too just 10 rows?? -Sundar Date: Tue, 22 Jul 2008 12:24:35 -0700 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: RE: Out of memory on Solr sorting org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403) - this piece of code do not request Array[100M] (as I seen with Lucene), it asks only few bytes / Kb for a field... Probably 128 - 512 is not enough; it is also advisable to use equal sizes -Xms1024M -Xmx1024M (it minimizes GC frequency, and itensures that 1024M is available at startup) OOM happens also with fragmented memory, when application requests big contigues fragment and GC is unable to optimize; looks like your application requests a little and memory is not available... Quoting sundar shankar [EMAIL PROTECTED]: From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Out of memory on Solr sorting Date: Tue, 22 Jul 2008 19:11:02 + Hi, Sorry again fellos. I am not sure whats happening. The day with solr is bad for me I guess. EZMLM didnt let me send any mails this morning. Asked me to confirm subscription and when I did, it said I was already a member. Now my
RE: Out of memory on Solr sorting
Hi Mark, I am still getting an OOM even after increasing the heap to 1024. The docset I have is numDocs : 1138976 maxDoc : 1180554 Not sure how much more I would need. Is there any other way out of this. I noticed another interesting behavior. I have a Solr setup on a personal Box where I try out a lot of different configuration and stuff before I even roll the changes out to dev. This server has been running with a similar indexed data for a lot longer than the dev box and it seems to have fetched the results out properly. This box is a windows 2 core processor with just about a gig of memory and the whole 1024 megs have been allocated to heap. The dev is a linux with over 2 Gigs of memory and 1024 allocated to heap now. :S -Sundar Date: Tue, 22 Jul 2008 13:17:40 -0700 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Re: Out of memory on Solr sorting Mark, Question: how much memory I need for 25,000,000 docs if I do a sort by string field, 256 bytes. 6.4Gb? Quoting Mark Miller [EMAIL PROTECTED]: Because to sort efficiently, Solr loads the term to sort on for each doc in the index into an array. For ints,longs, etc its just an array the size of the number of docs in your index (i believe deleted or not). For a String its an array to hold each unique string and an array of ints indexing into the String array. So if you do a sort, and search for something that only gets 1 doc as a hit...your still loading up that field cache for every single doc in your index on the first search. With solr, this happens in the background as it warms up the searcher. The end story is, you need more RAM to accommodate the sort most likely...have you upped your xmx setting? I think you can roughly say a 2 million doc index would need 40-50 MB (depending and rough, but to give an idea) per field your sorting on. - Mark _ Wish to Marry Now? Click Here to Register FREE http://www.shaadi.com/registration/user/index.php?ptnr=mhottag
RE: Out of memory on Solr sorting
Thanks for your help Mark. Lemme explore a little more and see if some one else can help me out too. :) Date: Tue, 22 Jul 2008 16:53:47 -0400 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Re: Out of memory on Solr sorting Someone else is going to have to take over Sundar - I am new to solr myself. I will say this though - 25 million docs is pushing the limits of a single machine - especially with only 2 gig of RAM, especially with any sort fields. You are at the edge I believe. But perhaps you can get by. Have you checked out all the solr stats on the admin page? Maybe you are trying to load up to many searchers at a time. I think there is a setting to limit the number of searchers that can be on deck... sundar shankar wrote: Hi Mark, I am still getting an OOM even after increasing the heap to 1024. The docset I have is numDocs : 1138976 maxDoc : 1180554 Not sure how much more I would need. Is there any other way out of this. I noticed another interesting behavior. I have a Solr setup on a personal Box where I try out a lot of different configuration and stuff before I even roll the changes out to dev. This server has been running with a similar indexed data for a lot longer than the dev box and it seems to have fetched the results out properly. This box is a windows 2 core processor with just about a gig of memory and the whole 1024 megs have been allocated to heap. The dev is a linux with over 2 Gigs of memory and 1024 allocated to heap now. :S-Sundar Date: Tue, 22 Jul 2008 13:17:40 -0700 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Re: Out of memory on Solr sorting Mark, Question: how much memory I need for 25,000,000 docs if I do a sort by string field, 256 bytes. 6.4Gb? Quoting Mark Miller [EMAIL PROTECTED]: Because to sort efficiently, Solr loads the term to sort on for each doc in the index into an array. For ints,longs, etc its just an array the size of the number of docs in your index (i believe deleted or not). For a String its an array to hold each unique string and an array of ints indexing into the String array. So if you do a sort, and search for something that only gets 1 doc as a hit...your still loading up that field cache for every single doc in your index on the first search. With solr, this happens in the background as it warms up the searcher. The end story is, you need more RAM to accommodate the sort most likely...have you upped your xmx setting? I think you can roughly say a 2 million doc index would need 40-50 MB (depending and rough, but to give an idea) per field your sorting on. - Mark _ Wish to Marry Now? Click Here to Register FREE http://www.shaadi.com/registration/user/index.php?ptnr=mhottag _ Missed your favourite programme? Stop surfing TV channels and start planning your weekend TV viewing with our comprehensive TV Listing http://entertainment.in.msn.com/TV/TVListing.aspx
Re: Out of memory on Solr sorting
Ok, what is confusing me is implicit guess that FieldCache contains field and Lucene uses in-memory sort instead of using file-system index... Array syze: 100Mb (25M x 4 bytes), and it is just pointers (4-byte integers) to documents in index. org.apache.lucene.search.FieldCacheImpl$10.createValue ... 357: protected Object createValue(IndexReader reader, Object fieldKey) 358: throws IOException { 359: String field = ((String) fieldKey).intern(); 360: final int[] retArray = new int[reader.maxDoc()]; // OutOfMemoryError!!! ... 408: StringIndex value = new StringIndex (retArray, mterms); 409: return value; 410: } ... It's very confusing, I don't know such internals... field name=XXX type=string indexed=true stored=true termVectors=true/ The sorting is done based on string field. I think Sundar should not use [termVectors=true]... Quoting Mark Miller [EMAIL PROTECTED]: Hmmm...I think its 32bits an integer with an index entry for each doc, so **25 000 000 x 32 bits = 95.3674316 megabytes** Then you have the string array that contains each unique term from your index...you can guess that based on the number of terms in your index and an avg length guess. There is some other overhead beyond the sort cache as well, but thats the bulk of what it will add. I think my memory may be bad with my original estimate :) Fuad Efendi wrote: Thank you very much Mark, it explains me a lot; I am guessing: for 1,000,000 documents with a [string] field of average size 1024 bytes I need 1Gb for single IndexSearcher instance; field-level cache it is used internally by Lucene (can Lucene manage size if it?); we can't have 1G of such documents without having 1Tb RAM... Quoting Mark Miller [EMAIL PROTECTED]: Fuad Efendi wrote: SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 100767936, Num elements: 25191979 I just noticed, this is an exact number of documents in index: 25191979 (http://www.tokenizer.org/, you can sort - click headers Id, [COuntry, Site, Price] in a table; experimental) If array is allocated ONLY on new searcher warming up I am _extremely_ happy... I had constant OOMs during past month (SUN Java 5). It is only on warmup - I believe its lazy loaded, so the first time a search is done (solr does the search as part of warmup I believe) the fieldcache is loaded. The underlying IndexReader is the key to the fieldcache, so until you get a new IndexReader (SolrSearcher in solr world?) the field cache will be good. Keep in mind that as a searcher is warming, the other search is still serving, so that will up the ram requirements...and since I think you can have 1 searchers on deck...you get the idea. As far as the number I gave, thats from a memory made months and months ago, so go with what you see. Quoting Fuad Efendi [EMAIL PROTECTED]: I've even seen exceptions (posted here) when sort-type queries caused Lucene to allocate 100Mb arrays, here is what happened to me: SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 100767936, Num elements: 25191979 at org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:360) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) - it does not happen after I increased from 4096M to 8192M (JRockit R27; more intelligent stacktrace, isn't it?) Thanks Mark; I didn't know that it happens only once (on warming up a searcher). Quoting Mark Miller [EMAIL PROTECTED]: Because to sort efficiently, Solr loads the term to sort on for each doc in the index into an array. For ints,longs, etc its just an array the size of the number of docs in your index (i believe deleted or not). For a String its an array to hold each unique string and an array of ints indexing into the String array. So if you do a sort, and search for something that only gets 1 doc as a hit...your still loading up that field cache for every single doc in your index on the first search. With solr, this happens in the background as it warms up the searcher. The end story is, you need more RAM to accommodate the sort most likely...have you upped your xmx setting? I think you can roughly say a 2 million doc index would need 40-50 MB (depending and rough, but to give an idea) per field your sorting on. - Mark sundar shankar wrote: Thanks Fuad. But why does just sorting provide an OOM. I executed the query without adding the sort clause it executed perfectly. In fact I even tried remove the maxrows=10 and executed. it came out fine. Queries with bigger results seems to come out fine too. But why just sort of that too just 10 rows?? -Sundar Date: Tue, 22 Jul 2008 12:24:35 -0700 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: RE: Out of memory on Solr sorting org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403
Re: Out of memory on Solr sorting
. With solr, this happens in the background as it warms up the searcher. The end story is, you need more RAM to accommodate the sort most likely...have you upped your xmx setting? I think you can roughly say a 2 million doc index would need 40-50 MB (depending and rough, but to give an idea) per field your sorting on. - Mark sundar shankar wrote: Thanks Fuad. But why does just sorting provide an OOM. I executed the query without adding the sort clause it executed perfectly. In fact I even tried remove the maxrows=10 and executed. it came out fine. Queries with bigger results seems to come out fine too. But why just sort of that too just 10 rows?? -Sundar Date: Tue, 22 Jul 2008 12:24:35 -0700 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: RE: Out of memory on Solr sorting org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403) - this piece of code do not request Array[100M] (as I seen with Lucene), it asks only few bytes / Kb for a field... Probably 128 - 512 is not enough; it is also advisable to use equal sizes -Xms1024M -Xmx1024M (it minimizes GC frequency, and itensures that 1024M is available at startup) OOM happens also with fragmented memory, when application requests big contigues fragment and GC is unable to optimize; looks like your application requests a little and memory is not available... Quoting sundar shankar [EMAIL PROTECTED]: From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Out of memory on Solr sorting Date: Tue, 22 Jul 2008 19:11:02 +Hi, Sorry again fellos. I am not sure whats happening. The day with solr is bad for me I guess. EZMLM didnt let me send any mails this morning. Asked me to confirm subscription and when I did, it said I was already a member. Now my mails are all coming out bad. Sorry for troubling y'all this bad. I hope this mail comes out right.Hi, We are developing a product in a agile manner and the current implementation has a data of size just about a 800 megs in dev. The memory allocated to solr on dev (Dual core Linux box) is 128-512. My config = !-- autocommit pending docs if certain criteria are met autoCommit maxDocs1/maxDocs maxTime1000/maxTime /autoCommit -- filterCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=256/ queryResultCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=256/ documentCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ enableLazyFieldLoadingtrue/enableLazyFieldLoadingMy Field === fieldType name=autocomplete class=solr.TextField analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9]) replacement= replace=all / filter class=solr.EdgeNGramFilterFactory maxGramSize=100 minGramSize=1 / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9]) replacement= replace=all / filter class=solr.PatternReplaceFilterFactory pattern=^(.{20})(.*)? replacement=$1 replace=all / /analyzer /fieldTypeProblem == I execute a query that returns 24 rows of result. I pick 10 out of it. I have no problem when I execute this. But When I do sort it by a String field that is fetched from this result. I get an OOM. I am able to execute several other queries with no problem. Just having a sort asc clause added to the query throws an OOM. Why is that. What should I have ideally done. My config on QA is pretty similar to the dev box and probably has more data than on dev. It didnt throw any OOM during the integration test. The Autocomplete is a new field we added recently. Another point is that the indexing is done with a field of type string field name=XXX type=string indexed=true stored=true termVectors=true/ and the autocomplete field is a copy field. The sorting is done based on string field. Please do lemme know what mistake am I doing? Regards Sundar P.S: The stack trace of the exception isCaused by: org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:86) at org.apache.solr.client.solrj.impl.BaseSolrServer.query(BaseSolrServer.java:101) at com.apollo.sisaw.solr.service.AbstractSolrSearchService.makeSolrQuery(AbstractSolrSearchService.java:193) ... 105 more Caused by: org.apache.solr.common.SolrException: Java heap space java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) at org.apache.lucene.search.FieldCacheImpl.getStringIndex
RE: Out of memory on Solr sorting
I haven't seen the source code before, But I don't know why the sorting isn't done after the fetch is done. Wouldn't that make it more faster. at least in case of field level sorting? I could be wrong too and the implementation might probably be better. But don't know why all of the fields have had to be loaded. Date: Tue, 22 Jul 2008 14:26:26 -0700 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Re: Out of memory on Solr sorting Ok, after some analysis of FieldCacheImpl: - it is supposed that (sorted) Enumeration of terms is less than total number of documents (so that SOLR uses specific field type for sorted searches: solr.StrField with omitNorms=true) It creates int[reader.maxDoc()] array, checks (sorted) Enumeration of terms (untokenized solr.StrField), and populates array with document Ids. - it also creates array of String String[] mterms = new String[reader.maxDoc()+1]; Why do we need that? For 1G document with average term/StrField size of 100 bytes (which could be unique text!!!) it will create kind of huge 100Gb cache which is not really needed... StringIndex value = new StringIndex (retArray, mterms); If I understand correctly... StringIndex _must_ be a file in a filesystem for such a case... We create StringIndex, and retrieve top 10 documents, huge overhead. Quoting Fuad Efendi [EMAIL PROTECTED]:Ok, what is confusing me is implicit guess that FieldCache contains field and Lucene uses in-memory sort instead of using file-system index... Array syze: 100Mb (25M x 4 bytes), and it is just pointers (4-byte integers) to documents in index. org.apache.lucene.search.FieldCacheImpl$10.createValue ... 357: protected Object createValue(IndexReader reader, Object fieldKey) 358: throws IOException { 359: String field = ((String) fieldKey).intern(); 360: final int[] retArray = new int[reader.maxDoc()]; // OutOfMemoryError!!! ... 408: StringIndex value = new StringIndex (retArray, mterms); 409: return value; 410: } ... It's very confusing, I don't know such internals...field name=XXX type=string indexed=true stored=true termVectors=true/ The sorting is done based on string field.I think Sundar should not use [termVectors=true]... Quoting Mark Miller [EMAIL PROTECTED]: Hmmm...I think its 32bits an integer with an index entry for each doc, so**25 000 000 x 32 bits = 95.3674316 megabytes** Then you have the string array that contains each unique term from your index...you can guess that based on the number of terms in your index and an avg length guess. There is some other overhead beyond the sort cache as well, but thats the bulk of what it will add. I think my memory may be bad with my original estimate :) Fuad Efendi wrote: Thank you very much Mark, it explains me a lot; I am guessing: for 1,000,000 documents with a [string] field of average size 1024 bytes I need 1Gb for single IndexSearcher instance; field-level cache it is used internally by Lucene (can Lucene manage size if it?); we can't have 1G of such documents without having 1Tb RAM... Quoting Mark Miller [EMAIL PROTECTED]: Fuad Efendi wrote: SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 100767936, Num elements: 25191979I just noticed, this is an exact number of documents in index: 25191979 (http://www.tokenizer.org/, you can sort - click headers Id, [COuntry, Site, Price] in a table; experimental)If array is allocated ONLY on new searcher warming up I am _extremely_ happy... I had constant OOMs during past month (SUN Java 5). It is only on warmup - I believe its lazy loaded, so the first time a search is done (solr does the search as part of warmup I believe) the fieldcache is loaded. The underlying IndexReader is the key to the fieldcache, so until you get a new IndexReader (SolrSearcher in solr world?) the field cache will be good. Keep in mind that as a searcher is warming, the other search is still serving, so that will up the ram requirements...and since I think you can have 1 searchers on deck...you get the idea. As far as the number I gave, thats from a memory made months and months ago, so go with what you see. Quoting Fuad Efendi [EMAIL PROTECTED]: I've even seen exceptions (posted here) when sort-type queries caused Lucene to allocate 100Mb arrays, here is what happened to me: SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 100767936, Num elements: 25191979 at org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:360) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) - it does not happen after I increased from 4096M to 8192M (JRockit R27; more intelligent stacktrace, isn't it?) Thanks Mark; I didn't know that it happens only
RE: Out of memory on Solr sorting
I am hoping [new StringIndex (retArray, mterms)] is called only once per-sort-field and cached somewhere at Lucene; theoretically you need multiply number of documents on size of field (supposing that field contains unique text); you need not tokenize this field; you need not store TermVector. for 2 000 000 documents with simple untokenized text field such as title of book (256 bytes) you need probably 512 000 000 bytes per Searcher, and as Mark mentioned you should limit number of searchers in SOLR. So that Xmx512M is definitely not enough even for simple cases... Quoting sundar shankar [EMAIL PROTECTED]: I haven't seen the source code before, But I don't know why the sorting isn't done after the fetch is done. Wouldn't that make it more faster. at least in case of field level sorting? I could be wrong too and the implementation might probably be better. But don't know why all of the fields have had to be loaded. Date: Tue, 22 Jul 2008 14:26:26 -0700 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Re: Out of memory on Solr sorting Ok, after some analysis of FieldCacheImpl: - it is supposed that (sorted) Enumeration of terms is less than total number of documents (so that SOLR uses specific field type for sorted searches: solr.StrField with omitNorms=true) It creates int[reader.maxDoc()] array, checks (sorted) Enumeration of terms (untokenized solr.StrField), and populates array with document Ids. - it also creates array of String String[] mterms = new String[reader.maxDoc()+1]; Why do we need that? For 1G document with average term/StrField size of 100 bytes (which could be unique text!!!) it will create kind of huge 100Gb cache which is not really needed... StringIndex value = new StringIndex (retArray, mterms); If I understand correctly... StringIndex _must_ be a file in a filesystem for such a case... We create StringIndex, and retrieve top 10 documents, huge overhead. Quoting Fuad Efendi [EMAIL PROTECTED]:Ok, what is confusing me is implicit guess that FieldCache contains field and Lucene uses in-memory sort instead of using file-system index... Array syze: 100Mb (25M x 4 bytes), and it is just pointers (4-byte integers) to documents in index. org.apache.lucene.search.FieldCacheImpl$10.createValue ... 357: protected Object createValue(IndexReader reader, Object fieldKey) 358: throws IOException { 359: String field = ((String) fieldKey).intern(); 360: final int[] retArray = new int[reader.maxDoc()]; // OutOfMemoryError!!! ... 408: StringIndex value = new StringIndex (retArray, mterms); 409: return value; 410: } ... It's very confusing, I don't know such internals...field name=XXX type=string indexed=true stored=true termVectors=true/ The sorting is done based on string field.I think Sundar should not use [termVectors=true]... Quoting Mark Miller [EMAIL PROTECTED]: Hmmm...I think its 32bits an integer with an index entry for each doc, so**25 000 000 x 32 bits = 95.3674316 megabytes** Then you have the string array that contains each unique term from your index...you can guess that based on the number of terms in your index and an avg length guess. There is some other overhead beyond the sort cache as well, but thats the bulk of what it will add. I think my memory may be bad with my original estimate :) Fuad Efendi wrote: Thank you very much Mark, it explains me a lot; I am guessing: for 1,000,000 documents with a [string] field of average size 1024 bytes I need 1Gb for single IndexSearcher instance; field-level cache it is used internally by Lucene (can Lucene manage size if it?); we can't have 1G of such documents without having 1Tb RAM... Quoting Mark Miller [EMAIL PROTECTED]: Fuad Efendi wrote: SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 100767936, Num elements: 25191979 I just noticed, this is an exact number of documents in index: 25191979 (http://www.tokenizer.org/, you can sort - click headers Id, [COuntry, Site, Price] in a table; experimental)If array is allocated ONLY on new searcher warming up I am _extremely_ happy... I had constant OOMs during past month (SUN Java 5). It is only on warmup - I believe its lazy loaded, so the first time a search is done (solr does the search as part of warmup I believe) the fieldcache is loaded. The underlying IndexReader is the key to the fieldcache, so until you get a new IndexReader (SolrSearcher in solr world?) the field cache will be good. Keep in mind that as a searcher is warming, the other search is still serving, so that will up the ram requirements...and since I think you can have 1 searchers on deck...you get the idea. As far as the number I gave
RE: Out of memory on Solr sorting
Yes, it is a cache, it stores sorted by sorted field array of Document IDs together with sorted fields; query results can intersect with it and reorder accordingly. But memory requirements should be well documented. It uses internally WeakHashMap which is not good(!!!) - a lot of underground warming ups of caches which SOLR is not aware of... Could be. I think Lucene-SOLR developers should join this discussion: /** * Expert: The default cache implementation, storing all values in memory. * A WeakHashMap is used for storage. * .. // inherit javadocs public StringIndex getStringIndex(IndexReader reader, String field) throws IOException { return (StringIndex) stringsIndexCache.get(reader, field); } Cache stringsIndexCache = new Cache() { protected Object createValue(IndexReader reader, Object fieldKey) throws IOException { String field = ((String) fieldKey).intern(); final int[] retArray = new int[reader.maxDoc()]; String[] mterms = new String[reader.maxDoc()+1]; TermDocs termDocs = reader.termDocs(); TermEnum termEnum = reader.terms (new Term (field, )); Quoting Fuad Efendi [EMAIL PROTECTED]: I am hoping [new StringIndex (retArray, mterms)] is called only once per-sort-field and cached somewhere at Lucene; theoretically you need multiply number of documents on size of field (supposing that field contains unique text); you need not tokenize this field; you need not store TermVector. for 2 000 000 documents with simple untokenized text field such as title of book (256 bytes) you need probably 512 000 000 bytes per Searcher, and as Mark mentioned you should limit number of searchers in SOLR. So that Xmx512M is definitely not enough even for simple cases... Quoting sundar shankar [EMAIL PROTECTED]: I haven't seen the source code before, But I don't know why the sorting isn't done after the fetch is done. Wouldn't that make it more faster. at least in case of field level sorting? I could be wrong too and the implementation might probably be better. But don't know why all of the fields have had to be loaded. Date: Tue, 22 Jul 2008 14:26:26 -0700 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org Subject: Re: Out of memory on Solr sorting Ok, after some analysis of FieldCacheImpl: - it is supposed that (sorted) Enumeration of terms is less than total number of documents (so that SOLR uses specific field type for sorted searches: solr.StrField with omitNorms=true) It creates int[reader.maxDoc()] array, checks (sorted) Enumeration of terms (untokenized solr.StrField), and populates array with document Ids. - it also creates array of String String[] mterms = new String[reader.maxDoc()+1]; Why do we need that? For 1G document with average term/StrField size of 100 bytes (which could be unique text!!!) it will create kind of huge 100Gb cache which is not really needed... StringIndex value = new StringIndex (retArray, mterms); If I understand correctly... StringIndex _must_ be a file in a filesystem for such a case... We create StringIndex, and retrieve top 10 documents, huge overhead. Quoting Fuad Efendi [EMAIL PROTECTED]:Ok, what is confusing me is implicit guess that FieldCache contains field and Lucene uses in-memory sort instead of using file-system index... Array syze: 100Mb (25M x 4 bytes), and it is just pointers (4-byte integers) to documents in index. org.apache.lucene.search.FieldCacheImpl$10.createValue ... 357: protected Object createValue(IndexReader reader, Object fieldKey) 358: throws IOException { 359: String field = ((String) fieldKey).intern(); 360: final int[] retArray = new int[reader.maxDoc()]; // OutOfMemoryError!!! ... 408: StringIndex value = new StringIndex (retArray, mterms); 409: return value; 410: } ... It's very confusing, I don't know such internals...field name=XXX type=string indexed=true stored=true termVectors=true/ The sorting is done based on string field.I think Sundar should not use [termVectors=true]... Quoting Mark Miller [EMAIL PROTECTED]: Hmmm...I think its 32bits an integer with an index entry for each doc, so**25 000 000 x 32 bits = 95.3674316 megabytes** Then you have the string array that contains each unique term from your index...you can guess that based on the number of terms in your index and an avg length guess. There is some other overhead beyond the sort cache as well, but thats the bulk of what it will add. I think my memory may be bad with my original estimate :) Fuad Efendi wrote: Thank you very much Mark, it explains me a lot; I am guessing: for 1,000,000 documents with a [string] field of average size 1024 bytes I need 1Gb for single IndexSearcher