Re: SOLR 4.1 Out Of Memory error After commit of a few thousand Solr Docs
On 1/27/2013 10:28 PM, Rahul Bishnoi wrote: Thanks for your reply. After following your suggestions we were able to index 30k documents. I have some queries: 1) What is stored in the RAM while only indexing is going on? How to calculate the RAM/heap requirements for our documents? 2) The document cache, filter cache, etc...are populated while querying. Correct me if I am wrong. Are there any caches that are populated while indexing? If anyone catches me making statements that are not true, please feel free to correct me. The caches are indeed only used during querying. If you are not making queries at all, they aren't much of a factor. I can't give you any definitive answers to your question about RAM usage and how to calculate RAM/heap requirements. I can make some general statements without looking at the code, just based on what I've learned so far about Solr, and about Java in general. You would have an exact copy of the input text for each field initially, which would ultimately get used for the stored data (for those fields that are stored). Each one is probably just a plain String, though I don't know as I haven't read the code. If the field is not being stored or copied, then it would be possible to get rid of that data as soon as it is no longer required for indexing. I don't have any idea whether Solr/Lucene code actually gets rid of the exact copy in this way. If you are storing termvectors, additional memory would be needed for that. I don't know if that involves lots of objects or if it's one object with index information. Based on my experience, termvectors can be bigger than the stored data for the same field. Tokenization and filtering is where I imagine that most of the memory would get used. If you're using a filter like EdgeNGram, that's a LOT of tokens. Even if you're just tokenizing words, it can add up. There is also space required for the inverted index, norms, and other data/metadata. If each token is a separate Java object (which I do not know), there would be a fair amount of memory overhead involved. A String object in java has something like 40 bytes of overhead above and beyond the space required for the data. Also, strings in Java are internally represented in UTF-16, so each character actually takes two bytes. http://www.javamex.com/tutorials/memory/string_memory_usage.shtml The finished documents stack up in the ramBufferSizeMB space until it gets full or a hard commit is issued, at which point they are flushed to disk as a Lucene segment. One thing that I'm not sure about is whether an additional ram buffer is allocated for further indexing while the flush is happening, or if it flushes and then re-uses the buffer for subsequent documents. Another way that it can use memory is when merging index segments. I don't know how much memory gets used for this process. On Solr 4 with the default directory factory, part of a flushed segment may remain in RAM until enough additional segment data is created. The amount of memory used by this feature should be pretty small, unless you have a lot of cores on a single JVM. That extra memory can be eliminated by using MMapDirectoryFactory instead of NRTCachingDirectoryFactory, at the expense of fast Near-RealTime index updates. Thanks, Shawn
Re: SOLR 4.1 Out Of Memory error After commit of a few thousand Solr Docs
Hi Shawn, Thanks for your reply. After following your suggestions we were able to index 30k documents. I have some queries: 1) What is stored in the RAM while only indexing is going on? How to calculate the RAM/heap requirements for our documents? 2) The document cache, filter cache, etc...are populated while querying. Correct me if I am wrong. Are there any caches that are populated while indexing? Thanks, Rahul On Sat, Jan 26, 2013 at 11:46 PM, Shawn Heisey wrote: > On 1/26/2013 12:55 AM, Rahul Bishnoi wrote: > >> Thanks for quick reply and addressing each point queried. >> >> Additional asked information is mentioned below: >> >> OS = Ubuntu 12.04 (64 bit) >> Sun Java 7 (64 bit) >> Total RAM = 8GB >> >> SolrConfig.xml is available at http://pastebin.com/SEFxkw2R >> >> > Rahul, > > The MaxPermGenSize could be a contributing factor. The documents where > you have 1000 words are somewhat large, though your overall index size is > pretty small. I would try removing the MaxPermGenSize option and see what > happens. You can also try reducing the ramBufferSizeMB in solrconfig.xml. > The default in previous versions of Solr was 32, which is big enough for > most things, unless you are indexing HUGE documents like entire books. > > It looks like you have the cache sizes under at values close to > default. I wouldn't decrease the documentCache any - in fact an increase > might be a good thing there. As for the others, you could probably reduce > them. The filterCache size I would start at 64 or 128. Watch your cache > hitratios to see whether the changes make things remarkably worse. > > If that doesn't help, try increasing the -Xmx option - first 3072m, next > 4096m. You could go as high as 6GB and not run into any OS cache problems > with your small index size, though you might run into long GC pauses. > > Indexing, especially big documents, is fairly memory intensive. Some > queries can be memory intensive as well, especially those using facets or a > lot of clauses. > > Under normal operation, I could probably get away with a 3GB heap size, > but I have it at 8GB because otherwise a full reindex (full-import from > mysql) runs into OOM errors. > > Thanks, > Shawn > >
Re: SOLR 4.1 Out Of Memory error After commit of a few thousand Solr Docs
On 1/26/2013 12:55 AM, Rahul Bishnoi wrote: Thanks for quick reply and addressing each point queried. Additional asked information is mentioned below: OS = Ubuntu 12.04 (64 bit) Sun Java 7 (64 bit) Total RAM = 8GB SolrConfig.xml is available at http://pastebin.com/SEFxkw2R Rahul, The MaxPermGenSize could be a contributing factor. The documents where you have 1000 words are somewhat large, though your overall index size is pretty small. I would try removing the MaxPermGenSize option and see what happens. You can also try reducing the ramBufferSizeMB in solrconfig.xml. The default in previous versions of Solr was 32, which is big enough for most things, unless you are indexing HUGE documents like entire books. It looks like you have the cache sizes under at values close to default. I wouldn't decrease the documentCache any - in fact an increase might be a good thing there. As for the others, you could probably reduce them. The filterCache size I would start at 64 or 128. Watch your cache hitratios to see whether the changes make things remarkably worse. If that doesn't help, try increasing the -Xmx option - first 3072m, next 4096m. You could go as high as 6GB and not run into any OS cache problems with your small index size, though you might run into long GC pauses. Indexing, especially big documents, is fairly memory intensive. Some queries can be memory intensive as well, especially those using facets or a lot of clauses. Under normal operation, I could probably get away with a 3GB heap size, but I have it at 8GB because otherwise a full reindex (full-import from mysql) runs into OOM errors. Thanks, Shawn
Re: SOLR 4.1 Out Of Memory error After commit of a few thousand Solr Docs
Thanks for quick reply and addressing each point queried. Additional asked information is mentioned below: OS = Ubuntu 12.04 (64 bit) Sun Java 7 (64 bit) Total RAM = 8GB SolrConfig.xml is available at http://pastebin.com/SEFxkw2R
Re: SOLR 4.1 Out Of Memory error After commit of a few thousand Solr Docs
On 1/25/2013 4:49 AM, Harish Verma wrote: we are testing solr 4.1 running inside tomcat 7 and java 7 with following options JAVA_OPTS="-Xms256m -Xmx2048m -XX:MaxPermSize=1024m -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+ParallelRefProcEnabled -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/ubuntu/OOM_HeapDump" our source code looks like following: / START */ int noOfSolrDocumentsInBatch = 0; for(int i=0 ; i<5000 ; i++) { SolrInputDocument solrInputDocument = getNextSolrInputDocument(); server.add(solrInputDocument); noOfSolrDocumentsInBatch += 1; if(noOfSolrDocumentsInBatch == 10) { server.commit(); noOfSolrDocumentsInBatch = 0; } } / END */ the method "getNextSolrInputDocument()" generates a solr document with 100 fields (average). Around 50 of the fields are of "text_general" type. Some of the "test_general" fields consist of approx 1000 words rest consists of few words. Ouf of total fields there are around 35-40 multivalued fields (not of type "text_general"). We are indexing all the fields but storing only 8 fields. Out of these 8 fields two are string type, five are long and one is boolean. So our index size is only 394 MB. But the RAM occupied at time of OOM is around 2.5 GB. Why the memory is so high even though the index size is small? What is being stored in the memory? Our understanding is that after every commit documents are flushed to the disk.So nothing should remain in RAM after commit. We are using the following settings: server.commit() set waitForSearcher=true and waitForFlush=true solrConfig.xml has following properties set: directoryFactory = solr.MMapDirectoryFactory maxWarmingSearchers = 1 text_general data type is being used as supplied in the schema.xml with the solr setup. maxIndexingThreads = 8(default) 15000false We get Java heap Out Of Memory Error after commiting around 3990 solr documents.Some of the snapshots of memory dump from profiler are uploaded at following links. http://s9.postimage.org/w7589t9e7/memorydump1.png http://s7.postimage.org/p3abs6nuj/memorydump2.png can somebody please suggest what should we do to minimize/optimize the memory consumption in our case with the reasons? also suggest what should be optimal values and reason for following parameters of solrConfig.xml useColdSearcher - true/false? maxwarmingsearchers- number spellcheck-on/off? omitNorms=true/false? omitTermFreqAndPositions? mergefactor? we are using default value 10 java garbage collection tuning parameters ? Additional information is needed. What OS platform? Is the OS 64-bit? Is Java 64-bit? How much total RAM? We'll need your solrconfig.xml file, in particular the query and indexConfig sections. Use your favorite paste site (pastie.org, pastebin.com for example) to link the solrconfig.xml file. General thoughts without the above information: You are reserving half of your java heap for the permanent generation. I have a solr installation where Java has a max heap of 8GB, about 5GB of that is currently committed - actually allocated at the OS level. My perm gen space is 65908KB. This server handles a total index size of nearly 70GB. I doubt you need 1GB for your perm gen size. A 2GB heap is fairly small in the Solr world. If you are using a 32 bit java, that's the biggest heap you can create, so 64-bit on both Java and OS is the way to go. You can reduce memory requirements a small amount by using Jetty instead of Tomcat, but the difference is probably not big enough to really matter. For the questions you asked at the end, most of them are personal preference, but maxWarmingSearchers should normally be kept low. I think I have a value of 2 in my config. Here are the GC tuning parameters that I am currently testing. I have been having problems with long GC pauses that I am trying to fix: -Xms1024M -Xmx8192M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:NewRatio=3 -XX:MaxTenuringThreshold=8 -XX:+CMSParallelRemarkEnable You should only use CMSIncrementalMode if you only have one or two processor cores. My reading has suggested that when you have more, it is not beneficial. So far my GC parameters seem to be working really well, but I need to do a full reindex which should force usage of the entire 8GB heap and push garbage collection to its limits. I have a question of my own for someone familiar with the code. Does Solr extensively use weak references? If so, ParallelRefProcEnabled might be a win. Thanks, Shawn
SOLR 4.1 Out Of Memory error After commit of a few thousand Solr Docs
we are testing solr 4.1 running inside tomcat 7 and java 7 with following options JAVA_OPTS="-Xms256m -Xmx2048m -XX:MaxPermSize=1024m -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+ParallelRefProcEnabled -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/ubuntu/OOM_HeapDump" our source code looks like following: / START */ int noOfSolrDocumentsInBatch = 0; for(int i=0 ; i<5000 ; i++) { SolrInputDocument solrInputDocument = getNextSolrInputDocument(); server.add(solrInputDocument); noOfSolrDocumentsInBatch += 1; if(noOfSolrDocumentsInBatch == 10) { server.commit(); noOfSolrDocumentsInBatch = 0; } } / END */ the method "getNextSolrInputDocument()" generates a solr document with 100 fields (average). Around 50 of the fields are of "text_general" type. Some of the "test_general" fields consist of approx 1000 words rest consists of few words. Ouf of total fields there are around 35-40 multivalued fields (not of type "text_general"). We are indexing all the fields but storing only 8 fields. Out of these 8 fields two are string type, five are long and one is boolean. So our index size is only 394 MB. But the RAM occupied at time of OOM is around 2.5 GB. Why the memory is so high even though the index size is small? What is being stored in the memory? Our understanding is that after every commit documents are flushed to the disk.So nothing should remain in RAM after commit. We are using the following settings: server.commit() set waitForSearcher=true and waitForFlush=true solrConfig.xml has following properties set: directoryFactory = solr.MMapDirectoryFactory maxWarmingSearchers = 1 text_general data type is being used as supplied in the schema.xml with the solr setup. maxIndexingThreads = 8(default) 15000false We get Java heap Out Of Memory Error after commiting around 3990 solr documents.Some of the snapshots of memory dump from profiler are uploaded at following links. http://s9.postimage.org/w7589t9e7/memorydump1.png http://s7.postimage.org/p3abs6nuj/memorydump2.png can somebody please suggest what should we do to minimize/optimize the memory consumption in our case with the reasons? also suggest what should be optimal values and reason for following parameters of solrConfig.xml useColdSearcher - true/false? maxwarmingsearchers- number spellcheck-on/off? omitNorms=true/false? omitTermFreqAndPositions? mergefactor? we are using default value 10 java garbage collection tuning parameters ? Regards Harish Verma