Re: 170G index, 1.5 billion documents, out of memory on query
Hi, the full stack trace is below. - SEVERE: Unable to create core: collection1 org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.(SolrCore.java:794) at org.apache.solr.core.SolrCore.(SolrCore.java:607) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1003) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1033) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:624) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1423) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1535) at org.apache.solr.core.SolrCore.(SolrCore.java:769) ... 13 more Caused by: java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:748) at org.apache.lucene.store.MMapDirectory.map(MMapDirectory.java:285) at org.apache.lucene.store.MMapDirectory$MMapIndexInput.(MMapDirectory.java:257) at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:225) at org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.(Lucene41PostingsReader.java:72) at org.apache.lucene.codecs.lucene41.Lucene41PostingsFormat.fieldsProducer(Lucene41PostingsFormat.java:430) at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.(PerFieldPostingsFormat.java:194) at org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProducer(PerFieldPostingsFormat.java:233) at org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:107) at org.apache.lucene.index.SegmentReader.(SegmentReader.java:57) at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:62) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783) at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:87) at org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:34) at org.apache.solr.search.SolrIndexSearcher.(SolrIndexSearcher.java:123) at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1399) ... 15 more Caused by: java.lang.OutOfMemoryError: Map failed at sun.nio.ch.FileChannelImpl.map0(Native Method) at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:745) ... 31 more 26-Feb-2013 12:20:46 org.apache.solr.common.SolrException log SEVERE: null:org.apache.solr.common.SolrException: Unable to create core: collection1 at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1654) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1039) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:624) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.(SolrCore.java:794) at org.apache.solr.core.SolrCore.(SolrCore.java:607) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1003) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1033) ... 10 more Caused by: org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1423) at org.apache.solr.core.SolrCore.get
Re: 170G index, 1.5 billion documents, out of memory on query
Do you have the stack trace for the OOM during startup when using MMapDirectory? That would be interesting to know. Cheers, Tim On Mon, Feb 25, 2013 at 1:15 PM, zqzuk wrote: > Hi Michael > > Yes I have double checked and pretty sure its 64bit java. Thanks > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/170G-index-1-5-billion-documents-out-of-memory-on-query-tp4042696p4042828.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: 170G index, 1.5 billion documents, out of memory on query
Hello Zqzuk, It's true that this index is probably too big for a single shard, but make sure you heed Shawn's advice and use a 64-bit JVM in any case! Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Mon, Feb 25, 2013 at 2:45 PM, zqzuk wrote: > Thanks again for your kind input! > > I followed Tim's advice and tried to use MMapDirectory. Then I get > outofmemory on solr startup (tried giving only 8G, 4G to JVM) > > I guess this truely indicates that there arent sufficient memory for such a > huge index. > > On another thread I posted days before, regarding splitting index for > solrcloud, the answer I got was that it is not possible to split an existing > index into a solrcloud configuration. > > I will try re-indexing with solrcloud instead... > > thanks > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/170G-index-1-5-billion-documents-out-of-memory-on-query-tp4042696p4042819.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: 170G index, 1.5 billion documents, out of memory on query
Thanks again for your kind input! I followed Tim's advice and tried to use MMapDirectory. Then I get outofmemory on solr startup (tried giving only 8G, 4G to JVM) I guess this truely indicates that there arent sufficient memory for such a huge index. On another thread I posted days before, regarding splitting index for solrcloud, the answer I got was that it is not possible to split an existing index into a solrcloud configuration. I will try re-indexing with solrcloud instead... thanks -- View this message in context: http://lucene.472066.n3.nabble.com/170G-index-1-5-billion-documents-out-of-memory-on-query-tp4042696p4042819.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: 170G index, 1.5 billion documents, out of memory on query
On 2/25/2013 11:05 AM, zqzuk wrote: I have deliberately allocated 32G to JVM, with the command "java -Xmx32000m -jar start.jar" etc. I am using our server which I think has a total of 48G. However it still crashes because of that error when I specify any keywords in my query. The only query that worked, as I said, is "q=*:*" I also realised that the best configuration would be a cloudsetting. It's a shame that I cannot split this index for that purpose but rather have to re-index everything. Yes, you would have to reindex in order to use SolrCloud with multiple shards. There is an API available somewhere that can split Solr indexes, which you might be able to use to create the shards using your current index, but I don't know how well it works or where to find it. But I very much would like to know exactly what has happened with that error: "java.lang.OutOfMemoryError: OutOfMemoryError likely caused by the Sun VM Bug described in https://issues.apache.org/jira/browse/LUCENE-1566; try calling FSDirectory.setReadChunkSize with a value smaller than the current chunk size (2147483647)" Especially what does the last line tell me? The error message indicates that the problem MIGHT a bug in the Sun/Oracle JVM. There is a workaround that's possible at the Lucene level - setting the chunk size for the directory implementation. If you were using Lucene directly, this would make perfect sense to you, because you would already be writing java code. Because you're using Solr, those details are hidden from you. This is *only* a problem if your Java VM is 32-bit. If it's 64-bit, this bug does not happen. If you are using a 32-bit Java, then you'll want to make sure you're on a 64-bit OS and upgrade to a 64-bit java. The output of "java -version" will explicitly say 64-Bit if that's what it is: [root@idxa2 idxbuild]# java -version java version "1.6.0_38" Java(TM) SE Runtime Environment (build 1.6.0_38-b05) Java HotSpot(TM) 64-Bit Server VM (build 20.13-b02, mixed mode) I would not attempt to run with an index this big on a 32-bit JVM. Thanks, Shawn
Re: 170G index, 1.5 billion documents, out of memory on query
The other issue you need to be worried about is long full GC pauses with -Xmx32000m. Maybe try reducing your JVM Heap considerably (e.g. -Xmx8g) and switching to the MMapDirectory - see: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html In solrconfig.xml, this would be: On Mon, Feb 25, 2013 at 11:05 AM, zqzuk wrote: > Hi, thanks for your advice! > > I have deliberately allocated 32G to JVM, with the command "java -Xmx32000m > -jar start.jar" etc. I am using our server which I think has a total of 48G. > However it still crashes because of that error when I specify any keywords > in my query. The only query that worked, as I said, is "q=*:*" > > I also realised that the best configuration would be a cloudsetting. It's a > shame that I cannot split this index for that purpose but rather have to > re-index everything. > > But I very much would like to know exactly what has happened with that > error: > > "java.lang.OutOfMemoryError: OutOfMemoryError likely caused by the Sun VM > Bug described in https://issues.apache.org/jira/browse/LUCENE-1566; > try calling FSDirectory.setReadChunkSize with a value smaller than the > current chunk size (2147483647)" > > Especially what does the last line tell me? > > Many thanks again! > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/170G-index-1-5-billion-documents-out-of-memory-on-query-tp4042696p4042796.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: 170G index, 1.5 billion documents, out of memory on query
Hi, thanks for your advice! I have deliberately allocated 32G to JVM, with the command "java -Xmx32000m -jar start.jar" etc. I am using our server which I think has a total of 48G. However it still crashes because of that error when I specify any keywords in my query. The only query that worked, as I said, is "q=*:*" I also realised that the best configuration would be a cloudsetting. It's a shame that I cannot split this index for that purpose but rather have to re-index everything. But I very much would like to know exactly what has happened with that error: "java.lang.OutOfMemoryError: OutOfMemoryError likely caused by the Sun VM Bug described in https://issues.apache.org/jira/browse/LUCENE-1566; try calling FSDirectory.setReadChunkSize with a value smaller than the current chunk size (2147483647)" Especially what does the last line tell me? Many thanks again! -- View this message in context: http://lucene.472066.n3.nabble.com/170G-index-1-5-billion-documents-out-of-memory-on-query-tp4042696p4042796.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: 170G index, 1.5 billion documents, out of memory on query
On 2/25/2013 4:06 AM, zqzuk wrote: Hi I am really frustrated by this problem. I have built an index of 1.5 billion data records, with a size of about 170GB. It's been optimised and has 12 separate files in the index directory, looking like below: _2.fdt --- 58G _2.fdx --- 80M _2.fnm--- 900bytes _2.si --- 380bytes _2.lucene41_0.doc --- 46G _2_Lucene41_0.pos --- 22G _2_Lucene41_0.tim --- 37G _2_Lucene41_0.tip --- 766MB _2_nrm.cfe --- 139byte _2_nrm.cfs --- 5.7G segments.gen -- 20byte segments_1 ---68byte It sits on a single server with a memory of 32G allocated to it, using default solr setting that are provided with the solr example in the distrubtion. I started the server ok with 32G memory, but any query other than "q=*:*" fails, with the following out of memory exception: When you say 32GB, are you saying that you have allocated 32GB to the Java heap, or that the machine has 32GB of total system RAM? 32GB total RAM would not be enough for this index. If you are saying that 32GB is the Java heap, I would expect that to be OK - as long as the machine has at least 96GB of total system RAM (256GB would be ideal), and your Solr caches are not enormous. Is this a 64-bit operating system with a 64-bit Java? If it's not, then the largest java heap you can allocate would be 2GB, which would definitely not be enough. The other reply that just came in from Timothy Potter is also correct. Put more memory on each machine and use more machines for better results. Thanks, Shawn
Re: 170G index, 1.5 billion documents, out of memory on query
My sense tells me that you're heading down the wrong path of trying to fit such a large index on one server. Even if you resolve this current issue, you're not likely to be happy with query performance as one thread searching 1.5B docs index is going to be slower than 10 threads searching 10 - 150M docs indexes concurrently. Solr is designed to scale-out. There is overhead to support distributed search but in my experience, the benefit outweighs the cost 10x. My index is only about the same size on disk but only about 500M docs. I also think you'll get much better support from the Solr community using SolrCloud. So I recommend breaking your index up into multiple shards and distributing across more nodes - start with: http://wiki.apache.org/solr/SolrCloud Cheers, Tim On Mon, Feb 25, 2013 at 7:02 AM, Artem OXSEED wrote: > Hello, > > adding my 5 cents here as well: it seems that we experienced similar problem > that was supposed to be fixed or not appear at all for 64-bit systems. Our > current solution is custom build of Solr with DEFAULT_READ_CHUNK_SIZE set t0 > 10MB in FSDirectory class. This fix was done however not by me and in the > old times of Solr 1.4.1 so I'm not sure if it's valid anymore considering > vast changes in Lucene/Solr code and JVM improvements, so I'd like much to > hear suggestions of experienced users. > > -- > Warm regards, > Artem Karpenko > > > On 25.02.2013 14:33, zqzuk wrote: >> >> Just to add... I noticed this line in the stack trace particularly: >> >> *try calling FSDirectory.setReadChunkSize with a value smaller than the >> current chunk size (2147483647)* >> >> Had a look at the javadoc and solrconfig.xml. I cannot see a way to call >> this method to change it with solr. If that would be a possible fix, how >> can >> I do it in Solr? >> >> Thanks >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/170G-index-1-5-billion-documents-out-of-memory-on-query-tp4042696p4042705.html >> Sent from the Solr - User mailing list archive at Nabble.com. > >
Re: 170G index, 1.5 billion documents, out of memory on query
Hello, adding my 5 cents here as well: it seems that we experienced similar problem that was supposed to be fixed or not appear at all for 64-bit systems. Our current solution is custom build of Solr with DEFAULT_READ_CHUNK_SIZE set t0 10MB in FSDirectory class. This fix was done however not by me and in the old times of Solr 1.4.1 so I'm not sure if it's valid anymore considering vast changes in Lucene/Solr code and JVM improvements, so I'd like much to hear suggestions of experienced users. -- Warm regards, Artem Karpenko On 25.02.2013 14:33, zqzuk wrote: Just to add... I noticed this line in the stack trace particularly: *try calling FSDirectory.setReadChunkSize with a value smaller than the current chunk size (2147483647)* Had a look at the javadoc and solrconfig.xml. I cannot see a way to call this method to change it with solr. If that would be a possible fix, how can I do it in Solr? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/170G-index-1-5-billion-documents-out-of-memory-on-query-tp4042696p4042705.html Sent from the Solr - User mailing list archive at Nabble.com.