Re: configuring solr3.6 for a large intensive index only run
1) In SolrConfig.xml, find ramBufferSizeMB and change to: ramBufferSizeMB1024/ramBufferSizeMB 2) Also, try decrease the mergefactor to see if it will give you less segments. In my experiment, it does. -- View this message in context: http://lucene.472066.n3.nabble.com/configuring-solr3-6-for-a-large-intensive-index-only-run-tp3985733p3995659.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: configuring solr3.6 for a large intensive index only run
Scott, In addition to what Lance said, make sure your ramBufferSizeMB in solrconfig.xml is high. Try with 512MB or 1024MB. Seeing Solr/Lucene index segment merging visualization in SPM for Solr is one of my favourite reports in SPM. It's kind of amazing how much index size fluctuates! Otis Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm From: Scott Preddy scott.m.pre...@gmail.com To: solr-user@lucene.apache.org Sent: Wednesday, May 23, 2012 2:19 PM Subject: configuring solr3.6 for a large intensive index only run I am trying to do a very large insertion (about 68million documents) into a solr instance. Our schema is pretty simple. About 40 fields using these types: types fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType fieldType name=int class=solr.TrieIntField precisionStep=0 omitNorms=true positionIncrementGap=0/ /types We are running solrj clients from a hadoop cluster, and are struggling with the merge process as time progresses. As the number of documents grows, merging will eventually hog everything. What we would really like to do is turn merging off and just do an index run with a sparse solrconfig and then start things back up with our runtime config which would kick off merging when it starts. Is there a way to do this? I came close to finding an answer in this post, but did not find out how to actually turn off merging. Post by Mike McCandless: http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
Re: configuring solr3.6 for a large intensive index only run
On 5/23/2012 12:27 PM, Lance Norskog wrote: If you want to suppress merging, set the 'mergeFactor' very high. Perhaps 100. Note that Lucene opens many files (50? 100? 200?) for each segment. You would have to set the 'ulimit' for file descriptors to 'unlimited' or 'millions'. My installation (Solr 3.5.0) creates 11 files per segment, and there is often a 12th file for deletes. I have termvectors turned on for some of my fields. If you aren't using termvectors at all, the last three files in my list are not created: _26n_2.del _26n.fdt _26n.fdx _26n.fnm _26n.frq _26n.nrm _26n.prx _26n.tii _26n.tis _26n.tvd _26n.tvf _26n.tvx I have yet to try 3.6, but I would imagine that it isn't a lot different than 3.5. I use a fairly high mergeFactor of 35, and I am considering raising it even higher so that during normal operation there will never be a merge that's not under my control. When I do a full index rebuild, there is so much data added that it will still do automatic merges. Thanks, Shawn
Re: configuring solr3.6 for a large intensive index only run
If you want to suppress merging, set the 'mergeFactor' very high. Perhaps 100. Note that Lucene opens many files (50? 100? 200?) for each segment. You would have to set the 'ulimit' for file descriptors to 'unlimited' or 'millions'. Later, you can call optimize with a 'maxSegments' value. Optimize will stop at maxSegments instead of merging down to one. Lucene these days does not need to have one segment, so merging down to 20 or 50 is fine. On Wed, May 23, 2012 at 11:19 AM, Scott Preddy scott.m.pre...@gmail.com wrote: I am trying to do a very large insertion (about 68million documents) into a solr instance. Our schema is pretty simple. About 40 fields using these types: types fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType fieldType name=int class=solr.TrieIntField precisionStep=0 omitNorms=true positionIncrementGap=0/ /types We are running solrj clients from a hadoop cluster, and are struggling with the merge process as time progresses. As the number of documents grows, merging will eventually hog everything. What we would really like to do is turn merging off and just do an index run with a sparse solrconfig and then start things back up with our runtime config which would kick off merging when it starts. Is there a way to do this? I came close to finding an answer in this post, but did not find out how to actually turn off merging. Post by Mike McCandless: http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html -- Lance Norskog goks...@gmail.com