Re: configuring solr3.6 for a large intensive index only run

2012-07-17 Thread nanshi
1) In SolrConfig.xml, find ramBufferSizeMB and change to:
 ramBufferSizeMB1024/ramBufferSizeMB

2) Also, try decrease the mergefactor to see if it will give you less
segments. In my experiment, it does.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/configuring-solr3-6-for-a-large-intensive-index-only-run-tp3985733p3995659.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: configuring solr3.6 for a large intensive index only run

2012-05-24 Thread Otis Gospodnetic
Scott,

In addition to what Lance said, make sure your ramBufferSizeMB in 
solrconfig.xml is high. Try with 512MB or 1024MB.  Seeing Solr/Lucene index 
segment merging visualization in SPM for Solr is one of my favourite reports in 
SPM.  It's kind of amazing how much index size fluctuates!

Otis 

Performance Monitoring for Solr / ElasticSearch / HBase - 
http://sematext.com/spm 




 From: Scott Preddy scott.m.pre...@gmail.com
To: solr-user@lucene.apache.org 
Sent: Wednesday, May 23, 2012 2:19 PM
Subject: configuring solr3.6 for a large intensive index only run
 
I am trying to do a very large insertion (about 68million documents) into a
solr instance.

Our schema is pretty simple. About 40 fields using these types:

   types
      fieldType name=string class=solr.StrField sortMissingLast=true
omitNorms=true/
      fieldType name=text_general class=solr.TextField
positionIncrementGap=100
         analyzer type=index
            tokenizer class=solr.StandardTokenizerFactory/
            filter class=solr.LowerCaseFilterFactory/
         /analyzer
         analyzer type=query
            tokenizer class=solr.StandardTokenizerFactory/
            filter class=solr.LowerCaseFilterFactory/
         /analyzer
      /fieldType
      fieldType name=int class=solr.TrieIntField precisionStep=0
omitNorms=true positionIncrementGap=0/
   /types

We are running solrj clients from a hadoop cluster, and are struggling with
the merge process as time progresses.
As the number of documents grows, merging will eventually hog everything.

What we would really like to do is turn merging off and just do an index
run with a sparse solrconfig and then
start things back up with our runtime config which would kick off merging
when it starts.

Is there a way to do this?

I came close to finding an answer in this post, but did not find out how to
actually turn off merging.

Post by Mike McCandless:
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html




Re: configuring solr3.6 for a large intensive index only run

2012-05-24 Thread Shawn Heisey

On 5/23/2012 12:27 PM, Lance Norskog wrote:

If you want to suppress merging, set the 'mergeFactor' very high.
Perhaps 100. Note that Lucene opens many files (50? 100? 200?) for
each segment. You would have to set the 'ulimit' for file descriptors
to 'unlimited' or 'millions'.


My installation (Solr 3.5.0) creates 11 files per segment, and there is 
often a 12th file for deletes.  I have termvectors turned on for some of 
my fields.  If you aren't using termvectors at all, the last three files 
in my list are not created:


_26n_2.del  _26n.fdt  _26n.fdx  _26n.fnm  _26n.frq  _26n.nrm  _26n.prx  
_26n.tii  _26n.tis  _26n.tvd  _26n.tvf  _26n.tvx


I have yet to try 3.6, but I would imagine that it isn't a lot different 
than 3.5.  I use a fairly high mergeFactor of 35, and I am considering 
raising it even higher so that during normal operation there will never 
be a merge that's not under my control.  When I do a full index rebuild, 
there is so much data added that it will still do automatic merges.


Thanks,
Shawn



Re: configuring solr3.6 for a large intensive index only run

2012-05-23 Thread Lance Norskog
If you want to suppress merging, set the 'mergeFactor' very high.
Perhaps 100. Note that Lucene opens many files (50? 100? 200?) for
each segment. You would have to set the 'ulimit' for file descriptors
to 'unlimited' or 'millions'.

Later, you can call optimize with a 'maxSegments' value. Optimize will
stop at maxSegments instead of merging down to one. Lucene these days
does not need to have one segment, so merging down to 20 or 50 is
fine.

On Wed, May 23, 2012 at 11:19 AM, Scott Preddy scott.m.pre...@gmail.com wrote:
 I am trying to do a very large insertion (about 68million documents) into a
 solr instance.

 Our schema is pretty simple. About 40 fields using these types:

   types
      fieldType name=string class=solr.StrField sortMissingLast=true
 omitNorms=true/
      fieldType name=text_general class=solr.TextField
 positionIncrementGap=100
         analyzer type=index
            tokenizer class=solr.StandardTokenizerFactory/
            filter class=solr.LowerCaseFilterFactory/
         /analyzer
         analyzer type=query
            tokenizer class=solr.StandardTokenizerFactory/
            filter class=solr.LowerCaseFilterFactory/
         /analyzer
      /fieldType
      fieldType name=int class=solr.TrieIntField precisionStep=0
 omitNorms=true positionIncrementGap=0/
   /types

 We are running solrj clients from a hadoop cluster, and are struggling with
 the merge process as time progresses.
 As the number of documents grows, merging will eventually hog everything.

 What we would really like to do is turn merging off and just do an index
 run with a sparse solrconfig and then
 start things back up with our runtime config which would kick off merging
 when it starts.

 Is there a way to do this?

 I came close to finding an answer in this post, but did not find out how to
 actually turn off merging.

 Post by Mike McCandless:
 http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html



-- 
Lance Norskog
goks...@gmail.com