Re: Optimizing in SolrCloud

2012-03-29 Thread Walter Underwood
Don't. Optimize is a poorly-chosen name for a full merge. It doesn't make 
that much difference and there is almost never a need to do it on a periodic 
basis.

The full merge will mean a longer time between the commit and the time that the 
data is first searchable. Do the commit, then search.

wunder

On Mar 29, 2012, at 4:04 PM, Jamie Johnson wrote:

 What is the best way to periodically optimize a Solr index?  I've seen
 a few places where this is done from a CRON job, but I wanted to know
 if there are any other techniques that are used in practice for doing
 this.  My use case is that we generally load a large corpus of data up
 front and then information trickle's in after that, but we want this
 information to be available for search within a reasonable amount of
 time (say 10 minutes).  I believe that the CRON job would probably
 suffice but if there are any other thoughts/suggestions I'd be
 interested to hear them.







Re: Optimizing in SolrCloud

2012-03-29 Thread Jamie Johnson
Thanks, does it matter that we are also updates to documents at
various times?  Do the deleted documents get removed when doing a
merge or does that only get done on an optimize?

On Thu, Mar 29, 2012 at 7:08 PM, Walter Underwood wun...@wunderwood.org wrote:
 Don't. Optimize is a poorly-chosen name for a full merge. It doesn't make 
 that much difference and there is almost never a need to do it on a periodic 
 basis.

 The full merge will mean a longer time between the commit and the time that 
 the data is first searchable. Do the commit, then search.

 wunder

 On Mar 29, 2012, at 4:04 PM, Jamie Johnson wrote:

 What is the best way to periodically optimize a Solr index?  I've seen
 a few places where this is done from a CRON job, but I wanted to know
 if there are any other techniques that are used in practice for doing
 this.  My use case is that we generally load a large corpus of data up
 front and then information trickle's in after that, but we want this
 information to be available for search within a reasonable amount of
 time (say 10 minutes).  I believe that the CRON job would probably
 suffice but if there are any other thoughts/suggestions I'd be
 interested to hear them.







Re: Optimizing in SolrCloud

2012-03-29 Thread Yonik Seeley
On Thu, Mar 29, 2012 at 7:15 PM, Jamie Johnson jej2...@gmail.com wrote:
 Thanks, does it matter that we are also updates to documents at
 various times?  Do the deleted documents get removed when doing a
 merge or does that only get done on an optimize?

Yes, any merge removes documents that have been marked as deleted
(from the segments involved in the merge).

Optimize can still make sense, but more often in scenarios where
documents are updated infrequently.

-Yonik
lucenerevolution.com - Lucene/Solr Open Source Search Conference.
Boston May 7-10


Re: Optimizing in SolrCloud

2012-03-29 Thread Walter Underwood
The documents are removed from the search when the delete is committed.

The space for those documents is reclaimed at the next merge for the segment 
where they were. 

wunder

On Mar 29, 2012, at 4:15 PM, Jamie Johnson wrote:

 Thanks, does it matter that we are also updates to documents at
 various times?  Do the deleted documents get removed when doing a
 merge or does that only get done on an optimize?
 
 On Thu, Mar 29, 2012 at 7:08 PM, Walter Underwood wun...@wunderwood.org 
 wrote:
 Don't. Optimize is a poorly-chosen name for a full merge. It doesn't make 
 that much difference and there is almost never a need to do it on a periodic 
 basis.
 
 The full merge will mean a longer time between the commit and the time that 
 the data is first searchable. Do the commit, then search.
 
 wunder
 
 On Mar 29, 2012, at 4:04 PM, Jamie Johnson wrote:
 
 What is the best way to periodically optimize a Solr index?  I've seen
 a few places where this is done from a CRON job, but I wanted to know
 if there are any other techniques that are used in practice for doing
 this.  My use case is that we generally load a large corpus of data up
 front and then information trickle's in after that, but we want this
 information to be available for search within a reasonable amount of
 time (say 10 minutes).  I believe that the CRON job would probably
 suffice but if there are any other thoughts/suggestions I'd be
 interested to hear them.