It's not only that the newly merged segments are quickly searchable
(you could do that with warming outside of IW).

It's more importantly so that you can continue to add/delete docs,
flush the segment, open a new NRT reader, and search those changes,
without waiting for the warming to complete.  You could do many such
updates all while a large merged segment is being warmed in the BG.

It decouples merging (which results in no change to the search
results) from the add/deletes (which do result in changes to the
search results), so that the warming due to a large merge won't hold
up the stream of updates.

I think for any serious NRT app, it's a must.  (Either that or avoid
ever doing large merges entirely).

Mike

On Tue, Sep 22, 2009 at 11:44 AM, Jason Rutherglen
<jason.rutherg...@gmail.com> wrote:
> Adding segment warming to IW is the only way to insure newly
> merged segments are quickly searchable without the impact
> brought up by John W regarding queries on new segments being
> slow when they load field caches.
>
> On Tue, Sep 22, 2009 at 8:37 AM, Michael McCandless
> <luc...@mikemccandless.com> wrote:
>> On Tue, Sep 22, 2009 at 11:08 AM, Yonik Seeley
>> <yo...@lucidimagination.com> wrote:
>>
>>> I'm still not sure I see the reason for complicating the IndexWriter
>>> with warming... can't this be done just as efficiently (if not more
>>> efficiently) in user/application space?
>>
>> It will be less efficient when you warm outside of IndexWriter, ie,
>> you will necessarily delay the app's net turnaround time on being able
>> to search newly added/deleted docs.
>>
>> The whole point of putting optional warming into IndexWriter was so
>> the segment could be warmed *before* the merge commits the change to
>> the writer's SegmentInfos.  Any newly opened near-real-timer readers
>> continue to search the old (merged away) segments, until the warming
>> completes.
>>
>> This way the warming of merged segments is independent of making any
>> newly flushed segments searchable (as long as you use CMS, or any
>> merge scheduler that uses separate threads for merging).  New segments
>> can be flushed and then become searchable (with getReader()) even
>> while the warming is happening.
>>
>> So... if your merge policy allows large merges, setting a warmer in
>> the IndexWriter is crucial for minimizing turnaround time.  But, even
>> once you do that, merging is still IO & CPU intensive, plus IO caches
>> are unnecessarily flushed (since we can't easily madvise/posix_fadvise
>> from java), and we have no IO scheduler control to have merging run at
>> very lower priority, etc., so while the merge & warming are taking
>> place, search performance will be impacted.
>>
>> Mike
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to