Re: Realtime Search

Robert Engels Fri, 26 Dec 2008 10:56:23 -0800

This is what we mostly do, but we serialize the documents to a log file first, 
so if server crashes before the background merge of the RAM segments into the 
disk segments completes, we can replay the operations on server restart. Since 
the serialize is a sequential write to an already open file, it is very fast.

I realize that many users do not wrap Lucene in a server process, so it doesn't 
seem that writing only to the RAM segments will work? How will the other 
processes/servers see them?  Doesn't seem it would be real-time for them.

Maybe restrict the real-time search to "server" Lucene installations? If you 
are concerned about performance in the first place, seems a requirement anyway.

On this note, maybe to allow greater advancement of Lucene, Lucene should move 
to a design approach similar to many databases.  You have an embedded version, 
which is designed for single process with multiple threads, and a server 
version which wraps the embedded version allowing multiple clients. Seems to be 
a far simpler architecture. I know I addressed have brought this up in the 
past, but maybe time to revisit?  It was the core of unix design (no file locks 
needed), and works well for many dbs (i.e. derby)

-----Original Message-----
>From: Doug Cutting <cutt...@apache.org>
>Sent: Dec 26, 2008 12:20 PM
>To: java-dev@lucene.apache.org
>Subject: Re: Realtime Search
>
>Michael McCandless wrote:
>> So then I think we should start with approach #2 (build real-time on
>> top of the Lucene core) and iterate from there.  Newly added docs go
>> into a tiny segments, which IndexReader.reopen pulls in.  Replaced or
>> deleted docs record the delete against the right SegmentReader (and
>> LUCENE-1314 lets reopen carry those pending deletes forward, in RAM).
>> 
>> I would take the simple approach first: use ordinary SegmentReader on
>> a RAMDirectory for the tiny segments.  If that proves too slow, swap
>> in Memory/InstantiatedIndex for the tiny segments.  If that proves too
>> slow, build a reader impl that reads from DocumentsWriter RAM buffer.
>
>+1 This sounds like a good approach to me.  I don't see any fundamental 
>reasons why we need different representations, and fewer implementations 
>of IndexWriter and IndexReader is generally better, unless they get way 
>too hairy.  Mostly it seems that real-time can be done with our existing 
>toolbox of datastructures, but with some slightly different control 
>structures.  Once we have the control structure in place then we should 
>look at optimizing data structures as needed.
>
>Doug
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>For additional commands, e-mail: java-dev-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Realtime Search

Reply via email to