Re: GData, updateable IndexSearcher

Doug Cutting Wed, 26 Apr 2006 13:44:34 -0700

jason rutherglen wrote:

I was thinking you implied that you knew of someone who had customized their own, but it was a closed source solution. And if so then you would know how that project faired.

I don't recall the details, but I know folks have discussed thispreviously, and probably even posted patches, but I don't think any ofthe patches was ready to commit.

Wouldn't there also need to be a hack on the IndexWriter to keep track of new 
segments?

I think the 'public static IndexReader.reopen(IndexReader old)' method Iproposed can easily compare the current list of segments for thedirectory of old to those that old already has open, and determine whichcan be reused and which new segments must be opened. Deletions would bea little tricky to track. If a segment has had deletions, then a newSegmentReader could be cloned from the old, sharing everything but thedeletions, which could be re-read from disk. This would invalidatecached filters for segments that had deletions.

You could even try to figure out what documents have been deleted, thenupdate filters incrementally. That would be fastest, but more complicated.


Doug

----- Original Message ----
From: Doug Cutting <[EMAIL PROTECTED]>
To: [email protected]
Sent: Wednesday, April 26, 2006 11:27:44 AM
Subject: Re: GData, updateable IndexSearcher

jason rutherglen wrote:
Interesting, does this mean there is a plan for incrementally updateable 
IndexSearchers to become part of Lucene?
In general, there is no plan for Lucene. If someone implements agenerally useful, efficient, feature in a back-compatible, easy to use,manner, and submits it as a patch, then it becomes a part of Lucene.That's the way Lucene changes. Since we don't pay anyone, we can't makeplans and assign tasks. So if you're particularly interested in thisfeature, you might search the archives to find past efforts, or simplytry to implement it yourself.
I think a good approach would be to create a new IndexSearcher instancebased on an existing one, that shares IndexReaders. Similarly, oneshould be able to create a new IndexReader based on an existing one.This would be a MultiReader that shares many of the same SegmentReaders.
Things get a little tricky after this.
Lucene caches filters based on the IndexReader. So filters would needto be re-created. Ideally these could be incrementally re-created, butthat might be difficult. What might be simpler would be to use aMultiSearcher constructed with an IndexSearcher per SegmentReader,avoiding the use of MultiReader. Then the caches would still work.This would require making a few things public that are not at present.Perhaps adding a 'MultiReader.getSubReaders()' method, combined with an'static IndexReader.reopen(IndexReader)' method. The latter wouldreturn a new MultiReader that shared SegmentReaders with the oldversion. Then one could use getSubReaders() on the new multi reader toextract the current set to use when constructing a MultiSearcher.
Another tricky bit is figuring out when to close readers.
Does this make sense? This discussion should probably move to thelucene-dev list.
Are there any negatives to updateable IndexSearchers?
Not if implemented well!

Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: GData, updateable IndexSearcher

Reply via email to