Re: GData, updateable IndexSearcher

Chuck Williams Wed, 26 Apr 2006 19:03:01 -0700

If I'm following this correctly, it omits a related issue which is the
need to periodically close and reopen the IndexWriter in order to flush
its internal RAMDirectory, and similarly for the IndexReader used for
delete.  Is there any good solution to avoid these as well?


My app has an IndexManager class that is somewhat like the built-in
IndexModifier, except that it contains all index operations: 
IndexSearcher, IndexReader for search, IndexReader for delete,
IndexWriter and IndexUpdater (my own class that covers many more use
cases than the contributed patch of the same name).  IndexManager tracks
whether or not the index has changed, and spawns a RefreshThread that
periodically commits all updates (close and reopen whatever is open of
IndexWriter, IndexReader for delete and IndexUpdater) and then reopens
the searcher and search reader.

The notion of incrementally updating the searcher is great.  Is there
any way to also avoid closing the writer and delete reader?

Chuck


Doug Cutting wrote on 04/26/2006 10:44 AM:
> jason rutherglen wrote:
>> I was thinking you implied that you knew of someone who had
>> customized their own, but it was a closed source solution.  And if so
>> then you would know how that project faired.  
>
> I don't recall the details, but I know folks have discussed this
> previously, and probably even posted patches, but I don't think any of
> the patches was ready to commit.
>
>> Wouldn't there also need to be a hack on the IndexWriter to keep
>> track of new segments?
>
> I think the 'public static IndexReader.reopen(IndexReader old)' method
> I proposed can easily compare the current list of segments for the
> directory of old to those that old already has open, and determine
> which can be reused and which new segments must be opened.  Deletions
> would be a little tricky to track.  If a segment has had deletions,
> then a new SegmentReader could be cloned from the old, sharing
> everything but the deletions, which could be re-read from disk.  This
> would invalidate cached filters for segments that had deletions.
>
> You could even try to figure out what documents have been deleted,
> then update filters incrementally.  That would be fastest, but more
> complicated.
>
> Doug
>
>> ----- Original Message ----
>> From: Doug Cutting <[EMAIL PROTECTED]>
>> To: solr-dev@lucene.apache.org
>> Sent: Wednesday, April 26, 2006 11:27:44 AM
>> Subject: Re: GData, updateable IndexSearcher
>>
>> jason rutherglen wrote:
>>
>>> Interesting, does this mean there is a plan for incrementally
>>> updateable IndexSearchers to become part of Lucene?
>>
>>
>> In general, there is no plan for Lucene.  If someone implements a
>> generally useful, efficient, feature in a back-compatible, easy to
>> use, manner, and submits it as a patch, then it becomes a part of
>> Lucene. That's the way Lucene changes.  Since we don't pay anyone, we
>> can't make plans and assign tasks.  So if you're particularly
>> interested in this feature, you might search the archives to find
>> past efforts, or simply try to implement it yourself.
>>
>> I think a good approach would be to create a new IndexSearcher
>> instance based on an existing one, that shares IndexReaders. 
>> Similarly, one should be able to create a new IndexReader based on an
>> existing one. This would be a MultiReader that shares many of the
>> same SegmentReaders.
>>
>> Things get a little tricky after this.
>>
>> Lucene caches filters based on the IndexReader.  So filters would
>> need to be re-created.  Ideally these could be incrementally
>> re-created, but that might be difficult.  What might be simpler would
>> be to use a MultiSearcher constructed with an IndexSearcher per
>> SegmentReader, avoiding the use of MultiReader.  Then the caches
>> would still work. This would require making a few things public that
>> are not at present. Perhaps adding a 'MultiReader.getSubReaders()'
>> method, combined with an 'static IndexReader.reopen(IndexReader)'
>> method.  The latter would return a new MultiReader that shared
>> SegmentReaders with the old version.  Then one could use
>> getSubReaders() on the new multi reader to extract the current set to
>> use when constructing a MultiSearcher.
>>
>> Another tricky bit is figuring out when to close readers.
>>
>> Does this make sense?  This discussion should probably move to the
>> lucene-dev list.
>>
>>
>>> Are there any negatives to updateable IndexSearchers?  
>>
>>
>> Not if implemented well!
>>
>> Doug
>>
>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: GData, updateable IndexSearcher

Reply via email to