Re: [jira] Commented: (SOLR-139) Support updateable/modifiable documents

2007-07-13 Thread Mike Klaas


On 13-Jul-07, at 1:53 PM, Yonik Seeley (JIRA) wrote:




... ParallelReader, where some fields are in one sub-index ...
the processor would ask the updateHandler for the existing  
document - the updateHandler deals with

getting it to/from the right place.


The big reason you would use ParallelReader is to avoid touching  
the less-modified/bigger fields in one index when changing some of  
the other fields in the other index.


I've pondered this a few times: it could be a huge win for  
highlighting apps, which can be stored-field-heavy.


However, I wonder if there is something that I am missing: PR  
requires perfect synchro of lucene doc ids, no?  If you update fields  
for a doc in one index, need not you (re-)store the fields in all  
other indices too, to keep the doc ids in sync?


-mike


Re: [jira] Commented: (SOLR-139) Support updateable/modifiable documents

2007-07-13 Thread Yonik Seeley

On 7/13/07, Mike Klaas [EMAIL PROTECTED] wrote:

 ... ParallelReader, where some fields are in one sub-index ...
 the processor would ask the updateHandler for the existing
 document - the updateHandler deals with
 getting it to/from the right place.

 The big reason you would use ParallelReader is to avoid touching
 the less-modified/bigger fields in one index when changing some of
 the other fields in the other index.

I've pondered this a few times: it could be a huge win for
highlighting apps, which can be stored-field-heavy.

However, I wonder if there is something that I am missing: PR
requires perfect synchro of lucene doc ids, no?  If you update fields
for a doc in one index, need not you (re-)store the fields in all
other indices too, to keep the doc ids in sync?


Well, it would be tricky... one PR usecase would be to entirely
re-index one field (in it's own separate index) thus maintaining
synchronization with the main index. As Doug said
ParallelReader was not really designed to support incremental updates of
fields, but rather to accellerate batch updates. For incremental
updates you're probably better served by updating a single index.

That's probably not too useful for a general purpose platform like Solr.

Another way to support a more incremental model is perhaps to split up
the smaller volatile index into many segments so that updating a
single doc involves rewriting just that segment.

There might also be possibilities in different types of IndexReader
implementations:  one could map docids to maintain synchronization.
This brings up a slightly different problem that lucene scorers expect
to go in docid order.

-Yonik