Almost parallel indexes

Tim Sturge Thu, 27 Sep 2007 16:14:33 -0700

Hi,

I have an index which contains two very distinct types of fields:


- Some fields are large (many term documents) and change fairly slowly.
- Some fields are small (mostly titles, names, anchor text) and change fairly 
rapidly.

Right now I keep around the large fields in raw form and when the small fields 
change, I retokenize the large and the small fields together. The problem is 
that this retokenization is sucking up most of my CPU time, making the indexing 
process too slow (this index needs to track changes in almost real time; I'm 
using one of the reopen() patches from LUCENE-743 in JIRA to achieve this).

I can't really use ParallelReader to keep the indexes the same; it requires me 
to add documents to both indexes which means I have to retokenize the large 
fields anyway. I would want to do a "join" on an external id, and as far as I 
can tell, Lucene doesn't support that.

Alternatively, what I'd like is a way to either store a pre-tokenized version 
of the large fields, or to be able to add fields to a document that come from 
an existing document in the index. 

I suspect there is more to this question than meets the eye, but I'd be 
interested in any strategies that people have used in the past.

Thanks,

Tim

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Almost parallel indexes

Reply via email to