Hi, I have an index which contains two very distinct types of fields:
- Some fields are large (many term documents) and change fairly slowly. - Some fields are small (mostly titles, names, anchor text) and change fairly rapidly. Right now I keep around the large fields in raw form and when the small fields change, I retokenize the large and the small fields together. The problem is that this retokenization is sucking up most of my CPU time, making the indexing process too slow (this index needs to track changes in almost real time; I'm using one of the reopen() patches from LUCENE-743 in JIRA to achieve this). I can't really use ParallelReader to keep the indexes the same; it requires me to add documents to both indexes which means I have to retokenize the large fields anyway. I would want to do a "join" on an external id, and as far as I can tell, Lucene doesn't support that. Alternatively, what I'd like is a way to either store a pre-tokenized version of the large fields, or to be able to add fields to a document that come from an existing document in the index. I suspect there is more to this question than meets the eye, but I'd be interested in any strategies that people have used in the past. Thanks, Tim --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
