Hello, I have a similar problem, for which ParallelReader looks like a good solution -- except for the problem of creating a set of indices with matching document numbers.
I want to augment the documents in an existing index with information that can be extracted from the same index. (Basically, I am indexing a mailing list archive, and want to add keyword fields to documents that contain the message ids of followup messages. That way, I could quickly link to the followup messages from original message. Unfortunately, I don't know the ids of all followup messages until after I indexed the whole archive.) I tried to implement a FilterIndexReader that would add the required information, but couldn't get that to work. (I guess there's more to extending FilterIndexReader than just overriding the document() method and tacking a few more keyword fields on to the document before returning it.) When I add my FilterIndexReader to a new IndexWriter with the .addIndexes() method, it seems to work, but when I try to optimize the new index, I get the following error: merging segments _0 (1900 docs)Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 100203040 at java.util.ArrayList.get(ArrayList.java:326) at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:155) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:66) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:237) at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:185) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:92) at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:487) at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:366) at org.sebastiankirsch.thesis.util.MailFilterIndexReader.main(MailFilterIndexReader.java:210) If I don't optimize the index, I don't get an error, but Luke cannot read the new index properly. I guess this has something to do with me messing with the documents without properly adjusting the index terms etc. At the moment, I index the whole archive twice, and use the info from the first index to add missing fields to the second index. However, it would save me a lot of work (and processing power, of course) if I could just postprocess the index from the first pass without re-indexing the messages. Furthermore, it would open up the possibility to apply even more passes to the postprocessing. (I'm probably going to need that soon.) I presume that a ParallelIndexReader could be merged into a single index using addIndexes()? So if the problem of keeping the doc numbers in sync can be solved ... Alternatively, I would welcome hints as to how to implement a FilterIndexReader properly. Thanks very much for your time, Sebastian On Mon, May 30, 2005 at 11:32:13AM -0400, Robichaud, Jean-Philippe wrote: > What about: > http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/src/java/org/apache/luce > ne/index/ParallelReader.java?rev=169859&view=markup -- Sebastian Kirsch <[EMAIL PROTECTED]> [http://www.sebastian-kirsch.org/] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]