: > undesired words as a sort of stoplist. But surely there's a better way : > to do it (the inverted index structure seems like this should be : > natural). Any pointers would be most helpful.
I've never given this much thought, but i know that merging indexes can be done with IndexReaders, and I know i've seen a FilterIndexReader class which lets you proxy to another IndexReader but "filter out" some data by overriding methods and using FilterTermDocs, FilterTermEnum and FilterTermPositions ... would it be possibe to subclass FilterIndexReader with an instance that checked the docFreq of each term and ignored it if it was less then N ... then just merge your old index into a new (epty) index using this FilteredIndexReader... ? -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]