Thanks for your reply. Yes, it's likely that many terms occur in few documents.
If I understand you right, I should do the following: -Write a HitCollector that simply increments a counter -Get the filter for the user query once: new CachingWrapperFilter(new QueryWrapperFilter(userQuery)); -Create a TermQuery for each term -Perform the search and read the counter of the HitCollector I did that, but it didn't get faster. Any ideas why? Regards, Chris 2009/10/12 John Wang <john.w...@gmail.com> > Given you have 1M docs and about 1M terms, do you see very few docs per > term? > If your DocSet per term is very sparse, BitSet is probably not a good > representation. Simple int array maybe better for memory, and faster for > iterating. > > -John > > On Mon, Oct 12, 2009 at 8:45 AM, Paul Elschot <paul.elsc...@xs4all.nl > >wrote: > > > On Monday 12 October 2009 14:53:45 Christoph Boosz wrote: > > > Hi, > > > > > > I have a question related to faceted search. My index contains more > than > > 1 > > > million documents, and nearly 1 million terms. My aim is to get a > > DocIdSet > > > for each term occurring in the result of a query. I use the approach > > > described on > > > > > > http://sujitpal.blogspot.com/2007/04/lucene-search-within-search-with.html > > < > > > https://service.gmx.net/de/cgi/derefer?TYPE=3&DEST=http%3A%2F%2Fsujitpal.blogspot.com%2F2007%2F04%2Flucene-search-within-search-with.html > > >, > > > where a BitSet is built out of a QueryFilter for each term and > > intersected > > > with the BitSet representing the user query. > > > However, performance could be better. I guess it’s because the term > > filter > > > considers each document in the index, even if it’s not in the result. > My > > > attempt to use a ChainedFilter, where the first filter (cached) is for > > the > > > user query, and the second one for the term (done for all terms), > didn’t > > > speed things up, though. > > > Am I missing something? Is there a better way to get the DocIdSets for > a > > > huge number of terms in a limited set of documents? > > > > Assuming you only need the number of documents within the original query > > that contain each term, one thing that can be saved is the allocation of > > the > > resulting BitSet for each term. To do this, use the cached BitSet (or the > > OpenBitSet in current lucene) for the original Query as a filter for a > > TermQuery > > per term, and then count the matching documents by using a counting > > HitCollector on the IndexSearcher. > > > > Regards, > > Paul Elschot > > >