On Sunday 06 February 2005 20:00, Chris Hostetter wrote: > : > care about their content. I only want to know a particular numeric > : > field from > : > document (id of document's category). > : > I also need to know how many docs in category were found, so I can't > : > index > : > : You should explore the use of IndexReader. Index your documents with > : category id field, and use the methods on IndexReader to find all > : unique categories (TermEnum). > > to expand on erik's suggestion: once you know the complete list of > categories you iterate over then and execute your search once per > category, filtering each time on the category Id (to determine the number > of results from that category).
Nah, I did a little more tricky thing, but promises to be faster (I have 12K categories now and there will be more). I index docs' categories ids as zero-padded keywords. Then I do search for documents, sorting them by category id. Then I iterate Hits following the scheme: 1. I have the cache that holds ids of documents in current category. 2. Each time I see doc id that is not in current category, I read that document and reload cache with it's category data. So if I found docs in N categories (N usually is not big), I really need to read exactly N docs from disk, the rest of iterating through Hits is just checking cache (because I sort by category). It's a pity lucene doesn't have IndexSearcher.search( Query, Sort, HitCollector ), but if I understood Hits properly, it gives me O( log2 ( doc_dum ) ) performance impact per resultset, which is perfectly acceptable. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]