Hi Karsten, My index contains about 100M documents, and I'm trying to count results on around 300 facets. At the moment I'm keeping a set of cached facet bitsets and then comparing the query result against those bitsets. Performance is pretty lousy. It takes more than 2s to calculate the cardinality of the main query against those 300 facets.
I have two possible datasets to use for the facets. One is an integer and the other is a short string (about 10 characters). The taxonomy solution seems interesting but it might be overkill since there is really no hierarchical relationship between these facets. I could count the facets manually by implementing a hitcollector, but the javadocs warn (pretty strenuously) about reading the content of a document inside a hitcollector. Is this something I should be worried about, or is it an inevitable part of the solution? Thanks! -Dave -----Original Message----- From: Karsten F. [mailto:karsten-luc...@fiz-technik.de] Sent: Saturday, April 18, 2009 10:58 AM To: java-user@lucene.apache.org Subject: Re: Faceting, Sort and DocIDSet Hi Dave, searching and sorting in lucene are two separate functions (if you not want to sort by relevance). You will not loss performance if you first search with BitSet as HitCollector and then sort the result by DateField. But more easy is to extend TopFieldDocCollector/TopFieldCollector to a Collector with facet count. Sujit Pal's implementation of facet count is a good idea if you have a small amount of facets and a lot documents for each facet. I know half a dozen of implementations of facet browsing. To choose the best you have to know: - How many different values have the facet? Which kind of value (Integer, small String, huge String)? - More then one value of the facet per document/how many in average? Possible http://www.nabble.com/Taxonomy-in-Lucene-td20929487.html is also interesting for you. Best regards Karsten David Seltzer wrote: > > I have a set of indexes, each index contains a month's worth of > Articles. I need to be able to search the index (sorting by date) and > then apply access-filters based on the Article Source. I'm also trying > to get result counts for each Article Source. > So my questions: > 1) How do I use a HitCollector and sort by a field? > 2) Is using BitSets the wrong way to quickly generate facet counts? I've > read about DocIDSets, but I'm not sure how to use them in the same way. > (I'm basing my faceting technique on Sujit Pal's article > http://sujitpal.blogspot.com/2007/04/lucene-search-within-search-with.ht > ml) > > Thanks! > > -Dave > -- View this message in context: http://www.nabble.com/Faceting%2C-Sort-and-DocIDSet-tp23099854p23113784. html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org