David, One suggestion I have for your large index. Is it possible to index these documents ordered by Date? (and ingest new docs in Date order?)
This way index order = Date order, you can do this sort very quickly by using Sort.INDEXORDER with huge indexes I try to see if there's a way i can have the index sorted in some meaningful way so I can use this trick for the most common sort case. hope this helps, Robert On Mon, Apr 20, 2009 at 10:12 AM, David Seltzer <dselt...@tveyes.com> wrote: > Hi Karsten, > > My index contains about 100M documents, and I'm trying to count results > on around 300 facets. At the moment I'm keeping a set of cached facet > bitsets and then comparing the query result against those bitsets. > Performance is pretty lousy. It takes more than 2s to calculate the > cardinality of the main query against those 300 facets. > > I have two possible datasets to use for the facets. One is an integer > and the other is a short string (about 10 characters). > > The taxonomy solution seems interesting but it might be overkill since > there is really no hierarchical relationship between these facets. > > I could count the facets manually by implementing a hitcollector, but > the javadocs warn (pretty strenuously) about reading the content of a > document inside a hitcollector. Is this something I should be worried > about, or is it an inevitable part of the solution? > > Thanks! > > -Dave > > -----Original Message----- > From: Karsten F. [mailto:karsten-luc...@fiz-technik.de] > Sent: Saturday, April 18, 2009 10:58 AM > To: java-user@lucene.apache.org > Subject: Re: Faceting, Sort and DocIDSet > > > Hi Dave, > > searching and sorting in lucene are two separate functions (if you not > want > to sort by relevance). > You will not loss performance if you first search with BitSet as > HitCollector and then sort the result by DateField. > But more easy is to extend TopFieldDocCollector/TopFieldCollector to a > Collector with facet count. > > Sujit Pal's implementation of facet count is a good idea if you have a > small > amount of facets and a lot documents for each facet. > > I know half a dozen of implementations of facet browsing. > To choose the best you have to know: > - How many different values have the facet? Which kind of value > (Integer, > small String, huge String)? > - More then one value of the facet per document/how many in average? > > Possible > http://www.nabble.com/Taxonomy-in-Lucene-td20929487.html > is also interesting for you. > > Best regards > Karsten > > > David Seltzer wrote: > > > > I have a set of indexes, each index contains a month's worth of > > Articles. I need to be able to search the index (sorting by date) and > > then apply access-filters based on the Article Source. I'm also trying > > to get result counts for each Article Source. > > So my questions: > > 1) How do I use a HitCollector and sort by a field? > > 2) Is using BitSets the wrong way to quickly generate facet counts? > I've > > read about DocIDSets, but I'm not sure how to use them in the same > way. > > (I'm basing my faceting technique on Sujit Pal's article > > > http://sujitpal.blogspot.com/2007/04/lucene-search-within-search-with.ht > > ml) > > > > Thanks! > > > > -Dave > > > > -- > View this message in context: > http://www.nabble.com/Faceting%2C-Sort-and-DocIDSet-tp23099854p23113784. > html<http://www.nabble.com/Faceting%2C-Sort-and-DocIDSet-tp23099854p23113784.%0Ahtml> > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Robert Muir rcm...@gmail.com