Hi Karsten,

My index contains about 100M documents, and I'm trying to count results
on around 300 facets. At the moment I'm keeping a set of cached facet
bitsets and then comparing the query result against those bitsets.
Performance is pretty lousy. It takes more than 2s to calculate the
cardinality of the main query against those 300 facets.

I have two possible datasets to use for the facets. One is an integer
and the other is a short string (about 10 characters).

The taxonomy solution seems interesting but it might be overkill since
there is really no hierarchical relationship between these facets.

I could count the facets manually by implementing a hitcollector, but
the javadocs warn (pretty strenuously) about reading the content of a
document inside a hitcollector. Is this something I should be worried
about, or is it an inevitable part of the solution?

Thanks!

-Dave

-----Original Message-----
From: Karsten F. [mailto:karsten-luc...@fiz-technik.de] 
Sent: Saturday, April 18, 2009 10:58 AM
To: java-user@lucene.apache.org
Subject: Re: Faceting, Sort and DocIDSet


Hi Dave,

searching and sorting in lucene are two separate functions (if you not
want
to sort by relevance).
You will not loss performance if you first search with BitSet as
HitCollector and then sort the result by DateField.
But more easy is to extend TopFieldDocCollector/TopFieldCollector to a
Collector with facet count.

Sujit Pal's implementation of facet count is a good idea if you have a
small
amount of facets and a lot documents for each facet.

I know half a dozen of implementations of facet browsing.
To choose the best you have to know:
 - How many different values have the facet? Which kind of value
(Integer,
small String, huge String)?
 - More then one value of the facet per document/how many in average?
 
Possible
http://www.nabble.com/Taxonomy-in-Lucene-td20929487.html
is also interesting for you.

Best regards
  Karsten


David Seltzer wrote:
> 
> I have a set of indexes, each index contains a month's worth of
> Articles. I need to be able to search the index (sorting by date) and
> then apply access-filters based on the Article Source. I'm also trying
> to get result counts for each Article Source.
> So my questions: 
> 1) How do I use a HitCollector and sort by a field? 
> 2) Is using BitSets the wrong way to quickly generate facet counts?
I've
> read about DocIDSets, but I'm not sure how to use them in the same
way.
> (I'm basing my faceting technique on Sujit Pal's article
>
http://sujitpal.blogspot.com/2007/04/lucene-search-within-search-with.ht
> ml)
> 
> Thanks!
> 
> -Dave
> 

-- 
View this message in context:
http://www.nabble.com/Faceting%2C-Sort-and-DocIDSet-tp23099854p23113784.
html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to