RE: Faceting, Sort and DocIDSet

Karsten F. Mon, 20 Apr 2009 13:00:30 -0700

Hi David,

correct: you should avoid reading the content of a document inside a
hitcollector.
Normaly that means to cache all you need in main memory. Very simple and
fast is a facet with only 255 possible values and exactly one value per
document. In this case you need only an byte[IndexReader.maxDoc()] array in
cache and an int[256] array for collecting the results
(we have 5 GByte to run lucene with a couple of facets).

About "facet". For me a facet corresponds to a field in lucene. So 300
facets are quite a lot.
Or did you mean two facets with 150 values each?

To find a good solution for your 100M Document, I have three questions:
 - How many hits per search?
 - More then one value of the facet per document/how many in average?

INDEXORDER means document number. 
MultiSearcher works also fine:
If you have one index for each year and for each of this indices the
indexorder in order of date, also the MultiSearcher will have correct
INDEXORDER:
Take a look to the variable "int[] starts" in MultiSearcher.

David Seltzer wrote:
> 
> 
> Is INDEXORDER based on the DocumentID within each individual index? If so
> then the results could be interleaved. Anyone know how this behaves?
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Faceting%2C-Sort-and-DocIDSet-tp23099854p23143797.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: Faceting, Sort and DocIDSet

Reply via email to