hi yonik,
   
  thanks for the thurough reply,,
   
  a few more quick questions...
   
  "the number of facets to check per request
average somewhere between 100 and 200 (the total number of unique
facets is much larger though). "
   
  you mean 100 - 200 different catagories to facet?
   
  i ran the test on a 600,000 doc index, however the cool thing about my 
solution, is that the total doc count is not too relavant , i will be checking 
this with much larger indexes probably 10x the size of my initial testing, and 
algorithmically i dont expect too much of a performance dropoff, due to the 
fact that response time is effected by the result set size and not the docs in 
the index size (since i cache all faceted values on startup),
   
  as for the 500 milli, this is basically what i do in that time:
   
  1. in each search instance: initally send a query and return the top 100 
docs. start a seprate thread to collect full facet values (i do this by 
resending the same query with maxDoc as the number of results to return.... can 
i save this requerying somehow?)
   
  2. then merge all instances' docs using a custom parallel m searcher
   
  3. for the top 100 docs i calculate which doc came from which instance
   
  4. and send the doc id's back to each instance and have each instance create 
facets on its docs from the top 100
   
  5. each instance returns this info, i then go back to the instance and pass 
to them the top 20 terms of each facet for the actual facet counts...
   
  i do this so that the facet counts i display are from good docs, i am trying 
to avoid a situation where i recieve 5,000 results and that 4,500 of them with 
awful rankings have the same facet values and therefore the facets displayed in 
the UI are of bad ranked docs
   
  confusing!!!!
   
  however , i will look into your impl, it sounds solid, i am curretly on 
lucene 1.4.3 (which classes should i look into in solr?)
   
  comments welcomed
   
  thanks in advance!
  

Yonik Seeley <[EMAIL PROTECTED]> wrote:
  On 6/10/06, zzzzz shalev wrote:
> 1. could you let me know what kind of response time you were getting with 
> solr (as well as the size of data and result sizes)

A can tell you a little bit about ours... on one CNET faceted browsing
implementation using Solr, the number of facets to check per request
average somewhere between 100 and 200 (the total number of unique
facets is much larger though). The median request time is 3ms (and I
don't think the majority of that time is calculating set
intersections).

We actually don't have the LRUCaches set large enough to achieve a
100% hit rate, but performance is still fine.

> 2. i took a really really quick look at DocSetHitCollector and saw the dreaded
>
> if (bits==null) bits = new BitSet(maxDoc);

Yes, DocSets can be memory intensive. A BitSet is only used when the
number of results gets larger than a threshold... below that, a
HashDocSet is used that is O(n) rather than O(maxDoc). So the memory
footprint also depends on the cardinality of the sets.

> since i rewrote some lucene code to support 64-bit search instances i have 
> indexes that may reach quite a few GB's ,

GBs of index size, or actually billions of documents. It's the number
of documents that matters in this case.

> allocating bitset's (arrays of long's is quite expensive memory wise and i am 
> still a little
> skeptical about performance with large result sets)

I just checked in a replacement for BitSet that takes intersection
counts much faster.

> i did some testing of my facet impl and after an overnight webload session 
> received about a 500 milli response time average for full faceting (with 
> result sets from a few thousand to over 100,000)

How many documents was that with, and how many facets per document?

I certainly am interested in more memory efficient faceted browsing,
and have been meaning to try some alternatives. So far, we've had
good results using cached DocSets though.


-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



 __________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Reply via email to