Tim (and we should move this to java-dev if it gains traction),

Perhaps you can come up with a mechanism to perform scoring in two passes 
instead of one:
- first pass is cheap and fast
- second pass is more expensive and slower

Currently, there is no choice - Lucene does 2).  But perhaps you can come up 
with a generic way to do 1) ?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Tim Sturge <[EMAIL PROTECTED]>
> To: "java-user@lucene.apache.org" <java-user@lucene.apache.org>
> Sent: Thursday, December 4, 2008 3:27:30 PM
> Subject: Slow queries with lots of hits
> 
> Hi all,
> 
> I have an interesting problem with my query traffic. Most of the queries run
> in a fairly short amount of time (< 100ms) but a few take over 1000ms. These
> queries are predominantly those with a huge number of hits (>1 million hits
> in a >100 million document index). The time taken (as far as I can tell) is
> for lucene to sit there while it scores and sorts all these results.
> 
> However it turns out these queries really don¹t have top results. That is,
> of the million documents, there are easily 10000 which are decent results
> (basically those above some threshold score). Frankly, just returning some
> consistent (so paging and reload work) but
> otherwise arbitrary ranking of these 10000 results would be more than good
> enough.
> 
> It seems to me that a solution would be to impose some sort of pseudo-random
> filter (e.g. consider only every n-th document assuming they are uniformly
> distributed). I¹m wondering if anyone else has experience with this sort of
> issue and what solutions they have found to work well in practice.
> 
> Thanks,
> 
> Tim


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to