Tim (and we should move this to java-dev if it gains traction), Perhaps you can come up with a mechanism to perform scoring in two passes instead of one: - first pass is cheap and fast - second pass is more expensive and slower
Currently, there is no choice - Lucene does 2). But perhaps you can come up with a generic way to do 1) ? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Tim Sturge <[EMAIL PROTECTED]> > To: "java-user@lucene.apache.org" <java-user@lucene.apache.org> > Sent: Thursday, December 4, 2008 3:27:30 PM > Subject: Slow queries with lots of hits > > Hi all, > > I have an interesting problem with my query traffic. Most of the queries run > in a fairly short amount of time (< 100ms) but a few take over 1000ms. These > queries are predominantly those with a huge number of hits (>1 million hits > in a >100 million document index). The time taken (as far as I can tell) is > for lucene to sit there while it scores and sorts all these results. > > However it turns out these queries really don¹t have top results. That is, > of the million documents, there are easily 10000 which are decent results > (basically those above some threshold score). Frankly, just returning some > consistent (so paging and reload work) but > otherwise arbitrary ranking of these 10000 results would be more than good > enough. > > It seems to me that a solution would be to impose some sort of pseudo-random > filter (e.g. consider only every n-th document assuming they are uniformly > distributed). I¹m wondering if anyone else has experience with this sort of > issue and what solutions they have found to work well in practice. > > Thanks, > > Tim --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]