No, you've got that right. But there's something I think you might be able to try. Fair warning, I'm remembering things I've read on this list and my memory isn't what it used to be <G>....
I *think* that if you reduce your result set by, say, a filter, you might drastically reduce what gets sorted. I'm thinking of something like this BooleanQuery bq = new BooleanQuery(); bq.add(Filter for the last N days wrapped in a ConstantScoreQuery, MUST) bq.add(all the rest of your stuff). RangeFilter might work for you here. Even if this works, you'll still have to deal with making the range big enough to do what you want. Perhaps an iterative approach, say the first time you run the query and you don't get your 25 (or whatever) results, increase the range and try again. Again, I'm note entirely sure when the filter gets applied, before or after the sort. Nor am I sure how to tell. I'd sure like you to do the work and tell me how <G>.... I *am* sure that this has been discussed in this mailing list, so a search there might settle this.... C'mon Chris, Erik and Yonki, can't you recognize a plea for help when you read it<G>? Although here's yet another thing that flitted through my mind. Is date really the same as doc ID order? And would you be able to sort on DocID instead? And would it matter <G>? If you're adding your documents as they come in, this might work. Doc IDs change, but I *believe* if doc A is added after doc B, the doc ID for A will always be less than the docID for B, although neither of them is guaranteed to be the same between index optimizations. Again, not sure if this helps at all..... Good luck! Erick On 10/18/06, Paul Waite <[EMAIL PROTECTED]> wrote:
Many thanks to Erik and Ollie for responding - a lot of ideas and I'll have my work cut out grokking them properly and thinking about what to do. I'll respond further as that develops. One quick thing though - Erik wrote: > So, I wonder if your out of memory issue is really related to the number > of requests you're servicing. But only you will be able to figure that > out <G>. These problems are...er...unpleasant to track down... Indeed! > I guess I wonder a bit about what large result sets is all about. That > is, do your users really care about results 100-10,000 or do they just > want to page through them on demand? No they don't want that. They just want a small number. What happens is they enter some silly query, like searching for all stories with a single common non-stop-word in them, and with the usual sort criterion of by date (ie. a field) descending, and a limit of, say 25. So Lucene then presumably has to haul out a massive resultset, sort it, and return the top 25 (out of 500,000 or whatever). Isn't that how it goes? Or am I missing something horribly obvious. Cheers, Paul. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]