Scott Smith wrote:
1. Simply use the built-in lucene sort functionality, cache the hit
list and then page through the list. Adv: looks pretty straight
forward, I write less code. Dis: for searches that return a large
number of hits (having a search return several hundred to a few thousand
hits is not uncommon), Lucene is sorting a lot of entries that don't
really need to be sorted (because the user will never look at them) and
sorting tends to be expensive.
2. The other solution uses a priority heap to collect the top N (or
next N) entries. I still have to walk the entire hit list, but keeping
entries in a priority heap means I can determine the N entries I need
with a few comparisons and minimal sorting. I don't have to sort a
bunch of entries whose order I don't care about. Additionally, I don't
have to have all of the entries in memory at one time. The big
disadvantage with this is that I have to write more code. However, it
may be worth it if the performance difference is large enough.

Lucene's built-in sorting code already performs the optimization you describe as (2). So don't bother re-inventing it!


Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to