18 jul 2008 kl. 09.49 skrev Eric Bowman:

One thing I have trouble understanding is how scoring works in this case. Does Lucene really "just work", or are there special things we have to do to make sure that the scores are coherent so we can actually decide which was the best match? What kind of constraints are there when breaking up the index into parts to make sure scoring remains coherent?


AFAIK the score would suffer from splitting up the index as tf/idf then only represent a part of the index, i.e. two identical docments in two indices would end up with different scores as the index meta data is different. I have no clue how large the impact could be nor if there are good and bad ways to split an index.

One solution I can think of is to share complete index over all nodes but restrict the results from each node to a subset of the index using a filter. This should produce the right score but will probably be a bit slower than splitting the index.

Perhaps it would be possible to split the index for searching but use an alternative source for scoring.


          karl

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to