Greetings,

Could someone describe how the results from multiple indices are merged when
using a MultiSearcher? My naive intuition is that the scores for documents
found in each index could be wildly different, so what criteria is used to
merge the scored docs?

I believe they are blindly merged.

Which means that the IDFs for terms between multiple indices must be relatively equal, otherwise the results will be skewed.

The simple approach that most people take when dealing with this issue is to generate a larger set of smaller indices from the total data set, then randomize the selection of indices that get merged to form the N final indices. This randomization helps avoid the IDF skew problem.

There's an Jira issue on the Nutch side (see NUTCH-92) around this same problem.

-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"Find Code, Find Answers"

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to