Greetings,
Could someone describe how the results from multiple indices are merged when
using a MultiSearcher? My naive intuition is that the scores for documents
found in each index could be wildly different, so what criteria is used to
merge the scored docs?
I believe they are blindly merged.
Which means that the IDFs for terms between multiple indices must be
relatively equal, otherwise the results will be skewed.
The simple approach that most people take when dealing with this
issue is to generate a larger set of smaller indices from the total
data set, then randomize the selection of indices that get merged to
form the N final indices. This randomization helps avoid the IDF skew
problem.
There's an Jira issue on the Nutch side (see NUTCH-92) around this
same problem.
-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"Find Code, Find Answers"
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]