One issue is that if you are splitting the index in half (for example), getting some results from index A and some from index B, then you need to merge the results somewhere. But the scores coming from the two indexes are not related at all, for example, document 100 from index A has score 0.85, document 200 from index B has score 0.90 -- this doesn't necessarily mean that document 200 should be ranked before document 100. This is one issue to deal with.
I think this issue has been discussed on this mailing list before. Has anyone else had to deal with this issue with a distributed index? What does Nutch do? -chris On 1/31/06, Chun Wei Ho <[EMAIL PROTECTED]> wrote: > I am deploying a web application serving searches on a Lucene index, > and am deciding between distributing search between several machines > or single searching, and was hoping that someone could tell me from > their experiences: > > + Is there anything particular to watch out for if using distributed > searching instead of searching one merged Lucene index? > > + What should be the size of the index that I am looking at before I > need to (or should be) turn to distributed searching to reduce > response/search time? I know it would depend a lot on hardware and request > frequency but I was wondering if anyone could post their hardware > info and index size as a reference of when/if they had to use > distributed search due to load issues. > > Thanks :) > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]