One issue is that if you are splitting the index in half (for
example), getting some results from index A and some from index B,
then you need to merge the results somewhere.  But the scores coming
from the two indexes are not related at all, for example, document 100
from index A has score 0.85, document 200 from index B has score 0.90
-- this doesn't necessarily mean that document 200 should be ranked
before document 100.    This is one issue to deal with.

I think this issue has been discussed on this mailing list before. 
Has anyone else had to deal with this issue with a distributed index? 
What does Nutch do?

-chris

On 1/31/06, Chun Wei Ho <[EMAIL PROTECTED]> wrote:
> I am deploying a web application serving searches on a Lucene index,
> and am deciding between distributing search between several machines
> or single searching, and was hoping that someone could tell me from
> their experiences:
>
> + Is there anything particular to watch out for if using distributed
> searching instead of  searching one merged Lucene index?
>
> + What should be the size of the index that I am looking at before I
> need to (or should be) turn to distributed searching to reduce
> response/search time? I know it would depend a lot on hardware and request
> frequency but I was wondering if anyone could post their hardware
> info and index size as a reference of when/if they had to use
> distributed search due to load issues.
>
> Thanks :)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to