Hi there,

I'm using nutch 0.7 and Hadoop 0.4 to create a distributed search
environment. I have a problem with getting the results in the right order.
My scenario is the following:

- I have multiple search servers but each of them have different sets of
lucene documents. Some of them have new information and others have old
information (in relation to a specific query) so the goal of our project is
always to retrieve new stuff independent from where the documents come from.
The search in nutch is able to sort by date and return the newest documents.
However, In our test we could have the following:


search server A return (10 ) documents with sort date from 03/17/2007 to
03/15/2007
search server B return (10) documents with sort date from 01/11/2003 to
01/05/2003

both are sort by date so the logic works. But, in search server A I have
more new documents than server B.  So the idea is that if we have some case
like the one described how we could handle. Do we need to create another RPC
call to retrieve new stuff from search server A?, Do we need to create a RPC
call that ask each of the servers how new is its documents in relation with
the query?

I hope that my problem make some sense

Xavier
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to