> I'm going to be running the archive full-time, so yes. I am not a C
 > programmer by any stretch of the imagination, so I'm not sure what I
 > could do.

 You could do something that would help a *lot* and is not programming
: find documentation (papers, books) on the subject of parallel
indexing/search, find pieces of code or examples, read them and
summarize the whole thing so that we all have a clear idea of the
state of the art, the tradeoffs etc.

 > Have 4 search machines. Each machine has 125 of the 500 databases,
 > merged together. Then I have a PHP script on the main web server that
 > opens an http connection to query the machines and takes the top few of
 > each result set.

 This is the easy way and it's probably the best way to do it at present.
But it's not efficient, I agree. 

 > That could work, but isn't very efficient. And splitting the databases
 > evenly across the four machines would be tricky at best.

 The idea of splitting according to MD5(word or URL) / machine requires
support at the indexer level but it's something simple that would work
well, IMHO. 

      Cheers,

-- 
                Loic Dachary

                ECILA
                100 av. du Gal Leclerc
                93500 Pantin - France
                Tel: 33 1 56 96 09 80, Fax: 33 1 56 96 09 61
                e-mail: [EMAIL PROTECTED] URL: http://www.senga.org/


------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to