> I'm going to be running the archive full-time, so yes. I am not a C
> programmer by any stretch of the imagination, so I'm not sure what I
> could do.
You could do something that would help a *lot* and is not programming
: find documentation (papers, books) on the subject of parallel
indexing/search, find pieces of code or examples, read them and
summarize the whole thing so that we all have a clear idea of the
state of the art, the tradeoffs etc.
> Have 4 search machines. Each machine has 125 of the 500 databases,
> merged together. Then I have a PHP script on the main web server that
> opens an http connection to query the machines and takes the top few of
> each result set.
This is the easy way and it's probably the best way to do it at present.
But it's not efficient, I agree.
> That could work, but isn't very efficient. And splitting the databases
> evenly across the four machines would be tricky at best.
The idea of splitting according to MD5(word or URL) / machine requires
support at the indexer level but it's something simple that would work
well, IMHO.
Cheers,
--
Loic Dachary
ECILA
100 av. du Gal Leclerc
93500 Pantin - France
Tel: 33 1 56 96 09 80, Fax: 33 1 56 96 09 61
e-mail: [EMAIL PROTECTED] URL: http://www.senga.org/
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.