On Thu, 22 Jul 1999, Tim Perdue wrote:

> day. Right now, I don't believe there's any way that this could be done
> because of the scale of the archive. Right now, Geocrawler has over 450
> separate ht://dig databases, which isn't as cool of a search as we want.
..
> I understand you have some multi-search scripts or something, but can
> you conceive of a way to spread these searches across a cluster of

The multidig scripts are for assisting with *indexing* multiple databases.
Basically, it just loops through the equivalent of rundig. It would be
great to run the indexing on a Beowulf cluster since you could have each
node doing independent indexing. This is part of the point of the new
database merge feature. But that's not what you asked.

As for splitting a search across a cluster, you have a few options.
1) Put one (or a few) databases on a particular node.
2) Put all the databases on each node (this isn't quite as nutty if you
have an easy way of sharing the database filesystems)
3) Work out a meta-index system for ht://Dig to allow splitting the
databases in heterogenous ways.

3) is, of course, what everyone wants to do. I've been talking with a few
people about how to allow 'collections' of databases in ht://Dig. 1) or 2)
would work, but they're not as 'fun' or 'clean' as anyone would like. See
more below.

> I'm sure some of the hackers here could figure out something, but I
> wondered if you have some ideas for a starting point, or tell me if I am
> totally nutty for using ht://dig on this.

You're certainly not nutty. This is actually how the big iron as far as
search engines run. They each seem to have different strategies as I've
heard through the grapevine. Google puts a copy of the word database on
each node, but splits the document database. AltaVista supposedly does #2
(perhaps through some sort of shared filesystem). Metasearch engines do
something like #1-- each node is independent of the others and has their
own collection.

Anyway, if there's interest in this, let's take it up in the htdig3-dev
mailing list. It sounds like people are willing to throw some code behind
it, so we'll all benefit. (Plus it sounds just plain cool.)

-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/



------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word unsubscribe in
the SUBJECT of the message.

Reply via email to