The problem is that the databases are too big to be searched in any kind
of a timely fashion by a single computer. I estimate that the database
would be 5GB+ to scan through and rank results for every single request.

So *each search* needs to be split up and distributed somehow.

Also, Linux of course has a 2GB file limitation. And merging/sorting a
5GB database would not be a trivial undertaking.

Thanks,

Tim


Thomas Bj�rn Andersen wrote:
> 
> >>>>> "TP" == Tim Perdue <[EMAIL PROTECTED]> writes:
> 
>  TP> So I'm now working for a certain prominent Linux hardware company
>  TP> and they like the Geocrawler archive. They are going to want to
>  TP> run searches against the *entire* archive, quickly, tens of
>  TP> thousands of times per day. Right now, I don't believe there's
>  TP> any way that this could be done because of the scale of the
>  TP> archive. Right now, Geocrawler has over 450 separate ht://dig
>  TP> databases, which isn't as cool of a search as we want.
> 
>  TP> I understand you have some multi-search scripts or something, but
>  TP> can you conceive of a way to spread these searches across a
>  TP> cluster of machines, aka, a Beowulf cluster or something? There
>  TP> is some talk of giving me a Beowulf cluster to run ht://dig on.
> 
> If I am reading your question correctly, couldn't you use your DNS
> server to do load balancing amongst a group of servers?  I'm assuming
> that you can use something like rcp in a cronscript to copy the
> database files when they have been updated from a central server.
> 
> Best wishes,
> Thomas
> --
> | Thomas Bjorn Andersen, [EMAIL PROTECTED]                 |
> +----------------------------------------------------------+

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word unsubscribe in
the SUBJECT of the message.

Reply via email to