Tim Perdue writes:
>
> I just wanted to refresh the thread about distributed searching for
> large, high-traffic ht://dig databases.
>
I would say that the new index structure is exactly trying to
solve your problem. The only thing that has not yet been discussed is
how to implement something that would merge answers from many htdig
databases. Any idea, Geoff ?
If you say 'hardware is free' I would suggest the following. First wait
for 3.2 to be published. Then install freebsd on your machines / or linux
patched for big files. Have one 60 gig striped disks for your original
emails and 60 gig striped disks for the indexes. Index everything in one
database. Have 1gb of memory and give htdig 700Mb shared cache to minimize
disk access.
Tests conducted independently on a similar platform (150Gig disks stripped)
showed that you can expect 50Mb/s on the constructed file system. The disks
are scsi disks. Given the fact that the PC IO Bus has a maximum of 80Mb/s,
this is really good. Of course you have to carefully chose the striping
parameters. This was done using software raid and used ~15-20% of the CPU.
You'd better have a bi-processor then.
Anyway I'm glad you want to do that because I have similar needs. The volume
of data I want to index is around 50 Gb (HTML pages, from web sites) and
the resulting index must be able to answer ~100 000 requests per day. The
average request must be answered in less that 1sec time. There is a peak
usage time of ~20 req/sec but most of the time it's 1 to 3 req/s. I think
it can be done on one machine with 2 processors, 120 Gb of disk and 1gb
of memory. Since I want to grow beyond this limit, I'll have to find a
way to automatically dispatch the indexes and have a query process that
is able to query/merge answers from various htdig database on the network.
If you want that to happen too, are you able to devote development time
to this task ?
Cheers,
--
Loic Dachary
ECILA
100 av. du Gal Leclerc
93500 Pantin - France
Tel: 33 1 56 96 09 80, Fax: 33 1 56 96 09 61
e-mail: [EMAIL PROTECTED] URL: http://www.senga.org/
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.