I just wanted to refresh the thread about distributed searching for
large, high-traffic ht://dig databases.

Basically, what I have is 6GB of text that amounts to about 14GB of
indexes and ht://dig databases. That has accumulated in just 5 months,
so it's going to get a lot bigger.

I need to be able to issue searches across that tens of thousands of
times per day, and be able to update those indexes incrementally
(reindexing the entire 2.5 million emails is impractical).

Right now, I use ht://dig to dig the new messages each week, then
multimerge (or htmerge or whatever), to mix the new messages in with the
old messages. I have 450-500 separate ht://dig databases to keep the
size down (on ht://dig database for each mailing list).

I have the feeling that sorting/merging those 500 databases into one big
database is going to be prohibitively expensive on processor/IO,
especially if it needs to be done each and every week when the indexes
are updated.

So does anyone have some specific suggestions for what I'm trying to do?

Let's assume hardware is free ;-)

Tim

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to