I have been maintaining a mod-perl/MySQL4 web search engine with about 3M records, 
which gains and drops about 100k records daily.
It runs on a dual-Opteron 242 system with 8GB RAM, 15k SCSI driv, SUSE Professional 
for AMD64.

It recently grew to this size (from about 1M records), and I am encountering some 
problems scaling.

1. The timeout is set to 5 min, because of the number of queries, there are a lot of 
unused http processes that linger with connections, and the only way to seeminly keep 
MySQL connections available is to keep timeouts short.

However, I have updates that take over an hour at a time, and the 5 min timeout will 
kill the update process. If I change the timeout to 2 hours, Apache will eat up all 
the connections. Im a little confused at the behaviour.

Any suggestions appreciated. It seems to me the only answer is to maintain a 
completely seperate MySQL server with a 2 hour timeout on the same box, hotcopy the 
database, do the updates, and hotcopy it back, which I would *really* prefer not to 
do. There has got to be an easier way - any suggestions?

2. The other problem I have is that because I am rotating so many records daily and 
the queries are so complex and the tables/indexes so large, I want to keep the db 
OPTIMIZEd, but running an OPTIMIZE TABLE on this database also takes over an hour and 
creates timeouts on queries. At the same time, I want to get the optimize done as soon 
as possible.

Again, a separate server would solve this problem, but I am really struggling for a 
way to not do that - as much as I love MySQL, I really hate installing, configuring, 
maintaining parallel permissions on 2 servers. Especially since MySQL needs 
maintainence so infrequently, every time I need to configure it it is like learning 
the entire process all over again from scratch, and I always goof on some little 
detail that doesnt raise its ugly head right away.

If there is another option ( or maybe I shouldnt be OPTIMIZING every day?), 
suggestions appreciated.

PS I do not have any problems with the query performance.

Reply via email to