I suppose beefing up the replication server might also work, as that seems to be the real bottleneck. An interesting alternative might be a MySQL cluster (see: http://www.mysql.com/products/database/cluster/faq.html or other sources). We do something similar to this already with a number of our Oracle databases, but have honestly not explored MySQL clustering here yet.
The issue of having the LIMIT set, period, instead of it being an option is a concern. If one ever gets behind, ti would appear that depending on the load, you'd never catch up. Your point was that there are times where you might actually deal with over 100,000 messages within an hour and if that's sustained over a few hours, the cleanup will never catch up (unless it's aware that it hit the limit and is started up again). So, I agree with your concern about the LIMIT being implicitly set and would suggest that this could perhaps at least be overridden through a command line variable. --Tobias Nate wrote: > At 09:59 AM 10/5/2007, Tobias Kreidl wrote: > >> So, if you remove the "LIMIT" restriction, it then just runs longer, >> with no other adverse effects? >> > > correct > > >> That aside, I assume you want to run the >> cleanup application at night because the load is generally a lot lower. >> > > correct, and at night nobody really relies on the secondary database > server (see below) > > >> Just to get an idea what you're up against, how many messages are we >> talking about per day and what is the configuration of your server in >> terms of CPU(s), memory and disks that results in its working so hard to >> keep up? >> > > This might be more complicated of an answer than expected, but here it goes. > Messages/Day = 3.5million > The servers which the policyd daemons run on do not have any issues > with load and aren't any part of the delay. The issues arise from > one of the database servers. We have a multi-master replication > setup. The primary server is 2ghz opteron, 3G ram, mutiple raid1 > arrays to distribute IO load. This db server takes care of the > cleanup in seconds. The secondary db server receives the replicate > cleanup command shortly thereafter and is not quite as beefy. 1ghz > athlon, 640M ram, single raid array. The issue is completely IO load > on the secondary server. The cleanup requests take roughly 30 > minutes to complete, during which time all other replicated commands > fall behind. > > More ram and perhaps more disks could solve the IO load for now; > however, a much simpler and cheaper solution is to change the cleanup > script to run once per day rather than hourly during an hour when > nobody cares if the replication server is behind. Even if it takes > an hour to run once per day instead of 30 minutes every hour, that is > much better IMO. > > Thanks, > > - Nate > ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ policyd-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/policyd-users
