I suppose beefing up the replication server might also work, as that 
seems to be the real bottleneck.
An interesting alternative might be a MySQL cluster (see: 
http://www.mysql.com/products/database/cluster/faq.html
or other sources).  We do something similar to this already with a 
number of our Oracle databases, but have honestly not explored MySQL 
clustering here yet.

The issue of having the LIMIT set, period, instead of it being an option 
is a concern.
If one ever gets behind, ti would appear that depending on the load, 
you'd never catch up.

Your point was that there are times where you might actually deal with 
over 100,000 messages within an hour and
if that's sustained over a few hours, the cleanup will never catch up 
(unless it's aware that it hit the limit and is started
up again).  So, I agree with your concern about the LIMIT being 
implicitly set and would suggest that this could perhaps at least be 
overridden through a command line variable.

--Tobias

Nate wrote:
> At 09:59 AM 10/5/2007, Tobias Kreidl wrote:
>   
>> So, if you remove the "LIMIT" restriction, it then just runs longer,
>> with no other adverse effects?
>>     
>
> correct
>
>   
>> That aside, I assume you want to run the
>> cleanup application at night because the load is generally a lot lower.
>>     
>
> correct, and at night nobody really relies on the secondary database 
> server (see below)
>
>   
>> Just to get an idea what you're up against, how many messages are we
>> talking about per day and what is the configuration of your server in
>> terms of CPU(s), memory and disks that results in its working so hard to
>> keep up?
>>     
>
> This might be more complicated of an answer than expected, but here it goes.
> Messages/Day = 3.5million
> The servers which the policyd daemons run on do not have any issues 
> with load and aren't any part of the delay.  The issues arise from 
> one of the database servers.  We have a multi-master replication 
> setup.  The primary server is 2ghz opteron, 3G ram, mutiple raid1 
> arrays to distribute IO load.  This db server takes care of the 
> cleanup in seconds.  The secondary db server receives the replicate 
> cleanup command shortly thereafter and is not quite as beefy.  1ghz 
> athlon, 640M ram, single raid array.  The issue is completely IO load 
> on the secondary server.  The cleanup requests take roughly 30 
> minutes to complete, during which time all other replicated commands 
> fall behind.
>
> More ram and perhaps more disks could solve the IO load for now; 
> however, a much simpler and cheaper solution is to change the cleanup 
> script to run once per day rather than hourly during an hour when 
> nobody cares if the replication server is behind.  Even if it takes 
> an hour to run once per day instead of 30 minutes every hour, that is 
> much better IMO.
>
> Thanks,
>
> - Nate
>   

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
policyd-users mailing list
policyd-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/policyd-users

Reply via email to