Hi all,

I've been having a problem recently. We have three relay servers (relay1, relay2, and relay3) that are round robin MX for the most part. We have a cisco local director hooked up to them and some domains use it in DNS.

Anyway, the servers run fine for the most part, with 20-30 messages queues on each. But on random days on random servers (sometimes its relay3, sometimes its relay1), the queues get gigantic ... 50,000+.

I keep thinking these are spam attacks of some sorts, and since we have IDE hard drives, once it starts writing to the queues, it can't read back fast enough and the system gets bogged down.

Relay3 had a nice queue load today. It only got up to 5,000 messages before we realized the problem. A reboot will ALWAYS fix this problem. In other words, if a server has 10,000 messages in the queue, and I reboot it, the queue is immediately flushed the second the machine comes back up... usually about 1000 messages every 2 minutes (so a queue of 5,000 clears out in about 10 minutes).

Its just odd that we have to reboot the box in order for this problem to be solved. I have a graph of what is going on and I can hand out the URL if that will assist in anyone trying to guess the problem. Maybe my IDE drive idea isn't the best idea in the world.

For the record, I just did a top and got this on relay3. If you notice, the CPU is 0% idle (even though its a 3.06 ghz). There are three vscan processes which seem to be using a LOT of cpu time... maybe this is what is occuring, and it gets bad and eventually causes the queue to rise? Anyway, any ideas would be appreciated!

-Matt

----snip----
last pid: 31870;  load averages:  4.35,  4.26,  4.47
93 processes:  5 running, 88 sleeping
CPU states: 93.8% user, 0.0% nice, 5.4% system, 0.8% interrupt, 0.0% idle
Mem: 266M Active, 381M Inact, 173M Wired, 976K Cache, 110M Buf, 168M Free
Swap: 2007M Total, 2007M Free

 PID USERNAME PRI NICE   SIZE    RES STATE    TIME   WCPU    CPU COMMAND
17979 vscan    129    0 45336K 41692K RUN     49:13 20.17% 20.17% perl5.8.6
4903 vscan    129    0 45724K 42080K RUN     84:35 19.97% 19.97% perl5.8.6
26248 vscan    129    0 44500K 40884K RUN     12:44 19.68% 19.68% perl5.8.6
31822 vscan     20    0 45352K 42088K lockf    0:01  5.55%  3.61% perl5.8.6
31656 vscan      4    0 46000K 42740K select   0:03  2.50%  2.49% perl5.8.6
31604 vscan     20    0 47076K 43808K lockf    0:03  2.30%  2.29% perl5.8.6
31606 vscan      4    0 47516K 43808K accept   0:04  1.86%  1.86% perl5.8.6
31690 vscan     20    0 46140K 42840K lockf    0:02  1.52%  1.51% perl5.8.6
31670 vscan     20    0 47624K 44088K lockf    0:03  1.47%  1.46% perl5.8.6
31616 vscan     20    0 46516K 43256K lockf    0:03  1.47%  1.46% perl5.8.6
31773 vscan     20    0 45184K 41920K lockf    0:01  1.51%  1.42% perl5.8.6
31601 vscan    105    0 46572K 43292K RUN      0:03  1.07%  1.07% perl5.8.6
31703 vscan     20    0 46008K 42756K lockf    0:02  0.49%  0.49% perl5.8.6
 432 clamav    20    0 13348K 12692K kserel   3:35  0.00%  0.00% clamd
----snip----





-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
AMaViS-user mailing list
AMaViS-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/amavis-user
AMaViS-FAQ:http://www.amavis.org/amavis-faq.php3
AMaViS-HowTos:http://www.amavis.org/howto/

Reply via email to