Alan wrote:

> Hey all, new to the list here.

> Our setup here at work is an amavis setup to filter mail for many
> domains and emails (typical isp-like environment).   What happens
> ocassionally is that incoming mail will spike and then the mail queue
> starts to grow.  Amavis processing becomes more sporatic, it starts
> taking way longer to do mail processing, say, 30-80 seconds instead of
> the 300-700ms that it takes on off-peak times.

> I have it set to have 4 child processes, and it seems that all 4 will
> work fine for say, 10-30 seconds, and the mailq will go down, then one
> of these processes goes to 100% cpu usage, the others go to 0%, and the
> mailq starts going up.  This stays like this for the 30-80 seconds and
> then it goes back to "normal" processing with timing back down low and
> all children chugging away happily.  Then after N seconds one process
> locks it all up.  While things are locked up the load on the server is
> up in the 2-3+ level.  Normally it is under 2 (right now it's sitting at
> 1.64 with almost nothing in the mailq).

> There doesn't seem to be a correlation between message size (via the
> postfix logs) and the length of time, or some sort of spam bomb via
> looking for lots of decompose notices in the amavis.log.

> SETUP:
> Servers: Debian stable on HT 2.8G server hardware with 1G of ram
> Server 1:
> Postfix (2.3.3-1)
>   |  does rbl checks and forwards to the scanning server
>   |
>   V
> Server 2:
> Amavisd-new (20030616p10-5)
>   |  Spamassassin (2.0.3-2sarge)
>   |  Clamav (0.88.6-0volatile1)
>   V
> Server 3:
> lmtp connection to a a cyrus server for our clients to retrieve.

> Our servers are processing about 250k messages a day, with about 50-60k 
> of those not blocked by RBLs in postfix and being processed by amavis
> (based on the last few days of log files anyway).

> Things seem to have gotten worse lately and we see these spikes of up
> to 5000-12000 messages in the active queue more and more often.  The
> spikes mean that mail delivery is delayed for several hours in some
> cases, leading to none-too-happy clients.  Our only option in these
> cases seem to be to either ride it out or disabled SA processing and
> letting the spam through.

> I've tweaked out SA as much as I know how, disabling the blacklist
> config files, disabling any network checks (pyzor, razor, etc).  Also
> have gone through the SA and amavis wikis performace improvement pages.

> I'm hoping that someone can suggest why this pausing happens.

> Log messages:
> Here's a couple of examples of super log amavis log.

> SA check: 80539
> SA check: 479182

> I realize that the versions of the software are pretty old, and I'm
> working on getting new versions deployed (a tricky situation when this
> is the mail infrastructure for several thousand users :)

> If anyone can give me a hand I'd really appreciate it.
> TIA
> Alan

I think you meant Spamassassin (3.0.3-2sarge). It certainly looks
like SA is the problem. Are you using SQL for Bayes and AWL? This
helps with file locking issues (and is faster).

If you have not migrated Bayes to MySQL and desire to, this may help:
http://www200.pair.com/mecham/spam/debian-spamassassin-sql.html

There was a bug report about problems with message/partial:
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5041
It appears this has not been resolved. If you have samples of the mail
that took a long time to scan it would be interesting to see if they
they contained a message/partial component.

If you are able to obtain one of the offending messages, it would be a
good idea to see what happens when you manually feed it to
spamassassin in debug mode.

I would think with 1GB RAM you could run two more
$max_servers (with complimentary maxproc for the smtp-amavis transport
in master.cf) which *may* give you 50% more throughput, but it does look
like you are already pushing the CPU pretty hard right now so this may
not necessarily improve things.

Upgrading to SA 3.1.7 (from sarge-backports) is pretty simple (and it
will catch a lot more spam than 3.0.3):
http://www.freespamfilter.org/forum/viewtopic.php?t=327
but at the same time is a little more CPU intensive, so scan times
will actually increase slightly overall (and your load average could
increase also). Not necessarily going to solve the immediate problem.

Do you have a lot of left over old amavis... temp directories in
/var/lib/amavis?

Is there any interesting warnings from amavis? Assuming you use $DO_SYSLOG = 1;
egrep -i "(trouble|can't|timed|error|preserving|failed|abort)" 
/var/log/mail.log | grep amavis

Gary V


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
AMaViS-user mailing list
AMaViS-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/amavis-user
AMaViS-FAQ:http://www.amavis.org/amavis-faq.php3
AMaViS-HowTos:http://www.amavis.org/howto/

Reply via email to