Hi - Some time ago I started running into problems with amavisd hanging 
on a server that I maintain. In the typical case, everything works fine 
for several hours, after which one child hangs. The system then 
continues to run in this diminished fashion until the second child 
hangs. After that, Postfix starts reporting the following on all 
connection attempts.

  status=deferred (delivery temporarily suspended: connect to 127.0.0.1 
[127.0.0.1]: read timeout)

At this point no further mail is processed until daemons are restarted. 
The current configuration (aside from security updates) worked 
flawlessly for many months before these problems started. I have pretty 
much exhausted my ideas on how to proceed and decided maybe it was time 
to beg for help.

I have increased the amavisd debug level and it is clear that the 
problem results after 'CALLING SA check', from which amavis never 
returns. When this occurs, the child appears to be in a sleep state. An 
strace shows the following.

 elm:~# strace -p 7002
 Process 7002 attached - interrupt to quit
 recvfrom(11,

A netstat shows the connections are in a CLOSE_WAIT state. The lsof 
output includes the following (I have the full output available if 
needed).

 amavisd-n 7002 amavis    3u  unix 0xf1cea6a0         91865013 socket
 amavisd-n 7002 amavis    4u  IPv4   91865015              TCP localhost:10024 
(LISTEN)
 amavisd-n 7002 amavis    5u   REG        8,3    6399  5227052 
/var/lib/amavis/amavis-20060728T101157-07002/email.txt
 amavisd-n 7002 amavis    6u  IPv4   92321291              UDP *:41070
 amavisd-n 7002 amavis    7u  IPv4   92321247              TCP 
localhost:10024->localhost:58311 (CLOSE_WAIT)

I have looked at several of the lsof reported email.txt files and 
haven't seen anything interesting there. I have run a manual expiry on 
the Bayes database several times. I have tried disabling AWL. I have 
tried disabling all plugins in init.pre. No help there. I have also 
tried running amavisd via 'amavis debug-sa'. However when I do this it 
usually takes at least a day for the first child to hang, leaving a 
mountain of data to sift through. The only time I managed to get both 
children to fail in this mode, the last debug messages were as follows.

 debug: leaving helper-app run mode
 debug: Running tests for priority: 500

I was unable to draw any useful conclusions from this information. I 
didn't see any other obvious problems being reported in the debug-sa 
output.

The server in question is running Debian stable with the standard, 
up-to-date packages.

 amavisd-new    20030616p10-5
 spamassassin   3.0.3-2sarge1
 postfix        2.1.5-9
 razor          2.670-1sarge2
 pyzor          0.4.0+cvs20030
 dcc            1.2.74-2
 kernel-image   2.4.27-10sarge

Any help would be greatly appreciated. I would really like to find a 
better solution that hourly postfix/amavisd restarts via cron.

Thanks.

Jim

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
AMaViS-user mailing list
AMaViS-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/amavis-user
AMaViS-FAQ:http://www.amavis.org/amavis-faq.php3
AMaViS-HowTos:http://www.amavis.org/howto/

Reply via email to