On May 24, 2005, at 19:30, Damian Menscher wrote:

On Tue, 24 May 2005, Doug Hardie wrote:

On May 24, 2005, at 13:21, Stephen Gran wrote:

On Tue, May 24, 2005 at 12:54:47PM -0700, Doug Hardie said:

http://www.lafn.org/clamav/ktrace.html
http://www.lafn.org/clamav/clamd.html


clamav-milter is only one process. It has multiple threads but those are not visible to the kernel. The problem does not occur immediately with a database reload. It takes 10 or so minutes before it hangs/quits. I suspect that the problem occurs when there are active messages that do not complete before some timeout value. clamav-milter is waiting for everything to go quiet, but on my receive mail server that never happens. There are always 30-40 active sendmail children. As a result it never goes quiet. I suspect that clamav-milter eventually gives up and thats when the problem occurs. On my outgoing mail server which handles considerably less mail, most of the database updates do not cause a problem. On my test server which handles 3 email daily it never causes a problem.


Just to bring you (and anyone else joining us) up to speed, here's a description of how it's supposed to work:

When there's a database update, the milter wants everything to be quiet. So it stops accepting new connections. It then waits for the currently-running children to finish. Once n_children drops to 0, it reloads the database and resumes accepting connections.

At least, that's the theory. In practice, n_children isn't ever hitting 0, so it stays in the !accepting state forever. For example, in the ktrace you posted, n_children dropped from 7 down to 2. The fact that it never reached 0 is the entire problem. Of course, nobody knows *why* it isn't reaching 0. It might be from a hung scanner thread, or from a pthreads race condition, or even a locking issue.

The hope was that getting an strace of each thread of a hung milter would provide information on which of those causes was at fault, and perhaps enable us to actually locate the bug.

I frequently see sendmail children alive for over 30 minutes and sometimes considerably longer. Some connections are very slow at transferring data. I would guess its just not waiting long enough.....


_______________________________________________
http://lurker.clamav.net/list/clamav-users.html

Reply via email to