http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5422
------- Additional Comments From [EMAIL PROTECTED] 2007-04-24 06:52 -------
ok, I have a theory. I think this is what's happening: at some point, the
{kids} array is not coherent. for example, this order of two
immediately-contiguous lines in the log demonstrates it:
Apr 24 13:56:51 mxin001 spamd[44308]: JMD bug5313
read_one_message_from_child_socket 69792=I at
/usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/SpamdForkScaling.pm line
400.
Apr 24 13:56:51 mxin001 spamd[44308]: prefork: child states:
BBKBBBBBBBBBBBBBBBBBBBBBBBBB
After the first line, $self->set_child_state ($pid, PFSTATE_IDLE); is called,
which sets the {kids} entry for pid 69792 to PFSTATE_IDLE unless the pid has no
entry (which only happens for servers which have exited). However, there's no
"I" in the "child states" line!
This later results in a kid notifying the parent that its state is "B"
(PFSTATE_BUSY), the parent notes this, but the notification is "lost" somehow
-- hence the parent attempts to assign a job to the supposedly PFSTATE_IDLE
task, causing the error.
I think the reason it's becoming incoherent is due to an intermittent race
condition between the main thread and the SIGCHLD signal handler. The latter
performs write ops on the {kids} hash -- it deletes entries from the hash.
Perhaps when this happens at bad times, it results in other entries getting
"lost" somehow, and therefore causing the incoherence.
I'll upload a new version of SpamdForkScaling.pm (in entirety, not a patch, too
many patches = getting messy ;). This version moves all deletions from the
{kids} hash into the mainline, and adds (yet more) debugging info.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.