http://bugzilla.spamassassin.org/show_bug.cgi?id=3983
[EMAIL PROTECTED] changed: What |Removed |Added ---------------------------------------------------------------------------- BugsThisDependsOn| |4189 ------- Additional Comments From [EMAIL PROTECTED] 2005-03-15 19:15 ------- yes, bug 4189 is a bug, and the patch in that bug should be applied on top of these. However, I don't think it's the bug Dallas is reporting; this bug manifests itself as lots of "K" state children, and Dallas' logs don't show that. BTW Dallas, I note something very wierd in those logs you posted: --------------------------------------------------------------------------- 2005-01-22 09:50:00.782509500 [pid 1287] write(11, "A\n", 2) = 2 2005-01-22 09:50:00.782578500 [pid 23708] <... read resumed> "A\n", 4096) = 2 2005-01-22 09:50:00.782652500 [pid 1287] write(2, "[1287] debug: prefork: ordered 2"..., 47[1287] debug: prefork: ordered 23708 to accept 2005-01-22 09:50:00.782695500 ) = 47 2005-01-22 09:50:00.782747500 [pid 1287] read(11, <unfinished ...> 2005-01-22 09:50:00.782941500 [pid 23708] rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 2005-01-22 09:50:00.783028500 [pid 23708] accept(5, {sin_family=AF_INET, sin_port=htons(40829), sin_addr=inet_addr("127.0.0.1")}}, [16]) = 11 2005-01-22 09:55:00.629245500 [pid 23708] fcntl64(11, F_GETFL) = 0x2 (flags O_RDWR) 2005-01-22 09:55:00.629333500 [pid 23708] fstat64(11, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0 2005-01-22 09:55:00.629470500 [pid 23708] mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40203000 --------------------------------------------------------------------------- note the 5-minute wait time between the accept() calling and returning. that's the hang, in wait_for_child_to_accept() in SpamdForkScaling.pm, where it calls my $state = $self->read_one_line_from_child_socket($sock); essentially, the child process is hung in the accept() system call. (This is not supposed to happen, naturally. ;) The spamd will sit there waiting for the child to report its status, and the child is just hung waiting for accept() to return... So the question is, why is accept() hanging? Is it: - 1. because there's no connection there to accept, according to the child, and it's just waiting for the next conn request? - 2. because something in the kernel is holding onto a lock and eventually time out after 5 minutes exactly? The timestamps (09:50:00.783028500 and 09:55:00.629245500) are almost exactly 5 minutes apart. That doesn't seem to be accidental, and would lead me to suspect that #1 is very unlikely. This doesn't seem to be a particularly quiet server, where the socket would see no activity for a whole 5 minutes... right? If that's the case, I think #2 seems likely. Anyway, I'll try adding some timeout code to the preforking code, to take care of this. But it looks like some kind of kernel lock timeout to me... accept() should not be sitting on the socket for that long assuming there's other connections arriving during those 5 mins. --j. ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.