http://bugzilla.spamassassin.org/show_bug.cgi?id=3983


[EMAIL PROTECTED] changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  BugsThisDependsOn|                            |4189




------- Additional Comments From [EMAIL PROTECTED]  2005-03-15 19:15 -------
yes, bug 4189 is a bug, and the patch in that bug should be applied on top of
these.   However, I don't think it's the bug Dallas is reporting; this bug
manifests itself as lots of "K" state children, and Dallas' logs don't show 
that.

BTW Dallas, I note something very wierd in those logs you posted:

---------------------------------------------------------------------------
2005-01-22 09:50:00.782509500 [pid  1287] write(11, "A\n", 2)         = 2
2005-01-22 09:50:00.782578500 [pid 23708] <... read resumed> "A\n", 4096) = 2
2005-01-22 09:50:00.782652500 [pid  1287] write(2, "[1287] debug: prefork:
ordered 2"..., 47[1287] debug: prefork: ordered 23708 to accept
2005-01-22 09:50:00.782695500 ) = 47
2005-01-22 09:50:00.782747500 [pid  1287] read(11,  <unfinished ...>
2005-01-22 09:50:00.782941500 [pid 23708] rt_sigprocmask(SIG_BLOCK, NULL, [], 
8) = 0
2005-01-22 09:50:00.783028500 [pid 23708] accept(5, {sin_family=AF_INET,
sin_port=htons(40829), sin_addr=inet_addr("127.0.0.1")}}, [16]) = 11
2005-01-22 09:55:00.629245500 [pid 23708] fcntl64(11, F_GETFL)        = 0x2
(flags O_RDWR)
2005-01-22 09:55:00.629333500 [pid 23708] fstat64(11, {st_mode=S_IFSOCK|0777,
st_size=0, ...}) = 0
2005-01-22 09:55:00.629470500 [pid 23708] mmap2(NULL, 4096,
PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40203000
---------------------------------------------------------------------------

note the 5-minute wait time between the accept() calling and returning. that's
the hang, in wait_for_child_to_accept() in SpamdForkScaling.pm, where it
calls

    my $state = $self->read_one_line_from_child_socket($sock);

essentially, the child process is hung in the accept() system call. (This is
not supposed to happen, naturally. ;)   The spamd will sit there waiting for
the child to report its status, and the child is just hung waiting for accept()
to return...

So the question is, why is accept() hanging?  Is it:

    - 1. because there's no connection there to accept, according to the child,
      and it's just waiting for the next conn request?
    - 2. because something in the kernel is holding onto a lock and eventually
      time out after 5 minutes exactly?

The timestamps (09:50:00.783028500 and 09:55:00.629245500) are almost exactly 5
minutes apart.  That doesn't seem to be accidental, and would lead me to
suspect that #1 is very unlikely.   This doesn't seem to be a particularly
quiet server, where the socket would see no activity for a whole 5 minutes...
right?

If that's the case, I think #2 seems likely.

Anyway, I'll try adding some timeout code to the preforking code, to take
care of this.  But it looks like some kind of kernel lock timeout to me...
accept() should not be sitting on the socket for that long assuming there's
other connections arriving during those 5 mins.

--j.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to