Hi all,
Here are the results of strace/truss on a qmail-lspawn session which
caused qmail v1.00 to crash on Linux RH 5.2. Apologies for the long post.
Here's what I did:
0. Emptied the queue.
1. Restarted qmail
3. Started "strace -f -p 2074" in one window [qmail-send pid]
4. Started "strace -f -p 2076" in another window [qmail-lspawn pid]
5. Ran "echo to:fred|/var/qmail/bin/qmail-inject" in a third window
After that, qmail crashed. But the message had been delivered locally to
fred's Maildir/new directory, _and_ the very same message was still in
the queue!
Here's the results of the qmail-send trace (showing the last few lines
only):
# strace -f -p 2074
[snip]
stat("todo/313551", 0xbffffc98) = -1 ENOENT (No such file or
directory)
stat("info/15/313551", {st_mode=0, st_size=0, ...}) = 0
open("info/15/313551", O_RDONLY|O_NONBLOCK) = 9
fstat(9, {st_mode=0, st_size=0, ...}) = 0
read(9, "[EMAIL PROTECTED]\0", 128) = 27
close(9) = 0
stat("bounce/313551", {st_mode=0, st_size=0, ...}) = 0
pipe([9, 10]) = 0
pipe([11, 12]) = 0
fork() = 2092
[pid 2074] close(9 <unfinished ...>
[pid 2092] close(10 <unfinished ...>
[pid 2074] <... close resumed> ) = 0
[pid 2092] <... close resumed> ) = 0
[pid 2074] close(11 <unfinished ...>
[pid 2092] close(12 <unfinished ...>
[pid 2074] <... close resumed> ) = 0
[pid 2092] <... close resumed> ) = 0
[pid 2074] time( <unfinished ...>
[pid 2092] fcntl(9, F_GETFL <unfinished ...>
[pid 2074] <... time resumed> NULL) = 927231668
[pid 2092] <... fcntl resumed> ) = 0 (flags O_RDONLY)
[pid 2074] --- SIGSEGV (Segmentation fault) ---
close(0) = 0
fcntl(9, F_DUPFD, 0) = 0
close(9) = 0
fcntl(11, F_GETFL) = 0 (flags O_RDONLY)
close(1) = 0
fcntl(11, F_DUPFD, 1) = 1
close(11) = 0
execve("/var/qmail/bin/qmail-queue", ["qmail-queue"], [/* 1 var */]) = -1
EPERM (Operation not permitted)
_exit(120) = ?
Exit code 120 appears to be EXECSOFT, which is being used in a couple of
places. Is this the key to my problem? Is the preceeding line with
qmail-queue the process which exits with exit code 120? What do I do
next?
Also, here's the results of the qmail-lspawntrace (showing the last few
lines only):
# strace -f -p 2076
[snip]
[pid 2086] munmap(0x4000b000, 4096) = 0
[pid 2086] read(0, "Received: (qmail 2084 invoked by"..., 1024) = 224
[pid 2086] read(0, "", 1024) = 0
[pid 2086] write(3, "Return-Path: <[EMAIL PROTECTED]"..., 305) = 305
[pid 2086] fsync(3) = 0
[pid 2086] close(3) = 0
[pid 2086] link("tmp/927231667.2086.localhost.localdomain",
"new/927231667.2086.localhost.localdomain") = 0
[pid 2086] unlink("tmp/927231667.2086.localhost.localdomain") = 0
[pid 2086] _exit(0) = ?
[pid 2085] <... wait4 resumed> [WIFEXITED(s) && WEXITSTATUS(s) == 0], 0,
NULL) = 2086
[pid 2085] write(1, "did 0+0+0\n", 10) = 10
[pid 2085] _exit(0) = ?
<... select resumed> ) = 1 (in [3])
sigprocmask(SIG_SETMASK, [CHLD], NULL) = 0
read(3, "did 0+0+0\n", 128) = 10
sigprocmask(SIG_SETMASK, [], NULL) = 0
select(4, [0 3], NULL, NULL, NULL) = ? ERESTARTNOHAND (To be
restarted)
--- SIGCHLD (Child exited) ---
wait4(-1, [WIFEXITED(s) && WEXITSTATUS(s) == 0], WNOHANG, NULL) = 2085
close(4) = 0
wait4(-1, 0xbffffcd0, WNOHANG, NULL) = -1 ECHILD (No child processes)
sigreturn() = ? (mask now [])
sigprocmask(SIG_SETMASK, [CHLD], NULL) = 0
sigprocmask(SIG_SETMASK, [], NULL) = 0
select(4, [0 3], NULL, NULL, NULL) = 1 (in [3])
sigprocmask(SIG_SETMASK, [CHLD], NULL) = 0
read(3, "", 128) = 0
write(1, "\0Kdid 0+0+0\n\0", 13) = 13
close(3) = 0
sigprocmask(SIG_SETMASK, [], NULL) = 0
select(1, [0], NULL, NULL, NULL) = 1 (in [0])
sigprocmask(SIG_SETMASK, [CHLD], NULL) = 0
read(0, "", 1024) = 0
_exit(0) = ?
In my opinion, this doesn't reveal much about the problem, but I'm by no
means a truss/strace expert so I might be wrong!
Please advice me what to do next. Thanks!
Cheers
Fred
Harald Hanche-Olsen wrote:
> + Fred Backman <[EMAIL PROTECTED]>:
>
> | AFAIK qmail-alias is a child execv'd by qmail-lspawn, which is not
> | supposed to exit.
>
> That's right, qmail-lspawn should not exit until qmail-send tells it
> to.
>
> | > I must have to do something with it. Use a debugger to find out.
> |
> | Sorry, I'm not sure I understand what you mean. What is it you think
> | has something to do with the problem?
>
> Learn to use truss and apply it to qmail-lspawn and qmail-send.
> Something like truss -o file -f -p PID might be a good start.
> Then do whatever triggers the fault, and examine the output from
> truss.
>
> - Harald