Hi all,

Here are the results of strace/truss on a qmail-lspawn session which
caused qmail v1.00 to crash on Linux RH 5.2. Apologies for the long post.

Here's what I did:
0. Emptied the queue.
1. Restarted qmail
3. Started "strace -f -p 2074" in one window [qmail-send pid]
4. Started "strace -f -p 2076" in another window [qmail-lspawn pid]
5. Ran "echo to:fred|/var/qmail/bin/qmail-inject" in a third window

After that, qmail crashed. But the message had been delivered locally to
fred's Maildir/new directory, _and_ the very same message was still in
the queue!

Here's the results of the qmail-send trace (showing the last few lines
only):

# strace -f -p 2074
[snip]
stat("todo/313551", 0xbffffc98)         = -1 ENOENT (No such file or
directory)
stat("info/15/313551", {st_mode=0, st_size=0, ...}) = 0
open("info/15/313551", O_RDONLY|O_NONBLOCK) = 9
fstat(9, {st_mode=0, st_size=0, ...})   = 0
read(9, "[EMAIL PROTECTED]\0", 128) = 27
close(9)                                = 0
stat("bounce/313551", {st_mode=0, st_size=0, ...}) = 0
pipe([9, 10])                           = 0
pipe([11, 12])                          = 0
fork()                                  = 2092
[pid  2074] close(9 <unfinished ...>
[pid  2092] close(10 <unfinished ...>
[pid  2074] <... close resumed> )       = 0
[pid  2092] <... close resumed> )       = 0
[pid  2074] close(11 <unfinished ...>
[pid  2092] close(12 <unfinished ...>
[pid  2074] <... close resumed> )       = 0
[pid  2092] <... close resumed> )       = 0
[pid  2074] time( <unfinished ...>
[pid  2092] fcntl(9, F_GETFL <unfinished ...>
[pid  2074] <... time resumed> NULL)    = 927231668
[pid  2092] <... fcntl resumed> )       = 0 (flags O_RDONLY)
[pid  2074] --- SIGSEGV (Segmentation fault) ---
close(0)                                = 0
fcntl(9, F_DUPFD, 0)                    = 0
close(9)                                = 0
fcntl(11, F_GETFL)                      = 0 (flags O_RDONLY)
close(1)                                = 0
fcntl(11, F_DUPFD, 1)                   = 1
close(11)                               = 0
execve("/var/qmail/bin/qmail-queue", ["qmail-queue"], [/* 1 var */]) = -1
EPERM (Operation not permitted)
_exit(120)                              = ?

Exit code 120 appears to be EXECSOFT, which is being used in a couple of
places. Is this the key to my problem? Is the preceeding line with
qmail-queue the process which exits with exit code 120? What do I do
next?

Also, here's the results of the qmail-lspawntrace (showing the last few
lines only):

# strace -f -p 2076
[snip]
[pid  2086] munmap(0x4000b000, 4096)    = 0
[pid  2086] read(0, "Received: (qmail 2084 invoked by"..., 1024) = 224
[pid  2086] read(0, "", 1024)           = 0
[pid  2086] write(3, "Return-Path: <[EMAIL PROTECTED]"..., 305) = 305
[pid  2086] fsync(3)                    = 0
[pid  2086] close(3)                    = 0
[pid  2086] link("tmp/927231667.2086.localhost.localdomain",
"new/927231667.2086.localhost.localdomain") = 0
[pid  2086] unlink("tmp/927231667.2086.localhost.localdomain") = 0
[pid  2086] _exit(0)                    = ?
[pid  2085] <... wait4 resumed> [WIFEXITED(s) && WEXITSTATUS(s) == 0], 0,
NULL) = 2086
[pid  2085] write(1, "did 0+0+0\n", 10) = 10
[pid  2085] _exit(0)                    = ?
<... select resumed> )                  = 1 (in [3])
sigprocmask(SIG_SETMASK, [CHLD], NULL)  = 0
read(3, "did 0+0+0\n", 128)             = 10
sigprocmask(SIG_SETMASK, [], NULL)      = 0
select(4, [0 3], NULL, NULL, NULL)      = ? ERESTARTNOHAND (To be
restarted)
--- SIGCHLD (Child exited) ---
wait4(-1, [WIFEXITED(s) && WEXITSTATUS(s) == 0], WNOHANG, NULL) = 2085
close(4)                                = 0
wait4(-1, 0xbffffcd0, WNOHANG, NULL)    = -1 ECHILD (No child processes)
sigreturn()                             = ? (mask now [])
sigprocmask(SIG_SETMASK, [CHLD], NULL)  = 0
sigprocmask(SIG_SETMASK, [], NULL)      = 0
select(4, [0 3], NULL, NULL, NULL)      = 1 (in [3])
sigprocmask(SIG_SETMASK, [CHLD], NULL)  = 0
read(3, "", 128)                        = 0
write(1, "\0Kdid 0+0+0\n\0", 13)        = 13
close(3)                                = 0
sigprocmask(SIG_SETMASK, [], NULL)      = 0
select(1, [0], NULL, NULL, NULL)        = 1 (in [0])
sigprocmask(SIG_SETMASK, [CHLD], NULL)  = 0
read(0, "", 1024)                       = 0
_exit(0)                                = ?

In my opinion, this doesn't reveal much about the problem, but I'm by no
means a truss/strace expert so I might be wrong!

Please advice me what to do next.  Thanks!

Cheers
Fred



Harald Hanche-Olsen wrote:

> + Fred Backman <[EMAIL PROTECTED]>:
>
> | AFAIK qmail-alias is a child execv'd by qmail-lspawn, which is not
> | supposed to exit.
>
> That's right, qmail-lspawn should not exit until qmail-send tells it
> to.
>
> | > I must have to do something with it. Use a debugger to find out.
> |
> | Sorry, I'm not sure I understand what you mean. What is it you think
> | has something to do with the problem?
>
> Learn to use truss and apply it to qmail-lspawn and qmail-send.
> Something like truss -o file -f -p PID might be a good start.
> Then do whatever triggers the fault, and examine the output from
> truss.
>
> - Harald

Reply via email to