Hi,
I've been running qmail on a number of platforms quite happily for a
while - until now I've had no problems at all. However, I am now
experiencing a problem with qmail-remote hanging.
I'm running qmail on this server for sending mails from websites and
bulk mail-outs (up to about 40,000 recipients.) The server doesn't receive
mails iteself to a great extent.
It's a dual-cpu Dell running Linux. I have another very similar
installation which has absolutely no problems. Qmail on this server is 100%
standard Qmail 1.03.
The problem I see is with qmail-remote failing to terminate when a
connection times-out. If left alone, the number of "stuck" processes will
slowly climb, after about a month I had about 25 such processes. The network
connections remain in the "ESTABLISHED" state.
Looking at the process list right now, I have one stuck:
# ps -ef | grep qmail-remote
qmailr 12278 662 0 13:13 ? 00:00:00 qmail-remote
xxxxxxxxxx.co.uk xx
qmailr 19876 662 0 16:09 ? 00:00:00 qmail-remote xxxxxxxxxx.com
xxxx
root 19912 19489 0 16:10 pts/0 00:00:00 grep qmail-remote
# strace -p 12278
read(3, <unfinished ...>
... all socket read()s in qmail-remote should be protected by a
select and therefore should not block as this one is doing now. After
recompiling with debugging and symbols, I get ...
# gdb qmail-remote 12278
GNU gdb 5.0
Attaching to program: /home/qmail/bin/qmail-remote, Pid 12278
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/libc.so.6...wdone.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...hdone.
Loaded symbols for /lib/ld-linux.so.2
0x40103424 in __libc_read () from /lib/libc.so.6
(gdb) where
#0 0x40103424 in __libc_read () from /lib/libc.so.6
#1 0x3b654f80 in ?? ()
#2 0x8048f05 in saferead (fd=-1, buf=0x8051180 "", len=128)
at qmail-remote.c:113
#3 0x804d193 in oneread (op=0x8048ee8 <saferead>, fd=-1, buf=0x8051180 "",
len=128) at substdi.c:14
#4 0x804d25e in substdio_feed (s=0x804f3d0) at substdi.c:44
#5 0x804d3ab in substdio_get (s=0x804f3d0, buf=0xbffffdc7 "", len=1)
at substdi.c:75
#6 0x8048f70 in get (ch=0xbffffdc7 "") at qmail-remote.c:137
#7 0x8048fda in smtpcode () at qmail-remote.c:150
#8 0x80492cb in smtp () at qmail-remote.c:225
#9 0x8049d31 in main (argc=4, argv=0xbffffe94) at qmail-remote.c:420
#10 0x4004bf31 in __libc_start_main (main=0x804987c <main>, argc=4,
ubp_av=0xbffffe94, init=0x804878c <_init>, fini=0x804dd10 <_fini>,
rtld_fini=0x4000e274 <_dl_fini>, stack_end=0xbffffe8c)
at ../sysdeps/generic/libc-start.c:129
... in smtp() ...
220 {
221 unsigned long code;
222 int flagbother;
223 int i;
224
225 => if (smtpcode() != 220) quit("ZConnected to "," but greeting
failed");
226
227 substdio_puts(&smtpto,"HELO ");
228 substdio_put(&smtpto,helohost.s,helohost.len);
229 substdio_puts(&smtpto,"\r\n");
saferead() calls timeoutread() which calls select() and then read().
fd=-1 is a red-herring, it's not used by saferead in qmail-remote.
Can anyone explain this, or has anyone experienced anything similar?
Thanks,
Richard