Hi,

        I've been running qmail on a number of platforms quite happily for a
while - until now I've had no problems at all. However, I am now
experiencing a problem with qmail-remote hanging.

        I'm running qmail on this server for sending mails from websites and
bulk mail-outs (up to about 40,000 recipients.) The server doesn't receive
mails iteself to a great extent.

        It's a dual-cpu Dell running Linux. I have another very similar
installation which has absolutely no problems. Qmail on this server is 100%
standard Qmail 1.03.

        The problem I see is with qmail-remote failing to terminate when a
connection times-out. If left alone, the number of "stuck" processes will
slowly climb, after about a month I had about 25 such processes. The network
connections remain in the "ESTABLISHED" state.

        Looking at the process list right now, I have one stuck:

# ps -ef | grep qmail-remote
qmailr   12278   662  0 13:13 ?        00:00:00 qmail-remote
xxxxxxxxxx.co.uk xx
qmailr   19876   662  0 16:09 ?        00:00:00 qmail-remote xxxxxxxxxx.com
xxxx
root     19912 19489  0 16:10 pts/0    00:00:00 grep qmail-remote

# strace -p 12278
read(3,  <unfinished ...>

        ... all socket read()s in qmail-remote should be protected by a
select and therefore should not block as this one is doing now. After
recompiling with debugging and symbols, I get ...

# gdb qmail-remote 12278
GNU gdb 5.0
Attaching to program: /home/qmail/bin/qmail-remote, Pid 12278
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/libc.so.6...wdone.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...hdone.
Loaded symbols for /lib/ld-linux.so.2
0x40103424 in __libc_read () from /lib/libc.so.6
(gdb) where
#0  0x40103424 in __libc_read () from /lib/libc.so.6
#1  0x3b654f80 in ?? ()
#2  0x8048f05 in saferead (fd=-1, buf=0x8051180 "", len=128)
    at qmail-remote.c:113
#3  0x804d193 in oneread (op=0x8048ee8 <saferead>, fd=-1, buf=0x8051180 "", 
    len=128) at substdi.c:14
#4  0x804d25e in substdio_feed (s=0x804f3d0) at substdi.c:44
#5  0x804d3ab in substdio_get (s=0x804f3d0, buf=0xbffffdc7 "", len=1)
    at substdi.c:75
#6  0x8048f70 in get (ch=0xbffffdc7 "") at qmail-remote.c:137
#7  0x8048fda in smtpcode () at qmail-remote.c:150
#8  0x80492cb in smtp () at qmail-remote.c:225
#9  0x8049d31 in main (argc=4, argv=0xbffffe94) at qmail-remote.c:420
#10 0x4004bf31 in __libc_start_main (main=0x804987c <main>, argc=4, 
    ubp_av=0xbffffe94, init=0x804878c <_init>, fini=0x804dd10 <_fini>, 
    rtld_fini=0x4000e274 <_dl_fini>, stack_end=0xbffffe8c)
    at ../sysdeps/generic/libc-start.c:129

        ... in smtp() ...

220     {
221       unsigned long code;
222       int flagbother;
223       int i;
224      
225 =>    if (smtpcode() != 220) quit("ZConnected to "," but greeting
failed");
226      
227       substdio_puts(&smtpto,"HELO ");
228       substdio_put(&smtpto,helohost.s,helohost.len);
229       substdio_puts(&smtpto,"\r\n");

        saferead() calls timeoutread() which calls select() and then read().
fd=-1 is a red-herring, it's not used by saferead in qmail-remote.

        Can anyone explain this, or has anyone experienced anything similar?

        Thanks,

                Richard

Reply via email to