RE: Can anyone explain this?
Charles Cazabon wrote: eric [EMAIL PROTECTED] wrote: Anyone else seeing similar stuff happening? I'm running qmail-1.03 without any patches under RedHat 6.1 (kernel 2.2.12-20smp) Linux kernel version 2.2.12 has known bugs in its stack, and has been identified as a likely culprit for these symptoms before. Upgrade to 2.2.19 and the problem may just disappear. If this is the problem I have been experiencing (qmail-remote processes lasting a great deal longer than they should) then upgrading to 2.2.19 won't fix this problem. A work-around is to enable socket keep-alives which (at least for me) fixes the problem. I've put the patch on the web at: http://www.duff.org/qmail/ Richard P.S. Eric: ps -ef on modern RedHat distributions (and others) has a STIME column which will tell you when the process started running.
RE: qmail-queue question
Edward, I've had problems with qmail-remote hanging - it had nothing to do with the queue lifetime, but with some code in qmail-remote failing, possibly due to an O/S bug. A fix which works for me is to enable socket keep-alives. This will kill the socket if it has died after about 2-3 hours. I've put a patch on the web at http://www.duff.org/qmail/ Richard -Original Message- From: Edward McLain [mailto:[EMAIL PROTECTED]] On a side note, is there any reason that qmail-remote should start up and then just sit there connected to a remote host for like 6 or 7 hours trying to send one email? I get this all the freaking time and I'm just wandering what exactly the freaking thing is doing? (although this problem only really seems to occur with mindspring.com, yet if I telnet to port 25 of mindsprings mail server and send the same message through telnet to the same user, from the same user as the one qmail's trying to send it works just fine and I don't get any errors or return codes.)
RE: Fix for qmail-remote process hanging on Linux (and possibly o ther s)
try a couple of dozen connections to the same remote host at the same time. (This is an issue in itself!) Why is this an issue? If the remote host can handle 100 inbound connections, you should be able to open 100 connections to them, inject your messages, and close the connections. Everyone's happy. If the remote host can't handle that many, it shouldn't accept that many. You'll then get connections past X deferred, and qmail will back off. It's an issue because, while in an ideal world this would be fine, we don't live in an ideal world and not every smtp server out there will drop connections smoothly. Instead, they hang, or accept connections that they can't handle, leading to a reduced throughput. If I'm sending a few thousand mails, chances are it'd be possible to maintain full throughput without hitting the same host more than once concurrently. Sure, if there's nothing else in the queue, then you may as well use multiple threads per MX, but what do you lose by scheduling other hosts first? I've also noticed that if qmail tries to deliver (for example) 50 messages to one host concurrently, perhaps 2 will get through. The rest will be retried, but unfortunately they tend to get retried at much the same time. Again, 2 messages get through, and the process repeats. This simply isn't efficient. I think qmail is great, don't get me wrong. I just thing there is room for improvement. Richard P.S. People here seem to be a little over-sensitive about this issue!
RE: Fix for qmail-remote process hanging on Linux (and possibly o ther s)
From: Henning Brauer [mailto:[EMAIL PROTECTED]] On Mon, Aug 06, 2001 at 11:09:25AM +0100, Richard Underwood wrote: I've also noticed that if qmail tries to deliver (for example) 50 messages to one host concurrently, perhaps 2 will get through. The rest will be retried, but unfortunately they tend to get retried at much the same time. Again, 2 messages get through, and the process repeats. This simply isn't efficient. This isn't qmails fault but the fault of the remote host. There is room for improvement - just not on qmail's side. The remote host MUST NOT accept more connections than it can handle. If it does the remote recipients must live with the delays. Read what I wrote again. It IS qmail's fault. One role I use qmail for is to accept mail which is then passed on to an exchange server on the same network. Here's an example of what can happen ... If the exchange server goes down, a large queue builds up. The exchange server accepts something like 20 concurrent connections before refusing to accept connections. This, as you say, is what the server should do. When the exchange server comes back up, I kick the qmail-send process to get it to deliver the queue. At this point I should be able to go off and do other things. However, qmail tries to send the queue with lots of concurrent connections. The first 20 work, but the rest are dropped. This then blocks any further attempts for a time. After this time, the mails are tried again - once more, lots of concurrent connections, more dropped connections, more delays. In the end, I resort to sitting there watching the queue and kicking the qmail-send process until the queue is small enough to go through without help. The exchange server is working as it should be - it's dropping connections once its connection limit is reached, but left alone, qmail is being far less than efficient - sendmail with a single thread could deliver the mail faster! Saying that there's no room for improvement on qmail's side is pure arrogance. Just look at the number of patches available for it for a clue. qmail is great, it works well, but it still could be improved. People complain about single message blocking their queue ... Run two copies? That works, but is it the best option? A configurable limit to the threads per message would fix this. Prioritizing messages would be better. The problem I describe above could be fixed by a configurable per-host thread limit. Can you think of a neater solution? Blindly defending qmail isn't going to make it better. It doesn't help, except in flame-wars. Richard
RE: Fix for qmail-remote process hanging on Linux (and possibly o ther s)
From: Peter van Dijk [mailto:[EMAIL PROTECTED]] On Mon, Aug 06, 2001 at 01:07:36PM +0100, Richard Underwood wrote: When the exchange server comes back up, I kick the qmail-send process to get it to deliver the queue. At this point I should be able to go off and do other things. Why are you kicking qmail-send? That should never be necessary in a production environment. Life with qmail, 1.5.8 and E.1. - qmail backs off. When a server comes back up, I want the queued mail there ASAP. If the exchange server has been down for more than half an hour, you're looking at unacceptable delays. (Unacceptable to the company, that is.) If this box' only function is relaying to the exchange server, why not set concurrencyremote to 20? It's not the only function of the server. Even if it was, that's a hack. It's a workable solution, but also a hack. I could install another qmail instance, but then that's worse in my opinion. Have you tried not kicking it at all? qmail has a very efficient retry schedule, that doesn't even bog down heavily loaded servers. I've not, for the reason given above. What I described was just an example of what can happen to illustrate my point. I've seen similar problems without kicking the queue, but nothing so clear or repeatable. I was just giving an example of where qmail doesn't act 'perfectly'. Lots of patches satisfy needs easily fixed without patches. Lots of patches satisfy needs that only a few users have. Oh, I quite agree. Apart from my server where I run virtual domains, I use qmail out of the box. (Actually, one server now has the keep-alives patch happily installed.) And if you don't like this behaviour: write a patch (or find one), or stop using qmail. Nobody is forcing you to use qmail. Perhaps, but using this list to tell people (often quite forcefully) that the behaviour they are experiencing is as it should be, and it's the rest of the world that's broken, and that qmail is perfect already isn't going to encourage anyone to help. If I had the time, I'd write a patch. I wouldn't do it without discussing it on a mailing list first, though ... Has it been suggested/done before? Does anyone have any suggestions for better algorithms? What features would people want? But I don't think I will. Even suggesting that there was an issue (I didn't say bug, and I didn't say problem, I said issue) with qmail resulted in some very abrupt replies, telling me that I was wrong, and qmail was perfect. This stifles discussion. The nobody is forcing you cliche makes things worse. I personally think qmail is great, and will always use it - but all this makes me less willing to contribute - if the (apparent) general consensus is that people are happy with qmail as it is, then I'll leave you all in peace. I've got enough servers to split tasks up, a patch would be good, but I may as well tailor it to my needs and not bother sharing it. I wonder how many other people have been put off like that? I think I've been quite reasonable with the messages I've sent. I've said that I like qmail numerous times. I've said I want to improve it ... and people have told me it needs no improvement. I simply think that this is short-sighted. I'll leave you all alone now. Richard
RE: Fix for qmail-remote process hanging on Linux (and possibly o ther s)
From: MarkD [mailto:[EMAIL PROTECTED]] Has this been discussed before? Yes. Endlessly. Check the archives. You are breaking no new ground here at all. I'm sure it has - it's an important issue. If you want to be truly helpful, you might want to read the archives on this matter and then suggest/do something beyond what has been already been discussed ad nauseum. I would have, if I had been investigating that problem. I was looking at a completely different problem at the time. If you look at the title of this thread, you'll see that it's about qmail-remote processes hanging on Linux. The argument was started because I described multiple connections to a site as an issue. I could have ignored the replies telling me that it wasn't an issue as flame-bait, but having seen some of the other replies on this list, I thought it'd be more constructive to explain what I had meant. I still believe it's an issue. Richard
Fix for qmail-remote process hanging on Linux (and possibly others)
Hi, I asked about qmail-remote processes hanging in read() on this list a few days ago. It appears that this has been reported before, but no conclusion seemed to have been made. The problem appears to be in timeoutread() which uses select() to prevent read() from blocking. For whatever reason, during heavy load, this fails and the read() call blocks. The TCP connection stays in the established state and therefore the process never terminates, leading to a reduction in the number of available concurrent remote deliveries. One suggestion (from MarkD) was to set a large-value alarm signal to terminate the process, which would work (qmail would see the qmail-remote process as crashed and try it again) but I don't particularly like this method. Potentially you could cut a large message sent over a slow connection off for one thing. Another solution, which I have been trying over the last few days is to turn on socket keep alives. This has the effect of closing the socket if no data has been sent over it for a fixed period (usually 2 or 3 hours.) The read() call will end as if the remote host dropped the connection and qmail-remote will terminate normally. It all seems to be working, so if anyone else is having the same problem, you may like to try this fix too. I've included a patch for qmail-remote.c - it's not exactly beautiful code, but it works for me. Good luck, Richard *** qmail-1.03/qmail-remote.c Mon Jun 15 11:53:16 1998 --- qmail-1.03.patched/qmail-remote.c Fri Aug 3 14:34:27 2001 *** *** 338,344 int flagallaliases; int flagalias; char *relayhost; ! sig_pipeignore(); if (argc 4) perm_usage(); if (chdir(auto_qmail) == -1) temp_chdir(); --- 338,345 int flagallaliases; int flagalias; char *relayhost; ! int s_opt; ! sig_pipeignore(); if (argc 4) perm_usage(); if (chdir(auto_qmail) == -1) temp_chdir(); *** *** 415,420 --- 416,423 if (smtpfd == -1) temp_oserr(); if (timeoutconn(smtpfd,ip.ix[i].ip,(unsigned int) port,timeoutconnect) == 0) { + s_opt=1; + setsockopt(smtpfd,SOL_SOCKET,SO_KEEPALIVE,s_opt,sizeof(int)); tcpto_err(ip.ix[i].ip,0); partner = ip.ix[i].ip; smtp(); /* does not return */
RE: Fix for qmail-remote process hanging on Linux (and possibly other s)
I just looked at the server I had problems with -- 15 hung qmail-remotes :( Not good! I peaked at 26 before I noticed. How did you test this patch? Are you saying that you were able to reliably reproduce the problem? I could never do this... If so, how? I tested the patch by running it on the live server for three days. I was experiencing on average 1-2 processes getting stuck a day and haven't had one stuck since. The problems generally started during large mailing which happen daily on this server. I couldn't repeat the problem, but it happened reliably enough for me to believe that it has now been stopped. The patch itself should not affect the running of the program in any way except dropping dead connections. There is a lot of mistery in this: Most (but not all) reports had connections hung to outblaze.com Most (but not all) servers ran Linux. It's weird... It is. I didn't spot a pattern in the remote hosts, but then I didn't try to. I suspect it's something to do with stateful firewalls dropping a session after a period of inactivity, it doesn't explain why the code is affected by it all, though. My other suspicion is that there's a chance that my one server will try a couple of dozen connections to the same remote host at the same time. (This is an issue in itself!) It could be that a firewall in the path is mistaking the connection as a DOS attempt and responding weirdly, kicking off a bug with select. I'll let you know if the problems re-appear. Richard
RE: Problems with qmail-remote hanging
This problem's been reported before. If your OS says that an fd is readable via select(), then the read() should not block. As you observe though, the read is blocking so your OS is probably not telling the truth when it returns from the select(). The archives have plenty of discussion on this and the simplest solution is to put a large-value alarm() handler in qmail-remote. No one as yet seems to be able to narrow down which OSes do this and under what circumstances. Mark, Thanks for the reply. I only seem to experience the problem with large mail-outs. One possibility is that because of the way qmail works, there's a significant chance that we will be making a large number of simultaneous connections to some servers. It's possible that this is causing a connection to be blackholed somewhere ... that doesn't explain why select/read are failing to agree, though. Perhaps select thinks the connection is closed, but read doesn't. Setting an alarm is a nasty hack in my opinion, but I have to admit that it's something I considered. A slightly neater solution might be to use the SO_KEEPALIVE socket option - if it works (and there isn't a good reason not to use it) that is. What would be better is finding out why this happens, of course. Thanks, Richard P.S. If anyone is keeping track, Linux 2.2.19, concurrencyremote set to 200
Problems with qmail-remote hanging
Hi, I've been running qmail on a number of platforms quite happily for a while - until now I've had no problems at all. However, I am now experiencing a problem with qmail-remote hanging. I'm running qmail on this server for sending mails from websites and bulk mail-outs (up to about 40,000 recipients.) The server doesn't receive mails iteself to a great extent. It's a dual-cpu Dell running Linux. I have another very similar installation which has absolutely no problems. Qmail on this server is 100% standard Qmail 1.03. The problem I see is with qmail-remote failing to terminate when a connection times-out. If left alone, the number of stuck processes will slowly climb, after about a month I had about 25 such processes. The network connections remain in the ESTABLISHED state. Looking at the process list right now, I have one stuck: # ps -ef | grep qmail-remote qmailr 12278 662 0 13:13 ?00:00:00 qmail-remote xx.co.uk xx qmailr 19876 662 0 16:09 ?00:00:00 qmail-remote xx.com root 19912 19489 0 16:10 pts/000:00:00 grep qmail-remote # strace -p 12278 read(3, unfinished ... ... all socket read()s in qmail-remote should be protected by a select and therefore should not block as this one is doing now. After recompiling with debugging and symbols, I get ... # gdb qmail-remote 12278 GNU gdb 5.0 Attaching to program: /home/qmail/bin/qmail-remote, Pid 12278 Reading symbols from /lib/libresolv.so.2...done. Loaded symbols for /lib/libresolv.so.2 Reading symbols from /lib/libc.so.6...wdone. Loaded symbols for /lib/libc.so.6 Reading symbols from /lib/ld-linux.so.2...hdone. Loaded symbols for /lib/ld-linux.so.2 0x40103424 in __libc_read () from /lib/libc.so.6 (gdb) where #0 0x40103424 in __libc_read () from /lib/libc.so.6 #1 0x3b654f80 in ?? () #2 0x8048f05 in saferead (fd=-1, buf=0x8051180 , len=128) at qmail-remote.c:113 #3 0x804d193 in oneread (op=0x8048ee8 saferead, fd=-1, buf=0x8051180 , len=128) at substdi.c:14 #4 0x804d25e in substdio_feed (s=0x804f3d0) at substdi.c:44 #5 0x804d3ab in substdio_get (s=0x804f3d0, buf=0xbdc7 , len=1) at substdi.c:75 #6 0x8048f70 in get (ch=0xbdc7 ) at qmail-remote.c:137 #7 0x8048fda in smtpcode () at qmail-remote.c:150 #8 0x80492cb in smtp () at qmail-remote.c:225 #9 0x8049d31 in main (argc=4, argv=0xbe94) at qmail-remote.c:420 #10 0x4004bf31 in __libc_start_main (main=0x804987c main, argc=4, ubp_av=0xbe94, init=0x804878c _init, fini=0x804dd10 _fini, rtld_fini=0x4000e274 _dl_fini, stack_end=0xbe8c) at ../sysdeps/generic/libc-start.c:129 ... in smtp() ... 220 { 221 unsigned long code; 222 int flagbother; 223 int i; 224 225 =if (smtpcode() != 220) quit(ZConnected to , but greeting failed); 226 227 substdio_puts(smtpto,HELO ); 228 substdio_put(smtpto,helohost.s,helohost.len); 229 substdio_puts(smtpto,\r\n); saferead() calls timeoutread() which calls select() and then read(). fd=-1 is a red-herring, it's not used by saferead in qmail-remote. Can anyone explain this, or has anyone experienced anything similar? Thanks, Richard