Re: Fix for qmail-remote process hanging on Linux (and possibly other s)
On Fri, Aug 03, 2001 at 03:07:57PM +0100, Richard Underwood wrote: Hi, I asked about qmail-remote processes hanging in read() on this list a few days ago. It appears that this has been reported before, but no conclusion seemed to have been made. I just looked at the server I had problems with -- 15 hung qmail-remotes :( snip Another solution, which I have been trying over the last few days is to turn on socket keep alives. This has the effect of closing the socket if no data has been sent over it for a fixed period (usually 2 or 3 hours.) The read() call will end as if the remote host dropped the connection and qmail-remote will terminate normally. It all seems to be working, so if anyone else is having the same problem, you may like to try this fix too. I've included a patch for qmail-remote.c - it's not exactly beautiful code, but it works for me. How did you test this patch? Are you saying that you were able to reliably reproduce the problem? I could never do this... If so, how? There is a lot of mistery in this: Most (but not all) reports had connections hung to outblaze.com Most (but not all) servers ran Linux. It's weird...
Re: qmail-remote (cry wolf?)
On Sat, Jun 09, 2001 at 06:32:55AM -0400, Troy Settle wrote: Yes, I've had qmail-remote processes sit there for weeks. I think that instead of killing them off wholesale, I'll pick one or two processes and see just how long they'll hang around. I'll post weekly updates if there's any interest. Here is what I have on one of mail servers (ps -waux|grep qmail-remote, real email addresses removed, domain names are left. I only left user, pids, state, date, and prog name on the output for readability purposes): qmailr 7365 S May19 qmail-remote iname.com [EMAIL PROTECTED] [EMAIL PROTECTED] qmailr 14602 S May19 qmail-remote mail.com [EMAIL PROTECTED] [EMAIL PROTECTED] qmailr 25415 S May19 qmail-remote careful.com [EMAIL PROTECTED] [EMAIL PROTECTED] qmailr 25875 S May19 qmail-remote programmer.net [EMAIL PROTECTED] [EMAIL PROTECTED] qmailr 25902 S May19 qmail-remote mail.com [EMAIL PROTECTED] [EMAIL PROTECTED] qmailr 852S May19 qmail-remote mail.com [EMAIL PROTECTED] [EMAIL PROTECTED] qmailr 20283 S May25 qmail-remote ziplip.com [EMAIL PROTECTED] [EMAIL PROTECTED] qmailr 29814 S May18 qmail-remote mail.com [EMAIL PROTECTED] [EMAIL PROTECTED] qmailr 25877 S May19 qmail-remote mail.com [EMAIL PROTECTED] [EMAIL PROTECTED] qmailr 25145 S May19 qmail-remote mail.com [EMAIL PROTECTED] [EMAIL PROTECTED] qmailr 27208 S Jun08 qmail-remote hp.com [EMAIL PROTECTED] [EMAIL PROTECTED] qmailr 27070 S Jun08 qmail-remote mail.com [EMAIL PROTECTED] [EMAIL PROTECTED] qmailr 11525 S Jun08 qmail-remote best-service.com [EMAIL PROTECTED] [EMAIL PROTECTED] qmailr 13766 S Jun08 qmail-remote mad.scientist.com [EMAIL PROTECTED] [EMAIL PROTECTED] As you can see, processes running since May 19th cannot possibly be explained by slow deliver -- 20 days is just too much. The following domains go through outblaze.com mail servers: iname.com mail.com careful.com programmer.net best-service.com mad.scientist.com The following domains do not go through outblaze: ziplip.com hp.com Unforunatelly, I cannot explain this situation by blaming everything on outblaze. -- Eugene Miretskiy [EMAIL PROTECTED] InVision.com, INC. (631) 543-1000 www.invision.net / www.longisland.com
Re: qmail-remote (cry wolf?)
One more time, I did tcpdump and strace on stuck qmail-remote for over an hour. strace shows that qmail-remote is stuck on: 'read(3', and tcpdump shows that nothing comes in. On Fri, Jun 08, 2001 at 03:13:54PM +, Mark wrote: processed those 1500 messages in less than 30 minutes. However, it left behind another handfull of stuck qmail-remote processes. Other messages were undeliverable and left in the queue, and still others were sent back to sender with permanent errors. What do you mean by stuck? Do you mean they *never* go away - even after a day or two? As others have pointed out, a slow delivery can take a long, long time. That's not necessarily a problem, that's just the way it is. To find out a bit more about what a stuck qmail-remote is doing, you may want to ktrace it and show us the output. Find the process id of the stuck qmail-remote and then as root go: ktrace -p thepid Leave that running for at least an hour and show us the output. Yes, I mean at least an hour. Regards. -- Eugene Miretskiy [EMAIL PROTECTED] InVision.com, INC. (631) 543-1000 www.invision.net / www.longisland.com
Re: qmail-remote (cry wolf?)
On Fri, Jun 08, 2001 at 09:47:16PM +, Mark wrote: Then it's an OS bug. qmail-remote only gets to the read() if the OS (via select() ) says that the read will not block. Ergo, the OS is lying. If it's OS bug, anybody heard/knows of such severe network related bug in RedHat 6.2? What about FreeBSD 4.2 (I believe somebody reported problem with FreeBSD as well)??? What are the chances of _such_ bug in _both_ OSes? I'd like to mention, that I ran qmail of FreeBSD (starting from 3.x all the way to latest) for couple years and _never_ observed this behaviour on FreeBSD. Is it possible that some external devices s.a. switch/router/firewall/anything could be causing this problem? -- Eugene Miretskiy [EMAIL PROTECTED] InVision.com, INC. (631) 543-1000 www.invision.net / www.longisland.com
qmail-remote hangs
Hello, I encountered very veird problem with qmail-remote hanging indefinitelly (for over 2 weeks). Here are the details: qmail-remote stays in ESTABLESHED state forever. strace show that qmail-remote hangs doing: 'read(3,' Most of hanged qmail remote are trying to deliver to outblaze.com (or to domains using outblaze.com mail servers, s.a. mail.com) There were couple posts to the list (http://www.ornl.gov/its/archives/mailing-lists/qmail/2001/05/msg00558.html, and http://www.ornl.gov/its/archives/mailing-lists/qmail/2001/05/msg01333.html) outlining similar problem (also with outblaze). However, I had 2 qmail-remotes get stuck on different domains. I observed this behaviour on 2 mail servers (OS and setup is the same). OS is Rhat 6.2 I'm at complete loss and have no idea how qmail-remote can get stuck trying to read from fd3. After looking at qmail-remote.c, it's apparent that qmail-remote calls timeoutread to perform network reads. Timeoutread does select() syscall before doing read. So, the only way we can get to read is when selected returned w/out timeout indicating that there is data available. Then, how can 'read(3' can get stuck??? Furthermore, when I kill one of the stuck qmail-remotes, qmail tries to redeliver the message, and either succedes or times out on remote host w/out problem. ANY suggestions on what can be done to troubleshoot this problem are very much appreciated. -- Eugene Miretskiy [EMAIL PROTECTED] InVision.com, INC. (631) 543-1000 www.invision.net / www.longisland.com