Re: Lots and lots of qmail-queue's
Martin Ouwehand writes... Has anybody seen this: from time to time, a whole bunch of qmail-queue's will accumulate (I'd say up to ~400), apparently doing nothing (ps shows that most of them have the same WCHAN, but not all of them). Most of them have 1 as PPID, a few still have qmail-smtpd as parent. Are they all in TIME_WAIT? It's probably one machine. Look in the logs (or use netstat or lsof) to get the IP address of the machine. Use my recordio patch to record dialog just for that host. I bet you'll find qmail is dropping the connection with the "bare LF" message. I've previously given my point of view on qmail's non-RFC compliance on the list before, find that message, apply the patch. This has a serious impact on the through-put and reliability of our qmail server. Right now, killing and restarting "tcpserver [...] qmail-smtpd" fixes the problem, but I'd really like to know what is going on to altogether avoid this behavior. BTW, this is a Solaris 2.5 machine. Any idea ? Martin -- Aaron Nabil
Re: Lots and lots of qmail-queue's
Aaron Nabil [EMAIL PROTECTED] writes: ] Has anybody seen this: from time to time, a whole bunch of qmail-queue's ] will accumulate (I'd say up to ~400), apparently doing nothing (ps shows ] that most of them have the same WCHAN, but not all of them). Most of ] them have 1 as PPID, a few still have qmail-smtpd as parent. ] ] Are they all in TIME_WAIT? It's probably one machine. It sure is. As to the TIME_WAIT, this doesn't seem to be a problem (a "netstat -an" doesn't show to many sockets, in whatever state). ] Look in the logs (or use netstat or lsof) to get the IP address of the ] machine. ] ] Use my recordio patch to record dialog just for that host. Thanks, it was very useful. ] I bet you'll find qmail is dropping the connection with the "bare LF" ] message. I've previously given my point of view on qmail's non-RFC ] compliance on the list before, find that message, apply the patch. You're right about the "bare LF" message. About your patch: if you mean the one where you just comment out those "if (ch == '\n') straynewline();" lines, I'm afraid that the exchange following your proposal convinced me that this isn't the right solution. BUT, I'm also convinced that the present situation isn't satisfactory. It is of no comfort to me that the client is the culprit by not following some RFC if my server is on its knees ! So what can we do to avoid this ? One way would be not to _exit() in straynewline() but taking care of the state we're in is barely within the limit of my window of understanding of the source code. What do you think of this ?: == --- qmail-smtpd.c.orig Mon Jun 15 12:53:16 1998 +++ qmail-smtpd.c Wed Aug 18 15:49:57 1999 @@ -47,7 +47,6 @@ void die_nomem() { out("421 out of memory (#4.3.0)\r\n"); flush(); _exit(1); } void die_control() { out("421 unable to read controls (#4.3.0)\r\n"); flush(); _exit(1); } void die_ipme() { out("421 unable to figure out my IP addresses (#4.3.0)\r\n"); flush(); _exit(1); } -void straynewline() { out("451 See http://pobox.com/~djb/docs/smtplf.html.\r\n"); flush(); _exit(1); } void err_bmf() { out("553 sorry, your envelope sender is in my badmailfrom list (#5.7.1)\r\n"); } void err_nogateway() { out("553 sorry, that domain isn't in my list of allowed rcpthosts (#5.7.1)\r\n"); } @@ -290,6 +289,8 @@ qmail_put(qqt,ch,1); } +int straynewline; + void blast(hops) int *hops; { @@ -322,17 +323,17 @@ } switch(state) { case 0: -if (ch == '\n') straynewline(); +if (ch == '\n') { straynewline = 1; return; } if (ch == '\r') { state = 4; continue; } break; case 1: /* \r\n */ -if (ch == '\n') straynewline(); +if (ch == '\n') { straynewline = 1; return; } if (ch == '.') { state = 2; continue; } if (ch == '\r') { state = 4; continue; } state = 0; break; case 2: /* \r\n + . */ -if (ch == '\n') straynewline(); +if (ch == '\n') { straynewline = 1; return; } if (ch == '\r') { state = 3; continue; } state = 0; break; @@ -379,7 +380,9 @@ out("354 go ahead\r\n"); received(qqt,"SMTP",local,remoteip,remotehost,remoteinfo,fakehelo); + straynewline = 0; blast(hops); + if (straynewline) qmail_fail(qqt); hops = (hops = MAXHOPS); if (hops) qmail_fail(qqt); qmail_from(qqt,mailfrom.s); @@ -387,6 +390,7 @@ qqx = qmail_close(qqt); if (!*qqx) { acceptmessage(qp); return; } + if (straynewline) { out("451 See http://pobox.com/~djb/docs/smtplf.html.\r\n"); +return; } if (hops) { out("554 too many hops, this message is looping (#5.4.6)\r\n"); return; } if (databytes) if (!bytestooverflow) { out("552 sorry, that message size exceeds my databytes limit (#5.3.4)\r\n"); return; } if (*qqx == 'D') out("554 "); else out("451 "); == I said I would not invoke any RFC in vain, but I can't resist :-), I think this patch is in line with RFC 821, page 26: The receiver should not close the transmission channel until it receives and replies to a QUIT command (even if there was an error). Another solution for me would be to understand (and fix) why all these qmail-queue stay around when their father _exit()'s. For example, shouldn't straynewline() do some clean-up before _exit()'ing ? Thanks for any advice... -- | Martin Ouwehand ~ Swiss Federal Institute of Technology ~ Lausanne __|_ Email/PGP: http://slwww.epfl.ch/SIC/SL/info/Martin.html __ Educar es vincular la ciencia y la ternura [José Martí]
Re: Lots and lots of qmail-queue's
Martin Ouwehand writes... Aaron Nabil [EMAIL PROTECTED] writes: ] Has anybody seen this: from time to time, a whole bunch of qmail-queue's ] will accumulate (I'd say up to ~400), apparently doing nothing (ps shows ] that most of them have the same WCHAN, but not all of them). Most of ] them have 1 as PPID, a few still have qmail-smtpd as parent. ] ] Are they all in TIME_WAIT? It's probably one machine. It sure is. As to the TIME_WAIT, this doesn't seem to be a problem (a "netstat -an" doesn't show to many sockets, in whatever state). On mine the connections would hang around 2*MSL, but this doesn't seem to be a problem in your implementation, or you've tuned your MSL down, sometimes people do that. ] Look in the logs (or use netstat or lsof) to get the IP address of the ] machine. ] ] Use my recordio patch to record dialog just for that host. Thanks, it was very useful. You are welcome. ] I bet you'll find qmail is dropping the connection with the "bare LF" ] message. I've previously given my point of view on qmail's non-RFC ] compliance on the list before, find that message, apply the patch. You're right about the "bare LF" message. About your patch: if you mean the one where you just comment out those "if (ch == '\n') straynewline();" lines, I'm afraid that the exchange following your proposal convinced me that this isn't the right solution. Yikes! I stopped the dialog on the qmail list beacuse it's simply wasn't productive, not beacause I didn't have more to say. The author's position on the subject seems political, not technical, and the qmail list is the wrong place for a politcal debate. RFC-822 allows bare line feeds. RFC-821 prohibits termination before quit. RFC 1652 requires 8BITMIME to supress local conversions and pass data unmodified (although RFC's dealing with the MIME _body_ may disallow bare LF's, RFC 1652 defines how the MTA must handle an 8BITMIME envelope). There is no debate on these issues. It's all in black and white, and qmail violates all three. That some _draft_ version of a future RFC (that may or may not ever become an internet "standard") disallows bare LFs is a very flimsy excuse for a MTA to refuse to interoperate with a MTA that _is_ following RFC-822 _as published_. I fully intended to make sure the the working group understands that people are using the _draft_ as an excuse not to interoperate with RFC-822 MTA's, and that some language needs to be inserted about interoperability. It's perfectly OK for a 822bis MTA to not generate bare LF's itself (in fact, it shouldn't), but there are always going to be RFC-822 MTA's out there, and you need to interoperate with them. I _am_ going to take it up with the 822bis working group, but plowing through the existing 5000 messages is taking a while, and there has been some substantial discussion on the subject before. It may be a couple weeks. BUT, I'm also convinced that the present situation isn't satisfactory. It is of no comfort to me that the client is the culprit by not following some RFC if my server is on its knees ! So what can we do to avoid this ? One way would be not to _exit() in straynewline() but taking care of the state we're in is barely within the limit of my window of understanding of the source code. What do you think of this ?: Well, I guess I've already made my opinion about bare LF's known, but if your intention is still to disallow them but fix the "terminate before quit" problem, it seems like a reasonable approach. I'm not sure using a 451 error is desirable, unless your back-up MX handler is sendmail it's just going to re-queue and fail again. Might as well use 551 and be done with it. . . . Another solution for me would be to understand (and fix) why all these qmail-queue stay around when their father _exit()'s. For example, shouldn't straynewline() do some clean-up before _exit()'ing ? Thanks for any advice... I'm just guessing, but there might be some kind of deadlock or subobtimal signal handling going on with the child. Failing anything else, I'd imagine the qmail-queue would get a SIGPIPE when the qmail-smtpd dies, but purhaps it's getting masked, and you are getting zombification. Dunno, would have to look into it. -- Aaron Nabil
Lots and lots of qmail-queue's
Has anybody seen this: from time to time, a whole bunch of qmail-queue's will accumulate (I'd say up to ~400), apparently doing nothing (ps shows that most of them have the same WCHAN, but not all of them). Most of them have 1 as PPID, a few still have qmail-smtpd as parent. This has a serious impact on the through-put and reliability of our qmail server. Right now, killing and restarting "tcpserver [...] qmail-smtpd" fixes the problem, but I'd really like to know what is going on to altogether avoid this behavior. BTW, this is a Solaris 2.5 machine. Any idea ? Martin -- | Martin Ouwehand ~ Swiss Federal Institute of Technology ~ Lausanne __|_ Email/PGP: http://slwww.epfl.ch/SIC/SL/info/Martin.html __ Alors que la philosophie enseigne comment l'homme prétend penser, la beuverie montre comment il pense vraiment [René Daumal]
Re: Lots and lots of qmail-queue's
Check your tcpserver-log. I wouldn't be surprised if there's a broken M$ mailer that is trying to send the same email to you over and over again, pretty much like a DOS-attack. This has happened to me several times and having IIS with SP4 causes it. There's more information in the Qmail mailinglist-archive. http://www-archive.ornl.gov:8000/ On 17 Aug 1999, Martin Ouwehand wrote: Has anybody seen this: from time to time, a whole bunch of qmail-queue's will accumulate (I'd say up to ~400), apparently doing nothing (ps shows that most of them have the same WCHAN, but not all of them). Most of them have 1 as PPID, a few still have qmail-smtpd as parent. This has a serious impact on the through-put and reliability of our qmail server. Right now, killing and restarting "tcpserver [...] qmail-smtpd" fixes the problem, but I'd really like to know what is going on to altogether avoid this behavior. BTW, this is a Solaris 2.5 machine. Any idea ? Martin -- | Martin Ouwehand ~ Swiss Federal Institute of Technology ~ Lausanne __|_ Email/PGP: http://slwww.epfl.ch/SIC/SL/info/Martin.html __ Alors que la philosophie enseigne comment l'homme prétend penser, la beuverie montre comment il pense vraiment [René Daumal] / daj
Re: Lots and lots of qmail-queue's
] Check your tcpserver-log. I wouldn't be surprised if there's a broken M$ ] mailer that is trying to send the same email to you over and over again, ] pretty much like a DOS-attack. This has happened to me several times ] and having IIS with SP4 causes it. Yes, this seems to be the explanation, but what now ? I can filter out the faulty machines that I know about, but what if other come along ? Is there anything I can do ? Also, how do other MTA cope with this ? I'm asking this because we switched recently to qmail, but this IIS client went unoticed until now, meaning that old and big PP (our previous MTA) knew what to handle it. -- | Martin Ouwehand ~ Swiss Federal Institute of Technology ~ Lausanne __|_ Email/PGP: http://slwww.epfl.ch/SIC/SL/info/Martin.html __ La méthode que j'emploie pour te discipliner n'est[Marpa à] pas faite pour les êtres dégénérés de l'avenir [Milarepa]
Re: Lots and lots of qmail-queue's
] Check your tcpserver-log. I wouldn't be surprised if there's a broken M$ ] mailer that is trying to send the same email to you over and over again, ] pretty much like a DOS-attack. This has happened to me several times ] and having IIS with SP4 causes it. Yes, this seems to be the explanation, but what now ? I can filter out the faulty machines that I know about, but what if other come along ? Is there anything I can do ? Also, how do other MTA cope with this ? I'm asking this because we switched recently to qmail, but this IIS client went unoticed until now, meaning that old and big PP (our previous MTA) knew what to handle it. I try to contact the administrators of the broken systems and convince them to fix the problem on their side, because that's where the problem is really. I guess you could change Qmail's behaviour somehow to get rid of the problem but I don't think it's the right solution. / daj