Re: Lots and lots of qmail-queue's

1999-08-18 Thread Aaron Nabil

Martin Ouwehand writes...
Has anybody seen this: from time to time, a whole bunch of qmail-queue's
will accumulate (I'd say up to ~400), apparently doing nothing (ps shows
that most of them have the same WCHAN, but not all of them). Most of
them have 1 as PPID, a few still have qmail-smtpd as parent.

Are they all in TIME_WAIT?  It's probably one machine.

Look in the logs (or use netstat or lsof) to get the IP address of the
machine.

Use my recordio patch to record dialog just for that host.

I bet you'll find qmail is dropping the connection with the "bare LF" 
message.  I've previously given my point of view on qmail's non-RFC 
compliance on the list before, find that message, apply the patch.

This has a serious impact on the through-put and reliability of our qmail
server. Right now, killing and restarting "tcpserver [...] qmail-smtpd"
fixes the problem, but I'd really like to know what is going on to altogether
avoid this behavior. BTW, this is a Solaris 2.5 machine.

Any idea ?
 Martin


-- 
Aaron Nabil



Re: Lots and lots of qmail-queue's

1999-08-18 Thread Martin Ouwehand

Aaron Nabil [EMAIL PROTECTED] writes:

] Has anybody seen this: from time to time, a whole bunch of qmail-queue's
] will accumulate (I'd say up to ~400), apparently doing nothing (ps shows
] that most of them have the same WCHAN, but not all of them). Most of
] them have 1 as PPID, a few still have qmail-smtpd as parent.
] 
] Are they all in TIME_WAIT?  It's probably one machine.

It sure is. As to the TIME_WAIT, this doesn't seem to be a problem
(a "netstat -an" doesn't show to many sockets, in whatever state).

] Look in the logs (or use netstat or lsof) to get the IP address of the
] machine.
] 
] Use my recordio patch to record dialog just for that host.

Thanks, it was very useful.

] I bet you'll find qmail is dropping the connection with the "bare LF" 
] message.  I've previously given my point of view on qmail's non-RFC 
] compliance on the list before, find that message, apply the patch.

You're right about the "bare LF" message. About your patch: if you mean
the one where you just comment out those "if (ch == '\n') straynewline();"
lines, I'm afraid that the exchange following your proposal convinced me
that this isn't the right solution.

BUT, I'm also convinced that the present situation isn't satisfactory. It is
of no comfort to me that the client is the culprit by not following some RFC
if my server is on its knees ! So what can we do to avoid this ? One way
would be not to _exit() in straynewline() but taking care of the state
we're in is barely within the limit of my window of understanding of the
source code. What do you think of this ?:

 ==
--- qmail-smtpd.c.orig  Mon Jun 15 12:53:16 1998
+++ qmail-smtpd.c   Wed Aug 18 15:49:57 1999
@@ -47,7 +47,6 @@
 void die_nomem() { out("421 out of memory (#4.3.0)\r\n"); flush(); _exit(1); }
 void die_control() { out("421 unable to read controls (#4.3.0)\r\n"); flush(); 
_exit(1); }
 void die_ipme() { out("421 unable to figure out my IP addresses (#4.3.0)\r\n"); 
flush(); _exit(1); }
-void straynewline() { out("451 See http://pobox.com/~djb/docs/smtplf.html.\r\n"); 
flush(); _exit(1); }
 
 void err_bmf() { out("553 sorry, your envelope sender is in my badmailfrom list 
(#5.7.1)\r\n"); }
 void err_nogateway() { out("553 sorry, that domain isn't in my list of allowed 
rcpthosts (#5.7.1)\r\n"); }
@@ -290,6 +289,8 @@
   qmail_put(qqt,ch,1);
 }
 
+int straynewline;
+
 void blast(hops)
 int *hops;
 {
@@ -322,17 +323,17 @@
 }
 switch(state) {
   case 0:
-if (ch == '\n') straynewline();
+if (ch == '\n') { straynewline = 1; return; }
 if (ch == '\r') { state = 4; continue; }
 break;
   case 1: /* \r\n */
-if (ch == '\n') straynewline();
+if (ch == '\n') { straynewline = 1; return; }
 if (ch == '.') { state = 2; continue; }
 if (ch == '\r') { state = 4; continue; }
 state = 0;
 break;
   case 2: /* \r\n + . */
-if (ch == '\n') straynewline();
+if (ch == '\n') { straynewline = 1; return; }
 if (ch == '\r') { state = 3; continue; }
 state = 0;
 break;
@@ -379,7 +380,9 @@
   out("354 go ahead\r\n");
  
   received(qqt,"SMTP",local,remoteip,remotehost,remoteinfo,fakehelo);
+  straynewline = 0;
   blast(hops);
+  if (straynewline) qmail_fail(qqt);
   hops = (hops = MAXHOPS);
   if (hops) qmail_fail(qqt);
   qmail_from(qqt,mailfrom.s);
@@ -387,6 +390,7 @@
  
   qqx = qmail_close(qqt);
   if (!*qqx) { acceptmessage(qp); return; }
+  if (straynewline) { out("451 See http://pobox.com/~djb/docs/smtplf.html.\r\n"); 
+return; }
   if (hops) { out("554 too many hops, this message is looping (#5.4.6)\r\n"); return; 
}
   if (databytes) if (!bytestooverflow) { out("552 sorry, that message size exceeds my 
databytes limit (#5.3.4)\r\n"); return; }
   if (*qqx == 'D') out("554 "); else out("451 ");

 ==

I said I would not invoke any RFC in vain, but I can't resist :-), I think
this patch is in line with RFC 821, page 26:

 The receiver should not close the transmission channel until
 it receives and replies to a QUIT command (even if there was an
 error).

Another solution for me would be to understand (and fix) why all these
qmail-queue stay around when their father _exit()'s. For example,
shouldn't straynewline() do some clean-up before _exit()'ing ?

Thanks for any advice...


--
  |  Martin Ouwehand ~ Swiss Federal Institute of Technology ~ Lausanne
__|_ Email/PGP: http://slwww.epfl.ch/SIC/SL/info/Martin.html __
Educar es vincular la ciencia y la ternura [José Martí]



Re: Lots and lots of qmail-queue's

1999-08-18 Thread Aaron Nabil

Martin Ouwehand writes...
Aaron Nabil [EMAIL PROTECTED] writes:

] Has anybody seen this: from time to time, a whole bunch of qmail-queue's
] will accumulate (I'd say up to ~400), apparently doing nothing (ps shows
] that most of them have the same WCHAN, but not all of them). Most of
] them have 1 as PPID, a few still have qmail-smtpd as parent.
] 
] Are they all in TIME_WAIT?  It's probably one machine.

It sure is. As to the TIME_WAIT, this doesn't seem to be a problem
(a "netstat -an" doesn't show to many sockets, in whatever state).

On mine the connections would hang around 2*MSL, but this doesn't
seem to be a problem in your implementation, or you've tuned your
MSL down, sometimes people do that.


] Look in the logs (or use netstat or lsof) to get the IP address of the
] machine.
] 
] Use my recordio patch to record dialog just for that host.

Thanks, it was very useful.

You are welcome.


] I bet you'll find qmail is dropping the connection with the "bare LF" 
] message.  I've previously given my point of view on qmail's non-RFC 
] compliance on the list before, find that message, apply the patch.

You're right about the "bare LF" message. About your patch: if you mean
the one where you just comment out those "if (ch == '\n') straynewline();"
lines, I'm afraid that the exchange following your proposal convinced me
that this isn't the right solution.

Yikes!  I stopped the dialog on the qmail list beacuse it's simply wasn't 
productive, not beacause I didn't have more to say.  The author's position
on the subject seems political, not technical, and the qmail list is the
wrong place for a politcal debate.

RFC-822 allows bare line feeds.  RFC-821 prohibits termination before
quit.  RFC 1652 requires 8BITMIME to supress local conversions and
pass data unmodified (although RFC's dealing with the MIME _body_ may
disallow bare LF's, RFC 1652 defines how the MTA must handle an 8BITMIME 
envelope).  There is no debate on these issues.  It's all in black
and white, and qmail violates all three.

That some _draft_ version of a future RFC (that may or may not ever become
an internet "standard") disallows bare LFs is a very flimsy excuse for a
MTA to refuse to interoperate with a MTA that _is_ following RFC-822
_as published_.  I fully intended to make sure the the working group 
understands that people are using the _draft_ as an excuse not to 
interoperate with RFC-822 MTA's, and that some language needs to be
inserted about interoperability.  It's perfectly OK for a 822bis MTA
to not generate bare LF's itself (in fact, it shouldn't), but there 
are always going to be RFC-822 MTA's out there, and you need to 
interoperate with them.

I _am_ going to take it up with the 822bis working group, but plowing 
through the existing 5000 messages is taking a while, and there has been 
some substantial discussion on the subject before.  It may be a couple
weeks.

BUT, I'm also convinced that the present situation isn't satisfactory. It is
of no comfort to me that the client is the culprit by not following some RFC
if my server is on its knees ! So what can we do to avoid this ? One way
would be not to _exit() in straynewline() but taking care of the state
we're in is barely within the limit of my window of understanding of the
source code. What do you think of this ?:

Well, I guess I've already made my opinion about bare LF's known, but
if your intention is still to disallow them but fix the "terminate
before quit" problem, it seems like a reasonable approach.  I'm not
sure using a 451 error is desirable, unless your back-up MX handler 
is sendmail it's just going to re-queue and fail again.  Might as well
use 551 and be done with it.

 . . .

Another solution for me would be to understand (and fix) why all these
qmail-queue stay around when their father _exit()'s. For example,
shouldn't straynewline() do some clean-up before _exit()'ing ?

Thanks for any advice...

I'm just guessing, but there might be some kind of deadlock or
subobtimal signal handling going on with the child.  Failing anything
else, I'd imagine the qmail-queue would get a SIGPIPE when the 
qmail-smtpd dies, but purhaps it's getting masked, and you are getting
zombification.  Dunno, would have to look into it.


-- 
Aaron Nabil



Lots and lots of qmail-queue's

1999-08-17 Thread Martin Ouwehand

Has anybody seen this: from time to time, a whole bunch of qmail-queue's
will accumulate (I'd say up to ~400), apparently doing nothing (ps shows
that most of them have the same WCHAN, but not all of them). Most of
them have 1 as PPID, a few still have qmail-smtpd as parent.

This has a serious impact on the through-put and reliability of our qmail
server. Right now, killing and restarting "tcpserver [...] qmail-smtpd"
fixes the problem, but I'd really like to know what is going on to altogether
avoid this behavior. BTW, this is a Solaris 2.5 machine.

Any idea ?
 Martin


--
  |  Martin Ouwehand ~ Swiss Federal Institute of Technology ~ Lausanne
__|_ Email/PGP: http://slwww.epfl.ch/SIC/SL/info/Martin.html __
Alors que la philosophie enseigne comment l'homme prétend
penser, la beuverie montre comment il pense vraiment  [René Daumal]



Re: Lots and lots of qmail-queue's

1999-08-17 Thread Daniel Jovius - Telenordia AB /Algonet


Check your tcpserver-log. I wouldn't be surprised if there's a broken M$
mailer that is trying to send the same email to you over and over again,
pretty much like a DOS-attack. This has happened to me several times 
and having IIS with SP4 causes it. There's more information in the
Qmail mailinglist-archive. http://www-archive.ornl.gov:8000/



On 17 Aug 1999, Martin Ouwehand wrote:

 Has anybody seen this: from time to time, a whole bunch of qmail-queue's
 will accumulate (I'd say up to ~400), apparently doing nothing (ps shows
 that most of them have the same WCHAN, but not all of them). Most of
 them have 1 as PPID, a few still have qmail-smtpd as parent.
 
 This has a serious impact on the through-put and reliability of our qmail
 server. Right now, killing and restarting "tcpserver [...] qmail-smtpd"
 fixes the problem, but I'd really like to know what is going on to altogether
 avoid this behavior. BTW, this is a Solaris 2.5 machine.
 
 Any idea ?
  Martin
 
 
 --
   |  Martin Ouwehand ~ Swiss Federal Institute of Technology ~ Lausanne
 __|_ Email/PGP: http://slwww.epfl.ch/SIC/SL/info/Martin.html __
 Alors que la philosophie enseigne comment l'homme prétend
 penser, la beuverie montre comment il pense vraiment  [René Daumal]
 
 

/ daj



Re: Lots and lots of qmail-queue's

1999-08-17 Thread Martin Ouwehand


] Check your tcpserver-log. I wouldn't be surprised if there's a broken M$
] mailer that is trying to send the same email to you over and over again,
] pretty much like a DOS-attack. This has happened to me several times 
] and having IIS with SP4 causes it.

Yes, this seems to be the explanation, but what now ? I can filter out
the faulty machines that I know about, but what if other come along ?
Is there anything I can do ?

Also, how do other MTA cope with this ? I'm asking this because we switched
recently to qmail, but this IIS client went unoticed until now, meaning
that old and big PP (our previous MTA) knew what to handle it.



--
  |  Martin Ouwehand ~ Swiss Federal Institute of Technology ~ Lausanne
__|_ Email/PGP: http://slwww.epfl.ch/SIC/SL/info/Martin.html __
La méthode que j'emploie pour te discipliner n'est[Marpa à]
pas faite pour les êtres dégénérés de l'avenir   [Milarepa]



Re: Lots and lots of qmail-queue's

1999-08-17 Thread Daniel Jovius - Telenordia AB /Algonet

 
 ] Check your tcpserver-log. I wouldn't be surprised if there's a broken M$
 ] mailer that is trying to send the same email to you over and over again,
 ] pretty much like a DOS-attack. This has happened to me several times 
 ] and having IIS with SP4 causes it.
 
 Yes, this seems to be the explanation, but what now ? I can filter out
 the faulty machines that I know about, but what if other come along ?
 Is there anything I can do ?
 
 Also, how do other MTA cope with this ? I'm asking this because we switched
 recently to qmail, but this IIS client went unoticed until now, meaning
 that old and big PP (our previous MTA) knew what to handle it.
 

I try to contact the administrators of the broken systems and convince
them to fix the problem on their side, because that's where the problem is
really. I guess you could change Qmail's behaviour somehow to get rid of
the problem but I don't think it's the right solution.


/ daj