----- Original Message ----- From: "Claudio Jeker" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Thursday, February 05, 2004 2:08 PM Subject: Re: QMQP problem
> On Thu, Feb 05, 2004 at 09:31:48AM +0700, Frank Rabitsch wrote: > > guys, > > > > First of all thanks for all the great work you put into releasing the > > monthly QMail LDAP patch. It saves me a lot of work because this patch > > provides me a one stop solution for my QMail environment. > > > > I have a strange problem with QMQP. I'm running a QMail LDAP cluster in > > three different countries. In two out of the three countries my users > > are complaining of receiving a lot of double messages. > > > > Investigation of the log file learns me that the messages are indeed > > send multiple times. The only difference is the qmail ID (946 and 1041) > > and the time is slightly different on the receiving server. ID's are the > > same on the sending server. > > > > Received: (qmail 946 invoked from network); 4 Feb 2004 03:05:05 -0000 > > Received: from unknown ([10.1.0.7]) by mail.ph.magnusint.com > > (qmail-ldap-1.03) with compressed QMQP; 4 Feb 2004 03:05:05 -0000 > > Delivered-To: CLUSTERHOST mail.id.magnusint.com [EMAIL PROTECTED] > > > > Received: (qmail 1041 invoked from network); 4 Feb 2004 03:07:36 -0000 > > Received: from unknown ([10.1.0.7]) by mail.ph.magnusint.com > > (qmail-ldap-1.03) with compressed QMQP; 4 Feb 2004 03:07:36 -0000 > > Delivered-To: CLUSTERHOST mail.id.magnusint.com [EMAIL PROTECTED] > > > > Next to that I see a number of error messages on the sending server: > > > > @400000004021a06822ddb66c delivery 842: deferral: > > Unable_to_cluster-forward_message:_communication_with_mail_server_failed_(#4 .4.2)./ > > > > @400000004020c29a27a0013c delivery 2761: deferral: > > Unable_to_cluster-forward_message:_connection_to_mail_server_timed_out_(#4.4 .1)./ > > > > At first I thought there was a version conflict between QMail > > installations but over the weekend I installed the servers on exactly > > the same QMail LDAP patch (20040101). I did a clean install with default > > settings provided in the boot scripts. QMQP compression is enabled on > > the servers. > > > > My suspicion is that the communication channel is somehow interrupted > > before the receiving party can confirm correct reception of the email > > which causes the sending party to resend the email. The strange thing is > > that this only happens in two out of three servers. On the 3rd server I > > never receive any double messages. > > > > Is this problem connection related? If yes, is there a setting to change > > so that QMQP is less sensitive to connection errors? SMTP doesn't show > > the above mentioned problems on the same connection so I assume that > > QMQP should have the same behavior. > > > > Anything else that went wrong in my setup? Your feedback is appreciated. > > > > I see two possibilities: > 1. there is a problem with compression. I have a setup that uses > compressed qmqp quite excessively without a problem. > 2. timeouts. qmqp is known to have to radical timeouts. > qmqpc has a 10 sec connect timeout and a 60 sec read timeout > qmqpd a generic 3600 sec timeout. > > The 10 sec timeout seems to be somewhat short but that would definitifly > not cause double deliveries. > > The "connection_to_mail_server_timed_out" error is the 10 sec timer and my > guess is, that all tcpserver slosts where full. > The second error seems to be a read timeout just at the end of the > transaction. Do you have a log of the qmail-qmqpd? Difficult to match the failing messages on the sending side with log entries on the receiving side. Let me give it a try... This is how a normal session should look like according to me: 2004-02-03 20:01:07.186652500 tcpserver: pid 23973 from 10.2.0.1 2004-02-03 20:01:07.189260500 tcpserver: ok 23973 mail.ph.magnusint.com:10.2.0.7:628 :10.2.0.1::59579 2004-02-03 20:01:07.196407500 ok 1075813267 qp 23974 ddc saved 43 percent 2004-02-03 20:01:07.196737500 tcpserver: end 23973 status 0 It starts up the connection then tcpserver reports OK, and the pid is ended in the 4th line. This logentry is from a null client on the firewall on the same subnet so on a fast network connection. On the 3rd line it reports ok but I don't know how to match the number 23974 to the original message pid 23973. If the session is fast its most likely one number higher but on slow sessions the number is not sequential. The following entries are of a failing session: 2004-02-03 18:37:15.950854500 tcpserver: pid 23576 from 10.1.0.7 2004-02-03 18:37:15.952341500 tcpserver: ok 23576 mail.ph.magnusint.com:10.2.0.7:628 :10.1.0.7::32940 ... 2004-02-03 19:37:15.943964500 tcpserver: end 23576 status 28416 As you can see there is one hour delay and the status code is 28416. Don't know what this code means. If I do a flush on the sending side it starts up ten delivries at once. Some of them return with the "communication_with_mail_server_failed" message. On the receiving side I see that some of them end with status 28416 as above. Some of them simply end with status 0 however I don't get an OK messages on the receiving side (e.g. ...ok 1075813267 qp 23974 ddc saved 43 percent) for all of the ending connections with status 0. Let me know what could be the cause of this problem or how I should analyze the log-files. Thanks. regards Frank > > -- > :wq Claudio >
