RE: Can anyone explain this?

2001-08-09 Thread Richard Underwood

Charles Cazabon wrote:
 
 eric [EMAIL PROTECTED] wrote:
  Anyone else seeing similar stuff happening?  I'm running qmail-1.03
without
  any patches under RedHat 6.1 (kernel 2.2.12-20smp)
 
 Linux kernel version 2.2.12 has known bugs in its stack, and has been
 identified as a likely culprit for these symptoms before.  Upgrade to
 2.2.19 and the problem may just disappear.
 
If this is the problem I have been experiencing (qmail-remote
processes lasting a great deal longer than they should) then upgrading to
2.2.19 won't fix this problem.

A work-around is to enable socket keep-alives which (at least for
me) fixes the problem.

I've put the patch on the web at: http://www.duff.org/qmail/

Richard

P.S. Eric: ps -ef on modern RedHat distributions (and others) has a STIME
column which will tell you when the process started running.



RE: qmail-queue question

2001-08-09 Thread Richard Underwood

Edward,

I've had problems with qmail-remote hanging - it had nothing to do
with the queue lifetime, but with some code in qmail-remote failing,
possibly due to an O/S bug.

A fix which works for me is to enable socket keep-alives. This will
kill the socket if it has died after about 2-3 hours. 

I've put a patch on the web at http://www.duff.org/qmail/ 

Richard

-Original Message-
From: Edward McLain [mailto:[EMAIL PROTECTED]]

On a side note, is there any reason that qmail-remote should start up and
then just sit there connected to a remote host for like 6 or 7 hours trying
to send one email?  I get this all the freaking time and I'm just wandering
what exactly the freaking thing is doing? (although this problem only really
seems to occur with mindspring.com, yet if I telnet to port 25 of
mindsprings mail server and send the same message through telnet to the same
user, from the same user as the one qmail's trying to send it works just
fine and I don't get any errors or return codes.)
 



RE: Fix for qmail-remote process hanging on Linux (and possibly o ther s)

2001-08-06 Thread Richard Underwood

  try a couple of dozen connections to the same remote host 
 at the same time.
  (This is an issue in itself!)
 
 Why is this an issue?  If the remote host can handle 100 inbound
 connections, you should be able to open 100 connections to 
 them, inject
 your messages, and close the connections.  Everyone's happy.
 
 If the remote host can't handle that many, it shouldn't accept that
 many.  You'll then get connections past X deferred, and qmail 
 will back
 off.
 
It's an issue because, while in an ideal world this would be fine,
we don't live in an ideal world and not every smtp server out there will
drop connections smoothly. Instead, they hang, or accept connections that
they can't handle, leading to a reduced throughput.

If I'm sending a few thousand mails, chances are it'd be possible to
maintain full throughput without hitting the same host more than once
concurrently. Sure, if there's nothing else in the queue, then you may as
well use multiple threads per MX, but what do you lose by scheduling other
hosts first?

I've also noticed that if qmail tries to deliver (for example) 50
messages to one host concurrently, perhaps 2 will get through. The rest will
be retried, but unfortunately they tend to get retried at much the same
time. Again, 2 messages get through, and the process repeats. This simply
isn't efficient.

I think qmail is great, don't get me wrong. I just thing there is
room for improvement.

Richard

P.S. People here seem to be a little over-sensitive about this issue!



RE: Fix for qmail-remote process hanging on Linux (and possibly o ther s)

2001-08-06 Thread Richard Underwood

 From: Henning Brauer [mailto:[EMAIL PROTECTED]]
 
 On Mon, Aug 06, 2001 at 11:09:25AM +0100, Richard Underwood wrote:
  I've also noticed that if qmail tries to deliver (for example) 50
  messages to one host concurrently, perhaps 2 will get through. The rest
will
  be retried, but unfortunately they tend to get retried at much the same
  time. Again, 2 messages get through, and the process repeats. This
simply
  isn't efficient.
 
 This isn't qmails fault but the fault of the remote host. There is room
for
 improvement - just not on qmail's side. The remote host MUST NOT accept
more
 connections than it can handle. If it does the remote recipients must live
 with the delays.
 
Read what I wrote again. It IS qmail's fault. One role I use qmail
for is to accept mail which is then passed on to an exchange server on the
same network. Here's an example of what can happen ...

If the exchange server goes down, a large queue builds up. The
exchange server accepts something like 20 concurrent connections before
refusing to accept connections. This, as you say, is what the server should
do.

When the exchange server comes back up, I kick the qmail-send
process to get it to deliver the queue. At this point I should be able to go
off and do other things.

However, qmail tries to send the queue with lots of concurrent
connections. The first 20 work, but the rest are dropped. This then blocks
any further attempts for a time. After this time, the mails are tried again
- once more, lots of concurrent connections, more dropped connections, more
delays.

In the end, I resort to sitting there watching the queue and kicking
the qmail-send process until the queue is small enough to go through without
help.

The exchange server is working as it should be - it's dropping
connections once its connection limit is reached, but left alone, qmail is
being far less than efficient - sendmail with a single thread could deliver
the mail faster!

Saying that there's no room for improvement on qmail's side is pure
arrogance. Just look at the number of patches available for it for a clue.
qmail is great, it works well, but it still could be improved.

People complain about single message blocking their queue ...  Run
two copies? That works, but is it the best option? A configurable limit to
the threads per message would fix this. Prioritizing messages would be
better.

The problem I describe above could be fixed by a configurable
per-host thread limit. Can you think of a neater solution?

Blindly defending qmail isn't going to make it better. It doesn't
help, except in flame-wars. 

Richard



RE: Fix for qmail-remote process hanging on Linux (and possibly o ther s)

2001-08-06 Thread Richard Underwood

 From: Peter van Dijk [mailto:[EMAIL PROTECTED]]
 On Mon, Aug 06, 2001 at 01:07:36PM +0100, Richard Underwood wrote:
 
  When the exchange server comes back up, I kick the qmail-send
  process to get it to deliver the queue. At this point I should be able
to go
  off and do other things.
 
 Why are you kicking qmail-send? That should never be necessary in a
 production environment.
 
Life with qmail, 1.5.8 and E.1. - qmail backs off. When a server
comes back up, I want the queued mail there ASAP. If the exchange server has
been down for more than half an hour, you're looking at unacceptable delays.
(Unacceptable to the company, that is.)

 If this box' only function is relaying to the exchange server, why not
 set concurrencyremote to 20?
 
It's not the only function of the server. Even if it was, that's a
hack. It's a workable solution, but also a hack. I could install another
qmail instance, but then that's worse in my opinion. 

 Have you tried not kicking it at all? qmail has a very efficient retry
 schedule, that doesn't even bog down heavily loaded servers.
 
I've not, for the reason given above. What I described was just an
example of what can happen to illustrate my point. I've seen similar
problems without kicking the queue, but nothing so clear or repeatable. I
was just giving an example of where qmail doesn't act 'perfectly'.

 Lots of patches satisfy needs easily fixed without patches. Lots of
 patches satisfy needs that only a few users have.
 
Oh, I quite agree. Apart from my server where I run virtual domains,
I use qmail out of the box. (Actually, one server now has the keep-alives
patch happily installed.) 

 And if you don't like this behaviour: write a patch (or find one), or
 stop using qmail. Nobody is forcing you to use qmail.
 
Perhaps, but using this list to tell people (often quite forcefully)
that the behaviour they are experiencing is as it should be, and it's the
rest of the world that's broken, and that qmail is perfect already isn't
going to encourage anyone to help. 

If I had the time, I'd write a patch. I wouldn't do it without
discussing it on a mailing list first, though ... Has it been suggested/done
before? Does anyone have any suggestions for better algorithms? What
features would people want?

But I don't think I will. Even suggesting that there was an issue (I
didn't say bug, and I didn't say problem, I said issue) with qmail resulted
in some very abrupt replies, telling me that I was wrong, and qmail was
perfect.

This stifles discussion. The nobody is forcing you cliche makes
things worse. I personally think qmail is great, and will always use it -
but all this makes me less willing to contribute - if the (apparent) general
consensus is that people are happy with qmail as it is, then I'll leave you
all in peace. I've got enough servers to split tasks up, a patch would be
good, but I may as well tailor it to my needs and not bother sharing it.

I wonder how many other people have been put off like that?

I think I've been quite reasonable with the messages I've sent. I've
said that I like qmail numerous times. I've said I want to improve it ...
and people have told me it needs no improvement. I simply think that this is
short-sighted.

I'll leave you all alone now.

Richard



RE: Fix for qmail-remote process hanging on Linux (and possibly o ther s)

2001-08-06 Thread Richard Underwood

 From: MarkD [mailto:[EMAIL PROTECTED]]
 
 Has this been discussed before? Yes. Endlessly. Check the
 archives. You are breaking no new ground here at all.
 
I'm sure it has - it's an important issue.

 If you want to be truly helpful, you might want to read the archives
 on this matter and then suggest/do something beyond what has been
 already been discussed ad nauseum.
 
I would have, if I had been investigating that problem. I was
looking at a completely different problem at the time. If you look at the
title of this thread, you'll see that it's about qmail-remote processes
hanging on Linux.

The argument was started because I described multiple connections to
a site as an issue. I could have ignored the replies telling me that it
wasn't an issue as flame-bait, but having seen some of the other replies on
this list, I thought it'd be more constructive to explain what I had meant.

I still believe it's an issue. 

Richard



Fix for qmail-remote process hanging on Linux (and possibly others)

2001-08-03 Thread Richard Underwood

Hi,

I asked about qmail-remote processes hanging in read() on this list
a few days ago. It appears that this has been reported before, but no
conclusion seemed to have been made.

The problem appears to be in timeoutread() which uses select() to
prevent read() from blocking. For whatever reason, during heavy load, this
fails and the read() call blocks. The TCP connection stays in the
established state and therefore the process never terminates, leading to a
reduction in the number of available concurrent remote deliveries.

One suggestion (from MarkD) was to set a large-value alarm signal to
terminate the process, which would work (qmail would see the qmail-remote
process as crashed and try it again) but I don't particularly like this
method. Potentially you could cut a large message sent over a slow
connection off for one thing.

Another solution, which I have been trying over the last few days is
to turn on socket keep alives. This has the effect of closing the socket if
no data has been sent over it for a fixed period (usually 2 or 3 hours.) The
read() call will end as if the remote host dropped the connection and
qmail-remote will terminate normally.

It all seems to be working, so if anyone else is having the same
problem, you may like to try this fix too. I've included a patch for
qmail-remote.c - it's not exactly beautiful code, but it works for me.

Good luck,

Richard

*** qmail-1.03/qmail-remote.c   Mon Jun 15 11:53:16 1998
--- qmail-1.03.patched/qmail-remote.c   Fri Aug  3 14:34:27 2001
***
*** 338,344 
int flagallaliases;
int flagalias;
char *relayhost;
!  
sig_pipeignore();
if (argc  4) perm_usage();
if (chdir(auto_qmail) == -1) temp_chdir();
--- 338,345 
int flagallaliases;
int flagalias;
char *relayhost;
!   int s_opt;
! 
sig_pipeignore();
if (argc  4) perm_usage();
if (chdir(auto_qmail) == -1) temp_chdir();
***
*** 415,420 
--- 416,423 
  if (smtpfd == -1) temp_oserr();
   
  if (timeoutconn(smtpfd,ip.ix[i].ip,(unsigned int)
port,timeoutconnect) == 0) {
+   s_opt=1;
+   setsockopt(smtpfd,SOL_SOCKET,SO_KEEPALIVE,s_opt,sizeof(int));
tcpto_err(ip.ix[i].ip,0);
partner = ip.ix[i].ip;
smtp(); /* does not return */



RE: Fix for qmail-remote process hanging on Linux (and possibly other s)

2001-08-03 Thread Richard Underwood

 I just looked at the server I had problems with -- 15 hung 
 qmail-remotes :(
 
Not good! I peaked at 26 before I noticed.

 How did you test this patch?
 Are you saying that you were able to reliably reproduce the problem?
 I could never do this... If so, how?
 
I tested the patch by running it on the live server for three days.
I was experiencing on average 1-2 processes getting stuck a day and haven't
had one stuck since. The problems generally started during large mailing
which happen daily on this server.

I couldn't repeat the problem, but it happened reliably enough for
me to believe that it has now been stopped.

The patch itself should not affect the running of the program in any
way except dropping dead connections.

 There is a lot of mistery in this:  Most (but not all) reports 
 had connections hung to outblaze.com
 Most (but not all) servers ran Linux.
 
 It's weird...
 
It is. I didn't spot a pattern in the remote hosts, but then I
didn't try to. I suspect it's something to do with stateful firewalls
dropping a session after a period of inactivity, it doesn't explain why the
code is affected by it all, though.

My other suspicion is that there's a chance that my one server will
try a couple of dozen connections to the same remote host at the same time.
(This is an issue in itself!) It could be that a firewall in the path is
mistaking the connection as a DOS attempt and responding weirdly, kicking
off a bug with select.

I'll let you know if the problems re-appear.

Richard



RE: Problems with qmail-remote hanging

2001-07-31 Thread Richard Underwood

 This problem's been reported before. If your OS says that an fd is
 readable via select(), then the read() should not block.
 
 As you observe though, the read is blocking so your OS is probably not
 telling the truth when it returns from the select().
 
 The archives have plenty of discussion on this and the simplest
 solution is to put a large-value alarm() handler in qmail-remote. No
 one as yet seems to be able to narrow down which OSes do this and
 under what circumstances.

Mark,

Thanks for the reply. I only seem to experience the problem with
large mail-outs. One possibility is that because of the way qmail works,
there's a significant chance that we will be making a large number of
simultaneous connections to some servers.

It's possible that this is causing a connection to be blackholed
somewhere ... that doesn't explain why select/read are failing to agree,
though. Perhaps select thinks the connection is closed, but read doesn't.

Setting an alarm is a nasty hack in my opinion, but I have to admit
that it's something I considered. A slightly neater solution might be to use
the SO_KEEPALIVE socket option - if it works (and there isn't a good reason
not to use it) that is.

What would be better is finding out why this happens, of course.

Thanks,

Richard

P.S. If anyone is keeping track, Linux 2.2.19, concurrencyremote set to 200



Problems with qmail-remote hanging

2001-07-30 Thread Richard Underwood

Hi,

I've been running qmail on a number of platforms quite happily for a
while - until now I've had no problems at all. However, I am now
experiencing a problem with qmail-remote hanging.

I'm running qmail on this server for sending mails from websites and
bulk mail-outs (up to about 40,000 recipients.) The server doesn't receive
mails iteself to a great extent.

It's a dual-cpu Dell running Linux. I have another very similar
installation which has absolutely no problems. Qmail on this server is 100%
standard Qmail 1.03.

The problem I see is with qmail-remote failing to terminate when a
connection times-out. If left alone, the number of stuck processes will
slowly climb, after about a month I had about 25 such processes. The network
connections remain in the ESTABLISHED state.

Looking at the process list right now, I have one stuck:

# ps -ef | grep qmail-remote
qmailr   12278   662  0 13:13 ?00:00:00 qmail-remote
xx.co.uk xx
qmailr   19876   662  0 16:09 ?00:00:00 qmail-remote xx.com

root 19912 19489  0 16:10 pts/000:00:00 grep qmail-remote

# strace -p 12278
read(3,  unfinished ...

... all socket read()s in qmail-remote should be protected by a
select and therefore should not block as this one is doing now. After
recompiling with debugging and symbols, I get ...

# gdb qmail-remote 12278
GNU gdb 5.0
Attaching to program: /home/qmail/bin/qmail-remote, Pid 12278
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/libc.so.6...wdone.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...hdone.
Loaded symbols for /lib/ld-linux.so.2
0x40103424 in __libc_read () from /lib/libc.so.6
(gdb) where
#0  0x40103424 in __libc_read () from /lib/libc.so.6
#1  0x3b654f80 in ?? ()
#2  0x8048f05 in saferead (fd=-1, buf=0x8051180 , len=128)
at qmail-remote.c:113
#3  0x804d193 in oneread (op=0x8048ee8 saferead, fd=-1, buf=0x8051180 , 
len=128) at substdi.c:14
#4  0x804d25e in substdio_feed (s=0x804f3d0) at substdi.c:44
#5  0x804d3ab in substdio_get (s=0x804f3d0, buf=0xbdc7 , len=1)
at substdi.c:75
#6  0x8048f70 in get (ch=0xbdc7 ) at qmail-remote.c:137
#7  0x8048fda in smtpcode () at qmail-remote.c:150
#8  0x80492cb in smtp () at qmail-remote.c:225
#9  0x8049d31 in main (argc=4, argv=0xbe94) at qmail-remote.c:420
#10 0x4004bf31 in __libc_start_main (main=0x804987c main, argc=4, 
ubp_av=0xbe94, init=0x804878c _init, fini=0x804dd10 _fini, 
rtld_fini=0x4000e274 _dl_fini, stack_end=0xbe8c)
at ../sysdeps/generic/libc-start.c:129

... in smtp() ...

220 {
221   unsigned long code;
222   int flagbother;
223   int i;
224  
225 =if (smtpcode() != 220) quit(ZConnected to , but greeting
failed);
226  
227   substdio_puts(smtpto,HELO );
228   substdio_put(smtpto,helohost.s,helohost.len);
229   substdio_puts(smtpto,\r\n);

saferead() calls timeoutread() which calls select() and then read().
fd=-1 is a red-herring, it's not used by saferead in qmail-remote.

Can anyone explain this, or has anyone experienced anything similar?

Thanks,

Richard