Re: Fix for qmail-remote process hanging on Linux (and possibly other s)

2001-08-03 Thread Yevgeniy Miretskiy

On Fri, Aug 03, 2001 at 03:07:57PM +0100, Richard Underwood wrote:
 Hi,
 
   I asked about qmail-remote processes hanging in read() on this list
 a few days ago. It appears that this has been reported before, but no
 conclusion seemed to have been made.

I just looked at the server I had problems with -- 15 hung qmail-remotes :(

snip
   Another solution, which I have been trying over the last few days is
 to turn on socket keep alives. This has the effect of closing the socket if
 no data has been sent over it for a fixed period (usually 2 or 3 hours.) The
 read() call will end as if the remote host dropped the connection and
 qmail-remote will terminate normally.
 
   It all seems to be working, so if anyone else is having the same
 problem, you may like to try this fix too. I've included a patch for
 qmail-remote.c - it's not exactly beautiful code, but it works for me.
 

How did you test this patch?
Are you saying that you were able to reliably reproduce the problem?
I could never do this... If so, how?

There is a lot of mistery in this:  Most (but not all) reports 
had connections hung to outblaze.com
Most (but not all) servers ran Linux.

It's weird...




Re: qmail-remote (cry wolf?)

2001-06-09 Thread Yevgeniy Miretskiy

On Sat, Jun 09, 2001 at 06:32:55AM -0400, Troy Settle wrote:
 Yes, I've had qmail-remote processes sit there for weeks.  I think that
 instead of killing them off wholesale, I'll pick one or two processes and
 see just how long they'll hang around. I'll post weekly updates if there's
 any interest.

Here is what I have on one of mail servers (ps -waux|grep qmail-remote, real email
addresses removed, domain names are left. I only left user, pids, state, date, and prog
name on the output for readability purposes):
 
qmailr 7365   S May19 qmail-remote iname.com [EMAIL PROTECTED] [EMAIL PROTECTED]
qmailr 14602  S May19 qmail-remote mail.com [EMAIL PROTECTED] [EMAIL PROTECTED]
qmailr 25415  S May19 qmail-remote careful.com [EMAIL PROTECTED] [EMAIL PROTECTED]
qmailr 25875  S May19 qmail-remote programmer.net [EMAIL PROTECTED] 
[EMAIL PROTECTED]
qmailr 25902  S May19 qmail-remote mail.com [EMAIL PROTECTED] [EMAIL PROTECTED]
qmailr 852S May19 qmail-remote mail.com [EMAIL PROTECTED] [EMAIL PROTECTED]
qmailr 20283  S May25 qmail-remote ziplip.com [EMAIL PROTECTED] [EMAIL PROTECTED]
qmailr 29814  S May18 qmail-remote mail.com [EMAIL PROTECTED] [EMAIL PROTECTED]
qmailr 25877  S May19 qmail-remote mail.com [EMAIL PROTECTED] [EMAIL PROTECTED]
qmailr 25145  S May19 qmail-remote mail.com [EMAIL PROTECTED] [EMAIL PROTECTED]
qmailr 27208  S Jun08 qmail-remote hp.com [EMAIL PROTECTED] [EMAIL PROTECTED]
qmailr 27070  S Jun08 qmail-remote mail.com [EMAIL PROTECTED] [EMAIL PROTECTED]
qmailr 11525  S Jun08 qmail-remote best-service.com [EMAIL PROTECTED] 
[EMAIL PROTECTED]
qmailr 13766  S Jun08 qmail-remote mad.scientist.com [EMAIL PROTECTED] 
[EMAIL PROTECTED]

As you can see, processes running since May 19th cannot possibly be explained by
slow deliver -- 20 days is just too much.
The following domains go through outblaze.com mail servers:
  iname.com
  mail.com
  careful.com
  programmer.net
  best-service.com
  mad.scientist.com

The following domains do not go through outblaze:
  ziplip.com
  hp.com

Unforunatelly, I cannot explain this situation by blaming everything on outblaze.



-- 
  Eugene Miretskiy [EMAIL PROTECTED]
  InVision.com, INC.  (631) 543-1000
  www.invision.net  /  www.longisland.com 



Re: qmail-remote (cry wolf?)

2001-06-08 Thread Yevgeniy Miretskiy

One more time,

I did tcpdump and strace on stuck qmail-remote for over an hour.
strace shows that qmail-remote is stuck on: 'read(3', and tcpdump shows
that nothing comes in.

On Fri, Jun 08, 2001 at 03:13:54PM +, Mark wrote:
  processed those 1500 messages in less than 30 minutes.  However, it left
  behind another handfull of stuck qmail-remote processes.  Other messages
  were undeliverable and left in the queue, and still others were sent back to
  sender with permanent errors.
 
 What do you mean by stuck? Do you mean they *never* go away - even
 after a day or two? As others have pointed out, a slow delivery can
 take a long, long time. That's not necessarily a problem, that's just
 the way it is.
 
 To find out a bit more about what a stuck qmail-remote is doing, you
 may want to ktrace it and show us the output. Find the process id of the
 stuck qmail-remote and then as root go: ktrace -p thepid
 
 Leave that running for at least an hour and show us the output. Yes, I
 mean at least an hour.
 
 
 Regards.
 

-- 
  Eugene Miretskiy [EMAIL PROTECTED]
  InVision.com, INC.  (631) 543-1000
  www.invision.net  /  www.longisland.com 



Re: qmail-remote (cry wolf?)

2001-06-08 Thread Yevgeniy Miretskiy

On Fri, Jun 08, 2001 at 09:47:16PM +, Mark wrote:
 Then it's an OS bug.
 
 qmail-remote only gets to the read() if the OS (via select() ) says
 that the read will not block. Ergo, the OS is lying.

If it's OS bug, anybody heard/knows of such severe network related
bug in RedHat 6.2?

What about FreeBSD 4.2 (I believe somebody reported problem with
FreeBSD as well)???

What are the chances of _such_ bug in _both_ OSes?
I'd like to mention, that I ran qmail of FreeBSD (starting from 3.x all
the way to latest) for couple years and _never_ observed this behaviour
on FreeBSD.

Is it possible that some external devices s.a.
switch/router/firewall/anything could be causing this problem?


-- 
  Eugene Miretskiy [EMAIL PROTECTED]
  InVision.com, INC.  (631) 543-1000
  www.invision.net  /  www.longisland.com 



qmail-remote hangs

2001-06-07 Thread Yevgeniy Miretskiy

Hello,

I encountered very veird problem with qmail-remote hanging
indefinitelly (for over 2 weeks).  Here are the details:

qmail-remote stays in ESTABLESHED state forever.
strace show that qmail-remote hangs doing: 'read(3,'

Most of hanged qmail remote are trying to deliver to outblaze.com
(or to domains using outblaze.com mail servers, s.a. mail.com)
There were couple posts to the list
(http://www.ornl.gov/its/archives/mailing-lists/qmail/2001/05/msg00558.html, and
 http://www.ornl.gov/its/archives/mailing-lists/qmail/2001/05/msg01333.html)
outlining similar problem (also with outblaze).  However, I had 2 qmail-remotes
get stuck on different domains.

I observed this behaviour on 2 mail servers (OS and setup is the same).

OS is Rhat 6.2

I'm at complete loss and have no idea how qmail-remote can get stuck
trying to read from fd3.  After looking at qmail-remote.c, it's apparent
that qmail-remote calls timeoutread to perform network reads.
Timeoutread does select() syscall before doing read.
So, the only way we can get to read is when selected returned w/out timeout
indicating that there is data available.  Then, how can 'read(3' can get stuck???

Furthermore, when I kill one of the stuck qmail-remotes, qmail tries to redeliver
the message, and either succedes or times out on remote host w/out problem.

ANY suggestions on what can be done to troubleshoot this problem are very much 
appreciated.

-- 
  Eugene Miretskiy [EMAIL PROTECTED]
  InVision.com, INC.  (631) 543-1000
  www.invision.net  /  www.longisland.com