> From:  Chris Garrigues <[EMAIL PROTECTED]>
> Date:  Fri, 31 Aug 2007 13:09:52 -0500
>
> > From:  Charlie Brady <[EMAIL PROTECTED]>
> > Date:  Fri, 31 Aug 2007 13:49:02 -0400 (EDT)
> >
> > 
> > On Fri, 31 Aug 2007, Chris Garrigues wrote:
> > 
> > >> From:  Chris Garrigues <[EMAIL PROTECTED]>
> > >> Date:  Wed, 29 Aug 2007 09:27:42 -0500
> > ...
> > >> Any idea what's going on here?  It requires a -9 to kill the processes.
> > ...
> > > and then it hangs forever and requires a -9.
> > 
> > Are you quite sure of that? What happens if you use a TERM or QUIT signal? 
> > Have you attached strace (or whatever syscall tracing tool is appropriate 
> > for your platform)?
> 
> I just confirmed it.  If I strace I just see that it's hanging on "read(0, ".
> 
> Since I hadn't diagnosed what was triggering it until just now, I didn't know 
> how to provide a test case.  Now that I do, I'll try to get a real strace.
> 
> > > Am I doing the wrong thing, is this a bug, or is there something odd 
> > > about my
> > > system?
> > 
> > Last time I looked the qpsmtpd timeout alarm only applied while parsing 
> > SMTP or while receiving messages, but not while plugins were executing. I 
> > haven't seen any discussion about possible fixes for that (but I haven't 
> > checked that it hasn't been fixed). That could explain qpsmtpd waiting 
> > forever, but wouldn't explain faulure to terminate on TERM and QUIT 
> > signals.
> 
> Note that it's no longer in my plugin at this point.

Okay, I did an strace while telnetting to the smtp port.  If I type "quit" 
after I get the 550, it does the right thing, but if I just let it sit there, 
it never times out.  According to the strace:

write(2, "21349 Plugin tarpit, hook deny r"..., 51) = 51
write(2, "21349 550 No such user as utterl"..., 60) = 60
write(1, "550 No such user as utterlybogus"..., 55) = 55
alarm(120)                              = 0
read(0, 0x8c43798, 4096)                = ? ERESTARTSYS (To be restarted)
--- SIGALRM (Alarm clock) @ 0 (0) ---
sigreturn()                             = ? (mask now [])
rt_sigprocmask(SIG_BLOCK, [ALRM], NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [ALRM], NULL, 8) = 0
read(0, 

The read following the alarm does indeed wait for 120 seconds, but then the 
alarm is blocked....

...and I've just concluded that the problem must be in a library of my own 
called by my code.  It's code that I wrote years ago...I'll have to figure out 
why I blocked the signals that I did when I did.

Thanks for helping me find the problem in my own code.

Chris

-- 
Chris Garrigues                         Trinsic Solutions
President                               710-B West 14th Street
                                        Austin, TX  78701-1798
http://www.trinsics.com/blog
http://www.trinsics.com                 512-322-0180

                 Would you rather proactively pay for
                uptime or reactively pay for downtime?

                          Trinsic Solutions
                Your Trusted Friends in Proactive IT.


Attachment: pgpZsDDAX5nTU.pgp
Description: PGP signature

Reply via email to