You may want to double check any soft_timeout()s.  If any of them go off,
there will be no further timeouts around and the code is expected to
properly check connection->aborted before doing anything.  A quick glance
doesn't show any such obvious things, but it is possible.  Temporarily
replacing all soft_timeout()s with hard_timeout()s would tell if this is
the problem, although it could break some things; haven't really looked at
the code... 

On Sun, 13 Apr 1997, Brian Moore wrote:

> Nope, it's a http transfer (but a large one).  Not sure what it is, though:
> it seems to be that alarm(0) is getting called [which in my looking at the
> code is a bad thing to do] somewhere.  The client request on this is on the
> other side of a packet filtering router, but at 10mbps, so it shouldn't be
> a client timeout.
> 
> Since it got through the flush-the-buffer stuff in saferead(), I think it's
> not the speed, just the dropped alarm.  See I printed out the value of
> alarms_blocked, which in theory should mean it's not blocked. :)  I've left
> a couple of these children running (there were three transferring files
> via http from www.cdrom.com.  The specific URLs (though I doubt it matters)
> GET http://www.cdrom.com/pub/quake/quakec/weapons/mini20.zip
> and 
> GET http://www.cdrom.com/pub/quake/quakec/weapons/pnc1_02b.zip
> I killed the third demon-child -- those two are still running.
> 
> Since we have about 50 machines on the far side of this router using illegal
> IP's, it may be hard to spot: they do a good amount of web/ftp access and
> it all runs through Apache, so it's a rare occurence.
> 
> When tracking down the missing unblocks, I did insert some code to
> whine... something like:
> 
>     alarm_save = alarm(0);
>     if (!alarm_save )
>         syslog(LOG_DAEMON | LOG_EMERG, "saferead, no alarm! %p", getpid());
>     else
>         alarm(alarm_save);
> 
> When that code was in place in 1.2b7 I did see a bunch of times saferead
> was getting called with no alarm, which shouldn't happen (though I will
> confess I don't know what alarm() returns if say 1/2 a second is remaining
> on Solaris).
> 
> I'll see if I can find who was downloading quake this week and if they did
> anything like abort it or anything.
> 
> On Sun, 13 Apr 1997, Chuck Murcko wrote:
> 
> > Thanks for the report, Brian. It looks like a large file transfer is
> > indeed punching through a soft timeout. I assume these are FTP
> > transfers? I can duplicate your environment, so I should see the problem
> > when I test for it.
> > 
> > Brian Moore wrote:
> > > 
> > > >Number:         374
> > > >Category:       mod_proxy
> > > >Synopsis:       mod_proxy(?) seems to alarm(0) somewhere
> > > >Confidential:   no
> > > >Severity:       serious
> > > >Priority:       medium
> > > >Responsible:    apache (Apache HTTP Project)
> > > >State:          open
> > > >Class:          sw-bug
> > > >Submitter-Id:   apache
> > > >Arrival-Date:   Sun Apr 13 12:00:01 1997
> > > >Originator:     [EMAIL PROTECTED]
> > > >Organization:
> > > apache
> > > >Release:        1.2b8
> > > >Environment:
> > > Solaris 2.5, all recommended patches, gcc 2.7.2
> > > >Description:
> > > Looks like there's one other problem in mod_proxy with alarms being 
> > > turned off
> > > (not blocked via the block_alarms() call, but alarm(0)'d for some 
> > > reason).  I'm
> > > guessing on the module involved, since the three dead children this 
> > > morning
> > > were all doing proxy stuff.
> > > 
> > > The backtrace of a child that's been waiting for 110k seconds:
> > > #0  0xef67792c in _read ()
> > > #1  0x29364 in saferead ()
> > > #2  0x29480 in bread ()
> > > #3  0x488b0 in proxy_send_fb ()
> > > #4  0x47e78 in proxy_http_handler ()
> > > #5  0x432c0 in proxy_handler ()
> > > #6  0x1f040 in invoke_handler ()
> > > #7  0x21dc0 in process_request_internal ()
> > > #8  0x21df4 in process_request ()
> > > #9  0x1bf30 in child_main ()
> > > #10 0x1c0cc in make_child ()
> > > #11 0x1c8c8 in standalone_main ()
> > > #12 0x1cb88 in main ()
> > > (gdb) up
> > > #1  0x29364 in saferead ()
> > > (gdb) print alarms_blocked
> > > $1 = 0
> > > 
> > > So this seems to be something calling alarm(0) somewhere instead of a 
> > > 'logical'
> > > alarms-off via the official mechanism.
> > > 
> > > >How-To-Repeat:
> > > Not sure: virtually all of our proxy users are on a 10Mbps ethernet but 
> > > behind
> > > a firewall.  This usage may or may not be relevant.  The children I found 
> > > dead
> > > this morning were fetching files from cdrom.com via http, so it should be 
> > > normal
> > > the only odd thing is that these were quake files so they were no doubt 
> > > huge.
> > > >Fix:
> > > Will be looking at the code myself this week
> > > >Audit-Trail:
> > > >Unformatted:
> > 
> > -- 
> > chuck
> > Chuck Murcko
> > The Topsail Group, West Chester PA USA
> > [EMAIL PROTECTED]
> > 
> > 
> 

Reply via email to