DO NOT REPLY [Bug 10266] - apache hangs after some hours of running

bugzilla 15 Aug 2002 15:13:08 -0000

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=10266>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.


http://nagoya.apache.org/bugzilla/show_bug.cgi?id=10266

apache hangs after some hours of running





------- Additional Comments From [EMAIL PROTECTED]  2002-08-15 15:13 -------
Another clue to the problem:

After I split the system into 5 apaches running, we have had no hangs for 2 
weeks. Then, 
saturday (as I previously reported) we had a single hang of one of the servers. 
 No 
problems the rest of the weekend, then this last monday morning all he** broke 
loose.

Monday Morning saw the system hang, and then go nuts basically. We would kill 
and 
restart, and it would hang within 15 seconds, 15 minutes or about 1 hour - 
depending on 
the restart (e.g., throughout the day it would hang, the longest it wouldn't 
hang was 1 
hour).  Not only would the apache we had suspected of hanging hang, but also 
the other 
that we split, that had not hung for 2 weeks.

By Monday night (10:30 PM Hawaii time, so pretty late) it was back to not 
hanging.

So what happened monday?  I posted a URL to slashdot.org column and we got a 
lot of 
hits because of it. I suspect GREATLY that the number of hits contributed to 
the hang. 
The INTRESTING thing is that of the two apaches that were hanging, one had the 
domain being slashdotted in it, the other didnt (btw, the server was able to 
server up the 
pages with no problem... it was just a lot of hits but no major load or 
anything).

So... this leads me to believe that the problem is related to traffic.  It is 
possible that it is 
related to the restarting of a child after a maximum number of hits.

I also discovered that my earlier reports were untrue... in this regard:

1)  I reported earlier that HUPing the hung server did no good. This is not 
true. HUPping 
it appears to work. It takes up to a minute to free up - and sometimes requires 
a second 
HUP before it frees up. ONLY OCCASSIONALLY after two hup attempts would it not 
free 
up and we had to kill/restart.

2) I reported eariler that my remote alarms that try to sense it would also 
hang on the 
open and not recover. While this is true, it was due to me using SIGNAL() 
instead of 
SIGACTION() (signal() automatically sets the restart flag to tell the socket 
commands to 
retry the operation after the signal... thus it *appeared* to be always stuck). 
 So... I am 
able to sense it (we've since written a program to sense it on the server and 
automatically 
rehup or restart the server depending on what it sees).


So... all of this leads me to believe it's the rollover (restart) of the 
children.  Note that if one 
hangs, all other virtuals on the same server also hang (e.g., no virtual 
assigned to the 
stuck server will respond until we rehup it).

The only other possibility, I think, is some type of exploit that hangs apache 
in this way... 
but I think that is remote.

One last thing...  when we were having the problems on Monday, I tried to roll 
back 
apache to version httpd-2.0.36  -- but the same problem occured, so I brought 
the version 
back to httpd-2.0.40.  (So this problem appears to be in all versions SINCE and 
INCLUDING 2.0.36).

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

DO NOT REPLY [Bug 10266] - apache hangs after some hours of running

Reply via email to