DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://nagoya.apache.org/bugzilla/show_bug.cgi?id=10266>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=10266 apache hangs after some hours of running ------- Additional Comments From [EMAIL PROTECTED] 2002-08-15 15:13 ------- Another clue to the problem: After I split the system into 5 apaches running, we have had no hangs for 2 weeks. Then, saturday (as I previously reported) we had a single hang of one of the servers. No problems the rest of the weekend, then this last monday morning all he** broke loose. Monday Morning saw the system hang, and then go nuts basically. We would kill and restart, and it would hang within 15 seconds, 15 minutes or about 1 hour - depending on the restart (e.g., throughout the day it would hang, the longest it wouldn't hang was 1 hour). Not only would the apache we had suspected of hanging hang, but also the other that we split, that had not hung for 2 weeks. By Monday night (10:30 PM Hawaii time, so pretty late) it was back to not hanging. So what happened monday? I posted a URL to slashdot.org column and we got a lot of hits because of it. I suspect GREATLY that the number of hits contributed to the hang. The INTRESTING thing is that of the two apaches that were hanging, one had the domain being slashdotted in it, the other didnt (btw, the server was able to server up the pages with no problem... it was just a lot of hits but no major load or anything). So... this leads me to believe that the problem is related to traffic. It is possible that it is related to the restarting of a child after a maximum number of hits. I also discovered that my earlier reports were untrue... in this regard: 1) I reported earlier that HUPing the hung server did no good. This is not true. HUPping it appears to work. It takes up to a minute to free up - and sometimes requires a second HUP before it frees up. ONLY OCCASSIONALLY after two hup attempts would it not free up and we had to kill/restart. 2) I reported eariler that my remote alarms that try to sense it would also hang on the open and not recover. While this is true, it was due to me using SIGNAL() instead of SIGACTION() (signal() automatically sets the restart flag to tell the socket commands to retry the operation after the signal... thus it *appeared* to be always stuck). So... I am able to sense it (we've since written a program to sense it on the server and automatically rehup or restart the server depending on what it sees). So... all of this leads me to believe it's the rollover (restart) of the children. Note that if one hangs, all other virtuals on the same server also hang (e.g., no virtual assigned to the stuck server will respond until we rehup it). The only other possibility, I think, is some type of exploit that hangs apache in this way... but I think that is remote. One last thing... when we were having the problems on Monday, I tried to roll back apache to version httpd-2.0.36 -- but the same problem occured, so I brought the version back to httpd-2.0.40. (So this problem appears to be in all versions SINCE and INCLUDING 2.0.36). --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]