On Sat, Sep 12, 2009 at 1:47 AM, Julian Elischer <jul...@elischer.org> wrote: > ok now we need to describe the hang.. if you can predictably get a hang > every 7 seconds does this mean that it doesn't respond to keyboard for a > moment every 7 seconds?
It's possible. > or that it doesn't accept packets every 7 seconds? It appears that it accepts & responds to at least pings; I was able to do an every-0.1-seconds ping through a bevy of 300-1900ms stalls with: 2323 packets transmitted, 2323 packets received, 0% packet loss round-trip min/avg/max/stddev = 0.120/1.019/5.979/0.288 ms As best as I could tell, schedgraph also showed that the clock interrupt and the em0 interrupt always got serviced on time. Pretty much seems like its userspace that's getting put on hold. > Or is it just the apache process that hangs? This is where I started from. In the original post (way long ago now), I described how pretty much every process on the system went into the kernel for something and stalled there, and then when the stall ends, they all unblock at once. I posted some examples via ktrace that I sadly no longer have the source data for. > Does the watching process that you refer to below also hang? I don't think I can say for sure. I observe visual stalls from time to time in the output if I have it show every request where there is no stall shown, which could either indicate that a stall occurred outside the request or that my shoddy Internet connection has 100ms latency and consistent 1% packet loss, which it does. I did write a short C program that just select()s on stdin for 100ms over and over and aborts if it takes more than 125ms to go through the loop; it never aborts, even through 1s+ stalls and the loop times it reports are consistently 110ms regardless of what else is going on, which I don't think is unexpected. However, I'm not sure why that differs from the behavior of the "master" Apache processes, which select() for 1 second all day long, but do appear to be affected. Maybe because they are selecting a network socket instead of a tty? I don't know. Also, if I disable NTP, the system does not appear to lose time during the stalls, which fits with the consistent clock interrupts I saw. > would it hang if it tried to access the disk? By using the md device, I believe I have removed the disk from the equation; neither process is accessing it. Even without doing that, if I leave iostat -w 1 running alongside the test, there's no correlation between the tiny amount of disk activity there is and observed stalls. > if the watching process is on the same machine, does it only trigger AFTER > teh request has taken a ling time or could it time out with a select DURING > the delayed response? (another way of asking "how hung > is 'hung'?" It's just a PHP script using libcurl to request the file. I only moved it to the same machine in order to have it be able to write the sysctl to stop the KTR traces I was doing. If you're asking could the check script be modified to time out after, say, 1 second, and if so, would it return during the hang or after it? I don't know. My guess based on the earlier ktrace output is that it would time out, but not return until the hang ended. I'll see if I the curl lib exposes a configurable timeout and try it. _______________________________________________ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"