when I came in to work today, top on daedalus showed all three load
averages above 27. vmstat -w 5 showed that the number of processes in
the run queue was jumping up to over 350 fairly frequently, and there's
some evidence of spikes in CPU usage. So I bounced us back to httpd
2_0_28, and the bad behavior has definately gotten better.
Either we're doing something to make more processes run-able at times,
or httpd related processes are staying in the run queue longer than they
used to sometimes. I think it's the latter. For one thing, the idle
CPU drops down to zero periodically, which I've never seen with 2_0_28.
Jeff & I discussed how to troubleshoot this baby. Some thoughts:
* compare trusses and look for abnormalities
* scrutinize the error logs
* calculate and log how much CPU time each request takes, the thought
being that some requests are burning a lot more CPU than they used to.
Any other ideas?
If this isn't a showstopper, it's close.
Greg
-----------------------------------------------------------------------------
(with 2.0.30-dev)
[gregames@daedalus gregames]$ top
last pid: 3248; load averages: 39.32, 32.01, 27.17 up 4+16:31:26
06:49:45
552 processes: 8 running, 541 sleeping, 3 zombie
CPU states: 18.1% user, 0.0% nice, 23.8% system, 4.2% interrupt, 53.8%
idle
[...]
[gregames@daedalus gregames]$ top
last pid: 3517; load averages: 14.83, 26.29, 25.36 up 4+16:32:21
06:50:40
467 processes: 255 running, 210 sleeping, 2 zombie
CPU states: 0.8% user, 0.0% nice, 11.6% system, 2.5% interrupt, 85.1%
idle
[...]
[gregames@daedalus gregames]$ vmstat -w 5
(first column is the size of the run queue; last column is idle CPU)
procs memory page disks faults
cpu
r b w avm fre flt re pi po fr sr da0 da1 in sy cs us
sy id
21 3 0 360208 83460 309 2 1 0 329 64 0 0 675 2543 1480
4 5 90
65 3 0 355192 84776 564 2 1 0 618 0 49 7 1238 4590 7486 9
19 73
72 3 0 350932 84324 594 6 2 0 607 0 75 10 1392 5040 8431 10
20 70
53 4 0 347908 83620 682 4 2 0 660 0 53 8 1280 5072 8647 10
20 69
360 3 0 346000 82164 1336 3 4 0 1162 0 80 11 1415 8164 8891
26 27 47
19 4 0 346792 80344 606 1 2 0 559 0 14 10 1283 4140 8060 11
19 71
18 3 0 337024 79208 498 9 2 0 596 0 15 21 1598 4979 6531 11
20 70
12 4 0 343628 71740 818 4 2 0 546 0 63 13 1449 5099 6503 9
18 73
26 3 0 352104 65332 1322 5 3 0 930 0 66 14 1340 8979 7786 11
25 64
25 5 0 387428 46428 5383 4 2 0 4397 0 16 9 1876 27473 15659
29 71 0
40 6 0 404196 36828 5136 2 2 0 3927 0 46 12 2050 25730 15231
28 72 0
26 3 0 390944 67624 3763 3 3 0 3513 1365 53 12 2042 21203
15966 22 69 9
388 6 0 354580 82856 572 2 2 0 1325 0 28 6 1305 7314
10551 9 28 62
356 4 0 341624 86328 1247 2 1 0 1369 0 70 8 1225 4977 6641
20 20 60
12 3 0 338044 86392 384 8 3 0 384 0 1 10 1028 3300 6861 14
15 71
341 3 0 336016 85280 24 9 3 0 48 0 0 8 1010 2983 6120
4 14 83
18 3 0 327908 86340 660 5 3 0 696 0 11 9 1112 3515 6304 13
16 71
320 3 0 322972 85720 522 2 2 0 512 0 3 14 1083 3043 5146
14 12 74
8 3 0 323028 83276 255 3 2 0 211 0 12 11 1112 2712 4367 4
11 85
-----------------------------------------------------------------------------
(with 2_0_28)
[gregames@daedalus gregames]$ top
last pid: 43953; load averages: 0.48, 0.62, 0.81 up 4+21:09:23
11:27:42
258 processes: 1 running, 257 sleeping
CPU states: 2.8% user, 0.0% nice, 5.1% system, 2.4% interrupt, 89.8%
idle
[gregames@daedalus gregames]$ vmstat -w 5
procs memory page disks faults
cpu
r b w avm fre flt re pi po fr sr da0 da1 in sy cs us
sy id
8 3 0 347520 56464 324 2 1 0 342 65 0 0 693 2645 1499
4 6 90
63 3 0 341884 57496 356 6 0 0 403 0 4 8 754 2451 1372
8 5 87
19 3 0 341808 56672 386 0 1 0 353 0 7 10 886 2829 1273
8 7 86
35 3 0 351424 53896 219 3 2 0 122 0 0 14 1023 2481 1033
4 8 88
7 3 0 351504 52840 31 1 1 0 1 0 1 4 916 2326 1180
2 5 93
68 3 0 431392 39536 4274 1 2 0 3315 1363 61 5 1504 19223 4338
21 46 34
16 3 0 382092 62120 681 1 3 0 1806 0 10 4 1422 8630 2455
5 22 73
13 3 0 371612 66116 700 9 6 0 822 0 11 9 988 4534 1637 10
10 80
45 3 0 360188 67256 449 2 3 0 480 0 20 5 1024 4424 1347
6 9 85
11 3 0 357000 66488 1809 1 1 0 1664 0 7 8 864 18482 1276
4 17 79
16 3 0 356388 65428 301 1 1 0 276 0 9 8 1140 3559 1490
10 9 81
15 3 0 353324 62920 139 1 1 0 138 0 36 11 1069 3355 1545
4 6 89
9 4 0 351680 60328 665 3 1 0 550 0 25 14 1123 4700 1474 12
10 79
35 3 0 350108 59352 930 3 1 0 865 0 27 7 1047 7593 1224 10
10 81
65 3 0 355612 57188 907 1 1 0 762 0 52 6 1018 4586 1515 10
12 78
57 3 0 353360 56660 445 0 1 0 451 0 33 7 909 3744 1436
7 9 84
23 3 0 355708 53164 379 2 1 0 256 0 45 6 946 3333 1487
6 7 88
10 4 0 369888 51636 586 2 1 0 535 0 54 6 1066 4817 1174 7
11 82
30 3 0 369504 49556 258 4 1 0 212 0 20 8 920 3304 1189
5 7 88
-------- Original Message --------
Subject: upgrade to FreeBSD 4.5-PRERELEASE
Date: Fri, 28 Dec 2001 14:49:56 -0800 (PST)
From: Brian Behlendorf <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
I've upgraded icarus and daedalus to the most recent cut of the
"stable" branch of FreeBSD, which is currently named "4.5-PRERELEASE" as
the 4.5 release is imminent. There have been lots of improvements in
performance and stability with this release, and it's good to keep
current
anyways.
I've noticed, btw, an occurance of load spiking on daedalus in the last
week - where the load jumps up to 30 or so for a few minutes then back
down. I get a page whenever the 10-minute load average is above 8, and
when I get that page I also get a quick "top" output, but by the time I
get that notification there's no clear process causing that load. So
I've
been getting 10-20 pages per day on my phone due to the load, without a
way to tell what's been causing it. The only thing I can think of that
has changed significantly over the last week was a newer httpd being
installed. Has anyone else seen this from recent httpd 2.0 releases?
Brian