Here's a good explanation about what procps is doing: http://lkml.org/lkml/2002/2/18/187 However, the problem I'm seeing is not due to an overflow.
Just after a boot: adsdebian:~# ps >/dev/null Unknown HZ value! (67) Assume 100. adsdebian:~# cat /proc/uptime /proc/stat 92.52 64.55 cpu 1336 0 1458 3773 2681 4 0 0 If I sum all the cpu values and divide by uptime, I get 100, every time. Meanwhile, procps warns about unknown hz values that are trending toward 100 as the uptime increases. After enough uptime, the problem disappears. adsdebian:~# ps >/dev/null Unknown HZ value! (89) Assume 100. adsdebian:~# cat /proc/uptime /proc/stat 271.05 242.41 cpu 1367 0 1494 21521 2716 6 1 0 adsdebian:~# ps >/dev/null Unknown HZ value! (91) Assume 100. adsdebian:~# cat /proc/uptime /proc/stat 336.21 307.28 cpu 1380 0 1510 27984 2740 6 1 0 adsdebian:/tmp# ps >/dev/null adsdebian:/tmp# cat /proc/uptime /proc/stat 1195.29 1155.56 cpu 2319 0 1651 109945 5596 11 7 0 Now, looking at the code: sscanf(buf, "cpu %Lu %Lu %Lu %Lu", &user_j, &nice_j, &sys_j, &other_j); Why are only 4 of the numbers extracted? All of them seem to be needed. Especially on slow and disk-bound systems, the current code only succeeds in getting a number between 95 and 105 some time after boot, when the time the system has spent in sys+user+idle mode swamps the iowait+irq+softirq+steal numbers. /proc/stat kernel/system statistics. Varies with architecture. Common entries include: cpu 3357 0 4313 1362393 The amount of time, measured in units of USER_HZ (1/100ths of a second on most architectures), that the system spent in user mode, user mode with low priority (nice), system mode, and the idle task, respectively. The last value should be USER_HZ times the second entry in the uptime pseudo-file. In Linux 2.6 this line includes three additional columns: iowait - time waiting for I/O to complete (since 2.5.41); irq - time servicing interrupts (since 2.6.0-test4); softirq - time servicing softirqs (since 2.6.0-test4). Since Linux 2.6.11, there is an eighth column, steal - stolen time, which is the time spent in other operating systems when running in a virtualized environment Based on this, it seems right to add up all of the values if all are available. (For values of "right" that assume this gross approach is the right way to get the Hz value in the first place..) With that said, on my laptop, I have: 2677820.60 1205073.58 cpu 17764500 386487 3214308 117025796 3022994 318693 296809 0 0 Using the first 4 numbers yeilds 52, while adding all yeilds 53, which would be an unknown Hz value with the current code. -- see shy jo
signature.asc
Description: Digital signature