On 28 Sep, Chip Camden wrote: > Quoth Don Lewis on Monday, 27 September 2010: >> CPU time accounting is broken on one of my machines running 8-STABLE. I >> ran a test with a simple program that just loops and consumes CPU time: >> >> % time ./a.out >> 94.544u 0.000s 19:14.10 8.1% 62+2054k 0+0io 0pf+0w >> >> The display in top shows the process with WCPU at 100%, but TIME >> increments very slowly. >> >> Several hours after booting, I got a bunch of "calcru: runtime went >> backwards" messages, but they stopped right away and never appeared >> again. >> >> Aug 23 13:40:07 scratch ntpd[1159]: ntpd 4.2.4p5-a (1) >> Aug 23 13:43:18 scratch ntpd[1160]: kernel time sync status change 2001 >> Aug 23 18:05:57 scratch dbus-daemon: [system] Reloaded configuration >> Aug 23 18:06:16 scratch dbus-daemon: [system] Reloaded configuration >> Aug 23 18:12:40 scratch ntpd[1160]: time reset +18.059948 s >> [snip] >> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from >> 6836685136 usec to 5425839798 usec for pid 1526 (csh) >> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 4747 >> usec to 2403 usec for pid 1519 (csh) >> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 5265 >> usec to 2594 usec for pid 1494 (hald-addon-mouse-sy) >> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 7818 >> usec to 3734 usec for pid 1488 (console-kit-daemon) >> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 977 usec >> to 459 usec for pid 1480 (getty) >> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 958 usec >> to 450 usec for pid 1479 (getty) >> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 957 usec >> to 449 usec for pid 1478 (getty) >> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 952 usec >> to 447 usec for pid 1477 (getty) >> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 959 usec >> to 450 usec for pid 1476 (getty) >> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 975 usec >> to 458 usec for pid 1475 (getty) >> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 1026 >> usec to 482 usec for pid 1474 (getty) >> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 1333 >> usec to 626 usec for pid 1473 (getty) >> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 2469 >> usec to 1160 usec for pid 1440 (inetd) >> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 719 usec >> to 690 usec for pid 1402 (sshd) >> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 120486 >> usec to 56770 usec for pid 1360 (cupsd) >> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 6204 >> usec to 2914 usec for pid 1289 (dbus-daemon) >> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 179 usec >> to 84 usec for pid 1265 (moused) >> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 22156 >> usec to 10407 usec for pid 1041 (nfsd) >> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 1292 >> usec to 607 usec for pid 1032 (mountd) >> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 8801 >> usec to 4134 usec for pid 664 (devd) >> Aug 23 23:49:06 scratch kernel: calcru: runtime went backwards from 19 usec >> to 9 usec for pid 9 (sctp_iterator) >> >> >> If I reboot and run the test again, the CPU time accounting seems to be >> working correctly. >> % time ./a.out >> 1144.226u 0.000s 19:06.62 99.7% 5+168k 0+0io 0pf+0w >> > <snip> > > I notice that before the calcru messages, ntpd reset the clock by > 18 seconds -- that probably accounts for that.
Interesting observation. Since this happened so early in the log, I thought that this time change was the initial time change after boot, but taking a closer look, the time change occurred about 4 1/2 hours after boot. The calcru messages occured another 5 1/2 hours after that. I also just noticed that this log info was from the August 23rd kernel, before I noticed the CPU time accounting problem, and not the latest occurance. Here's the latest log info: Sep 23 16:33:50 scratch ntpd[1144]: ntpd 4.2.4p5-a (1) Sep 23 16:37:03 scratch ntpd[1145]: kernel time sync status change 2001 Sep 23 17:43:47 scratch ntpd[1145]: time reset +276.133928 s Sep 23 17:43:47 scratch ntpd[1145]: kernel time sync status change 6001 Sep 23 17:47:15 scratch ntpd[1145]: kernel time sync status change 2001 Sep 23 19:02:48 scratch ntpd[1145]: time reset +291.507262 s Sep 23 19:02:48 scratch ntpd[1145]: kernel time sync status change 6001 Sep 23 19:06:37 scratch ntpd[1145]: kernel time sync status change 2001 Sep 24 00:03:36 scratch kernel: calcru: runtime went backwards from 1120690857 u sec to 367348485 usec for pid 1518 (csh) Sep 24 00:03:36 scratch kernel: calcru: runtime went backwards from 5403 usec to 466 usec for pid 1477 (hald-addon-mouse-sy) Sep 24 00:03:36 scratch kernel: calcru: runtime went backwards from 7511 usec to 1502 usec for pid 1472 (hald-runner) Sep 24 00:03:36 scratch kernel: calcru: runtime went backwards from 17323 usec t o 12470 usec for pid 1472 (hald-runner) [snip] The time jumps are even larger. There is still the large interval between the last ntp message and the first calcru message. My time source is another FreeBSD box with a GPS receiver on my LAN. My other client machine isn't seeing these time jumps. The only messages from ntp in its log from this period are these: Sep 23 04:12:23 mousie ntpd[1111]: kernel time sync status change 6001 Sep 23 04:29:29 mousie ntpd[1111]: kernel time sync status change 2001 Sep 24 03:55:24 mousie ntpd[1111]: kernel time sync status change 6001 Sep 24 04:12:28 mousie ntpd[1111]: kernel time sync status change 2001 > I don't know if that has any connection to time(1) running slower -- but > perhaps ntpd is aggressively adjusting your clock? It seems to be pretty stable when the machine is idle: % ntpq -c pe remote refid st t when poll reach delay offset jitter ============================================================================== *gw.catspoiler.o .GPS. 1 u 8 64 377 0.168 -0.081 0.007 Not too much degradation under CPU load: % ntpq -c pe remote refid st t when poll reach delay offset jitter ============================================================================== *gw.catspoiler.o .GPS. 1 u 40 64 377 0.166 -0.156 0.026 I/O (dd if=/dev/ad6 of=/dev/null bs=512) doesn't appear to bother it much, either. % ntpq -c pe remote refid st t when poll reach delay offset jitter ============================================================================== *gw.catspoiler.o .GPS. 1 u 35 64 377 0.169 -0.106 0.009 _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"