Re: ps time field jumps backward
On Fri, Feb 05, 2010 at 05:45:01PM -0500, Linda Messerschmidt wrote: On Fri, Feb 5, 2010 at 4:28 PM, Dan Nelson dnel...@allantgroup.com wrote: Ideally, top and ps would total up all the per-thread CPU counts when displaying the per-process numbers, but it doesn't seem to. It does seem to total them: $ ps axHo pid,lwp,time,wchan,comm | awk '$1 == 1647' 1647 100401 0:00.63 select mysqld 1647 100466 0:11.08 sigwai mysqld 1647 100521 0:00.00 ucond mysqld $ ps axo pid,lwp,time,wchan,comm | awk '$1 == 1647' 1647 100521 0:11.71 ucond mysqld But you put me on the right track. I ran both side by side for awhile, and found that ps/top only sums up those threads that haven't already exited. I.e., once a thread has exited, it's as if its usage never happened from the perspective of ps and top's total calculation. That seems like undesirable behavior, particularly if it conceals CPU-churning behavior by short-lived threads, but possibly very hard to change. :( I wonder if the system accounting records are more accurate? It should not be very hard to fix the time field, because the rusage struct is correct. This can be seen in the ^T status line and, after the process has terminated, time, times, ps S. In the kernel code, sys/kern/kern_proc.c, it seems that fill_kinfo_proc_only() puts in the correct ki_runtime using p-p_rux.rux_runtime, but fill_kinfo_aggregate() later overwrites this using the sum of all threads' td-td_runtime. Removing the bogus calculation from fill_kinfo_aggregate() fixes ps(1)'s time field, but not the %cpu field, nor anything in top(1). The latter is because top(1) always requests information about threads and does the same wrong calculation as fill_kinfo_aggregate() internally. Fixing the %cpu field needs changes to the scheduler. The schedulers have functions to propagate CPU usage from terminating child processes and threads, but this seems to affect scheduling decisions only and not the %cpu field. Note that the CPU usage is always charged to the most recently created thread in the process, not necessarily the thread that called or will call fork(), pthread_create(), waitpid() or pthread_join(). If the thread charged to could be selected better, it could be useful to add in the cpu time as well. Index: sys/kern/kern_proc.c === --- sys/kern/kern_proc.c(revision 203549) +++ sys/kern/kern_proc.c(working copy) @@ -676,11 +676,9 @@ kp-ki_estcpu = 0; kp-ki_pctcpu = 0; - kp-ki_runtime = 0; FOREACH_THREAD_IN_PROC(p, td) { thread_lock(td); kp-ki_pctcpu += sched_pctcpu(td); - kp-ki_runtime += cputick2usec(td-td_runtime); kp-ki_estcpu += td-td_estcpu; thread_unlock(td); } Test program that starts threads that waste about 1 second of cpu time each. #include pthread.h #include stdio.h #include unistd.h #include err.h void * threadfunc(void *unused) { int error; error = pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, NULL); if (error) errc(1, error, pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS)); for (;;) ; } int main(int argc, char *argv[]) { int error; pthread_t td; for (;;) { error = pthread_create(td, NULL, threadfunc, NULL); if (error != 0) errc(1, error, pthread_create); sleep(1); error = pthread_cancel(td); if (error != 0) errc(1, error, pthread_cancel); error = pthread_join(td, NULL); if (error != 0) errc(1, error, pthread_join); } return 0; } -- Jilles Tjoelker ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
ps time field jumps backward
Hi all, For most of 7.2, on up to a 7.3-PRERELEASE built yesterday, I've noticed that the time field reported by ps and top jumps around for some processes. I've particularly noticed it with MySQL. Here are some repeated ps results (ps axo pid,time,wchan,comm) for the same process over a few minutes: 1647 0:08.06 ucond mysqld 1647 0:08.06 ucond mysqld 1647 0:08.06 ucond mysqld 1647 0:08.06 ucond mysqld 1647 0:08.07 ucond mysqld 1647 0:10.24 wdrain mysqld 1647 0:08.08 ucond mysqld 1647 0:09.25 - mysqld 1647 0:08.08 ucond mysqld 1647 0:08.91 - mysqld 1647 0:09.13 wdrain mysqld 1647 0:09.76 wdrain mysqld 1647 0:08.10 ucond mysqld 1647 0:08.11 ucond mysqld 1647 0:08.12 ucond mysqld 1647 0:08.12 ucond mysqld 1647 0:08.12 ucond mysqld 1647 0:08.12 ucond mysqld 1647 0:08.38 wdrain mysqld 1647 0:08.13 ucond mysqld 1647 0:09.47 - mysqld 1647 0:08.13 ucond mysqld 1647 0:08.13 ucond mysqld It's like when it starts working it builds up time, but as soon as it goes back to ucond, it resets back to the earlier value. This particular machine is an 8-core amd64, but I've seen it on quad-core i386 as well. Does anybody have an idea what's going on here? Thanks for any suggestions! ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ps time field jumps backward
In the last episode (Feb 05), Linda Messerschmidt said: For most of 7.2, on up to a 7.3-PRERELEASE built yesterday, I've noticed that the time field reported by ps and top jumps around for some processes. I've particularly noticed it with MySQL. Here are some repeated ps results (ps axo pid,time,wchan,comm) for the same process over a few minutes: 1647 0:08.06 ucond mysqld 1647 0:08.06 ucond mysqld 1647 0:08.06 ucond mysqld 1647 0:08.06 ucond mysqld 1647 0:08.07 ucond mysqld 1647 0:10.24 wdrain mysqld 1647 0:08.08 ucond mysqld 1647 0:09.25 - mysqld It's like when it starts working it builds up time, but as soon as it goes back to ucond, it resets back to the earlier value. By default, ps only lists one thread when listing threaded processes, and if it's anything like top, the CPU column seems to be randomly picked from one thread. Try running ps axHo pid,lwp,time,wchan,comm instead (add H and lwp). That will print the individual threads, plus the thread id so you can track them across runs. You should see one thread with a CPU value of 8.08 (or so), and other values for the rest. Ideally, top and ps would total up all the per-thread CPU counts when displaying the per-process numbers, but it doesn't seem to. -- Dan Nelson dnel...@allantgroup.com ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ps time field jumps backward
On Fri, Feb 5, 2010 at 4:28 PM, Dan Nelson dnel...@allantgroup.com wrote: Ideally, top and ps would total up all the per-thread CPU counts when displaying the per-process numbers, but it doesn't seem to. It does seem to total them: $ ps axHo pid,lwp,time,wchan,comm | awk '$1 == 1647' 1647 100401 0:00.63 select mysqld 1647 100466 0:11.08 sigwai mysqld 1647 100521 0:00.00 ucond mysqld $ ps axo pid,lwp,time,wchan,comm | awk '$1 == 1647' 1647 100521 0:11.71 ucond mysqld But you put me on the right track. I ran both side by side for awhile, and found that ps/top only sums up those threads that haven't already exited. I.e., once a thread has exited, it's as if its usage never happened from the perspective of ps and top's total calculation. That seems like undesirable behavior, particularly if it conceals CPU-churning behavior by short-lived threads, but possibly very hard to change. :( I wonder if the system accounting records are more accurate? Thanks for pointing me in the right direction! ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org