Hi, Jean Thank you for your explaination and your helpful advices :) It seems you're right. After reading your guess, it reminded me that something I've read from a book, Solairs Performance and Tools, and the author talked about the decaying problem in this book.
2007/6/18, Jean-Francois Richard <[EMAIL PROTECTED]>:
Hi I don't know enough about the specifics for Open Solaris so the following is based on the guess that it is the same as regular Solaris (the fact that your vmstat r queue is at values like 7 and that idle CPU is at 0% but prstat shows only about 4% CPU makes this feel like a good guess) . Historically, Solaris used the "pcpu" parameter to display CPU against individual processes in prstat (or as part of ps -ef with the -o option). As pcpu is based on a slowly decaying average (equivalent to exponential weighting over the last minute) it deals poorly with processes with short lives such as the one you have (your example from your snap shot below shows that the CPU times are all very low - suggesting processes which don't stay up for long). Typically, changing the prstat refresh time doesn't change anything because the %cpu it is displaying remains the decaying average over the last minute. Alternatives could be prstat with the -m option if it is supported. The TOP tool
yes, -m option works partially. It shows the whole system's user and system CPU percentage correctly, but to each process, they all look idle as well, and no short lives are displayed either. changed how it calculates per process CPU% for Solaris (it stopped
displaying the pcpu parameter and started to calculate %cpu based on the increases in cpu time) as of version 3.6, so you might want to give that a try. Because TOP calculates CPU% of the processes based on CPU time, going to a shorter refresh rate actually does help to improve accuracy for short processes... but it can't go below 1 second. Many short lived processes live less than one second. I don't know
TOP doesn't work correctly, it ignores all the short lives as well as the prstat. But the whole system's user and system CPU percentage are correct in TOP. Dtrace that well but look forward to having a good tools to deal with this
type of situation.
Yes, DTrace works! I've used the dtrace script named shortlived.d introduced by ,Matty in his email and the result is: [EMAIL PROTECTED]:~/performance/DTraceToolkit-0.96/Bin# ./shortlived.d Tracing... Hit Ctrl-C to stop. ^C short lived processes: 15.941 secs total sample duration: 20.880 secs Total time by process name, mkdir 26 ms mv 36 ms lint2 37 ms date 40 ms lint 642 ms sh 678 ms dmake 1422 ms lint1 12690 ms Total time by PPID, 26628 5 ms 26637 5 ms 26643 5 ms 26659 5 ms 26665 5 ms 26682 5 ms 26691 5 ms 26697 5 ms 26715 5 ms 26721 5 ms 26739 5 ms 26748 5 ms 26754 6 ms 26653 7 ms 26678 7 ms 26699 7 ms 26709 7 ms 26723 7 ms 26727 7 ms 26733 7 ms 24185 8 ms 26621 8 ms 26624 8 ms 26675 8 ms 26703 8 ms 26768 8 ms 26780 8 ms 26784 8 ms 26788 8 ms 26792 8 ms 26796 8 ms 26800 8 ms 26804 8 ms 26808 8 ms 26812 8 ms 26820 8 ms 26828 8 ms 26840 8 ms 26844 8 ms 26860 8 ms 26872 8 ms 26756 9 ms 26757 9 ms 26764 9 ms 26776 9 ms 26816 9 ms 26824 9 ms 26832 9 ms 26836 9 ms 26848 9 ms 26852 9 ms 26856 9 ms 26864 9 ms 26873 9 ms 26772 10 ms 26845 10 ms 26861 10 ms 26667 11 ms 26765 11 ms 26837 11 ms 26865 11 ms 26613 12 ms 26645 12 ms 26773 12 ms 26777 12 ms 26797 12 ms 26801 12 ms 26825 12 ms 26833 12 ms 26841 12 ms 26607 13 ms 26648 13 ms 26670 13 ms 26758 13 ms 26759 13 ms 26769 13 ms 26785 13 ms 26789 13 ms 26793 13 ms 26805 13 ms 26809 13 ms 26813 13 ms 26817 13 ms 26821 13 ms 26829 13 ms 26853 13 ms 26616 14 ms 26700 14 ms 26704 14 ms 26728 14 ms 26781 14 ms 26849 14 ms 26857 14 ms 26610 15 ms 26674 15 ms 26724 15 ms 26652 16 ms 26708 23 ms 26732 23 ms 26737 24 ms 26874 33 ms 26626 39 ms 26680 39 ms 26862 46 ms 26620 48 ms 26590 51 ms 26713 60 ms 26689 61 ms 26635 67 ms 26657 67 ms 26766 144 ms 26866 147 ms 26838 152 ms 26846 155 ms 26625 163 ms 26679 163 ms 26746 177 ms 26842 180 ms 26668 205 ms 26646 206 ms 26656 211 ms 26634 214 ms 26688 250 ms 26712 252 ms 26761 255 ms 26790 263 ms 26798 263 ms 26806 264 ms 26814 265 ms 26818 265 ms 26826 266 ms 26810 267 ms 26822 267 ms 26778 270 ms 26834 270 ms 26770 271 ms 26774 271 ms 26671 273 ms 26649 275 ms 26802 275 ms 26854 292 ms 26608 351 ms 26830 360 ms 26614 366 ms 26794 374 ms 26850 391 ms 26705 410 ms 26729 412 ms 26760 430 ms 26786 442 ms 26858 459 ms 26617 510 ms 26782 546 ms 26725 549 ms 26701 556 ms 26611 694 ms [EMAIL PROTECTED]:~/performance/DTraceToolkit-0.96/Bin# YES! DTRACE CATCHED THESE SHORT LIVES!
Hope it helps and others, please correct me if I am saying anything wrong or that doesn't apply to OpenSolaris, JF. ------------------------------ *From:* [EMAIL PROTECTED] [mailto: [EMAIL PROTECTED] *On Behalf Of [EMAIL PROTECTED] *Sent:* Monday, June 18, 2007 8:28 AM *To:* opensolaris-discuss@opensolaris.org; [EMAIL PROTECTED]; [EMAIL PROTECTED] *Subject:* [perf-discuss] cpu performance counters obtained by vmstat and prstat look conflict Dear all: I'm compiling the ON build 65's source code now, using "nightly opensolaris.sh" command. The prstat reports that system is very idle, but the load average tells me that the system is very busy. -,- And then I check the vmstat report, it shows the system is busy now, too. Following are the reports, what's the problem? ps, My system is a dell workstation with a P4 1.7G CPU and 512MB memory. PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 6673 root 13M 12M sleep 35 0 0:00:02 2.9% dmake/1 9086 root 5672K 3396K run 35 0 0:00:00 0.5% acomp/1 6634 root 2080K 1344K sleep 59 0 0:00:00 0.2% vmstat/1 9080 root 9640K 6576K run 15 0 0:00:00 0.2% ube/1 8383 root 4320K 2764K cpu0 59 0 0:00:00 0.1% prstat/1 7720 root 8112K 3928K sleep 59 0 0:00:00 0.0% sshd/1 9083 root 1140K 876K sleep 35 0 0:00:00 0.0% sh/1 9071 root 1200K 920K run 15 0 0:00:00 0.0% cc/1 9069 root 1140K 876K sleep 45 0 0:00:00 0.0% sh/1 9085 root 1192K 916K sleep 35 0 0:00:00 0.0% cc/1 9070 root 996K 688K sleep 35 0 0:00:00 0.0% cw/1 9084 root 996K 668K run 25 0 0:00:00 0.0% cw/1 9082 root 13M 1320K sleep 35 0 0:00:00 0.0% dmake/1 9068 root 13M 1316K sleep 45 0 0:00:00 0.0% dmake/1 7979 root 7836K 2040K sleep 59 0 0:00:00 0.0% sshd/1 7984 root 2588K 1820K sleep 59 0 0:00:00 0.0% bash/1 7918 root 7836K 2044K sleep 59 0 0:00:01 0.0% sshd/1 117 daemon 4008K 1972K sleep 59 0 0:00:01 0.0% kcfd/3 9742 root 4012K 2552K sleep 59 0 0:00:11 0.0% nscd/25 5309 root 12M 6428K sleep 59 0 0:10:35 0.0% smbd/1 7598 root 4600K 3860K sleep 59 0 0:00:02 0.0% dmake/1 7597 root 972K 688K sleep 59 0 0:00:00 0.0% time/1 27340 root 3152K 2396K sleep 59 0 0:00:00 0.0% dmake/1 26430 root 3176K 2436K sleep 59 0 0:00:00 0.0% dmake/1 NPROC USERNAME SIZE RSS MEMORY TIME CPU 63 root 289M 138M 27% 0:21:04 *4.3% * 2 daemon 6332K 2960K 0.6% 0:00:01 0.0% 1 smmsp 6996K 1452K 0.3% 0:00:05 0.0% Total: 66 processes, 176 lwps, load averages: *2.97, 3.04, 3.03* [EMAIL PROTECTED]:~#vmstat 1 kthr memory page disk faults cpu r b w swap free re mf pi po fr de sr cd f0 s0 -- in sy cs us sy id 0 0 0 1238652 245720 15 22 40 0 0 0 0 2 0 0 0 419 153 155 1 1 97 1 0 0 1201976 225460 1445 6769 0 67 67 0 0 7 0 0 0 317 5544 115 73 27 0 3 0 0 1190896 224480 6842 18811 0 72 72 0 0 10 0 0 0 332 11701 223 35 65 0 7 0 0 1196188 231540 4681 13909 0 99 99 0 0 10 0 0 0 340 10901 292 47 53 0 4 0 0 1196072 230308 4168 11179 0 67 67 0 0 8 0 0 0 329 11525 171 58 42 0 3 0 0 1183820 218020 1415 7525 0 28 28 0 0 6 0 0 0 325 5082 135 74 26 0 2 0 0 1189328 222504 4544 12530 0 139 139 0 0 16 0 0 0 348 10556 309 50 50 0 8 0 0 1194864 229620 5194 15550 0 36 36 0 0 11 0 0 0 333 10805 204 43 57 0 7 0 0 1195228 229536 5172 14077 0 67 67 0 0 13 0 0 0 338 10306 196 50 50 0 7 0 0 1187484 226696 4699 14620 0 115 115 0 0 11 0 0 0 336 10514 205 *45 55 0 *^C [EMAIL PROTECTED]:~# It seems that prstat doesn't report the correct CPU usage percent for some processes. Regards TJ
Regards TJ
_______________________________________________ opensolaris-discuss mailing list opensolaris-discuss@opensolaris.org