I am running libmicro benchmark in my system, but sometimes I find that
the benchmark can not be finished because one workload of pipe does not
stop. To reproduce this, I have written a simple script called
"pipe_run" to run pipe workload and it can reproduce this senario.
Following is the steps:
intel5# pwd
/export/bench/system/libMicroV12
intel5# cat pipe_run
#!/bin/sh
BHOME=`pwd`
while [ true ]; do
$BHOME/libMicro/bin/pipe -E -C 200 -L -S -W -N pipe_sst1 -s 1 -I 1000 -x
sock -m st >/dev/null
done
intel5# ./pipe_run
Running: pipe_sst1 for 0.01631 seconds
Running: pipe_sst1 for 0.01556 seconds
Running: pipe_sst1 for 0.01563 seconds
Running: pipe_sst1 for 0.01569 seconds
Running: pipe_sst1 for 0.01565 seconds
Running: pipe_sst1 for 0.01578 seconds
Running: pipe_sst1 for 0.01580 seconds
Running: pipe_sst1 for 0.01534 seconds
Running: pipe_sst1 for 0.01574 seconds
Running: pipe_sst1 for 0.01562 seconds
Running: pipe_sst1
>From this, you can see that normal time to finish workload pipe_sst1 is
very short. But at the last line, it never stops. Following is the
output of "prstat -n 5" on another terminal, you can see the pipe
workload have runned 49 minutes and is still consumming CPU very much.
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
5570 root 4648K 700K cpu4 0 0 0:49:36 12% pipe/1
9097 root 4552K 2832K cpu2 59 0 0:00:00 0.0% prstat/1
663 noaccess 108M 86M sleep 59 0 0:00:11 0.0% java/19
366 daemon 2928K 1676K sleep 59 0 0:00:00 0.0%
avahi-daemon-br/1
9 root 14M 13M sleep 59 0 0:00:04 0.0%
svc.configd/17
Total: 55 processes, 205 lwps, load averages: 1.00, 1.00, 1.00
Following is the output "mpstat 2", it shows that this workload have
consumed totally one core of this system and do system calls at rate of
one-million-syscalls/seconds.
intel5# mpstat 2
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt
idl
0 6 0 1 340 139 56 0 6 2 0 5472 0 1 0
99
1 22 0 3 5 0 119 0 7 2 0 68327 1 5 0
94
2 20 0 12 9 7 38 0 4 2 0 50 0 0 0
100
3 14 0 2 4 0 13 0 2 1 0 13530 0 2 0
98
4 35 0 1 15 2 26 5 2 2 0 866042 13 58
0 30
5 23 0 1 7 2 92 1 3 2 0 89669 1 6 0
93
6 27 0 1 6 4 8 0 1 1 0 26 0 0 0
100
7 12 0 0 2 0 2 0 0 0 0 22 0 0 0
100
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt
idl
0 0 0 0 338 138 70 0 10 0 0 7 0 0 0
100
1 0 0 0 8 0 0 7 0 0 0 1228927 17 83
0 0
2 0 0 0 1 0 82 0 4 0 0 3 0 0 0
100
3 0 0 0 1 0 7 0 3 0 0 3 0 0 0
100
4 0 0 1 3 2 63 0 3 0 0 6 0 0 0
100
5 3 0 0 3 2 21 0 6 0 0 41 0 0 0
100
6 0 0 0 5 4 36 0 1 0 0 0 0 0 0
100
7 0 0 0 1 0 3 0 0 0 0 0 0 0 0
100
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt
idl
0 3 0 0 340 139 67 0 8 0 0 59 0 0 0
100
1 0 0 0 8 0 0 7 0 0 0 1229706 17 83
0 0
2 0 0 0 3 2 68 0 2 0 0 2 0 0 0
100
3 0 0 0 1 0 6 0 3 0 0 3 0 0 0
100
4 0 0 0 3 2 69 0 4 0 0 16 0 0 0
100
5 0 0 0 3 2 30 0 4 0 0 6 0 0 0
100
6 0 0 0 6 4 34 0 0 0 0 2 0 0 0
100
7 0 0 0 1 0 1 0 0 0 0 0 0 0 0
100
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt
idl
0 0 0 0 338 138 74 0 8 0 0 5 0 0 0
100
1 0 0 0 8 0 0 7 0 0 0 1236779 17 83
0 0
2 0 0 0 3 2 71 0 3 1 0 36 0 0 0
100
3 0 0 0 1 0 2 0 1 0 0 1 0 0 0
100
4 0 0 0 3 2 74 0 4 0 0 50 0 0 0
100
5 0 0 0 2 2 16 0 5 0 0 20 0 0 0
100
6 0 0 0 5 4 33 0 0 1 0 0 0 0 0
100
7 0 0 0 1 0 1 0 0 0 0 0 0 0 0
100
^C
To check what syscall are being caused, I run the script syscallbypid.d
provided by DTraceToolkit and following is the result. It shows that
most of syscalles are read and write, which come from pipe workload.
intel5# syscallbypid.d
Tracing... Hit Ctrl-C to end.
^C
PID CMD SYSCALL COUNT
161 devfsadm gtime 1
161 devfsadm lwp_park 1
462 inetd lwp_park 1
576 intrd nanosleep 1
663 java lwp_cond_wait 1
9123 dtrace fstat 1
9123 dtrace lwp_sigmask 1
9123 dtrace mmap 1
9123 dtrace schedctl 1
9123 dtrace setcontext 1
9123 dtrace sigpending 1
576 intrd gtime 2
9123 dtrace lwp_park 2
9123 dtrace write 2
9123 dtrace sysconfig 3
9123 dtrace sigaction 4
9123 dtrace brk 6
663 java pollsys 26
576 intrd ioctl 29
9123 dtrace p_online 64
9123 dtrace ioctl 159
5570 pipe read 454363
5570 pipe write 454364
Does anyone meet this issue and know the cause?
Chen Zhihui
Intel OpenSolaris Team
_______________________________________________
perf-discuss mailing list
[email protected]