Hello,
I have setup dtrace script which is run every 10 minutes:
0,10,20,30,40,50 * * * * /root/bin/sum.sh
(the dtrace script is below).
It runs for 5 min. and saves result in a file (so 5 out of 10 min is recorded)
and records the following data:
dtrace ioctl
6265410485
uname exece
6460327273
oracle semsys
7842510331
oracle munmap
9508009506
oracle resolvepath
9757628007
tnslsnr fork1
13248716831
emagent fork1
14051647077
oracle stat
15720295909
oracle exece
16459326038
oracle pollsys
19301262569
oracle mmap
20975464662
oracle times
23059663461
oracle pwrite
25491331888
oracle open
30196916271
oracle memcntl
31122380062
oracle pread
40277198389
oracle read
64166456399
oracle write
67988846272
oracle ioctl
180151590889
TOTAL:
770149631798
When I grep for TOTAL in these log files I get:
2012-07-10-13:00: TOTAL:
900883959215
2012-07-10-13:10: TOTAL:
824011512465
2012-07-10-13:20: TOTAL:
839473060028
2012-07-10-13:30: TOTAL:
880047194365
2012-07-10-13:40: TOTAL:
5236046549
2012-07-10-13:50: TOTAL:
949408615917
2012-07-10-14:00: TOTAL:
957321356582
2012-07-10-14:10: TOTAL:
805430788624
2012-07-10-14:20: TOTAL:
678869217982
2012-07-10-14:30: TOTAL:
710839026697
2012-07-10-14:40: TOTAL:
816557964466
2012-07-10-14:50: TOTAL:
880681668064
2012-07-10-15:00: TOTAL:
986025103759
While running 'sar' for the _same_ time period it says:
13:00:04 38 10 0 52
13:10:04 36 10 0 54
13:20:04 36 11 0 53
13:30:03 35 12 0 53
13:40:04 34 13 0 53
13:50:04 35 20 0 45
14:00:04 36 23 0 41
14:10:04 29 9 0 62
14:20:04 30 9 0 61
14:30:03 26 8 0 67
14:40:04 27 8 0 65
14:50:04 32 9 0 59
15:00:04 31 9 0 60
So 'sar' data shows increase in syscall time while dtrace data doesn't.
Can anybody explain me that ? Of course it might be that something is happening
in theses five
minutes period which is not recorded by dtrace. But this is typical OLTP
database server with many Oracles
running on it and such behaviour is rather unrealistic ...
Best regards
Przemek
On Fri, Jul 06, 2012 at 11:44:39PM +0200, [email protected] wrote:
> "Mike Gerdts" <[email protected]> pisze:
> > On Fri, Jul 6, 2012 at 3:22 PM, wrote:
> > > Hello,
> > >
> > > we have been observing some high syscalls (much higher then usually)
> > > during fixed part of week.
> > > DTrace shows that they mostly comes from Oracle:
> > > ...
> > > uname exece
> > > 6788165162
> > > oracle close
> > > 6904279356
> > > dtrace ioctl
> > > 7987232206
> > > oracle munmap
> > > 13857698724
> > > emagent fork1
> > > 14217339236
> > > oracle resolvepath
> > > 14438549232
> > > oracle semsys
> > > 16447265715
> > > tnslsnr fork1
> > > 18617374490
> > > oracle stat
> > > 23129632013
> > > oracle exece
> > > 23859088254
> > > oracle pollsys
> > > 28102955920
> > > oracle pread
> > > 32159134471
> > > oracle mmap
> > > 35380305339
> > > oracle times
> > > 35475706300
> > > oracle open
> > > 42314061704
> > > oracle memcntl
> > > 45026502910
> > > oracle pwrite
> > > 51648592719
> > > oracle read
> > > 72939284043
> > > oracle write
> > > 95484269587
> > > oracle ioctl
> > > 306634639800
> > >
> > > TOTAL:
> > > 1060934672587
> >
> > What was the dtrace script you used and over what period?
>
> #!/usr/sbin/dtrace -s
>
> #pragma D option dynvarsize=64m
>
> syscall:::entry
> {
> self->start = vtimestamp;
> }
> syscall:::return
> /(int)arg0 != -1 && self->start/
> {
> this->time = vtimestamp - self->start;
> @Time[execname,probefunc] = sum(this->time);
> @TimeT["Total:"] = sum(this->time);
> self->start = 0;
> }
> profile:::tick-5m
> {
> trunc (@Time, 20);
> trunc (@TimeT, 20);
> printa(@Time);
> printa("\n TOTAL:%113@d", @TimeT);
> clear(@Time);
> clear(@TimeT);
> exit(0);
> }
> END
> {
> clear (@Time);
> clear (@TimeT);
> }
>
>
> >
> > How many
> > cpus in the system (according to psrinfo | wc -l)?
>
> 96
> This is T2+ based server.
>
> >
> > Those numbers look
> > suspiciously high and make me think something is wrong with the dtrace
> > script.
> >
> > >
> > > Using sar we can observe that system spends much more time in syscalls
> > > (column 'sys') but number of syscalls is not higher then usually (column
> > > 'scall/s'):
> > >
> > > server:/export/home/przemol>sar -f /var/adm/sa/sa04 -s 12:00 -e 15:30
> > >
> > > SunOS server 5.10 Generic_147440-19 sun4v 07/04/2012
> > >
> > > 12:00:03 %usr %sys %wio %idle
> > > 12:10:03 42 8 0 50
> > > 12:20:03 37 7 0 56
> > > 12:30:03 37 8 0 55
> > > 12:40:03 37 10 0 53
> > > 12:50:04 40 15 0 45
> > > 13:00:03 37 10 0 52
> > > 13:10:04 35 10 0 55
> > > 13:20:04 38 8 0 55
> > > 13:30:04 35 7 0 58
> > > 13:40:04 38 9 0 53
> > > 13:50:03 37 21 0 42
> > > 14:00:03 36 14 0 50
> > > 14:10:04 33 10 0 57
> > > 14:20:04 36 21 0 42
> > > 14:30:04 36 32 0 32
> > > 14:40:04 28 15 0 57
> > > 14:50:03 32 12 0 57
> > > 15:00:03 30 12 0 57
> > > 15:10:03 32 8 0 60
> > > 15:20:03 31 6 0 64
> > >
> > > Average 35 12 0 52
> > >
> > > server:/export/home/przemol>sar -c -f /var/adm/sa/sa04 -s 12:00 -e 15:30
> > >
> > > SunOS server 5.10 Generic_147440-19 sun4v 07/04/2012
> > >
> > > 12:00:03 scall/s sread/s swrit/s fork/s exec/s rchar/s wchar/s
> > > 12:10:03 107602 6567 4948 27.82 18.74 478579 4290812
> > > 12:20:03 96872 6411 4549 25.64 17.55 6949437 4295743
> > > 12:30:03 99273 6139 4792 25.69 17.76 723126 895552
> > > 12:40:03 95908 5950 4129 25.47 17.61 4020240 3585954
> > > 12:50:04 99792 5685 5319 24.52 16.69 4579288 4915466
> > > 13:00:03 90723 4927 3553 24.85 17.34 4490451 3330185
> > > 13:10:04 94744 6207 3688 25.22 17.40 6691959 3530345
> > > 13:20:04 93971 6105 4111 24.88 17.05 1220270 4051683
> > > 13:30:04 87887 6189 4418 24.45 17.15 3026237 4946326
> > > 13:40:04 92151 6032 4269 24.58 17.12 4982878 3595013
> > > 13:50:03 80146 5360 4101 23.22 16.10 448459 4052298
> > > 14:00:03 72998 4488 2142 23.69 16.69 4678268 3586545
> > > 14:10:04 76433 4031 2385 24.82 17.31 6913550 6157899
> > > 14:20:04 72935 4462 1994 23.49 16.25 3090017 5321098
> > > 14:30:04 72516 4124 1993 23.46 16.37 2942463 6993712
> > > 14:40:04 72057 3048 2344 23.95 16.77 3042674 2668180
> > > 14:50:03 73552 3370 2506 24.93 17.13 6398090 2906262
> > > 15:00:03 70869 3111 2049 25.39 17.55 5858628 3745978
> > > 15:10:03 80258 3643 2467 47.57 28.89 7126361 3621600
> > > 15:20:03 73211 3519 2693 24.66 16.99 4550096 2973429
> > >
> > > Average 85197 4969 3423 25.91 17.72 4110499 3973336
> >
> > It's averaging 85197 system calls per second. At that rate, 1
> > trillion system calls (1,060,934,672,587) would take around 144 CPU
> > days of system time. Seems kinda unlikely.
> >
> > Perhaps the 1 trillion number is nanoseconds spent in the system
> > calls. That would turn into about 29 minutes of system time. That
> > could be, but whether it is bad depends a lot on the system and the
> > duration.
> >
> > >
> > > Can I conclude that system spends much more time processing more or less
> > > the same number
> > > of syscalls and time spent on one (statistically) syscall takes longer
> > > time ?
> > > What is your opinion ?
> > >
> > > Regards
> > > Przemek
> > > _______________________________________________
> > > perf-discuss mailing list
> > > [email protected]
> >
> >
> >
> > --
> > Mike Gerdts
> > http://mgerdts.blogspot.com/
> >
>
>
> _______________________________________________
> perf-discuss mailing list
> [email protected]
--
Regards
Przemyslaw Bak (przemol)
--
http://przemol.blogspot.com/
_______________________________________________
perf-discuss mailing list
[email protected]