Hi Phil,
Comments inline.
> >
> > truss shows writes like this:
> >
> > /1: write(22, "\0 4\001 :919C\v J H90 b".., 96)
> = 96
> : write(23, " %97 *AF", 4)
> = 4
> 97 *AF", 4) = 4
> >
> > and reads like this:
> > /1: read(77, "\0 D\003 :919F16 J ,EACF"..,
> 131072) = 131072
> > /1: read(77, "\0 D\003 :919F16 J ,EADC"..,
> 131072) = 131072
> > /1: read(77, "\0 D\003 :919F16 J ,EADC"..,
> 131072) = 131072
> > /1: read(78, 0x033BD918, 5)
> Err#11 EAGAIN
> BD918, 5) Err#11 EAGAIN
> > /1: read(78, 0x033BD918, 5)
> Err#11 EAGAIN
> BD918, 5) Err#11 EAGAIN
> >
> > I had done a truss before and seen the 128k reads.
> Though I hadn't noticed so many EAGAIN's.
> >
>
> The writes are tiny. That's a lot of overhead. But I
> think we need the
> big picture, so some DTrace quntizations of per-fd
> sizes might be useful
> to fill out the picture some more.
# dtrace -n 'sysinfo:::readch /pid ==24937/ { @dist[execname] = quantize(arg0);
}'
dtrace: description 'sysinfo:::readch ' matched 4 probes
^C
fwd
value ------------- Distribution ------------- count
-1 | 0
0 |@@@@@@@@@@@@@@ 6513
1 | 0
2 | 0
4 |@ 366
8 | 0
16 | 0
32 | 0
64 | 6
128 | 6
256 | 12
512 | 5
1024 | 6
2048 | 13
4096 |@ 246
8192 | 24
16384 |@ 276
32768 | 0
65536 | 0
131072 |@@@@@@@@@@@@@@@@@@@@@@@@ 11372
262144 | 0
# dtrace -n 'syscall::read:entry /pid ==24937/ { @dist[arg0] = quantize(arg2);
}'
dtrace: description 'syscall::read:entry ' matched 1 probe
^C
69
value ------------- Distribution ------------- count
2 | 0
4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ 8
8 | 0
16 | 0
32 | 0
64 |@@@@@@@@@@ 3
128 |@@@ 1
256 | 0
65
value ------------- Distribution ------------- count
2 | 0
4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ 6
8 | 0
16 | 0
32 | 0
64 |@@@@ 1
128 |@@@@ 1
256 |@@@@ 1
512 | 0
66
value ------------- Distribution ------------- count
2 | 0
4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ 14
8 | 0
16 | 0
32 | 0
64 |@@ 1
128 |@@@@@@@@ 4
256 |@@@@ 2
512 | 0
67
value ------------- Distribution ------------- count
2 | 0
4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ 16
8 | 0
16 | 0
32 | 0
64 |@@ 1
128 |@@ 1
256 |@@@ 2
512 | 0
1024 |@@@@@@@ 4
2048 | 0
64
value ------------- Distribution ------------- count
2 | 0
4 |@@@@@@@@@@@@@@@@@@@@@@@@@@ 11
8 | 0
16 | 0
32 | 0
64 |@@@@@ 2
128 |@@ 1
256 |@@ 1
512 | 0
1024 | 0
2048 |@@ 1
4096 | 0
8192 |@@ 1
16384 | 0
71
value ------------- Distribution ------------- count
2 | 0
4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 3216
8 | 0
43
value ------------- Distribution ------------- count
2 | 0
4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 3233
8 | 0
54
value ------------- Distribution ------------- count
8192 | 0
16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 16
32768 | 0
55
value ------------- Distribution ------------- count
65536 | 0
131072 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 4
262144 | 0
68
value ------------- Distribution ------------- count
2 | 0
4 |@@@@@@@@@@@@@@@@@@@@ 119
8 | 0
16 | 0
32 | 0
64 | 0
128 | 0
256 | 0
512 | 0
1024 |@ 4
2048 | 0
4096 |@@@@@@@@@@@@@@@@@@ 107
8192 |@ 4
16384 |@@ 10
32768 | 0
61
value ------------- Distribution ------------- count
2 | 0
4 |@@@@@@@@@@@@@@@@@@@ 114
8 | 0
16 | 0
32 | 0
64 | 0
128 |@ 4
256 | 0
512 | 0
1024 | 2
2048 |@ 4
4096 |@@@@@@@@@@@@@@@ 90
8192 |@ 6
16384 |@@@@ 22
32768 | 0
62
value ------------- Distribution ------------- count
2 | 0
4 |@@@@@@@@@@@@@@@@@@ 110
8 | 0
16 | 0
32 | 0
64 | 2
128 | 0
256 | 0
512 | 3
1024 |@ 4
2048 | 2
4096 |@@@@@@@@ 47
8192 |@@@ 17
16384 |@@@@@@@@@ 57
32768 | 0
73
value ------------- Distribution ------------- count
65536 | 0
131072 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 10604
262144 | 0
interesting fd's from `pfile 24937`. Note that external reporting engine is now
indexing the active log file not an archived one:
43: S_IFSOCK mode:0666 dev:332,0 ino:23157 uid:0 gid:0 size:0
O_RDWR|O_NDELAY
SOCK_STREAM
SO_REUSEADDR,SO_OOBINLINE,SO_SNDBUF(49152),SO_RCVBUF(49640),IP_NEXTHOP(0.0.193.232)
sockname: AF_INET 1.1.1.42 port: 18184
peername: AF_INET 1.1.1.5 port: 45277
71: S_IFSOCK mode:0666 dev:332,0 ino:63336 uid:0 gid:0 size:0
O_RDWR|O_NDELAY
SOCK_STREAM
SO_REUSEADDR,SO_OOBINLINE,SO_SNDBUF(49152),SO_RCVBUF(49640),IP_NEXTHOP(0.0.193.232)
sockname: AF_INET 1.1.1.42 port: 18184
peername: AF_INET 1.1.1.5 port: 46653
73: S_IFREG mode:0644 dev:256,65541 ino:148301 uid:0 gid:0 size:1785772291
O_RDONLY
/var/opt/CPmds-R65/customers/XXX/CPsuite-R65/fw1/log/fw.log
Writes appear to be quite small on the whole.
fd 22 here is fw.log, 23 fw.logptr, 43 & 71 TCP connections to log reporting
server.
# dtrace -n 'syscall::write:entry /pid ==24937/ { @dist[arg0] = quantize(arg2);
}'
dtrace: description 'syscall::write:entry ' matched 1 probe
^C
13
value ------------- Distribution ------------- count
512 | 0
1024 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 4
2048 | 0
4096 |@@@@@@@@ 1
8192 | 0
24
value ------------- Distribution ------------- count
2 | 0
4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 15365
8 | 0
23
value ------------- Distribution ------------- count
2 | 0
4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 17146
8 | 0
71
value ------------- Distribution ------------- count
4096 | 0
8192 |@@@@@@@@@@@@@@@@@@@@ 22
16384 |@@@@@@@@@@@@@@@@@@@@ 22
32768 | 0
43
value ------------- Distribution ------------- count
4096 | 0
8192 |@@@@@@@@@@@@@@@@@@@@ 22
16384 |@@@@@@@@@@@@@@@@@@@@ 23
32768 | 0
22
value ------------- Distribution ------------- count
2 | 0
4 |@@@@ 1781
8 | 0
16 | 0
32 | 21
64 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 15806
128 |@@@ 1319
256 | 0
>
> Those EAGAINs are a concern. It would be good to get
> a handle on how
> much that happens, but it really points to code that
> needs fixing.
> Unconstrained polling is a big no-no. I get the
> impression that this
> code isn't up to the kind of load being put through
> it. I'd like to see
> multiple threads and/or the use of poll(2) or event
> ports.
>
> >> 2) You've said the application is single threaded.
> >> This isn't a good fit
> >> for the T-series (which was designed for
> throughput
> >> workloads at the
> >> expense of single thread performance ... e.g. the
> >> M-series has typically
> >> 4x the single thread performance the T-series).
> >>
> >
> > The server has handled the workload quite well
> until recently when we've added a few new firewalls
> that do quite a large volume of logging. If I can
> build a case for a faster server then the easier it
> is to convince the business to buy it.
> >
>
> The application has a finite capacity. It will max
> out sooner or later.
> Faster CPUs might put off the evil day, but the code
> really needs fixing
> too. Your T5120 has plenty of capacity, but not with
> this code, and your
> prstat data confirms ...
>
> >> 3) It would be helpful to see 'prstat -Lm' to get
> a
> >> better overall feel
> >> for what is running where (especially, where fwd
> is
> >> spending its time).
> >>
> >
> > PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX
> ICX SCL SIG PROCESS/LWPID
> > 2153 root 41 59 0.1 0.0 0.0 0.0 0.3 0.2 383
> 11 7K 0 fwd/1
>
> at's one whole VCPU dedicated to this task. Either we
> improve the code
> (e.g. save cycles) or you need a faster CPU with
> respec to single thread
> performance. Of course, if we could also make the
> application
> multithreaded or multiprocess the T5120 has plenty of
> CPU to spare!
I agree that putting in faster CPu will help it is only putting things off.
I'll be approaching the vendor to discuss their application's performance but
building a case such as this will help me get the point across.
>
> >
> > unix`send_mondo_set+0x314
> > unix`xt_some+0x1a0
> > unix`sfmmu_tlb_range_demap+0x1f0
> > unix`hat_unload_callback+0x850
> > unix`ppmapout+0x94
> > zfs`mappedread+0xa4
> > zfs`zfs_read+0x19c
> > genunix`fop_read+0x20
> > genunix`read+0x274
> > unix`syscall_trap32+0xcc
> > 135480
> >
> > unix`send_mondo_set+0x314
> > unix`xt_some+0x1a0
> > unix`xt_sync+0x158
> > unix`hat_unload_callback+0x898
> > unix`ppmapout+0x94
> > zfs`mappedread+0xa4
> > zfs`zfs_read+0x19c
> > genunix`fop_read+0x20
> > genunix`read+0x274
> > unix`syscall_trap32+0xcc
> > 135480
> >
>
> Ah! This is very telling since zfs_read() only calls
> mappedread() if it
> thinks someone also has the file mmapped. ZFS and
> mmap(2) do not mix
> well because ZFS has its own cache (the ARC) apart
> from the Solaris page
> cache. However, mmap(2) currently uses the Solaris
> page cache for mapped
> ZFS files, so ZFS has to keep the two caches in sync
> (so everyone sees
> the file in the same state). The crosscall storm is
> the result.
>
> I worked on a partial fix for mapped read
> performance, but I doubt that
> will help your application much (because it wasn't
> aimed at read()
> performance of an elsewhere mapped file).
>
> So the big question is: who has the the file mapped,
> and can you stop
> that? Note: utilities like cat(1) and cp(1) use
> mmap(2) to read data,
> but dd(1) doesn't (e.g. if you had a script that
> copied the files
> periodically, you could substitute dd(1) and lose the
> crosscalls).
>
> Does fwd mmap the file itself? (you can use pmap(1)
> to find out).
>
It appears that fwd is mmapping the files. So as a temporary measure would
moving the logs to UFS improve things?
I'll be applying latest recommended patches but another poster, sistare, seemed
to say that the fix for the ZFS mmap issue (6699438) has not been released yet.
So I'll have to open a case with Sun to get hold of that myself.
# pmap 24937
24937: /opt/CPmds-R65/customers/XXX/CPsuite-R65/fw1/bin/fwd -n
[snip]
EBB9C000 39544K r--s- dev:256,65541 ino:148301 --- fw.log
EE5DA000 41424K r--s- dev:256,65541 ino:125472 --- 2009-06-29_224738_25.log
[snip]
total 169976K
>
> ibCPLogRepository.so`__1cOCBinaryLogFileNReadLogHeader
> 6kMpnJILogEvent_pnNCLogFileChain__b_+0x280
> >
>
> ibCPLogRepository.so`__1cOCBinaryLogFileHReadLog6MpnJI
> LogEvent_pnNCLogFileChain__b_+0xc
> >
>
> ibCPLogRepository.so`__1cICLogFilePReadLogByOffset6Mpn
> JILogEvent_LpL3_i_+0x118
> >
>
> ibCPLogLuFile.so`__1cKCLuLogFileLgatherChain6Mn0APtRea
> dNavigation_kLrnKCSrvVector4CpnJILogEvent_nDstdTalloca
> tor_interface4n0DJallocator4C4__C4____r4_i_+0x3f8
> >
>
> ibCPLogLuFile.so`__1cKCLuLogFileJdoReadLog6MpnJILogEve
> nt_n0APtReadNavigation_kLipp2pi_i_+0x80
> > libCPLogLuFile.so`0xfc493e9c
> >
>
> ibfw1.so`__1cNlog_read_next6FpnPICplogLuLogFile_pnJILo
> gEvent_bpp3pi6_3_+0xfc
> > libfwdopsec.so`lea_send_line+0x108
> > libopsec.so`lea_send_50_records+0x68
> >
>
> ibopsec.so`lea_server_event_handler+0xc8
> >
>
> ibopsec.so`opsec_fwasync_conn_handler+0x2fc
> > libComUtils.so`0xfe393bc4
> >
>
> ibComUtils.so`T_event_mainloop_iter+0x4c4
> > libComUtils.so`T_event_mainloop+0x118
> > libfw1.so`fwd_cmain+0x131c
> > libfw1.so`fw_cmain+0x910
> > fwd`_start+0x108
> > 226302
> >
> >
> >> Hope this helps,
> >> Phil
> >>
> >
> > I hope so too. And thanks for the fast response!
> >
> > regards
> >
> > Matthew
> >
>
> Nothing else there, except that taking an xcall
> during a memcpy() might
> indicate a mapped read. However, xcalls can happen in
> many ways. A
> PID-predicated stack aggregation of the xcalls might
> shed a little more
> light ...
>
> dtrace -n 'sysinfo:::xcalls /pid == 2153/ {
> @[stack()] = count(); }'
>
# dtrace -n 'sysinfo:::xcalls /pid == 24937/ { @[stack()] = count(); }'
unix`send_one_mondo+0x1c
unix`xt_one_unchecked+0xc8
genunix`sleepq_wakeone_chan+0x70
genunix`cv_signal+0x58
genunix`taskq_bucket_dispatch+0x74
genunix`taskq_dispatch+0x13c
genunix`qenable_locked+0xd4
genunix`backenable+0x94
genunix`kstrgetmsg+0xa80
sockfs`sotpi_recvmsg+0x2ac
sockfs`socktpi_read+0x44
genunix`fop_read+0x20
genunix`read+0x274
unix`syscall_trap32+0xcc
23
unix`send_mondo_set+0x314
unix`xt_some+0x1a0
unix`sfmmu_tlb_range_demap+0x1f0
unix`hat_unload_callback+0x850
unix`ppmapout+0x94
zfs`zfs_fillpage+0x19c
zfs`zfs_getpage+0x184
genunix`fop_getpage+0x44
genunix`segvn_fault+0xb04
genunix`as_fault+0x4c8
unix`pagefault+0x68
unix`trap+0xd50
unix`utl0+0x4c
157
unix`send_mondo_set+0x314
unix`xt_some+0x1a0
unix`xt_sync+0x158
unix`hat_unload_callback+0x898
unix`ppmapout+0x94
zfs`zfs_fillpage+0x19c
zfs`zfs_getpage+0x184
genunix`fop_getpage+0x44
genunix`segvn_fault+0xb04
genunix`as_fault+0x4c8
unix`pagefault+0x68
unix`trap+0xd50
unix`utl0+0x4c
157
unix`send_mondo_set+0x314
unix`xt_some+0x1a0
unix`sfmmu_tlb_range_demap+0x1f0
unix`hat_unload_callback+0x850
unix`ppmapout+0x94
zfs`mappedread+0xa4
zfs`zfs_read+0x19c
genunix`fop_read+0x20
genunix`read+0x274
unix`syscall_trap32+0xcc
176138
unix`send_mondo_set+0x314
unix`xt_some+0x1a0
unix`xt_sync+0x158
unix`hat_unload_callback+0x898
unix`ppmapout+0x94
zfs`mappedread+0xa4
zfs`zfs_read+0x19c
genunix`fop_read+0x20
genunix`read+0x274
unix`syscall_trap32+0xcc
176138
>
> Phil
> _______________________________________________
> perf-discuss mailing list
> [email protected]
regards
Matthew
--
This message posted from opensolaris.org
_______________________________________________
perf-discuss mailing list
[email protected]