Re: [perf-discuss] poor application performance - very high cross-calls

Matthew Flanagan Mon, 29 Jun 2009 20:52:36 -0700

Hi Phil,

Comments inline.


> >
> > truss shows writes like this:
> >
> > /1:     write(22, "\0 4\001 :919C\v J H90 b".., 96)
>     = 96
> :     write(23, " %97 *AF", 4)
>                        = 4
> 97 *AF", 4)                        = 4
> >
> > and reads like this:
> > /1:     read(77, "\0 D\003 :919F16 J ,EACF"..,
> 131072)  = 131072
> > /1:     read(77, "\0 D\003 :919F16 J ,EADC"..,
> 131072)  = 131072
> > /1:     read(77, "\0 D\003 :919F16 J ,EADC"..,
> 131072)  = 131072
> > /1:     read(78, 0x033BD918, 5)
>                         Err#11 EAGAIN
> BD918, 5)                         Err#11 EAGAIN
> > /1:     read(78, 0x033BD918, 5)
>                         Err#11 EAGAIN
> BD918, 5)                         Err#11 EAGAIN
> >
> > I had done a truss before and seen the 128k reads.
> Though I hadn't noticed so many EAGAIN's.
> >   
> 
> The writes are tiny. That's a lot of overhead. But I
> think we need the 
> big picture, so some DTrace quntizations of per-fd
> sizes might be useful 
> to fill out the picture some more.

# dtrace -n 'sysinfo:::readch /pid ==24937/ { @dist[execname] = quantize(arg0); 
}'
dtrace: description 'sysinfo:::readch ' matched 4 probes
^C

  fwd
           value  ------------- Distribution ------------- count
              -1 |                                         0
               0 |@@@@@@@@@@@@@@                           6513
               1 |                                         0
               2 |                                         0
               4 |@                                        366
               8 |                                         0
              16 |                                         0
              32 |                                         0
              64 |                                         6
             128 |                                         6
             256 |                                         12
             512 |                                         5
            1024 |                                         6
            2048 |                                         13
            4096 |@                                        246
            8192 |                                         24
           16384 |@                                        276
           32768 |                                         0
           65536 |                                         0
          131072 |@@@@@@@@@@@@@@@@@@@@@@@@                 11372
          262144 |                                         0


# dtrace -n 'syscall::read:entry /pid ==24937/ { @dist[arg0] = quantize(arg2); 
}'
dtrace: description 'syscall::read:entry ' matched 1 probe
^C

               69
           value  ------------- Distribution ------------- count
               2 |                                         0
               4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@              8
               8 |                                         0
              16 |                                         0
              32 |                                         0
              64 |@@@@@@@@@@                               3
             128 |@@@                                      1
             256 |                                         0

               65
           value  ------------- Distribution ------------- count
               2 |                                         0
               4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@              6
               8 |                                         0
              16 |                                         0
              32 |                                         0
              64 |@@@@                                     1
             128 |@@@@                                     1
             256 |@@@@                                     1
             512 |                                         0

               66
           value  ------------- Distribution ------------- count
               2 |                                         0
               4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@              14
               8 |                                         0
              16 |                                         0
              32 |                                         0
              64 |@@                                       1
             128 |@@@@@@@@                                 4
             256 |@@@@                                     2
             512 |                                         0

               67
           value  ------------- Distribution ------------- count
               2 |                                         0
               4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@              16
               8 |                                         0
              16 |                                         0
              32 |                                         0
              64 |@@                                       1
             128 |@@                                       1
             256 |@@@                                      2
             512 |                                         0
            1024 |@@@@@@@                                  4
            2048 |                                         0

               64
           value  ------------- Distribution ------------- count
               2 |                                         0
               4 |@@@@@@@@@@@@@@@@@@@@@@@@@@               11
               8 |                                         0
              16 |                                         0
              32 |                                         0
              64 |@@@@@                                    2
             128 |@@                                       1
             256 |@@                                       1
             512 |                                         0
            1024 |                                         0
            2048 |@@                                       1
            4096 |                                         0
            8192 |@@                                       1
           16384 |                                         0

               71
           value  ------------- Distribution ------------- count
               2 |                                         0
               4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 3216
               8 |                                         0

               43
           value  ------------- Distribution ------------- count
               2 |                                         0
               4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 3233
               8 |                                         0

               54
           value  ------------- Distribution ------------- count
            8192 |                                         0
           16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 16
           32768 |                                         0

               55
           value  ------------- Distribution ------------- count
           65536 |                                         0
          131072 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 4
          262144 |                                         0

               68
           value  ------------- Distribution ------------- count
               2 |                                         0
               4 |@@@@@@@@@@@@@@@@@@@@                     119
               8 |                                         0
              16 |                                         0
              32 |                                         0
              64 |                                         0
             128 |                                         0
             256 |                                         0
             512 |                                         0
            1024 |@                                        4
            2048 |                                         0
            4096 |@@@@@@@@@@@@@@@@@@                       107
            8192 |@                                        4
           16384 |@@                                       10
           32768 |                                         0

               61
           value  ------------- Distribution ------------- count
               2 |                                         0
               4 |@@@@@@@@@@@@@@@@@@@                      114
               8 |                                         0
              16 |                                         0
              32 |                                         0
              64 |                                         0
             128 |@                                        4
             256 |                                         0
             512 |                                         0
            1024 |                                         2
            2048 |@                                        4
            4096 |@@@@@@@@@@@@@@@                          90
            8192 |@                                        6
           16384 |@@@@                                     22
           32768 |                                         0

               62
           value  ------------- Distribution ------------- count
               2 |                                         0
               4 |@@@@@@@@@@@@@@@@@@                       110
               8 |                                         0
              16 |                                         0
              32 |                                         0
              64 |                                         2
             128 |                                         0
             256 |                                         0
             512 |                                         3
            1024 |@                                        4
            2048 |                                         2
            4096 |@@@@@@@@                                 47
            8192 |@@@                                      17
           16384 |@@@@@@@@@                                57
           32768 |                                         0

               73
           value  ------------- Distribution ------------- count
           65536 |                                         0
          131072 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 10604
          262144 |                                         0

interesting fd's from `pfile 24937`. Note that external reporting engine is now 
indexing the active log file not an archived one:

  43: S_IFSOCK mode:0666 dev:332,0 ino:23157 uid:0 gid:0 size:0
      O_RDWR|O_NDELAY
        SOCK_STREAM
        
SO_REUSEADDR,SO_OOBINLINE,SO_SNDBUF(49152),SO_RCVBUF(49640),IP_NEXTHOP(0.0.193.232)
        sockname: AF_INET 1.1.1.42  port: 18184
        peername: AF_INET 1.1.1.5  port: 45277
  71: S_IFSOCK mode:0666 dev:332,0 ino:63336 uid:0 gid:0 size:0
      O_RDWR|O_NDELAY
        SOCK_STREAM
        
SO_REUSEADDR,SO_OOBINLINE,SO_SNDBUF(49152),SO_RCVBUF(49640),IP_NEXTHOP(0.0.193.232)
        sockname: AF_INET 1.1.1.42  port: 18184
        peername: AF_INET 1.1.1.5  port: 46653
  73: S_IFREG mode:0644 dev:256,65541 ino:148301 uid:0 gid:0 size:1785772291
      O_RDONLY
      /var/opt/CPmds-R65/customers/XXX/CPsuite-R65/fw1/log/fw.log

Writes appear to be quite small on the whole.
fd 22 here is fw.log, 23 fw.logptr, 43 & 71 TCP connections to log reporting 
server.

# dtrace -n 'syscall::write:entry /pid ==24937/ { @dist[arg0] = quantize(arg2); 
}'
dtrace: description 'syscall::write:entry ' matched 1 probe
^C

               13
           value  ------------- Distribution ------------- count
             512 |                                         0
            1024 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@         4
            2048 |                                         0
            4096 |@@@@@@@@                                 1
            8192 |                                         0

               24
           value  ------------- Distribution ------------- count
               2 |                                         0
               4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 15365
               8 |                                         0

               23
           value  ------------- Distribution ------------- count
               2 |                                         0
               4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 17146
               8 |                                         0

               71
           value  ------------- Distribution ------------- count
            4096 |                                         0
            8192 |@@@@@@@@@@@@@@@@@@@@                     22
           16384 |@@@@@@@@@@@@@@@@@@@@                     22
           32768 |                                         0

               43
           value  ------------- Distribution ------------- count
            4096 |                                         0
            8192 |@@@@@@@@@@@@@@@@@@@@                     22
           16384 |@@@@@@@@@@@@@@@@@@@@                     23
           32768 |                                         0

               22
           value  ------------- Distribution ------------- count
               2 |                                         0
               4 |@@@@                                     1781
               8 |                                         0
              16 |                                         0
              32 |                                         21
              64 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@        15806
             128 |@@@                                      1319
             256 |                                         0



> 
> Those EAGAINs are a concern. It would be good to get
> a handle on how 
> much that happens, but it really points to code that
> needs fixing. 
> Unconstrained polling is a big no-no. I get the
> impression that this 
> code isn't up to the kind of load being put through
> it. I'd like to see 
> multiple threads and/or the use of poll(2) or event
> ports.
> 
> >> 2) You've said the application is single threaded.
> >> This isn't a good fit 
> >> for the T-series (which was designed for
> throughput
> >> workloads at the 
> >> expense of single thread performance ... e.g. the
> >> M-series has typically 
> >> 4x the single thread performance the T-series).
> >>     
> >
> > The server has handled the workload quite well
> until recently when we've added a few new firewalls
> that do quite a large volume of logging. If I can
> build a case for a faster server then the easier it
> is to convince the business to buy it.
> >   
> 
> The application has a finite capacity. It will max
> out sooner or later. 
> Faster CPUs might put off the evil day, but the code
> really needs fixing 
> too. Your T5120 has plenty of capacity, but not with
> this code, and your 
> prstat data confirms ...
> 
> >> 3) It would be helpful to see 'prstat -Lm' to get
> a
> >> better overall feel 
> >> for what is running where (especially, where fwd
> is
> >> spending its time).
> >>     
> >
> >    PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX
> ICX SCL SIG PROCESS/LWPID
> >   2153 root      41  59 0.1 0.0 0.0 0.0 0.3 0.2 383
>  11  7K   0 fwd/1
>    
> at's one whole VCPU dedicated to this task. Either we
> improve the code 
> (e.g. save cycles) or you need a faster CPU with
> respec to single thread 
> performance. Of course, if we could also make the
> application 
> multithreaded or multiprocess the T5120 has plenty of
> CPU to spare!

I agree that putting in faster CPu will help it is only putting things off. 
I'll be approaching the vendor to discuss their application's performance but 
building a case such as this will help me get the point across.

> 
> >
> >               unix`send_mondo_set+0x314
> >               unix`xt_some+0x1a0
> >               unix`sfmmu_tlb_range_demap+0x1f0
> >               unix`hat_unload_callback+0x850
> >               unix`ppmapout+0x94
> >               zfs`mappedread+0xa4
> >               zfs`zfs_read+0x19c
> >               genunix`fop_read+0x20
> >               genunix`read+0x274
> >               unix`syscall_trap32+0xcc
> >            135480
> >
> >               unix`send_mondo_set+0x314
> >               unix`xt_some+0x1a0
> >               unix`xt_sync+0x158
> >               unix`hat_unload_callback+0x898
> >               unix`ppmapout+0x94
> >               zfs`mappedread+0xa4
> >               zfs`zfs_read+0x19c
> >               genunix`fop_read+0x20
> >               genunix`read+0x274
> >               unix`syscall_trap32+0xcc
> >            135480
> >   
> 
> Ah! This is very telling since zfs_read() only calls
> mappedread() if it 
> thinks someone also has the file mmapped. ZFS and
> mmap(2) do not mix 
> well because ZFS has its own cache (the ARC) apart
> from the Solaris page 
> cache. However, mmap(2) currently uses the Solaris
> page cache for mapped 
> ZFS files, so ZFS has to keep the two caches in sync
> (so everyone sees 
> the file in the same state). The crosscall storm is
> the result.
> 
> I worked on a partial fix for mapped read
> performance, but I doubt that 
> will help your application much (because it wasn't
> aimed at read() 
> performance of an elsewhere mapped file).
> 
> So the big question is: who has the the file mapped,
> and can you stop 
> that? Note: utilities like cat(1) and cp(1) use
> mmap(2) to read data, 
> but dd(1) doesn't (e.g. if you had a script that
> copied the files 
> periodically, you could substitute dd(1) and lose the
> crosscalls).
> 
> Does fwd mmap the file itself? (you can use pmap(1)
> to find out).
> 

It appears that fwd is mmapping the files. So as a temporary measure would 
moving the logs to UFS improve things?

I'll be applying latest recommended patches but another poster, sistare, seemed 
to say that the fix for the ZFS mmap issue (6699438) has not been released yet. 
So I'll have to open a case with Sun to get hold of that myself.

# pmap 24937
24937:  /opt/CPmds-R65/customers/XXX/CPsuite-R65/fw1/bin/fwd -n
[snip]
EBB9C000   39544K r--s-  dev:256,65541 ino:148301 --- fw.log
EE5DA000   41424K r--s-  dev:256,65541 ino:125472 --- 2009-06-29_224738_25.log
[snip]
 total    169976K




> 
> ibCPLogRepository.so`__1cOCBinaryLogFileNReadLogHeader
> 6kMpnJILogEvent_pnNCLogFileChain__b_+0x280
> >
> 
> ibCPLogRepository.so`__1cOCBinaryLogFileHReadLog6MpnJI
> LogEvent_pnNCLogFileChain__b_+0xc
> >
> 
> ibCPLogRepository.so`__1cICLogFilePReadLogByOffset6Mpn
> JILogEvent_LpL3_i_+0x118
> >
> 
> ibCPLogLuFile.so`__1cKCLuLogFileLgatherChain6Mn0APtRea
> dNavigation_kLrnKCSrvVector4CpnJILogEvent_nDstdTalloca
> tor_interface4n0DJallocator4C4__C4____r4_i_+0x3f8
> >
> 
> ibCPLogLuFile.so`__1cKCLuLogFileJdoReadLog6MpnJILogEve
> nt_n0APtReadNavigation_kLipp2pi_i_+0x80
> >               libCPLogLuFile.so`0xfc493e9c
> >
> 
> ibfw1.so`__1cNlog_read_next6FpnPICplogLuLogFile_pnJILo
> gEvent_bpp3pi6_3_+0xfc
> >               libfwdopsec.so`lea_send_line+0x108
> >               libopsec.so`lea_send_50_records+0x68
> >
> 
> ibopsec.so`lea_server_event_handler+0xc8
> >
> 
> ibopsec.so`opsec_fwasync_conn_handler+0x2fc
> >               libComUtils.so`0xfe393bc4
> >
> 
> ibComUtils.so`T_event_mainloop_iter+0x4c4
> >               libComUtils.so`T_event_mainloop+0x118
> >               libfw1.so`fwd_cmain+0x131c
> >               libfw1.so`fw_cmain+0x910
> >               fwd`_start+0x108
> >            226302
> >
> >   
> >> Hope this helps,
> >> Phil
> >>     
> >
> > I hope so too. And thanks for the fast response!
> >
> > regards
> >
> > Matthew
> >   
> 
> Nothing else there, except that taking an xcall
> during a memcpy() might 
> indicate a mapped read. However, xcalls can happen in
> many ways. A 
> PID-predicated stack aggregation of the xcalls might
> shed a little more 
> light ...
> 
> dtrace -n 'sysinfo:::xcalls /pid == 2153/ {
> @[stack()] = count(); }'
> 

# dtrace -n 'sysinfo:::xcalls /pid == 24937/ { @[stack()] = count(); }'
              unix`send_one_mondo+0x1c
              unix`xt_one_unchecked+0xc8
              genunix`sleepq_wakeone_chan+0x70
              genunix`cv_signal+0x58
              genunix`taskq_bucket_dispatch+0x74
              genunix`taskq_dispatch+0x13c
              genunix`qenable_locked+0xd4
              genunix`backenable+0x94
              genunix`kstrgetmsg+0xa80
              sockfs`sotpi_recvmsg+0x2ac
              sockfs`socktpi_read+0x44
              genunix`fop_read+0x20
              genunix`read+0x274
              unix`syscall_trap32+0xcc
               23

              unix`send_mondo_set+0x314
              unix`xt_some+0x1a0
              unix`sfmmu_tlb_range_demap+0x1f0
              unix`hat_unload_callback+0x850
              unix`ppmapout+0x94
              zfs`zfs_fillpage+0x19c
              zfs`zfs_getpage+0x184
              genunix`fop_getpage+0x44
              genunix`segvn_fault+0xb04
              genunix`as_fault+0x4c8
              unix`pagefault+0x68
              unix`trap+0xd50
              unix`utl0+0x4c
              157

              unix`send_mondo_set+0x314
              unix`xt_some+0x1a0
              unix`xt_sync+0x158
              unix`hat_unload_callback+0x898
              unix`ppmapout+0x94
              zfs`zfs_fillpage+0x19c
              zfs`zfs_getpage+0x184
              genunix`fop_getpage+0x44
              genunix`segvn_fault+0xb04
              genunix`as_fault+0x4c8
              unix`pagefault+0x68
              unix`trap+0xd50
              unix`utl0+0x4c
              157

              unix`send_mondo_set+0x314
              unix`xt_some+0x1a0
              unix`sfmmu_tlb_range_demap+0x1f0
              unix`hat_unload_callback+0x850
              unix`ppmapout+0x94
              zfs`mappedread+0xa4
              zfs`zfs_read+0x19c
              genunix`fop_read+0x20
              genunix`read+0x274
              unix`syscall_trap32+0xcc
           176138

              unix`send_mondo_set+0x314
              unix`xt_some+0x1a0
              unix`xt_sync+0x158
              unix`hat_unload_callback+0x898
              unix`ppmapout+0x94
              zfs`mappedread+0xa4
              zfs`zfs_read+0x19c
              genunix`fop_read+0x20
              genunix`read+0x274
              unix`syscall_trap32+0xcc
           176138

> 
> Phil
> _______________________________________________
> perf-discuss mailing list
> [email protected]

regards

Matthew
-- 
This message posted from opensolaris.org
_______________________________________________
perf-discuss mailing list
[email protected]

Re: [perf-discuss] poor application performance - very high cross-calls

Reply via email to