Re: [dtrace-discuss] Memory leak scripts
Can you please provide a reference for disassembling malloc on Solaris 10? I am also pursuing the previous suggestion of a Python provider - this one seems to be against Python 2.5: http://blogs.sun.com/binujp/resource/pydtrace/diffs Thanks, Fletcher On 7/1/08 9:48 PM, Sanjeev Bagewadi [EMAIL PROTECTED] wrote: Hello Fletcher, From the error looks like dtrace is not able recognize it as probe. DTrace needs a signature for the function to be detected as probe. Probably this is missing in case of malloc. Just to double check this you could disassemble malloc and check if we have a push' instruction at the beginning. Thanks and regards, Sanjeev. Fletcher Cocquyt wrote: Hola, I am trying to isolate the memory leak I suspect in a mailman installation I found: http://blogs.sun.com/sanjeevb/date/200506 It gives an error: [EMAIL PROTECTED]:~ 9:21am 65 # ./memleak.d 10312 dtrace: failed to compile script ./memleak.d: line 3: probe description pid10312:libc.so.1:malloc:entry does not match any probes I am on SunOS 5.10 Generic_127112-07 i86pc i386 i86pc Are there some better scripts for isolating memory leaks? thanks Fletch. ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Memory leak scripts
Looks OK: [EMAIL PROTECTED]:~ 7:22am 60 # !nm nm /bin/python | egrep malloc [3597] | 134599012| 0|FUNC |GLOB |0|UNDEF |malloc [690] | 0| 0|FILE |LOCL |0|ABS|obmalloc.c On 7/2/08 6:11 AM, rickey c weisner [EMAIL PROTECTED] wrote: Fletcher, First confirm that malloc is in your binary. arwen:nm a.out | grep malloc [70]| 134547228| 0|FUNC |GLOB |0|UNDEF |malloc Then key on any malloc. Something like: pid$target::malloc:return, pid$target::memalign:return, pid$target::realloc:return, pid$target::valloc:return rick -- Fletcher Cocquyt Senior Systems Administrator Information Resources and Technology (IRT) Stanford University School of Medicine Email: [EMAIL PROTECTED] Phone: (650) 724-7485 ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Memory leak scripts
Sanjeev, I get this with your new version: [EMAIL PROTECTED]:~ 7:26am 64 # ./memleak2.d 7560 dtrace: failed to compile script ./memleak2.d: line 3: probe description pid7560:libc.so.1:malloc:0 does not match any probes [EMAIL PROTECTED]:~ 7:27am 65 # ps -ef | grep 7560 mailman 7560 718 0 06:24:11 ? 0:05 /bin/python /opt/mailman-2.1.9/bin/qrunner --runner=BounceRunner:0:1 -s thanks On 7/2/08 4:29 AM, Sanjeev Bagewadi [EMAIL PROTECTED] wrote: Fletcher, Mark Durney hit similar problem and while I was working with him and talking to my colleague he pointed out that we could use function:offset notation when we are using pid-provider. So, I have modified the script to enable the first instruction of malloc. Attached is the script. Please try it out and let me know if it works. If it does I shall update my blog to reflect it. NOTE : If there more functions which fail (for :entry) please replace entry with 0. Thanks and regards, Sanjeev. Sanjeev Bagewadi wrote: Fletcher, You could attach mdb to the running process and disassemble the routine in question : -- snip -- # mdb -p pid malloc::dis libc.so.1`malloc: pushl %ebp libc.so.1`malloc+1: movl %esp,%ebp libc.so.1`malloc+3: pushl %ebx libc.so.1`malloc+4: pushl %esi libc.so.1`malloc+5: pushl %edi libc.so.1`malloc+6: call +0x5 libc.so.1`malloc+0xb libc.so.1`malloc+0xb: popl %ebx libc.so.1`malloc+0xc: addl $0x88fe1,%ebx -- snip -- So, in my case notice that the first instruction is pushl. Thanks and regards, Sanjeev. Fletcher Cocquyt wrote: Can you please provide a reference for disassembling malloc on Solaris 10? I am also pursuing the previous suggestion of a Python provider - this one seems to be against Python 2.5: http://blogs.sun.com/binujp/resource/pydtrace/diffs Thanks, Fletcher On 7/1/08 9:48 PM, Sanjeev Bagewadi [EMAIL PROTECTED] wrote: Hello Fletcher, From the error looks like dtrace is not able recognize it as probe. DTrace needs a signature for the function to be detected as probe. Probably this is missing in case of malloc. Just to double check this you could disassemble malloc and check if we have a push' instruction at the beginning. Thanks and regards, Sanjeev. Fletcher Cocquyt wrote: Hola, I am trying to isolate the memory leak I suspect in a mailman installation I found: http://blogs.sun.com/sanjeevb/date/200506 It gives an error: [EMAIL PROTECTED]:~ 9:21am 65 # ./memleak.d 10312 dtrace: failed to compile script ./memleak.d: line 3: probe description pid10312:libc.so.1:malloc:entry does not match any probes I am on SunOS 5.10 Generic_127112-07 i86pc i386 i86pc Are there some better scripts for isolating memory leaks? thanks Fletch. ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org -- Fletcher Cocquyt Senior Systems Administrator Information Resources and Technology (IRT) Stanford University School of Medicine Email: [EMAIL PROTECTED] Phone: (650) 724-7485 ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Memory leak scripts
+0x156 python`Py_Main+0xa6b python`main+0x17 python`_start+0x80 0 42249 free:entry Ptr=0x86d78f0 On 7/2/08 7:46 AM, rickey c weisner [EMAIL PROTECTED] wrote: Fletcher, This looks suspicious. Perhaps your malloc is not in libc ? [690] | 0| 0|FILE |LOCL |0|ABS|obmalloc.c remove the libc.so.1 from your probe description. rick On Wed, Jul 02, 2008 at 07:23:49AM -0700, Fletcher Cocquyt wrote: Date: Wed, 02 Jul 2008 07:23:49 -0700 From: Fletcher Cocquyt [EMAIL PROTECTED] Subject: Re: [dtrace-discuss] Memory leak scripts In-reply-to: [EMAIL PROTECTED] To: rickey c weisner [EMAIL PROTECTED] Cc: dtrace-discuss@opensolaris.org Thread-topic: [dtrace-discuss] Memory leak scripts Thread-index: AcjcTz6M0DPfSSwvAEuYeh4RTBFiig== X-PMX-Version: 5.4.1.325704 X-Brightmail-Tracker: AA== X-Antispam: No, score=0.0/5.0, scanned in 0.102sec at (localhost [127.0.0.1]) by smf-spamd v1.3.1 - http://smfs.sf.net/ User-Agent: Microsoft-Entourage/12.11.0.080522 Original-recipient: rfc822;[EMAIL PROTECTED] Looks OK: [EMAIL PROTECTED]:~ 7:22am 60 # !nm nm /bin/python | egrep malloc [3597] | 134599012| 0|FUNC |GLOB |0|UNDEF |malloc On 7/2/08 6:11 AM, rickey c weisner [EMAIL PROTECTED] wrote: Fletcher, First confirm that malloc is in your binary. arwen:nm a.out | grep malloc [70]| 134547228| 0|FUNC |GLOB |0|UNDEF |malloc Then key on any malloc. Something like: pid$target::malloc:return, pid$target::memalign:return, pid$target::realloc:return, pid$target::valloc:return rick -- Fletcher Cocquyt Senior Systems Administrator Information Resources and Technology (IRT) Stanford University School of Medicine Email: [EMAIL PROTECTED] Phone: (650) 724-7485 -- Fletcher Cocquyt Senior Systems Administrator Information Resources and Technology (IRT) Stanford University School of Medicine Email: [EMAIL PROTECTED] Phone: (650) 724-7485 ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Memory leak scripts - analysis
Ok, maybe this is significant in the context of explaining why my python (mailman) processes seem to grow abnormally? If the libc malloc is not being called, why and is that an important issue? [EMAIL PROTECTED]:~ 2:17pm 54 # ldd /bin/python libresolv.so.2 =/lib/libresolv.so.2 libsocket.so.1 =/lib/libsocket.so.1 libnsl.so.1 = /lib/libnsl.so.1 librt.so.1 =/lib/librt.so.1 libdl.so.1 =/lib/libdl.so.1 libm.so.2 = /lib/libm.so.2 libc.so.1 = /lib/libc.so.1 libmp.so.2 =/lib/libmp.so.2 libmd.so.1 =/lib/libmd.so.1 libscf.so.1 = /lib/libscf.so.1 libaio.so.1 = /lib/libaio.so.1 libdoor.so.1 = /lib/libdoor.so.1 libuutil.so.1 = /lib/libuutil.so.1 libgen.so.1 = /lib/libgen.so.1 This is Python 2.5.2, built with no configure options besides --prefix on Solaris 10. Config.log excerpt: configure:14433: checking for --with-pymalloc configure:14453: result: yes thanks On 7/2/08 7:46 AM, rickey c weisner [EMAIL PROTECTED] wrote: Fletcher, This looks suspicious. Perhaps your malloc is not in libc ? [690] | 0| 0|FILE |LOCL |0|ABS|obmalloc.c remove the libc.so.1 from your probe description. rick On Wed, Jul 02, 2008 at 07:23:49AM -0700, Fletcher Cocquyt wrote: Date: Wed, 02 Jul 2008 07:23:49 -0700 From: Fletcher Cocquyt [EMAIL PROTECTED] Subject: Re: [dtrace-discuss] Memory leak scripts In-reply-to: [EMAIL PROTECTED] To: rickey c weisner [EMAIL PROTECTED] Cc: dtrace-discuss@opensolaris.org Thread-topic: [dtrace-discuss] Memory leak scripts Thread-index: AcjcTz6M0DPfSSwvAEuYeh4RTBFiig== X-PMX-Version: 5.4.1.325704 X-Brightmail-Tracker: AA== X-Antispam: No, score=0.0/5.0, scanned in 0.102sec at (localhost [127.0.0.1]) by smf-spamd v1.3.1 - http://smfs.sf.net/ User-Agent: Microsoft-Entourage/12.11.0.080522 Original-recipient: rfc822;[EMAIL PROTECTED] Looks OK: [EMAIL PROTECTED]:~ 7:22am 60 # !nm nm /bin/python | egrep malloc [3597] | 134599012| 0|FUNC |GLOB |0|UNDEF |malloc On 7/2/08 6:11 AM, rickey c weisner [EMAIL PROTECTED] wrote: Fletcher, First confirm that malloc is in your binary. arwen:nm a.out | grep malloc [70]| 134547228| 0|FUNC |GLOB |0|UNDEF |malloc Then key on any malloc. Something like: pid$target::malloc:return, pid$target::memalign:return, pid$target::realloc:return, pid$target::valloc:return rick -- Fletcher Cocquyt Senior Systems Administrator Information Resources and Technology (IRT) Stanford University School of Medicine Email: [EMAIL PROTECTED] Phone: (650) 724-7485 -- Fletcher Cocquyt Senior Systems Administrator Information Resources and Technology (IRT) Stanford University School of Medicine Email: [EMAIL PROTECTED] Phone: (650) 724-7485 ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Memory leak scripts - analysis
- 4K rwx--[ anon ] CFBB 4 4 4 - 4K rwx--[ anon ] CFBC 4 4 - - 4K r--s- dev:61,0 ino:47763 CFBC4000 156 156 - - 4K r-x-- ld.so.1 CFBFB000 4 4 4 - 4K rwx-- ld.so.1 CFBFC000 8 8 8 - 4K rwx-- ld.so.1 --- --- --- --- total Kb 28724 28044 23440 - On 7/2/08 2:58 PM, rickey c weisner [EMAIL PROTECTED] wrote: Fletcher, libc malloc being called or not only had to do with the naming of your probe. Looking at your configure options : --with-pymalloc This imples to me that python has his own malloc. I do not recall, but why do you think you have a memory leak ? Just because the process grows over time and does not diminish in size does not necessarily mean a memory leak. How do you measure the size of your process and are you examining the virtual size or the RSS ? The virtual size only grows upward, but I would expect it to eventually stabilize. RSS will go up and down over time. I would be more concerned with RSS than virtual size except for the possibility of exceeding a 4 GB address space for 32 bit applications. pmap -xs would be interesting. rick On Wed, Jul 02, 2008 at 02:27:31PM -0700, Fletcher Cocquyt wrote: Date: Wed, 02 Jul 2008 14:27:31 -0700 From: Fletcher Cocquyt [EMAIL PROTECTED] Subject: Re: [dtrace-discuss] Memory leak scripts - analysis In-reply-to: [EMAIL PROTECTED] To: rickey c weisner [EMAIL PROTECTED] Cc: dtrace-discuss@opensolaris.org Thread-topic: [dtrace-discuss] Memory leak scripts - analysis Thread-index: Acjcim8+knqA/bONQ0GtOzL8hEW0mg== X-PMX-Version: 5.4.1.325704 X-Brightmail-Tracker: AA== X-Antispam: No, score=0.0/5.0, scanned in 0.085sec at (localhost [127.0.0.1]) by smf-spamd v1.3.1 - http://smfs.sf.net/ User-Agent: Microsoft-Entourage/12.11.0.080522 Original-recipient: rfc822;[EMAIL PROTECTED] Ok, maybe this is significant in the context of explaining why my python (mailman) processes seem to grow abnormally? If the libc malloc is not being called, why and is that an important issue? [EMAIL PROTECTED]:~ 2:17pm 54 # ldd /bin/python libresolv.so.2 =/lib/libresolv.so.2 libsocket.so.1 =/lib/libsocket.so.1 libnsl.so.1 = /lib/libnsl.so.1 librt.so.1 =/lib/librt.so.1 libdl.so.1 =/lib/libdl.so.1 libm.so.2 = /lib/libm.so.2 libc.so.1 = /lib/libc.so.1 libmp.so.2 =/lib/libmp.so.2 libmd.so.1 =/lib/libmd.so.1 libscf.so.1 = /lib/libscf.so.1 libaio.so.1 = /lib/libaio.so.1 libdoor.so.1 = /lib/libdoor.so.1 libuutil.so.1 = /lib/libuutil.so.1 libgen.so.1 = /lib/libgen.so.1 This is Python 2.5.2, built with no configure options besides --prefix on Solaris 10. Config.log excerpt: configure:14433: checking for --with-pymalloc configure:14453: result: yes thanks On 7/2/08 7:46 AM, rickey c weisner [EMAIL PROTECTED] wrote: Fletcher, This looks suspicious. Perhaps your malloc is not in libc ? [690] | 0| 0|FILE |LOCL |0|ABS|obmalloc.c remove the libc.so.1 from your probe description. rick On Wed, Jul 02, 2008 at 07:23:49AM -0700, Fletcher Cocquyt wrote: Date: Wed, 02 Jul 2008 07:23:49 -0700 From: Fletcher Cocquyt [EMAIL PROTECTED] Subject: Re: [dtrace-discuss] Memory leak scripts In-reply-to: [EMAIL PROTECTED] To: rickey c weisner [EMAIL PROTECTED] Cc: dtrace-discuss@opensolaris.org Thread-topic: [dtrace-discuss] Memory leak scripts Thread-index: AcjcTz6M0DPfSSwvAEuYeh4RTBFiig== X-PMX-Version: 5.4.1.325704 X-Brightmail-Tracker: AA== X-Antispam: No, score=0.0/5.0, scanned in 0.102sec at (localhost [127.0.0.1]) by smf-spamd v1.3.1 - http://smfs.sf.net/ User-Agent: Microsoft-Entourage/12.11.0.080522 Original-recipient: rfc822;[EMAIL PROTECTED] Looks OK: [EMAIL PROTECTED]:~ 7:22am 60 # !nm nm /bin/python | egrep malloc [3597] | 134599012| 0|FUNC |GLOB |0|UNDEF |malloc On 7/2/08 6:11 AM, rickey c weisner [EMAIL PROTECTED] wrote: Fletcher, First confirm that malloc is in your binary. arwen:nm a.out | grep malloc [70]| 134547228| 0|FUNC |GLOB |0|UNDEF |malloc Then key on any malloc. Something like: pid$target::malloc:return, pid$target::memalign:return, pid$target::realloc:return, pid$target::valloc:return rick -- Fletcher Cocquyt Senior Systems Administrator Information Resources and Technology (IRT) Stanford University School of Medicine Email: [EMAIL PROTECTED] Phone: (650) 724-7485 -- Fletcher Cocquyt Senior Systems Administrator Information Resources and Technology (IRT) Stanford University School of Medicine Email: [EMAIL PROTECTED] Phone: (650) 724-7485 -- Fletcher Cocquyt Senior Systems Administrator Information Resources and Technology
[dtrace-discuss] Memory leak scripts
Hola, I am trying to isolate the memory leak I suspect in a mailman installation I found: http://blogs.sun.com/sanjeevb/date/200506 It gives an error: [EMAIL PROTECTED]:~ 9:21am 65 # ./memleak.d 10312 dtrace: failed to compile script ./memleak.d: line 3: probe description pid10312:libc.so.1:malloc:entry does not match any probes I am on SunOS 5.10 Generic_127112-07 i86pc i386 i86pc Are there some better scripts for isolating memory leaks? thanks Fletch. ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] Memory leak scripts
Yes: [EMAIL PROTECTED]:~ 10:02am 52 # ps -ef | grep 10312 mailman 10312 22726 0 09:13:19 ? 0:05 /bin/python /opt/mailman-2.1.9/bin/qrunner --runner=VirginRunner:0:1 -s This is the error for no such process: [EMAIL PROTECTED]:~ 10:04am 53 # ./memleak.d 666 dtrace: failed to compile script ./memleak.d: line 3: failed to grab process 666 [EMAIL PROTECTED]:~ 10:04am 54 # ps -ef | grep 666 root 20386 19893 0 10:04:49 pts/1 0:00 grep 666 [EMAIL PROTECTED]:~ 10:04am 55 # I'm hoping there is a fresher script than this 3yr old one I found via the top google hit for: dtrace script for memory leak The 2nd and third hits are now this thread - gah! I know memory leaks are a non-trivial problem - but the rate of this one is so egregious as to require twice daily restarts of mailman - I like the logic behind checking the alloc/free calls and matching them up... Any tips appreciated - Thanks, Fletcher On 7/1/08 9:41 AM, Michael Schuster [EMAIL PROTECTED] wrote: Fletcher Cocquyt wrote: Hola, I am trying to isolate the memory leak I suspect in a mailman installation I found: http://blogs.sun.com/sanjeevb/date/200506 It gives an error: [EMAIL PROTECTED]:~ 9:21am 65 # ./memleak.d 10312 dtrace: failed to compile script ./memleak.d: line 3: probe description pid10312:libc.so.1:malloc:entry does not match any probes this begs the question: is there a process with pid 10312? Michael -- Fletcher Cocquyt Senior Systems Administrator Information Resources and Technology (IRT) Stanford University School of Medicine Email: [EMAIL PROTECTED] Phone: (650) 724-7485 ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] tcptop error: failed to resolveSS_TCP_FAST_ACCEPT: Unknown variable name
I wanted to post a closing message for this thread. I resolved the system contention on this Solaris VM - although it was not by way of Dtrace. Turns out the VMWare settings in the vmx file for this Solaris VM were not optimal: memsize = 2048 (old file) sched.mem.max = 256 (old file) - (If sched.mem.max is smaller than memsize, the balloon driver can start consuming memory (especially if the Guest Operating system application has peaky memory usage). However, this setting can cause the balloon driver to retain it's hold on memory continuously, even if the Guest Operating System requires it again. This causes Guest Operating System to start swapping and will slow down considerably.) Now I recognize the vmware-memctld process consuming so much CPU was a red flag for this. Once the two settings were brought into line (by using VC and checking Memory resources unlimited) the VM functioned 100x better (responsiveness, workload throughput etc_ Thanks On 1/21/08 2:16 PM, Brendan Gregg - Sun Microsystems [EMAIL PROTECTED] wrote: On Mon, Jan 21, 2008 at 01:55:36PM -0800, Fletcher Cocquyt wrote: Followup - this system has a lot of kernel activity and I/O - (top typically shows CPU 50% kernel) - but the hotkernel blorked with this (eventhough load avg was only ~2 and command line is responsive): [EMAIL PROTECTED]:~ 1:41pm 114 # ./hotkernel Sampling... Hit Ctrl-C to end. dtrace: processing aborted: Abort due to systemic unresponsiveness The system is so busy DTrace has decided to play it safe and abort... Based on a few hunches, try these: - interstat 1 look for a network driver burning CPU - pidpersec.d from the DTraceToolkit (or sar -c 1 100 if DTrace won't behave) look for lots of short lived processes - procsystime -coT from the DTraceToolkit look for frequent syscalls burning CPU time - dtrace -n 'profile-101 { @[stack(5)] = count(); }' (this has a slower profile rate than hotuser) look for hot kernel stacks Brendan ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] where is a working tcptop for Solaris 10 8/07 s10x_u4wos_12b X86?
This is my version 10:46am 61 more /etc/release Solaris 10 8/07 s10x_u4wos_12b X86 Copyright 2007 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 16 August 2007 Did you say what version of Solaris are you on? Thanks ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
[dtrace-discuss] where is a working tcptop?
Hi, I tried both: http://www.nbl.fi/~nbl97/solaris/dtrace/099html/Net/tcptop_snv.html and http://www.nbl.fi/~nbl97/solaris/dtrace/099html/Net/tcptop.html they both give this error: ./tcptop_nevada ./tcptop_nevada[80]: syntax error at line 86 : `' unmatched Is there a central dtrace repository under SVN revision control? Thanks -Original Message- From: Brendan Gregg - Sun Microsystems [mailto:[EMAIL PROTECTED] Sent: Monday, January 21, 2008 2:25 PM To: Fletcher Cocquyt Cc: [EMAIL PROTECTED]; dtrace-discuss@opensolaris.org Subject: Re: [dtrace-discuss] tcptop error: failed toresolve SS_TCP_FAST_ACCEPT: Unknown variable name On Mon, Jan 21, 2008 at 02:17:46PM -0800, Fletcher Cocquyt wrote: Replaced SS_TCP_FAST_ACCEPT with SS_DIRECT in tcptop per the thread you cited - now I get a new error: [EMAIL PROTECTED]:~ 2:14pm 133 # ./tcptop dtrace: failed to compile script /dev/fd/11: line 163: failed to resolve `tcp_g_q: Unknown symbol name I got it from here: http://www.brendangregg.com/DTrace/tcptop is that not up to date? Sorry about that - I've kept the DTraceToolkit bundle up to date, but not individual copies of those scripts in other locations. I'll either update that copy, or link it to the DTraceToolkit bundle when I get a chance. Stefan Parvu has an up to date HTML browsable version of the toolkit here: http://www.nbl.fi/~nbl97/solaris/dtrace/dtt_testing.html Click on 0.99. Brendan -- Brendan [CA, USA] ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] where is a working tcptop?
Same error with the '-latest': [EMAIL PROTECTED]:~ 1:58pm 55 # DTraceToolkit-0.99/Net/tcptop dtrace: failed to compile script /dev/fd/11: line 166: failed to resolve `tcp_g_q: Unknown symbol name I'd like to get this working as network captures are showing retransmits... Thanks -Original Message- From: Brendan Gregg - Sun Microsystems [mailto:[EMAIL PROTECTED] Sent: Thursday, January 24, 2008 11:37 AM To: Fletcher Cocquyt Cc: [EMAIL PROTECTED]; dtrace-discuss@opensolaris.org Subject: Re: where is a working tcptop? G'Day Fletcher, On Thu, Jan 24, 2008 at 10:40:43AM -0800, Fletcher Cocquyt wrote: Hi, I tried both: http://www.nbl.fi/~nbl97/solaris/dtrace/099html/Net/tcptop_snv.html and http://www.nbl.fi/~nbl97/solaris/dtrace/099html/Net/tcptop.html they both give this error: ./tcptop_nevada ./tcptop_nevada[80]: syntax error at line 86 : `' unmatched Hmm, sounds like a HTML-izing bug. The latest version should always be here: http://www.brendangregg.com/DTraceToolkit-latest.tar.gz Brendan -- Brendan [CA, USA] ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] where is a working tcptop?
Different error with that one: [EMAIL PROTECTED]:~ 2:49pm 53 # DTraceToolkit-0.99/Net/tcptop_snv dtrace: failed to compile script /dev/fd/11: line 198: probe description fbt:ip:tcp_xchg:entry does not match any probes Retransmit rate is low (9/3) - but the fact I'm seeing any warrants further analysis Thanks -Original Message- From: Brendan Gregg - Sun Microsystems [mailto:[EMAIL PROTECTED] Sent: Thursday, January 24, 2008 2:48 PM To: Fletcher Cocquyt Cc: [EMAIL PROTECTED]; dtrace-discuss@opensolaris.org Subject: Re: where is a working tcptop? G'Day Fletcher, On Thu, Jan 24, 2008 at 02:39:19PM -0800, Fletcher Cocquyt wrote: Same error with the '-latest': [EMAIL PROTECTED]:~ 1:58pm 55 # DTraceToolkit-0.99/Net/tcptop dtrace: failed to compile script /dev/fd/11: line 166: failed to resolve `tcp_g_q: Unknown symbol name That's not quite the latest: DTraceToolkit-0.99 MANPATH=Man man tcptop ... OS Solaris 10 3/05 DTraceToolkit-0.99 MANPATH=Man man tcptop_snv ... OS Solaris Nevada / OpenSolaris, circa late 2007 Try tcptop_snv. I put the OS field in the man pages for this rev, not only to point out Solaris version support, but also for MacOS X and other OSes with DTrace. I'd like to get this working as network captures are showing retransmits... Thanks This won't help directly with retransmits. What is your retransmit ratio? Brendan -- Brendan [CA, USA] ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
[dtrace-discuss] tcptop error: failed to resolve SS_TCP_FAST_ACCEPT: Unknown variable name
Hi, I am trying to debug the bottle neck(s) in a Solaris 10 Mailman/Spamassassin/Sendmail VMWare VM and get the following error from tcptop: [EMAIL PROTECTED]:~ 1:35pm 103 # ./tcptop dtrace: failed to compile script /dev/fd/11: line 40: failed to resolve SS_TCP_FAST_ACCEPT: Unknown variable name thanks for any insight, Fletcher. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Colin Burgess Sent: Friday, January 18, 2008 1:33 PM To: dtrace-discuss@opensolaris.org Subject: [dtrace-discuss] LatencyTop I see Intel has released a new tool. Oh, it requires some patches to the kernel to record latency times. Good thing people don't mind patching their kernels, eh? So who can write the equivalent latencytop.d the fastest? ;-) http://www.latencytop.org/ -- [EMAIL PROTECTED] ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] tcptop error: failed to resolveSS_TCP_FAST_ACCEPT: Unknown variable name
Followup - this system has a lot of kernel activity and I/O - (top typically shows CPU 50% kernel) - but the hotkernel blorked with this (eventhough load avg was only ~2 and command line is responsive): [EMAIL PROTECTED]:~ 1:41pm 114 # ./hotkernel Sampling... Hit Ctrl-C to end. dtrace: processing aborted: Abort due to systemic unresponsiveness FUNCTIONCOUNT PCNT I'm working my way down the toolkit list - any help on pinpointing the bottlenecks with the appropriate 1st pass tools appreciated. Here is some iotop output - nothing surprising here - sendmail, spamd and mailman (python) are generating I/O: 2008 Jan 21 13:49:54, load: 1.35, disk_r: 32 KB, disk_w: 2424 KB UIDPID PPID CMD DEVICE MAJ MIN DBYTES 0 13413 13412 sendmail sd0 61 0 W 2048 0 13411 13406 sendmail sd0 61 0 W 4096 0 13409 13370 sendmail sd0 61 0 W 5120 0 3 0 fsflush sd0 61 0 W 8192 0 13420 1 sendmail sd0 61 0 W22528 555 3809 3140 spamdsd0 61 0 R32768 0 13419496 sendmail sd0 61 0 W41984 0 13412496 sendmail sd0 61 0 W44032 0 13370496 sendmail sd0 61 0 W50688 0 13413 1 sendmail sd0 61 0 W51712 0 13406496 sendmail sd0 61 0 W71680 0 13414496 sendmail sd0 61 0 W96256 35 24406 24400 python2.4sd0 61 0 W 172032 0 0 0 schedsd0 61 0 W 318464 555 3809 3140 spamdsd0 61 0 W 405504 35 24409 24400 python2.4sd0 61 0 W 1006592 Ideally I'd like to know what the fixable (tunable) bottlenecks are on a system that otherwise has plenty of CPU and memory available Thanks -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Fletcher Cocquyt Sent: Monday, January 21, 2008 1:39 PM To: dtrace-discuss@opensolaris.org Subject: [dtrace-discuss] tcptop error: failed to resolveSS_TCP_FAST_ACCEPT: Unknown variable name Hi, I am trying to debug the bottle neck(s) in a Solaris 10 Mailman/Spamassassin/Sendmail VMWare VM and get the following error from tcptop: [EMAIL PROTECTED]:~ 1:35pm 103 # ./tcptop dtrace: failed to compile script /dev/fd/11: line 40: failed to resolve SS_TCP_FAST_ACCEPT: Unknown variable name thanks for any insight, Fletcher. ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] tcptop error: failed to resolveSS_TCP_FAST_ACCEPT: Unknown variable name
Thanks - prustat works great once I point it at Sun's perl (I was using a newer install) I'm going to record some snapshots when the contention is happening... What if I wanted to quantify the latency (wait times) due to DNS lookups (I suspect I could benefit from a local caching install - but I want a before) picture so I can show how much better it is using a local DNS cache... Thanks, Fletcher -Original Message- From: Brendan Gregg - Sun Microsystems [mailto:[EMAIL PROTECTED] Sent: Monday, January 21, 2008 3:38 PM To: Fletcher Cocquyt Cc: dtrace-discuss@opensolaris.org Subject: Re: [dtrace-discuss] tcptop error: failed to resolveSS_TCP_FAST_ACCEPT: Unknown variable name On Mon, Jan 21, 2008 at 02:48:47PM -0800, Fletcher Cocquyt wrote: Forgive me, where do I find 'interstat' ? Also, where can I get Sun::Solaris::Kstat for prustat? It's probably already under /usr/perl5/5.8.4/lib - it's a vendor (Sun) supplied package. prustat was written as a demo tool - it might be useful, but it will probably fail due to a kernel change. I wrote it when I was a customer to make a point to Sun that this is the sort of tool that customers would like. It turns out that supporting this tool for customers would require stable network providers for DTrace, a project that is still in progress. I never put prustat into the DTraceToolkit because it wasn't stable enough, despite it providing key resource utilisations by process (which is wonderful, and made possible by DTrace). If anyone hasn't seen it, it looks like this: # prustat -ct 20 5 PID %CPU %Mem %Disk %Net COMM 22301 78.84 3.16 0.00 0.00 setiathome 22635 4.09 0.20 69.11 0.00 tar 440 2.76 45.39 0.00 0.00 Xsun 2618 0.31 14.34 0.00 0.00 mozilla-bin 22640 3.87 1.49 0.12 0.00 dtrace 582 2.04 2.16 0.00 0.00 gnome-terminal 576 0.02 2.80 0.00 0.00 nautilus 2299 0.33 1.99 0.00 0.00 acroread 22641 0.00 0.00 1.84 0.00 upsmonitor 578 0.37 1.46 0.00 0.00 gnome-panel 574 0.41 1.31 0.00 0.00 metacity 6504 0.00 1.23 0.00 0.00 nautilus-throbb 593 0.04 1.05 0.00 0.00 mixer_applet2 556 0.00 1.05 0.00 0.00 gconfd-2 549 0.00 0.94 0.00 0.00 gnome-session 6510 0.00 0.93 0.00 0.00 nautilus-text-v 591 0.02 0.83 0.00 0.00 galf-server 21551 0.00 0.56 0.00 0.00 dtterm 4789 0.10 0.45 0.00 0.00 vncviewer 553 0.00 0.43 0.00 0.00 gnome-volcheck the screen updates like William LeFebvre's top. Let me stress again - prustat was written to demonstrate an idea, but is currently unstable as a tool. Brendan -- Brendan [CA, USA] ___ dtrace-discuss mailing list dtrace-discuss@opensolaris.org
Re: [dtrace-discuss] DTraceTools Update
Re: not enough test servers Can't Dtrace testing and development be done on virtual machines? Doesn't Dtrace behave the same on a Solaris 10 virtual machine (eg VMWare's free server?) - and yes as far as I know there is not currently a way to create a Sparc VM, but x86 based OSes are well represented. I'm keen to test out VMWare Lab Manager which purports to be the solution for rapid deployment of whole sets of test system Thanks for continuing the development DTrace -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Brendan Gregg - Sun Microsystems Sent: Wednesday, September 05, 2007 6:04 PM To: Gary Gendel Cc: dtrace-discuss@opensolaris.org Subject: Re: [dtrace-discuss] DTraceTools Update G'Day Folks, Plans to update the DTraceTools (DTraceToolkit)? yes. Development has been happening, but I haven't wanted to upload a new version without addressing the tcp* scripts first. somehow. They have exposed an issue with versioning of unstable scripts and supported OSes, which I'll use this thread to discuss at some length for anyone interested. ... Firstly, most of the DTraceToolkit *is* up to date with the latest snv builds, since most of the DTraceToolkit uses stable DTrace providers (as it should). Some stable providers are not yet available, and until then we are in an awkward place -- people on older (and newer) builds may find some of the fbt based script don't work. I'm currently thinking that it would be practical to only support the following, Solaris 10 3/05 OpenSolaris latest build MacOS X Leopard [insert OSes here after DTrace is ported] I've already made changes to the man pages to show which operating systems each script will run on. This means that the tcp* scripts need updating to support the latest OpenSolaris builds (and updating, and updating, as things keep changing). Of course, life will be somewhat easier when stable networking providers exist, and the tcp* scripts can use their probes (although, I'm expecting tcpsnoop and tcptop to need more than just the network providers to become stable). Several people have asked about the tcp* scripts on Solaris 10 6/06 and other Solaris builds (builds inbetween 3/05 and the latest OpenSolaris). I've wanted to have a go at fixing these scripts for these minor releases - but since moving to the US I've found it harder to re-acquire a pile of test servers to support them (SPARC and x86 servers for every Solaris 10 release == a lot of servers, space and electricity). The desire is there, but the servers are not; not to mention that it will probably eat up a lot of my spare time to port these. Now, if I or someone else do eventually port the tcp* scripts, that then presents a versioning issue in the DTraceToolkit, and I'd prefer not to have fat ugly scripts in a THIS VERSION, THAT VERSION style as Nathan has mentioned. I'm thiking the way ahead would be a Versions directory with entire ported copies of the script. eg, /opt/DTT# ls -1 Net/tcp* Net/Versions/tcp* Net/tcpsnoop Net/tcpsnoop.d Net/tcpstat.d Net/tcptop Net/tcpwdist.d Net/Versions/tcpsnoop.sol10u2.d Net/Versions/tcpsnoop.sol10u3.d Net/Versions/tcpsnoop.sol10u3 Net/Versions/tcptop.sol10u3 Remember, there won't be many scripts in these Versions directories, just those *fbt* based scripts that have broken, so it won't be that common to need to poke around there. However, what happens if I have a *stable* provider based script, and want to enhance it to use newer DTrace features (like multiple aggregations)? I would end up with two or more versions, one for Solaris 10 3/05 (without the enhancements), one for the latest OpenSolaris (with the enhancements), and possibly another for MacOS X (with whatever they support so far), and maybe another for Linux (when they port DTrace :-). Would some of these get moved to the Versions directory (forcing people to frequently look in there)? Do I write a wrapper for every script, isaexec style? Do I deal with it in the installer script, symlinking the correct version based on your OS? Do I have ugly ifdef THIS VERSION statements throughout the scripts?... I don't know what to do yet, but it won't be long before I'll need an answer (I do want to start using some of the new DTrace features, as well as supporting those on Solaris 10 3/05). Ideas? Until I know of a sensible way to do it, I could add scripts with an x after their name - for extended (and rename them when we think of something better). Eg, hotuser # Solaris 10 3/05 (uses ustack() + perl) hotuserx# Latest OpenSolaris (uses ufunc() and umod()) hotuser would be the most glaring example, since the code will become trivial if I can use ufunc() and umod() instead. I don't know which version MacOS X would run (need to check if it has ufunc() and umod())... I should stress that this issue is only for