Another place to start might be with Brendan Gregg's DTrace tools:
http://www.brendangregg.com/dtrace.html
His prustat, hotuser, hotkernel, and shortlived.d scripts might be
helpful in your situation.
-j
On Mon, May 21, 2007 at 01:50:27PM -0700, Eric Saxe wrote:
> Jeffrey Collyer wrote:
> >5 identical V440s, Solaris 10, storage on a Netapp via NFS, providing
> >access to a mailstore (so 80% read, 20% write).
> >
> >During the day, randomly a machine will start to climb its load from the
> >baseline of 2-3 up to 50-60. Under heavy loading, I've seen it go up to
> >300. All the time will be in split almost 50/50 user and kernel, no idle,
> >nothing in I/O (according to top).
> >
> >I'm suspecting NFS problems, but the Netapp and switch traffic graphics
> >look clean and consistent. Nothing shows network errors, not nfsstat, not
> >the switch ports, not the netapp.
> >
> >And like I mentioned, the problem moves. One day on machine 1, tomorrow
> >on 4, etc No real pattern.
> >
> >How would I go about trying to discover what the kernel is doing when this
> >is happening. Some of the simple dtrace stuff I've tried have just shown
> >me alot lof lwp_parks (the main apps is heavily multithreaded, so that
> >figures).
> >
> >Anyone got any key dtrace probes they look at for NFS or dnlc problems?
> >
> One fairly simple thing to try (to start), would be a "lockstat -I",
> which essentially does some simple kernel profiling.
> In a coarse sense that should give you an idea as to where (the kernel
> at least) is spending the bulk of it's time. You'll want
> to kick that off during one of the load spikes...
>
> Thanks,
> -Eric
> _______________________________________________
> perf-discuss mailing list
> [email protected]
_______________________________________________
perf-discuss mailing list
[email protected]