"16-core x86" - what is the exact platform?  4 quad-4's?  Intel or AMD? 

-----Original Message-----
From: [email protected]
[mailto:observability-discuss-bounces at opensolaris.org] On Behalf Of Jim
Leonard
Sent: Tuesday, September 22, 2009 12:30 PM
To: observability-discuss at opensolaris.org
Subject: [observability-discuss] How to drill down cause of cross-calls
in the kernel? (output provided)

We have a 16-core x86 system that, at seemingly random intervals, will
completely stop responding for several seconds.  Running an mpstat 1
showed something horrifiying:

CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 1004691 397 170 0 0 0 5 0 0 0 100 0 0 (rest of CPUs omitted)

That's over a million cross-calls a second.  Seeing them on CPU0 made me
nervous that they were kernel-related, so I wrote a dtrace to print out
xcalls per second aggregated by PID to see if a specific process was the
culprit.  Here's the output during another random system outage:

2009 Sep 22 12:51:49, load average: 5.90, 5.35, 5.39   xcalls: 637511

   PID                        XCALLCOUNT
   6164                                15
   6165                                15
   28339                               26
   0                               637455

PID 0 is "sched" (aka the kernel).

At this point I'm completely stumped as to what could be causing this.
Any hints or ideas?
--
This message posted from opensolaris.org
_______________________________________________
observability-discuss mailing list
observability-discuss at opensolaris.org

This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you have received this email in error please notify the system manager. This 
message contains confidential information and is intended only for the 
individual named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail.

Reply via email to