"16-core x86" - what is the exact platform? 4 quad-4's? Intel or AMD?
-----Original Message----- From: [email protected] [mailto:observability-discuss-bounces at opensolaris.org] On Behalf Of Jim Leonard Sent: Tuesday, September 22, 2009 12:30 PM To: observability-discuss at opensolaris.org Subject: [observability-discuss] How to drill down cause of cross-calls in the kernel? (output provided) We have a 16-core x86 system that, at seemingly random intervals, will completely stop responding for several seconds. Running an mpstat 1 showed something horrifiying: CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 1004691 397 170 0 0 0 5 0 0 0 100 0 0 (rest of CPUs omitted) That's over a million cross-calls a second. Seeing them on CPU0 made me nervous that they were kernel-related, so I wrote a dtrace to print out xcalls per second aggregated by PID to see if a specific process was the culprit. Here's the output during another random system outage: 2009 Sep 22 12:51:49, load average: 5.90, 5.35, 5.39 xcalls: 637511 PID XCALLCOUNT 6164 15 6165 15 28339 26 0 637455 PID 0 is "sched" (aka the kernel). At this point I'm completely stumped as to what could be causing this. Any hints or ideas? -- This message posted from opensolaris.org _______________________________________________ observability-discuss mailing list observability-discuss at opensolaris.org This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail.
