Bug#813764: linux-source-3.16: "Dazed and confused, but trying to continue" on X10SDV-TLN4F while using perf top

2017-05-04 Thread Daniel Bakken
I can reproduce this bug on Intel Broadwell processors by running "perf top" 
and another cpu intensive process like "stress -c 8". Within 5 minutes, the 
kernel reports an unexpected NMI:


[2005170.748842] Uhhuh. NMI received for unknown reason 01 on CPU 70.
[2005170.748882] Do you have a strange power saving mode enabled?
[2005170.748900] Dazed and confused, but trying to continue


-- Hardware
CPU: Intel Xeon E5-2695 v4
RAM: 512GB DDR4 2400MHz
System: Supermicro B10DRT

-- Operating System
Debian Release: 8.7
Architecture: amd64 (x86_64)
Kernel: Linux 3.16.0-4-amd64
Init: systemd


Bug#813764: linux-source-3.16: "Dazed and confused, but trying to continue" on X10SDV-TLN4F while using perf top

2016-02-05 Thread Ben Hutchings
Control: reassign -1 src:linux 3.16.7-ckt20-1+deb8u3
Control: tag -1 unreproducible moreinfo

On Thu, 2016-02-04 at 23:25 -0500, Rich Ercolani wrote:
> Package: linux-source-3.16
> Version: 3.16.0-4
> Severity: normal
> 
> Dear Maintainer,
> 
> I was going about my business, using perf top to see what I was spending a
> bunch of my time in on this system, when suddenly, I got this written to
> console:
> 
> [2425302.546957] Uhhuh. NMI received for unknown reason 11 on CPU 1.
> [2425302.547625] Do you have a strange power saving mode enabled?
> [2425302.548291] Dazed and confused, but trying to continue
> 
> I've been running this system for several months at this point, and have not
> seen this issue at any point prior.
> 
> I can't easily reproduce it again, so it may have been something other than
> "perf top" that resulted in the behavior, but that's the only unusual thing
> I can think of.
[...]

perf uses the hardware performance monitoring units (PMUs) which do
trigger NMIs periodically.  Obviously the kernel should expect and
recognise these as related to performance monitoring, but there could
be a bug in that code.

I'm tagging this report as needing more information as we won't be able
to make any progress unless you can find a way to reproduce the bug.

Ben.

-- 
Ben Hutchings
It is a miracle that curiosity survives formal education. - Albert Einstein

signature.asc
Description: This is a digitally signed message part


Bug#813764: linux-source-3.16: "Dazed and confused, but trying to continue" on X10SDV-TLN4F while using perf top

2016-02-04 Thread Rich Ercolani
Package: linux-source-3.16
Version: 3.16.0-4
Severity: normal

Dear Maintainer,

I was going about my business, using perf top to see what I was spending a
bunch of my time in on this system, when suddenly, I got this written to
console:

[2425302.546957] Uhhuh. NMI received for unknown reason 11 on CPU 1.
[2425302.547625] Do you have a strange power saving mode enabled?
[2425302.548291] Dazed and confused, but trying to continue

I've been running this system for several months at this point, and have not
seen this issue at any point prior.

I can't easily reproduce it again, so it may have been something other than
"perf top" that resulted in the behavior, but that's the only unusual thing
I can think of.

Supermicro X10SDV-TLN4F, which is a Xeon D-1540 board.

-- System Information:
Debian Release: 8.2
  APT prefers stable
  APT policy: (1000, 'stable'), (900, 'testing'), (500, 'unstable'), (1, 
'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 3.16.0-4-amd64 (SMP w/16 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)