** Also affects: linux (Ubuntu Jammy)
   Importance: Undecided
       Status: New

** Changed in: linux (Ubuntu Jammy)
       Status: New => In Progress

** Changed in: linux (Ubuntu Jammy)
     Assignee: (unassigned) => Jeff Lane  (bladernr)

** Changed in: linux (Ubuntu Jammy)
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2049637

Title:
  Some SPR systems throw kernel warnings from uncore_discovery.c

Status in intel:
  In Progress
Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Jammy:
  In Progress

Bug description:
  [Impact]
  On some Sapphire Rapids CPUs we are seeing Kernel warnings in the kern.log:
  https://certification.canonical.com/hardware/202311-32288/submission/341156/
  Intel(R) Xeon(R) Gold 6442Y

  Oct 31 03:35:55 N8 kernel: [   92.770372] ------------[ cut here ]------------
  Oct 31 03:35:55 N8 kernel: [   92.825738] WARNING: CPU: 48 PID: 1 at 
arch/x86/events/intel/uncore_discovery.c:184 uncore_insert_box_info+0x134/0x350
  Oct 31 03:35:55 N8 kernel: [   92.953850] Modules linked in:
  Oct 31 03:35:55 N8 kernel: [   92.990464] CPU: 48 PID: 1 Comm: swapper/0 Not 
tainted 5.15.0-88-generic #98-Ubuntu
  Oct 31 03:35:55 N8 kernel: [   93.082179] Hardware name: ASUSTeK COMPUTER 
INC. ESC N8-E11/Z13PN-D32 Series, BIOS 0402 09/08/2023
  Oct 31 03:35:55 N8 kernel: [   93.189501] RIP: 
0010:uncore_insert_box_info+0x134/0x350
  Oct 31 03:35:55 N8 kernel: [   93.206419] Freeing initrd memory: 106936K
  Oct 31 03:35:55 N8 kernel: [   93.253138] Code: c2 01 48 83 c0 04 39 d1 0f 8e 
c6 01 00 00 49 8b 4c 24 38 8b 0c 01 41 89 0c 07 49 8b 74 24 40 8b 34 06 41 89 
34 06 39 f9 75 cf <0f> 0b 4c 89 ff e8 b2 07 33 00 4c 89 f7 e8 aa 07 33 00 5b 41 
5c 41
  Oct 31 03:35:55 N8 kernel: [   93.527071] RSP: 0000:ff5c25ed800efc98 EFLAGS: 
00010246
  Oct 31 03:35:55 N8 kernel: [   93.589669] RAX: 0000000000000008 RBX: 
0000000000000000 RCX: 0000000000000003
  Oct 31 03:35:55 N8 kernel: [   93.675160] RDX: 0000000000000002 RSI: 
0000000000018000 RDI: 0000000000000003
  Oct 31 03:35:55 N8 kernel: [   93.760654] RBP: ff5c25ed800efcc0 R08: 
0000000000000010 R09: ff32ac8a801df260
  Oct 31 03:35:55 N8 kernel: [   93.846130] R10: 0000000000000246 R11: 
00000000ffffffff R12: ff32ac8a8b8412a0
  Oct 31 03:35:55 N8 kernel: [   93.931613] R13: ff5c25ed800efcf8 R14: 
ff32ac8a8aa32cb0 R15: ff32ac8a801df260
  Oct 31 03:35:55 N8 kernel: [   94.017099] FS:  0000000000000000(0000) 
GS:ff32ac99bfa00000(0000) knlGS:0000000000000000
  Oct 31 03:35:55 N8 kernel: [   94.114042] CS:  0010 DS: 0000 ES: 0000 CR0: 
0000000080050033
  Oct 31 03:35:55 N8 kernel: [   94.182871] CR2: 0000000000000000 CR3: 
0000000d07e10001 CR4: 0000000000771ee0
  Oct 31 03:35:55 N8 kernel: [   94.268360] DR0: 0000000000000000 DR1: 
0000000000000000 DR2: 0000000000000000
  Oct 31 03:35:55 N8 kernel: [   94.353828] DR3: 0000000000000000 DR6: 
00000000fffe07f0 DR7: 0000000000000400
  Oct 31 03:35:55 N8 kernel: [   94.439332] PKRU: 55555554
  Oct 31 03:35:55 N8 kernel: [   94.471788] Call Trace:
  Oct 31 03:35:55 N8 kernel: [   94.501100]  <TASK>
  Oct 31 03:35:55 N8 kernel: [   94.526275]  ? show_trace_log_lvl+0x1d6/0x2ea
  Oct 31 03:35:55 N8 kernel: [   94.578457]  ? show_trace_log_lvl+0x1d6/0x2ea
  Oct 31 03:35:55 N8 kernel: [   94.630686]  ? 
parse_discovery_table.isra.0+0x162/0x1a0
  Oct 31 03:35:55 N8 kernel: [   94.693295]  ? show_regs.part.0+0x23/0x29
  Oct 31 03:35:55 N8 kernel: [   94.741331]  ? show_regs.cold+0x8/0xd
  Oct 31 03:35:55 N8 kernel: [   94.785212]  ? 
uncore_insert_box_info+0x134/0x350
  Oct 31 03:35:55 N8 kernel: [   94.841591]  ? __warn+0x8c/0x100
  Oct 31 03:35:55 N8 kernel: [   94.880281]  ? 
uncore_insert_box_info+0x134/0x350
  Oct 31 03:35:55 N8 kernel: [   94.936636]  ? report_bug+0xa4/0xd0
  Oct 31 03:35:55 N8 kernel: [   94.978460]  ? handle_bug+0x39/0x90
  Oct 31 03:35:55 N8 kernel: [   95.020246]  ? exc_invalid_op+0x19/0x70
  Oct 31 03:35:55 N8 kernel: [   95.066232]  ? asm_exc_invalid_op+0x1b/0x20
  Oct 31 03:35:55 N8 kernel: [   95.116341]  ? 
uncore_insert_box_info+0x134/0x350
  Oct 31 03:35:55 N8 kernel: [   95.172708]  ? uncore_insert_box_info+0xe3/0x350
  Oct 31 03:35:55 N8 kernel: [   95.228032]  
parse_discovery_table.isra.0+0x162/0x1a0
  Oct 31 03:35:55 N8 cloud-init[1992]: |.+.o  .o   .o o +|
  Oct 31 03:35:55 N8 kernel: [   95.288570]  
intel_uncore_has_discovery_tables+0x19e/0x270
  Oct 31 03:35:55 N8 kernel: [   95.354298]  ? type_pmu_register+0x2f/0x42
  Oct 31 03:35:55 N8 kernel: [   95.403385]  intel_uncore_init+0xe3/0x226
  Oct 31 03:35:55 N8 kernel: [   95.451409]  ? type_pmu_register+0x42/0x42
  Oct 31 03:35:55 N8 kernel: [   95.500506]  do_one_initcall+0x46/0x1e0
  Oct 31 03:35:55 N8 kernel: [   95.546475]  do_initcalls+0x12f/0x159
  Oct 31 03:35:55 N8 kernel: [   95.590372]  kernel_init_freeable+0x162/0x1b5
  Oct 31 03:35:55 N8 kernel: [   95.642556]  ? rest_init+0x100/0x100
  Oct 31 03:35:55 N8 kernel: [   95.685405]  kernel_init+0x1b/0x150
  Oct 31 03:35:55 N8 kernel: [   95.727228]  ? rest_init+0x100/0x100
  Oct 31 03:35:55 N8 kernel: [   95.770054]  ret_from_fork+0x1f/0x30
  Oct 31 03:35:55 N8 kernel: [   95.812906]  </TASK>
  Oct 31 03:35:55 N8 kernel: [   95.839108] ---[ end trace 2d0c57130f45fd62 ]---

  https://certification.canonical.com/hardware/202305-31570/submission/312593/
  Intel(R) Xeon(R) Gold 6426Y
  Apr 14 17:29:28 ML110Gen11 kernel: [    2.135184] ------------[ cut here 
]------------
  Apr 14 17:29:28 ML110Gen11 kernel: [    2.135185] WARNING: CPU: 0 PID: 1 at 
arch/x86/events/intel/uncore_discovery.c:184 uncore_insert_box_info+0x134/0x350
  Apr 14 17:29:28 ML110Gen11 kernel: [    2.135192] Modules linked in:
  Apr 14 17:29:28 ML110Gen11 kernel: [    2.135194] CPU: 0 PID: 1 Comm: 
swapper/0 Not tainted 5.15.0-69-generic #76-Ubuntu
  Apr 14 17:29:28 ML110Gen11 kernel: [    2.135198] Hardware name: HPE ProLiant 
ML110 Gen11/ProLiant ML110 Gen11, BIOS 1.30 03/01/2023
  Apr 14 17:29:28 ML110Gen11 kernel: [    2.135200] RIP: 
0010:uncore_insert_box_info+0x134/0x350
  Apr 14 17:29:28 ML110Gen11 kernel: [    2.135202] Code: c2 01 48 83 c0 04 39 
d1 0f 8e c6 01 00 00 49 8b 4c 24 38 8b 0c 01 41 89 0c 07 49 8b 74 24 40 8b 34 
06 41 89 34 06 39 f9 75 cf <0f> 0b 4c 89 ff e8 22 a2 32 00 4c 89 f7 e8 1a a2 32 
00 5b 41 5c 41
  Apr 14 17:29:28 ML110Gen11 kernel: [    2.135206] RSP: 0000:ff3b3e198006bc98 
EFLAGS: 00010246
  Apr 14 17:29:28 ML110Gen11 kernel: [    2.135209] RAX: 0000000000000008 RBX: 
0000000000000000 RCX: 0000000000000003
  Apr 14 17:29:28 ML110Gen11 kernel: [    2.135210] RDX: 0000000000000002 RSI: 
0000000000018000 RDI: 0000000000000003
  Apr 14 17:29:28 ML110Gen11 kernel: [    2.135212] RBP: ff3b3e198006bcc0 R08: 
0000000000000010 R09: ff31766844f3c5e0
  Apr 14 17:29:28 ML110Gen11 kernel: [    2.135214] R10: ff31766844fa4438 R11: 
0000000000000000 R12: ff31766844f5fa20
  Apr 14 17:29:28 ML110Gen11 kernel: [    2.135216] R13: ff3b3e198006bcf8 R14: 
ff31766844f3ca20 R15: ff31766844f3c5e0
  Apr 14 17:29:28 ML110Gen11 kernel: [    2.135218] FS:  0000000000000000(0000) 
GS:ff3176e5bf800000(0000) knlGS:0000000000000000
  Apr 14 17:29:28 ML110Gen11 kernel: [    2.135220] CS:  0010 DS: 0000 ES: 0000 
CR0: 0000000080050033
  Apr 14 17:29:28 ML110Gen11 kernel: [    2.135222] CR2: 0000000000000000 CR3: 
0000004f35e10001 CR4: 0000000000771ef0
  Apr 14 17:29:28 ML110Gen11 kernel: [    2.135224] DR0: 0000000000000000 DR1: 
0000000000000000 DR2: 0000000000000000
  Apr 14 17:29:28 ML110Gen11 kernel: [    2.135225] DR3: 0000000000000000 DR6: 
00000000fffe07f0 DR7: 0000000000000400
  Apr 14 17:29:28 ML110Gen11 kernel: [    2.135227] PKRU: 55555554
  Apr 14 17:29:28 ML110Gen11 kernel: [    2.135228] Call Trace:
  Apr 14 17:29:28 ML110Gen11 kernel: [    2.135230]  <TASK>
  Apr 14 17:29:28 ML110Gen11 kernel: [    2.135232]  
parse_discovery_table.isra.0+0x162/0x1a0
  Apr 14 17:29:28 ML110Gen11 kernel: [    2.135235]  
intel_uncore_has_discovery_tables+0x19e/0x270
  Apr 14 17:29:28 ML110Gen11 kernel: [    2.135238]  ? 
type_pmu_register+0x21/0x42
  Apr 14 17:29:28 ML110Gen11 kernel: [    2.135243]  
intel_uncore_init+0xe3/0x226
  Apr 14 17:29:28 ML110Gen11 kernel: [    2.135246]  ? 
type_pmu_register+0x42/0x42
  Apr 14 17:29:28 ML110Gen11 kernel: [    2.135249]  do_one_initcall+0x46/0x1e0
  Apr 14 17:29:28 ML110Gen11 kernel: [    2.135253]  do_initcalls+0x12f/0x159
  Apr 14 17:29:28 ML110Gen11 kernel: [    2.135256]  
kernel_init_freeable+0x162/0x1b5
  Apr 14 17:29:28 ML110Gen11 kernel: [    2.135259]  ? rest_init+0x100/0x100
  Apr 14 17:29:28 ML110Gen11 kernel: [    2.135263]  kernel_init+0x1b/0x150
  Apr 14 17:29:28 ML110Gen11 kernel: [    2.135265]  ? rest_init+0x100/0x100
  Apr 14 17:29:28 ML110Gen11 kernel: [    2.135266]  ret_from_fork+0x1f/0x30
  Apr 14 17:29:28 ML110Gen11 kernel: [    2.135270]  </TASK>
  Apr 14 17:29:28 ML110Gen11 kernel: [    2.135271] ---[ end trace 
6011f2a9999291c3 ]---

  This doesn't happen on ALL SPR platforms, but it does happen
  periodically, and always seems to be centered around
  arch/x86/events/intel/uncore_discovery.c

  This doesn't seem to cause an stability issues that we've seen, but we
  need to know if these are innocuous, and better, can this be fixed so
  the kernel no longer spits out warnings (which triggers the kernel
  taint flag)?

  [Fixes]
  commit 5d515ee40cb57ea5331998f27df7946a69f14dc3
  Author: Kan Liang <kan.li...@linux.intel.com>
  Date: Thu Jan 12 12:01:05 2023 -0800
  perf/x86/uncore: Don't WARN_ON_ONCE() for a broken discovery table

  Clean cherry pick from 6.3 (and exists in Mantic and later already)

  [Test Case]
  On SPR systems, the kernel warning should not appear in kern.log and the 
kernel should not show the taint flag (9) for "Kernel issued warning"

  [Where problems could occur]
  This is a specific bug fix to resolve this issue identified by Intel and 
should not generate issues outside the scope of this fix.

To manage notifications about this bug go to:
https://bugs.launchpad.net/intel/+bug/2049637/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to