On Wed, Jun 28, 2017 at 1:00 PM, Ghannam, Yazen <yazen.ghan...@amd.com> wrote: >> -----Original Message----- >> From: themo...@gmail.com [mailto:themo...@gmail.com] On Behalf Of >> Jack Miller >> Sent: Wednesday, June 28, 2017 1:44 PM >> To: Borislav Petkov <b...@suse.de> >> Cc: Jack Miller <j...@codezen.org>; linux-kernel@vger.kernel.org; >> t...@linutronix.de; Ghannam, Yazen <yazen.ghan...@amd.com>; >> x...@kernel.org >> Subject: Re: [PATCH] x86/mce/AMD: Fix partial SMCA bank init when CPU 0 != >> thread 0 >> >> On Wed, Jun 28, 2017 at 4:22 AM, Borislav Petkov <b...@suse.de> wrote: >> > On Tue, Jun 27, 2017 at 07:06:30PM -0500, Jack Miller wrote: >> >> After a call to firmware SwitchBSP(), >> > >> > What is that and who does that? >> >> SwitchBSP() is part of the UEFI MPServices Protocol which I believe is an >> extension but it is supported by all of the firmwares I've tested on. >> >> In this case, I'm using a bootloader to SwitchBSP() so that hardware thread 0 >> (and thus core 0) can be offlined on AMD hardware (cpu0_hotplug >> unsupported). This is currently working by passing 'nomce' to the kernel, but >> obviously I'd prefer not to disable it. >> > > Which core are you using as the BSP with SwitchBSP()?
Core 4, hardware thread 8 overall. I am testing on a Ryzen 7 machine. > >> > >> >> Linux can be booted with a thread >> >> that isn't the first in the system. That thread automatically becomes >> >> CPU 0. >> > >> > Btw, you should be seeing other explosions too as a lot of code >> > assumes CPU 0 is the BSP. >> >> Actually, with 'nomce' or this patch applied the system seems to chug along >> merrily, no further errors in dmesg, no further BUGs. Linux still gets all >> of the >> topology correct (i.e. CPU 0's core/thread/siblings are correctly >> identified) so >> really, aside from userspace programs doing naive stuff with CPU affinity >> (like >> expecting even,odd CPUs to be SMT pairs), I think the overall result here is >> that most threads are interchangeable... except when probing certain >> features like these MCA types. >> > > Do you see 23 banks named in the new BSP's /sys/devices/system/machinecheck/ > folder? You should see non-core banks like l3_cache, umc, etc. With my patch applied, I see entries like l3_cache under hardware thread 0's directory (it's shifted to CPU 1, so machinecheck1). Without my patch, only machinecheck0 has anything interesting in it (insn_fetch, l2_cache etc.) because the init failed on CPU 1. Jack