> -----Original Message----- > From: themo...@gmail.com [mailto:themo...@gmail.com] On Behalf Of > Jack Miller > Sent: Wednesday, June 28, 2017 2:53 PM > To: Ghannam, Yazen <yazen.ghan...@amd.com> > Cc: Jack Miller <j...@codezen.org>; Borislav Petkov <b...@suse.de>; linux- > ker...@vger.kernel.org; t...@linutronix.de; x...@kernel.org > Subject: Re: [PATCH] x86/mce/AMD: Fix partial SMCA bank init when CPU 0 != > thread 0 > > On Wed, Jun 28, 2017 at 1:00 PM, Ghannam, Yazen > <yazen.ghan...@amd.com> wrote: > >> -----Original Message----- > >> From: themo...@gmail.com [mailto:themo...@gmail.com] On Behalf Of > >> Jack Miller > >> Sent: Wednesday, June 28, 2017 1:44 PM > >> To: Borislav Petkov <b...@suse.de> > >> Cc: Jack Miller <j...@codezen.org>; linux-kernel@vger.kernel.org; > >> t...@linutronix.de; Ghannam, Yazen <yazen.ghan...@amd.com>; > >> x...@kernel.org > >> Subject: Re: [PATCH] x86/mce/AMD: Fix partial SMCA bank init when CPU > >> 0 != thread 0 > >> > >> On Wed, Jun 28, 2017 at 4:22 AM, Borislav Petkov <b...@suse.de> wrote: > >> > On Tue, Jun 27, 2017 at 07:06:30PM -0500, Jack Miller wrote: > >> >> After a call to firmware SwitchBSP(), > >> > > >> > What is that and who does that? > >> > >> SwitchBSP() is part of the UEFI MPServices Protocol which I believe > >> is an extension but it is supported by all of the firmwares I've tested on. > >> > >> In this case, I'm using a bootloader to SwitchBSP() so that hardware > >> thread 0 (and thus core 0) can be offlined on AMD hardware > >> (cpu0_hotplug unsupported). This is currently working by passing > >> 'nomce' to the kernel, but obviously I'd prefer not to disable it. > >> > > > > Which core are you using as the BSP with SwitchBSP()? > > Core 4, hardware thread 8 overall. I am testing on a Ryzen 7 machine. > > > > >> > > >> >> Linux can be booted with a thread > >> >> that isn't the first in the system. That thread automatically > >> >> becomes CPU 0. > >> > > >> > Btw, you should be seeing other explosions too as a lot of code > >> > assumes CPU 0 is the BSP. > >> > >> Actually, with 'nomce' or this patch applied the system seems to chug > >> along merrily, no further errors in dmesg, no further BUGs. Linux > >> still gets all of the topology correct (i.e. CPU 0's > >> core/thread/siblings are correctly identified) so really, aside from > >> userspace programs doing naive stuff with CPU affinity (like > >> expecting even,odd CPUs to be SMT pairs), I think the overall result > >> here is that most threads are interchangeable... except when probing > certain features like these MCA types. > >> > > > > Do you see 23 banks named in the new BSP's > > /sys/devices/system/machinecheck/ folder? You should see non-core banks > like l3_cache, umc, etc. > > With my patch applied, I see entries like l3_cache under hardware thread 0's > directory (it's shifted to CPU 1, so machinecheck1). > Without my patch, only machinecheck0 has anything interesting in it > (insn_fetch, l2_cache etc.) because the init failed on CPU 1. >
What happens with SMT off? Thanks, Yazen