> -----Original Message----- > From: Borislav Petkov <b...@alien8.de> > Sent: Monday, April 8, 2019 12:52 PM > To: Ghannam, Yazen <yazen.ghan...@amd.com> > Cc: linux-e...@vger.kernel.org; linux-kernel@vger.kernel.org; > tony.l...@intel.com; x...@kernel.org > Subject: Re: [PATCH RESEND 2/5] x86/MCE: Handle MCA controls in a per_cpu way > > On Mon, Apr 08, 2019 at 02:12:16PM +0000, Ghannam, Yazen wrote: > > From: Yazen Ghannam <yazen.ghan...@amd.com> > > > > Current AMD systems have unique MCA banks per logical CPU even though > > the type of the banks may all align to the same bank number. Each CPU > > will have control of a set of MCA banks in the hardware and these are > > not shared with other CPUs. > > > > For example, bank 0 may be the Load-Store Unit on every logical CPU, but > > each bank 0 is a unique structure in the hardware. In other words, there > > isn't a *single* Load-Store Unit at MCA bank 0 that all logical CPUs > > share. > > > > This idea extends even to non-core MCA banks. For example, CPU0 and CPU4 > > may see a Unified Memory Controller at bank 15, but each CPU is actually > > seeing a unique hardware structure that is not shared with other CPUs. > > > > Because the MCA banks are all unique hardware structures, it would be > > good to control them in a more granular way. For example, if there is a > > known issue with the Floating Point Unit on CPU5 and a user wishes to > > disable an error type on the Floating Point Unit, then it would be good > > to do this only for CPU5 rather than all CPUs. > > > > Also, future AMD systems may have heterogeneous MCA banks. Meaning the > > bank numbers may not necessarily represent the same types between CPUs. > > For example, bank 20 visible to CPU0 may be a Unified Memory Controller > > and bank 20 visible to CPU4 may be a Coherent Slave. So granular control > > will be even more necessary should the user wish to control specific MCA > > banks. > > > > Split the device attributes from struct mce_bank leaving only the MCA > > bank control fields. > > > > Make struct mce_banks[] per_cpu in order to have more granular control > > over individual MCA banks in the hardware. > > > > Allocate the device attributes statically based on the maximum number of > > MCA banks supported. The sysfs interface will use as many as needed per > > CPU. Currently, this is set to mca_cfg.banks, but will be changed to a > > per_cpu bank count in a future patch. > > > > Allocate the MCA control bits dynamically. Use the maximum number of MCA > > banks supported for now. This will be changed to a per_cpu bank count in > > a future patch. > > > > Redo the sysfs store/show functions to handle the per_cpu mce_banks[]. > > > > Signed-off-by: Yazen Ghannam <yazen.ghan...@amd.com> > > --- > > arch/x86/kernel/cpu/mce/core.c | 77 ++++++++++++++++++++++------------ > > 1 file changed, 51 insertions(+), 26 deletions(-) > > > > diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c > > index 8d0d1e8425db..14583c5c6e12 100644 > > --- a/arch/x86/kernel/cpu/mce/core.c > > +++ b/arch/x86/kernel/cpu/mce/core.c > > @@ -64,16 +64,21 @@ static DEFINE_MUTEX(mce_sysfs_mutex); > > > > DEFINE_PER_CPU(unsigned, mce_exception_count); > > > > +struct mce_bank { > > + u64 ctl; /* subevents to enable */ > > + bool init; /* initialise bank? */ > > Keep that vertical alignment as of that of the members if mce_bank_dev > below. > > > +}; > > +static DEFINE_PER_CPU_READ_MOSTLY(struct mce_bank*, mce_banks); > > Space between mce_bank and *. >
Okay. > > + > > #define ATTR_LEN 16 > > /* One object for each MCE bank, shared by all CPUs */ > > -struct mce_bank { > > - u64 ctl; /* subevents to enable > > */ > > - bool init; /* initialise bank? */ > > +struct mce_bank_dev { > > struct device_attribute attr; /* device attribute */ > > char attrname[ATTR_LEN]; /* attribute name */ > > + u8 bank; /* bank number */ > > }; > > +static struct mce_bank_dev mce_bank_devs[MAX_NR_BANKS]; > > What bothers me here is the connection between the mce_bank and the > mce_bank_dev: it is simply not there. > The connection is the bank number. > Why isn't there a > > struct mce_bank_dev *dev; > > in struct mce_bank? > > Because - and correct me if I'm wrong here - but I think if we do > per-CPU banks, then we need to selectively point from each mce_bank to > its corresponding mce_bank_dev descriptor so that you have the proper > names. > The file name is always "bank#", so it seems redundant to make the file descriptor per_cpu. My thinking was to make the file descriptor part common to all CPUs. Then redo the show/store functions to handle the per_cpu control values. > For example, if bank3 on CPU5 is not present/disabled/N/A/whatever, then > you need to not initialize the that sysfs file there and have: > > /sys/devices/system/machinecheck/machinecheck5/ > ├── bank0 > ├── bank1 > ├── bank10 > ├── bank11 > ├── bank12 > ├── bank13 > ├── bank14 > ├── bank15 > ├── bank16 > ├── bank17 > ├── bank18 > ├── bank19 > ├── bank2 > ├── bank20 > ├── bank21 > ├── bank22 > <--- bank 3 is not there because unsupported. > ├── bank4 > ├── bank5 > ├── bank6 > ├── bank7 > ├── bank8 > ├── bank9 > > > Which means that mce_device_create() should learn to be able to create > non-contiguous per-CPU bank sysfs files so that you'll have to iterate > over the per-CPU struct mce_banks array and use only those mce_bank_dev > * pointers which represent present banks on this CPU only. > > Yes, no, am I way off? > I see what you're saying. We already have the case where some banks are not initialized either due to quirks or because they are Read-as-Zero, but we don't try to skip creating their files. With this full set (see patch 5), an unused bank will return a control value of 0. Would that be sufficient to indicate that a bank is not used? I don't have any strong opinion about skipping the file creation or not. The files are created per CPU, so I think an "if (per_cpu(mce_banks, cpu)[i].ctl)" check could be enough to decide to create a file or not. But I do have a couple of thoughts: 1) Will missing banks confuse users? As mentioned, we already have the case of unused/uninitialized banks today, but we don't skip their file creation. a) Will this affect any userspace tools? 2) Is the added complexity for file creation/destruction worth it? As mentioned, the file will return 0 for unused/uninitialized banks. Thanks, Yazen