> -----Original Message-----
> From: Borislav Petkov <b...@alien8.de>
> Sent: Monday, April 8, 2019 12:52 PM
> To: Ghannam, Yazen <yazen.ghan...@amd.com>
> Cc: linux-e...@vger.kernel.org; linux-kernel@vger.kernel.org; 
> tony.l...@intel.com; x...@kernel.org
> Subject: Re: [PATCH RESEND 2/5] x86/MCE: Handle MCA controls in a per_cpu way
> 
> On Mon, Apr 08, 2019 at 02:12:16PM +0000, Ghannam, Yazen wrote:
> > From: Yazen Ghannam <yazen.ghan...@amd.com>
> >
> > Current AMD systems have unique MCA banks per logical CPU even though
> > the type of the banks may all align to the same bank number. Each CPU
> > will have control of a set of MCA banks in the hardware and these are
> > not shared with other CPUs.
> >
> > For example, bank 0 may be the Load-Store Unit on every logical CPU, but
> > each bank 0 is a unique structure in the hardware. In other words, there
> > isn't a *single* Load-Store Unit at MCA bank 0 that all logical CPUs
> > share.
> >
> > This idea extends even to non-core MCA banks. For example, CPU0 and CPU4
> > may see a Unified Memory Controller at bank 15, but each CPU is actually
> > seeing a unique hardware structure that is not shared with other CPUs.
> >
> > Because the MCA banks are all unique hardware structures, it would be
> > good to control them in a more granular way. For example, if there is a
> > known issue with the Floating Point Unit on CPU5 and a user wishes to
> > disable an error type on the Floating Point Unit, then it would be good
> > to do this only for CPU5 rather than all CPUs.
> >
> > Also, future AMD systems may have heterogeneous MCA banks. Meaning the
> > bank numbers may not necessarily represent the same types between CPUs.
> > For example, bank 20 visible to CPU0 may be a Unified Memory Controller
> > and bank 20 visible to CPU4 may be a Coherent Slave. So granular control
> > will be even more necessary should the user wish to control specific MCA
> > banks.
> >
> > Split the device attributes from struct mce_bank leaving only the MCA
> > bank control fields.
> >
> > Make struct mce_banks[] per_cpu in order to have more granular control
> > over individual MCA banks in the hardware.
> >
> > Allocate the device attributes statically based on the maximum number of
> > MCA banks supported. The sysfs interface will use as many as needed per
> > CPU. Currently, this is set to mca_cfg.banks, but will be changed to a
> > per_cpu bank count in a future patch.
> >
> > Allocate the MCA control bits dynamically. Use the maximum number of MCA
> > banks supported for now. This will be changed to a per_cpu bank count in
> > a future patch.
> >
> > Redo the sysfs store/show functions to handle the per_cpu mce_banks[].
> >
> > Signed-off-by: Yazen Ghannam <yazen.ghan...@amd.com>
> > ---
> >  arch/x86/kernel/cpu/mce/core.c | 77 ++++++++++++++++++++++------------
> >  1 file changed, 51 insertions(+), 26 deletions(-)
> >
> > diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> > index 8d0d1e8425db..14583c5c6e12 100644
> > --- a/arch/x86/kernel/cpu/mce/core.c
> > +++ b/arch/x86/kernel/cpu/mce/core.c
> > @@ -64,16 +64,21 @@ static DEFINE_MUTEX(mce_sysfs_mutex);
> >
> >  DEFINE_PER_CPU(unsigned, mce_exception_count);
> >
> > +struct mce_bank {
> > +   u64     ctl;    /* subevents to enable */
> > +   bool    init;   /* initialise bank? */
> 
> Keep that vertical alignment as of that of the members if mce_bank_dev
> below.
> 
> > +};
> > +static DEFINE_PER_CPU_READ_MOSTLY(struct mce_bank*, mce_banks);
> 
> Space between mce_bank and *.
> 

Okay.

> > +
> >  #define ATTR_LEN               16
> >  /* One object for each MCE bank, shared by all CPUs */
> > -struct mce_bank {
> > -   u64                     ctl;                    /* subevents to enable 
> > */
> > -   bool                    init;                   /* initialise bank? */
> > +struct mce_bank_dev {
> >     struct device_attribute attr;                   /* device attribute */
> >     char                    attrname[ATTR_LEN];     /* attribute name */
> > +   u8                      bank;                   /* bank number */
> >  };
> > +static struct mce_bank_dev mce_bank_devs[MAX_NR_BANKS];
> 
> What bothers me here is the connection between the mce_bank and the
> mce_bank_dev: it is simply not there.
> 

The connection is the bank number.

> Why isn't there a
> 
>       struct mce_bank_dev *dev;
> 
> in struct mce_bank?
> 
> Because - and correct me if I'm wrong here - but I think if we do
> per-CPU banks, then we need to selectively point from each mce_bank to
> its corresponding mce_bank_dev descriptor so that you have the proper
> names.
> 

The file name is always "bank#", so it seems redundant to make the file 
descriptor per_cpu.

My thinking was to make the file descriptor part common to all CPUs. Then redo 
the show/store functions to handle the per_cpu control values.

> For example, if bank3 on CPU5 is not present/disabled/N/A/whatever, then
> you need to not initialize the that sysfs file there and have:
> 
> /sys/devices/system/machinecheck/machinecheck5/
> ├── bank0
> ├── bank1
> ├── bank10
> ├── bank11
> ├── bank12
> ├── bank13
> ├── bank14
> ├── bank15
> ├── bank16
> ├── bank17
> ├── bank18
> ├── bank19
> ├── bank2
> ├── bank20
> ├── bank21
> ├── bank22
>               <--- bank 3 is not there because unsupported.
> ├── bank4
> ├── bank5
> ├── bank6
> ├── bank7
> ├── bank8
> ├── bank9
> 
> 
> Which means that mce_device_create() should learn to be able to create
> non-contiguous per-CPU bank sysfs files so that you'll have to iterate
> over the per-CPU struct mce_banks array and use only those mce_bank_dev
> * pointers which represent present banks on this CPU only.
> 
> Yes, no, am I way off?
> 

I see what you're saying.

We already have the case where some banks are not initialized either due to 
quirks or because they are Read-as-Zero, but we don't try to skip creating 
their files. With this full set (see patch 5), an unused bank will return a 
control value of 0. Would that be sufficient to indicate that a bank is not 
used?

I don't have any strong opinion about skipping the file creation or not. The 
files are created per CPU, so I think an "if (per_cpu(mce_banks, cpu)[i].ctl)" 
check could be enough to decide to create a file or not.

But I do have a couple of thoughts:
1) Will missing banks confuse users? As mentioned, we already have the case of 
unused/uninitialized banks today, but we don't skip their file creation.
        a) Will this affect any userspace tools?
2) Is the added complexity for file creation/destruction worth it? As 
mentioned, the file will return 0 for unused/uninitialized banks.

Thanks,
Yazen

Reply via email to