> -----Original Message-----
> From: Borislav Petkov <b...@alien8.de>
> Sent: Saturday, May 18, 2019 6:26 AM
> To: Ghannam, Yazen <yazen.ghan...@amd.com>
> Cc: linux-e...@vger.kernel.org; linux-kernel@vger.kernel.org; b...@suse.de; 
> tony.l...@intel.com; x...@kernel.org
> Subject: Re: [PATCH v3 4/6] x86/MCE: Make number of MCA banks per_cpu
> 
> 
> On Tue, Apr 30, 2019 at 08:32:20PM +0000, Ghannam, Yazen wrote:
> > From: Yazen Ghannam <yazen.ghan...@amd.com>
> >
> > The number of MCA banks is provided per logical CPU. Historically, this
> > number has been the same across all CPUs, but this is not an
> > architectural guarantee. Future AMD systems may have MCA bank counts
> > that vary between logical CPUs in a system.
> >
> > This issue was partially addressed in
> >
> > 006c077041dc ("x86/mce: Handle varying MCA bank counts")
> >
> > by allocating structures using the maximum number of MCA banks and by
> > saving the maximum MCA bank count in a system as the global count. This
> > means that some extra structures are allocated. Also, this means that
> > CPUs will spend more time in the #MC and other handlers checking extra
> > MCA banks.
> 
> ...
> 
> > @@ -1480,14 +1482,15 @@ EXPORT_SYMBOL_GPL(mce_notify_irq);
> >
> >  static int __mcheck_cpu_mce_banks_init(void)
> >  {
> > +     u8 n_banks = this_cpu_read(mce_num_banks);
> >       struct mce_bank *mce_banks;
> >       int i;
> >
> > -     mce_banks = kcalloc(MAX_NR_BANKS, sizeof(struct mce_bank), 
> > GFP_KERNEL);
> > +     mce_banks = kcalloc(n_banks, sizeof(struct mce_bank), GFP_KERNEL);
> 
> Something changed in mm land or maybe we were lucky and got away with an
> atomic GFP_KERNEL allocation until now but:
> 
> [    2.447838] smp: Bringing up secondary CPUs ...
> [    2.456895] x86: Booting SMP configuration:
> [    2.457822] .... node  #0, CPUs:        #1

The issue seems to be that the allocation is now happening on CPUs other than 
CPU0.

Patch 2 in this set has the same issue. I didn't see it until I turned on the 
"Lock Debugging" config options.

> [    1.344284] BUG: sleeping function called from invalid context at 
> mm/slab.h:418

This message comes from ___might_sleep() which checks the system_state.

On CPU0, system_state=SYSTEM_BOOTING.

On every other CPU, system_state=SYSTEM_SCHEDULING, and that's the only 
system_state where the message is shown.

Changing GFP_KERNEL to GFP_ATOMIC seems to be a fix. Is this appropriate? Or do 
you think there's something else we could try?

Thanks,
Yazen

Reply via email to