Re: [PATCH 1/2] x86/MCE/AMD: Check for NULL banks in THR interrupt handler
On Thu, Aug 16, 2018 at 07:00:43PM +, Ghannam, Yazen wrote: > So I think we should keep the NULL pointer checks for now to keep this fix > small. I can make a new patch following your suggestion above. > > We can change the code so that we create the data structures during the > earlier init process, but I think this will be a much bigger change. This > could > fall under the idea of decoupling the handling code from sysfs. Ok, then pls add the simpler fixes at the beginning of the series so that they can go to stable and have the more involved changes follow them. Thx. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. --
RE: [PATCH 1/2] x86/MCE/AMD: Check for NULL banks in THR interrupt handler
> -Original Message- > From: linux-edac-ow...@vger.kernel.org > On Behalf Of Ghannam, Yazen > Sent: Thursday, August 9, 2018 1:18 PM > To: Borislav Petkov > Cc: linux-e...@vger.kernel.org; linux-kernel@vger.kernel.org; > tony.l...@intel.com; x...@kernel.org > Subject: RE: [PATCH 1/2] x86/MCE/AMD: Check for NULL banks in THR > interrupt handler > > > -Original Message- > > From: Borislav Petkov > > Sent: Thursday, August 9, 2018 11:16 AM > > To: Ghannam, Yazen > > Cc: linux-e...@vger.kernel.org; linux-kernel@vger.kernel.org; > > tony.l...@intel.com; x...@kernel.org > > Subject: Re: [PATCH 1/2] x86/MCE/AMD: Check for NULL banks in THR > > interrupt handler > > > > On Thu, Aug 09, 2018 at 09:08:33AM -0500, Yazen Ghannam wrote: > > > From: Yazen Ghannam > > > > > > If threshold_init_device() fails then per_cpu(threshold_banks) will be > > > deallocated. The thresholding interrupt handler will still be active, so > > > > So fix the code so that *that* doesn't happen instead of adding checks > > to the interrupt handler. > > > > I.e., > > > > if (err) { > > mce_threshold_vector = default_threshold_interrupt; > > return err; > > } > > > > Okay. I'll make that change. > I don't think this is enough. We have a gap between when the interrupt handler is set up during boot in __mcheck_cpu_init_vendor() and when all the data structures are created during threshold_init_device(). So I think we should keep the NULL pointer checks for now to keep this fix small. I can make a new patch following your suggestion above. We can change the code so that we create the data structures during the earlier init process, but I think this will be a much bigger change. This could fall under the idea of decoupling the handling code from sysfs. Thanks, Yazen
RE: [PATCH 1/2] x86/MCE/AMD: Check for NULL banks in THR interrupt handler
> -Original Message- > From: Borislav Petkov > Sent: Thursday, August 9, 2018 11:16 AM > To: Ghannam, Yazen > Cc: linux-e...@vger.kernel.org; linux-kernel@vger.kernel.org; > tony.l...@intel.com; x...@kernel.org > Subject: Re: [PATCH 1/2] x86/MCE/AMD: Check for NULL banks in THR > interrupt handler > > On Thu, Aug 09, 2018 at 09:08:33AM -0500, Yazen Ghannam wrote: > > From: Yazen Ghannam > > > > If threshold_init_device() fails then per_cpu(threshold_banks) will be > > deallocated. The thresholding interrupt handler will still be active, so > > So fix the code so that *that* doesn't happen instead of adding checks > to the interrupt handler. > > I.e., > > if (err) { > mce_threshold_vector = default_threshold_interrupt; > return err; > } > Okay. I'll make that change. Thanks, Yazen
Re: [PATCH 1/2] x86/MCE/AMD: Check for NULL banks in THR interrupt handler
On Thu, Aug 09, 2018 at 09:08:33AM -0500, Yazen Ghannam wrote: > From: Yazen Ghannam > > If threshold_init_device() fails then per_cpu(threshold_banks) will be > deallocated. The thresholding interrupt handler will still be active, so So fix the code so that *that* doesn't happen instead of adding checks to the interrupt handler. I.e., if (err) { mce_threshold_vector = default_threshold_interrupt; return err; } -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. --
[PATCH 1/2] x86/MCE/AMD: Check for NULL banks in THR interrupt handler
From: Yazen Ghannam If threshold_init_device() fails then per_cpu(threshold_banks) will be deallocated. The thresholding interrupt handler will still be active, so it's possible to get a NULL pointer dereference if a THR interrupt happens and any of the structures are NULL. Exit the handler if per_cpu(threshold_banks) is NULL and skip NULL banks. MCA error information will still be in the registers. The information will be logged during polling or in another MCA exception or interrupt handler. Fixes: 17ef4af0ec0f ("x86/mce/AMD: Use saved threshold block info in interrupt handler") Cc: # 4.13.x Signed-off-by: Yazen Ghannam --- arch/x86/kernel/cpu/mcheck/mce_amd.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c index dd33c357548f..2dbf34250bbf 100644 --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c @@ -934,13 +934,21 @@ static void log_and_reset_block(struct threshold_block *block) static void amd_threshold_interrupt(void) { struct threshold_block *first_block = NULL, *block = NULL, *tmp = NULL; + struct threshold_bank *th_bank = NULL; unsigned int bank, cpu = smp_processor_id(); + if (!per_cpu(threshold_banks, cpu)) + return; + for (bank = 0; bank < mca_cfg.banks; ++bank) { if (!(per_cpu(bank_map, cpu) & (1 << bank))) continue; - first_block = per_cpu(threshold_banks, cpu)[bank]->blocks; + th_bank = per_cpu(threshold_banks, cpu)[bank]; + if (!th_bank) + continue; + + first_block = th_bank->blocks; if (!first_block) continue; -- 2.17.1