Sayali Patil <[email protected]> writes:

> A kernel panic is observed when handling machine check exceptions from
> real mode.
>
>   BUG: Unable to handle kernel data access on read at 0xc00000006be21300
>   Oops: Kernel access of bad area, sig: 11 [#1]
>   NIP [c000000000029e40] arch_irq_work_raise+0x10/0x70
>   LR [c00000000003ffc8] machine_check_queue_event+0xa8/0x150

[14626.841925] MSR:  8000000000001003 <SF,ME,RI,LE>  CR: 88222248  XER: 00000005
[14626.841939] CFAR: c00000000003ffc4 DAR: c00000006be21300 DSISR: 40000000 
IRQMASK: 0


Let's also add the above MSR state along with the call stack showing
MSR[EE] was 0 when this triggered. This also shows the DAR as 0xc....
while MSR[IR|DR] = 0. 

>   Call Trace:
>   [c0000000179d3c70] [c00000000003ff64] machine_check_queue_event+0x44/0x150
>   [c0000000179d3d30] [c0000000000084e0] machine_check_early_common+0x1f0/0x2c0
>
> The crash occurs because arch_irq_work_raise() calls preempt_disable()
> from machine check exception (MCE) handlers running in real mode. In
> this context, accessing the preempt_count can fault, leading to the panic.
>
> The preempt_disable()/preempt_enable() pair in arch_irq_work_raise()
> was originally added by commit 0fe1ac48bef0 ("powerpc/perf_event: Fix
> oops due to perf_event_do_pending call") to avoid races while raising
> irq work from exception context.
>
> Later, commit 471ba0e686cb ("irq_work: Do not raise an IPI when
> queueing work on the local CPU") added preemption protection in
> irq_work_queue() path, while commit 20b876918c06 ("irq_work: Use per
> cpu atomics instead of regular atomics") added equivalent
> protection in irq_work_queue_on() before reaching arch_irq_work_raise():
>
>   irq_work_queue() / irq_work_queue_on()
>     -> preempt_disable()
>       -> __irq_work_queue_local()
>         -> irq_work_raise()
>           -> arch_irq_work_raise()
>
> As a result, callers other than mce_irq_work_raise() already execute
> with preemption disabled, making the additional
> preempt_disable()/preempt_enable() pair in arch_irq_work_raise()
> redundant.
>
> Remove it to avoid accessing preempt_count from real mode context.
>
> Fixes: cc15ff327569 ("powerpc/mce: Avoid using irq_work_queue() in realmode")

Agree with the Fixes tag. This patch actually moved mce to use
arch_irq_work_raise(). It was ok until the CONFIG_PREEMPTION was
disabled on powerpc since macros like preempt_enable|disable() were
mostly a no-op. However, after lazy preemption got enabled, access to
preempt_count while in real mode can cause the issue you described.


One more thing which we should add to the commit msg is:
The arch_irq_work_raise() function executes in NMI context when called
from MCE handler, hence we won't be preempted or scheduled out since we
are in NMI context with MSR[EE]=0, hence it is safe to remove
preempt_disable|enable() call from here.

And let's change the commit subject to:
    powerpc/time: Remove redundant preempt_disable|enable() calls from 
arch_irq_work_raise()


BTW, thanks for adding a nice commit msg with the sequence of events.
With the above changes - pease feel free to add:

Reviewed-by: Ritesh Harjani (IBM) <[email protected]>


> Suggested-by: Mahesh Salgaonkar <[email protected]>
> Signed-off-by: Sayali Patil <[email protected]>
> ---
>  arch/powerpc/kernel/time.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
> index 4bbeb8644d3d..a99eb43f6ce9 100644
> --- a/arch/powerpc/kernel/time.c
> +++ b/arch/powerpc/kernel/time.c
> @@ -471,10 +471,8 @@ void arch_irq_work_raise(void)
>        * which could get tangled up if we're messing with the same state
>        * here.
>        */
> -     preempt_disable();
>       set_irq_work_pending_flag();
>       set_dec(1);
> -     preempt_enable();
>  }
>  
>  static void set_dec_or_work(u64 val)
> -- 
> 2.52.0

Reply via email to