Re: kdump kernel randomly hang with tick_periodic call trace on bare metal system

2023-01-16 Thread Baoquan He
Hi Thomas,

On 01/09/23 at 10:57pm, Thomas Gleixner wrote:
> On Tue, Dec 20 2022 at 13:41, Baoquan He wrote:
> > On one intel bare metal system, I can randomly reproduce the kdump hang
> > as below with tick_periodic call trace. Attach the kernel config for
> > reference.
> 
> This has absolutely nothing to do with x2apic IPI shorthands
> 
> > [0.045980] Spurious LAPIC timer interrupt on cpu 0
> 
> So here the CPU receives a spurious Local APIC timer interrupt, but
> that's a red herring.
> 
> > [1.152690] BUG: kernel NULL pointer dereference, address: 
> > 0088
> > [1.159634] #PF: supervisor read access in kernel mode
> > [1.164757] #PF: error_code(0x) - not-present page
> > [1.169882] PGD 0 P4D 0 
> > [1.172407] Oops:  [#1] PREEMPT SMP PTI
> > [1.176578] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
> > 6.0.11-300.fc37.x86_64 #1
> > [1.183870] Hardware name: Dell Inc. PowerEdge R410/0N051F, BIOS 1.11.0 
> > 07/20/2012
> > [1.191420] RIP: 0010:tick_periodic+0x23/0x80
> 
> I'm willing to bet that this is caused by the following line in 
> tick_periodic():
> 
> update_process_times(user_mode(get_irq_regs()));
> 
>user_mode() is invoked with a NULL pointer. user_mode() accesses
>regs->cs. CS is at offset 0x88
> 
> The reason for this is here:
> 
> > [1.280648]  tick_handle_periodic+0x1f/0x70
> > [1.284821]  timer_interrupt+0x14/0x20
> > [1.288561]  __handle_irq_event_percpu+0x46/0x190
> > [1.293253]  handle_irq_event+0x34/0x70
> > [1.297080]  handle_level_irq+0xa8/0x180
> > [1.300993]  resend_irqs+0x5d/0x70
> > [1.304386]  tasklet_action_common.constprop.0+0xab/0xe0
> > [1.309686]  __do_softirq+0xfb/0x319
> > [1.313254]  __irq_exit_rcu+0xd7/0x140
> > [1.316993]  common_interrupt+0xb9/0xd0
> 
> For some reason the timer interrupt is resent in software. I assume it is
> the HPET interrupt because that's what just got initialized.
> 
> > [1.143537] clocksource: hpet: mask: 0x max_cycles: 0x, 
> > max_idle_ns: 133484882848 ns
> 
> and the callchain below just confirms that:
> 
> > [1.393315]  _raw_spin_unlock_irqrestore+0x19/0x40
> > [1.398093]  __setup_irq+0x443/0x6d0
> > [1.401659]  request_threaded_irq+0x109/0x170
> > [1.406005]  hpet_time_init+0x2d/0x4b
> > [1.409661]  x86_late_time_init+0x17/0x34
> > [1.413658]  start_kernel+0x8cf/0x97f
> 
> The software resend code does not go through the regular interrupt entry
> path which explains why get_irq_regs() returns NULL.
> 
> That software resend is bogus especially since the timer interrupt is
> a level interrupt. As dmesg does not say anything about the APIC
> delivery mode I assume this goes through i8259, which fails to set the
> IRQ_LEVEL flag on all interrupt lines.
> 
> The below should fix this.

I got the machine where this random hang often happen, and built the
latest upstream kernel to test. W/o this fix, kdump kernel will hang 
when calling tick_periodic 6 of 10 times. With this patch applied, I
tried about 30 times, kdump kernel all booted up and vmcore dumping 
succeeded.

So this patch fix the issue I met, thanks a lot. And please feel free to
add:

Tested-by: Baoquan He 


> 
> ---
> --- a/arch/x86/kernel/i8259.c
> +++ b/arch/x86/kernel/i8259.c
> @@ -114,6 +114,7 @@ static void make_8259A_irq(unsigned int
>   disable_irq_nosync(irq);
>   io_apic_irqs &= ~(1<   irq_set_chip_and_handler(irq, _chip, handle_level_irq);
> + irq_set_status_flags(irq, IRQ_LEVEL);
>   enable_irq(irq);
>   lapic_assign_legacy_vector(irq, true);
>  }
> --- a/arch/x86/kernel/irqinit.c
> +++ b/arch/x86/kernel/irqinit.c
> @@ -65,8 +65,10 @@ void __init init_ISA_irqs(void)
>  
>   legacy_pic->init(0);
>  
> - for (i = 0; i < nr_legacy_irqs(); i++)
> + for (i = 0; i < nr_legacy_irqs(); i++) {
>   irq_set_chip_and_handler(i, chip, handle_level_irq);
> + irq_set_status_flags(i, IRQ_LEVEL);
> + }
>  }
>  
>  void __init init_IRQ(void)
> 


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: kdump kernel randomly hang with tick_periodic call trace on bare metal system

2023-01-13 Thread Baoquan He
On 01/09/23 at 10:57pm, Thomas Gleixner wrote:
> On Tue, Dec 20 2022 at 13:41, Baoquan He wrote:
> > On one intel bare metal system, I can randomly reproduce the kdump hang
> > as below with tick_periodic call trace. Attach the kernel config for
> > reference.
> 
> This has absolutely nothing to do with x2apic IPI shorthands
> 
> > [0.045980] Spurious LAPIC timer interrupt on cpu 0
> 
> So here the CPU receives a spurious Local APIC timer interrupt, but
> that's a red herring.
> 
> > [1.152690] BUG: kernel NULL pointer dereference, address: 
> > 0088
> > [1.159634] #PF: supervisor read access in kernel mode
> > [1.164757] #PF: error_code(0x) - not-present page
> > [1.169882] PGD 0 P4D 0 
> > [1.172407] Oops:  [#1] PREEMPT SMP PTI
> > [1.176578] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
> > 6.0.11-300.fc37.x86_64 #1
> > [1.183870] Hardware name: Dell Inc. PowerEdge R410/0N051F, BIOS 1.11.0 
> > 07/20/2012
> > [1.191420] RIP: 0010:tick_periodic+0x23/0x80
> 
> I'm willing to bet that this is caused by the following line in 
> tick_periodic():
> 
> update_process_times(user_mode(get_irq_regs()));
> 
>user_mode() is invoked with a NULL pointer. user_mode() accesses
>regs->cs. CS is at offset 0x88
> 
> The reason for this is here:
> 
> > [1.280648]  tick_handle_periodic+0x1f/0x70
> > [1.284821]  timer_interrupt+0x14/0x20
> > [1.288561]  __handle_irq_event_percpu+0x46/0x190
> > [1.293253]  handle_irq_event+0x34/0x70
> > [1.297080]  handle_level_irq+0xa8/0x180
> > [1.300993]  resend_irqs+0x5d/0x70
> > [1.304386]  tasklet_action_common.constprop.0+0xab/0xe0
> > [1.309686]  __do_softirq+0xfb/0x319
> > [1.313254]  __irq_exit_rcu+0xd7/0x140
> > [1.316993]  common_interrupt+0xb9/0xd0
> 
> For some reason the timer interrupt is resent in software. I assume it is
> the HPET interrupt because that's what just got initialized.
> 
> > [1.143537] clocksource: hpet: mask: 0x max_cycles: 0x, 
> > max_idle_ns: 133484882848 ns
> 
> and the callchain below just confirms that:
> 
> > [1.393315]  _raw_spin_unlock_irqrestore+0x19/0x40
> > [1.398093]  __setup_irq+0x443/0x6d0
> > [1.401659]  request_threaded_irq+0x109/0x170
> > [1.406005]  hpet_time_init+0x2d/0x4b
> > [1.409661]  x86_late_time_init+0x17/0x34
> > [1.413658]  start_kernel+0x8cf/0x97f
> 
> The software resend code does not go through the regular interrupt entry
> path which explains why get_irq_regs() returns NULL.
> 
> That software resend is bogus especially since the timer interrupt is
> a level interrupt. As dmesg does not say anything about the APIC
> delivery mode I assume this goes through i8259, which fails to set the
> IRQ_LEVEL flag on all interrupt lines.
> 
> The below should fix this.

Sorry for late reply, just notice this mail.

Thanks a lot for checking this and providing a fix, I will reserve the
lab machine and give it a shot, will feedback once finished. 

> ---
> --- a/arch/x86/kernel/i8259.c
> +++ b/arch/x86/kernel/i8259.c
> @@ -114,6 +114,7 @@ static void make_8259A_irq(unsigned int
>   disable_irq_nosync(irq);
>   io_apic_irqs &= ~(1<   irq_set_chip_and_handler(irq, _chip, handle_level_irq);
> + irq_set_status_flags(irq, IRQ_LEVEL);
>   enable_irq(irq);
>   lapic_assign_legacy_vector(irq, true);
>  }
> --- a/arch/x86/kernel/irqinit.c
> +++ b/arch/x86/kernel/irqinit.c
> @@ -65,8 +65,10 @@ void __init init_ISA_irqs(void)
>  
>   legacy_pic->init(0);
>  
> - for (i = 0; i < nr_legacy_irqs(); i++)
> + for (i = 0; i < nr_legacy_irqs(); i++) {
>   irq_set_chip_and_handler(i, chip, handle_level_irq);
> + irq_set_status_flags(i, IRQ_LEVEL);
> + }
>  }
>  
>  void __init init_IRQ(void)
> 


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: kdump kernel randomly hang with tick_periodic call trace on bare metal system

2023-01-09 Thread Thomas Gleixner
On Tue, Dec 20 2022 at 13:41, Baoquan He wrote:
> On one intel bare metal system, I can randomly reproduce the kdump hang
> as below with tick_periodic call trace. Attach the kernel config for
> reference.

This has absolutely nothing to do with x2apic IPI shorthands

> [0.045980] Spurious LAPIC timer interrupt on cpu 0

So here the CPU receives a spurious Local APIC timer interrupt, but
that's a red herring.

> [1.152690] BUG: kernel NULL pointer dereference, address: 0088
> [1.159634] #PF: supervisor read access in kernel mode
> [1.164757] #PF: error_code(0x) - not-present page
> [1.169882] PGD 0 P4D 0 
> [1.172407] Oops:  [#1] PREEMPT SMP PTI
> [1.176578] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
> 6.0.11-300.fc37.x86_64 #1
> [1.183870] Hardware name: Dell Inc. PowerEdge R410/0N051F, BIOS 1.11.0 
> 07/20/2012
> [1.191420] RIP: 0010:tick_periodic+0x23/0x80

I'm willing to bet that this is caused by the following line in tick_periodic():

update_process_times(user_mode(get_irq_regs()));

   user_mode() is invoked with a NULL pointer. user_mode() accesses
   regs->cs. CS is at offset 0x88

The reason for this is here:

> [1.280648]  tick_handle_periodic+0x1f/0x70
> [1.284821]  timer_interrupt+0x14/0x20
> [1.288561]  __handle_irq_event_percpu+0x46/0x190
> [1.293253]  handle_irq_event+0x34/0x70
> [1.297080]  handle_level_irq+0xa8/0x180
> [1.300993]  resend_irqs+0x5d/0x70
> [1.304386]  tasklet_action_common.constprop.0+0xab/0xe0
> [1.309686]  __do_softirq+0xfb/0x319
> [1.313254]  __irq_exit_rcu+0xd7/0x140
> [1.316993]  common_interrupt+0xb9/0xd0

For some reason the timer interrupt is resent in software. I assume it is
the HPET interrupt because that's what just got initialized.

> [1.143537] clocksource: hpet: mask: 0x max_cycles: 0x, 
> max_idle_ns: 133484882848 ns

and the callchain below just confirms that:

> [1.393315]  _raw_spin_unlock_irqrestore+0x19/0x40
> [1.398093]  __setup_irq+0x443/0x6d0
> [1.401659]  request_threaded_irq+0x109/0x170
> [1.406005]  hpet_time_init+0x2d/0x4b
> [1.409661]  x86_late_time_init+0x17/0x34
> [1.413658]  start_kernel+0x8cf/0x97f

The software resend code does not go through the regular interrupt entry
path which explains why get_irq_regs() returns NULL.

That software resend is bogus especially since the timer interrupt is
a level interrupt. As dmesg does not say anything about the APIC
delivery mode I assume this goes through i8259, which fails to set the
IRQ_LEVEL flag on all interrupt lines.

The below should fix this.

Thanks,

tglx
---
--- a/arch/x86/kernel/i8259.c
+++ b/arch/x86/kernel/i8259.c
@@ -114,6 +114,7 @@ static void make_8259A_irq(unsigned int
disable_irq_nosync(irq);
io_apic_irqs &= ~(1

Re: kdump kernel randomly hang with tick_periodic call trace on bare metal system

2022-12-21 Thread Baoquan He
On 12/21/22 at 12:46pm, Guilherme G. Piccoli wrote:
> On 20/12/2022 02:51, Baoquan He wrote:
> > On 12/20/22 at 01:41pm, Baoquan He wrote:
> >> On one intel bare metal system, I can randomly reproduce the kdump hang
> >> as below with tick_periodic call trace. Attach the kernel config for
> >> reference.
> > 
> > Forgot mentioning this random hang is also caused by adding
> > 'nr_cpus=2' into normal kernel's cmdline, then triggering crash will get
> > kdump kernel hang as below kdump log shown.
> > 
> 
> The weird thing is that you seem to be using "nr_cpus=1" instead - this
> is the cmdline from the log:
> 
> "nr_cpus=2 irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off
> numa=off udev.children-max=2 panic=10 acpi_no_memhotplug
> transparent_hugepage=never nokaslr hest_disable novmcoredd cma=0
> hugetlb_cma=0 disable_cpu_apicid=16 [...]"
> 
> You seems to pass twice the "nr_cpus" thing, and I guess kernel pick the
> last one?

>From the kdump kernel boot log, yes, the nr_cpus=1 is taken. The
parse_early_param() will parse the kernel parameters one by one, then
the last one will take effect. Here, the problem is not at nr_cpus=2 or
1, the bare metal system has 16 cpus, only 2 cpus is present, it seems
to be the halted 14 cpus get wrong message and behave incorrectly to
cause the issue.

> 
> Also, what is "disable_cpu_apicid=16"? Could this be related?

Not really. Please check disable_cpu_apicid in
Documentation/admin-guide/kdump/kdump.rst, it's bsp's apic id.


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: kdump kernel randomly hang with tick_periodic call trace on bare metal system

2022-12-21 Thread Guilherme G. Piccoli
On 20/12/2022 02:51, Baoquan He wrote:
> On 12/20/22 at 01:41pm, Baoquan He wrote:
>> On one intel bare metal system, I can randomly reproduce the kdump hang
>> as below with tick_periodic call trace. Attach the kernel config for
>> reference.
> 
> Forgot mentioning this random hang is also caused by adding
> 'nr_cpus=2' into normal kernel's cmdline, then triggering crash will get
> kdump kernel hang as below kdump log shown.
> 

The weird thing is that you seem to be using "nr_cpus=1" instead - this
is the cmdline from the log:

"nr_cpus=2 irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off
numa=off udev.children-max=2 panic=10 acpi_no_memhotplug
transparent_hugepage=never nokaslr hest_disable novmcoredd cma=0
hugetlb_cma=0 disable_cpu_apicid=16 [...]"

You seems to pass twice the "nr_cpus" thing, and I guess kernel pick the
last one?

Also, what is "disable_cpu_apicid=16"? Could this be related?


Thanks for the report!
Cheers,


Guilherme




___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec