RE: nmi_watchdog fix for x86_64 to be more like i386
On Thu, 4 Oct 2007, Pallipadi, Venkatesh wrote: > >-Original Message- > >From: [EMAIL PROTECTED] > >[mailto:[EMAIL PROTECTED] On Behalf Of > >Thomas Gleixner > >Sent: Monday, October 01, 2007 11:19 PM > >To: Andi Kleen > >Cc: Arjan van de Ven; David Bahi; LKML; > >[EMAIL PROTECTED]; Andrew Morton; Ingo Molnar; > >Gregory Haskins > >Subject: Re: nmi_watchdog fix for x86_64 to be more like i386 > > > >> > >> The only workaround for chipsets ignoring IRQ affinity would > >be to keep > >> track on which CPU irq 0 happens and then restart APIC timer > >interrupts > >> on the others (or send IPIs) as needed. But that would be > >fairly ugly. > > > >The clock events code does handle this already. The broadcast > >interrupt > >can come in on any cpu. It's just the nmi watchdog which would > >be affected > >by that. > > > > Probably we can workaround this by keeping track of IRQ0 count at percpu > level and > use local apic timer + this percpu counter in NMI. Or just increment > local > apic timer count in IRQ0 with nohz enabled. No, I tried that. It's ugly. The per cpu accounting is the correct way to go if we want to take care of those systems, which ignore the CPU0 binding of irq0. See patch against the x86 tree below. tglx > commit 093976c7ad206a008bd5de4619f40f6bca4a79c3 Author: Thomas Gleixner <[EMAIL PROTECTED]> Date: Fri Oct 5 22:19:18 2007 +0200 x86: Fix irq0 / local apic timer accounting The clock events merge introduced a change to the nmi watchdog code to handle the not longer increasing local apic timer count in the broadcast mode. This is fine for UP, but on SMP it pampers over a stuck CPU which is not handling the broadcast interrupt due to the unconditional sum up of local apic timer count and irq0 count. To cover all cases we need to keep track on which CPU irq0 is handled. In theory this is CPU#0 due to the explicit disabling of irq balancing for irq0, but there are systems which ignore this on the hardware level. The per cpu irq0 accounting allows us to remove the irq0 to CPU0 binding as well. Add a per cpu counter for irq0 and evaluate this instead of the global irq0 count in the nmi watchdog code. Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]> diff --git a/arch/x86/kernel/nmi_32.c b/arch/x86/kernel/nmi_32.c index c7227e2..95d3fc2 100644 --- a/arch/x86/kernel/nmi_32.c +++ b/arch/x86/kernel/nmi_32.c @@ -353,7 +353,8 @@ __kprobes int nmi_watchdog_tick(struct pt_regs * regs, unsigned reason) * Take the local apic timer and PIT/HPET into account. We don't * know which one is active, when we have highres/dyntick on */ - sum = per_cpu(irq_stat, cpu).apic_timer_irqs + kstat_cpu(cpu).irqs[0]; + sum = per_cpu(irq_stat, cpu).apic_timer_irqs + + per_cpu(irq_stat, cpu).irq0_irqs; /* if the none of the timers isn't firing, this cpu isn't doing much */ if (!touched && last_irq_sums[cpu] == sum) { diff --git a/arch/x86/kernel/time_32.c b/arch/x86/kernel/time_32.c index 19a6c67..3571d0a 100644 --- a/arch/x86/kernel/time_32.c +++ b/arch/x86/kernel/time_32.c @@ -157,6 +157,9 @@ EXPORT_SYMBOL(profile_pc); */ irqreturn_t timer_interrupt(int irq, void *dev_id) { + /* Keep nmi watchdog up to date */ + per_cpu(irq_stat, cpu).irq0_irqs++; + #ifdef CONFIG_X86_IO_APIC if (timer_ack) { /* diff --git a/include/asm-x86/hardirq_32.h b/include/asm-x86/hardirq_32.h index ed7cf97..9188635 100644 --- a/include/asm-x86/hardirq_32.h +++ b/include/asm-x86/hardirq_32.h @@ -9,6 +9,7 @@ typedef struct { unsigned long idle_timestamp; unsigned int __nmi_count; /* arch dependent */ unsigned int apic_timer_irqs; /* arch dependent */ + unsigned int irq0_irqs; } cacheline_aligned irq_cpustat_t; DECLARE_PER_CPU(irq_cpustat_t, irq_stat); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: nmi_watchdog fix for x86_64 to be more like i386
>-Original Message- >From: [EMAIL PROTECTED] >[mailto:[EMAIL PROTECTED] On Behalf Of >Thomas Gleixner >Sent: Monday, October 01, 2007 11:19 PM >To: Andi Kleen >Cc: Arjan van de Ven; David Bahi; LKML; >[EMAIL PROTECTED]; Andrew Morton; Ingo Molnar; >Gregory Haskins >Subject: Re: nmi_watchdog fix for x86_64 to be more like i386 > >> >> The only workaround for chipsets ignoring IRQ affinity would >be to keep >> track on which CPU irq 0 happens and then restart APIC timer >interrupts >> on the others (or send IPIs) as needed. But that would be >fairly ugly. > >The clock events code does handle this already. The broadcast >interrupt >can come in on any cpu. It's just the nmi watchdog which would >be affected >by that. > Probably we can workaround this by keeping track of IRQ0 count at percpu level and use local apic timer + this percpu counter in NMI. Or just increment local apic timer count in IRQ0 with nohz enabled. Thanks, Venki - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
> Hummm, a directory appears to be protected, or DNE. I get "failed to > change directory". Sorry typo ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/fix-watchdog -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
On Fri, 2007-10-05 at 18:03 +0200, Andi Kleen wrote: > > If it's agreed that this is the fix - can you submit a proper [PATCH] so > > all users of watchdog_use_timer_and_hpet_on_x86_64.patch can be removed, > > and replaced with yours. > > ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/watchdog-fix > > -Andi > Hummm, a directory appears to be protected, or DNE. I get "failed to change directory". Thx, -PWM - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
> If it's agreed that this is the fix - can you submit a proper [PATCH] so > all users of watchdog_use_timer_and_hpet_on_x86_64.patch can be removed, > and replaced with yours. ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/watchdog-fix -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
On Mon, 2007-10-01 at 23:41 +0200, Andi Kleen wrote: > > IRQ_NOBALANCING is not preventing cpu unplug. It moves the affinity to the > > next CPU, but the check in NMI watchdog for CPU == 0 would not longer > > work. > > That cannot happen right now because cpu_disable() on both i386/x86-64 > reject CPU #0. So just setting IRQ_NOBALANCING is sufficient and both > do that already. I was wrong earlier in being concerned about this. > > > int tick_do_broadcast(cpumask_t mask) > > @@ -137,6 +147,7 @@ int tick_do_broadcast(cpumask_t mask) > > cpu_clear(cpu, mask); > > td = _cpu(tick_cpu_device, cpu); > > td->evtdev->event_handler(td->evtdev); > > + tick_broadcast_account(cpu); > > That would not handle the case with a single CPU running only > irq 0 but not broadcasting I think. > > I believe ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/fix-watchdog > is the correct fix > > -Andi Andi, If it's agreed that this is the fix - can you submit a proper [PATCH] so all users of watchdog_use_timer_and_hpet_on_x86_64.patch can be removed, and replaced with yours. Thank you very much signature.asc Description: PGP signature
Re: nmi_watchdog fix for x86_64 to be more like i386
If it's agreed that this is the fix - can you submit a proper [PATCH] so all users of watchdog_use_timer_and_hpet_on_x86_64.patch can be removed, and replaced with yours. ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/watchdog-fix -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
On Mon, 2007-10-01 at 23:41 +0200, Andi Kleen wrote: IRQ_NOBALANCING is not preventing cpu unplug. It moves the affinity to the next CPU, but the check in NMI watchdog for CPU == 0 would not longer work. That cannot happen right now because cpu_disable() on both i386/x86-64 reject CPU #0. So just setting IRQ_NOBALANCING is sufficient and both do that already. I was wrong earlier in being concerned about this. int tick_do_broadcast(cpumask_t mask) @@ -137,6 +147,7 @@ int tick_do_broadcast(cpumask_t mask) cpu_clear(cpu, mask); td = per_cpu(tick_cpu_device, cpu); td-evtdev-event_handler(td-evtdev); + tick_broadcast_account(cpu); That would not handle the case with a single CPU running only irq 0 but not broadcasting I think. I believe ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/fix-watchdog is the correct fix -Andi Andi, If it's agreed that this is the fix - can you submit a proper [PATCH] so all users of watchdog_use_timer_and_hpet_on_x86_64.patch can be removed, and replaced with yours. Thank you very much signature.asc Description: PGP signature
Re: nmi_watchdog fix for x86_64 to be more like i386
On Fri, 2007-10-05 at 18:03 +0200, Andi Kleen wrote: If it's agreed that this is the fix - can you submit a proper [PATCH] so all users of watchdog_use_timer_and_hpet_on_x86_64.patch can be removed, and replaced with yours. ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/watchdog-fix -Andi Hummm, a directory appears to be protected, or DNE. I get failed to change directory. Thx, -PWM - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
Hummm, a directory appears to be protected, or DNE. I get failed to change directory. Sorry typo ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/fix-watchdog -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: nmi_watchdog fix for x86_64 to be more like i386
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Thomas Gleixner Sent: Monday, October 01, 2007 11:19 PM To: Andi Kleen Cc: Arjan van de Ven; David Bahi; LKML; [EMAIL PROTECTED]; Andrew Morton; Ingo Molnar; Gregory Haskins Subject: Re: nmi_watchdog fix for x86_64 to be more like i386 The only workaround for chipsets ignoring IRQ affinity would be to keep track on which CPU irq 0 happens and then restart APIC timer interrupts on the others (or send IPIs) as needed. But that would be fairly ugly. The clock events code does handle this already. The broadcast interrupt can come in on any cpu. It's just the nmi watchdog which would be affected by that. Probably we can workaround this by keeping track of IRQ0 count at percpu level and use local apic timer + this percpu counter in NMI. Or just increment local apic timer count in IRQ0 with nohz enabled. Thanks, Venki - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: nmi_watchdog fix for x86_64 to be more like i386
On Thu, 4 Oct 2007, Pallipadi, Venkatesh wrote: -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Thomas Gleixner Sent: Monday, October 01, 2007 11:19 PM To: Andi Kleen Cc: Arjan van de Ven; David Bahi; LKML; [EMAIL PROTECTED]; Andrew Morton; Ingo Molnar; Gregory Haskins Subject: Re: nmi_watchdog fix for x86_64 to be more like i386 The only workaround for chipsets ignoring IRQ affinity would be to keep track on which CPU irq 0 happens and then restart APIC timer interrupts on the others (or send IPIs) as needed. But that would be fairly ugly. The clock events code does handle this already. The broadcast interrupt can come in on any cpu. It's just the nmi watchdog which would be affected by that. Probably we can workaround this by keeping track of IRQ0 count at percpu level and use local apic timer + this percpu counter in NMI. Or just increment local apic timer count in IRQ0 with nohz enabled. No, I tried that. It's ugly. The per cpu accounting is the correct way to go if we want to take care of those systems, which ignore the CPU0 binding of irq0. See patch against the x86 tree below. tglx commit 093976c7ad206a008bd5de4619f40f6bca4a79c3 Author: Thomas Gleixner [EMAIL PROTECTED] Date: Fri Oct 5 22:19:18 2007 +0200 x86: Fix irq0 / local apic timer accounting The clock events merge introduced a change to the nmi watchdog code to handle the not longer increasing local apic timer count in the broadcast mode. This is fine for UP, but on SMP it pampers over a stuck CPU which is not handling the broadcast interrupt due to the unconditional sum up of local apic timer count and irq0 count. To cover all cases we need to keep track on which CPU irq0 is handled. In theory this is CPU#0 due to the explicit disabling of irq balancing for irq0, but there are systems which ignore this on the hardware level. The per cpu irq0 accounting allows us to remove the irq0 to CPU0 binding as well. Add a per cpu counter for irq0 and evaluate this instead of the global irq0 count in the nmi watchdog code. Signed-off-by: Thomas Gleixner [EMAIL PROTECTED] diff --git a/arch/x86/kernel/nmi_32.c b/arch/x86/kernel/nmi_32.c index c7227e2..95d3fc2 100644 --- a/arch/x86/kernel/nmi_32.c +++ b/arch/x86/kernel/nmi_32.c @@ -353,7 +353,8 @@ __kprobes int nmi_watchdog_tick(struct pt_regs * regs, unsigned reason) * Take the local apic timer and PIT/HPET into account. We don't * know which one is active, when we have highres/dyntick on */ - sum = per_cpu(irq_stat, cpu).apic_timer_irqs + kstat_cpu(cpu).irqs[0]; + sum = per_cpu(irq_stat, cpu).apic_timer_irqs + + per_cpu(irq_stat, cpu).irq0_irqs; /* if the none of the timers isn't firing, this cpu isn't doing much */ if (!touched last_irq_sums[cpu] == sum) { diff --git a/arch/x86/kernel/time_32.c b/arch/x86/kernel/time_32.c index 19a6c67..3571d0a 100644 --- a/arch/x86/kernel/time_32.c +++ b/arch/x86/kernel/time_32.c @@ -157,6 +157,9 @@ EXPORT_SYMBOL(profile_pc); */ irqreturn_t timer_interrupt(int irq, void *dev_id) { + /* Keep nmi watchdog up to date */ + per_cpu(irq_stat, cpu).irq0_irqs++; + #ifdef CONFIG_X86_IO_APIC if (timer_ack) { /* diff --git a/include/asm-x86/hardirq_32.h b/include/asm-x86/hardirq_32.h index ed7cf97..9188635 100644 --- a/include/asm-x86/hardirq_32.h +++ b/include/asm-x86/hardirq_32.h @@ -9,6 +9,7 @@ typedef struct { unsigned long idle_timestamp; unsigned int __nmi_count; /* arch dependent */ unsigned int apic_timer_irqs; /* arch dependent */ + unsigned int irq0_irqs; } cacheline_aligned irq_cpustat_t; DECLARE_PER_CPU(irq_cpustat_t, irq_stat); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
On Tue, 2 Oct 2007, Andi Kleen wrote: > > Agreed. > > > > I just got a x8664-hrt report, where I found the following oddity: > > > > 0: 1197 172881 IO-APIC-edge timer > > > > That's one of those infamous AMD C1E boxen. Strange, all my systems have > > IRQ#0 on CPU#0 and nowhere else. Any idea ? > > Hmm, in lowestpriority mode it would be possible that the APIC changes > the CPU to #1 once; but IRQ 0 is always set to fixed mode. Also even > if that happens you should have them all on 1. > > Maybe the chipset is just ignoring the IO-APIC configuration in this case? > > Is it always the same chipset? Is it seen on i386 too? > > The problem is really that if this happens it's more than the NMI watchdog > that is broken. If you don't run an additional APIC timer interrupt on CPU #0 > it's possible that CPU #0 won't schedule at all. > > The only workaround for chipsets ignoring IRQ affinity would be to keep > track on which CPU irq 0 happens and then restart APIC timer interrupts > on the others (or send IPIs) as needed. But that would be fairly ugly. The clock events code does handle this already. The broadcast interrupt can come in on any cpu. It's just the nmi watchdog which would be affected by that. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
On Tue, 2 Oct 2007, Andi Kleen wrote: Agreed. I just got a x8664-hrt report, where I found the following oddity: 0: 1197 172881 IO-APIC-edge timer That's one of those infamous AMD C1E boxen. Strange, all my systems have IRQ#0 on CPU#0 and nowhere else. Any idea ? Hmm, in lowestpriority mode it would be possible that the APIC changes the CPU to #1 once; but IRQ 0 is always set to fixed mode. Also even if that happens you should have them all on 1. Maybe the chipset is just ignoring the IO-APIC configuration in this case? Is it always the same chipset? Is it seen on i386 too? The problem is really that if this happens it's more than the NMI watchdog that is broken. If you don't run an additional APIC timer interrupt on CPU #0 it's possible that CPU #0 won't schedule at all. The only workaround for chipsets ignoring IRQ affinity would be to keep track on which CPU irq 0 happens and then restart APIC timer interrupts on the others (or send IPIs) as needed. But that would be fairly ugly. The clock events code does handle this already. The broadcast interrupt can come in on any cpu. It's just the nmi watchdog which would be affected by that. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
> Agreed. > > I just got a x8664-hrt report, where I found the following oddity: > > 0: 1197 172881 IO-APIC-edge timer > > That's one of those infamous AMD C1E boxen. Strange, all my systems have > IRQ#0 on CPU#0 and nowhere else. Any idea ? Hmm, in lowestpriority mode it would be possible that the APIC changes the CPU to #1 once; but IRQ 0 is always set to fixed mode. Also even if that happens you should have them all on 1. Maybe the chipset is just ignoring the IO-APIC configuration in this case? Is it always the same chipset? Is it seen on i386 too? The problem is really that if this happens it's more than the NMI watchdog that is broken. If you don't run an additional APIC timer interrupt on CPU #0 it's possible that CPU #0 won't schedule at all. The only workaround for chipsets ignoring IRQ affinity would be to keep track on which CPU irq 0 happens and then restart APIC timer interrupts on the others (or send IPIs) as needed. But that would be fairly ugly. IRQ 0 is unfortunately an generally under-validated area because Windows doesn't use it. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
On Tue, 02 Oct 2007 07:56:24 +0300 Mika Penttilä <[EMAIL PROTECTED]> wrote: > Here I have with stock FC7 (2.6.22.9-91) kernel : > 0: 107835 133459760 IO-APIC-edge timer > fwiw this is entirely done by the hardware; no irq balancer has touched this one (fc7 has the new irqbalancer which leaves irq0 alone) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
Thomas Gleixner wrote: On Tue, 2 Oct 2007, Andi Kleen wrote: OTOH, the accounting hook would allow us to remove the IRQ#0 -> CPU#0 restriction. Not sure whether it's worth the trouble. Some SIS chipsets hang the machine when you migrate irq 0 to another CPU. It's better to keep that Also I wouldn't be surprised if there are some other assumptions about this elsewhere. Ok in theory it could be done only on SIS, but that probably would really not be worth the trouble Agreed. I just got a x8664-hrt report, where I found the following oddity: 0: 1197 172881 IO-APIC-edge timer That's one of those infamous AMD C1E boxen. Strange, all my systems have IRQ#0 on CPU#0 and nowhere else. Any idea ? tglx Here I have with stock FC7 (2.6.22.9-91) kernel : 0: 107835 133459760 IO-APIC-edge timer Processor: vendor_id: AuthenticAMD cpu family: 15 model: 107 model name: AMD Athlon(tm) 64 X2 Dual Core Processor 4000+ stepping: 1 cpu MHz: 2109.721 cache size: 512 KB MB: Asus M2N-E (NF570) --Mika - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
On Tue, 2 Oct 2007 00:47:12 +0200 (CEST) Thomas Gleixner <[EMAIL PROTECTED]> wrote: > On Tue, 2 Oct 2007, Andi Kleen wrote: > > > > > OTOH, the accounting hook would allow us to remove the IRQ#0 -> > > > CPU#0 restriction. Not sure whether it's worth the trouble. > > > > Some SIS chipsets hang the machine when you migrate irq 0 to another > > CPU. It's better to keep that Also I wouldn't be surprised if there > > are some other assumptions about this elsewhere. > > > > Ok in theory it could be done only on SIS, but that probably would > > really not be worth the trouble > > Agreed. > > I just got a x8664-hrt report, where I found the following oddity: > > 0: 1197 172881 IO-APIC-edge timer > > That's one of those infamous AMD C1E boxen. Strange, all my systems > have IRQ#0 on CPU#0 and nowhere else. Any idea ? 2 things; the current irq balancers don't balance the timer interrupt, in fact they'll leave it alone (so the chipset might balance) older ones pin it to cpu 0 or rotate it - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
On Tue, 2 Oct 2007, Andi Kleen wrote: > > > OTOH, the accounting hook would allow us to remove the IRQ#0 -> CPU#0 > > restriction. Not sure whether it's worth the trouble. > > Some SIS chipsets hang the machine when you migrate irq 0 to another > CPU. It's better to keep that Also I wouldn't be surprised if there are some > other assumptions about this elsewhere. > > Ok in theory it could be done only on SIS, but that probably would really > not be worth the trouble Agreed. I just got a x8664-hrt report, where I found the following oddity: 0: 1197 172881 IO-APIC-edge timer That's one of those infamous AMD C1E boxen. Strange, all my systems have IRQ#0 on CPU#0 and nowhere else. Any idea ? tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
> OTOH, the accounting hook would allow us to remove the IRQ#0 -> CPU#0 > restriction. Not sure whether it's worth the trouble. Some SIS chipsets hang the machine when you migrate irq 0 to another CPU. It's better to keep that Also I wouldn't be surprised if there are some other assumptions about this elsewhere. Ok in theory it could be done only on SIS, but that probably would really not be worth the trouble -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
On Mon, 1 Oct 2007, Andi Kleen wrote: > > > IRQ_NOBALANCING is not preventing cpu unplug. It moves the affinity to the > > next CPU, but the check in NMI watchdog for CPU == 0 would not longer > > work. > > That cannot happen right now because cpu_disable() on both i386/x86-64 > reject CPU #0. So just setting IRQ_NOBALANCING is sufficient and both > do that already. I was wrong earlier in being concerned about this. > > > int tick_do_broadcast(cpumask_t mask) > > @@ -137,6 +147,7 @@ int tick_do_broadcast(cpumask_t mask) > > cpu_clear(cpu, mask); > > td = _cpu(tick_cpu_device, cpu); > > td->evtdev->event_handler(td->evtdev); > > + tick_broadcast_account(cpu); > > That would not handle the case with a single CPU running only > irq 0 but not broadcasting I think. Hmm. The only situation where this can happen is when you add "nolapic_timer" to the command line on a single CPU system. We do not register the lapic "dummy" clock event device then. > I believe > ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/fix-watchdog > is the correct fix Yup, I completely missed the fact, that we reject CPU#0 unplugging, so your fix seems indeed to be more correct and simpler. OTOH, the accounting hook would allow us to remove the IRQ#0 -> CPU#0 restriction. Not sure whether it's worth the trouble. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
> IRQ_NOBALANCING is not preventing cpu unplug. It moves the affinity to the > next CPU, but the check in NMI watchdog for CPU == 0 would not longer > work. That cannot happen right now because cpu_disable() on both i386/x86-64 reject CPU #0. So just setting IRQ_NOBALANCING is sufficient and both do that already. I was wrong earlier in being concerned about this. > int tick_do_broadcast(cpumask_t mask) > @@ -137,6 +147,7 @@ int tick_do_broadcast(cpumask_t mask) > cpu_clear(cpu, mask); > td = _cpu(tick_cpu_device, cpu); > td->evtdev->event_handler(td->evtdev); > + tick_broadcast_account(cpu); That would not handle the case with a single CPU running only irq 0 but not broadcasting I think. I believe ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/fix-watchdog is the correct fix -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
On Mon, 1 Oct 2007, Arjan van de Ven wrote: > > > I already did this here by checking for cpu != 0. But it also needs > > > either tracking or forbidding migrations of irq 0. I can take care > > > of the patch. > > > > I was thinking about the same fix. On i386 we already have the irq > > migration / balancing of irq 0 disabled. That's why we setup IRQ0 with > > IRQ_NOBALANCING. > > btw doing this is a problem if the user decides to hot(un)plug cpu 0... > he then can't move the irqs away to do that IRQ_NOBALANCING is not preventing cpu unplug. It moves the affinity to the next CPU, but the check in NMI watchdog for CPU == 0 would not longer work. Fix below. Post .23 material. I work out a separate one for the x8664 clock events series. tglx > [PATCH] i386: Fix nmi watchdog per cpu timer irq accounting The clock events patches changed the interrupt distribution and the local apic timer interrupt accounting for the broadcast case. The per cpu clock events handler of the cpu, which runs the broadcast interrupt, is executed directly in the broadcast irq context. This does not invoke the low level arch code, which does the local apic timer irq accounting. The work around for false positives in the nmi watchdog was to add the irq0 interrupts (broadcast device) to the local apic timer interrupts. This falsifies the results for the CPUs which are not handling the broadcast interrupt, i.e. stuck CPUs might be not detected, as noticed by Andi Kleen. It would be possible to move the clockevents handler invocation of the CPU which runs the broadcast interrupt into the tick device broadcast function, but this would require to handle the per cpu device to this function and perform the direct operation in the clock device specific architecture code. Right now this is only i386 and x86_64, but MIPS is on the way to use the broadcast mode as well. Introduce a weak function tick_broadcast_account(), which allows x86 to adjust the local apic timer interrupt counter in the case when the cpu local timer handler has been invoked. This keeps the cpu local handler decision and invocation in the common code and allows x86 to handle the nmi watchdog accounting correctly. Signed-off-by: Thomas Gleixner <[EMAIL PROTECTED]> diff --git a/arch/i386/kernel/apic.c b/arch/i386/kernel/apic.c index 3d67ae1..180dde8 100644 --- a/arch/i386/kernel/apic.c +++ b/arch/i386/kernel/apic.c @@ -283,6 +283,16 @@ static void lapic_timer_broadcast(cpumask_t mask) } /* + * Called from the broadcasting code to keep the local apic timer irq + * accounting straight for the nmi watchdog. Is called with interrupts + * disabled. + */ +void tick_broadcast_account(int cpu) +{ + per_cpu(irq_stat, cpu).apic_timer_irqs++; +} + +/* * Setup the local APIC timer for this CPU. Copy the initilized values * of the boot CPU and register the clock event in the framework. */ diff --git a/arch/i386/kernel/nmi.c b/arch/i386/kernel/nmi.c index c7227e2..03cdcaf 100644 --- a/arch/i386/kernel/nmi.c +++ b/arch/i386/kernel/nmi.c @@ -349,11 +349,7 @@ __kprobes int nmi_watchdog_tick(struct pt_regs * regs, unsigned reason) cpu_clear(cpu, backtrace_mask); } - /* -* Take the local apic timer and PIT/HPET into account. We don't -* know which one is active, when we have highres/dyntick on -*/ - sum = per_cpu(irq_stat, cpu).apic_timer_irqs + kstat_cpu(cpu).irqs[0]; + sum = per_cpu(irq_stat, cpu).apic_timer_irqs; /* if the none of the timers isn't firing, this cpu isn't doing much */ if (!touched && last_irq_sums[cpu] == sum) { diff --git a/include/linux/tick.h b/include/linux/tick.h index 9a7252e..99b3021 100644 --- a/include/linux/tick.h +++ b/include/linux/tick.h @@ -73,6 +73,7 @@ static inline void tick_cancel_sched_timer(int cpu) { } # ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST extern struct tick_device *tick_get_broadcast_device(void); extern cpumask_t *tick_get_broadcast_mask(void); +extern void tick_broadcast_account(int cpu); # ifdef CONFIG_TICK_ONESHOT extern cpumask_t *tick_get_broadcast_oneshot_mask(void); diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c index 0962e05..43d0085 100644 --- a/kernel/time/tick-broadcast.c +++ b/kernel/time/tick-broadcast.c @@ -123,6 +123,16 @@ int tick_device_uses_broadcast(struct clock_event_device *dev, int cpu) } /* + * Weak function for cpu local interrupt accounting. Used by x86 to + * keep the lapic accounting correct for nmi_watchdog. + * + * Must be called with interrupts disabled. + */ +void __attribute__((weak)) tick_broadcast_account(int cpu) +{ +} + +/* * Broadcast the event to the cpus, which are set in the mask */ int tick_do_broadcast(cpumask_t mask) @@ -137,6 +147,7 @@ int tick_do_broadcast(cpumask_t mask) cpu_clear(cpu, mask); td = _cpu(tick_cpu_device, cpu); td->evtdev->event_handler(td->evtdev); +
Re: nmi_watchdog fix for x86_64 to be more like i386
On Mon, Oct 01, 2007 at 12:56:26PM -0700, Arjan van de Ven wrote: > On Mon, 1 Oct 2007 21:27:39 +0200 (CEST) > > > > I already did this here by checking for cpu != 0. But it also needs > > > either tracking or forbidding migrations of irq 0. I can take care > > > of the patch. > > > > I was thinking about the same fix. On i386 we already have the irq > > migration / balancing of irq 0 disabled. That's why we setup IRQ0 with > > IRQ_NOBALANCING. > > btw doing this is a problem if the user decides to hot(un)plug cpu 0... > he then can't move the irqs away to do that Last I tried it, x86_64 and i386 in fact prohibit hotunplugging CPU 0. Thanx, Paul - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
On Mon, Oct 01, 2007 at 12:56:26PM -0700, Arjan van de Ven wrote: > On Mon, 1 Oct 2007 21:27:39 +0200 (CEST) > > > > I already did this here by checking for cpu != 0. But it also needs > > > either tracking or forbidding migrations of irq 0. I can take care > > > of the patch. > > > > I was thinking about the same fix. On i386 we already have the irq > > migration / balancing of irq 0 disabled. That's why we setup IRQ0 with > > IRQ_NOBALANCING. > > btw doing this is a problem if the user decides to hot(un)plug cpu 0... > he then can't move the irqs away to do that You can't hot unplug cpu0. Take a look in sysfs, no /sys/devices/system/cpu/cpu0/online file. Dave -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
On Mon, 1 Oct 2007 21:27:39 +0200 (CEST) > > I already did this here by checking for cpu != 0. But it also needs > > either tracking or forbidding migrations of irq 0. I can take care > > of the patch. > > I was thinking about the same fix. On i386 we already have the irq > migration / balancing of irq 0 disabled. That's why we setup IRQ0 with > IRQ_NOBALANCING. btw doing this is a problem if the user decides to hot(un)plug cpu 0... he then can't move the irqs away to do that - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
On Mon, 1 Oct 2007, Andi Kleen wrote: > On Monday 01 October 2007 20:54:21 Thomas Gleixner wrote: > > On Mon, 1 Oct 2007, Andi Kleen wrote: > > > > > On Wednesday 26 September 2007 20:03:12 David Bahi wrote: > > > > Thanks to tglx and ghaskins for all the help in tracking down a very > > > > early nmi_watchdog crash on certain x86_64 machines. > > > > > > The patch is totally bogus. irq 0 doesn't say anything about whether > > > the current CPU still works or not. You always need some local > > > interrupt. This basically disables the NMI watchdog for the non boot CPUs. > > > > > > It's even wrong on i386 -- i wonder how that broken patch > > > made it in there. I'll remove it there. > > > > Right, it's wrong for the broadcast case, but simply removing it will > > trigger false positives on the CPU which runs the broadcast timer. I > > fix this proper. > > I already did this here by checking for cpu != 0. But it also needs either > tracking > or forbidding migrations of irq 0. I can take care of the patch. I was thinking about the same fix. On i386 we already have the irq migration / balancing of irq 0 disabled. That's why we setup IRQ0 with IRQ_NOBALANCING. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
On Monday 01 October 2007 20:54:21 Thomas Gleixner wrote: > On Mon, 1 Oct 2007, Andi Kleen wrote: > > > On Wednesday 26 September 2007 20:03:12 David Bahi wrote: > > > Thanks to tglx and ghaskins for all the help in tracking down a very > > > early nmi_watchdog crash on certain x86_64 machines. > > > > The patch is totally bogus. irq 0 doesn't say anything about whether > > the current CPU still works or not. You always need some local > > interrupt. This basically disables the NMI watchdog for the non boot CPUs. > > > > It's even wrong on i386 -- i wonder how that broken patch > > made it in there. I'll remove it there. > > Right, it's wrong for the broadcast case, but simply removing it will > trigger false positives on the CPU which runs the broadcast timer. I > fix this proper. I already did this here by checking for cpu != 0. But it also needs either tracking or forbidding migrations of irq 0. I can take care of the patch. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
On Mon, 1 Oct 2007, Andi Kleen wrote: > On Wednesday 26 September 2007 20:03:12 David Bahi wrote: > > Thanks to tglx and ghaskins for all the help in tracking down a very > > early nmi_watchdog crash on certain x86_64 machines. > > The patch is totally bogus. irq 0 doesn't say anything about whether > the current CPU still works or not. You always need some local > interrupt. This basically disables the NMI watchdog for the non boot CPUs. > > It's even wrong on i386 -- i wonder how that broken patch > made it in there. I'll remove it there. Right, it's wrong for the broadcast case, but simply removing it will trigger false positives on the CPU which runs the broadcast timer. I fix this proper. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
On Wednesday 26 September 2007 20:03:12 David Bahi wrote: > Thanks to tglx and ghaskins for all the help in tracking down a very > early nmi_watchdog crash on certain x86_64 machines. The patch is totally bogus. irq 0 doesn't say anything about whether the current CPU still works or not. You always need some local interrupt. This basically disables the NMI watchdog for the non boot CPUs. It's even wrong on i386 -- i wonder how that broken patch made it in there. I'll remove it there. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
On Wednesday 26 September 2007 20:03:12 David Bahi wrote: Thanks to tglx and ghaskins for all the help in tracking down a very early nmi_watchdog crash on certain x86_64 machines. The patch is totally bogus. irq 0 doesn't say anything about whether the current CPU still works or not. You always need some local interrupt. This basically disables the NMI watchdog for the non boot CPUs. It's even wrong on i386 -- i wonder how that broken patch made it in there. I'll remove it there. -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
On Mon, 1 Oct 2007, Andi Kleen wrote: On Wednesday 26 September 2007 20:03:12 David Bahi wrote: Thanks to tglx and ghaskins for all the help in tracking down a very early nmi_watchdog crash on certain x86_64 machines. The patch is totally bogus. irq 0 doesn't say anything about whether the current CPU still works or not. You always need some local interrupt. This basically disables the NMI watchdog for the non boot CPUs. It's even wrong on i386 -- i wonder how that broken patch made it in there. I'll remove it there. Right, it's wrong for the broadcast case, but simply removing it will trigger false positives on the CPU which runs the broadcast timer. I fix this proper. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
On Monday 01 October 2007 20:54:21 Thomas Gleixner wrote: On Mon, 1 Oct 2007, Andi Kleen wrote: On Wednesday 26 September 2007 20:03:12 David Bahi wrote: Thanks to tglx and ghaskins for all the help in tracking down a very early nmi_watchdog crash on certain x86_64 machines. The patch is totally bogus. irq 0 doesn't say anything about whether the current CPU still works or not. You always need some local interrupt. This basically disables the NMI watchdog for the non boot CPUs. It's even wrong on i386 -- i wonder how that broken patch made it in there. I'll remove it there. Right, it's wrong for the broadcast case, but simply removing it will trigger false positives on the CPU which runs the broadcast timer. I fix this proper. I already did this here by checking for cpu != 0. But it also needs either tracking or forbidding migrations of irq 0. I can take care of the patch. -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
On Mon, 1 Oct 2007, Andi Kleen wrote: On Monday 01 October 2007 20:54:21 Thomas Gleixner wrote: On Mon, 1 Oct 2007, Andi Kleen wrote: On Wednesday 26 September 2007 20:03:12 David Bahi wrote: Thanks to tglx and ghaskins for all the help in tracking down a very early nmi_watchdog crash on certain x86_64 machines. The patch is totally bogus. irq 0 doesn't say anything about whether the current CPU still works or not. You always need some local interrupt. This basically disables the NMI watchdog for the non boot CPUs. It's even wrong on i386 -- i wonder how that broken patch made it in there. I'll remove it there. Right, it's wrong for the broadcast case, but simply removing it will trigger false positives on the CPU which runs the broadcast timer. I fix this proper. I already did this here by checking for cpu != 0. But it also needs either tracking or forbidding migrations of irq 0. I can take care of the patch. I was thinking about the same fix. On i386 we already have the irq migration / balancing of irq 0 disabled. That's why we setup IRQ0 with IRQ_NOBALANCING. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
On Mon, 1 Oct 2007 21:27:39 +0200 (CEST) I already did this here by checking for cpu != 0. But it also needs either tracking or forbidding migrations of irq 0. I can take care of the patch. I was thinking about the same fix. On i386 we already have the irq migration / balancing of irq 0 disabled. That's why we setup IRQ0 with IRQ_NOBALANCING. btw doing this is a problem if the user decides to hot(un)plug cpu 0... he then can't move the irqs away to do that - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
On Mon, Oct 01, 2007 at 12:56:26PM -0700, Arjan van de Ven wrote: On Mon, 1 Oct 2007 21:27:39 +0200 (CEST) I already did this here by checking for cpu != 0. But it also needs either tracking or forbidding migrations of irq 0. I can take care of the patch. I was thinking about the same fix. On i386 we already have the irq migration / balancing of irq 0 disabled. That's why we setup IRQ0 with IRQ_NOBALANCING. btw doing this is a problem if the user decides to hot(un)plug cpu 0... he then can't move the irqs away to do that You can't hot unplug cpu0. Take a look in sysfs, no /sys/devices/system/cpu/cpu0/online file. Dave -- http://www.codemonkey.org.uk - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
On Mon, Oct 01, 2007 at 12:56:26PM -0700, Arjan van de Ven wrote: On Mon, 1 Oct 2007 21:27:39 +0200 (CEST) I already did this here by checking for cpu != 0. But it also needs either tracking or forbidding migrations of irq 0. I can take care of the patch. I was thinking about the same fix. On i386 we already have the irq migration / balancing of irq 0 disabled. That's why we setup IRQ0 with IRQ_NOBALANCING. btw doing this is a problem if the user decides to hot(un)plug cpu 0... he then can't move the irqs away to do that Last I tried it, x86_64 and i386 in fact prohibit hotunplugging CPU 0. Thanx, Paul - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
On Mon, 1 Oct 2007, Arjan van de Ven wrote: I already did this here by checking for cpu != 0. But it also needs either tracking or forbidding migrations of irq 0. I can take care of the patch. I was thinking about the same fix. On i386 we already have the irq migration / balancing of irq 0 disabled. That's why we setup IRQ0 with IRQ_NOBALANCING. btw doing this is a problem if the user decides to hot(un)plug cpu 0... he then can't move the irqs away to do that IRQ_NOBALANCING is not preventing cpu unplug. It moves the affinity to the next CPU, but the check in NMI watchdog for CPU == 0 would not longer work. Fix below. Post .23 material. I work out a separate one for the x8664 clock events series. tglx [PATCH] i386: Fix nmi watchdog per cpu timer irq accounting The clock events patches changed the interrupt distribution and the local apic timer interrupt accounting for the broadcast case. The per cpu clock events handler of the cpu, which runs the broadcast interrupt, is executed directly in the broadcast irq context. This does not invoke the low level arch code, which does the local apic timer irq accounting. The work around for false positives in the nmi watchdog was to add the irq0 interrupts (broadcast device) to the local apic timer interrupts. This falsifies the results for the CPUs which are not handling the broadcast interrupt, i.e. stuck CPUs might be not detected, as noticed by Andi Kleen. It would be possible to move the clockevents handler invocation of the CPU which runs the broadcast interrupt into the tick device broadcast function, but this would require to handle the per cpu device to this function and perform the direct operation in the clock device specific architecture code. Right now this is only i386 and x86_64, but MIPS is on the way to use the broadcast mode as well. Introduce a weak function tick_broadcast_account(), which allows x86 to adjust the local apic timer interrupt counter in the case when the cpu local timer handler has been invoked. This keeps the cpu local handler decision and invocation in the common code and allows x86 to handle the nmi watchdog accounting correctly. Signed-off-by: Thomas Gleixner [EMAIL PROTECTED] diff --git a/arch/i386/kernel/apic.c b/arch/i386/kernel/apic.c index 3d67ae1..180dde8 100644 --- a/arch/i386/kernel/apic.c +++ b/arch/i386/kernel/apic.c @@ -283,6 +283,16 @@ static void lapic_timer_broadcast(cpumask_t mask) } /* + * Called from the broadcasting code to keep the local apic timer irq + * accounting straight for the nmi watchdog. Is called with interrupts + * disabled. + */ +void tick_broadcast_account(int cpu) +{ + per_cpu(irq_stat, cpu).apic_timer_irqs++; +} + +/* * Setup the local APIC timer for this CPU. Copy the initilized values * of the boot CPU and register the clock event in the framework. */ diff --git a/arch/i386/kernel/nmi.c b/arch/i386/kernel/nmi.c index c7227e2..03cdcaf 100644 --- a/arch/i386/kernel/nmi.c +++ b/arch/i386/kernel/nmi.c @@ -349,11 +349,7 @@ __kprobes int nmi_watchdog_tick(struct pt_regs * regs, unsigned reason) cpu_clear(cpu, backtrace_mask); } - /* -* Take the local apic timer and PIT/HPET into account. We don't -* know which one is active, when we have highres/dyntick on -*/ - sum = per_cpu(irq_stat, cpu).apic_timer_irqs + kstat_cpu(cpu).irqs[0]; + sum = per_cpu(irq_stat, cpu).apic_timer_irqs; /* if the none of the timers isn't firing, this cpu isn't doing much */ if (!touched last_irq_sums[cpu] == sum) { diff --git a/include/linux/tick.h b/include/linux/tick.h index 9a7252e..99b3021 100644 --- a/include/linux/tick.h +++ b/include/linux/tick.h @@ -73,6 +73,7 @@ static inline void tick_cancel_sched_timer(int cpu) { } # ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST extern struct tick_device *tick_get_broadcast_device(void); extern cpumask_t *tick_get_broadcast_mask(void); +extern void tick_broadcast_account(int cpu); # ifdef CONFIG_TICK_ONESHOT extern cpumask_t *tick_get_broadcast_oneshot_mask(void); diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c index 0962e05..43d0085 100644 --- a/kernel/time/tick-broadcast.c +++ b/kernel/time/tick-broadcast.c @@ -123,6 +123,16 @@ int tick_device_uses_broadcast(struct clock_event_device *dev, int cpu) } /* + * Weak function for cpu local interrupt accounting. Used by x86 to + * keep the lapic accounting correct for nmi_watchdog. + * + * Must be called with interrupts disabled. + */ +void __attribute__((weak)) tick_broadcast_account(int cpu) +{ +} + +/* * Broadcast the event to the cpus, which are set in the mask */ int tick_do_broadcast(cpumask_t mask) @@ -137,6 +147,7 @@ int tick_do_broadcast(cpumask_t mask) cpu_clear(cpu, mask); td = per_cpu(tick_cpu_device, cpu); td-evtdev-event_handler(td-evtdev); +
Re: nmi_watchdog fix for x86_64 to be more like i386
IRQ_NOBALANCING is not preventing cpu unplug. It moves the affinity to the next CPU, but the check in NMI watchdog for CPU == 0 would not longer work. That cannot happen right now because cpu_disable() on both i386/x86-64 reject CPU #0. So just setting IRQ_NOBALANCING is sufficient and both do that already. I was wrong earlier in being concerned about this. int tick_do_broadcast(cpumask_t mask) @@ -137,6 +147,7 @@ int tick_do_broadcast(cpumask_t mask) cpu_clear(cpu, mask); td = per_cpu(tick_cpu_device, cpu); td-evtdev-event_handler(td-evtdev); + tick_broadcast_account(cpu); That would not handle the case with a single CPU running only irq 0 but not broadcasting I think. I believe ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/fix-watchdog is the correct fix -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
On Mon, 1 Oct 2007, Andi Kleen wrote: IRQ_NOBALANCING is not preventing cpu unplug. It moves the affinity to the next CPU, but the check in NMI watchdog for CPU == 0 would not longer work. That cannot happen right now because cpu_disable() on both i386/x86-64 reject CPU #0. So just setting IRQ_NOBALANCING is sufficient and both do that already. I was wrong earlier in being concerned about this. int tick_do_broadcast(cpumask_t mask) @@ -137,6 +147,7 @@ int tick_do_broadcast(cpumask_t mask) cpu_clear(cpu, mask); td = per_cpu(tick_cpu_device, cpu); td-evtdev-event_handler(td-evtdev); + tick_broadcast_account(cpu); That would not handle the case with a single CPU running only irq 0 but not broadcasting I think. Hmm. The only situation where this can happen is when you add nolapic_timer to the command line on a single CPU system. We do not register the lapic dummy clock event device then. I believe ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/fix-watchdog is the correct fix Yup, I completely missed the fact, that we reject CPU#0 unplugging, so your fix seems indeed to be more correct and simpler. OTOH, the accounting hook would allow us to remove the IRQ#0 - CPU#0 restriction. Not sure whether it's worth the trouble. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
OTOH, the accounting hook would allow us to remove the IRQ#0 - CPU#0 restriction. Not sure whether it's worth the trouble. Some SIS chipsets hang the machine when you migrate irq 0 to another CPU. It's better to keep that Also I wouldn't be surprised if there are some other assumptions about this elsewhere. Ok in theory it could be done only on SIS, but that probably would really not be worth the trouble -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
On Tue, 2 Oct 2007, Andi Kleen wrote: OTOH, the accounting hook would allow us to remove the IRQ#0 - CPU#0 restriction. Not sure whether it's worth the trouble. Some SIS chipsets hang the machine when you migrate irq 0 to another CPU. It's better to keep that Also I wouldn't be surprised if there are some other assumptions about this elsewhere. Ok in theory it could be done only on SIS, but that probably would really not be worth the trouble Agreed. I just got a x8664-hrt report, where I found the following oddity: 0: 1197 172881 IO-APIC-edge timer That's one of those infamous AMD C1E boxen. Strange, all my systems have IRQ#0 on CPU#0 and nowhere else. Any idea ? tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
On Tue, 2 Oct 2007 00:47:12 +0200 (CEST) Thomas Gleixner [EMAIL PROTECTED] wrote: On Tue, 2 Oct 2007, Andi Kleen wrote: OTOH, the accounting hook would allow us to remove the IRQ#0 - CPU#0 restriction. Not sure whether it's worth the trouble. Some SIS chipsets hang the machine when you migrate irq 0 to another CPU. It's better to keep that Also I wouldn't be surprised if there are some other assumptions about this elsewhere. Ok in theory it could be done only on SIS, but that probably would really not be worth the trouble Agreed. I just got a x8664-hrt report, where I found the following oddity: 0: 1197 172881 IO-APIC-edge timer That's one of those infamous AMD C1E boxen. Strange, all my systems have IRQ#0 on CPU#0 and nowhere else. Any idea ? 2 things; the current irq balancers don't balance the timer interrupt, in fact they'll leave it alone (so the chipset might balance) older ones pin it to cpu 0 or rotate it - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
Thomas Gleixner wrote: On Tue, 2 Oct 2007, Andi Kleen wrote: OTOH, the accounting hook would allow us to remove the IRQ#0 - CPU#0 restriction. Not sure whether it's worth the trouble. Some SIS chipsets hang the machine when you migrate irq 0 to another CPU. It's better to keep that Also I wouldn't be surprised if there are some other assumptions about this elsewhere. Ok in theory it could be done only on SIS, but that probably would really not be worth the trouble Agreed. I just got a x8664-hrt report, where I found the following oddity: 0: 1197 172881 IO-APIC-edge timer That's one of those infamous AMD C1E boxen. Strange, all my systems have IRQ#0 on CPU#0 and nowhere else. Any idea ? tglx Here I have with stock FC7 (2.6.22.9-91) kernel : 0: 107835 133459760 IO-APIC-edge timer Processor: vendor_id: AuthenticAMD cpu family: 15 model: 107 model name: AMD Athlon(tm) 64 X2 Dual Core Processor 4000+ stepping: 1 cpu MHz: 2109.721 cache size: 512 KB MB: Asus M2N-E (NF570) --Mika - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
On Tue, 02 Oct 2007 07:56:24 +0300 Mika Penttilä [EMAIL PROTECTED] wrote: Here I have with stock FC7 (2.6.22.9-91) kernel : 0: 107835 133459760 IO-APIC-edge timer fwiw this is entirely done by the hardware; no irq balancer has touched this one (fc7 has the new irqbalancer which leaves irq0 alone) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: nmi_watchdog fix for x86_64 to be more like i386
Agreed. I just got a x8664-hrt report, where I found the following oddity: 0: 1197 172881 IO-APIC-edge timer That's one of those infamous AMD C1E boxen. Strange, all my systems have IRQ#0 on CPU#0 and nowhere else. Any idea ? Hmm, in lowestpriority mode it would be possible that the APIC changes the CPU to #1 once; but IRQ 0 is always set to fixed mode. Also even if that happens you should have them all on 1. Maybe the chipset is just ignoring the IO-APIC configuration in this case? Is it always the same chipset? Is it seen on i386 too? The problem is really that if this happens it's more than the NMI watchdog that is broken. If you don't run an additional APIC timer interrupt on CPU #0 it's possible that CPU #0 won't schedule at all. The only workaround for chipsets ignoring IRQ affinity would be to keep track on which CPU irq 0 happens and then restart APIC timer interrupts on the others (or send IPIs) as needed. But that would be fairly ugly. IRQ 0 is unfortunately an generally under-validated area because Windows doesn't use it. -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
nmi_watchdog fix for x86_64 to be more like i386
Thanks to tglx and ghaskins for all the help in tracking down a very early nmi_watchdog crash on certain x86_64 machines. This modifies nmi_watchdog_tick behavior for x86_64 arch to consider both timer and hpet IRQs just as the i386 arch does. Signed-off-by: David Bahi <[EMAIL PROTECTED]> --- arch/x86_64/kernel/nmi.c |8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) Index: linux-2.6.22.8-rt9_1267/arch/x86_64/kernel/nmi.c === --- linux-2.6.22.8-rt9_1267.orig/arch/x86_64/kernel/nmi.c +++ linux-2.6.22.8-rt9_1267/arch/x86_64/kernel/nmi.c @@ -369,8 +369,6 @@ int notrace __kprobes nmi_watchdog_tick( touched = 1; } - sum = read_pda(apic_timer_irqs); - if (__get_cpu_var(nmi_touch)) { __get_cpu_var(nmi_touch) = 0; touched = 1; @@ -386,6 +384,12 @@ int notrace __kprobes nmi_watchdog_tick( cpu_clear(cpu, backtrace_mask); } + /* +* Take the local apic timer and PIT/HPET into account. We don't +* know which one is active, when we have highres/dyntick on +*/ + sum = read_pda(apic_timer_irqs) + kstat_cpu(cpu).irqs[0]; + #ifdef CONFIG_X86_MCE /* Could check oops_in_progress here too, but it's safer not too */ signature.asc Description: PGP signature
nmi_watchdog fix for x86_64 to be more like i386
Thanks to tglx and ghaskins for all the help in tracking down a very early nmi_watchdog crash on certain x86_64 machines. This modifies nmi_watchdog_tick behavior for x86_64 arch to consider both timer and hpet IRQs just as the i386 arch does. Signed-off-by: David Bahi [EMAIL PROTECTED] --- arch/x86_64/kernel/nmi.c |8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) Index: linux-2.6.22.8-rt9_1267/arch/x86_64/kernel/nmi.c === --- linux-2.6.22.8-rt9_1267.orig/arch/x86_64/kernel/nmi.c +++ linux-2.6.22.8-rt9_1267/arch/x86_64/kernel/nmi.c @@ -369,8 +369,6 @@ int notrace __kprobes nmi_watchdog_tick( touched = 1; } - sum = read_pda(apic_timer_irqs); - if (__get_cpu_var(nmi_touch)) { __get_cpu_var(nmi_touch) = 0; touched = 1; @@ -386,6 +384,12 @@ int notrace __kprobes nmi_watchdog_tick( cpu_clear(cpu, backtrace_mask); } + /* +* Take the local apic timer and PIT/HPET into account. We don't +* know which one is active, when we have highres/dyntick on +*/ + sum = read_pda(apic_timer_irqs) + kstat_cpu(cpu).irqs[0]; + #ifdef CONFIG_X86_MCE /* Could check oops_in_progress here too, but it's safer not too */ signature.asc Description: PGP signature