Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups
On Wed, 14 Feb 2001, Roeland Th. Jansen wrote: > On Tue, Feb 13, 2001 at 09:13:10PM +0100, Maciej W. Rozycki wrote: > > Please test it extensively, as much as you can, before I submit it for > > inclusion. If you ever get "Aieee!!! Remote IRR still set after unlock!" > > message, please report it to me immediately -- it means the code failed. > > > ok, so far so good. > > > There is also an additional debugging/statistics counter provided in > > /proc/cpuinfo that counts interrupts which got delivered with its trigger > > mode mismatched. Check it out to find if you get any misdelivered > > interrupts at all. > > currently attacking the box with a flood ping. I used a pristine 2.4.1. > to be sure I didn't leave stuff and applied the patch. ping -l is a good test also... Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups
On Wed, Feb 14, 2001 at 05:30:57PM +, Roeland Th. Jansen wrote: > other observations -- approx 6000 ints from the ne2k card/sec. > MIS shows approx 1% that goes wrong with a ping flood. oops. had to count both CPU0 and CPU1's interrupts. after 23 minutes : CPU0 CPU1 19:38241143823371 IO-APIC-level eth0 MIS: 29025 makes approx 0.3%.. -- Grobbebol's Home | Don't give in to spammers. -o) http://www.xs4all.nl/~bengel | Use your real e-mail address /\ Linux 2.2.16 SMP 2x466MHz / 256 MB |on Usenet. _\_v - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups
On Tue, Feb 13, 2001 at 09:13:10PM +0100, Maciej W. Rozycki wrote: > Please test it extensively, as much as you can, before I submit it for > inclusion. If you ever get "Aieee!!! Remote IRR still set after unlock!" > message, please report it to me immediately -- it means the code failed. ok, so far so good. > There is also an additional debugging/statistics counter provided in > /proc/cpuinfo that counts interrupts which got delivered with its trigger > mode mismatched. Check it out to find if you get any misdelivered > interrupts at all. currently attacking the box with a flood ping. I used a pristine 2.4.1. to be sure I didn't leave stuff and applied the patch. observations -- system doesn't crash; usually I had to use disable focus processor -- else it fails. other observations -- approx 6000 ints from the ne2k card/sec. MIS shows approx 1% that goes wrong with a ping flood. CPU0 CPU1 0: 35345 36195IO-APIC-edge timer 1: 1632 1534IO-APIC-edge keyboard 2: 0 0 XT-PIC cascade 3:826832IO-APIC-edge serial 4: 4 4IO-APIC-edge serial 5: 12213 12201IO-APIC-edge soundblaster 8: 0 1IO-APIC-edge rtc 14: 3079 2906IO-APIC-edge ide0 15: 3 3IO-APIC-edge ide1 18: 69 85 IO-APIC-level BusLogic BT-930 19:17582801758266 IO-APIC-level eth0 NMI: 71480 71480 LOC: 71459 71456 ERR: 3 MIS: 15814 good work ! -- Grobbebol's Home | Don't give in to spammers. -o) http://www.xs4all.nl/~bengel | Use your real e-mail address /\ Linux 2.2.16 SMP 2x466MHz / 256 MB |on Usenet. _\_v - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups
On Wed, 14 Feb 2001, Andrew Morton wrote: > Tell me, please: what tradeoffs are involved in this patch? > Obviously it works around a pretty fatal problem, but > what are we giving away? The change decreases performance a bit. For well-behaved systems the loss is fifteen instructions: a local APIC read (uncached but supposedly cheap), a global memory read (a cache line invalidation and fetch), seven stack accesses (cached for sure), a taken branch and five ALU. With the version you have I see gcc is actually doing an extra memory read due to the volatile APIC access presumably -- this is now fixed. For misdelivered interrupts the overhead is much, much bigger, involving acquiring a spinlock and multiple (uncached and possibly slow) I/O APIC accesses. We may lower the overhead by undefining APIC_LOCKUP_DEBUG, which we should do after a bit of testing. I think we might leave APIC_MISMATCH_DEBUG intact -- its cost is a single locked instruction which is negligible IMO. Note the original version consisted of two instructions only -- a local APIC write and "ret", sigh... Maciej -- + Maciej W. Rozycki, Technical University of Gdansk, Poland + +--+ +e-mail: [EMAIL PROTECTED], PGP key available+ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups
"Maciej W. Rozycki" wrote: > > Hi, > > After performing various tests I came to the following workaround for > APIC lockups which people observe under IRQ load, mostly for networking > stuff. Works fine on the dual-PII. No "Aieee!!!" messages at all. After sending a few gigs across the ethernet, running irq-whacker: mnm:/usr/src/cptimer> cat /proc/interrupts CPU0 CPU1 0: 77613 61869IO-APIC-edge timer 1:253258IO-APIC-edge keyboard 2: 0 0 XT-PIC cascade 8: 0 1IO-APIC-edge rtc 9: 0 0 XT-PIC acpi 12: 0 0IO-APIC-edge PS/2 Mouse 17:51048553919759 IO-APIC-level eth0 18: 2334 2313 IO-APIC-level ide2 NMI: 139418 139418 LOC: 139403 139402 ERR:221 MIS:5299867 And without irq-whacker: mnm:/home/morton> cat /proc/interrupts CPU0 CPU1 0: 55384 70899IO-APIC-edge timer 1: 2 3IO-APIC-edge keyboard 2: 0 0 XT-PIC cascade 8: 0 1IO-APIC-edge rtc 9: 0 0 XT-PIC acpi 12: 0 0IO-APIC-edge PS/2 Mouse 17:25547052554064 IO-APIC-level eth0 18: 1814 1812 IO-APIC-level ide2 NMI: 126220 126220 LOC: 126202 126201 ERR: 35 MIS: 0 Tell me, please: what tradeoffs are involved in this patch? Obviously it works around a pretty fatal problem, but what are we giving away? Oh: and thanks :) - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups
"Maciej W. Rozycki" wrote: Hi, After performing various tests I came to the following workaround for APIC lockups which people observe under IRQ load, mostly for networking stuff. Works fine on the dual-PII. No "Aieee!!!" messages at all. After sending a few gigs across the ethernet, running irq-whacker: mnm:/usr/src/cptimer cat /proc/interrupts CPU0 CPU1 0: 77613 61869IO-APIC-edge timer 1:253258IO-APIC-edge keyboard 2: 0 0 XT-PIC cascade 8: 0 1IO-APIC-edge rtc 9: 0 0 XT-PIC acpi 12: 0 0IO-APIC-edge PS/2 Mouse 17:51048553919759 IO-APIC-level eth0 18: 2334 2313 IO-APIC-level ide2 NMI: 139418 139418 LOC: 139403 139402 ERR:221 MIS:5299867 And without irq-whacker: mnm:/home/morton cat /proc/interrupts CPU0 CPU1 0: 55384 70899IO-APIC-edge timer 1: 2 3IO-APIC-edge keyboard 2: 0 0 XT-PIC cascade 8: 0 1IO-APIC-edge rtc 9: 0 0 XT-PIC acpi 12: 0 0IO-APIC-edge PS/2 Mouse 17:25547052554064 IO-APIC-level eth0 18: 1814 1812 IO-APIC-level ide2 NMI: 126220 126220 LOC: 126202 126201 ERR: 35 MIS: 0 Tell me, please: what tradeoffs are involved in this patch? Obviously it works around a pretty fatal problem, but what are we giving away? Oh: and thanks :) - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups
On Wed, 14 Feb 2001, Andrew Morton wrote: Tell me, please: what tradeoffs are involved in this patch? Obviously it works around a pretty fatal problem, but what are we giving away? The change decreases performance a bit. For well-behaved systems the loss is fifteen instructions: a local APIC read (uncached but supposedly cheap), a global memory read (a cache line invalidation and fetch), seven stack accesses (cached for sure), a taken branch and five ALU. With the version you have I see gcc is actually doing an extra memory read due to the volatile APIC access presumably -- this is now fixed. For misdelivered interrupts the overhead is much, much bigger, involving acquiring a spinlock and multiple (uncached and possibly slow) I/O APIC accesses. We may lower the overhead by undefining APIC_LOCKUP_DEBUG, which we should do after a bit of testing. I think we might leave APIC_MISMATCH_DEBUG intact -- its cost is a single locked instruction which is negligible IMO. Note the original version consisted of two instructions only -- a local APIC write and "ret", sigh... Maciej -- + Maciej W. Rozycki, Technical University of Gdansk, Poland + +--+ +e-mail: [EMAIL PROTECTED], PGP key available+ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups
On Tue, Feb 13, 2001 at 09:13:10PM +0100, Maciej W. Rozycki wrote: Please test it extensively, as much as you can, before I submit it for inclusion. If you ever get "Aieee!!! Remote IRR still set after unlock!" message, please report it to me immediately -- it means the code failed. ok, so far so good. There is also an additional debugging/statistics counter provided in /proc/cpuinfo that counts interrupts which got delivered with its trigger mode mismatched. Check it out to find if you get any misdelivered interrupts at all. currently attacking the box with a flood ping. I used a pristine 2.4.1. to be sure I didn't leave stuff and applied the patch. observations -- system doesn't crash; usually I had to use disable focus processor -- else it fails. other observations -- approx 6000 ints from the ne2k card/sec. MIS shows approx 1% that goes wrong with a ping flood. CPU0 CPU1 0: 35345 36195IO-APIC-edge timer 1: 1632 1534IO-APIC-edge keyboard 2: 0 0 XT-PIC cascade 3:826832IO-APIC-edge serial 4: 4 4IO-APIC-edge serial 5: 12213 12201IO-APIC-edge soundblaster 8: 0 1IO-APIC-edge rtc 14: 3079 2906IO-APIC-edge ide0 15: 3 3IO-APIC-edge ide1 18: 69 85 IO-APIC-level BusLogic BT-930 19:17582801758266 IO-APIC-level eth0 NMI: 71480 71480 LOC: 71459 71456 ERR: 3 MIS: 15814 good work ! -- Grobbebol's Home | Don't give in to spammers. -o) http://www.xs4all.nl/~bengel | Use your real e-mail address /\ Linux 2.2.16 SMP 2x466MHz / 256 MB |on Usenet. _\_v - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups
On Wed, Feb 14, 2001 at 05:30:57PM +, Roeland Th. Jansen wrote: other observations -- approx 6000 ints from the ne2k card/sec. MIS shows approx 1% that goes wrong with a ping flood. oops. had to count both CPU0 and CPU1's interrupts. after 23 minutes : CPU0 CPU1 19:38241143823371 IO-APIC-level eth0 MIS: 29025 makes approx 0.3%.. -- Grobbebol's Home | Don't give in to spammers. -o) http://www.xs4all.nl/~bengel | Use your real e-mail address /\ Linux 2.2.16 SMP 2x466MHz / 256 MB |on Usenet. _\_v - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups
On Wed, 14 Feb 2001, Roeland Th. Jansen wrote: On Tue, Feb 13, 2001 at 09:13:10PM +0100, Maciej W. Rozycki wrote: Please test it extensively, as much as you can, before I submit it for inclusion. If you ever get "Aieee!!! Remote IRR still set after unlock!" message, please report it to me immediately -- it means the code failed. ok, so far so good. There is also an additional debugging/statistics counter provided in /proc/cpuinfo that counts interrupts which got delivered with its trigger mode mismatched. Check it out to find if you get any misdelivered interrupts at all. currently attacking the box with a flood ping. I used a pristine 2.4.1. to be sure I didn't leave stuff and applied the patch. ping -l is a good test also... Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups
On Tue, Feb 13, 2001 at 09:13:10PM +0100, Maciej W. Rozycki wrote: > There is also an additional debugging/statistics counter provided in > /proc/cpuinfo that counts interrupts which got delivered with its trigger > mode mismatched. Check it out to find if you get any misdelivered > interrupts at all. I guess you mean the MIS: counter in /proc/interrupts? This is what it says on my box after running some 33 interrupts (at a rate of app. 900/second) through the network/usb IRQ: cat /proc/interrupts CPU0 CPU1 0: 31693 32749IO-APIC-edge timer 1: 1208 1174IO-APIC-edge keyboard 2: 0 0 XT-PIC cascade 3:113 26IO-APIC-edge serial 4: 4689 4567IO-APIC-edge serial 14: 4440 4545IO-APIC-edge ide0 15: 1911 2132IO-APIC-edge ide1 16: 85021 84227 IO-APIC-level es1371, mga@PCI:1:0:0 17: 26 26 IO-APIC-level sym53c8xx 18: 0 0 IO-APIC-level btaudio, bttv 19: 165467 166254 IO-APIC-level eth0, eth1, usb-uhci NMI: 64376 64376 LOC: 64364 64362 ERR: 0 MIS:647 So, that's about 650 misdelivered interrupts for 33 deliveries (the other interrupts never gave me any trouble, so I guess the misdelivered ones are all from IRQ 19), or about .2% When I load the network and stream some audio over it, the sound becomes a bit choppy. The MIS: counter only increases when the network (read: IRQ1() is loaded, a single audio stream (app. 220 int/sec) causes no MISses to occur. In general, I'd say the stability WITH the patch is good, and timeouts are withing tolerable levels. If I need something better, I'll probably get myself a better set of network cards... So, quick conclusion, this seems a reasonable fix... Cheers//Frank -- W ___ ## o o\/ Frank de Lange \ }# \| / \ ##---# _/ \ \ +31-320-252965/ \[EMAIL PROTECTED]/ - [ "Omnis enim res, quae dando non deficit, dum habetur et non datur, nondum habetur, quomodo habenda est." ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups
"Maciej W. Rozycki" wrote: > > Hi, > > After performing various tests I came to the following workaround for > APIC lockups which people observe under IRQ load, mostly for networking > stuff. I believe the test should work in all cases as it basically > implements a manual replacement for EOI messages. In my simulated > environment I was unable to get a lockup with the code in place, even > though I was getting about every other level-triggered IRQ misdelivered. > > Please test it extensively, as much as you can, before I submit it for > inclusion. If you ever get "Aieee!!! Remote IRR still set after unlock!" > message, please report it to me immediately -- it means the code failed. > No messages. > There is also an additional debugging/statistics counter provided in > /proc/cpuinfo that counts interrupts which got delivered with its trigger > mode mismatched. Check it out to find if you get any misdelivered > interrupts at all. > I'm running my default webserver load test, and I get ~40 /second, 92735 total. bw_tcp says 1.13 MB/sec, that's wire speed. tcpdump | grep 'sack ' doesn't show unusually many lost packets. Look promising. -- Manfred - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] 2.4.1, 2.4.2-pre3: APIC lockups
Hi, After performing various tests I came to the following workaround for APIC lockups which people observe under IRQ load, mostly for networking stuff. I believe the test should work in all cases as it basically implements a manual replacement for EOI messages. In my simulated environment I was unable to get a lockup with the code in place, even though I was getting about every other level-triggered IRQ misdelivered. Please test it extensively, as much as you can, before I submit it for inclusion. If you ever get "Aieee!!! Remote IRR still set after unlock!" message, please report it to me immediately -- it means the code failed. There is also an additional debugging/statistics counter provided in /proc/cpuinfo that counts interrupts which got delivered with its trigger mode mismatched. Check it out to find if you get any misdelivered interrupts at all. The patch applies to 2.4.1 and 2.4.2-pre3 cleanly. For -ac series you need to revert patch-2.4.0-io_apic-2 first -- check list archives for the patch. Andrew, Manfred: that's a one-line-updated version comparing to what you already have. Ingo: while implementing irq_mis_count, I corrected irq_err_count to be atomic_t as well. Good luck, Maciej -- + Maciej W. Rozycki, Technical University of Gdansk, Poland + +--+ +e-mail: [EMAIL PROTECTED], PGP key available+ patch-2.4.1-io_apic-46 diff -up --recursive --new-file linux-2.4.1.macro/arch/i386/kernel/apic.c linux-2.4.1/arch/i386/kernel/apic.c --- linux-2.4.1.macro/arch/i386/kernel/apic.c Wed Dec 13 23:54:27 2000 +++ linux-2.4.1/arch/i386/kernel/apic.c Mon Feb 12 16:11:15 2001 @@ -23,6 +23,7 @@ #include #include +#include #include #include #include @@ -270,7 +271,13 @@ void __init setup_local_APIC (void) * PCI Ne2000 networking cards and PII/PIII processors, dual * BX chipset. ] */ -#if 0 + /* +* Actually disabling the focus CPU check just makes the hang less +* frequent as it makes the interrupt distributon model be more +* like LRU than MRU (the short-term load is more even across CPUs). +* See also the comment in end_level_ioapic_irq(). --macro +*/ +#if 1 /* Enable focus processor (bit==0) */ value &= ~(1<<9); #else @@ -764,7 +771,7 @@ asmlinkage void smp_error_interrupt(void apic_write(APIC_ESR, 0); v1 = apic_read(APIC_ESR); ack_APIC_irq(); - irq_err_count++; + atomic_inc(_err_count); /* Here is what the APIC error bits mean: 0: Send CS error diff -up --recursive --new-file linux-2.4.1.macro/arch/i386/kernel/i8259.c linux-2.4.1/arch/i386/kernel/i8259.c --- linux-2.4.1.macro/arch/i386/kernel/i8259.c Mon Nov 20 18:01:58 2000 +++ linux-2.4.1/arch/i386/kernel/i8259.cSun Feb 11 19:54:33 2001 @@ -12,6 +12,7 @@ #include #include +#include #include #include #include @@ -321,7 +322,7 @@ spurious_8259A_irq: printk("spurious 8259A interrupt: IRQ%d.\n", irq); spurious_irq_mask |= irqmask; } - irq_err_count++; + atomic_inc(_err_count); /* * Theoretically we do not have to handle this IRQ, * but in Linux this does not cause problems and is diff -up --recursive --new-file linux-2.4.1.macro/arch/i386/kernel/io_apic.c linux-2.4.1/arch/i386/kernel/io_apic.c --- linux-2.4.1.macro/arch/i386/kernel/io_apic.cSat Feb 3 12:05:49 2001 +++ linux-2.4.1/arch/i386/kernel/io_apic.c Tue Feb 13 19:59:55 2001 @@ -33,6 +33,8 @@ #include #include +#define APIC_LOCKUP_DEBUG + static spinlock_t ioapic_lock = SPIN_LOCK_UNLOCKED; /* @@ -122,8 +124,14 @@ static void add_pin_to_irq(unsigned int static void name##_IO_APIC_irq (unsigned int irq) \ __DO_ACTION(R, ACTION, FINAL) -DO_ACTION( __mask,0, |= 0x0001, io_apic_sync(entry->apic))/* mask = 1 */ -DO_ACTION( __unmask, 0, &= 0xfffe, ) /* mask = 0 */ +DO_ACTION( __mask, 0, |= 0x0001, io_apic_sync(entry->apic) ) + /* mask = 1 */ +DO_ACTION( __unmask, 0, &= 0xfffe, ) + /* mask = 0 */ +DO_ACTION( __mask_and_edge,0, = (reg & 0x7fff) | 0x0001, ) + /* mask = 1, trigger = 0 */ +DO_ACTION( __unmask_and_level, 0, = (reg & 0xfffe) | 0x8000, ) + /* mask = 0, trigger = 1 */ static void mask_IO_APIC_irq (unsigned int irq) { @@ -847,6 +855,8 @@ void /*__init*/ print_local_APIC(void * v = apic_read(APIC_EOI); printk(KERN_DEBUG "... APIC EOI: %08x\n", v); + v = apic_read(APIC_RRR); + printk(KERN_DEBUG "... APIC RRR:
[patch] 2.4.1, 2.4.2-pre3: APIC lockups
Hi, After performing various tests I came to the following workaround for APIC lockups which people observe under IRQ load, mostly for networking stuff. I believe the test should work in all cases as it basically implements a manual replacement for EOI messages. In my simulated environment I was unable to get a lockup with the code in place, even though I was getting about every other level-triggered IRQ misdelivered. Please test it extensively, as much as you can, before I submit it for inclusion. If you ever get "Aieee!!! Remote IRR still set after unlock!" message, please report it to me immediately -- it means the code failed. There is also an additional debugging/statistics counter provided in /proc/cpuinfo that counts interrupts which got delivered with its trigger mode mismatched. Check it out to find if you get any misdelivered interrupts at all. The patch applies to 2.4.1 and 2.4.2-pre3 cleanly. For -ac series you need to revert patch-2.4.0-io_apic-2 first -- check list archives for the patch. Andrew, Manfred: that's a one-line-updated version comparing to what you already have. Ingo: while implementing irq_mis_count, I corrected irq_err_count to be atomic_t as well. Good luck, Maciej -- + Maciej W. Rozycki, Technical University of Gdansk, Poland + +--+ +e-mail: [EMAIL PROTECTED], PGP key available+ patch-2.4.1-io_apic-46 diff -up --recursive --new-file linux-2.4.1.macro/arch/i386/kernel/apic.c linux-2.4.1/arch/i386/kernel/apic.c --- linux-2.4.1.macro/arch/i386/kernel/apic.c Wed Dec 13 23:54:27 2000 +++ linux-2.4.1/arch/i386/kernel/apic.c Mon Feb 12 16:11:15 2001 @@ -23,6 +23,7 @@ #include linux/mc146818rtc.h #include linux/kernel_stat.h +#include asm/atomic.h #include asm/smp.h #include asm/mtrr.h #include asm/mpspec.h @@ -270,7 +271,13 @@ void __init setup_local_APIC (void) * PCI Ne2000 networking cards and PII/PIII processors, dual * BX chipset. ] */ -#if 0 + /* +* Actually disabling the focus CPU check just makes the hang less +* frequent as it makes the interrupt distributon model be more +* like LRU than MRU (the short-term load is more even across CPUs). +* See also the comment in end_level_ioapic_irq(). --macro +*/ +#if 1 /* Enable focus processor (bit==0) */ value = ~(19); #else @@ -764,7 +771,7 @@ asmlinkage void smp_error_interrupt(void apic_write(APIC_ESR, 0); v1 = apic_read(APIC_ESR); ack_APIC_irq(); - irq_err_count++; + atomic_inc(irq_err_count); /* Here is what the APIC error bits mean: 0: Send CS error diff -up --recursive --new-file linux-2.4.1.macro/arch/i386/kernel/i8259.c linux-2.4.1/arch/i386/kernel/i8259.c --- linux-2.4.1.macro/arch/i386/kernel/i8259.c Mon Nov 20 18:01:58 2000 +++ linux-2.4.1/arch/i386/kernel/i8259.cSun Feb 11 19:54:33 2001 @@ -12,6 +12,7 @@ #include linux/init.h #include linux/kernel_stat.h +#include asm/atomic.h #include asm/system.h #include asm/io.h #include asm/irq.h @@ -321,7 +322,7 @@ spurious_8259A_irq: printk("spurious 8259A interrupt: IRQ%d.\n", irq); spurious_irq_mask |= irqmask; } - irq_err_count++; + atomic_inc(irq_err_count); /* * Theoretically we do not have to handle this IRQ, * but in Linux this does not cause problems and is diff -up --recursive --new-file linux-2.4.1.macro/arch/i386/kernel/io_apic.c linux-2.4.1/arch/i386/kernel/io_apic.c --- linux-2.4.1.macro/arch/i386/kernel/io_apic.cSat Feb 3 12:05:49 2001 +++ linux-2.4.1/arch/i386/kernel/io_apic.c Tue Feb 13 19:59:55 2001 @@ -33,6 +33,8 @@ #include asm/smp.h #include asm/desc.h +#define APIC_LOCKUP_DEBUG + static spinlock_t ioapic_lock = SPIN_LOCK_UNLOCKED; /* @@ -122,8 +124,14 @@ static void add_pin_to_irq(unsigned int static void name##_IO_APIC_irq (unsigned int irq) \ __DO_ACTION(R, ACTION, FINAL) -DO_ACTION( __mask,0, |= 0x0001, io_apic_sync(entry-apic))/* mask = 1 */ -DO_ACTION( __unmask, 0, = 0xfffe, ) /* mask = 0 */ +DO_ACTION( __mask, 0, |= 0x0001, io_apic_sync(entry-apic) ) + /* mask = 1 */ +DO_ACTION( __unmask, 0, = 0xfffe, ) + /* mask = 0 */ +DO_ACTION( __mask_and_edge,0, = (reg 0x7fff) | 0x0001, ) + /* mask = 1, trigger = 0 */ +DO_ACTION( __unmask_and_level, 0, = (reg 0xfffe) | 0x8000, ) + /* mask = 0, trigger = 1 */ static void mask_IO_APIC_irq (unsigned int irq) { @@ -847,6 +855,8 @@ void /*__init*/
Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups
On Tue, Feb 13, 2001 at 09:13:10PM +0100, Maciej W. Rozycki wrote: There is also an additional debugging/statistics counter provided in /proc/cpuinfo that counts interrupts which got delivered with its trigger mode mismatched. Check it out to find if you get any misdelivered interrupts at all. I guess you mean the MIS: counter in /proc/interrupts? This is what it says on my box after running some 33 interrupts (at a rate of app. 900/second) through the network/usb IRQ: cat /proc/interrupts CPU0 CPU1 0: 31693 32749IO-APIC-edge timer 1: 1208 1174IO-APIC-edge keyboard 2: 0 0 XT-PIC cascade 3:113 26IO-APIC-edge serial 4: 4689 4567IO-APIC-edge serial 14: 4440 4545IO-APIC-edge ide0 15: 1911 2132IO-APIC-edge ide1 16: 85021 84227 IO-APIC-level es1371, mga@PCI:1:0:0 17: 26 26 IO-APIC-level sym53c8xx 18: 0 0 IO-APIC-level btaudio, bttv 19: 165467 166254 IO-APIC-level eth0, eth1, usb-uhci NMI: 64376 64376 LOC: 64364 64362 ERR: 0 MIS:647 So, that's about 650 misdelivered interrupts for 33 deliveries (the other interrupts never gave me any trouble, so I guess the misdelivered ones are all from IRQ 19), or about .2% When I load the network and stream some audio over it, the sound becomes a bit choppy. The MIS: counter only increases when the network (read: IRQ1() is loaded, a single audio stream (app. 220 int/sec) causes no MISses to occur. In general, I'd say the stability WITH the patch is good, and timeouts are withing tolerable levels. If I need something better, I'll probably get myself a better set of network cards... So, quick conclusion, this seems a reasonable fix... Cheers//Frank -- W ___ ## o o\/ Frank de Lange \ }# \| / \ ##---# _/ Hacker for Hire \ \ +31-320-252965/ \[EMAIL PROTECTED]/ - [ "Omnis enim res, quae dando non deficit, dum habetur et non datur, nondum habetur, quomodo habenda est." ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/