Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups

2001-02-14 Thread Jeff Garzik



On Wed, 14 Feb 2001, Roeland Th. Jansen wrote:

> On Tue, Feb 13, 2001 at 09:13:10PM +0100, Maciej W. Rozycki wrote:
> >  Please test it extensively, as much as you can, before I submit it for
> > inclusion.  If you ever get "Aieee!!!  Remote IRR still set after unlock!" 
> > message, please report it to me immediately -- it means the code failed. 
> 
> 
> ok, so far so good.
> 
> > There is also an additional debugging/statistics counter provided in
> > /proc/cpuinfo that counts interrupts which got delivered with its trigger
> > mode mismatched.  Check it out to find if you get any misdelivered
> > interrupts at all.
> 
> currently attacking the box with a flood ping. I used a pristine 2.4.1.
> to be sure I didn't leave stuff and applied the patch.

ping -l is a good test also...

Jeff




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups

2001-02-14 Thread Roeland Th. Jansen

On Wed, Feb 14, 2001 at 05:30:57PM +, Roeland Th. Jansen wrote:
> other observations -- approx 6000 ints from the ne2k card/sec.
> MIS shows approx 1% that goes wrong with a ping flood.

oops. had to count both CPU0 and CPU1's interrupts. after 23 minutes :

   CPU0   CPU1
 19:38241143823371   IO-APIC-level  eth0
MIS:  29025

makes approx 0.3%..

-- 
Grobbebol's Home   |  Don't give in to spammers.   -o)
http://www.xs4all.nl/~bengel   | Use your real e-mail address   /\
Linux 2.2.16 SMP 2x466MHz / 256 MB |on Usenet. _\_v  
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups

2001-02-14 Thread Roeland Th. Jansen

On Tue, Feb 13, 2001 at 09:13:10PM +0100, Maciej W. Rozycki wrote:
>  Please test it extensively, as much as you can, before I submit it for
> inclusion.  If you ever get "Aieee!!!  Remote IRR still set after unlock!" 
> message, please report it to me immediately -- it means the code failed. 


ok, so far so good.

> There is also an additional debugging/statistics counter provided in
> /proc/cpuinfo that counts interrupts which got delivered with its trigger
> mode mismatched.  Check it out to find if you get any misdelivered
> interrupts at all.

currently attacking the box with a flood ping. I used a pristine 2.4.1.
to be sure I didn't leave stuff and applied the patch.

observations -- system doesn't crash; usually I had to use disable focus
processor -- else it fails.

other observations -- approx 6000 ints from the ne2k card/sec.
MIS shows approx 1% that goes wrong with a ping flood.

   CPU0   CPU1
  0:  35345  36195IO-APIC-edge  timer
  1:   1632   1534IO-APIC-edge  keyboard
  2:  0  0  XT-PIC  cascade
  3:826832IO-APIC-edge  serial
  4:  4  4IO-APIC-edge  serial
  5:  12213  12201IO-APIC-edge  soundblaster
  8:  0  1IO-APIC-edge  rtc
 14:   3079   2906IO-APIC-edge  ide0
 15:  3  3IO-APIC-edge  ide1
 18: 69 85   IO-APIC-level  BusLogic BT-930
 19:17582801758266   IO-APIC-level  eth0
NMI:  71480  71480
LOC:  71459  71456
ERR:  3
MIS:  15814


good work !




-- 
Grobbebol's Home   |  Don't give in to spammers.   -o)
http://www.xs4all.nl/~bengel   | Use your real e-mail address   /\
Linux 2.2.16 SMP 2x466MHz / 256 MB |on Usenet. _\_v  
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups

2001-02-14 Thread Maciej W. Rozycki

On Wed, 14 Feb 2001, Andrew Morton wrote:

> Tell me, please: what tradeoffs are involved in this patch?
> Obviously it works around a pretty fatal problem, but
> what are we giving away?

 The change decreases performance a bit.  For well-behaved systems the
loss is fifteen instructions: a local APIC read (uncached but supposedly
cheap), a global memory read (a cache line invalidation and fetch), seven
stack accesses (cached for sure), a taken branch and five ALU.  With the
version you have I see gcc is actually doing an extra memory read due to
the volatile APIC access presumably -- this is now fixed.

 For misdelivered interrupts the overhead is much, much bigger, involving
acquiring a spinlock and multiple (uncached and possibly slow) I/O APIC
accesses.  We may lower the overhead by undefining APIC_LOCKUP_DEBUG,
which we should do after a bit of testing.  I think we might leave
APIC_MISMATCH_DEBUG intact -- its cost is a single locked instruction
which is negligible IMO.

 Note the original version consisted of two instructions only -- a local
APIC write and "ret", sigh...

  Maciej

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--+
+e-mail: [EMAIL PROTECTED], PGP key available+

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups

2001-02-14 Thread Andrew Morton

"Maciej W. Rozycki" wrote:
> 
> Hi,
> 
>  After performing various tests I came to the following workaround for
> APIC lockups which people observe under IRQ load, mostly for networking
> stuff.

Works fine on the dual-PII.  No "Aieee!!!" messages at all.

After sending a few gigs across the ethernet, running
irq-whacker:

mnm:/usr/src/cptimer> cat /proc/interrupts
   CPU0   CPU1
  0:  77613  61869IO-APIC-edge  timer
  1:253258IO-APIC-edge  keyboard
  2:  0  0  XT-PIC  cascade
  8:  0  1IO-APIC-edge  rtc
  9:  0  0  XT-PIC  acpi
 12:  0  0IO-APIC-edge  PS/2 Mouse
 17:51048553919759   IO-APIC-level  eth0
 18:   2334   2313   IO-APIC-level  ide2
NMI: 139418 139418
LOC: 139403 139402
ERR:221
MIS:5299867

And without irq-whacker:

mnm:/home/morton> cat /proc/interrupts
   CPU0   CPU1
  0:  55384  70899IO-APIC-edge  timer
  1:  2  3IO-APIC-edge  keyboard
  2:  0  0  XT-PIC  cascade
  8:  0  1IO-APIC-edge  rtc
  9:  0  0  XT-PIC  acpi
 12:  0  0IO-APIC-edge  PS/2 Mouse
 17:25547052554064   IO-APIC-level  eth0
 18:   1814   1812   IO-APIC-level  ide2
NMI: 126220 126220
LOC: 126202 126201
ERR: 35
MIS:  0


Tell me, please: what tradeoffs are involved in this patch?
Obviously it works around a pretty fatal problem, but
what are we giving away?

Oh: and thanks :)

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups

2001-02-14 Thread Andrew Morton

"Maciej W. Rozycki" wrote:
 
 Hi,
 
  After performing various tests I came to the following workaround for
 APIC lockups which people observe under IRQ load, mostly for networking
 stuff.

Works fine on the dual-PII.  No "Aieee!!!" messages at all.

After sending a few gigs across the ethernet, running
irq-whacker:

mnm:/usr/src/cptimer cat /proc/interrupts
   CPU0   CPU1
  0:  77613  61869IO-APIC-edge  timer
  1:253258IO-APIC-edge  keyboard
  2:  0  0  XT-PIC  cascade
  8:  0  1IO-APIC-edge  rtc
  9:  0  0  XT-PIC  acpi
 12:  0  0IO-APIC-edge  PS/2 Mouse
 17:51048553919759   IO-APIC-level  eth0
 18:   2334   2313   IO-APIC-level  ide2
NMI: 139418 139418
LOC: 139403 139402
ERR:221
MIS:5299867

And without irq-whacker:

mnm:/home/morton cat /proc/interrupts
   CPU0   CPU1
  0:  55384  70899IO-APIC-edge  timer
  1:  2  3IO-APIC-edge  keyboard
  2:  0  0  XT-PIC  cascade
  8:  0  1IO-APIC-edge  rtc
  9:  0  0  XT-PIC  acpi
 12:  0  0IO-APIC-edge  PS/2 Mouse
 17:25547052554064   IO-APIC-level  eth0
 18:   1814   1812   IO-APIC-level  ide2
NMI: 126220 126220
LOC: 126202 126201
ERR: 35
MIS:  0


Tell me, please: what tradeoffs are involved in this patch?
Obviously it works around a pretty fatal problem, but
what are we giving away?

Oh: and thanks :)

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups

2001-02-14 Thread Maciej W. Rozycki

On Wed, 14 Feb 2001, Andrew Morton wrote:

 Tell me, please: what tradeoffs are involved in this patch?
 Obviously it works around a pretty fatal problem, but
 what are we giving away?

 The change decreases performance a bit.  For well-behaved systems the
loss is fifteen instructions: a local APIC read (uncached but supposedly
cheap), a global memory read (a cache line invalidation and fetch), seven
stack accesses (cached for sure), a taken branch and five ALU.  With the
version you have I see gcc is actually doing an extra memory read due to
the volatile APIC access presumably -- this is now fixed.

 For misdelivered interrupts the overhead is much, much bigger, involving
acquiring a spinlock and multiple (uncached and possibly slow) I/O APIC
accesses.  We may lower the overhead by undefining APIC_LOCKUP_DEBUG,
which we should do after a bit of testing.  I think we might leave
APIC_MISMATCH_DEBUG intact -- its cost is a single locked instruction
which is negligible IMO.

 Note the original version consisted of two instructions only -- a local
APIC write and "ret", sigh...

  Maciej

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--+
+e-mail: [EMAIL PROTECTED], PGP key available+

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups

2001-02-14 Thread Roeland Th. Jansen

On Tue, Feb 13, 2001 at 09:13:10PM +0100, Maciej W. Rozycki wrote:
  Please test it extensively, as much as you can, before I submit it for
 inclusion.  If you ever get "Aieee!!!  Remote IRR still set after unlock!" 
 message, please report it to me immediately -- it means the code failed. 


ok, so far so good.

 There is also an additional debugging/statistics counter provided in
 /proc/cpuinfo that counts interrupts which got delivered with its trigger
 mode mismatched.  Check it out to find if you get any misdelivered
 interrupts at all.

currently attacking the box with a flood ping. I used a pristine 2.4.1.
to be sure I didn't leave stuff and applied the patch.

observations -- system doesn't crash; usually I had to use disable focus
processor -- else it fails.

other observations -- approx 6000 ints from the ne2k card/sec.
MIS shows approx 1% that goes wrong with a ping flood.

   CPU0   CPU1
  0:  35345  36195IO-APIC-edge  timer
  1:   1632   1534IO-APIC-edge  keyboard
  2:  0  0  XT-PIC  cascade
  3:826832IO-APIC-edge  serial
  4:  4  4IO-APIC-edge  serial
  5:  12213  12201IO-APIC-edge  soundblaster
  8:  0  1IO-APIC-edge  rtc
 14:   3079   2906IO-APIC-edge  ide0
 15:  3  3IO-APIC-edge  ide1
 18: 69 85   IO-APIC-level  BusLogic BT-930
 19:17582801758266   IO-APIC-level  eth0
NMI:  71480  71480
LOC:  71459  71456
ERR:  3
MIS:  15814


good work !




-- 
Grobbebol's Home   |  Don't give in to spammers.   -o)
http://www.xs4all.nl/~bengel   | Use your real e-mail address   /\
Linux 2.2.16 SMP 2x466MHz / 256 MB |on Usenet. _\_v  
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups

2001-02-14 Thread Roeland Th. Jansen

On Wed, Feb 14, 2001 at 05:30:57PM +, Roeland Th. Jansen wrote:
 other observations -- approx 6000 ints from the ne2k card/sec.
 MIS shows approx 1% that goes wrong with a ping flood.

oops. had to count both CPU0 and CPU1's interrupts. after 23 minutes :

   CPU0   CPU1
 19:38241143823371   IO-APIC-level  eth0
MIS:  29025

makes approx 0.3%..

-- 
Grobbebol's Home   |  Don't give in to spammers.   -o)
http://www.xs4all.nl/~bengel   | Use your real e-mail address   /\
Linux 2.2.16 SMP 2x466MHz / 256 MB |on Usenet. _\_v  
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups

2001-02-14 Thread Jeff Garzik



On Wed, 14 Feb 2001, Roeland Th. Jansen wrote:

 On Tue, Feb 13, 2001 at 09:13:10PM +0100, Maciej W. Rozycki wrote:
   Please test it extensively, as much as you can, before I submit it for
  inclusion.  If you ever get "Aieee!!!  Remote IRR still set after unlock!" 
  message, please report it to me immediately -- it means the code failed. 
 
 
 ok, so far so good.
 
  There is also an additional debugging/statistics counter provided in
  /proc/cpuinfo that counts interrupts which got delivered with its trigger
  mode mismatched.  Check it out to find if you get any misdelivered
  interrupts at all.
 
 currently attacking the box with a flood ping. I used a pristine 2.4.1.
 to be sure I didn't leave stuff and applied the patch.

ping -l is a good test also...

Jeff




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups

2001-02-13 Thread Frank de Lange

On Tue, Feb 13, 2001 at 09:13:10PM +0100, Maciej W. Rozycki wrote:
> There is also an additional debugging/statistics counter provided in
> /proc/cpuinfo that counts interrupts which got delivered with its trigger
> mode mismatched.  Check it out to find if you get any misdelivered
> interrupts at all.

I guess you mean the MIS: counter in /proc/interrupts? This is what it says on
my box after running some 33 interrupts (at a rate of app. 900/second)
through the network/usb IRQ:

 cat /proc/interrupts 
   CPU0   CPU1   
  0:  31693  32749IO-APIC-edge  timer
  1:   1208   1174IO-APIC-edge  keyboard
  2:  0  0  XT-PIC  cascade
  3:113 26IO-APIC-edge  serial
  4:   4689   4567IO-APIC-edge  serial
 14:   4440   4545IO-APIC-edge  ide0
 15:   1911   2132IO-APIC-edge  ide1
 16:  85021  84227   IO-APIC-level  es1371, mga@PCI:1:0:0
 17: 26 26   IO-APIC-level  sym53c8xx
 18:  0  0   IO-APIC-level  btaudio, bttv
 19: 165467 166254   IO-APIC-level  eth0, eth1, usb-uhci
NMI:  64376  64376 
LOC:  64364  64362 
ERR:  0
MIS:647

So, that's about 650 misdelivered interrupts for 33 deliveries (the other
interrupts never gave me any trouble, so I guess the misdelivered ones are all
from IRQ 19), or about .2%

When I load the network and stream some audio over it, the sound becomes a bit
choppy. The MIS: counter only increases when the network (read: IRQ1() is
loaded, a single audio stream (app. 220 int/sec) causes no MISses to occur.

In general, I'd say the stability WITH the patch is good, and timeouts are
withing tolerable levels. If I need something better, I'll probably get myself
a better set of network cards...

So, quick conclusion, this seems a reasonable fix...

Cheers//Frank

-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/   \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups

2001-02-13 Thread Manfred Spraul

"Maciej W. Rozycki" wrote:
> 
> Hi,
> 
>  After performing various tests I came to the following workaround for
> APIC lockups which people observe under IRQ load, mostly for networking
> stuff.  I believe the test should work in all cases as it basically
> implements a manual replacement for EOI messages.  In my simulated
> environment I was unable to get a lockup with the code in place, even
> though I was getting about every other level-triggered IRQ misdelivered.
> 
>  Please test it extensively, as much as you can, before I submit it for
> inclusion.  If you ever get "Aieee!!!  Remote IRR still set after unlock!"
> message, please report it to me immediately -- it means the code failed.
>
No messages.

> There is also an additional debugging/statistics counter provided in
> /proc/cpuinfo that counts interrupts which got delivered with its trigger
> mode mismatched.  Check it out to find if you get any misdelivered
> interrupts at all.
> 
I'm running my default webserver load test, and I get ~40 /second, 92735
total.

bw_tcp says 1.13 MB/sec, that's wire speed.

tcpdump | grep 'sack ' doesn't show unusually many lost packets.

Look promising.

--
Manfred
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[patch] 2.4.1, 2.4.2-pre3: APIC lockups

2001-02-13 Thread Maciej W. Rozycki

Hi,

 After performing various tests I came to the following workaround for
APIC lockups which people observe under IRQ load, mostly for networking
stuff.  I believe the test should work in all cases as it basically
implements a manual replacement for EOI messages.  In my simulated
environment I was unable to get a lockup with the code in place, even
though I was getting about every other level-triggered IRQ misdelivered. 

 Please test it extensively, as much as you can, before I submit it for
inclusion.  If you ever get "Aieee!!!  Remote IRR still set after unlock!" 
message, please report it to me immediately -- it means the code failed. 
There is also an additional debugging/statistics counter provided in
/proc/cpuinfo that counts interrupts which got delivered with its trigger
mode mismatched.  Check it out to find if you get any misdelivered
interrupts at all.

 The patch applies to 2.4.1 and 2.4.2-pre3 cleanly.  For -ac series you
need to revert patch-2.4.0-io_apic-2 first -- check list archives for the
patch. 

 Andrew, Manfred: that's a one-line-updated version comparing to what you
already have. 

 Ingo: while implementing irq_mis_count, I corrected irq_err_count to be
atomic_t as well.

 Good luck,

  Maciej

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--+
+e-mail: [EMAIL PROTECTED], PGP key available+

patch-2.4.1-io_apic-46
diff -up --recursive --new-file linux-2.4.1.macro/arch/i386/kernel/apic.c 
linux-2.4.1/arch/i386/kernel/apic.c
--- linux-2.4.1.macro/arch/i386/kernel/apic.c   Wed Dec 13 23:54:27 2000
+++ linux-2.4.1/arch/i386/kernel/apic.c Mon Feb 12 16:11:15 2001
@@ -23,6 +23,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -270,7 +271,13 @@ void __init setup_local_APIC (void)
 *   PCI Ne2000 networking cards and PII/PIII processors, dual
 *   BX chipset. ]
 */
-#if 0
+   /*
+* Actually disabling the focus CPU check just makes the hang less
+* frequent as it makes the interrupt distributon model be more
+* like LRU than MRU (the short-term load is more even across CPUs).
+* See also the comment in end_level_ioapic_irq().  --macro
+*/
+#if 1
/* Enable focus processor (bit==0) */
value &= ~(1<<9);
 #else
@@ -764,7 +771,7 @@ asmlinkage void smp_error_interrupt(void
apic_write(APIC_ESR, 0);
v1 = apic_read(APIC_ESR);
ack_APIC_irq();
-   irq_err_count++;
+   atomic_inc(_err_count);
 
/* Here is what the APIC error bits mean:
   0: Send CS error
diff -up --recursive --new-file linux-2.4.1.macro/arch/i386/kernel/i8259.c 
linux-2.4.1/arch/i386/kernel/i8259.c
--- linux-2.4.1.macro/arch/i386/kernel/i8259.c  Mon Nov 20 18:01:58 2000
+++ linux-2.4.1/arch/i386/kernel/i8259.cSun Feb 11 19:54:33 2001
@@ -12,6 +12,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -321,7 +322,7 @@ spurious_8259A_irq:
printk("spurious 8259A interrupt: IRQ%d.\n", irq);
spurious_irq_mask |= irqmask;
}
-   irq_err_count++;
+   atomic_inc(_err_count);
/*
 * Theoretically we do not have to handle this IRQ,
 * but in Linux this does not cause problems and is
diff -up --recursive --new-file linux-2.4.1.macro/arch/i386/kernel/io_apic.c 
linux-2.4.1/arch/i386/kernel/io_apic.c
--- linux-2.4.1.macro/arch/i386/kernel/io_apic.cSat Feb  3 12:05:49 2001
+++ linux-2.4.1/arch/i386/kernel/io_apic.c  Tue Feb 13 19:59:55 2001
@@ -33,6 +33,8 @@
 #include 
 #include 
 
+#define APIC_LOCKUP_DEBUG
+
 static spinlock_t ioapic_lock = SPIN_LOCK_UNLOCKED;
 
 /*
@@ -122,8 +124,14 @@ static void add_pin_to_irq(unsigned int 
static void name##_IO_APIC_irq (unsigned int irq)   \
__DO_ACTION(R, ACTION, FINAL)
 
-DO_ACTION( __mask,0, |= 0x0001, io_apic_sync(entry->apic))/* mask = 1 */
-DO_ACTION( __unmask,  0, &= 0xfffe, )  /* mask = 0 */
+DO_ACTION( __mask, 0, |= 0x0001, io_apic_sync(entry->apic) )
+   /* mask = 1 */
+DO_ACTION( __unmask,   0, &= 0xfffe, )
+   /* mask = 0 */
+DO_ACTION( __mask_and_edge,0, = (reg & 0x7fff) | 0x0001, )
+   /* mask = 1, trigger = 0 */
+DO_ACTION( __unmask_and_level, 0, = (reg & 0xfffe) | 0x8000, )
+   /* mask = 0, trigger = 1 */
 
 static void mask_IO_APIC_irq (unsigned int irq)
 {
@@ -847,6 +855,8 @@ void /*__init*/ print_local_APIC(void * 
 
v = apic_read(APIC_EOI);
printk(KERN_DEBUG "... APIC EOI: %08x\n", v);
+   v = apic_read(APIC_RRR);
+   printk(KERN_DEBUG "... APIC RRR: 

[patch] 2.4.1, 2.4.2-pre3: APIC lockups

2001-02-13 Thread Maciej W. Rozycki

Hi,

 After performing various tests I came to the following workaround for
APIC lockups which people observe under IRQ load, mostly for networking
stuff.  I believe the test should work in all cases as it basically
implements a manual replacement for EOI messages.  In my simulated
environment I was unable to get a lockup with the code in place, even
though I was getting about every other level-triggered IRQ misdelivered. 

 Please test it extensively, as much as you can, before I submit it for
inclusion.  If you ever get "Aieee!!!  Remote IRR still set after unlock!" 
message, please report it to me immediately -- it means the code failed. 
There is also an additional debugging/statistics counter provided in
/proc/cpuinfo that counts interrupts which got delivered with its trigger
mode mismatched.  Check it out to find if you get any misdelivered
interrupts at all.

 The patch applies to 2.4.1 and 2.4.2-pre3 cleanly.  For -ac series you
need to revert patch-2.4.0-io_apic-2 first -- check list archives for the
patch. 

 Andrew, Manfred: that's a one-line-updated version comparing to what you
already have. 

 Ingo: while implementing irq_mis_count, I corrected irq_err_count to be
atomic_t as well.

 Good luck,

  Maciej

-- 
+  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
+--+
+e-mail: [EMAIL PROTECTED], PGP key available+

patch-2.4.1-io_apic-46
diff -up --recursive --new-file linux-2.4.1.macro/arch/i386/kernel/apic.c 
linux-2.4.1/arch/i386/kernel/apic.c
--- linux-2.4.1.macro/arch/i386/kernel/apic.c   Wed Dec 13 23:54:27 2000
+++ linux-2.4.1/arch/i386/kernel/apic.c Mon Feb 12 16:11:15 2001
@@ -23,6 +23,7 @@
 #include linux/mc146818rtc.h
 #include linux/kernel_stat.h
 
+#include asm/atomic.h
 #include asm/smp.h
 #include asm/mtrr.h
 #include asm/mpspec.h
@@ -270,7 +271,13 @@ void __init setup_local_APIC (void)
 *   PCI Ne2000 networking cards and PII/PIII processors, dual
 *   BX chipset. ]
 */
-#if 0
+   /*
+* Actually disabling the focus CPU check just makes the hang less
+* frequent as it makes the interrupt distributon model be more
+* like LRU than MRU (the short-term load is more even across CPUs).
+* See also the comment in end_level_ioapic_irq().  --macro
+*/
+#if 1
/* Enable focus processor (bit==0) */
value = ~(19);
 #else
@@ -764,7 +771,7 @@ asmlinkage void smp_error_interrupt(void
apic_write(APIC_ESR, 0);
v1 = apic_read(APIC_ESR);
ack_APIC_irq();
-   irq_err_count++;
+   atomic_inc(irq_err_count);
 
/* Here is what the APIC error bits mean:
   0: Send CS error
diff -up --recursive --new-file linux-2.4.1.macro/arch/i386/kernel/i8259.c 
linux-2.4.1/arch/i386/kernel/i8259.c
--- linux-2.4.1.macro/arch/i386/kernel/i8259.c  Mon Nov 20 18:01:58 2000
+++ linux-2.4.1/arch/i386/kernel/i8259.cSun Feb 11 19:54:33 2001
@@ -12,6 +12,7 @@
 #include linux/init.h
 #include linux/kernel_stat.h
 
+#include asm/atomic.h
 #include asm/system.h
 #include asm/io.h
 #include asm/irq.h
@@ -321,7 +322,7 @@ spurious_8259A_irq:
printk("spurious 8259A interrupt: IRQ%d.\n", irq);
spurious_irq_mask |= irqmask;
}
-   irq_err_count++;
+   atomic_inc(irq_err_count);
/*
 * Theoretically we do not have to handle this IRQ,
 * but in Linux this does not cause problems and is
diff -up --recursive --new-file linux-2.4.1.macro/arch/i386/kernel/io_apic.c 
linux-2.4.1/arch/i386/kernel/io_apic.c
--- linux-2.4.1.macro/arch/i386/kernel/io_apic.cSat Feb  3 12:05:49 2001
+++ linux-2.4.1/arch/i386/kernel/io_apic.c  Tue Feb 13 19:59:55 2001
@@ -33,6 +33,8 @@
 #include asm/smp.h
 #include asm/desc.h
 
+#define APIC_LOCKUP_DEBUG
+
 static spinlock_t ioapic_lock = SPIN_LOCK_UNLOCKED;
 
 /*
@@ -122,8 +124,14 @@ static void add_pin_to_irq(unsigned int 
static void name##_IO_APIC_irq (unsigned int irq)   \
__DO_ACTION(R, ACTION, FINAL)
 
-DO_ACTION( __mask,0, |= 0x0001, io_apic_sync(entry-apic))/* mask = 1 */
-DO_ACTION( __unmask,  0, = 0xfffe, )  /* mask = 0 */
+DO_ACTION( __mask, 0, |= 0x0001, io_apic_sync(entry-apic) )
+   /* mask = 1 */
+DO_ACTION( __unmask,   0, = 0xfffe, )
+   /* mask = 0 */
+DO_ACTION( __mask_and_edge,0, = (reg  0x7fff) | 0x0001, )
+   /* mask = 1, trigger = 0 */
+DO_ACTION( __unmask_and_level, 0, = (reg  0xfffe) | 0x8000, )
+   /* mask = 0, trigger = 1 */
 
 static void mask_IO_APIC_irq (unsigned int irq)
 {
@@ -847,6 +855,8 @@ void /*__init*/ 

Re: [patch] 2.4.1, 2.4.2-pre3: APIC lockups

2001-02-13 Thread Frank de Lange

On Tue, Feb 13, 2001 at 09:13:10PM +0100, Maciej W. Rozycki wrote:
 There is also an additional debugging/statistics counter provided in
 /proc/cpuinfo that counts interrupts which got delivered with its trigger
 mode mismatched.  Check it out to find if you get any misdelivered
 interrupts at all.

I guess you mean the MIS: counter in /proc/interrupts? This is what it says on
my box after running some 33 interrupts (at a rate of app. 900/second)
through the network/usb IRQ:

 cat /proc/interrupts 
   CPU0   CPU1   
  0:  31693  32749IO-APIC-edge  timer
  1:   1208   1174IO-APIC-edge  keyboard
  2:  0  0  XT-PIC  cascade
  3:113 26IO-APIC-edge  serial
  4:   4689   4567IO-APIC-edge  serial
 14:   4440   4545IO-APIC-edge  ide0
 15:   1911   2132IO-APIC-edge  ide1
 16:  85021  84227   IO-APIC-level  es1371, mga@PCI:1:0:0
 17: 26 26   IO-APIC-level  sym53c8xx
 18:  0  0   IO-APIC-level  btaudio, bttv
 19: 165467 166254   IO-APIC-level  eth0, eth1, usb-uhci
NMI:  64376  64376 
LOC:  64364  64362 
ERR:  0
MIS:647

So, that's about 650 misdelivered interrupts for 33 deliveries (the other
interrupts never gave me any trouble, so I guess the misdelivered ones are all
from IRQ 19), or about .2%

When I load the network and stream some audio over it, the sound becomes a bit
choppy. The MIS: counter only increases when the network (read: IRQ1() is
loaded, a single audio stream (app. 220 int/sec) causes no MISses to occur.

In general, I'd say the stability WITH the patch is good, and timeouts are
withing tolerable levels. If I need something better, I'll probably get myself
a better set of network cards...

So, quick conclusion, this seems a reasonable fix...

Cheers//Frank

-- 
  W  ___
 ## o o\/ Frank de Lange \
 }#   \|   /  \
  ##---# _/ Hacker for Hire  \
      \  +31-320-252965/
   \[EMAIL PROTECTED]/
-
 [ "Omnis enim res, quae dando non deficit, dum habetur
et non datur, nondum habetur, quomodo habenda est."  ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/