Re: Patch available for shared em interrupts (Re: em, bge, network problems survey.)

2006-10-27 Thread Craig Boston
On Thu, Oct 05, 2006 at 10:34:25PM -0400, Kris Kennaway wrote:
> Please let Scott and I know whether or not this patch works for you
> (in addition to the information previously requested, if you have not
> already sent it).  Unfortunately it is only a workaround, but it
> points to an underlying problem with fast interrupt handlers on a
> shared irq that can be studied separately.

I'm a bit behind in mailing list traffic (700 unread in -stable,
yikes!).  I can confirm that this works around the problem for me.  It
also seems to prevent the USB controller the irq is shared with from
locking up as well.

Craig
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Patch available for shared em interrupts (Re: em, bge, network problems survey.)

2006-10-13 Thread Scott Long

Mike Tancsa wrote:

At 10:34 PM 10/5/2006, Kris Kennaway wrote:


Based on successful testing on a machine with shared em interrupt, the
following patch should work around the problem *in that case*.

Note that this patch will not help you if you are not using the em
driver, or if you are seeing the problem with non-shared em interrupt
(I have investigated on such outlier, which seems to be a problem with
a particular model of em hardware and not a generic problem with the
driver).

Please let Scott and I know whether or not this patch works for you
(in addition to the information previously requested, if you have not
already sent it).  Unfortunately it is only a workaround, but it
points to an underlying problem with fast interrupt handlers on a
shared irq that can be studied separately.


I ran into a em0 timeout on a box I just started testing. The patch 
seems to fix the issue.

(before the patch)
Oct 13 21:42:56 am64 kernel: em0: watchdog timeout -- resetting
Oct 13 21:42:56 am64 kernel: em0: link state changed to DOWN
Oct 13 21:42:58 am64 kernel: em0: link state changed to UP

dmesg with patch

Copyright (c) 1992-2006 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 6.2-PRERELEASE #2: Fri Oct 13 22:28:38 EDT 2006
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/up
ACPI APIC Table: 
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Pentium(R) 4 CPU 3.00GHz (2992.71-MHz K8-class CPU)
  Origin = "GenuineIntel"  Id = 0xf43  Stepping = 3
  
Features=0xbfebfbff 


  Features2=0x649d>
  AMD Features=0x2800
  Logical CPUs per core: 2
real memory  = 3481198592 (3319 MB)
avail memory = 3360186368 (3204 MB)
ioapic0  irqs 0-23 on motherboard
ioapic1  irqs 24-47 on motherboard
ioapic2  irqs 48-71 on motherboard
kbd1 at kbdmux0
acpi0:  on motherboard
acpi_bus_number: can't get _ADR
acpi_bus_number: can't get _ADR
acpi0: Power Button (fixed)
acpi0: reservation of 500, 10 (4) failed
acpi0: reservation of 560, 20 (4) failed
Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
cpu0:  on acpi0
acpi_throttle0:  on cpu0
pcib0:  port 0xcf8-0xcff on acpi0
pci0:  on pcib0
pci0:  at device 2.0 (no driver attached)
pcib1:  irq 16 at device 28.0 on pci0
pci2:  on pcib1
pcib2:  at device 0.0 on pci2
pci4:  on pcib2
pcib3:  at device 0.2 on pci2
pci3:  on pcib3
3ware device driver for 9000 series storage controllers, version: 
3.60.02.012
twa0: <3ware 9000 series Storage Controller> port 0xef80-0xefbf mem 
0xfebff000-0xfebf irq 53 at device 2.0 on pci3

twa0: [GIANT-LOCKED]
twa0: INFO: (0x15: 0x1300): Controller details:: Model 9550SX-4LP, 4 
ports, Firmware FE9X 3.01.01.028, BIOS BE9X 3.01.00.024
uhci0:  port 
0xcc00-0xcc1f irq 23 at device 29.0 on pci0

uhci0: [GIANT-LOCKED]
usb0:  on uhci0
usb0: USB revision 1.0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1:  port 
0xcc80-0xcc9f irq 19 at device 29.1 on pci0

uhci1: [GIANT-LOCKED]
usb1:  on uhci1
usb1: USB revision 1.0
uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhci2:  port 
0xcd00-0xcd1f irq 18 at device 29.2 on pci0

uhci2: [GIANT-LOCKED]
usb2:  on uhci2
usb2: USB revision 1.0
uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
ehci0:  mem 
0xfe9ff800-0xfe9ffbff irq 23 at device 29.7 on pci0

ehci0: [GIANT-LOCKED]
usb3: EHCI version 1.0
usb3: companion controllers, 2 ports each: usb0 usb1 usb2
usb3:  on ehci0
usb3: USB revision 2.0
uhub3: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub3: 6 ports with 6 removable, self powered
pcib4:  at device 30.0 on pci0
pci1:  on pcib4
em0:  port 
0xdf80-0xdfbf mem 0xfeae-0xfeaf irq 18 at device 3.0 on pci1

em0: Ethernet address: 00:0e:0c:4b:15:eb
isab0:  at device 31.0 on pci0
isa0:  on isab0
atapci0:  port 
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376 at device 31.1 on pci0

ata0:  on atapci0
ata1:  on atapci0
atapci1:  port 
0xcf80-0xcf87,0xcf00-0xcf03,0xce80-0xce87,0xce00-0xce03,0xcd80-0xcd8f 
mem 0xfe9ffc00-0xfe9f irq 19 at device 31.2 on pci0

ata2:  on atapci1
ata3:  on atapci1
pci0:  at device 31.3 (no driver attached)
acpi_button0:  on acpi0
atkbdc0:  port 0x60,0x64 irq 1 on acpi0
atkbd0:  flags 0x1 irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
sio0: configured irq 4 not in bitmap of probed irqs 0
sio0: port may not be enabled
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on 
acpi0

sio0: type 16550A
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1: port may not be enabled
sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0
sio1: type 16550A
fdc0:  port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 
on acpi0

fdc0: [

Re: Patch available for shared em interrupts (Re: em, bge, network problems survey.)

2006-10-13 Thread Mike Tancsa

At 12:31 AM 10/14/2006, Scott Long wrote:


Mike,

I have a new patch that I hope addresses the actual bug, instead of 
shuffling the timing.  Would you be willing to test it?  I can't 
guarantee that it's safe for production use yet, though.  It seems

to work, but it might set your dog on fire too.


Yes, for sure as the box is just for testing mysql right now. I 
dont think we will end up even using it in production as the whole MB 
runs insanely hot.


---Mike 


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Patch available for shared em interrupts (Re: em, bge, network problems survey.)

2006-10-13 Thread Mike Tancsa

At 10:34 PM 10/5/2006, Kris Kennaway wrote:


Based on successful testing on a machine with shared em interrupt, the
following patch should work around the problem *in that case*.

Note that this patch will not help you if you are not using the em
driver, or if you are seeing the problem with non-shared em interrupt
(I have investigated on such outlier, which seems to be a problem with
a particular model of em hardware and not a generic problem with the
driver).

Please let Scott and I know whether or not this patch works for you
(in addition to the information previously requested, if you have not
already sent it).  Unfortunately it is only a workaround, but it
points to an underlying problem with fast interrupt handlers on a
shared irq that can be studied separately.


I ran into a em0 timeout on a box I just started testing. The patch 
seems to fix the issue.

(before the patch)
Oct 13 21:42:56 am64 kernel: em0: watchdog timeout -- resetting
Oct 13 21:42:56 am64 kernel: em0: link state changed to DOWN
Oct 13 21:42:58 am64 kernel: em0: link state changed to UP

dmesg with patch

Copyright (c) 1992-2006 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 6.2-PRERELEASE #2: Fri Oct 13 22:28:38 EDT 2006
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/up
ACPI APIC Table: 
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Pentium(R) 4 CPU 3.00GHz (2992.71-MHz K8-class CPU)
  Origin = "GenuineIntel"  Id = 0xf43  Stepping = 3
  
Features=0xbfebfbff
  Features2=0x649d>
  AMD Features=0x2800
  Logical CPUs per core: 2
real memory  = 3481198592 (3319 MB)
avail memory = 3360186368 (3204 MB)
ioapic0  irqs 0-23 on motherboard
ioapic1  irqs 24-47 on motherboard
ioapic2  irqs 48-71 on motherboard
kbd1 at kbdmux0
acpi0:  on motherboard
acpi_bus_number: can't get _ADR
acpi_bus_number: can't get _ADR
acpi0: Power Button (fixed)
acpi0: reservation of 500, 10 (4) failed
acpi0: reservation of 560, 20 (4) failed
Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
cpu0:  on acpi0
acpi_throttle0:  on cpu0
pcib0:  port 0xcf8-0xcff on acpi0
pci0:  on pcib0
pci0:  at device 2.0 (no driver attached)
pcib1:  irq 16 at device 28.0 on pci0
pci2:  on pcib1
pcib2:  at device 0.0 on pci2
pci4:  on pcib2
pcib3:  at device 0.2 on pci2
pci3:  on pcib3
3ware device driver for 9000 series storage controllers, version: 3.60.02.012
twa0: <3ware 9000 series Storage Controller> port 0xef80-0xefbf mem 
0xfebff000-0xfebf irq 53 at device 2.0 on pci3

twa0: [GIANT-LOCKED]
twa0: INFO: (0x15: 0x1300): Controller details:: Model 9550SX-4LP, 4 
ports, Firmware FE9X 3.01.01.028, BIOS BE9X 3.01.00.024
uhci0:  port 
0xcc00-0xcc1f irq 23 at device 29.0 on pci0

uhci0: [GIANT-LOCKED]
usb0:  on uhci0
usb0: USB revision 1.0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1:  port 
0xcc80-0xcc9f irq 19 at device 29.1 on pci0

uhci1: [GIANT-LOCKED]
usb1:  on uhci1
usb1: USB revision 1.0
uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhci2:  port 
0xcd00-0xcd1f irq 18 at device 29.2 on pci0

uhci2: [GIANT-LOCKED]
usb2:  on uhci2
usb2: USB revision 1.0
uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
ehci0:  mem 
0xfe9ff800-0xfe9ffbff irq 23 at device 29.7 on pci0

ehci0: [GIANT-LOCKED]
usb3: EHCI version 1.0
usb3: companion controllers, 2 ports each: usb0 usb1 usb2
usb3:  on ehci0
usb3: USB revision 2.0
uhub3: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub3: 6 ports with 6 removable, self powered
pcib4:  at device 30.0 on pci0
pci1:  on pcib4
em0:  port 
0xdf80-0xdfbf mem 0xfeae-0xfeaf irq 18 at device 3.0 on pci1

em0: Ethernet address: 00:0e:0c:4b:15:eb
isab0:  at device 31.0 on pci0
isa0:  on isab0
atapci0:  port 
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376 at device 31.1 on pci0

ata0:  on atapci0
ata1:  on atapci0
atapci1:  port 
0xcf80-0xcf87,0xcf00-0xcf03,0xce80-0xce87,0xce00-0xce03,0xcd80-0xcd8f 
mem 0xfe9ffc00-0xfe9f irq 19 at device 31.2 on pci0

ata2:  on atapci1
ata3:  on atapci1
pci0:  at device 31.3 (no driver attached)
acpi_button0:  on acpi0
atkbdc0:  port 0x60,0x64 irq 1 on acpi0
atkbd0:  flags 0x1 irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
sio0: configured irq 4 not in bitmap of probed irqs 0
sio0: port may not be enabled
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
sio0: type 16550A
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1: port may not be enabled
sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0
sio1: type 16550A
fdc0:  port 0x3f0-0x3f5,0x3f7 irq 6 
drq 2 on acpi0

fdc0: [FAST]
fd0: <1440-KB 3.5" d

Re: Patch available for shared em interrupts (Re: em, bge, network problems survey.)

2006-10-10 Thread Frode Nordahl

On 6. okt. 2006, at 04.34, Kris Kennaway wrote:


On Thu, Oct 05, 2006 at 04:05:52PM -0400, Kris Kennaway wrote:

On Wed, Oct 04, 2006 at 05:14:27PM -0600, Scott Long wrote:

All,

I'm seeing some patterns here with all of the network driver problem
reports, but I need more information to help narrow it down further.
I ask all of you who are having problems to take a minute to fill
out this survey and return it to Kris Kennaway (on cc:) and myself.
Thanks.

1. Are you experiencing network hangs and/or "timeout" messages  
on the
console?  If yes, please provide a _brief_ description of the  
problem.


OK, next question, to all em users:

If your em device is using a shared interrupt, and you are NOT
experiencing timeout problems when using this device, please let me
know:


Based on successful testing on a machine with shared em interrupt, the
following patch should work around the problem *in that case*.

Note that this patch will not help you if you are not using the em
driver, or if you are seeing the problem with non-shared em interrupt
(I have investigated on such outlier, which seems to be a problem with
a particular model of em hardware and not a generic problem with the
driver).

Index: if_em.c
===
RCS file: /home/ncvs/src/sys/dev/em/if_em.c,v
retrieving revision 1.65.2.18
diff -u -u -r1.65.2.18 if_em.c
--- if_em.c 25 Aug 2006 12:38:26 -  1.65.2.18
+++ if_em.c 5 Oct 2006 22:05:45 -
@@ -2086,7 +2086,7 @@
taskqueue_start_threads(&adapter->tq, 1, PI_NET, "%s taskq",
device_get_nameunit(adapter->dev));
if ((error = bus_setup_intr(dev, adapter->res_interrupt,
-   INTR_TYPE_NET | INTR_FAST, em_intr_fast, adapter,
+   INTR_TYPE_NET | INTR_MPSAFE, em_intr_fast, adapter,
&adapter->int_handler_tag)) != 0) {
device_printf(dev, "Failed to register fast interrupt "
"handler: %d\n", error);

Please let Scott and I know whether or not this patch works for you
(in addition to the information previously requested, if you have not
already sent it).  Unfortunately it is only a workaround, but it
points to an underlying problem with fast interrupt handlers on a
shared irq that can be studied separately.


I tested this on one of my other systems where em0 and USB shares an  
interrupt, and the patch helps to remove the watchdog timeout, and  
makes the system usable.


Without it  the system will some times not come up successfully at  
all, and other times it will drop off the face of the earth as soon  
as some network I/O in combination with disk I/O is done.


--
Frode Nordahl



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Patch available for shared em interrupts (Re: em, bge, network problems survey.)

2006-10-08 Thread Ade Lovett

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


On Oct 5, 2006, at 19:34 , Kris Kennaway wrote:

Based on successful testing on a machine with shared em interrupt, the
following patch should work around the problem *in that case*.


This solves the em(4) issue for me on a shared interrupt.  Prior to  
this, the network hang (no watchdog timeouts) was trivially  
reproducible with an NFS-mounted FreeBSD repository to two builder  
boxes, and running cvs -q upd on the ports tree at the same time.  
(the builder boxes also have em(4) interfaces, which I haven't  
patched, but they're running 7.0-CURRENT).  Everything is i386.


[EMAIL PROTECTED]:/dtbox] 739# vmstat -i
...
irq21: em0 acpi0  965426857
...

- -aDe

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.3 (Darwin)

iD8DBQFFKexJpXS8U0IvffwRArroAKCR69boUDor2t+L9rXsYXpoYsQkEQCeIcYg
pSAbtbu28DAUE+EbOJUmIk8=
=NbgC
-END PGP SIGNATURE-
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Patch available for shared em interrupts (Re: em, bge, network problems survey.)

2006-10-06 Thread Guy Brand
Kris Kennaway ([EMAIL PROTECTED]) on 05/10/2006 at 22:34 wrote:

> Based on successful testing on a machine with shared em interrupt, the
> following patch should work around the problem *in that case*.
[...]
> Please let Scott and I know whether or not this patch works for you
> (in addition to the information previously requested, if you have not
> already sent it).  Unfortunately it is only a workaround, but it
> points to an underlying problem with fast interrupt handlers on a
> shared irq that can be studied separately.

  # mojito uptime
  14:23  up  1:59, 4 users, load averages: 0,07 0,05 0,01
  # mojito uname -v
  FreeBSD 6.2-PRERELEASE #15: Fri Oct  6 12:11:36 CEST 2006
  [EMAIL PROTECTED]:/usr/obj/usr/src/sys/DEBUG 

  Your patch fixes my em/nvidia issue.
  Thanks Kris

-- 
  bug

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Patch available for shared em interrupts (Re: em, bge, network problems survey.)

2006-10-05 Thread Kris Kennaway
On Thu, Oct 05, 2006 at 04:05:52PM -0400, Kris Kennaway wrote:
> On Wed, Oct 04, 2006 at 05:14:27PM -0600, Scott Long wrote:
> > All,
> > 
> > I'm seeing some patterns here with all of the network driver problem 
> > reports, but I need more information to help narrow it down further.
> > I ask all of you who are having problems to take a minute to fill
> > out this survey and return it to Kris Kennaway (on cc:) and myself.
> > Thanks.
> > 
> > 1. Are you experiencing network hangs and/or "timeout" messages on the 
> > console?  If yes, please provide a _brief_ description of the problem.
> 
> OK, next question, to all em users:
> 
> If your em device is using a shared interrupt, and you are NOT
> experiencing timeout problems when using this device, please let me
> know:

Based on successful testing on a machine with shared em interrupt, the
following patch should work around the problem *in that case*.

Note that this patch will not help you if you are not using the em
driver, or if you are seeing the problem with non-shared em interrupt
(I have investigated on such outlier, which seems to be a problem with
a particular model of em hardware and not a generic problem with the
driver).

Index: if_em.c
===
RCS file: /home/ncvs/src/sys/dev/em/if_em.c,v
retrieving revision 1.65.2.18
diff -u -u -r1.65.2.18 if_em.c
--- if_em.c 25 Aug 2006 12:38:26 -  1.65.2.18
+++ if_em.c 5 Oct 2006 22:05:45 -
@@ -2086,7 +2086,7 @@
taskqueue_start_threads(&adapter->tq, 1, PI_NET, "%s taskq",
device_get_nameunit(adapter->dev));
if ((error = bus_setup_intr(dev, adapter->res_interrupt,
-   INTR_TYPE_NET | INTR_FAST, em_intr_fast, adapter,
+   INTR_TYPE_NET | INTR_MPSAFE, em_intr_fast, adapter,
&adapter->int_handler_tag)) != 0) {
device_printf(dev, "Failed to register fast interrupt "
"handler: %d\n", error);

Please let Scott and I know whether or not this patch works for you
(in addition to the information previously requested, if you have not
already sent it).  Unfortunately it is only a workaround, but it
points to an underlying problem with fast interrupt handlers on a
shared irq that can be studied separately.

Kris



pgpp54QFa2jMW.pgp
Description: PGP signature