Re: Patch available for shared em interrupts (Re: em, bge, network problems survey.)
On Thu, Oct 05, 2006 at 10:34:25PM -0400, Kris Kennaway wrote: > Please let Scott and I know whether or not this patch works for you > (in addition to the information previously requested, if you have not > already sent it). Unfortunately it is only a workaround, but it > points to an underlying problem with fast interrupt handlers on a > shared irq that can be studied separately. I'm a bit behind in mailing list traffic (700 unread in -stable, yikes!). I can confirm that this works around the problem for me. It also seems to prevent the USB controller the irq is shared with from locking up as well. Craig ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Patch available for shared em interrupts (Re: em, bge, network problems survey.)
Mike Tancsa wrote: At 10:34 PM 10/5/2006, Kris Kennaway wrote: Based on successful testing on a machine with shared em interrupt, the following patch should work around the problem *in that case*. Note that this patch will not help you if you are not using the em driver, or if you are seeing the problem with non-shared em interrupt (I have investigated on such outlier, which seems to be a problem with a particular model of em hardware and not a generic problem with the driver). Please let Scott and I know whether or not this patch works for you (in addition to the information previously requested, if you have not already sent it). Unfortunately it is only a workaround, but it points to an underlying problem with fast interrupt handlers on a shared irq that can be studied separately. I ran into a em0 timeout on a box I just started testing. The patch seems to fix the issue. (before the patch) Oct 13 21:42:56 am64 kernel: em0: watchdog timeout -- resetting Oct 13 21:42:56 am64 kernel: em0: link state changed to DOWN Oct 13 21:42:58 am64 kernel: em0: link state changed to UP dmesg with patch Copyright (c) 1992-2006 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 6.2-PRERELEASE #2: Fri Oct 13 22:28:38 EDT 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/up ACPI APIC Table: Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Pentium(R) 4 CPU 3.00GHz (2992.71-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0xf43 Stepping = 3 Features=0xbfebfbff Features2=0x649d> AMD Features=0x2800 Logical CPUs per core: 2 real memory = 3481198592 (3319 MB) avail memory = 3360186368 (3204 MB) ioapic0 irqs 0-23 on motherboard ioapic1 irqs 24-47 on motherboard ioapic2 irqs 48-71 on motherboard kbd1 at kbdmux0 acpi0: on motherboard acpi_bus_number: can't get _ADR acpi_bus_number: can't get _ADR acpi0: Power Button (fixed) acpi0: reservation of 500, 10 (4) failed acpi0: reservation of 560, 20 (4) failed Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0 cpu0: on acpi0 acpi_throttle0: on cpu0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 pci0: at device 2.0 (no driver attached) pcib1: irq 16 at device 28.0 on pci0 pci2: on pcib1 pcib2: at device 0.0 on pci2 pci4: on pcib2 pcib3: at device 0.2 on pci2 pci3: on pcib3 3ware device driver for 9000 series storage controllers, version: 3.60.02.012 twa0: <3ware 9000 series Storage Controller> port 0xef80-0xefbf mem 0xfebff000-0xfebf irq 53 at device 2.0 on pci3 twa0: [GIANT-LOCKED] twa0: INFO: (0x15: 0x1300): Controller details:: Model 9550SX-4LP, 4 ports, Firmware FE9X 3.01.01.028, BIOS BE9X 3.01.00.024 uhci0: port 0xcc00-0xcc1f irq 23 at device 29.0 on pci0 uhci0: [GIANT-LOCKED] usb0: on uhci0 usb0: USB revision 1.0 uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered uhci1: port 0xcc80-0xcc9f irq 19 at device 29.1 on pci0 uhci1: [GIANT-LOCKED] usb1: on uhci1 usb1: USB revision 1.0 uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 2 ports with 2 removable, self powered uhci2: port 0xcd00-0xcd1f irq 18 at device 29.2 on pci0 uhci2: [GIANT-LOCKED] usb2: on uhci2 usb2: USB revision 1.0 uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub2: 2 ports with 2 removable, self powered ehci0: mem 0xfe9ff800-0xfe9ffbff irq 23 at device 29.7 on pci0 ehci0: [GIANT-LOCKED] usb3: EHCI version 1.0 usb3: companion controllers, 2 ports each: usb0 usb1 usb2 usb3: on ehci0 usb3: USB revision 2.0 uhub3: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1 uhub3: 6 ports with 6 removable, self powered pcib4: at device 30.0 on pci0 pci1: on pcib4 em0: port 0xdf80-0xdfbf mem 0xfeae-0xfeaf irq 18 at device 3.0 on pci1 em0: Ethernet address: 00:0e:0c:4b:15:eb isab0: at device 31.0 on pci0 isa0: on isab0 atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376 at device 31.1 on pci0 ata0: on atapci0 ata1: on atapci0 atapci1: port 0xcf80-0xcf87,0xcf00-0xcf03,0xce80-0xce87,0xce00-0xce03,0xcd80-0xcd8f mem 0xfe9ffc00-0xfe9f irq 19 at device 31.2 on pci0 ata2: on atapci1 ata3: on atapci1 pci0: at device 31.3 (no driver attached) acpi_button0: on acpi0 atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: flags 0x1 irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] sio0: configured irq 4 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 sio1: type 16550A fdc0: port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: [
Re: Patch available for shared em interrupts (Re: em, bge, network problems survey.)
At 12:31 AM 10/14/2006, Scott Long wrote: Mike, I have a new patch that I hope addresses the actual bug, instead of shuffling the timing. Would you be willing to test it? I can't guarantee that it's safe for production use yet, though. It seems to work, but it might set your dog on fire too. Yes, for sure as the box is just for testing mysql right now. I dont think we will end up even using it in production as the whole MB runs insanely hot. ---Mike ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Patch available for shared em interrupts (Re: em, bge, network problems survey.)
At 10:34 PM 10/5/2006, Kris Kennaway wrote: Based on successful testing on a machine with shared em interrupt, the following patch should work around the problem *in that case*. Note that this patch will not help you if you are not using the em driver, or if you are seeing the problem with non-shared em interrupt (I have investigated on such outlier, which seems to be a problem with a particular model of em hardware and not a generic problem with the driver). Please let Scott and I know whether or not this patch works for you (in addition to the information previously requested, if you have not already sent it). Unfortunately it is only a workaround, but it points to an underlying problem with fast interrupt handlers on a shared irq that can be studied separately. I ran into a em0 timeout on a box I just started testing. The patch seems to fix the issue. (before the patch) Oct 13 21:42:56 am64 kernel: em0: watchdog timeout -- resetting Oct 13 21:42:56 am64 kernel: em0: link state changed to DOWN Oct 13 21:42:58 am64 kernel: em0: link state changed to UP dmesg with patch Copyright (c) 1992-2006 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 6.2-PRERELEASE #2: Fri Oct 13 22:28:38 EDT 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/up ACPI APIC Table: Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Pentium(R) 4 CPU 3.00GHz (2992.71-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0xf43 Stepping = 3 Features=0xbfebfbff Features2=0x649d> AMD Features=0x2800 Logical CPUs per core: 2 real memory = 3481198592 (3319 MB) avail memory = 3360186368 (3204 MB) ioapic0 irqs 0-23 on motherboard ioapic1 irqs 24-47 on motherboard ioapic2 irqs 48-71 on motherboard kbd1 at kbdmux0 acpi0: on motherboard acpi_bus_number: can't get _ADR acpi_bus_number: can't get _ADR acpi0: Power Button (fixed) acpi0: reservation of 500, 10 (4) failed acpi0: reservation of 560, 20 (4) failed Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0 cpu0: on acpi0 acpi_throttle0: on cpu0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 pci0: at device 2.0 (no driver attached) pcib1: irq 16 at device 28.0 on pci0 pci2: on pcib1 pcib2: at device 0.0 on pci2 pci4: on pcib2 pcib3: at device 0.2 on pci2 pci3: on pcib3 3ware device driver for 9000 series storage controllers, version: 3.60.02.012 twa0: <3ware 9000 series Storage Controller> port 0xef80-0xefbf mem 0xfebff000-0xfebf irq 53 at device 2.0 on pci3 twa0: [GIANT-LOCKED] twa0: INFO: (0x15: 0x1300): Controller details:: Model 9550SX-4LP, 4 ports, Firmware FE9X 3.01.01.028, BIOS BE9X 3.01.00.024 uhci0: port 0xcc00-0xcc1f irq 23 at device 29.0 on pci0 uhci0: [GIANT-LOCKED] usb0: on uhci0 usb0: USB revision 1.0 uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered uhci1: port 0xcc80-0xcc9f irq 19 at device 29.1 on pci0 uhci1: [GIANT-LOCKED] usb1: on uhci1 usb1: USB revision 1.0 uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 2 ports with 2 removable, self powered uhci2: port 0xcd00-0xcd1f irq 18 at device 29.2 on pci0 uhci2: [GIANT-LOCKED] usb2: on uhci2 usb2: USB revision 1.0 uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub2: 2 ports with 2 removable, self powered ehci0: mem 0xfe9ff800-0xfe9ffbff irq 23 at device 29.7 on pci0 ehci0: [GIANT-LOCKED] usb3: EHCI version 1.0 usb3: companion controllers, 2 ports each: usb0 usb1 usb2 usb3: on ehci0 usb3: USB revision 2.0 uhub3: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1 uhub3: 6 ports with 6 removable, self powered pcib4: at device 30.0 on pci0 pci1: on pcib4 em0: port 0xdf80-0xdfbf mem 0xfeae-0xfeaf irq 18 at device 3.0 on pci1 em0: Ethernet address: 00:0e:0c:4b:15:eb isab0: at device 31.0 on pci0 isa0: on isab0 atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376 at device 31.1 on pci0 ata0: on atapci0 ata1: on atapci0 atapci1: port 0xcf80-0xcf87,0xcf00-0xcf03,0xce80-0xce87,0xce00-0xce03,0xcd80-0xcd8f mem 0xfe9ffc00-0xfe9f irq 19 at device 31.2 on pci0 ata2: on atapci1 ata3: on atapci1 pci0: at device 31.3 (no driver attached) acpi_button0: on acpi0 atkbdc0: port 0x60,0x64 irq 1 on acpi0 atkbd0: flags 0x1 irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] sio0: configured irq 4 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 sio1: type 16550A fdc0: port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: [FAST] fd0: <1440-KB 3.5" d
Re: Patch available for shared em interrupts (Re: em, bge, network problems survey.)
On 6. okt. 2006, at 04.34, Kris Kennaway wrote: On Thu, Oct 05, 2006 at 04:05:52PM -0400, Kris Kennaway wrote: On Wed, Oct 04, 2006 at 05:14:27PM -0600, Scott Long wrote: All, I'm seeing some patterns here with all of the network driver problem reports, but I need more information to help narrow it down further. I ask all of you who are having problems to take a minute to fill out this survey and return it to Kris Kennaway (on cc:) and myself. Thanks. 1. Are you experiencing network hangs and/or "timeout" messages on the console? If yes, please provide a _brief_ description of the problem. OK, next question, to all em users: If your em device is using a shared interrupt, and you are NOT experiencing timeout problems when using this device, please let me know: Based on successful testing on a machine with shared em interrupt, the following patch should work around the problem *in that case*. Note that this patch will not help you if you are not using the em driver, or if you are seeing the problem with non-shared em interrupt (I have investigated on such outlier, which seems to be a problem with a particular model of em hardware and not a generic problem with the driver). Index: if_em.c === RCS file: /home/ncvs/src/sys/dev/em/if_em.c,v retrieving revision 1.65.2.18 diff -u -u -r1.65.2.18 if_em.c --- if_em.c 25 Aug 2006 12:38:26 - 1.65.2.18 +++ if_em.c 5 Oct 2006 22:05:45 - @@ -2086,7 +2086,7 @@ taskqueue_start_threads(&adapter->tq, 1, PI_NET, "%s taskq", device_get_nameunit(adapter->dev)); if ((error = bus_setup_intr(dev, adapter->res_interrupt, - INTR_TYPE_NET | INTR_FAST, em_intr_fast, adapter, + INTR_TYPE_NET | INTR_MPSAFE, em_intr_fast, adapter, &adapter->int_handler_tag)) != 0) { device_printf(dev, "Failed to register fast interrupt " "handler: %d\n", error); Please let Scott and I know whether or not this patch works for you (in addition to the information previously requested, if you have not already sent it). Unfortunately it is only a workaround, but it points to an underlying problem with fast interrupt handlers on a shared irq that can be studied separately. I tested this on one of my other systems where em0 and USB shares an interrupt, and the patch helps to remove the watchdog timeout, and makes the system usable. Without it the system will some times not come up successfully at all, and other times it will drop off the face of the earth as soon as some network I/O in combination with disk I/O is done. -- Frode Nordahl ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Patch available for shared em interrupts (Re: em, bge, network problems survey.)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Oct 5, 2006, at 19:34 , Kris Kennaway wrote: Based on successful testing on a machine with shared em interrupt, the following patch should work around the problem *in that case*. This solves the em(4) issue for me on a shared interrupt. Prior to this, the network hang (no watchdog timeouts) was trivially reproducible with an NFS-mounted FreeBSD repository to two builder boxes, and running cvs -q upd on the ports tree at the same time. (the builder boxes also have em(4) interfaces, which I haven't patched, but they're running 7.0-CURRENT). Everything is i386. [EMAIL PROTECTED]:/dtbox] 739# vmstat -i ... irq21: em0 acpi0 965426857 ... - -aDe -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.3 (Darwin) iD8DBQFFKexJpXS8U0IvffwRArroAKCR69boUDor2t+L9rXsYXpoYsQkEQCeIcYg pSAbtbu28DAUE+EbOJUmIk8= =NbgC -END PGP SIGNATURE- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Patch available for shared em interrupts (Re: em, bge, network problems survey.)
Kris Kennaway ([EMAIL PROTECTED]) on 05/10/2006 at 22:34 wrote: > Based on successful testing on a machine with shared em interrupt, the > following patch should work around the problem *in that case*. [...] > Please let Scott and I know whether or not this patch works for you > (in addition to the information previously requested, if you have not > already sent it). Unfortunately it is only a workaround, but it > points to an underlying problem with fast interrupt handlers on a > shared irq that can be studied separately. # mojito uptime 14:23 up 1:59, 4 users, load averages: 0,07 0,05 0,01 # mojito uname -v FreeBSD 6.2-PRERELEASE #15: Fri Oct 6 12:11:36 CEST 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/DEBUG Your patch fixes my em/nvidia issue. Thanks Kris -- bug ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Patch available for shared em interrupts (Re: em, bge, network problems survey.)
On Thu, Oct 05, 2006 at 04:05:52PM -0400, Kris Kennaway wrote: > On Wed, Oct 04, 2006 at 05:14:27PM -0600, Scott Long wrote: > > All, > > > > I'm seeing some patterns here with all of the network driver problem > > reports, but I need more information to help narrow it down further. > > I ask all of you who are having problems to take a minute to fill > > out this survey and return it to Kris Kennaway (on cc:) and myself. > > Thanks. > > > > 1. Are you experiencing network hangs and/or "timeout" messages on the > > console? If yes, please provide a _brief_ description of the problem. > > OK, next question, to all em users: > > If your em device is using a shared interrupt, and you are NOT > experiencing timeout problems when using this device, please let me > know: Based on successful testing on a machine with shared em interrupt, the following patch should work around the problem *in that case*. Note that this patch will not help you if you are not using the em driver, or if you are seeing the problem with non-shared em interrupt (I have investigated on such outlier, which seems to be a problem with a particular model of em hardware and not a generic problem with the driver). Index: if_em.c === RCS file: /home/ncvs/src/sys/dev/em/if_em.c,v retrieving revision 1.65.2.18 diff -u -u -r1.65.2.18 if_em.c --- if_em.c 25 Aug 2006 12:38:26 - 1.65.2.18 +++ if_em.c 5 Oct 2006 22:05:45 - @@ -2086,7 +2086,7 @@ taskqueue_start_threads(&adapter->tq, 1, PI_NET, "%s taskq", device_get_nameunit(adapter->dev)); if ((error = bus_setup_intr(dev, adapter->res_interrupt, - INTR_TYPE_NET | INTR_FAST, em_intr_fast, adapter, + INTR_TYPE_NET | INTR_MPSAFE, em_intr_fast, adapter, &adapter->int_handler_tag)) != 0) { device_printf(dev, "Failed to register fast interrupt " "handler: %d\n", error); Please let Scott and I know whether or not this patch works for you (in addition to the information previously requested, if you have not already sent it). Unfortunately it is only a workaround, but it points to an underlying problem with fast interrupt handlers on a shared irq that can be studied separately. Kris pgpp54QFa2jMW.pgp Description: PGP signature