Serious i386 interrupt mask bug in RELENG_4 (was Re: 4.4-RC NFS panic)
Ian Dowse writes: > In message <[EMAIL PROTECTED]>, Warner Losh writes: > > > >I think that might be due to a bug in the shared interrupt code that > >Ian Dowse sent me about earlier today. > > Just to add a few details - there is a bug in the update_masks() > function in i386/isa/intr_machdep.c that can cause some interrupts > to occur at times when they should be masked. The problem only > occurs with certain configurations of shared interrupts and devices, > and this code is only present in RELENG_4. Congratulations! I've applied your patch together with the one posted by Warner Losh and now the PCMCIA card is working again and the find/cat test passed without panic. -- walter pelissero http://www.pelissero.org To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Serious i386 interrupt mask bug in RELENG_4 (was Re: 4.4-RC NFS panic)
In message <[EMAIL PROTECTED]>, Warner Losh writes: > >I think that might be due to a bug in the shared interrupt code that >Ian Dowse sent me about earlier today. Just to add a few details - there is a bug in the update_masks() function in i386/isa/intr_machdep.c that can cause some interrupts to occur at times when they should be masked. The problem only occurs with certain configurations of shared interrupts and devices, and this code is only present in RELENG_4. The update_masks() function is called after an interrupt handler has been registered or removed. Its main function is to update the interrupt masks (tty_imask, net_imask etc) if necessary (e.g if IRQ11 is registered by a tty-type device, IRQ11 will be added to tty_imask so that future spltty()'s will mask IRQ11). A second function of update_masks() is to update the cached copy of the interrupt mask stored with each handler for a multiplexed interrupt. This is done via the call to update_mux_masks(). The bug is that update_masks() returns without calling update_mux_masks() in some cases where it should call it. Specifically, if a newly-added multiplexed interrupt handler has the same maskptr as another handler on the same IRQ line, that new handler doesn't get it's cached mask set. For example if a single IRQ has a usb device and a modem (tty), the second device to register it's handler will get its idesc->mask set to 0 instead of the value of tty_imask because update_mux_masks() may never be called to set it. Of course, if update_masks() is called later for some other device it may correct the situation. Interrupt handlers are called with intr_mask[irq] or'd into the cpl to block further interrupts; for non-multiplexed interrupts intr_mask[irq] will set from one of the *_imask masks. However with multiplexed interrupts, only the IRQ itself (and SWI_CLOCK_MASK) are blocked, and the multiplex handler intr_mux() needs to raise the cpl further when necessary. It uses idesc->mask to control this. When this bug occurs, idesc->mask == 0, so the device interrupt handler gets called with only the IRQ and SWI_CLOCK_MASK masked, instead of the full *_mask that it requested. Not good. On my laptop, this bug causes hangs within minutes of starting to use a pccard modem, but as should be apparent from the above it could strike virtually anywhere that multiplexed interrupts are used. The patch below seems to solve the problem; it just causes update_masks() to unconditionally update the masks. Ian Index: intr_machdep.c === RCS file: /home/iedowse/CVS/src/sys/i386/isa/intr_machdep.c,v retrieving revision 1.29.2.2 diff -u -r1.29.2.2 intr_machdep.c --- intr_machdep.c 2000/08/16 05:35:34 1.29.2.2 +++ intr_machdep.c 2001/08/23 20:24:17 @@ -651,15 +651,9 @@ if (find_idesc(maskptr, irq) == NULL) { /* no reference to this maskptr was found in this irq's chain */ - if ((*maskptr & mask) == 0) - return; - /* the irq was included in the classes mask, remove it */ *maskptr &= ~mask; } else { /* a reference to this maskptr was found in this irq's chain */ - if ((*maskptr & mask) != 0) - return; - /* put the irq into the classes mask */ *maskptr |= mask; } /* we need to update all values in the intr_mask[irq] array */ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: 4.4-RC NFS panic
In message <[EMAIL PROTECTED]> [EMAIL PROTECTED] writes: : I've been having similar problems with my 4.4-RC Vaio F807K whenever I : do a lot of NFS over my wi0 (Buffalo wireless card), every so often my : laptop just completely freezes. I think that might be due to a bug in the shared interrupt code that Ian Dowse sent me about earlier today. Warner To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: 4.4-RC NFS panic
Warner Losh writes: > After talking with Ian Dowse, I think that we've hammered out what may > cause this. Basically, the problem is I'm afraid your patch didn't fix the problem on my laptop. It certainly changed the behaviour and the system doesn't crash any more, but I'm almost unable to use the net. A ping to my server yelds the IP address to be resolved but no ping activity is carried on. Even worse, now the pcm driver fails to detect any sound device. 8-| Regarding the warm boot, I can confirm the same behavior (already pointed out in another mail of mine). My impression it's not a PCCARD issue as it happens even with no card inserted. The system looks as frozen but if I press the "Pause" key and then type something and then press again the "Pause" key I get the the cursor moved of the amount of typing I did. No echo though. -- walter pelissero http://www.pelissero.org To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: 4.4-RC NFS panic
On Tue, Aug 21, 2001 at 12:24:30PM +0200, Andre Albsmeier wrote: > On Tue, 21-Aug-2001 at 03:07:33 -0600, Warner Losh wrote: > > In message <[EMAIL PROTECTED]> Andre Albsmeier writes: > > : As I wrote in my PR (#29845), my problems also happen with > > : the 3C589 which uses the ep driver. So we can sum up to: > > : > > : 1.) Intel Etherexpress PRO/100 PCMCIA (xe driver) crashes > > : 2.) 3Com 589D EtherLink III PCMCIA (ep driver) crashes > > : 3.) Intel Etherexpress PRO/100+ PCI Card (fxp driver) works perfectly > > > > Interesting. I'm not sure what to make of this. > > We can now add: > > 4.) D-Link DFE-650 PCMCIA (ed driver)freezes > > :-( > > Warner, I have seen your mails regarding pcic-44rc1.diff.1. > My box has a TI PCI-1225 chip... I will try the patch... I've been having similar problems with my 4.4-RC Vaio F807K whenever I do a lot of NFS over my wi0 (Buffalo wireless card), every so often my laptop just completely freezes. -- Simon Dick [EMAIL PROTECTED] "Why do I get this urge to go bowling everytime I see Tux?" To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: 4.4-RC NFS panic
In message <[EMAIL PROTECTED]> Andre Albsmeier writes: : Attached below is the dmesg... It hangs only when warm booting; after : a power toggle everything is OK... ... : pcic0: Event mask 0xf stat 0x3419 : ### : ### Now it hangs until poweroff/poweron ### : ### OK. Looks like maybe an interrupt storm on warm boot. I'll have to check into this. Warner To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: 4.4-RC NFS panic
On Tue, 21-Aug-2001 at 23:44:40 -0600, Warner Losh wrote: > In message <[EMAIL PROTECTED]> Andre Albsmeier writes: > : I still have the hangs on a warm reboot but this is a different > : story... > > Eh? what kind of hangs and when? Attached below is the dmesg... It hangs only when warm booting; after a power toggle everything is OK... Copyright (c) 1992-2001 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 4.4-RC #23: Wed Aug 22 07:21:34 CEST 2001 [EMAIL PROTECTED]:/src/obj-4/src/src-4/sys/schlappy Calibrating clock(s) ... TSC clock: 30160 Hz, i8254 clock: 1193146 Hz Timecounter "i8254" frequency 1193146 Hz CPU: Pentium II/Pentium II Xeon/Celeron (366.66-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x66a Stepping = 10 Features=0x183f9ff real memory = 134152192 (131008K bytes) Physical memory chunk(s): 0x1000 - 0x0009efff, 647168 bytes (158 pages) 0x00325000 - 0x07febfff, 130838528 bytes (31943 pages) avail memory = 127590400 (124600K bytes) bios32: Found BIOS32 Service Directory header at 0xc00f6230 bios32: Entry = 0xfd790 (c00fd790) Rev = 0 Len = 1 pcibios: PCI BIOS entry at 0x225 pnpbios: Found PnP BIOS data at 0xc00f6260 pnpbios: Entry = f:a34e Rev = 1.0 pnpbios: Event flag at 4b4 Other BIOS signatures found: ACPI: 000f61f0 Preloaded elf kernel "kernel" at 0xc02ff000. Pentium Pro MTRR support enabled pci_open(1):mode 1 addr port (0x0cf8) is 0x8000384c pci_open(1a): mode1res=0x8000 (0x8000) pci_cfgcheck: device 0 [class=06] [hdr=00] is there (id=71908086) Using $PIR table, 7 entries at 0xc00fdf50 apm0: on motherboard apm: found APM BIOS v1.2, connected at v1.2 npx0: on motherboard npx0: INT 16 interface pcib0: on motherboard found-> vendor=0x8086, dev=0x7190, revid=0x03 class=06-00-00, hdrtype=0x00, mfdev=0 subordinatebus=0secondarybus=0 map[10]: type 1, range 32, base f800, size 26 found-> vendor=0x8086, dev=0x7191, revid=0x03 class=06-04-00, hdrtype=0x01, mfdev=0 subordinatebus=1secondarybus=1 found-> vendor=0x8086, dev=0x7110, revid=0x02 class=06-80-00, hdrtype=0x00, mfdev=1 subordinatebus=0secondarybus=0 found-> vendor=0x8086, dev=0x7111, revid=0x01 class=01-01-80, hdrtype=0x00, mfdev=0 subordinatebus=0secondarybus=0 map[20]: type 1, range 32, base fcd0, size 4 found-> vendor=0x8086, dev=0x7112, revid=0x01 class=0c-03-00, hdrtype=0x00, mfdev=0 subordinatebus=0secondarybus=0 intpin=d, irq=9 map[20]: type 1, range 32, base fce0, size 5 found-> vendor=0x8086, dev=0x7113, revid=0x02 class=06-80-00, hdrtype=0x00, mfdev=0 subordinatebus=0secondarybus=0 map[90]: type 1, range 32, base 2180, size 4 found-> vendor=0x104c, dev=0xac1c, revid=0x01 class=06-07-00, hdrtype=0x02, mfdev=1 subordinatebus=0secondarybus=0 intpin=a, irq=10 found-> vendor=0x104c, dev=0xac1c, revid=0x01 class=06-07-00, hdrtype=0x02, mfdev=1 subordinatebus=0secondarybus=0 intpin=b, irq=11 pci0: on pcib0 pcib1: at device 1.0 on pci0 found-> vendor=0x10c8, dev=0x0005, revid=0x12 class=03-00-00, hdrtype=0x00, mfdev=1 subordinatebus=0secondarybus=0 intpin=a, irq=10 map[10]: type 1, range 32, base f600, size 24 map[14]: type 1, range 32, base fe40, size 22 map[18]: type 1, range 32, base feb0, size 20 found-> vendor=0x10c8, dev=0x8005, revid=0x12 class=04-01-00, hdrtype=0x00, mfdev=1 subordinatebus=0secondarybus=0 intpin=b, irq=11 map[10]: type 1, range 32, base f780, size 22 map[14]: type 1, range 32, base fea0, size 20 pci1: on pcib1 pci1: (vendor=0x10c8, dev=0x0005) at 0.0 irq 10 chip1: mem 0xfea0-0xfeaf,0xf780-0xf7bf irq 11 at device 0.1 on pci1 isab0: at device 7.0 on pci0 isa0: on isab0 atapci0: port 0xfcd0-0xfcdf at device 7.1 on pci0 ata0: iobase=0x01f0 altiobase=0x03f6 bmaddr=0xfcd0 ata0: mask=03 status0=50 status1=50 ata0: mask=03 ostat0=50 ostat2=50 ata0-slave: ATAPI probe a=14 b=eb ata0-master: ATAPI probe a=00 b=00 ata0: mask=03 status0=50 status1=00 ata0-master: ATA probe a=01 b=a5 ata0: devices=09 ata0: at 0x1f0 irq 14 on atapci0 ata1: iobase=0x0170 altiobase=0x0376 bmaddr=0xfcd8 ata1: mask=00 status0=ff status1=ff ata1: probe allocation failed pci0: (vendor=0x8086, dev=0x7112) at 7.2 irq 9 chip2: port 0x2180-0x218f at device 7.3 on pci0 pcic0: irq 10 at device 10.0 on pci0 pcic0: PCI Memory allocated: 0x4400 pcic0: TI12XX PCI Config Reg: [ring enable][speaker enable][pwr save][CSC serial isa irq] pccard0: on pcic0 pcic1: irq 11 at device 10.1 on pci0 pcic1: PCI Memory allocated: 0x44001000 pcic1: TI12XX PCI Config Reg: [rin
Re: 4.4-RC NFS panic
In message <[EMAIL PROTECTED]> Andre Albsmeier writes: : I still have the hangs on a warm reboot but this is a different : story... Eh? what kind of hangs and when? Warner To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: 4.4-RC NFS panic
On Tue, 21-Aug-2001 at 11:45:12 -0600, Warner Losh wrote: > In message <[EMAIL PROTECTED]> David Malone writes: > : Andre Albsmeier, who's seeing various network problems, is using > : the xe driver (also PCMCIA I think), but the problems go away if > : he uses an Etherexpress card on the PCI bus of the same machine. > : > : It seems unlikely to be PCMCIA related ('cos it has nothing to do > : with the networking itself) it may just be triggered in machines > : with slower networking. > > After talking with Ian Dowse, I think that we've hammered out what may > cause this. Basically, the problem is > > code in net doing splnet() > >-> pcic_pci_intr -> netcard_intr -> network code. > > And we've interrupted the critical section, broken all kinds of > invariants. > > Warner > > P.S. I think that with Ian's other interrupt changes, we can do the > following w/o problems. This should fix the network problems, I > think. Runs perfectly for about 10 minutes now under full load. It didn't survive 10 seconds before :-) I still have the hangs on a warm reboot but this is a different story... Thanks a lot for the quick help! -Andre > > Index: pcic_pci.c > === > RCS file: /cache/ncvs/src/sys/pccard/pcic_pci.c,v > retrieving revision 1.54.2.7 > diff -u -r1.54.2.7 pcic_pci.c > --- pcic_pci.c2001/08/21 09:06:25 1.54.2.7 > +++ pcic_pci.c2001/08/21 17:18:06 > @@ -515,15 +515,6 @@ >* in the CD change. >*/ > sp->getb(sp, PCIC_STAT_CHG); > - > - /* > - * If we have a card in the slot with an interrupt handler, then > - * call it. Note: This means that each card can have at most one > - * interrupt handler for it. Since multifunction cards aren't > - * supported, this shouldn't cause a problem in practice. > - */ > - if (sc->cd_present && sp->intr != NULL) > - sp->intr(sp->argp); > } > > /* > @@ -784,36 +775,6 @@ > return (0); > } > > -static int > -pcic_pci_setup_intr(device_t dev, device_t child, struct resource *irq, > -int flags, driver_intr_t *intr, void *arg, void **cookiep) > -{ > - struct pcic_softc *sc = (struct pcic_softc *) device_get_softc(dev); > - struct pcic_slot *sp = &sc->slots[0]; > - > - if (sp->intr) { > - device_printf(dev, > -"Interrupt already established, possible multiple attach bug.\n"); > - return (EINVAL); > - } > - sp->intr = intr; > - sp->argp = arg; > - *cookiep = sc; > - return (0); > -} > - > -static int > -pcic_pci_teardown_intr(device_t dev, device_t child, struct resource *irq, > -void *cookie) > -{ > - struct pcic_softc *sc = (struct pcic_softc *) device_get_softc(dev); > - struct pcic_slot *sp = &sc->slots[0]; > - > - sp->intr = NULL; > - sp->argp = NULL; > - return (0); > -} > - > static device_method_t pcic_pci_methods[] = { > /* Device interface */ > DEVMETHOD(device_probe, pcic_pci_probe), > @@ -829,8 +790,8 @@ > DEVMETHOD(bus_release_resource, bus_generic_release_resource), > DEVMETHOD(bus_activate_resource, pcic_activate_resource), > DEVMETHOD(bus_deactivate_resource, pcic_deactivate_resource), > - DEVMETHOD(bus_setup_intr, pcic_pci_setup_intr), > - DEVMETHOD(bus_teardown_intr,pcic_pci_teardown_intr), > + DEVMETHOD(bus_setup_intr, bus_generic_setup_intr), > + DEVMETHOD(bus_teardown_intr,bus_generic_teardown_intr), > > /* Card interface */ > DEVMETHOD(card_set_res_flags, pcic_set_res_flags), -- BSD, from the people who brought you TCP/IP. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: 4.4-RC NFS panic
In message <[EMAIL PROTECTED]> David Malone writes: : Andre Albsmeier, who's seeing various network problems, is using : the xe driver (also PCMCIA I think), but the problems go away if : he uses an Etherexpress card on the PCI bus of the same machine. : : It seems unlikely to be PCMCIA related ('cos it has nothing to do : with the networking itself) it may just be triggered in machines : with slower networking. After talking with Ian Dowse, I think that we've hammered out what may cause this. Basically, the problem is code in net doing splnet() -> pcic_pci_intr -> netcard_intr -> network code. And we've interrupted the critical section, broken all kinds of invariants. Warner P.S. I think that with Ian's other interrupt changes, we can do the following w/o problems. This should fix the network problems, I think. Index: pcic_pci.c === RCS file: /cache/ncvs/src/sys/pccard/pcic_pci.c,v retrieving revision 1.54.2.7 diff -u -r1.54.2.7 pcic_pci.c --- pcic_pci.c 2001/08/21 09:06:25 1.54.2.7 +++ pcic_pci.c 2001/08/21 17:18:06 @@ -515,15 +515,6 @@ * in the CD change. */ sp->getb(sp, PCIC_STAT_CHG); - - /* -* If we have a card in the slot with an interrupt handler, then -* call it. Note: This means that each card can have at most one -* interrupt handler for it. Since multifunction cards aren't -* supported, this shouldn't cause a problem in practice. -*/ - if (sc->cd_present && sp->intr != NULL) - sp->intr(sp->argp); } /* @@ -784,36 +775,6 @@ return (0); } -static int -pcic_pci_setup_intr(device_t dev, device_t child, struct resource *irq, -int flags, driver_intr_t *intr, void *arg, void **cookiep) -{ - struct pcic_softc *sc = (struct pcic_softc *) device_get_softc(dev); - struct pcic_slot *sp = &sc->slots[0]; - - if (sp->intr) { - device_printf(dev, -"Interrupt already established, possible multiple attach bug.\n"); - return (EINVAL); - } - sp->intr = intr; - sp->argp = arg; - *cookiep = sc; - return (0); -} - -static int -pcic_pci_teardown_intr(device_t dev, device_t child, struct resource *irq, -void *cookie) -{ - struct pcic_softc *sc = (struct pcic_softc *) device_get_softc(dev); - struct pcic_slot *sp = &sc->slots[0]; - - sp->intr = NULL; - sp->argp = NULL; - return (0); -} - static device_method_t pcic_pci_methods[] = { /* Device interface */ DEVMETHOD(device_probe, pcic_pci_probe), @@ -829,8 +790,8 @@ DEVMETHOD(bus_release_resource, bus_generic_release_resource), DEVMETHOD(bus_activate_resource, pcic_activate_resource), DEVMETHOD(bus_deactivate_resource, pcic_deactivate_resource), - DEVMETHOD(bus_setup_intr, pcic_pci_setup_intr), - DEVMETHOD(bus_teardown_intr,pcic_pci_teardown_intr), + DEVMETHOD(bus_setup_intr, bus_generic_setup_intr), + DEVMETHOD(bus_teardown_intr,bus_generic_teardown_intr), /* Card interface */ DEVMETHOD(card_set_res_flags, pcic_set_res_flags), To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: 4.4-RC NFS panic
On Tue, 21-Aug-2001 at 03:07:33 -0600, Warner Losh wrote: > In message <[EMAIL PROTECTED]> Andre Albsmeier writes: > : As I wrote in my PR (#29845), my problems also happen with > : the 3C589 which uses the ep driver. So we can sum up to: > : > : 1.) Intel Etherexpress PRO/100 PCMCIA (xe driver) crashes > : 2.) 3Com 589D EtherLink III PCMCIA (ep driver) crashes > : 3.) Intel Etherexpress PRO/100+ PCI Card (fxp driver) works perfectly > > Interesting. I'm not sure what to make of this. We can now add: 4.) D-Link DFE-650 PCMCIA (ed driver)freezes :-( Warner, I have seen your mails regarding pcic-44rc1.diff.1. My box has a TI PCI-1225 chip... I will try the patch... -Andre To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: 4.4-RC NFS panic
On Tue, 21-Aug-2001 at 03:07:33 -0600, Warner Losh wrote: > In message <[EMAIL PROTECTED]> Andre Albsmeier writes: > : As I wrote in my PR (#29845), my problems also happen with > : the 3C589 which uses the ep driver. So we can sum up to: > : > : 1.) Intel Etherexpress PRO/100 PCMCIA (xe driver) crashes > : 2.) 3Com 589D EtherLink III PCMCIA (ep driver) crashes > : 3.) Intel Etherexpress PRO/100+ PCI Card (fxp driver) works perfectly > > Interesting. I'm not sure what to make of this. So do I. Ian Dowse already sent me a program to inspect the mbufs in the crashdumps. I don't know a lot about mbufs but the output appears really hosed... I will try it again using another PCMICA card I just got... -Andre To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: 4.4-RC NFS panic
In message <[EMAIL PROTECTED]> Andre Albsmeier writes: : As I wrote in my PR (#29845), my problems also happen with : the 3C589 which uses the ep driver. So we can sum up to: : : 1.) Intel Etherexpress PRO/100 PCMCIA (xe driver) crashes : 2.) 3Com 589D EtherLink III PCMCIA (ep driver) crashes : 3.) Intel Etherexpress PRO/100+ PCI Card (fxp driver) works perfectly Interesting. I'm not sure what to make of this. Warner To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: 4.4-RC NFS panic
On Tue, 21-Aug-2001 at 09:35:34 +0100, David Malone wrote: > > I've just done a further test. I've mounted a directory tree from > > Vaio to Vaio using localhost (lo driver) and the test has run > > smoothly. So chances would be good the bug is in the ep driver. > > Unfortunately... > > Andre Albsmeier, who's seeing various network problems, is using > the xe driver (also PCMCIA I think), but the problems go away if > he uses an Etherexpress card on the PCI bus of the same machine. As I wrote in my PR (#29845), my problems also happen with the 3C589 which uses the ep driver. So we can sum up to: 1.) Intel Etherexpress PRO/100 PCMCIA (xe driver) crashes 2.) 3Com 589D EtherLink III PCMCIA (ep driver) crashes 3.) Intel Etherexpress PRO/100+ PCI Card (fxp driver) works perfectly -Andre > > It seems unlikely to be PCMCIA related ('cos it has nothing to do > with the networking itself) it may just be triggered in machines > with slower networking. > > David. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: 4.4-RC NFS panic
> I've just done a further test. I've mounted a directory tree from > Vaio to Vaio using localhost (lo driver) and the test has run > smoothly. So chances would be good the bug is in the ep driver. > Unfortunately... Andre Albsmeier, who's seeing various network problems, is using the xe driver (also PCMCIA I think), but the problems go away if he uses an Etherexpress card on the PCI bus of the same machine. It seems unlikely to be PCMCIA related ('cos it has nothing to do with the networking itself) it may just be triggered in machines with slower networking. David. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: 4.4-RC NFS panic
In message <[EMAIL PROTECTED]> "Walter C. Pelissero" writes: : Mmmm, you might be right. I'm using a 3com 589, therefore I'm using : the ep driver. The ep driver has been a little flakey under heavy load (like NFS) for a while. : Side note. Regarding a different problem I've mentioned in : freebsd-hackers I've been told 4.4-RC has got problems with the PCCARD : code. Whether that can influence the ep driver is beyond my : knowledge. No. It is a works or doesn't kinda bug. If you are getting to the point of mounting with NFS, the network works. Unless you are seeing 1s ping times. Warner To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: 4.4-RC NFS panic
David Malone writes: > On Mon, Aug 20, 2001 at 07:51:17PM +0100, Walter C. Pelissero wrote: > > This enforces my belief that there is something broken in some deeper > > layer of the network code (see the remote printing issue). > > Just out of curiosity, what sort of network card is your Vaio using? > Someone else is seeing network related panics that might be related > to freeing an mbuf that's in use, and it's possible this might be > related. Mmmm, you might be right. I'm using a 3com 589, therefore I'm using the ep driver. Unfortunately I don't have a different PCMCIA network card at hand, but I can try to reverse the crash test with my server. Yes, the server is running exactly the same version (4.4-RC) but uses the de driver. So I did and, guess what, the find/cat test on my server over an NFS mounted directory from my Vaio ran without a problem. I've just done a further test. I've mounted a directory tree from Vaio to Vaio using localhost (lo driver) and the test has run smoothly. So chances would be good the bug is in the ep driver. Unfortunately... $ ls -l /sys/dev/ep total 70 -rw-r--r-- 1 root wheel 23554 Jul 17 2000 if_ep.c -rw-r--r-- 1 root wheel 6202 Jan 14 2000 if_ep_eisa.c -rw-r--r-- 1 root wheel 10046 Dec 16 2000 if_ep_isa.c -rw-r--r-- 1 root wheel 4584 Oct 27 1999 if_ep_mca.c -rw-r--r-- 1 root wheel 6950 Aug 9 2000 if_ep_pccard.c -rw-r--r-- 1 root wheel 13935 Jan 12 2000 if_epreg.h -rw-r--r-- 1 root wheel 2667 May 24 2000 if_epvar.h none of the modules belonging to the ep driver has been touched for a long time. Side note. Regarding a different problem I've mentioned in freebsd-hackers I've been told 4.4-RC has got problems with the PCCARD code. Whether that can influence the ep driver is beyond my knowledge. -- walter pelissero http://www.pelissero.org To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: 4.4-RC NFS panic
> > "etni" > > "inet" > Your string reversal function is buggy. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: 4.4-RC NFS panic
According to David Malone: > Just out of curiosity, what sort of network card is your Vaio using? > Someone else is seeing network related panics that might be related If this is a VAIO with built-in ethernet, then it is an fxp card. -- Ollivier ROBERT -=- FreeBSD: The Power to Serve! -=- [EMAIL PROTECTED] FreeBSD keltia.freenix.fr 5.0-CURRENT #80: Sun Jun 4 22:44:19 CEST 2000 To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: 4.4-RC NFS panic
On Mon, Aug 20, 2001 at 07:51:17PM +0100, Walter C. Pelissero wrote: > This enforces my belief that there is something broken in some deeper > layer of the network code (see the remote printing issue). Just out of curiosity, what sort of network card is your Vaio using? Someone else is seeing network related panics that might be related to freeing an mbuf that's in use, and it's possible this might be related. David. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: 4.4-RC NFS panic
John Baldwin writes: > > fault virtual address = 0x65746e69 > "etni" > > Looks like a string has gotten spammed across a data structure or a > weird pointer, etc. Whatever mess happend, I've got some news for you that should remove the NFS module from the list of possible causes. Currently I'm running an old 4.3-STABLE kernel and kldstat shows: Id Refs AddressSize Name 16 0xc000 4000 kernel 21 0xc0ae d000 msdos.ko 31 0xc0b0f000 6000 procfs.ko 41 0xc0b18000 4000 kernfs.ko 51 0xc0b3b000 4d000nfs.ko <--- ! 61 0xc0bae000 12000linux.ko That is, since my /modules is new, I've loaded the brand new 4.4-RC's NFS module, and it works without a glitch (at least for now). This enforces my belief that there is something broken in some deeper layer of the network code (see the remote printing issue). The time stamp of the older kernel is -r-xr-xr-x 1 root wheel 2408052 Jul 29 15:19 /kernel.good It's a pretty long period (almost a month) but maybe is possible to track down the mods to the network code till now. -- walter pelissero http://www.pelissero.org To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: 4.4-RC NFS panic
> "etni" oops I mean "inte" (as in "integer" > > Looks like a string has gotten spammed across a data structure or a weird > pointer, etc. > > From the previous panic: > > > fault virtual address = 0x33693d55 > "3i=U" > > That one looks more suspicious, but could still be part of a string of some > sort. > > -- > > John Baldwin <[EMAIL PROTECTED]> -- http://www.FreeBSD.org/~jhb/ > PGP Key: http://www.baldwin.cx/~john/pgpkey.asc > "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ > > To Unsubscribe: send mail to [EMAIL PROTECTED] > with "unsubscribe freebsd-hackers" in the body of the message > To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: 4.4-RC NFS panic
remember it's littel endian On Mon, 20 Aug 2001, John Baldwin wrote: > > fault virtual address = 0x65746e69 > "etni" "inet" > > Looks like a string has gotten spammed across a data structure or a weird > pointer, etc. > > From the previous panic: > > > fault virtual address = 0x33693d55 > "3i=U" > "U=i3" (as in U=i386_xxx;" To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: 4.4-RC NFS panic
On 20-Aug-01 Walter C. Pelissero wrote: > [ third time I retry to post this message on the mailing list ] > > Peter Pentchev writes: > > On Mon, Aug 20, 2001 at 12:27:24PM +0100, Walter C. Pelissero wrote: > > All those ??'s are the result of kgdb being unable to look inside > > a kernel module. Are you loading NFS as a module? > > Yep. I recompiled a kernel with almost all modules linked in. I > forgot some of them but I guess those don't hurt. > Now kldstat says: > > Id Refs AddressSize Name > 14 0xc010 298698 kernel > 21 0xc0399000 30e0 splash_bmp.ko > 31 0xc039d000 5458 vesa.ko > 41 0xc0b63000 19000usb.ko > > The panic is still easily reproducible and therefore I've got some > more details to show you: > > GNU gdb 4.18 > Copyright 1998 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "i386-unknown-freebsd". > Reading symbols from kernel.debug...done. > IdlePTD 4009984 > initial pcb at 311680 > panicstr: page fault > panic messages: > --- > Fatal trap 12: page fault while in kernel mode > fault virtual address = 0x65746e69 "etni" Looks like a string has gotten spammed across a data structure or a weird pointer, etc. >From the previous panic: > fault virtual address = 0x33693d55 "3i=U" That one looks more suspicious, but could still be part of a string of some sort. -- John Baldwin <[EMAIL PROTECTED]> -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc "Power Users Use the Power to Serve!" - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: 4.4-RC NFS panic
[ third time I retry to post this message on the mailing list ] Peter Pentchev writes: > On Mon, Aug 20, 2001 at 12:27:24PM +0100, Walter C. Pelissero wrote: > All those ??'s are the result of kgdb being unable to look inside > a kernel module. Are you loading NFS as a module? Yep. I recompiled a kernel with almost all modules linked in. I forgot some of them but I guess those don't hurt. Now kldstat says: Id Refs AddressSize Name 14 0xc010 298698 kernel 21 0xc0399000 30e0 splash_bmp.ko 31 0xc039d000 5458 vesa.ko 41 0xc0b63000 19000usb.ko The panic is still easily reproducible and therefore I've got some more details to show you: GNU gdb 4.18 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-unknown-freebsd". Reading symbols from kernel.debug...done. IdlePTD 4009984 initial pcb at 311680 panicstr: page fault panic messages: --- Fatal trap 12: page fault while in kernel mode fault virtual address = 0x65746e69 fault code = supervisor read, page not present instruction pointer = 0x8:0xc028782e stack pointer = 0x10:0xc780bccc frame pointer = 0x10:0xc780bd08 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 179 (nfsiod) interrupt mask = none trap number = 12 panic: page fault syncing disks... 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 done Uptime: 3m35s dumping to dev #ad/0x30001, offset 272736 dump ata0: resetting devices .. done 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 --- #0 dumpsys () at ../../kern/kern_shutdown.c:472 472 if (dumping++) { (kgdb) bt #0 dumpsys () at ../../kern/kern_shutdown.c:472 #1 0xc0159b17 in boot (howto=256) at ../../kern/kern_shutdown.c:312 #2 0xc0159ee4 in poweroff_wait (junk=0xc02cd40c, howto=-1070805201) at ../../kern/kern_shutdown.c:580 #3 0xc0289002 in trap_fatal (frame=0xc780bc8c, eva=1702129257) at ../../i386/i386/trap.c:956 #4 0xc0288cd5 in trap_pfault (frame=0xc780bc8c, usermode=0, eva=1702129257) at ../../i386/i386/trap.c:849 #5 0xc02888bf in trap (frame={tf_fs = 16, tf_es = -1019805680, tf_ds = -1062076400, tf_edi = -1003117116, tf_esi = 1702129257, tf_ebp = -947864312, tf_isp = -947864392, tf_ebx = 6716, tf_edx = -947864124, tf_ecx = 1679, tf_eax = 1589720923, tf_trapno = 12, tf_err = 0, tf_eip = -1071089618, tf_cs = 8, tf_eflags = 66066, tf_esp = 1397686380, tf_ss = 6716}) at ../../i386/i386/trap.c:448 #6 0xc028782e in generic_bcopy () #7 0xc01f994a in nfs_readrpc (vp=0xc78dc300, uiop=0xc780bdc4, cred=0xc0bc9d80) at ../../nfs/nfs_vnops.c:1118 #8 0xc01d3393 in nfs_doio (bp=0xc3373e60, cr=0xc0bc9d80, p=0x0) at ../../nfs/nfs_bio.c:1410 #9 0xc01f348e in nfssvc_iod (p=0xc77baf20) at ../../nfs/nfs_syscalls.c:970 #10 0xc01f1ed3 in nfssvc (p=0xc77baf20, uap=0xc780bf80) at ../../nfs/nfs_syscalls.c:166 #11 0xc02892ad in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = -1077936680, tf_esi = 0, tf_ebp = -1077936776, tf_isp = -947863596, tf_ebx = 2, tf_edx = 1, tf_ecx = 19, tf_eax = 155, tf_trapno = 12, tf_err = 2, tf_eip = 134515664, tf_cs = 31, tf_eflags = 643, tf_esp = -1077936852, tf_ss = 47}) at ../../i386/i386/trap.c:1155 #12 0xc027d635 in Xint0x80_syscall () #13 0x8048135 in ?? () Side note. I experienced another panic not directly related to NFS. During a high resolution print of a big image (something around 30MB postscript file) on a remote host (the NFS server) I got a panic, which might suggest the problem (if related) is in a deeper level than NFS. The remote printing panic is not so easy to reproduce so I gave up on that front. A nicer remark. The NFS server is up and running with a 4.4-RC (the same as my Vaio) since Friday without a single problem. I'm currently using a 4.3-STABLE and I don't get a panic whatsoever, so I assume the hardware is still all right. -- walter pelissero http://www.pelissero.org To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: 4.4-RC NFS panic
On Mon, Aug 20, 2001 at 12:27:24PM +0100, Walter C. Pelissero wrote: > [ it seems my original article didn't get through ] > > I recently upgraded to 4.4-RC. > Now my Vaio panics when I use NFS volumes (as client). > The panic is reproducible with a: > > find /some/NFS/mount/point -type f -exec cat {} \; >/dev/null > > Sometime I got a "page fault", sometime a "lockmgr: locking against myself" > > Here is a kgdb session: [snip] > #7 0xc016dbfc in m_freem (m=0xc0738a00) at ../../kern/uipc_mbuf.c:618 > #8 0xc0b59652 in ?? () > #9 0xc0b66b92 in ?? () > #10 0xc0b3fe37 in ?? () > #11 0xc0b606de in ?? () > #12 0xc0b5f11b in ?? () > #13 0xc023b75d in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, All those ??'s are the result of kgdb being unable to look inside a kernel module. Are you loading NFS as a module? What other modules are loaded at the time of the panic? Could you try compiling them statically into the kernel, see if the panic still happens, but with more debugging information? G'luck, Peter -- If this sentence didn't exist, somebody would have invented it. To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message