Serious i386 interrupt mask bug in RELENG_4 (was Re: 4.4-RC NFS panic)

2001-08-23 Thread Ian Dowse

In message [EMAIL PROTECTED], Warner Losh writes:

I think that might be due to a bug in the shared interrupt code that
Ian Dowse sent me about earlier today.

Just to add a few details - there is a bug in the update_masks()
function in i386/isa/intr_machdep.c that can cause some interrupts
to occur at times when they should be masked. The problem only
occurs with certain configurations of shared interrupts and devices,
and this code is only present in RELENG_4.

The update_masks() function is called after an interrupt handler
has been registered or removed. Its main function is to update the
interrupt masks (tty_imask, net_imask etc) if necessary (e.g if
IRQ11 is registered by a tty-type device, IRQ11 will be added to
tty_imask so that future spltty()'s will mask IRQ11).

A second function of update_masks() is to update the cached copy
of the interrupt mask stored with each handler for a multiplexed
interrupt. This is done via the call to update_mux_masks().

The bug is that update_masks() returns without calling update_mux_masks()
in some cases where it should call it. Specifically, if a newly-added
multiplexed interrupt handler has the same maskptr as another
handler on the same IRQ line, that new handler doesn't get it's
cached mask set. For example if a single IRQ has a usb device and
a modem (tty), the second device to register it's handler will get
its idesc-mask set to 0 instead of the value of tty_imask because
update_mux_masks() may never be called to set it. Of course, if
update_masks() is called later for some other device it may correct
the situation.

Interrupt handlers are called with intr_mask[irq] or'd into the
cpl to block further interrupts; for non-multiplexed interrupts
intr_mask[irq] will set from one of the *_imask masks. However with
multiplexed interrupts, only the IRQ itself (and SWI_CLOCK_MASK)
are blocked, and the multiplex handler intr_mux() needs to raise
the cpl further when necessary. It uses idesc-mask to control
this.

When this bug occurs, idesc-mask == 0, so the device interrupt
handler gets called with only the IRQ and SWI_CLOCK_MASK masked,
instead of the full *_mask that it requested. Not good.

On my laptop, this bug causes hangs within minutes of starting to
use a pccard modem, but as should be apparent from the above it
could strike virtually anywhere that multiplexed interrupts are
used. The patch below seems to solve the problem; it just causes
update_masks() to unconditionally update the masks.

Ian


Index: intr_machdep.c
===
RCS file: /home/iedowse/CVS/src/sys/i386/isa/intr_machdep.c,v
retrieving revision 1.29.2.2
diff -u -r1.29.2.2 intr_machdep.c
--- intr_machdep.c  2000/08/16 05:35:34 1.29.2.2
+++ intr_machdep.c  2001/08/23 20:24:17
@@ -651,15 +651,9 @@
 
if (find_idesc(maskptr, irq) == NULL) {
/* no reference to this maskptr was found in this irq's chain */
-   if ((*maskptr  mask) == 0)
-   return;
-   /* the irq was included in the classes mask, remove it */
*maskptr = ~mask;
} else {
/* a reference to this maskptr was found in this irq's chain */
-   if ((*maskptr  mask) != 0)
-   return;
-   /* put the irq into the classes mask */
*maskptr |= mask;
}
/* we need to update all values in the intr_mask[irq] array */


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Serious i386 interrupt mask bug in RELENG_4 (was Re: 4.4-RC NFS panic)

2001-08-23 Thread Walter C. Pelissero

Ian Dowse writes:
  In message [EMAIL PROTECTED], Warner Losh writes:
  
  I think that might be due to a bug in the shared interrupt code that
  Ian Dowse sent me about earlier today.
  
  Just to add a few details - there is a bug in the update_masks()
  function in i386/isa/intr_machdep.c that can cause some interrupts
  to occur at times when they should be masked. The problem only
  occurs with certain configurations of shared interrupts and devices,
  and this code is only present in RELENG_4.

Congratulations!
I've applied your patch together with the one posted by Warner Losh
and now the PCMCIA card is working again and the find/cat test passed
without panic.

-- 
walter pelissero
http://www.pelissero.org

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: 4.4-RC NFS panic

2001-08-22 Thread simond

On Tue, Aug 21, 2001 at 12:24:30PM +0200, Andre Albsmeier wrote:
 On Tue, 21-Aug-2001 at 03:07:33 -0600, Warner Losh wrote:
  In message [EMAIL PROTECTED] Andre Albsmeier writes:
  : As I wrote in my PR (#29845), my problems also happen with
  : the 3C589 which uses the ep driver. So we can sum up to:
  : 
  : 1.) Intel Etherexpress PRO/100 PCMCIA (xe driver)  crashes
  : 2.) 3Com 589D EtherLink III PCMCIA (ep driver) crashes
  : 3.) Intel Etherexpress PRO/100+ PCI Card (fxp driver)  works perfectly
  
  Interesting.  I'm not sure what to make of this.
 
 We can now add:
 
 4.) D-Link DFE-650 PCMCIA (ed driver)freezes
 
 :-(
 
 Warner, I have seen your mails regarding pcic-44rc1.diff.1.
 My box has a TI PCI-1225 chip... I will try the patch...

I've been having similar problems with my 4.4-RC Vaio F807K whenever I
do a lot of NFS over my wi0 (Buffalo wireless card), every so often my
laptop just completely freezes.

-- 
Simon Dick  [EMAIL PROTECTED]
Why do I get this urge to go bowling everytime I see Tux?

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: 4.4-RC NFS panic

2001-08-22 Thread Walter C. Pelissero


Warner Losh writes:
  After talking with Ian Dowse, I think that we've hammered out what may 
  cause this.  Basically, the problem is

I'm afraid your patch didn't fix the problem on my laptop.  It
certainly changed the behaviour and the system doesn't crash any more,
but I'm almost unable to use the net.

A ping to my server yelds the IP address to be resolved but no ping
activity is carried on.  Even worse, now the pcm driver fails to
detect any sound device.  8-|

Regarding the warm boot, I can confirm the same behavior (already
pointed out in another mail of mine).  My impression it's not a PCCARD
issue as it happens even with no card inserted.  The system looks as
frozen but if I press the Pause key and then type something and then
press again the Pause key I get the the cursor moved of the amount
of typing I did.  No echo though.

-- 
walter pelissero
http://www.pelissero.org

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: 4.4-RC NFS panic

2001-08-22 Thread Warner Losh

In message [EMAIL PROTECTED] [EMAIL PROTECTED] writes:
: I've been having similar problems with my 4.4-RC Vaio F807K whenever I
: do a lot of NFS over my wi0 (Buffalo wireless card), every so often my
: laptop just completely freezes.

I think that might be due to a bug in the shared interrupt code that
Ian Dowse sent me about earlier today.

Warner

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: 4.4-RC NFS panic

2001-08-21 Thread David Malone

 I've just done a further test.  I've mounted a directory tree from
 Vaio to Vaio using localhost (lo driver) and the test has run
 smoothly.  So chances would be good the bug is in the ep driver.
 Unfortunately...

Andre Albsmeier, who's seeing various network problems, is using
the xe driver (also PCMCIA I think), but the problems go away if
he uses an Etherexpress card on the PCI bus of the same machine.

It seems unlikely to be PCMCIA related ('cos it has nothing to do
with the networking itself) it may just be triggered in machines
with slower networking.

David.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: 4.4-RC NFS panic

2001-08-21 Thread Andre Albsmeier

On Tue, 21-Aug-2001 at 09:35:34 +0100, David Malone wrote:
  I've just done a further test.  I've mounted a directory tree from
  Vaio to Vaio using localhost (lo driver) and the test has run
  smoothly.  So chances would be good the bug is in the ep driver.
  Unfortunately...
 
 Andre Albsmeier, who's seeing various network problems, is using
 the xe driver (also PCMCIA I think), but the problems go away if
 he uses an Etherexpress card on the PCI bus of the same machine.

As I wrote in my PR (#29845), my problems also happen with
the 3C589 which uses the ep driver. So we can sum up to:

1.) Intel Etherexpress PRO/100 PCMCIA (xe driver)  crashes
2.) 3Com 589D EtherLink III PCMCIA (ep driver) crashes
3.) Intel Etherexpress PRO/100+ PCI Card (fxp driver)  works perfectly


-Andre

 
 It seems unlikely to be PCMCIA related ('cos it has nothing to do
 with the networking itself) it may just be triggered in machines
 with slower networking.
 
   David.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: 4.4-RC NFS panic

2001-08-21 Thread Warner Losh

In message [EMAIL PROTECTED] Andre Albsmeier writes:
: As I wrote in my PR (#29845), my problems also happen with
: the 3C589 which uses the ep driver. So we can sum up to:
: 
: 1.) Intel Etherexpress PRO/100 PCMCIA (xe driver)  crashes
: 2.) 3Com 589D EtherLink III PCMCIA (ep driver) crashes
: 3.) Intel Etherexpress PRO/100+ PCI Card (fxp driver)  works perfectly

Interesting.  I'm not sure what to make of this.

Warner

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: 4.4-RC NFS panic

2001-08-21 Thread Andre Albsmeier

On Tue, 21-Aug-2001 at 03:07:33 -0600, Warner Losh wrote:
 In message [EMAIL PROTECTED] Andre Albsmeier writes:
 : As I wrote in my PR (#29845), my problems also happen with
 : the 3C589 which uses the ep driver. So we can sum up to:
 : 
 : 1.) Intel Etherexpress PRO/100 PCMCIA (xe driver)  crashes
 : 2.) 3Com 589D EtherLink III PCMCIA (ep driver) crashes
 : 3.) Intel Etherexpress PRO/100+ PCI Card (fxp driver)  works perfectly
 
 Interesting.  I'm not sure what to make of this.

So do I. Ian Dowse already sent me a program to inspect the mbufs
in the crashdumps. I don't know a lot about mbufs but the
output appears really hosed...

I will try it again using another PCMICA card I just got...

-Andre

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: 4.4-RC NFS panic

2001-08-21 Thread Andre Albsmeier

On Tue, 21-Aug-2001 at 03:07:33 -0600, Warner Losh wrote:
 In message [EMAIL PROTECTED] Andre Albsmeier writes:
 : As I wrote in my PR (#29845), my problems also happen with
 : the 3C589 which uses the ep driver. So we can sum up to:
 : 
 : 1.) Intel Etherexpress PRO/100 PCMCIA (xe driver)  crashes
 : 2.) 3Com 589D EtherLink III PCMCIA (ep driver) crashes
 : 3.) Intel Etherexpress PRO/100+ PCI Card (fxp driver)  works perfectly
 
 Interesting.  I'm not sure what to make of this.

We can now add:

4.) D-Link DFE-650 PCMCIA (ed driver)freezes

:-(

Warner, I have seen your mails regarding pcic-44rc1.diff.1.
My box has a TI PCI-1225 chip... I will try the patch...

-Andre

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: 4.4-RC NFS panic

2001-08-21 Thread Andre Albsmeier

On Tue, 21-Aug-2001 at 11:45:12 -0600, Warner Losh wrote:
 In message [EMAIL PROTECTED] David Malone writes:
 : Andre Albsmeier, who's seeing various network problems, is using
 : the xe driver (also PCMCIA I think), but the problems go away if
 : he uses an Etherexpress card on the PCI bus of the same machine.
 : 
 : It seems unlikely to be PCMCIA related ('cos it has nothing to do
 : with the networking itself) it may just be triggered in machines
 : with slower networking.
 
 After talking with Ian Dowse, I think that we've hammered out what may 
 cause this.  Basically, the problem is
 
   code in net doing splnet()
 
   interrupt here - pcic_pci_intr - netcard_intr - network code.
 
 And we've interrupted the critical section, broken all kinds of
 invariants.
 
 Warner
 
 P.S.  I think that with Ian's other interrupt changes, we can do the
 following w/o problems.  This should fix the network problems, I
 think.

Runs perfectly for about 10 minutes now under full load. It didn't
survive 10 seconds before :-)

I still have the hangs on a warm reboot but this is a different
story...

Thanks a lot for the quick help!

-Andre

 
 Index: pcic_pci.c
 ===
 RCS file: /cache/ncvs/src/sys/pccard/pcic_pci.c,v
 retrieving revision 1.54.2.7
 diff -u -r1.54.2.7 pcic_pci.c
 --- pcic_pci.c2001/08/21 09:06:25 1.54.2.7
 +++ pcic_pci.c2001/08/21 17:18:06
 @@ -515,15 +515,6 @@
* in the CD change.
*/
   sp-getb(sp, PCIC_STAT_CHG);
 -
 - /*
 -  * If we have a card in the slot with an interrupt handler, then
 -  * call it.  Note: This means that each card can have at most one
 -  * interrupt handler for it.  Since multifunction cards aren't
 -  * supported, this shouldn't cause a problem in practice.
 -  */
 - if (sc-cd_present  sp-intr != NULL)
 - sp-intr(sp-argp);
  }
  
  /*
 @@ -784,36 +775,6 @@
   return (0);
  }
  
 -static int
 -pcic_pci_setup_intr(device_t dev, device_t child, struct resource *irq,
 -int flags, driver_intr_t *intr, void *arg, void **cookiep)
 -{
 - struct pcic_softc *sc = (struct pcic_softc *) device_get_softc(dev);
 - struct pcic_slot *sp = sc-slots[0];
 - 
 - if (sp-intr) {
 - device_printf(dev,
 -Interrupt already established, possible multiple attach bug.\n);
 - return (EINVAL);
 - }
 - sp-intr = intr;
 - sp-argp = arg;
 - *cookiep = sc;
 - return (0);
 -}
 -
 -static int
 -pcic_pci_teardown_intr(device_t dev, device_t child, struct resource *irq,
 -void *cookie)
 -{
 - struct pcic_softc *sc = (struct pcic_softc *) device_get_softc(dev);
 - struct pcic_slot *sp = sc-slots[0];
 -
 - sp-intr = NULL;
 - sp-argp = NULL;
 - return (0);
 -}
 -
  static device_method_t pcic_pci_methods[] = {
   /* Device interface */
   DEVMETHOD(device_probe, pcic_pci_probe),
 @@ -829,8 +790,8 @@
   DEVMETHOD(bus_release_resource, bus_generic_release_resource),
   DEVMETHOD(bus_activate_resource, pcic_activate_resource),
   DEVMETHOD(bus_deactivate_resource, pcic_deactivate_resource),
 - DEVMETHOD(bus_setup_intr,   pcic_pci_setup_intr),
 - DEVMETHOD(bus_teardown_intr,pcic_pci_teardown_intr),
 + DEVMETHOD(bus_setup_intr,   bus_generic_setup_intr),
 + DEVMETHOD(bus_teardown_intr,bus_generic_teardown_intr),
  
   /* Card interface */
   DEVMETHOD(card_set_res_flags,   pcic_set_res_flags),

-- 
BSD, from the people who brought you TCP/IP.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: 4.4-RC NFS panic

2001-08-21 Thread Warner Losh

In message [EMAIL PROTECTED] Andre Albsmeier writes:
: I still have the hangs on a warm reboot but this is a different
: story...

Eh?  what kind of hangs and when?

Warner

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: 4.4-RC NFS panic

2001-08-21 Thread Andre Albsmeier

On Tue, 21-Aug-2001 at 23:44:40 -0600, Warner Losh wrote:
 In message [EMAIL PROTECTED] Andre Albsmeier writes:
 : I still have the hangs on a warm reboot but this is a different
 : story...
 
 Eh?  what kind of hangs and when?

Attached below is the dmesg... It hangs only when warm booting; after
a power toggle everything is OK...


Copyright (c) 1992-2001 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 4.4-RC #23: Wed Aug 22 07:21:34 CEST 2001
[EMAIL PROTECTED]:/src/obj-4/src/src-4/sys/schlappy
Calibrating clock(s) ... TSC clock: 30160 Hz, i8254 clock: 1193146 Hz
Timecounter i8254  frequency 1193146 Hz
CPU: Pentium II/Pentium II Xeon/Celeron (366.66-MHz 686-class CPU)
  Origin = GenuineIntel  Id = 0x66a  Stepping = 10
  
Features=0x183f9ffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR
real memory  = 134152192 (131008K bytes)
Physical memory chunk(s):
0x1000 - 0x0009efff, 647168 bytes (158 pages)
0x00325000 - 0x07febfff, 130838528 bytes (31943 pages)
avail memory = 127590400 (124600K bytes)
bios32: Found BIOS32 Service Directory header at 0xc00f6230
bios32: Entry = 0xfd790 (c00fd790)  Rev = 0  Len = 1
pcibios: PCI BIOS entry at 0x225
pnpbios: Found PnP BIOS data at 0xc00f6260
pnpbios: Entry = f:a34e  Rev = 1.0
pnpbios: Event flag at 4b4
Other BIOS signatures found:
ACPI: 000f61f0
Preloaded elf kernel kernel at 0xc02ff000.
Pentium Pro MTRR support enabled
pci_open(1):mode 1 addr port (0x0cf8) is 0x8000384c
pci_open(1a):   mode1res=0x8000 (0x8000)
pci_cfgcheck:   device 0 [class=06] [hdr=00] is there (id=71908086)
Using $PIR table, 7 entries at 0xc00fdf50
apm0: APM BIOS on motherboard
apm: found APM BIOS v1.2, connected at v1.2
npx0: math processor on motherboard
npx0: INT 16 interface
pcib0: Intel 82443BX (440 BX) host to PCI bridge on motherboard
found- vendor=0x8086, dev=0x7190, revid=0x03
class=06-00-00, hdrtype=0x00, mfdev=0
subordinatebus=0secondarybus=0
map[10]: type 1, range 32, base f800, size 26
found- vendor=0x8086, dev=0x7191, revid=0x03
class=06-04-00, hdrtype=0x01, mfdev=0
subordinatebus=1secondarybus=1
found- vendor=0x8086, dev=0x7110, revid=0x02
class=06-80-00, hdrtype=0x00, mfdev=1
subordinatebus=0secondarybus=0
found- vendor=0x8086, dev=0x7111, revid=0x01
class=01-01-80, hdrtype=0x00, mfdev=0
subordinatebus=0secondarybus=0
map[20]: type 1, range 32, base fcd0, size  4
found- vendor=0x8086, dev=0x7112, revid=0x01
class=0c-03-00, hdrtype=0x00, mfdev=0
subordinatebus=0secondarybus=0
intpin=d, irq=9
map[20]: type 1, range 32, base fce0, size  5
found- vendor=0x8086, dev=0x7113, revid=0x02
class=06-80-00, hdrtype=0x00, mfdev=0
subordinatebus=0secondarybus=0
map[90]: type 1, range 32, base 2180, size  4
found- vendor=0x104c, dev=0xac1c, revid=0x01
class=06-07-00, hdrtype=0x02, mfdev=1
subordinatebus=0secondarybus=0
intpin=a, irq=10
found- vendor=0x104c, dev=0xac1c, revid=0x01
class=06-07-00, hdrtype=0x02, mfdev=1
subordinatebus=0secondarybus=0
intpin=b, irq=11
pci0: PCI bus on pcib0
pcib1: Intel 82443BX (440 BX) PCI-PCI (AGP) bridge at device 1.0 on pci0
found- vendor=0x10c8, dev=0x0005, revid=0x12
class=03-00-00, hdrtype=0x00, mfdev=1
subordinatebus=0secondarybus=0
intpin=a, irq=10
map[10]: type 1, range 32, base f600, size 24
map[14]: type 1, range 32, base fe40, size 22
map[18]: type 1, range 32, base feb0, size 20
found- vendor=0x10c8, dev=0x8005, revid=0x12
class=04-01-00, hdrtype=0x00, mfdev=1
subordinatebus=0secondarybus=0
intpin=b, irq=11
map[10]: type 1, range 32, base f780, size 22
map[14]: type 1, range 32, base fea0, size 20
pci1: PCI bus on pcib1
pci1: NeoMagic MagicMedia 256AV SVGA controller (vendor=0x10c8, dev=0x0005) at 0.0 
irq 10
chip1: NeoMagic MagicMedia 256AX Audio controller mem 
0xfea0-0xfeaf,0xf780-0xf7bf irq 11 at device 0.1 on pci1
isab0: Intel 82371AB PCI to ISA bridge at device 7.0 on pci0
isa0: ISA bus on isab0
atapci0: Intel PIIX4 ATA33 controller port 0xfcd0-0xfcdf at device 7.1 on pci0
ata0: iobase=0x01f0 altiobase=0x03f6 bmaddr=0xfcd0
ata0: mask=03 status0=50 status1=50
ata0: mask=03 ostat0=50 ostat2=50
ata0-slave: ATAPI probe a=14 b=eb
ata0-master: ATAPI probe a=00 b=00
ata0: mask=03 status0=50 status1=00
ata0-master: ATA probe a=01 b=a5
ata0: devices=09
ata0: at 0x1f0 irq 14 on atapci0
ata1: iobase=0x0170 altiobase=0x0376 bmaddr=0xfcd8
ata1: mask=00 status0=ff status1=ff
ata1: probe allocation failed
pci0: Intel 82371AB/EB (PIIX4) USB controller (vendor=0x8086, dev=0x7112) at 

Re: 4.4-RC NFS panic

2001-08-21 Thread Warner Losh

In message [EMAIL PROTECTED] Andre Albsmeier writes:
: Attached below is the dmesg... It hangs only when warm booting; after
: a power toggle everything is OK...

...

: pcic0: Event mask 0xf stat 0x3419
: ###
: ###   Now it hangs until poweroff/poweron   ###
: ###

OK.  Looks like maybe an interrupt storm on warm boot.  I'll have to
check into this.

Warner

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: 4.4-RC NFS panic

2001-08-20 Thread Peter Pentchev

On Mon, Aug 20, 2001 at 12:27:24PM +0100, Walter C. Pelissero wrote:
 [ it seems my original article didn't get through ]
 
 I recently upgraded to 4.4-RC.
 Now my Vaio panics when I use NFS volumes (as client).
 The panic is reproducible with a:
 
 find /some/NFS/mount/point -type f -exec cat {} \; /dev/null
 
 Sometime I got a page fault, sometime a lockmgr: locking against myself
 
 Here is a kgdb session:
[snip]
 #7  0xc016dbfc in m_freem (m=0xc0738a00) at ../../kern/uipc_mbuf.c:618
 #8  0xc0b59652 in ?? ()
 #9  0xc0b66b92 in ?? ()
 #10 0xc0b3fe37 in ?? ()
 #11 0xc0b606de in ?? ()
 #12 0xc0b5f11b in ?? ()
 #13 0xc023b75d in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, 

All those ??'s are the result of kgdb being unable to look inside
a kernel module.  Are you loading NFS as a module?  What other modules
are loaded at the time of the panic?  Could you try compiling them
statically into the kernel, see if the panic still happens, but with
more debugging information?

G'luck,
Peter

-- 
If this sentence didn't exist, somebody would have invented it.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: 4.4-RC NFS panic

2001-08-20 Thread Walter C. Pelissero

[ third time I retry to post this message on the mailing list ]

Peter Pentchev writes:
  On Mon, Aug 20, 2001 at 12:27:24PM +0100, Walter C. Pelissero wrote:
  All those ??'s are the result of kgdb being unable to look inside
  a kernel module.  Are you loading NFS as a module?

Yep.  I recompiled a kernel with almost all modules linked in.  I
forgot some of them but I guess those don't hurt.
Now kldstat says:

Id Refs AddressSize Name
 14 0xc010 298698   kernel
 21 0xc0399000 30e0 splash_bmp.ko
 31 0xc039d000 5458 vesa.ko
 41 0xc0b63000 19000usb.ko

The panic is still easily reproducible and therefore I've got some
more details to show you:

GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as i386-unknown-freebsd.
Reading symbols from kernel.debug...done.
IdlePTD 4009984
initial pcb at 311680
panicstr: page fault
panic messages:
---
Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x65746e69
fault code  = supervisor read, page not present
instruction pointer = 0x8:0xc028782e
stack pointer   = 0x10:0xc780bccc
frame pointer   = 0x10:0xc780bd08
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 179 (nfsiod)
interrupt mask  = none
trap number = 12
panic: page fault

syncing disks... 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 
done
Uptime: 3m35s

dumping to dev #ad/0x30001, offset 272736
dump ata0: resetting devices .. done
127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 108 107 
106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 
80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 
51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 
22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 
---
#0  dumpsys () at ../../kern/kern_shutdown.c:472
472 if (dumping++) {
(kgdb) bt
#0  dumpsys () at ../../kern/kern_shutdown.c:472
#1  0xc0159b17 in boot (howto=256) at ../../kern/kern_shutdown.c:312
#2  0xc0159ee4 in poweroff_wait (junk=0xc02cd40c, howto=-1070805201)
at ../../kern/kern_shutdown.c:580
#3  0xc0289002 in trap_fatal (frame=0xc780bc8c, eva=1702129257)
at ../../i386/i386/trap.c:956
#4  0xc0288cd5 in trap_pfault (frame=0xc780bc8c, usermode=0, eva=1702129257)
at ../../i386/i386/trap.c:849
#5  0xc02888bf in trap (frame={tf_fs = 16, tf_es = -1019805680, 
  tf_ds = -1062076400, tf_edi = -1003117116, tf_esi = 1702129257, 
  tf_ebp = -947864312, tf_isp = -947864392, tf_ebx = 6716, 
  tf_edx = -947864124, tf_ecx = 1679, tf_eax = 1589720923, tf_trapno = 12, 
  tf_err = 0, tf_eip = -1071089618, tf_cs = 8, tf_eflags = 66066, 
  tf_esp = 1397686380, tf_ss = 6716}) at ../../i386/i386/trap.c:448
#6  0xc028782e in generic_bcopy ()
#7  0xc01f994a in nfs_readrpc (vp=0xc78dc300, uiop=0xc780bdc4, cred=0xc0bc9d80)
at ../../nfs/nfs_vnops.c:1118
#8  0xc01d3393 in nfs_doio (bp=0xc3373e60, cr=0xc0bc9d80, p=0x0)
at ../../nfs/nfs_bio.c:1410
#9  0xc01f348e in nfssvc_iod (p=0xc77baf20) at ../../nfs/nfs_syscalls.c:970
#10 0xc01f1ed3 in nfssvc (p=0xc77baf20, uap=0xc780bf80)
at ../../nfs/nfs_syscalls.c:166
#11 0xc02892ad in syscall2 (frame={tf_fs = 47, tf_es = 47, tf_ds = 47, 
  tf_edi = -1077936680, tf_esi = 0, tf_ebp = -1077936776, 
  tf_isp = -947863596, tf_ebx = 2, tf_edx = 1, tf_ecx = 19, tf_eax = 155, 
  tf_trapno = 12, tf_err = 2, tf_eip = 134515664, tf_cs = 31, 
  tf_eflags = 643, tf_esp = -1077936852, tf_ss = 47})
at ../../i386/i386/trap.c:1155
#12 0xc027d635 in Xint0x80_syscall ()
#13 0x8048135 in ?? ()


Side note.  I experienced another panic not directly related to NFS.
During a high resolution print of a big image (something around 30MB
postscript file) on a remote host (the NFS server) I got a panic,
which might suggest the problem (if related) is in a deeper level than
NFS.  The remote printing panic is not so easy to reproduce so I gave
up on that front.

A nicer remark.  The NFS server is up and running with a 4.4-RC (the
same as my Vaio) since Friday without a single problem.  I'm currently
using a 4.3-STABLE and I don't get a panic whatsoever, so I assume the
hardware is still all right.

-- 
walter pelissero
http://www.pelissero.org

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: 4.4-RC NFS panic

2001-08-20 Thread John Baldwin


On 20-Aug-01 Walter C. Pelissero wrote:
 [ third time I retry to post this message on the mailing list ]
 
 Peter Pentchev writes:
   On Mon, Aug 20, 2001 at 12:27:24PM +0100, Walter C. Pelissero wrote:
   All those ??'s are the result of kgdb being unable to look inside
   a kernel module.  Are you loading NFS as a module?
 
 Yep.  I recompiled a kernel with almost all modules linked in.  I
 forgot some of them but I guess those don't hurt.
 Now kldstat says:
 
 Id Refs AddressSize Name
  14 0xc010 298698   kernel
  21 0xc0399000 30e0 splash_bmp.ko
  31 0xc039d000 5458 vesa.ko
  41 0xc0b63000 19000usb.ko
 
 The panic is still easily reproducible and therefore I've got some
 more details to show you:
 
 GNU gdb 4.18
 Copyright 1998 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you are
 welcome to change it and/or distribute copies of it under certain conditions.
 Type show copying to see the conditions.
 There is absolutely no warranty for GDB.  Type show warranty for details.
 This GDB was configured as i386-unknown-freebsd.
 Reading symbols from kernel.debug...done.
 IdlePTD 4009984
 initial pcb at 311680
 panicstr: page fault
 panic messages:
 ---
 Fatal trap 12: page fault while in kernel mode
 fault virtual address = 0x65746e69
etni

Looks like a string has gotten spammed across a data structure or a weird
pointer, etc.

From the previous panic:

 fault virtual address = 0x33693d55
3i=U

That one looks more suspicious, but could still be part of a string of some
sort. 

-- 

John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: 4.4-RC NFS panic

2001-08-20 Thread Julian Elischer

remember it's littel endian



On Mon, 20 Aug 2001, John Baldwin wrote:
  fault virtual address = 0x65746e69
 etni

inet

 
 Looks like a string has gotten spammed across a data structure or a weird
 pointer, etc.
 
 From the previous panic:
 
  fault virtual address = 0x33693d55
 3i=U
 

U=i3
(as in U=i386_xxx;





To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: 4.4-RC NFS panic

2001-08-20 Thread Julian Elischer

 etni


oops I mean inte (as in integer

 
 Looks like a string has gotten spammed across a data structure or a weird
 pointer, etc.
 
 From the previous panic:
 
  fault virtual address = 0x33693d55
 3i=U
 
 That one looks more suspicious, but could still be part of a string of some
 sort. 
 
 -- 
 
 John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/
 PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
 Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with unsubscribe freebsd-hackers in the body of the message
 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: 4.4-RC NFS panic

2001-08-20 Thread Walter C. Pelissero

John Baldwin writes:
   fault virtual address = 0x65746e69
  etni
  
  Looks like a string has gotten spammed across a data structure or a
  weird pointer, etc.

Whatever mess happend, I've got some news for you that should remove
the NFS module from the list of possible causes.  Currently I'm
running an old 4.3-STABLE kernel and kldstat shows:

Id Refs AddressSize Name
 16 0xc000 4000 kernel
 21 0xc0ae d000 msdos.ko
 31 0xc0b0f000 6000 procfs.ko
 41 0xc0b18000 4000 kernfs.ko
 51 0xc0b3b000 4d000nfs.ko  --- !
 61 0xc0bae000 12000linux.ko

That is, since my /modules is new, I've loaded the brand new 4.4-RC's
NFS module, and it works without a glitch (at least for now).

This enforces my belief that there is something broken in some deeper
layer of the network code (see the remote printing issue).

The time stamp of the older kernel is

-r-xr-xr-x  1 root  wheel  2408052 Jul 29 15:19 /kernel.good

It's a pretty long period (almost a month) but maybe is possible to
track down the mods to the network code till now.

-- 
walter pelissero
http://www.pelissero.org

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: 4.4-RC NFS panic

2001-08-20 Thread David Malone

On Mon, Aug 20, 2001 at 07:51:17PM +0100, Walter C. Pelissero wrote:
 This enforces my belief that there is something broken in some deeper
 layer of the network code (see the remote printing issue).

Just out of curiosity, what sort of network card is your Vaio using?
Someone else is seeing network related panics that might be related
to freeing an mbuf that's in use, and it's possible this might be
related.

David.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: 4.4-RC NFS panic

2001-08-20 Thread Joseph Mallett

  etni
 
 inet
 

Your string reversal function is buggy.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: 4.4-RC NFS panic

2001-08-20 Thread Walter C. Pelissero

David Malone writes:
  On Mon, Aug 20, 2001 at 07:51:17PM +0100, Walter C. Pelissero wrote:
   This enforces my belief that there is something broken in some deeper
   layer of the network code (see the remote printing issue).
  
  Just out of curiosity, what sort of network card is your Vaio using?
  Someone else is seeing network related panics that might be related
  to freeing an mbuf that's in use, and it's possible this might be
  related.

Mmmm, you might be right. I'm using a 3com 589, therefore I'm using
the ep driver.

Unfortunately I don't have a different PCMCIA network card at hand,
but I can try to reverse the crash test with my server.  Yes, the
server is running exactly the same version (4.4-RC) but uses the de
driver.

So I did and, guess what, the find/cat test on my server over an NFS
mounted directory from my Vaio ran without a problem.

I've just done a further test.  I've mounted a directory tree from
Vaio to Vaio using localhost (lo driver) and the test has run
smoothly.  So chances would be good the bug is in the ep driver.
Unfortunately...

$ ls -l /sys/dev/ep
total 70
-rw-r--r--  1 root  wheel  23554 Jul 17  2000 if_ep.c
-rw-r--r--  1 root  wheel   6202 Jan 14  2000 if_ep_eisa.c
-rw-r--r--  1 root  wheel  10046 Dec 16  2000 if_ep_isa.c
-rw-r--r--  1 root  wheel   4584 Oct 27  1999 if_ep_mca.c
-rw-r--r--  1 root  wheel   6950 Aug  9  2000 if_ep_pccard.c
-rw-r--r--  1 root  wheel  13935 Jan 12  2000 if_epreg.h
-rw-r--r--  1 root  wheel   2667 May 24  2000 if_epvar.h

none of the modules belonging to the ep driver has been touched for a
long time.

Side note.  Regarding a different problem I've mentioned in
freebsd-hackers I've been told 4.4-RC has got problems with the PCCARD
code.  Whether that can influence the ep driver is beyond my
knowledge.

-- 
walter pelissero
http://www.pelissero.org

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: 4.4-RC NFS panic

2001-08-20 Thread Warner Losh

In message [EMAIL PROTECTED] Walter C. Pelissero 
writes:
: Mmmm, you might be right. I'm using a 3com 589, therefore I'm using
: the ep driver.

The ep driver has been a little flakey under heavy load (like NFS) for
a while.

: Side note.  Regarding a different problem I've mentioned in
: freebsd-hackers I've been told 4.4-RC has got problems with the PCCARD
: code.  Whether that can influence the ep driver is beyond my
: knowledge.

No.  It is a works or doesn't kinda bug.  If you are getting to the
point of mounting with NFS, the network works.  Unless you are seeing
1s ping times.

Warner

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message