Interrupt issues with amd64 and an IXSystems 1U server (5.1)

2011-05-12 Thread Brian Buhrow
Hello.  I've got anew system I'm building where I have interrupt
issues with with the GENERIC kernel. This is an amd64 system with
NetBSD-5.1 with sources as of about 3 weeks ago.
The INSTALL kernel works fine, but when I boot the GENERIC kernel, the
ethernet doesn't work.  I get a bunch of device timeout messages.  Both the
INSTALL and GENERIC kernels are  built from the same 5.1 sources.
Anyone have ideas on why the INSTALL kernel works fine, but the
GENERIC kernel does not?  They look to be the same to me in so far as  the
INSTALL config includes the GENERIC config.  I rebuilt the GENERIC kernel
to turn off MTRR support, as the INSTALL kernel does, but that doesn't
help, though it does remove the FIXME message about there being too many
MTRR devices on the system.

-thanks
-Brian


Dmesg looks like:


Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
2006, 2007, 2008, 2009, 2010
The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights reserved.

NetBSD 5.1_STABLE (GENERIC) #1: Wed May 11 12:48:14 PDT 2011

buh...@lothlorien.nfbcal.org:/usr/local/netbsd/obj/sys/arch/amd64/compile/GENERIC
total memory = 12279 MB
avail memory = 11889 MB
timecounter: Timecounters tick every 10.000 msec
timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
SMBIOS rev. 2.6 @ 0x99c00 (81 entries)
iXsystems iX1204-563TUB (1234567890)
mainbus0 (root)
WARNING: can't reserve area for I/O APIC.
cpu0 at mainbus0 apid 0: Intel 686-class, 2400MHz, id 0x206c2
cpu1 at mainbus0 apid 2: Intel 686-class, 2400MHz, id 0x206c2
cpu2 at mainbus0 apid 18: Intel 686-class, 2400MHz, id 0x206c2
cpu3 at mainbus0 apid 20: Intel 686-class, 2400MHz, id 0x206c2
cpu4 at mainbus0 apid 32: Intel 686-class, 2400MHz, id 0x206c2
cpu5 at mainbus0 apid 34: Intel 686-class, 2400MHz, id 0x206c2
cpu6 at mainbus0 apid 50: Intel 686-class, 2400MHz, id 0x206c2
cpu7 at mainbus0 apid 52: Intel 686-class, 2400MHz, id 0x206c2
cpu8 at mainbus0 apid 1: Intel 686-class, 2400MHz, id 0x206c2
cpu9 at mainbus0 apid 3: Intel 686-class, 2400MHz, id 0x206c2
cpu10 at mainbus0 apid 19: Intel 686-class, 2400MHz, id 0x206c2
cpu11 at mainbus0 apid 21: Intel 686-class, 2400MHz, id 0x206c2
cpu12 at mainbus0 apid 33: Intel 686-class, 2400MHz, id 0x206c2
cpu13 at mainbus0 apid 35: Intel 686-class, 2400MHz, id 0x206c2
cpu14 at mainbus0 apid 51: Intel 686-class, 2400MHz, id 0x206c2
cpu15 at mainbus0 apid 53: Intel 686-class, 2400MHz, id 0x206c2
ioapic0 at mainbus0 apid 6: pa 0xfec0, version 20, 24 pins
ioapic1 at mainbus0 apid 7: pa 0xfec8a000, version 20, 24 pins
acpi0 at mainbus0: Intel ACPICA 20080321
acpi0: X/RSDT: OemId , AslId 
acpi0: SCI interrupting at int 9
acpi0: fixed-feature power button present
timecounter: Timecounter "ACPI-Fast" frequency 3579545 Hz quality 1000
ACPI-Fast 24-bit timer
acpi: activated PNP0C0F
acpi: activated PNP0C0F
attimer1 at acpi0 (TMR, PNP0100): io 0x40-0x43 irq 0
pcppi1 at acpi0 (SPKR, PNP0800): io 0x61
midi0 at pcppi1: PC speaker (CPU-intensive output)
sysbeep0 at pcppi1
UAR1 (PNP0501) at acpi0 not configured
UAR2 (PNP0501) at acpi0 not configured
hpet0 at acpi0 (HPET, PNP0103): mem 0xfed0-0xfed003ff
timecounter: Timecounter "hpet0" frequency 14318179 Hz quality 2000
acpibut0 at acpi0 (PWRB, PNP0C0C-170): ACPI Power Button
attimer1: attached to pcppi1
ipmi0 at mainbus0
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0
pchb0: Intel product 0x3406 (rev. 0x22)
ppb0 at pci0 dev 1 function 0: Intel product 0x3408 (rev. 0x22)
ppb0: unsupported PCI Express version
pci1 at ppb0 bus 1
pci1: i/o space, memory space enabled, rd/line, wr/inv ok
wm0 at pci1 dev 0 function 0: 82576 1000BaseT Ethernet, rev. 1
wm0: interrupting at ioapic1 pin 4
wm0: PCI-Express bus
wm0: 65536 word (16 address bits) SPI EEPROM
wm0: Ethernet address 00:25:90:35:XX:XX
igphy0 at wm0 phy 1: i82566 10/100/1000 media interface, rev. 1
igphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-FDX, auto
ukphy0 at wm0 phy 22: Generic IEEE 802.3u media interface
ukphy0: OUI 0x00, model 0x, rev. 0
ukphy0: 10baseT-FDX, 100baseTX-FDX, 100baseT4, auto
wm1 at pci1 dev 0 function 1: 82576 1000BaseT Ethernet, rev. 1
wm1: interrupting at ioapic1 pin 16
wm1: PCI-Express bus
wm1: 65536 word (16 address bits) SPI EEPROM
wm1: Ethernet address 00:25:90:35:XX:XX
igphy1 at wm1 phy 1: i82566 10/100/1000 media interface, rev. 1
igphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-FDX, auto
ppb1 at pci0 dev 3 function 0: Intel product 0x340a (rev. 0x22)
ppb1: unsupported PCI Express version
pci2 at ppb1 bus 2
pci2: i/o space, memory space enabled, rd/line, wr/inv ok
ppb2 at pci0 dev 5 function 0: Intel product 0x340c (rev. 0x22)
ppb2: unsupported PCI Express version
pci3 at ppb2 

re: Interrupt issues with amd64 and an IXSystems 1U server (5.1)

2011-05-12 Thread matthew green

>   Anyone have ideas on why the INSTALL kernel works fine, but the
> GENERIC kernel does not?  They look to be the same to me in so far as  the
> INSTALL config includes the GENERIC config.  I rebuilt the GENERIC kernel
> to turn off MTRR   support, as the INSTALL kernel does, but that doesn't
> help, though it does remove the FIXME message about there being too many
> MTRR devices on the system.

FWIW, turning it off shouldn't affect anything on modern systems it
just means the extra MTRRs aren't available for use.


.mrg.


Re: Interrupt issues with amd64 and an IXSystems 1U server (5.1)

2011-05-12 Thread Manuel Bouyer
On Thu, May 12, 2011 at 12:12:54PM -0700, Brian Buhrow wrote:
>   Hello.  I've got anew system I'm building where I have interrupt
> issues with with the GENERIC kernel. This is an amd64 system with
> NetBSD-5.1 with sources as of about 3 weeks ago.
>   The INSTALL kernel works fine, but when I boot the GENERIC kernel, the
> ethernet doesn't work.  I get a bunch of device timeout messages.  Both the
> INSTALL and GENERIC kernels are  built from the same 5.1 sources.
>   Anyone have ideas on why the INSTALL kernel works fine, but the
> GENERIC kernel does not?  They look to be the same to me in so far as  the
> INSTALL config includes the GENERIC config.  I rebuilt the GENERIC kernel
> to turn off MTRR   support, as the INSTALL kernel does, but that doesn't
> help, though it does remove the FIXME message about there being too many
> MTRR devices on the system.

Did you try to diff the dmesg of working and non-working kernels ?

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Interrupt issues with amd64 and an IXSystems 1U server (5.1)

2011-05-12 Thread Brian Buhrow
Hello.  I did try doing a diff of working and non-working kernels, but
the differences weren't obvious to me.  However, I think the problem is
that the network chips are interrupting on ioapic1, where as everything
else is interrupting on ioapic0.  It looks like even though ioapic1
interrupts are showing up in vmstat -i output, they're not actually causing
the drivers to get triggered.  It looks like this is a potentially known
issue, but I can't find any particular patches which address the problem.
Anyone else ever seen times when interrupts from ioapic1 or higher don't
get routed properly?
-thanks
-Brian
On May 12, 11:54pm, Manuel Bouyer wrote:
} Subject: Re: Interrupt issues with amd64 and an IXSystems 1U server (5.1)
} On Thu, May 12, 2011 at 12:12:54PM -0700, Brian Buhrow wrote:
} > Hello.  I've got anew system I'm building where I have interrupt
} > issues with with the GENERIC kernel. This is an amd64 system with
} > NetBSD-5.1 with sources as of about 3 weeks ago.
} > The INSTALL kernel works fine, but when I boot the GENERIC kernel, the
} > ethernet doesn't work.  I get a bunch of device timeout messages.  Both the
} > INSTALL and GENERIC kernels are  built from the same 5.1 sources.
} > Anyone have ideas on why the INSTALL kernel works fine, but the
} > GENERIC kernel does not?  They look to be the same to me in so far as  the
} > INSTALL config includes the GENERIC config.  I rebuilt the GENERIC kernel
} > to turn off MTRR support, as the INSTALL kernel does, but that doesn't
} > help, though it does remove the FIXME message about there being too many
} > MTRR devices on the system.
} 
} Did you try to diff the dmesg of working and non-working kernels ?
} 
} -- 
} Manuel Bouyer 
}  NetBSD: 26 ans d'experience feront toujours la difference
} --
>-- End of excerpt from Manuel Bouyer




Re: Interrupt issues with amd64 and an IXSystems 1U server (5.1)

2011-05-12 Thread Paul Goyette
I had a similar issue on a SuperMicro motherboard.  The attached patch 
is for -current but should not be too hard to adapt to 5.1




On Thu, 12 May 2011, Brian Buhrow wrote:


Hello.  I did try doing a diff of working and non-working kernels, but
the differences weren't obvious to me.  However, I think the problem is
that the network chips are interrupting on ioapic1, where as everything
else is interrupting on ioapic0.  It looks like even though ioapic1
interrupts are showing up in vmstat -i output, they're not actually causing
the drivers to get triggered.  It looks like this is a potentially known
issue, but I can't find any particular patches which address the problem.
Anyone else ever seen times when interrupts from ioapic1 or higher don't
get routed properly?
-thanks
-Brian
On May 12, 11:54pm, Manuel Bouyer wrote:
} Subject: Re: Interrupt issues with amd64 and an IXSystems 1U server (5.1)
} On Thu, May 12, 2011 at 12:12:54PM -0700, Brian Buhrow wrote:
} >  Hello.  I've got anew system I'm building where I have interrupt
} > issues with with the GENERIC kernel. This is an amd64 system with
} > NetBSD-5.1 with sources as of about 3 weeks ago.
} >  The INSTALL kernel works fine, but when I boot the GENERIC kernel, the
} > ethernet doesn't work.  I get a bunch of device timeout messages.  Both the
} > INSTALL and GENERIC kernels are  built from the same 5.1 sources.
} >  Anyone have ideas on why the INSTALL kernel works fine, but the
} > GENERIC kernel does not?  They look to be the same to me in so far as  the
} > INSTALL config includes the GENERIC config.  I rebuilt the GENERIC kernel
} > to turn off MTRR  support, as the INSTALL kernel does, but that doesn't
} > help, though it does remove the FIXME message about there being too many
} > MTRR devices on the system.
}
} Did you try to diff the dmesg of working and non-working kernels ?
}
} --
} Manuel Bouyer 
}  NetBSD: 26 ans d'experience feront toujours la difference
} --

-- End of excerpt from Manuel Bouyer




!DSPAM:4dcc62302402595018236!





-
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:   |
| Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com|
| Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net |
| Kernel Developer |  | pgoyette at netbsd.org  |
-Index: src/sys/arch/x86/x86/mpacpi.c
===
RCS file: /cvsroot/src/sys/arch/x86/x86/mpacpi.c,v
retrieving revision 1.91
diff -u -p -r1.91 mpacpi.c
--- src/sys/arch/x86/x86/mpacpi.c   5 Apr 2011 13:17:04 -   1.91
+++ src/sys/arch/x86/x86/mpacpi.c   12 May 2011 23:04:06 -
@@ -625,7 +625,9 @@ mpacpi_derive_bus(ACPI_HANDLE handle, st
if (ACPI_FAILURE(rv))
goto out;
 
-   if (acpi_match_hid(devinfo, pciroot_hid)) {
+   if (acpi_match_hid(devinfo, pciroot_hid) &&
+   ((devinfo->Valid & ACPI_VALID_STA) == 0 ||
+   (devinfo->CurrentStatus & ACPI_STA_OK) == ACPI_STA_OK)) {
rv = mpacpi_get_bbn(acpi, parent, &bus);
if (ACPI_FAILURE(rv))
bus = 0;