Re: Unable to boot recent -stable with MSI/MSIX enabled

2007-11-15 Thread Jack Vogel
On Nov 15, 2007 7:17 AM, John Baldwin <[EMAIL PROTECTED]> wrote:
> On Saturday 13 October 2007 11:59:40 am Alson van der Meulen wrote:
> > * Jack Vogel <[EMAIL PROTECTED]> [2007-10-13 05:19]:
> > > A suggestion, take the relevant files from my em driver and put
> > > them back into the kernel tree that was working on 10/1, it should
> > > be compatible. Then see if it breaks that kernel. Or if you'd prefer
> > > I can just email the tar ball for the Intel version of 6.6.6, you can
> > > disable the in-kernel em driver, and build the other and use that
> > > with the 10/1 kernel.
> >
>
> > Then I removed sys/dev/em and copied it from the 10/12 sources. I also
> > copied sys/conf/files, sys/conf/kern.pre.mk and sys/modules/em/Makelfile
> > from the 10/12 sources; this should be all of the 6.6.6 merge. Compiled
> > with same config, booted with MSI/MSIX enabled. Surprisingly, this
> > kernel behaved different than the 10/10 and 10/12 kernels. It booted OK
> > without any major errors, only a few watchdog timeouts and link down/ups
> > on em0. It was very slow though. Top showed 60% interrupt.
> >
> > vmstat -i:
> > interrupt  total   rate
> > irq4: sio0  3563  8
> > irq6: fdc0 1  0
> > irq14: ata0   58  0
> > irq16: fxp0 32076072  79593
> > irq21: atapci1+24300 60
> > cpu0: timer   793477   1968
> > Total   32897471  81631
> >
> > There wasn't much traffic on fxp0 (only ssh, ping and ntp). According to
> > dmesg, em0 used the same IRQ as fxp0, except it should be using MSI:
> > em0:  port 
> > 0xdf00-0xdf1f mem 0xfdde-0xfddf,0xfddc-0xfddd irq 16 at 
> > device 0.0 on pci5
> > em0: Reserved 0x2 bytes for rid 0x10 type 3 at 0xfdde
> > em0: attempting to allocate 1 MSI vectors (1 supported)
> > msi: routing MSI IRQ 256 to vector 56
> > em0: using IRQ 256 for MSI
> > em0: bpf attached
> > em0: Ethernet address: 00:15:17:19:59:e4
> > em0: [FAST]
> >
> > It appears that em0 still generates interrupts on irq16, even though it
> > should be using MSI.
>
> This was due to a bug with rman_set_rid() not getting used in 6-stable that
> broke the most recent MSI MFC.  The rman thing was fixed on 10/3, so MSI
> is not expected to work on 6-stable kernels between 8/15 and 10/3.
>
> Are you still having problems with em + MSI?

I got diverted with this S7000 bringup and haven't looked since that
last point in the thread. I am working in parallel on splitting the
E1000 driver and may be able to check this in a bit.

Jack
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Unable to boot recent -stable with MSI/MSIX enabled

2007-11-15 Thread John Baldwin
On Saturday 13 October 2007 11:59:40 am Alson van der Meulen wrote:
> * Jack Vogel <[EMAIL PROTECTED]> [2007-10-13 05:19]:
> > A suggestion, take the relevant files from my em driver and put
> > them back into the kernel tree that was working on 10/1, it should
> > be compatible. Then see if it breaks that kernel. Or if you'd prefer
> > I can just email the tar ball for the Intel version of 6.6.6, you can
> > disable the in-kernel em driver, and build the other and use that
> > with the 10/1 kernel.
> 
> Then I removed sys/dev/em and copied it from the 10/12 sources. I also
> copied sys/conf/files, sys/conf/kern.pre.mk and sys/modules/em/Makelfile
> from the 10/12 sources; this should be all of the 6.6.6 merge. Compiled
> with same config, booted with MSI/MSIX enabled. Surprisingly, this
> kernel behaved different than the 10/10 and 10/12 kernels. It booted OK
> without any major errors, only a few watchdog timeouts and link down/ups
> on em0. It was very slow though. Top showed 60% interrupt.
> 
> vmstat -i:
> interrupt  total   rate
> irq4: sio0  3563  8
> irq6: fdc0 1  0
> irq14: ata0   58  0
> irq16: fxp0 32076072  79593
> irq21: atapci1+24300 60
> cpu0: timer   793477   1968
> Total   32897471  81631
> 
> There wasn't much traffic on fxp0 (only ssh, ping and ntp). According to
> dmesg, em0 used the same IRQ as fxp0, except it should be using MSI:
> em0:  port 
> 0xdf00-0xdf1f mem 0xfdde-0xfddf,0xfddc-0xfddd irq 16 at 
> device 0.0 on pci5
> em0: Reserved 0x2 bytes for rid 0x10 type 3 at 0xfdde
> em0: attempting to allocate 1 MSI vectors (1 supported)
> msi: routing MSI IRQ 256 to vector 56
> em0: using IRQ 256 for MSI
> em0: bpf attached
> em0: Ethernet address: 00:15:17:19:59:e4
> em0: [FAST]
> 
> It appears that em0 still generates interrupts on irq16, even though it
> should be using MSI.

This was due to a bug with rman_set_rid() not getting used in 6-stable that
broke the most recent MSI MFC.  The rman thing was fixed on 10/3, so MSI
is not expected to work on 6-stable kernels between 8/15 and 10/3.

Are you still having problems with em + MSI?

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Unable to boot recent -stable with MSI/MSIX enabled

2007-10-13 Thread Alson van der Meulen
* Jack Vogel <[EMAIL PROTECTED]> [2007-10-13 05:19]:
> A suggestion, take the relevant files from my em driver and put
> them back into the kernel tree that was working on 10/1, it should
> be compatible. Then see if it breaks that kernel. Or if you'd prefer
> I can just email the tar ball for the Intel version of 6.6.6, you can
> disable the in-kernel em driver, and build the other and use that
> with the 10/1 kernel.

10/12 kernel (MSI enabled) with cable to em unplugged worked. As soon as
I plugged in the cable, I got these messages:
em0: Link is up 1000 Mbps Full Duplex
em0: link state changed to UP
em0: watchdog timeout -- resetting
em0: Link is Down
em0: link state changed to DOWN
em0: Link is up 1000 Mbps Full Duplex
em0: link state changed to UP
I don't think the watchdog timeout is normal, no other errors appeared
on the console.

The system no longer responded on any of the network interfaces, and it
couldn't properly read its disks either: exec format error on almost all
binaries, kill 1 gave a bunch of not found errors, on subsequent calls
the shell tried to execute kill as a shell script, had to use the
hardware reset button.

I did a checkout of src/sys (RELENG_6 of 2007-10-01 14:00 CEST). First
compiled a kernel from these sources without modifications, em version
6.2.9. This worked as expected with hw.pci.enable_msix?=1, probably
because this driver doesn't support MSI.

Then I removed sys/dev/em and copied it from the 10/12 sources. I also
copied sys/conf/files, sys/conf/kern.pre.mk and sys/modules/em/Makelfile
from the 10/12 sources; this should be all of the 6.6.6 merge. Compiled
with same config, booted with MSI/MSIX enabled. Surprisingly, this
kernel behaved different than the 10/10 and 10/12 kernels. It booted OK
without any major errors, only a few watchdog timeouts and link down/ups
on em0. It was very slow though. Top showed 60% interrupt.

vmstat -i:
interrupt  total   rate
irq4: sio0  3563  8
irq6: fdc0 1  0
irq14: ata0   58  0
irq16: fxp0 32076072  79593
irq21: atapci1+24300 60
cpu0: timer   793477   1968
Total   32897471  81631

There wasn't much traffic on fxp0 (only ssh, ping and ntp). According to
dmesg, em0 used the same IRQ as fxp0, except it should be using MSI:
em0:  port 0xdf00-0xdf1f 
mem 0xfdde-0xfddf,0xfddc-0xfddd irq 16 at device 0.0 on pci5
em0: Reserved 0x2 bytes for rid 0x10 type 3 at 0xfdde
em0: attempting to allocate 1 MSI vectors (1 supported)
msi: routing MSI IRQ 256 to vector 56
em0: using IRQ 256 for MSI
em0: bpf attached
em0: Ethernet address: 00:15:17:19:59:e4
em0: [FAST]

It appears that em0 still generates interrupts on irq16, even though it
should be using MSI.

vmstat -i with MSI disabled:
interrupt  total   rate
irq4: sio0   606  5
irq6: fdc0 1  0
irq14: ata0   58  0
irq16: em0 fxp0  971  8
irq21: atapci1+38127320
cpu0: timer   231905   1948
Total 271668   2282

regards,
Alson
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Unable to boot recent -stable with MSI/MSIX enabled

2007-10-12 Thread Jack Vogel
A suggestion, take the relevant files from my em driver and put
them back into the kernel tree that was working on 10/1, it should
be compatible. Then see if it breaks that kernel. Or if you'd prefer
I can just email the tar ball for the Intel version of 6.6.6, you can
disable the in-kernel em driver, and build the other and use that
with the 10/1 kernel.

Let me know what you'd prefer.

Jack


On 10/12/07, Alson van der Meulen <[EMAIL PROTECTED]> wrote:
> * Jack Vogel <[EMAIL PROTECTED]> [2007-10-13 01:30]:
> > Hmmm, so am I correct in understanding that this root is remote, so its
> > really coming in over the the em driver?
>
> No, the root is local: gmirror of two SATA disks on ATA (AHCI)
> controller, this host has no remote filesystems. em is not needed for
> mounting the root fs. I'm not 100% sure if em is to blame, but:
> - The em merge is the only remotely related commit to RELENG_6 that I
>   could find between October 1 and October 10.
> - Disabling MSI/MSIX fixes it, and em is the only MSI user as far as I
>   can see in the dmesg.
>
> It's possible that the use of MSI by em triggers a bug in the PCI/ATA
> driver. It's even possible that the chipset has broken MSI support (see
> previous mail for dmesgs).
>
> Friday morning (local time, CEST), it did boot up with the new kernel
> and mounted its root FS successfully, but when I attempted to log in a
> few hours later, none of the network interfaces (em and fxp) worked. fxp
> is not even on a PCIe link, but a PCI card, so it appears to break
> any PCI/PCIe device. Logging in via the console gave this error:
> getty[1709]: /usr/bin/login: Exec format error
> Probably because it couldn't properly access /usr (which is on ATA
> disks) anymore.
>
> The system appears to have worked initially, but started to fail when my
> workstation, which is directly connected to the em interface, was turned
> on. I also saw a watchdog timeout on the em interface about ten minutes
> after the link went up. After my workstation was turned on this box lost
> all network connections. Unplugging the cable to the em interface might
> prevent the problem to occur, this also points at the em driver as the
> trigger. I'll try to verify this.
>
> Below is a list of files in /usr/src/sys changed since the last working
> kernel of 2007-10-01. I don't see any PCI changes relevant to amd64, so
> it appears to be at least triggered by the em driver.
>
> regards,
> Alson
>
> ./alpha/isa/isa.c
> ./alpha/pci/apecs_pci.c
> ./alpha/pci/lca_pci.c
> ./alpha/pci/pcibus.c
> ./amd64/acpica/madt.c
> ./amd64/amd64/local_apic.c
> ./amd64/amd64/mp_machdep.c
> ./amd64/amd64/mptable.c
> ./amd64/amd64/nexus.c
> ./amd64/conf/NOTES
> ./amd64/include/apicvar.h
> ./arm/arm/nexus.c
> ./arm/xscale/i80321/i80321_pci.c
> ./arm/xscale/i80321/obio.c
> ./compat/ia32/ia32_sysvec.c
> ./conf/files
> ./conf/files.amd64
> ./conf/files.i386
> ./conf/kern.pre.mk
> ./dev/em/LICENSE
> ./dev/em/if_em.c
> ./dev/em/if_em.h
> ./dev/em/e1000_80003es2lan.c
> ./dev/em/e1000_80003es2lan.h
> ./dev/em/e1000_82540.c
> ./dev/em/e1000_82541.c
> ./dev/em/e1000_82541.h
> ./dev/em/e1000_82542.c
> ./dev/em/e1000_82543.c
> ./dev/em/e1000_82543.h
> ./dev/em/e1000_82571.c
> ./dev/em/e1000_82571.h
> ./dev/em/e1000_82575.c
> ./dev/em/e1000_82575.h
> ./dev/em/e1000_api.c
> ./dev/em/e1000_api.h
> ./dev/em/e1000_defines.h
> ./dev/em/e1000_hw.h
> ./dev/em/e1000_ich8lan.c
> ./dev/em/e1000_ich8lan.h
> ./dev/em/e1000_mac.c
> ./dev/em/e1000_mac.h
> ./dev/em/e1000_manage.c
> ./dev/em/e1000_manage.h
> ./dev/em/e1000_nvm.c
> ./dev/em/e1000_nvm.h
> ./dev/em/e1000_osdep.h
> ./dev/em/e1000_phy.c
> ./dev/em/e1000_phy.h
> ./dev/em/e1000_regs.h
> ./dev/re/if_re.c
> ./dev/mxge/eth_z8e.h
> ./dev/mxge/ethp_z8e.h
> ./dev/mxge/if_mxge.c
> ./dev/mxge/if_mxge_var.h
> ./dev/mxge/mcp_gen_header.h
> ./dev/mxge/mxge_lro.c
> ./dev/mxge/mxge_mcp.h
> ./dev/mxge/mxge_eth_z8e.c
> ./dev/mxge/mxge_ethp_z8e.c
> ./fs/devfs/devfs_vnops.c
> ./fs/fifofs/fifo_vnops.c
> ./i386/acpica/madt.c
> ./i386/conf/NOTES
> ./i386/i386/local_apic.c
> ./i386/i386/mp_machdep.c
> ./i386/i386/mptable.c
> ./i386/i386/nexus.c
> ./i386/include/apicvar.h
> ./ia64/ia64/nexus.c
> ./kern/uipc_usrreq.c
> ./kern/vfs_vnops.c
> ./modules/acpi/Makefile
> ./modules/em/Makefile
> ./modules/mxge/mxge_eth_z8e/Makefile
> ./modules/mxge/mxge_ethp_z8e/Makefile
> ./net/if_bridge.c
> ./netgraph/ng_l2tp.c
> ./opencrypto/cryptodev.c
> ./powerpc/powermac/grackle.c
> ./powerpc/powermac/hrowpic.c
> ./powerpc/powermac/macio.c
> ./powerpc/powermac/uninorth.c
> ./powerpc/powerpc/openpic.c
> ./powerpc/psim/iobus.c
> ./sparc64/ebus/ebus.c
> ./sparc64/isa/ofw_isa.c
> ./sparc64/pci/apb.c
> ./sparc64/pci/ofw_pci.c
> ./sparc64/pci/ofw_pcib_subr.c
> ./sparc64/pci/ofw_pcibus.c
> ./sparc64/pci/psycho.c
> ./sparc64/sbus/sbus.c
> ./sparc64/sparc64/nexus.c
> ./vm/vnode_pager.c
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To un

Re: Unable to boot recent -stable with MSI/MSIX enabled

2007-10-12 Thread Alson van der Meulen
* Jack Vogel <[EMAIL PROTECTED]> [2007-10-13 01:30]:
> Hmmm, so am I correct in understanding that this root is remote, so its
> really coming in over the the em driver?

No, the root is local: gmirror of two SATA disks on ATA (AHCI)
controller, this host has no remote filesystems. em is not needed for
mounting the root fs. I'm not 100% sure if em is to blame, but:
- The em merge is the only remotely related commit to RELENG_6 that I
  could find between October 1 and October 10.
- Disabling MSI/MSIX fixes it, and em is the only MSI user as far as I
  can see in the dmesg.

It's possible that the use of MSI by em triggers a bug in the PCI/ATA
driver. It's even possible that the chipset has broken MSI support (see
previous mail for dmesgs).

Friday morning (local time, CEST), it did boot up with the new kernel
and mounted its root FS successfully, but when I attempted to log in a
few hours later, none of the network interfaces (em and fxp) worked. fxp
is not even on a PCIe link, but a PCI card, so it appears to break
any PCI/PCIe device. Logging in via the console gave this error:
getty[1709]: /usr/bin/login: Exec format error
Probably because it couldn't properly access /usr (which is on ATA
disks) anymore.

The system appears to have worked initially, but started to fail when my
workstation, which is directly connected to the em interface, was turned
on. I also saw a watchdog timeout on the em interface about ten minutes
after the link went up. After my workstation was turned on this box lost
all network connections. Unplugging the cable to the em interface might
prevent the problem to occur, this also points at the em driver as the
trigger. I'll try to verify this.

Below is a list of files in /usr/src/sys changed since the last working
kernel of 2007-10-01. I don't see any PCI changes relevant to amd64, so
it appears to be at least triggered by the em driver.

regards,
Alson

./alpha/isa/isa.c
./alpha/pci/apecs_pci.c
./alpha/pci/lca_pci.c
./alpha/pci/pcibus.c
./amd64/acpica/madt.c
./amd64/amd64/local_apic.c
./amd64/amd64/mp_machdep.c
./amd64/amd64/mptable.c
./amd64/amd64/nexus.c
./amd64/conf/NOTES
./amd64/include/apicvar.h
./arm/arm/nexus.c
./arm/xscale/i80321/i80321_pci.c
./arm/xscale/i80321/obio.c
./compat/ia32/ia32_sysvec.c
./conf/files
./conf/files.amd64
./conf/files.i386
./conf/kern.pre.mk
./dev/em/LICENSE
./dev/em/if_em.c
./dev/em/if_em.h
./dev/em/e1000_80003es2lan.c
./dev/em/e1000_80003es2lan.h
./dev/em/e1000_82540.c
./dev/em/e1000_82541.c
./dev/em/e1000_82541.h
./dev/em/e1000_82542.c
./dev/em/e1000_82543.c
./dev/em/e1000_82543.h
./dev/em/e1000_82571.c
./dev/em/e1000_82571.h
./dev/em/e1000_82575.c
./dev/em/e1000_82575.h
./dev/em/e1000_api.c
./dev/em/e1000_api.h
./dev/em/e1000_defines.h
./dev/em/e1000_hw.h
./dev/em/e1000_ich8lan.c
./dev/em/e1000_ich8lan.h
./dev/em/e1000_mac.c
./dev/em/e1000_mac.h
./dev/em/e1000_manage.c
./dev/em/e1000_manage.h
./dev/em/e1000_nvm.c
./dev/em/e1000_nvm.h
./dev/em/e1000_osdep.h
./dev/em/e1000_phy.c
./dev/em/e1000_phy.h
./dev/em/e1000_regs.h
./dev/re/if_re.c
./dev/mxge/eth_z8e.h
./dev/mxge/ethp_z8e.h
./dev/mxge/if_mxge.c
./dev/mxge/if_mxge_var.h
./dev/mxge/mcp_gen_header.h
./dev/mxge/mxge_lro.c
./dev/mxge/mxge_mcp.h
./dev/mxge/mxge_eth_z8e.c
./dev/mxge/mxge_ethp_z8e.c
./fs/devfs/devfs_vnops.c
./fs/fifofs/fifo_vnops.c
./i386/acpica/madt.c
./i386/conf/NOTES
./i386/i386/local_apic.c
./i386/i386/mp_machdep.c
./i386/i386/mptable.c
./i386/i386/nexus.c
./i386/include/apicvar.h
./ia64/ia64/nexus.c
./kern/uipc_usrreq.c
./kern/vfs_vnops.c
./modules/acpi/Makefile
./modules/em/Makefile
./modules/mxge/mxge_eth_z8e/Makefile
./modules/mxge/mxge_ethp_z8e/Makefile
./net/if_bridge.c
./netgraph/ng_l2tp.c
./opencrypto/cryptodev.c
./powerpc/powermac/grackle.c
./powerpc/powermac/hrowpic.c
./powerpc/powermac/macio.c
./powerpc/powermac/uninorth.c
./powerpc/powerpc/openpic.c
./powerpc/psim/iobus.c
./sparc64/ebus/ebus.c
./sparc64/isa/ofw_isa.c
./sparc64/pci/apb.c
./sparc64/pci/ofw_pci.c
./sparc64/pci/ofw_pcib_subr.c
./sparc64/pci/ofw_pcibus.c
./sparc64/pci/psycho.c
./sparc64/sbus/sbus.c
./sparc64/sparc64/nexus.c
./vm/vnode_pager.c
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Unable to boot recent -stable with MSI/MSIX enabled

2007-10-12 Thread Jack Vogel
On 10/12/07, Alson van der Meulen <[EMAIL PROTECTED]> wrote:
> Hi,
>
> After upgrading from RELENG_6 of 2007-10-01 to 2007-10-10, one of my
> computers didn't boot anymore. It failed to mount its root device
> because geom_mirror didn't find its volume, and manually specifying one
> of the partitions didn't work either. Same with a kernel from today's
> -stable sources. The kernel is a GENERIC amd64 kernel plus 'device puc'.
>
> It works if I disable MSI/MSIX support by commenting out these lines in
> loader.conf (I added them myself at some point):
> hw.pci.enable_msi=1
> hw.pci.enable_msix=1
>
> The only relevant commit I could find between 2007-10-01 and 2007-10-10
> is the em merge. This box does have an Intel Pro/1000PT (PCIe) NIC. I
> believe that the new MSI support in the em driver upsets the ATA driver.

Hmmm, so am I correct in understanding that this root is remote, so its
really coming in over the the em driver?

I remember someone having problems with remote boot/mounts at some
point, think it was Sam, but we resolved it, and I don't remember what
it was, but it had nothing to do with MSI/X.

This is pretty odd, I've never run into anything like it so far. Anyone else
with ideas or suggestions?

Regards,

Jack
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"