Re: Strange network problem. Debugging hints needed.
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi all I still have this issue with 4.9. Please let me know if I could assist with any additional data. During the last week I had this problem about 3 times, so I can run tests when the issue is present but I don't know what I could check. I can say the problem seems to occur only with vr(4). bge(4) and em(4) work fine with an otherwise identical configuration. I'd really appreciate some help on this...(This is also PR6546) Regards Andri Am 21.01.2011 00:38, schrieb Andre Keller: Hi there I have a strange problem with network connectivity on a device of mine. The setup is carp on vlan on vr(4). The problem is that the link runs for 10minutes, 10hours or 10days and suddenly it stops working. Doing a ifconfig vr0 down ; ifconfig vr0 up solves the problem temporarly but as you can imagine I'd like to have a more permanent solution. The problem is there are no obvious indications. The port on the switch (C 2960) stays up (and it is not errdisabled), there are no errors. Configuring the interface 100 full or autoselect (on both switch and device) does not make a difference. The error counters on the switch as well as netstat -i do not show any errors. I setup the same configuration (carp on vlan on physical interface) using em(4) and did not run into the problem yet (3 weeks up). So I guess I could have something to do with vr(4). The problem appeared first after updateing 4.8 to 20101222 snapshot, and is still present with snapshot from this week. But prior 20101222 snapshot there were no carp and vlan interfaces, just an ip on the physical interface. So I don't know if the problem is my configuration or something that has changed in the code... dmesg: OpenBSD 4.9-beta (GENERIC) #628: Tue Jan 18 14:14:07 MST 2011 t...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC cpu0: Geode(TM) Integrated Processor by AMD PCS (AuthenticAMD 586-class) 499 MHz cpu0: FPU,DE,PSE,TSC,MSR,CX8,SEP,PGE,CMOV,CFLUSH,MMX real mem = 268009472 (255MB) avail mem = 253489152 (241MB) mainbus0 at root bios0 at mainbus0: AT/286+ BIOS, date 11/05/08, BIOS32 rev. 0 @ 0xfd088 pcibios0 at bios0: rev 2.1 @ 0xf/0x1 pcibios0: pcibios_get_intr_routing - function not supported pcibios0: PCI IRQ Routing information unavailable. pcibios0: PCI bus #0 is the last bus bios0: ROM list: 0xe/0xa800 cpu0 at mainbus0: (uniprocessor) pci0 at mainbus0 bus 0: configuration mode 1 (bios) pchb0 at pci0 dev 1 function 0 AMD Geode LX rev 0x33 glxsb0 at pci0 dev 1 function 2 AMD Geode LX Crypto rev 0x00: RNG AES vr0 at pci0 dev 9 function 0 VIA VT6105M RhineIII rev 0x96: irq 10, address 00:0d:b9:17:c0:60 ukphy0 at vr0 phy 1: Generic IEEE 802.3u media interface, rev. 3: OUI 0x004063, model 0x0034 vr1 at pci0 dev 10 function 0 VIA VT6105M RhineIII rev 0x96: irq 11, address 00:0d:b9:17:c0:61 ukphy1 at vr1 phy 1: Generic IEEE 802.3u media interface, rev. 3: OUI 0x004063, model 0x0034 vr2 at pci0 dev 11 function 0 VIA VT6105M RhineIII rev 0x96: irq 15, address 00:0d:b9:17:c0:62 ukphy2 at vr2 phy 1: Generic IEEE 802.3u media interface, rev. 3: OUI 0x004063, model 0x0034 glxpcib0 at pci0 dev 15 function 0 AMD CS5536 ISA rev 0x03: rev 3, 32-bit 3579545Hz timer, watchdog, gpio gpio0 at glxpcib0: 32 pins pciide0 at pci0 dev 15 function 2 AMD CS5536 IDE rev 0x01: DMA, channel 0 wired to compatibility, channel 1 wired to compatibility wd0 at pciide0 channel 0 drive 0: CF 4GB wd0: 1-sector PIO, LBA, 3823MB, 7831152 sectors wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2 pciide0: channel 1 ignored (disabled) ohci0 at pci0 dev 15 function 4 AMD CS5536 USB rev 0x02: irq 12, version 1.0, legacy support ehci0 at pci0 dev 15 function 5 AMD CS5536 USB rev 0x02: irq 12 usb0 at ehci0: USB revision 2.0 uhub0 at usb0 AMD EHCI root hub rev 2.00/1.00 addr 1 isa0 at glxpcib0 isadma0 at isa0 com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo com0: console com1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo pcppi0 at isa0 port 0x61 spkr0 at pcppi0 npx0 at isa0 port 0xf0/16: reported by CPUID; using exception 16 usb1 at ohci0: USB revision 1.0 uhub1 at usb1 AMD OHCI root hub rev 1.00/1.00 addr 1 biomask 73e7 netmask ffe7 ttymask mtrr: K6-family MTRR support (2 registers) nvram: invalid checksum vscsi0 at root scsibus0 at vscsi0: 256 targets softraid0 at root root on wd0a swap on wd0b dump on wd0b clock: unknown CMOS layout ifconfig (first two octets / words of ip exchanged): lo0: flags=8049UP,LOOPBACK,RUNNING,MULTICAST mtu 33200 priority: 0 groups: lo inet 127.0.0.1 netmask 0xff00 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x5 vr0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500 lladdr 00:0d:b9:17:c0:60 priority: 0 media: Ethernet 100baseTX full-duplex status: active inet6
Re: Strange network problem. Debugging hints needed.
On 2011-01-21, Markus Hennecke markus-henne...@markus-hennecke.de wrote: There was a well known bug that would cause vr devices to stop receiving and sending traffic if a short cable was used to connect to the switch you're thinking of sis(4) for the short cable problem. there was a problem with VT6105M with vr(4) fixed 2009/04/28 where some link-state changes caused the nic to wedge. there was also a problem with the MCLGETI code (added on 2009/06/18) where high packet rates (or possibly a busy machine) caused the nic to wedge. a fix for this was committed recently (2011/01/13). if people are still seeing problems with vr(4) wedging on a 4.9-beta kernel they should write up a report with as much detail as possible and submit it as a PR with sendbug(1).
Re: Strange network problem. Debugging hints needed.
Am 21.01.2011 10:42, schrieb Stuart Henderson: On 2011-01-21, Markus Henneckemarkus-henne...@markus-hennecke.de wrote: There was a well known bug that would cause vr devices to stop receiving and sending traffic if a short cable was used to connect to the switch you're thinking of sis(4) for the short cable problem. there was a problem with VT6105M with vr(4) fixed 2009/04/28 where some link-state changes caused the nic to wedge. Ah, thanks for that information. I meant the latter but did not have all the facts available (I think the first time this got meantioned on the list was in 2007 in the thread vr driver trouble on Soekris 5501). there was also a problem with the MCLGETI code (added on 2009/06/18) where high packet rates (or possibly a busy machine) caused the nic to wedge. a fix for this was committed recently (2011/01/13). Looks like I am seeing this on my box here from time to time. Thanks for the information. Kind regards, Markus
Re: Strange network problem. Debugging hints needed.
On 2011/01/21 11:16, Markus Hennecke wrote: Am 21.01.2011 10:42, schrieb Stuart Henderson: On 2011-01-21, Markus Henneckemarkus-henne...@markus-hennecke.de wrote: There was a well known bug that would cause vr devices to stop receiving and sending traffic if a short cable was used to connect to the switch you're thinking of sis(4) for the short cable problem. there was a problem with VT6105M with vr(4) fixed 2009/04/28 where some link-state changes caused the nic to wedge. Ah, thanks for that information. I meant the latter but did not have all the facts available (I think the first time this got meantioned on the list was in 2007 in the thread vr driver trouble on Soekris 5501). there was also a problem with the MCLGETI code (added on 2009/06/18) where high packet rates (or possibly a busy machine) caused the nic to wedge. a fix for this was committed recently (2011/01/13). Looks like I am seeing this on my box here from time to time. Thanks for the information. This diff should apply to -stable. Index: if_vr.c === RCS file: /cvs/src/sys/dev/pci/if_vr.c,v retrieving revision 1.105.2.1 diff -u -p -r1.105.2.1 if_vr.c --- if_vr.c 2 Oct 2010 03:00:52 - 1.105.2.1 +++ if_vr.c 21 Jan 2011 10:28:49 - @@ -1048,15 +1048,11 @@ vr_intr(void *arg) /* Disable interrupts. */ CSR_WRITE_2(sc, VR_IMR, 0x); - for (;;) { - - status = CSR_READ_2(sc, VR_ISR); - if (status) - CSR_WRITE_2(sc, VR_ISR, status); - - if ((status VR_INTRS) == 0) - break; + status = CSR_READ_2(sc, VR_ISR); + if (status) + CSR_WRITE_2(sc, VR_ISR, status); + if (status VR_INTRS) { claimed = 1; if (status VR_ISR_RX_OK) @@ -1092,7 +1088,7 @@ vr_intr(void *arg) sc-sc_dev.dv_xname); vr_reset(sc); vr_init(sc); - break; + status = 0; } if ((status VR_ISR_TX_OK) || (status VR_ISR_TX_ABRT) ||
Strange network problem. Debugging hints needed.
Hi there I have a strange problem with network connectivity on a device of mine. The setup is carp on vlan on vr(4). The problem is that the link runs for 10minutes, 10hours or 10days and suddenly it stops working. Doing a ifconfig vr0 down ; ifconfig vr0 up solves the problem temporarly but as you can imagine I'd like to have a more permanent solution. The problem is there are no obvious indications. The port on the switch (C 2960) stays up (and it is not errdisabled), there are no errors. Configuring the interface 100 full or autoselect (on both switch and device) does not make a difference. The error counters on the switch as well as netstat -i do not show any errors. I setup the same configuration (carp on vlan on physical interface) using em(4) and did not run into the problem yet (3 weeks up). So I guess I could have something to do with vr(4). The problem appeared first after updateing 4.8 to 20101222 snapshot, and is still present with snapshot from this week. But prior 20101222 snapshot there were no carp and vlan interfaces, just an ip on the physical interface. So I don't know if the problem is my configuration or something that has changed in the code... dmesg: OpenBSD 4.9-beta (GENERIC) #628: Tue Jan 18 14:14:07 MST 2011 t...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC cpu0: Geode(TM) Integrated Processor by AMD PCS (AuthenticAMD 586-class) 499 MHz cpu0: FPU,DE,PSE,TSC,MSR,CX8,SEP,PGE,CMOV,CFLUSH,MMX real mem = 268009472 (255MB) avail mem = 253489152 (241MB) mainbus0 at root bios0 at mainbus0: AT/286+ BIOS, date 11/05/08, BIOS32 rev. 0 @ 0xfd088 pcibios0 at bios0: rev 2.1 @ 0xf/0x1 pcibios0: pcibios_get_intr_routing - function not supported pcibios0: PCI IRQ Routing information unavailable. pcibios0: PCI bus #0 is the last bus bios0: ROM list: 0xe/0xa800 cpu0 at mainbus0: (uniprocessor) pci0 at mainbus0 bus 0: configuration mode 1 (bios) pchb0 at pci0 dev 1 function 0 AMD Geode LX rev 0x33 glxsb0 at pci0 dev 1 function 2 AMD Geode LX Crypto rev 0x00: RNG AES vr0 at pci0 dev 9 function 0 VIA VT6105M RhineIII rev 0x96: irq 10, address 00:0d:b9:17:c0:60 ukphy0 at vr0 phy 1: Generic IEEE 802.3u media interface, rev. 3: OUI 0x004063, model 0x0034 vr1 at pci0 dev 10 function 0 VIA VT6105M RhineIII rev 0x96: irq 11, address 00:0d:b9:17:c0:61 ukphy1 at vr1 phy 1: Generic IEEE 802.3u media interface, rev. 3: OUI 0x004063, model 0x0034 vr2 at pci0 dev 11 function 0 VIA VT6105M RhineIII rev 0x96: irq 15, address 00:0d:b9:17:c0:62 ukphy2 at vr2 phy 1: Generic IEEE 802.3u media interface, rev. 3: OUI 0x004063, model 0x0034 glxpcib0 at pci0 dev 15 function 0 AMD CS5536 ISA rev 0x03: rev 3, 32-bit 3579545Hz timer, watchdog, gpio gpio0 at glxpcib0: 32 pins pciide0 at pci0 dev 15 function 2 AMD CS5536 IDE rev 0x01: DMA, channel 0 wired to compatibility, channel 1 wired to compatibility wd0 at pciide0 channel 0 drive 0: CF 4GB wd0: 1-sector PIO, LBA, 3823MB, 7831152 sectors wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2 pciide0: channel 1 ignored (disabled) ohci0 at pci0 dev 15 function 4 AMD CS5536 USB rev 0x02: irq 12, version 1.0, legacy support ehci0 at pci0 dev 15 function 5 AMD CS5536 USB rev 0x02: irq 12 usb0 at ehci0: USB revision 2.0 uhub0 at usb0 AMD EHCI root hub rev 2.00/1.00 addr 1 isa0 at glxpcib0 isadma0 at isa0 com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo com0: console com1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo pcppi0 at isa0 port 0x61 spkr0 at pcppi0 npx0 at isa0 port 0xf0/16: reported by CPUID; using exception 16 usb1 at ohci0: USB revision 1.0 uhub1 at usb1 AMD OHCI root hub rev 1.00/1.00 addr 1 biomask 73e7 netmask ffe7 ttymask mtrr: K6-family MTRR support (2 registers) nvram: invalid checksum vscsi0 at root scsibus0 at vscsi0: 256 targets softraid0 at root root on wd0a swap on wd0b dump on wd0b clock: unknown CMOS layout ifconfig (first two octets / words of ip exchanged): lo0: flags=8049UP,LOOPBACK,RUNNING,MULTICAST mtu 33200 priority: 0 groups: lo inet 127.0.0.1 netmask 0xff00 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x5 vr0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500 lladdr 00:0d:b9:17:c0:60 priority: 0 media: Ethernet 100baseTX full-duplex status: active inet6 fe80::20d:b9ff:fe17:c060%vr0 prefixlen 64 scopeid 0x1 vr1: flags=8b43UP,BROADCAST,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST mtu 1500 lladdr 00:0d:b9:17:c0:61 priority: 0 media: Ethernet 100baseTX full-duplex status: active inet6 fe80::20d:b9ff:fe17:c061%vr1 prefixlen 64 scopeid 0x2 vr2: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500 lladdr 00:0d:b9:17:c0:62 priority: 0 media: Ethernet autoselect (100baseTX full-duplex) status: active inet 10.11.255.253 netmask 0xfffc broadcast 10.11.255.255 inet6 fe80::20d:b9ff:fe17:c062%vr2 prefixlen 64
Re: Strange network problem. Debugging hints needed.
Am 21.01.2011 00:38, schrieb Andre Keller: I have a strange problem with network connectivity on a device of mine. The setup is carp on vlan on vr(4). The problem is that the link runs for 10minutes, 10hours or 10days and suddenly it stops working. Doing a ifconfig vr0 down ; ifconfig vr0 up solves the problem temporarly but as you can imagine I'd like to have a more permanent solution. The problem is there are no obvious indications. The port on the switch (C 2960) stays up (and it is not errdisabled), there are no errors. Configuring the interface 100 full or autoselect (on both switch and device) does not make a difference. The error counters on the switch as well as netstat -i do not show any errors. I setup the same configuration (carp on vlan on physical interface) using em(4) and did not run into the problem yet (3 weeks up). So I guess I could have something to do with vr(4). The problem appeared first after updateing 4.8 to 20101222 snapshot, and is still present with snapshot from this week. But prior 20101222 snapshot there were no carp and vlan interfaces, just an ip on the physical interface. So I don't know if the problem is my configuration or something that has changed in the code... There was a well known bug that would cause vr devices to stop receiving and sending traffic if a short cable was used to connect to the switch or other device (like a DSL modem). I really don't know if this was fixed at some time, but I think I still saw this bug on my router running 4.8-stable (a soekris 5501). I dealt with it monitoring the pppoe0 interface and doing the ifconfig up / down of the vr interface used by pppoe0 via script as soon as the pppoe connection is lost. As I worked around the problem I can't offer you a solution. Perhaps this information gives a hint to someone who can solve this. Kind regards Markus