Re: Strange network problem. Debugging hints needed.

2011-02-08 Thread Andre Keller
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi all

I still have this issue with 4.9. Please let me know if I could assist
with any additional data. During the last week I had this problem about
3 times, so I can run tests when the issue is present but I don't know
what I could check.

I can say the problem seems to occur only with vr(4). bge(4) and em(4)
work fine with an otherwise identical configuration.

I'd really appreciate some help on this...(This is also PR6546)


Regards Andri


Am 21.01.2011 00:38, schrieb Andre Keller:
 Hi there
 
 I have a strange problem with network connectivity on a device of mine.
 
 The setup is carp on vlan on vr(4).
 
 The problem is that the link runs for 10minutes, 10hours or 10days and
 suddenly it stops working. Doing a ifconfig vr0 down ; ifconfig vr0 up
 solves the problem temporarly but as you can imagine I'd like to have a
 more permanent solution.
 
 The problem is there are no obvious indications. The port on the switch
 (C 2960) stays up (and it is not errdisabled), there are no errors.
 Configuring the interface 100 full or autoselect (on both switch and
 device) does not make a difference. The error counters on the switch as
 well as netstat -i do not show any errors.
 
 I setup the same configuration (carp on vlan on physical interface)
 using em(4) and did not run into the problem yet (3 weeks up). So I
 guess I could have something to do with vr(4).
 
 The problem appeared first after updateing 4.8 to 20101222 snapshot, and
 is still present with snapshot from this week. But prior 20101222
 snapshot there were no carp and vlan interfaces, just an ip on the
 physical interface. So I don't know if the problem is my configuration
 or something that has changed in the code...
 
 dmesg:
 OpenBSD 4.9-beta (GENERIC) #628: Tue Jan 18 14:14:07 MST 2011
 t...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC
 cpu0: Geode(TM) Integrated Processor by AMD PCS (AuthenticAMD
 586-class) 499 MHz
 cpu0: FPU,DE,PSE,TSC,MSR,CX8,SEP,PGE,CMOV,CFLUSH,MMX
 real mem  = 268009472 (255MB)
 avail mem = 253489152 (241MB)
 mainbus0 at root
 bios0 at mainbus0: AT/286+ BIOS, date 11/05/08, BIOS32 rev. 0 @ 0xfd088
 pcibios0 at bios0: rev 2.1 @ 0xf/0x1
 pcibios0: pcibios_get_intr_routing - function not supported
 pcibios0: PCI IRQ Routing information unavailable.
 pcibios0: PCI bus #0 is the last bus
 bios0: ROM list: 0xe/0xa800
 cpu0 at mainbus0: (uniprocessor)
 pci0 at mainbus0 bus 0: configuration mode 1 (bios)
 pchb0 at pci0 dev 1 function 0 AMD Geode LX rev 0x33
 glxsb0 at pci0 dev 1 function 2 AMD Geode LX Crypto rev 0x00: RNG AES
 vr0 at pci0 dev 9 function 0 VIA VT6105M RhineIII rev 0x96: irq 10,
 address 00:0d:b9:17:c0:60
 ukphy0 at vr0 phy 1: Generic IEEE 802.3u media interface, rev. 3: OUI
 0x004063, model 0x0034
 vr1 at pci0 dev 10 function 0 VIA VT6105M RhineIII rev 0x96: irq 11,
 address 00:0d:b9:17:c0:61
 ukphy1 at vr1 phy 1: Generic IEEE 802.3u media interface, rev. 3: OUI
 0x004063, model 0x0034
 vr2 at pci0 dev 11 function 0 VIA VT6105M RhineIII rev 0x96: irq 15,
 address 00:0d:b9:17:c0:62
 ukphy2 at vr2 phy 1: Generic IEEE 802.3u media interface, rev. 3: OUI
 0x004063, model 0x0034
 glxpcib0 at pci0 dev 15 function 0 AMD CS5536 ISA rev 0x03: rev 3,
 32-bit 3579545Hz timer, watchdog, gpio
 gpio0 at glxpcib0: 32 pins
 pciide0 at pci0 dev 15 function 2 AMD CS5536 IDE rev 0x01: DMA,
 channel 0 wired to compatibility, channel 1 wired to compatibility
 wd0 at pciide0 channel 0 drive 0: CF 4GB
 wd0: 1-sector PIO, LBA, 3823MB, 7831152 sectors
 wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2
 pciide0: channel 1 ignored (disabled)
 ohci0 at pci0 dev 15 function 4 AMD CS5536 USB rev 0x02: irq 12,
 version 1.0, legacy support
 ehci0 at pci0 dev 15 function 5 AMD CS5536 USB rev 0x02: irq 12
 usb0 at ehci0: USB revision 2.0
 uhub0 at usb0 AMD EHCI root hub rev 2.00/1.00 addr 1
 isa0 at glxpcib0
 isadma0 at isa0
 com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
 com0: console
 com1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
 pcppi0 at isa0 port 0x61
 spkr0 at pcppi0
 npx0 at isa0 port 0xf0/16: reported by CPUID; using exception 16
 usb1 at ohci0: USB revision 1.0
 uhub1 at usb1 AMD OHCI root hub rev 1.00/1.00 addr 1
 biomask 73e7 netmask ffe7 ttymask 
 mtrr: K6-family MTRR support (2 registers)
 nvram: invalid checksum
 vscsi0 at root
 scsibus0 at vscsi0: 256 targets
 softraid0 at root
 root on wd0a swap on wd0b dump on wd0b
 clock: unknown CMOS layout
 
 
 ifconfig (first two octets / words of ip exchanged):
 lo0: flags=8049UP,LOOPBACK,RUNNING,MULTICAST mtu 33200
 priority: 0
 groups: lo
 inet 127.0.0.1 netmask 0xff00
 inet6 ::1 prefixlen 128
 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x5
 vr0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
 lladdr 00:0d:b9:17:c0:60
 priority: 0
 media: Ethernet 100baseTX full-duplex
 status: active
 inet6 

Re: Strange network problem. Debugging hints needed.

2011-01-21 Thread Stuart Henderson
On 2011-01-21, Markus Hennecke markus-henne...@markus-hennecke.de wrote:

 There was a well known bug that would cause vr devices to stop receiving 
 and sending traffic if a short cable was used to connect to the switch 

you're thinking of sis(4) for the short cable problem.

there was a problem with VT6105M with vr(4) fixed 2009/04/28
where some link-state changes caused the nic to wedge.

there was also a problem with the MCLGETI code (added on 2009/06/18)
where high packet rates (or possibly a busy machine) caused the
nic to wedge. a fix for this was committed recently (2011/01/13).

if people are still seeing problems with vr(4) wedging on a
4.9-beta kernel they should write up a report with as much detail
as possible and submit it as a PR with sendbug(1).



Re: Strange network problem. Debugging hints needed.

2011-01-21 Thread Markus Hennecke

Am 21.01.2011 10:42, schrieb Stuart Henderson:

On 2011-01-21, Markus Henneckemarkus-henne...@markus-hennecke.de  wrote:


There was a well known bug that would cause vr devices to stop receiving
and sending traffic if a short cable was used to connect to the switch


you're thinking of sis(4) for the short cable problem.

there was a problem with VT6105M with vr(4) fixed 2009/04/28
where some link-state changes caused the nic to wedge.


Ah, thanks for that information. I meant the latter but did not have all 
the facts available (I think the first time this got meantioned on the 
list was in 2007 in the thread vr driver trouble on Soekris 5501).



there was also a problem with the MCLGETI code (added on 2009/06/18)
where high packet rates (or possibly a busy machine) caused the
nic to wedge. a fix for this was committed recently (2011/01/13).


Looks like I am seeing this on my box here from time to time. Thanks for 
the information.


Kind regards,
  Markus



Re: Strange network problem. Debugging hints needed.

2011-01-21 Thread Stuart Henderson
On 2011/01/21 11:16, Markus Hennecke wrote:
 Am 21.01.2011 10:42, schrieb Stuart Henderson:
 On 2011-01-21, Markus Henneckemarkus-henne...@markus-hennecke.de  wrote:
 
 There was a well known bug that would cause vr devices to stop receiving
 and sending traffic if a short cable was used to connect to the switch
 
 you're thinking of sis(4) for the short cable problem.
 
 there was a problem with VT6105M with vr(4) fixed 2009/04/28
 where some link-state changes caused the nic to wedge.
 
 Ah, thanks for that information. I meant the latter but did not have
 all the facts available (I think the first time this got meantioned
 on the list was in 2007 in the thread vr driver trouble on Soekris
 5501).
 
 there was also a problem with the MCLGETI code (added on 2009/06/18)
 where high packet rates (or possibly a busy machine) caused the
 nic to wedge. a fix for this was committed recently (2011/01/13).
 
 Looks like I am seeing this on my box here from time to time. Thanks
 for the information.

This diff should apply to -stable.

Index: if_vr.c
===
RCS file: /cvs/src/sys/dev/pci/if_vr.c,v
retrieving revision 1.105.2.1
diff -u -p -r1.105.2.1 if_vr.c
--- if_vr.c 2 Oct 2010 03:00:52 -   1.105.2.1
+++ if_vr.c 21 Jan 2011 10:28:49 -
@@ -1048,15 +1048,11 @@ vr_intr(void *arg)
/* Disable interrupts. */
CSR_WRITE_2(sc, VR_IMR, 0x);
 
-   for (;;) {
-
-   status = CSR_READ_2(sc, VR_ISR);
-   if (status)
-   CSR_WRITE_2(sc, VR_ISR, status);
-
-   if ((status  VR_INTRS) == 0)
-   break;
+   status = CSR_READ_2(sc, VR_ISR);
+   if (status)
+   CSR_WRITE_2(sc, VR_ISR, status);
 
+   if (status  VR_INTRS) {
claimed = 1;
 
if (status  VR_ISR_RX_OK)
@@ -1092,7 +1088,7 @@ vr_intr(void *arg)
sc-sc_dev.dv_xname);
vr_reset(sc);
vr_init(sc);
-   break;
+   status = 0;
}
 
if ((status  VR_ISR_TX_OK) || (status  VR_ISR_TX_ABRT) ||



Strange network problem. Debugging hints needed.

2011-01-20 Thread Andre Keller
Hi there

I have a strange problem with network connectivity on a device of mine.

The setup is carp on vlan on vr(4).

The problem is that the link runs for 10minutes, 10hours or 10days and
suddenly it stops working. Doing a ifconfig vr0 down ; ifconfig vr0 up
solves the problem temporarly but as you can imagine I'd like to have a
more permanent solution.

The problem is there are no obvious indications. The port on the switch
(C 2960) stays up (and it is not errdisabled), there are no errors.
Configuring the interface 100 full or autoselect (on both switch and
device) does not make a difference. The error counters on the switch as
well as netstat -i do not show any errors.

I setup the same configuration (carp on vlan on physical interface)
using em(4) and did not run into the problem yet (3 weeks up). So I
guess I could have something to do with vr(4).

The problem appeared first after updateing 4.8 to 20101222 snapshot, and
is still present with snapshot from this week. But prior 20101222
snapshot there were no carp and vlan interfaces, just an ip on the
physical interface. So I don't know if the problem is my configuration
or something that has changed in the code...

dmesg:
OpenBSD 4.9-beta (GENERIC) #628: Tue Jan 18 14:14:07 MST 2011
t...@i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC
cpu0: Geode(TM) Integrated Processor by AMD PCS (AuthenticAMD
586-class) 499 MHz
cpu0: FPU,DE,PSE,TSC,MSR,CX8,SEP,PGE,CMOV,CFLUSH,MMX
real mem  = 268009472 (255MB)
avail mem = 253489152 (241MB)
mainbus0 at root
bios0 at mainbus0: AT/286+ BIOS, date 11/05/08, BIOS32 rev. 0 @ 0xfd088
pcibios0 at bios0: rev 2.1 @ 0xf/0x1
pcibios0: pcibios_get_intr_routing - function not supported
pcibios0: PCI IRQ Routing information unavailable.
pcibios0: PCI bus #0 is the last bus
bios0: ROM list: 0xe/0xa800
cpu0 at mainbus0: (uniprocessor)
pci0 at mainbus0 bus 0: configuration mode 1 (bios)
pchb0 at pci0 dev 1 function 0 AMD Geode LX rev 0x33
glxsb0 at pci0 dev 1 function 2 AMD Geode LX Crypto rev 0x00: RNG AES
vr0 at pci0 dev 9 function 0 VIA VT6105M RhineIII rev 0x96: irq 10,
address 00:0d:b9:17:c0:60
ukphy0 at vr0 phy 1: Generic IEEE 802.3u media interface, rev. 3: OUI
0x004063, model 0x0034
vr1 at pci0 dev 10 function 0 VIA VT6105M RhineIII rev 0x96: irq 11,
address 00:0d:b9:17:c0:61
ukphy1 at vr1 phy 1: Generic IEEE 802.3u media interface, rev. 3: OUI
0x004063, model 0x0034
vr2 at pci0 dev 11 function 0 VIA VT6105M RhineIII rev 0x96: irq 15,
address 00:0d:b9:17:c0:62
ukphy2 at vr2 phy 1: Generic IEEE 802.3u media interface, rev. 3: OUI
0x004063, model 0x0034
glxpcib0 at pci0 dev 15 function 0 AMD CS5536 ISA rev 0x03: rev 3,
32-bit 3579545Hz timer, watchdog, gpio
gpio0 at glxpcib0: 32 pins
pciide0 at pci0 dev 15 function 2 AMD CS5536 IDE rev 0x01: DMA,
channel 0 wired to compatibility, channel 1 wired to compatibility
wd0 at pciide0 channel 0 drive 0: CF 4GB
wd0: 1-sector PIO, LBA, 3823MB, 7831152 sectors
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2
pciide0: channel 1 ignored (disabled)
ohci0 at pci0 dev 15 function 4 AMD CS5536 USB rev 0x02: irq 12,
version 1.0, legacy support
ehci0 at pci0 dev 15 function 5 AMD CS5536 USB rev 0x02: irq 12
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 AMD EHCI root hub rev 2.00/1.00 addr 1
isa0 at glxpcib0
isadma0 at isa0
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
com0: console
com1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
npx0 at isa0 port 0xf0/16: reported by CPUID; using exception 16
usb1 at ohci0: USB revision 1.0
uhub1 at usb1 AMD OHCI root hub rev 1.00/1.00 addr 1
biomask 73e7 netmask ffe7 ttymask 
mtrr: K6-family MTRR support (2 registers)
nvram: invalid checksum
vscsi0 at root
scsibus0 at vscsi0: 256 targets
softraid0 at root
root on wd0a swap on wd0b dump on wd0b
clock: unknown CMOS layout


ifconfig (first two octets / words of ip exchanged):
lo0: flags=8049UP,LOOPBACK,RUNNING,MULTICAST mtu 33200
priority: 0
groups: lo
inet 127.0.0.1 netmask 0xff00
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x5
vr0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
lladdr 00:0d:b9:17:c0:60
priority: 0
media: Ethernet 100baseTX full-duplex
status: active
inet6 fe80::20d:b9ff:fe17:c060%vr0 prefixlen 64 scopeid 0x1
vr1: flags=8b43UP,BROADCAST,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST
mtu 1500
lladdr 00:0d:b9:17:c0:61
priority: 0
media: Ethernet 100baseTX full-duplex
status: active
inet6 fe80::20d:b9ff:fe17:c061%vr1 prefixlen 64 scopeid 0x2
vr2: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
lladdr 00:0d:b9:17:c0:62
priority: 0
media: Ethernet autoselect (100baseTX full-duplex)
status: active
inet 10.11.255.253 netmask 0xfffc broadcast 10.11.255.255
inet6 fe80::20d:b9ff:fe17:c062%vr2 prefixlen 64 

Re: Strange network problem. Debugging hints needed.

2011-01-20 Thread Markus Hennecke

Am 21.01.2011 00:38, schrieb Andre Keller:

I have a strange problem with network connectivity on a device of mine.

The setup is carp on vlan on vr(4).

The problem is that the link runs for 10minutes, 10hours or 10days and
suddenly it stops working. Doing a ifconfig vr0 down ; ifconfig vr0 up
solves the problem temporarly but as you can imagine I'd like to have a
more permanent solution.

The problem is there are no obvious indications. The port on the switch
(C 2960) stays up (and it is not errdisabled), there are no errors.
Configuring the interface 100 full or autoselect (on both switch and
device) does not make a difference. The error counters on the switch as
well as netstat -i do not show any errors.

I setup the same configuration (carp on vlan on physical interface)
using em(4) and did not run into the problem yet (3 weeks up). So I
guess I could have something to do with vr(4).

The problem appeared first after updateing 4.8 to 20101222 snapshot, and
is still present with snapshot from this week. But prior 20101222
snapshot there were no carp and vlan interfaces, just an ip on the
physical interface. So I don't know if the problem is my configuration
or something that has changed in the code...


There was a well known bug that would cause vr devices to stop receiving 
and sending traffic if a short cable was used to connect to the switch 
or other device (like a DSL modem). I really don't know if this was 
fixed at some time, but I think I still saw this bug on my router 
running 4.8-stable (a soekris 5501). I dealt with it monitoring the 
pppoe0 interface and doing the ifconfig up / down of the vr interface 
used by pppoe0 via script as soon as the pppoe connection is lost.


As I worked around the problem I can't offer you a solution. Perhaps 
this information gives a hint to someone who can solve this.


Kind regards
  Markus