Re: 5.4 hanging when used as hostap [obviated by upgrade]

2014-06-23 Thread andy
On Tue, 2014-03-25 at 12:46 +0100, Stefan Sperling wrote: 
 On Mon, Mar 24, 2014 at 06:35:29PM -0700, andy wrote:

[description of ral-related hangs on 5.4]

 The diff below backs out my changes for ral from 5.3-5.4. 
 Can you test this? I doubt it will have any effect but if it does
 I'd very much like to know about it.

i did not have a chance to try backing out the ral changes for 5.3-5.4
prior to upgrading to 5.5.  fwiw i have not had any of these hangs since
the upgrade (i did not swap out power supply).

thank you for your suggestions.

andy

-- 
andy and...@diatribes.org



Re: 5.4 hanging when used as hostap

2014-03-26 Thread andy
On Tue, 2014-03-25 at 12:46 +0100, Stefan Sperling wrote: 
 The diff below backs out my changes for ral from 5.3-5.4. 
 Can you test this? I doubt it will have any effect but if it does
 I'd very much like to know about it.

thanks for the suggestion!  i may not have a chance till the weekend,
but i'll try applying the diff and see how it goes.

 It is possible that your power supply is having issues.
 
 With my net5501 I was seeing hard lockups until I upgraded to a stronger
 power supply (same voltage, more ampere). The default power supply couldn't
 power the board, a hard disk, and a wireless minipci card (also a ral rt2661
 in my case). You seem to be using a hard disk instead of a CF card, correct?
 
 If you like I can look up the exact specs of the power supply I'm using 
 tonight.

yes, my net5501 is running off a hard disk.  i'm using munin to graph
the voltage sensors and nothing jumps out at me (though i don't know
that it would).  if you could provide your specs that'd be great.

it looks like my power supply is an SBU40C-120; the info sticker says
12V 3.34A 40W max.  from what i can tell that looks like the highest
ampere psu that soekris has available online.  i don't have a great feel
for amps/watts/volts, but i believe it should power this setup (board
says 6W; hd, 1.7 W; wifi 350mA =~ 4W (?)).  if the current one does look
like it should be sufficient i could order a replacement or (if there's
a suggested model) try picking up a different one.

thanks again...

andy

-- 
andy and...@diatribes.org



Re: 5.4 hanging when used as hostap

2014-03-25 Thread Stefan Sperling
On Mon, Mar 24, 2014 at 06:35:29PM -0700, andy wrote:
 hello -
 
 i've been using a soekris net5501 as a home gateway since early 2008,
 starting w/openbsd 4.2 and upgrading through 5.4.  for most of that time
 it's also been serving as a wireless access point.  the wireless card is
 a SparkLAN WMIR-168AG WLAN 802.11a/b/g Mini PCI Module with the Ralink
 RT2561T chipset (ral driver; dmesg.boot attached).  the system has been
 working reliably for years.
 
 however, the box started to hang within days of upgrading to 5.4.  it
 stays responsive for a variable length of time after reboot, ranging
 from minutes to a week or more (but not much more).  and unfortunately,
 it hangs w/o writing anything to syslog or serial console.  i enabled
 ddb.console in sysctl.conf but found it to be completely unresponsive
 when hung (i successfully tested sending a break in normal operation).

The diff below backs out my changes for ral from 5.3-5.4. 
Can you test this? I doubt it will have any effect but if it does
I'd very much like to know about it.

 i've merged the patches from the 5.4 release errata  patch list and
 rebuilt the os to no effect.  there was some correlation between the
 hangs and increased wireless usage; i tried disabling pf and squid but
 the hangs continued.  eventually i ran `ifconfig ral0 down` and hooked
 the laptops up to a switch.  rock-solid for weeks.  brought ral0 back up
 and within days of usage the box hung again.  i see at least one person
 w/similar symptoms from 2011[1] but nothing more recent.

It is possible that your power supply is having issues.

With my net5501 I was seeing hard lockups until I upgraded to a stronger
power supply (same voltage, more ampere). The default power supply couldn't
power the board, a hard disk, and a wireless minipci card (also a ral rt2661
in my case). You seem to be using a hard disk instead of a CF card, correct?

If you like I can look up the exact specs of the power supply I'm using tonight.


Diff to back out the 'tx interrupt race' fix:

Index: rt2661.c
===
RCS file: /cvs/src/sys/dev/ic/rt2661.c,v
retrieving revision 1.68
retrieving revision 1.67
diff -u -p -r1.68 -r1.67
--- rt2661.c23 Aug 2012 10:34:25 -  1.68
+++ rt2661.c17 Jul 2012 14:43:12 -  1.67
@@ -34,7 +34,6 @@
 #include sys/timeout.h
 #include sys/conf.h
 #include sys/device.h
-#include sys/queue.h
 
 #include machine/bus.h
 #include machine/endian.h
@@ -58,7 +57,6 @@
 #include net80211/ieee80211_var.h
 #include net80211/ieee80211_amrr.h
 #include net80211/ieee80211_radiotap.h
-#include net80211/ieee80211_node.h
 
 #include dev/ic/rt2661var.h
 #include dev/ic/rt2661reg.h
@@ -90,8 +88,6 @@ void  rt2661_reset_rx_ring(struct rt2661
 void   rt2661_free_rx_ring(struct rt2661_softc *,
struct rt2661_rx_ring *);
 struct ieee80211_node *rt2661_node_alloc(struct ieee80211com *);
-void   rt2661_node_free(struct ieee80211com *,
-   struct ieee80211_node *);
 intrt2661_media_change(struct ifnet *);
 void   rt2661_next_scan(void *);
 void   rt2661_iter_func(void *, struct ieee80211_node *);
@@ -119,7 +115,7 @@ uint16_trt2661_txtime(int, int, uint32_
 uint8_trt2661_plcp_signal(int);
 void   rt2661_setup_tx_desc(struct rt2661_softc *,
struct rt2661_tx_desc *, uint32_t, uint16_t, int, int,
-   const bus_dma_segment_t *, int, int, u_int8_t);
+   const bus_dma_segment_t *, int, int);
 intrt2661_tx_mgt(struct rt2661_softc *, struct mbuf *,
struct ieee80211_node *);
 intrt2661_tx_data(struct rt2661_softc *, struct mbuf *,
@@ -160,14 +156,6 @@ intrt2661_prepare_beacon(struct rt2661
 #endif
 void   rt2661_enable_tsf_sync(struct rt2661_softc *);
 intrt2661_get_rssi(struct rt2661_softc *, uint8_t);
-struct rt2661_amrr_node *rt2661_amrr_node_alloc(struct ieee80211com *,
-   struct rt2661_node *);
-void   rt2661_amrr_node_free(struct rt2661_softc *,
-   struct rt2661_amrr_node *);
-void   rt2661_amrr_node_free_all(struct rt2661_softc *);
-void   rt2661_amrr_node_free_unused(struct rt2661_softc *);
-struct rt2661_amrr_node *rt2661_amrr_node_find(struct 
rt2661_softc *,
-   u_int8_t);
 
 static const struct {
uint32_treg;
@@ -207,8 +195,6 @@ rt2661_attach(void *xsc, int id)
timeout_set(sc-amrr_to, rt2661_updatestats, sc);
timeout_set(sc-scan_to, rt2661_next_scan, sc);
 
-   TAILQ_INIT(sc-amn);
-
/* wait for NIC to initialize */
for (ntries = 0; ntries  1000; ntries++) {
if ((val = RAL_READ(sc, RT2661_MAC_CSR0)) != 0)
@@ -358,8 +344,6 @@ rt2661_attachhook(void *xsc)
if_attach(ifp);

Re: 5.4 hanging when used as hostap

2014-03-25 Thread Bernte
On 25/03/14 11:46, Stefan Sperling wrote:

 It is possible that your power supply is having issues.
 
 With my net5501 I was seeing hard lockups until I upgraded to a stronger
 power supply (same voltage, more ampere). The default power supply couldn't
 power the board, a hard disk, and a wireless minipci card (also a ral rt2661
 in my case). You seem to be using a hard disk instead of a CF card, correct?
 
 If you like I can look up the exact specs of the power supply I'm using 
 tonight.

I fully agree, I had exactly the same issues with my Soerkis 4801 when I
used a power supply that was too weak. I upgraded mine to a supply that
could do 15W, and have been happy ever since.

Bernd



5.4 hanging when used as hostap

2014-03-24 Thread andy
hello -

i've been using a soekris net5501 as a home gateway since early 2008,
starting w/openbsd 4.2 and upgrading through 5.4.  for most of that time
it's also been serving as a wireless access point.  the wireless card is
a SparkLAN WMIR-168AG WLAN 802.11a/b/g Mini PCI Module with the Ralink
RT2561T chipset (ral driver; dmesg.boot attached).  the system has been
working reliably for years.

however, the box started to hang within days of upgrading to 5.4.  it
stays responsive for a variable length of time after reboot, ranging
from minutes to a week or more (but not much more).  and unfortunately,
it hangs w/o writing anything to syslog or serial console.  i enabled
ddb.console in sysctl.conf but found it to be completely unresponsive
when hung (i successfully tested sending a break in normal operation).

i've merged the patches from the 5.4 release errata  patch list and
rebuilt the os to no effect.  there was some correlation between the
hangs and increased wireless usage; i tried disabling pf and squid but
the hangs continued.  eventually i ran `ifconfig ral0 down` and hooked
the laptops up to a switch.  rock-solid for weeks.  brought ral0 back up
and within days of usage the box hung again.  i see at least one person
w/similar symptoms from 2011[1] but nothing more recent.

if my searches have missed any relevant documentation, threads, bug
reports, etc. please let me know.  otherwise, any suggestions on how to
resolve / troubleshoot / workaround or how to gather add'l information
for a bug report if necessary? 

tia...

andy


/etc/hostname.ral0:
inet 192.168.3.1 255.255.255.0 NONE \
media autoselect mode 11g mediaopt hostap \
nwid diatribes chan 3 \
nwkey [...]

1 http://marc.info/?l=openbsd-miscm=130827912412200

-- 
andy and...@diatribes.org
OpenBSD 5.4-stable (GENERIC) #5: Wed Feb 12 17:33:29 PST 2014
a...@tirith.diatribes.org:/usr/src/sys/arch/i386/compile/GENERIC
cpu0: Geode(TM) Integrated Processor by AMD PCS (AuthenticAMD 586-class) 434 
MHz
cpu0: FPU,DE,PSE,TSC,MSR,CX8,SEP,PGE,CMOV,CFLUSH,MMX,MMXX,3DNOW2,3DNOW
real mem  = 267972608 (255MB)
avail mem = 252141568 (240MB)
mainbus0 at root
bios0 at mainbus0: AT/286+ BIOS, date 20/80/26, BIOS32 rev. 0 @ 0xfac40
pcibios0 at bios0: rev 2.0 @ 0xf/0x1
pcibios0: pcibios_get_intr_routing - function not supported
pcibios0: PCI IRQ Routing information unavailable.
pcibios0: PCI bus #0 is the last bus
bios0: ROM list: 0xc8000/0xa800
cpu0 at mainbus0: (uniprocessor)
amdmsr0 at mainbus0
pci0 at mainbus0 bus 0: configuration mode 1 (bios)
0:20:0: io address conflict 0x6100/0x100
0:20:0: io address conflict 0x6200/0x200
pchb0 at pci0 dev 1 function 0 AMD Geode LX rev 0x30
glxsb0 at pci0 dev 1 function 2 AMD Geode LX Crypto rev 0x00: RNG AES
vr0 at pci0 dev 6 function 0 VIA VT6105M RhineIII rev 0x96: irq 11, address 
00:00:24:c9:5e:8c
ukphy0 at vr0 phy 1: Generic IEEE 802.3u media interface, rev. 3: OUI 0x004063, 
model 0x0034
vr1 at pci0 dev 7 function 0 VIA VT6105M RhineIII rev 0x96: irq 5, address 
00:00:24:c9:5e:8d
ukphy1 at vr1 phy 1: Generic IEEE 802.3u media interface, rev. 3: OUI 0x004063, 
model 0x0034
vr2 at pci0 dev 8 function 0 VIA VT6105M RhineIII rev 0x96: irq 9, address 
00:00:24:c9:5e:8e
ukphy2 at vr2 phy 1: Generic IEEE 802.3u media interface, rev. 3: OUI 0x004063, 
model 0x0034
vr3 at pci0 dev 9 function 0 VIA VT6105M RhineIII rev 0x96: irq 12, address 
00:00:24:c9:5e:8f
ukphy3 at vr3 phy 1: Generic IEEE 802.3u media interface, rev. 3: OUI 0x004063, 
model 0x0034
ral0 at pci0 dev 17 function 0 Ralink RT2561S rev 0x00: irq 15, address 
00:12:0e:61:7f:a8
ral0: MAC/BBP RT2561C, RF RT5225
glxpcib0 at pci0 dev 20 function 0 AMD CS5536 ISA rev 0x03: rev 3, 32-bit 
3579545Hz timer, watchdog, gpio, i2c
gpio0 at glxpcib0: 32 pins
iic0 at glxpcib0
pciide0 at pci0 dev 20 function 2 AMD CS5536 IDE rev 0x01: DMA, channel 0 
wired to compatibility, channel 1 wired to compatibility
wd0 at pciide0 channel 0 drive 1: HTE721080G9AT00
wd0: 16-sector PIO, LBA48, 76319MB, 156301488 sectors
wd0(pciide0:0:1): using PIO mode 4, Ultra-DMA mode 2
pciide0: channel 1 ignored (disabled)
ohci0 at pci0 dev 21 function 0 AMD CS5536 USB rev 0x02: irq 7, version 1.0, 
legacy support
ehci0 at pci0 dev 21 function 1 AMD CS5536 USB rev 0x02: irq 7
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 AMD EHCI root hub rev 2.00/1.00 addr 1
isa0 at glxpcib0
isadma0 at isa0
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
com0: console
com1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
pckbc0 at isa0 port 0x60/5
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
nsclpcsio0 at isa0 port 0x2e/2: NSC PC87366 rev 9: GPIO VLM TMS
gpio1 at nsclpcsio0: 29 pins
npx0 at isa0 port 0xf0/16: reported by CPUID; using exception 16
usb1 at ohci0: USB revision 1.0
uhub1 at usb1 AMD OHCI root hub rev 1.00/1.00 addr 1
mtrr: K6-family MTRR support (2 registers)