Re: 5.4 hanging when used as hostap [obviated by upgrade]
On Tue, 2014-03-25 at 12:46 +0100, Stefan Sperling wrote: On Mon, Mar 24, 2014 at 06:35:29PM -0700, andy wrote: [description of ral-related hangs on 5.4] The diff below backs out my changes for ral from 5.3-5.4. Can you test this? I doubt it will have any effect but if it does I'd very much like to know about it. i did not have a chance to try backing out the ral changes for 5.3-5.4 prior to upgrading to 5.5. fwiw i have not had any of these hangs since the upgrade (i did not swap out power supply). thank you for your suggestions. andy -- andy and...@diatribes.org
Re: 5.4 hanging when used as hostap
On Tue, 2014-03-25 at 12:46 +0100, Stefan Sperling wrote: The diff below backs out my changes for ral from 5.3-5.4. Can you test this? I doubt it will have any effect but if it does I'd very much like to know about it. thanks for the suggestion! i may not have a chance till the weekend, but i'll try applying the diff and see how it goes. It is possible that your power supply is having issues. With my net5501 I was seeing hard lockups until I upgraded to a stronger power supply (same voltage, more ampere). The default power supply couldn't power the board, a hard disk, and a wireless minipci card (also a ral rt2661 in my case). You seem to be using a hard disk instead of a CF card, correct? If you like I can look up the exact specs of the power supply I'm using tonight. yes, my net5501 is running off a hard disk. i'm using munin to graph the voltage sensors and nothing jumps out at me (though i don't know that it would). if you could provide your specs that'd be great. it looks like my power supply is an SBU40C-120; the info sticker says 12V 3.34A 40W max. from what i can tell that looks like the highest ampere psu that soekris has available online. i don't have a great feel for amps/watts/volts, but i believe it should power this setup (board says 6W; hd, 1.7 W; wifi 350mA =~ 4W (?)). if the current one does look like it should be sufficient i could order a replacement or (if there's a suggested model) try picking up a different one. thanks again... andy -- andy and...@diatribes.org
Re: 5.4 hanging when used as hostap
On Mon, Mar 24, 2014 at 06:35:29PM -0700, andy wrote: hello - i've been using a soekris net5501 as a home gateway since early 2008, starting w/openbsd 4.2 and upgrading through 5.4. for most of that time it's also been serving as a wireless access point. the wireless card is a SparkLAN WMIR-168AG WLAN 802.11a/b/g Mini PCI Module with the Ralink RT2561T chipset (ral driver; dmesg.boot attached). the system has been working reliably for years. however, the box started to hang within days of upgrading to 5.4. it stays responsive for a variable length of time after reboot, ranging from minutes to a week or more (but not much more). and unfortunately, it hangs w/o writing anything to syslog or serial console. i enabled ddb.console in sysctl.conf but found it to be completely unresponsive when hung (i successfully tested sending a break in normal operation). The diff below backs out my changes for ral from 5.3-5.4. Can you test this? I doubt it will have any effect but if it does I'd very much like to know about it. i've merged the patches from the 5.4 release errata patch list and rebuilt the os to no effect. there was some correlation between the hangs and increased wireless usage; i tried disabling pf and squid but the hangs continued. eventually i ran `ifconfig ral0 down` and hooked the laptops up to a switch. rock-solid for weeks. brought ral0 back up and within days of usage the box hung again. i see at least one person w/similar symptoms from 2011[1] but nothing more recent. It is possible that your power supply is having issues. With my net5501 I was seeing hard lockups until I upgraded to a stronger power supply (same voltage, more ampere). The default power supply couldn't power the board, a hard disk, and a wireless minipci card (also a ral rt2661 in my case). You seem to be using a hard disk instead of a CF card, correct? If you like I can look up the exact specs of the power supply I'm using tonight. Diff to back out the 'tx interrupt race' fix: Index: rt2661.c === RCS file: /cvs/src/sys/dev/ic/rt2661.c,v retrieving revision 1.68 retrieving revision 1.67 diff -u -p -r1.68 -r1.67 --- rt2661.c23 Aug 2012 10:34:25 - 1.68 +++ rt2661.c17 Jul 2012 14:43:12 - 1.67 @@ -34,7 +34,6 @@ #include sys/timeout.h #include sys/conf.h #include sys/device.h -#include sys/queue.h #include machine/bus.h #include machine/endian.h @@ -58,7 +57,6 @@ #include net80211/ieee80211_var.h #include net80211/ieee80211_amrr.h #include net80211/ieee80211_radiotap.h -#include net80211/ieee80211_node.h #include dev/ic/rt2661var.h #include dev/ic/rt2661reg.h @@ -90,8 +88,6 @@ void rt2661_reset_rx_ring(struct rt2661 void rt2661_free_rx_ring(struct rt2661_softc *, struct rt2661_rx_ring *); struct ieee80211_node *rt2661_node_alloc(struct ieee80211com *); -void rt2661_node_free(struct ieee80211com *, - struct ieee80211_node *); intrt2661_media_change(struct ifnet *); void rt2661_next_scan(void *); void rt2661_iter_func(void *, struct ieee80211_node *); @@ -119,7 +115,7 @@ uint16_trt2661_txtime(int, int, uint32_ uint8_trt2661_plcp_signal(int); void rt2661_setup_tx_desc(struct rt2661_softc *, struct rt2661_tx_desc *, uint32_t, uint16_t, int, int, - const bus_dma_segment_t *, int, int, u_int8_t); + const bus_dma_segment_t *, int, int); intrt2661_tx_mgt(struct rt2661_softc *, struct mbuf *, struct ieee80211_node *); intrt2661_tx_data(struct rt2661_softc *, struct mbuf *, @@ -160,14 +156,6 @@ intrt2661_prepare_beacon(struct rt2661 #endif void rt2661_enable_tsf_sync(struct rt2661_softc *); intrt2661_get_rssi(struct rt2661_softc *, uint8_t); -struct rt2661_amrr_node *rt2661_amrr_node_alloc(struct ieee80211com *, - struct rt2661_node *); -void rt2661_amrr_node_free(struct rt2661_softc *, - struct rt2661_amrr_node *); -void rt2661_amrr_node_free_all(struct rt2661_softc *); -void rt2661_amrr_node_free_unused(struct rt2661_softc *); -struct rt2661_amrr_node *rt2661_amrr_node_find(struct rt2661_softc *, - u_int8_t); static const struct { uint32_treg; @@ -207,8 +195,6 @@ rt2661_attach(void *xsc, int id) timeout_set(sc-amrr_to, rt2661_updatestats, sc); timeout_set(sc-scan_to, rt2661_next_scan, sc); - TAILQ_INIT(sc-amn); - /* wait for NIC to initialize */ for (ntries = 0; ntries 1000; ntries++) { if ((val = RAL_READ(sc, RT2661_MAC_CSR0)) != 0) @@ -358,8 +344,6 @@ rt2661_attachhook(void *xsc) if_attach(ifp);
Re: 5.4 hanging when used as hostap
On 25/03/14 11:46, Stefan Sperling wrote: It is possible that your power supply is having issues. With my net5501 I was seeing hard lockups until I upgraded to a stronger power supply (same voltage, more ampere). The default power supply couldn't power the board, a hard disk, and a wireless minipci card (also a ral rt2661 in my case). You seem to be using a hard disk instead of a CF card, correct? If you like I can look up the exact specs of the power supply I'm using tonight. I fully agree, I had exactly the same issues with my Soerkis 4801 when I used a power supply that was too weak. I upgraded mine to a supply that could do 15W, and have been happy ever since. Bernd
5.4 hanging when used as hostap
hello - i've been using a soekris net5501 as a home gateway since early 2008, starting w/openbsd 4.2 and upgrading through 5.4. for most of that time it's also been serving as a wireless access point. the wireless card is a SparkLAN WMIR-168AG WLAN 802.11a/b/g Mini PCI Module with the Ralink RT2561T chipset (ral driver; dmesg.boot attached). the system has been working reliably for years. however, the box started to hang within days of upgrading to 5.4. it stays responsive for a variable length of time after reboot, ranging from minutes to a week or more (but not much more). and unfortunately, it hangs w/o writing anything to syslog or serial console. i enabled ddb.console in sysctl.conf but found it to be completely unresponsive when hung (i successfully tested sending a break in normal operation). i've merged the patches from the 5.4 release errata patch list and rebuilt the os to no effect. there was some correlation between the hangs and increased wireless usage; i tried disabling pf and squid but the hangs continued. eventually i ran `ifconfig ral0 down` and hooked the laptops up to a switch. rock-solid for weeks. brought ral0 back up and within days of usage the box hung again. i see at least one person w/similar symptoms from 2011[1] but nothing more recent. if my searches have missed any relevant documentation, threads, bug reports, etc. please let me know. otherwise, any suggestions on how to resolve / troubleshoot / workaround or how to gather add'l information for a bug report if necessary? tia... andy /etc/hostname.ral0: inet 192.168.3.1 255.255.255.0 NONE \ media autoselect mode 11g mediaopt hostap \ nwid diatribes chan 3 \ nwkey [...] 1 http://marc.info/?l=openbsd-miscm=130827912412200 -- andy and...@diatribes.org OpenBSD 5.4-stable (GENERIC) #5: Wed Feb 12 17:33:29 PST 2014 a...@tirith.diatribes.org:/usr/src/sys/arch/i386/compile/GENERIC cpu0: Geode(TM) Integrated Processor by AMD PCS (AuthenticAMD 586-class) 434 MHz cpu0: FPU,DE,PSE,TSC,MSR,CX8,SEP,PGE,CMOV,CFLUSH,MMX,MMXX,3DNOW2,3DNOW real mem = 267972608 (255MB) avail mem = 252141568 (240MB) mainbus0 at root bios0 at mainbus0: AT/286+ BIOS, date 20/80/26, BIOS32 rev. 0 @ 0xfac40 pcibios0 at bios0: rev 2.0 @ 0xf/0x1 pcibios0: pcibios_get_intr_routing - function not supported pcibios0: PCI IRQ Routing information unavailable. pcibios0: PCI bus #0 is the last bus bios0: ROM list: 0xc8000/0xa800 cpu0 at mainbus0: (uniprocessor) amdmsr0 at mainbus0 pci0 at mainbus0 bus 0: configuration mode 1 (bios) 0:20:0: io address conflict 0x6100/0x100 0:20:0: io address conflict 0x6200/0x200 pchb0 at pci0 dev 1 function 0 AMD Geode LX rev 0x30 glxsb0 at pci0 dev 1 function 2 AMD Geode LX Crypto rev 0x00: RNG AES vr0 at pci0 dev 6 function 0 VIA VT6105M RhineIII rev 0x96: irq 11, address 00:00:24:c9:5e:8c ukphy0 at vr0 phy 1: Generic IEEE 802.3u media interface, rev. 3: OUI 0x004063, model 0x0034 vr1 at pci0 dev 7 function 0 VIA VT6105M RhineIII rev 0x96: irq 5, address 00:00:24:c9:5e:8d ukphy1 at vr1 phy 1: Generic IEEE 802.3u media interface, rev. 3: OUI 0x004063, model 0x0034 vr2 at pci0 dev 8 function 0 VIA VT6105M RhineIII rev 0x96: irq 9, address 00:00:24:c9:5e:8e ukphy2 at vr2 phy 1: Generic IEEE 802.3u media interface, rev. 3: OUI 0x004063, model 0x0034 vr3 at pci0 dev 9 function 0 VIA VT6105M RhineIII rev 0x96: irq 12, address 00:00:24:c9:5e:8f ukphy3 at vr3 phy 1: Generic IEEE 802.3u media interface, rev. 3: OUI 0x004063, model 0x0034 ral0 at pci0 dev 17 function 0 Ralink RT2561S rev 0x00: irq 15, address 00:12:0e:61:7f:a8 ral0: MAC/BBP RT2561C, RF RT5225 glxpcib0 at pci0 dev 20 function 0 AMD CS5536 ISA rev 0x03: rev 3, 32-bit 3579545Hz timer, watchdog, gpio, i2c gpio0 at glxpcib0: 32 pins iic0 at glxpcib0 pciide0 at pci0 dev 20 function 2 AMD CS5536 IDE rev 0x01: DMA, channel 0 wired to compatibility, channel 1 wired to compatibility wd0 at pciide0 channel 0 drive 1: HTE721080G9AT00 wd0: 16-sector PIO, LBA48, 76319MB, 156301488 sectors wd0(pciide0:0:1): using PIO mode 4, Ultra-DMA mode 2 pciide0: channel 1 ignored (disabled) ohci0 at pci0 dev 21 function 0 AMD CS5536 USB rev 0x02: irq 7, version 1.0, legacy support ehci0 at pci0 dev 21 function 1 AMD CS5536 USB rev 0x02: irq 7 usb0 at ehci0: USB revision 2.0 uhub0 at usb0 AMD EHCI root hub rev 2.00/1.00 addr 1 isa0 at glxpcib0 isadma0 at isa0 com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo com0: console com1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo pckbc0 at isa0 port 0x60/5 pckbd0 at pckbc0 (kbd slot) pckbc0: using irq 1 for kbd slot wskbd0 at pckbd0: console keyboard pcppi0 at isa0 port 0x61 spkr0 at pcppi0 nsclpcsio0 at isa0 port 0x2e/2: NSC PC87366 rev 9: GPIO VLM TMS gpio1 at nsclpcsio0: 29 pins npx0 at isa0 port 0xf0/16: reported by CPUID; using exception 16 usb1 at ohci0: USB revision 1.0 uhub1 at usb1 AMD OHCI root hub rev 1.00/1.00 addr 1 mtrr: K6-family MTRR support (2 registers)