Re: ral rt2661 tx interrupt race fix

2012-09-30 Thread Tom Murphy
Stefan,

  Your patch works well on my system:

  ral0 at pci0 dev 14 function 0 "Ralink RT2661" rev 0x00: irq 10, address 
  00:14:85:d5:39:bb
  ral0: MAC/BBP RT2661D, RF RT2529 (MIMO XR)

  Only problem is downloading from the net is extremely slow. Benchmarks
  have it at 512 KB/sec as opposed to 10 megabits/s.

  This is (internet) -> OpenBSD -> ral0 -> wifi client

  But it doesn't crash or bring up OACTIVE flag anymore which is
  fantastic.. however, it's a little to slow to use with any regularity.

  Uploads are fine (wifi -> ral(4) -> OpenBSD -> out to the net). 

  Tom



Re: ral rt2661 tx interrupt race fix

2012-10-04 Thread Stefan Sperling
On Sun, Sep 30, 2012 at 10:32:23PM +0100, Tom Murphy wrote:
> Stefan,
> 
>   Your patch works well on my system:
> 
>   ral0 at pci0 dev 14 function 0 "Ralink RT2661" rev 0x00: irq 10, address 
>   00:14:85:d5:39:bb
>   ral0: MAC/BBP RT2661D, RF RT2529 (MIMO XR)
> 
>   Only problem is downloading from the net is extremely slow. Benchmarks
>   have it at 512 KB/sec as opposed to 10 megabits/s.
> 
>   This is (internet) -> OpenBSD -> ral0 -> wifi client
> 
>   But it doesn't crash or bring up OACTIVE flag anymore which is
>   fantastic.. however, it's a little to slow to use with any regularity.
> 
>   Uploads are fine (wifi -> ral(4) -> OpenBSD -> out to the net). 

I've already replied to Tom in private requesting some additional
testing but I would still be interested in other reports about
transmission speed issues with ral.

I've also noticed that my ral-based AP can be ridiculously slow.
I believe this is a separate bug which is independent of the OACTIVE
lock-up problem.

Is anyone else out there seeing this, too?

IIRC dragonfly have some ral performance fixes in their git history
which haven't been ported back to OpenBSD yet.



Re: ral rt2661 tx interrupt race fix

2012-10-04 Thread Kenneth R Westerback
On Thu, Oct 04, 2012 at 07:21:50PM +0200, Stefan Sperling wrote:
> On Sun, Sep 30, 2012 at 10:32:23PM +0100, Tom Murphy wrote:
> > Stefan,
> > 
> >   Your patch works well on my system:
> > 
> >   ral0 at pci0 dev 14 function 0 "Ralink RT2661" rev 0x00: irq 10, address 
> >   00:14:85:d5:39:bb
> >   ral0: MAC/BBP RT2661D, RF RT2529 (MIMO XR)
> > 
> >   Only problem is downloading from the net is extremely slow. Benchmarks
> >   have it at 512 KB/sec as opposed to 10 megabits/s.
> > 
> >   This is (internet) -> OpenBSD -> ral0 -> wifi client
> > 
> >   But it doesn't crash or bring up OACTIVE flag anymore which is
> >   fantastic.. however, it's a little to slow to use with any regularity.
> > 
> >   Uploads are fine (wifi -> ral(4) -> OpenBSD -> out to the net). 
> 
> I've already replied to Tom in private requesting some additional
> testing but I would still be interested in other reports about
> transmission speed issues with ral.
> 
> I've also noticed that my ral-based AP can be ridiculously slow.
> I believe this is a separate bug which is independent of the OACTIVE
> lock-up problem.
> 
> Is anyone else out there seeing this, too?
> 
> IIRC dragonfly have some ral performance fixes in their git history
> which haven't been ported back to OpenBSD yet.
> 

I gave up on my ral ap a couple of years ago due to ridiculous slowness,
but the athn replacement was just as slow. Got a new motherboard
going into the firewall shortly and may be motivated to recheck the
wireless performance of -current then.

 Ken



Re: ral rt2661 tx interrupt race fix

2012-08-03 Thread Stefan Sperling
I haven't received any test reports so far, apart from my own
testing and edd@'s testing. It's been working splendid for me
so far but I would like to get more testing if possible.

I've Bcc'd some people who were involved in testing earlier ral diffs
or complained about ral not working for them. It would be great if
some of you could try this diff on access points and clients. Cheers!


On Wed, Jul 18, 2012 at 08:21:48PM +0200, Stefan Sperling wrote:
> ral(4) cards using the rt2661 driver code (RT2561, RT2561S, RT2661
> variants) suffer a race in TX interrupt handling which can cause
> TX processing to get stuck.
> 
> This problem was previously discussed here:
> http://marc.info/?l=openbsd-misc&m=125895269930106&w=2
> The patch proposed there was rejected by Damien for reasons unknown to me:
> http://marc.info/?l=openbsd-tech&m=126451543507085&w=2
> 
> Sepherosa Ziehau from dragonfly fixed this problem in a different way:
> http://gitweb.dragonflybsd.org/dragonfly.git/commit/9dd87f8a6f730e54dfa4f21f3d9fae6a615b1908
> 
> My approach for a fix is based on Sephe's approach.
> 
> With 2661, TX interrupt processing is split between two interrupt handlers.
> One that runs when the chip has offloaded the frame to the ASIC
> (rt2661_tx_dma_intr) and one that runs when the ASIC reports back
> about transmission success or failure (rt2661_tx_intr).
> 
> Currently, we free the mbuf for the transmitted frame during the
> first interrupt (rt2661_tx_dma_intr), but do not release the STA
> node reference yet and do not make space in the TX queue.
> 
> When the second interrupt handler (rt2661_tx_intr) runs, we obtain the
> STA node corresponding to the transmitted frame, and update the STA node's
> associated AMRR node, which contains counters used by the rate adaptation
> code ("Adaptive Multi Rate Retry" algorithm).
> Then space is made in the TX queue.
> 
> This approach assumes that rt2661_tx_intr and rt2661_tx_dma_intr will always
> be triggered in the right amount and order. This assumption doesn't seem
> to be safe. I've seen the driver end up in a situation where some entries
> in the TX queue are not removed at all.
> 
> To fix this, we can keep track of which AMRR node needs to be updated when
> the second interrupt handler gets to run. By not tying the AMRR node to the
> STA node directly, we can release the STA node reference in the first
> interrupt handler and make room in the TX queue there.
> 
> The hardware provides an 8-bit field for driver-specific data in the
> TX descriptor, which can be written before TX and read back when TX
> completes. This field is currently used to store the ID of the queue
> a frame was transmitted from, and read back in rt2661_tx_intr to obtain,
> via the queue, the STA node corresponding to the transmitted frame.
> 
> Instead of storing a queue ID, we can store an identifier for the AMRR
> node that needs to be updated when the TX has completed (in case the
> STA node has already been freed when the second interrupt occurs,
> we free the corresponding AMRR node instead of updating it).
> 
> With the patch below I've haven't seen my ral AP lock up in 2 days,
> whereas before I could trigger the problem simply by taking the laptop
> to my kitchen, where signal to the AP is poor.
> 
> Can anyone confirm that this patch provides similar benefits elsewhere?
> If your ral(4) device reports itself as "Ralink RT2561", "Ralink RT2561S",
> or "Ralink RT2661", please test and report back. Thanks!
> 
> Index: rt2661.c
> ===
> RCS file: /cvs/src/sys/dev/ic/rt2661.c,v
> retrieving revision 1.67
> diff -u -p -r1.67 rt2661.c
> --- rt2661.c  17 Jul 2012 14:43:12 -  1.67
> +++ rt2661.c  18 Jul 2012 13:38:39 -
> @@ -34,6 +34,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -57,6 +58,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -88,6 +90,8 @@ voidrt2661_reset_rx_ring(struct rt2661
>  void rt2661_free_rx_ring(struct rt2661_softc *,
>   struct rt2661_rx_ring *);
>  struct   ieee80211_node *rt2661_node_alloc(struct ieee80211com 
> *);
> +void rt2661_node_free(struct ieee80211com *,
> + struct ieee80211_node *);
>  int  rt2661_media_change(struct ifnet *);
>  void rt2661_next_scan(void *);
>  void rt2661_iter_func(void *, struct ieee80211_node *);
> @@ -115,7 +119,7 @@ uint16_t  rt2661_txtime(int, int, uint32_
>  uint8_t  rt2661_plcp_signal(int);
>  void rt2661_setup_tx_desc(struct rt2661_softc *,
>   struct rt2661_tx_desc *, uint32_t, uint16_t, int, int,
> - const bus_dma_segment_t *, int, int);
> + const bus_dma_segment_t *, int, int, u_int8_t);
>  int  rt2661_tx_mgt(struct rt2661_softc *, struct mbuf *,
>   struct ieee80211_node *);
>  int   

Re: ral rt2661 tx interrupt race fix

2012-08-03 Thread Steven Chamberlain
Hi Stefan,

On 03/08/12 12:31, Stefan Sperling wrote:
> On Wed, Jul 18, 2012 at 08:21:48PM +0200, Stefan Sperling wrote:
>> With the patch below I've haven't seen my ral AP lock up in 2 days,
>> whereas before I could trigger the problem simply by taking the laptop
>> to my kitchen, where signal to the AP is poor.

I don't think I've ever used any ral(4) hardware, but I see very similar
symptoms with rum(4) RT2573.  I guess that cannot be the same issue though.

Your recent patch for net80211 node leaks seems to have fixed a
different problem I was seeing, so thank you for that!

Regards,
-- 
Steven Chamberlain
ste...@pyro.eu.org



Re: ral rt2661 tx interrupt race fix

2012-08-14 Thread Tobias Ulmer
On Fri, Aug 03, 2012 at 01:31:18PM +0200, Stefan Sperling wrote:
> I haven't received any test reports so far, apart from my own
> testing and edd@'s testing. It's been working splendid for me
> so far but I would like to get more testing if possible.
> 
> I've Bcc'd some people who were involved in testing earlier ral diffs
> or complained about ral not working for them. It would be great if
> some of you could try this diff on access points and clients. Cheers!

Finally got around to dig out the card and put it in a Blade 1500. With
-current, I get around 10Mb using tcpbench. With your diff,
freelist corruption messages pop up rather quickly, speed is the same. I
guess the panic at the end is just an effect of the modify after free.

Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2012 OpenBSD. All rights reserved.  http://www.OpenBSD.org

: Tue Aug 14 19:57:29 CEST 2012
tobi...@blade.tmux.org:/usr/src/sys/arch/sparc64/compile/GENERIC
real mem = 2147483648 (2048MB)
avail mem = 2100854784 (2003MB)
mainbus0 at root: Sun Blade 1500 (Silver)
cpu0 at mainbus0: SUNW,UltraSPARC-IIIi (rev 3.4) @ 1503 MHz
cpu0: physical 32K instruction (32 b/l), 64K data (32 b/l), 1024K external (64 
b/l)
"memory-controller" at mainbus0 not configured
schizo0 at mainbus0: "Tomatillo", version 4, ign 780, bus A 0 to 1
schizo0: dvma map c000-dfff
pci0 at schizo0
ebus0 at pci0 dev 7 function 0 "Acer Labs M1533 ISA" rev 0x00
"flashprom" at ebus0 addr 0-f not configured
rtc0 at ebus0 addr 70-71: m5823
pcfiic0 at ebus0 addr 320-321 ivec 0x2e
iic0 at pcfiic0
admtt0 at iic0 addr 0x2e
spdmem0 at iic0 addr 0x50: 512MB DDR SDRAM registered ECC PC2300CL2.5
spdmem1 at iic0 addr 0x51: 512MB DDR SDRAM registered ECC PC2300CL2.5
spdmem2 at iic0 addr 0x52: 512MB DDR SDRAM registered ECC PC2300CL2.5
spdmem3 at iic0 addr 0x53: 512MB DDR SDRAM registered ECC PC2300CL2.5
"ics951601" at iic0 addr 0x69 not configured
power0 at ebus0 addr 800-82f ivec 0x20
com0 at ebus0 addr 3f8-3ff ivec 0x2c: ns16550a, 16 byte fifo
com0: console
com1 at ebus0 addr 2e8-2ef ivec 0x2c: ns16550a, 16 byte fifo
"dma" at ebus0 addr 0- not configured
alipm0 at pci0 dev 6 function 0 "Acer Labs M7101 Power" rev 0x00: 223KHz clock
iic1 at alipm0
"scm001" at alipm0 addr 0x20 skipped due to alipm0 bugs
autri0 at pci0 dev 8 function 0 "Acer Labs M5451 Audio" rev 0x02: ivec 0x7a4
ac97: codec id 0x41445348 (Analog Devices AD1881A)
ac97: codec features headphone, Analog Devices Phat Stereo
audio0 at autri0
midi0 at autri0: <4DWAVE MIDI UART>
ohci0 at pci0 dev 10 function 0 "Acer Labs M5237 USB" rev 0x03: ivec 0x7a7, 
version 1.0, legacy support
ohci1 at pci0 dev 11 function 0 "Acer Labs M5237 USB" rev 0x03: ivec 0x7a6, 
version 1.0, legacy support
pciide0 at pci0 dev 13 function 0 "Acer Labs M5229 UDMA IDE" rev 0xc4: DMA, 
channel 0 configured to native-PCI, channel 1 configured to native-PCI
pciide0: using ivec 0x798 for native-PCI interrupt
wd0 at pciide0 channel 0 drive 0: 
wd0: 16-sector PIO, LBA48, 114473MB, 234441648 sectors
wd1 at pciide0 channel 0 drive 1: 
wd1: 16-sector PIO, LBA48, 238474MB, 488395055 sectors
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2
wd1(pciide0:0:1): using PIO mode 4, Ultra-DMA mode 2
pciide0: channel 1 disabled (no drives)
ral0 at pci0 dev 2 function 0 "Ralink RT2561" rev 0x00: ivec 0x790, address 
00:80:5a:38:c4:0b
ral0: MAC/BBP RT2661B, RF RT2527
ppb0 at pci0 dev 3 function 0 "TI PCI2250 PCI-PCI" rev 0x02
pci1 at ppb0 bus 1
ohci2 at pci1 dev 8 function 0 "NEC USB" rev 0x43: ivec 0x78c, version 1.0
ohci3 at pci1 dev 8 function 1 "NEC USB" rev 0x43: ivec 0x78d, version 1.0
ehci0 at pci1 dev 8 function 2 "NEC USB" rev 0x04: ivec 0x78e
usb0 at ehci0: USB revision 2.0
uhub0 at usb0 "NEC EHCI root hub" rev 2.00/1.00 addr 1
"TI TSB43AB23 FireWire" rev 0x00 at pci1 dev 11 function 0 not configured
usb1 at ohci2: USB revision 1.0
uhub1 at usb1 "NEC OHCI root hub" rev 1.00/1.00 addr 1
usb2 at ohci3: USB revision 1.0
uhub2 at usb2 "NEC OHCI root hub" rev 1.00/1.00 addr 1
usb3 at ohci0: USB revision 1.0
uhub3 at usb3 "Acer Labs OHCI root hub" rev 1.00/1.00 addr 1
usb4 at ohci1: USB revision 1.0
uhub4 at usb4 "Acer Labs OHCI root hub" rev 1.00/1.00 addr 1
"ppm" at mainbus0 not configured
schizo1 at mainbus0: "Tomatillo", version 4, ign 7c0, bus B 0 to 0
schizo1: dvma map c000-dfff
pci2 at schizo1
bge0 at pci2 dev 2 function 0 "Broadcom BCM5703" rev 0x00, BCM5702/5703 A2 
(0x1002): ivec 0x7dc, address 00:14:4f:23:72:a2
brgphy0 at bge0 phy 1: BCM5703 10/100/1000baseT PHY, rev. 2
ifb0 at pci2 dev 3 function 0 "3D Labs Wildcat 5110" rev 0x01
ifb0: XVR-1200 (SUNW,375-3101), 1152x900
wsdisplay0 at ifb0 mux 1
wsdisplay0: screen 0 added (std, sun emulation)
"3D Labs Wildcat 5110" rev 0x01 at pci2 dev 3 function 1 not configured
"i2c" at mainbus0 not configured
vscsi0 at root
scsibus0 at vscsi0: 256 targets
softraid0 at root
scsibus1 at softraid0: 256

Re: ral rt2661 tx interrupt race fix

2012-08-15 Thread Stefan Sperling
On Tue, Aug 14, 2012 at 08:31:31PM +0200, Tobias Ulmer wrote:
> Finally got around to dig out the card and put it in a Blade 1500. With
> -current, I get around 10Mb using tcpbench. With your diff,
> freelist corruption messages pop up rather quickly, speed is the same. I
> guess the panic at the end is just an effect of the modify after free.

Thanks for catching this, Tobias!

I could reproduce this problem on a Blade 100 with a Ralink RT2561S card.
This updated diff should fix the problem. It works well for me in both
hostap and STA modes.

Changes relative to the last diff:

  - Zero the amn->rn->amn pointer before freeing an amrr node.
This prevents the use after free.

  - Fix a NULL rn->amn dereference.

  - Avoid builiding up a large list of unused amrr nodes.
This would happen when running lots of scans, for example.
Free unused amrr nodes if the amrr node list is full, and also
periodically clean the list from the amrr timeout (once per second).

  - Wrap amrr node list manipulation in splnet() for safety.

Index: rt2661.c
===
RCS file: /cvs/src/sys/dev/ic/rt2661.c,v
retrieving revision 1.67
diff -u -p -r1.67 rt2661.c
--- rt2661.c17 Jul 2012 14:43:12 -  1.67
+++ rt2661.c15 Aug 2012 20:04:34 -
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -57,6 +58,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -88,6 +90,8 @@ void  rt2661_reset_rx_ring(struct rt2661
 void   rt2661_free_rx_ring(struct rt2661_softc *,
struct rt2661_rx_ring *);
 struct ieee80211_node *rt2661_node_alloc(struct ieee80211com *);
+void   rt2661_node_free(struct ieee80211com *,
+   struct ieee80211_node *);
 intrt2661_media_change(struct ifnet *);
 void   rt2661_next_scan(void *);
 void   rt2661_iter_func(void *, struct ieee80211_node *);
@@ -115,7 +119,7 @@ uint16_trt2661_txtime(int, int, uint32_
 uint8_trt2661_plcp_signal(int);
 void   rt2661_setup_tx_desc(struct rt2661_softc *,
struct rt2661_tx_desc *, uint32_t, uint16_t, int, int,
-   const bus_dma_segment_t *, int, int);
+   const bus_dma_segment_t *, int, int, u_int8_t);
 intrt2661_tx_mgt(struct rt2661_softc *, struct mbuf *,
struct ieee80211_node *);
 intrt2661_tx_data(struct rt2661_softc *, struct mbuf *,
@@ -156,6 +160,14 @@ intrt2661_prepare_beacon(struct rt2661
 #endif
 void   rt2661_enable_tsf_sync(struct rt2661_softc *);
 intrt2661_get_rssi(struct rt2661_softc *, uint8_t);
+struct rt2661_amrr_node *rt2661_amrr_node_alloc(struct ieee80211com *,
+   struct rt2661_node *);
+void   rt2661_amrr_node_free(struct rt2661_softc *,
+   struct rt2661_amrr_node *);
+void   rt2661_amrr_node_free_all(struct rt2661_softc *);
+void   rt2661_amrr_node_free_unused(struct rt2661_softc *);
+struct rt2661_amrr_node *rt2661_amrr_node_find(struct 
rt2661_softc *,
+   u_int8_t);
 
 static const struct {
uint32_treg;
@@ -195,6 +207,8 @@ rt2661_attach(void *xsc, int id)
timeout_set(&sc->amrr_to, rt2661_updatestats, sc);
timeout_set(&sc->scan_to, rt2661_next_scan, sc);
 
+   TAILQ_INIT(&sc->amn);
+
/* wait for NIC to initialize */
for (ntries = 0; ntries < 1000; ntries++) {
if ((val = RAL_READ(sc, RT2661_MAC_CSR0)) != 0)
@@ -344,6 +358,8 @@ rt2661_attachhook(void *xsc)
if_attach(ifp);
ieee80211_ifattach(ifp);
ic->ic_node_alloc = rt2661_node_alloc;
+   sc->sc_node_free = ic->ic_node_free;
+   ic->ic_node_free = rt2661_node_free;
ic->ic_newassoc = rt2661_newassoc;
ic->ic_updateslot = rt2661_updateslot;
 
@@ -377,6 +393,7 @@ rt2661_detach(void *xsc)
timeout_del(&sc->amrr_to);
 
ieee80211_ifdetach(ifp);/* free all nodes */
+   rt2661_amrr_node_free_all(sc);
if_detach(ifp);
 
for (ac = 0; ac < 4; ac++)
@@ -705,11 +722,117 @@ rt2661_free_rx_ring(struct rt2661_softc 
}
 }
 
+struct rt2661_amrr_node *
+rt2661_amrr_node_alloc(struct ieee80211com *ic, struct rt2661_node *rn)
+{
+   struct rt2661_softc *sc = ic->ic_softc;
+   struct rt2661_amrr_node *amn;
+   int s;
+
+   if (sc->amn_count >= RT2661_AMRR_NODES_MAX)
+   rt2661_amrr_node_free_unused(sc);
+   if (sc->amn_count >= RT2661_AMRR_NODES_MAX)
+   return NULL;
+
+   amn = malloc(sizeof (struct rt2661_amrr_node), M_DEVBUF,
+   M_NOWAIT | M_ZERO);
+
+   if (amn) {
+   s = splnet();
+   amn->id = sc->amn_count++;
+   amn->rn = rn;
+   TAILQ_INSERT_TAIL(&sc->amn, amn, entry);

Re: ral rt2661 tx interrupt race fix

2012-08-16 Thread Tobias Ulmer
On Wed, Aug 15, 2012 at 10:17:09PM +0200, Stefan Sperling wrote:
> On Tue, Aug 14, 2012 at 08:31:31PM +0200, Tobias Ulmer wrote:
> > Finally got around to dig out the card and put it in a Blade 1500. With
> > -current, I get around 10Mb using tcpbench. With your diff,
> > freelist corruption messages pop up rather quickly, speed is the same. I
> > guess the panic at the end is just an effect of the modify after free.
> 
> Thanks for catching this, Tobias!
> 
> I could reproduce this problem on a Blade 100 with a Ralink RT2561S card.
> This updated diff should fix the problem. It works well for me in both
> hostap and STA modes.

Survived 9h of tcpbench testing here, no problems (except for a low
data rate today, but there was no difference to the stock kernel)