Re: [ath9k-devel] [PATCH] ath9k: Restart xmit logic in xmit watchdog.
On 2011-01-09 9:39 PM, Ben Greear wrote: On 01/09/2011 10:19 AM, Felix Fietkau wrote: On 2011-01-09 12:46 AM, gree...@candelatech.com wrote: diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c index d9a4144..1b3a62c 100644 --- a/drivers/net/wireless/ath/ath9k/xmit.c +++ b/drivers/net/wireless/ath/ath9k/xmit.c @@ -1988,19 +1988,30 @@ static void ath_tx_rc_status(struct ath_buf *bf, struct ath_tx_status *ts, tx_info-status.rates[tx_rateindex].count = ts-ts_longretry + 1; } -static void ath_wake_mac80211_queue(struct ath_softc *sc, int qnum) +/* Has no locking. */ +static void __ath_wake_mac80211_queue(struct ath_softc *sc, struct ath_txq *txq) { - struct ath_txq *txq; - - txq = sc-tx.txq_map[qnum]; - spin_lock_bh(txq-axq_lock); if (txq-stopped txq-pending_frames ATH_MAX_QDEPTH) { - if (ath_mac80211_start_queue(sc, qnum)) + if (ath_mac80211_start_queue(sc, txq-axq_qnum)) txq-stopped = 0; } +} This part is quite broken, I think you got confused with various types of queue numbers. txq-axq_qnum refers to the atheros hw queue index, whereas the qnum argument to this function refers to the mac80211 queue index (which is also the correct index for sc-tx.txq_map - not to be confused with the sc-tx.txq array). Yeah, I am confused on all of this. Looks like I should add a member to the txq struct to record it's mac80211 index and use that instead? How about just passing the proper qnum? You can get it from the skb queue mapping anyway. In the upstream code, is this correct? It seems to me that it should always be waking 'txq' since it just completed a packet. Why the check against txq_map? if (txq == sc-tx.txq_map[qnum]) ath_wake_mac80211_queue(sc, qnum); Things like CAB (or maybe UAPSD at some point), where a frame might go out through a queue other than the 4 WMM data queues. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] [PATCH] ath9k: Implement rx copy-break.
On 2011-01-08 8:33 AM, gree...@candelatech.com wrote: From: Ben Greeargree...@candelatech.com This saves us constantly allocating large, multi-page skbs. It should fix the order-1 allocation errors reported, and in a 60-vif scenario, this significantly decreases CPU utilization, and latency, and increases bandwidth. Signed-off-by: Ben Greeargree...@candelatech.com --- :100644 100644 b2497b8... ea2f67c... M drivers/net/wireless/ath/ath9k/recv.c drivers/net/wireless/ath/ath9k/recv.c | 92 ++--- 1 files changed, 61 insertions(+), 31 deletions(-) diff --git a/drivers/net/wireless/ath/ath9k/recv.c b/drivers/net/wireless/ath/ath9k/recv.c index b2497b8..ea2f67c 100644 --- a/drivers/net/wireless/ath/ath9k/recv.c +++ b/drivers/net/wireless/ath/ath9k/recv.c @@ -1702,42 +1704,70 @@ int ath_rx_tasklet(struct ath_softc *sc, int flush, bool hp) unlikely(tsf_lower - rs.rs_tstamp 0x1000)) rxs-mactime += 0x1ULL; - /* Ensure we always have an skb to requeue once we are done - * processing the current buffer's skb */ - requeue_skb = ath_rxbuf_alloc(common, common-rx_bufsize, GFP_ATOMIC); - - /* If there is no memory we ignore the current RX'd frame, - * tell hardware it can give us a new frame using the old - * skb and put it at the tail of the sc-rx.rxbuf list for - * processing. */ - if (!requeue_skb) - goto requeue; - - /* Unmap the frame */ - dma_unmap_single(sc-dev, bf-bf_buf_addr, - common-rx_bufsize, - dma_type); + len = rs.rs_datalen + ah-caps.rx_status_len; + if (use_copybreak) { + skb = netdev_alloc_skb(NULL, len); + if (!skb) { + skb = bf-bf_mpdu; + use_copybreak = false; + goto non_copybreak; + } + } else { I think this should be dependent on packet size, maybe even based on the architecture. Especially on embedded hardware, copying large frames is probably quite a bit more expensive than allocating large buffers. Cache sizes are small, memory access takes several cycles, especially during concurrent DMA. Once I'm back home, I could try a few packet size threshold to find a sweet spot for the typical MIPS hardware that I'm playing with. I expect a visible performance regression from this patch when applied as-is. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] [PATCH] ath9k: Implement rx copy-break.
On 2011-01-08 5:36 PM, Ben Greear wrote: On 01/08/2011 04:20 PM, Felix Fietkau wrote: I think this should be dependent on packet size, maybe even based on the architecture. Especially on embedded hardware, copying large frames is probably quite a bit more expensive than allocating large buffers. Cache sizes are small, memory access takes several cycles, especially during concurrent DMA. Once I'm back home, I could try a few packet size threshold to find a sweet spot for the typical MIPS hardware that I'm playing with. I expect a visible performance regression from this patch when applied as-is. I see a serious performance improvement with this patch. My current test is sending 1024 byte UDP payloads to/from each of 60 stations at 128kbps. Please do try it out on your system and see how it performs there. I'm guessing that any time you have more than 1 VIF this will be a good improvement since mac80211 does skb_copy (and you would typically be copying a much smaller packet with this patch). If we do see performance differences on different platforms, this could perhaps be something we could tune at run-time. What kind of system are you testing on? If it's a PC, then the performance characteristics will be completely different compared to embedded hardware. I've had to remove a few copybreak-like implementations from various ethernet drivers on similar hardware, because even taking the hit of unaligned load/store exceptions (which are already *very* expensive on MIPS) was less than copying the full packet data, even with packet sizes less than what you're using. I don't have suitable test hardware with me right now, but I'll do some tests in a week or so. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] [PATCH v2 3/3] ath9k: Keep track of stations for debugfs.
On 2011-01-07 1:12 PM, Luis R. Rodriguez wrote: On Thu, Jan 06, 2011 at 08:49:12PM -0800, gree...@candelatech.com wrote: From: Ben Greeargree...@candelatech.com The stations hold the ath_node, which holds the tid and other xmit logic structures. In order to debug stuck xmit logic, we need a way to print out the tid state for the stations. Signed-off-by: Ben Greeargree...@candelatech.com --- v1 - v2: Use linked list instead of array. Protect with spinlock. Again, see my comments about the # STAs limit. I think this can go in as a cfg80211 driver limitation which can be exposed. If you want to go over the supported number (known to work, safe, call it what you want) then a kconfig option can be used. I disagree with putting in an essentially arbitrary restriction based on what has been tested, unless we can actually point at a *specific* limitation that makes the limit necessary. If ath9k gets unstable with too many STA interfaces, then we should identify the underlying cause or fix it. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] [PATCH 1/4] initvals: update AR9003 initvals based on latest from Atheros
On 2010-12-13 6:48 PM, Luis R. Rodriguez wrote: On Sun, Dec 12, 2010 at 05:30:31AM -0800, Felix Fietkau wrote: Signed-off-by: Felix Fietkau n...@openwrt.org Hey Felix, thanks a lot. These didn't apply though, do you have this as your top patch from upstream? commit f1379b3189509355e29eed3b4bb110ea9829a555 Author: Luis R. Rodriguez lrodrig...@atheros.com Date: Fri Nov 19 09:32:23 2010 -0800 initvals: forgot to remove one file dependency on 2p0 removal Signed-off-by: Luis R. Rodriguez lrodrig...@atheros.com Yes, I did have that. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] Script to crash ath9k with DMA errors.
On 2010-12-06 9:28 PM, Ben Greear wrote: On 12/06/2010 11:53 AM, Luis R. Rodriguez wrote: On Mon, Dec 06, 2010 at 11:53:13AM -0800, Luis Rodriguez wrote: On Mon, Dec 06, 2010 at 11:47:47AM -0800, Ben Greear wrote: On 12/06/2010 11:36 AM, Luis R. Rodriguez wrote: Can you clarify the status of this issue. It remains unclear to me from your above description how things are going. As I read it some things look OK now but you still get a warning. Ok, since you asked :) I worked on this over the weekend and this morning. I had all sorts of issues until I realized that I had one STA with non-configured SSID. It sometimes connected to one /a AP and the other STAs attempted to connect to another /n (on entirely different band) AP. I basically got zero stations associated for any length of time due to constant channel switching. No crashes, but lots of warnings about DMA failing to stop. Now..I've fixed this configuration issue (and adding steps to help prevent this mis-configuration again). With 16 properly configured non-encrypted stations, running with wpa-supplicant with netlink driver sharing scan results, the interfaces quickly associate. However, I do continue to see DMA warnings such as these (I had picked up my portable phone, and it knocked all the interfaces offline ..here they are coming back up after I hung up the phone). Please note that I ported Felix's 2.6.37 patch he posted this morning to wireless-testing and have applied it. I'm highly tempted to just make that a WARN_ON_ONCE so at least my logs aren't spammed so heavily with the recv.c:531 DMA warning. You can send this change upstream as well. Also, feel free to limit the number of STAs you can have up physically by setting this to a number you bless yourself. I have a feeling there is no hard limit..but if I do find one, I'll cook up a patch. Probably not many of us ever going to push anywhere near what I'm trying, and folks like me can limit in user-space if wanted... I'll do up the warn-on-once patch shortly. By the way, would you consider this channel-change suppression patch, or something similar? drivers/net/wireless/ath/ath9k/main.c index f026a03..6c1c43b 100644 @@ -1605,6 +1605,16 @@ static int ath9k_config(struct ieee80211_hw *hw, u32 changed) else sc-sc_flags = ~SC_OP_OFFCHANNEL; + /* If channels HT are the same, then don't actually do anything. + */ + if ((sc-sc_ah-curchan == sc-sc_ah-channels[pos]) + (aphy-chan_is_ht == conf_is_ht(conf))) { + ath_print(common, ATH_DBG_CONFIG, + Skip Set channel: %d MHz, already there.\n, + curchan-center_freq); + goto skip_chan_change; + } + I think this needs to check the offchannel flag as well, at least in one direction. Skipping on-channel - off-channel is fine, but the other way around might break calibration - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] Script to crash ath9k with DMA errors.
On 2010-12-03 9:14 AM, Ben Greear wrote: On 12/01/2010 03:22 PM, Ben Greear wrote: On 11/29/2010 04:44 PM, Luis R. Rodriguez wrote: On Mon, Nov 29, 2010 at 04:28:51PM -0800, Ben Greear wrote: BUG: unable to handle kernel NULL pointer dereference at 0040 IP: [f933470a] ath_tx_start+0x461/0x5ef [ath9k] *pde = Oops: [#1] SMP DEBUG_PAGEALLOC last sysfs file: /sys/devices/pci:00/:00:1e.0/:08:01.0/irq Modules linked in: aes_i586 aes_generic fuse nfs lockd fscache nfs_acl auth_rpcgss sunrpc ipv6 uinput arc4 ecb ath9k mac80211 ath9k_common ath9k_hw mi] Pid: 38, comm: kworker/u:1 Tainted: GW 2.6.37-rc3-wl+ #53 PDSBM/PDSBM EIP: 0060:[f933470a] EFLAGS: 00010246 CPU: 1 EIP is at ath_tx_start+0x461/0x5ef [ath9k] Please use gdb drivers/net/wireless/ath/ath9k/ l *(ath_tx_start+0x461) Luis I managed to hit that ath_tx_start crash again, and this time there were no obvious DMA or irq errors immediately preceding it. So, it might be a real bug after all. I'll add some extra checks to see if tid-ac is NULL. I've made some small progress on this general issue. First, I added all sorts of debugging to try to figure out ath_tx_start crash. As best as I can tell, 'tid' is not NULL, but also is not a valid pointer, and probably something close to 0x0. I've added yet more debugging, but haven't hit the problem again. I also tried stopping DMA in a loop up to 5 times if it failed to stop previously in the loop. This did not appear to help at all. I also managed to make both the ath_tx_start crash and the DMA errors very hard to reproduce (I dare not say fixed, yet). It appears that this small patch (and possibly, the fact that I set debugging to 0x600 instead of 0x400) makes the problems go away. This makes me wonder if a root cause is something to do with repeatedly resetting the hardware too fast, as setting channels rapidly would tend to do that, and channels are set on association by supplicant, it appears. Please try this patch while leaving the unnecessary resets in place. I found that when ath_drain_all_txq finds tx dma not stopped, it will issue a reset at a point in time where it is both useless (since it's right before a reset anyway) and dangerous (since the rx dma engine isn't even disabled yet), so IMHO the right thing to do is to drop this extra reset. --- a/drivers/net/wireless/ath/ath9k/xmit.c +++ b/drivers/net/wireless/ath/ath9k/xmit.c @@ -1194,18 +1194,8 @@ void ath_drain_all_txq(struct ath_softc } } - if (npend) { - int r; - - ath_print(common, ATH_DBG_FATAL, - Failed to stop TX DMA. Resetting hardware!\n); - - r = ath9k_hw_reset(ah, sc-sc_ah-curchan, ah-caldata, false); - if (r) - ath_print(common, ATH_DBG_FATAL, - Unable to reset hardware; reset status %d\n, - r); - } + if (npend) + ath_print(common, ATH_DBG_FATAL, Failed to stop TX DMA!\n); for (i = 0; i ATH9K_NUM_TX_QUEUES; i++) { if (ATH_TXQ_SETUP(sc, i)) ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] [PATCH wireless-next] ath: Rename ath_print to ath_debug
On 2010-12-01 3:27 PM, Joe Perches wrote: On Tue, 2010-11-30 at 23:56 -0800, Luis R. Rodriguez wrote: On Tue, Nov 30, 2010 at 12:19 PM, Joe Perches j...@perches.com wrote: Poor function naming is just that. It reduces readability and the uses are counter expectation. The name is perfect, we use it to print anything, even non-debugging stuff. 'fraid not. ath/debug.h #ifdef CONFIG_ATH_DEBUG void ath_print(struct ath_common *common, int dbg_mask, const char *fmt, ...) __attribute__ ((format (printf, 3, 4))); #else static inline void __attribute__ ((format (printf, 3, 4))) ath_print(struct ath_common *common, int dbg_mask, const char *fmt, ...) { } #endif /* CONFIG_ATH_DEBUG */ Now we're getting closer to something worth fixing. IMHO we should change the code so that ath_print(common, ATH_DBG_FATAL, ...) prints something even with CONFIG_ATH_DEBUG unset. To get this done, some renaming would make sense here. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] [PATCH] fix endianity on ath9k_htc
On 2010-11-29 10:58 AM, Pavel Machek wrote: It seems struct eep_header lacks proper #ifdef BIG_ENDIAN_BITFIELD markup. eep_4k_header has proper markup, but two fields were swapped. Signed-off-by: Pavel Machek p...@sysgo.com diff --git a/drivers/net/wireless/ath/ath9k/eeprom.h b/drivers/net/wireless/ath/ath9k/eeprom.h index 3c99830..022589d 100644 --- a/drivers/net/wireless/ath/ath9k/eeprom.h +++ b/drivers/net/wireless/ath/ath9k/eeprom.h @@ -378,10 +374,14 @@ struct modal_eep_header { u8 xatten2Margin[AR5416_MAX_CHAINS]; u8 ob_ch1; u8 db_ch1; - u8 useAnt1:1, - force_xpaon:1, - local_bias:1, +#ifndef __BIG_ENDIAN_BITFIELD + u8 useAnt1:1, force_xpaon:1, local_bias:1, femBandSelectUsed:1, xlnabufin:1, xlnaisel:2, xlnabufmode:1; +#else + u8 xlnabufmode:1, xlnaisel:2, xlnabufin:1, + femBandSelectUsed:1, local_bias:1, force_xpaon:1, useAnt1:1; +#endif + u8 miscBits; u16 xpaBiasLvlFreq[3]; u8 futureModal[6]; I already sent a different fix for this part. See '[PATCH] ath9k_hw: fix more bitfield related endian issues' @@ -443,7 +444,7 @@ struct modal_eep_4k_header { u8 antdiv_ctl1:4, ob_4:4; u8 db1_3:4, db1_2:4; u8 antdiv_ctl2:4, db1_4:4; - u8 db2_2:4, db2_3:4; + u8 db2_3:4, db2_2:4; u8 reserved:4, db2_4:4; #else u8 ob_2:4, ob_3:4; Looks good. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] [PATCH wireless-next] ath: Rename ath_print to ath_debug
On 2010-11-29 7:07 AM, Peter Stuge wrote: Luis R. Rodriguez wrote: On Sun, Nov 28, 2010 at 3:53 PM, Joe Perches j...@perches.com wrote: Make the function name match the function purpose. ath_debug is a debug only facility. ath_print seems too generic a name for a debug only use. Nack, I don't see the point. The point is to improve readability. I like the patch. And how exactly does this improve readability? Don't get me wrong, I generally like to see more cleanups merged to the ath/ath9k drivers (they do need it, after all). But in my opinion, simple renaming churn like this does nothing but annoy people who want to get other work done at the same time. Consider the large number of lines touched (and the potential merge conflicts with other code because of that), relative to the microscopic aesthetic gain (if any). Instead I'd like to see more cleanups that go beyond trivial function renames. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] [PATCH wireless-next] ath: Rename ath_print to ath_debug
On 2010-11-30 2:39 AM, Joe Perches wrote: On Mon, 2010-11-29 at 23:41 +0100, Felix Fietkau wrote: On 2010-11-29 7:07 AM, Peter Stuge wrote: Luis R. Rodriguez wrote: On Sun, Nov 28, 2010 at 3:53 PM, Joe Perches j...@perches.com wrote: Make the function name match the function purpose. ath_debug is a debug only facility. ath_print seems too generic a name for a debug only use. Nack, I don't see the point. The point is to improve readability. I like the patch. And how exactly does this improve readability? Don't get me wrong, I generally like to see more cleanups merged to the ath/ath9k drivers (they do need it, after all). It's considered polite to cc the patch author. Sorry, I forgot to add you back. The Cc list was cleared somewhere in this thread and I added a few entries back, but apparently accidentally dropped some of them again. print is generic, debug is specific. This function has a specific use for debugging not a generic use all for logging. If it was ath_print(level, etc) with KERN_level passed as the first argument that'd be similar to other kernel calls. As is, it's not. But in my opinion, simple renaming churn like this does nothing but annoy people who want to get other work done at the same time. This sort of thing can be done when other changes have just been merged to minimize conflicts. Consider the large number of lines touched (and the potential merge conflicts with other code because of that), relative to the microscopic aesthetic gain (if any). Instead I'd like to see more cleanups that go beyond trivial function renames. I gauge my willingness to spend time on subsystems on the maintainers willingness to merge things that improve readability and correctness. I'm not trying to discourage you from spending time on improving this code, just doubting the usefulness of such simple function renames. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] [RFC] ath9k: fix beacon race conditions
On 2010-11-03 6:36 PM, Björn Smedman wrote: Hi all, The beacon processing in ath9k is done in a tasklet. This tasklet may race against beacon allocation/deallocation in process context. The patch below is an attempt to point out / avoid these race conditions. My hope is that this will stabilize ath9k in a use-case I'm interested in where ap vif interfaces are added and removed quite often. This is purely an RFC, and it's probably synchronizing a bit too much. I'm currently testing this patch with no obvious problems so far (except for the part in xmit.c which I've commented out). More testing is necessary but so is a rewrite of the patch anyway. Any thoughts? Best regards, Björn --- diff --git a/drivers/net/wireless/ath/ath9k/beacon.c b/drivers/net/wireless/ath/ath9k/beacon.c index 19891e7..f91e0b5 100644 --- a/drivers/net/wireless/ath/ath9k/beacon.c +++ b/drivers/net/wireless/ath/ath9k/beacon.c @@ -192,6 +192,7 @@ static struct ath_buf *ath_beacon_generate(struct ieee80211_hw *hw, if (sc-nvifs 1) { ath_print(common, ATH_DBG_BEACON, Flushing previous cabq traffic\n); + ath9k_hw_stoptxdma(sc-sc_ah, cabq-axq_qnum); ath_draintxq(sc, cabq, false); } } Makes sense diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c index b52f1cf..c3d2a36 100644 --- a/drivers/net/wireless/ath/ath9k/main.c +++ b/drivers/net/wireless/ath/ath9k/main.c @@ -290,9 +290,11 @@ int ath_set_channel(struct ath_softc *sc, struct ieee80211_hw *hw, ath9k_hw_set_interrupts(ah, ah-imask); if (!(sc-sc_flags (SC_OP_OFFCHANNEL))) { + tasklet_disable(sc-bcon_tasklet); ath_beacon_config(sc, NULL); ieee80211_queue_delayed_work(sc-hw, sc-tx_complete_work, 0); ath_start_ani(common); + tasklet_enable(sc-bcon_tasklet); } ps_restore: Why? @@ -838,6 +840,8 @@ static void ath9k_bss_assoc_info(struct ath_softc *sc, struct ath_common *common = ath9k_hw_common(ah); if (bss_conf-assoc) { + tasklet_disable(sc-bcon_tasklet); + ath_print(common, ATH_DBG_CONFIG, Bss Info ASSOC %d, bssid: %pM\n, bss_conf-aid, common-curbssid); @@ -861,6 +865,8 @@ static void ath9k_bss_assoc_info(struct ath_softc *sc, sc-sc_flags |= SC_OP_ANI_RUN; ath_start_ani(common); + + tasklet_enable(sc-bcon_tasklet); } else { ath_print(common, ATH_DBG_CONFIG, Bss Info DISASSOC\n); common-curaid = 0; Unnecessary, does not have anything to do with *sending* beacons. @@ -903,8 +909,11 @@ void ath_radio_enable(struct ath_softc *sc, struct ieee80211_hw *hw) } spin_unlock_bh(sc-rx.pcu_lock); - if (sc-sc_flags SC_OP_BEACONS) + if (sc-sc_flags SC_OP_BEACONS) { + tasklet_disable(sc-bcon_tasklet); ath_beacon_config(sc, NULL);/* restart beacons */ + tasklet_enable(sc-bcon_tasklet); + } /* Re-Enable interrupts */ ath9k_hw_set_interrupts(ah, ah-imask); @@ -1008,8 +1017,11 @@ int ath_reset(struct ath_softc *sc, bool retry_tx) */ ath_update_txpow(sc); - if ((sc-sc_flags SC_OP_BEACONS) || !(sc-sc_flags (SC_OP_OFFCHANNEL))) + if ((sc-sc_flags SC_OP_BEACONS) || !(sc-sc_flags (SC_OP_OFFCHANNEL))) { + tasklet_disable(sc-bcon_tasklet); ath_beacon_config(sc, NULL);/* restart beacons */ + tasklet_enable(sc-bcon_tasklet); + } ath9k_hw_set_interrupts(ah, ah-imask); Why? @@ -1517,6 +1529,8 @@ static void ath9k_remove_interface(struct ieee80211_hw *hw, mutex_lock(sc-mutex); + tasklet_disable(sc-bcon_tasklet); + /* Stop ANI */ sc-sc_flags = ~SC_OP_ANI_RUN; del_timer_sync(common-ani.timer); @@ -1544,6 +1558,8 @@ static void ath9k_remove_interface(struct ieee80211_hw *hw, sc-nvifs--; + tasklet_enable(sc-bcon_tasklet); + mutex_unlock(sc-mutex); } Makes sense. @@ -1817,6 +1833,8 @@ static int ath9k_conf_tx(struct ieee80211_hw *hw, u16 queue, mutex_lock(sc-mutex); + tasklet_disable(sc-bcon_tasklet); + memset(qi, 0, sizeof(struct ath9k_tx_queue_info)); qi.tqi_aifs = params-aifs; @@ -1839,6 +1857,8 @@ static int ath9k_conf_tx(struct ieee80211_hw *hw, u16 queue, if ((qnum == sc-tx.hwq_map[WME_AC_BE]) !ret) ath_beaconq_config(sc); + tasklet_enable(sc-bcon_tasklet); + mutex_unlock(sc-mutex); return ret; I don't think that one's necessary. @@ -1930,13 +1950,16 @@ static void ath9k_bss_info_changed(struct ieee80211_hw *hw, /* Enable
Re: [ath9k-devel] [RFC] ath9k: fix tx queue selection
On 2010-11-03 12:35 PM, Björn Smedman wrote: This is one good looking patch. :) And I agree, looking at the header qos is good to avoid. But there is still the risk of queue selection mismatch as I see it... See comments below. /Björn - /* XXX: Remove me once we don't depend on ath9k_channel for all * this redundant data */ void ath9k_update_ichannel(struct ath_softc *sc, struct ieee80211_hw *hw, @@ -1244,7 +1193,6 @@ static int ath9k_tx(struct ieee80211_hw struct ath_tx_control txctl; int padpos, padsize; struct ieee80211_hdr *hdr = (struct ieee80211_hdr *) skb-data; - int qnum; if (aphy-state != ATH_WIPHY_ACTIVE aphy-state != ATH_WIPHY_SCAN) { ath_print(common, ATH_DBG_XMIT, @@ -1317,8 +1265,7 @@ static int ath9k_tx(struct ieee80211_hw memmove(skb-data, skb-data + padsize, padpos); } - qnum = ath_get_hal_qnum(skb_get_queue_mapping(skb), sc); - txctl.txq = sc-tx.txq[qnum]; + txctl.txq = sc-tx.txq_map[skb_get_queue_mapping(skb)]; Could we be indexing txq_map[] out of bounds here? I guess this question is the fundamental one: can we be sure that skb_get_queue_mapping(skb) will return an AC i.e. range 0-3? Not only now but forever? Or do we need a comment in mac80211 saying driver will crash if anything else is returned? mac80211 already makes the same assumption in several places. It should never be out of bounds unless something in the network stack is broken. And if there is a need for a defensive check for this, then ath9k is definitely not the right place for it. @@ -1756,8 +1744,9 @@ int ath_tx_start(struct ieee80211_hw *hw * we will at least have to run TX completionon one buffer * on the queue */ spin_lock_bh(txq-axq_lock); - if (!txq-stopped txq-axq_depth 1) { - ath_mac80211_stop_queue(sc, skb_get_queue_mapping(skb)); + if (txq == sc-tx.txq_map[q] !txq-stopped + txq-axq_depth 1) { + ath_mac80211_stop_queue(sc, q); Again, possible index out of bounds, no? Also, what happens if txq != sc-tx.txq_map[q]? I guess that's less catastrophic but still; meaningful code will not execute. I added that check primarily for the case where buffered frames go through the cabq. txq-stopped = 1; } spin_unlock_bh(txq-axq_lock); @@ -1767,13 +1756,10 @@ int ath_tx_start(struct ieee80211_hw *hw return r; } - q = skb_get_queue_mapping(skb); - if (q = 4) - q = 0; - spin_lock_bh(txq-axq_lock); - if (++sc-tx.pending_frames[q] ATH_MAX_QDEPTH !txq-stopped) { - ath_mac80211_stop_queue(sc, skb_get_queue_mapping(skb)); + if (txq == sc-tx.txq_map[q] + ++txq-pending_frames ATH_MAX_QDEPTH !txq-stopped) { + ath_mac80211_stop_queue(sc, q); Same as above. txq-stopped = 1; } spin_unlock_bh(txq-axq_lock); @@ -1887,12 +1873,12 @@ static void ath_tx_complete(struct ath_s if (unlikely(tx_info-pad[0] ATH_TX_INFO_FRAME_TYPE_INTERNAL)) ath9k_tx_status(hw, skb); else { - q = skb_get_queue_mapping(skb); - if (q = 4) - q = 0; + struct ath_txq *txq; - if (--sc-tx.pending_frames[q] 0) - sc-tx.pending_frames[q] = 0; + q = skb_get_queue_mapping(skb); + txq = sc-tx.txq_map[q]; + if (--txq-pending_frames 0) + txq-pending_frames = 0; This is off topic, cut do we really need this? Where do those missing frames go? :) I would much prefer a BUG_ON(txq-pending_frames 0). BUG_ON is not a good idea, it's only supposed to be used for cases with potentially severe side effects, things like random memory corruption. A counting imbalance here would be completely harmless, so at most we should have a WARN_ON_ONCE here. spin_lock_bh(txq-axq_lock); if (!list_empty(txq-txq_fifo_pending)) { @@ -2375,7 +2366,7 @@ void ath_tx_node_init(struct ath_softc * for (acno = 0, ac = an-ac[acno]; acno WME_NUM_AC; acno++, ac++) { ac-sched= false; - ac-qnum = sc-tx.hwq_map[acno]; + ac-txq = sc-tx.txq_map[acno]; INIT_LIST_HEAD(ac-tid_q); } } @@ -2385,17 +2376,13 @@ void ath_tx_node_cleanup(struct ath_soft struct ath_atx_ac *ac; struct ath_atx_tid *tid; struct ath_txq *txq; - int i, tidno; + int tidno; for (tidno = 0, tid = an-tid[tidno]; tidno WME_NUM_TID; tidno++, tid++) { - i = tid-ac-qnum; - - if (!ATH_TXQ_SETUP(sc, i)) -
Re: [ath9k-devel] [RFC] ath9k: fix tx queue selection
On 2010-11-03 5:27 PM, Björn Smedman wrote: It comes down to this: either we look at the header qos when we select the queue (so the above cannot happen) or we relay on mac80211 to set the header qos and the skb queue mapping in a certain way. If we choose the later I vote for a BUG_ON(txctl-txq != tid-ac-txq) in ath_tx_send_ampdu(). For regular QoS data frames (and no other frames ever hit the aggregation code) there is only one possible way to map tid - ac - queue. I did review those code paths, and I believe them to be safe. If you want, we can add a WARN_ON_ONCE later, but definitely no BUG_ON. I've briefly looked through the IEEE Std 802.11e-2005. There is a clear requirement that There shall be no reordering of unicast MSDUs with the same TID value and addressed to the same destination in analog to what Hulmut pointed out earlier. Other than that the only reference I can find is that: The MAC data service for QSTAs shall incorporate a TID with each MA-UNITDATA.request service. This TID associates the MSDU with the AC or TS queue for the indicated traffic. Why are you sure there is only one way to map tid - ac - queue? I don't think it's hard to come up with a case where you want to map differently (e.g. depending on RA or even TA). Take a look at Table 9-1 on page 253 (PDF page 301) in 802.11-2007. Ok, regardless. So lets say there is a bug in mac80211 that allows a mismatch between header qos tid and skb queue mapping to occur (which in fact there is because this happens all the time with my frame injection heavy app). Then it's ok for ath9k to screw up the locking, possibly corrupt data and so on, silently? I don't see potential for locking issues or data corruption here, even if such a bug were to show up. Other than that I guess that it's basically an argument about aesthetics, and you may very well be right. All I know is that I've been following ath9k development now for almost two years and I'm amazed by the severity of bugs that are still found, and I guess yet to be found. We're dma:ing all over the place, deadlocking queues and so on, on a regular basis, or at least we where 3 months ago. After each one of these is fixed the attitude seems to be now everything is perfect and suggesting there could be some more problems or will be in the future is just plain rude. Then yet another is found... I'm not saying we should assume that everything is always fine, but I do object to adding defensive code against made up scenarios of potential bugs that might be introduced at some point in the future. If you relay on something so fragile as the contents of frame data matching skb_get_queue_mapping() I think you owe me at least a WARN_ON_ONCE before you start corrupting memory. ;) The assumption that I make is not just about some random field in the frame contents. I'm assuming that ieee80211_select_queue() makes a sane decision that matches the description in the standard, and that the network stack preserves the decision that this function made. And besides - it's not like part of ath9k that cares about the TID is going to live on for much longer :) - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] [RFC] ath9k: fix tx queue selection
On 2010-11-03 6:31 PM, Björn Smedman wrote: 2010/11/3 Felix Fietkau n...@openwrt.org: On 2010-11-03 5:27 PM, Björn Smedman wrote: Ok, regardless. So lets say there is a bug in mac80211 that allows a mismatch between header qos tid and skb queue mapping to occur (which in fact there is because this happens all the time with my frame injection heavy app). Then it's ok for ath9k to screw up the locking, possibly corrupt data and so on, silently? I don't see potential for locking issues or data corruption here, even if such a bug were to show up. Then I think this is the only point we really disagree on. :) It goes like this. When we get to ath_tx_start_dma() there already is a txq assigned (passed as txctl-txq) and we lock that txq. Then, if it's aggregation data we look up the tid: an = (struct ath_node *)tx_info-control.sta-drv_priv; tid = ATH_AN_2_TID(an, bf-bf_tidno); Notice how bf-bf_tidno is used. This contains the TID from the 802.11 header qos field. That means tid-ac-txq may not be the same as txctl-txq if there is a mismatch between frame data and skb queue mapping. Now we call ath_tx_send_ampdu() which presumes the txq (and therefore the associated tid) is already locked and starts fiddling with e.g. tid-buf_q, in this case without holding tid-ac-txq-axq_lock. This is racy e.g. against ath_draintxq() / ath_txq_drain_pending_buffers() which does not know about this madness and locks the correct txq. Hmm, I guess you have a point there ;) I'm not saying we should assume that everything is always fine, but I do object to adding defensive code against made up scenarios of potential bugs that might be introduced at some point in the future. I can see your point. I don't want lots of defensive stuff (like what you removed in your patch). But I still feel the balance is wrong. Take one recent case for example: We're not 100% sure we can always stop RX dma. In fact, a few weeks ago we weren't even sure what we didn't start it when we weren't supposed to. Yet for some reason there seems to be a consensus it is a good idea to keep ds_data of all those dma descriptors pointing at arbitrary kernel data. I realize it takes some time and adds some clutter to do ds_data = 0. I also understand it does not help in all cases. But I think it's a reasonable precaution under the circumstances. It's like in medicine, patients will die but when they do you want to be able to say we did everything we could. ;) Actually, when dealing with hardware pointers, I'm not sure setting them to 0 makes things any better, since 0 still points to a physical RAM location :) - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] [RFC] ath9k: fix tx queue selection
On 2010-11-02 6:13 PM, Felix Fietkau wrote: On 2010-11-02 5:13 PM, Björn Smedman wrote: Hi all, The following patch attempts to fix some problems with ath9k tx queue selection: 1. There was a posible mismatch between the queue selected for QoS packets (on which locking, queue start/stop and statistics where performed) and the queue actually used for TX. This is fixed by selecting the tx queue based on the TID of the 802.11 header for this type of packet. This should not be necessary. mac80211 should take care of queue selection properly for QoS frames. If it doesn't, then that is the bug that needs to be fixed... 2. Even with the above fix there was still a risk of mac80211 queue deadlock because the queue to stop was selected on the basis of the skb queue mapping whereas the queue to start was selected based on the hardware tx queue to mac80211 queue mapping. These may not in all situations be one and the same. Instead of working around this, we need to make sure that those are always identical. This patch is against latest wireless-testing but I've only tested it against compat-wireless-2010-10-19 with openwrt patches on top (including Luis latest pcu lock patch) and some other patches I'm working on. If you can run wireless-testing directly, please give it a spin. For me it's a big improvement in stability under high tx/rx load. Any thoughts? Let's do a thorough review of the tx path and figure out where the mismatch is actually coming from. Björn, how about this instead? Please test if it improves the stability in your tests. --- a/drivers/net/wireless/ath/ath9k/xmit.c +++ b/drivers/net/wireless/ath/ath9k/xmit.c @@ -1747,6 +1747,7 @@ int ath_tx_start(struct ieee80211_hw *hw return -1; } + q = ath_get_mac80211_qnum(txq-axq_class, sc); r = ath_tx_setup_buffer(hw, bf, skb, txctl); if (unlikely(r)) { ath_print(common, ATH_DBG_FATAL, TX mem alloc failure\n); @@ -1756,8 +1757,8 @@ int ath_tx_start(struct ieee80211_hw *hw * we will at least have to run TX completionon one buffer * on the queue */ spin_lock_bh(txq-axq_lock); - if (!txq-stopped txq-axq_depth 1) { - ath_mac80211_stop_queue(sc, skb_get_queue_mapping(skb)); + if (q = 0 !txq-stopped txq-axq_depth 1) { + ath_mac80211_stop_queue(sc, q); txq-stopped = 1; } spin_unlock_bh(txq-axq_lock); @@ -1767,13 +1768,10 @@ int ath_tx_start(struct ieee80211_hw *hw return r; } - q = skb_get_queue_mapping(skb); - if (q = 4) - q = 0; - spin_lock_bh(txq-axq_lock); - if (++sc-tx.pending_frames[q] ATH_MAX_QDEPTH !txq-stopped) { - ath_mac80211_stop_queue(sc, skb_get_queue_mapping(skb)); + if (q = 0 ++sc-tx.pending_frames[q] ATH_MAX_QDEPTH + !txq-stopped) { + ath_mac80211_stop_queue(sc, q); txq-stopped = 1; } spin_unlock_bh(txq-axq_lock); @@ -1841,7 +1839,8 @@ exit: /*/ static void ath_tx_complete(struct ath_softc *sc, struct sk_buff *skb, - struct ath_wiphy *aphy, int tx_flags) + struct ath_wiphy *aphy, struct ath_txq *txq, + int tx_flags) { struct ieee80211_hw *hw = sc-hw; struct ieee80211_tx_info *tx_info = IEEE80211_SKB_CB(skb); @@ -1887,11 +1886,8 @@ static void ath_tx_complete(struct ath_s if (unlikely(tx_info-pad[0] ATH_TX_INFO_FRAME_TYPE_INTERNAL)) ath9k_tx_status(hw, skb); else { - q = skb_get_queue_mapping(skb); - if (q = 4) - q = 0; - - if (--sc-tx.pending_frames[q] 0) + q = ath_get_mac80211_qnum(txq-axq_class, sc); + if (q = 0 --sc-tx.pending_frames[q] 0) sc-tx.pending_frames[q] = 0; ieee80211_tx_status(hw, skb); @@ -1928,7 +1924,7 @@ static void ath_tx_complete_buf(struct a complete(sc-paprd_complete); } else { ath_debug_stat_tx(sc, txq, bf, ts); - ath_tx_complete(sc, skb, bf-aphy, tx_flags); + ath_tx_complete(sc, skb, bf-aphy, txq, tx_flags); } /* At this point, skb (bf-bf_mpdu) is consumed...make sure we don't * accidentally reference it later. ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] [RFC] ath9k: fix tx queue selection
On 2010-11-02 8:16 PM, Björn Smedman wrote: 2010/11/2 Felix Fietkau n...@openwrt.org: On 2010-11-02 7:20 PM, Björn Smedman wrote: 2010/11/2 Felix Fietkau n...@openwrt.org: + q = ath_get_mac80211_qnum(txq-axq_class, sc); r = ath_tx_setup_buffer(hw, bf, skb, txctl); if (unlikely(r)) { ath_print(common, ATH_DBG_FATAL, TX mem alloc failure\n); @@ -1756,8 +1757,8 @@ int ath_tx_start(struct ieee80211_hw *hw * we will at least have to run TX completionon one buffer * on the queue */ spin_lock_bh(txq-axq_lock); - if (!txq-stopped txq-axq_depth 1) { - ath_mac80211_stop_queue(sc, skb_get_queue_mapping(skb)); + if (q = 0 !txq-stopped txq-axq_depth 1) { + ath_mac80211_stop_queue(sc, q); txq-stopped = 1; } You cannot be sure that you are stopping the queue that the skb actually came in on here since mac80211 queues are mapped to hw queues by ath_get_hal_qnum() and that mapping is not reversible (due to the default statement): How does the default statement matter here? The queue number always comes from an index of the ieee802_1d_to_ac[] array, which only contains numbers from 0 to 3. That should make the conversion reversible. True, but then you have a functional dependency on that data/code (with catastrophic consequences if it ever changes). I understand the name of that array suggests that it will be fixed forever but I don't think we can be sure that a queue mapping will always be an AC. I think it may be very reasonable to expand it to be a TID (0-7) or even a separate queue per RA and TID. Are you prepared to put a BUG_ON() under that default? If so that's a start. But it's not only the default statement that may make that mapping non-reversible. It could also be that e.g. sc-tx.hwq_map[WME_AC_VI] == sc-tx.hwq_map[WME_AC_VO]. You need some BUG_ONs there too and you better not try to support a chipset with less than 4 hw queues. How about this then? I decoupled the WME_AC_* definitions from the ath9k_hw queue subtypes, so that I could redefine them to the numbers used by mac80211. That gets rid of another crappy abstraction. With this patch, pending frames will only be counted for the case where the txq is correct wrt. skb queue mapping. I still don't think fetching the tid from the data buffer again is the right thing to do. I double checked ath9k's tid - ac conversion and it looks correct to me. --- a/drivers/net/wireless/ath/ath9k/ath9k.h +++ b/drivers/net/wireless/ath/ath9k/ath9k.h @@ -195,7 +195,6 @@ enum ATH_AGGR_STATUS { #define ATH_TXFIFO_DEPTH 8 struct ath_txq { - int axq_class; u32 axq_qnum; u32 *axq_link; struct list_head axq_q; @@ -208,11 +207,12 @@ struct ath_txq { struct list_head txq_fifo_pending; u8 txq_headidx; u8 txq_tailidx; + int pending_frames; }; struct ath_atx_ac { + struct ath_txq *txq; int sched; - int qnum; struct list_head list; struct list_head tid_q; }; @@ -290,12 +290,11 @@ struct ath_tx_control { struct ath_tx { u16 seq_no; u32 txqsetup; - int hwq_map[WME_NUM_AC]; spinlock_t txbuflock; struct list_head txbuf; struct ath_txq txq[ATH9K_NUM_TX_QUEUES]; struct ath_descdma txdma; - int pending_frames[WME_NUM_AC]; + struct ath_txq *txq_map[WME_NUM_AC]; }; struct ath_rx_edma { @@ -325,7 +324,6 @@ void ath_rx_cleanup(struct ath_softc *sc int ath_rx_tasklet(struct ath_softc *sc, int flush, bool hp); struct ath_txq *ath_txq_setup(struct ath_softc *sc, int qtype, int subtype); void ath_tx_cleanupq(struct ath_softc *sc, struct ath_txq *txq); -int ath_tx_setup(struct ath_softc *sc, int haltype); void ath_drain_all_txq(struct ath_softc *sc, bool retry_tx); void ath_draintxq(struct ath_softc *sc, struct ath_txq *txq, bool retry_tx); @@ -665,7 +663,6 @@ struct ath_wiphy { void ath9k_tasklet(unsigned long data); int ath_reset(struct ath_softc *sc, bool retry_tx); -int ath_get_mac80211_qnum(u32 queue, struct ath_softc *sc); int ath_cabq_update(struct ath_softc *); static inline void ath_read_cachesize(struct ath_common *common, int *csz) --- a/drivers/net/wireless/ath/ath9k/beacon.c +++ b/drivers/net/wireless/ath/ath9k/beacon.c @@ -28,7 +28,7 @@ int ath_beaconq_config(struct ath_softc struct ath_hw *ah = sc-sc_ah; struct ath_common *common = ath9k_hw_common(ah); struct ath9k_tx_queue_info qi, qi_be; - int qnum; + struct ath_txq *txq; ath9k_hw_get_txq_props(ah, sc-beacon.beaconq, qi); if (sc-sc_ah-opmode == NL80211_IFTYPE_AP) { @@ -38,8 +38,8 @@ int ath_beaconq_config(struct ath_softc qi.tqi_cwmax = 0; } else { /* Adhoc mode; important thing is to use 2x cwmin
Re: [ath9k-devel] ath9k: race conditions in dma
On 2010-11-01 5:44 PM, Luis R. Rodriguez wrote: 2010/11/1 Björn Smedman bjorn.smed...@venatech.se: On Mon, Nov 1, 2010 at 4:43 PM, Ben Gamari bgam...@gmail.com wrote: On Mon, 1 Nov 2010 16:17:23 +0100, Björn Smedman bjorn.smed...@venatech.se wrote: Hi all, I have an application that creates and destroys a lot of ap vifs and does a lot of monitor frame injection. The recent ath9k rx locking fixes have helped with stability in this use-case but there still seems to be some tx/beacon related race condition(s). These manifests themselves as follows on an AR913x based router running compat-wireless-2010-10-19 (with locking fixes etc from openwrt): 1. TX DMA hangs under simultaneous high RX and TX load 2. TX is completely hung but chip is never reset I have also observed both of these behaviors with just a standard hostapd single VIF configuration. Quite annoying. It seems to be better with recent wireless-testing trees. - Ben The next thing that looks racy to me is ath_beacon_alloc() vs ath_beacon_tasklet() in beacon.c. Beacon queue TX DMA is always stopped in main.c before calling ath_beacon_alloc() but ath_beacon_tasklet() is scheduled when we get an SWBA interrupt. My guess is that these keep coming even if we stop TX DMA on the beacon queue, no? My TX PCU patches for ath9k are not merged yet, try those or wait until John merges them. They are merged in OpenWrt. Björn, which OpenWrt revision did you use in your tests? - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] [PATCH v2] ath9k: Properly initialize ath_common-cc_lock.
On 2010-10-16 12:04 AM, gree...@candelatech.com wrote: From: Ben Greear gree...@candelatech.com Otherwise, lockdep splats, at the least: [...] diff --git a/drivers/net/wireless/ath/ath9k/init.c b/drivers/net/wireless/ath/ath9k/init.c index a4c5ed4..fdc25f9 100644 --- a/drivers/net/wireless/ath/ath9k/init.c +++ b/drivers/net/wireless/ath/ath9k/init.c @@ -577,6 +577,7 @@ static int ath9k_init_softc(u16 devid, struct ath_softc *sc, u16 subsysid, common-hw = sc-hw; common-priv = sc; common-debug_mask = ath9k_debug; + spin_lock_init(common-cc_lock); spin_lock_init(sc-wiphy_lock); spin_lock_init(sc-sc_resetlock); Makes sense. Please update ath5k as well, it also uses the lock. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] [RFC] ath9k: Insert wmb before linking dma descriptors
On 2010-10-09 5:19 PM, Björn Smedman wrote: On Tue, Oct 5, 2010 at 9:50 PM, Luis R. Rodriguez lrodrig...@atheros.com wrote: Felix is more familiar with this area so I'll let him chime with his ACK/NACK. Luis So Felix, what do you think? I realize it may not be a common problem on any currently popular platform, but don't you agree it is in principle unsafe to write to the ds_link field of dma descriptors without a guarantee that those writes will be performed in the intended order? Haven't had time to review this in detail yet. I'll take another look at it later... - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] ath9k, multiple stations, and AMPDUs
On 2010-09-22 6:33 AM, Ben Greear wrote: On 09/21/2010 03:41 PM, Felix Fietkau wrote: On 2010-09-21 10:19 PM, Ben Greear wrote: On 09/21/2010 12:32 PM, Felix Fietkau wrote: On 2010-09-21 9:28 PM, Johannes Berg wrote: On Tue, 2010-09-21 at 20:00 +0200, Felix Fietkau wrote: Could we just poke a pointer to the STA into the ath_buf structure? No, that doesn't work because of RCU. Well, it could work, if you walk all the structures upon sta_notify and remove now stale pointers (or just drop the frames or something). I think it would be much better to just add the helper function that checks the RA on STA lookup. Keeps things simple, especially since nothing else in the tx path needs the vif. How about this. Seems to do the trick on my system: diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c index 85a7323..09815a1 100644 --- a/drivers/net/wireless/ath/ath9k/xmit.c +++ b/drivers/net/wireless/ath/ath9k/xmit.c @@ -328,8 +328,7 @@ static void ath_tx_complete_aggr(struct ath_softc *sc, struct ath_txq *txq, rcu_read_lock(); - /* XXX: use ieee80211_find_sta! */ - sta = ieee80211_find_sta_by_hw(hw, hdr-addr1); + sta = tx_info-control.sta; As I mentioned in another email: at the time we get the tx status report, we have to consider the sta pointer stale. It may or may not still be valid. How about this one. I think it ensures that the sta will never be stale, No, it doesn't. At least not in AP mode. since it flushes the tx queue on vif removal. Minimal testing shows it working, but of course I might be missing something. In AP mode, a vif has multiple sta. And draining the queue when a sta gets removed is not a good idea. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] Crash Report
On 2010-09-22 1:51 PM, Masahiro Inoue wrote: Hello, The log when crashing with compat-wireless-2010-09-20 is sent. My kernel is 2.6.27. [ cut here ] kernel BUG at /home/miyabi/test/compat-wireless/compat-wireless-2010-09-20/drivers/net/wireless/ath/ath9k/xmit.c:190! I've sent some fixes for this to linux-wireless on September 20, which will make it into a new compat-wireless snapshot soon. If you want to test them manually, look for [PATCH 1/5] ath9k: clean up block ack window handling and the two following patches on the linux-wireless list and apply those. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] ath9k, multiple stations, and AMPDUs
On 2010-09-21 7:25 AM, Ben Greear wrote: I think I have narrowed down some problems I see when I create two STA interfaces on the same radio and associate with the same AP. I'm using wireless-testing plus some patches to the rx logic I posted earlier. It appears that the ath9k NIC does not transport AMPDUs properly, and after a few seconds, it's backlog is exhausted and the queues are stopped in ath_tx_start. ath9k currently looks up the sta by hw and not by vif on tx completion. That completely breaks aggregation in such a setup. Unfortunately I don't know an easy way to fix this yet. This will be fixed by my sw aggregation rewrite, but I don't know when I'll get that completed yet. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] ath9k-devel: mac80211 Rate Control
On 2010-09-21 10:37 AM, Lorna González wrote: Hello, I have some questions regarding mac80211 rate control: 1. Is minstrel the default rate control algorithm? minstrel_ht is for 802.11n, for legacy it falls back to minstrel. ath9k still uses its own rate control by default though - but I sent a patch that allows you to turn that off :) 2. If I built two interfaces on the mac80211 layer and I connect one interface i.e. wlan0 to an ieee802.11g AP and the second one i.e. wlan1 to a iee802.11n AP. Will the rate control work independent for each network or it would take for both networks the rate control of the iee802.11g/n? Has someone made a similar test? 3. Does rate control also aply for virtual wiphys? or since the mac80211 does not know about multiple radios there is not rate control implemented? No need to test for this specifically, since rate control works on each individual peer anyway, so it doesn't matter whether there's one virtual interface or any other number. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] ath9k, multiple stations, and AMPDUs
On 2010-09-21 2:08 PM, Ben Greear wrote: On 09/21/2010 03:10 AM, Felix Fietkau wrote: On 2010-09-21 7:25 AM, Ben Greear wrote: I think I have narrowed down some problems I see when I create two STA interfaces on the same radio and associate with the same AP. I'm using wireless-testing plus some patches to the rx logic I posted earlier. It appears that the ath9k NIC does not transport AMPDUs properly, and after a few seconds, it's backlog is exhausted and the queues are stopped in ath_tx_start. ath9k currently looks up the sta by hw and not by vif on tx completion. That completely breaks aggregation in such a setup. Unfortunately I don't know an easy way to fix this yet. This will be fixed by my sw aggregation rewrite, but I don't know when I'll get that completed yet. If you have any more details on this, please let me know. I'm going to attempt to fix it...I certainly have a good test case :) ath_tx_complete_aggr completes an A-MPDU frame, which typically triggers the release of the next A-MPDU to the hw queue. To keep track of the Block ACK window, it needs to look up the TID, for which it needs a STA pointer. At that level, the driver typically doesn't have access to the vif. It might be possible to fix this by adding another sta lookup helper function in mac80211 that takes another address argument for the BSSID, so that it can get the sta entry for the correct vif. I don't know if Johannes wants something like that though. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] ath9k, multiple stations, and AMPDUs
On 2010-09-21 7:25 PM, Ben Greear wrote: On 09/21/2010 05:19 AM, Felix Fietkau wrote: On 2010-09-21 2:08 PM, Ben Greear wrote: If you have any more details on this, please let me know. I'm going to attempt to fix it...I certainly have a good test case :) ath_tx_complete_aggr completes an A-MPDU frame, which typically triggers the release of the next A-MPDU to the hw queue. To keep track of the Block ACK window, it needs to look up the TID, for which it needs a STA pointer. At that level, the driver typically doesn't have access to the vif. It might be possible to fix this by adding another sta lookup helper function in mac80211 that takes another address argument for the BSSID, so that it can get the sta entry for the correct vif. I don't know if Johannes wants something like that though. Could we just poke a pointer to the STA into the ath_buf structure? No, that doesn't work because of RCU. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] ath9k, multiple stations, and AMPDUs
On 2010-09-21 8:04 PM, Ben Greear wrote: On 09/21/2010 11:00 AM, Felix Fietkau wrote: On 2010-09-21 7:25 PM, Ben Greear wrote: On 09/21/2010 05:19 AM, Felix Fietkau wrote: On 2010-09-21 2:08 PM, Ben Greear wrote: If you have any more details on this, please let me know. I'm going to attempt to fix it...I certainly have a good test case :) ath_tx_complete_aggr completes an A-MPDU frame, which typically triggers the release of the next A-MPDU to the hw queue. To keep track of the Block ACK window, it needs to look up the TID, for which it needs a STA pointer. At that level, the driver typically doesn't have access to the vif. It might be possible to fix this by adding another sta lookup helper function in mac80211 that takes another address argument for the BSSID, so that it can get the sta entry for the correct vif. I don't know if Johannes wants something like that though. Could we just poke a pointer to the STA into the ath_buf structure? No, that doesn't work because of RCU. Would it also be bad to use skb-dev to find an STA? It would be bad to keep a STA pointer around anywhere in the skb or the ath_buf. You can only use it within a rcu_read_lock()/rcu_read_unlock() pair. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] ath9k, multiple stations, and AMPDUs
On 2010-09-21 9:28 PM, Johannes Berg wrote: On Tue, 2010-09-21 at 20:00 +0200, Felix Fietkau wrote: Could we just poke a pointer to the STA into the ath_buf structure? No, that doesn't work because of RCU. Well, it could work, if you walk all the structures upon sta_notify and remove now stale pointers (or just drop the frames or something). I think it would be much better to just add the helper function that checks the RA on STA lookup. Keeps things simple, especially since nothing else in the tx path needs the vif. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] ath9k, multiple stations, and AMPDUs
On 2010-09-21 10:19 PM, Ben Greear wrote: On 09/21/2010 12:32 PM, Felix Fietkau wrote: On 2010-09-21 9:28 PM, Johannes Berg wrote: On Tue, 2010-09-21 at 20:00 +0200, Felix Fietkau wrote: Could we just poke a pointer to the STA into the ath_buf structure? No, that doesn't work because of RCU. Well, it could work, if you walk all the structures upon sta_notify and remove now stale pointers (or just drop the frames or something). I think it would be much better to just add the helper function that checks the RA on STA lookup. Keeps things simple, especially since nothing else in the tx path needs the vif. How about this. Seems to do the trick on my system: diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c index 85a7323..09815a1 100644 --- a/drivers/net/wireless/ath/ath9k/xmit.c +++ b/drivers/net/wireless/ath/ath9k/xmit.c @@ -328,8 +328,7 @@ static void ath_tx_complete_aggr(struct ath_softc *sc, struct ath_txq *txq, rcu_read_lock(); - /* XXX: use ieee80211_find_sta! */ - sta = ieee80211_find_sta_by_hw(hw, hdr-addr1); + sta = tx_info-control.sta; As I mentioned in another email: at the time we get the tx status report, we have to consider the sta pointer stale. It may or may not still be valid. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] [RFC] ath9k: improve aggregation throughput by using only first rate
On 2010-07-26 7:10 PM, Björn Smedman wrote: Hi all, I've been running a lot of iperf on AR913x / compat-wireless-2010-07-16 (w/ openwrt/tr...@22388). I think there are some (in theory) simple improvements that can be done to the tx aggregation / rate control logic. A proof of concept of one such improvement is provided below. Basically, it's a hack that makes ath9k output aggregates with only the first rate in the rate series. The reasoning is that a failure is not a problem for aggregates because there is software retry. Retrying in hardware at a slower rate is counter productive. So, better to fail and do a software retry at possibly another rate. Also, since the aggregate size is often limited by the slowest rate in the MRR series (4 ms txop limit) having a slow rate in the series may affect performance even if it is never used by the hardware. In my (not so scientific) tests max AP downstream throughput increases about 30-40% with the patch below (from 33.9 to 55.7 Mbit/s with HT20 in noisy environment with 20 meters and a few walls between AP and client). Of course, if all rates in the series are high then this patch has no effect. I think it makes sense to rely less on on-chip MRR for fallback, but I think to make this workable, we really should use the MRR table for something, otherwise the rate control algorithm will take much longer to adapt. It's probably better to fix this properly after I'm done with my A-MPDU rewrite, because then I can more easily push parts of the software retransmission behaviour into minstrel_ht directly. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] ath9k: performance regressions / tx semi-stuck somehow
On 2010-07-22 12:17 AM, Björn Smedman wrote: Hi all, I just tried out compat-wireless-2010-07-16 on AR913x (with openwrt/tr...@r22321) and saw some weird performance problems. That's in exactly the same spot I was getting 16Mbps consistently with AP was 11n! Any debuging I can do to help? Bisecting is unfortunately a lot of work on this embedded system... Could you please try if the earlier version that was in OpenWrt (2010-07-06) has the same issues? - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] Question about ath9k signal strength (AP mode)
On 2010-07-11 3:37 AM, Peter Stuge wrote: Lars Hardy wrote: I have tried with 2 different Atheros chipset in AP mode, the AR5416 and AR9280. dd-wrt gives a stronger signal with both chipsets compared with openwrt. I know the ath9k is under development, so my question is if this is known by the development team and therefor will be worked on? I continue to have problems with ath9k and development is slower than I first expected since Atheros seems to have very limited focus on support, and only for the very latest generation hardware. (I wish they made this fact more clear.) Community resources are of course also limited, but great improvements have been made to ath9k by members of the OpenWRT community and I guess they will continue their work. Without looking much at the code, but following this list for half a year, my gut feeling is that there is still too much major work required on ath9k at this point for fine tuning such as signal strength optimization to come up on the agenda in the near future.. I disagree. The major work has been done and hardware support has mostly caught up with Atheros' own codebase. What's missing is debugging work and fine tuning - and other than that, maybe a few things that aren't very hardware specific. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] ath9k: ap tsf seems random and only uses lower 24 bits or so
On 2010-06-29 5:20 PM, Björn Smedman wrote: 2010/6/29 Felix Fietkau n...@openwrt.org: IMHO the most likely problem source is stuck beacons. Please compile the driver with the debug option enabled and load it with insmod ath9k debug=0x0100 It looks like it could be: ... Jan 1 00:06:21 OpenWrt user.debug kernel: ath: slot 2 [tsf 1986567 tsftu 1940 intval 100] vif (null) Jan 1 00:06:21 OpenWrt user.debug kernel: ath: slot 1 [tsf 2012168 tsftu 1965 intval 100] vif (null) Jan 1 00:06:21 OpenWrt user.debug kernel: ath: slot 0 [tsf 2037767 tsftu 1990 intval 100] vif 80945e70 Jan 1 00:06:21 OpenWrt user.debug kernel: ath: slot 0 [tsf 79033 tsftu 77 intval 100] vif 80945e70 Jan 1 00:06:21 OpenWrt user.debug kernel: ath: missed 1 consecutive beacons Jan 1 00:06:21 OpenWrt user.debug kernel: ath: resume beacon xmit after 1 misses Jan 1 00:06:21 OpenWrt user.debug kernel: ath: slot 3 [tsf 117790 tsftu 115 intval 100] vif (null) Jan 1 00:06:21 OpenWrt user.debug kernel: ath: slot 2 [tsf 143368 tsftu 140 intval 100] vif (null) Jan 1 00:06:21 OpenWrt user.debug kernel: ath: slot 1 [tsf 168967 tsftu 165 intval 100] vif (null) ... Jan 1 00:09:08 OpenWrt user.debug kernel: ath: slot 1 [tsf 14197768 tsftu 13865 intval 100] vif (null) Jan 1 00:09:08 OpenWrt user.debug kernel: ath: slot 0 [tsf 14223368 tsftu 13890 intval 100] vif 80945e70 Jan 1 00:09:08 OpenWrt user.debug kernel: ath: slot 3 [tsf 14248967 tsftu 13915 intval 100] vif (null) Jan 1 00:09:08 OpenWrt user.debug kernel: ath: slot 0 [tsf 79180 tsftu 77 intval 100] vif 80945e70 Jan 1 00:09:08 OpenWrt user.debug kernel: ath: missed 1 consecutive beacons Jan 1 00:09:08 OpenWrt user.debug kernel: ath: resume beacon xmit after 1 misses Jan 1 00:09:08 OpenWrt user.debug kernel: ath: slot 3 [tsf 117791 tsftu 115 intval 100] vif (null) Jan 1 00:09:08 OpenWrt user.debug kernel: ath: slot 2 [tsf 143366 tsftu 140 intval 100] vif (null) Jan 1 00:09:08 OpenWrt user.debug kernel: ath: slot 1 [tsf 168967 tsftu 165 intval 100] vif (null) Jan 1 00:09:08 OpenWrt user.debug kernel: ath: slot 0 [tsf 194567 tsftu 190 intval 100] vif 80945e70 ... What can cause a missed beacon? Are they just a fact of life? In any case I can't find any code that resets the tsf in this (single missed beacon) case. Will the hardware reset the tsf automatically whenever a single beacon is missed? Isn't that a bit overkill? Will it not cause problems for clients? One beacon miss should never cause a TSF reset. Only a lot of consecutive beacon misses trigger a hardware reset, which then resets the TSF. Looking at your log, it appears that the beacon miss is a symptom rather than a cause of the TSF jumps. Can you add a debug statement to the hw reset function to see if it's called before the TSF jumps? - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] ath9k: ap tsf seems random and only uses lower 24 bits or so
On 2010-06-29 6:36 PM, Björn Smedman wrote: 2010/6/29 Felix Fietkau n...@openwrt.org: One beacon miss should never cause a TSF reset. Only a lot of consecutive beacon misses trigger a hardware reset, which then resets the TSF. Looking at your log, it appears that the beacon miss is a symptom rather than a cause of the TSF jumps. Can you add a debug statement to the hw reset function to see if it's called before the TSF jumps? Yup, seems to be a hardware reset. Added an ath_print (Reset HW!) at the beginning of ath9k_hw_reset() and used debug mask 0x101: ... Jan 1 00:01:59 OpenWrt user.debug kernel: ath: slot 3 [tsf 14863367 tsftu 14515 intval 100] vif (null) Jan 1 00:01:59 OpenWrt user.debug kernel: ath: slot 2 [tsf 14888967 tsftu 14540 intval 100] vif (null) Jan 1 00:01:59 OpenWrt user.debug kernel: ath: slot 1 [tsf 14914568 tsftu 14565 intval 100] vif (null) Jan 1 00:01:59 OpenWrt user.debug kernel: ath: Reset HW! Jan 1 00:01:59 OpenWrt user.debug kernel: ath: ah-misc_mode 0xc Jan 1 00:01:59 OpenWrt user.debug kernel: ath: Setting CFG 0x10a Jan 1 00:01:59 OpenWrt user.debug kernel: ath: slot 0 [tsf 80123 tsftu 78 intval 100] vif 80945e70 Jan 1 00:01:59 OpenWrt user.debug kernel: ath: missed 1 consecutive beacons Jan 1 00:01:59 OpenWrt user.debug kernel: ath: Reset HW! Jan 1 00:01:59 OpenWrt user.debug kernel: ath: ah-misc_mode 0xc Jan 1 00:01:59 OpenWrt user.debug kernel: ath: Setting CFG 0x10a Jan 1 00:01:59 OpenWrt user.debug kernel: ath: slot 0 [tsf 80989 tsftu 79 intval 100] vif 80945e70 Jan 1 00:01:59 OpenWrt user.debug kernel: ath: missed 1 consecutive beacons Jan 1 00:01:59 OpenWrt user.debug kernel: ath: resume beacon xmit after 1 misses Jan 1 00:01:59 OpenWrt user.debug kernel: ath: slot 3 [tsf 117792 tsftu 115 intval 100] vif (null) Jan 1 00:01:59 OpenWrt user.debug kernel: ath: slot 2 [tsf 143368 tsftu 140 intval 100] vif (null) Jan 1 00:01:59 OpenWrt user.debug kernel: ath: slot 1 [tsf 168967 tsftu 165 intval 100] vif (null) ... Please add another print to the end of ath9k_hw_check_alive() before the 'return false'. Make sure it prints the value of the 'reg' variable. If you see it in the log, then it's probably the baseband getting stuck. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] ath9k: ap tsf seems random and only uses lower 24 bits or so
On 2010-06-29 11:40 PM, Björn Smedman wrote: 2010/6/29 Björn Smedman bjorn.smed...@venatech.se: Yes, hw reset is due to reg = 0x01702400 every 4 - 40 seconds or so: ... When the chip is really stuck, does 'reg' (at 'return false') change over time? If I add a second requirement that ath9k_hw_check_alive() must fail three times in a row (in different invocations of ath9k_tasklet()) before we reset the chip the ap seems fine. I sometimes get several of these reg = 0x01702400 every second but only one or at the max two in a row. Under load I sometimes see some reg = 0x00f02400 as well. I also see an occasional reset now and then (about once a minute) that must be caused by something else. Any insight into what these reg values mean? Do you think they can safely be ignored as per above? I had a similar thought about the multiple invocations thing. I think that's a good approach in general, but we need to ensure that we make it safe. The main point of this function is to detect baseband hangs. If we experience such a hang, I'm not sure we will always get enough interrupts to do multiple consecutive tests. One way to make it safe would be to reschedule the tasklet each time we ignore the result of the ath9k_hw_check_alive(), that way we keep the detection time low as well. Maybe we could also use a timer for leaving 10 ms time between attempts. Another thing that I'm working on right now is to ensure that the TSF gets preserved across resets. For some AR9280 based cards the code already preserves TSF in software over the chip reset, I could simply extend that to cover SoC as well. But before I post such a patch, I'll do a test on AR9160 - to see if it would be better to make the TSF preserve unconditional. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] ath9k: ap tsf seems random and only uses lower 24 bits or so
On 2010-06-30 12:50 AM, Björn Smedman wrote: 2010/6/29 Felix Fietkau n...@openwrt.org: I had a similar thought about the multiple invocations thing. I think that's a good approach in general, but we need to ensure that we make it safe. The main point of this function is to detect baseband hangs. If we experience such a hang, I'm not sure we will always get enough interrupts to do multiple consecutive tests. One way to make it safe would be to reschedule the tasklet each time we ignore the result of the ath9k_hw_check_alive(), that way we keep the detection time low as well. Maybe we could also use a timer for leaving 10 ms time between attempts. The xmit logic has sc-tx_complete_work that periodically checks if the tx is hung and resets the chip if so. Maybe we could refactor that into a common periodic health checkup task in main.c that could call into both xmit.c and recv.c? Or does ath9k_hw_check_alive() have to run in the interrupt context in some way? I'd like to keep them separate. I think a tx queue hang is very rare, and the check only runs every second or so, whereas the baseband hang check needs to run very frequently, as in some situations, the hangs can probably occur frequently as well. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] [RFC/RFT] minstrel_ht: new rate control module for 802.11n
On 2010-06-28 12:20 PM, Björn Smedman wrote: 2010/6/28 Felix Fietkau n...@openwrt.org: On 2010-06-28 2:01 AM, Björn Smedman wrote: [snip] I guess the real solution is your rewrite... But in the mean time perhaps we can memcpy the tx_info control from the last subframe to the first before calling ath_buf_set_rate() in ath_tx_sched_aggr()? Could that have any side effects? It could make the aggregate size go over the 4 ms limit I guess... How bad is that? There's an easy solution which would take into account the 4ms frame limit properly, and which could work without any memcpy() hacks: I could just grab a pointer to the last buffer in the tid queue in the ath_tx_sched_aggr() function, then pass it to ath_lookup_rate() via ath_tx_form_aggr(), and also to ath_buf_set_rate(). Then I make those functions use this last buffer as reference for the rate lookup. Sounds better to use the rate control from the last buffer in the tid queue. But be careful if you don't memcpy it to the first frame of the aggregate then the feedback calculated in ath_tx_rc_status() after tx will be incorrect again, no? Right. I intend to let ath_buf_set_rate() set the rates array of the first subframe accordingly. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] ath9k: ap tsf seems random and only uses lower 24 bits or so
On 2010-06-29 12:31 AM, Björn Smedman wrote: Hi all, I'm getting weird values from the debugfs file ieee80211/phy0/tsf: the value goes up and down rather randomly and only the lower 24 bits or so seem to ever be used (see below for details). The only thing running on phy0 is a single ap interface (and the monitor companion that hostapd sets up). I was expecting tsf to increase monotonically until all 64 bits had been used. For a moment I thought it might be the kernel snprintf (on mips) playing a trick on me so I tried the following patch. But the result is the same. IMHO the most likely problem source is stuck beacons. Please compile the driver with the debug option enabled and load it with insmod ath9k debug=0x0100 - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] [RFC/RFT] minstrel_ht: new rate control module for 802.11n
On 2010-06-28 2:01 AM, Björn Smedman wrote: 2010/6/23 Felix Fietkau n...@openwrt.org: On 2010-06-23 6:36 PM, Björn Smedman wrote: [snip] As far as I can tell, whenever the first subframe of an aggregate fails and is software retried, the rate control feedback for that aggregate is lost (ath_tx_rc_status() is never called with update_rc = true in xmit.c). I think you misread that part. The loop iterates over all subframes in the aggregate, and the first successful or swretry-expired frame will trigger an AMPDU status report, which will update the RC. The first subframe of the A-MPDU is not getting any special treatment here. You're (still) right I misread that part. But I think there is another problem when the first subframe of an A-MPDU is not acked: if it has not expired yet it is (as I understand it) prepended to the tid queue for software retry and will therefore be the first subframe of the next aggregate as well, which will then be transmitted with the same old rates and counts as the previous aggregate. So the feedback from xmit to rc works, but the control information flow from rc to xmit is delayed. I guess the real solution is your rewrite... But in the mean time perhaps we can memcpy the tx_info control from the last subframe to the first before calling ath_buf_set_rate() in ath_tx_sched_aggr()? Could that have any side effects? It could make the aggregate size go over the 4 ms limit I guess... How bad is that? There's an easy solution which would take into account the 4ms frame limit properly, and which could work without any memcpy() hacks: I could just grab a pointer to the last buffer in the tid queue in the ath_tx_sched_aggr() function, then pass it to ath_lookup_rate() via ath_tx_form_aggr(), and also to ath_buf_set_rate(). Then I make those functions use this last buffer as reference for the rate lookup. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] ATH9K does not transmit beacons in AD-HOC mode
On 2010-06-26 8:47 AM, Alphan Ulusoy wrote: Dear Felix, Thank you for your reply, as I was going over the code yesterday I've changed several parts and also the part your first patch covers. However I have also felt that beacon staggering is somewhat problematic. I've made a total of 4 changes and ad-hoc beacons work now: 1) Applied your second patch. 2) Disabled beacon staggering by setting ATH_BCBUF (ath9k.h) to 1. I believe, this covers your second patch. Unnecessary, with my patch, as long as the opmode is set to ad-hoc, it will use a beacon interval that will only leave one beacon slot active anyway. 3) main.c:ath9k_bss_info_changed() calls ath_beacon_alloc() when called by mac80211. However, ath_beacon_alloc() honors the sc-sc_ah-opmode field for deciding which beacon config function to run while ath9k_bss_info_changed() does not update this field from vif-type. Thus, I've added the following at line 2051 of main.c:ath9k_bss_info_changed(). ah-opmode = vif-type; sc-ah-opmode should have been set by the code that is called when the interface is brought up, setting it from ath9k_bss_info_changed() is definitely wrong. 4) beacon.c:ath_beacon_config_adhoc() relies on sc-beacon.bc_tstamp field to determine the next beacon time. However, during initialization I've found this field to be very large in number causing the first beacon to suffer a long delay causing problems in association. Maybe this field is missing the required initialization somewhere? Instead, I've added the following at line 752 of beacon.c so that first beacon is sent sooner. nexttbtt = tsftu + 5*intval; It's normal for this to be large, however maybe this needs some range checking against the local tsf. The intention behind using bc_tstamp is that the hardware syncs its own TSF against the one from the beacon automatically. After these modifications ad-hoc beacons work just fine, beacon tasklet is called appropriately and I did not observe any missed beacons. I would like to hear your comments especially regarding points 3, 4. I'd suggest testing a recent version of mac80211/ath9k before taking another look at any of the changes that you made. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] ATH9K does not transmit beacons in AD-HOC mode
On 2010-06-25 11:07 AM, Alphan Ulusoy wrote: Hi, I have noticed that ATH9K does not transmit beacons in IBSS. I can only see probe request/response frames but no periodic beacons. I have even applied the patch proposed by Felix (https://patchwork.kernel.org/patch/99373/) that disables VEOL and uses SWBA interrupts instead. It looks like beacon tasklet is never called. I don't think this is normal. Is there a missing line that must kick-off beacon transmission (e.g. ath9k_hw_puttxbuf, ath9k_hw_txstart) ? The second patch (the one you applied) doesn't work without the first patch (https://patchwork.kernel.org/patch/99372/) - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] [RFC/RFT] minstrel_ht: new rate control module for 802.11n
On 2010-06-23 6:36 PM, Björn Smedman wrote: 2010/3/2 Felix Fietkau n...@openwrt.org: On 2010-03-02 4:47 PM, Björn Smedman wrote: 2010/3/2 Felix Fietkau n...@openwrt.org: [snip] You mean the hardware interprets the block-ack and keeps retrying the un-acked frames? I thought it stopped as soon as it got a block-ack to let software sort out the acked and un-acked frames and handle the partial A-MPDU retry. Not sure, actually. I just looked at the ath9k tx path again, and it seems that you're right. However it looks like it's not sending rate control updates until it's done with the software retry, so that's probably the reason why I wasn't able to make it more precise yet. I had another look at the code now and if I read it correctly this delay in the rate control feedback is really scary. In the extreme case where all the rates in the MRR stop working you have to make 10 (ATH_MAX_SW_RETRIES) aggregate software retries (of about 20 frames each) with approx 10 hardware retries each before you give the rate control algorithm any feedback whatsoever. That is a worst case of several thousand (pointless) subframe retransmissions before the rate control algorithm has a chance to adjust... Yes, the extreme case is currently not handled properly. However the extreme case is also extremely unlikely to trigger. With minstrel_ht, the max_prob_rate is always in the MRR series. If the conditions jump from all rates working down to even max_prob_rate failing, then something must be so wrong with the radio, that there's probably no possibility of a graceful fallback at all. I do agree that this should be fixed, though. If I'm not wrong above then the rate control feedback must also be incorrect: a disaster of that magnitude simply cannot be conveyed to the rate control algorithm through the thin tx status interface. As far as I can tell, whenever the first subframe of an aggregate fails and is software retried, the rate control feedback for that aggregate is lost (ath_tx_rc_status() is never called with update_rc = true in xmit.c). I think you misread that part. The loop iterates over all subframes in the aggregate, and the first successful or swretry-expired frame will trigger an AMPDU status report, which will update the RC. The first subframe of the A-MPDU is not getting any special treatment here. Any ideas on how to fix this? To me the aggregation and rate control code seems to need a major overhaul, something which would require changes to the interface between mac80211 and drivers, e.g. ath9k. That's out of my league unfortunately... I've already made a lot of progress rewriting the entire aggregation logic (it'll be in mac80211 instead of ath9k). As soon as I'm done fixing the current batch of bugs that I'm debugging at the moment, I will post my changes as RFC on the linux-wireless list. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] [RFC/RFT] minstrel_ht: new rate control module for 802.11n
On 2010-06-23 8:47 PM, Björn Smedman wrote: 2010/6/23 Felix Fietkau n...@openwrt.org: On 2010-06-23 6:36 PM, Björn Smedman wrote: [snip] If I'm not wrong above then the rate control feedback must also be incorrect: a disaster of that magnitude simply cannot be conveyed to the rate control algorithm through the thin tx status interface. As far as I can tell, whenever the first subframe of an aggregate fails and is software retried, the rate control feedback for that aggregate is lost (ath_tx_rc_status() is never called with update_rc = true in xmit.c). I think you misread that part. The loop iterates over all subframes in the aggregate, and the first successful or swretry-expired frame will trigger an AMPDU status report, which will update the RC. The first subframe of the A-MPDU is not getting any special treatment here. You're right, I missed that part. And I guess that's also why the worst case is so rare. But on the other hand doesn't that also mean that MRR rates and counts in the tx status will be incorrect? I.e. one set of rates/counts used to transmit the aggregate (first subframe) and (possibly) another set reported back in tx status (first acked/expired subframe)? No, MRR info is just fine. The retry stats are reported for the whole A-MPDU, not for indvididual subframes. It's always present in the last descriptor of the whole batch. Also, since my code cleanup a while ago, the converted rx/tx status info is not longer part of the allocated descriptor space, but instead kept on stack in the function that processes the tx status. This on stack tx status is then passed to the rc update function, which sends the data to mac80211 along with the AMPDU tx status report. Also, bf-bf_retries is used in ath_tx_rc_status() but the logic makes no sense if bf_retries holds the number of software retries... Hmm, that part might indeed be buggy. I'll review it in detail when I'm at home again. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] ath9k driver instability
On 2010-06-14 1:58 PM, Ben Gamari wrote: On Sat, 29 May 2010 22:16:13 -0700 (PDT), Cloud Strife piroisl...@yahoo.com wrote: Sometimes the card stops working all together (cannot scan, connect, etc.), I have to ifconfig wlan0 down; rmmod ath9k; modprobe ath9k; ifconfig wlan0 up to get it started again. There is also no interference in my area as well. I too have been experiencing this sort of failure with my AR5008 cards. I have a AR5008 rev. 1 card in my server running in master mode, which dies about once per day. During this period all stations attempting to associate with the access point timeout. Unfortunately, there is no useful error message produced by the driver, so I have no idea what might be going wrong. The only way to bring the card back is to reload the driver. I would recommend testing the latest bleeding-edge compat-wireless tarball. That's what I use in OpenWrt, which is getting a lot of testing specifically in AP mode. I have another similar card in my personal laptop, which generally runs in managed mode. Here, I experience similar types of failures several times a day. The symptoms are almost identical, with inexplicable timeouts I'm aware that the Atheros developers aren't actively supporting AR5008 hardware but even some hints as to where to begin in debugging would be appreciated. I've more than once stated my complete willingness to do whatever necessary to troubleshoot this issue yet Atheros has remained entirely silent. There is just too much stuff to do, which makes it hard to find the time to take care of stability on old hardware as well, especially since debugging such issues can be very time consuming. I would be more than willing to buy hardware based on a newer Atheros chipset if there were products available. However, despite the dozens of products listed on the Ath9k products page[1], I have not seen a single AR92xx-based full-sized mini-PCIe card available to consumers (most of the hardware is produce for OEM it appears, a market to which most individuals have no access). Meanwhile, AR5008 hardware is cheap, available, and works, albeit not with ath9k. However, I would be most appreciative of anyone who could point me in the direction of where to acquire a newer card compatible with my hardware. I got myself an Ubiquiti SR-71e, but unfortunately the card seems to have disappeared from the manufacturer's website. Maybe you can still find it in a shop somewhere. It does work well for me with ath9k. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] ath9k driver instability
On 2010-06-16 1:32 AM, RHS Linux User wrote: Hi All, I notice that the wireless driver version in xpud works reliably, while the version in puppy does not? The puppy version comes up OK, but then dies after a short time (minutes)? I wonder if the wireless driver can (easily?) be arranged to be kernel version independent? It would sure help with debug. Maybe some sort of compatibility layer or special conversion/debug interface driver with a common interface to the CPU ? Compat-wireless takes care of the backporting stuff... - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] ath9k Segmentation fault on the OpenWRT. 5416+5133
On 2010-06-01 8:27 AM, linux_pro wrote: [ 30.408000] PCI error 1 at PCI addr 0x100010c0 [ 30.408000] Data bus error, epc == 80c848bc, ra == 80c848a8 [ 30.408000] Oops[#1]: [ 30.408000] Cpu 0 [...] Please enable CONFIG_KERNEL_KALLSYMS in your OpenWrt .config when posting kernel crash logs. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] About A-MPDU aggregation. ATH_AMPDU_SUBFRAME_DEFAULT flag.
On 2010-04-05 4:11 PM, Rakesh Kumar wrote: Hi, I notice this parameter ATH_AMPDU_SUBFRAME_DEFAULT and in the code in xmit.c, function ath_tx_form_aggr limits the number of sub-frames that can be included in a A-MPDU to half the total size. The accompanying comments say: /* do not exceed subframe limit */ But why halve the baw_size for that? I looked around the standard documents, couldn't find any reference to halving the station-specfic baw_size (negotiated during association) to half. Maybe because ath9k tries to keep two A-MPDU aggregates in the xmit queue and has to handle retransmission of failed subframes in software. IMHO it makes sense to use half of the BAW for each of the queue entries under those conditions. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] Ath9k MIMO Performance versus Proprietary Drivers
On 2010-02-21 9:41 PM, Galen wrote: Hello All, I have been testing out ath9k in a variety of situations. I have observed it appears to have poorer handling in MIMO-intensive environments than the binary drivers under Mac OS X and Windows. I have two computers with identical radios (3x3:2 AR5008 Mini PCI-e) and as similar of antenna configurations as possible. One computer runs Windows XP + latest Atheros binary drivers and Linux 2.6.32 + latest compat-wireless. (I have also tested some older versions with similar results.) The other computer runs Mac OS X 10.6.2 which contains the latest Atheros binary drivers. *** Further Testing *** I plan to create a triple-boot environment so I can compare any OS combination on exactly the same hardware. Absolute care will be used when rebooting as not to move anything. I will run a standard series of network tests under Linux and OS X. (It is a significantly larger headache to setup Cygwin to use the same tools on Windows and I will only do so if OS X versus Linux testing is inconclusive.) I will also have another system in monitor mode and be recording the raw 802.11 frames for later review. I do not expect a significant change in behavior in my testing environment, but I do expect more accurate and precise data, which can hopefully help me identify the cause of the performance differences. *** Discussion *** While I realize some of my examples are rather extreme, they clearly demonstrate the huge disparity between ath9k and proprietary drivers. I suspect that ath9k may have inferior MIMO support code (MRC, beamforming, etc.) as compared to the proprietary drivers. I believe that STBC is still not supported yet either. All of the currently available common Atheros hardware such as AR9280 and earlier chip generations do not have MRC, Beamforming and similar advanced features. Except for STBC, ath9k seems to have pretty much the same hardware features as Atheros' other drivers. There may be some workarounds for various hw issues missing, I have not extensively reviewed that yet. While I haven't done any tests with it yet, I believe STBC could potentially make a difference in strong multipath environments, especially when dealing with lower signal strength. The signal strength issues might also be related to ANI, you should probably disable that in ath9k to make sure that it's not screwing up your test results. Can anybody comment on this issue? Have you experienced it yourself? While I haven't done any extensive testing in that area, nor compared it against proprietary APs directly, your description of ath9k issues sounds somewhat similar to what I'm seeing with AR9280 hardware in my tests. Does anybody have ideas on how this might be improved? I have been considering ath9k for an embedded installation, but these multi-path performance differences are pretty disturbing. Atheros has a proprietary driver available with source for a very hefty license fee, but I'd rather put energies into ath9k - the kind of licensing fees they are working with can pay for a lot of developer time. I'm currently working on a new rate control algorithm for 11n, and I also intend to add STBC support to ath9k soon (it's only a few flags missing, nothing major). Maybe I should do STBC first, as it should be fairly straightforward. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] Ath9k MIMO Performance versus Proprietary Drivers
On 2010-02-22 8:43 PM, Galen wrote: On Feb 22, 2010, at 6:29 AM, Felix Fietkau wrote: *** Discussion *** While I realize some of my examples are rather extreme, they clearly demonstrate the huge disparity between ath9k and proprietary drivers. I suspect that ath9k may have inferior MIMO support code (MRC, beamforming, etc.) as compared to the proprietary drivers. I believe that STBC is still not supported yet either. All of the currently available common Atheros hardware such as AR9280 and earlier chip generations do not have MRC, Beamforming and similar advanced features. As Atheros does not release technical specifications without NDA, I do not have a clear picture of what is and is not supported by a particular chipset. Reviewing the ath9k source code is a very intensive process and only provides a partial picture. Some features might be implemented 100% in hardware, making them difficult to discern from ath9k source alone. Since I do commercial work for some companies using Atheros based stuff, I know a few things about the other codebases ;) Except for STBC, ath9k seems to have pretty much the same hardware features as Atheros' other drivers. There may be some workarounds for various hw issues missing, I have not extensively reviewed that yet. I would be interested in knowing more about these. LDPC? Others? There appear good software implementations of LDPC out there: http://planete-bcast.inrialpes.fr/article.php3?id_article=7 I'm pretty sure the current hardware also doesn't do LDPC yet. While I haven't done any tests with it yet, I believe STBC could potentially make a difference in strong multipath environments, especially when dealing with lower signal strength. Yes, I would tend to agree this could help significantly. Are there details about what you are implementing? Are you implementing encoding or decoding or both? Are you using an orthogonal or quasi-orthagonal code? Have you considered a hybrid system using a quasi-orthagonal code at low SNR and orthogonal at higher SNR? I think the hardware can only do 2:1 STBC The signal strength issues might also be related to ANI, you should probably disable that in ath9k to make sure that it's not screwing up your test results. If I am not mistaken, ANI seems unlikely to have a negative impact on performance. Do you believe it could be aversely affecting performance, or do you believe that it is simply causing the signal numbers to be mis-reported? ANI messes with the AGC parameters, OFDM or CCK self-correlation during reception, spur mitigation, and a few other things. Atheros has published a patent on this a while back. Look it up and read it if you're curious about the details. Short answer: I've seen it mess with the reported signal strength in their legacy chips, and I believe there's a good chance it still holds true for the 11n variants. Can anybody comment on this issue? Have you experienced it yourself? While I haven't done any extensive testing in that area, nor compared it against proprietary APs directly, your description of ath9k issues sounds somewhat similar to what I'm seeing with AR9280 hardware in my tests. I have a good testing environment and strong interests in seeing better performance, so let me know if there is anything I can test for you. I have AR5008, AR9280, AR9281 all on-hand, including radio modules from multiple vendors for each of these chips. I will probably be equipped to test AR9220 and AR9160 soon. I have a wide variety antennas from essentially zero gain to 30 dB and a mix of indoor and outdoor line of sight testing possible. All testing machines are setup with the latest Debian testing, so they always have the latest kernel, etc. You should use compat-wireless based on the wireless-testing sources, it contains a lot of stuff that hasn't made it into stable kernels yet. This is also what I use for the mac80211+drivers packages in OpenWrt. Does anybody have ideas on how this might be improved? I have been considering ath9k for an embedded installation, but these multi-path performance differences are pretty disturbing. Atheros has a proprietary driver available with source for a very hefty license fee, but I'd rather put energies into ath9k - the kind of licensing fees they are working with can pay for a lot of developer time. I'm currently working on a new rate control algorithm for 11n, and I also intend to add STBC support to ath9k soon (it's only a few flags missing, nothing major). Maybe I should do STBC first, as it should be fairly straightforward. I tend to think the STBC is probably a lot more foundational than rate control. That said, I am curious, what changes to the rate control are you planning / working on? I started adapting the minstrel algorithm for 11n (it's a rewrite, actually), but found out by testing that without modifications the general approach isn't really suitable for 11n yet. So I'm playing
Re: [ath9k-devel] ath9k in wireless-testing won't work in AP mode
On 2010-02-03 1:08 AM, Luis R. Rodriguez wrote: We have reviewed this. The 64 value came from interoperability tests against another 802.11n device which had increased delayed BlockAcks when CTS-to-self was enabled. Although this is a higher value than what the standard says to use we recommend to just leave the value as-is and actually use the values from the initvals as the minimum possible value as those are the values that have been used for a large array of tests, including WMM interop tests. We cannot gaurantee proper functionality against other devices otherwise. Since the issues so far are obaserved on AR9160 and AR9220 (and not AR9280) and AR9271 (sujith) this might be a bus issue and the only way to zero in on the issue would be by getting full register dumps to ensure every other register related to ACK Timeout is programmed properly (AR_USEC_USEC I think is one) and taking it from there. Testing different values are welcomed but upstream we should just use what we have tested with until we do WMM interop tests with different values and not sure if we'll be doing that for a while. So how should we handle ACK timeout for different coverage class values? That's my primary concern, since I wrote the patch to support that. Should I just send a patch that adds an offset of 45? (= 64us - 19us, based on the diff between calculated and initval) - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] ath9k in wireless-testing won't work in AP mode
On 2010-02-03 1:27 AM, Luis R. Rodriguez wrote: On Tue, Feb 2, 2010 at 4:18 PM, Felix Fietkau n...@openwrt.org wrote: On 2010-02-03 1:08 AM, Luis R. Rodriguez wrote: We have reviewed this. The 64 value came from interoperability tests against another 802.11n device which had increased delayed BlockAcks when CTS-to-self was enabled. Although this is a higher value than what the standard says to use we recommend to just leave the value as-is and actually use the values from the initvals as the minimum possible value as those are the values that have been used for a large array of tests, including WMM interop tests. We cannot gaurantee proper functionality against other devices otherwise. Since the issues so far are obaserved on AR9160 and AR9220 (and not AR9280) and AR9271 (sujith) this might be a bus issue and the only way to zero in on the issue would be by getting full register dumps to ensure every other register related to ACK Timeout is programmed properly (AR_USEC_USEC I think is one) and taking it from there. Testing different values are welcomed but upstream we should just use what we have tested with until we do WMM interop tests with different values and not sure if we'll be doing that for a while. So how should we handle ACK timeout for different coverage class values? That's my primary concern, since I wrote the patch to support that. Should I just send a patch that adds an offset of 45? (= 64us - 19us, based on the diff between calculated and initval) Well so what I meant is that we should ensure hardware is not programmed with an ACK/CTS Timeout value lower than what is on the initvals already. If changing the coverage class means a different ACK timeout is produced we just take the max of the two values. Taking the max doesn't make any sense to me if this is about working around delay in the transmission of BlockAcks. Since the coverage class is meant to compensate delay in the air propagation time, the ACK timeout should increase along with it, because along with increasing distance, the worst case delay of the BA of a distant node will get higher as well. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] ath9k in wireless-testing won't work in AP mode
On 2010-01-30 8:39 PM, Pavel Roskin wrote: On Sat, 2010-01-30 at 00:34 +0100, Felix Fietkau wrote: Please try this patch and see if it helps. Yes, it worked perfectly!!! I added some debug printks, and it turns out that ah-slottime is negative. The card is Ubiquiti SR71-12, 2 GHz only. phy1: Atheros AR9280 Rev:2 mem=0xc900107c, irq=19 mac80211: debugfs: failed to rename debugfs dir to netdev:wlan0 udev: renamed network interface wlan0 to wlan13 ah-slottime = -1, ah-coverage_class = 0, sifstime = 10 acktimeout = 9, conf-channel = a01c2520, conf-channel-band = 0 ah-slottime = -1, ah-coverage_class = 0, sifstime = 10 acktimeout = 9, conf-channel = a01c2520, conf-channel-band = 0 ah-slottime = -1, ah-coverage_class = 0, sifstime = 10 acktimeout = 9, conf-channel = a01c2520, conf-channel-band = 0 ah-slottime = -1, ah-coverage_class = 0, sifstime = 10 acktimeout = 9, conf-channel = a01c25d4, conf-channel-band = 0 ah-slottime = 9, ah-coverage_class = 0, sifstime = 10 acktimeout = 19, conf-channel = a01c25d4, conf-channel-band = 0 phy1: Allocated STA 00:19:e3:05:25:10 phy1: Inserted STA 00:19:e3:05:25:10 I see that ah-slottime is initialized to -1 in ath9k_hw_init_defaults(). Maybe we want a number that is large, but doesn't overflow? Or we could start with 54, which would give 64 in the first iteration. The workaround value of '64' is actually wrong. When I had trouble associating in 2.4 GHz in a case where the slot time was actually set correctly, I simply used it, because that's what was being set in the initvals. We shouldn't base the slot time on this though - the initvals don't do this either. The slottime == -1 thing is obviously a bug, and I'll send a patch to fix it later. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] ath9k in wireless-testing won't work in AP mode
On 2010-01-30 9:37 PM, Pavel Roskin wrote: On Sat, 2010-01-30 at 21:10 +0100, Felix Fietkau wrote: The workaround value of '64' is actually wrong. When I had trouble associating in 2.4 GHz in a case where the slot time was actually set correctly, I simply used it, because that's what was being set in the initvals. We shouldn't base the slot time on this though - the initvals don't do this either. I could reduce the minimal value of acktimeout from 64 to 20. With 20, I get a reliable communication every time. With 19, it's doesn't work ever. The distance between STA and AP is about 1 meter. The slottime == -1 thing is obviously a bug, and I'll send a patch to fix it later. It turns out that simply increasing the initial value to 54 is not enough, as ah-slottime will be set to 9 from sc-beacon.slottime, so acktimeout becomes 19. That's why my patch forces it to a minimum of 64 just before it gets set. But using the value of 10 instead of 9 for sc-beacon.slottime does the trick. That's the minimal patch that works for me. Perhaps we should be using ATH9K_SLOT_TIME_9 there (see mac.h) and define it to 10. That value would still be wrong, though. According to the 802.11 docs, the slot time for 2.4 GHz has to be set to 9 usec, not 10. If this is a bug in the hardware, we need to know about the implications before we pick a value that just happened to work in our tests, but may not work in all cases. slottime=9, acktimeout=64 is what the initvals use, so that's tested by Atheros. If we continue to receive no comments from Atheros on this issue, we should probably use those values by default. By the way, 5 GHz seems to be unaffected by this issue, which makes this a little suspicious. I've Cc'd Luis, maybe he can forward this to the right people. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] Significiant performance differences between ath5k and ath9k in 802.11a
On 2010-01-29 9:10 AM, Joerg Pommnitz wrote: --- rootki...@yahoo.it rootki...@yahoo.it wrote: Can you try in AP-client mode? I think you'll get more throughput so. No, IBSS is what I'm interested in. And the point is, that there is a 30% performance difference between ath5k (and Madwifi) and ath9k. Even if the performance in AP mode would be better, this does not change the problem in IBSS mode. I can confirm these numbers for AP/Client mode as well. I've even tested legacy communication between ath9k ap and sta, producing the same result. Additionally, I can rule out the rate control algorithm, as using minstrel instead of the ath9k RC produces a very similar throughput test result. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] ath9k in wireless-testing won't work in AP mode
On 2010-01-30 12:05 AM, Pavel Roskin wrote: Hello ath9k in wireless-testing won't work in AP mode. Stations fail to associate: # hostapd /etc/hostapd/wlan13.conf Configuration file: /etc/hostapd/wlan13.conf Using interface wlan13 with hwaddr 00:15:6d:84:1f:37 and ssid 'mike2' wlan13: STA 00:17:c4:3b:fc:88 IEEE 802.11: did not acknowledge authentication response I haven't used ath9k in AP mode for a long time if ever, so I don't know when it broke. Please try this patch and see if it helps. --- a/drivers/net/wireless/ath/ath9k/hw.c +++ b/drivers/net/wireless/ath/ath9k/hw.c @@ -1229,6 +1229,11 @@ void ath9k_hw_init_global_settings(struc /* As defined by IEEE 802.11-2007 17.3.8.6 */ slottime = ah-slottime + 3 * ah-coverage_class; acktimeout = slottime + sifstime; + + /* Workaround for a hw issue */ + if (conf-channel conf-channel-band == IEEE80211_BAND_2GHZ) + acktimeout = max(64, acktimeout); + ath9k_hw_setslottime(ah, slottime); ath9k_hw_set_ack_timeout(ah, acktimeout); ath9k_hw_set_cts_timeout(ah, acktimeout); ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] D-Link DWA-547 completely freezes system
On 2010-01-22 11:23 PM, Luis R. Rodriguez wrote: Adding Ben and Cliff just to keep them in the loop. Note: this e-mail is on a public mailing list. On Fri, Jan 22, 2010 at 02:17:43PM -0800, Pavel Roskin wrote: On Fri, 2010-01-22 at 19:26 +0100, Jorge Boncompte [DTI2] wrote: Hi Pavel, i had done some tests and did found that my SR71-12 does works if I change in hw.c::ath9k_hw_init_11a_eeprom_fix()... if ((ah-hw_version.devid == AR9280_DEVID_PCI) test_bit(ATH9K_MODE_11A, ah-caps.wireless_modes)) { to... if (ah-hw_version.devid == AR9280_DEVID_PCI) { Unfortunately, I don't know what that 'fix' fixes :-) You patch fixes Ubiquiti SR71-12 for me! And it looks quite reasonable. I'm only getting deadbeef from AR_IMR_S2 now, which is exactly what I'm getting with SR71-15. The right fix would be probably to find out when the fixup is actually needed. By the way, I'm not very fond of checking PCI ID in the driver when there are other ways to check for the chip revision and capabilities. I'm baffled, why would initializing some 5 Ghz stuff be required for a 2ghz only card. Unless hardware requires this.. we're still looking into this. Looking at the fixups more closely, I'd say that this is not actually 5 GHz specific stuff, but rather a bogus check that was added over time. When looking back at a much, much older version of ath9k (before it was merged to the wireless tree), it didn't have any 11a mode capability check before running the INI fixups. I think dropping the 11a capability check is the right fix in this case, as the code that gets called then fixes up a static preinitialized value from the initval with a value from the eeprom, and nothing in the fixup code indicates that it's supposed to be 11a specific. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] No probe response from AP after 500ms, disconnecting.
Hi Peter, On 2010-01-11 4:03 PM, Peter Stuge wrote: Since I never saw this behavior in exactly the same kernel with another mac80211 driver (ipw2200) the problem must be in the ath9k driver or in my AR5416 MAC/BB Rev:2 AR5133 RF Rev:81 hardware. ipw2200 is not a mac80211 driver. It probably handles reconnects internally inside the libipw stack, whereas mac80211 does not do this and leaves it to user space instead. The difference in behaviour thus does not necessarily rule out a problem in the RF environment. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] Enhancing ath9k for embedded systems
On 2009-12-15 2:18 PM, Ayee Goundan wrote: I apologize for the delay. Please use Sasi Subramaniam as your first point of contact for OpenWRT related questions, while cc'ing Senthil Balasubramanian. Sasi will coordinate appropriate help within the organization to provide as quick a response as possible. Hi Sasi, I've quoted my initial problem description below. Please let me know if you have any questions about my details of my test setup. Thanks, - Felix I'm running OpenWrt with our Linux 2.6.30 based kernel and ath9k from compat-wireless on two identical devices based on AR7240 and AR9283 (Python+Merlin). The devices are 'Rocket M5' by Ubiquiti Networks. One of them is configured to run as an AP, the other one is connected to it as STA. The main problem that I have been unable to track down so far is that whenever I run traffic through those two devices, I get very bad throughput (max. 60 mbit/s, mostly only 50 mbit/s). Rate control stats show that the rate in use is mostly 216 MBit/s, with a low packet error rate (fluctuating between 0 and 7%). Packet captures from an external device confirm this. Aggregation is definitely working properly, and the CPU load on both devices usually stays well below 50%. I also ran similar tests with ath9k on completely different hardware, with exactly the same issues. The other hardware combinations that I tested were Intel IXP4xx based boards with AR5416, AR9160 and AR9220 based cards. The kernel that I used on those boards was 2.6.28. I can rule out issues with the Kernel or the Ethernet driver, since a modified version of the Carrier LSDK driver is performing well in the same setup (I included the carrier driver in the same firmware build that I tested ath9k with). The weird part to this is that whenever I run the same set of tests with an older MacBook as STA against one of the ath9k devices as AP, I get more than 95 MBit/s in both directions (limited by the Ethernet connection speed). I also ran the same tests with a Ralink RT2870 device instead of the MacBook card, with equally good results. I've spent a considerable amount of time trying to track this issue down (including comparing register traces against a working driver), with no promising lead so far, so I'd appreciate any help from your side. ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] [PATCH] ath9k: enable 2GHz band only if the device supports it
Luis R. Rodriguez wrote: On Wed, Nov 25, 2009 at 12:29:15PM -0800, Gabor Juhos wrote: Currently, the 2GHz band is enabled unconditionally, even if the device does not support it. This is true, but we don't have any 11n 5 GHz only devices, The patch is fine too though but it'd be better if you fail with a proper ath_print ATH_DBG_FATAL message if neither band has been marked as supported. We do have some ;) Ubiquiti Networks has a few Python+Merlin based devices that are 5GHz only, and we're working on supporting them. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] [PATCH v2 16/21] mac80211: add helper for management / no-ack frame rate decision
Luis R. Rodriguez wrote: All current rate control algorithms agree to send management and no-ack frames at the lowest rate. They also agree to do this when sta and the private rate control data is NULL. We add a hlper to mac80211 for this and simplify the rate control algorithm code. Developers wishing to make enhancements to rate control algorithms are for broadcast/multicast can opt to not use this in their gate_rate() mac80211 callback. Cc: Zhu Yi yi@intel.com Cc: Reinette Chatre reinette.cha...@intel.com Cc: ipw3945-de...@lists.sourceforge.net Cc: Gabor Juhos juh...@openwrt.org Cc: Felix Fietkau n...@openwrt.org Cc: Derek Smithies de...@indranet.co.nz Cc: Chittajit Mitra chittajit.mi...@atheros.com Signed-off-by: Luis R. Rodriguez lrodrig...@atheros.com --- drivers/net/wireless/ath/ath9k/rc.c| 14 + drivers/net/wireless/iwlwifi/iwl-3945-rs.c | 13 ++- drivers/net/wireless/iwlwifi/iwl-agn-rs.c |7 +- include/net/mac80211.h | 23 ++ net/mac80211/rate.c| 29 net/mac80211/rc80211_minstrel.c| 22 + net/mac80211/rc80211_pid_algo.c| 11 +- 7 files changed, 59 insertions(+), 60 deletions(-) Acked-by: Felix Fietkau n...@openwrt.org ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel