Re: [ath9k-devel] [RFC] ath9k: improve aggregation throughput by using only first rate
On 07/27/2010 01:11 AM, Felix Fietkau wrote: > On 2010-07-26 9:23 PM, Björn Smedman wrote: >> 2010/7/26 Felix Fietkau: >>> On 2010-07-26 7:10 PM, Björn Smedman wrote: I think there are some (in theory) simple improvements that can be done to the tx aggregation / rate control logic. A proof of concept of one such improvement is provided below. Basically, it's a hack that >>> I think it makes sense to rely less on on-chip MRR for fallback, but I >>> think to make this workable, we really should use the MRR table for >>> something, otherwise the rate control algorithm will take much longer to >>> adapt. >>> It's probably better to fix this properly after I'm done with my A-MPDU >>> rewrite, because then I can more easily push parts of the software >>> retransmission behaviour into minstrel_ht directly. >> Sounds very reasonable. I'm sure you've thought of it but now that >> it's fresh in my head it would be great if the new aggregation design >> allowed us to experiment with stuff like this: >> >> * The rate control logic treats the average aggregate length as a >> measured independent variable, when in fact it depends heavily on the >> rates selected (via the 4 ms txop limit). > Yes, with the new design maybe we could use the initial rate lookup only > for setting the sampling flag, and then doing a separate per-AMPDU > lookup, which properly takes the AMPDU length into account. > >> * When tx is aggregated most rate control probe frames end up inside >> aggregates and are never used for probing (effective probe frequency >> is divided by average aggregate length). > Nope, a probing frame never ends up inside an aggregate. It's always > sent out as a single frame, which is why I had to make the decision > about sending a probing frame more complex in minstrel_ht, compared to > minstrel - the previous 10% stuff was limiting aggregation size. > >> * When setting up a hardware MRR for an aggregate the focus should be >> on throughput (as explained earlier in this thread). But there are >> situations when reliability is important: e.g. when a subframe in the >> aggregate is about to expire (because of time or block ack window). It >> may even be advantageous to tx the subframes that are about to expire >> in their own aggregate with lower / more reliable bitrate? > Yes, that's what I was thinking as well. We should probably make this > decision based on the number of sw-retransmitted frames, and maybe > consider the offset of seqno vs baw_tail as well. > >> * In many busy radio environments the packet success rate depends very >> much on the protection method being used (none, cts-to-self or >> rts-cts), often more so than on the bitrate itself. It would be >> interesting to experiment with including the protection method in the >> rate selection, i.e. to probe for the optimal protection method and >> bitrate combination. > Sounds good. > >> * In order to have the best possible rate control in very dynamic rf >> environments it's important to keep the hardware queue short and >> select rates as late as possible (to not introduce unnecessary delay >> when selecting new rates). I have no idea how to do this but it would >> be great if the tx queue could be kept long enough to never stall tx, >> but no longer. > This would work with what I suggested above - per-AMPDU rate lookup. > With software scheduling that's easy to do, since we already restrict > the queue to max. 2 AMPDUs > >> * If I understand correctly the Atheros hardware does not adjust the >> rts / cts-to-self duration field when going through the MRR >> (correct?). In that case it may be even more advantageous to use >> software retry as much as possible when some form of protection is >> enabled. > Not sure, but I think it does adjust the duration field according to the > rate, while transmitting. [ranga] Yes it does. If you enable RTS on all rates, you would see different RTSs coming with different duration. >> Looking forward to the new aggregation code! > That will still take some time, I recently came up with some better > design ideas, which require some larger changes to the code that I > already wrote. > > - Felix > ___ > ath9k-devel mailing list > ath9k-devel@lists.ath9k.org > https://lists.ath9k.org/mailman/listinfo/ath9k-devel ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] Compilation Error
On Mon, Jul 26, 2010 at 05:54:34PM -0700, Rajan Giri wrote: > Hi Everybody > > I am tryng to compile compat-wireless-2.6.35-rc6.tar.bz2.But i get the > follwing errors. > Please help me fixing the errors.where i have to make changes. > I am sending the error list in attachment. Please paste your results, I don't want to open open office to see your errors! Jeesh. Luis ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
[ath9k-devel] Compilation Error
Hi Everybody I am tryng to compile compat-wireless-2.6.35-rc6.tar.bz2.But i get the follwing errors. Please help me fixing the errors.where i have to make changes. I am sending the error list in attachment. Regards SP error.docx Description: application/vnd.openxmlformats-officedocument.wordprocessingml.document ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] ath9k: performance regressions / tx semi-stuck somehow
Björn Smedman wrote: > If I then switch the AP channel (from 1 to 11) performance is looking > good again: Does it stay consistent and associated for you over say a weekend in an office building, or a work week, or a month? //Peter ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] ath9k: performance regressions / tx semi-stuck somehow
2010/7/23 Björn Smedman : > 2010/7/22 Felix Fietkau : >> On 2010-07-22 12:17 AM, Björn Smedman wrote: >>> I just tried out compat-wireless-2010-07-16 on AR913x (with >>> openwrt/tr...@r22321) and saw some weird performance problems. >>> >> Could you please try if the earlier version that was in OpenWrt >> (2010-07-06) has the same issues? I had some trouble reproducing this but now I feel convinced that this performance issue was caused by interference, although I think ath9k could do a better job in difficult radio environments. Note how downstream throughput goes from ~55 Mbps down to around 3 Mbps. Nothing is moved in this experiment. bjorn-smedmans-macbook-2:~ bjornsmedman$ iperf -w 256K -s Server listening on TCP port 5001 TCP window size: 256 KByte [ 4] local 192.168.78.119 port 5001 connected with 192.168.78.211 port 58060 [ ID] Interval Transfer Bandwidth [ 4] 0.0-30.0 sec194 MBytes 54.1 Mbits/sec [ 4] local 192.168.78.119 port 5001 connected with 192.168.78.211 port 58926 [ ID] Interval Transfer Bandwidth [ 4] 0.0-30.0 sec203 MBytes 56.6 Mbits/sec [ 4] local 192.168.78.119 port 5001 connected with 192.168.78.211 port 58990 [ ID] Interval Transfer Bandwidth [ 4] 0.0-30.0 sec190 MBytes 53.2 Mbits/sec [ 4] local 192.168.78.119 port 5001 connected with 192.168.78.211 port 59236 [ ID] Interval Transfer Bandwidth [ 4] 0.0-30.0 sec189 MBytes 52.8 Mbits/sec [ 4] local 192.168.78.119 port 5001 connected with 192.168.78.211 port 64327 [Some interference source probably comes in here] [ ID] Interval Transfer Bandwidth [ 4] 0.0-34.5 sec 5.64 MBytes 1.37 Mbits/sec [ 4] local 192.168.78.119 port 5001 connected with 192.168.78.211 port 64424 [ ID] Interval Transfer Bandwidth [ 4] 0.0-30.3 sec 8.06 MBytes 2.23 Mbits/sec [ 4] local 192.168.78.119 port 5001 connected with 192.168.78.211 port 64625 [ ID] Interval Transfer Bandwidth [ 4] 0.0-31.3 sec 13.6 MBytes 3.64 Mbits/sec If I then switch the AP channel (from 1 to 11) performance is looking good again: bjorn-smedmans-macbook-2:~ bjornsmedman$ iperf -w 256K -s Server listening on TCP port 5001 TCP window size: 256 KByte [ 4] local 192.168.78.119 port 5001 connected with 192.168.78.211 port 0 [ ID] Interval Transfer Bandwidth [ 4] 0.0-120.0 sec621 MBytes 43.4 Mbits/sec [ 4] local 192.168.78.119 port 5001 connected with 192.168.78.211 port 55562 [ ID] Interval Transfer Bandwidth [ 4] 0.0-120.0 sec764 MBytes 53.4 Mbits/sec [ 4] local 192.168.78.119 port 5001 connected with 192.168.78.211 port 55568 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.0 sec 68.1 MBytes 56.9 Mbits/sec /Björn ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] [RFC] ath9k: improve aggregation throughput by using only first rate
On 2010-07-26 10:37 PM, Björn Smedman wrote: > 2010/7/26 Felix Fietkau : >> On 2010-07-26 9:23 PM, Björn Smedman wrote: >>> 2010/7/26 Felix Fietkau : >>> * When tx is aggregated most rate control probe frames end up inside >>> aggregates and are never used for probing (effective probe frequency >>> is divided by average aggregate length). >> Nope, a probing frame never ends up inside an aggregate. It's always >> sent out as a single frame, which is why I had to make the decision >> about sending a probing frame more complex in minstrel_ht, compared to >> minstrel - the previous 10% stuff was limiting aggregation size. > > Ok, I must have jumped to conclusions. I looked quickly at the code > and had the impression that it only cared about the RATE_PROBE flag if > it was on the first subframe of the aggregate, and then I compared > debug output from rc and xmit like this: Oh, wait. It seems that you may be right after all. I think I was remembering stuff from the wrong codebase again Well, at least what I described is what I think the code should be doing ;) - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] [RFC] ath9k: improve aggregation throughput by using only first rate
2010/7/26 Felix Fietkau : > On 2010-07-26 9:23 PM, Björn Smedman wrote: >> 2010/7/26 Felix Fietkau : >> * When tx is aggregated most rate control probe frames end up inside >> aggregates and are never used for probing (effective probe frequency >> is divided by average aggregate length). > Nope, a probing frame never ends up inside an aggregate. It's always > sent out as a single frame, which is why I had to make the decision > about sending a probing frame more complex in minstrel_ht, compared to > minstrel - the previous 10% stuff was limiting aggregation size. Ok, I must have jumped to conclusions. I looked quickly at the code and had the impression that it only cared about the RATE_PROBE flag if it was on the first subframe of the aggregate, and then I compared debug output from rc and xmit like this: r...@openwrt:/sys/kernel/debug# cat ieee80211/phy0/stations/00\:1e\:52\:c7\:cf\:63/rc_stats ; ca t ath9k/phy0/xmit type rate throughput ewma prob this prob this succ/attempt successattempts HT20/LGIMCS05.8 87.3 50.0 0( 0) 48 54 HT20/LGIMCS1 12.6 94.6 100.0 0( 0) 46 48 HT20/LGIMCS2 18.9 95.8 100.0 0( 0) 52 73 HT20/LGIMCS3 24.8 94.8 100.0 0( 0) 53 62 HT20/LGIMCS4 38.4 99.2 100.0 0( 0) 45 55 HT20/LGIMCS5 47.4 94.0 100.0 0( 0) 56 72 HT20/LGIMCS6 55.4 98.7 100.0 0( 0) 60 78 HT20/LGI PMCS7 56.2 88.8 66.6 0( 0) 112 143 HT20/LGIMCS8 10.8 81.4 50.0 0( 0) 50 62 HT20/LGIMCS9 23.6 90.4 100.0 0( 0) 66 81 HT20/LGIMCS10 30.6 79.0 50.0 0( 0) 51 64 HT20/LGIMCS11 50.1 99.2 100.0 0( 0) 56 63 HT20/LGIMCS12 60.1 80.6 100.0 0( 0) 217 382 HT20/LGIMCS13 66.6 70.6 50.0 0( 0) 24403042 HT20/LGI t MCS14 82.9 77.9 65.9 0( 0) 70446 86949 HT20/LGI T MCS15 85.5 73.5 77.1264(342) 31170 43240 Total packet count::ideal 117093 lookaround 1322 Average A-MPDU length: 10.6 BE BKVIVO MPDUs Queued: 120 0 0 224 MPDUs Completed: 120 0 0 224 Aggregates: 7555 0 0 0 AMPDUs Queued: 118358 0 050 AMPDUs Completed: 118247 0 020 AMPDUs Retried: 15406 0 0 300 AMPDUs XRetried:21 0 030 FIFO Underrun: 0 0 0 0 TXOP Exceeded: 0 0 0 0 TXTIMER Expiry: 0 0 0 0 DESC CFG Error: 0 0 0 0 DATA Underrun: 0 0 0 0 DELIM Underrun: 0 0 0 0 Rate control says 1322 lookaround (=probe frames?) but ath9k xmit says only 120 + 224 MPDUs. /Björn ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] [RFC] ath9k: improve aggregation throughput by using only first rate
On 2010-07-26 9:23 PM, Björn Smedman wrote: > 2010/7/26 Felix Fietkau : >> On 2010-07-26 7:10 PM, Björn Smedman wrote: >>> I think there are some (in theory) simple improvements that can be >>> done to the tx aggregation / rate control logic. A proof of concept of >>> one such improvement is provided below. Basically, it's a hack that >> I think it makes sense to rely less on on-chip MRR for fallback, but I >> think to make this workable, we really should use the MRR table for >> something, otherwise the rate control algorithm will take much longer to >> adapt. >> It's probably better to fix this properly after I'm done with my A-MPDU >> rewrite, because then I can more easily push parts of the software >> retransmission behaviour into minstrel_ht directly. > > Sounds very reasonable. I'm sure you've thought of it but now that > it's fresh in my head it would be great if the new aggregation design > allowed us to experiment with stuff like this: > > * The rate control logic treats the average aggregate length as a > measured independent variable, when in fact it depends heavily on the > rates selected (via the 4 ms txop limit). Yes, with the new design maybe we could use the initial rate lookup only for setting the sampling flag, and then doing a separate per-AMPDU lookup, which properly takes the AMPDU length into account. > * When tx is aggregated most rate control probe frames end up inside > aggregates and are never used for probing (effective probe frequency > is divided by average aggregate length). Nope, a probing frame never ends up inside an aggregate. It's always sent out as a single frame, which is why I had to make the decision about sending a probing frame more complex in minstrel_ht, compared to minstrel - the previous 10% stuff was limiting aggregation size. > * When setting up a hardware MRR for an aggregate the focus should be > on throughput (as explained earlier in this thread). But there are > situations when reliability is important: e.g. when a subframe in the > aggregate is about to expire (because of time or block ack window). It > may even be advantageous to tx the subframes that are about to expire > in their own aggregate with lower / more reliable bitrate? Yes, that's what I was thinking as well. We should probably make this decision based on the number of sw-retransmitted frames, and maybe consider the offset of seqno vs baw_tail as well. > * In many busy radio environments the packet success rate depends very > much on the protection method being used (none, cts-to-self or > rts-cts), often more so than on the bitrate itself. It would be > interesting to experiment with including the protection method in the > rate selection, i.e. to probe for the optimal protection method and > bitrate combination. Sounds good. > * In order to have the best possible rate control in very dynamic rf > environments it's important to keep the hardware queue short and > select rates as late as possible (to not introduce unnecessary delay > when selecting new rates). I have no idea how to do this but it would > be great if the tx queue could be kept long enough to never stall tx, > but no longer. This would work with what I suggested above - per-AMPDU rate lookup. With software scheduling that's easy to do, since we already restrict the queue to max. 2 AMPDUs > * If I understand correctly the Atheros hardware does not adjust the > rts / cts-to-self duration field when going through the MRR > (correct?). In that case it may be even more advantageous to use > software retry as much as possible when some form of protection is > enabled. Not sure, but I think it does adjust the duration field according to the rate, while transmitting. > Looking forward to the new aggregation code! That will still take some time, I recently came up with some better design ideas, which require some larger changes to the code that I already wrote. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] [RFC] ath9k: improve aggregation throughput by using only first rate
2010/7/26 Felix Fietkau : > On 2010-07-26 7:10 PM, Björn Smedman wrote: >> I think there are some (in theory) simple improvements that can be >> done to the tx aggregation / rate control logic. A proof of concept of >> one such improvement is provided below. Basically, it's a hack that > I think it makes sense to rely less on on-chip MRR for fallback, but I > think to make this workable, we really should use the MRR table for > something, otherwise the rate control algorithm will take much longer to > adapt. > It's probably better to fix this properly after I'm done with my A-MPDU > rewrite, because then I can more easily push parts of the software > retransmission behaviour into minstrel_ht directly. Sounds very reasonable. I'm sure you've thought of it but now that it's fresh in my head it would be great if the new aggregation design allowed us to experiment with stuff like this: * The rate control logic treats the average aggregate length as a measured independent variable, when in fact it depends heavily on the rates selected (via the 4 ms txop limit). * When tx is aggregated most rate control probe frames end up inside aggregates and are never used for probing (effective probe frequency is divided by average aggregate length). * When setting up a hardware MRR for an aggregate the focus should be on throughput (as explained earlier in this thread). But there are situations when reliability is important: e.g. when a subframe in the aggregate is about to expire (because of time or block ack window). It may even be advantageous to tx the subframes that are about to expire in their own aggregate with lower / more reliable bitrate? * In many busy radio environments the packet success rate depends very much on the protection method being used (none, cts-to-self or rts-cts), often more so than on the bitrate itself. It would be interesting to experiment with including the protection method in the rate selection, i.e. to probe for the optimal protection method and bitrate combination. * In order to have the best possible rate control in very dynamic rf environments it's important to keep the hardware queue short and select rates as late as possible (to not introduce unnecessary delay when selecting new rates). I have no idea how to do this but it would be great if the tx queue could be kept long enough to never stall tx, but no longer. * If I understand correctly the Atheros hardware does not adjust the rts / cts-to-self duration field when going through the MRR (correct?). In that case it may be even more advantageous to use software retry as much as possible when some form of protection is enabled. Looking forward to the new aggregation code! /Björn ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
Re: [ath9k-devel] [RFC] ath9k: improve aggregation throughput by using only first rate
On 2010-07-26 7:10 PM, Björn Smedman wrote: > Hi all, > > I've been running a lot of iperf on AR913x / > compat-wireless-2010-07-16 (w/ openwrt/tr...@22388). > > I think there are some (in theory) simple improvements that can be > done to the tx aggregation / rate control logic. A proof of concept of > one such improvement is provided below. Basically, it's a hack that > makes ath9k output aggregates with only the first rate in the rate > series. The reasoning is that a failure is not a problem for > aggregates because there is software retry. Retrying in hardware at a > slower rate is counter productive. So, better to fail and do a > software retry at possibly another rate. Also, since the aggregate > size is often limited by the slowest rate in the MRR series (4 ms txop > limit) having a slow rate in the series may affect performance even if > it is never used by the hardware. > > In my (not so scientific) tests max AP downstream throughput increases > about 30-40% with the patch below (from 33.9 to 55.7 Mbit/s with HT20 > in noisy environment with 20 meters and a few walls between AP and > client). > > Of course, if all rates in the series are high then this patch has no effect. I think it makes sense to rely less on on-chip MRR for fallback, but I think to make this workable, we really should use the MRR table for something, otherwise the rate control algorithm will take much longer to adapt. It's probably better to fix this properly after I'm done with my A-MPDU rewrite, because then I can more easily push parts of the software retransmission behaviour into minstrel_ht directly. - Felix ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel
[ath9k-devel] [RFC] ath9k: improve aggregation throughput by using only first rate
Hi all, I've been running a lot of iperf on AR913x / compat-wireless-2010-07-16 (w/ openwrt/tr...@22388). I think there are some (in theory) simple improvements that can be done to the tx aggregation / rate control logic. A proof of concept of one such improvement is provided below. Basically, it's a hack that makes ath9k output aggregates with only the first rate in the rate series. The reasoning is that a failure is not a problem for aggregates because there is software retry. Retrying in hardware at a slower rate is counter productive. So, better to fail and do a software retry at possibly another rate. Also, since the aggregate size is often limited by the slowest rate in the MRR series (4 ms txop limit) having a slow rate in the series may affect performance even if it is never used by the hardware. In my (not so scientific) tests max AP downstream throughput increases about 30-40% with the patch below (from 33.9 to 55.7 Mbit/s with HT20 in noisy environment with 20 meters and a few walls between AP and client). Of course, if all rates in the series are high then this patch has no effect. /Björn --- diff -urpN a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c --- a/drivers/net/wireless/ath/ath9k/xmit.c 2010-07-26 15:35:17.0 +0200 +++ b/drivers/net/wireless/ath/ath9k/xmit.c 2010-07-26 17:11:33.0 +0200 @@ -565,7 +565,7 @@ static u32 ath_lookup_rate(struct ath_so */ max_4ms_framelen = ATH_AMPDU_LIMIT_MAX; - for (i = 0; i < 4; i++) { + for (i = 0; i < 1; i++) { if (rates[i].count) { int modeidx; if (!(rates[i].flags & IEEE80211_TX_RC_MCS)) { @@ -1553,6 +1553,9 @@ static void ath_buf_set_rate(struct ath_ if (sc->sc_flags & SC_OP_PREAMBLE_SHORT) ctsrate |= rate->hw_value_short; + if (bf_isaggr(bf)) + rates[1].count = rates[2].count = rates[3].count = 0; + for (i = 0; i < 4; i++) { bool is_40, is_sgi, is_sp; int phy; ___ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel