Re: [ath9k-devel] [RFC] ath9k: improve aggregation throughput by using only first rate

2010-07-26 Thread Felix Fietkau
On 2010-07-26 7:10 PM, Björn Smedman wrote:
 Hi all,
 
 I've been running a lot of iperf on AR913x /
 compat-wireless-2010-07-16 (w/ openwrt/tr...@22388).
 
 I think there are some (in theory) simple improvements that can be
 done to the tx aggregation / rate control logic. A proof of concept of
 one such improvement is provided below. Basically, it's a hack that
 makes ath9k output aggregates with only the first rate in the rate
 series. The reasoning is that a failure is not a problem for
 aggregates because there is software retry. Retrying in hardware at a
 slower rate is counter productive. So, better to fail and do a
 software retry at possibly another rate. Also, since the aggregate
 size is often limited by the slowest rate in the MRR series (4 ms txop
 limit) having a slow rate in the series may affect performance even if
 it is never used by the hardware.
 
 In my (not so scientific) tests max AP downstream throughput increases
 about 30-40% with the patch below (from 33.9 to 55.7 Mbit/s with HT20
 in noisy environment with 20 meters and a few walls between AP and
 client).
 
 Of course, if all rates in the series are high then this patch has no effect.
I think it makes sense to rely less on on-chip MRR for fallback, but I
think to make this workable, we really should use the MRR table for
something, otherwise the rate control algorithm will take much longer to
adapt.
It's probably better to fix this properly after I'm done with my A-MPDU
rewrite, because then I can more easily push parts of the software
retransmission behaviour into minstrel_ht directly.

- Felix
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] [RFC] ath9k: improve aggregation throughput by using only first rate

2010-07-26 Thread Björn Smedman
2010/7/26 Felix Fietkau n...@openwrt.org:
 On 2010-07-26 9:23 PM, Björn Smedman wrote:
 2010/7/26 Felix Fietkau n...@openwrt.org:
 * When tx is aggregated most rate control probe frames end up inside
 aggregates and are never used for probing (effective probe frequency
 is divided by average aggregate length).
 Nope, a probing frame never ends up inside an aggregate. It's always
 sent out as a single frame, which is why I had to make the decision
 about sending a probing frame more complex in minstrel_ht, compared to
 minstrel - the previous 10% stuff was limiting aggregation size.

Ok, I must have jumped to conclusions. I looked quickly at the code
and had the impression that it only cared about the RATE_PROBE flag if
it was on the first subframe of the aggregate, and then I compared
debug output from rc and xmit like this:

r...@openwrt:/sys/kernel/debug# cat
ieee80211/phy0/stations/00\:1e\:52\:c7\:cf\:63/rc_stats ; ca
t ath9k/phy0/xmit
type  rate throughput  ewma prob   this prob  this
succ/attempt   successattempts
HT20/LGIMCS05.8   87.3   50.0  0(  0)
   48  54
HT20/LGIMCS1   12.6   94.6  100.0  0(  0)
   46  48
HT20/LGIMCS2   18.9   95.8  100.0  0(  0)
   52  73
HT20/LGIMCS3   24.8   94.8  100.0  0(  0)
   53  62
HT20/LGIMCS4   38.4   99.2  100.0  0(  0)
   45  55
HT20/LGIMCS5   47.4   94.0  100.0  0(  0)
   56  72
HT20/LGIMCS6   55.4   98.7  100.0  0(  0)
   60  78
HT20/LGI   PMCS7   56.2   88.8   66.6  0(  0)
  112 143
HT20/LGIMCS8   10.8   81.4   50.0  0(  0)
   50  62
HT20/LGIMCS9   23.6   90.4  100.0  0(  0)
   66  81
HT20/LGIMCS10  30.6   79.0   50.0  0(  0)
   51  64
HT20/LGIMCS11  50.1   99.2  100.0  0(  0)
   56  63
HT20/LGIMCS12  60.1   80.6  100.0  0(  0)
  217 382
HT20/LGIMCS13  66.6   70.6   50.0  0(  0)
 24403042
HT20/LGI  t MCS14  82.9   77.9   65.9  0(  0)
70446   86949
HT20/LGI T  MCS15  85.5   73.5   77.1264(342)
31170   43240

Total packet count::ideal 117093  lookaround 1322
Average A-MPDU length: 10.6
BE BKVIVO

MPDUs Queued:  120  0 0   224
MPDUs Completed:   120  0 0   224
Aggregates:   7555  0 0 0
AMPDUs Queued:  118358  0 050
AMPDUs Completed:   118247  0 020
AMPDUs Retried:  15406  0 0   300
AMPDUs XRetried:21  0 030
FIFO Underrun:   0  0 0 0
TXOP Exceeded:   0  0 0 0
TXTIMER Expiry:  0  0 0 0
DESC CFG Error:  0  0 0 0
DATA Underrun:   0  0 0 0
DELIM Underrun:  0  0 0 0

Rate control says 1322 lookaround (=probe frames?) but ath9k xmit says
only 120 + 224 MPDUs.

/Björn
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] [RFC] ath9k: improve aggregation throughput by using only first rate

2010-07-26 Thread Ranga Rao Ravuri


On 07/27/2010 01:11 AM, Felix Fietkau wrote:
 On 2010-07-26 9:23 PM, Björn Smedman wrote:
 2010/7/26 Felix Fietkaun...@openwrt.org:
 On 2010-07-26 7:10 PM, Björn Smedman wrote:
 I think there are some (in theory) simple improvements that can be
 done to the tx aggregation / rate control logic. A proof of concept of
 one such improvement is provided below. Basically, it's a hack that
 I think it makes sense to rely less on on-chip MRR for fallback, but I
 think to make this workable, we really should use the MRR table for
 something, otherwise the rate control algorithm will take much longer to
 adapt.
 It's probably better to fix this properly after I'm done with my A-MPDU
 rewrite, because then I can more easily push parts of the software
 retransmission behaviour into minstrel_ht directly.
 Sounds very reasonable. I'm sure you've thought of it but now that
 it's fresh in my head it would be great if the new aggregation design
 allowed us to experiment with stuff like this:

 * The rate control logic treats the average aggregate length as a
 measured independent variable, when in fact it depends heavily on the
 rates selected (via the 4 ms txop limit).
 Yes, with the new design maybe we could use the initial rate lookup only
 for setting the sampling flag, and then doing a separate per-AMPDU
 lookup, which properly takes the AMPDU length into account.

 * When tx is aggregated most rate control probe frames end up inside
 aggregates and are never used for probing (effective probe frequency
 is divided by average aggregate length).
 Nope, a probing frame never ends up inside an aggregate. It's always
 sent out as a single frame, which is why I had to make the decision
 about sending a probing frame more complex in minstrel_ht, compared to
 minstrel - the previous 10% stuff was limiting aggregation size.

 * When setting up a hardware MRR for an aggregate the focus should be
 on throughput (as explained earlier in this thread). But there are
 situations when reliability is important: e.g. when a subframe in the
 aggregate is about to expire (because of time or block ack window). It
 may even be advantageous to tx the subframes that are about to expire
 in their own aggregate with lower / more reliable bitrate?
 Yes, that's what I was thinking as well. We should probably make this
 decision based on the number of sw-retransmitted frames, and maybe
 consider the offset of seqno vs baw_tail as well.

 * In many busy radio environments the packet success rate depends very
 much on the protection method being used (none, cts-to-self or
 rts-cts), often more so than on the bitrate itself. It would be
 interesting to experiment with including the protection method in the
 rate selection, i.e. to probe for the optimal protection method and
 bitrate combination.
 Sounds good.

 * In order to have the best possible rate control in very dynamic rf
 environments it's important to keep the hardware queue short and
 select rates as late as possible (to not introduce unnecessary delay
 when selecting new rates). I have no idea how to do this but it would
 be great if the tx queue could be kept long enough to never stall tx,
 but no longer.
 This would work with what I suggested above - per-AMPDU rate lookup.
 With software scheduling that's easy to do, since we already restrict
 the queue to max. 2 AMPDUs

 * If I understand correctly the Atheros hardware does not adjust the
 rts / cts-to-self duration field when going through the MRR
 (correct?). In that case it may be even more advantageous to use
 software retry as much as possible when some form of protection is
 enabled.
 Not sure, but I think it does adjust the duration field according to the
 rate, while transmitting.
[ranga] Yes it does. If you enable RTS on all rates, you would see 
different RTSs coming with different duration.
 Looking forward to the new aggregation code!
 That will still take some time, I recently came up with some better
 design ideas, which require some larger changes to the code that I
 already wrote.

 - Felix
 ___
 ath9k-devel mailing list
 ath9k-devel@lists.ath9k.org
 https://lists.ath9k.org/mailman/listinfo/ath9k-devel
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel