Re: [ath9k-devel] [RFC] ath9k: improve aggregation throughput by using only first rate

2010-07-26 Thread Ranga Rao Ravuri


On 07/27/2010 01:11 AM, Felix Fietkau wrote:
> On 2010-07-26 9:23 PM, Björn Smedman wrote:
>> 2010/7/26 Felix Fietkau:
>>> On 2010-07-26 7:10 PM, Björn Smedman wrote:
 I think there are some (in theory) simple improvements that can be
 done to the tx aggregation / rate control logic. A proof of concept of
 one such improvement is provided below. Basically, it's a hack that
>>> I think it makes sense to rely less on on-chip MRR for fallback, but I
>>> think to make this workable, we really should use the MRR table for
>>> something, otherwise the rate control algorithm will take much longer to
>>> adapt.
>>> It's probably better to fix this properly after I'm done with my A-MPDU
>>> rewrite, because then I can more easily push parts of the software
>>> retransmission behaviour into minstrel_ht directly.
>> Sounds very reasonable. I'm sure you've thought of it but now that
>> it's fresh in my head it would be great if the new aggregation design
>> allowed us to experiment with stuff like this:
>>
>> * The rate control logic treats the average aggregate length as a
>> measured independent variable, when in fact it depends heavily on the
>> rates selected (via the 4 ms txop limit).
> Yes, with the new design maybe we could use the initial rate lookup only
> for setting the sampling flag, and then doing a separate per-AMPDU
> lookup, which properly takes the AMPDU length into account.
>
>> * When tx is aggregated most rate control probe frames end up inside
>> aggregates and are never used for probing (effective probe frequency
>> is divided by average aggregate length).
> Nope, a probing frame never ends up inside an aggregate. It's always
> sent out as a single frame, which is why I had to make the decision
> about sending a probing frame more complex in minstrel_ht, compared to
> minstrel - the previous 10% stuff was limiting aggregation size.
>
>> * When setting up a hardware MRR for an aggregate the focus should be
>> on throughput (as explained earlier in this thread). But there are
>> situations when reliability is important: e.g. when a subframe in the
>> aggregate is about to expire (because of time or block ack window). It
>> may even be advantageous to tx the subframes that are about to expire
>> in their own aggregate with lower / more reliable bitrate?
> Yes, that's what I was thinking as well. We should probably make this
> decision based on the number of sw-retransmitted frames, and maybe
> consider the offset of seqno vs baw_tail as well.
>
>> * In many busy radio environments the packet success rate depends very
>> much on the protection method being used (none, cts-to-self or
>> rts-cts), often more so than on the bitrate itself. It would be
>> interesting to experiment with including the protection method in the
>> rate selection, i.e. to probe for the optimal protection method and
>> bitrate combination.
> Sounds good.
>
>> * In order to have the best possible rate control in very dynamic rf
>> environments it's important to keep the hardware queue short and
>> select rates as late as possible (to not introduce unnecessary delay
>> when selecting new rates). I have no idea how to do this but it would
>> be great if the tx queue could be kept long enough to never stall tx,
>> but no longer.
> This would work with what I suggested above - per-AMPDU rate lookup.
> With software scheduling that's easy to do, since we already restrict
> the queue to max. 2 AMPDUs
>
>> * If I understand correctly the Atheros hardware does not adjust the
>> rts / cts-to-self duration field when going through the MRR
>> (correct?). In that case it may be even more advantageous to use
>> software retry as much as possible when some form of protection is
>> enabled.
> Not sure, but I think it does adjust the duration field according to the
> rate, while transmitting.
[ranga] Yes it does. If you enable RTS on all rates, you would see 
different RTSs coming with different duration.
>> Looking forward to the new aggregation code!
> That will still take some time, I recently came up with some better
> design ideas, which require some larger changes to the code that I
> already wrote.
>
> - Felix
> ___
> ath9k-devel mailing list
> ath9k-devel@lists.ath9k.org
> https://lists.ath9k.org/mailman/listinfo/ath9k-devel
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] Compilation Error

2010-07-26 Thread Luis R. Rodriguez
On Mon, Jul 26, 2010 at 05:54:34PM -0700, Rajan Giri wrote:
> Hi Everybody
> 
> I am tryng to compile compat-wireless-2.6.35-rc6.tar.bz2.But i get the 
> follwing errors.
> Please help me fixing the errors.where i have to make changes.
> I am sending the error list in attachment.

Please paste your results, I don't want to open open office to
see your errors! Jeesh.

  Luis
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


[ath9k-devel] Compilation Error

2010-07-26 Thread Rajan Giri
Hi Everybody

I am tryng to compile compat-wireless-2.6.35-rc6.tar.bz2.But i get the
follwing errors.
Please help me fixing the errors.where i have to make changes.
I am sending the error list in attachment.

Regards
SP


error.docx
Description: application/vnd.openxmlformats-officedocument.wordprocessingml.document
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] ath9k: performance regressions / tx semi-stuck somehow

2010-07-26 Thread Peter Stuge
Björn Smedman wrote:
> If I then switch the AP channel (from 1 to 11) performance is looking
> good again:

Does it stay consistent and associated for you over say a weekend in
an office building, or a work week, or a month?


//Peter
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] ath9k: performance regressions / tx semi-stuck somehow

2010-07-26 Thread Björn Smedman
2010/7/23 Björn Smedman :
> 2010/7/22 Felix Fietkau :
>> On 2010-07-22 12:17 AM, Björn Smedman wrote:
>>> I just tried out compat-wireless-2010-07-16 on AR913x (with
>>> openwrt/tr...@r22321) and saw some weird performance problems.
>>>
>> Could you please try if the earlier version that was in OpenWrt
>> (2010-07-06) has the same issues?

I had some trouble reproducing this but now I feel convinced that this
performance issue was caused by interference, although I think ath9k
could do a better job in difficult radio environments.

Note how downstream throughput goes from ~55 Mbps down to around 3
Mbps. Nothing is moved in this experiment.

bjorn-smedmans-macbook-2:~ bjornsmedman$ iperf -w 256K -s

Server listening on TCP port 5001
TCP window size:   256 KByte

[  4] local 192.168.78.119 port 5001 connected with 192.168.78.211 port 58060
[ ID] Interval   Transfer Bandwidth
[  4]  0.0-30.0 sec194 MBytes  54.1 Mbits/sec
[  4] local 192.168.78.119 port 5001 connected with 192.168.78.211 port 58926
[ ID] Interval   Transfer Bandwidth
[  4]  0.0-30.0 sec203 MBytes  56.6 Mbits/sec
[  4] local 192.168.78.119 port 5001 connected with 192.168.78.211 port 58990
[ ID] Interval   Transfer Bandwidth
[  4]  0.0-30.0 sec190 MBytes  53.2 Mbits/sec
[  4] local 192.168.78.119 port 5001 connected with 192.168.78.211 port 59236
[ ID] Interval   Transfer Bandwidth
[  4]  0.0-30.0 sec189 MBytes  52.8 Mbits/sec
[  4] local 192.168.78.119 port 5001 connected with 192.168.78.211 port 64327

[Some interference source probably comes in here]

[ ID] Interval   Transfer Bandwidth
[  4]  0.0-34.5 sec  5.64 MBytes  1.37 Mbits/sec
[  4] local 192.168.78.119 port 5001 connected with 192.168.78.211 port 64424
[ ID] Interval   Transfer Bandwidth
[  4]  0.0-30.3 sec  8.06 MBytes  2.23 Mbits/sec
[  4] local 192.168.78.119 port 5001 connected with 192.168.78.211 port 64625
[ ID] Interval   Transfer Bandwidth
[  4]  0.0-31.3 sec  13.6 MBytes  3.64 Mbits/sec


If I then switch the AP channel (from 1 to 11) performance is looking
good again:

bjorn-smedmans-macbook-2:~ bjornsmedman$ iperf -w 256K -s

Server listening on TCP port 5001
TCP window size:   256 KByte

[  4] local 192.168.78.119 port 5001 connected with 192.168.78.211 port 0
[ ID] Interval   Transfer Bandwidth
[  4]  0.0-120.0 sec621 MBytes  43.4 Mbits/sec
[  4] local 192.168.78.119 port 5001 connected with 192.168.78.211 port 55562
[ ID] Interval   Transfer Bandwidth
[  4]  0.0-120.0 sec764 MBytes  53.4 Mbits/sec
[  4] local 192.168.78.119 port 5001 connected with 192.168.78.211 port 55568
[ ID] Interval   Transfer Bandwidth
[  4]  0.0-10.0 sec  68.1 MBytes  56.9 Mbits/sec


/Björn
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] [RFC] ath9k: improve aggregation throughput by using only first rate

2010-07-26 Thread Felix Fietkau
On 2010-07-26 10:37 PM, Björn Smedman wrote:
> 2010/7/26 Felix Fietkau :
>> On 2010-07-26 9:23 PM, Björn Smedman wrote:
>>> 2010/7/26 Felix Fietkau :
>>> * When tx is aggregated most rate control probe frames end up inside
>>> aggregates and are never used for probing (effective probe frequency
>>> is divided by average aggregate length).
>> Nope, a probing frame never ends up inside an aggregate. It's always
>> sent out as a single frame, which is why I had to make the decision
>> about sending a probing frame more complex in minstrel_ht, compared to
>> minstrel - the previous 10% stuff was limiting aggregation size.
> 
> Ok, I must have jumped to conclusions. I looked quickly at the code
> and had the impression that it only cared about the RATE_PROBE flag if
> it was on the first subframe of the aggregate, and then I compared
> debug output from rc and xmit like this:
Oh, wait. It seems that you may be right after all. I think I was
remembering stuff from the wrong codebase again Well, at least what I
described is what I think the code should be doing ;)

- Felix
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] [RFC] ath9k: improve aggregation throughput by using only first rate

2010-07-26 Thread Björn Smedman
2010/7/26 Felix Fietkau :
> On 2010-07-26 9:23 PM, Björn Smedman wrote:
>> 2010/7/26 Felix Fietkau :
>> * When tx is aggregated most rate control probe frames end up inside
>> aggregates and are never used for probing (effective probe frequency
>> is divided by average aggregate length).
> Nope, a probing frame never ends up inside an aggregate. It's always
> sent out as a single frame, which is why I had to make the decision
> about sending a probing frame more complex in minstrel_ht, compared to
> minstrel - the previous 10% stuff was limiting aggregation size.

Ok, I must have jumped to conclusions. I looked quickly at the code
and had the impression that it only cared about the RATE_PROBE flag if
it was on the first subframe of the aggregate, and then I compared
debug output from rc and xmit like this:

r...@openwrt:/sys/kernel/debug# cat
ieee80211/phy0/stations/00\:1e\:52\:c7\:cf\:63/rc_stats ; ca
t ath9k/phy0/xmit
type  rate throughput  ewma prob   this prob  this
succ/attempt   successattempts
HT20/LGIMCS05.8   87.3   50.0  0(  0)
   48  54
HT20/LGIMCS1   12.6   94.6  100.0  0(  0)
   46  48
HT20/LGIMCS2   18.9   95.8  100.0  0(  0)
   52  73
HT20/LGIMCS3   24.8   94.8  100.0  0(  0)
   53  62
HT20/LGIMCS4   38.4   99.2  100.0  0(  0)
   45  55
HT20/LGIMCS5   47.4   94.0  100.0  0(  0)
   56  72
HT20/LGIMCS6   55.4   98.7  100.0  0(  0)
   60  78
HT20/LGI   PMCS7   56.2   88.8   66.6  0(  0)
  112 143
HT20/LGIMCS8   10.8   81.4   50.0  0(  0)
   50  62
HT20/LGIMCS9   23.6   90.4  100.0  0(  0)
   66  81
HT20/LGIMCS10  30.6   79.0   50.0  0(  0)
   51  64
HT20/LGIMCS11  50.1   99.2  100.0  0(  0)
   56  63
HT20/LGIMCS12  60.1   80.6  100.0  0(  0)
  217 382
HT20/LGIMCS13  66.6   70.6   50.0  0(  0)
 24403042
HT20/LGI  t MCS14  82.9   77.9   65.9  0(  0)
70446   86949
HT20/LGI T  MCS15  85.5   73.5   77.1264(342)
31170   43240

Total packet count::ideal 117093  lookaround 1322
Average A-MPDU length: 10.6
BE BKVIVO

MPDUs Queued:  120  0 0   224
MPDUs Completed:   120  0 0   224
Aggregates:   7555  0 0 0
AMPDUs Queued:  118358  0 050
AMPDUs Completed:   118247  0 020
AMPDUs Retried:  15406  0 0   300
AMPDUs XRetried:21  0 030
FIFO Underrun:   0  0 0 0
TXOP Exceeded:   0  0 0 0
TXTIMER Expiry:  0  0 0 0
DESC CFG Error:  0  0 0 0
DATA Underrun:   0  0 0 0
DELIM Underrun:  0  0 0 0

Rate control says 1322 lookaround (=probe frames?) but ath9k xmit says
only 120 + 224 MPDUs.

/Björn
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] [RFC] ath9k: improve aggregation throughput by using only first rate

2010-07-26 Thread Felix Fietkau
On 2010-07-26 9:23 PM, Björn Smedman wrote:
> 2010/7/26 Felix Fietkau :
>> On 2010-07-26 7:10 PM, Björn Smedman wrote:
>>> I think there are some (in theory) simple improvements that can be
>>> done to the tx aggregation / rate control logic. A proof of concept of
>>> one such improvement is provided below. Basically, it's a hack that
>> I think it makes sense to rely less on on-chip MRR for fallback, but I
>> think to make this workable, we really should use the MRR table for
>> something, otherwise the rate control algorithm will take much longer to
>> adapt.
>> It's probably better to fix this properly after I'm done with my A-MPDU
>> rewrite, because then I can more easily push parts of the software
>> retransmission behaviour into minstrel_ht directly.
> 
> Sounds very reasonable. I'm sure you've thought of it but now that
> it's fresh in my head it would be great if the new aggregation design
> allowed us to experiment with stuff like this:
> 
> * The rate control logic treats the average aggregate length as a
> measured independent variable, when in fact it depends heavily on the
> rates selected (via the 4 ms txop limit).
Yes, with the new design maybe we could use the initial rate lookup only
for setting the sampling flag, and then doing a separate per-AMPDU
lookup, which properly takes the AMPDU length into account.

> * When tx is aggregated most rate control probe frames end up inside
> aggregates and are never used for probing (effective probe frequency
> is divided by average aggregate length).
Nope, a probing frame never ends up inside an aggregate. It's always
sent out as a single frame, which is why I had to make the decision
about sending a probing frame more complex in minstrel_ht, compared to
minstrel - the previous 10% stuff was limiting aggregation size.

> * When setting up a hardware MRR for an aggregate the focus should be
> on throughput (as explained earlier in this thread). But there are
> situations when reliability is important: e.g. when a subframe in the
> aggregate is about to expire (because of time or block ack window). It
> may even be advantageous to tx the subframes that are about to expire
> in their own aggregate with lower / more reliable bitrate?
Yes, that's what I was thinking as well. We should probably make this
decision based on the number of sw-retransmitted frames, and maybe
consider the offset of seqno vs baw_tail as well.

> * In many busy radio environments the packet success rate depends very
> much on the protection method being used (none, cts-to-self or
> rts-cts), often more so than on the bitrate itself. It would be
> interesting to experiment with including the protection method in the
> rate selection, i.e. to probe for the optimal protection method and
> bitrate combination.
Sounds good.

> * In order to have the best possible rate control in very dynamic rf
> environments it's important to keep the hardware queue short and
> select rates as late as possible (to not introduce unnecessary delay
> when selecting new rates). I have no idea how to do this but it would
> be great if the tx queue could be kept long enough to never stall tx,
> but no longer.
This would work with what I suggested above - per-AMPDU rate lookup.
With software scheduling that's easy to do, since we already restrict
the queue to max. 2 AMPDUs

> * If I understand correctly the Atheros hardware does not adjust the
> rts / cts-to-self duration field when going through the MRR
> (correct?). In that case it may be even more advantageous to use
> software retry as much as possible when some form of protection is
> enabled.
Not sure, but I think it does adjust the duration field according to the
rate, while transmitting.

> Looking forward to the new aggregation code!
That will still take some time, I recently came up with some better
design ideas, which require some larger changes to the code that I
already wrote.

- Felix
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] [RFC] ath9k: improve aggregation throughput by using only first rate

2010-07-26 Thread Björn Smedman
2010/7/26 Felix Fietkau :
> On 2010-07-26 7:10 PM, Björn Smedman wrote:
>> I think there are some (in theory) simple improvements that can be
>> done to the tx aggregation / rate control logic. A proof of concept of
>> one such improvement is provided below. Basically, it's a hack that
> I think it makes sense to rely less on on-chip MRR for fallback, but I
> think to make this workable, we really should use the MRR table for
> something, otherwise the rate control algorithm will take much longer to
> adapt.
> It's probably better to fix this properly after I'm done with my A-MPDU
> rewrite, because then I can more easily push parts of the software
> retransmission behaviour into minstrel_ht directly.

Sounds very reasonable. I'm sure you've thought of it but now that
it's fresh in my head it would be great if the new aggregation design
allowed us to experiment with stuff like this:

* The rate control logic treats the average aggregate length as a
measured independent variable, when in fact it depends heavily on the
rates selected (via the 4 ms txop limit).

* When tx is aggregated most rate control probe frames end up inside
aggregates and are never used for probing (effective probe frequency
is divided by average aggregate length).

* When setting up a hardware MRR for an aggregate the focus should be
on throughput (as explained earlier in this thread). But there are
situations when reliability is important: e.g. when a subframe in the
aggregate is about to expire (because of time or block ack window). It
may even be advantageous to tx the subframes that are about to expire
in their own aggregate with lower / more reliable bitrate?

* In many busy radio environments the packet success rate depends very
much on the protection method being used (none, cts-to-self or
rts-cts), often more so than on the bitrate itself. It would be
interesting to experiment with including the protection method in the
rate selection, i.e. to probe for the optimal protection method and
bitrate combination.

* In order to have the best possible rate control in very dynamic rf
environments it's important to keep the hardware queue short and
select rates as late as possible (to not introduce unnecessary delay
when selecting new rates). I have no idea how to do this but it would
be great if the tx queue could be kept long enough to never stall tx,
but no longer.

* If I understand correctly the Atheros hardware does not adjust the
rts / cts-to-self duration field when going through the MRR
(correct?). In that case it may be even more advantageous to use
software retry as much as possible when some form of protection is
enabled.

Looking forward to the new aggregation code!

/Björn
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


Re: [ath9k-devel] [RFC] ath9k: improve aggregation throughput by using only first rate

2010-07-26 Thread Felix Fietkau
On 2010-07-26 7:10 PM, Björn Smedman wrote:
> Hi all,
> 
> I've been running a lot of iperf on AR913x /
> compat-wireless-2010-07-16 (w/ openwrt/tr...@22388).
> 
> I think there are some (in theory) simple improvements that can be
> done to the tx aggregation / rate control logic. A proof of concept of
> one such improvement is provided below. Basically, it's a hack that
> makes ath9k output aggregates with only the first rate in the rate
> series. The reasoning is that a failure is not a problem for
> aggregates because there is software retry. Retrying in hardware at a
> slower rate is counter productive. So, better to fail and do a
> software retry at possibly another rate. Also, since the aggregate
> size is often limited by the slowest rate in the MRR series (4 ms txop
> limit) having a slow rate in the series may affect performance even if
> it is never used by the hardware.
> 
> In my (not so scientific) tests max AP downstream throughput increases
> about 30-40% with the patch below (from 33.9 to 55.7 Mbit/s with HT20
> in noisy environment with 20 meters and a few walls between AP and
> client).
> 
> Of course, if all rates in the series are high then this patch has no effect.
I think it makes sense to rely less on on-chip MRR for fallback, but I
think to make this workable, we really should use the MRR table for
something, otherwise the rate control algorithm will take much longer to
adapt.
It's probably better to fix this properly after I'm done with my A-MPDU
rewrite, because then I can more easily push parts of the software
retransmission behaviour into minstrel_ht directly.

- Felix
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel


[ath9k-devel] [RFC] ath9k: improve aggregation throughput by using only first rate

2010-07-26 Thread Björn Smedman
Hi all,

I've been running a lot of iperf on AR913x /
compat-wireless-2010-07-16 (w/ openwrt/tr...@22388).

I think there are some (in theory) simple improvements that can be
done to the tx aggregation / rate control logic. A proof of concept of
one such improvement is provided below. Basically, it's a hack that
makes ath9k output aggregates with only the first rate in the rate
series. The reasoning is that a failure is not a problem for
aggregates because there is software retry. Retrying in hardware at a
slower rate is counter productive. So, better to fail and do a
software retry at possibly another rate. Also, since the aggregate
size is often limited by the slowest rate in the MRR series (4 ms txop
limit) having a slow rate in the series may affect performance even if
it is never used by the hardware.

In my (not so scientific) tests max AP downstream throughput increases
about 30-40% with the patch below (from 33.9 to 55.7 Mbit/s with HT20
in noisy environment with 20 meters and a few walls between AP and
client).

Of course, if all rates in the series are high then this patch has no effect.

/Björn
---
diff -urpN a/drivers/net/wireless/ath/ath9k/xmit.c
b/drivers/net/wireless/ath/ath9k/xmit.c
--- a/drivers/net/wireless/ath/ath9k/xmit.c 2010-07-26 15:35:17.0 
+0200
+++ b/drivers/net/wireless/ath/ath9k/xmit.c 2010-07-26 17:11:33.0 
+0200
@@ -565,7 +565,7 @@ static u32 ath_lookup_rate(struct ath_so
 */
max_4ms_framelen = ATH_AMPDU_LIMIT_MAX;

-   for (i = 0; i < 4; i++) {
+   for (i = 0; i < 1; i++) {
if (rates[i].count) {
int modeidx;
if (!(rates[i].flags & IEEE80211_TX_RC_MCS)) {
@@ -1553,6 +1553,9 @@ static void ath_buf_set_rate(struct ath_
if (sc->sc_flags & SC_OP_PREAMBLE_SHORT)
ctsrate |= rate->hw_value_short;

+   if (bf_isaggr(bf))
+   rates[1].count = rates[2].count = rates[3].count = 0;
+
for (i = 0; i < 4; i++) {
bool is_40, is_sgi, is_sp;
int phy;
___
ath9k-devel mailing list
ath9k-devel@lists.ath9k.org
https://lists.ath9k.org/mailman/listinfo/ath9k-devel