Re: FW: [PATCH] ath10k: fix return value check in wake_tx_q op

2019-02-24 Thread Yibo Zhao

在 2019-02-07 22:25,Kalle Valo 写道:

Yibo Zhao  writes:


We have met performance issue on our two-core system after applying
your patch. In WDS mode, we found that the peak throughput in TCP-DL
and UDP-DL dropped more than 10% compared with previous one. And in
some cases, though throughput stays the same, one CPU usage rises
about 20% which leads to 10% in total CPU usage. With your change, I
think driver will try its best to push as many packets as it can.
During this time, the driver's queue lock will be held for too much
time in one CPU and as a result, the other CPU will be blocked if it
wants to acquired the same lock. Working in this way seems not
efficiency.

So I think it is better to revert the change till we come up with a
new solution.


I don't think reverting is a clear option at this stage because that
again creates problems for SDIO. IIRC without this patch SDIO was
sending one packet a time (or something like that, can't remember all
the details right now).



Below is the aqm result after 1 min test with Erik's patch.

target 1us interval 9us ecn yes
tid ac backlog-bytes backlog-packets new-flows drops marks overlimit
collisions tx-bytes tx-packets flags
0 2 0 0 5342229 0 0 0 0 3867657029 5342229 0x0(RUN)
1 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
2 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
3 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
4 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
5 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
6 0 0 0 2 0 0 0 0 144 2 0x0(RUN)
7 0 0 0 2 0 0 0 0 282 2 0x0(RUN)
8 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
9 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
10 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
11 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
12 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
13 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
14 0 0 0 0 0 0 0 0 0 0 0x0(RUN)
15 0 0 0 0 0 0 0 0 0 0 0x0(RUN)

we see no difference between tx-packets and new-flows.
Whereas if we revert the patch, we get:

target 1us interval 9us ecn yes
tid ac backlog-bytes backlog-packets new-flows drops marks overlimit
collisions tx-bytes tx-packets flags
0 2 0 0 2233059 3 0 9236 12 1159661783 6380867 0x0(RUN)
1 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
2 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
3 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
4 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
5 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
6 0 0 0 1 0 0 0 0 144 2 0x0(RUN)
7 0 0 0 1 0 0 0 0 282 2 0x0(RUN)
8 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
9 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
10 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
11 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
12 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
13 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
14 0 0 0 0 0 0 0 0 0 0 0x0(RUN)
15 0 0 0 0 0 0 0 0 0 0 0x0(RUN)

new-flows are roughly one-third of the total tx-packets.

IMHO, with Erik's change, Erik's change has changed the way fq's 
schedule behavior and it looks like there is no other packets in the fq 
after a packet has been dequeued. And as a result, this flow's deficit 
will be refill and then removed from fq list at once in the same CPU. 
And during this time, the other CPU could be blocked. When new packet 
comes, same thing happens. So we get equal new flows and tx-packets.


Things would be different without Erik's change. After a packet has been 
dequeued, this flow's deficit will not be refill immediately in
CPU0. It is possible that the deficit to be refilled in CPU1 while at 
the same time CPU0 can fetch data from ethernet. So we can see more 
tx-packets delivered to FW from aqm.



Why does this happen only WDS mode? Did you test other modes, like AP 
or

client mode?
AP mode has same issue. UDP throughput drops more than 10%. As for TCP, 
CPU usage rising a lot although throughput stays similiar.

So, I'd say Erik's change does not work for us.


--
Yibo

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: FW: [PATCH] ath10k: fix return value check in wake_tx_q op

2019-03-03 Thread Yibo Zhao

在 2019-02-25 12:40,Yibo Zhao 写道:

在 2019-02-07 22:25,Kalle Valo 写道:

Yibo Zhao  writes:


We have met performance issue on our two-core system after applying
your patch. In WDS mode, we found that the peak throughput in TCP-DL
and UDP-DL dropped more than 10% compared with previous one. And in
some cases, though throughput stays the same, one CPU usage rises
about 20% which leads to 10% in total CPU usage. With your change, I
think driver will try its best to push as many packets as it can.
During this time, the driver's queue lock will be held for too much
time in one CPU and as a result, the other CPU will be blocked if it
wants to acquired the same lock. Working in this way seems not
efficiency.

So I think it is better to revert the change till we come up with a
new solution.


I don't think reverting is a clear option at this stage because that
again creates problems for SDIO. IIRC without this patch SDIO was
sending one packet a time (or something like that, can't remember all
the details right now).



Below is the aqm result after 1 min test with Erik's patch.

target 1us interval 9us ecn yes
tid ac backlog-bytes backlog-packets new-flows drops marks overlimit
collisions tx-bytes tx-packets flags
0 2 0 0 5342229 0 0 0 0 3867657029 5342229 0x0(RUN)
1 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
2 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
3 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
4 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
5 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
6 0 0 0 2 0 0 0 0 144 2 0x0(RUN)
7 0 0 0 2 0 0 0 0 282 2 0x0(RUN)
8 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
9 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
10 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
11 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
12 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
13 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
14 0 0 0 0 0 0 0 0 0 0 0x0(RUN)
15 0 0 0 0 0 0 0 0 0 0 0x0(RUN)

we see no difference between tx-packets and new-flows.
Whereas if we revert the patch, we get:

target 1us interval 9us ecn yes
tid ac backlog-bytes backlog-packets new-flows drops marks overlimit
collisions tx-bytes tx-packets flags
0 2 0 0 2233059 3 0 9236 12 1159661783 6380867 0x0(RUN)
1 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
2 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
3 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
4 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
5 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
6 0 0 0 1 0 0 0 0 144 2 0x0(RUN)
7 0 0 0 1 0 0 0 0 282 2 0x0(RUN)
8 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
9 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
10 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
11 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
12 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
13 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
14 0 0 0 0 0 0 0 0 0 0 0x0(RUN)
15 0 0 0 0 0 0 0 0 0 0 0x0(RUN)

new-flows are roughly one-third of the total tx-packets.

IMHO, with Erik's change, Erik's change has changed the way fq's
schedule behavior and it looks like there is no other packets in the
fq after a packet has been dequeued. And as a result, this flow's
deficit will be refill and then removed from fq list at once in the
same CPU. And during this time, the other CPU could be blocked. When
new packet comes, same thing happens. So we get equal new flows and
tx-packets.

Things would be different without Erik's change. After a packet has
been dequeued, this flow's deficit will not be refill immediately in
CPU0. It is possible that the deficit to be refilled in CPU1 while at
the same time CPU0 can fetch data from ethernet. So we can see more
tx-packets delivered to FW from aqm.


Why does this happen only WDS mode? Did you test other modes, like AP 
or

client mode?

AP mode has same issue. UDP throughput drops more than 10%. As for
TCP, CPU usage rising a lot although throughput stays similiar.
So, I'd say Erik's change does not work for us.


Hi Kalle,

May I have your comments?

--
Yibo

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: FW: [PATCH] ath10k: fix return value check in wake_tx_q op

2019-03-11 Thread Yibo Zhao

On 2019-03-11 14:44, Erik Stromdahl wrote:

Hi Yibo,

Sorry for a late reply, but I have been busy with other projects 
lately.

I have added my comments below

On 3/4/19 2:56 AM, Yibo Zhao wrote:

在 2019-02-25 12:40,Yibo Zhao 写道:

在 2019-02-07 22:25,Kalle Valo 写道:

Yibo Zhao  writes:


I have a few patches related to bundling of TX packets on my private 
repo.

I have not yet had the time to prepare them for submission.
This patch is related to that work, but I decided to submit it 
separately

since I considered it a bugfix.


Great! Really looking forward to your new patch.







IMHO, with Erik's change, Erik's change has changed the way fq's
schedule behavior and it looks like there is no other packets in the
fq after a packet has been dequeued. And as a result, this flow's
deficit will be refill and then removed from fq list at once in the
same CPU. And during this time, the other CPU could be blocked. When
new packet comes, same thing happens. So we get equal new flows and
tx-packets.

Things would be different without Erik's change. After a packet has
been dequeued, this flow's deficit will not be refill immediately in
CPU0. It is possible that the deficit to be refilled in CPU1 while at
the same time CPU0 can fetch data from ethernet. So we can see more
tx-packets delivered to FW from aqm.


Why does this happen only WDS mode? Did you test other modes, like 
AP or

client mode?

AP mode has same issue. UDP throughput drops more than 10%. As for
TCP, CPU usage rising a lot although throughput stays similiar.
So, I'd say Erik's change does not work for us.


Hi Kalle,

May I have your comments?



As I wrote in the commit message, the original code will always break 
out of

the loop after just one iteration.

It is OK with me to bring back the old logic, but I think we should 
skip the

loop entirely then.

Something like this:

if (ath10k_mac_tx_can_push(hw, txq)) {
ath10k_mac_tx_push_txq(hw, txq);
}
Yes, it is the exact way we tried in our private repo. And it works fine 
in our setup so far. Not sure it is ok for other situations. Have you 
tested on your sdio setup with this change? Any issue observed?



Btw, I noticed that the "fair scheduling" mechanism (derived from 
ath9k) from

Toke have been integrated.

I haven't rebased my tree for a while, so I will most likely have to 
rewrite
my patches anyway in order to make them work with the new TX 
scheduling.


--
Yibo

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


[PATCH v2] fq: fix fq_tin tx bytes overflow

2019-03-13 Thread Yibo Zhao
Currently, we are using u32 for tx_bytes in fq_tin.
If the throughput stays more than 1.2Gbps, tx_bytes
statistics overflow in about 1 min.

In order to allow us to trace the tx_bytes statistics
for longer time in high throughput, change its type
from u32 to u64.

Signed-off-by: Yibo Zhao 
---
 include/net/fq.h  | 2 +-
 net/mac80211/debugfs_netdev.c | 2 +-
 net/mac80211/debugfs_sta.c| 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/net/fq.h b/include/net/fq.h
index ac944a6..70f8b12 100644
--- a/include/net/fq.h
+++ b/include/net/fq.h
@@ -53,7 +53,7 @@ struct fq_tin {
u32 overlimit;
u32 collisions;
u32 flows;
-   u32 tx_bytes;
+   u64 tx_bytes;
u32 tx_packets;
 };
 
diff --git a/net/mac80211/debugfs_netdev.c b/net/mac80211/debugfs_netdev.c
index cff0fb3..8d66f41 100644
--- a/net/mac80211/debugfs_netdev.c
+++ b/net/mac80211/debugfs_netdev.c
@@ -499,7 +499,7 @@ static ssize_t ieee80211_if_fmt_aqm(
len = scnprintf(buf,
buflen,
"ac backlog-bytes backlog-packets new-flows drops marks 
overlimit collisions tx-bytes tx-packets\n"
-   "%u %u %u %u %u %u %u %u %u %u\n",
+   "%u %u %u %u %u %u %u %u %llu %u\n",
txqi->txq.ac,
txqi->tin.backlog_bytes,
txqi->tin.backlog_packets,
diff --git a/net/mac80211/debugfs_sta.c b/net/mac80211/debugfs_sta.c
index 3aa618d..e54a6d6 100644
--- a/net/mac80211/debugfs_sta.c
+++ b/net/mac80211/debugfs_sta.c
@@ -168,7 +168,7 @@ static ssize_t sta_aqm_read(struct file *file, char __user 
*userbuf,
continue;
txqi = to_txq_info(sta->sta.txq[i]);
p += scnprintf(p, bufsz+buf-p,
-  "%d %d %u %u %u %u %u %u %u %u %u 
0x%lx(%s%s%s)\n",
+  "%d %d %u %u %u %u %u %u %u %llu %u 
0x%lx(%s%s%s)\n",
   txqi->txq.tid,
   txqi->txq.ac,
   txqi->tin.backlog_bytes,
-- 
1.9.1


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2] fq: fix fq_tin tx bytes overflow

2019-03-17 Thread Yibo Zhao

On 2019-03-15 17:38, Johannes Berg wrote:

On Wed, 2019-03-13 at 11:08 +0800, Yibo Zhao wrote:

Currently, we are using u32 for tx_bytes in fq_tin.
If the throughput stays more than 1.2Gbps, tx_bytes
statistics overflow in about 1 min.

In order to allow us to trace the tx_bytes statistics
for longer time in high throughput, change its type
from u32 to u64.


Hmm. 64-bit values are kinda expensive on 32-bit architectures. How
badly do you need this? I mean ... worst case you just have to capture
every 30 seconds or so if you're doing really high throughput with HE 
or

something?

johannes


Hi Johannes,

I understand your concern. Yes, I am using high end AP for throughput 
test. I'd say 1.2 Gbps is not the worst case since we can achieve max 
1.4Gbps according to our test. AFAIK, for most throughput cases, 1min is 
the minimum requirement. And I think, with more and more high end 
products(even higher throughput) on the ways to the market, it is highly 
possible that 30s is not a safe time before overflow.


--
Yibo

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH] ath10k: remove iteration in wake_tx_queue

2019-04-01 Thread Yibo Zhao

On 2019-03-29 15:47, Erik Stromdahl wrote:

On 3/27/19 6:49 PM, Peter Oh wrote:



On 03/27/2019 09:29 AM, Erik Stromdahl wrote:
Iterating the TX queue and thereby dequeuing all available packets in 
the

queue could result in performance penalties on some SMP systems.


Please share the test results and numbers you've run to help others
thoughts.



I haven't run any tests with ath10k PCIe, but Yibo Zhao had noticed a 
10%

degradation without this patch.

Yibo:
Can you provide iperf results etc. that shows the performance gain?


My tests are based on ixchariot with cabled setup(two-core AP system).
WDS mode--10% deviation:
With previous change: UDP DL-1246 Mbps
W/O previous change:  UDP DL-987 Mbps

Normal mode:
With previous change: UDP DL-1380 Mbps
W/O previous change:  UDP DL-1310 Mbps

Also attached the aqm status.

With previous change:

tid ac backlog-bytes backlog-packets new-flows drops marks overlimit 
collisions tx-bytes tx-packets flags

0 2 0 0 5342229 0 0 0 0 3867657029 5342229 0x0(RUN)
1 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
2 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
3 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
4 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
5 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
6 0 0 0 2 0 0 0 0 144 2 0x0(RUN)
7 0 0 0 2 0 0 0 0 282 2 0x0(RUN)
8 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
9 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
10 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
11 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
12 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
13 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
14 0 0 0 0 0 0 0 0 0 0 0x0(RUN)
15 0 0 0 0 0 0 0 0 0 0 0x0(RUN)

we see no difference between tx-packets and new-flows.

W/O previous change:

tid ac backlog-bytes backlog-packets new-flows drops marks overlimit 
collisions tx-bytes tx-packets flags

0 2 0 0 2233059 3 0 9236 12 1159661783 6380867 0x0(RUN)
1 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
2 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
3 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
4 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
5 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
6 0 0 0 1 0 0 0 0 144 2 0x0(RUN)
7 0 0 0 1 0 0 0 0 282 2 0x0(RUN)
8 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
9 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
10 3 0 0 0 0 0 0 0 0 0 0x0(RUN)
11 2 0 0 0 0 0 0 0 0 0 0x0(RUN)
12 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
13 1 0 0 0 0 0 0 0 0 0 0x0(RUN)
14 0 0 0 0 0 0 0 0 0 0 0x0(RUN)
15 0 0 0 0 0 0 0 0 0 0 0x0(RUN)

new-flows are roughly one-third of the total tx-packets.



--
Erik


--
Yibo

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2] fq: fix fq_tin tx bytes overflow

2019-04-12 Thread Yibo Zhao

On 2019-04-09 04:01, Johannes Berg wrote:

On Mon, 2019-03-18 at 12:59 +0800, Yibo Zhao wrote:


I understand your concern. Yes, I am using high end AP for throughput
test. I'd say 1.2 Gbps is not the worst case since we can achieve max
1.4Gbps according to our test. AFAIK, for most throughput cases, 1min 
is

the minimum requirement. And I think, with more and more high end
products(even higher throughput) on the ways to the market, it is 
highly

possible that 30s is not a safe time before overflow.


Well, 2 Gbps (goodput) would make it overflow every 16 seconds, so I'm
not sure where you take the 1 minute from :-) Maybe from 1.2Gbps PHY
rate.

But still, the only place we expose this is debugfs, so I'm not really
sure we want to spend that.

Note that I'm generally pushing back on statistics right now - I really
want to put some trace points or eBPF hooks in places that people can
use to keep their favourite statistics, instead of trying to cover each
and every use case in the stack itself.


Cool, great!looking forward to that change. :-)


johannes


--
Yibo

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


[PATCH] mac80211: remove warning message

2019-05-10 Thread Yibo Zhao
In multiple SSID cases, it takes time to prepare every AP interface
to be ready in initializing phase. If a sta already knows everything it
needs to join one of the APs and sends authentication to the AP which
is not fully prepared at this point of time, AP's channel context
could be NULL. As a result, warning message occurs.

Even worse, if the AP is under attack via tools such as MDK3 and massive
authentication requests are received in a very short time, console will
be hung due to kernel warning messages.

If this case can be hit during normal functionality, there should be no
WARN_ON(). Those should be reserved to cases that are not supposed to be
hit at all or some other more specific cases like indicating obsolete
interface.

Signed-off-by: Zhi Chen 
Signed-off-by: Yibo Zhao 
---
 net/mac80211/ieee80211_i.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index 2ae0364..f39c289 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -1435,7 +1435,7 @@ struct ieee80211_local {
rcu_read_lock();
chanctx_conf = rcu_dereference(sdata->vif.chanctx_conf);
 
-   if (WARN_ON_ONCE(!chanctx_conf)) {
+   if (!chanctx_conf) {
rcu_read_unlock();
return NULL;
}
-- 
1.9.1


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH] mac80211: remove warning message

2019-05-12 Thread Yibo Zhao

On 2019-05-10 22:04, Ben Greear wrote:

On 05/10/2019 12:01 AM, Yibo Zhao wrote:

In multiple SSID cases, it takes time to prepare every AP interface
to be ready in initializing phase. If a sta already knows everything 
it

needs to join one of the APs and sends authentication to the AP which
is not fully prepared at this point of time, AP's channel context
could be NULL. As a result, warning message occurs.

Even worse, if the AP is under attack via tools such as MDK3 and 
massive
authentication requests are received in a very short time, console 
will

be hung due to kernel warning messages.


Since it is a WARN_ON_ONCE, how it the console hang due to warnings?  
You should

get no more than once per boot?


Hi Ben,

I was planning to use WARN_ON_ONCE() in the first place to replace 
WARN_ON() then after some discussion, we think removing it could be 
better. So the patch was based on my first version. Sorry for the 
confusing. Will raise another one.


I have no problem with removing it though.  Seems a harmless splat and 
I removed

it from my tree some time back as well.

Thanks,
Ben



If this case can be hit during normal functionality, there should be 
no
WARN_ON(). Those should be reserved to cases that are not supposed to 
be

hit at all or some other more specific cases like indicating obsolete
interface.

Signed-off-by: Zhi Chen 
Signed-off-by: Yibo Zhao 
---
 net/mac80211/ieee80211_i.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index 2ae0364..f39c289 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -1435,7 +1435,7 @@ struct ieee80211_local {
rcu_read_lock();
chanctx_conf = rcu_dereference(sdata->vif.chanctx_conf);

-   if (WARN_ON_ONCE(!chanctx_conf)) {
+   if (!chanctx_conf) {
rcu_read_unlock();
return NULL;
}



--
Yibo


[PATCH v2] mac80211: remove warning message

2019-05-14 Thread Yibo Zhao
In multiple SSID cases, it takes time to prepare every AP interface
to be ready in initializing phase. If a sta already knows everything it
needs to join one of the APs and sends authentication to the AP which
is not fully prepared at this point of time, AP's channel context
could be NULL. As a result, warning message occurs.

Even worse, if the AP is under attack via tools such as MDK3 and massive
authentication requests are received in a very short time, console will
be hung due to kernel warning messages.

If this case can be hit during normal functionality, there should be no
WARN_ON(). Those should be reserved to cases that are not supposed to be
hit at all or some other more specific cases like indicating obsolete
interface.

Signed-off-by: Zhi Chen 
Signed-off-by: Yibo Zhao 
---
 net/mac80211/ieee80211_i.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index 073a823..f39c289 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -1435,7 +1435,7 @@ struct ieee80211_local {
rcu_read_lock();
chanctx_conf = rcu_dereference(sdata->vif.chanctx_conf);
 
-   if (WARN_ON(!chanctx_conf)) {
+   if (!chanctx_conf) {
rcu_read_unlock();
return NULL;
}
-- 
1.9.1



Re: [PATCH v2] mac80211: remove warning message

2019-05-14 Thread Yibo Zhao

On 2019-05-14 17:05, Johannes Berg wrote:

On Tue, 2019-05-14 at 17:01 +0800, Yibo Zhao wrote:

In multiple SSID cases, it takes time to prepare every AP interface
to be ready in initializing phase. If a sta already knows everything 
it

needs to join one of the APs and sends authentication to the AP which
is not fully prepared at this point of time, AP's channel context
could be NULL. As a result, warning message occurs.


Err, what was the point in sending v2 without any changes?

johannes

Hi Johannes,

I was planning to use WARN_ON_ONCE() in the first place to replace 
WARN_ON() then after some discussion, we think removing it could be 
better. So the first patch was based on my first version which is sent 
incorrectly. Please check again.


Sorry for the confusing.


--
Yibo

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2] mac80211: remove warning message

2019-05-20 Thread Yibo Zhao

On 2019-05-15 02:57, Johannes Berg wrote:

On Tue, 2019-05-14 at 11:54 -0700, Ben Greear wrote:


Here is the info I have in my commit that changed this to 
WARN_ON_ONCE.
I never posted it because I had to hack ath10k to get to this state, 
so maybe

this is not a valid case to debug.


Maybe Yibo Zhao has a better example.

 mac80211: don't spam kernel logs when chantx is null.

 I set up ath10k to be chandef based again in order to test
 WDS.  My WDS stations are not very functional yet, and
 when ethtool stats are queried, there is a WARN_ON splat
 generated.  Change this to WARN_ON_ONCE so that there is
 less kernel spam.


I'm totally fine with WARN_ON_ONCE, FWIW.

Sounds like different bugs though. You're talking about WDS here, and
Yibo was talking about something with AP interfaces prematurely
accepting frames or so.


Yes, they might be different bugs that hit the same point. Looks like 
others found this too many warnings issue as well. Then I believe 
WARN_ON_ONCE() seems to be our solution for now.




johannes


--
Yibo

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH v2] mac80211: remove warning message

2019-06-13 Thread Yibo Zhao

On 2019-05-20 21:56, Yibo Zhao wrote:

On 2019-05-15 02:57, Johannes Berg wrote:

On Tue, 2019-05-14 at 11:54 -0700, Ben Greear wrote:


Here is the info I have in my commit that changed this to 
WARN_ON_ONCE.
I never posted it because I had to hack ath10k to get to this state, 
so maybe

this is not a valid case to debug.


Maybe Yibo Zhao has a better example.

 mac80211: don't spam kernel logs when chantx is null.

 I set up ath10k to be chandef based again in order to test
 WDS.  My WDS stations are not very functional yet, and
 when ethtool stats are queried, there is a WARN_ON splat
 generated.  Change this to WARN_ON_ONCE so that there is
 less kernel spam.


I'm totally fine with WARN_ON_ONCE, FWIW.

Sounds like different bugs though. You're talking about WDS here, and
Yibo was talking about something with AP interfaces prematurely
accepting frames or so.


Yes, they might be different bugs that hit the same point. Looks like
others found this too many warnings issue as well. Then I believe
WARN_ON_ONCE() seems to be our solution for now.


Hi Johannes,

May I know if it is fine that WARN_ON_ONCE() to be applied in kernel in 
the future? If a separate patch for it is needed, please let me know so 
that I can raise a new one.




johannes


--
Yibo

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


[PATCH] mac80211: suppress kernel warning message

2019-06-14 Thread Yibo Zhao
In multiple SSID cases, it takes time to prepare every AP interface
to be ready in initializing phase. If a sta already knows everything it
needs to join one of the APs and sends authentication to the AP which
is not fully prepared at this point of time, AP's channel context
could be NULL. As a result, warning message occurs.

Even worse, if the AP is under attack via tools such as MDK3 and massive
authentication requests are received in a very short time, console will
be hung due to kernel warning messages.

WARN_ON_ONCE() could be a better way for indicating warning messages
without duplicate messages to flood the console.

Signed-off-by: Zhi Chen 
Signed-off-by: Yibo Zhao 
---
 net/mac80211/ieee80211_i.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index a8af4aa..682d0ab 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -1435,7 +1435,7 @@ struct ieee80211_local {
rcu_read_lock();
chanctx_conf = rcu_dereference(sdata->vif.chanctx_conf);
 
-   if (WARN_ON(!chanctx_conf)) {
+   if (WARN_ON_ONCE(!chanctx_conf)) {
rcu_read_unlock();
return NULL;
}
-- 
1.9.1


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


[PATCH 2/4] mac80211: defer txqs removal from rbtree

2019-09-16 Thread Yibo Zhao
In a loop txqs dequeue scenario, if the first txq in the rbtree gets
removed from rbtree immediately in the ieee80211_return_txq(), the
loop will break soon in the ieee80211_next_txq() due to schedule_pos
not leading to the second txq in the rbtree. Thus, defering the
removal right before the end of this schedule round.

Co-developed-by: Yibo Zhao 
Signed-off-by: Yibo Zhao 
Signed-off-by: Toke Høiland-Jørgensen 
---
 include/net/mac80211.h | 16 ++--
 net/mac80211/ieee80211_i.h |  3 +++
 net/mac80211/main.c|  6 +
 net/mac80211/tx.c  | 63 +++---
 4 files changed, 83 insertions(+), 5 deletions(-)

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index ac2ed8e..ba5a345 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -925,6 +925,8 @@ struct ieee80211_tx_rate {
 
 #define IEEE80211_MAX_TX_RETRY 31
 
+#define IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS 100
+
 static inline void ieee80211_rate_set_vht(struct ieee80211_tx_rate *rate,
  u8 mcs, u8 nss)
 {
@@ -6232,7 +6234,8 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw 
*hw,
  * @ac: AC number to return packets from.
  *
  * Should only be called between calls to ieee80211_txq_schedule_start()
- * and ieee80211_txq_schedule_end().
+ * and ieee80211_txq_schedule_end(). If the txq is empty, it will be added
+ * to a remove list and get removed later.
  * Returns the next txq if successful, %NULL if no queue is eligible. If a txq
  * is returned, it should be returned with ieee80211_return_txq() after the
  * driver has finished scheduling it.
@@ -6268,7 +6271,8 @@ void ieee80211_txq_schedule_start(struct ieee80211_hw 
*hw, u8 ac)
  * @hw: pointer as obtained from ieee80211_alloc_hw()
  * @ac: AC number to acquire locks for
  *
- * Release locks previously acquired by ieee80211_txq_schedule_end().
+ * Release locks previously acquired by ieee80211_txq_schedule_end(). Check
+ * and remove the empty txq from rb-tree.
  */
 void ieee80211_txq_schedule_end(struct ieee80211_hw *hw, u8 ac)
__releases(txq_lock);
@@ -6287,6 +6291,14 @@ void ieee80211_schedule_txq(struct ieee80211_hw *hw, 
struct ieee80211_txq *txq)
__acquires(txq_lock) __releases(txq_lock);
 
 /**
+ * ieee80211_txqs_check - Check txqs waiting for removal
+ *
+ * @tmr: pointer as obtained from local
+ *
+ */
+void ieee80211_txqs_check(struct timer_list *tmr);
+
+/**
  * ieee80211_txq_may_transmit - check whether TXQ is allowed to transmit
  *
  * This function is used to check whether given txq is allowed to transmit by
diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index a4556f9..49aa143e 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -847,6 +847,7 @@ struct txq_info {
struct codel_stats cstats;
struct sk_buff_head frags;
struct rb_node schedule_order;
+   struct list_head candidate;
unsigned long flags;
 
/* keep last! */
@@ -1145,6 +1146,8 @@ struct ieee80211_local {
u64 airtime_v_t[IEEE80211_NUM_ACS];
u64 airtime_weight_sum[IEEE80211_NUM_ACS];
 
+   struct list_head remove_list[IEEE80211_NUM_ACS];
+   struct timer_list remove_timer;
u16 airtime_flags;
 
const struct ieee80211_ops *ops;
diff --git a/net/mac80211/main.c b/net/mac80211/main.c
index e9ffa8e..78fe24a 100644
--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -667,10 +667,15 @@ struct ieee80211_hw *ieee80211_alloc_hw_nm(size_t 
priv_data_len,
 
for (i = 0; i < IEEE80211_NUM_ACS; i++) {
local->active_txqs[i] = RB_ROOT_CACHED;
+   INIT_LIST_HEAD(&local->remove_list[i]);
spin_lock_init(&local->active_txq_lock[i]);
}
local->airtime_flags = AIRTIME_USE_TX | AIRTIME_USE_RX;
 
+   timer_setup(&local->remove_timer, ieee80211_txqs_check, 0);
+   mod_timer(&local->remove_timer,
+ jiffies + 
msecs_to_jiffies(IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS));
+
INIT_LIST_HEAD(&local->chanctx_list);
mutex_init(&local->chanctx_mtx);
 
@@ -1305,6 +1310,7 @@ void ieee80211_unregister_hw(struct ieee80211_hw *hw)
tasklet_kill(&local->tx_pending_tasklet);
tasklet_kill(&local->tasklet);
 
+   del_timer_sync(&local->remove_timer);
 #ifdef CONFIG_INET
unregister_inetaddr_notifier(&local->ifa_notifier);
 #endif
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index d00baaa..42ca010 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -1450,6 +1450,7 @@ void ieee80211_txq_init(struct ieee80211_sub_if_data 
*sdata,
codel_stats_init(&txqi->cstats);
__skb_queue_head_init(&txqi->frags);
RB_CLEAR_NODE(&txqi->schedule_order);
+   INIT_LIST_HEAD(&txqi->candidate);
 
txqi->txq.vif = &a

[PATCH 1/4] mac80211: Switch to a virtual time-based airtime scheduler

2019-09-16 Thread Yibo Zhao
From: Toke Høiland-Jørgensen 

This switches the airtime scheduler in mac80211 to use a virtual time-based
scheduler instead of the round-robin scheduler used before. This has a
couple of advantages:

- No need to sync up the round-robin scheduler in firmware/hardware with
  the round-robin airtime scheduler.

- If several stations are eligible for transmission we can schedule both of
  them; no need to hard-block the scheduling rotation until the head of the
  queue has used up its quantum.

- The check of whether a station is eligible for transmission becomes
  simpler (in ieee80211_txq_may_transmit()).

The drawback is that scheduling becomes slightly more expensive, as we need
to maintain an rbtree of TXQs sorted by virtual time. This means that
ieee80211_register_airtime() becomes O(logN) in the number of currently
scheduled TXQs. However, hopefully this number rarely grows too big (it's
only TXQs currently backlogged, not all associated stations), so it
shouldn't be too big of an issue.

Signed-off-by: Toke Høiland-Jørgensen 
---
 net/mac80211/debugfs.c |  48 +-
 net/mac80211/debugfs_sta.c |  16 ++--
 net/mac80211/ieee80211_i.h |  14 ++-
 net/mac80211/main.c|   2 +-
 net/mac80211/sta_info.c|  19 +++-
 net/mac80211/sta_info.h|   3 +-
 net/mac80211/tx.c  | 217 +
 7 files changed, 223 insertions(+), 96 deletions(-)

diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c
index 2d43bc1..4847168 100644
--- a/net/mac80211/debugfs.c
+++ b/net/mac80211/debugfs.c
@@ -150,6 +150,46 @@ static ssize_t aqm_write(struct file *file,
.llseek = default_llseek,
 };
 
+static ssize_t airtime_read(struct file *file,
+   char __user *user_buf,
+   size_t count,
+   loff_t *ppos)
+{
+   struct ieee80211_local *local = file->private_data;
+   char buf[200];
+   u64 v_t[IEEE80211_NUM_ACS];
+   u64 wt[IEEE80211_NUM_ACS];
+   int len = 0, ac;
+
+   for (ac = 0; ac < IEEE80211_NUM_ACS; ac++) {
+   spin_lock_bh(&local->active_txq_lock[ac]);
+   v_t[ac] = local->airtime_v_t[ac];
+   wt[ac] = local->airtime_weight_sum[ac];
+   spin_unlock_bh(&local->active_txq_lock[ac]);
+   }
+   len = scnprintf(buf, sizeof(buf),
+   "\tVO VI BE BK\n"
+   "Virt-t\t%-10llu %-10llu %-10llu %-10llu\n"
+   "Weight\t%-10llu %-10llu %-10llu %-10llu\n",
+   v_t[0],
+   v_t[1],
+   v_t[2],
+   v_t[3],
+   wt[0],
+   wt[1],
+   wt[2],
+   wt[3]);
+
+   return simple_read_from_buffer(user_buf, count, ppos,
+  buf, len);
+}
+
+static const struct file_operations airtime_ops = {
+   .read = airtime_read,
+   .open = simple_open,
+   .llseek = default_llseek,
+};
+
 #ifdef CONFIG_PM
 static ssize_t reset_write(struct file *file, const char __user *user_buf,
   size_t count, loff_t *ppos)
@@ -386,8 +426,12 @@ void debugfs_hw_add(struct ieee80211_local *local)
if (local->ops->wake_tx_queue)
DEBUGFS_ADD_MODE(aqm, 0600);
 
-   debugfs_create_u16("airtime_flags", 0600,
-  phyd, &local->airtime_flags);
+   if (wiphy_ext_feature_isset(local->hw.wiphy,
+   NL80211_EXT_FEATURE_AIRTIME_FAIRNESS)) {
+   DEBUGFS_ADD_MODE(airtime, 0600);
+   debugfs_create_u16("airtime_flags", 0600,
+  phyd, &local->airtime_flags);
+   }
 
statsd = debugfs_create_dir("statistics", phyd);
 
diff --git a/net/mac80211/debugfs_sta.c b/net/mac80211/debugfs_sta.c
index 3aa618d..80028da 100644
--- a/net/mac80211/debugfs_sta.c
+++ b/net/mac80211/debugfs_sta.c
@@ -203,7 +203,7 @@ static ssize_t sta_airtime_read(struct file *file, char 
__user *userbuf,
size_t bufsz = 200;
char *buf = kzalloc(bufsz, GFP_KERNEL), *p = buf;
u64 rx_airtime = 0, tx_airtime = 0;
-   s64 deficit[IEEE80211_NUM_ACS];
+   u64 v_t[IEEE80211_NUM_ACS];
ssize_t rv;
int ac;
 
@@ -214,20 +214,20 @@ static ssize_t sta_airtime_read(struct file *file, char 
__user *userbuf,
spin_lock_bh(&local->active_txq_lock[ac]);
rx_airtime += sta->airtime[ac].rx_airtime;
tx_airtime += sta->airtime[ac].tx_airtime;
-   deficit[ac] = sta->airtime[ac].deficit;
+   v_t[ac] = sta->airtime[ac].v_t;
spin_unlock_bh(&local->active_txq_lock[ac]);
}
 
p += scnprintf(p, bufsz + buf - p,
"RX: %llu us\nTX: %llu us\nWeight: %u\n"
-   "Deficit: VO: %lld us VI: %lld us BE: %lld 

[PATCH 3/4] mac80211: fix low throughput in push pull mode

2019-09-16 Thread Yibo Zhao
If station is ineligible for transmission in ieee80211_txq_may_transmit(),
no packet will be delivered to FW. During the tests in push-pull mode with
many clients, after several seconds, not a single station is an eligible
candidate for transmission since global time is smaller than all the
station's virtual airtime. As a consequence, the Tx has been blocked and
throughput is quite low.

To avoid this situation to occur in push-pull mode, the new proposal is:

- Increase the airtime grace period a little more to reduce the
  unexpected sync

- If global virtual time is less than the virtual airtime of any station,
  sync it to the airtime of first station in the red-black tree

- Round the division result since the process of global virtual time
  involves the division calculation

Co-developed-by: Yibo Zhao 
Signed-off-by: Yibo Zhao 
Signed-off-by: Toke Høiland-Jørgensen 
---
 net/mac80211/sta_info.c |  3 ++-
 net/mac80211/sta_info.h |  2 +-
 net/mac80211/tx.c   | 16 +++-
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index 9d01fdd..feac975 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -1852,7 +1852,8 @@ void ieee80211_sta_register_airtime(struct ieee80211_sta 
*pubsta, u8 tid,
 
weight_sum = local->airtime_weight_sum[ac] ?: sta->airtime_weight;
 
-   local->airtime_v_t[ac] += airtime / weight_sum;
+   /* Round the calculation of global vt */
+   local->airtime_v_t[ac] += (airtime + (weight_sum >> 1)) / weight_sum;
sta->airtime[ac].v_t += airtime / sta->airtime_weight;
ieee80211_resort_txq(&local->hw, txq);
 
diff --git a/net/mac80211/sta_info.h b/net/mac80211/sta_info.h
index 5c1cac9..5055f94 100644
--- a/net/mac80211/sta_info.h
+++ b/net/mac80211/sta_info.h
@@ -130,7 +130,7 @@ enum ieee80211_agg_stop_reason {
 /* Debugfs flags to enable/disable use of RX/TX airtime in scheduler */
 #define AIRTIME_USE_TX BIT(0)
 #define AIRTIME_USE_RX BIT(1)
-#define AIRTIME_GRACE 500 /* usec of grace period before reset */
+#define AIRTIME_GRACE 2000 /* usec of grace period before reset */
 
 struct airtime_info {
u64 rx_airtime;
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 42ca010..60cf569 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -3867,15 +3867,29 @@ bool ieee80211_txq_may_transmit(struct ieee80211_hw *hw,
struct ieee80211_txq *txq)
 {
struct ieee80211_local *local = hw_to_local(hw);
-   struct txq_info *txqi = to_txq_info(txq);
+   struct txq_info *first_txqi, *txqi = to_txq_info(txq);
+   struct rb_node *node = NULL;
struct sta_info *sta;
u8 ac = txq->ac;
+   first_txqi = NULL;
 
lockdep_assert_held(&local->active_txq_lock[ac]);
 
if (!txqi->txq.sta)
return true;
 
+   node = rb_first_cached(&local->active_txqs[ac]);
+   if (node) {
+   first_txqi = container_of(node, struct txq_info,
+ schedule_order);
+   if (first_txqi->txq.sta) {
+   sta = container_of(first_txqi->txq.sta,
+  struct sta_info, sta);
+   if (local->airtime_v_t[ac] < sta->airtime[ac].v_t)
+   local->airtime_v_t[ac] = sta->airtime[ac].v_t;
+   }
+   }
+
sta = container_of(txqi->txq.sta, struct sta_info, sta);
return (sta->airtime[ac].v_t <= local->airtime_v_t[ac]);
 }
-- 
1.9.1


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


[PATCH 4/4] mac80211: Sync airtime weight sum with per AC synced sta airtime weight together

2019-09-16 Thread Yibo Zhao
Global airtime weight sum is updated only when txq is added/removed
from rbtree. If upper layer configures sta weight during high load,
airtime weight sum will not be updated since txq is most likely on the
tree. It could a little late for upper layer to reconfigure sta weight
when txq is already in the rbtree. And thus, incorrect airtime weight sum
will lead to incorrect global virtual time calculation as well as global
airtime weight sum overflow of airtime weight sum during txq removed.

Hence, need to update airtime weight sum upon receiving event for
configuring sta weight once sta's txq is on the rbtree.

Besides, if airtime weight sum of ACs and sta weight is synced under the
same per AC lock protection, there can be a very short window causing
incorrct airtime weight sum calculation as below:

active_txq_lock_VO  .
VO weight sum is syncd  .
sta airtime weight sum is synced.
active_txq_unlock_VO.
.   .
active_txq_lock_VI  .
VI weight sum is syncd  .
sta airtime weight sum  active_txq_lock_BE
active_txq_unlock_VI  Remove txq and thus sum
. is calculated with synced
. sta airtime weight
.   active_txq_unlock_BE

So introduce a per ac synced station airtime weight synced with per
AC synced weight sum together. And the per-AC station airtime weight
is used to calculate weight sum.

Signed-off-by: Yibo Zhao 
---
 net/mac80211/cfg.c  | 27 +--
 net/mac80211/sta_info.c |  6 --
 net/mac80211/sta_info.h |  3 +++
 net/mac80211/tx.c   |  4 ++--
 4 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
index d65aa01..4b420bb 100644
--- a/net/mac80211/cfg.c
+++ b/net/mac80211/cfg.c
@@ -1284,7 +1284,8 @@ static int sta_apply_parameters(struct ieee80211_local 
*local,
int ret = 0;
struct ieee80211_supported_band *sband;
struct ieee80211_sub_if_data *sdata = sta->sdata;
-   u32 mask, set;
+   u32 mask, set, tid, ac;
+   struct txq_info *txqi;
 
sband = ieee80211_get_sband(sdata);
if (!sband)
@@ -1452,8 +1453,30 @@ static int sta_apply_parameters(struct ieee80211_local 
*local,
if (ieee80211_vif_is_mesh(&sdata->vif))
sta_apply_mesh_params(local, sta, params);
 
-   if (params->airtime_weight)
+   if (params->airtime_weight &&
+   params->airtime_weight != sta->airtime_weight) {
+   for (ac = 0; ac < IEEE80211_NUM_ACS; ac++) {
+   spin_lock_bh(&local->active_txq_lock[ac]);
+   for (tid = 0; tid < IEEE80211_NUM_TIDS + 1; tid++) {
+   if (!sta->sta.txq[tid] ||
+   ac != ieee80211_ac_from_tid(tid))
+   continue;
+
+   sta->airtime_weight_synced[ac] =
+   params->airtime_weight;
+
+   txqi = to_txq_info(sta->sta.txq[tid]);
+   if (RB_EMPTY_NODE(&txqi->schedule_order))
+   continue;
+
+   local->airtime_weight_sum[ac] = 
local->airtime_weight_sum[ac] +
+   
params->airtime_weight -
+   
sta->airtime_weight;
+   }
+   spin_unlock_bh(&local->active_txq_lock[ac]);
+   }
sta->airtime_weight = params->airtime_weight;
+   }
 
/* set the STA state after all sta info from usermode has been set */
if (test_sta_flag(sta, WLAN_STA_TDLS_PEER) ||
diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index feac975..b00812f 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -389,6 +389,7 @@ struct sta_info *sta_info_alloc(struct 
ieee80211_sub_if_data *sdata,
for (i = 0; i < IEEE80211_NUM_ACS; i++) {
skb_queue_head_init(&sta->ps_tx_buf[i]);
skb_queue_head_init(&sta->tx_filtered[i]);
+   sta->airtime_weight_synced[i] = sta->airtime_weight;
}
 
for (i = 0; i < IEEE80211_NUM_TIDS; i++)
@@ -1850,11 +1851,12 @@ void ieee80211_sta_register_airtime(struct 
ieee80211_sta *pubsta, u8 tid,
sta->airtime[ac].tx_airtime += tx_airtime;
sta->airtime[ac].rx_airtime += rx_airtime;
 
-   weight_sum = local->airtime_weight_sum[ac] ?: sta->airtime_weight;
+   weight_

Re: [PATCH 3/4] mac80211: fix low throughput in push pull mode

2019-09-16 Thread Yibo Zhao

On 2019-09-16 23:27, Johannes Berg wrote:

Without really looking at the code -

If station is ineligible for transmission in 
ieee80211_txq_may_transmit(),
no packet will be delivered to FW. During the tests in push-pull mode 
with
many clients, after several seconds, not a single station is an 
eligible

candidate for transmission since global time is smaller than all the
station's virtual airtime. As a consequence, the Tx has been blocked 
and

throughput is quite low.


You should rewrite this to be, erm, a bit more understandable in
mac80211 context. I assume you're speaking (mostly?) about ath10k, but 
I

have very little context there. "push pull mode"? "firmware"? These
things are not something mac80211 knows about.

Hi Johannes,

Thanks for your kindly reminder. Will rewrite the commit log.




Co-developed-by: Yibo Zhao 


That also seems wrong, should be Toke I guess, unless you intended for 
a

From: Toke to be present?

Do you mean it should be something like:

Co-developed-by: Toke Høiland-Jørgensen 
Signed-off-by: Yibo Zhao 
Signed-off-by: Toke Høiland-Jørgensen 

Am I understanding right?


johannes


--
Yibo


[PATCH V2 2/4] mac80211: defer txqs removal from rbtree

2019-09-18 Thread Yibo Zhao
In a loop txqs dequeue scenario, if the first txq in the rbtree gets
removed from rbtree immediately in the ieee80211_return_txq(), the
loop will break soon in the ieee80211_next_txq() due to schedule_pos
not leading to the second txq in the rbtree. Thus, defering the
removal right before the end of this schedule round.

Co-developed-by: Toke Høiland-Jørgensen 
Signed-off-by: Yibo Zhao 
Signed-off-by: Toke Høiland-Jørgensen 
---
 include/net/mac80211.h | 16 ++--
 net/mac80211/ieee80211_i.h |  3 +++
 net/mac80211/main.c|  6 +
 net/mac80211/tx.c  | 63 +++---
 4 files changed, 83 insertions(+), 5 deletions(-)

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index ac2ed8e..ba5a345 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -925,6 +925,8 @@ struct ieee80211_tx_rate {
 
 #define IEEE80211_MAX_TX_RETRY 31
 
+#define IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS 100
+
 static inline void ieee80211_rate_set_vht(struct ieee80211_tx_rate *rate,
  u8 mcs, u8 nss)
 {
@@ -6232,7 +6234,8 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw 
*hw,
  * @ac: AC number to return packets from.
  *
  * Should only be called between calls to ieee80211_txq_schedule_start()
- * and ieee80211_txq_schedule_end().
+ * and ieee80211_txq_schedule_end(). If the txq is empty, it will be added
+ * to a remove list and get removed later.
  * Returns the next txq if successful, %NULL if no queue is eligible. If a txq
  * is returned, it should be returned with ieee80211_return_txq() after the
  * driver has finished scheduling it.
@@ -6268,7 +6271,8 @@ void ieee80211_txq_schedule_start(struct ieee80211_hw 
*hw, u8 ac)
  * @hw: pointer as obtained from ieee80211_alloc_hw()
  * @ac: AC number to acquire locks for
  *
- * Release locks previously acquired by ieee80211_txq_schedule_end().
+ * Release locks previously acquired by ieee80211_txq_schedule_end(). Check
+ * and remove the empty txq from rb-tree.
  */
 void ieee80211_txq_schedule_end(struct ieee80211_hw *hw, u8 ac)
__releases(txq_lock);
@@ -6287,6 +6291,14 @@ void ieee80211_schedule_txq(struct ieee80211_hw *hw, 
struct ieee80211_txq *txq)
__acquires(txq_lock) __releases(txq_lock);
 
 /**
+ * ieee80211_txqs_check - Check txqs waiting for removal
+ *
+ * @tmr: pointer as obtained from local
+ *
+ */
+void ieee80211_txqs_check(struct timer_list *tmr);
+
+/**
  * ieee80211_txq_may_transmit - check whether TXQ is allowed to transmit
  *
  * This function is used to check whether given txq is allowed to transmit by
diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index a4556f9..49aa143e 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -847,6 +847,7 @@ struct txq_info {
struct codel_stats cstats;
struct sk_buff_head frags;
struct rb_node schedule_order;
+   struct list_head candidate;
unsigned long flags;
 
/* keep last! */
@@ -1145,6 +1146,8 @@ struct ieee80211_local {
u64 airtime_v_t[IEEE80211_NUM_ACS];
u64 airtime_weight_sum[IEEE80211_NUM_ACS];
 
+   struct list_head remove_list[IEEE80211_NUM_ACS];
+   struct timer_list remove_timer;
u16 airtime_flags;
 
const struct ieee80211_ops *ops;
diff --git a/net/mac80211/main.c b/net/mac80211/main.c
index e9ffa8e..78fe24a 100644
--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -667,10 +667,15 @@ struct ieee80211_hw *ieee80211_alloc_hw_nm(size_t 
priv_data_len,
 
for (i = 0; i < IEEE80211_NUM_ACS; i++) {
local->active_txqs[i] = RB_ROOT_CACHED;
+   INIT_LIST_HEAD(&local->remove_list[i]);
spin_lock_init(&local->active_txq_lock[i]);
}
local->airtime_flags = AIRTIME_USE_TX | AIRTIME_USE_RX;
 
+   timer_setup(&local->remove_timer, ieee80211_txqs_check, 0);
+   mod_timer(&local->remove_timer,
+ jiffies + 
msecs_to_jiffies(IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS));
+
INIT_LIST_HEAD(&local->chanctx_list);
mutex_init(&local->chanctx_mtx);
 
@@ -1305,6 +1310,7 @@ void ieee80211_unregister_hw(struct ieee80211_hw *hw)
tasklet_kill(&local->tx_pending_tasklet);
tasklet_kill(&local->tasklet);
 
+   del_timer_sync(&local->remove_timer);
 #ifdef CONFIG_INET
unregister_inetaddr_notifier(&local->ifa_notifier);
 #endif
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index d00baaa..42ca010 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -1450,6 +1450,7 @@ void ieee80211_txq_init(struct ieee80211_sub_if_data 
*sdata,
codel_stats_init(&txqi->cstats);
__skb_queue_head_init(&txqi->frags);
RB_CLEAR_NODE(&txqi->schedule_order);
+   INIT_LIST_HEAD(&txqi->candidate);
 
 

[PATCH V2 1/4] mac80211: Switch to a virtual time-based airtime scheduler

2019-09-18 Thread Yibo Zhao
From: Toke Høiland-Jørgensen 

This switches the airtime scheduler in mac80211 to use a virtual time-based
scheduler instead of the round-robin scheduler used before. This has a
couple of advantages:

- No need to sync up the round-robin scheduler in firmware/hardware with
  the round-robin airtime scheduler.

- If several stations are eligible for transmission we can schedule both of
  them; no need to hard-block the scheduling rotation until the head of the
  queue has used up its quantum.

- The check of whether a station is eligible for transmission becomes
  simpler (in ieee80211_txq_may_transmit()).

The drawback is that scheduling becomes slightly more expensive, as we need
to maintain an rbtree of TXQs sorted by virtual time. This means that
ieee80211_register_airtime() becomes O(logN) in the number of currently
scheduled TXQs. However, hopefully this number rarely grows too big (it's
only TXQs currently backlogged, not all associated stations), so it
shouldn't be too big of an issue.

Signed-off-by: Toke Høiland-Jørgensen 
---
 net/mac80211/debugfs.c |  48 +-
 net/mac80211/debugfs_sta.c |  16 ++--
 net/mac80211/ieee80211_i.h |  14 ++-
 net/mac80211/main.c|   2 +-
 net/mac80211/sta_info.c|  19 +++-
 net/mac80211/sta_info.h|   3 +-
 net/mac80211/tx.c  | 217 +
 7 files changed, 223 insertions(+), 96 deletions(-)

diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c
index 2d43bc1..4847168 100644
--- a/net/mac80211/debugfs.c
+++ b/net/mac80211/debugfs.c
@@ -150,6 +150,46 @@ static ssize_t aqm_write(struct file *file,
.llseek = default_llseek,
 };
 
+static ssize_t airtime_read(struct file *file,
+   char __user *user_buf,
+   size_t count,
+   loff_t *ppos)
+{
+   struct ieee80211_local *local = file->private_data;
+   char buf[200];
+   u64 v_t[IEEE80211_NUM_ACS];
+   u64 wt[IEEE80211_NUM_ACS];
+   int len = 0, ac;
+
+   for (ac = 0; ac < IEEE80211_NUM_ACS; ac++) {
+   spin_lock_bh(&local->active_txq_lock[ac]);
+   v_t[ac] = local->airtime_v_t[ac];
+   wt[ac] = local->airtime_weight_sum[ac];
+   spin_unlock_bh(&local->active_txq_lock[ac]);
+   }
+   len = scnprintf(buf, sizeof(buf),
+   "\tVO VI BE BK\n"
+   "Virt-t\t%-10llu %-10llu %-10llu %-10llu\n"
+   "Weight\t%-10llu %-10llu %-10llu %-10llu\n",
+   v_t[0],
+   v_t[1],
+   v_t[2],
+   v_t[3],
+   wt[0],
+   wt[1],
+   wt[2],
+   wt[3]);
+
+   return simple_read_from_buffer(user_buf, count, ppos,
+  buf, len);
+}
+
+static const struct file_operations airtime_ops = {
+   .read = airtime_read,
+   .open = simple_open,
+   .llseek = default_llseek,
+};
+
 #ifdef CONFIG_PM
 static ssize_t reset_write(struct file *file, const char __user *user_buf,
   size_t count, loff_t *ppos)
@@ -386,8 +426,12 @@ void debugfs_hw_add(struct ieee80211_local *local)
if (local->ops->wake_tx_queue)
DEBUGFS_ADD_MODE(aqm, 0600);
 
-   debugfs_create_u16("airtime_flags", 0600,
-  phyd, &local->airtime_flags);
+   if (wiphy_ext_feature_isset(local->hw.wiphy,
+   NL80211_EXT_FEATURE_AIRTIME_FAIRNESS)) {
+   DEBUGFS_ADD_MODE(airtime, 0600);
+   debugfs_create_u16("airtime_flags", 0600,
+  phyd, &local->airtime_flags);
+   }
 
statsd = debugfs_create_dir("statistics", phyd);
 
diff --git a/net/mac80211/debugfs_sta.c b/net/mac80211/debugfs_sta.c
index 3aa618d..80028da 100644
--- a/net/mac80211/debugfs_sta.c
+++ b/net/mac80211/debugfs_sta.c
@@ -203,7 +203,7 @@ static ssize_t sta_airtime_read(struct file *file, char 
__user *userbuf,
size_t bufsz = 200;
char *buf = kzalloc(bufsz, GFP_KERNEL), *p = buf;
u64 rx_airtime = 0, tx_airtime = 0;
-   s64 deficit[IEEE80211_NUM_ACS];
+   u64 v_t[IEEE80211_NUM_ACS];
ssize_t rv;
int ac;
 
@@ -214,20 +214,20 @@ static ssize_t sta_airtime_read(struct file *file, char 
__user *userbuf,
spin_lock_bh(&local->active_txq_lock[ac]);
rx_airtime += sta->airtime[ac].rx_airtime;
tx_airtime += sta->airtime[ac].tx_airtime;
-   deficit[ac] = sta->airtime[ac].deficit;
+   v_t[ac] = sta->airtime[ac].v_t;
spin_unlock_bh(&local->active_txq_lock[ac]);
}
 
p += scnprintf(p, bufsz + buf - p,
"RX: %llu us\nTX: %llu us\nWeight: %u\n"
-   "Deficit: VO: %lld us VI: %lld us BE: %lld 

[PATCH V2 3/4] mac80211: fix low throughput in multi-clients situation

2019-09-18 Thread Yibo Zhao
Not long after the start of multi-clients test, not a single station is
an eligible candidate for transmission since global virtual time(g_vt) is
smaller than the virtual airtime(s_vt) of all the stations. As a result,
the Tx has been blocked and throughput is quite low.

This may mainly due to sync mechanism and accumulative deviation from the
devision calculation of g_vt.

For example:
Suppose we have 50 clients in first round.
Round 1:
STA weight  Tx_time_round  wt_sum   s_vtg_vt  valid_for_next_Tx
.   .   .   .   .
.   .   .   .   .
.   .   .   .   .

After this round, all the stations are not valid for next transmission due
to accumulative deviation.

And if we add a new #51,
Round 2:
STA weight  Tx_time_round   wt_sum  s_vtg_vt  valid_for_next_Tx
.   .   .   .   .
.   .   .   .   .
.   .   .   .   .

Sync is done by:
max(g_vt of last round - grace period, s_vt)
and s_vt of #51 = max(2000 - 500, 0) + 1024 = 2524, and it is more than the
final g_vt of this round.

After this round, no more station is valid for transmission.

The real situation can be more complicate, above is one of the extremely
case.

To avoid this situation to occur, the new proposal is:

- Increase the airtime grace period a little more to reduce the
  unexpected sync

- If global virtual time is less than the virtual airtime of any station,
  sync it to the airtime of first station in the red-black tree

- Round the division result

Co-developed-by: Toke Høiland-Jørgensen 
Signed-off-by: Yibo Zhao 
Signed-off-by: Toke Høiland-Jørgensen 
---
 net/mac80211/sta_info.c |  3 ++-
 net/mac80211/sta_info.h |  2 +-
 net/mac80211/tx.c   | 16 +++-
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index 9d01fdd..feac975 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -1852,7 +1852,8 @@ void ieee80211_sta_register_airtime(struct ieee80211_sta 
*pubsta, u8 tid,
 
weight_sum = local->airtime_weight_sum[ac] ?: sta->airtime_weight;
 
-   local->airtime_v_t[ac] += airtime / weight_sum;
+   /* Round the calculation of global vt */
+   local->airtime_v_t[ac] += (airtime + (weight_sum >> 1)) / weight_sum;
sta->airtime[ac].v_t += airtime / sta->airtime_weight;
ieee80211_resort_txq(&local->hw, txq);
 
diff --git a/net/mac80211/sta_info.h b/net/mac80211/sta_info.h
index 5c1cac9..5055f94 100644
--- a/net/mac80211/sta_info.h
+++ b/net/mac80211/sta_info.h
@@ -130,7 +130,7 @@ enum ieee80211_agg_stop_reason {
 /* Debugfs flags to enable/disable use of RX/TX airtime in scheduler */
 #define AIRTIME_USE_TX BIT(0)
 #define AIRTIME_USE_RX BIT(1)
-#define AIRTIME_GRACE 500 /* usec of grace period before reset */
+#define AIRTIME_GRACE 2000 /* usec of grace period before reset */
 
 struct airtime_info {
u64 rx_airtime;
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 42ca010..60cf569 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -3867,15 +3867,29 @@ bool ieee80211_txq_may_transmit(struct ieee80211_hw *hw,
struct ieee80211_txq *txq)
 {
struct ieee80211_local *local = hw_to_local(hw);
-   struct txq_info *txqi = to_txq_info(txq);
+   struct txq_info *first_txqi, *txqi = to_txq_info(txq);
+   struct rb_node *node = NULL;
struct sta_info *sta;
u8 ac = txq->ac;
+   first_txqi = NULL;
 
lockdep_assert_held(&local->active_txq_lock[ac]);
 
if (!txqi->txq.sta)
return true;
 
+   node = rb_first_cached(&local->active_txqs[ac]);
+   if (node) {
+   first_txqi = container_of(node, struct txq_info,
+ schedule_order);
+   if (first_txqi->txq.sta) {
+   sta = container_of(first_txqi->txq.sta,
+  struct sta_info, sta);
+   if (local->airtime_v_t[ac] < sta->airtime[ac].v_t)
+   local->airtime_v_t[ac] = sta->airtime[ac].v_t;
+   }
+   }
+
sta = container_of(txqi->txq.sta, struct sta_info, sta);
return (sta->airtime[ac].v_t <= local->airtime_v_t[ac]);
 }
-- 
1.9.1



[PATCH V2 4/4] mac80211: Sync airtime weight sum with per AC synced sta airtime weight together

2019-09-18 Thread Yibo Zhao
Global airtime weight sum is updated only when txq is added/removed
from rbtree. If upper layer configures sta weight during high load,
airtime weight sum will not be updated since txq is most likely on the
tree. It could a little late for upper layer to reconfigure sta weight
when txq is already in the rbtree. And thus, incorrect airtime weight sum
will lead to incorrect global virtual time calculation as well as overflow
of airtime weight sum during txq removed.

Hence, need to update airtime weight sum upon receiving event for
configuring sta weight once sta's txq is on the rbtree.

Besides, if airtime weight sum of ACs and sta weight is synced under the
same per AC lock protection, there can be a very short window causing
incorrct airtime weight sum calculation as below:

active_txq_lock_VO  .
VO weight sum is syncd  .
sta airtime weight sum is synced.
active_txq_unlock_VO.
.   .
active_txq_lock_VI  .
VI weight sum is syncd  .
sta airtime weight sum  active_txq_lock_BE
active_txq_unlock_VI  Remove txq and thus sum
. is calculated with synced
. sta airtime weight
.   active_txq_unlock_BE

So introduce a per ac synced station airtime weight synced with per
AC synced weight sum together. And the per-AC station airtime weight
is used to calculate weight sum.

Signed-off-by: Yibo Zhao 
---
 net/mac80211/cfg.c  | 27 +--
 net/mac80211/sta_info.c |  6 --
 net/mac80211/sta_info.h |  3 +++
 net/mac80211/tx.c   |  4 ++--
 4 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
index d65aa01..4b420bb 100644
--- a/net/mac80211/cfg.c
+++ b/net/mac80211/cfg.c
@@ -1284,7 +1284,8 @@ static int sta_apply_parameters(struct ieee80211_local 
*local,
int ret = 0;
struct ieee80211_supported_band *sband;
struct ieee80211_sub_if_data *sdata = sta->sdata;
-   u32 mask, set;
+   u32 mask, set, tid, ac;
+   struct txq_info *txqi;
 
sband = ieee80211_get_sband(sdata);
if (!sband)
@@ -1452,8 +1453,30 @@ static int sta_apply_parameters(struct ieee80211_local 
*local,
if (ieee80211_vif_is_mesh(&sdata->vif))
sta_apply_mesh_params(local, sta, params);
 
-   if (params->airtime_weight)
+   if (params->airtime_weight &&
+   params->airtime_weight != sta->airtime_weight) {
+   for (ac = 0; ac < IEEE80211_NUM_ACS; ac++) {
+   spin_lock_bh(&local->active_txq_lock[ac]);
+   for (tid = 0; tid < IEEE80211_NUM_TIDS + 1; tid++) {
+   if (!sta->sta.txq[tid] ||
+   ac != ieee80211_ac_from_tid(tid))
+   continue;
+
+   sta->airtime_weight_synced[ac] =
+   params->airtime_weight;
+
+   txqi = to_txq_info(sta->sta.txq[tid]);
+   if (RB_EMPTY_NODE(&txqi->schedule_order))
+   continue;
+
+   local->airtime_weight_sum[ac] = 
local->airtime_weight_sum[ac] +
+   
params->airtime_weight -
+   
sta->airtime_weight;
+   }
+   spin_unlock_bh(&local->active_txq_lock[ac]);
+   }
sta->airtime_weight = params->airtime_weight;
+   }
 
/* set the STA state after all sta info from usermode has been set */
if (test_sta_flag(sta, WLAN_STA_TDLS_PEER) ||
diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index feac975..b00812f 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -389,6 +389,7 @@ struct sta_info *sta_info_alloc(struct 
ieee80211_sub_if_data *sdata,
for (i = 0; i < IEEE80211_NUM_ACS; i++) {
skb_queue_head_init(&sta->ps_tx_buf[i]);
skb_queue_head_init(&sta->tx_filtered[i]);
+   sta->airtime_weight_synced[i] = sta->airtime_weight;
}
 
for (i = 0; i < IEEE80211_NUM_TIDS; i++)
@@ -1850,11 +1851,12 @@ void ieee80211_sta_register_airtime(struct 
ieee80211_sta *pubsta, u8 tid,
sta->airtime[ac].tx_airtime += tx_airtime;
sta->airtime[ac].rx_airtime += rx_airtime;
 
-   weight_sum = local->airtime_weight_sum[ac] ?: sta->airtime_weight;
+   weight_sum = local->airtime_we

Re: [PATCH 3/4] mac80211: fix low throughput in push pull mode

2019-09-18 Thread Yibo Zhao

On 2019-09-18 05:12, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:


On 2019-09-16 23:27, Johannes Berg wrote:

Without really looking at the code -


If station is ineligible for transmission in
ieee80211_txq_may_transmit(),
no packet will be delivered to FW. During the tests in push-pull 
mode

with
many clients, after several seconds, not a single station is an
eligible
candidate for transmission since global time is smaller than all the
station's virtual airtime. As a consequence, the Tx has been blocked
and
throughput is quite low.


You should rewrite this to be, erm, a bit more understandable in
mac80211 context. I assume you're speaking (mostly?) about ath10k, 
but

I
have very little context there. "push pull mode"? "firmware"? These
things are not something mac80211 knows about.

Hi Johannes,

Thanks for your kindly reminder. Will rewrite the commit log.




Co-developed-by: Yibo Zhao 


That also seems wrong, should be Toke I guess, unless you intended 
for

a
From: Toke to be present?

Do you mean it should be something like:

Co-developed-by: Toke Høiland-Jørgensen 
Signed-off-by: Yibo Zhao 
Signed-off-by: Toke Høiland-Jørgensen 

Am I understanding right?


I think the right thing here, as with the previous patch, is to just
drop my sign-off; you're writing this patch, and I'll add ack/reviews 
as

appropriate. And in that case, well, no need to have co-developed-by
yourself when your name is on the patch as author :)

-Toke
Sorry, I think I have missed checking your reply, please ignore the 
wrong signed-off in PATCH-V2.


--
Yibo


Re: [PATCH 4/4] mac80211: Sync airtime weight sum with per AC synced sta airtime weight together

2019-09-18 Thread Yibo Zhao

On 2019-09-18 05:24, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:


Global airtime weight sum is updated only when txq is added/removed
from rbtree. If upper layer configures sta weight during high load,
airtime weight sum will not be updated since txq is most likely on the
tree. It could a little late for upper layer to reconfigure sta weight
when txq is already in the rbtree. And thus, incorrect airtime weight 
sum
will lead to incorrect global virtual time calculation as well as 
global

airtime weight sum overflow of airtime weight sum during txq removed.

Hence, need to update airtime weight sum upon receiving event for
configuring sta weight once sta's txq is on the rbtree.

Besides, if airtime weight sum of ACs and sta weight is synced under 
the

same per AC lock protection, there can be a very short window causing
incorrct airtime weight sum calculation as below:

active_txq_lock_VO  .
VO weight sum is syncd  .
sta airtime weight sum is synced.
active_txq_unlock_VO.
.   .
active_txq_lock_VI  .
VI weight sum is syncd  .
sta airtime weight sum  active_txq_lock_BE
active_txq_unlock_VI  Remove txq and thus sum
. is calculated with synced
. sta airtime weight
.   active_txq_unlock_BE

So introduce a per ac synced station airtime weight synced with per
AC synced weight sum together. And the per-AC station airtime weight
is used to calculate weight sum.

Signed-off-by: Yibo Zhao 
---
 net/mac80211/cfg.c  | 27 +--
 net/mac80211/sta_info.c |  6 --
 net/mac80211/sta_info.h |  3 +++
 net/mac80211/tx.c   |  4 ++--
 4 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
index d65aa01..4b420bb 100644
--- a/net/mac80211/cfg.c
+++ b/net/mac80211/cfg.c
@@ -1284,7 +1284,8 @@ static int sta_apply_parameters(struct 
ieee80211_local *local,

int ret = 0;
struct ieee80211_supported_band *sband;
struct ieee80211_sub_if_data *sdata = sta->sdata;
-   u32 mask, set;
+   u32 mask, set, tid, ac;
+   struct txq_info *txqi;

sband = ieee80211_get_sband(sdata);
if (!sband)
@@ -1452,8 +1453,30 @@ static int sta_apply_parameters(struct 
ieee80211_local *local,

if (ieee80211_vif_is_mesh(&sdata->vif))
sta_apply_mesh_params(local, sta, params);

-   if (params->airtime_weight)
+   if (params->airtime_weight &&
+   params->airtime_weight != sta->airtime_weight) {
+   for (ac = 0; ac < IEEE80211_NUM_ACS; ac++) {
+   spin_lock_bh(&local->active_txq_lock[ac]);
+   for (tid = 0; tid < IEEE80211_NUM_TIDS + 1; tid++) {
+   if (!sta->sta.txq[tid] ||
+   ac != ieee80211_ac_from_tid(tid))
+   continue;
+
+   sta->airtime_weight_synced[ac] =
+   params->airtime_weight;
+
+   txqi = to_txq_info(sta->sta.txq[tid]);
+   if (RB_EMPTY_NODE(&txqi->schedule_order))
+   continue;
+
+   local->airtime_weight_sum[ac] = 
local->airtime_weight_sum[ac] +
+   
params->airtime_weight -
+   
sta->airtime_weight;
+   }
+   spin_unlock_bh(&local->active_txq_lock[ac]);
+   }
sta->airtime_weight = params->airtime_weight;


With this, airtime_weight is basically only used to return to and from
userspace, right? I.e., after the above loop has run, it will match the
contents of airtime_weight_synced; so why not just turn airtime_weight
into  a per-ac array? You could just use airtime_weight[0] as the value
to return to userspace...
Yes, I also feel it is a little weird to keep both of them. I am fine 
with suggestion.




-Toke


--
Yibo

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH 3/4] mac80211: fix low throughput in push pull mode

2019-09-18 Thread Yibo Zhao

On 2019-09-18 18:16, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:


On 2019-09-18 05:12, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:


On 2019-09-16 23:27, Johannes Berg wrote:

Without really looking at the code -


If station is ineligible for transmission in
ieee80211_txq_may_transmit(),
no packet will be delivered to FW. During the tests in push-pull
mode
with
many clients, after several seconds, not a single station is an
eligible
candidate for transmission since global time is smaller than all 
the
station's virtual airtime. As a consequence, the Tx has been 
blocked

and
throughput is quite low.


You should rewrite this to be, erm, a bit more understandable in
mac80211 context. I assume you're speaking (mostly?) about ath10k,
but
I
have very little context there. "push pull mode"? "firmware"? These
things are not something mac80211 knows about.

Hi Johannes,

Thanks for your kindly reminder. Will rewrite the commit log.




Co-developed-by: Yibo Zhao 


That also seems wrong, should be Toke I guess, unless you intended
for
a
From: Toke to be present?

Do you mean it should be something like:

Co-developed-by: Toke Høiland-Jørgensen 
Signed-off-by: Yibo Zhao 
Signed-off-by: Toke Høiland-Jørgensen 

Am I understanding right?


I think the right thing here, as with the previous patch, is to just
drop my sign-off; you're writing this patch, and I'll add ack/reviews
as
appropriate. And in that case, well, no need to have co-developed-by
yourself when your name is on the patch as author :)

-Toke

Sorry, I think I have missed checking your reply, please ignore the
wrong signed-off in PATCH-V2.


While you're re-spinning, could you please add a changelog for the
changes you make? Makes it easier to keep track :)

You can add a cover-letter with a full changelog instead of having a
separate changelog for each patch; that's what I usually do...

-Toke

Sure, thanks.
--
Yibo

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH 2/4] mac80211: defer txqs removal from rbtree

2019-09-18 Thread Yibo Zhao

On 2019-09-18 05:10, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:


In a loop txqs dequeue scenario, if the first txq in the rbtree gets
removed from rbtree immediately in the ieee80211_return_txq(), the
loop will break soon in the ieee80211_next_txq() due to schedule_pos
not leading to the second txq in the rbtree. Thus, defering the
removal right before the end of this schedule round.

Co-developed-by: Yibo Zhao 
Signed-off-by: Yibo Zhao 
Signed-off-by: Toke Høiland-Jørgensen 


I didn't write this patch, so please don't use my sign-off. I'll add
ack or review tags as appropriate in reply; but a few comments first:


---
 include/net/mac80211.h | 16 ++--
 net/mac80211/ieee80211_i.h |  3 +++
 net/mac80211/main.c|  6 +
 net/mac80211/tx.c  | 63 
+++---

 4 files changed, 83 insertions(+), 5 deletions(-)

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index ac2ed8e..ba5a345 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -925,6 +925,8 @@ struct ieee80211_tx_rate {

 #define IEEE80211_MAX_TX_RETRY 31

+#define IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS 100
+
 static inline void ieee80211_rate_set_vht(struct ieee80211_tx_rate 
*rate,

  u8 mcs, u8 nss)
 {
@@ -6232,7 +6234,8 @@ struct sk_buff *ieee80211_tx_dequeue(struct 
ieee80211_hw *hw,

  * @ac: AC number to return packets from.
  *
  * Should only be called between calls to 
ieee80211_txq_schedule_start()

- * and ieee80211_txq_schedule_end().
+ * and ieee80211_txq_schedule_end(). If the txq is empty, it will be 
added

+ * to a remove list and get removed later.
  * Returns the next txq if successful, %NULL if no queue is eligible. 
If a txq
  * is returned, it should be returned with ieee80211_return_txq() 
after the

  * driver has finished scheduling it.
@@ -6268,7 +6271,8 @@ void ieee80211_txq_schedule_start(struct 
ieee80211_hw *hw, u8 ac)

  * @hw: pointer as obtained from ieee80211_alloc_hw()
  * @ac: AC number to acquire locks for
  *
- * Release locks previously acquired by ieee80211_txq_schedule_end().
+ * Release locks previously acquired by ieee80211_txq_schedule_end(). 
Check

+ * and remove the empty txq from rb-tree.
  */
 void ieee80211_txq_schedule_end(struct ieee80211_hw *hw, u8 ac)
__releases(txq_lock);
@@ -6287,6 +6291,14 @@ void ieee80211_schedule_txq(struct ieee80211_hw 
*hw, struct ieee80211_txq *txq)

__acquires(txq_lock) __releases(txq_lock);

 /**
+ * ieee80211_txqs_check - Check txqs waiting for removal
+ *
+ * @tmr: pointer as obtained from local
+ *
+ */
+void ieee80211_txqs_check(struct timer_list *tmr);
+
+/**
  * ieee80211_txq_may_transmit - check whether TXQ is allowed to 
transmit

  *
  * This function is used to check whether given txq is allowed to 
transmit by

diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index a4556f9..49aa143e 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -847,6 +847,7 @@ struct txq_info {
struct codel_stats cstats;
struct sk_buff_head frags;
struct rb_node schedule_order;
+   struct list_head candidate;
unsigned long flags;

/* keep last! */
@@ -1145,6 +1146,8 @@ struct ieee80211_local {
u64 airtime_v_t[IEEE80211_NUM_ACS];
u64 airtime_weight_sum[IEEE80211_NUM_ACS];

+   struct list_head remove_list[IEEE80211_NUM_ACS];
+   struct timer_list remove_timer;
u16 airtime_flags;

const struct ieee80211_ops *ops;
diff --git a/net/mac80211/main.c b/net/mac80211/main.c
index e9ffa8e..78fe24a 100644
--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -667,10 +667,15 @@ struct ieee80211_hw 
*ieee80211_alloc_hw_nm(size_t priv_data_len,


for (i = 0; i < IEEE80211_NUM_ACS; i++) {
local->active_txqs[i] = RB_ROOT_CACHED;
+   INIT_LIST_HEAD(&local->remove_list[i]);
spin_lock_init(&local->active_txq_lock[i]);
}
local->airtime_flags = AIRTIME_USE_TX | AIRTIME_USE_RX;

+   timer_setup(&local->remove_timer, ieee80211_txqs_check, 0);
+   mod_timer(&local->remove_timer,
+		  jiffies + 
msecs_to_jiffies(IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS));

+
INIT_LIST_HEAD(&local->chanctx_list);
mutex_init(&local->chanctx_mtx);

@@ -1305,6 +1310,7 @@ void ieee80211_unregister_hw(struct ieee80211_hw 
*hw)

tasklet_kill(&local->tx_pending_tasklet);
tasklet_kill(&local->tasklet);

+   del_timer_sync(&local->remove_timer);
 #ifdef CONFIG_INET
unregister_inetaddr_notifier(&local->ifa_notifier);
 #endif
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index d00baaa..42ca010 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -1450,6 +1450,7 @@ void ieee80211_txq_init(struct 
ieee80211_sub_if_dat

Re: [PATCH V2 3/4] mac80211: fix low throughput in multi-clients situation

2019-09-18 Thread Yibo Zhao

On 2019-09-18 17:59, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:

Not long after the start of multi-clients test, not a single station 
is
an eligible candidate for transmission since global virtual time(g_vt) 
is
smaller than the virtual airtime(s_vt) of all the stations. As a 
result,

the Tx has been blocked and throughput is quite low.

This may mainly due to sync mechanism and accumulative deviation from 
the

devision calculation of g_vt.

For example:
Suppose we have 50 clients in first round.
Round 1:
STA weight  Tx_time_round  wt_sum   s_vtg_vt  valid_for_next_Tx
.   .   .   .   .
.   .   .   .   .
.   .   .   .   .

After this round, all the stations are not valid for next transmission 
due

to accumulative deviation.

And if we add a new #51,
Round 2:
STA weight  Tx_time_round   wt_sum  s_vtg_vt  valid_for_next_Tx
.   .   .   .   .
.   .   .   .   .
.   .   .   .   .

Sync is done by:
max(g_vt of last round - grace period, s_vt)
and s_vt of #51 = max(2000 - 500, 0) + 1024 = 2524, and it is more 
than the

final g_vt of this round.

After this round, no more station is valid for transmission.


I'm not sure I understand this. Was there supposed to be numbers in
those tables above?
Yes, it looks like there are some display issues. Will fix it in next 
version.


-Toke


--
Yibo


Re: [PATCH 2/4] mac80211: defer txqs removal from rbtree

2019-09-19 Thread Yibo Zhao

On 2019-09-18 19:23, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:


On 2019-09-18 05:10, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:


In a loop txqs dequeue scenario, if the first txq in the rbtree gets
removed from rbtree immediately in the ieee80211_return_txq(), the
loop will break soon in the ieee80211_next_txq() due to schedule_pos
not leading to the second txq in the rbtree. Thus, defering the
removal right before the end of this schedule round.

Co-developed-by: Yibo Zhao 
Signed-off-by: Yibo Zhao 
Signed-off-by: Toke Høiland-Jørgensen 


I didn't write this patch, so please don't use my sign-off. I'll add
ack or review tags as appropriate in reply; but a few comments first:


---
 include/net/mac80211.h | 16 ++--
 net/mac80211/ieee80211_i.h |  3 +++
 net/mac80211/main.c|  6 +
 net/mac80211/tx.c  | 63
+++---
 4 files changed, 83 insertions(+), 5 deletions(-)

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index ac2ed8e..ba5a345 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -925,6 +925,8 @@ struct ieee80211_tx_rate {

 #define IEEE80211_MAX_TX_RETRY 31

+#define IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS 100
+
 static inline void ieee80211_rate_set_vht(struct ieee80211_tx_rate
*rate,
  u8 mcs, u8 nss)
 {
@@ -6232,7 +6234,8 @@ struct sk_buff *ieee80211_tx_dequeue(struct
ieee80211_hw *hw,
  * @ac: AC number to return packets from.
  *
  * Should only be called between calls to
ieee80211_txq_schedule_start()
- * and ieee80211_txq_schedule_end().
+ * and ieee80211_txq_schedule_end(). If the txq is empty, it will 
be

added
+ * to a remove list and get removed later.
  * Returns the next txq if successful, %NULL if no queue is 
eligible.

If a txq
  * is returned, it should be returned with ieee80211_return_txq()
after the
  * driver has finished scheduling it.
@@ -6268,7 +6271,8 @@ void ieee80211_txq_schedule_start(struct
ieee80211_hw *hw, u8 ac)
  * @hw: pointer as obtained from ieee80211_alloc_hw()
  * @ac: AC number to acquire locks for
  *
- * Release locks previously acquired by 
ieee80211_txq_schedule_end().
+ * Release locks previously acquired by 
ieee80211_txq_schedule_end().

Check
+ * and remove the empty txq from rb-tree.
  */
 void ieee80211_txq_schedule_end(struct ieee80211_hw *hw, u8 ac)
__releases(txq_lock);
@@ -6287,6 +6291,14 @@ void ieee80211_schedule_txq(struct 
ieee80211_hw

*hw, struct ieee80211_txq *txq)
__acquires(txq_lock) __releases(txq_lock);

 /**
+ * ieee80211_txqs_check - Check txqs waiting for removal
+ *
+ * @tmr: pointer as obtained from local
+ *
+ */
+void ieee80211_txqs_check(struct timer_list *tmr);
+
+/**
  * ieee80211_txq_may_transmit - check whether TXQ is allowed to
transmit
  *
  * This function is used to check whether given txq is allowed to
transmit by
diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index a4556f9..49aa143e 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -847,6 +847,7 @@ struct txq_info {
struct codel_stats cstats;
struct sk_buff_head frags;
struct rb_node schedule_order;
+   struct list_head candidate;
unsigned long flags;

/* keep last! */
@@ -1145,6 +1146,8 @@ struct ieee80211_local {
u64 airtime_v_t[IEEE80211_NUM_ACS];
u64 airtime_weight_sum[IEEE80211_NUM_ACS];

+   struct list_head remove_list[IEEE80211_NUM_ACS];
+   struct timer_list remove_timer;
u16 airtime_flags;

const struct ieee80211_ops *ops;
diff --git a/net/mac80211/main.c b/net/mac80211/main.c
index e9ffa8e..78fe24a 100644
--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -667,10 +667,15 @@ struct ieee80211_hw
*ieee80211_alloc_hw_nm(size_t priv_data_len,

for (i = 0; i < IEEE80211_NUM_ACS; i++) {
local->active_txqs[i] = RB_ROOT_CACHED;
+   INIT_LIST_HEAD(&local->remove_list[i]);
spin_lock_init(&local->active_txq_lock[i]);
}
local->airtime_flags = AIRTIME_USE_TX | AIRTIME_USE_RX;

+   timer_setup(&local->remove_timer, ieee80211_txqs_check, 0);
+   mod_timer(&local->remove_timer,
+ jiffies +
msecs_to_jiffies(IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS));
+
INIT_LIST_HEAD(&local->chanctx_list);
mutex_init(&local->chanctx_mtx);

@@ -1305,6 +1310,7 @@ void ieee80211_unregister_hw(struct 
ieee80211_hw

*hw)
tasklet_kill(&local->tx_pending_tasklet);
tasklet_kill(&local->tasklet);

+   del_timer_sync(&local->remove_timer);
 #ifdef CONFIG_INET
unregister_inetaddr_notifier(&local->ifa_notifier);
 #endif
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index d00baaa..42ca010 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -1450,6 +1450,7 

Re: [PATCH 2/4] mac80211: defer txqs removal from rbtree

2019-09-20 Thread Yibo Zhao

On 2019-09-19 18:37, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:


On 2019-09-18 19:23, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:


On 2019-09-18 05:10, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:

In a loop txqs dequeue scenario, if the first txq in the rbtree 
gets

removed from rbtree immediately in the ieee80211_return_txq(), the
loop will break soon in the ieee80211_next_txq() due to 
schedule_pos

not leading to the second txq in the rbtree. Thus, defering the
removal right before the end of this schedule round.

Co-developed-by: Yibo Zhao 
Signed-off-by: Yibo Zhao 
Signed-off-by: Toke Høiland-Jørgensen 


I didn't write this patch, so please don't use my sign-off. I'll 
add
ack or review tags as appropriate in reply; but a few comments 
first:



---
 include/net/mac80211.h | 16 ++--
 net/mac80211/ieee80211_i.h |  3 +++
 net/mac80211/main.c|  6 +
 net/mac80211/tx.c  | 63
+++---
 4 files changed, 83 insertions(+), 5 deletions(-)

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index ac2ed8e..ba5a345 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -925,6 +925,8 @@ struct ieee80211_tx_rate {

 #define IEEE80211_MAX_TX_RETRY 31

+#define IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS 100
+
 static inline void ieee80211_rate_set_vht(struct 
ieee80211_tx_rate

*rate,
  u8 mcs, u8 nss)
 {
@@ -6232,7 +6234,8 @@ struct sk_buff *ieee80211_tx_dequeue(struct
ieee80211_hw *hw,
  * @ac: AC number to return packets from.
  *
  * Should only be called between calls to
ieee80211_txq_schedule_start()
- * and ieee80211_txq_schedule_end().
+ * and ieee80211_txq_schedule_end(). If the txq is empty, it will
be
added
+ * to a remove list and get removed later.
  * Returns the next txq if successful, %NULL if no queue is
eligible.
If a txq
  * is returned, it should be returned with ieee80211_return_txq()
after the
  * driver has finished scheduling it.
@@ -6268,7 +6271,8 @@ void ieee80211_txq_schedule_start(struct
ieee80211_hw *hw, u8 ac)
  * @hw: pointer as obtained from ieee80211_alloc_hw()
  * @ac: AC number to acquire locks for
  *
- * Release locks previously acquired by
ieee80211_txq_schedule_end().
+ * Release locks previously acquired by
ieee80211_txq_schedule_end().
Check
+ * and remove the empty txq from rb-tree.
  */
 void ieee80211_txq_schedule_end(struct ieee80211_hw *hw, u8 ac)
__releases(txq_lock);
@@ -6287,6 +6291,14 @@ void ieee80211_schedule_txq(struct
ieee80211_hw
*hw, struct ieee80211_txq *txq)
__acquires(txq_lock) __releases(txq_lock);

 /**
+ * ieee80211_txqs_check - Check txqs waiting for removal
+ *
+ * @tmr: pointer as obtained from local
+ *
+ */
+void ieee80211_txqs_check(struct timer_list *tmr);
+
+/**
  * ieee80211_txq_may_transmit - check whether TXQ is allowed to
transmit
  *
  * This function is used to check whether given txq is allowed to
transmit by
diff --git a/net/mac80211/ieee80211_i.h 
b/net/mac80211/ieee80211_i.h

index a4556f9..49aa143e 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -847,6 +847,7 @@ struct txq_info {
struct codel_stats cstats;
struct sk_buff_head frags;
struct rb_node schedule_order;
+   struct list_head candidate;
unsigned long flags;

/* keep last! */
@@ -1145,6 +1146,8 @@ struct ieee80211_local {
u64 airtime_v_t[IEEE80211_NUM_ACS];
u64 airtime_weight_sum[IEEE80211_NUM_ACS];

+   struct list_head remove_list[IEEE80211_NUM_ACS];
+   struct timer_list remove_timer;
u16 airtime_flags;

const struct ieee80211_ops *ops;
diff --git a/net/mac80211/main.c b/net/mac80211/main.c
index e9ffa8e..78fe24a 100644
--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -667,10 +667,15 @@ struct ieee80211_hw
*ieee80211_alloc_hw_nm(size_t priv_data_len,

for (i = 0; i < IEEE80211_NUM_ACS; i++) {
local->active_txqs[i] = RB_ROOT_CACHED;
+   INIT_LIST_HEAD(&local->remove_list[i]);
spin_lock_init(&local->active_txq_lock[i]);
}
local->airtime_flags = AIRTIME_USE_TX | AIRTIME_USE_RX;

+   timer_setup(&local->remove_timer, ieee80211_txqs_check, 0);
+   mod_timer(&local->remove_timer,
+ jiffies +
msecs_to_jiffies(IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS));
+
INIT_LIST_HEAD(&local->chanctx_list);
mutex_init(&local->chanctx_mtx);

@@ -1305,6 +1310,7 @@ void ieee80211_unregister_hw(struct
ieee80211_hw
*hw)
tasklet_kill(&local->tx_pending_tasklet);
tasklet_kill(&local->tasklet);

+   del_timer_sync(&local->remove_timer);
 #ifdef CONFIG_INET
unregister_inetaddr_notifier(&local->ifa_notifier);
 #endif
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index d00b

Re: [PATCH 1/4] mac80211: Switch to a virtual time-based airtime scheduler

2019-09-20 Thread Yibo Zhao

On 2019-09-18 05:31, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:


From: Toke Høiland-Jørgensen 

This switches the airtime scheduler in mac80211 to use a virtual 
time-based

scheduler instead of the round-robin scheduler used before. This has a
couple of advantages:

- No need to sync up the round-robin scheduler in firmware/hardware 
with

  the round-robin airtime scheduler.

- If several stations are eligible for transmission we can schedule 
both of
  them; no need to hard-block the scheduling rotation until the head 
of the

  queue has used up its quantum.

- The check of whether a station is eligible for transmission becomes
  simpler (in ieee80211_txq_may_transmit()).

The drawback is that scheduling becomes slightly more expensive, as we 
need

to maintain an rbtree of TXQs sorted by virtual time. This means that
ieee80211_register_airtime() becomes O(logN) in the number of 
currently
scheduled TXQs. However, hopefully this number rarely grows too big 
(it's

only TXQs currently backlogged, not all associated stations), so it
shouldn't be too big of an issue.

Signed-off-by: Toke Høiland-Jørgensen 


I'll note that this patch still has the two issues that Felix pointed
out when I posted the RFC version. Namely:

- The use of divisions in the fast path. I guess I need to go write 
some

  reciprocal-calculation code, since that is also an issue with the AQL
  patches I linked to before.

- The fact that we don't count the airtime usage of multicast traffic,
  which with this series means that the vif TXQ will get priority over
  the others. I think we agreed to fix this by just adding an airtime
  v_t to the vif as well and use that for scheduling the TXQ. Does
  ath10k report airtime usage for multicast as well, or only for
  stations?


I remember we have Felix' patches reducing the time the lock is held in 
mac80211 for DRR, do we need to integrate it into this version?

-Toke


--
Yibo


Re: [PATCH 2/4] mac80211: defer txqs removal from rbtree

2019-09-21 Thread Yibo Zhao

On 2019-09-20 17:15, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:


On 2019-09-19 18:37, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:


On 2019-09-18 19:23, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:


On 2019-09-18 05:10, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:


In a loop txqs dequeue scenario, if the first txq in the rbtree
gets
removed from rbtree immediately in the ieee80211_return_txq(), 
the

loop will break soon in the ieee80211_next_txq() due to
schedule_pos
not leading to the second txq in the rbtree. Thus, defering the
removal right before the end of this schedule round.

Co-developed-by: Yibo Zhao 
Signed-off-by: Yibo Zhao 
Signed-off-by: Toke Høiland-Jørgensen 


I didn't write this patch, so please don't use my sign-off. I'll
add
ack or review tags as appropriate in reply; but a few comments
first:


---
 include/net/mac80211.h | 16 ++--
 net/mac80211/ieee80211_i.h |  3 +++
 net/mac80211/main.c|  6 +
 net/mac80211/tx.c  | 63
+++---
 4 files changed, 83 insertions(+), 5 deletions(-)

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index ac2ed8e..ba5a345 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -925,6 +925,8 @@ struct ieee80211_tx_rate {

 #define IEEE80211_MAX_TX_RETRY 31

+#define IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS 100
+
 static inline void ieee80211_rate_set_vht(struct
ieee80211_tx_rate
*rate,
  u8 mcs, u8 nss)
 {
@@ -6232,7 +6234,8 @@ struct sk_buff 
*ieee80211_tx_dequeue(struct

ieee80211_hw *hw,
  * @ac: AC number to return packets from.
  *
  * Should only be called between calls to
ieee80211_txq_schedule_start()
- * and ieee80211_txq_schedule_end().
+ * and ieee80211_txq_schedule_end(). If the txq is empty, it 
will

be
added
+ * to a remove list and get removed later.
  * Returns the next txq if successful, %NULL if no queue is
eligible.
If a txq
  * is returned, it should be returned with 
ieee80211_return_txq()

after the
  * driver has finished scheduling it.
@@ -6268,7 +6271,8 @@ void ieee80211_txq_schedule_start(struct
ieee80211_hw *hw, u8 ac)
  * @hw: pointer as obtained from ieee80211_alloc_hw()
  * @ac: AC number to acquire locks for
  *
- * Release locks previously acquired by
ieee80211_txq_schedule_end().
+ * Release locks previously acquired by
ieee80211_txq_schedule_end().
Check
+ * and remove the empty txq from rb-tree.
  */
 void ieee80211_txq_schedule_end(struct ieee80211_hw *hw, u8 ac)
__releases(txq_lock);
@@ -6287,6 +6291,14 @@ void ieee80211_schedule_txq(struct
ieee80211_hw
*hw, struct ieee80211_txq *txq)
__acquires(txq_lock) __releases(txq_lock);

 /**
+ * ieee80211_txqs_check - Check txqs waiting for removal
+ *
+ * @tmr: pointer as obtained from local
+ *
+ */
+void ieee80211_txqs_check(struct timer_list *tmr);
+
+/**
  * ieee80211_txq_may_transmit - check whether TXQ is allowed to
transmit
  *
  * This function is used to check whether given txq is allowed 
to

transmit by
diff --git a/net/mac80211/ieee80211_i.h
b/net/mac80211/ieee80211_i.h
index a4556f9..49aa143e 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -847,6 +847,7 @@ struct txq_info {
struct codel_stats cstats;
struct sk_buff_head frags;
struct rb_node schedule_order;
+   struct list_head candidate;
unsigned long flags;

/* keep last! */
@@ -1145,6 +1146,8 @@ struct ieee80211_local {
u64 airtime_v_t[IEEE80211_NUM_ACS];
u64 airtime_weight_sum[IEEE80211_NUM_ACS];

+   struct list_head remove_list[IEEE80211_NUM_ACS];
+   struct timer_list remove_timer;
u16 airtime_flags;

const struct ieee80211_ops *ops;
diff --git a/net/mac80211/main.c b/net/mac80211/main.c
index e9ffa8e..78fe24a 100644
--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -667,10 +667,15 @@ struct ieee80211_hw
*ieee80211_alloc_hw_nm(size_t priv_data_len,

for (i = 0; i < IEEE80211_NUM_ACS; i++) {
local->active_txqs[i] = RB_ROOT_CACHED;
+   INIT_LIST_HEAD(&local->remove_list[i]);
spin_lock_init(&local->active_txq_lock[i]);
}
local->airtime_flags = AIRTIME_USE_TX | AIRTIME_USE_RX;

+   timer_setup(&local->remove_timer, ieee80211_txqs_check, 0);
+   mod_timer(&local->remove_timer,
+ jiffies +
msecs_to_jiffies(IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS));
+
INIT_LIST_HEAD(&local->chanctx_list);
mutex_init(&local->chanctx_mtx);

@@ -1305,6 +1310,7 @@ void ieee80211_unregister_hw(struct
ieee80211_hw
*hw)
tasklet_kill(&local->tx_pending_tasklet);
tasklet_kill(&local->tasklet);

+   del_timer_sync(&local->remove_timer);
 #ifdef CONFIG_INET
unregister_inetaddr_notifier(&local->ifa_notifier);
 #endif
diff 

Re: [PATCH 2/4] mac80211: defer txqs removal from rbtree

2019-09-21 Thread Yibo Zhao

On 2019-09-21 19:27, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:


On 2019-09-20 17:15, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:


On 2019-09-19 18:37, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:


On 2019-09-18 19:23, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:


On 2019-09-18 05:10, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:

In a loop txqs dequeue scenario, if the first txq in the 
rbtree

gets
removed from rbtree immediately in the ieee80211_return_txq(),
the
loop will break soon in the ieee80211_next_txq() due to
schedule_pos
not leading to the second txq in the rbtree. Thus, defering 
the

removal right before the end of this schedule round.

Co-developed-by: Yibo Zhao 
Signed-off-by: Yibo Zhao 
Signed-off-by: Toke Høiland-Jørgensen 


I didn't write this patch, so please don't use my sign-off. 
I'll

add
ack or review tags as appropriate in reply; but a few comments
first:


---
 include/net/mac80211.h | 16 ++--
 net/mac80211/ieee80211_i.h |  3 +++
 net/mac80211/main.c|  6 +
 net/mac80211/tx.c  | 63
+++---
 4 files changed, 83 insertions(+), 5 deletions(-)

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index ac2ed8e..ba5a345 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -925,6 +925,8 @@ struct ieee80211_tx_rate {

 #define IEEE80211_MAX_TX_RETRY 31

+#define IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS 100
+
 static inline void ieee80211_rate_set_vht(struct
ieee80211_tx_rate
*rate,
  u8 mcs, u8 nss)
 {
@@ -6232,7 +6234,8 @@ struct sk_buff
*ieee80211_tx_dequeue(struct
ieee80211_hw *hw,
  * @ac: AC number to return packets from.
  *
  * Should only be called between calls to
ieee80211_txq_schedule_start()
- * and ieee80211_txq_schedule_end().
+ * and ieee80211_txq_schedule_end(). If the txq is empty, it
will
be
added
+ * to a remove list and get removed later.
  * Returns the next txq if successful, %NULL if no queue is
eligible.
If a txq
  * is returned, it should be returned with
ieee80211_return_txq()
after the
  * driver has finished scheduling it.
@@ -6268,7 +6271,8 @@ void ieee80211_txq_schedule_start(struct
ieee80211_hw *hw, u8 ac)
  * @hw: pointer as obtained from ieee80211_alloc_hw()
  * @ac: AC number to acquire locks for
  *
- * Release locks previously acquired by
ieee80211_txq_schedule_end().
+ * Release locks previously acquired by
ieee80211_txq_schedule_end().
Check
+ * and remove the empty txq from rb-tree.
  */
 void ieee80211_txq_schedule_end(struct ieee80211_hw *hw, u8 
ac)

__releases(txq_lock);
@@ -6287,6 +6291,14 @@ void ieee80211_schedule_txq(struct
ieee80211_hw
*hw, struct ieee80211_txq *txq)
__acquires(txq_lock) __releases(txq_lock);

 /**
+ * ieee80211_txqs_check - Check txqs waiting for removal
+ *
+ * @tmr: pointer as obtained from local
+ *
+ */
+void ieee80211_txqs_check(struct timer_list *tmr);
+
+/**
  * ieee80211_txq_may_transmit - check whether TXQ is allowed 
to

transmit
  *
  * This function is used to check whether given txq is 
allowed

to
transmit by
diff --git a/net/mac80211/ieee80211_i.h
b/net/mac80211/ieee80211_i.h
index a4556f9..49aa143e 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -847,6 +847,7 @@ struct txq_info {
struct codel_stats cstats;
struct sk_buff_head frags;
struct rb_node schedule_order;
+   struct list_head candidate;
unsigned long flags;

/* keep last! */
@@ -1145,6 +1146,8 @@ struct ieee80211_local {
u64 airtime_v_t[IEEE80211_NUM_ACS];
u64 airtime_weight_sum[IEEE80211_NUM_ACS];

+   struct list_head remove_list[IEEE80211_NUM_ACS];
+   struct timer_list remove_timer;
u16 airtime_flags;

const struct ieee80211_ops *ops;
diff --git a/net/mac80211/main.c b/net/mac80211/main.c
index e9ffa8e..78fe24a 100644
--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -667,10 +667,15 @@ struct ieee80211_hw
*ieee80211_alloc_hw_nm(size_t priv_data_len,

for (i = 0; i < IEEE80211_NUM_ACS; i++) {
local->active_txqs[i] = RB_ROOT_CACHED;
+   INIT_LIST_HEAD(&local->remove_list[i]);
spin_lock_init(&local->active_txq_lock[i]);
}
local->airtime_flags = AIRTIME_USE_TX | AIRTIME_USE_RX;

+   timer_setup(&local->remove_timer, ieee80211_txqs_check, 0);
+   mod_timer(&local->remove_timer,
+ jiffies +
msecs_to_jiffies(IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS));
+
INIT_LIST_HEAD(&local->chanctx_list);
mutex_init(&local->chanctx_mtx);

@@ -1305,6 +1310,7 @@ void ieee80211_unregister_hw(struct
ieee80211_hw
*hw)
tasklet_kill(&local->tx_pending_tasklet);
tasklet_kill(&local->tasklet);

+   del_timer_sync(&local->remove_timer);
 #ifdef CONFIG_INET
  

Re: [PATCH 2/4] mac80211: defer txqs removal from rbtree

2019-09-21 Thread Yibo Zhao

On 2019-09-21 19:27, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:


On 2019-09-20 17:15, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:


On 2019-09-19 18:37, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:


On 2019-09-18 19:23, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:


On 2019-09-18 05:10, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:

In a loop txqs dequeue scenario, if the first txq in the 
rbtree

gets
removed from rbtree immediately in the ieee80211_return_txq(),
the
loop will break soon in the ieee80211_next_txq() due to
schedule_pos
not leading to the second txq in the rbtree. Thus, defering 
the

removal right before the end of this schedule round.

Co-developed-by: Yibo Zhao 
Signed-off-by: Yibo Zhao 
Signed-off-by: Toke Høiland-Jørgensen 


I didn't write this patch, so please don't use my sign-off. 
I'll

add
ack or review tags as appropriate in reply; but a few comments
first:


---
 include/net/mac80211.h | 16 ++--
 net/mac80211/ieee80211_i.h |  3 +++
 net/mac80211/main.c|  6 +
 net/mac80211/tx.c  | 63
+++---
 4 files changed, 83 insertions(+), 5 deletions(-)

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index ac2ed8e..ba5a345 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -925,6 +925,8 @@ struct ieee80211_tx_rate {

 #define IEEE80211_MAX_TX_RETRY 31

+#define IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS 100
+
 static inline void ieee80211_rate_set_vht(struct
ieee80211_tx_rate
*rate,
  u8 mcs, u8 nss)
 {
@@ -6232,7 +6234,8 @@ struct sk_buff
*ieee80211_tx_dequeue(struct
ieee80211_hw *hw,
  * @ac: AC number to return packets from.
  *
  * Should only be called between calls to
ieee80211_txq_schedule_start()
- * and ieee80211_txq_schedule_end().
+ * and ieee80211_txq_schedule_end(). If the txq is empty, it
will
be
added
+ * to a remove list and get removed later.
  * Returns the next txq if successful, %NULL if no queue is
eligible.
If a txq
  * is returned, it should be returned with
ieee80211_return_txq()
after the
  * driver has finished scheduling it.
@@ -6268,7 +6271,8 @@ void ieee80211_txq_schedule_start(struct
ieee80211_hw *hw, u8 ac)
  * @hw: pointer as obtained from ieee80211_alloc_hw()
  * @ac: AC number to acquire locks for
  *
- * Release locks previously acquired by
ieee80211_txq_schedule_end().
+ * Release locks previously acquired by
ieee80211_txq_schedule_end().
Check
+ * and remove the empty txq from rb-tree.
  */
 void ieee80211_txq_schedule_end(struct ieee80211_hw *hw, u8 
ac)

__releases(txq_lock);
@@ -6287,6 +6291,14 @@ void ieee80211_schedule_txq(struct
ieee80211_hw
*hw, struct ieee80211_txq *txq)
__acquires(txq_lock) __releases(txq_lock);

 /**
+ * ieee80211_txqs_check - Check txqs waiting for removal
+ *
+ * @tmr: pointer as obtained from local
+ *
+ */
+void ieee80211_txqs_check(struct timer_list *tmr);
+
+/**
  * ieee80211_txq_may_transmit - check whether TXQ is allowed 
to

transmit
  *
  * This function is used to check whether given txq is 
allowed

to
transmit by
diff --git a/net/mac80211/ieee80211_i.h
b/net/mac80211/ieee80211_i.h
index a4556f9..49aa143e 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -847,6 +847,7 @@ struct txq_info {
struct codel_stats cstats;
struct sk_buff_head frags;
struct rb_node schedule_order;
+   struct list_head candidate;
unsigned long flags;

/* keep last! */
@@ -1145,6 +1146,8 @@ struct ieee80211_local {
u64 airtime_v_t[IEEE80211_NUM_ACS];
u64 airtime_weight_sum[IEEE80211_NUM_ACS];

+   struct list_head remove_list[IEEE80211_NUM_ACS];
+   struct timer_list remove_timer;
u16 airtime_flags;

const struct ieee80211_ops *ops;
diff --git a/net/mac80211/main.c b/net/mac80211/main.c
index e9ffa8e..78fe24a 100644
--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -667,10 +667,15 @@ struct ieee80211_hw
*ieee80211_alloc_hw_nm(size_t priv_data_len,

for (i = 0; i < IEEE80211_NUM_ACS; i++) {
local->active_txqs[i] = RB_ROOT_CACHED;
+   INIT_LIST_HEAD(&local->remove_list[i]);
spin_lock_init(&local->active_txq_lock[i]);
}
local->airtime_flags = AIRTIME_USE_TX | AIRTIME_USE_RX;

+   timer_setup(&local->remove_timer, ieee80211_txqs_check, 0);
+   mod_timer(&local->remove_timer,
+ jiffies +
msecs_to_jiffies(IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS));
+
INIT_LIST_HEAD(&local->chanctx_list);
mutex_init(&local->chanctx_mtx);

@@ -1305,6 +1310,7 @@ void ieee80211_unregister_hw(struct
ieee80211_hw
*hw)
tasklet_kill(&local->tx_pending_tasklet);
tasklet_kill(&local->tasklet);

+   del_timer_sync(&local->remove_timer);
 #ifdef CONFIG_INET
  

Re: [PATCH 2/4] mac80211: defer txqs removal from rbtree

2019-09-21 Thread Yibo Zhao

On 2019-09-21 21:02, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:


On 2019-09-21 19:27, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:


On 2019-09-20 17:15, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:


On 2019-09-19 18:37, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:


On 2019-09-18 19:23, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:


On 2019-09-18 05:10, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:


In a loop txqs dequeue scenario, if the first txq in the
rbtree
gets
removed from rbtree immediately in the 
ieee80211_return_txq(),

the
loop will break soon in the ieee80211_next_txq() due to
schedule_pos
not leading to the second txq in the rbtree. Thus, defering
the
removal right before the end of this schedule round.

Co-developed-by: Yibo Zhao 
Signed-off-by: Yibo Zhao 
Signed-off-by: Toke Høiland-Jørgensen 


I didn't write this patch, so please don't use my sign-off.
I'll
add
ack or review tags as appropriate in reply; but a few 
comments

first:


---
 include/net/mac80211.h | 16 ++--
 net/mac80211/ieee80211_i.h |  3 +++
 net/mac80211/main.c|  6 +
 net/mac80211/tx.c  | 63
+++---
 4 files changed, 83 insertions(+), 5 deletions(-)

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index ac2ed8e..ba5a345 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -925,6 +925,8 @@ struct ieee80211_tx_rate {

 #define IEEE80211_MAX_TX_RETRY 31

+#define IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS 100
+
 static inline void ieee80211_rate_set_vht(struct
ieee80211_tx_rate
*rate,
  u8 mcs, u8 nss)
 {
@@ -6232,7 +6234,8 @@ struct sk_buff
*ieee80211_tx_dequeue(struct
ieee80211_hw *hw,
  * @ac: AC number to return packets from.
  *
  * Should only be called between calls to
ieee80211_txq_schedule_start()
- * and ieee80211_txq_schedule_end().
+ * and ieee80211_txq_schedule_end(). If the txq is empty, 
it

will
be
added
+ * to a remove list and get removed later.
  * Returns the next txq if successful, %NULL if no queue is
eligible.
If a txq
  * is returned, it should be returned with
ieee80211_return_txq()
after the
  * driver has finished scheduling it.
@@ -6268,7 +6271,8 @@ void 
ieee80211_txq_schedule_start(struct

ieee80211_hw *hw, u8 ac)
  * @hw: pointer as obtained from ieee80211_alloc_hw()
  * @ac: AC number to acquire locks for
  *
- * Release locks previously acquired by
ieee80211_txq_schedule_end().
+ * Release locks previously acquired by
ieee80211_txq_schedule_end().
Check
+ * and remove the empty txq from rb-tree.
  */
 void ieee80211_txq_schedule_end(struct ieee80211_hw *hw, u8
ac)
__releases(txq_lock);
@@ -6287,6 +6291,14 @@ void ieee80211_schedule_txq(struct
ieee80211_hw
*hw, struct ieee80211_txq *txq)
__acquires(txq_lock) __releases(txq_lock);

 /**
+ * ieee80211_txqs_check - Check txqs waiting for removal
+ *
+ * @tmr: pointer as obtained from local
+ *
+ */
+void ieee80211_txqs_check(struct timer_list *tmr);
+
+/**
  * ieee80211_txq_may_transmit - check whether TXQ is 
allowed

to
transmit
  *
  * This function is used to check whether given txq is
allowed
to
transmit by
diff --git a/net/mac80211/ieee80211_i.h
b/net/mac80211/ieee80211_i.h
index a4556f9..49aa143e 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -847,6 +847,7 @@ struct txq_info {
struct codel_stats cstats;
struct sk_buff_head frags;
struct rb_node schedule_order;
+   struct list_head candidate;
unsigned long flags;

/* keep last! */
@@ -1145,6 +1146,8 @@ struct ieee80211_local {
u64 airtime_v_t[IEEE80211_NUM_ACS];
u64 airtime_weight_sum[IEEE80211_NUM_ACS];

+   struct list_head remove_list[IEEE80211_NUM_ACS];
+   struct timer_list remove_timer;
u16 airtime_flags;

const struct ieee80211_ops *ops;
diff --git a/net/mac80211/main.c b/net/mac80211/main.c
index e9ffa8e..78fe24a 100644
--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -667,10 +667,15 @@ struct ieee80211_hw
*ieee80211_alloc_hw_nm(size_t priv_data_len,

for (i = 0; i < IEEE80211_NUM_ACS; i++) {
local->active_txqs[i] = RB_ROOT_CACHED;
+   INIT_LIST_HEAD(&local->remove_list[i]);
spin_lock_init(&local->active_txq_lock[i]);
}
local->airtime_flags = AIRTIME_USE_TX | AIRTIME_USE_RX;

+	timer_setup(&local->remove_timer, ieee80211_txqs_check, 
0);

+   mod_timer(&local->remove_timer,
+ jiffies +
msecs_to_jiffies(IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS));
+
INIT_LIST_HEAD(&local->chanctx_list);
mutex_init(&local->chanctx_mtx);

@@ -1305,6 +1310,7 @@ void ieee80211_unregister_hw(struct
ieee80211_hw
*hw)
tasklet_kill(&local->tx_pending_tasklet);
tasklet_kill(&local-

Re: [PATCH 2/4] mac80211: defer txqs removal from rbtree

2019-09-21 Thread Yibo Zhao

On 2019-09-21 22:00, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:


On 2019-09-21 21:02, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:


On 2019-09-21 19:27, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:


On 2019-09-20 17:15, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:


On 2019-09-19 18:37, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:


On 2019-09-18 19:23, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:


On 2019-09-18 05:10, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:


In a loop txqs dequeue scenario, if the first txq in the
rbtree
gets
removed from rbtree immediately in the
ieee80211_return_txq(),
the
loop will break soon in the ieee80211_next_txq() due to
schedule_pos
not leading to the second txq in the rbtree. Thus, 
defering

the
removal right before the end of this schedule round.

Co-developed-by: Yibo Zhao 
Signed-off-by: Yibo Zhao 
Signed-off-by: Toke Høiland-Jørgensen 


I didn't write this patch, so please don't use my sign-off.
I'll
add
ack or review tags as appropriate in reply; but a few
comments
first:


---
 include/net/mac80211.h | 16 ++--
 net/mac80211/ieee80211_i.h |  3 +++
 net/mac80211/main.c|  6 +
 net/mac80211/tx.c  | 63
+++---
 4 files changed, 83 insertions(+), 5 deletions(-)

diff --git a/include/net/mac80211.h 
b/include/net/mac80211.h

index ac2ed8e..ba5a345 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -925,6 +925,8 @@ struct ieee80211_tx_rate {

 #define IEEE80211_MAX_TX_RETRY 31

+#define IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS 100
+
 static inline void ieee80211_rate_set_vht(struct
ieee80211_tx_rate
*rate,
  u8 mcs, u8 nss)
 {
@@ -6232,7 +6234,8 @@ struct sk_buff
*ieee80211_tx_dequeue(struct
ieee80211_hw *hw,
  * @ac: AC number to return packets from.
  *
  * Should only be called between calls to
ieee80211_txq_schedule_start()
- * and ieee80211_txq_schedule_end().
+ * and ieee80211_txq_schedule_end(). If the txq is empty,
it
will
be
added
+ * to a remove list and get removed later.
  * Returns the next txq if successful, %NULL if no queue 
is

eligible.
If a txq
  * is returned, it should be returned with
ieee80211_return_txq()
after the
  * driver has finished scheduling it.
@@ -6268,7 +6271,8 @@ void
ieee80211_txq_schedule_start(struct
ieee80211_hw *hw, u8 ac)
  * @hw: pointer as obtained from ieee80211_alloc_hw()
  * @ac: AC number to acquire locks for
  *
- * Release locks previously acquired by
ieee80211_txq_schedule_end().
+ * Release locks previously acquired by
ieee80211_txq_schedule_end().
Check
+ * and remove the empty txq from rb-tree.
  */
 void ieee80211_txq_schedule_end(struct ieee80211_hw *hw, 
u8

ac)
__releases(txq_lock);
@@ -6287,6 +6291,14 @@ void ieee80211_schedule_txq(struct
ieee80211_hw
*hw, struct ieee80211_txq *txq)
__acquires(txq_lock) __releases(txq_lock);

 /**
+ * ieee80211_txqs_check - Check txqs waiting for removal
+ *
+ * @tmr: pointer as obtained from local
+ *
+ */
+void ieee80211_txqs_check(struct timer_list *tmr);
+
+/**
  * ieee80211_txq_may_transmit - check whether TXQ is
allowed
to
transmit
  *
  * This function is used to check whether given txq is
allowed
to
transmit by
diff --git a/net/mac80211/ieee80211_i.h
b/net/mac80211/ieee80211_i.h
index a4556f9..49aa143e 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -847,6 +847,7 @@ struct txq_info {
struct codel_stats cstats;
struct sk_buff_head frags;
struct rb_node schedule_order;
+   struct list_head candidate;
unsigned long flags;

/* keep last! */
@@ -1145,6 +1146,8 @@ struct ieee80211_local {
u64 airtime_v_t[IEEE80211_NUM_ACS];
u64 airtime_weight_sum[IEEE80211_NUM_ACS];

+   struct list_head remove_list[IEEE80211_NUM_ACS];
+   struct timer_list remove_timer;
u16 airtime_flags;

const struct ieee80211_ops *ops;
diff --git a/net/mac80211/main.c b/net/mac80211/main.c
index e9ffa8e..78fe24a 100644
--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -667,10 +667,15 @@ struct ieee80211_hw
*ieee80211_alloc_hw_nm(size_t priv_data_len,

for (i = 0; i < IEEE80211_NUM_ACS; i++) {
local->active_txqs[i] = RB_ROOT_CACHED;
+   INIT_LIST_HEAD(&local->remove_list[i]);
spin_lock_init(&local->active_txq_lock[i]);
}
local->airtime_flags = AIRTIME_USE_TX | AIRTIME_USE_RX;

+   timer_setup(&local->remove_timer, ieee80211_txqs_check,
0);
+   mod_timer(&local->remove_timer,
+ jiffies +
msecs_to_jiffies(IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS));
+
INIT_LIST_HEAD(&local->chanctx_list);
mutex_init(&local->chanctx_mtx);

@@ -1305,6 +1310,7 @@ void ieee80211_unregister_hw(struct
ieee80211_hw
*hw)
tasklet_kill

[PATCH V3 0/4] Enable virtual time-based airtime scheduler support on ath10k

2019-09-23 Thread Yibo Zhao
This series fix some issues when enabling virtual time-based airtime scheduler 
on ath10k.

Changes since v2:
  Changes station airtime weight to be per-AC based to avoid sync issue
  Remove Co-developed-by and Toke's sign-off as Toke suggested

Changes since v1:
  Modify the author of Co-developed-by as Johannes suggested

Toke Høiland-Jørgensen (1):
  mac80211: Switch to a virtual time-based airtime scheduler

Yibo Zhao (3):
  mac80211: defer txqs removal from rbtree
  mac80211: fix low throughput in multi-clients situation
  mac80211: Sync airtime weight sum with per AC synced sta airtime
weight together

 include/net/mac80211.h |  16 ++-
 net/mac80211/cfg.c |  29 -
 net/mac80211/debugfs.c |  48 +++-
 net/mac80211/debugfs_sta.c |  18 +--
 net/mac80211/ieee80211_i.h |  17 ++-
 net/mac80211/main.c|   8 +-
 net/mac80211/sta_info.c|  25 +++-
 net/mac80211/sta_info.h|   8 +-
 net/mac80211/tx.c  | 282 +
 9 files changed, 347 insertions(+), 104 deletions(-)

-- 
1.9.1



[PATCH V3 2/4] mac80211: defer txqs removal from rbtree

2019-09-23 Thread Yibo Zhao
In a loop txqs dequeue scenario, if the first txq in the rbtree gets
removed from rbtree immediately in the ieee80211_return_txq(), the
loop will break soon in the ieee80211_next_txq() due to schedule_pos
not leading to the second txq in the rbtree. Thus, defering the
removal right before the end of this schedule round.

Signed-off-by: Yibo Zhao 
---
 include/net/mac80211.h | 16 ++--
 net/mac80211/ieee80211_i.h |  3 +++
 net/mac80211/main.c|  6 +
 net/mac80211/tx.c  | 63 +++---
 4 files changed, 83 insertions(+), 5 deletions(-)

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index ac2ed8e..ba5a345 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -925,6 +925,8 @@ struct ieee80211_tx_rate {
 
 #define IEEE80211_MAX_TX_RETRY 31
 
+#define IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS 100
+
 static inline void ieee80211_rate_set_vht(struct ieee80211_tx_rate *rate,
  u8 mcs, u8 nss)
 {
@@ -6232,7 +6234,8 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw 
*hw,
  * @ac: AC number to return packets from.
  *
  * Should only be called between calls to ieee80211_txq_schedule_start()
- * and ieee80211_txq_schedule_end().
+ * and ieee80211_txq_schedule_end(). If the txq is empty, it will be added
+ * to a remove list and get removed later.
  * Returns the next txq if successful, %NULL if no queue is eligible. If a txq
  * is returned, it should be returned with ieee80211_return_txq() after the
  * driver has finished scheduling it.
@@ -6268,7 +6271,8 @@ void ieee80211_txq_schedule_start(struct ieee80211_hw 
*hw, u8 ac)
  * @hw: pointer as obtained from ieee80211_alloc_hw()
  * @ac: AC number to acquire locks for
  *
- * Release locks previously acquired by ieee80211_txq_schedule_end().
+ * Release locks previously acquired by ieee80211_txq_schedule_end(). Check
+ * and remove the empty txq from rb-tree.
  */
 void ieee80211_txq_schedule_end(struct ieee80211_hw *hw, u8 ac)
__releases(txq_lock);
@@ -6287,6 +6291,14 @@ void ieee80211_schedule_txq(struct ieee80211_hw *hw, 
struct ieee80211_txq *txq)
__acquires(txq_lock) __releases(txq_lock);
 
 /**
+ * ieee80211_txqs_check - Check txqs waiting for removal
+ *
+ * @tmr: pointer as obtained from local
+ *
+ */
+void ieee80211_txqs_check(struct timer_list *tmr);
+
+/**
  * ieee80211_txq_may_transmit - check whether TXQ is allowed to transmit
  *
  * This function is used to check whether given txq is allowed to transmit by
diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index a4556f9..49aa143e 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -847,6 +847,7 @@ struct txq_info {
struct codel_stats cstats;
struct sk_buff_head frags;
struct rb_node schedule_order;
+   struct list_head candidate;
unsigned long flags;
 
/* keep last! */
@@ -1145,6 +1146,8 @@ struct ieee80211_local {
u64 airtime_v_t[IEEE80211_NUM_ACS];
u64 airtime_weight_sum[IEEE80211_NUM_ACS];
 
+   struct list_head remove_list[IEEE80211_NUM_ACS];
+   struct timer_list remove_timer;
u16 airtime_flags;
 
const struct ieee80211_ops *ops;
diff --git a/net/mac80211/main.c b/net/mac80211/main.c
index e9ffa8e..78fe24a 100644
--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -667,10 +667,15 @@ struct ieee80211_hw *ieee80211_alloc_hw_nm(size_t 
priv_data_len,
 
for (i = 0; i < IEEE80211_NUM_ACS; i++) {
local->active_txqs[i] = RB_ROOT_CACHED;
+   INIT_LIST_HEAD(&local->remove_list[i]);
spin_lock_init(&local->active_txq_lock[i]);
}
local->airtime_flags = AIRTIME_USE_TX | AIRTIME_USE_RX;
 
+   timer_setup(&local->remove_timer, ieee80211_txqs_check, 0);
+   mod_timer(&local->remove_timer,
+ jiffies + 
msecs_to_jiffies(IEEE80211_AIRTIME_TXQ_RM_CHK_INTV_IN_MS));
+
INIT_LIST_HEAD(&local->chanctx_list);
mutex_init(&local->chanctx_mtx);
 
@@ -1305,6 +1310,7 @@ void ieee80211_unregister_hw(struct ieee80211_hw *hw)
tasklet_kill(&local->tx_pending_tasklet);
tasklet_kill(&local->tasklet);
 
+   del_timer_sync(&local->remove_timer);
 #ifdef CONFIG_INET
unregister_inetaddr_notifier(&local->ifa_notifier);
 #endif
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index d00baaa..42ca010 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -1450,6 +1450,7 @@ void ieee80211_txq_init(struct ieee80211_sub_if_data 
*sdata,
codel_stats_init(&txqi->cstats);
__skb_queue_head_init(&txqi->frags);
RB_CLEAR_NODE(&txqi->schedule_order);
+   INIT_LIST_HEAD(&txqi->candidate);
 
txqi->txq.vif = &sdata->vif;
 
@@ -3724,6 +3725,9 @@ void ieee80211_schedule_txq(str

[PATCH V3 1/4] mac80211: Switch to a virtual time-based airtime scheduler

2019-09-23 Thread Yibo Zhao
From: Toke Høiland-Jørgensen 

This switches the airtime scheduler in mac80211 to use a virtual time-based
scheduler instead of the round-robin scheduler used before. This has a
couple of advantages:

- No need to sync up the round-robin scheduler in firmware/hardware with
  the round-robin airtime scheduler.

- If several stations are eligible for transmission we can schedule both of
  them; no need to hard-block the scheduling rotation until the head of the
  queue has used up its quantum.

- The check of whether a station is eligible for transmission becomes
  simpler (in ieee80211_txq_may_transmit()).

The drawback is that scheduling becomes slightly more expensive, as we need
to maintain an rbtree of TXQs sorted by virtual time. This means that
ieee80211_register_airtime() becomes O(logN) in the number of currently
scheduled TXQs. However, hopefully this number rarely grows too big (it's
only TXQs currently backlogged, not all associated stations), so it
shouldn't be too big of an issue.

Signed-off-by: Toke Høiland-Jørgensen 
---
 net/mac80211/debugfs.c |  48 +-
 net/mac80211/debugfs_sta.c |  16 ++--
 net/mac80211/ieee80211_i.h |  14 ++-
 net/mac80211/main.c|   2 +-
 net/mac80211/sta_info.c|  19 +++-
 net/mac80211/sta_info.h|   3 +-
 net/mac80211/tx.c  | 217 +
 7 files changed, 223 insertions(+), 96 deletions(-)

diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c
index 2d43bc1..4847168 100644
--- a/net/mac80211/debugfs.c
+++ b/net/mac80211/debugfs.c
@@ -150,6 +150,46 @@ static ssize_t aqm_write(struct file *file,
.llseek = default_llseek,
 };
 
+static ssize_t airtime_read(struct file *file,
+   char __user *user_buf,
+   size_t count,
+   loff_t *ppos)
+{
+   struct ieee80211_local *local = file->private_data;
+   char buf[200];
+   u64 v_t[IEEE80211_NUM_ACS];
+   u64 wt[IEEE80211_NUM_ACS];
+   int len = 0, ac;
+
+   for (ac = 0; ac < IEEE80211_NUM_ACS; ac++) {
+   spin_lock_bh(&local->active_txq_lock[ac]);
+   v_t[ac] = local->airtime_v_t[ac];
+   wt[ac] = local->airtime_weight_sum[ac];
+   spin_unlock_bh(&local->active_txq_lock[ac]);
+   }
+   len = scnprintf(buf, sizeof(buf),
+   "\tVO VI BE BK\n"
+   "Virt-t\t%-10llu %-10llu %-10llu %-10llu\n"
+   "Weight\t%-10llu %-10llu %-10llu %-10llu\n",
+   v_t[0],
+   v_t[1],
+   v_t[2],
+   v_t[3],
+   wt[0],
+   wt[1],
+   wt[2],
+   wt[3]);
+
+   return simple_read_from_buffer(user_buf, count, ppos,
+  buf, len);
+}
+
+static const struct file_operations airtime_ops = {
+   .read = airtime_read,
+   .open = simple_open,
+   .llseek = default_llseek,
+};
+
 #ifdef CONFIG_PM
 static ssize_t reset_write(struct file *file, const char __user *user_buf,
   size_t count, loff_t *ppos)
@@ -386,8 +426,12 @@ void debugfs_hw_add(struct ieee80211_local *local)
if (local->ops->wake_tx_queue)
DEBUGFS_ADD_MODE(aqm, 0600);
 
-   debugfs_create_u16("airtime_flags", 0600,
-  phyd, &local->airtime_flags);
+   if (wiphy_ext_feature_isset(local->hw.wiphy,
+   NL80211_EXT_FEATURE_AIRTIME_FAIRNESS)) {
+   DEBUGFS_ADD_MODE(airtime, 0600);
+   debugfs_create_u16("airtime_flags", 0600,
+  phyd, &local->airtime_flags);
+   }
 
statsd = debugfs_create_dir("statistics", phyd);
 
diff --git a/net/mac80211/debugfs_sta.c b/net/mac80211/debugfs_sta.c
index 3aa618d..80028da 100644
--- a/net/mac80211/debugfs_sta.c
+++ b/net/mac80211/debugfs_sta.c
@@ -203,7 +203,7 @@ static ssize_t sta_airtime_read(struct file *file, char 
__user *userbuf,
size_t bufsz = 200;
char *buf = kzalloc(bufsz, GFP_KERNEL), *p = buf;
u64 rx_airtime = 0, tx_airtime = 0;
-   s64 deficit[IEEE80211_NUM_ACS];
+   u64 v_t[IEEE80211_NUM_ACS];
ssize_t rv;
int ac;
 
@@ -214,20 +214,20 @@ static ssize_t sta_airtime_read(struct file *file, char 
__user *userbuf,
spin_lock_bh(&local->active_txq_lock[ac]);
rx_airtime += sta->airtime[ac].rx_airtime;
tx_airtime += sta->airtime[ac].tx_airtime;
-   deficit[ac] = sta->airtime[ac].deficit;
+   v_t[ac] = sta->airtime[ac].v_t;
spin_unlock_bh(&local->active_txq_lock[ac]);
}
 
p += scnprintf(p, bufsz + buf - p,
"RX: %llu us\nTX: %llu us\nWeight: %u\n"
-   "Deficit: VO: %lld us VI: %lld us BE: %lld 

[PATCH V3 3/4] mac80211: fix low throughput in multi-clients situation

2019-09-23 Thread Yibo Zhao
Not long after the start of multi-clients test, not a single station is
an eligible candidate for transmission since global virtual time(g_vt) is
smaller than the virtual airtime(s_vt) of all the stations. As a result,
the Tx has been blocked and throughput is quite low.

This may mainly due to sync mechanism and accumulative deviation from the
devision calculation of g_vt.

For example:
Suppose we have 50 clients in first round.
Round 1:
STA weight  Tx_time_round  wt_sum   s_vtg_vt  valid_for_next_Tx
1   256 204812800   20482000N
2   256 20482048N
.   .   .   .   .
.   .   .   .   .
.   .   .   .   .
50  256 20482048N

After this round, all the stations are not valid for next transmission due to
accumulative deviation.

And if we add a new #51,
STA weight  Tx_time_round  wt_sum   s_vtg_vt  valid_for_next_Tx
1   256 204813056   20482020N
2   256 20482048N
.   .   .   .
.   .   .   .
.   .   .   .
50  256 20482048N
51  256 10242524N

Sync is done by:
max(g_vt of last round - grace period, s_vt)
and s_vt of #51 = max(2000 - 500, 0) + 1024 = 2524, and it is more than the 
final
g_vt of this round.

After this round, no more station is valid for transmission.

The real situation can be more complicate, above is one of the extremely case.

To avoid this situation to occur, the new proposal is:

- Increase the airtime grace period a little more to reduce the
  unexpected sync

- If global virtual time is less than the virtual airtime of any station,
  sync it to the airtime of first station in the red-black tree

- Round the division result

Signed-off-by: Yibo Zhao 
---
 net/mac80211/sta_info.c |  3 ++-
 net/mac80211/sta_info.h |  2 +-
 net/mac80211/tx.c   | 16 +++-
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index 9d01fdd..feac975 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -1852,7 +1852,8 @@ void ieee80211_sta_register_airtime(struct ieee80211_sta 
*pubsta, u8 tid,
 
weight_sum = local->airtime_weight_sum[ac] ?: sta->airtime_weight;
 
-   local->airtime_v_t[ac] += airtime / weight_sum;
+   /* Round the calculation of global vt */
+   local->airtime_v_t[ac] += (airtime + (weight_sum >> 1)) / weight_sum;
sta->airtime[ac].v_t += airtime / sta->airtime_weight;
ieee80211_resort_txq(&local->hw, txq);
 
diff --git a/net/mac80211/sta_info.h b/net/mac80211/sta_info.h
index 5c1cac9..5055f94 100644
--- a/net/mac80211/sta_info.h
+++ b/net/mac80211/sta_info.h
@@ -130,7 +130,7 @@ enum ieee80211_agg_stop_reason {
 /* Debugfs flags to enable/disable use of RX/TX airtime in scheduler */
 #define AIRTIME_USE_TX BIT(0)
 #define AIRTIME_USE_RX BIT(1)
-#define AIRTIME_GRACE 500 /* usec of grace period before reset */
+#define AIRTIME_GRACE 2000 /* usec of grace period before reset */
 
 struct airtime_info {
u64 rx_airtime;
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 42ca010..60cf569 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -3867,15 +3867,29 @@ bool ieee80211_txq_may_transmit(struct ieee80211_hw *hw,
struct ieee80211_txq *txq)
 {
struct ieee80211_local *local = hw_to_local(hw);
-   struct txq_info *txqi = to_txq_info(txq);
+   struct txq_info *first_txqi, *txqi = to_txq_info(txq);
+   struct rb_node *node = NULL;
struct sta_info *sta;
u8 ac = txq->ac;
+   first_txqi = NULL;
 
lockdep_assert_held(&local->active_txq_lock[ac]);
 
if (!txqi->txq.sta)
return true;
 
+   node = rb_first_cached(&local->active_txqs[ac]);
+   if (node) {
+   first_txqi = container_of(node, struct txq_info,
+ schedule_order);
+   if (first_txqi->txq.sta) {
+   sta = container_of(first_txqi->txq.sta,
+  struct sta_info, sta);
+   if (local->airtime_v_t[ac] < sta->airtime[ac].v_t)
+   local->airtime_v_t[ac] = sta->airtime[ac].v_t;
+   }
+   }
+
sta = container_of(txqi->txq.sta, struct sta_info, sta);
return (sta->airtime[ac].v_t <= local->airtime_v_t[ac]);
 }
-- 
1.9.1


___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


[PATCH V3 4/4] mac80211: Sync airtime weight sum with per AC synced sta airtime weight together

2019-09-23 Thread Yibo Zhao
Global airtime weight sum is updated only when txq is added/removed
from rbtree. If upper layer configures sta weight during high load,
airtime weight sum will not be updated since txq is most likely on the
tree. It could a little late for upper layer to reconfigure sta weight
when txq is already in the rbtree. And thus, incorrect airtime weight sum
will lead to incorrect global virtual time calculation as well as overflow
of airtime weight sum during txq removed.

Hence, need to update airtime weight sum upon receiving event for
configuring sta weight once sta's txq is on the rbtree.

Besides, if airtime weight sum of ACs and sta weight is synced under the
same per AC lock protection, there can be a very short window causing
incorrct airtime weight sum calculation as below:

active_txq_lock_VO  .
VO weight sum is syncd  .
sta airtime weight sum is synced.
active_txq_unlock_VO.
.   .
active_txq_lock_VI  .
VI weight sum is syncd  .
sta airtime weight sum  active_txq_lock_BE
active_txq_unlock_VI  Remove txq and thus sum
. is calculated with synced
. sta airtime weight
.   active_txq_unlock_BE

So introduce a per ac synced station airtime weight synced with per
AC synced weight sum together. And the per-AC station airtime weight
is used to calculate weight sum.

Signed-off-by: Yibo Zhao 
---
 net/mac80211/cfg.c | 29 ++---
 net/mac80211/debugfs_sta.c |  2 +-
 net/mac80211/sta_info.c|  9 -
 net/mac80211/sta_info.h|  5 +++--
 net/mac80211/tx.c  |  4 ++--
 5 files changed, 36 insertions(+), 13 deletions(-)

diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
index d65aa01..c8a0683 100644
--- a/net/mac80211/cfg.c
+++ b/net/mac80211/cfg.c
@@ -1284,7 +1284,8 @@ static int sta_apply_parameters(struct ieee80211_local 
*local,
int ret = 0;
struct ieee80211_supported_band *sband;
struct ieee80211_sub_if_data *sdata = sta->sdata;
-   u32 mask, set;
+   u32 mask, set, tid, ac, pre_weight;
+   struct txq_info *txqi;
 
sband = ieee80211_get_sband(sdata);
if (!sband)
@@ -1452,8 +1453,30 @@ static int sta_apply_parameters(struct ieee80211_local 
*local,
if (ieee80211_vif_is_mesh(&sdata->vif))
sta_apply_mesh_params(local, sta, params);
 
-   if (params->airtime_weight)
-   sta->airtime_weight = params->airtime_weight;
+   if (params->airtime_weight &&
+   params->airtime_weight != sta->airtime_weight) {
+   for (ac = 0; ac < IEEE80211_NUM_ACS; ac++) {
+   spin_lock_bh(&local->active_txq_lock[ac]);
+   for (tid = 0; tid < IEEE80211_NUM_TIDS + 1; tid++) {
+   if (!sta->sta.txq[tid] ||
+   ac != ieee80211_ac_from_tid(tid))
+   continue;
+
+   pre_weight = sta->airtime_weight[ac];
+   sta->airtime_weight[ac] =
+   params->airtime_weight;
+
+   txqi = to_txq_info(sta->sta.txq[tid]);
+   if (RB_EMPTY_NODE(&txqi->schedule_order))
+   continue;
+
+   local->airtime_weight_sum[ac] = 
local->airtime_weight_sum[ac] +
+   
params->airtime_weight -
+   pre_weight;
+   }
+   spin_unlock_bh(&local->active_txq_lock[ac]);
+   }
+   }
 
/* set the STA state after all sta info from usermode has been set */
if (test_sta_flag(sta, WLAN_STA_TDLS_PEER) ||
diff --git a/net/mac80211/debugfs_sta.c b/net/mac80211/debugfs_sta.c
index 80028da..43a7e6a 100644
--- a/net/mac80211/debugfs_sta.c
+++ b/net/mac80211/debugfs_sta.c
@@ -223,7 +223,7 @@ static ssize_t sta_airtime_read(struct file *file, char 
__user *userbuf,
"Virt-T: VO: %lld us VI: %lld us BE: %lld us BK: %lld us\n",
rx_airtime,
tx_airtime,
-   sta->airtime_weight,
+   sta->airtime_weight[0],
v_t[0],
v_t[1],
v_t[2],
diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index feac975..e599cf1 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -384,11 +384,10 @@ struct sta_info *sta_info_alloc(struct 
ieee80

Re: [PATCH 2/4] mac80211: defer txqs removal from rbtree

2019-09-23 Thread Yibo Zhao

On 2019-09-23 18:47, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:

So, instead we need to keep next_txq() the way it is, and just add


Right, should keep next_txq() the way it is.



local->schedule_pos[ac] = rb_prev(node);

whenever we remove a node (both in return_txq() and resort_txq()).


Agree, and also we may need to consider case like A is removed and 
soon

be added back just the same as ii),
B->C->A->D->E
then B is schedule, removed and soon added back,
C->A->B->D->E
A and B will have a second chance to be scheduled and this may happen 
to

others as well leading to the infinite loop as you have mentioned
previously, so do we need to maintain a schedule_round like we do in
DRR? Like,
 - If the node is in the same round, by pass schedule, go to
rb_next(), either continue loop this round or end this round.
 - Increase the schedule_round at the schedule_start() only when 
the

schedule_pos is NULL.


Hmm, yeah, I guess we could end up with a loop like that as well.
Keeping the schedule_round would be a way to fix it, but I'm not sure 
we

should just skip that station; maybe we should just end the round
instead?
I am not sure. I believe, in some cases, the rest of the nodes which 
could be most of the nodes in the tree will not have the chance to be 
scheduled in this round.





We'd still need a check in resort_txq() then, but it would make it
safe
to unschedule in return_txq()...

Yes, agree with that.





--
Yibo


Re: [PATCH V3 2/4] mac80211: defer txqs removal from rbtree

2019-09-23 Thread Yibo Zhao

On 2019-09-23 18:56, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:


In a loop txqs dequeue scenario, if the first txq in the rbtree gets
removed from rbtree immediately in the ieee80211_return_txq(), the
loop will break soon in the ieee80211_next_txq() due to schedule_pos
not leading to the second txq in the rbtree. Thus, defering the
removal right before the end of this schedule round.


Didn't we agree that we could fix this by making __unschedule_txq()
aware of schedule_pos instead of this deferred removal mechanism?


Yes, V3 is actually used to update the commit log of [PATCH V3 3/4] so 
that we can discuss in parallel with others, and [PATCH V3 4/4] for 
discussion. Please ignore [PATCH V3 2/4]. we can keep our discussion in 
V2 until we conclude the final one and then a new version will be sent 
out along with [mac80211: Switch to a virtual time-based airtime 
scheduler] which includes the reducing lock fix.


Sorry for the confusion, ;).



-Toke


--
Yibo

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH V3 4/4] mac80211: Sync airtime weight sum with per AC synced sta airtime weight together

2019-09-23 Thread Yibo Zhao

On 2019-09-23 19:00, Toke Høiland-Jørgensen wrote:


-   if (params->airtime_weight)
-   sta->airtime_weight = params->airtime_weight;
+   if (params->airtime_weight &&
+   params->airtime_weight != sta->airtime_weight) {


This check doesn't work I think? You're not using the array-based
sta->airtime_weight[], and there are locking issues by just checking
like this; so maybe just keep the if() on params->airtime_weight, and 
do

the checking inside the loop while holding the lock?


It should be array-based sta->airtime_weight[] and I am missing that 
part during the porting. But you are right about we should check it 
inside the loop with the lock.




Or could we just turn the weights into atomics to avoid the locking
entirely?


Actually, we still need the active txq locking to make sure the txq is 
on the rbtree. Otherwise, no need to change airtime weight sum.





+   for (ac = 0; ac < IEEE80211_NUM_ACS; ac++) {
+   spin_lock_bh(&local->active_txq_lock[ac]);
+   for (tid = 0; tid < IEEE80211_NUM_TIDS + 1; tid++) {
+   if (!sta->sta.txq[tid] ||
+   ac != ieee80211_ac_from_tid(tid))
+   continue;
+
+   pre_weight = sta->airtime_weight[ac];
+   sta->airtime_weight[ac] =
+   params->airtime_weight;
+
+   txqi = to_txq_info(sta->sta.txq[tid]);
+   if (RB_EMPTY_NODE(&txqi->schedule_order))
+   continue;
+
+   local->airtime_weight_sum[ac] = 
local->airtime_weight_sum[ac] +
+   
params->airtime_weight -
+   pre_weight;
+   }
+   spin_unlock_bh(&local->active_txq_lock[ac]);
+   }
+   }




--
Yibo


Re: [PATCH V3 3/4] mac80211: fix low throughput in multi-clients situation

2019-09-24 Thread Yibo Zhao

On 2019-09-23 18:55, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:

Not long after the start of multi-clients test, not a single station 
is
an eligible candidate for transmission since global virtual time(g_vt) 
is
smaller than the virtual airtime(s_vt) of all the stations. As a 
result,

the Tx has been blocked and throughput is quite low.

This may mainly due to sync mechanism and accumulative deviation from 
the

devision calculation of g_vt.

For example:
Suppose we have 50 clients in first round.
Round 1:
STA weight  Tx_time_round  wt_sum   s_vtg_vt  valid_for_next_Tx
1   256 204812800   20482000N
2   256 20482048N
.   .   .   .   .
.   .   .   .   .
.   .   .   .   .
50  256 20482048N

After this round, all the stations are not valid for next transmission 
due to

accumulative deviation.

And if we add a new #51,
STA weight  Tx_time_round  wt_sum   s_vtg_vt  valid_for_next_Tx
1   256 204813056   20482020N
2   256 20482048N
.   .   .   .
.   .   .   .
.   .   .   .
50  256 20482048N
51  256 10242524N


That's better :)


Sync is done by:
max(g_vt of last round - grace period, s_vt)
and s_vt of #51 = max(2000 - 500, 0) + 1024 = 2524, and it is more 
than the final

g_vt of this round.

After this round, no more station is valid for transmission.

The real situation can be more complicate, above is one of the 
extremely case.


To avoid this situation to occur, the new proposal is:

- Increase the airtime grace period a little more to reduce the
  unexpected sync

- If global virtual time is less than the virtual airtime of any 
station,

  sync it to the airtime of first station in the red-black tree

- Round the division result


I can see why we need the second part (basically, this happens because 
I
forgot to add a check for "no eligible stations" in may_transmit(), 
like

the one in next_txq()). And rounding up the division result doesn't
hurt, I guess. But why does it help to change the grace period if we're
doing all the other stuff?
In multi-clients case, it is possible a TXQ sometimes gets drained due 
to FW has deep queue and few packets in TXQ at that time. So the TXQ is 
removed from the rbtree after dequeuing. When it is about to added back 
very soon after the removal, the g_vt might have gone a little far away 
from sta vt where sync is needed. With this sync, the station is forced 
to catch up with the g_vt, however, its chance for transmission has been 
reduced. I think 500us is quite a short period in multi-clients case.


-Toke


--
Yibo

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


Re: [PATCH 2/4] mac80211: defer txqs removal from rbtree

2019-09-24 Thread Yibo Zhao

On 2019-09-24 15:26, Toke Høiland-Jørgensen wrote:

Hmm, yeah, I guess we could end up with a loop like that as well.
Keeping the schedule_round would be a way to fix it, but I'm not sure
we
should just skip that station; maybe we should just end the round
instead?

I am not sure. I believe, in some cases, the rest of the nodes which
could be most of the nodes in the tree will not have the chance to be
scheduled in this round.


My guess would be that it doesn't really matter, because in most cases
each schedule round will only actually end up queueing packets from one
or two stations; as the driver will pull multiple packets from that one
station which will often fill up the firmware queues (especially once 
we

start throttling that with the AQL stuff).

So I guess we can just skip TXQs that we've already seen this 
scheduling

round, and let the v_t compare determine transmit eligibility :)


I am a little confused. So do you mean it is fine for you to skip the 
TXQs we met in this round before and continue the loop until the end or 
vt comparison failure?




-Toke


--
Yibo


Re: [PATCH V3 3/4] mac80211: fix low throughput in multi-clients situation

2019-09-24 Thread Yibo Zhao

On 2019-09-24 16:48, Toke Høiland-Jørgensen wrote:

Yibo Zhao  writes:

I can see why we need the second part (basically, this happens 
because

I
forgot to add a check for "no eligible stations" in may_transmit(),
like
the one in next_txq()). And rounding up the division result doesn't
hurt, I guess. But why does it help to change the grace period if 
we're

doing all the other stuff?

In multi-clients case, it is possible a TXQ sometimes gets drained due
to FW has deep queue and few packets in TXQ at that time. So the TXQ 
is
removed from the rbtree after dequeuing. When it is about to added 
back
very soon after the removal, the g_vt might have gone a little far 
away
from sta vt where sync is needed. With this sync, the station is 
forced
to catch up with the g_vt, however, its chance for transmission has 
been

reduced. I think 500us is quite a short period in multi-clients case.


That's a good point, actually: Having the grace period be too small 
will
allow stations that leave and re-enter the queue to "skip ahead" and 
use

more than its share. However, I think it's a separate issue from what
this patch is about; so how about I just increase the grace period in
the next version of the base patch?


Sure, no problem. :)


-Toke


--
Yibo


Re: [PATCH V3 0/4] Enable virtual time-based airtime scheduler support on ath10k

2019-10-01 Thread Yibo Zhao

On 2019-10-01 18:19, Johannes Berg wrote:

On Mon, 2019-09-23 at 15:19 +0800, Yibo Zhao wrote:
This series fix some issues when enabling virtual time-based airtime 
scheduler on ath10k.



Given the lengthy discussion here and also over in the related thread
about AQL, I'm assuming you're going to repost this eventually.


Yes, will repost once Toke have updated "mac80211: Switch to a virtual 
time-based airtime scheduler" with his new ideas.


johannes


--
Yibo

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


[PATCH V4 0/4] Enable virtual time-based airtime scheduler support on ath10k

2019-12-12 Thread Yibo Zhao
This series fix some issues when enabling virtual time-based airtime scheduler 
on ath10k.

Changes since v3:
  Change schedule_pos to previous node once it has chance to be moved/removed
  from current position in the tree in loop scenario and bring back 
schedule_round
  in case that same node is to be scheduled again in the mean time.

  Increase airtime grace period to 2000 us in the first patch.

  Put per-AC station weight checking in its lock during configuration from 
application.

Changes since v2:
  Changes station airtime weight to be per-AC based to avoid sync issue
  Remove Co-developed-by and Toke's sign-off as Toke suggested

Changes since v1:
  Modify the author of Co-developed-by as Johannes suggested

Toke Høiland-Jørgensen (1):
  mac80211: Switch to a virtual time-based airtime scheduler

Yibo Zhao (3):
  mac80211: fix issue in loop scenario
  mac80211: fix low throughput in multi-clients situation
  mac80211: Sync airtime weight sum with per AC synced sta airtime
weight together

 net/mac80211/cfg.c |  29 +-
 net/mac80211/debugfs.c |  48 +-
 net/mac80211/debugfs_sta.c |  18 ++--
 net/mac80211/ieee80211_i.h |  12 ++-
 net/mac80211/main.c|   2 +-
 net/mac80211/sta_info.c|  25 +++--
 net/mac80211/sta_info.h|   8 +-
 net/mac80211/tx.c  | 234 +++--
 8 files changed, 278 insertions(+), 98 deletions(-)

-- 
1.9.1

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


[PATCH 1/4] mac80211: Switch to a virtual time-based airtime scheduler

2019-12-12 Thread Yibo Zhao
From: Toke Høiland-Jørgensen 

This switches the airtime scheduler in mac80211 to use a virtual time-based
scheduler instead of the round-robin scheduler used before. This has a
couple of advantages:

- No need to sync up the round-robin scheduler in firmware/hardware with
  the round-robin airtime scheduler.

- If several stations are eligible for transmission we can schedule both of
  them; no need to hard-block the scheduling rotation until the head of the
  queue has used up its quantum.

- The check of whether a station is eligible for transmission becomes
  simpler (in ieee80211_txq_may_transmit()).

The drawback is that scheduling becomes slightly more expensive, as we need
to maintain an rbtree of TXQs sorted by virtual time. This means that
ieee80211_register_airtime() becomes O(logN) in the number of currently
scheduled TXQs. However, hopefully this number rarely grows too big (it's
only TXQs currently backlogged, not all associated stations), so it
shouldn't be too big of an issue.

Signed-off-by: Toke Høiland-Jørgensen 
---
 net/mac80211/debugfs.c |  48 +-
 net/mac80211/debugfs_sta.c |  16 ++--
 net/mac80211/ieee80211_i.h |  14 ++-
 net/mac80211/main.c|   2 +-
 net/mac80211/sta_info.c|  19 +++-
 net/mac80211/sta_info.h|   3 +-
 net/mac80211/tx.c  | 217 +
 7 files changed, 223 insertions(+), 96 deletions(-)

diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c
index 2d43bc1..4847168 100644
--- a/net/mac80211/debugfs.c
+++ b/net/mac80211/debugfs.c
@@ -150,6 +150,46 @@ static ssize_t aqm_write(struct file *file,
.llseek = default_llseek,
 };
 
+static ssize_t airtime_read(struct file *file,
+   char __user *user_buf,
+   size_t count,
+   loff_t *ppos)
+{
+   struct ieee80211_local *local = file->private_data;
+   char buf[200];
+   u64 v_t[IEEE80211_NUM_ACS];
+   u64 wt[IEEE80211_NUM_ACS];
+   int len = 0, ac;
+
+   for (ac = 0; ac < IEEE80211_NUM_ACS; ac++) {
+   spin_lock_bh(&local->active_txq_lock[ac]);
+   v_t[ac] = local->airtime_v_t[ac];
+   wt[ac] = local->airtime_weight_sum[ac];
+   spin_unlock_bh(&local->active_txq_lock[ac]);
+   }
+   len = scnprintf(buf, sizeof(buf),
+   "\tVO VI BE BK\n"
+   "Virt-t\t%-10llu %-10llu %-10llu %-10llu\n"
+   "Weight\t%-10llu %-10llu %-10llu %-10llu\n",
+   v_t[0],
+   v_t[1],
+   v_t[2],
+   v_t[3],
+   wt[0],
+   wt[1],
+   wt[2],
+   wt[3]);
+
+   return simple_read_from_buffer(user_buf, count, ppos,
+  buf, len);
+}
+
+static const struct file_operations airtime_ops = {
+   .read = airtime_read,
+   .open = simple_open,
+   .llseek = default_llseek,
+};
+
 #ifdef CONFIG_PM
 static ssize_t reset_write(struct file *file, const char __user *user_buf,
   size_t count, loff_t *ppos)
@@ -386,8 +426,12 @@ void debugfs_hw_add(struct ieee80211_local *local)
if (local->ops->wake_tx_queue)
DEBUGFS_ADD_MODE(aqm, 0600);
 
-   debugfs_create_u16("airtime_flags", 0600,
-  phyd, &local->airtime_flags);
+   if (wiphy_ext_feature_isset(local->hw.wiphy,
+   NL80211_EXT_FEATURE_AIRTIME_FAIRNESS)) {
+   DEBUGFS_ADD_MODE(airtime, 0600);
+   debugfs_create_u16("airtime_flags", 0600,
+  phyd, &local->airtime_flags);
+   }
 
statsd = debugfs_create_dir("statistics", phyd);
 
diff --git a/net/mac80211/debugfs_sta.c b/net/mac80211/debugfs_sta.c
index 3aa618d..80028da 100644
--- a/net/mac80211/debugfs_sta.c
+++ b/net/mac80211/debugfs_sta.c
@@ -203,7 +203,7 @@ static ssize_t sta_airtime_read(struct file *file, char 
__user *userbuf,
size_t bufsz = 200;
char *buf = kzalloc(bufsz, GFP_KERNEL), *p = buf;
u64 rx_airtime = 0, tx_airtime = 0;
-   s64 deficit[IEEE80211_NUM_ACS];
+   u64 v_t[IEEE80211_NUM_ACS];
ssize_t rv;
int ac;
 
@@ -214,20 +214,20 @@ static ssize_t sta_airtime_read(struct file *file, char 
__user *userbuf,
spin_lock_bh(&local->active_txq_lock[ac]);
rx_airtime += sta->airtime[ac].rx_airtime;
tx_airtime += sta->airtime[ac].tx_airtime;
-   deficit[ac] = sta->airtime[ac].deficit;
+   v_t[ac] = sta->airtime[ac].v_t;
spin_unlock_bh(&local->active_txq_lock[ac]);
}
 
p += scnprintf(p, bufsz + buf - p,
"RX: %llu us\nTX: %llu us\nWeight: %u\n"
-   "Deficit: VO: %lld us VI: %lld us BE: %lld 

[PATCH 3/4] mac80211: fix low throughput in multi-clients situation

2019-12-12 Thread Yibo Zhao
Not long after the start of multi-clients test, not a single station is
an eligible candidate for transmission since global virtual time(g_vt) is
smaller than the virtual airtime(s_vt) of all the stations. As a result,
the Tx has been blocked and throughput is quite low.

This may mainly due accumulative deviation from the devision calculation
of g_vt.

For example:
Suppose we have 50 clients in first round.
Round 1:
STA weight  Tx_time_round  wt_sum   s_vtg_vt  valid_for_next_Tx
1   256 204812800   20482000N
2   256 20482048N
.   .   .   .   .
.   .   .   .   .
.   .   .   .   .
50  256 20482048N

After this round, all the stations are not valid for next transmission due to
accumulative deviation.

The real situation can be more complicate, above is one of the extremely case.

To avoid this situation to occur, the new proposal is:

- If global virtual time is less than the virtual airtime of any station,
  sync it to the airtime of first station in the red-black tree

- Round the division result

Signed-off-by: Yibo Zhao 
---
 net/mac80211/sta_info.c |  3 ++-
 net/mac80211/tx.c   | 16 +++-
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index 9d01fdd..feac975 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -1852,7 +1852,8 @@ void ieee80211_sta_register_airtime(struct ieee80211_sta 
*pubsta, u8 tid,
 
weight_sum = local->airtime_weight_sum[ac] ?: sta->airtime_weight;
 
-   local->airtime_v_t[ac] += airtime / weight_sum;
+   /* Round the calculation of global vt */
+   local->airtime_v_t[ac] += (airtime + (weight_sum >> 1)) / weight_sum;
sta->airtime[ac].v_t += airtime / sta->airtime_weight;
ieee80211_resort_txq(&local->hw, txq);
 
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index c1444e7..b40cf91 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -3822,15 +3822,29 @@ bool ieee80211_txq_may_transmit(struct ieee80211_hw *hw,
struct ieee80211_txq *txq)
 {
struct ieee80211_local *local = hw_to_local(hw);
-   struct txq_info *txqi = to_txq_info(txq);
+   struct txq_info *first_txqi, *txqi = to_txq_info(txq);
+   struct rb_node *node = NULL;
struct sta_info *sta;
u8 ac = txq->ac;
+   first_txqi = NULL;
 
lockdep_assert_held(&local->active_txq_lock[ac]);
 
if (!txqi->txq.sta)
return true;
 
+   node = rb_first_cached(&local->active_txqs[ac]);
+   if (node) {
+   first_txqi = container_of(node, struct txq_info,
+ schedule_order);
+   if (first_txqi->txq.sta) {
+   sta = container_of(first_txqi->txq.sta,
+  struct sta_info, sta);
+   if (local->airtime_v_t[ac] < sta->airtime[ac].v_t)
+   local->airtime_v_t[ac] = sta->airtime[ac].v_t;
+   }
+   }
+
sta = container_of(txqi->txq.sta, struct sta_info, sta);
return (sta->airtime[ac].v_t <= local->airtime_v_t[ac]);
 }
-- 
1.9.1

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


[PATCH 2/4] mac80211: fix issue in loop scenario

2019-12-12 Thread Yibo Zhao
In a loop txqs dequeue scenario, if the first txq in the rbtree gets
removed from rbtree immediately in the ieee80211_return_txq(), the
loop will break soon in the ieee80211_next_txq() due to schedule_pos
not leading to the second txq in the rbtree. Thus update schedule_pos
to previous node once the node of schedule_pos is either removed from
rbtree or move to other location in rbtree due to airtime update.

Signed-off-by: Yibo Zhao 
---
 net/mac80211/ieee80211_i.h |  2 ++
 net/mac80211/tx.c  | 14 +-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index a4556f9..ed85400 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -847,6 +847,7 @@ struct txq_info {
struct codel_stats cstats;
struct sk_buff_head frags;
struct rb_node schedule_order;
+   u16 schedule_round;
unsigned long flags;
 
/* keep last! */
@@ -1144,6 +1145,7 @@ struct ieee80211_local {
struct rb_node *schedule_pos[IEEE80211_NUM_ACS];
u64 airtime_v_t[IEEE80211_NUM_ACS];
u64 airtime_weight_sum[IEEE80211_NUM_ACS];
+   u16 schedule_round[IEEE80211_NUM_ACS];
 
u16 airtime_flags;
 
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index d00baaa..c1444e7 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -3644,6 +3644,7 @@ struct ieee80211_txq *ieee80211_next_txq(struct 
ieee80211_hw *hw, u8 ac)
 
lockdep_assert_held(&local->active_txq_lock[ac]);
 
+begin:
if (!node) {
node = rb_first_cached(&local->active_txqs[ac]);
first = true;
@@ -3668,7 +3669,10 @@ struct ieee80211_txq *ieee80211_next_txq(struct 
ieee80211_hw *hw, u8 ac)
}
}
 
+   if (txqi->schedule_round == local->schedule_round[ac])
+   goto begin;
 
+   txqi->schedule_round = local->schedule_round[ac];
local->schedule_pos[ac] = node;
return &txqi->txq;
 }
@@ -3752,6 +3756,9 @@ void ieee80211_resort_txq(struct ieee80211_hw *hw,
u8 ac = txq->ac;
 
if (!RB_EMPTY_NODE(&txqi->schedule_order)) {
+   if (local->schedule_pos[ac] == &txqi->schedule_order)
+   local->schedule_pos[ac] = 
rb_prev(&txqi->schedule_order);
+
rb_erase_cached(&txqi->schedule_order,
&local->active_txqs[ac]);
RB_CLEAR_NODE(&txqi->schedule_order);
@@ -3771,6 +3778,9 @@ static void __ieee80211_unschedule_txq(struct 
ieee80211_hw *hw,
if (RB_EMPTY_NODE(&txqi->schedule_order))
return;
 
+   if (local->schedule_pos[ac] == &txqi->schedule_order)
+   local->schedule_pos[ac] = rb_prev(&txqi->schedule_order);
+
if (txq->sta) {
struct sta_info *sta = container_of(txq->sta,
struct sta_info, sta);
@@ -3803,7 +3813,7 @@ void ieee80211_return_txq(struct ieee80211_hw *hw,
lockdep_assert_held(&local->active_txq_lock[txq->ac]);
 
if (!RB_EMPTY_NODE(&txqi->schedule_order) &&
-   (skb_queue_empty(&txqi->frags) && !txqi->tin.backlog_packets))
+   !txq_has_queue(&txqi->txq))
__ieee80211_unschedule_txq(hw, txq);
 }
 EXPORT_SYMBOL(ieee80211_return_txq);
@@ -3832,6 +3842,8 @@ void ieee80211_txq_schedule_start(struct ieee80211_hw 
*hw, u8 ac)
struct ieee80211_local *local = hw_to_local(hw);
 
spin_lock_bh(&local->active_txq_lock[ac]);
+   local->schedule_round[ac]++;
+
 }
 EXPORT_SYMBOL(ieee80211_txq_schedule_start);
 
-- 
1.9.1

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


[PATCH 4/4] mac80211: Sync airtime weight sum with per AC synced sta airtime weight together

2019-12-12 Thread Yibo Zhao
Global airtime weight sum is updated only when txq is added/removed
from rbtree. If upper layer configures sta weight during high load,
airtime weight sum will not be updated since txq is most likely on the
tree. It could a little late for upper layer to reconfigure sta weight
when txq is already in the rbtree. And thus, incorrect airtime weight sum
will lead to incorrect global virtual time calculation as well as overflow
of airtime weight sum during txq removed.

Hence, need to update airtime weight sum upon receiving event for
configuring sta weight once sta's txq is on the rbtree.

Besides, if airtime weight sum of ACs and sta weight is synced under the
same per AC lock protection, there can be a very short window causing
incorrct airtime weight sum calculation as below:

active_txq_lock_VO  .
VO weight sum is syncd  .
sta airtime weight sum is synced.
active_txq_unlock_VO.
.   .
active_txq_lock_VI  .
VI weight sum is syncd  .
sta airtime weight sum  active_txq_lock_BE
active_txq_unlock_VI  Remove txq and thus sum
. is calculated with synced
. sta airtime weight
.   active_txq_unlock_BE

So introduce a per ac synced station airtime weight synced with per
AC synced weight sum together. And the per-AC station airtime weight
is used to calculate weight sum.

Signed-off-by: Yibo Zhao 
---
 net/mac80211/cfg.c | 29 ++---
 net/mac80211/debugfs_sta.c |  2 +-
 net/mac80211/sta_info.c|  9 -
 net/mac80211/sta_info.h|  5 +++--
 net/mac80211/tx.c  |  4 ++--
 5 files changed, 36 insertions(+), 13 deletions(-)

diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
index d65aa01..298b61d 100644
--- a/net/mac80211/cfg.c
+++ b/net/mac80211/cfg.c
@@ -1284,7 +1284,8 @@ static int sta_apply_parameters(struct ieee80211_local 
*local,
int ret = 0;
struct ieee80211_supported_band *sband;
struct ieee80211_sub_if_data *sdata = sta->sdata;
-   u32 mask, set;
+   u32 mask, set, tid, ac, pre_weight;
+   struct txq_info *txqi;
 
sband = ieee80211_get_sband(sdata);
if (!sband)
@@ -1452,8 +1453,30 @@ static int sta_apply_parameters(struct ieee80211_local 
*local,
if (ieee80211_vif_is_mesh(&sdata->vif))
sta_apply_mesh_params(local, sta, params);
 
-   if (params->airtime_weight)
-   sta->airtime_weight = params->airtime_weight;
+   if (params->airtime_weight) {
+   for (ac = 0; ac < IEEE80211_NUM_ACS; ac++) {
+   spin_lock_bh(&local->active_txq_lock[ac]);
+   for (tid = 0; tid < IEEE80211_NUM_TIDS + 1; tid++) {
+   if (params->airtime_weight == 
sta->airtime_weight[ac] ||
+   !sta->sta.txq[tid] ||
+   ac != ieee80211_ac_from_tid(tid))
+   continue;
+
+   pre_weight = sta->airtime_weight[ac];
+   sta->airtime_weight[ac] =
+   params->airtime_weight;
+
+   txqi = to_txq_info(sta->sta.txq[tid]);
+   if (RB_EMPTY_NODE(&txqi->schedule_order))
+   continue;
+
+   local->airtime_weight_sum[ac] = 
local->airtime_weight_sum[ac] +
+   
params->airtime_weight -
+   pre_weight;
+   }
+   spin_unlock_bh(&local->active_txq_lock[ac]);
+   }
+   }
 
/* set the STA state after all sta info from usermode has been set */
if (test_sta_flag(sta, WLAN_STA_TDLS_PEER) ||
diff --git a/net/mac80211/debugfs_sta.c b/net/mac80211/debugfs_sta.c
index 80028da..43a7e6a 100644
--- a/net/mac80211/debugfs_sta.c
+++ b/net/mac80211/debugfs_sta.c
@@ -223,7 +223,7 @@ static ssize_t sta_airtime_read(struct file *file, char 
__user *userbuf,
"Virt-T: VO: %lld us VI: %lld us BE: %lld us BK: %lld us\n",
rx_airtime,
tx_airtime,
-   sta->airtime_weight,
+   sta->airtime_weight[0],
v_t[0],
v_t[1],
v_t[2],
diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index feac975..e599cf1 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -384,11 +384,10 @@ struct sta_info *sta_i

[PATCH] Allow qca988x family to support ack rssi of tx data packets.

2020-02-12 Thread Yibo Zhao
Hardwares tested : QCA9887
Firmwares tested : 10.4-3.9.0.1-00036

Signed-off-by: Yibo Zhao 
---
 drivers/net/wireless/ath/ath10k/hw.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/wireless/ath/ath10k/hw.c 
b/drivers/net/wireless/ath/ath10k/hw.c
index 2451e0f..57c58af 100644
--- a/drivers/net/wireless/ath/ath10k/hw.c
+++ b/drivers/net/wireless/ath/ath10k/hw.c
@@ -1131,6 +1131,7 @@ static int ath10k_get_htt_tx_data_rssi_pad(struct 
htt_resp *resp)
 
 const struct ath10k_hw_ops qca988x_ops = {
.set_coverage_class = ath10k_hw_qca988x_set_coverage_class,
+   .is_rssi_enable = ath10k_htt_tx_rssi_enable,
 };
 
 static int ath10k_qca99x0_rx_desc_get_l3_pad_bytes(struct htt_rx_desc *rxd)
-- 
1.9.1

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k


[PATCH] ath10k: fix not registering airtime of 11a station with WMM disable

2020-02-19 Thread Yibo Zhao
The tid of 11a station with WMM disable reported by FW is 0x10 in
tx completion. The tid 16 is mapped to a NULL txq since buffer
MMPDU capbility is not supported. Then 11a station's airtime will
not be registered due to NULL txq check. As a results, airtime of
11a station keeps unchanged in debugfs system.

Mask the tid along with IEEE80211_QOS_CTL_TID_MASK to make it in
the valid range.

Hardwares tested : QCA9984
Firmwares tested : 10.4-3.10-00047

Signed-off-by: Yibo Zhao 
---
 drivers/net/wireless/ath/ath10k/htt_rx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c 
b/drivers/net/wireless/ath/ath10k/htt_rx.c
index 38a5814..f883f2a 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -2744,7 +2744,8 @@ static void ath10k_htt_rx_tx_compl_ind(struct ath10k *ar,
continue;
}
 
-   tid = FIELD_GET(HTT_TX_PPDU_DUR_INFO0_TID_MASK, info0);
+   tid = FIELD_GET(HTT_TX_PPDU_DUR_INFO0_TID_MASK, info0) &
+   IEEE80211_QOS_CTL_TID_MASK;
tx_duration = __le32_to_cpu(ppdu_dur->tx_duration);
 
ieee80211_sta_register_airtime(peer->sta, tid, tx_duration, 0);
-- 
1.9.1

___
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k