Re: [PATCH RFC v4 1/4] mac80211: Add TXQ scheduling API

2018-09-19 Thread Rajkumar Manoharan

On 2018-09-19 07:50, Toke Høiland-Jørgensen wrote:

Kalle Valo  writes:


Toke Høiland-Jørgensen  writes:

Unfortunately ath10k is not reporting last_tx_rate in tx_status(). 
So
I also applied this "ath10k: report tx rate using 
ieee80211_tx_status"

change.


Yeah, that and the patch that computes the last used rate will 
probably

be necessary; but they can be pretty much applied as-is, right?


Unfortunately not. I think the plan is now to follow Johannes' 
proposal:


   "I'd recommend against doing this and disentangling the necessary
code in mac80211, e.g. with ieee80211_tx_status_ext() or adding
similar APIs."

   https://patchwork.kernel.org/patch/10353959/


Ahh, right... *that* patch :)

Was thinking on this one with the "as-is" comment:

https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/588189

It is useful only when the driver calls tx_status_noskb(). It was 
recommended not to
call tx_status() and tx_status_noskb() APIs from same driver. Hence Anil 
was trying

to piggyback tx rate report by tx_status itself.

https://chromium.googlesource.com/chromiumos/third_party/kernel/+/1e034d84bd444fd29b7f902c5e033a8c737a58b2%5E%21/
https://chromium.googlesource.com/chromiumos/third_party/kernel/+/2a8da427fc9dfb527516e7ac395b1e6af73bff84%5E%21/

-Rajkumar


Re: [PATCH RFC v4 1/4] mac80211: Add TXQ scheduling API

2018-09-19 Thread Toke Høiland-Jørgensen
Kalle Valo  writes:

> Toke Høiland-Jørgensen  writes:
>
>>> Unfortunately ath10k is not reporting last_tx_rate in tx_status(). So
>>> I also applied this "ath10k: report tx rate using ieee80211_tx_status"
>>> change.
>>
>> Yeah, that and the patch that computes the last used rate will probably
>> be necessary; but they can be pretty much applied as-is, right?
>
> Unfortunately not. I think the plan is now to follow Johannes' proposal:
>
>"I'd recommend against doing this and disentangling the necessary
> code in mac80211, e.g. with ieee80211_tx_status_ext() or adding
> similar APIs."
>
>https://patchwork.kernel.org/patch/10353959/

Ahh, right... *that* patch :)

Was thinking on this one with the "as-is" comment:

https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/588189

-Toke


Re: [PATCH RFC v4 1/4] mac80211: Add TXQ scheduling API

2018-09-19 Thread Kalle Valo
Toke Høiland-Jørgensen  writes:

>> Unfortunately ath10k is not reporting last_tx_rate in tx_status(). So
>> I also applied this "ath10k: report tx rate using ieee80211_tx_status"
>> change.
>
> Yeah, that and the patch that computes the last used rate will probably
> be necessary; but they can be pretty much applied as-is, right?

Unfortunately not. I think the plan is now to follow Johannes' proposal:

   "I'd recommend against doing this and disentangling the necessary
code in mac80211, e.g. with ieee80211_tx_status_ext() or adding
similar APIs."

   https://patchwork.kernel.org/patch/10353959/

-- 
Kalle Valo


Re: [PATCH RFC v4 1/4] mac80211: Add TXQ scheduling API

2018-09-19 Thread Toke Høiland-Jørgensen
Rajkumar Manoharan  writes:

> On 2018-09-18 13:41, Toke Høiland-Jørgensen wrote:
>> Rajkumar Manoharan  writes:
>> 
> Also an option to add the node at head or tail would be preferred. 
> If
> return_txq adds node at head of list, then it is forcing the driver 
> to
> serve same txq until it becomes empty. Also this will not allow the
> driver to send N frames from each txqs.
 
 The whole point of this patch set is to move those kinds of decisions
 out of the driver and into mac80211. The airtime scheduler won't
 achieve
 fairness if it allows queues to be queued to the end of the rotation
 before its deficit turns negative. And obviously there's some lag in
 this since we're using after-the-fact airtime information.
 
>>> Hmm.. As you know ath10k kind of doing fairness by serving fixed 
>>> frames
>>> from each txq. This approach will be removed from ath10k.
>>> 
 For ath9k this has not really been a problem in my tests; if the lag
 turns out to be too great for ath10k (which I suppose is a 
 possibility
 since we don't get airtime information on every TX-compl), I figure 
 we
 can use the same estimated airtime value that is used for throttling
 the
 queues to adjust the deficit immediately...
 
>>> Thats true. I am porting Kan's changes of airtime estimation for each
>>> msdu for firmware that does not report airtime.
>> 
>> Right. My thinking with this was that we could put the per-frame 
>> airtime
>> estimation into ieee80211_tx_dequeue(), which could track the
>> outstanding airtime and just return NULL if it goes over the threshold.
>> I think this is fairly straight-forward to do on its own; the biggest
>> problem is probably finding the space in the mac80211 cb?
>> 
>> Is this what you are working on porting? Because then I'll wait for 
>> your
>> patch rather than starting to write this code myself :)
>> 
> Kind of.. something like below.
>
> tx_dequeue(){
>  compute airtime_est from last_tx_rate
>  if (sta->airtime[ac].deficit < airtime_est)
>  return NULL;
>  dequeue skb and store airtime_est in cb
> }

I think I would decouple it further and not use the deficit. But rather:

 tx_dequeue(){
  if (sta->airtime[ac].outstanding > AIRTIME_OUTSTANDING_MAX)
return NULL
  compute airtime_est from last_tx_rate
  dequeue skb and store airtime_est in cb
  sta->airtime[ac].outstanding += airtime_est;
 }

> Unfortunately ath10k is not reporting last_tx_rate in tx_status(). So
> I also applied this "ath10k: report tx rate using ieee80211_tx_status"
> change.

Yeah, that and the patch that computes the last used rate will probably
be necessary; but they can be pretty much applied as-is, right?

>> This mechanism on its own will get us the queue limiting and latency
>> reduction goodness for firmwares with deep queues. And for that it can
>> be completely independent of the airtime fairness scheduler, which can
>> use the after-tx-compl airtime information to presumably get more
>> accurate fairness which includes retransmissions etc.
>> 
>> Now, we could *also* use the ahead-of-time airtime estimation for
>> fairness; either just as a fallback for drivers that can't get actual
>> airtime usage information for the hardware, or as an alternative in
>> cases where it works better for other reasons. But I think that
>> separating the two in the initial implementation makes more sense; that
>> will make it easier to experiment with different combinations of the
>> two.
>> 
>> Does that make sense? :)
>> 
> Completely agree. I was thinking of using this as fallback for devices
> that does not report airtime but tx rate.

Great! Seems we are converging on a workable solution, then :)

-Toke


Re: [PATCH RFC v4 1/4] mac80211: Add TXQ scheduling API

2018-09-18 Thread Rajkumar Manoharan

On 2018-09-18 13:41, Toke Høiland-Jørgensen wrote:

Rajkumar Manoharan  writes:

Also an option to add the node at head or tail would be preferred. 
If
return_txq adds node at head of list, then it is forcing the driver 
to

serve same txq until it becomes empty. Also this will not allow the
driver to send N frames from each txqs.


The whole point of this patch set is to move those kinds of decisions
out of the driver and into mac80211. The airtime scheduler won't
achieve
fairness if it allows queues to be queued to the end of the rotation
before its deficit turns negative. And obviously there's some lag in
this since we're using after-the-fact airtime information.

Hmm.. As you know ath10k kind of doing fairness by serving fixed 
frames

from each txq. This approach will be removed from ath10k.


For ath9k this has not really been a problem in my tests; if the lag
turns out to be too great for ath10k (which I suppose is a 
possibility
since we don't get airtime information on every TX-compl), I figure 
we

can use the same estimated airtime value that is used for throttling
the
queues to adjust the deficit immediately...


Thats true. I am porting Kan's changes of airtime estimation for each
msdu for firmware that does not report airtime.


Right. My thinking with this was that we could put the per-frame 
airtime

estimation into ieee80211_tx_dequeue(), which could track the
outstanding airtime and just return NULL if it goes over the threshold.
I think this is fairly straight-forward to do on its own; the biggest
problem is probably finding the space in the mac80211 cb?

Is this what you are working on porting? Because then I'll wait for 
your

patch rather than starting to write this code myself :)


Kind of.. something like below.

tx_dequeue(){
compute airtime_est from last_tx_rate
if (sta->airtime[ac].deficit < airtime_est)
return NULL;
dequeue skb and store airtime_est in cb
}

Unfortunately ath10k is not reporting last_tx_rate in tx_status(). So I
also applied this "ath10k: report tx rate using ieee80211_tx_status" 
change.



This mechanism on its own will get us the queue limiting and latency
reduction goodness for firmwares with deep queues. And for that it can
be completely independent of the airtime fairness scheduler, which can
use the after-tx-compl airtime information to presumably get more
accurate fairness which includes retransmissions etc.

Now, we could *also* use the ahead-of-time airtime estimation for
fairness; either just as a fallback for drivers that can't get actual
airtime usage information for the hardware, or as an alternative in
cases where it works better for other reasons. But I think that
separating the two in the initial implementation makes more sense; that
will make it easier to experiment with different combinations of the
two.

Does that make sense? :)


Completely agree. I was thinking of using this as fallback for devices
that does not report airtime but tx rate.

-Rajkumar


Re: [PATCH RFC v4 1/4] mac80211: Add TXQ scheduling API

2018-09-18 Thread Toke Høiland-Jørgensen
Rajkumar Manoharan  writes:

>>> Also an option to add the node at head or tail would be preferred. If
>>> return_txq adds node at head of list, then it is forcing the driver to
>>> serve same txq until it becomes empty. Also this will not allow the
>>> driver to send N frames from each txqs.
>> 
>> The whole point of this patch set is to move those kinds of decisions
>> out of the driver and into mac80211. The airtime scheduler won't 
>> achieve
>> fairness if it allows queues to be queued to the end of the rotation
>> before its deficit turns negative. And obviously there's some lag in
>> this since we're using after-the-fact airtime information.
>> 
> Hmm.. As you know ath10k kind of doing fairness by serving fixed frames
> from each txq. This approach will be removed from ath10k.
>
>> For ath9k this has not really been a problem in my tests; if the lag
>> turns out to be too great for ath10k (which I suppose is a possibility
>> since we don't get airtime information on every TX-compl), I figure we
>> can use the same estimated airtime value that is used for throttling 
>> the
>> queues to adjust the deficit immediately...
>> 
> Thats true. I am porting Kan's changes of airtime estimation for each
> msdu for firmware that does not report airtime.

Right. My thinking with this was that we could put the per-frame airtime
estimation into ieee80211_tx_dequeue(), which could track the
outstanding airtime and just return NULL if it goes over the threshold.
I think this is fairly straight-forward to do on its own; the biggest
problem is probably finding the space in the mac80211 cb?

Is this what you are working on porting? Because then I'll wait for your
patch rather than starting to write this code myself :)

This mechanism on its own will get us the queue limiting and latency
reduction goodness for firmwares with deep queues. And for that it can
be completely independent of the airtime fairness scheduler, which can
use the after-tx-compl airtime information to presumably get more
accurate fairness which includes retransmissions etc.

Now, we could *also* use the ahead-of-time airtime estimation for
fairness; either just as a fallback for drivers that can't get actual
airtime usage information for the hardware, or as an alternative in
cases where it works better for other reasons. But I think that
separating the two in the initial implementation makes more sense; that
will make it easier to experiment with different combinations of the
two.

Does that make sense? :)

-Toke


Re: [PATCH RFC v4 1/4] mac80211: Add TXQ scheduling API

2018-09-18 Thread Rajkumar Manoharan

On 2018-09-18 03:29, Toke Høiland-Jørgensen wrote:

Rajkumar Manoharan  writes:


On 2018-09-16 10:42, Toke Høiland-Jørgensen wrote:
return_txq() should return a bool to inform the driver that whether
txq is queued back or not.


What would the driver do with that return value, exactly?


never mind.. got lost with earlier schedule_txq API.


Otherwise the same txq will be served indefinitely until txq becomes
empty. This problem occurs when the driver is running out of hw
descriptors or driver sends only N frames (< backlog_packets).


No, if it's using next_txq(), the API guarantees that the same TXQ will
not be returned more than once between a set of calls to
schedule_start()/schedule_end() (by way of the seqno mechanism). I
didn't add the same check to may_transmit(), because I assumed the
driver would not be looping in this case. Is that not correct?


Yeah.. you are correct. sorry for the noise.


Also an option to add the node at head or tail would be preferred. If
return_txq adds node at head of list, then it is forcing the driver to
serve same txq until it becomes empty. Also this will not allow the
driver to send N frames from each txqs.


The whole point of this patch set is to move those kinds of decisions
out of the driver and into mac80211. The airtime scheduler won't 
achieve

fairness if it allows queues to be queued to the end of the rotation
before its deficit turns negative. And obviously there's some lag in
this since we're using after-the-fact airtime information.


Hmm.. As you know ath10k kind of doing fairness by serving fixed frames
from each txq. This approach will be removed from ath10k.


For ath9k this has not really been a problem in my tests; if the lag
turns out to be too great for ath10k (which I suppose is a possibility
since we don't get airtime information on every TX-compl), I figure we
can use the same estimated airtime value that is used for throttling 
the

queues to adjust the deficit immediately...

Thats true. I am porting Kan's changes of airtime estimation for each 
msdu

for firmware that does not report airtime.

-Rajkumar


Re: [PATCH RFC v4 1/4] mac80211: Add TXQ scheduling API

2018-09-18 Thread Toke Høiland-Jørgensen
Rajkumar Manoharan  writes:

> On 2018-09-16 10:42, Toke Høiland-Jørgensen wrote:
>> +/**
>> + * ieee80211_return_txq - return a TXQ previously acquired by
>> ieee80211_next_txq()
>> + *
>> + * @hw: pointer as obtained from ieee80211_alloc_hw()
>> + * @txq: pointer obtained from station or virtual interface
>> + *
>> + * Should only be called between calls to 
>> ieee80211_txq_schedule_start()
>> + * and ieee80211_txq_schedule_end().
>> + */
>> +void ieee80211_return_txq(struct ieee80211_hw *hw, struct 
>> ieee80211_txq *txq);
>> +
>> 
> return_txq() should return a bool to inform the driver that whether
> txq is queued back or not.

What would the driver do with that return value, exactly?

> Otherwise the same txq will be served indefinitely until txq becomes
> empty. This problem occurs when the driver is running out of hw
> descriptors or driver sends only N frames (< backlog_packets).

No, if it's using next_txq(), the API guarantees that the same TXQ will
not be returned more than once between a set of calls to
schedule_start()/schedule_end() (by way of the seqno mechanism). I
didn't add the same check to may_transmit(), because I assumed the
driver would not be looping in this case. Is that not correct?

> Also an option to add the node at head or tail would be preferred. If
> return_txq adds node at head of list, then it is forcing the driver to
> serve same txq until it becomes empty. Also this will not allow the
> driver to send N frames from each txqs.

The whole point of this patch set is to move those kinds of decisions
out of the driver and into mac80211. The airtime scheduler won't achieve
fairness if it allows queues to be queued to the end of the rotation
before its deficit turns negative. And obviously there's some lag in
this since we're using after-the-fact airtime information.

For ath9k this has not really been a problem in my tests; if the lag
turns out to be too great for ath10k (which I suppose is a possibility
since we don't get airtime information on every TX-compl), I figure we
can use the same estimated airtime value that is used for throttling the
queues to adjust the deficit immediately...

>> +/**
>> + * ieee80211_txq_schedule_start - acquire locks for safe scheduling of 
>> an AC
>> + *
>> + * @hw: pointer as obtained from ieee80211_alloc_hw()
>> + * @ac: AC number to acquire locks for
>> + *
>> + * Acquire locks needed to schedule TXQs from the given AC. Should be 
>> called
>> + * before ieee80211_next_txq() or ieee80211_schedule_txq().
>> + */
> Typo error. s/schedule_txq()/return_txq()/.

Yup, will fix :)

-Toke


Re: [PATCH RFC v4 1/4] mac80211: Add TXQ scheduling API

2018-09-17 Thread Rajkumar Manoharan

On 2018-09-16 10:42, Toke Høiland-Jørgensen wrote:

+/**
+ * ieee80211_return_txq - return a TXQ previously acquired by
ieee80211_next_txq()
+ *
+ * @hw: pointer as obtained from ieee80211_alloc_hw()
+ * @txq: pointer obtained from station or virtual interface
+ *
+ * Should only be called between calls to 
ieee80211_txq_schedule_start()

+ * and ieee80211_txq_schedule_end().
+ */
+void ieee80211_return_txq(struct ieee80211_hw *hw, struct 
ieee80211_txq *txq);

+

return_txq() should return a bool to inform the driver that whether txq 
is

queued back or not. Otherwise the same txq will be served indefinitely
until txq becomes empty. This problem occurs when the driver is running 
out

of hw descriptors or driver sends only N frames (< backlog_packets).

Also an option to add the node at head or tail would be preferred. If 
return_txq
adds node at head of list, then it is forcing the driver to serve same 
txq until it
becomes empty. Also this will not allow the driver to send N frames from 
each txqs.



+/**
+ * ieee80211_txq_schedule_start - acquire locks for safe scheduling of 
an AC

+ *
+ * @hw: pointer as obtained from ieee80211_alloc_hw()
+ * @ac: AC number to acquire locks for
+ *
+ * Acquire locks needed to schedule TXQs from the given AC. Should be 
called

+ * before ieee80211_next_txq() or ieee80211_schedule_txq().
+ */

Typo error. s/schedule_txq()/return_txq()/.

-Rajkumar


[PATCH RFC v4 1/4] mac80211: Add TXQ scheduling API

2018-09-16 Thread Toke Høiland-Jørgensen
This adds an API to mac80211 to handle scheduling of TXQs. The interface
between driver and mac80211 for TXQ handling is changed by adding two new
functions: ieee80211_next_txq(), which will return the next TXQ to schedule
in the current round-robin rotation, and ieee80211_return_txq(), which the
driver uses to indicate that it has finished scheduling a TXQ (which will
then be put back in the scheduling rotation if it isn't empty).

The driver must call ieee80211_txq_schedule_start() at the start of each
scheduling session, and ieee80211_txq_schedule_end() at the end. The API
then guarantees that the same TXQ is not returned twice in the same
session (so a driver can loop on ieee80211_next_txq() without worrying
about breaking the loop.

Usage of the new API is optional, so drivers can be ported one at a time.
In this patch, the actual scheduling performed by mac80211 is simple
round-robin, but a subsequent commit adds airtime fairness awareness to the
scheduler.

Signed-off-by: Toke Høiland-Jørgensen 
---
 include/net/mac80211.h |   62 +---
 net/mac80211/agg-tx.c  |2 +
 net/mac80211/driver-ops.h  |9 ++
 net/mac80211/ieee80211_i.h |9 ++
 net/mac80211/main.c|5 
 net/mac80211/sta_info.c|2 +
 net/mac80211/tx.c  |   59 +-
 7 files changed, 141 insertions(+), 7 deletions(-)

diff --git a/include/net/mac80211.h b/include/net/mac80211.h
index c4fadbafbf21..5ca1484cba58 100644
--- a/include/net/mac80211.h
+++ b/include/net/mac80211.h
@@ -108,9 +108,16 @@
  * The driver is expected to initialize its private per-queue data for stations
  * and interfaces in the .add_interface and .sta_add ops.
  *
- * The driver can't access the queue directly. To dequeue a frame, it calls
- * ieee80211_tx_dequeue(). Whenever mac80211 adds a new frame to a queue, it
- * calls the .wake_tx_queue driver op.
+ * The driver can't access the queue directly. To dequeue a frame from a
+ * txq, it calls ieee80211_tx_dequeue(). Whenever mac80211 adds a new frame to 
a
+ * queue, it calls the .wake_tx_queue driver op.
+ *
+ * Drivers can optionally delegate responsibility for scheduling queues to
+ * mac80211, to take advantage of airtime fairness accounting. In this case, to
+ * obtain the next queue to pull frames from, the driver calls
+ * ieee80211_next_txq(). The driver is then expected to re-schedule the txq
+ * using ieee80211_schedule_txq() if it is still active after the driver has
+ * finished pulling packets from it.
  *
  * For AP powersave TIM handling, the driver only needs to indicate if it has
  * buffered packets in the driver specific data structures by calling
@@ -6045,13 +6052,60 @@ void ieee80211_unreserve_tid(struct ieee80211_sta *sta, 
u8 tid);
  * ieee80211_tx_dequeue - dequeue a packet from a software tx queue
  *
  * @hw: pointer as obtained from ieee80211_alloc_hw()
- * @txq: pointer obtained from station or virtual interface
+ * @txq: pointer obtained from station or virtual interface, or from
+ *   ieee80211_next_txq()
  *
  * Returns the skb if successful, %NULL if no frame was available.
  */
 struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
 struct ieee80211_txq *txq);
 
+/**
+ * ieee80211_next_txq - get next tx queue to pull packets from
+ *
+ * @hw: pointer as obtained from ieee80211_alloc_hw()
+ * @ac: AC number to return packets from.
+ *
+ * Should only be called between calls to ieee80211_txq_schedule_start()
+ * and ieee80211_txq_schedule_end().
+ * Returns the next txq if successful, %NULL if no queue is eligible. If a txq
+ * is returned, it should be returned with ieee80211_return_txq() after the
+ * driver has finished scheduling it.
+ */
+struct ieee80211_txq *ieee80211_next_txq(struct ieee80211_hw *hw, u8 ac);
+
+/**
+ * ieee80211_return_txq - return a TXQ previously acquired by 
ieee80211_next_txq()
+ *
+ * @hw: pointer as obtained from ieee80211_alloc_hw()
+ * @txq: pointer obtained from station or virtual interface
+ *
+ * Should only be called between calls to ieee80211_txq_schedule_start()
+ * and ieee80211_txq_schedule_end().
+ */
+void ieee80211_return_txq(struct ieee80211_hw *hw, struct ieee80211_txq *txq);
+
+/**
+ * ieee80211_txq_schedule_start - acquire locks for safe scheduling of an AC
+ *
+ * @hw: pointer as obtained from ieee80211_alloc_hw()
+ * @ac: AC number to acquire locks for
+ *
+ * Acquire locks needed to schedule TXQs from the given AC. Should be called
+ * before ieee80211_next_txq() or ieee80211_schedule_txq().
+ */
+void ieee80211_txq_schedule_start(struct ieee80211_hw *hw, u8 ac);
+
+/**
+ * ieee80211_txq_schedule_end - release locks for safe scheduling of an AC
+ *
+ * @hw: pointer as obtained from ieee80211_alloc_hw()
+ * @ac: AC number to acquire locks for
+ *
+ * Release locks previously acquired by ieee80211_txq_schedule_end().
+ */
+void ieee80211_txq_schedule_end(struct