Re: [PATCH RFC v4 1/4] mac80211: Add TXQ scheduling API
On 2018-09-19 07:50, Toke Høiland-Jørgensen wrote: Kalle Valo writes: Toke Høiland-Jørgensen writes: Unfortunately ath10k is not reporting last_tx_rate in tx_status(). So I also applied this "ath10k: report tx rate using ieee80211_tx_status" change. Yeah, that and the patch that computes the last used rate will probably be necessary; but they can be pretty much applied as-is, right? Unfortunately not. I think the plan is now to follow Johannes' proposal: "I'd recommend against doing this and disentangling the necessary code in mac80211, e.g. with ieee80211_tx_status_ext() or adding similar APIs." https://patchwork.kernel.org/patch/10353959/ Ahh, right... *that* patch :) Was thinking on this one with the "as-is" comment: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/588189 It is useful only when the driver calls tx_status_noskb(). It was recommended not to call tx_status() and tx_status_noskb() APIs from same driver. Hence Anil was trying to piggyback tx rate report by tx_status itself. https://chromium.googlesource.com/chromiumos/third_party/kernel/+/1e034d84bd444fd29b7f902c5e033a8c737a58b2%5E%21/ https://chromium.googlesource.com/chromiumos/third_party/kernel/+/2a8da427fc9dfb527516e7ac395b1e6af73bff84%5E%21/ -Rajkumar
Re: [PATCH RFC v4 1/4] mac80211: Add TXQ scheduling API
Kalle Valo writes: > Toke Høiland-Jørgensen writes: > >>> Unfortunately ath10k is not reporting last_tx_rate in tx_status(). So >>> I also applied this "ath10k: report tx rate using ieee80211_tx_status" >>> change. >> >> Yeah, that and the patch that computes the last used rate will probably >> be necessary; but they can be pretty much applied as-is, right? > > Unfortunately not. I think the plan is now to follow Johannes' proposal: > >"I'd recommend against doing this and disentangling the necessary > code in mac80211, e.g. with ieee80211_tx_status_ext() or adding > similar APIs." > >https://patchwork.kernel.org/patch/10353959/ Ahh, right... *that* patch :) Was thinking on this one with the "as-is" comment: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/588189 -Toke
Re: [PATCH RFC v4 1/4] mac80211: Add TXQ scheduling API
Toke Høiland-Jørgensen writes: >> Unfortunately ath10k is not reporting last_tx_rate in tx_status(). So >> I also applied this "ath10k: report tx rate using ieee80211_tx_status" >> change. > > Yeah, that and the patch that computes the last used rate will probably > be necessary; but they can be pretty much applied as-is, right? Unfortunately not. I think the plan is now to follow Johannes' proposal: "I'd recommend against doing this and disentangling the necessary code in mac80211, e.g. with ieee80211_tx_status_ext() or adding similar APIs." https://patchwork.kernel.org/patch/10353959/ -- Kalle Valo
Re: [PATCH RFC v4 1/4] mac80211: Add TXQ scheduling API
Rajkumar Manoharan writes: > On 2018-09-18 13:41, Toke Høiland-Jørgensen wrote: >> Rajkumar Manoharan writes: >> > Also an option to add the node at head or tail would be preferred. > If > return_txq adds node at head of list, then it is forcing the driver > to > serve same txq until it becomes empty. Also this will not allow the > driver to send N frames from each txqs. The whole point of this patch set is to move those kinds of decisions out of the driver and into mac80211. The airtime scheduler won't achieve fairness if it allows queues to be queued to the end of the rotation before its deficit turns negative. And obviously there's some lag in this since we're using after-the-fact airtime information. >>> Hmm.. As you know ath10k kind of doing fairness by serving fixed >>> frames >>> from each txq. This approach will be removed from ath10k. >>> For ath9k this has not really been a problem in my tests; if the lag turns out to be too great for ath10k (which I suppose is a possibility since we don't get airtime information on every TX-compl), I figure we can use the same estimated airtime value that is used for throttling the queues to adjust the deficit immediately... >>> Thats true. I am porting Kan's changes of airtime estimation for each >>> msdu for firmware that does not report airtime. >> >> Right. My thinking with this was that we could put the per-frame >> airtime >> estimation into ieee80211_tx_dequeue(), which could track the >> outstanding airtime and just return NULL if it goes over the threshold. >> I think this is fairly straight-forward to do on its own; the biggest >> problem is probably finding the space in the mac80211 cb? >> >> Is this what you are working on porting? Because then I'll wait for >> your >> patch rather than starting to write this code myself :) >> > Kind of.. something like below. > > tx_dequeue(){ > compute airtime_est from last_tx_rate > if (sta->airtime[ac].deficit < airtime_est) > return NULL; > dequeue skb and store airtime_est in cb > } I think I would decouple it further and not use the deficit. But rather: tx_dequeue(){ if (sta->airtime[ac].outstanding > AIRTIME_OUTSTANDING_MAX) return NULL compute airtime_est from last_tx_rate dequeue skb and store airtime_est in cb sta->airtime[ac].outstanding += airtime_est; } > Unfortunately ath10k is not reporting last_tx_rate in tx_status(). So > I also applied this "ath10k: report tx rate using ieee80211_tx_status" > change. Yeah, that and the patch that computes the last used rate will probably be necessary; but they can be pretty much applied as-is, right? >> This mechanism on its own will get us the queue limiting and latency >> reduction goodness for firmwares with deep queues. And for that it can >> be completely independent of the airtime fairness scheduler, which can >> use the after-tx-compl airtime information to presumably get more >> accurate fairness which includes retransmissions etc. >> >> Now, we could *also* use the ahead-of-time airtime estimation for >> fairness; either just as a fallback for drivers that can't get actual >> airtime usage information for the hardware, or as an alternative in >> cases where it works better for other reasons. But I think that >> separating the two in the initial implementation makes more sense; that >> will make it easier to experiment with different combinations of the >> two. >> >> Does that make sense? :) >> > Completely agree. I was thinking of using this as fallback for devices > that does not report airtime but tx rate. Great! Seems we are converging on a workable solution, then :) -Toke
Re: [PATCH RFC v4 1/4] mac80211: Add TXQ scheduling API
On 2018-09-18 13:41, Toke Høiland-Jørgensen wrote: Rajkumar Manoharan writes: Also an option to add the node at head or tail would be preferred. If return_txq adds node at head of list, then it is forcing the driver to serve same txq until it becomes empty. Also this will not allow the driver to send N frames from each txqs. The whole point of this patch set is to move those kinds of decisions out of the driver and into mac80211. The airtime scheduler won't achieve fairness if it allows queues to be queued to the end of the rotation before its deficit turns negative. And obviously there's some lag in this since we're using after-the-fact airtime information. Hmm.. As you know ath10k kind of doing fairness by serving fixed frames from each txq. This approach will be removed from ath10k. For ath9k this has not really been a problem in my tests; if the lag turns out to be too great for ath10k (which I suppose is a possibility since we don't get airtime information on every TX-compl), I figure we can use the same estimated airtime value that is used for throttling the queues to adjust the deficit immediately... Thats true. I am porting Kan's changes of airtime estimation for each msdu for firmware that does not report airtime. Right. My thinking with this was that we could put the per-frame airtime estimation into ieee80211_tx_dequeue(), which could track the outstanding airtime and just return NULL if it goes over the threshold. I think this is fairly straight-forward to do on its own; the biggest problem is probably finding the space in the mac80211 cb? Is this what you are working on porting? Because then I'll wait for your patch rather than starting to write this code myself :) Kind of.. something like below. tx_dequeue(){ compute airtime_est from last_tx_rate if (sta->airtime[ac].deficit < airtime_est) return NULL; dequeue skb and store airtime_est in cb } Unfortunately ath10k is not reporting last_tx_rate in tx_status(). So I also applied this "ath10k: report tx rate using ieee80211_tx_status" change. This mechanism on its own will get us the queue limiting and latency reduction goodness for firmwares with deep queues. And for that it can be completely independent of the airtime fairness scheduler, which can use the after-tx-compl airtime information to presumably get more accurate fairness which includes retransmissions etc. Now, we could *also* use the ahead-of-time airtime estimation for fairness; either just as a fallback for drivers that can't get actual airtime usage information for the hardware, or as an alternative in cases where it works better for other reasons. But I think that separating the two in the initial implementation makes more sense; that will make it easier to experiment with different combinations of the two. Does that make sense? :) Completely agree. I was thinking of using this as fallback for devices that does not report airtime but tx rate. -Rajkumar
Re: [PATCH RFC v4 1/4] mac80211: Add TXQ scheduling API
Rajkumar Manoharan writes: >>> Also an option to add the node at head or tail would be preferred. If >>> return_txq adds node at head of list, then it is forcing the driver to >>> serve same txq until it becomes empty. Also this will not allow the >>> driver to send N frames from each txqs. >> >> The whole point of this patch set is to move those kinds of decisions >> out of the driver and into mac80211. The airtime scheduler won't >> achieve >> fairness if it allows queues to be queued to the end of the rotation >> before its deficit turns negative. And obviously there's some lag in >> this since we're using after-the-fact airtime information. >> > Hmm.. As you know ath10k kind of doing fairness by serving fixed frames > from each txq. This approach will be removed from ath10k. > >> For ath9k this has not really been a problem in my tests; if the lag >> turns out to be too great for ath10k (which I suppose is a possibility >> since we don't get airtime information on every TX-compl), I figure we >> can use the same estimated airtime value that is used for throttling >> the >> queues to adjust the deficit immediately... >> > Thats true. I am porting Kan's changes of airtime estimation for each > msdu for firmware that does not report airtime. Right. My thinking with this was that we could put the per-frame airtime estimation into ieee80211_tx_dequeue(), which could track the outstanding airtime and just return NULL if it goes over the threshold. I think this is fairly straight-forward to do on its own; the biggest problem is probably finding the space in the mac80211 cb? Is this what you are working on porting? Because then I'll wait for your patch rather than starting to write this code myself :) This mechanism on its own will get us the queue limiting and latency reduction goodness for firmwares with deep queues. And for that it can be completely independent of the airtime fairness scheduler, which can use the after-tx-compl airtime information to presumably get more accurate fairness which includes retransmissions etc. Now, we could *also* use the ahead-of-time airtime estimation for fairness; either just as a fallback for drivers that can't get actual airtime usage information for the hardware, or as an alternative in cases where it works better for other reasons. But I think that separating the two in the initial implementation makes more sense; that will make it easier to experiment with different combinations of the two. Does that make sense? :) -Toke
Re: [PATCH RFC v4 1/4] mac80211: Add TXQ scheduling API
On 2018-09-18 03:29, Toke Høiland-Jørgensen wrote: Rajkumar Manoharan writes: On 2018-09-16 10:42, Toke Høiland-Jørgensen wrote: return_txq() should return a bool to inform the driver that whether txq is queued back or not. What would the driver do with that return value, exactly? never mind.. got lost with earlier schedule_txq API. Otherwise the same txq will be served indefinitely until txq becomes empty. This problem occurs when the driver is running out of hw descriptors or driver sends only N frames (< backlog_packets). No, if it's using next_txq(), the API guarantees that the same TXQ will not be returned more than once between a set of calls to schedule_start()/schedule_end() (by way of the seqno mechanism). I didn't add the same check to may_transmit(), because I assumed the driver would not be looping in this case. Is that not correct? Yeah.. you are correct. sorry for the noise. Also an option to add the node at head or tail would be preferred. If return_txq adds node at head of list, then it is forcing the driver to serve same txq until it becomes empty. Also this will not allow the driver to send N frames from each txqs. The whole point of this patch set is to move those kinds of decisions out of the driver and into mac80211. The airtime scheduler won't achieve fairness if it allows queues to be queued to the end of the rotation before its deficit turns negative. And obviously there's some lag in this since we're using after-the-fact airtime information. Hmm.. As you know ath10k kind of doing fairness by serving fixed frames from each txq. This approach will be removed from ath10k. For ath9k this has not really been a problem in my tests; if the lag turns out to be too great for ath10k (which I suppose is a possibility since we don't get airtime information on every TX-compl), I figure we can use the same estimated airtime value that is used for throttling the queues to adjust the deficit immediately... Thats true. I am porting Kan's changes of airtime estimation for each msdu for firmware that does not report airtime. -Rajkumar
Re: [PATCH RFC v4 1/4] mac80211: Add TXQ scheduling API
Rajkumar Manoharan writes: > On 2018-09-16 10:42, Toke Høiland-Jørgensen wrote: >> +/** >> + * ieee80211_return_txq - return a TXQ previously acquired by >> ieee80211_next_txq() >> + * >> + * @hw: pointer as obtained from ieee80211_alloc_hw() >> + * @txq: pointer obtained from station or virtual interface >> + * >> + * Should only be called between calls to >> ieee80211_txq_schedule_start() >> + * and ieee80211_txq_schedule_end(). >> + */ >> +void ieee80211_return_txq(struct ieee80211_hw *hw, struct >> ieee80211_txq *txq); >> + >> > return_txq() should return a bool to inform the driver that whether > txq is queued back or not. What would the driver do with that return value, exactly? > Otherwise the same txq will be served indefinitely until txq becomes > empty. This problem occurs when the driver is running out of hw > descriptors or driver sends only N frames (< backlog_packets). No, if it's using next_txq(), the API guarantees that the same TXQ will not be returned more than once between a set of calls to schedule_start()/schedule_end() (by way of the seqno mechanism). I didn't add the same check to may_transmit(), because I assumed the driver would not be looping in this case. Is that not correct? > Also an option to add the node at head or tail would be preferred. If > return_txq adds node at head of list, then it is forcing the driver to > serve same txq until it becomes empty. Also this will not allow the > driver to send N frames from each txqs. The whole point of this patch set is to move those kinds of decisions out of the driver and into mac80211. The airtime scheduler won't achieve fairness if it allows queues to be queued to the end of the rotation before its deficit turns negative. And obviously there's some lag in this since we're using after-the-fact airtime information. For ath9k this has not really been a problem in my tests; if the lag turns out to be too great for ath10k (which I suppose is a possibility since we don't get airtime information on every TX-compl), I figure we can use the same estimated airtime value that is used for throttling the queues to adjust the deficit immediately... >> +/** >> + * ieee80211_txq_schedule_start - acquire locks for safe scheduling of >> an AC >> + * >> + * @hw: pointer as obtained from ieee80211_alloc_hw() >> + * @ac: AC number to acquire locks for >> + * >> + * Acquire locks needed to schedule TXQs from the given AC. Should be >> called >> + * before ieee80211_next_txq() or ieee80211_schedule_txq(). >> + */ > Typo error. s/schedule_txq()/return_txq()/. Yup, will fix :) -Toke
Re: [PATCH RFC v4 1/4] mac80211: Add TXQ scheduling API
On 2018-09-16 10:42, Toke Høiland-Jørgensen wrote: +/** + * ieee80211_return_txq - return a TXQ previously acquired by ieee80211_next_txq() + * + * @hw: pointer as obtained from ieee80211_alloc_hw() + * @txq: pointer obtained from station or virtual interface + * + * Should only be called between calls to ieee80211_txq_schedule_start() + * and ieee80211_txq_schedule_end(). + */ +void ieee80211_return_txq(struct ieee80211_hw *hw, struct ieee80211_txq *txq); + return_txq() should return a bool to inform the driver that whether txq is queued back or not. Otherwise the same txq will be served indefinitely until txq becomes empty. This problem occurs when the driver is running out of hw descriptors or driver sends only N frames (< backlog_packets). Also an option to add the node at head or tail would be preferred. If return_txq adds node at head of list, then it is forcing the driver to serve same txq until it becomes empty. Also this will not allow the driver to send N frames from each txqs. +/** + * ieee80211_txq_schedule_start - acquire locks for safe scheduling of an AC + * + * @hw: pointer as obtained from ieee80211_alloc_hw() + * @ac: AC number to acquire locks for + * + * Acquire locks needed to schedule TXQs from the given AC. Should be called + * before ieee80211_next_txq() or ieee80211_schedule_txq(). + */ Typo error. s/schedule_txq()/return_txq()/. -Rajkumar
[PATCH RFC v4 1/4] mac80211: Add TXQ scheduling API
This adds an API to mac80211 to handle scheduling of TXQs. The interface between driver and mac80211 for TXQ handling is changed by adding two new functions: ieee80211_next_txq(), which will return the next TXQ to schedule in the current round-robin rotation, and ieee80211_return_txq(), which the driver uses to indicate that it has finished scheduling a TXQ (which will then be put back in the scheduling rotation if it isn't empty). The driver must call ieee80211_txq_schedule_start() at the start of each scheduling session, and ieee80211_txq_schedule_end() at the end. The API then guarantees that the same TXQ is not returned twice in the same session (so a driver can loop on ieee80211_next_txq() without worrying about breaking the loop. Usage of the new API is optional, so drivers can be ported one at a time. In this patch, the actual scheduling performed by mac80211 is simple round-robin, but a subsequent commit adds airtime fairness awareness to the scheduler. Signed-off-by: Toke Høiland-Jørgensen --- include/net/mac80211.h | 62 +--- net/mac80211/agg-tx.c |2 + net/mac80211/driver-ops.h |9 ++ net/mac80211/ieee80211_i.h |9 ++ net/mac80211/main.c|5 net/mac80211/sta_info.c|2 + net/mac80211/tx.c | 59 +- 7 files changed, 141 insertions(+), 7 deletions(-) diff --git a/include/net/mac80211.h b/include/net/mac80211.h index c4fadbafbf21..5ca1484cba58 100644 --- a/include/net/mac80211.h +++ b/include/net/mac80211.h @@ -108,9 +108,16 @@ * The driver is expected to initialize its private per-queue data for stations * and interfaces in the .add_interface and .sta_add ops. * - * The driver can't access the queue directly. To dequeue a frame, it calls - * ieee80211_tx_dequeue(). Whenever mac80211 adds a new frame to a queue, it - * calls the .wake_tx_queue driver op. + * The driver can't access the queue directly. To dequeue a frame from a + * txq, it calls ieee80211_tx_dequeue(). Whenever mac80211 adds a new frame to a + * queue, it calls the .wake_tx_queue driver op. + * + * Drivers can optionally delegate responsibility for scheduling queues to + * mac80211, to take advantage of airtime fairness accounting. In this case, to + * obtain the next queue to pull frames from, the driver calls + * ieee80211_next_txq(). The driver is then expected to re-schedule the txq + * using ieee80211_schedule_txq() if it is still active after the driver has + * finished pulling packets from it. * * For AP powersave TIM handling, the driver only needs to indicate if it has * buffered packets in the driver specific data structures by calling @@ -6045,13 +6052,60 @@ void ieee80211_unreserve_tid(struct ieee80211_sta *sta, u8 tid); * ieee80211_tx_dequeue - dequeue a packet from a software tx queue * * @hw: pointer as obtained from ieee80211_alloc_hw() - * @txq: pointer obtained from station or virtual interface + * @txq: pointer obtained from station or virtual interface, or from + * ieee80211_next_txq() * * Returns the skb if successful, %NULL if no frame was available. */ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw, struct ieee80211_txq *txq); +/** + * ieee80211_next_txq - get next tx queue to pull packets from + * + * @hw: pointer as obtained from ieee80211_alloc_hw() + * @ac: AC number to return packets from. + * + * Should only be called between calls to ieee80211_txq_schedule_start() + * and ieee80211_txq_schedule_end(). + * Returns the next txq if successful, %NULL if no queue is eligible. If a txq + * is returned, it should be returned with ieee80211_return_txq() after the + * driver has finished scheduling it. + */ +struct ieee80211_txq *ieee80211_next_txq(struct ieee80211_hw *hw, u8 ac); + +/** + * ieee80211_return_txq - return a TXQ previously acquired by ieee80211_next_txq() + * + * @hw: pointer as obtained from ieee80211_alloc_hw() + * @txq: pointer obtained from station or virtual interface + * + * Should only be called between calls to ieee80211_txq_schedule_start() + * and ieee80211_txq_schedule_end(). + */ +void ieee80211_return_txq(struct ieee80211_hw *hw, struct ieee80211_txq *txq); + +/** + * ieee80211_txq_schedule_start - acquire locks for safe scheduling of an AC + * + * @hw: pointer as obtained from ieee80211_alloc_hw() + * @ac: AC number to acquire locks for + * + * Acquire locks needed to schedule TXQs from the given AC. Should be called + * before ieee80211_next_txq() or ieee80211_schedule_txq(). + */ +void ieee80211_txq_schedule_start(struct ieee80211_hw *hw, u8 ac); + +/** + * ieee80211_txq_schedule_end - release locks for safe scheduling of an AC + * + * @hw: pointer as obtained from ieee80211_alloc_hw() + * @ac: AC number to acquire locks for + * + * Release locks previously acquired by ieee80211_txq_schedule_end(). + */ +void ieee80211_txq_schedule_end(struct