Re: [RFC v2 1/4] mac80211: Add TXQ scheduling API

2018-08-29 Thread Toke Høiland-Jørgensen
Johannes Berg  writes:

> On Wed, 2018-08-29 at 11:22 +0200, Toke Høiland-Jørgensen wrote:
>
>> > We're also working on adding a TXQ for (bufferable) management packets
>> > - I wonder how that should interact here? Always be scheduled first?
>> 
>> Ah, cool! It may be that it should be given priority, yeah. Presently,
>> the multicast queue just rotates in the RR with the others, but is never
>> throttled as it doesn't have an airtime measure (though perhaps it
>> should)? But that might not be desirable for management frames, as
>> presumable those need to go out as fast as possible?
>
> Good question. I guess the multicast should have an airtime measure,
> but I don't think we want to do accounting on the management. That
> really should be few frames, and we want them out ASAP in most cases.

Yup, makes sense.

>> Hmm, I seem to recall thinking about just putting the call to
>> schedule_txq() into drv_wake_tx_queue; don't remember why I ended up
>> dropping that; but will take another look at whether it makes sense to
>> combine things.
>
> I was thinking the other way around - but that doesn't work since you'd
> loop from the driver to itself. This way might work, I guess, hadn't
> considered that.
>
> Might be better anyway though to make a new inline that does both, since
> the drv_() calls usually really don't do much on their own (other than
> checks, and in one case backward compatibility code, but still)

ACK, I'll look into that.

-Toke


Re: [RFC v2 1/4] mac80211: Add TXQ scheduling API

2018-08-29 Thread Johannes Berg
On Wed, 2018-08-29 at 11:22 +0200, Toke Høiland-Jørgensen wrote:

> > We're also working on adding a TXQ for (bufferable) management packets
> > - I wonder how that should interact here? Always be scheduled first?
> 
> Ah, cool! It may be that it should be given priority, yeah. Presently,
> the multicast queue just rotates in the RR with the others, but is never
> throttled as it doesn't have an airtime measure (though perhaps it
> should)? But that might not be desirable for management frames, as
> presumable those need to go out as fast as possible?

Good question. I guess the multicast should have an airtime measure, but
I don't think we want to do accounting on the management. That really
should be few frames, and we want them out ASAP in most cases.

> Hmm, I seem to recall thinking about just putting the call to
> schedule_txq() into drv_wake_tx_queue; don't remember why I ended up
> dropping that; but will take another look at whether it makes sense to
> combine things.

I was thinking the other way around - but that doesn't work since you'd
loop from the driver to itself. This way might work, I guess, hadn't
considered that.

Might be better anyway though to make a new inline that does both, since
the drv_() calls usually really don't do much on their own (other than
checks, and in one case backward compatibility code, but still)

johannes


Re: [RFC v2 1/4] mac80211: Add TXQ scheduling API

2018-08-29 Thread Toke Høiland-Jørgensen
Johannes Berg  writes:

> Rather belatedly reviewing this too ...
>
>> + * The driver can't access the queue directly. To dequeue a frame from a
>> + * txq, it calls ieee80211_tx_dequeue(). Whenever mac80211 adds a new frame 
>> to a
>> + * queue, it calls the .wake_tx_queue driver op.
>> + *
>> + * Drivers can optionally delegate responsibility for scheduling queues to
>> + * mac80211, to take advantage of airtime fairness accounting. In this 
>> case, to
>> + * obtain the next queue to pull frames from, the driver calls
>> + * ieee80211_next_txq(). The driver is then expected to re-schedule the txq
>> + * using ieee80211_schedule_txq() if it is still active after the driver has
>> + * finished pulling packets from it.
>
> Maybe you could clarify what "is still active" means here? I'm
> guessing it means "tx_dequeue() would return non-NULL" but I doubt you
> really want such a strong requirement, perhaps "the last tx_dequeue()
> returned non-NULL" is sufficient?

Yeah, the latter should suffice. Really it should be put back "if it is
not known to be empty", but, well, that doesn't read so well...

> We're also working on adding a TXQ for (bufferable) management packets
> - I wonder how that should interact here? Always be scheduled first?

Ah, cool! It may be that it should be given priority, yeah. Presently,
the multicast queue just rotates in the RR with the others, but is never
throttled as it doesn't have an airtime measure (though perhaps it
should)? But that might not be desirable for management frames, as
presumable those need to go out as fast as possible?

>> +/**
>> + * ieee80211_schedule_txq - add txq to scheduling loop
>> + *
>> + * @hw: pointer as obtained from ieee80211_alloc_hw()
>> + * @txq: pointer obtained from station or virtual interface
>> + * @reset_seqno: Whether to reset the internal scheduling sequence number,
>> + *   allowing this txq to appear again in the current scheduling
>> + *   round (see doc for ieee80211_next_txq()).
>> + *
>> + * Returns %true if the txq was actually added to the scheduling,
>> + * %false otherwise.
>> + */
>> +bool ieee80211_schedule_txq(struct ieee80211_hw *hw,
>> +struct ieee80211_txq *txq,
>> +bool reset_seqno);
>
> I concur with Rajkumar in a way that "seqno" is a really bad name for
> this since it's so loaded in the context of wifi. He didn't say it this
> way, but said it was an "internal to mac80211" concept here, and was
> perhaps also/more referring to the manipulations by drivers.
>
> Perhaps calling it something like scheduling_round or something would be
> better? That's not really what it is, schedule_id? Even schedule_seq[no]
> would be clearer.

Yeah, definitely going to change that :)

>> +/**
>> + * ieee80211_next_txq - get next tx queue to pull packets from
>> + *
>> + * @hw: pointer as obtained from ieee80211_alloc_hw()
>> + * @ac: filter returned txqs with this AC number. Pass -1 for no filtering.
>> + * @inc_seqno: Whether to increase the scheduling sequence number. Setting 
>> this
>> + * to true signifies the start of a new scheduling round. Each 
>> TXQ
>> + * will only be returned exactly once in each round (unless its
>> + * sequence number is explicitly reset when calling
>> + * ieee80211_schedule_txq()).
>
> Here you already talk about "scheduling sequence number" after all :)
>
>> +ieee80211_schedule_txq(>hw, >txq, true);
>>  drv_wake_tx_queue(local, txqi);
>
> These always seem to come paired - perhaps a helper is useful?
>
> Although it looks like there are sometimes different locking contexts,
> not sure if that's really necessary though.

Hmm, I seem to recall thinking about just putting the call to
schedule_txq() into drv_wake_tx_queue; don't remember why I ended up
dropping that; but will take another look at whether it makes sense to
combine things.

-Toke


Re: [RFC v2 1/4] mac80211: Add TXQ scheduling API

2018-08-29 Thread Johannes Berg
Rather belatedly reviewing this too ...

> + * The driver can't access the queue directly. To dequeue a frame from a
> + * txq, it calls ieee80211_tx_dequeue(). Whenever mac80211 adds a new frame 
> to a
> + * queue, it calls the .wake_tx_queue driver op.
> + *
> + * Drivers can optionally delegate responsibility for scheduling queues to
> + * mac80211, to take advantage of airtime fairness accounting. In this case, 
> to
> + * obtain the next queue to pull frames from, the driver calls
> + * ieee80211_next_txq(). The driver is then expected to re-schedule the txq
> + * using ieee80211_schedule_txq() if it is still active after the driver has
> + * finished pulling packets from it.

Maybe you could clarify what "is still active" means here? I'm guessing
it means "tx_dequeue() would return non-NULL" but I doubt you really
want such a strong requirement, perhaps "the last tx_dequeue() returned
non-NULL" is sufficient?


We're also working on adding a TXQ for (bufferable) management packets -
I wonder how that should interact here? Always be scheduled first?

> +/**
> + * ieee80211_schedule_txq - add txq to scheduling loop
> + *
> + * @hw: pointer as obtained from ieee80211_alloc_hw()
> + * @txq: pointer obtained from station or virtual interface
> + * @reset_seqno: Whether to reset the internal scheduling sequence number,
> + *   allowing this txq to appear again in the current scheduling
> + *   round (see doc for ieee80211_next_txq()).
> + *
> + * Returns %true if the txq was actually added to the scheduling,
> + * %false otherwise.
> + */
> +bool ieee80211_schedule_txq(struct ieee80211_hw *hw,
> + struct ieee80211_txq *txq,
> + bool reset_seqno);

I concur with Rajkumar in a way that "seqno" is a really bad name for
this since it's so loaded in the context of wifi. He didn't say it this
way, but said it was an "internal to mac80211" concept here, and was
perhaps also/more referring to the manipulations by drivers.

Perhaps calling it something like scheduling_round or something would be
better? That's not really what it is, schedule_id? Even schedule_seq[no]
would be clearer.

> +/**
> + * ieee80211_next_txq - get next tx queue to pull packets from
> + *
> + * @hw: pointer as obtained from ieee80211_alloc_hw()
> + * @ac: filter returned txqs with this AC number. Pass -1 for no filtering.
> + * @inc_seqno: Whether to increase the scheduling sequence number. Setting 
> this
> + * to true signifies the start of a new scheduling round. Each 
> TXQ
> + * will only be returned exactly once in each round (unless its
> + * sequence number is explicitly reset when calling
> + * ieee80211_schedule_txq()).

Here you already talk about "scheduling sequence number" after all :)

> + ieee80211_schedule_txq(>hw, >txq, true);
>   drv_wake_tx_queue(local, txqi);

These always seem to come paired - perhaps a helper is useful?

Although it looks like there are sometimes different locking contexts,
not sure if that's really necessary though.

johannes


Re: [RFC v2 1/4] mac80211: Add TXQ scheduling API

2018-07-30 Thread Rajkumar Manoharan

On 2018-07-30 15:48, Toke Høiland-Jørgensen wrote:

Rajkumar Manoharan  writes:

Hmm, the driver really shouldn't need to do any locking apart from 
using

the next_txq() API. But I think you are right that the seqno mechanism
doesn't play well with unserialised access to through next_txq() from
multiple CPUs. I'll see what I can do about that, and also incorporate
the other changes we've been discussing into a new RFC series.


Great.. :)


Hmm.. reorder_txq() API may not needed. Calling next_txq() takes care
of reordering list though the driver is accessing txqs directly.


Right, as long as the next_txq() path is still called even while
fetch_ind() is active, that should be fine.


I am still wondering how and when to refill deficit in case next_txq()
won't be called. :(

Will post RFC change on top of yours.

-Rajkumar



Re: [RFC v2 1/4] mac80211: Add TXQ scheduling API

2018-07-30 Thread Toke Høiland-Jørgensen
Rajkumar Manoharan  writes:

>>> A simple change in argument may break algo. What would be seqno of
>>> first packet of txq if queue_skb() isn't reset seqno?
>> 
>> It would be 0, which would be less than the current seqno in all cases
>> except just when the seqno counter wraps.
>> 
>
> One point should be mentioned in comment section of next_txq() that
> the driver has to ensure that next_txq() iteration is serialized i.e
> no parallel txq traversal are encouraged. Though txqs_list is maintained
> in mac80211, parallel iteration may change the behavior. For example
> I thought of get rid of txqs_lock in ath10k as txqs_list is moved to 
> mac80211.
> But it is still needed. right?

Hmm, the driver really shouldn't need to do any locking apart from using
the next_txq() API. But I think you are right that the seqno mechanism
doesn't play well with unserialised access to through next_txq() from
multiple CPUs. I'll see what I can do about that, and also incorporate
the other changes we've been discussing into a new RFC series.

> Hmm.. reorder_txq() API may not needed. Calling next_txq() takes care
> of reordering list though the driver is accessing txqs directly.

Right, as long as the next_txq() path is still called even while
fetch_ind() is active, that should be fine.

>>> If we don't handle this case, then ath10k driver can not claim
>>> mac80211 ATF support. Agree that MU-MIMO won't work with DDR
>>> scheduler. and it will impact MU-MIMO performace in ath10k when
>>> mac80211 ATF is claimed by ath10k.
>> 
>> Yeah, so the question is if this is an acceptable tradeoff? Do you have
>> any idea how often MU-MIMO actually provides a benefit in the real
>> world?
>> 
> Hmm.. We have some criteria of ~1.9 gain in Mu-MIMO test cases with 50
> 11ac clients.

Hmm, yeah, that would be a shame to lose; although I do think 50-client
deployments are still relatively rare for many people. So maybe we can
make airtime fairness something that can be switched on and off for
ath10k, depending on whether users think they will be needing MU-MIMO?
Until we come up with a scheduler that can handle it, of course...

-Toke


Re: [RFC v2 1/4] mac80211: Add TXQ scheduling API

2018-07-30 Thread Rajkumar Manoharan

On 2018-07-24 04:08, Toke Høiland-Jørgensen wrote:

Rajkumar Manoharan  writes:



Sorry for the late reply.


I would prefer to keep separate list for each AC ;) Prepared one on
top of your earlier merged changes. Now it needs revisit.


Fine with me. Only reason I used a single list + the filtering 
mechanism

was to keep things compatible with ath10k ;)


Thanks :). If so, I will defer the per-ac-list later.


A simple change in argument may break algo. What would be seqno of
first packet of txq if queue_skb() isn't reset seqno?


It would be 0, which would be less than the current seqno in all cases
except just when the seqno counter wraps.



One point should be mentioned in comment section of next_txq() that
the driver has to ensure that next_txq() iteration is serialized i.e
no parallel txq traversal are encouraged. Though txqs_list is maintained
in mac80211, parallel iteration may change the behavior. For example
I thought of get rid of txqs_lock in ath10k as txqs_list is moved to 
mac80211.

But it is still needed. right?


I am fine in passing start of loop as arg to next_ntx(). But prefer to
keep loop detecting within mac80211 itself by tracking txq same as
ath10k. So that no change is needed in schedule_txq() arg list.
thoughts?


If we remove the reset argument to schedule_txq we lose the ability for
the driver to signal that it is OK to dequeue another series of packets
from the same TXQ in the current scheduling round. I think that things
would mostly still work, but we may risk the hardware running idle for
short periods of time (in ath9k at least). I am not quite sure which
conditions this would happen under, though; and I haven't been able to
trigger it myself with my test hardware. So one approach would be to 
try
it out without the parameter and put it back later if it turns out to 
be

problematic...


Agree. Based on experiments, additional arguments can be extended later.
Currently in ath10k, txqs are added back to list when there are pending
frames and are not considered again in the same iteration. It will be
processed in next loop.



ieee80211_reorder_txq() - after dequeuing skb (in direct txq access),
   txq will be deleted from list and if there 
are

pending
   frames, it will be requeued.


This could work, but reorder_txq() can't do the reordering from the
middle of the rotation. I.e, it can only reorder if the TXQ being 
passed

in is currently at the head of the list of active TXQs.

If we do go for this, I think it would make sense to use it everywhere.
I.e., next_txq() doesn't remove the queue, it just returns what is
currently at the head; and the driver will have to call reorder() after
processing a TXQ, instead of schedule_txq().


Hmm.. reorder_txq() API may not needed. Calling next_txq() takes care of
reordering list though the driver is accessing txqs directly.

And also as we discussed earlier, I will drop txq_may_pull() API and 
move

the deficit check under tx_dequeue().


Right, so with the current DRR scheduler, the only thing we can do with
this setup is throttle fetch_ind() if the TXQ isn't eligible for
scheduling. What happens if we do that? As far as I can tell,
fetch_ind() is triggered on tx completion from the same TXQ, right? So
if we throttle that, the driver falls back to the tx_push_pending()
rotation?

The fallback push will be effective only if current queued count is 
lesser than
allowed. Otherwise push_pending() won't be effective even after 
throttling
fetch_ind(). The same txq will be served in further fetch_ind() when the 
driver

reports to firmware about pending counts.


I think adding the throttling to tx_fetch_ind() would basically amount
to disabling that mechanism for most transmission scenarios...

If you don't throttle fetch_ind(), fairness won't be effective. Also I 
am
thinking of reordering push_pending() after processing fetch_ind_q. This 
will

give enough chances for fetch_ind().


If we don't handle this case, then ath10k driver can not claim
mac80211 ATF support. Agree that MU-MIMO won't work with DDR
scheduler. and it will impact MU-MIMO performace in ath10k when
mac80211 ATF is claimed by ath10k.


Yeah, so the question is if this is an acceptable tradeoff? Do you have
any idea how often MU-MIMO actually provides a benefit in the real
world?

Hmm.. We have some criteria of ~1.9 gain in Mu-MIMO test cases with 50 
11ac clients.


-Rajkumar


Re: [RFC v2 1/4] mac80211: Add TXQ scheduling API

2018-07-24 Thread Toke Høiland-Jørgensen
Rajkumar Manoharan  writes:

> On 2018-07-21 13:54, Toke Høiland-Jørgensen wrote:
>> Rajkumar Manoharan  writes:
>> 
>>> On 2018-07-19 07:18, Toke Høiland-Jørgensen wrote:
 Rajkumar Manoharan  writes:
 
> On 2018-07-13 06:39, Toke Høiland-Jørgensen wrote:
>> Rajkumar Manoharan  writes:
>>> 
>> It is not generally predictable how many times this will loop 
>> before
>> exiting...
>> 
> Agree.. It would be better If the driver does not worry about txq
> sequence numbering. Perhaps one more API (ieee80211_first_txq) could
> solve this. Will leave it to you.
 
 That is basically what the second parameter to next_txq() does in the
 current patchset. It just needs to be renamed...
 
>>> Agree. As next_txq() increments seqno while starting loop for each AC,
>>> It seems bit confusing. i.e A single ath_txq_schedule_all call will
>>> bump schedule_seqno by 4. right?
>> 
>> Hmm, good point. Guess there should be one seqno per ac...
>> 
> I would prefer to keep separate list for each AC ;) Prepared one on
> top of your earlier merged changes. Now it needs revisit.

Fine with me. Only reason I used a single list + the filtering mechanism
was to keep things compatible with ath10k ;)

>>> Let assume below case where CPU1 is dequeuing skb from isr context and
>>> CPU2 is enqueuing skbs into same txq.
>>> 
>>> CPU1  CPU2
>>>   
>>> ath_txq_schedule
>>>-> ieee80211_next_txq(first)
>>>  -> inc_seq
>>>  -> delete from list
>>>  -> txq->seqno = local->seqno
>>>   ieee80211_tx/fast_xmit
>>>  -> 
>>> ieee80211_queue_skb
>>>  ->
>>> ieee80211_schedule_txq(reset_seqno)
>>>   -> list_add
>>>   -> 
>>> txqi->seqno
>>> = local->seqno - 1
>>> 
>>> In above sequence, it is quite possible that the same txq will be
>>> served repeatedly and other txqs will be starved. am I right? IMHO
>>> seqno mechanism will not guarantee that txqs will be processed only
>>> once in an iteration.
>> 
>> Yeah, ieee80211_queue_skb() shouldn't set reset_seqno; was 
>> experimenting
>> with that, and guess I ended up picking the wrong value. Thanks for
>> pointing that out :)
>> 
> A simple change in argument may break algo. What would be seqno of
> first packet of txq if queue_skb() isn't reset seqno?

It would be 0, which would be less than the current seqno in all cases
except just when the seqno counter wraps.

> I am fine in passing start of loop as arg to next_ntx(). But prefer to
> keep loop detecting within mac80211 itself by tracking txq same as
> ath10k. So that no change is needed in schedule_txq() arg list.
> thoughts?

If we remove the reset argument to schedule_txq we lose the ability for
the driver to signal that it is OK to dequeue another series of packets
from the same TXQ in the current scheduling round. I think that things
would mostly still work, but we may risk the hardware running idle for
short periods of time (in ath9k at least). I am not quite sure which
conditions this would happen under, though; and I haven't been able to
trigger it myself with my test hardware. So one approach would be to try
it out without the parameter and put it back later if it turns out to be
problematic...

>> Well, it could conceivably be done in a way that doesn't require
>> taking
>> the activeq_lock. Adding another STOP flag to the TXQ, for 
>> instance.
>> From an API point of view I think that is more consistent with what
>> we
>> have already...
>> 
> 
> Make sense. ieee80211_txq_may_pull would be better place to decide
> whether given txq is allowed for transmission. It also makes drivers
> do not have to worry about deficit. Still I may need
> ieee80211_reorder_txq API after processing txq. isn't it?
 
 The way I was assuming this would work (and what ath9k does), is that 
 a
 driver only operates on one TXQ at a time; so it can get that txq,
 process it, and re-schedule it. In which case no other API is needed;
 the rotating can be done in next_txq(), and schedule_txq() can insert
 the txq back into the rotation as needed.
 
 However, it sounds like this is not how ath10k does things? See 
 below.
 
>>> correct. The current rotation works only in push-only mode. i.e when
>>> firmware is not deciding txqs and the driver picks priority txq from
>>> active_txqs list. Unfortunately rotation won't work when the driver
>>> selects txq other than first in the list. In any case separate API
>>> needed for rotation when the driver is processing few packets from txq
>>> instead of all pending frames.
>> 
>> Rotation is not dependent on 

Re: [RFC v2 1/4] mac80211: Add TXQ scheduling API

2018-07-23 Thread Rajkumar Manoharan

On 2018-07-21 13:54, Toke Høiland-Jørgensen wrote:

Rajkumar Manoharan  writes:


On 2018-07-19 07:18, Toke Høiland-Jørgensen wrote:

Rajkumar Manoharan  writes:


On 2018-07-13 06:39, Toke Høiland-Jørgensen wrote:

Rajkumar Manoharan  writes:


It is not generally predictable how many times this will loop 
before

exiting...


Agree.. It would be better If the driver does not worry about txq
sequence numbering. Perhaps one more API (ieee80211_first_txq) could
solve this. Will leave it to you.


That is basically what the second parameter to next_txq() does in the
current patchset. It just needs to be renamed...


Agree. As next_txq() increments seqno while starting loop for each AC,
It seems bit confusing. i.e A single ath_txq_schedule_all call will
bump schedule_seqno by 4. right?


Hmm, good point. Guess there should be one seqno per ac...

I would prefer to keep separate list for each AC ;) Prepared one on top 
of

your earlier merged changes. Now it needs revisit.


Let assume below case where CPU1 is dequeuing skb from isr context and
CPU2 is enqueuing skbs into same txq.

CPU1  CPU2
  
ath_txq_schedule
   -> ieee80211_next_txq(first)
 -> inc_seq
 -> delete from list
 -> txq->seqno = local->seqno
  ieee80211_tx/fast_xmit
 -> 
ieee80211_queue_skb

 ->
ieee80211_schedule_txq(reset_seqno)
  -> list_add
  -> 
txqi->seqno

= local->seqno - 1

In above sequence, it is quite possible that the same txq will be
served repeatedly and other txqs will be starved. am I right? IMHO
seqno mechanism will not guarantee that txqs will be processed only
once in an iteration.


Yeah, ieee80211_queue_skb() shouldn't set reset_seqno; was 
experimenting

with that, and guess I ended up picking the wrong value. Thanks for
pointing that out :)


A simple change in argument may break algo. What would be seqno of first
packet of txq if queue_skb() isn't reset seqno? I am fine in passing 
start

of loop as arg to next_ntx(). But prefer to keep loop detecting within
mac80211 itself by tracking txq same as ath10k. So that no change is 
needed

in schedule_txq() arg list. thoughts?


Well, it could conceivably be done in a way that doesn't require
taking
the activeq_lock. Adding another STOP flag to the TXQ, for 
instance.

From an API point of view I think that is more consistent with what
we
have already...



Make sense. ieee80211_txq_may_pull would be better place to decide
whether given txq is allowed for transmission. It also makes drivers
do not have to worry about deficit. Still I may need
ieee80211_reorder_txq API after processing txq. isn't it?


The way I was assuming this would work (and what ath9k does), is that 
a

driver only operates on one TXQ at a time; so it can get that txq,
process it, and re-schedule it. In which case no other API is needed;
the rotating can be done in next_txq(), and schedule_txq() can insert
the txq back into the rotation as needed.

However, it sounds like this is not how ath10k does things? See 
below.



correct. The current rotation works only in push-only mode. i.e when
firmware is not deciding txqs and the driver picks priority txq from
active_txqs list. Unfortunately rotation won't work when the driver
selects txq other than first in the list. In any case separate API
needed for rotation when the driver is processing few packets from txq
instead of all pending frames.


Rotation is not dependent on the TXQ draining completely. Dequeueing a
few packets, then rescheduling the TXQ is fine.


The problem is that when the driver accesses txq directly, it wont call
next_txq(). So the txq will not dequeued from list and schedule_txq() 
wont

be effective. right?

ieee80211_txq_may_pull() - check whether txq is allowed for tx_dequeue()
  that helps to keep deficit check in mac80211 
itself


ieee80211_reorder_txq() - after dequeuing skb (in direct txq access),
  txq will be deleted from list and if there are 
pending

  frames, it will be requeued.


And if so, how does that interact with ath10k_mac_tx_push_pending()?
That function is basically doing the same thing that I explained 
above
for ath9k; in the previous version of this patch series I modified 
that

to use next_txq(). But is that a different TX path, or am I
misunderstanding you?

If you could point me to which parts of the ath10k code I should be
looking at, that would be helpful as well :)


Depends on firmware htt_tx_mode (push/push_pull),
ath10k_mac_tx_push_pending() downloads all/partial frames to firmware.
Please take a look at ath10k_mac_tx_can_push() in push_pending(). Let
me know If you 

Re: [RFC v2 1/4] mac80211: Add TXQ scheduling API

2018-07-21 Thread Toke Høiland-Jørgensen
Rajkumar Manoharan  writes:

> On 2018-07-19 07:18, Toke Høiland-Jørgensen wrote:
>> Rajkumar Manoharan  writes:
>> 
>>> On 2018-07-13 06:39, Toke Høiland-Jørgensen wrote:
 Rajkumar Manoharan  writes:
>
 It is not generally predictable how many times this will loop before
 exiting...
 
>>> Agree.. It would be better If the driver does not worry about txq
>>> sequence numbering. Perhaps one more API (ieee80211_first_txq) could
>>> solve this. Will leave it to you.
>> 
>> That is basically what the second parameter to next_txq() does in the
>> current patchset. It just needs to be renamed...
>> 
> Agree. As next_txq() increments seqno while starting loop for each AC,
> It seems bit confusing. i.e A single ath_txq_schedule_all call will
> bump schedule_seqno by 4. right?

Hmm, good point. Guess there should be one seqno per ac...

> Let assume below case where CPU1 is dequeuing skb from isr context and
> CPU2 is enqueuing skbs into same txq.
>
> CPU1  CPU2
>   
> ath_txq_schedule
>-> ieee80211_next_txq(first)
>  -> inc_seq
>  -> delete from list
>  -> txq->seqno = local->seqno
>   ieee80211_tx/fast_xmit
>  -> ieee80211_queue_skb
>  -> 
> ieee80211_schedule_txq(reset_seqno)
>   -> list_add
>   -> txqi->seqno 
> = local->seqno - 1
>
> In above sequence, it is quite possible that the same txq will be
> served repeatedly and other txqs will be starved. am I right? IMHO
> seqno mechanism will not guarantee that txqs will be processed only
> once in an iteration.

Yeah, ieee80211_queue_skb() shouldn't set reset_seqno; was experimenting
with that, and guess I ended up picking the wrong value. Thanks for
pointing that out :)

 Well, it could conceivably be done in a way that doesn't require 
 taking
 the activeq_lock. Adding another STOP flag to the TXQ, for instance.
 From an API point of view I think that is more consistent with what 
 we
 have already...
 
>>> 
>>> Make sense. ieee80211_txq_may_pull would be better place to decide
>>> whether given txq is allowed for transmission. It also makes drivers
>>> do not have to worry about deficit. Still I may need
>>> ieee80211_reorder_txq API after processing txq. isn't it?
>> 
>> The way I was assuming this would work (and what ath9k does), is that a
>> driver only operates on one TXQ at a time; so it can get that txq,
>> process it, and re-schedule it. In which case no other API is needed;
>> the rotating can be done in next_txq(), and schedule_txq() can insert
>> the txq back into the rotation as needed.
>> 
>> However, it sounds like this is not how ath10k does things? See below.
>> 
> correct. The current rotation works only in push-only mode. i.e when
> firmware is not deciding txqs and the driver picks priority txq from
> active_txqs list. Unfortunately rotation won't work when the driver
> selects txq other than first in the list. In any case separate API
> needed for rotation when the driver is processing few packets from txq
> instead of all pending frames.

Rotation is not dependent on the TXQ draining completely. Dequeueing a
few packets, then rescheduling the TXQ is fine.

>> And if so, how does that interact with ath10k_mac_tx_push_pending()?
>> That function is basically doing the same thing that I explained above
>> for ath9k; in the previous version of this patch series I modified that
>> to use next_txq(). But is that a different TX path, or am I
>> misunderstanding you?
>> 
>> If you could point me to which parts of the ath10k code I should be
>> looking at, that would be helpful as well :)
>> 
> Depends on firmware htt_tx_mode (push/push_pull),
> ath10k_mac_tx_push_pending() downloads all/partial frames to firmware.
> Please take a look at ath10k_mac_tx_can_push() in push_pending(). Let
> me know If you need any other details.

Right, so looking at this, it looks like the driver will loop through
all the available TXQs, trying to dequeue from each of them if
ath10k_mac_tx_can_push() returns true, right? This should actually work
fine with the next_txq() / schedule_txq() API. Without airtime fairness
that will just translate into basically what the driver is doing now,
and I don't think more API functions are needed, as long as that is the
only point in the driver that pushes packets to the device...

With airtime fairness, what is going to happen is that the loop is only
going to get a single TXQ (the first one with a positive deficit), then
exit. Once that station has transmitted something and its deficit runs
negative, it'll get rotated and the next one will get a chance. So I
expect that airtime fairness will probably work, but MU-MIMO 

Re: [RFC v2 1/4] mac80211: Add TXQ scheduling API

2018-07-20 Thread Rajkumar Manoharan

On 2018-07-19 07:18, Toke Høiland-Jørgensen wrote:

Rajkumar Manoharan  writes:


On 2018-07-13 06:39, Toke Høiland-Jørgensen wrote:

Rajkumar Manoharan  writes:



It is not generally predictable how many times this will loop before
exiting...


Agree.. It would be better If the driver does not worry about txq
sequence numbering. Perhaps one more API (ieee80211_first_txq) could
solve this. Will leave it to you.


That is basically what the second parameter to next_txq() does in the
current patchset. It just needs to be renamed...


Agree. As next_txq() increments seqno while starting loop for each AC,
It seems bit confusing. i.e A single ath_txq_schedule_all call will bump
schedule_seqno by 4. right?

Let assume below case where CPU1 is dequeuing skb from isr context and
CPU2 is enqueuing skbs into same txq.

CPU1  CPU2
  
ath_txq_schedule
  -> ieee80211_next_txq(first)
-> inc_seq
-> delete from list
-> txq->seqno = local->seqno
 ieee80211_tx/fast_xmit
-> ieee80211_queue_skb
-> 
ieee80211_schedule_txq(reset_seqno)

 -> list_add
 -> txqi->seqno 
= local->seqno - 1


In above sequence, it is quite possible that the same txq
will be served repeatedly and other txqs will be starved. am I right?
IMHO seqno mechanism will not guarantee that txqs will be processed
only once in an iteration.

[...]

Well, it could conceivably be done in a way that doesn't require 
taking

the activeq_lock. Adding another STOP flag to the TXQ, for instance.
From an API point of view I think that is more consistent with what 
we

have already...



Make sense. ieee80211_txq_may_pull would be better place to decide
whether given txq is allowed for transmission. It also makes drivers
do not have to worry about deficit. Still I may need
ieee80211_reorder_txq API after processing txq. isn't it?


The way I was assuming this would work (and what ath9k does), is that a
driver only operates on one TXQ at a time; so it can get that txq,
process it, and re-schedule it. In which case no other API is needed;
the rotating can be done in next_txq(), and schedule_txq() can insert
the txq back into the rotation as needed.

However, it sounds like this is not how ath10k does things? See below.

correct. The current rotation works only in push-only mode. i.e when 
firmware
is not deciding txqs and the driver picks priority txq from active_txqs 
list.
Unfortunately rotation won't work when the driver selects txq other than 
first
in the list. In any case separate API needed for rotation when the 
driver is

processing few packets from txq instead of all pending frames.


In push-pull method, driver reports to firmware that number of frames
queued for each tid per station by wake_tx_queue. Later firmware will
query N frames from each TID and after dequeue driver will update
remaining frames for that tid. In ATF case, when driver is not able to
dequeue frames, driver will simply update remaining frames. The
consecutive fetch_ind get opportunity to dequeue the frames. By This
way, transmission for serving client will be paused for a while and
opportunity will be given to others.


This sounds like the driver doesn't do anything other than notify the
firmware that a txq has pending frames, and everything else is handled
by the firmware? Is this the case?



Correct. In push-pull method, DRR algorithm running in firmware and so
firmware decides txq.


And if so, how does that interact with ath10k_mac_tx_push_pending()?
That function is basically doing the same thing that I explained above
for ath9k; in the previous version of this patch series I modified that
to use next_txq(). But is that a different TX path, or am I
misunderstanding you?

If you could point me to which parts of the ath10k code I should be
looking at, that would be helpful as well :)

Depends on firmware htt_tx_mode (push/push_pull), 
ath10k_mac_tx_push_pending()
downloads all/partial frames to firmware. Please take a look at 
ath10k_mac_tx_can_push()

in push_pending(). Let me know If you need any other details.

-Rajkumar



Re: [RFC v2 1/4] mac80211: Add TXQ scheduling API

2018-07-19 Thread Toke Høiland-Jørgensen
Rajkumar Manoharan  writes:

> On 2018-07-13 06:39, Toke Høiland-Jørgensen wrote:
>> Rajkumar Manoharan  writes:
>> 
> [...]
>
>>> Hmm... I thought driver will call ieee80211_schedule_txq when it runs
>>> out of hardware descriptor and break the loop. The serving txq will be
>>> added back to head of activeq list. no?
>> 
>> Yes, and then the next one will be serviced... It's basically:
>> 
>> while (!hwq_is_full()) {
>>  txq = next_txq():
>>  build_one_aggr(txq); // may or may not succeed
>>  if (!empty(txq))
>>schedule_txq(txq);
>> }
>> 
>> It is not generally predictable how many times this will loop before
>> exiting...
>> 
> Agree.. It would be better If the driver does not worry about txq
> sequence numbering. Perhaps one more API (ieee80211_first_txq) could
> solve this. Will leave it to you.

That is basically what the second parameter to next_txq() does in the
current patchset. It just needs to be renamed...

> 
> ieee80211_txq_get_depth
>   - return deficit status along with frm_cnt
> 
> ieee80211_reorder_txq
>   - if txq deficit > 0
> - return;
>   - if txq is last
>  - return
>   - delete txq from list
>   - move it to tail
>   - update deficit by quantum
> 
> ath10k_htt_rx_tx_fetch_ind
>   - get txq deficit status
>   - if txq deficit > 0
> - dequeue skb
>   - else if deficit < 0
> - return NULL
> 
> Please share your thoughts.
 
 Hmm, not sure exactly how this would work; seems a little 
 complicated?
 Also, I'd rather if drivers were completely oblivious to the deficit;
 that is a bit of an implementation detail...
 
>>> Agree.. Initially I thought of adding deficit check in
>>> ieee80211_tx_dequeue.
>>> But It will be overhead of taking activeq_lock for every skbs. Perhaps
>>> it can be renamed as allowed_to_dequeue instead of deficit.
>>> 
 We could have an ieee80211_txq_may_pull(); or maybe just have
 ieee80211_tx_dequeue() return NULL if the deficit is negative?
 
>>> As I said earlier, checking deficit for every skb will be an overhead.
>>> It should be done once before accessing txq.
>> 
>> Well, it could conceivably be done in a way that doesn't require taking
>> the activeq_lock. Adding another STOP flag to the TXQ, for instance.
>> From an API point of view I think that is more consistent with what we
>> have already...
>> 
>
> Make sense. ieee80211_txq_may_pull would be better place to decide
> whether given txq is allowed for transmission. It also makes drivers
> do not have to worry about deficit. Still I may need
> ieee80211_reorder_txq API after processing txq. isn't it?

The way I was assuming this would work (and what ath9k does), is that a
driver only operates on one TXQ at a time; so it can get that txq,
process it, and re-schedule it. In which case no other API is needed;
the rotating can be done in next_txq(), and schedule_txq() can insert
the txq back into the rotation as needed.

However, it sounds like this is not how ath10k does things? See below.

 the reasonable thing for the driver to do, then, would be to ask
 ieee80211_next_txq() for another TXQ to pull from if the current one
 doesn't work for whatever reason.
 
 Would that work for push-pull mode?
 
>>> Not really. Driver shouldn't send other txq data instead of asked one.
>> 
>> I didn't necessarily mean immediately. As long as it does it 
>> eventually.
>> If a TXQ's deficit runs negative, that TXQ will not be allowed to send
>> again until its deficit has been restored to positive through enough
>> cycles of the loop in next_txq().
>> 
>
> Thats true. Are you suggesting to run the loop until the txq deficit
> becomes positive?

Yeah, or rather until the first station with a positive deficit is
found.

>>> In MU-MIMO, firmware will query N packets from given set of {STA,TID}.
>>> So the driver not supposed to send other txq's data.
>> 
>> Hmm, it'll actually be interesting to see how the airtime fairness
>> scheduler interacts with MU-MIMO. I'm not quite sure that it'll be in a
>> good way; the DRR scheduler generally only restores one TXQ to positive
>> deficit at a time, so it may be that MU-MIMO will break completely and
>> we'll have to come up with another scheduling algorithm.
>> 
>
> In push-pull method, driver reports to firmware that number of frames
> queued for each tid per station by wake_tx_queue. Later firmware will
> query N frames from each TID and after dequeue driver will update
> remaining frames for that tid. In ATF case, when driver is not able to
> dequeue frames, driver will simply update remaining frames. The
> consecutive fetch_ind get opportunity to dequeue the frames. By This
> way, transmission for serving client will be paused for a while and
> opportunity will be given to others.

This sounds like the driver doesn't do anything other than notify the

Re: [RFC v2 1/4] mac80211: Add TXQ scheduling API

2018-07-16 Thread Rajkumar Manoharan

On 2018-07-13 06:39, Toke Høiland-Jørgensen wrote:

Rajkumar Manoharan  writes:


[...]


Hmm... I thought driver will call ieee80211_schedule_txq when it runs
out of hardware descriptor and break the loop. The serving txq will be
added back to head of activeq list. no?


Yes, and then the next one will be serviced... It's basically:

while (!hwq_is_full()) {
 txq = next_txq():
 build_one_aggr(txq); // may or may not succeed
 if (!empty(txq))
   schedule_txq(txq);
}

It is not generally predictable how many times this will loop before
exiting...

Agree.. It would be better If the driver does not worry about txq 
sequence

numbering. Perhaps one more API (ieee80211_first_txq) could solve this.
Will leave it to you.



ieee80211_txq_get_depth
  - return deficit status along with frm_cnt

ieee80211_reorder_txq
  - if txq deficit > 0
- return;
  - if txq is last
 - return
  - delete txq from list
  - move it to tail
  - update deficit by quantum

ath10k_htt_rx_tx_fetch_ind
  - get txq deficit status
  - if txq deficit > 0
- dequeue skb
  - else if deficit < 0
- return NULL

Please share your thoughts.


Hmm, not sure exactly how this would work; seems a little 
complicated?

Also, I'd rather if drivers were completely oblivious to the deficit;
that is a bit of an implementation detail...


Agree.. Initially I thought of adding deficit check in
ieee80211_tx_dequeue.
But It will be overhead of taking activeq_lock for every skbs. Perhaps
it can be renamed as allowed_to_dequeue instead of deficit.


We could have an ieee80211_txq_may_pull(); or maybe just have
ieee80211_tx_dequeue() return NULL if the deficit is negative?


As I said earlier, checking deficit for every skb will be an overhead.
It should be done once before accessing txq.


Well, it could conceivably be done in a way that doesn't require taking
the activeq_lock. Adding another STOP flag to the TXQ, for instance.
From an API point of view I think that is more consistent with what we
have already...



Make sense. ieee80211_txq_may_pull would be better place to decide 
whether
given txq is allowed for transmission. It also makes drivers do not have 
to

worry about deficit. Still I may need ieee80211_reorder_txq API after
processing txq. isn't it?


the reasonable thing for the driver to do, then, would be to ask
ieee80211_next_txq() for another TXQ to pull from if the current one
doesn't work for whatever reason.

Would that work for push-pull mode?


Not really. Driver shouldn't send other txq data instead of asked one.


I didn't necessarily mean immediately. As long as it does it 
eventually.

If a TXQ's deficit runs negative, that TXQ will not be allowed to send
again until its deficit has been restored to positive through enough
cycles of the loop in next_txq().



Thats true. Are you suggesting to run the loop until the txq deficit 
becomes

positive?


In MU-MIMO, firmware will query N packets from given set of {STA,TID}.
So the driver not supposed to send other txq's data.


Hmm, it'll actually be interesting to see how the airtime fairness
scheduler interacts with MU-MIMO. I'm not quite sure that it'll be in a
good way; the DRR scheduler generally only restores one TXQ to positive
deficit at a time, so it may be that MU-MIMO will break completely and
we'll have to come up with another scheduling algorithm.



In push-pull method, driver reports to firmware that number of frames
queued for each tid per station by wake_tx_queue. Later firmware will 
query
N frames from each TID and after dequeue driver will update remaining 
frames
for that tid. In ATF case, when driver is not able to dequeue frames, 
driver
will simply update remaining frames. The consecutive fetch_ind get 
opportunity
to dequeue the frames. By This way, transmission for serving client will 
be paused

for a while and opportunity will be given to others.

-Rajkumar


Re: [RFC v2 1/4] mac80211: Add TXQ scheduling API

2018-07-13 Thread Toke Høiland-Jørgensen
Rajkumar Manoharan  writes:

> On 2018-07-12 16:13, Toke Høiland-Jørgensen wrote:
>> Rajkumar Manoharan  writes:
>> 
>>> On 2018-07-11 13:48, Toke Høiland-Jørgensen wrote:
 Rajkumar Manoharan  writes:
 
> On 2018-07-09 09:37, Toke Høiland-Jørgensen wrote:
> [...]
>> +/**
> [...]
>
 Erm, how would this prevent an infinite loop? With this scheme, at 
 some
 point, ieee80211_next_txq() removes the last txq from activeq, and
 returns that. Then, when it is called again the next time the driver
 loops, it's back to the first case (activeq empty, waitq non-empty); 
 so
 it'll move waitq back as activeq and start over... Only the driver
 really knows when it is starting a logical "loop through all active
 TXQs".
 
>>> Oops.. My bad.. The idea is that ieee80211_next_txq process txq from
>>> activeq only and keep processed txqs separately. Having single list
>>> eventually needs tracking mechanism. The point is that once activeq
>>> becomes empty, splice waitq list and return NULL. So that driver can
>>> break from the loop.
>>> 
>>> ieee80211_next_txq
>>>   - if activeq empty,
>>>- move waitq list into activeq
>>>- return NULL
>>> 
>>>   - if activeq not empty
>>>- fetch appropriate txq from activeq
>>>- remove txq from activeq list.
>>> 
>>>   - If txq found, return txq else return NULL
>> 
>> Right, so this would ensure the driver only sees each TXQ once *if it
>> keeps looping*. But it doesn't necessarily; if the hardware queues fill
>> up, for instance, it might abort earlier. In that case it would need to
>> signal mac80211 that it is done for now, which is equivalent to
>> signalling when it starts a scheduler round.
>> 
> Hmm... I thought driver will call ieee80211_schedule_txq when it runs
> out of hardware descriptor and break the loop. The serving txq will be
> added back to head of activeq list. no?

Yes, and then the next one will be serviced... It's basically:

while (!hwq_is_full()) {
 txq = next_txq():
 build_one_aggr(txq); // may or may not succeed
 if (!empty(txq))
   schedule_txq(txq);
}

It is not generally predictable how many times this will loop before
exiting...

 Also, for airtime fairness, the queues are not scheduled strictly
 round-robin, so the dual-list scheme wouldn't work there either...
 
>>> As you know, ath10k driver will operate in two tx modes (push-only,
>>> push-pull). These modes will be switched dynamically depends on
>>> firmware load or resource availability. In push-pull mode, firmware
>>> will query N number of frames for set of STA, TID.
>> 
>> Ah, so it will look up the TXQ without looping through the list of
>> pending queues at all? Didn't realise that is what it's doing; yeah,
>> that would be problematic for airtime fairness :)
>> 
>>> So the driver will directly dequeue N number of frames from given txq.
>>> In this case txq ordering alone wont help. I am planning to add below
>>> changes in exiting API and add new API ieee80211_reorder_txq.
>>> 
>>> ieee80211_txq_get_depth
>>>   - return deficit status along with frm_cnt
>>> 
>>> ieee80211_reorder_txq
>>>   - if txq deficit > 0
>>> - return;
>>>   - if txq is last
>>>  - return
>>>   - delete txq from list
>>>   - move it to tail
>>>   - update deficit by quantum
>>> 
>>> ath10k_htt_rx_tx_fetch_ind
>>>   - get txq deficit status
>>>   - if txq deficit > 0
>>> - dequeue skb
>>>   - else if deficit < 0
>>> - return NULL
>>> 
>>> Please share your thoughts.
>> 
>> Hmm, not sure exactly how this would work; seems a little complicated?
>> Also, I'd rather if drivers were completely oblivious to the deficit;
>> that is a bit of an implementation detail...
>> 
> Agree.. Initially I thought of adding deficit check in 
> ieee80211_tx_dequeue.
> But It will be overhead of taking activeq_lock for every skbs. Perhaps
> it can be renamed as allowed_to_dequeue instead of deficit.
>
>> We could have an ieee80211_txq_may_pull(); or maybe just have
>> ieee80211_tx_dequeue() return NULL if the deficit is negative?
>> 
> As I said earlier, checking deficit for every skb will be an overhead.
> It should be done once before accessing txq.

Well, it could conceivably be done in a way that doesn't require taking
the activeq_lock. Adding another STOP flag to the TXQ, for instance.
>From an API point of view I think that is more consistent with what we
have already...

>> the reasonable thing for the driver to do, then, would be to ask
>> ieee80211_next_txq() for another TXQ to pull from if the current one
>> doesn't work for whatever reason.
>> 
>> Would that work for push-pull mode?
>> 
> Not really. Driver shouldn't send other txq data instead of asked one.

I didn't necessarily mean immediately. As long as it does it eventually.
If a TXQ's deficit runs negative, that TXQ will not be allowed to send
again until its 

Re: [RFC v2 1/4] mac80211: Add TXQ scheduling API

2018-07-12 Thread Rajkumar Manoharan

On 2018-07-12 16:13, Toke Høiland-Jørgensen wrote:

Rajkumar Manoharan  writes:


On 2018-07-11 13:48, Toke Høiland-Jørgensen wrote:

Rajkumar Manoharan  writes:


On 2018-07-09 09:37, Toke Høiland-Jørgensen wrote:
[...]

+/**

[...]

Erm, how would this prevent an infinite loop? With this scheme, at 
some

point, ieee80211_next_txq() removes the last txq from activeq, and
returns that. Then, when it is called again the next time the driver
loops, it's back to the first case (activeq empty, waitq non-empty); 
so

it'll move waitq back as activeq and start over... Only the driver
really knows when it is starting a logical "loop through all active
TXQs".


Oops.. My bad.. The idea is that ieee80211_next_txq process txq from
activeq only and keep processed txqs separately. Having single list
eventually needs tracking mechanism. The point is that once activeq
becomes empty, splice waitq list and return NULL. So that driver can
break from the loop.

ieee80211_next_txq
  - if activeq empty,
   - move waitq list into activeq
   - return NULL

  - if activeq not empty
   - fetch appropriate txq from activeq
   - remove txq from activeq list.

  - If txq found, return txq else return NULL


Right, so this would ensure the driver only sees each TXQ once *if it
keeps looping*. But it doesn't necessarily; if the hardware queues fill
up, for instance, it might abort earlier. In that case it would need to
signal mac80211 that it is done for now, which is equivalent to
signalling when it starts a scheduler round.

Hmm... I thought driver will call ieee80211_schedule_txq when it runs 
out

of hardware descriptor and break the loop. The serving txq will be added
back to head of activeq list. no?


Also, for airtime fairness, the queues are not scheduled strictly
round-robin, so the dual-list scheme wouldn't work there either...


As you know, ath10k driver will operate in two tx modes (push-only,
push-pull). These modes will be switched dynamically depends on
firmware load or resource availability. In push-pull mode, firmware
will query N number of frames for set of STA, TID.


Ah, so it will look up the TXQ without looping through the list of
pending queues at all? Didn't realise that is what it's doing; yeah,
that would be problematic for airtime fairness :)


So the driver will directly dequeue N number of frames from given txq.
In this case txq ordering alone wont help. I am planning to add below
changes in exiting API and add new API ieee80211_reorder_txq.

ieee80211_txq_get_depth
  - return deficit status along with frm_cnt

ieee80211_reorder_txq
  - if txq deficit > 0
- return;
  - if txq is last
 - return
  - delete txq from list
  - move it to tail
  - update deficit by quantum

ath10k_htt_rx_tx_fetch_ind
  - get txq deficit status
  - if txq deficit > 0
- dequeue skb
  - else if deficit < 0
- return NULL

Please share your thoughts.


Hmm, not sure exactly how this would work; seems a little complicated?
Also, I'd rather if drivers were completely oblivious to the deficit;
that is a bit of an implementation detail...

Agree.. Initially I thought of adding deficit check in 
ieee80211_tx_dequeue.

But It will be overhead of taking activeq_lock for every skbs. Perhaps
it can be renamed as allowed_to_dequeue instead of deficit.


We could have an ieee80211_txq_may_pull(); or maybe just have
ieee80211_tx_dequeue() return NULL if the deficit is negative?


As I said earlier, checking deficit for every skb will be an overhead.
It should be done once before accessing txq.


the reasonable thing for the driver to do, then, would be to ask
ieee80211_next_txq() for another TXQ to pull from if the current one
doesn't work for whatever reason.

Would that work for push-pull mode?


Not really. Driver shouldn't send other txq data instead of asked one.
In MU-MIMO, firmware will query N packets from given set of {STA,TID}.
So the driver not supposed to send other txq's data.

-Rajkumar


Re: [RFC v2 1/4] mac80211: Add TXQ scheduling API

2018-07-12 Thread Toke Høiland-Jørgensen
Rajkumar Manoharan  writes:

> On 2018-07-11 13:48, Toke Høiland-Jørgensen wrote:
>> Rajkumar Manoharan  writes:
>> 
>>> On 2018-07-09 09:37, Toke Høiland-Jørgensen wrote:
>>> [...]
 +/**
 + * ieee80211_schedule_txq - add txq to scheduling loop
 + *
 + * @hw: pointer as obtained from ieee80211_alloc_hw()
 + * @txq: pointer obtained from station or virtual interface
 + * @reset_seqno: Whether to reset the internal scheduling sequence
 number,
 + *   allowing this txq to appear again in the current
 scheduling
 + *   round (see doc for ieee80211_next_txq()).
 + *
 + * Returns %true if the txq was actually added to the scheduling,
 + * %false otherwise.
 + */
 +bool ieee80211_schedule_txq(struct ieee80211_hw *hw,
 +  struct ieee80211_txq *txq,
 +  bool reset_seqno);
 +
 +/**
 + * ieee80211_next_txq - get next tx queue to pull packets from
 + *
 + * @hw: pointer as obtained from ieee80211_alloc_hw()
 + * @ac: filter returned txqs with this AC number. Pass -1 for no
 filtering.
 + * @inc_seqno: Whether to increase the scheduling sequence number.
 Setting this
 + * to true signifies the start of a new scheduling 
 round.
 Each TXQ
 + * will only be returned exactly once in each round
 (unless its
 + * sequence number is explicitly reset when calling
 + * ieee80211_schedule_txq()).
 + *
>>> Toke,
>>> 
>>> Seems like seqno is internal to mac80211 and meant for active_txq list
>>> manipulation. If so, why would drivers have to worry about increment
>>> or resetting seqno?
>> 
>> The drivers need to be able to signal when they start a new "scheduling
>> round" (which is the parameter to next_txq()), and when a queue should
>> be immediately rescheduled (which is the parameter to schedule_txq()).
>> See the subsequent patch to ath9k for how this is used; the latter is
>> signalled when a TXQ successfully dequeued an aggregate...
>> 
>> Now, you're right that the choice to track this via a sequence number 
>> is
>> an implementation detail internal to mac80211... so maybe the 
>> parameters
>> should be called something different.
>> 
>>> IMHO to avoid over serving same txq, two lists (activeq and waitq) can
>>> be used and always add new txq into waitq list. So that driver will
>>> not worry about mac80211 txq manipulation. Please correct me If Im
>>> wrong.
>>> 
>>> ieee80211_schedule_txq
>>> - if schedule_order empty, add txq into waitq list tail.
>>> 
>>> ieee80211_next_txq
>>> - if activeq empty,
>>>  - move waitq list into activeq
>>> 
>>> - if activeq not empty
>>>  - fetch appropriate txq from activeq
>>>  - remove txq from activeq list.
>>> 
>>> - If txq found, return txq else return NULL
>> 
>> 
>> Erm, how would this prevent an infinite loop? With this scheme, at some
>> point, ieee80211_next_txq() removes the last txq from activeq, and
>> returns that. Then, when it is called again the next time the driver
>> loops, it's back to the first case (activeq empty, waitq non-empty); so
>> it'll move waitq back as activeq and start over... Only the driver
>> really knows when it is starting a logical "loop through all active
>> TXQs".
>> 
> Oops.. My bad.. The idea is that ieee80211_next_txq process txq from
> activeq only and keep processed txqs separately. Having single list
> eventually needs tracking mechanism. The point is that once activeq
> becomes empty, splice waitq list and return NULL. So that driver can
> break from the loop.
>
> ieee80211_next_txq
>   - if activeq empty,
>- move waitq list into activeq
>- return NULL
>
>   - if activeq not empty
>- fetch appropriate txq from activeq
>- remove txq from activeq list.
>
>   - If txq found, return txq else return NULL

Right, so this would ensure the driver only sees each TXQ once *if it
keeps looping*. But it doesn't necessarily; if the hardware queues fill
up, for instance, it might abort earlier. In that case it would need to
signal mac80211 that it is done for now, which is equivalent to
signalling when it starts a scheduler round.

>> Also, for airtime fairness, the queues are not scheduled strictly
>> round-robin, so the dual-list scheme wouldn't work there either...
>> 
> As you know, ath10k driver will operate in two tx modes (push-only,
> push-pull). These modes will be switched dynamically depends on
> firmware load or resource availability. In push-pull mode, firmware
> will query N number of frames for set of STA, TID.

Ah, so it will look up the TXQ without looping through the list of
pending queues at all? Didn't realise that is what it's doing; yeah,
that would be problematic for airtime fairness :)

> So the driver will directly dequeue N number of frames from given txq.
> In this case txq 

Re: [RFC v2 1/4] mac80211: Add TXQ scheduling API

2018-07-12 Thread Rajkumar Manoharan

On 2018-07-11 13:48, Toke Høiland-Jørgensen wrote:

Rajkumar Manoharan  writes:


On 2018-07-09 09:37, Toke Høiland-Jørgensen wrote:
[...]

+/**
+ * ieee80211_schedule_txq - add txq to scheduling loop
+ *
+ * @hw: pointer as obtained from ieee80211_alloc_hw()
+ * @txq: pointer obtained from station or virtual interface
+ * @reset_seqno: Whether to reset the internal scheduling sequence
number,
+ *   allowing this txq to appear again in the current
scheduling
+ *   round (see doc for ieee80211_next_txq()).
+ *
+ * Returns %true if the txq was actually added to the scheduling,
+ * %false otherwise.
+ */
+bool ieee80211_schedule_txq(struct ieee80211_hw *hw,
+   struct ieee80211_txq *txq,
+   bool reset_seqno);
+
+/**
+ * ieee80211_next_txq - get next tx queue to pull packets from
+ *
+ * @hw: pointer as obtained from ieee80211_alloc_hw()
+ * @ac: filter returned txqs with this AC number. Pass -1 for no
filtering.
+ * @inc_seqno: Whether to increase the scheduling sequence number.
Setting this
+ * to true signifies the start of a new scheduling 
round.

Each TXQ
+ * will only be returned exactly once in each round
(unless its
+ * sequence number is explicitly reset when calling
+ * ieee80211_schedule_txq()).
+ *

Toke,

Seems like seqno is internal to mac80211 and meant for active_txq list
manipulation. If so, why would drivers have to worry about increment
or resetting seqno?


The drivers need to be able to signal when they start a new "scheduling
round" (which is the parameter to next_txq()), and when a queue should
be immediately rescheduled (which is the parameter to schedule_txq()).
See the subsequent patch to ath9k for how this is used; the latter is
signalled when a TXQ successfully dequeued an aggregate...

Now, you're right that the choice to track this via a sequence number 
is
an implementation detail internal to mac80211... so maybe the 
parameters

should be called something different.


IMHO to avoid over serving same txq, two lists (activeq and waitq) can
be used and always add new txq into waitq list. So that driver will
not worry about mac80211 txq manipulation. Please correct me If Im
wrong.

ieee80211_schedule_txq
- if schedule_order empty, add txq into waitq list tail.

ieee80211_next_txq
- if activeq empty,
 - move waitq list into activeq

- if activeq not empty
 - fetch appropriate txq from activeq
 - remove txq from activeq list.

- If txq found, return txq else return NULL



Erm, how would this prevent an infinite loop? With this scheme, at some
point, ieee80211_next_txq() removes the last txq from activeq, and
returns that. Then, when it is called again the next time the driver
loops, it's back to the first case (activeq empty, waitq non-empty); so
it'll move waitq back as activeq and start over... Only the driver
really knows when it is starting a logical "loop through all active
TXQs".

Oops.. My bad.. The idea is that ieee80211_next_txq process txq from 
activeq only
and keep processed txqs separately. Having single list eventually needs 
tracking
mechanism. The point is that once activeq becomes empty, splice waitq 
list and

return NULL. So that driver can break from the loop.

ieee80211_next_txq
 - if activeq empty,
  - move waitq list into activeq
  - return NULL

 - if activeq not empty
  - fetch appropriate txq from activeq
  - remove txq from activeq list.

 - If txq found, return txq else return NULL


Also, for airtime fairness, the queues are not scheduled strictly
round-robin, so the dual-list scheme wouldn't work there either...

As you know, ath10k driver will operate in two tx modes (push-only, 
push-pull).
These modes will be switched dynamically depends on firmware load or 
resource
availability. In push-pull mode, firmware will query N number of frames 
for set of
STA, TID. So the driver will directly dequeue N number of frames from 
given txq.
In this case txq ordering alone wont help. I am planning to add below 
changes in

exiting API and add new API ieee80211_reorder_txq.

ieee80211_txq_get_depth
 - return deficit status along with frm_cnt

ieee80211_reorder_txq
 - if txq deficit > 0
   - return;
 - if txq is last
- return
 - delete txq from list
 - move it to tail
 - update deficit by quantum

ath10k_htt_rx_tx_fetch_ind
 - get txq deficit status
 - if txq deficit > 0
   - dequeue skb
 - else if deficit < 0
   - return NULL

Please share your thoughts.

-Rajkumar


Re: [RFC v2 1/4] mac80211: Add TXQ scheduling API

2018-07-11 Thread Toke Høiland-Jørgensen
Rajkumar Manoharan  writes:

> On 2018-07-09 09:37, Toke Høiland-Jørgensen wrote:
> [...]
>> +/**
>> + * ieee80211_schedule_txq - add txq to scheduling loop
>> + *
>> + * @hw: pointer as obtained from ieee80211_alloc_hw()
>> + * @txq: pointer obtained from station or virtual interface
>> + * @reset_seqno: Whether to reset the internal scheduling sequence 
>> number,
>> + *   allowing this txq to appear again in the current 
>> scheduling
>> + *   round (see doc for ieee80211_next_txq()).
>> + *
>> + * Returns %true if the txq was actually added to the scheduling,
>> + * %false otherwise.
>> + */
>> +bool ieee80211_schedule_txq(struct ieee80211_hw *hw,
>> +struct ieee80211_txq *txq,
>> +bool reset_seqno);
>> +
>> +/**
>> + * ieee80211_next_txq - get next tx queue to pull packets from
>> + *
>> + * @hw: pointer as obtained from ieee80211_alloc_hw()
>> + * @ac: filter returned txqs with this AC number. Pass -1 for no 
>> filtering.
>> + * @inc_seqno: Whether to increase the scheduling sequence number. 
>> Setting this
>> + * to true signifies the start of a new scheduling round. 
>> Each TXQ
>> + * will only be returned exactly once in each round 
>> (unless its
>> + * sequence number is explicitly reset when calling
>> + * ieee80211_schedule_txq()).
>> + *
> Toke,
>
> Seems like seqno is internal to mac80211 and meant for active_txq list
> manipulation. If so, why would drivers have to worry about increment
> or resetting seqno?

The drivers need to be able to signal when they start a new "scheduling
round" (which is the parameter to next_txq()), and when a queue should
be immediately rescheduled (which is the parameter to schedule_txq()).
See the subsequent patch to ath9k for how this is used; the latter is
signalled when a TXQ successfully dequeued an aggregate...

Now, you're right that the choice to track this via a sequence number is
an implementation detail internal to mac80211... so maybe the parameters
should be called something different.

> IMHO to avoid over serving same txq, two lists (activeq and waitq) can
> be used and always add new txq into waitq list. So that driver will
> not worry about mac80211 txq manipulation. Please correct me If Im
> wrong.
>
> ieee80211_schedule_txq
> - if schedule_order empty, add txq into waitq list tail.
>
> ieee80211_next_txq
> - if activeq empty,
>  - move waitq list into activeq
>
> - if activeq not empty
>  - fetch appropriate txq from activeq
>  - remove txq from activeq list.
>
> - If txq found, return txq else return NULL


Erm, how would this prevent an infinite loop? With this scheme, at some
point, ieee80211_next_txq() removes the last txq from activeq, and
returns that. Then, when it is called again the next time the driver
loops, it's back to the first case (activeq empty, waitq non-empty); so
it'll move waitq back as activeq and start over... Only the driver
really knows when it is starting a logical "loop through all active
TXQs".

Also, for airtime fairness, the queues are not scheduled strictly
round-robin, so the dual-list scheme wouldn't work there either...

-Toke



Re: [RFC v2 1/4] mac80211: Add TXQ scheduling API

2018-07-10 Thread Rajkumar Manoharan

On 2018-07-09 09:37, Toke Høiland-Jørgensen wrote:
[...]

+/**
+ * ieee80211_schedule_txq - add txq to scheduling loop
+ *
+ * @hw: pointer as obtained from ieee80211_alloc_hw()
+ * @txq: pointer obtained from station or virtual interface
+ * @reset_seqno: Whether to reset the internal scheduling sequence 
number,
+ *   allowing this txq to appear again in the current 
scheduling

+ *   round (see doc for ieee80211_next_txq()).
+ *
+ * Returns %true if the txq was actually added to the scheduling,
+ * %false otherwise.
+ */
+bool ieee80211_schedule_txq(struct ieee80211_hw *hw,
+   struct ieee80211_txq *txq,
+   bool reset_seqno);
+
+/**
+ * ieee80211_next_txq - get next tx queue to pull packets from
+ *
+ * @hw: pointer as obtained from ieee80211_alloc_hw()
+ * @ac: filter returned txqs with this AC number. Pass -1 for no 
filtering.
+ * @inc_seqno: Whether to increase the scheduling sequence number. 
Setting this
+ * to true signifies the start of a new scheduling round. 
Each TXQ
+ * will only be returned exactly once in each round 
(unless its

+ * sequence number is explicitly reset when calling
+ * ieee80211_schedule_txq()).
+ *

Toke,

Seems like seqno is internal to mac80211 and meant for active_txq list 
manipulation.
If so, why would drivers have to worry about increment or resetting 
seqno? IMHO to
avoid over serving same txq, two lists (activeq and waitq) can be used 
and always
add new txq into waitq list. So that driver will not worry about 
mac80211 txq

manipulation. Please correct me If Im wrong.

ieee80211_schedule_txq
   - if schedule_order empty, add txq into waitq list tail.

ieee80211_next_txq
   - if activeq empty,
- move waitq list into activeq

   - if activeq not empty
- fetch appropriate txq from activeq
- remove txq from activeq list.

   - If txq found, return txq else return NULL

-Rajkumar