Re: [Make-wifi-fast] [PATCH v2] mac80211: Move crypto IV generation to after TXQ dequeue.

2016-08-26 Thread Toke Høiland-Jørgensen
Johannes Berg  writes:

> On Mon, 2016-08-22 at 16:47 +0200, Toke Høiland-Jørgensen wrote:
>> 
>> I suppose that could be a way to do it (i.e. have
>> ieee80211_tx_dequeue call all the TX hooks etc), but am not sure
>> whether there would be problems doing all this work in the loop
>> that's building aggregates (which is what would happen for ath9k at
>> least).
>
> I don't know, but it seems that it's worth trying.
>
>> An alternative could be to split the process up in two: An "early"
>> and "late" stage, where the early stage does everything that is not
>> sensitive to reordering and the occasional drop, and the late stage
>> is everything that is. Then the queueing step could happen in-between 
>> the two stages, and the non-queueing path could just call both stages
>> at once. In effect, this would just make the current work-arounds be
>> more explicit in the structure, rather than being marked as
>> exceptions.
>
> I'm not sure that works the way you think it does.
>
> What you did works for fast-xmit, but *only* because that doesn't do
> software crypto. If, for some reason, the TXQ stuff combines with
> software crypto, which doesn't seem impossible (ath9k even has a module
> parameter, iirc), then you have no way for this to work.

Yeah, I realised that when I started reviewing the slow path (sorry for
not realising that straight away). The v3 takes the "split handlers"
approach for this reason. That saved having to deal with fragmentation
on TXQ dequeue, and it means that some of the processing can be done
before queueing (such as GSO splitting; having packets be as small as
possible before applying FQ to them is a good thing if we want to
realise the full potential).

It seems there are still some bugs to work out with that patch, but I'd
be grateful if you could glance at it and comment on whether you think
this is a viable way forward (provided we can work out all the bugs, of
course).

>> > Now, it's unlikely to be that simple - fragmentation, for example,
>> > might mess this up.
>> > 
>> > Overall though, I'm definitely wondering if it should be this way,
>> > since all the special cases just add complexity.
>> 
>> I agree that the work-arounds are iffy, but I do also think it's
>> important to keep in mind that we are improving latency by orders of
>> magnitude here. A few special cases is worth it to achieve that, IMO.
>> And then iterating towards a design that don't need them, of course
>> :)
>
> I don't really agree, I'm not going to treat this unlike any other
> feature, which gets merged when it's ready for that.
>
> Right now, your code here obviously isn't, since it doesn't even
> address the cases that ath9k could run into, so either ath9k shouldn't
> use this mac80211 feature, or the mac80211 feature needs to be fixed
> before ath9k can use it.

Yeah, I agree now that I've looked at it some more :)

> I have no problems with documenting that the TXQ stuff can only be
> used with full hardware crypto, but then we should add some checks and
> warnings in mac80211 to ensure that, i.e. not allow software keys when
> TXQ stuff is used, nor allow keys with mac80211 PN assignment, etc.

I'd much rather fix things so it works in all cases. My patch to ath9k
to use this stuff completely removes the old TX path, and things like
the airtime fairness scheduler needs the intermediate queues to work.

-Toke


Re: [Make-wifi-fast] [PATCH v2] mac80211: Move crypto IV generation to after TXQ dequeue.

2016-08-26 Thread Johannes Berg
On Mon, 2016-08-22 at 16:47 +0200, Toke Høiland-Jørgensen wrote:
> 
> I suppose that could be a way to do it (i.e. have
> ieee80211_tx_dequeue call all the TX hooks etc), but am not sure
> whether there would be problems doing all this work in the loop
> that's building aggregates (which is what would happen for ath9k at
> least).

I don't know, but it seems that it's worth trying.

> An alternative could be to split the process up in two: An "early"
> and "late" stage, where the early stage does everything that is not
> sensitive to reordering and the occasional drop, and the late stage
> is everything that is. Then the queueing step could happen in-between 
> the two stages, and the non-queueing path could just call both stages
> at once. In effect, this would just make the current work-arounds be
> more explicit in the structure, rather than being marked as
> exceptions.

I'm not sure that works the way you think it does.

What you did works for fast-xmit, but *only* because that doesn't do
software crypto. If, for some reason, the TXQ stuff combines with
software crypto, which doesn't seem impossible (ath9k even has a module
parameter, iirc), then you have no way for this to work.

> > Now, it's unlikely to be that simple - fragmentation, for example,
> > might mess this up.
> > 
> > Overall though, I'm definitely wondering if it should be this way,
> > since all the special cases just add complexity.
> 
> I agree that the work-arounds are iffy, but I do also think it's
> important to keep in mind that we are improving latency by orders of
> magnitude here. A few special cases is worth it to achieve that, IMO.
> And then iterating towards a design that don't need them, of course
> :)

I don't really agree, I'm not going to treat this unlike any other
feature, which gets merged when it's ready for that.

Right now, your code here obviously isn't, since it doesn't even
address the cases that ath9k could run into, so either ath9k shouldn't
use this mac80211 feature, or the mac80211 feature needs to be fixed
before ath9k can use it.

I have no problems with documenting that the TXQ stuff can only be used
with full hardware crypto, but then we should add some checks and
warnings in mac80211 to ensure that, i.e. not allow software keys when
TXQ stuff is used, nor allow keys with mac80211 PN assignment, etc.

Even QoS-seqno assignment will be broken btw, so you do need a bunch
more offloads to make this work.

johannes


Re: [Make-wifi-fast] [PATCH v2] mac80211: Move crypto IV generation to after TXQ dequeue.

2016-08-22 Thread Toke Høiland-Jørgensen
Johannes Berg  writes:

>> well, we're getting there. the results of both patch attempts were
>> really nice, and brought encrypted performance with fq back into line
>> with unencrypted. Still running crypted tests as I write...
>> 
>> So fixing TKIP would be next, forcing the AP to use that? What other
>> scenarios do we have to worry about? WDS?
>> 
>
> I don't think there's anything else, I just don't really feel it's
> getting anywhere. This is a mere symptom of the design.
>
> Felix had worked around the SN assignment in a similar way, but I feel
> that perhaps the whole thing isn't quite the right architecture. Why
> are we applying FQ after the wifi conversion, when clearly that doesn't
> work well? Seems to me that it would make more sense to let the frames
> sit on the queues as they come in, and do most of the wifi handling
> only when needed (obviously, things like control port would still have
> to be done).

I suppose that could be a way to do it (i.e. have ieee80211_tx_dequeue
call all the TX hooks etc), but am not sure whether there would be
problems doing all this work in the loop that's building aggregates
(which is what would happen for ath9k at least).

An alternative could be to split the process up in two: An "early" and
"late" stage, where the early stage does everything that is not
sensitive to reordering and the occasional drop, and the late stage is
everything that is. Then the queueing step could happen in-between the
two stages, and the non-queueing path could just call both stages at
once. In effect, this would just make the current work-arounds be more
explicit in the structure, rather than being marked as exceptions.

> We even count those packets that are dropped for TX statistics, which
> would seem to be a big behavioural difference vs. applying a qdisc.

While you're right in principle, in practice I don't think this has too
big of an impact. In normal operation, CoDel drops (at most) dozens of
packets per *minute*, so it's not going to skew the statistics too much.

> Now, it's unlikely to be that simple - fragmentation, for example,
> might mess this up.
>
> Overall though, I'm definitely wondering if it should be this way,
> since all the special cases just add complexity.

I agree that the work-arounds are iffy, but I do also think it's
important to keep in mind that we are improving latency by orders of
magnitude here. A few special cases is worth it to achieve that, IMO.
And then iterating towards a design that don't need them, of course :)

-Toke


Re: [Make-wifi-fast] [PATCH v2] mac80211: Move crypto IV generation to after TXQ dequeue.

2016-08-17 Thread Johannes Berg

> well, we're getting there. the results of both patch attempts were
> really nice, and brought encrypted performance with fq back into line
> with unencrypted. Still running crypted tests as I write...
> 
> So fixing TKIP would be next, forcing the AP to use that? What other
> scenarios do we have to worry about? WDS?
> 

I don't think there's anything else, I just don't really feel it's
getting anywhere. This is a mere symptom of the design.

Felix had worked around the SN assignment in a similar way, but I feel
that perhaps the whole thing isn't quite the right architecture. Why
are we applying FQ after the wifi conversion, when clearly that doesn't
work well? Seems to me that it would make more sense to let the frames
sit on the queues as they come in, and do most of the wifi handling
only when needed (obviously, things like control port would still have
to be done).
We even count those packets that are dropped for TX statistics, which
would seem to be a big behavioural difference vs. applying a qdisc.

Now, it's unlikely to be that simple - fragmentation, for example,
might mess this up.

Overall though, I'm definitely wondering if it should be this way,
since all the special cases just add complexity.

johannes


Re: [Make-wifi-fast] [PATCH v2] mac80211: Move crypto IV generation to after TXQ dequeue.

2016-08-17 Thread Dave Taht
On Wed, Aug 17, 2016 at 9:49 PM, Johannes Berg
 wrote:
> Hi,
>
> You need to work on coding style, a lot of your indentation is
> completely messed up.
>
>> + switch (sdata->vif.type) {
>> + case NL80211_IFTYPE_STATION:
>> + if (sdata->u.mgd.use_4addr) {
>> + pn_offs = 30;
>> + break;
>> + }
>> + pn_offs = 24;
>> + break;
>> + case NL80211_IFTYPE_AP_VLAN:
>> + if (sdata->wdev.use_4addr) {
>> + pn_offs = 30;
>> + break;
>> + }
>> + /* fall through */
>> + case NL80211_IFTYPE_ADHOC:
>> + case NL80211_IFTYPE_AP:
>> + pn_offs = 24;
>> + break;
>> + default:
>> + return;
>> + }
>> +
>> + if (sta->sta.wme) {
>> + pn_offs += 2;
>> + }
>
> I think you just reinvented ieee80211_hdrlen(). No?
>
>> - if (fast_tx->pn_offs) {
>> - u64 pn;
>> - u8 *crypto_hdr = skb->data + fast_tx->pn_offs;
>
> No need to undo the pn_offs optimisation for the !txq case, you can
> pass it in to the new function that will fill it.
>
> However, you're still doing it wrong - now you haven't fixed anything
> for TKIP, which won't hit the fastpath.

well, we're getting there. the results of both patch attempts were
really nice, and brought encrypted performance with fq back into line
with unencrypted. Still running crypted tests as I write...

So fixing TKIP would be next, forcing the AP to use that? What other
scenarios do we have to worry about? WDS?


> johannes
> ___
> Make-wifi-fast mailing list
> make-wifi-f...@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/make-wifi-fast



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org