from:"Dave Taht"

Re: [Make-wifi-fast] [PATCH RFC v5 3/4] mac80211: Add airtime accounting and scheduling to TXQs

2018-10-13 Thread Dave Taht

On Fri, Oct 12, 2018 at 12:38 AM Rajkumar Manoharan
 wrote:
>
> On 2018-10-11 03:38, Toke Høiland-Jørgensen wrote:
> > Rajkumar Manoharan  writes:
> >
> >> Hmm... mine is bit different. txqs are refilled only once for all
> >> txqs.
> >> It will give more opportunity for non-served txqs. drv_wake_tx_queue
> >> won't be
> >> called from may_tx as the driver anyway will not push packets in
> >> pull-mode.
> >
> > So, as far as I can tell, this requires the hardware to "keep trying"?
> > I.e., if it just stops scheduling a TXQ after may_transmit() returns
> > false, there is no guarantee that that TXQ will ever get re-awoken
> > unless a new packet arrives for it?
> >
> That is true and even now ath10k operates the same way in pull mode. Not
> just packet arrival, even napi poll routine tries to pushes the packets.
> One more thing, fetch indication may pull ~4ms/8ms of packets from each
> tid.
> This makes deficit too low and so refilling txqs by just airtime_weight
> becomes
> cumbersome. In may_transmit, the deficit are incremented by 20 *
> airtime_weight.
> In future this will be also replaced by station specific quantum. we can
> revisit
> this once BQL in place. Performance issue is resolved by this approach.
> Do you foresee any issues?

I'll have some time in the coming weeks to be able to test this stuff.
I'm mostly interested
in algorithmic correctness more than the API changes...

Is there a version of these patches that is stable enough on ath9 or ath10k?

Do I foresee any issues? Jeeze, no, we *never* have any issues with wifi.

"fetch indication may pull ~4ms/8ms of packets from each tid"

made me really twitchy.
>
> #define IEEE80211_TXQ_MAY_TX_QUANTUM  20
> bool ieee80211_txq_may_transmit(struct ieee80211_hw *hw,
>  struct ieee80211_txq *txq)
> {
>  struct ieee80211_local *local = hw_to_local(hw);
>  struct txq_info *txqi = to_txq_info(txq);
>  struct sta_info *sta;
>  u8 ac = txq->ac;
>
>  lockdep_assert_held(>active_txq_lock[ac]);
>
>  if (!txqi->txq.sta)
>  goto out;
>
>  sta = container_of(txqi->txq.sta, struct sta_info, sta);
>  if (sta->airtime[ac].deficit >= 0)
>  goto out;
>
>  list_for_each_entry(txqi, >active_txqs[ac],
> schedule_order) {
>  if (!txqi->txq.sta)
>  continue;
>  sta = container_of(txqi->txq.sta, struct sta_info, sta);
>  sta->airtime[ac].deficit +=
>  (IEEE80211_TXQ_MAY_TX_QUANTUM *
> sta->airtime_weight);
>  }
>
>  return false;
>
>   out:
>  list_del_init(>schedule_order);
>  return true;
> }
>
> -Rajkumar
> ___
> Make-wifi-fast mailing list
> make-wifi-f...@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/make-wifi-fast



-- 

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740

Re: Tool to debug wifi pkt sniffs?

2018-10-10 Thread Dave Taht

On Wed, Oct 10, 2018 at 1:44 PM Ben Greear  wrote:
>
> On 10/10/2018 12:13 PM, Dave Taht wrote:
> > On Wed, Oct 10, 2018 at 10:10 AM Ben Greear  wrote:
> >>
> >> On 10/03/2018 01:29 PM, Dave Taht wrote:
> >>> On Wed, Oct 3, 2018 at 1:16 PM Toke Høiland-Jørgensen  
> >>> wrote:
> >>>>
> >>>> Ben Greear  writes:
> >>>>
> >>>>> Hello,
> >>>>>
> >>>>> I often find myself wanting to figure out what equipment is to blame 
> >>>>> (and why)
> >>>>> in a wifi environment.
> >>>>>
> >>>>> I am thinking writing a tool that would parse a pcap file and look at 
> >>>>> frames
> >>>>> in enough detail to flag block-ack bugs, rate-ctrl bugs, guess at the 
> >>>>> sniffer's
> >>>>> capture ability, etc.
> >>>>>
> >>>>> Does anyone have anything already written that they would like to 
> >>>>> share, or know
> >>>>> of projects that might already do some of this?
> >>>>
> >>>> Not sure if this fits your criteria, but Sven's tool to create airtime
> >>>> charts from packet sniffing data immediately came to mind:
> >>>>
> >>>> https://github.com/cloudtrax/airtime-pie-chart
> >>>
> >>> I have used that. Oy, it's a PITA. Some of kathie's code over here
> >>> (example: https://github.com/pollere/pping ) uses the slightly less
> >>> painful http://libtins.github.io/ library for parsing packets.
> >>
> >> I couldn't find anything that did what I wanted, so I wrote my own.
> >>
> >> The (perl) code is in the wifi-diag directory of this public repo:
> >>
> >> https://github.com/greearb/lanforge-scripts
> >>
> >> The rest of the scripts in that repo are not related to the wifi-diag 
> >> script, so just ignore those.
> >>
> >> Here is example output for what I have so far:
> >>
> >> https://www.candelatech.com/oss/wifi-diag/netgear-up-5s/index.html
> >
> > I *miss* writing in perl. :)

I did take a quick look at the perl. It's been too long
> >
> > My guess from looking at that output that that was a udp flood test.
> > Do I win the internets?
>
> Yes, UDP upload test with 20 emulated stations, sending ~500 byte UDP frames.
> One thing we notice in the case we are debugging, is
> that the average time from transmitter station device receiving BA from the AP
> to the transmitter station device putting the next AMPDU frame on air
> is 0.728ms for the problem AP, and 0.448ms for the good AP.

I'm not big on averages. A cdf plot would show you if the delay was consistent
across the range or had a knee in it.

>
> I checked that the wmm config in the beacons for the two APs is the same.
>
> I am at a loss as to what could cause this delay, other than possibly the 
> problem
> AP has a funky transmitter than puts a bit of extra noise on the air after it
> is done transmitting a frame?

A possible explanation would be garbage at one or more tried mcs rate
(not successfully captured).
minstrel at least tries multiple mcs rates.

>
> The problem AP also has 5% retransmits vs about 2% for the good AP, and 
> problem AP
> is typically using MCS8 instead of MCS9, but even so, I do not see how that 
> would explain
> the extra BA to AMPDU delay.

It's highly probable I'm misunderstanding you and would need to look
directly at the cap.

"typically using" says to me "more often trying the wrong rate"

>
> Thanks,
> Ben
>
>
> --
> Ben Greear 
> Candela Technologies Inc  http://www.candelatech.com
>


-- 

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740

Re: Tool to debug wifi pkt sniffs?

2018-10-10 Thread Dave Taht

On Wed, Oct 10, 2018 at 10:10 AM Ben Greear  wrote:
>
> On 10/03/2018 01:29 PM, Dave Taht wrote:
> > On Wed, Oct 3, 2018 at 1:16 PM Toke Høiland-Jørgensen  wrote:
> >>
> >> Ben Greear  writes:
> >>
> >>> Hello,
> >>>
> >>> I often find myself wanting to figure out what equipment is to blame (and 
> >>> why)
> >>> in a wifi environment.
> >>>
> >>> I am thinking writing a tool that would parse a pcap file and look at 
> >>> frames
> >>> in enough detail to flag block-ack bugs, rate-ctrl bugs, guess at the 
> >>> sniffer's
> >>> capture ability, etc.
> >>>
> >>> Does anyone have anything already written that they would like to share, 
> >>> or know
> >>> of projects that might already do some of this?
> >>
> >> Not sure if this fits your criteria, but Sven's tool to create airtime
> >> charts from packet sniffing data immediately came to mind:
> >>
> >> https://github.com/cloudtrax/airtime-pie-chart
> >
> > I have used that. Oy, it's a PITA. Some of kathie's code over here
> > (example: https://github.com/pollere/pping ) uses the slightly less
> > painful http://libtins.github.io/ library for parsing packets.
>
> I couldn't find anything that did what I wanted, so I wrote my own.
>
> The (perl) code is in the wifi-diag directory of this public repo:
>
> https://github.com/greearb/lanforge-scripts
>
> The rest of the scripts in that repo are not related to the wifi-diag script, 
> so just ignore those.
>
> Here is example output for what I have so far:
>
> https://www.candelatech.com/oss/wifi-diag/netgear-up-5s/index.html

I *miss* writing in perl. :)

My guess from looking at that output that that was a udp flood test.
Do I win the internets?

>
> The general idea is to get a performance test going, and then use tshark or 
> similar
> to grab a short sample (my script is slow, it can process only about 400 
> packets per second
> on my desktop, so a 5 sec capture at full speed takes around 5 minutes to 
> process),
> and then pipe that decoded pcap into my script.
>
> It tries to pay attention to latencies between block-ack and next AMPDU frame,
> MCS distributions, packet-type distributions, retries, and other
> such things.  I'm guessing tweaking wmm settings (or changing QoS in the 
> generated traffic)
> would be visible in this kind of metric, for instance.
>
> The goal is to be able to answer the question of why one AP is faster or 
> slower than another
> when running the same test case.
>
> Feedback (and even patches) is welcome...what other things can I report that 
> would
> be helpful?
>
>
> Thanks,
> Ben
>
> --
> Ben Greear 
> Candela Technologies Inc  http://www.candelatech.com
>


-- 

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740

Re: [PATCH 9/9] mac80211: rc80211_minstrel: remove variance / stddev calculation

2018-10-06 Thread Dave Taht

On Sat, Oct 6, 2018 at 11:18 AM Felix Fietkau  wrote:
>
> On 2018-10-06 19:59, Dave Taht wrote:
> > On Sat, Oct 6, 2018 at 10:37 AM Felix Fietkau  wrote:
> >>
> >> When there are few packets (e.g. for sampling attempts), the exponentially
> >> weighted variance is usually vastly overestimated, making the resulting 
> >> data
> >> essentially useless. As far as I know, there has not been any practical use
> >> for this, so let's not waste any cycles on it.
> >>
> >> Signed-off-by: Felix Fietkau 
> >> ---
> >>  net/mac80211/rc80211_minstrel.c|  6 -
> >>  net/mac80211/rc80211_minstrel.h| 26 +-
> >>  net/mac80211/rc80211_minstrel_debugfs.c| 14 
> >>  net/mac80211/rc80211_minstrel_ht_debugfs.c | 14 
> >>  4 files changed, 9 insertions(+), 51 deletions(-)
> >>
> >> diff --git a/net/mac80211/rc80211_minstrel.c 
> >> b/net/mac80211/rc80211_minstrel.c
> >> index dead57ba9eac..a34e9c2ca626 100644
> >> --- a/net/mac80211/rc80211_minstrel.c
> >> +++ b/net/mac80211/rc80211_minstrel.c
> >> @@ -167,12 +167,6 @@ minstrel_calc_rate_stats(struct minstrel_rate_stats 
> >> *mrs)
> >> if (unlikely(!mrs->att_hist)) {
> >> mrs->prob_ewma = cur_prob;
> >> } else {
> >> -   /* update exponential weighted moving variance */
> >> -   mrs->prob_ewmv = minstrel_ewmv(mrs->prob_ewmv,
> >> -   cur_prob,
> >> -   mrs->prob_ewma,
> >> -   EWMA_LEVEL);
> >> -
> >> /*update exponential weighted moving avarage */
> >> mrs->prob_ewma = minstrel_ewma(mrs->prob_ewma,
> >>cur_prob,
> >> diff --git a/net/mac80211/rc80211_minstrel.h 
> >> b/net/mac80211/rc80211_minstrel.h
> >> index 54b2b2c3e10a..23ec953e3a24 100644
> >> --- a/net/mac80211/rc80211_minstrel.h
> >> +++ b/net/mac80211/rc80211_minstrel.h
> >> @@ -35,19 +35,6 @@ minstrel_ewma(int old, int new, int weight)
> >> return old + incr;
> >>  }
> >>
> >> -/*
> >> - * Perform EWMV (Exponentially Weighted Moving Variance) calculation
> >> - */
> >
> > I worry about this one. where are you getting your proof from?
> I've done quite a few measurements myself to see if this can be usable
> for further rate control improvements or for the upcoming TPC work.
> The data this generates simply fluctuates wildly and incoherently based
> on the sampling behavior, making it completely useless.
> Together with Thomas (who introduced this code), I tried a few times to
> fix this, but couldn't find any way to make it coherent and usable.
>
> Thomas and I both agreed that it's better to just remove it until
> somebody has a better idea what to do.
>
> Also, this was only used for debugfs statistics, not for any actual rate
> control behavior.

OK, thanks. I'm totally delighted to see this patchset otherwise.

> - Felix



-- 

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740

Re: [PATCH 9/9] mac80211: rc80211_minstrel: remove variance / stddev calculation

2018-10-06 Thread Dave Taht

On Sat, Oct 6, 2018 at 10:37 AM Felix Fietkau  wrote:
>
> When there are few packets (e.g. for sampling attempts), the exponentially
> weighted variance is usually vastly overestimated, making the resulting data
> essentially useless. As far as I know, there has not been any practical use
> for this, so let's not waste any cycles on it.
>
> Signed-off-by: Felix Fietkau 
> ---
>  net/mac80211/rc80211_minstrel.c|  6 -
>  net/mac80211/rc80211_minstrel.h| 26 +-
>  net/mac80211/rc80211_minstrel_debugfs.c| 14 
>  net/mac80211/rc80211_minstrel_ht_debugfs.c | 14 
>  4 files changed, 9 insertions(+), 51 deletions(-)
>
> diff --git a/net/mac80211/rc80211_minstrel.c b/net/mac80211/rc80211_minstrel.c
> index dead57ba9eac..a34e9c2ca626 100644
> --- a/net/mac80211/rc80211_minstrel.c
> +++ b/net/mac80211/rc80211_minstrel.c
> @@ -167,12 +167,6 @@ minstrel_calc_rate_stats(struct minstrel_rate_stats *mrs)
> if (unlikely(!mrs->att_hist)) {
> mrs->prob_ewma = cur_prob;
> } else {
> -   /* update exponential weighted moving variance */
> -   mrs->prob_ewmv = minstrel_ewmv(mrs->prob_ewmv,
> -   cur_prob,
> -   mrs->prob_ewma,
> -   EWMA_LEVEL);
> -
> /*update exponential weighted moving avarage */
> mrs->prob_ewma = minstrel_ewma(mrs->prob_ewma,
>cur_prob,
> diff --git a/net/mac80211/rc80211_minstrel.h b/net/mac80211/rc80211_minstrel.h
> index 54b2b2c3e10a..23ec953e3a24 100644
> --- a/net/mac80211/rc80211_minstrel.h
> +++ b/net/mac80211/rc80211_minstrel.h
> @@ -35,19 +35,6 @@ minstrel_ewma(int old, int new, int weight)
> return old + incr;
>  }
>
> -/*
> - * Perform EWMV (Exponentially Weighted Moving Variance) calculation
> - */

I worry about this one. where are you getting your proof from?

> -static inline int
> -minstrel_ewmv(int old_ewmv, int cur_prob, int prob_ewma, int weight)
> -{
> -   int diff, incr;
> -
> -   diff = cur_prob - prob_ewma;
> -   incr = (EWMA_DIV - weight) * diff / EWMA_DIV;
> -   return weight * (old_ewmv + MINSTREL_TRUNC(diff * incr)) / EWMA_DIV;
> -}
> -
>  struct minstrel_rate_stats {
> /* current / last sampling period attempts/success counters */
> u16 attempts, last_attempts;
> @@ -56,11 +43,8 @@ struct minstrel_rate_stats {
> /* total attempts/success counters */
> u32 att_hist, succ_hist;
>
> -   /* statistis of packet delivery probability
> -*  prob_ewma - exponential weighted moving average of prob
> -*  prob_ewmsd - exp. weighted moving standard deviation of prob */
> +   /* prob_ewma - exponential weighted moving average of prob */
> u16 prob_ewma;
> -   u16 prob_ewmv;
>
> /* maximum retry counts */
> u8 retry_count;
> @@ -140,14 +124,6 @@ struct minstrel_debugfs_info {
> char buf[];
>  };
>
> -/* Get EWMSD (Exponentially Weighted Moving Standard Deviation) * 10 */
> -static inline int
> -minstrel_get_ewmsd10(struct minstrel_rate_stats *mrs)
> -{
> -   unsigned int ewmv = mrs->prob_ewmv;
> -   return int_sqrt(MINSTREL_TRUNC(ewmv * 1000 * 1000));
> -}
> -
>  extern const struct rate_control_ops mac80211_minstrel;
>  void minstrel_add_sta_debugfs(void *priv, void *priv_sta, struct dentry 
> *dir);
>
> diff --git a/net/mac80211/rc80211_minstrel_debugfs.c 
> b/net/mac80211/rc80211_minstrel_debugfs.c
> index 698a668b5316..c8afd85b51a0 100644
> --- a/net/mac80211/rc80211_minstrel_debugfs.c
> +++ b/net/mac80211/rc80211_minstrel_debugfs.c
> @@ -70,14 +70,13 @@ minstrel_stats_open(struct inode *inode, struct file 
> *file)
> p = ms->buf;
> p += sprintf(p, "\n");
> p += sprintf(p,
> -"best   __rate_
> statisticslast___sum-of\n");
> +"best   __rate_statistics___
> last___sum-of\n");
> p += sprintf(p,
> -"rate  [name idx airtime max_tp]  [avg(tp) avg(prob) 
> sd(prob)]  [retry|suc|att]  [#success | #attempts]\n");
> +"rate  [name idx airtime max_tp]  [avg(tp) avg(prob)]  
> [retry|suc|att]  [#success | #attempts]\n");
>
> for (i = 0; i < mi->n_rates; i++) {
> struct minstrel_rate *mr = >r[i];
> struct minstrel_rate_stats *mrs = >r[i].stats;
> -   unsigned int prob_ewmsd;
>
> *(p++) = (i == mi->max_tp_rate[0]) ? 'A' : ' ';
> *(p++) = (i == mi->max_tp_rate[1]) ? 'B' : ' ';
> @@ -93,15 +92,13 @@ minstrel_stats_open(struct inode *inode, struct file 
>

Re: Tool to debug wifi pkt sniffs?

2018-10-03 Thread Dave Taht

On Wed, Oct 3, 2018 at 1:16 PM Toke Høiland-Jørgensen  wrote:
>
> Ben Greear  writes:
>
> > Hello,
> >
> > I often find myself wanting to figure out what equipment is to blame (and 
> > why)
> > in a wifi environment.
> >
> > I am thinking writing a tool that would parse a pcap file and look at frames
> > in enough detail to flag block-ack bugs, rate-ctrl bugs, guess at the 
> > sniffer's
> > capture ability, etc.
> >
> > Does anyone have anything already written that they would like to share, or 
> > know
> > of projects that might already do some of this?
>
> Not sure if this fits your criteria, but Sven's tool to create airtime
> charts from packet sniffing data immediately came to mind:
>
> https://github.com/cloudtrax/airtime-pie-chart

I have used that. Oy, it's a PITA. Some of kathie's code over here
(example: https://github.com/pollere/pping ) uses the slightly less
painful http://libtins.github.io/ library for parsing packets.


>
> -Toke



-- 

Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619

Re: [RFC] mac80211: budget outstanding airtime for transmission

2018-09-20 Thread Dave Taht

As a side note (good work!) - I would dearly like to visibly account
for management frames somewhere that can be seen from userspace. ?

Re: [Make-wifi-fast] [PATCH] ath10k: Re-enable TXQs for all devices

2017-11-09 Thread Dave Taht

On Thu, Nov 9, 2017 at 4:10 PM, Toke Høiland-Jørgensen  wrote:
> Rajkumar Manoharan  writes:
>
>>> Commit 4ca1807815aa6801aaced7fdefa9edacc2521767 disables the use of the
>>> mac80211 TXQs for some devices because of a theoretical throughput
>>> regression. We have not seen this regression for a while now, so it should 
>>> be
>>> safe to re-enable TXQs.
>>>
>>> Signed-off-by: Toke Høiland-Jørgensen 
>>> ---
>>> This has been in LEDE trunk for a couple of months now with good results.
>>>
>> Toke,
>>
>> Good to know that the performance drop is not seen with the chips that does 
>> not
>> have push-pull support. The issue was originally reported with ap152 + 
>> qca988x
>> by community [1]. Hope this combination is also considered in LEDE.
>
> Ah, was that the original bug report? Thank you, I have not been able to
> find that anywhere!
>
> The issue that seems to point to has been fixed a while ago; I'll send
> and updated patch with a better commit message (also forgot to cc the
> ath10k list, I see).
>
> -Toke

Hmm. I remember that thread. I thought we'd basically resolved that
issue (45% of the time spent in fq_codel_drop under udp flood),
back then, with eric adding the batch drop fix to fq_codel itself:

See commit: 
https://osdn.net/projects/android-x86/scm/git/kernel/commits/9d18562a227874289fda8ca5d117d8f503f1dcca

which fixed up the problem beautifully:

https://lists.bufferbloat.net/pipermail/make-wifi-fast/2016-May/000590.html

So if we've been carrying this darn patch for the ath10k vs something
that we'd actually fixed elsewhere in the stack, for over a year,
sigh.

-- 

Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619

Re: [Make-wifi-fast] [PATCH v3] mac80211: Dynamically set CoDel parameters per station

2017-04-13 Thread Dave Taht

On Thu, Apr 6, 2017 at 8:58 AM, Toke Høiland-Jørgensen  wrote:
> Eric Dumazet  writes:
>
>> On Thu, 2017-04-06 at 11:38 +0200, Toke Høiland-Jørgensen wrote:
>>
>>> +
>>> +if (thr && thr < STA_SLOW_THRESHOLD * sta->local->num_sta) {
>>> +sta->cparams.target = MS2TIME(50);
>>> +sta->cparams.interval = MS2TIME(300);
>>> +sta->cparams.ecn = false;
>>> +} else {
>>> +sta->cparams.target = MS2TIME(20);
>>> +sta->cparams.interval = MS2TIME(100);
>>> +sta->cparams.ecn = true;
>>> +}
>>> +}
>>
>> Why ECN is flipped on/off like that ?
>
> The reasoning is that at really low bandwidths we're better off dropping
> the packet and getting potentially latency-sensitive data queued behind
> it through (see Dave's various rants with the topic "Packets have
> mass").

My general take on wifi is that if you are running at - particularly,
stuck at - a low rate (sub 6mbits in the case of this code) - you have
so many other problems like retransmits, interference, etc, in the
first place, that the presence or absence of codel here is just a
small contributor to that noise.

We could leave ecn at whatever it is set to here and not flip it on or
off. It does seem sane to twiddle the parameters enough to make sure
codel doesn't trigger at less than a MTU vs the achieved rate.

>> ECN really should be an admin choice.
>
> Well, the trouble is that the mac80211 queues don't really have an admin
> interface currently. So it'll always use ECN (before this change).

Should we add a sysfs api to this?

>> Also, this change in parameters looks suspect to me, adding a bimodal
>> behavior. I would consult Kathleen and Van on this possibility.

It's sort of trimodal, actually. I think a more effective approach
would be codel's default were the normal 5% of 100ms, bumping it up
(as per the above) when we're having bad connectivity and we tried
to tackle excessive retransmits harder, and addressed  the side
impacts of multicast, instead, as much bigger parts of the problem.

> Yeah, I agree that it's somewhat of a hack from a theoretical point of
> view. I've also been experimenting with Kathy's recommended way of
> dealing with bursty MACs (turning off CoDel when there's less than an
> MTU's worth of data left), but have not had a lot of success with it.

I'm not in a position to resume trying myself.

> Guess I can go back and try some variants on that, unless someone else
> has better ideas?

Just as stuck as you are!

>
> -Toke
> ___
> Make-wifi-fast mailing list
> make-wifi-f...@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/make-wifi-fast

-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

Re: Packet throughput (and those iperf data rate) with mac80211/ath9k is 20% worse than net80211/madwifi

2017-01-30 Thread Dave Taht

On Mon, Jan 30, 2017 at 8:17 AM, Toke Høiland-Jørgensen  wrote:
> Klaus Kinski  writes:
>
>> Hello all,
>>
>> this is a blast from the past, but something that still bothers me.
>> I have two systems with Atheros/QCA cards:
>>
>> System A:
>>   OS and driver: Linux 3.18.36 with last Madwifi/sample code from trunk
>>
>>   WLAN card: AR5413 (Senao EMP-8602 PLUS-S)
>>
>> System B:
>>   OS and driver: Linux 3.18.36 with mac80211/minstrel and ath9k from 
>> backports-4.2
>>
>>   WLAN card: AR9280 (Compex WLE200NX)
>>
>> While doing the performance measurements both systems are connected to a 
>> reference system
>>
>> with a HF cable, so there should be no outside influences.
>>
>> Both systems are running in 802.11a mode on channel 40.
>> The following table shows 802.11 data packets sent from system A and B 
>> generated by
>>
>> iperf in UDP mode over a 2s interval:
>
> What version of iperf, and configured to which rate? Some versions of
> iperf will send its traffic in very large bursts (see
> http://burntchrome.blogspot.se/2016/09/iperf3-and-microbursts.html?m=1)
> which could cause the queue inside ath9k to overflow (it is only 123
> packets pre-4.10).
>
> Did you try the latest mac80211/ath9k from 4.10? The queueing structure
> changed dramatically, which would impact this, at least if it's a queue
> overflow problem...

Packet captures would be helpful. Aircaps, if possible, also.

> -Toke



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

Re: [PATCH] mac80211: prevent skb/txq mismatch

2017-01-12 Thread Dave Taht

Yay! This sounds like a potential fix for this?

https://bugs.lede-project.org/index.php?do=details_id=368

Are all the ath10k chipsets excluded by commit:

4ca1807815aa6801aaced7fdefa9edacc2521767

Still needed to be excluded?

Re: [PATCH net-next] bridge: multicast to unicast

2017-01-10 Thread Dave Taht

On Tue, Jan 10, 2017 at 9:23 AM, Felix Fietkau <n...@nbd.name> wrote:
> On 2017-01-10 18:17, Dave Taht wrote:
>> In the case of wifi I have 3 issues with this line of thought.
>>
>> multicast in wifi has generally supposed to be unreliable. This makes
>> it reliable. reliability comes at a cost -
>>
>> multicast is typically set at a fixed low rate today. unicast is
>> retried at different rates until it succeeds - for every station
>> listening. If one station is already at the lowest rate, the total
>> cost of the transmit increases, rather than decreases.
>>
>> unicast gets block acks until it succeeds. Again, more delay.
>>
>> I think there is something like 31 soft-retries in the ath9k driver
> If I remember correctly, hardware retries are counted here as well.

I chopped this to something more reasonable but never got around to
quantifying it, so never pushed the patch. I figured I'd measure ATF
in a noisy environment (which I'd be doing now if it weren't for
https://bugs.lede-project.org/index.php?do=details_id=368 )
first.

>> what happens to diffserv markings here? for unicast CS1 goes into the
>> BE queue, CS6, the VO queue. Do we go from one flat queue for all of
>> multicast to punching it through one of the hardware queues based on
>> the diffserv mark now with this patch?

I meant CS1=BK here. Tracing the path through the bridge code made my
head hurt, I can go look at some aircaps to see if the mcast->unicast
conversion respects those markings or not (my vote is *not*).

>> I would like it if there was a way to preserve the unreliability
>> (which multiple mesh protocols depend on), send stuff with QoSNoack,
>> etc - or dynamically choose (based on the rates of the stations)
>> between conventional multicast and unicast.
>>
>> Or - better, IMHO, keep sending multicast as is but pick the best of
>> the rates available to all the listening stations for it.

> The advantage of the multicast-to-unicast conversion goes beyond simply
> selecting a better rate - aggregation matters a lot as well, and that is
> simply incompatible with normal multicast.

Except for the VO queue which cannot aggregate. And for that matter,
using any other hardware queue than BE tends to eat a txop that would
otherwise possibly be combined with an aggregate.

(and the VI queue has always misbehaved, long on my todo list)

> Some multicast streams use lots of small-ish packets, the airtime impact
> of those is vastly reduced, even if the transmission has to be
> duplicated for a few stations.

The question was basically how far up does it scale. Arguably, for a
very few, well connected stations, this patch would help. For a
network with more - and more badly connected stations, I think it
would hurt.

What sorts of multicast traffic are being observed that flood the
network sufficiently to be worth optimizing out? arp? nd? upnp? mdns?
uftp? tv?

(my questions above are related to basically trying to setup a sane
a/b test, I've been building up a new testbed in noisy environment to
match the one I have in a quiet one, and don't have any "good" mcast
tests defined. Has anyone done an a/b test of this code with some
repeatable test already?)

(In my observations... The only truly heavy creator of a multicast
"burp" has tended to be upnp and mdns on smaller networks. Things like
nd and arp get more problematic as the number of stations go up also.
I can try things like abusing vlc or uftp to see what happens?)

I certainly agree multicast is a "problem" (I've seen 20-80% or more
of a given wifi network eaten by multicast) but I'm not convinced that
making it reliable, aggregatable unicast scales much past
basement-level testing of a few "good" stations, and don't know which
protocols are making it worse, the worst, in typical environments.
Certainly apple gear puts out a lot of multicast.

...

As best as I recall a recommendation in the 802.11-2012 standard was
that multicast packets be rate-limited so that you'd have a fixed
amount of crap after each beacon sufficient to keep the rest of the
unicast traffic flowing rapidly, instead of dumping everything into a
given beacon transmit.

That, combined with (maybe) picking the "best" union of known rates
per station, was essentially the strategy I'd intended[1] to pursue
for tackling the currently infinite wifi multicast queue - fq the
entries, have a fairly short queue (codel is not the best choice here)
drop from head, and limit the number of packets transmitted per beacon
to spread them out. That would solve the issue for sparse multicast
(dhcp etc), and smooth out the burps from bigger chunks while
impacting conventional unicast minimally.

There's also the pursuit of less multicast overall at least in some protocols

https://tools.ietf.org/html/draft-ietf-dnssd-hybrid-05

>
> - Felix

[1] but make-wifi-fast has been out of funding since august

-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

Re: [PATCH net-next] bridge: multicast to unicast

2017-01-10 Thread Dave Taht

In the case of wifi I have 3 issues with this line of thought.

multicast in wifi has generally supposed to be unreliable. This makes
it reliable. reliability comes at a cost -

multicast is typically set at a fixed low rate today. unicast is
retried at different rates until it succeeds - for every station
listening. If one station is already at the lowest rate, the total
cost of the transmit increases, rather than decreases.

unicast gets block acks until it succeeds. Again, more delay.

I think there is something like 31 soft-retries in the ath9k driver

what happens to diffserv markings here? for unicast CS1 goes into the
BE queue, CS6, the VO queue. Do we go from one flat queue for all of
multicast to punching it through one of the hardware queues based on
the diffserv mark now with this patch?

I would like it if there was a way to preserve the unreliability
(which multiple mesh protocols depend on), send stuff with QoSNoack,
etc - or dynamically choose (based on the rates of the stations)
between conventional multicast and unicast.

Or - better, IMHO, keep sending multicast as is but pick the best of
the rates available to all the listening stations for it.

Has anyone actually looked at the effects of this with, say, 5-10
stations at middlin to poor quality (longer distance)? using something
to measure the real effect of the multicast conversion? (uftp, mdns?)

Re: scheduled scan interval

2016-11-21 Thread Dave Taht

On Mon, Nov 21, 2016 at 7:08 AM, Luca Coelho  wrote:
> Hi Arend,
>
> On Mon, 2016-11-21 at 13:03 +0100, Arend Van Spriel wrote:
>> On 21-11-2016 12:30, Arend Van Spriel wrote:
>> > On 21-11-2016 12:19, Arend Van Spriel wrote:
>> > > Hi Johannes, Luca,
>> > >
>> > > The gscan work made me look at scheduled scan and the implementation of
>> > > it in brcmfmac. The driver ignored the interval parameter from
>> > > user-space. Now I am fixing that. One thing is that our firmware has a
>> > > minimum interval which can not be indicated in struct wiphy. The other
>> > > issue is how the maximum interval is used in the nl80211.c.
>> > >
>> > > In nl80211_parse_sched_scan_plans() it is used against value passed in
>> > > NL80211_ATTR_SCHED_SCAN_INTERVAL and NL80211_SCHED_SCAN_PLAN_INTERVAL.
>> > > For the first one it caps the value to the maximum, but for the second
>> > > one it returns -EINVAL. I suspect this is done because maximum interval
>> > > was introduced with schedule scan plans, but it feels inconsistent.
>> >
>> > It also maybe simply wrong to cap. At least brcmfmac does not set the
>> > maximum so it will always get interval being zero. Maybe better to do:
>> >
>> > if (wiphy->max_sched_scan_plan_interval &&
>> > request->scan_plans[0].interval >
>> > wiphy->max_sched_scan_plan_interval)
>> > return -EINVAL;
>> >
>> > > Thoughts?
>>
>> Digging deeper. Looking at v4.3 before introduction of sched_scan_plans.
>> struct sched_scan_request::interval was specified in milliseconds! Below
>> the drivers that I see having scheduled scan support:
>>
>> iwlmvm: cap interval, convert to seconds.
>> ath6kl: cap to 1sec minimum, no max check, convert to seconds.
>> wl12xx: no checking in driver, fw need milliseconds.
>> wl18xx: no checking in driver, fw need milliseconds.
>>
>> The milliseconds conversion seems to be taken care of by multiplying
>> with MSEC_PER_SEC in wl{12,18}xx drivers.
>>
>> It seems in 4.8 only iwlmvm set wiphy->max_sched_scan_plan_interval so
>> other drivers will get interval of zero which only ath6kl handles.
>
> With the introduction of scheduled scan plans, we sort of deprecated
> the "generic" scheduled scan interval.  It doesn't make sense to have
> both passed at the same time, so nl80211 forbids
> NL80211_ATTR_SCHED_SCAN_INTERVAL if we pass
> NL80211_ATTR_SCHED_SCAN_PLANS.
>
> The original NL80211_ATTR_SCHED_SCAN_INTERVAL was specified in msecs,
> which is silly because we can never get millisecond accuracy in this.
> Thus, in the plans API, we decided to use seconds instead (because it
> makes much more sense).  Additionally, the interval is considered
> "advisory", because the FW may not be able guarantee the exact
> intervals (for instance, the iwlwifi driver actually starts the
> interval timer after scan completion, so if you specify 10 seconds
> intervals, in practice they'll be 13-14 seconds).
>
> I'm not sure I'm answering your question, because I'm also not sure I
> understood the question. :)

I'm not sure if I understand the discussion and hooks myself, but
recently fixes for doing channel scans saner from userspace ended up
here, after some discussion.

https://bugzilla.gnome.org/show_bug.cgi?id=766482

Anything that can reduce the impact of this behavior, I'm for!

http://www.taht.net/~d/channel_scans_destroying_latency_under_load_for_10s.png



>
> --
> Cheers,
> Luca.



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

make-wifi-fast linuxplumbers talk summarized on lwn.net

2016-11-08 Thread Dave Taht

and available here:

https://lwn.net/SubscriberLink/705884/1bdb9c4aa048b0d5/

After the talk I discussed with several folk about applying the same
debloating techniques to other chipsets.

I don't remember, unfortunately, who all those folk were, nor the
candidate chipsets!

-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

Re: [RFC] mac80211: set wifi_acked[_valid] bits for transmitted SKBs

2016-10-27 Thread Dave Taht

On Thu, Oct 27, 2016 at 7:21 AM, Johannes Berg
 wrote:
> From: Johannes Berg 
>
> There may be situations in which the in-kernel originator of an
> SKB cares about its wifi transmission status. To have that, set
> the wifi_acked[_valid] bits before freeing/orphaning the SKB if
> the destructor is set. The originator can then use it in there.
>
> Signed-off-by: Johannes Berg 
> ---
>  net/mac80211/status.c | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/net/mac80211/status.c b/net/mac80211/status.c
> index ddf71c648cab..dc3132d0effe 100644
> --- a/net/mac80211/status.c
> +++ b/net/mac80211/status.c
> @@ -541,6 +541,11 @@ static void ieee80211_report_used_skb(struct 
> ieee80211_local *local,
> } else if (info->ack_frame_id) {
> ieee80211_report_ack_skb(local, info, acked, dropped);
> }
> +
> +   if (!dropped && skb->destructor) {
> +   skb->wifi_acked_valid = 1;
> +   skb->wifi_acked = acked;
> +   }
>  }

One of the things I've been curious about one day trying to take advantage of
is the pacing available from sch_fq, in a world where we were trying
to take advantage of the chocolatey goodness of the new TCP BBR
congestion control algorithm. (sch_fq is apparently required for BBR
to work right)

By moving the fq_codel algo into the softmac layer as we are doing, we
currently expose the "noqueue" interface to the qdisc layer, there, which
works great for routers, but for dual use (acting as a NAS host and
routing) seems less than ideal.

Now it turns out that you can indeed slap the fq qdisc on top of the
new wifi intermediate queues code...

dave@nemesis:~/slashdot$ tc -s qdisc show dev wlp3s0
qdisc fq 8001: root refcnt 5 limit 1p flow_limit 100p buckets 1024
orphan_mask 1023 quantum 3028 initial_quantum 15140 refill_delay
40.0ms
 Sent 30828141202 bytes 20530733 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  1127 flows (1127 inactive, 0 throttled)
  0 gc, 117 highprio, 714646 throttled

but as 1127 inactive flows have been there for a day now, and don't
show up in netstat, I guess that somewhere in here we aren't
"retiring" a skb in a way the tcp stack understands.

root@nemesis:~/slashdot# tc qdisc del dev wlp3s0 root
root@nemesis:~/slashdot# tc -s qdisc show dev wlp3s0
qdisc noqueue 0: root refcnt 2
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0

>
>  /*
> --
> 2.9.3
>

-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

Re: Bayesian rate control

2016-10-24 Thread Dave Taht

On Sun, Oct 23, 2016 at 6:57 AM, Björn Smedman  wrote:
> Hi all,
>
> I've been thinking about rate control a bit lately. I've written up
> some of my thoughts in a blog post
> (http://www.openias.org/bayesian-wifi-rate-control), but very briefly

It is nice to see some newer thinking here.

> put I'd like to build a rate control algorithm based on Bayesian
> statistical inference, possibly by modeling the rate control problem
> as a "multi-armed bandit" problem and/or using Thompson sampling.

The paper on minstrel's design was never widely published. I linked to it here:

http://blog.cerowrt.org/post/minstrel/

Looking harder at rate control has long been on my todo list, but at
the top of my list to finish first has been the fair queuing
(fq_codel) and airtime fairness work.

https://blog.tohojo.dk/2016/06/fixing-the-wifi-performance-anomaly-on-ath9k.html#results

http://blog.cerowrt.org/post/real_results

Once you are statistically hitting more stations, more often, on a
more regular basis, with smaller txops, I felt that many things that
were perceived as rate control problems would go away, and other
things become easier.

A basic "fix" to minstrel is to opportunistically sample (which so far
as I know, minstrel-blues does), rather than at a fixed rate.

btw: I called my early (unpublished) attempt at a "minstrel-2", "bard". :)

The now-enormous search space is a big problem in present-day
minstrel, followed by excessive retries/latency when sampling, and
hidden stations are becoming more and more of a problem as densities
go up. (long list of minstrel issues on that first link I posted
above).

> A couple of questions for the list:
>
> 1. Is there anybody else out there thinking along similar lines?

Yes and no. At the moment I am thinking about the insights from the
TCP "BBR" work google just published: (paywalled but at:
http://queue.acm.org/app/ ) where they also point to max-plus algebra
as being helpful for solving the problems it had.

> I'd very much like to find collaborators interested in working on
> this. It coruld serve as a pretty nice masters thesis problem, for
> example.

Please join us over on the make-wifi-fast list. There are more than a
few good papers to be had out of it.

>
> 2. What would be the best hardware/software stack to base this work on?

Presently ath9k is the only game in town, and developing/debugging on
x86 is the easiest.

> I'm thinking the best driver for rate control experimentation would be
> ath9k, right? If so then a TP-Link TL-WA901ND router (apparently based
> on Qualcomm QCA956x SOC) with OpenWrt, and a TP-Link TL-WDN4800 PCIe
> card (apparently based on Atheros AR9380 with PCI ID 168c:0030) for my
> desktop sounds like a good combo, no? But would I have to run a custom
> kernel on my desktop then (or can I somehow get by with an Ubuntu
> standard kernel)?

These days I am using a pcengines apu2 as my primary x86 testbed, with
ath9k and ath10k cards in it (and one day mt72). The new turris omnia
looks like a good platform also. I've been trying to use stuff newer
than AR92xx there.

Another box I really like is the ubnt uap-lite.

Prior to all those, it was the wndr3800, archer c7v2, and nanostation
m5s for outdoor work.

>
> Any other thoughts or pointers are also more than welcome.
>
> Many thanks,
>
> Björn Smedman

-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

Re: [PATCH v3] ath10k: implement NAPI support

2016-08-26 Thread Dave Taht

On Fri, Aug 26, 2016 at 4:12 AM, Johannes Berg
<johan...@sipsolutions.net> wrote:
> On Fri, 2016-08-26 at 03:48 -0700, Dave Taht wrote:
>> I'm always rather big on people testing latency under load, and napi
>> tends to add some.
>
> That's a completely useless comment.
>
> Obviously, everybody uses NAPI; it's necessary for system load and thus
> performance, and lets drivers take advantage of TCP merging to reduce
> ACKs, which is tremendously helpful (over wifi in particular.)
>
> Please stop making such drive-by comments that focus only on the single
> thing you find important above all; not all people can care only about
> that single thing, and unconstructively reiterating it over and over
> doesn't help.

Well, I apologize for being testy. It is  I spent a lot of time
testing michal's patchset for the ath10k back in may, and I *will* go
and retest ath10k, when these patches land. My principal concern with
using napi is at lower rates than the maxes typically reported in a
patchset.

 But it would be nice if people always did test for latency under load
when making improvements, before getting to me, and despite having
helped make a very comprehensive test suite available (flent) that
tests all sorts of things for wifi, getting people to actually use it
to see real problems, (in addition to latency under load!) while their
fingers are still hot in the codebase, and track/plot their results,
remains an ongoing issue across the entire industry.

http://blog.cerowrt.org/post/fq_codel_on_ath10k/

There are many other problems in wifi, of course, that could use
engineering mental internalization, like airtime fairness, and the
mis-behavior of the hardware queues,

http://blog.cerowrt.org/post/cs5_lockout/

wifi channel scans

http://blog.cerowrt.org/post/disabling_channel_scans/

and so on.

I have a ton more datasets and blog entries left to write up from the
ath9k work thus far which point to some other issues (minstrel,
aggregation, retries)

> Thanks,
> johannes

-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

Re: [PATCH v3] ath10k: implement NAPI support

2016-08-26 Thread Dave Taht

I'm always rather big on people testing latency under load, and napi
tends to add some.

Re: [Make-wifi-fast] [PATCH v2] mac80211: Move crypto IV generation to after TXQ dequeue.

2016-08-17 Thread Dave Taht

On Wed, Aug 17, 2016 at 9:49 PM, Johannes Berg
 wrote:
> Hi,
>
> You need to work on coding style, a lot of your indentation is
> completely messed up.
>
>> + switch (sdata->vif.type) {
>> + case NL80211_IFTYPE_STATION:
>> + if (sdata->u.mgd.use_4addr) {
>> + pn_offs = 30;
>> + break;
>> + }
>> + pn_offs = 24;
>> + break;
>> + case NL80211_IFTYPE_AP_VLAN:
>> + if (sdata->wdev.use_4addr) {
>> + pn_offs = 30;
>> + break;
>> + }
>> + /* fall through */
>> + case NL80211_IFTYPE_ADHOC:
>> + case NL80211_IFTYPE_AP:
>> + pn_offs = 24;
>> + break;
>> + default:
>> + return;
>> + }
>> +
>> + if (sta->sta.wme) {
>> + pn_offs += 2;
>> + }
>
> I think you just reinvented ieee80211_hdrlen(). No?
>
>> - if (fast_tx->pn_offs) {
>> - u64 pn;
>> - u8 *crypto_hdr = skb->data + fast_tx->pn_offs;
>
> No need to undo the pn_offs optimisation for the !txq case, you can
> pass it in to the new function that will fill it.
>
> However, you're still doing it wrong - now you haven't fixed anything
> for TKIP, which won't hit the fastpath.

well, we're getting there. the results of both patch attempts were
really nice, and brought encrypted performance with fq back into line
with unencrypted. Still running crypted tests as I write...

So fixing TKIP would be next, forcing the AP to use that? What other
scenarios do we have to worry about? WDS?


> johannes
> ___
> Make-wifi-fast mailing list
> make-wifi-f...@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/make-wifi-fast



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

Re: [PATCH] ath10k: disable wake_tx_queue for older devices

2016-08-04 Thread Dave Taht

On Thu, Aug 4, 2016 at 12:07 PM, Roman Yeryomin <leroi.li...@gmail.com> wrote:
> On 1 August 2016 at 12:04, Dave Taht <dave.t...@gmail.com> wrote:
>> On Mon, Aug 1, 2016 at 1:35 AM, Roman Yeryomin <leroi.li...@gmail.com> wrote:
>>> On 7 July 2016 at 19:30, Valo, Kalle <kv...@qca.qualcomm.com> wrote:
>>>> Michal Kazior <michal.kaz...@tieto.com> writes:
>>>>
>>>>> Ideally wake_tx_queue should be used regardless as
>>>>> it is a requirement for reducing bufferbloat and
>>>>> implementing airtime fairness in the future.
>>>>>
>>>>> However some setups (typically low-end platforms
>>>>> hosting QCA988X) suffer performance regressions
>>>>> with the current wake_tx_queue implementation.
>>>>> Therefore disable it unless it is really
>>>>> beneficial with current codebase (which is when
>>>>> firmware supports smart pull-push tx scheduling).
>>>>>
>>>>> Signed-off-by: Michal Kazior <michal.kaz...@tieto.com>
>>>>
>>>> I think it's too late to send this to 4.7 anymore (and this due to my
>>>> vacation). So I'm planning to queue this to 4.8, but if the feedback is
>>>> positive we can always send this to a 4.7 stable release.
>>>>
>>>
>>> Sorry guys, drowned.
>>> So, yes, applying this patch does the job. That is gets me to the
>>> results similar to
>>> https://lists.openwrt.org/pipermail/openwrt-devel/2016-May/041448.html
>>>
>>> Going to try latest code on same system...
>>
>> Can you try increasing the quantum to 1514, and reducing the codel
>> target to 5ms? (without this patch?)
>>
>
> So it was 1514 already...

based on some testing of 20, codel target should be 5ms and isn't.

https://github.com/torvalds/linux/commit/5caa328e3811b7cfa33fd02c93280ffa622deb0e

> Regards,
> Roman
>
> ___
> ath10k mailing list
> ath...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/ath10k



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] ath10k: disable wake_tx_queue for older devices

2016-08-01 Thread Dave Taht

On Mon, Aug 1, 2016 at 1:35 AM, Roman Yeryomin  wrote:
> On 7 July 2016 at 19:30, Valo, Kalle  wrote:
>> Michal Kazior  writes:
>>
>>> Ideally wake_tx_queue should be used regardless as
>>> it is a requirement for reducing bufferbloat and
>>> implementing airtime fairness in the future.
>>>
>>> However some setups (typically low-end platforms
>>> hosting QCA988X) suffer performance regressions
>>> with the current wake_tx_queue implementation.
>>> Therefore disable it unless it is really
>>> beneficial with current codebase (which is when
>>> firmware supports smart pull-push tx scheduling).
>>>
>>> Signed-off-by: Michal Kazior 
>>
>> I think it's too late to send this to 4.7 anymore (and this due to my
>> vacation). So I'm planning to queue this to 4.8, but if the feedback is
>> positive we can always send this to a 4.7 stable release.
>>
>
> Sorry guys, drowned.
> So, yes, applying this patch does the job. That is gets me to the
> results similar to
> https://lists.openwrt.org/pipermail/openwrt-devel/2016-May/041448.html
>
> Going to try latest code on same system...

Can you try increasing the quantum to 1514, and reducing the codel
target to 5ms? (without this patch?)

>
> Regards,
> Roman
>
> ___
> ath10k mailing list
> ath...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/ath10k



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: TCP performance regression in mac80211 triggered by the fq code

2016-07-18 Thread Dave Taht

Just to add another datapoint, the "rack" optimization for tcp entered
the kernel recently. It has some "interesting" timing/batching
sensitive behaviors. While the TSO case is described, the packet
aggregation case seems similar, and is not.

https://www.ietf.org/proceedings/96/slides/slides-96-tcpm-3.pdf


10 Jan 2016

https://kernelnewbies.org/Linux_4.4#head-2583c31a65e6592bef9af426a78940078df7f630

The draft was significantly updated this month.

https://tools.ietf.org/html/draft-cheng-tcpm-rack-01
 -- Andrew Shewmaker

On Mon, Jul 18, 2016 at 2:49 PM, Toke Høiland-Jørgensen  wrote:
> Toke Høiland-Jørgensen  writes:
>
>> Felix Fietkau  writes:
>>
>>> Hi,
>>>
>>> With Toke's ath9k txq patch I've noticed a pretty nasty performance
>>> regression when running local iperf on an AP (running the txq stuff) to
>>> a wireless client.
>>>
>>> Here's some things that I found:
>>> - when I use only one TCP stream I get around 90-110 Mbit/s
>>> - when running multiple TCP streams, I get only 35-40 Mbit/s total
>>> - fairness between TCP streams looks completely fine
>>> - there's no big queue buildup, the code never actually drops any packets
>>> - if I put a hack in the fq code to force the hash to a constant value
>>> (effectively disabling fq without disabling codel), the problem
>>> disappears and even multiple streams get proper performance.
>>>
>>> Please let me know if you have any ideas.
>>
>> Hmm, I see two TCP streams get about the same aggregate throughput as
>> one, both when started from the AP and when started one hop away.
>
> So while I have still not been able to reproduce the issue you
> described, I have seen something else that is at least puzzling, and may
> or may not be related:
>
> When monitoring the output of /sys/kernel/debug/ieee80211/phy0/aqm I see
> that all stations have their queues empty all the way to zero several
> times per second. This is a bit puzzling; the queue should be kept under
> control, but really shouldn't empty completely. I figure this might also
> be the reason why you're seeing degraded performance...
>
> Since the stats output doesn't include a counter for drops, I haven't
> gotten any further with figuring out if it's CoDel that's being too
> aggressive, or what is happening. But will probably add that in and take
> another look.
>
> -Toke
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: TCP performance regression in mac80211 triggered by the fq code

2016-07-13 Thread Dave Taht

On Wed, Jul 13, 2016 at 10:53 AM, Felix Fietkau  wrote:

>> To me this implies a contending lock issue, too much work in the irq
>> handler or too delayed work in the softirq handler
>>
>> I thought you were very brave to try and backport this.
> I don't think this has anything to do with contending locks, CPU
> utilization, etc. The code does something to the packets that TCP really
> doesn't like.

With your 70% idle figure, I am inclined to agree... could you get an aircap
of the two different tests?  - as well as a regular packetcap taken at
the client or server?
And put somewhere I can get at them?

What version of OSX are you running?

I will setup an ath9k box shortly...

-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: TCP performance regression in mac80211 triggered by the fq code

2016-07-13 Thread Dave Taht

On Tue, Jul 12, 2016 at 4:02 PM, Dave Taht <dave.t...@gmail.com> wrote:
> On Tue, Jul 12, 2016 at 3:21 PM, Felix Fietkau <n...@nbd.name> wrote:
>> On 2016-07-12 14:13, Dave Taht wrote:
>>> On Tue, Jul 12, 2016 at 12:09 PM, Felix Fietkau <n...@nbd.name> wrote:
>>>> Hi,
>>>>
>>>> With Toke's ath9k txq patch I've noticed a pretty nasty performance
>>>> regression when running local iperf on an AP (running the txq stuff) to
>>>> a wireless client.
>>>
>>> Your kernel? cpu architecture?
>> QCA9558, 720 MHz, running Linux 4.4.14

So this is a single core at the near-bottom end of the range. I guess
we also should find a MIPS 24c derivative that runs at 400Mhz or so.

What HZ? (I no longer know how much higher HZ settings make any
difference, but I'm usually at NOHZ and 250, rather than 100.)

And all the testing to date was on much higher end multi-cores.

>>> What happens when going through the AP to a server from the wireless client?
>> Will test that next.

And?

>>
>>> Which direction?
>> AP->STA, iperf running on the AP. Client is a regular MacBook Pro
>> (Broadcom).
>
> There are always 2 wifi chips in play. Like the Sith.
>
>>>> Here's some things that I found:
>>>> - when I use only one TCP stream I get around 90-110 Mbit/s
>>>
>>> with how much cpu left over?
>> ~20%
>>
>>>> - when running multiple TCP streams, I get only 35-40 Mbit/s total
>>> with how much cpu left over?
>> ~30%

To me this implies a contending lock issue, too much work in the irq
handler or too delayed work in the softirq handler

I thought you were very brave to try and backport this.

>
> Hmm.
>
> Care to try netperf?
>
>>
>>> context switch difference between the two tests?
>> What's the easiest way to track that?
>
> if you have gnu "time" time -v the_process
>
> or:
>
> perf record -e context-switches -ag
>
> or: process /proc/$PID/status for cntx
>
>>> tcp_limit_output_bytes is?
>> 262144
>
> I keep hoping to be able to reduce this to something saner like 4096
> one day. It got bumped to 64k based on bad wifi performance once, and
> then to it's current size to make the Xen folk happier.
>
> The other param I'd like to see fiddled with is tcp_notsent_lowat.
>
> In both cases reductions will increase your context switches but
> reduce memory pressure and lead to a more reactive tcp.
>
> And in neither case I think this is the real cause of this problem.
>
>
>>> got perf?
>> Need to make a new build for that.
>>
>>>> - fairness between TCP streams looks completely fine
>>>
>>> A codel will get to long term fairness pretty fast. Packet captures
>>> from a fq will show much more regular interleaving of packets,
>>> regardless.
>>>
>>>> - there's no big queue buildup, the code never actually drops any packets
>>>
>>> A "trick" I have been using to observe codel behavior has been to
>>> enable ecn on server and client, then checking in wireshark for ect(3)
>>> marked packets.
>> I verified this with printk. The same issue already appears if I have
>> just the fq patch (with the codel patch reverted).
>
> OK. A four flow test "should" trigger codel
>
> Running out of cpu (or hitting some other bottleneck), without
> loss/marking "should" result in a tcptrace -G and xplot.org of the
> packet capture showing the window continuing to increase
>
>
>>>> - if I put a hack in the fq code to force the hash to a constant value
>>>
>>> You could also set "flows" to 1 to keep the hash being generated, but
>>> not actually use it.
>>>
>>>> (effectively disabling fq without disabling codel), the problem
>>>> disappears and even multiple streams get proper performance.
>>>
>>> Meaning you get 90-110Mbits ?
>> Right.
>>
>>> Do you have a "before toke" figure for this platform?
>> It's quite similar.
>>
>>>> Please let me know if you have any ideas.
>>>
>>> I am in berlin, packing hardware...
>> Nice!
>>
>> - Felix
>>
>
>
>
> --
> Dave Täht
> Let's go make home routers and wifi faster! With better software!
> http://blog.cerowrt.org



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: TCP performance regression in mac80211 triggered by the fq code

2016-07-12 Thread Dave Taht

On Tue, Jul 12, 2016 at 3:21 PM, Felix Fietkau <n...@nbd.name> wrote:
> On 2016-07-12 14:13, Dave Taht wrote:
>> On Tue, Jul 12, 2016 at 12:09 PM, Felix Fietkau <n...@nbd.name> wrote:
>>> Hi,
>>>
>>> With Toke's ath9k txq patch I've noticed a pretty nasty performance
>>> regression when running local iperf on an AP (running the txq stuff) to
>>> a wireless client.
>>
>> Your kernel? cpu architecture?
> QCA9558, 720 MHz, running Linux 4.4.14
>
>> What happens when going through the AP to a server from the wireless client?
> Will test that next.
>
>> Which direction?
> AP->STA, iperf running on the AP. Client is a regular MacBook Pro
> (Broadcom).

There are always 2 wifi chips in play. Like the Sith.

>>> Here's some things that I found:
>>> - when I use only one TCP stream I get around 90-110 Mbit/s
>>
>> with how much cpu left over?
> ~20%
>
>>> - when running multiple TCP streams, I get only 35-40 Mbit/s total
>> with how much cpu left over?
> ~30%

Hmm.

Care to try netperf?

>
>> context switch difference between the two tests?
> What's the easiest way to track that?

if you have gnu "time" time -v the_process

or:

perf record -e context-switches -ag

or: process /proc/$PID/status for cntx

>> tcp_limit_output_bytes is?
> 262144

I keep hoping to be able to reduce this to something saner like 4096
one day. It got bumped to 64k based on bad wifi performance once, and
then to it's current size to make the Xen folk happier.

The other param I'd like to see fiddled with is tcp_notsent_lowat.

In both cases reductions will increase your context switches but
reduce memory pressure and lead to a more reactive tcp.

And in neither case I think this is the real cause of this problem.


>> got perf?
> Need to make a new build for that.
>
>>> - fairness between TCP streams looks completely fine
>>
>> A codel will get to long term fairness pretty fast. Packet captures
>> from a fq will show much more regular interleaving of packets,
>> regardless.
>>
>>> - there's no big queue buildup, the code never actually drops any packets
>>
>> A "trick" I have been using to observe codel behavior has been to
>> enable ecn on server and client, then checking in wireshark for ect(3)
>> marked packets.
> I verified this with printk. The same issue already appears if I have
> just the fq patch (with the codel patch reverted).

OK. A four flow test "should" trigger codel

Running out of cpu (or hitting some other bottleneck), without
loss/marking "should" result in a tcptrace -G and xplot.org of the
packet capture showing the window continuing to increase


>>> - if I put a hack in the fq code to force the hash to a constant value
>>
>> You could also set "flows" to 1 to keep the hash being generated, but
>> not actually use it.
>>
>>> (effectively disabling fq without disabling codel), the problem
>>> disappears and even multiple streams get proper performance.
>>
>> Meaning you get 90-110Mbits ?
> Right.
>
>> Do you have a "before toke" figure for this platform?
> It's quite similar.
>
>>> Please let me know if you have any ideas.
>>
>> I am in berlin, packing hardware...
> Nice!
>
> - Felix
>



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: TCP performance regression in mac80211 triggered by the fq code

2016-07-12 Thread Dave Taht

On Tue, Jul 12, 2016 at 2:57 PM, Toke Høiland-Jørgensen <t...@toke.dk> wrote:
> Dave Taht <dave.t...@gmail.com> writes:
>
>>> As for why this would happen... There could be a bug in the dequeue code
>>> somewhere, but since you get better performance from sticking everything
>>> into one queue, my best guess would be that the client is choking on the
>>> interleaved packets? I.e. expending more CPU when it can't stick
>>> subsequent packets into the same TCP flow?
>>
>> I share this concern.
>>
>> The quantum is? I am not opposed to a larger quantum (2 full size
>> packets = 3028 in this case?).
>
> The quantum is hard-coded to 300 bytes in the current implementation
> (see net/fq_impl.h).

don't do that. :)

A single full size packet is preferable, and saves going around the
main dequeue loop 5-6 times per flow on this workload.

My tests on the prior patch set were mostly at the larger quantum.


> -Toke



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: TCP performance regression in mac80211 triggered by the fq code

2016-07-12 Thread Dave Taht

On Tue, Jul 12, 2016 at 2:28 PM, Toke Høiland-Jørgensen  wrote:
> Felix Fietkau  writes:
>
>> Hi,
>>
>> With Toke's ath9k txq patch I've noticed a pretty nasty performance
>> regression when running local iperf on an AP (running the txq stuff) to
>> a wireless client.
>>
>> Here's some things that I found:
>> - when I use only one TCP stream I get around 90-110 Mbit/s
>> - when running multiple TCP streams, I get only 35-40 Mbit/s total
>> - fairness between TCP streams looks completely fine
>> - there's no big queue buildup, the code never actually drops any packets
>> - if I put a hack in the fq code to force the hash to a constant value
>> (effectively disabling fq without disabling codel), the problem
>> disappears and even multiple streams get proper performance.
>>
>> Please let me know if you have any ideas.
>
> Hmm, I see two TCP streams get about the same aggregate throughput as
> one, both when started from the AP and when started one hop away.
> However, do see TCP flows take a while to ramp up when started from the
> AP - a short test gets ~70Mbps when run from one hop away and ~50Mbps
> when run from the AP. how long are you running the tests for?
>
> (I seem to recall the ramp-up issue to be there pre-patch as well,
> though).

The original ath10k code had a "swag" at hooking in an estimator from
rate control.
With minstrel in play that can be done better in the ath9k.

> As for why this would happen... There could be a bug in the dequeue code
> somewhere, but since you get better performance from sticking everything
> into one queue, my best guess would be that the client is choking on the
> interleaved packets? I.e. expending more CPU when it can't stick
> subsequent packets into the same TCP flow?

I share this concern.

The quantum is? I am not opposed to a larger quantum (2 full size
packets = 3028 in this case?).

> -Toke
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: TCP performance regression in mac80211 triggered by the fq code

2016-07-12 Thread Dave Taht

On Tue, Jul 12, 2016 at 12:09 PM, Felix Fietkau  wrote:
> Hi,
>
> With Toke's ath9k txq patch I've noticed a pretty nasty performance
> regression when running local iperf on an AP (running the txq stuff) to
> a wireless client.

Your kernel? cpu architecture?

What happens when going through the AP to a server from the wireless client?

Which direction?

> Here's some things that I found:
> - when I use only one TCP stream I get around 90-110 Mbit/s

with how much cpu left over?

> - when running multiple TCP streams, I get only 35-40 Mbit/s total

with how much cpu left over?
context switch difference between the two tests?
tcp_limit_output_bytes is?

got perf?

> - fairness between TCP streams looks completely fine

A codel will get to long term fairness pretty fast. Packet captures
from a fq will show much more regular interleaving of packets,
regardless.

> - there's no big queue buildup, the code never actually drops any packets

A "trick" I have been using to observe codel behavior has been to
enable ecn on server and client, then checking in wireshark for ect(3)
marked packets.

> - if I put a hack in the fq code to force the hash to a constant value

You could also set "flows" to 1 to keep the hash being generated, but
not actually use it.

> (effectively disabling fq without disabling codel), the problem
> disappears and even multiple streams get proper performance.

Meaning you get 90-110Mbits ?

Do you have a "before toke" figure for this platform?

> Please let me know if you have any ideas.

I am in berlin, packing hardware...

>
> - Felix
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: A question about MAC80211 (3.9)

2016-07-11 Thread Dave Taht

What are you trying to accomplish? I look forward to seeing any rate
control algorithm that can address the issues in minstrel!
( http://blog.cerowrt.org/post/minstrel/ )

It has generally been my hope to implement some form of better service
sharing between the VO, VI, and BE queues than what currently exists
in linux mainline... after the swq, fq_codel and airtime fairness
stuff finishes landing. (
https://blog.tohojo.dk/2016/06/fixing-the-wifi-performance-anomaly-on-ath9k.html
)

VO could only offer service once per 10ms per destination, VI, cap the
backlog at no more than 50ms, and in neither case starve BE, BK could
often be folded into BE, and so on. We've got bits of an architecture
for managing these queues discussed out (more discussion at ietf next
week), folding flows into the most appropriate queue rather than
treating them as totally separate. (so as to take better advantage of
aggregation)

Some of the other things in the make-wifi-fast backlog are in the appendix here:

https://docs.google.com/document/d/1Se36svYE1Uzpppe1HWnEyat_sAGghB3kE285LElJBW4/edit#heading=h.3ankl68j6jjo

On Mon, Jul 11, 2016 at 2:24 PM, Joan Josep Aleixendri
 wrote:
> Hello everybody!
>
>
> I'm a student working on a project about mac80211. I'm trying to modify the
> behaviour of the AC's queues but I could really need some help about this
> subject.
>
>
> We are implementing a rate control algorithm. To do this we enqueue packets
> on ieee80211_tx_frags() function, just after we are sure the packet is going
> to be submitted to the driver under the standard code. You can see the code
> below
>
>
> The problem is that once we send a packet that was going to the driver to
> the pending queue (because our custom logic), the tasklet takes over and
> tries to resubmit the packet to tx_frags(). The result is that we start
> losing connection and queuing the same packet over and over. Is there a way
> to stop the tasklet for some defined period of time? Or maybe another advice
> to implement rate limiting at mac80211?
>
>
> Thanks for your time! And i really appreciate your help!
>
>
> Joan Josep Aleixendri Cruelles
>
>
> Code:
>
> ieee80211_tx_frags():
>
> spin_lock_irqsave(>queue_stop_reason_lock, flags);
> if (local->queue_stop_reasons[q] ||
> (!txpending && !skb_queue_empty(>pending[q]))) {
> if (unlikely(info->flags &
> IEEE80211_TX_INTFL_OFFCHAN_TX_OK)) {
> if (local->queue_stop_reasons[q] &
> ~BIT(IEEE80211_QUEUE_STOP_REASON_OFFCHANNEL)) {
>  /*
>  * Drop off-channel frames if queues
>  * are stopped for any reason other
>  * than off-channel operation. Never
>  * queue them.
>  */
>  spin_unlock_irqrestore(
> >queue_stop_reason_lock,
>  flags);
> ieee80211_purge_tx_queue(>hw,
>  skbs);
>  return true;
>  }
>  } else {
>  /*
>  * Since queue is stopped, queue up frames for
>  * later transmission from the tx-pending
>  * tasklet when the queue is woken again.
>  */
>  if (txpending)
> skb_queue_splice_init(skbs,
>  >pending[q]);
>  else
>  skb_queue_splice_tail_init(skbs,
> >pending[q]);
>
> spin_unlock_irqrestore(>queue_stop_reason_lock,
> flags);
>  return false;
>  }
>  }
> ++   // CUSTOM CODE
> ++   if(packet_meets_requirements_to_be_xmitted()){
> ++   goto xmit;
> ++   }
> ++   else{
> ++   //Send the packet back to pending
> ++   if (txpending)
> ++   skb_queue_splice_init(skbs,
> ++   >pending[q]);
> ++   else
> ++   skb_queue_splice_tail_init(skbs,
> ++   >pending[q]);
> ++ spin_unlock_irqrestore(>queue_stop_reason_lock,
> ++   flags);
> ++   return false;
> ++   }
> ++   xmit:
> spin_unlock_irqrestore(>queue_stop_reason_lock, flags);
>
>  info->control.vif = vif;
>  control.sta = sta;
>
>  __skb_unlink(skb, skbs);
>  drv_tx(local, , skb);
>  }
>
> return true;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org
--
To

Re: [ath9k-devel] [PATCH] ath9k: Support 4.9Ghz channels on AR9580 adapter.

2016-06-21 Thread Dave Taht

On Tue, Jun 21, 2016 at 2:41 AM, Jouni Malinen  wrote:
> On Tue, Jun 21, 2016 at 11:02:20AM +1000, Julian Calaby wrote:
>> I've only done this work as I hate to see people's efforts go to
>> waste and I feel that there's enough roadblocks in the way of
>> actually using this functionality that casual idiots won't be able
>> to.
>
> Are these really ready to go to the upstream kernel in this state and
> without the other changes that would be needed to operate correctly?
> What is the use case for these and how have these been tested?

So far as I know the use case for these is to make it possible to build
open source wifi systems that enable emergency services. This
strikes me as a worthy goal.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [ath9k-devel] [PATCH 1/2] ath9k: use mac80211 intermediate software queues

2016-06-17 Thread Dave Taht

>>  struct ath_atx_tid {
>>   struct list_head list;
>> + struct sk_buff_head i_q;
> Do we really need a third queue here? Instead of adding yet another
> layer of queueing here, I think we should even get rid of buf_q.

Less queues, more filling!

>
> Channel context based queue handling can be dealt with by
> stopping/starting relevant queues on channel context changes.

what can be done to reduce the impact of channel scans?

http://blog.cerowrt.org/post/disabling_channel_scans/

> buf_q becomes unnecessary when you remove all code in the drv_tx
> codepath that moves frames to the intermediate queue.
>
> Any frame that was pulled from the intermediate queue and prepared for
> tx, but which can't be sent right now can simply be queued to retry_q.
>
> This will also help with getting the diffstat insertion/deletion ratio
> under control ;)

The ideas here can apply elsewhere, also. Are you still actively
working with the mt76?

Anything else "out there" besides that and the ath5k worth looking at?

Am I seeing patches and firmware changes for better statistic keeping
on the ath10k that look promising for airtime fairness... or am I
delusional?

> elsewhere powersave was mentioned

How big can a powersave queue get?
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: How to use fractional center frequencies (4942.5, for instance)?

2016-06-06 Thread Dave Taht

On Mon, Jun 6, 2016 at 8:04 AM, Ben Greear  wrote:
>
> It appears that some cisco equipment, at least, uses fractional center
> frequencies for 5Mhz channels.

I was not aware that anything supported 5Mhz other than ath9k and
ath5k. Thx. Can you identify what cisco gear supports this?

I have long advocated that meshy networks in increasingly dense areas
use narrower channels.

https://www.usenix.org/system/files/conference/nsdi12/nsdi12-final142.pdf

was one of the better papers that went into it back in the day.


> Has anyone attempted to support this with ath9k?
>
> Thanks,
> Ben
>
> --
> Ben Greear 
> Candela Technologies Inc  http://www.candelatech.com
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/RFT 2/5] ath9k: use mac80211 intermediate software queues

2016-06-06 Thread Dave Taht

For the record, michal's lastest patchset for the ath10k is here:

https://github.com/kazikcz/linux/tree/fqmac-v5

which includes the reworked codel.h support (which also landed in
net-next as of april 22) (no, haven't tried it yet, I'm only a day
back from vacation)

... but it would pay to leverage rate control more, for the ath9k, and
I'd like folk to agree on a standardized set of statistics in a std
location that can be polled for all implementations (ath9k, ath10k,
mt76)


I am also reviewing this:

http://info.iet.unipi.it/~luigi/papers/20160511-mysched-preprint.pdf

as we have a chance to innovate and use less locking with all this
stuff happening at the mac80211 layer.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/RFT 2/5] ath9k: use mac80211 intermediate software queues

2016-06-06 Thread Dave Taht

On Sun, Jun 5, 2016 at 10:50 PM, Tim Shepard  wrote:
>
>
>> > Thanks.  I've gotten no other feedback that suggests anyone else has
>> > read the code.  So I very much appreciate it.
>>
>> I not only read it but tested it extensively against the ath9k stock,
>> ath10k stock, ath10k -michal's fqmac 3.5 patches, and your ath9k
>> patches...
>
> My patch to ath9k shouldn't have any effect on any of the ath10k
> stuff.  It only cuts ath9k over to use the new intermediate queues.

Like the Sith, there are always at least two wifi chips on a link. One
behaving significantly differently (like flow queuing and codeling and
striving for airtime fairness) does affect the behavior of the other.

While I did do some ath9k to ath9k testing, I was mostly testing the
"whole enchalada". There are many, many moving parts left to swap out.

And the patchset I was testing included your fq_codel port for ath9k
but it was based on codel5.h. Michal's latest stuff reworked mainline
codel to be generically usable, and it *is* a different variant of the
algorithm.

The airtime-fair stuff does not as yet include fq_codel on top. It's a
MPU, and still puzzling as to why it did not get closer to perfect
fairness.

I also fiddled with the idea of dynamically altering the beacon's txop
sizes. A really short best effort  txop (94) was "Interesting". I need
to take apart how more beacons are constructed from non-linux vendors.

The whole enchalada is tasting pretty good thus far.

> I should point out again that Avery's observation that michal's
> mac80211 flow queueing patches and mac80211 codel stuff aren't needed
> to the improvement between competing client stations.

To *2* competing client stations, this is somewhat true at present.
There are at least 5 other (pending) factors to factor in.

* Toke's preliminary airtime-fair patches already showed a net gain in
bandwidth for the higher rate competing station. The "performance
anomaly", identified way back in 2003, is still with us without also
striving for airtime fairness.
* In order to hold latencies low with > 2 stations active, I advocate
gradually using shorter txops, which will improve behavior for
stations doing interactive tasks, and offering service sooner to
infrequently seen stations.
* In order to do gang scheduling for mu-mimo, we need to have several
2-3ms sized queues outstanding for the devices that can be mu-mimoed.
* Getting minstrel-blues in there to sample more dynamically would be nice
* Reducing retries and retransmits would be nice when already
congested. I'd also like to try QosNoack.

> All that's
> needed is to use the new mac80211 per-station per-tid intermediate
> queues and also set the IFF_NO_QUEUE bit.

It's a heckofastart.

>
> For ath9k, Felix's mac80211 interemediate queues patch (already in
> mainline Linux over a year ago), my patch to ath9k, and just
> Michal's first patch alone "[PATCHv3 1/5] mac80211: skip netdev
> queue control with software queuing" should (and seems to in
> testing I've done so far) get all the latency improvement there is
> to be had when the competing traffic is to a different client
> station.

I think it can be shaved from the presently observed 7-12ms minimum at
160mbit by another 2-3x. Also the codel implementation is not as yet
as tightly controlling queue size as I'd like - I haven't pushed it as
hard at sub 20mbit performance as I'd like (coping with being enraged
at networkmanager's over-use of channel scans was what I was at, last)
but I'm showing queue depth of well over 25ms at that rate right now
on the ath9k patch.

>
>
>
>> After losing my temper at the impact of channel scans...
>>
>> ( https://plus.google.com/u/0/107942175615993706558/posts/WA915Pt4SRN
>> ), I got told how to get rid of them for testing, and started redoing
>> the work when I got back from vacation.
>
> Heh... I've never seen that problem.  But I'm not running
> network-manager on any machine in my testbed.

I tend to think it is important to measure what happens in the real
world, to clearly identify what the real world problems actually are.

I let everybody else, with attenuators, and emulators, do whatever
they want. Me, I'm perpetually setting up real-world labs like the
yurtlab and sflab and the isclab to try to see what happens in
practice.

I now plan to put some work in on making channel scans less nasty and
to also look into what it takes to implement  802.11k, 802.11r and
802.11v.

or at the very least, nag people to get it more right.

https://bugzilla.gnome.org/show_bug.cgi?id=766482

NetworkManager's author suggests here that

https://blogs.gnome.org/dcbw/2016/05/16/networkmanager-and-wifi-scans/

"You can also advocate that your favorite WiFi interface add support
for NetworkManager’s RequestScan()"

>
>> > I looked for a way to ask mac80211 if there are any packets left in
>> > the intermediate queue without dequeueing a packet and I failed to
>> > find such an interface.
>>
>> qdisc->peek like function?

Re: iwlwifi: mvm: add reorder buffer per queue

2016-05-16 Thread Dave Taht

I can't even describe how much I hate the concept of the reorder
buffer in general. Ordering is the endpoints problem.

Someday, after we get fq_codeled, short queues again, I'll be able to show why.

On Mon, May 16, 2016 at 4:41 AM, Luca Coelho  wrote:
> On Fri, 2016-05-13 at 11:54 +0300, Dan Carpenter wrote:
>> Hello Sara Sharon,
>>
>> The patch b915c10174fb: "iwlwifi: mvm: add reorder buffer per queue"
>> from Mar 23, 2016, leads to the following static checker warnings:
>>
>>   drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c:912
>> iwl_mvm_rx_mpdu_mq()
>>   error: potential NULL dereference 'sta'.
>>
>>   drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c:912
>> iwl_mvm_rx_mpdu_mq()
>>   error: we previously assumed 'sta' could be null (see line 796)
>
> Thanks for the analysis and report, Dan!
>
> I have queued a fix for this through our internal tree.
>
> --
> Cheers,
> Luca.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Limit rate of BCM34348 on Raspberry PI-3 via brcmfmac

2016-05-09 Thread Dave Taht

Ugh.

A somewhat dumb question is how would I disable bluetooth entirely on the rpi3?

I had done some initial tests on the rpi3 for the fq_codel on wifi
work and gave up due to dismal results on the flent tests. I hadn't
got around to writing them up here,

http://blog.cerowrt.org/tags/wifi/

but perhaps disabling bluetooth would help.



On Mon, May 9, 2016 at 9:59 AM, Barry Reinhold
 wrote:
> Arend (and Hante),
>
> I appreciate the feedback on the core problem - that brings the issue into a 
> lot sharper focus. Based on your response I assume we should be able to see 
> successful coexistence with Bluetooth and 802.11 STA mode, as well as STA and 
> AP (with Bluetooth turned off). However, BT with AP mode should fail, and BT 
> with AP and STA will fail.
>
> I will rerun some of our crude tests to see if I can corroborate your 
> understanding by testing the different "should coexist" and "should not 
> coexist" cases.
>
> You have a very interesting lead in  statement in your response (bottom of 
> message) but the sentence just ends with "because of...". Would you be 
> willing to complete that thought, I would like to further understand the 
> nature of the problem.
>
> Is there a known reason why the brcmfmac does not support the 
> set_bitrate_mask() callback (such as the associated family of chips do not 
> support rate limiting) or is this something that nobody has cared about to do 
> date, ie, is it something that could be done if there was interest and 
> resources?
>
> 
> From: Arend Van Spriel 
> Sent: Monday, May 9, 2016 8:23
> To: Barry Reinhold; linux-wireless@vger.kernel.org
> Cc: Tom Harada; brcm80211-dev-list
> Subject: Re: Limit rate of BCM34348 on Raspberry PI-3 via brcmfmac
>
> On 7-5-2016 21:24, Barry Reinhold wrote:
>> I have observed erratic behavior with http connectivity over the WiFi 
>> interface of the Raspberry PI 3. This appears to be consistent with issues 
>> that a number of other people have reported. I fear, but can not provide 
>> definitive evidence, that these failures could be an RF design/layout issue 
>> with the RP-3 itself.
>>
>> The purpose of this post is to see if this possible issue can be confirmed 
>> by others, and to seek a possible work around by re configuring the BCM43438 
>> chip via the brcmfmac driver; or the other associated wifi modules.
>>
>> How the issue is being seen:
>>
>> Note: The testing I have done is limited and has the potential to be 
>> misleading, so any input on improving the test process would be appreciated.
>>
>> There are two metrics we are using to define/see failure: (1) Loss/delay in 
>> ICMP Echo requests/replys (pings),  and (2) The output of messages in 
>> journalctl from the wpa_supplicant or hostapd (sudo journalctl -u 
>> wpa_supplicant -u hostapd -f) indicating a disconnect event with associated 
>> reason - typically 0 - (wlan0: CTRL-EVENT-DISCONNECTED 
>> bssid=60:02:92:cd:c9:30 reason=0).
>> Ping times vary from 1 to several hundred ms, to outright loss.
>> There are also failures to reassociate (wlan0: CTRL-EVENT-ASSOC-REJECT 
>> status_code=16).
>>
>> The test Environment is composed of:
>> An official  Raspberry PI-3 model B with an official Raspberry PI-3 power 
>> supply.
>> Raspbian release: Jesse (March 18)
>> Kernel 4.4.6
>> wpa_supplicant 2.3
>> brcmfmac 7.45.41.23 (as reported by ethool)
>> BCM43438 firmware: 01-cc44eda9c
>> BlueZ 5.23
>>
>> We are running both wpa_supplicant and hostapd, (disabling hostapd does not 
>> impact the results of the tests).
>> We have an application that is monitoring for BTLE/Bluetooth connections so 
>> it is scanning on a regular basis, as well as sending out Bluetooth INQUIRE 
>> messages.
>>
>> WiFi Access Points:
>> 1. Cisco DPC3939B (supports n)
>> 2. Cisco Linksys E1200 (supports n)
>> 3. Netgear WNDR3400 (supports n)
>> 4. Linksys WAP54G v3 (does not support n)
>>
>>
>>
>> Test Process
>>
>>
>> While the application is running (thus generating Bluetooth activity)
>> 1. Connect a PC to the RPi3's software access point and ping the RPi3 
>> continuously.
>> 2. Connect the RPi3 to an AP from the set above.
>> 3. Let the system run for 10 minutes while counting wpa_supplicant 
>> disconnects and lost pings.
>>
>> Observations:
>> In our testing we noticed that either we essentially got no errors, or we 
>> got 12+ errors. Some error rates high enough that we couldn't count them as 
>> they just scrolled off our screen. Hence we considered thing to work (less 
>> then 2 errors) or failed (greater than 10 errors).
>>
>> The results table for the different APs is as follows:
>> DPC3939B - Failed
>> E1200 - Failed
>> WNDR3400 - Failed
>> WAP54G - Passed
>>
>> Since only the WAP54G passed (no n support), we modified the data rate on 
>> the Netgear WND3400 and limited its data rate to 54 mbs, at this point the 
>> WNDR3400 passed.
>>
>> We then tried changing channels. this

Re: [PATCHv4 5/5] mac80211: add debug knobs for codel

2016-05-06 Thread Dave Taht

On Thu, May 5, 2016 at 11:33 PM, Michal Kazior <michal.kaz...@tieto.com> wrote:
> On 6 May 2016 at 07:51, Dave Taht <dave.t...@gmail.com> wrote:
>> On Thu, May 5, 2016 at 10:27 PM, Michal Kazior <michal.kaz...@tieto.com> 
>> wrote:
>>> On 5 May 2016 at 17:21, Dave Taht <dave.t...@gmail.com> wrote:
>>>> On Thu, May 5, 2016 at 4:00 AM, Michal Kazior <michal.kaz...@tieto.com> 
>>>> wrote:
>>>>> This adds a few debugfs entries to make it easier
>>>>> to test, debug and experiment.
>>>>
>>>> I might argue in favor of moving all these (inc the fq ones) into
>>>> their own dir, maybe "aqm" or "sqm".
>>>>
>>>> The mixture of read only stats and configuration vars is a bit confusing.
>>>>
>>>> Also in my testing of the previous patch, actually seeing the stats
>>>> get updated seemed to be highly async or inaccurate. For example, it
>>>> was obvious from the captures themselves that codel_ce_mark-ing was
>>>> happening, but the actual numbers out of wack with the mark seen or
>>>> fq_backlog seen.  (I can go back to revisit this)
>>>
>>> That's kind of expected since all of these bits are exposed as
>>> separate debugfs entries/files. To avoid that it'd be necessary to
>>> provide a single debugfs entry/file whose contents are generated on
>>> open() while holding local->fq.lock. But then you could argue it
>>> should contain all per-sta-tid info as well (backlog, flows, drops) as
>>> well instead of having them in netdev*/stations/*/txqs.
>>> Hmm..
>>
>> I have not had time to write up todays results to any full extent, but
>> they were pretty spectacular.
>>
>> I have a comparison of the baseline ath10k driver vs your 3.5 patchset
>> here on the second plot:
>>
>> http://blog.cerowrt.org/post/predictive_codeling/
>>
>> The raw data is here:
>> https://github.com/dtaht/blog-cerowrt/tree/master/content/flent/qca-10.2-fqmac35-codel-5
>
> It's probably good to explicitly mention that you test(ed) ath10k with
> my RFC DQL patch applied. Without it the fqcodel benefits are a lot
> less significant.

Yes. I am trying to establish a baseline before and after, starting at
the max rate my ath9k (2x2) can take the ath10k (2x2) at a distance of
about 12 feet. Without moving anything.

https://github.com/dtaht/blog-cerowrt/tree/master/content/flent/stock-4.4.1-22
has the baseline stats from that ubuntu 16.04 kernel...
 but the comparison plots I'd generated there were against the ct-10.1
firmware and before I'd realized you'd used the smaller quantum. Life
is *even* better with using the bigger quantum in the
qca-10.2-fqmac35-codel-5 patchset.

>
> (oh, and the "3.5" is pre-PATCHv4 before fq/codel split work:
> https://github.com/kazikcz/linux/tree/fqmac-v3.5 )

I have insufficient time in life to track any but the most advanced
patchset, and I am catching up as fast as I can. First up was finding
the max ath9k performance, (5x reduction in latency, no reduction in
throughput at about 110mbit).

Then I'll try locking the bitrate at say 24mbit for another run. You
already showed the latency reduction at 6mbit at about 100x to 1, so I
don't plan to repeat that.

then I'll get another ath10k 3x3 up and wash, rinse, repeat.

I would not mind if your patch 4.1 had good stats generation (maybe
put all the relevant stats in a single file?) and defaulted to quantum
1514, since it seems likely I'll not get done this first test run
before monday.

Additional test suggestions wanted? I plan to add the tcp_square_wave
tests to the next run to show how much better the congestion control
is, and I'll add iperf3 floods too.

I am not sure how avery is planning to test each individual piece.

>
>>
>> ...
>>
>> a note: quantum of the mtu (typically 1514) is a saner default than 300,
>>
>> (the older patch I had, set it to 300, dunno what your default is now).
>
> I still use 300.
>
>
>> and quantum 1514, codel target 5ms rather than 20ms for this test
>> series was *just fine* (but more testing of the lower target is
>> needed)
>
> I would keep 20ms for now until we get more test data. I'm mostly
> concerned about MU performance on ath10k which requires significant
> amount of buffering.

ok.

>
>> However:
>>
>> quantum "300" only makes sense for very, very low bandwidths (say <
>> 6mbits), in other scenarios it just eats extra cpu (5 passes through
>> the loop to send a big packet) and disables
>> the "new/old" queue feature which helps "push" new flows to flo

Re: [PATCHv4 5/5] mac80211: add debug knobs for codel

2016-05-05 Thread Dave Taht

On Thu, May 5, 2016 at 10:27 PM, Michal Kazior <michal.kaz...@tieto.com> wrote:
> On 5 May 2016 at 17:21, Dave Taht <dave.t...@gmail.com> wrote:
>> On Thu, May 5, 2016 at 4:00 AM, Michal Kazior <michal.kaz...@tieto.com> 
>> wrote:
>>> This adds a few debugfs entries to make it easier
>>> to test, debug and experiment.
>>
>> I might argue in favor of moving all these (inc the fq ones) into
>> their own dir, maybe "aqm" or "sqm".
>>
>> The mixture of read only stats and configuration vars is a bit confusing.
>>
>> Also in my testing of the previous patch, actually seeing the stats
>> get updated seemed to be highly async or inaccurate. For example, it
>> was obvious from the captures themselves that codel_ce_mark-ing was
>> happening, but the actual numbers out of wack with the mark seen or
>> fq_backlog seen.  (I can go back to revisit this)
>
> That's kind of expected since all of these bits are exposed as
> separate debugfs entries/files. To avoid that it'd be necessary to
> provide a single debugfs entry/file whose contents are generated on
> open() while holding local->fq.lock. But then you could argue it
> should contain all per-sta-tid info as well (backlog, flows, drops) as
> well instead of having them in netdev*/stations/*/txqs.
> Hmm..

I have not had time to write up todays results to any full extent, but
they were pretty spectacular.

I have a comparison of the baseline ath10k driver vs your 3.5 patchset
here on the second plot:

http://blog.cerowrt.org/post/predictive_codeling/

The raw data is here:
https://github.com/dtaht/blog-cerowrt/tree/master/content/flent/qca-10.2-fqmac35-codel-5

...

a note: quantum of the mtu (typically 1514) is a saner default than 300,

(the older patch I had, set it to 300, dunno what your default is now).

and quantum 1514, codel target 5ms rather than 20ms for this test
series was *just fine* (but more testing of the lower target is
needed)

However:

quantum "300" only makes sense for very, very low bandwidths (say <
6mbits), in other scenarios it just eats extra cpu (5 passes through
the loop to send a big packet) and disables
the "new/old" queue feature which helps "push" new flows to flow
balance. I'd default it to the larger value.




...

In other news, spacex just landed on the barge a few minutes ago.

The webcast is still going on
https://www.youtube.com/watch?v=L0bMeDj76ig and you can reverse it to
the landing.
:awesome:

>
>
> Michał



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv4 4/5] mac80211: implement codel on fair queuing flows

2016-05-05 Thread Dave Taht

On Thu, May 5, 2016 at 4:00 AM, Michal Kazior  wrote:
> There is no other limit other than a global
> packet count limit when using software queuing.
> This means a single flow queue can grow insanely
> long. This is particularly bad for TCP congestion
> algorithms which requires a little more
> sophisticated frame dropping scheme than a mere
> headdrop on limit overflow.
>
> Hence apply (a slighly modified, to fit the knobs)
> CoDel5 on flow queues. This improves TCP
> convergence and stability when combined with
> wireless driver which keeps its own tx queue/fifo
> at a minimum fill level for given link conditions.
>
> Signed-off-by: Michal Kazior 
> ---
>
> Notes:
> v4:
>  * removed internal codel.h and re-used in-kernel one
>
>  include/net/mac80211.h |  14 +-
>  net/mac80211/ieee80211_i.h |   5 +++
>  net/mac80211/tx.c  | 109 
> -
>  3 files changed, 126 insertions(+), 2 deletions(-)
>
> diff --git a/include/net/mac80211.h b/include/net/mac80211.h
> index ffb90dfe0d70..cc534f1b0f8e 100644
> --- a/include/net/mac80211.h
> +++ b/include/net/mac80211.h
> @@ -21,6 +21,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>
>  /**
> @@ -895,7 +896,18 @@ struct ieee80211_tx_info {
> unsigned long jiffies;
> };
> /* NB: vif can be NULL for injected frames */
> -   struct ieee80211_vif *vif;
> +   union {
> +   /* NB: vif can be NULL for injected frames */
> +   struct ieee80211_vif *vif;
> +
> +   /* When packets are enqueued on txq it's easy
> +* to re-construct the vif pointer. There's no
> +* more space in tx_info so it can be used to
> +* store the necessary enqueue time for packet
> +* sojourn time computation.
> +*/
> +   codel_time_t enqueue_time;
> +   };

Can't the skb->timestamp be used instead? (or does that still stomp on tcp)

(my longstanding dream of course has been to always timestamp coming
off the rx ring, and to not have to do it on entrance to the codel
enqueue routine here. It adds measuring total system processing time
to the queue measurement, allows for offloaded timestamping, etc, but
did involve changing all of linux to use it)

> struct ieee80211_key_conf *hw_key;
> u32 flags;
> /* 4 bytes free */
> diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
> index 6f8375f1df88..54edfb6fc1d1 100644
> --- a/net/mac80211/ieee80211_i.h
> +++ b/net/mac80211/ieee80211_i.h
> @@ -812,10 +812,12 @@ enum txq_info_flags {
>   * @tin: contains packets split into multiple flows
>   * @def_flow: used as a fallback flow when a packet destined to @tin hashes 
> to
>   * a fq_flow which is already owned by a different tin
> + * @def_cvars: codel vars for @def_flow
>   */
>  struct txq_info {
> struct fq_tin tin;
> struct fq_flow def_flow;
> +   struct codel_vars def_cvars;
> unsigned long flags;
>
> /* keep last! */
> @@ -1108,6 +1110,9 @@ struct ieee80211_local {
> struct ieee80211_hw hw;
>
> struct fq fq;
> +   struct codel_vars *cvars;
> +   struct codel_params cparams;
> +   struct codel_stats cstats;
>
> const struct ieee80211_ops *ops;
>
> diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
> index 47936b939591..013b382f6888 100644
> --- a/net/mac80211/tx.c
> +++ b/net/mac80211/tx.c
> @@ -25,6 +25,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include 
>  #include 
>
> @@ -1269,11 +1271,92 @@ static struct txq_info *ieee80211_get_txq(struct 
> ieee80211_local *local,
> return NULL;
>  }
>
> +static void ieee80211_set_skb_enqueue_time(struct sk_buff *skb)
> +{
> +   IEEE80211_SKB_CB(skb)->control.enqueue_time = codel_get_time();
> +}
> +
> +static void ieee80211_set_skb_vif(struct sk_buff *skb, struct txq_info *txqi)
> +{
> +   IEEE80211_SKB_CB(skb)->control.vif = txqi->txq.vif;
> +}
> +
> +static u32 codel_skb_len_func(const struct sk_buff *skb)
> +{
> +   return skb->len;
> +}
> +
> +static codel_time_t codel_skb_time_func(const struct sk_buff *skb)
> +{
> +   const struct ieee80211_tx_info *info;
> +
> +   info = (const struct ieee80211_tx_info *)skb->cb;
> +   return info->control.enqueue_time;
> +}
> +
> +static struct sk_buff *codel_dequeue_func(struct codel_vars *cvars,
> + void *ctx)
> +{
> +   struct ieee80211_local *local;
> +   struct txq_info *txqi;
> +   struct fq *fq;
> +

Re: [PATCHv4 5/5] mac80211: add debug knobs for codel

2016-05-05 Thread Dave Taht

On Thu, May 5, 2016 at 4:00 AM, Michal Kazior  wrote:
> This adds a few debugfs entries to make it easier
> to test, debug and experiment.

I might argue in favor of moving all these (inc the fq ones) into
their own dir, maybe "aqm" or "sqm".

The mixture of read only stats and configuration vars is a bit confusing.

Also in my testing of the previous patch, actually seeing the stats
get updated seemed to be highly async or inaccurate. For example, it
was obvious from the captures themselves that codel_ce_mark-ing was
happening, but the actual numbers out of wack with the mark seen or
fq_backlog seen.  (I can go back to revisit this)

>
> Signed-off-by: Michal Kazior 
> ---
>
> Notes:
> v4:
>  * stats adjustments (in-kernel codel has more of them)
>
>  net/mac80211/debugfs.c | 40 
>  1 file changed, 40 insertions(+)
>
> diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c
> index 43592b6f79f0..c7cfedc61fc4 100644
> --- a/net/mac80211/debugfs.c
> +++ b/net/mac80211/debugfs.c
> @@ -124,6 +124,15 @@ static const struct file_operations name## _ops = {  
>   \
> res;\
>  })
>
> +#define DEBUGFS_RW_BOOL(arg)   \
> +({ \
> +   int res;\
> +   int val;\
> +   res = mac80211_parse_buffer(userbuf, count, ppos, "%d", );  \
> +   arg = !!(val);  \
> +   res;\
> +})
> +
>  DEBUGFS_READONLY_FILE(fq_flows_cnt, "%u",
>   local->fq.flows_cnt);
>  DEBUGFS_READONLY_FILE(fq_backlog, "%u",
> @@ -132,6 +141,16 @@ DEBUGFS_READONLY_FILE(fq_overlimit, "%u",
>   local->fq.overlimit);
>  DEBUGFS_READONLY_FILE(fq_collisions, "%u",
>   local->fq.collisions);
> +DEBUGFS_READONLY_FILE(codel_maxpacket, "%u",
> + local->cstats.maxpacket);
> +DEBUGFS_READONLY_FILE(codel_drop_count, "%u",
> + local->cstats.drop_count);
> +DEBUGFS_READONLY_FILE(codel_drop_len, "%u",
> + local->cstats.drop_len);
> +DEBUGFS_READONLY_FILE(codel_ecn_mark, "%u",
> + local->cstats.ecn_mark);
> +DEBUGFS_READONLY_FILE(codel_ce_mark, "%u",
> + local->cstats.ce_mark);
>
>  DEBUGFS_RW_FILE(fq_limit,
> DEBUGFS_RW_EXPR_FQ("%u", >fq.limit),
> @@ -139,6 +158,18 @@ DEBUGFS_RW_FILE(fq_limit,
>  DEBUGFS_RW_FILE(fq_quantum,
> DEBUGFS_RW_EXPR_FQ("%u", >fq.quantum),
> "%u", local->fq.quantum);
> +DEBUGFS_RW_FILE(codel_interval,
> +   DEBUGFS_RW_EXPR_FQ("%u", >cparams.interval),
> +   "%u", local->cparams.interval);
> +DEBUGFS_RW_FILE(codel_target,
> +   DEBUGFS_RW_EXPR_FQ("%u", >cparams.target),
> +   "%u", local->cparams.target);
> +DEBUGFS_RW_FILE(codel_mtu,
> +   DEBUGFS_RW_EXPR_FQ("%u", >cparams.mtu),
> +   "%u", local->cparams.mtu);
> +DEBUGFS_RW_FILE(codel_ecn,
> +   DEBUGFS_RW_BOOL(local->cparams.ecn),
> +   "%d", local->cparams.ecn ? 1 : 0);
>
>  #ifdef CONFIG_PM
>  static ssize_t reset_write(struct file *file, const char __user *user_buf,
> @@ -333,6 +364,15 @@ void debugfs_hw_add(struct ieee80211_local *local)
> DEBUGFS_ADD(fq_collisions);
> DEBUGFS_ADD(fq_limit);
> DEBUGFS_ADD(fq_quantum);
> +   DEBUGFS_ADD(codel_maxpacket);
> +   DEBUGFS_ADD(codel_drop_count);
> +   DEBUGFS_ADD(codel_drop_len);
> +   DEBUGFS_ADD(codel_ecn_mark);
> +   DEBUGFS_ADD(codel_ce_mark);
> +   DEBUGFS_ADD(codel_interval);
> +   DEBUGFS_ADD(codel_target);
> +   DEBUGFS_ADD(codel_mtu);
> +   DEBUGFS_ADD(codel_ecn);
>
> statsd = debugfs_create_dir("statistics", phyd);
>
> --
> 2.1.4
>



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: General VHT rate-ctrl question

2016-04-13 Thread Dave Taht

On Wed, Apr 13, 2016 at 6:18 AM, Ben Greear  wrote:
>
>
> On 04/13/2016 01:01 AM, Johannes Berg wrote:
>>
>> On Tue, 2016-04-12 at 16:48 -0700, Ben Greear wrote:
>>>
>>> If a station and it's peer can both do VHT, is there ever a good
>>> reason to even try HT rates?
>>>
>>
>> Not really; perhaps if you could do HT greenfield preamble (which VHT
>> doesn't have) you could get something out of it, beyond that I don't
>> see a reason to try.
>>
>> Unless, for some strange reason, it supports only single stream VHT and
>> dual-stream HT or something really weird?
>
>
> I was wondering if there was ever a reason that, say 450Mbps HT
> would work better than MCS-1 for VHT.  Or, maybe a mid-rate HT MCS would
> have more range than VHT, or something like that.
>
> After fighting with the firmware's rate-ctrl all day, I am even more
> interested
> in trying to make it use mistrel_ht.

I just put up Andrew's old paper on minstrel, if that helps any.

http://blog.cerowrt.org/post/minstrel/

>
> Thanks,
> Ben
>
> --
> Ben Greear 
> Candela Technologies Inc  http://www.candelatech.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

ietf proposed standards for dscp to 802.11e mappings

2016-04-06 Thread Dave Taht

If anyone cares

https://datatracker.ietf.org/doc/draft-szigeti-tsvwg-ieee-802-11/?include_text=1

these will be discussed on the tsvwg mailing list.

https://www.ietf.org/mailman/listinfo/tsvwg

There are several things in here I object to - no discussion of
multicast, and arbitrarily dropping CS6 and CS7 on APs, but I figure
that will be sorted out eventually.

--
Dave Täht
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv2 1/2] mac80211: implement fair queuing per txq

2016-04-06 Thread Dave Taht

On Wed, Apr 6, 2016 at 12:21 AM, Johannes Berg
 wrote:
> [removing other lists since they spam me with moderation bounces]

I have added your email address be accepted to the codel,
make-wifi-fast lists. My apologies for the bounces.

The people on those lists generally do not have the time to tackle the
volume of traffic on linux-wireless.

>> The hope had been the original codel.h would have been reusable,
>> which is not the case at present.
>
> So what's the strategy for making it happen?

Strategy? to meander towards a result that gives low latency to all
stations, no matter their bandwidth, on several chipsets.

The holy grail from my viewpoint is to get airtime fairness, better
mac utilization, slow stations not starving fast ones, more stations
servicable, and so on, and my focus has generally been on having an
architecture that applied equally to APs and clients. Getting clients
alone to have a queuing latency reduction of these orders of magnitude
on uploads at low rates would be a huge win, but not the holy grail.

It was really nice to have michal's proof of concept(s) show up and
show fq_codel-like benefits at both low and high speeds on wifi, but
it is clear more architectural rework is required to fit the theory
into the reality.

> Unless there is one, I
> don't see the point in making the code more complicated than it already
> has to be anyway.

+1.

Next steps were - get toke's and my testbeds up - avery/tim/myself to
keep hammering at the ath9k - michal exploring dql - jonathon poking
at it with cake-like ideas - and anyone else that cares to join in on
finally fixing bufferbloat on wifi.

and maybe put together a videoconference in 2-3 weeks or so with where
we are stuck at (felix will be off vacation, too, I think). There are
still multiple points where we all talk past each other.

Me, for example, am overly fixated on having a per station queue to
start with (which in the case of a client is two stations - one
multicast/mgtmt frames and regular traffic) and not dealing with tids
until much later in the process. Unfortunately it seems the hook is
very late in the process.
>
> johannes
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv2 1/2] mac80211: implement fair queuing per txq

2016-04-05 Thread Dave Taht

thx for the review!

On Tue, Apr 5, 2016 at 6:57 AM, Johannes Berg  wrote:
> On Thu, 2016-03-31 at 12:28 +0200, Michal Kazior wrote:
>
>> +++ b/net/mac80211/codel.h
>> +++ b/net/mac80211/codel_i.h
>
> Do we really need all this code in .h files? It seems very odd to me to
> have all the algorithm implementation there rather than a C file, you
> should (can?) only include codel.h into a single C file anyway.

The hope had been the original codel.h would have been reusable, which
is not the case at present.

>
>>  struct txq_info {
>> - struct sk_buff_head queue;
>> + struct txq_flow flow;
>> + struct list_head new_flows;
>> + struct list_head old_flows;
>
> This is confusing, can you please document that? Why are there two
> lists of flows, *and* an embedded flow? Is the embedded flow on any of
> the lists?

To explain the new and old flow concepts, there's
https://tools.ietf.org/html/draft-ietf-aqm-fq-codel-06 which is in the
ietf editors queue for final publication and doesn't have a final name
yet.

The embedded flow concept is michal's and I'm not convinced it's the
right idea as yet.

>
>> + u32 backlog_bytes;
>> + u32 backlog_packets;
>> + u32 drop_codel;
>
> Would it make some sense to at least conceptually layer this a bit?
> I.e. rather than calling this "drop_codel" call it "drop_congestion" or
> something like that?

Is there a more generic place overall in ieee80211 to record per-sta
backlogs, drops and marks?

>> + skb = codel_dequeue(flow,
>> + >backlog,
>> + 0,
>> + >cvars,
>> + >cparams,
>> + codel_get_time(),
>> + false);
>
> What happened here? :)

Magic.

>
>> + if (!skb) {
>> + if ((head == >new_flows) &&
>> + !list_empty(>old_flows)) {
>> + list_move_tail(>flowchain, >old_flows);
>> + } else {
>> + list_del_init(>flowchain);
>> + flow->txqi = NULL;
>> + }
>> + goto begin;
>> + }
>
> Ouch. Any way you can make that easier to follow?

It made my brain hurt in the original code, too, but it is eric
optimizing out cycles at his finest.

if the the new_flows list is expired or done, switch to the old_flows
list, if the old_flows list is done, go try selecting another queue to
pull from (which may or may not exist). see the pending rfc for a more
elongated version.

>
> johannes
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] ath10k: implement dql for htt tx

2016-03-29 Thread Dave Taht

As a side note of wifi ideas complementary to codel, please see:

http://blog.cerowrt.org/post/selective_unprotect/

On Tue, Mar 29, 2016 at 12:49 AM, Michal Kazior <michal.kaz...@tieto.com> wrote:
> On 26 March 2016 at 17:44, Dave Taht <dave.t...@gmail.com> wrote:
>> Dear Michal:
> [...]
>> I am running behind on this patch set, but a couple quick comments.
> [...]
>>>  - no rrul tests, sorry Dave! :)
>>
>> rrul would be a good baseline to have, but no need to waste your time
>> on running it every time as yet. It stresses out both sides of the
>> link so whenever you get two devices with these driver changes on them
>> it would be "interesting". It's the meanest, nastiest test we have...
>> if you can get past the rrul, you've truly won.
>>
>> Consistently using tcp_fair_up with 1,2,4 flows and 1-4 stations as
>> you are now is good enough.
>>
>> doing a more voip-like test with slamming d-itg into your test would be 
>> good...
>>
>>>
>>> Observations / conclusions:
>>>  - DQL builds up throughput slowly on "veryfast"; in some tests it
>>> doesn't get to reach peak (roughly 210mbps average) because the test
>>> is too short
>>
>> It looks like having access to the rate control info here for the
>> initial and ongoing estimates will react faster and better than dql
>> can. I loved the potential here in getting full rate for web traffic
>> in the usual 2second burst you get it in (see above blog entries)
>
> On one hand - yes, rate control should in theory be "faster".
>
> On the other hand DQL will react also to host system interrupt service
> time. On slow CPUs (typically found on routers and such) you might end
> up grinding the CPU so much you need deeper tx queues to keep the hw
> busy (and therefore keep performance maxed). DQL should automatically
> adjust to that while "txop limit" might not.

Mmmm current multi-core generation arm routers should be fast enough.

Otherwise, point taken (possibly). Even intel i3 boxes need offloads to get to
line rate.


>>
>> It is always good to test codel and fq_codel separately, particularly
>> on a new codel implementation. There are so many ways to get codel
>> wrong or add an optimization that doesn't work (speaking as someone
>> that has got it wrong often)
>>
>> If you are getting a fq result of 12 ms, that means you are getting
>> data into the device with a ~12ms standing queue there. On a good day
>> you'd see perhaps 17-22ms for "codel target 5ms" in that case, on the
>> rtt_fair_up series of tests.
>
> This will obviously depend on the number of stations you have data
> queued to. Estimating codel target time requires smarter tx
> scheduling. My earlier (RFC) patch tried doing that.

and I loved it. ;)

>
>> if you are getting a pure codel result of 160ms, that means the
>> implementation is broken. But I think (after having read your
>> description twice), the baseline result today of 160ms of queuing was
>> with a fq_codel *qdisc* doing the work on top of huge buffers,
>
> Yes. The 160ms is with fq_codel qdisc with ath10k doing DQL at 6mbps.
> Without DQL ath10k would clog up all tx slots (1424 of them) with
> frames. At 6mbps you typically want/need a handful (5-10) of frames to
> be queued.
>
>> the
>> results a few days ago were with a fq_codel 802.11 layer, and the
>> results today you are comparing, are pure fq (no codel) in the 802.11e
>> stack, with fixed (and dql) buffering?
>
> Yes. codel target in fq_codel-in-mac80211 is hardcoded at 20ms now
> because there's no scheduling and hence no data to derive the target
> dynamically.

Well, for these simple 2 station tests, you could halve it, easily.

With ecn on on both sides, I tend to look at the groupings of the ecn
marks in wireshark.

>
>
>> if so. Yea! Science!
>>
>> ...
>>
>> One of the flaws of the flent tests is that conceptually they were
>> developed before the fq stuff won so big, and looking hard at the
>> per-queue latency for the fat flows requires either looking hard at
>> the packet captures or sampling the actual queue length. There is that
>> sampling capability in various flent tests, but at the moment it only
>> samples what tc provides (Drops, marks, and length) and it does not
>> look like there is a snapshot queue length exported from that ath10k
>> driver?
>
> Exporting tx queue length snapshot should be fairly easy. 2 debugfs
> entries for ar->htt.max_num_pending_tx and ar->htt.num_pending_tx.

K. Still running *way* behind you on getting stuff up and

Re: Bonjour mDNS broacast can be lost during BT-WLAN coexistence schemes?

2016-03-27 Thread Dave Taht

These folk claim an open source prototype.

http://www.sigcomm.org/sites/default/files/ccr/papers/2014/January/2567561-2567567.pdf
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] ath10k: implement dql for htt tx

2016-03-26 Thread Dave Taht

Dear Michal:

I commented on and put up your results for the baseline driver here:

http://blog.cerowrt.org/post/rtt_fair_on_wifi/

And the wonderful result you got for the first ever fq_codel-ish
implementation here:

http://blog.cerowrt.org/post/fq_codel_on_ath10k/

I am running behind on this patch set, but a couple quick comments.

On Fri, Mar 25, 2016 at 2:55 AM, Michal Kazior  wrote:
> On 25 March 2016 at 10:39, Michal Kazior  wrote:
>> This implements a very naive dynamic queue limits
>> on the flat HTT Tx. In some of my tests (using
>> flent) it seems to reduce induced latency by
>> orders of magnitude (e.g. when enforcing 6mbps
>> tx rates 2500ms -> 150ms). But at the same time it
>> introduces TCP throughput buildup over time
>> (instead of immediate bump to max). More
>> importantly I didn't observe it to make things
>> much worse (yet).
>>
>> Signed-off-by: Michal Kazior 
>> ---
>>
>> I'm not sure yet if it's worth to consider this
>> patch for merging per se. My motivation was to
>> have something to prove mac80211 fq works and to
>> see if DQL can learn the proper queue limit in
>> face of wireless rate control at all.
>>
>> I'll do a follow up post with flent test results
>> and some notes.
>
> Here's a short description what-is-what test naming:
>  - sw/fq contains only txq/flow stuff (no scheduling, no txop queue limits)
>  - sw/ath10k_dql contains only ath10k patch which applies DQL to
> driver-firmware tx queue naively
>  - sw/fq+ath10k_dql is obvious
>  - sw/base today's ath.git/master checkout used as base
>  - "veryfast" tests TCP tput to reference receiver (4 antennas)
>  - "fast" tests TCP tput to ref receiver (1 antenna)
>  - "slow" tests TCP tput to ref receiver (1 *unplugged* antenna)
>  - "fast+slow" tests sharing between "fast" and "slow"
>  - "autorate" uses default rate control
>  - "rate6m" uses fixed-tx-rate at 6mbps
>  - the test uses QCA9880 w/ 10.1.467
>  - no rrul tests, sorry Dave! :)

rrul would be a good baseline to have, but no need to waste your time
on running it every time as yet. It stresses out both sides of the
link so whenever you get two devices with these driver changes on them
it would be "interesting". It's the meanest, nastiest test we have...
if you can get past the rrul, you've truly won.

Consistently using tcp_fair_up with 1,2,4 flows and 1-4 stations as
you are now is good enough.

doing a more voip-like test with slamming d-itg into your test would be good...

>
> Observations / conclusions:
>  - DQL builds up throughput slowly on "veryfast"; in some tests it
> doesn't get to reach peak (roughly 210mbps average) because the test
> is too short

It looks like having access to the rate control info here for the
initial and ongoing estimates will react faster and better than dql
can. I loved the potential here in getting full rate for web traffic
in the usual 2second burst you get it in (see above blog entries)

>  - DQL shows better latency results in almost all cases compared to
> the txop based scheduling from my mac80211 RFC (but i haven't
> thoroughly looked at *all* the data; I might've missed a case where it
> performs worse)

Well, if you are not saturating the link, latency will be better.
Showing how much less latency is possible, is good too, but

>  - latency improvement seen on sw/ath10k_dql @ rate6m,fast compared to
> sw/base (1800ms -> 160ms) can be explained by the fact that txq AC
> limit is 256 and since all TCP streams run on BE (and fq_codel as the
> qdisc) the induced txq latency is 256 * (1500 / (6*1024*1024/8.)) / 4
> = ~122ms which is pretty close to the test data (the formula ignores
> MAC overhead, so the latency in practice is larger). Once you consider
> the overhead and in-flight packets on driver-firmware tx queue 160ms
> doesn't seem strange. Moreover when you compare the same case with
> sw/fq+ath10k_dql you can clearly see the advantage of having fq_codel
> in mac80211 software queuing - the latency drops by (another) order of
> magnitude because now incomming ICMPs are treated as new, bursty flows
> and get fed to the device quickly.

It is always good to test codel and fq_codel separately, particularly
on a new codel implementation. There are so many ways to get codel
wrong or add an optimization that doesn't work (speaking as someone
that has got it wrong often)

If you are getting a fq result of 12 ms, that means you are getting
data into the device with a ~12ms standing queue there. On a good day
you'd see perhaps 17-22ms for "codel target 5ms" in that case, on the
rtt_fair_up series of tests.

if you are getting a pure codel result of 160ms, that means the
implementation is broken. But I think (after having read your
description twice), the baseline result today of 160ms of queuing was
with a fq_codel *qdisc* doing the work on top of huge buffers, the
results a few days ago were with a fq_codel 802.11 layer, and the
results today

Re: [RFCv2 0/3] mac80211: implement fq codel

2016-03-22 Thread Dave Taht

We have a huge cc list on this thread, and admittedly this work does
cut across a great deal of wireless, potentially, but does netdev need
to be on it?
there's been nothing codel specific on it in a while, so I cut those
from the cc.

On Tue, Mar 22, 2016 at 1:05 AM, Michal Kazior <michal.kaz...@tieto.com> wrote:
> On 21 March 2016 at 18:10, Dave Taht <dave.t...@gmail.com> wrote:
>> thx.
>>
>> a lot to digest.
>>
>> A) quick notes on "flent-gui bursts_11e-2016-03-21T09*.gz"
>>
>> 1) the new bursts_11e test *should* have stuck stuff in the VI and VO
>> queues, and there *should* have been some sort of difference shown on
>> the plots with it. There wasn't.
>
> traffic-gen generates only BE traffic. Everything else runs UDP_RR
> which doesn't generate a lot of traffic.
>
>
>> For diffserv markings I used BE=CS0, BK=CS1, VI=CS5, and VO=EF.
>> CS6/CS7 should also land in VO (at least with the soft mac handler
>> last I looked). Is there a way to check if you are indeed exercising
>> all four 802.11e hardware queues in this test? in ath9k it is the
>> "xmit" sysfs var
>
> Hmm.. there are no txq stats. I guess it makes sense to have them?

ath9k xmit has been useful to capture. I'm kind of unconvinced those
stats are correct, at the moment, but...

> There is /sys/kernel/debug/ieee80211/phy*/fq which dumps state of all
> queues which will be mostly empty with UDP_RR. You can run netperf UDP
> stream with diffserv marking to see onto which tid they are mapped.
> You can see tid-AC mappings here:
> https://wireless.wiki.kernel.org/en/developers/documentation/mac80211/queues

We can try to capture those, but sampling summary per-station stats
ties back better to actual traffic analysis.

Also useful to capture has been the minstrel stats, the minstrel-blues
version provided these in a handy csv format.

> I just checked and EF ends up as tid5 which is VI. It's actually the
> same as CS5. You can use CS7 to run on tid7 which is VO.

The intent of CS6 is somewhat incompatible with VO's intent, but we
can argue diffserv's usefulness and mappings another day.

I have changed the bursts_11e test to use CS7, which will break
parsing our previous test runs' data, but actually test what I'd
intended to test in the first place.

>> 2) In all the old cases the BE UDP_RR flow died on the first burst
>> (why?), and the fullpatch preserved it.
>
> I think it's related to my setup which involves veth pairs. I use them
> to simulate bridging/AP behavior but maybe it's not doing the job
> right, hmm..
>
>
>> (I would have kind of hoped to
>> have seen the BK flow die, actually, in the fullpatch)
>
> There's no extra weight priority to BK. The difference between BE and
> BK in 802.11 is contention window access time so BK gets less txops
> statistically. Both share the same txop, which is 5.484ms in most
> cases.

Um, well, another day.

>
>> 3) I am also confused on 802.11ac - can VO aggregate? ( can't in in 802.11n).
>
> Yes, it should be albeit VI and VO have shorter txop compared to
> BE/BK: 3.008ms and 1.504ms respectively.

Not being able to aggregate in VO in n was a bad thing. There is an
awful lot I like about ac over n.

>
> UDP_RR doesn't really create a lot of opportunities for aggregation.
> If you want to see how different queues behave when loaded you'll need
> to modify traffic-gen and add bursts across different ACs in the
> bursts_11e test.

or flood the queues with other tests like rrul or toke's enhancement
to traffic-gen. :) I liked being able to arbitrarily mark udp packets
ecn capable...

>
>
> Michał
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: making fq_codel default

2016-03-22 Thread Dave Taht

On Tue, Mar 22, 2016 at 3:05 AM, Reinoud Koornstra
 wrote:
> Thanks, that answers my question.

Adding this to /etc/sysctl.conf or /etc/sysctl.d/bufferbloat.conf is
generally what we do

net.core.default_qdisc=fq_codel

A lot of us are running ecn by default and put in ecn on also:

net.ipv4.tcp_ecn=1

> Ok, so currently Cake didn't make it into the 4.4 kernel yet I noticed.
> Are there plans to add this or are there still many issues to be worked out?

The last major test series in late december showed issues with the
then codel implementation, which was presumably fixed, and then also
the new "triple queue isolation" code landed, which is still something
of a head scratcher to test for the problem it's trying to solve
(better per host fairness while still getting ).

It has been my hope to see cake mainlined for about 4 linux kernel
versions now, but the code has sprouted a great deal more
instrumentation that I, personally, would like (like nearly all the
stats below), and additional complexity, that may not be needed, it
still could use some fixes (particularly in GRO peeling), diffserv
modeling still has disputes, and performance analysis and tuning on
the kinds of hardware (arm, mips) it is intended for, especially at a
gbit.

It is being incorporated in a couple openwrt builds, could use more
eyeballs and testers, to firmly escape second system syndrome.
Discussions are held on the cake mailing list.

https://lists.bufferbloat.net/pipermail/cake/2015-December/001755.html

I hope we can resume major testing on it again by the end of april.

> In the mean time I followed some instructions to build the module and
> iproute2 for cake.

Comparison tests on your workloads and your specific devices between
pfifo, fq_codel, and cake are welcomed, but I am pessimistic about any
effects without a bql-like layer underneath it. The consensus
generally is that while some cake-like algos might apply to wifi, the
work needs to happen at the ieee80211 layer rather than the qdisc
layer.

The principal use case for cake for wifi has been in analyzing codel's
behavior in the face of shifting rates on

tc change dev wifi0 root cake bandwidth 10mbit
sleep 2
tc change dev wifi0 root cake bandwidth 50mbit
sleep 2
tc change dev wifi0 root cake bandwidth 20mbit

> sudo tc qdisc add  dev wlp4s0 root cake (iwlwifi)
>
> reinoud@router-dev:~/Downloads/linux-4.4.5/net/sched$ sudo tc -s qdisc show
>
> qdisc cake 8002: dev wlp4s0 root refcnt 5 unlimited diffserv4 flows
> rtt 100.0ms raw
> Sent 71025 bytes 516 pkt (dropped 0, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
> memory used: 0b of 15140Kb
> capacity estimate: 0bit
> Tin 0   Tin 1   Tin 2   Tin 3
>  thresh0bit0bit0bit0bit
>  target   5.0ms   5.0ms   5.0ms   5.0ms
> interval 100.0ms 100.0ms 100.0ms 100.0ms
> Pk-delay 0us 0us 0us 0us
> Av-delay 0us 0us 0us 0us
> Sp-delay 0us 0us 0us 0us
>  pkts 0   0   0   0
>  bytes0   0   0   0
> way-inds   0   0   0   0
> way-miss   0   0   0   0
> way-cols   0   0   0   0
>  drops0   0   0   0
>  marks0   0   0   0
> Sp-flows   0   0   0   0
> Bk-flows   0   0   0   0
> last-len   0   0   0   0
> max-len0   0   0   0

Groovy. Go pound it flat with some tests and let us know if any of
these statistics are useful to you.

> On Tue, Mar 22, 2016 at 3:43 AM, Matthias May  
> wrote:
>> On 22/03/16 10:37, Reinoud Koornstra wrote:
>>>
>>> Hi Everyone,
>>>
>>> Everytime I boot I need to set fq_codel for my wireless interface:
>>>
>>> sudo tc qdisc add   dev wlp4s0 root fq_codel
>>>
>>> I also need to sudo sysctl -w net.core.default_qdisc=fq_codel
>>>
>>> Is there a good way to have this as the default in the kernel config
>>> instead of pfifo?
>>> Also, are there plans for cake support or do fq_codel in this case mean
>>> cake?
>>> Thanks,
>>>
>>> Reinoud.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-wireless"
>>> in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>> You might want to take a look at the patches in openwrt.
>> Specifically this one:
>> https://dev.openwrt.org/browser/trunk/target/linux/generic/patches-4.4/662-use_fq_codel_by_default.patch
>>
>> Best regards
>> Matthias
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to majord...@vger.kernel.org
>

Re: [RFCv2 0/3] mac80211: implement fq codel

2016-03-19 Thread Dave Taht

On Thu, Mar 17, 2016 at 1:55 AM, Michal Kazior <michal.kaz...@tieto.com> wrote:

> I suspect the BK/BE latency difference has to do with the fact that
> there's bulk traffic going on BE queues (this isn't reflected
> explicitly in the plots). The `bursts` flent test includes short
> bursts of traffic on tid0 (BE) which is shared with ICMP and BE UDP_RR
> (seen as green and blue lines on the plot). Due to (intended) limited
> outflow (6mbps) BE queues build up and don't drain for the duration of
> the entire test creating more opportunities for aggregating BE traffic
> while other queues are near-empty and very short (time wise as well).

I agree with your explanation. Access to the media and queue length
are the two variables at play here.

I just committed a new flent test that should exercise the vo,vi,be,
and bk queues, "bursts_11e". I dropped the conventional ping from it
and just rely on netperf's udp_rr for each queue. It seems to "do the
right thing" on the ath9k

And while I'm all in favor of getting 802.11e's behaviors more right,
and this seems like a good way to get there...

netperf's udp_rr is not how much traffic conventionally behaves. It
doesn't do tcp slow start or congestion control in particular...

In the case of the VO queue, for example, the (2004) intended behavior
was 1 isochronous packet per 10ms per voice sending station and one
from the ap, not a "ping". And at the time, VI was intended to be
unicast video. TCP was an afterthought. (wifi's original (1993) mac
was actually designed for ipx/spx!)

I long for regular "rrul" and "rrul_be" tests against the new stuff to
blow it up thoroughly as references along the way.
(tcp_upload, tcp_download, (and several of the rtt_fair tests also
between stations)). Will get formal about it here as soon as we end up
on the same kernel trees

Furthermore 802.11e is not widely used - in particular, not much
internet bound/sourced traffic falls into more than BE and BK,
presently. and in some cases weirder - comcast remarks a very large
percentage of to the home inbound traffic as CS1 (BK), btw, and
stations tend to use CS0. Data comes in on BK, acks go out on BE.

I/we will try to come up with intermediate tests between the burst
tests and the rrul tests as we go along the way.

> If you consider Wi-Fi is half-duplex and latency in the entire stack

In the context of this test regime...

Saying wifi is "half"-duplex is a misleading way to think about it in
many respects. it is a shared medium more like early, non-switched
ethernet, with a weird mac that governs what sort of packets get
access to (a txop) the medium first, across all stations co-operating
within EDCA.

Half or full duplex is something that mostly applied to p2p serial
connections (or p2p wifi), not P2MP. Additionally characteristics like
exponential backoff make no sense were wifi any form of duplex, full
or half.

Certainly much stuff within a txop (block acks for example) can be
considered half duplex in a microcosmic context.

I wish we actually had words that accurately described wifi's actual behavior.

> (for processing ICMP and UDP_RR) is greater than 11e contention window
> timings you can get your BE flow responses with extra delay (since
> other queues might have responses ready quicker).

yes. always having a request pending for each of the 802.11e queues is
actually not the best idea, it is better to take advantage of better
aggregation afforded by 802.11n/ac, to only have one or two of the
queues in use against any given station and promote or demote traffic
into a more-right queue.

simple example of the damage having all 4 queues always contending is
exemplified by running the rrul and rrul_be tests against nearly any
given AP.

>
> I've modified traffic-gen and re-run tests with bursts on all tested
> tids/ACs (tid0, tid1, tid5). I'm attaching the results.
>
> With bursts on all tids you can clearly see BK has much higher latency than 
> BE.

The long term goal here, of course, is for BK (or the other queues) to
not have seconds of queuing latency but something more bounded to 2x
media access time...

> (Note, I've changed my AP to QCA988X with oldie firmware 10.1.467 for
> this test; it doesn't have the weird hiccups I was seeing on QCA99X0
> and newer QCA988X firmware reports bogus expected throughput which is
> most likely a result of my sloppy proof-of-concept change in ath10k).

So I should avoid ben greer's firmware for now?

>
>
> Michał
>
> On 16 March 2016 at 20:48, Jasmine Strong <j...@eero.com> wrote:
>> BK usually has 0 txop, so it doesn't do aggregation.
>>
>> On Wed, Mar 16, 2016 at 11:55 AM, Bob Copeland <m...@bobcopeland.com> wrote:
>>>
>>> On Wed, Mar 16, 2016 at 11:36:31AM -0700, Dave Taht wrote:
>>> > That is the sanest 802.11e queu

Re: [RFC/RFT] mac80211: implement fq_codel for software queuing

2016-03-10 Thread Dave Taht

>> regular fq_codel uses 1024 and there has not been much reason to
>> change it. In the case of an AP which has more limited memory, 256 or
>> 1024 would be a good setting, per station. I'd stick to 1024 for now.
>
> Do note that the 4096 is shared _across_ station-tid queues. It is not
> per-station. If you have 10 stations you still have 4096 flows
> (actually 4096 + 16*10, because each tid - and there are 16 - has it's
> own fallback flow in case of hash collision on the global flowmap to
> maintain per-sta-tid queuing).

I have to admit I didn't parse this well - still haven't, I think I
need to draw. (got a picture?)

Where is this part happening in the code (or firmware?)

" because each tid - and there are 16 - has it's
 own fallback flow in case of hash collision on the global flowmap to
 maintain per-sta-tid queuing"

"fallback flow - hash collision on global flowmap" - huh?

> With that in mind do you still think 1024 is enough?

Can't answer that question without understanding what you said above.

I assembled a few of the patches to date (your fq_codel patch, avery's
and tims ath9k stuff) and tested them, to no measurable effect,
against linus's tree a day or two back. I also acquired an ath10k card
- would one of these suit?

http://www.amazon.com/gp/product/B011SIMFR8?psc=1=true_=oh_aui_detailpage_o08_s00
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

per sta queuing - the ath9k statistics

2016-03-07 Thread Dave Taht

I have put together some of the patches for fq_codel and per-station
queuing inside the mac80211 portion of the stack flying around on
linux-wireless, to no real visible effect as yet.

Mostly testing uploads at the moment, from an x86 based client. It's
not clear if I have the code path enabled, either, nor how to check,
from userspace. (?) Topology is

x86 <-wifi-> wndr3800 <-ethernet-> pi

Latency is still poor, throughput is down slightly. I will start
printk-ing tomorrow.

I do have a few puzzling things

A) re the ath9k statistics

At the client (ubuntu x86, 4.5-rc7 + patches, Atheros AR5418 Wireless
Network Adapter [AR5008E ) I see

http://pastebin.com/rvKJnc1y

AMPDUs Queued HW:0  0 0 0
AMPDUs Queued SW:0  0 0 0
AMPDUs Completed:  1098389   7050 14967 0

At the AP (cerowrt 3.10.50) I see

http://pastebin.com/RTt7MNT6

AMPDUs Queued SW:  3009455 364214557331 0
AMPDUs Completed:  2961055 363353556982 0
AMPDUs Retried: 115311   7833 21489 0

In both cases the TX-Pkts-all is close to the  AMPDUs completed figure.

B) In the regular packet captures I see no tcp losses. I can see in an
aircap wifi retrying for every lost packet.

C) I do not see any ECN marks (presumably would be generated by the
codel implementation.)

D) I do see things nicely "fq"'d on the captures but that might be by
cerowrt rather than the ieee mac

E) tc qdisc show dev wlp2s0 shows the tc layer qdisc disabled

qdisc noqueue 0: root refcnt 2

which does imply that at least part of the new codepath is working,
but there are no stats out of that side yet...

Ah, well, at least the patchset compiled and didn't crash the box.

--
Dave Täht
Let's go make home routers and wifi faster! With better software!
https://www.gofundme.com/savewifi
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/RFT] mac80211: implement fq_codel for software queuing

2016-03-07 Thread Dave Taht

 *txq = sta->sta.txq[tid];
> +   struct ieee80211_sub_if_data *sdata;
> +   struct ieee80211_fq *fq;
> struct txq_info *txqi;
>
> if (!txq)
> return;
>
> txqi = to_txq_info(txq);
> +   sdata = vif_to_sdata(txq->vif);
> +   fq = >local->fq;
>
> /* Lock here to protect against further seqno updates on dequeue */
> -   spin_lock_bh(>queue.lock);
> +   spin_lock_bh(>lock);
> set_bit(IEEE80211_TXQ_STOP, >flags);
> -   spin_unlock_bh(>queue.lock);
> +   spin_unlock_bh(>lock);
>  }
>
>  static void
> diff --git a/net/mac80211/codel.h b/net/mac80211/codel.h
> new file mode 100644
> index ..f6f1b9b73a9a
> --- /dev/null
> +++ b/net/mac80211/codel.h
> @@ -0,0 +1,260 @@
> +#ifndef __NET_MAC80211_CODEL_H
> +#define __NET_MAC80211_CODEL_H
> +
> +/*
> + * Codel - The Controlled-Delay Active Queue Management algorithm
> + *
> + *  Copyright (C) 2011-2012 Kathleen Nichols <nich...@pollere.com>
> + *  Copyright (C) 2011-2012 Van Jacobson <v...@pollere.net>
> + *  Copyright (C) 2016 Michael D. Taht <dave.t...@bufferbloat.net>
> + *  Copyright (C) 2012 Eric Dumazet <eduma...@google.com>
> + *  Copyright (C) 2015 Jonathan Morton <chromati...@gmail.com>
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1. Redistributions of source code must retain the above copyright
> + *notice, this list of conditions, and the following disclaimer,
> + *without modification.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + *notice, this list of conditions and the following disclaimer in the
> + *documentation and/or other materials provided with the distribution.
> + * 3. The names of the authors may not be used to endorse or promote products
> + *derived from this software without specific prior written permission.
> + *
> + * Alternatively, provided that this notice is retained in full, this
> + * software may be distributed under the terms of the GNU General
> + * Public License ("GPL") version 2, in which case the provisions of the
> + * GPL apply INSTEAD OF those given above.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> + * DAMAGE.
> + *
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "codel_i.h"
> +
> +/* Controlling Queue Delay (CoDel) algorithm
> + * =
> + * Source : Kathleen Nichols and Van Jacobson
> + * http://queue.acm.org/detail.cfm?id=2209336
> + *
> + * Implemented on linux by Dave Taht and Eric Dumazet
> + */
> +
> +/* CoDel5 uses a real clock, unlike codel */
> +
> +static inline codel_time_t codel_get_time(void)
> +{
> +   return ktime_get_ns();
> +}
> +
> +static inline u32 codel_time_to_us(codel_time_t val)
> +{
> +   do_div(val, NSEC_PER_USEC);
> +   return (u32)val;
> +}
> +
> +/* sizeof_in_bits(rec_inv_sqrt) */
> +#define REC_INV_SQRT_BITS (8 * sizeof(u16))
> +/* needed shift to get a Q0.32 number from rec_inv_sqrt */
> +#define REC_INV_SQRT_SHIFT (32 - REC_INV_SQRT_BITS)
> +
> +/* Newton approximation method needs more iterations at small inputs,
> + * so cache them.
> + */
> +
> +static void codel_vars_init(struct codel_vars *vars)
> +{
> +   memset(vars, 0, sizeof(*vars));
> +}
> +
> +/*
> + * 
> http://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Iterative_methods_for_reciprocal_square_roots
> + * new_invsqrt = (invsqrt / 2) * (3 - count * invsqrt^2)
> + *
> + * Here, invsqrt is a fixed point number (< 1.0), 32bit mantissa, aka Q0.32
> + */
> +static inline void codel_Newton_step(struct codel_vars *vars)
>

Re: [RFC/RFT] mac80211: implement fq_codel for software queuing

2016-03-07 Thread Dave Taht

On Mon, Mar 7, 2016 at 9:14 AM, Avery Pennarun <apenw...@gmail.com> wrote:
> On Mon, Mar 7, 2016 at 11:54 AM, Dave Taht <dave.t...@gmail.com> wrote:
>> If I can just get a coherent patch set that I can build, I will gladly
>> join you on the testing ath9k in particular... can probably do ath10k,
>> too - and do a bit of code review... this week. it's very exciting
>> watching all this activity...
>>
>> but I confess that I've totally lost track of what set of trees and
>> patchwork I should try to pull from. wireless-drivers-next? ath10k?
>> wireless-next? net-next? toke and I have a ton of x86 platforms
>> available to test on
>>
>> Avery - which patches did you use??? on top of what?
>
> The patch series I'm currently using can be found here:
>
>   git fetch https://gfiber.googlesource.com/vendor/opensource/backports
> ath9k_txq+fq_codel

No common commits, but ok, thx for a buildable-looking tree.

d@dancer:~/git/linux$ git clone -b ath9k_txq+fq_codel --reference
net-next https://gfiber.googlesource.com/vendor/opensource/backports
Cloning into 'backports'...
warning: no common commits
remote: Sending approximately 30.48 MiB ...
remote: Counting objects: 4758, done
remote: Finding sources: 100% (5/5)
remote: Total 19312 (delta 12999), reused 19308 (delta 12999)
Receiving objects: 100% (19312/19312), 30.48 MiB | 6.23 MiB/s, done.
Resolving deltas: 100% (12999/12999), done.


>
> That's again backports-20160122, which comes from linux-next as of
> 20160122.  You can either build backports against whatever kernel
> you're using (probably easiest) or try to use that version of
> linux-next, or rebase the patches onto your favourite kernel.
>
>> In terms of "smoothing" codel...
>>
>> I emphatically do not think codel in it's current form is "ready" for
>> wireless, at the very least the target should not be much lower than
>> 20ms in your 2 station tests.  There is another bit in codel where the
>> algo "turns off" with only a single MTU's worth of packets
>> outstanding, which could get bumped to the ideal size of the
>> aggregate. "ideal" kind of being a variable based on a ton of other
>> factors...
>
> Yeah, I figured that sort of thing would come up.  I'm feeling forward
> progress just by finally seeing the buggy oscillations finally happen,
> though. :)

It's *very* exciting to see y'all break things in a measurable, yet
positive direction.

>
>> the underlying code needs to be striving successfully for per-station
>> airtime fairness for this to work at all, and the driver/card
>> interface nearly as tight as BQL is for the fq portion to behave
>> sanely. I'd configure codel at a higher target and try to observe what
>> is going on at the fq level til that got saner.
>
> That seems like two good goals.  So Emmanuel's BQL-like thing seems
> like we'll need it soon.
>
> As for per-station airtime fairness, what's a good approximation of
> that?  Perhaps round-robin between stations, one aggregate per turn,
> where each aggregate has a maximum allowed latency?

Strict round robin is a start, and simplest, yes. Sure.

"Oldest station queues first" on a round (probably) has higher
potential for maximizing txops, but requires more overhead. (shortest
queue first would be bad). There's another algo based on last received
packets from a station possibly worth fiddling with in the long run...

as "maximum allowed latency" - well, to me that is eventually also a
variable, based on the number of stations that have to be scheduled on
that round. Trying to get away from 10 stations eating 5.7ms each +
return traffic on a round would be nicer. If you want a constant, for
now, aim for 2048us or 1TU.

> I don't know how
> the current code works, but it's probably almost like that, as long as
> we only put one aggregate's worth of stuff into each hwq, which I
> guess is what the BQL-like thing will do.

I would avoid trying to think about or using 802.11e's 4 queues at the
moment[1]. We also have fallout from mu-mimo to deal with, eventually,
also, but gang scheduling starts to fall out naturally from these
structures and methods...

>
> So if I understand correctly, what we need is, in the following order:
> 1) Disable fq_codel for now, and get BQL-like thing working in ath9k
> (and ensure we're getting airtime fairness even without fq_codel);
> 2) Re-enable fq_codel and increase fq_codel's target up to 20ms for now;
> 3) Tweak fq_codel's "turn off" size to be larger (how important is this?)
>
> Is that right?

Sounds good. I have not reviewed the codel5 based implementation, it
may not even have idea "#3" in it at the moment at all.  The relevant
line in  codel.h i

Re: [RFC/RFT] mac80211: implement fq_codel for software queuing

2016-03-07 Thread Dave Taht

If I can just get a coherent patch set that I can build, I will gladly
join you on the testing ath9k in particular... can probably do ath10k,
too - and do a bit of code review... this week. it's very exciting
watching all this activity...

but I confess that I've totally lost track of what set of trees and
patchwork I should try to pull from. wireless-drivers-next? ath10k?
wireless-next? net-next? toke and I have a ton of x86 platforms
available to test on

Avery - which patches did you use??? on top of what?

In terms of "smoothing" codel...

I emphatically do not think codel in it's current form is "ready" for
wireless, at the very least the target should not be much lower than
20ms in your 2 station tests.  There is another bit in codel where the
algo "turns off" with only a single MTU's worth of packets
outstanding, which could get bumped to the ideal size of the
aggregate. "ideal" kind of being a variable based on a ton of other
factors...

the underlying code needs to be striving successfully for per-station
airtime fairness for this to work at all, and the driver/card
interface nearly as tight as BQL is for the fq portion to behave
sanely. I'd configure codel at a higher target and try to observe what
is going on at the fq level til that got saner.

There are so many other variables and things left unaccounted for, as yet.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC/RFT] mac80211: implement fq_codel for software queuing

2016-03-03 Thread Dave Taht

On Tue, Mar 1, 2016 at 11:38 PM, Michal Kazior  wrote:
> On 1 March 2016 at 15:02, Johannes Berg  wrote:
>> On Fri, 2016-02-26 at 14:09 +0100, Michal Kazior wrote:
>>>
>>> +typedef u64 codel_time_t;
>>
>> Do we really need this? And if yes, does it have to be in the public
>> header file? Why a typedef anyway?
>
> Hmm.. I don't think it's strictly necessary. I just wanted to keep as
> much from the original codel implementation as possible. I'm fine with
> using just u64.

This is an artifact of the original codel keeping time in (nsec >> 10)
to fit into a 32 bit int.

In codel5 we switched to native 64 bit timekeeping as simpler, to
improve logging and reason about.

u64 is fine.

>
>>> - * @txq_ac_max_pending: maximum number of frames per AC pending in
>>> all txq
>>> - *   entries for a vif.
>>> + * @txq_cparams: codel parameters to control tx queueing dropping
>>> behavior
>>> + * @txq_limit: maximum number of frames queuesd
>>
>> typo - queued
>>
>>> @@ -2133,7 +2155,8 @@ struct ieee80211_hw {
>>>   u8 uapsd_max_sp_len;
>>>   u8 n_cipher_schemes;
>>>   const struct ieee80211_cipher_scheme *cipher_schemes;
>>> - int txq_ac_max_pending;
>>> + struct codel_params txq_cparams;
>>
>> Should this really be a parameter the driver determines?
>
> I would think so, or at least it should be able to influence it in
> *some* way. You can have varying degree of induced latency depending
> on fw/hw tx queue depth, air conditions and possible tx rates implying
> different/varying RTT. Cake[1] even has a few RTT presets like: lan,
> internet, satellite.

While those presets have been useful in testing codel and (more
generically in cake we can rapidly change the bandwidth from userspace
for testing), in the real world you don't move from orbit to desktop
and back as rapidly as wifi does.

> I don't really have a plan how exactly a driver could make use of it
> for benefit though. It might end up not being necessary after all if
> generic tx scheduling materializes in mac80211.

What we envisioned here is ewma smoothing the target based on the
total service time needed for all active stations, per round. (There
are other possible approaches)

Say you serve 10 stations at 1ms each in one round, a codel target of
5ms will try to push things down too far.  If in the next round, you
only serve 2 stations at 1ms each (but get back 10 responses at .5ms
each), you're still too high. If it's just one station, well, you can
get below 2ms if the driver is only sending 1ms, but maybe it's
sending 5ms...

If you have a large multicast burst, that burp will cause lost packets.

Merely getting typical wifi latencies under load down below the 20ms
range would be a good start, after that some testing, hard thought,
and evaluation are going to be needed. for early testing I think a
20ms fixed target would be safer than the existing 5ms.

Pushing the fq part of fq_codel on a per station basis as close to the
hardware as possible, and having better airtime fairness between
stations is a huge win in itself.





>
> [1]: http://www.bufferbloat.net/projects/codel/wiki/Cake
>
>
>>> +static void ieee80211_if_setup_no_queue(struct net_device *dev)
>>> +{
>>> + ieee80211_if_setup(dev);
>>> + dev->priv_flags |= IFF_NO_QUEUE;
>>> + /* Note for backporters: use dev->tx_queue_len = 0 instead
>>> of IFF_ */
>>
>> Heh. Remove that comment; we have an spatch in backports already :)
>
> I've put it in the RFC specifically in case anyone wants to port this
> bypassing backports, e.g. to openwrt's quilt (so when compilation
> fails, you can quickly fix it up). I'll remove it before proper
> submission obviously :)
>
>
>>> --- a/net/mac80211/sta_info.h
>>> +++ b/net/mac80211/sta_info.h
>>> @@ -19,6 +19,7 @@
>>>  #include 
>>>  #include 
>>>  #include "key.h"
>>> +#include "codel_i.h"
>>>
>>>  /**
>>>   * enum ieee80211_sta_info_flags - Stations flags
>>> @@ -327,6 +328,32 @@ struct mesh_sta {
>>>
>>>  DECLARE_EWMA(signal, 1024, 8)
>>>
>>> +struct txq_info;
>>> +
>>> +/**
>>> + * struct txq_flow - per traffic flow queue
>>> + *
>>> + * This structure is used to distinguish and queue different traffic
>>> flows
>>> + * separately for fair queueing/AQM purposes.
>>> + *
>>> + * @txqi: txq_info structure it is associated at given time
>>> + * @flowchain: can be linked to other flows for RR purposes
>>> + * @backlogchain: can be linked to other flows for backlog sorting
>>> purposes
>>> + * @queue: sk_buff queue
>>> + * @cvars: codel state vars
>>> + * @backlog: number of bytes pending in the queue
>>> + * @deficit: used for fair queueing balancing
>>> + */
>>> +struct txq_flow {
>>> + struct txq_info *txqi;
>>> + struct list_head flowchain;
>>> + struct list_head backlogchain;
>>> + struct sk_buff_head queue;
>>> + struct codel_vars cvars;
>>> + u32 backlog;
>>> + u32 deficit;
>>> +};
>>> +
>>>  /**
>>>   * struct sta_info - STA information

Re: [PATCH] mac80211: fix AP buffered multicast frames with queue control and txq

2016-03-03 Thread Dave Taht

On Thu, Mar 3, 2016 at 7:14 AM, Johannes Berg <johan...@sipsolutions.net> wrote:
> On Sun, 2016-02-28 at 09:35 -0800, Dave Taht wrote:
>> On Sun, Feb 28, 2016 at 6:19 AM, Felix Fietkau <n...@openwrt.org>
>> wrote:
>> > Buffered multicast frames must be passed to the driver directly via
>> > drv_tx instead of going through the txq, otherwise they cannot
>> > easily be
>> > scheduled to be sent after DTIM.
>> >
>> > Signed-off-by: Felix Fietkau <n...@openwrt.org>
>> > ---
>> >  net/mac80211/tx.c | 3 ++-
>> >  1 file changed, 2 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
>> > index 3a7475f..b294820 100644
>> > --- a/net/mac80211/tx.c
>> > +++ b/net/mac80211/tx.c
>> > @@ -1247,7 +1247,8 @@ static void ieee80211_drv_tx(struct
>> > ieee80211_local *local,
>> > struct txq_info *txqi;
>> > u8 ac;
>> >
>> > -   if (info->control.flags & IEEE80211_TX_CTRL_PS_RESPONSE)
>> > +   if ((info->flags & IEEE80211_TX_CTL_SEND_AFTER_DTIM) ||
>> > +   (info->control.flags & IEEE80211_TX_CTRL_PS_RESPONSE))
>> > goto tx_normal;
>> >
>> > if (!ieee80211_is_data(hdr->frame_control))
>> > --
>> > 2.2.2
>>
>> I would like
>
> Feel free to propose patches for anything you like :)

At the moment all I can do is cheer people on, and try to assemble
enough gear to test comprehensively when enough patches have landed in
your tree... Go, felix! Go, Michal! Go Mohammed! Go, Ben! Go Tim! Go,
Emmanuel! Go Johannes!


>
> johannes
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] mac80211: fix AP buffered multicast frames with queue control and txq

2016-02-28 Thread Dave Taht

On Sun, Feb 28, 2016 at 6:19 AM, Felix Fietkau  wrote:
> Buffered multicast frames must be passed to the driver directly via
> drv_tx instead of going through the txq, otherwise they cannot easily be
> scheduled to be sent after DTIM.
>
> Signed-off-by: Felix Fietkau 
> ---
>  net/mac80211/tx.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
> index 3a7475f..b294820 100644
> --- a/net/mac80211/tx.c
> +++ b/net/mac80211/tx.c
> @@ -1247,7 +1247,8 @@ static void ieee80211_drv_tx(struct ieee80211_local 
> *local,
> struct txq_info *txqi;
> u8 ac;
>
> -   if (info->control.flags & IEEE80211_TX_CTRL_PS_RESPONSE)
> +   if ((info->flags & IEEE80211_TX_CTL_SEND_AFTER_DTIM) ||
> +   (info->control.flags & IEEE80211_TX_CTRL_PS_RESPONSE))
> goto tx_normal;
>
> if (!ieee80211_is_data(hdr->frame_control))
> --
> 2.2.2

I would like hooks to emerge to be able to keep the level of multicast
at a dull roar, relative to other traffic, and the impact of a
multicast bursts measureable (stat exposed to userspace, something
reporting back to the main tx queues that one just happened and how
long it took). On receive, too.


>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] codel: add forgotten inline to functions in header file

2016-02-11 Thread Dave Taht

On Thu, Feb 11, 2016 at 7:05 AM, Grumbach, Emmanuel
 wrote:
> fixing linux-wireless address ...
>
> On 02/11/2016 04:30 PM, Eric Dumazet wrote:
>> On Thu, 2016-02-11 at 16:08 +0200, Emmanuel Grumbach wrote:
>>> Signed-off-by: Emmanuel Grumbach 
>>> ---
>>> -static bool codel_should_drop(const struct sk_buff *skb,
>>> -  struct Qdisc *sch,
>>> -  struct codel_vars *vars,
>>> -  struct codel_params *params,
>>> -  struct codel_stats *stats,
>>> -  codel_time_t now)
>>> +static inline bool codel_should_drop(const struct sk_buff *skb,
>>> + struct Qdisc *sch,
>>> + struct codel_vars *vars,
>>> + struct codel_params *params,
>>> + struct codel_stats *stats,
>>> + codel_time_t now)
>>
>> The lack of inline was done on purpose.
>>
>> This include file is kind of special, being included by codel and
>> fq_codel.
>>
>> Hint : we do not want to force the compiler to inline
>> codel_should_drop() (or any other function).
>>
>>
>> See this file as if it was a .c really.
>>
>>
>
> Yeah :) codel_should_drop seemed very long indeed... I wanted to use the
> codel_get_time and associated utils (_before, _after) in iwlwifi.
> They're better than jiffies... So maybe I can just copy that code to
> iwlwifi.

I need to stress that codel as is is not the right thing for wifi,
particularly point to multipoint wifi in highly contended scenarios.
It IS a starting point. We have generally felt that the target needs
to be offset against the actual service opportunities, and the effects
of multicast (with powersave) and other "background" frames, needs to
be smoothed out.

Lacking hardware that can do that, or adaquate sims, has stalled
trying to come up with "the right thing". It looks like you are
putting in place more of the pieces to get there in some tree
somewhere?
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] codel: add forgotten inline to functions in header file

2016-02-11 Thread Dave Taht

On Thu, Feb 11, 2016 at 7:29 AM, Grumbach, Emmanuel
 wrote:
>
>
> On 02/11/2016 05:12 PM, Eric Dumazet wrote:
>> On Thu, 2016-02-11 at 15:05 +, Grumbach, Emmanuel wrote:
>>
>>
>>> Yeah :) codel_should_drop seemed very long indeed... I wanted to use the
>>> codel_get_time and associated utils (_before, _after) in iwlwifi.
>>> They're better than jiffies... So maybe I can just copy that code to
>>> iwlwifi.

Definately better than jiffies.

>>
>> You certainly can submit a patch adding the inline, but not on all
>> functions present in this file ;)
>>
>> Thanks !
>>
>
> Actually... All I need *has* the inline, but if I include codel.h,
> codel_dequeue is defined but not used and you definitely don't want to
> inline that one. So I guess the only other option I have is to split
> that header file which I don't think is really worth it. So, unless you
> object it, I'll just copy the code.

I think it is best to start with another base implementation of codel
for wifi, yes.

What I think is the currently best performing codel implementation (on
the wire, for ethernet) we have is in:

https://github.com/dtaht/bcake/codel5.h

which has a few differences from eric's implementation (64 bit
timestamps, inlining, not a lot of cpu profiling on it - still aiming
for algorithmic correctness here, it is closer to the original
paper... We'd used a different means of injecting the callback in it,
too)

The one currently in the main cake had issues in the last test round
but has been updated since. (sch_cake is also on github).

In neither case it is the right thing for wifi either.

the "starting plan" such as it was was to get to "one aggregate in the
hardware, one in the driver, one ready to be formed on the completion
interrupt", and pull a smoothed service time from start to completion
interrupt to dynamically modify the codel target. (other headaches,
like multicast, abound).

(It's the per station queue + fq as close to the hardware as possible
that matters most, IMHO.)

>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC v4] mac80211: add A-MSDU tx support

2016-02-08 Thread Dave Taht

On Mon, Feb 8, 2016 at 3:23 AM, Felix Fietkau  wrote:
> On 2016-02-08 12:06, Krishna Chaitanya wrote:
>> On Mon, Feb 8, 2016 at 4:34 PM, Felix Fietkau  wrote:
>>> On 2016-02-08 10:54, Krishna Chaitanya wrote:
 On Mon, Feb 8, 2016 at 2:56 PM, Emmanuel Grumbach  
 wrote:
> On Mon, Feb 8, 2016 at 10:38 AM, Felix Fietkau  wrote:
>> Requires software tx queueing support. frag_list support (for zero-copy)
>> is optional.
>>
>> Signed-off-by: Felix Fietkau 
>> ---
>
>
> Ok - looks fine, but... and here comes the hard stuff.
> The frame size in the PLCP is limited in a way that you can't - from a
> spec POV - enable A-MSDU for low rates. Of course, you don't want to
> do that for low rates at all regardless of the spec.
> Since you build the A-MSDU in the mac80211 Tx queue which is not aware
> of the link quality, how do we prevent A-MSDU if the rate is low /
> dropping.
> I'd even argue that when the rates get lower, you'll  have more
> packets piling up in the software queue and ... even more chance to
> get A-MSDU in the exact case where you really want to avoid it?

 Similar to triggering AMPDU setup, we should put this control
 in RC (minstrel) to start/stop AMSDU based on link quality/if the rates
 drop below a pre-defined MCS (or) only for best-throughput rates.
>>> I think starting/stopping A-MSDU based on the rate is a bad idea.
>>> Even with low rates, using A-MSDU can be a good thing (especially for
>>> TCP ACKs), it just needs different size limits.
>>
>> By low rates, i was referring to bad channel conditions (more
>> retries/crc errors)
>> so using AMSDU might trigger more TCP level retries and for case
>> of TCP ACK's its still worse in that it triggers TCP data retires from the
>> peer.
> Based on the research and data from the Bufferbloat project, I'd say
> that in this case the latency due to queue buildup is a lot more harmful
> than lost packets.

+1.

> With unmanaged queues, the latency will cause unnecessary
> retransmissions anyway.

typically at greater than 250ms, yes.

It's my hope that everyone has agreed that > 250ms of queuing is bad,
except for actual transit to the moon and back, at this point. My hope
is that everyone can aim for < 20ms in the more general case.

I was very happy to see the iwl patch go by cutting things down from
seconds to 30-250ms under load, but am now puzzled.

One iwl user has reported that AMPDUs are disabled by default on iwl,
and had to do this:

options iwlwifi 11n_disable=8

For a 4fold reduction in latency under load in his tests, iwl to ath9k.

What enables/disable AMPDU and AMSDU in the general case?

> With managed queues, packet drops start increasing with latency until
> TCP starts behaving properly.
> In both cases you have extra TCP retransmissions...

One thrust of mine has been to get ecn more widely adopted, which
would eliminate the tcp retransmit problem on devices using it for
congestion control (except on actual loss).

Apple turned ecn on universally on all dev devices last august. They
ran into trouble with one major ISP and had to turn it off, I am not
sure if they shipped with it on or not.

https://plus.google.com/u/0/107942175615993706558/posts/1j8nXtLGZDm

Stuart Cheshire also points at a real case where excessive queuing
hurts and fq doesn't help - screen sharing - with a nice demo
combining fq_codel + the tcp_netsent_lowat option. (his talk starts at
about 16 minutes in, he goes into the relevant bits at about 24
minutes in, and I'm sorry for always pointing people at talks rather
than papers)

Toke ("the good, the bad, and the wifi") points to excessive queuing
*not* being a problem so long as the fq portion is sufficiently close
to the hardware, in other scenarios.

http://www.sciencedirect.com/science/article/pii/S1389128615002479

Now, as to whether a driver or device doing retransmits should (or
even could) drop or mark packets is a good question. I am biased
towards achieving low latency and per device fairness so I lean
towards giving up a lot sooner in bad conditions and moving on

> With bad conditions you also get a strong increase in per-TXOP latency.

Having an agreed upon table for all the "bad" conditions and what
should be done in these cases would be good. I don't regard
"contention" as a bad condition, but as an indicator to be smarter
about bunching up and/or discarding packets. Interference is a "bad"
condition. Low rates for one station compared to the others is not a
bad condition, as striving for airtime fairness between all stations
is a reasonable goal... etc...

> With A-MSDU you need fewer TXOPs for the same amount of data in the queue.

I like it. :)

I note that for stations, I am perpetually seeing this sort of behavior:

ap -> transmits a bunch of packets to station
station -> takes 2 or more txops to respond

Re: [RFC v2] iwlwifi: pcie: transmit queue auto-sizing

2016-02-05 Thread Dave Taht

> A bursted txop can be as big as 5-10ms. If you consider you want to
> queue 5-10ms worth of data for *each* station at any given time you
> obviously introduce a lot of lag. If you have 10 stations you might
> end up with service period at 10*10ms = 100ms. This gets even worse if
> you consider MU-MIMO because you need to do an expensive sounding
> procedure before transmitting. So while SU aggregation can probably
> still work reasonably well with shorter bursts (1-2ms) MU needs at
> least 3ms to get *any* gain when compared to SU (which obviously means
> you want more to actually make MU pay off).

I am not sure where you get these numbers. Got a spreadsheet?

Gradually reducing the maximum sized txop as a function of the number
of stations makes sense. If you have 10 stations pending delivery and
reduced the max txop to 1ms, you hurt bandwidth at that instant, but
by offering more service to more stations, in less time, they will
converge on a reasonable share of the bandwidth for each, faster[1].
And I'm sure that the person videoconferencing on a link like that
would appreciate getting some service inside of a 10ms interval,
rather than a 100ms.

yes, there's overhead, and that's not the right number, which would
vary as to g,n,ac and successors.

You will also get more opportunities to use mu-mimo with shorter
bursts extant and more stations being regularly serviced.

[1] https://www.youtube.com/watch?v=Rb-UnHDw02o at about 13:50

> The rule of thumb is the
> longer you wait the bigger capacity you can get.

This is not strictly true as the "fountain" of packets is regulated by
acks on the other side of the link, and ramp up or down as a function
of service time and loss.

>
> Apparently there's interest in maximizing throughput but it stands in
> direct opposition of keeping the latency down so I've been thinking
> how to satisfy both.
>
> The current approach ath10k is taking (patches in review [1][2]) is to
> use mac80211 software queues for per-station queuing, exposing queue
> state to firmware (it decides where frames should be dequeued from)
> and making it possible to stop/wake per-station tx subqueue with fake
> netdev queues. I'm starting to think this is not the right way though
> because it's inherently hard to control latency and there's a huge
> memory overhead associated with the fake netdev queues.

What is this overhead?

Applying things  like codel tend to dramatically shorten the amount of
skbs extant... modern 802.11ac capable hardware has tons more
memory...

> Also fq_codel
> is a less effective with this kind of setup.

fq_codel's principal problems with working with wifi are long and
documented in the talk above.

> My current thinking is that the entire problem should be solved via
> (per-AC) qdiscs, e.g. fq_codel. I guess one could use
> limit/target/interval/quantum knobs to tune it for higher latency of
> aggregation-oriented Wi-Fi links where long service time (think
> 100-200ms) is acceptable. However fq_codel is oblivious to how Wi-Fi
> works in the first place, i.e. Wi-Fi gets better throughput if you
> deliver bursts of packets destined to the same station. Moreover this
> gets even more complicated with MU-MIMO where you may want to consider
> spatial location (which influences signal quality when grouped) of
> each station when you decide which set of stations you're going to
> aggregate to in parallel. Since drivers have a finite tx ring this it
> is important to deliver bursts that can actually be aggregated
> efficiently. This means driver would need to be able to tell qdisc
> about per-flow conditions to influence the RR scheme in some way
> (assuming a qdiscs even understands flows; do we need a unified way of
> talking about flows between qdiscs and drivers?).

This is a very good summary of the problems in layering fq_codel as it
exists today on top of wifi as it exists today. :/ Our conclusion
several years ago was that as the information needed to do things more
right was in the mac80211 layer that we could not evolve the qdisc
layer to suit, and needed to move the core ideas into the mac80211
layer.

Things have evolved since, but I still think we can't get enough info
up to the qdisc layer (locks and so on) to use it sanely.

>
> [1]: https://www.spinics.net/lists/linux-wireless/msg146187.html
> [2]: https://www.spinics.net/lists/linux-wireless/msg146512.html

I will review!

>
 For reference, ath10k has around 1400 tx descriptors, though
 in practice not all are usable, and in stock firmware, I'm guessing
 the NIC will never be able to actually fill up it's tx descriptors
 and stop traffic.  Instead, it just allows the stack to try to
 TX, then drops the frame...
>>>
>>>
>>> 1400 descriptors, ok... but they are not organised in queues?
>>> (forgive my ignorance of athX drivers)
>>
>>
>> I think all the details are in the firmware, at least for now.
>
> Yeah. Basically ath10k has a flat set of tx descriptors which are
> AC-agnostic.

Re: [RFC RESEND] iwlwifi: pcie: transmit queue auto-sizing

2016-02-04 Thread Dave Taht

ook to do that is in use on the mt72 chipset that felix is working
on... but nowhere else so far as I know (as yet).

the iwl does it's own aggregation (I think(?))... but estimates can
still be made...

There are WAY more details of course - per station queuing, a separate
multicast queue, only some in that talk!, but my hope was that under
good conditions we'd get wireless-n down below 12ms driver overhead,
even at 6mbit, before something like fq_codel could kick in (under
good conditions! Plenty of other potential latency sources beside
excessive queuing in wifi!). My ideal world would be to hold it at
under 1250us at higher rates

Periodically sampling seems like a reasonable approach under lab
conditions but it would be nicer to have feedback from the firmware -
"I transmitted the last tx as an X byte aggregate, at MCS1, I had to
retransmit a few packets once, it took me 6ms to acquire the media, I
heard 3 other stations transmitting, etc.".

The above info we know we can get from a few chipsets, but not enough
was known about the iwl last I looked. And one reason why fq_codel -
unassisted - is not quite the right thing on top of this is that
multicast can take a really long time...

Regardless, I'd highly love to see/use this patch myself in a variety
of real world conditions and see what happens. And incremental
progress is the only way forward. Thx for cheering me up.

>
> Cc: Stephen Hemminger <step...@networkplumber.org>
> Cc: Dave Taht <dave.t...@gmail.com>
> Cc: Jonathan Corbet <cor...@lwn.net>
> Signed-off-by: Emmanuel Grumbach <emmanuel.grumb...@intel.com>
> ---
> Fix Dave's email address
> ---
>  drivers/net/wireless/intel/iwlwifi/pcie/internal.h |  6 
>  drivers/net/wireless/intel/iwlwifi/pcie/tx.c   | 32 
> --
>  2 files changed, 35 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/internal.h 
> b/drivers/net/wireless/intel/iwlwifi/pcie/internal.h
> index 2f95916..d83eb56 100644
> --- a/drivers/net/wireless/intel/iwlwifi/pcie/internal.h
> +++ b/drivers/net/wireless/intel/iwlwifi/pcie/internal.h
> @@ -192,6 +192,11 @@ struct iwl_cmd_meta {
> u32 flags;
>  };
>
> +struct iwl_txq_auto_size {
> +   int min_space;
> +   unsigned long reset_ts;
> +};
> +
>  /*
>   * Generic queue structure
>   *
> @@ -293,6 +298,7 @@ struct iwl_txq {
> bool block;
> unsigned long wd_timeout;
> struct sk_buff_head overflow_q;
> +   struct iwl_txq_auto_size auto_sz;
>  };
>
>  static inline dma_addr_t
> diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/tx.c 
> b/drivers/net/wireless/intel/iwlwifi/pcie/tx.c
> index 837a7d5..4d1dee6 100644
> --- a/drivers/net/wireless/intel/iwlwifi/pcie/tx.c
> +++ b/drivers/net/wireless/intel/iwlwifi/pcie/tx.c
> @@ -572,6 +572,8 @@ static int iwl_pcie_txq_init(struct iwl_trans *trans, 
> struct iwl_txq *txq,
>
> spin_lock_init(>lock);
> __skb_queue_head_init(>overflow_q);
> +   txq->auto_sz.min_space = 240;
> +   txq->auto_sz.reset_ts = jiffies;
>
> /*
>  * Tell nic where to find circular buffer of Tx Frame Descriptors for
> @@ -1043,10 +1045,14 @@ void iwl_trans_pcie_reclaim(struct iwl_trans *trans, 
> int txq_id, int ssn,
>  q->read_ptr != tfd_num;
>  q->read_ptr = iwl_queue_inc_wrap(q->read_ptr)) {
> struct sk_buff *skb = txq->entries[txq->q.read_ptr].skb;
> +   struct ieee80211_tx_info *info;
> +   unsigned long tx_time;
>
> if (WARN_ON_ONCE(!skb))
> continue;
>
> +   info = IEEE80211_SKB_CB(skb);
> +
> iwl_pcie_free_tso_page(skb);
>
> __skb_queue_tail(skbs, skb);
> @@ -1056,6 +1062,18 @@ void iwl_trans_pcie_reclaim(struct iwl_trans *trans, 
> int txq_id, int ssn,
> iwl_pcie_txq_inval_byte_cnt_tbl(trans, txq);
>
> iwl_pcie_txq_free_tfd(trans, txq);
> +
> +   tx_time = 
> (uintptr_t)info->driver_data[IWL_TRANS_FIRST_DRIVER_DATA + 2];
> +   if (time_before(jiffies, tx_time + msecs_to_jiffies(10))) {
> +   txq->auto_sz.min_space -= 10;
> +   txq->auto_sz.min_space =
> +   max(txq->auto_sz.min_space, txq->q.high_mark);
> +   } else if (time_after(jiffies,
> + tx_time + msecs_to_jiffies(20))) {
> +   txq->auto_sz.min_space += 10;
> +   txq->auto_sz.min_space =
> +   min(txq->auto_sz.min_space, 252);
>

Re: [ath9k-devel] AR9462 problems connecting again..

2015-02-24 Thread Dave Taht

On Tue, Feb 24, 2015 at 2:26 AM, Jouni Malinen j...@w1.fi wrote:
 On Tue, Feb 24, 2015 at 01:29:27PM +1100, Andrew McGregor wrote:
 Over the weekend I found a bug in minstrel-ht that might well be
 implicated here.

 The last retransmit rate is meant to be a 'get the packet there
 reliably' rate; minstrel-ht doesn't do that right, and can pick a
 fairly flaky rate instead.

 Can't generate a proper patch right now, so this diff might not apply
 cleanly, but the fix is simply to change 75 to 99 in the two places
 below:

 While this may indeed be helpful, I don't think it is sufficient for
 this EAPOL frame related issue. What I would like to see is minstrel_ht
 using a basic rate (something non-HT) at the end of the retry series for
 EAPOL frames.

 The current behavior looks very suspicious to me. The early EAPOL frames
 after association are being used to probe for higher rates. This results
 in the total number of retry attempts actually getting smaller than any
 other frame, i.e., minstrel_ht seems to be using significantly _less_
 robust choices for the EAPOL frames than following normal data frames!
 This should really be the other way around..

 As an example, I'm seeing this on 5 GHz band (with the 75 to 99 change
 in place, but behavior was more or less identical without it):
 - the first EAPOL frame (msg 2/4) getting one attempt at MCS 3, 2
   attempts at MCS 0, 2 attempts at MCS 0 (yes, identical to the previous
   one) with total maximum of 5 attempts
 - the second EAPOL frame (msg 4/4) getting one attempt at MCS 9, 2
   attempts at MCS 0, 2 attempts at MCS 0 with total maximum of 5
   attempts
 - another data frame after this: 5 attempts at MCS 9, 5 attempts at MCS
   3, 5 attempts at MCS 3 with total maximum of 15 attempts(!!)

I would in general prefer that the excessive retries in the present
driver layers in wifi be
dramatically reduced, the packet dropped and the problem punted to
higher layers.

 This cannot be the best approach here..

Falling back faster to the lowest possible rate with minimum retries,
and then giving up sooner would be better. 15 attempts? jeeze

 For the
 IEEE80211_TX_CTRL_PORT_CTRL_PROTO cases, there are identified issues
 where failing to deliver the frame results is significant issues either
 in getting connected in the first place or getting disconnected if
 rekeying fails.

 I'm not sure how this would be implemented cleanly in minstrel_ht or
 whether that is even the best place (i.e., rate.c could do this
 instead), but I'd like that third attempt for control port cases to be
 dropped to use a (lowish) basic rate and non-MCS at that since there may
 be some interop issues with HT MCS early during association.
 Alternatively with drivers like ath9k that support 4 rate values, it
 would also be fine to add this basic rate attempt (or well, I'd have
 multiple, say 4, such attempts) as an additional 4th entry which does
 not currently seem to get used with minstrel at all.

 The (lowish) basic rate here could be defined as 6 Mbps OFDM for 5 GHz
 band and either that or maybe even 2 Mbps or 5.5 Mbps on 2.4 GHz (if
 included by the AP in basic rate set).

 --
 Jouni MalinenPGP id EFC895FA
 ___
 ath9k-devel mailing list
 ath9k-de...@lists.ath9k.org
 https://lists.ath9k.org/mailman/listinfo/ath9k-devel



-- 
Dave Täht
Let's make wifi fast, less jittery and reliable again!

https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb
--
To unsubscribe from this list: send the line unsubscribe linux-wireless in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [ath9k-devel] AR9462 problems connecting again..

2015-02-22 Thread Dave Taht

On Sun, Feb 22, 2015 at 10:30 AM, Linus Torvalds
torva...@linux-foundation.org wrote:
 On Sun, Feb 22, 2015 at 10:24 AM, Adrian Chadd adr...@freebsd.org wrote:

 Just a wild shot - try disabling fast authentication and see if that
 makes a difference?

 wpa_supplicant.conf:

 fast_reauth=0

 I recall having issues with fast_reauth once, but I never stuck around
 that location long enough to debug it.

 Nope. Did that, killed wpa_supplicant (which restarts it), tried
 connecting, still failed.

Hint: Several unifi (and most ubnt) products are well supported by
openwrt directly, which by reflashing your device to it via their web
interface would

A) probably fix the problem, and
B) give you more insight into fixing it, if it persists, by giving you
full access on both sides of the connection.

https://downloads.openwrt.org/snapshots/trunk/ar71xx/generic/

I have been replacing ubnt´s default firmware on first boot for 6+
years now. It is good hardware, after you do that.

-- 
Dave Täht

http://www.bufferbloat.net/projects/cerowrt/wiki
--
To unsubscribe from this list: send the line unsubscribe linux-wireless in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Wifi outside the faraday cage (was: Throughput regression with `tcp: refine TSO autosizing`)

2015-02-12 Thread Dave Taht

On Fri, Feb 6, 2015 at 1:57 AM, Michal Kazior michal.kaz...@tieto.com wrote:
 On 5 February 2015 at 20:50, Dave Taht dave.t...@gmail.com wrote:
 [...]
 And I really, really, really wish, that just once during this thread,
 someone had bothered to try running a test
 at a real world MCS rate - say MCS1, or MCS4, and measured the latency
 under load of that...

 Time between frame submission to firmware and tx-completion on one of
 my ath10k machines:

THANK YOU for running these tests!


 Legacy 54mbps: ~18ms
 Legacy 6mbps: ~37ms

legacy rates are what many people actually achieve, given the
limited market penetration ac has on clients and APs.

 11n MCS 3 (nss=0): ~13ms
 11n MCS 8 (nss=1): ~6-8ms
 11ac NSS=1 MCS=2: ~4-6ms
 11ac NSS=2 MCS=0: ~5-8ms

 Keep in mind this is a clean room environment so retransmissions are
 kept at minimum. Obviously with a noisy environment you'll get retries
 at different rates and higher latency.

It is difficult to reconcile the results you get in the clean room
with the results I get from measurements in the real wold. I encourage
you to go test your code in coffee shops, in offices with wifi, and in
hotels and apartment buildings in preference to testing in the lab.

I typically measure induced delays in the 3 to 6 second range in your
typical conference scenario, which I measure at every conference I go
to. The latest talk, including data on that, is friday morning,
starting at 2:15 or so, at nznog:

http://new.livestream.com/i-filmservices/NZNOG2015/videos/75358960

1) In the real world, I rarely see the maximum *rates*.

I am personally quite fond of designing stuff with gears out of the
middle of the Boston Gear Catalog. [1]. In looking over my largely
outdoor wifi network, I see a cluster of values around mcs11,
followed\by mcs4,3, 7 and 0, and *nothing* with MCS15. David lang is
planning on doing some measurements at the SCALE conference next week,
and I expect heaps of data from that, but I strongly suspect that the
vast majority of connections in every circumstance except the
test-bench are not even coming close to the maximum MCS rate in the
standard(s).

I would have thought that the lessons of the chromecast, where *every*
attempt at reviewing it in an office environment failed, might have
supplied industry clue that even 20Mbit to a given station is
impossible in many cases due to airtime contention.

Aggregates should be sized to have a maximum of 2 full ones stacked up
at the rate being achieved for the destination, and the rest
backlogged in the qdisc layer, if possible. 37ms backed up in the
firmware is a lot, considering that the test above had no airtime
contention in it, and no multicast.

Drivers need to be aware that every TXOP is precious. I could see
having a watchdog timer set on getting one packet into a wifi driver
to wait a few hundred usec longer to fire off the write to the
hardware in order to maximize aggregation by accumulating more packets
to aggregate.

I have hopes for xmit_more also being useful, but I am really not sure
how well that works on single cores, interactions with napi, and with
other wifi aggregates. It looks like adding xmit_more to the ag71xx
driver will be easy...

2) In the real world I see media acquisition times *far* greater than 1ms.

Please feel free to test your drivers in coffee shops, in the office,
at hotels, in apartments...

And retries... let's not talk about retries...


3) Longer AMPDUs lead to more tail loss and retries

I have a paper around here somewhere that shows AMPDU loss and retries
go up disproportionately as the length of transmission approaches 4ms.
I hate drawing a conclusion from a paper I can't find, but my overall
take on it is that as media acquisition time and retransmits go up,
reducing AMPDU size from the maximum down to about 1ms at the current
rate would lead to more fair, responsive, and fast-feeling wifi for
everyone, improve ack clocking, flow mixing for web traffic, etc, etc.

4) There is some fairly decent academic work on other aspects of
excessive buffering at lower rates

http://hph16.uwaterloo.ca/~bshihada/publications/buffer-AMPDU.pdf

(there are problems with this paper, but at least it tests n)

and see google scholar for bufferbloat related papers in 2014 and
later on wifi and LTE.

5) As for rate control, Minstrel was designed in an era when there
wasn't one AP for every 4 people in the USA. Other people's rate
controllers are even dumber, and minstrel-ht itself needs a hard look
at n speeds, much less ac speeds.

6) Everything I say above applies to both stations and APs.

APs have FAR worse problems, where per-tid (station) queuing is really
needed in order to effectively aggregate when two or more stations are
in use. Statistically, with two or more stations using traffic,
aggregation possibilities will go down rapidly on a FIFO, (and go down
even faster with FQ in place without per sta queuing!),  and with the
usual fixed buffersize underneath that, without per-tid

Re: Throughput regression with `tcp: refine TSO autosizing`

2015-02-12 Thread Dave Taht

On Wed, Feb 11, 2015 at 11:48 PM, Michal Kazior michal.kaz...@tieto.com wrote:
 On 11 February 2015 at 09:57, Michal Kazior michal.kaz...@tieto.com wrote:
 On 10 February 2015 at 15:19, Johannes Berg johan...@sipsolutions.net 
 wrote:
 On Tue, 2015-02-10 at 11:33 +0100, Michal Kazior wrote:

 +   if (msdu-sk) {
 +   ewma_add(ar-tx_delay_us,
 +ktime_to_ns(ktime_sub(ktime_get(), 
 skb_cb-stamp)) /
 +NSEC_PER_USEC);
 +
 +   ACCESS_ONCE(msdu-sk-sk_tx_completion_delay_cushion) =
 +   (ewma_read(ar-tx_delay_us) *
 +msdu-sk-sk_pacing_rate)  20;
 +   }

 To some extent, every wifi driver is going to have this problem. Perhaps
 we should do this in mac80211?

 Good point. I was actually thinking about it. I can try cooking a
 patch unless you want to do it yourself :-)

 I've taken a look into this. The most obvious place to add the
 timestamp for each packet would be ieee80211_tx_info (i.e. the
 skb-cb[48]). The problem is it's very tight there. Even squeezing 2
 bytes (allowing up to 64ms of tx completion delay which I'm worried

I will argue strongly in favor of never allowing more than 4ms packets
to accumulate in the firmware.

 won't be enough) will be troublesome. Some drivers already use every
 last byte of their allowance on 64bit archs (e.g. ar5523 uses entire
 40 bytes of driver_data).

 I wonder if it's okay to bump skb-cb to 56 bytes to avoid the cascade
 of changes required to implement the tx completion delay accounting?


 Michał
 --
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Dave Täht

thttp://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks
--
To unsubscribe from this list: send the line unsubscribe linux-wireless in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Throughput regression with `tcp: refine TSO autosizing`

2015-02-05 Thread Dave Taht

On Fri, Feb 6, 2015 at 2:44 AM, Michal Kazior michal.kaz...@tieto.com wrote:

 On 5 February 2015 at 14:19, Eric Dumazet eric.duma...@gmail.com wrote:
  On Thu, 2015-02-05 at 04:57 -0800, Eric Dumazet wrote:
 
  The intention is to control the queues to the following :
 
  1 ms of buffering, but limited to a configurable value.
 
  On a 40Gbps flow, 1ms represents 5 MB, which is insane.
 
  We do not want to queue 5 MB of traffic, this would destroy latencies
  for all concurrent flows. (Or would require having fq_codel or fq as
  packet schedulers, instead of default pfifo_fast)
 
  This is why having 1.5 ms delay between the transmit and TX completion
  is a problem in your case.

 I do get your point. But 1.5ms is really tough on Wi-Fi.

 Just look at this:

 ; ping 192.168.1.2 -c 3
 PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
 64 bytes from 192.168.1.2: icmp_seq=1 ttl=64 time=1.83 ms
 64 bytes from 192.168.1.2: icmp_seq=2 ttl=64 time=2.02 ms
 64 bytes from 192.168.1.2: icmp_seq=3 ttl=64 time=1.98 ms

 ; ping 192.168.1.2 -c 3 -Q 224
 PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
 64 bytes from 192.168.1.2: icmp_seq=1 ttl=64 time=0.939 ms
 64 bytes from 192.168.1.2: icmp_seq=2 ttl=64 time=0.906 ms
 64 bytes from 192.168.1.2: icmp_seq=3 ttl=64 time=0.946 ms

 This was run with no load so batching code in the driver itself should
 have no measurable effect. The channel was near-ideal: low noise
 floor, cabled rf, no other traffic.

 The lower latency ping is when 802.11 QoS Voice Access Category is
 used. I also get 400mbps instead of 250mbps in this case with 5 flows
 (net/master).


The VO queue is now nearly useless in a real world environment. Whlle
it does grab the media mildly faster in some cases, on a good day with
no other competing APs, it cannot aggregate packets, and wastes TXOPS.
It is far saner to aim for better aggregate (use the VI queue if you
must try to get better media acquisition).

It is disabled in multiple products I know of.

And I really, really, really wish, that just once during this thread,
someone had bothered to try running a test
at a real world MCS rate - say MCS1, or MCS4, and measured the latency
under load of that...

or tried talking to two or more stations at the same time.

Instead of trying for 1.5Gbits in a faraday cage.



 Dealing with black box firmware blobs is a pain.


+10



  Note that TCP stack could detect when this happens, *if* ACK where
  delivered before the TX completions, or when TX completion happens,
  we could detect that the clone of the freed packet was freed.
 
  In my test, when I did ethtool -C eth0 tx-usecs 1024 tx-frames 64, and
  disabling GSO, TCP stack sends a bunch of packets (a bit less than 64),
  blocks on tcp_limit_output_bytes.
 
  Then we receive 2 stretch ACKS after ~50 usec.
 
  TCP stack tries to push again some packets but blocks on
  tcp_limit_output_bytes again.
 
  1ms later, TX completion happens, tcp_wfree() is called, and TCP stack
  push following ~60 packets.
 
 
  TCP could  eventually dynamically adjust the tcp_limit_output_bytes,
  using a per flow dynamic value, but I would rather not add a kludge in
  TCP stack only to deal with a possible bug in ath10k driver.
 
  niu has a similar issue and simply had to call skb_orphan() :
 
  drivers/net/ethernet/sun/niu.c:6669:skb_orphan(skb);

 Ok. I tried calling skb_orphan() right after I submit each Tx frame
 (similar to niu which does this in start_xmit):

 --- a/drivers/net/wireless/ath/ath10k/htt_tx.c
 +++ b/drivers/net/wireless/ath/ath10k/htt_tx.c
 @@ -564,6 +564,8 @@ int ath10k_htt_tx(struct ath10k_htt *htt, struct
 sk_buff *msdu)
 if (res)
 goto err_unmap_msdu;

 +   skb_orphan(msdu);
 +
 return 0;

  err_unmap_msdu:


 Now, with {net/master + ath10k GRO + the above} I get 620mbps on a
 single flow (even better then before). Wow.

 Does this look ok/safe as a solution to you?


 Michał
 --
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html




-- 
Dave Täht

thttp://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks
--
To unsubscribe from this list: send the line unsubscribe linux-wireless in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

69 matches

Mail list logo