Re: [Cerowrt-devel] [Codel] codel "oversteer"

2012-06-20 Thread Jonathan Morton
Is the cwnd also oscillating wildly or is it just an artefact of the visible 
part of the queue only being a fraction of the real queue?

Are ACK packets being aggregated by wireless? That would be a good explanation 
for large bursts that flood the buffer, if the rwnd opens a lot suddenly. This 
would also be an argument that 2*n is too small for the ECN drop threshold. 

The key to knowledge is not to rely on others to teach you it. 

On 20 Jun 2012, at 04:32, Dave Taht  wrote:

> I've been forming a theory regarding codel behavior in some
> pathological conditions. For the sake of developing the theory I'm
> going to return to the original car analogy published here, and add a
> new one - "oversteer".
> 
> Briefly:
> 
> If the underlying interface device driver is overbuffered, when the
> packet backlog finally makes it into the qdisc layer, that bursts up
> rapidly and codel rapidly ramps up it's drop strategy, which corrects
> the problem, but we are back in a state where we are, as in the case
> of an auto on ice, or a very loose connection to the steering wheel,
> "oversteering" because codel is actually not measuring the entire
> time-width of the queue and unable to control it well, even if it
> could.
> 
> What I observe on wireless now with fq_codel under heavy load is
> oscillation in the qdisc layer between 0 length queue and 70 or more
> packets backlogged, a burst of drops when that happens, and far more
> drops than ecn marks that I expected  (with the new (arbitrary) drop
> ecn packets if > 2 * target idea I was fiddling with illustrating the
> point better, now). It's difficult to gain further direct insight
> without time and packet traces, and maybe exporting more data to
> userspace, but this kind of explains a report I got privately on x86
> (no ecn drop enabled), and the behavior of fq_codel on wireless on the
> present version of cerowrt.
> 
> (I could always have inserted a bug, too, if it wasn't for the private
> report and having to get on a plane shortly I wouldn't be posting this
> now)
> 
> Further testing ideas (others!) could try would be:
> 
> Increase BQL's setting to over-large values on a BQL enabled interface
> and see what happens
> Test with an overbuffered ethernet interface in the first place
> Improve the ns3 model to have an emulated network interface with
> user-settable buffering
> 
> Assuming I'm right and others can reproduce this, this implies that
> focusing much harder on BQL and overbuffering related issues on the
> dozens? hundreds? of non-BQL enabled ethernet drivers is needed at
> this point. And we already know that much more hard work on fixing
> wifi is needed.
> 
> Despite this I'm generally pleased with the fq_codel results over
> wireless I'm currently getting from today's build of cerowrt, and
> certainly the BQL-enabled ethernet drivers I've worked with (ar71xx,
> e1000) don't display this behavior, neither does soft rate limiting
> using htb - instead achieving a steady state for the packet backlog,
> accepting bursts, and otherwise being "nice".
> 
> -- 
> Dave Täht
> SKYPE: davetaht
> http://ronsravings.blogspot.com/
> ___
> Codel mailing list
> co...@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/codel
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] cerowrt 3.3.8-17: nice latency improvements, some issues with bind

2012-08-18 Thread Jonathan Morton

On 18 Aug, 2012, at 12:38 pm, Török Edwin wrote:

> Shouldn't wireless N be able to do 200 - 300 Mbps though? If I enable 
> debugging in iwl4965 I see that it
> starts TX aggregation, so not sure whats wrong (router or laptop?). With 
> encryption off I can get at most 160 Mbps.

That's only the raw data rate - many non-ideal effects conspire to reduce this 
by at least half in practice.

I don't think anyone has ever seen the full theoretical throughput on wireless 
- at this point it's just a marketing number to indicate "this one is newer and 
better" to the technically illiterate.

 - Jonathan

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Codel] FQ_Codel lwn draft article review

2012-11-28 Thread Jonathan Morton
It may be worth noting that fq-codel is not stochastic in it's fairness
mechanism. SFQ suffers from the birthday effect because it hashes packets
into buffers, which is what makes it stochastic.

- Jonathan Morton
 On Nov 28, 2012 6:02 PM, "Paul E. McKenney" 
wrote:

> Dave gave me back the pen, so I looked to see what I had expanded
> FQ-CoDel to.  The answer was...  Nothing.  Nothing at all.
>
> So I added a Quick Quiz as follows:
>
> Quick Quiz 2: What does the FQ-CoDel acronym expand to?
>
> Answer: There are some differences of opinion on this. The
> comment header in net/sched/sch_fq_codel.c says
> “Fair Queue CoDel” (presumably by analogy to SFQ's
> expansion of “Stochastic Fairness Queueing”), and
> “CoDel” is generally agreed to expand to “controlled
> delay”. However, some prefer “Flow Queue Controlled
> Delay” and still others prefer to prepend a silent and
> invisible "S", expanding to “Stochastic Flow Queue
> Controlled Delay” or “Smart Flow Queue Controlled
> Delay”. No doubt additional expansions will appear in
> the fullness of time.
>
> In the meantime, this article focuses on the concepts,
> implementation, and performance, leaving naming debates
> to others.
>
> This level snarkiness would go over reasonably well in an LWN article,
> I would -not- suggest this approach in an academic paper, just in case
> you were wondering.  But if there is too much discomfort with snarking,
> I just might be convinced to take another approach.
>
> Thanx, Paul
>
> On Tue, Nov 27, 2012 at 08:38:38PM -0800, Paul E. McKenney wrote:
> > I guess I just have to be grateful that people mostly agree on the
> acronym,
> > regardless of the expansion.
> >
> >   Thanx, Paul
> >
> > On Tue, Nov 27, 2012 at 07:43:56PM -0800, Kathleen Nichols wrote:
> > >
> > > It would be me that tries to say "stochastic flow queuing with CoDel"
> > > as I like to be accurate. But I think FQ-Codel is Flow queuing with
> CoDel.
> > > JimG suggests "smart flow queuing" because he is ever mindful of the
> > > big audience.
> > >
> > > On 11/27/12 4:27 PM, Paul E. McKenney wrote:
> > > > On Tue, Nov 27, 2012 at 04:53:34PM -0700, Greg White wrote:
> > > >> BTW, I've heard some use the term "stochastic flow queueing" as a
> > > >> replacement to avoid the term "fair".  Seems like a more apt term
> anyway.
> > > >
> > > > Would that mean that FQ-CoDel is Flow Queue Controlled Delay?  ;-)
> > > >
> > > >   Thanx, Paul
> > > >
> > > >> -Greg
> > > >>
> > > >>
> > > >> On 11/27/12 3:49 PM, "Paul E. McKenney" 
> wrote:
> > > >>
> > > >>> Thank you for the review and comments, Jim!  I will apply them when
> > > >>> I get the pen back from Dave.  And yes, that is the thing about
> > > >>> "fairness" -- there are a great many definitions, many of the most
> > > >>> useful of which appear to many to be patently unfair.  ;-)
> > > >>>
> > > >>> As you suggest, it might well be best to drop discussion of
> fairness,
> > > >>> or to at the least supply the corresponding definition.
> > > >>>
> > > >>> Thanx, Paul
> > > >>>
> > > >>> On Tue, Nov 27, 2012 at 05:03:02PM -0500, Jim Gettys wrote:
> > > >>>> Some points worth making:
> > > >>>>
> > > >>>> 1) It is important to point out that (and how) fq_codel avoids
> > > >>>> starvation:
> > > >>>> unpleasant as elephant flows are, it would be very unfriendly to
> never
> > > >>>> service them at all until they time out.
> > > >>>>
> > > >>>> 2) "fairness" is not necessarily what we ultimately want at all;
> you'd
> > > >>>> really like to penalize those who induce congestion the most.
>  But we
> > > >>>> don't
> > > >>>> currently have a solution (th

Re: [Cerowrt-devel] [Bloat] bufferbloat and the web service providers

2012-12-09 Thread Jonathan Morton

On 9 Dec, 2012, at 6:14 pm, Maciej Soltysiak wrote:

> What are the heaviest (amount of elements, css, images, scripts, js bugs, ad 
> trackers, all that filth) websites out there?

Amazon, RS Components, ICanHazCheezburger, AOL (shudder) - those spring to mind 
immediately.

 - Jonathan

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] hardware hacking on fq_codel in FPGA form at 10GigE

2012-12-20 Thread Jonathan Morton
A small CPU can be made in perhaps 35K gates - something like an ARM7TDMI
or a Cortex-M0. It is common to stick one of those in a special purpose
chip to help with control logic.

But that would operate at a few hundred MHz, which leaves only a few cycles
per packet for small packets. That's not enough to run even a relatively
simple algorithm like codel.

Dedicated logic that *is* fast enough to run the algorithm on each packet
shouldn't be any bigger than such a CPU.

- Jonathan Morton
 On Dec 20, 2012 10:17 AM, "Hal Murray"  wrote:

>
> If I was going to do something like that, I'd build a small/simple CPU and
> do
> the work in microcode.
>
> > implementing {n,e,s}fq_codel onboard looks very feasible
>
> How many lines of assembler code would it take?
>
> How many registers do you need?  Do you need any memory other than queues?
> Maybe counters?
>
>
> > The only thing that is seriously serial about fq_codel is shooting the
> > biggest flow when the queue limit is exceeded, and that could be made
> > embarrassingly parallel with enough gates.There are no doubt other tricky
> > issues.
>
> Would it be better to do the fq work in the main CPU and let the FPGA grab
> packets from some shared  data structure in memory?  Can you work out a
> memory structure that doesn't need locks?
>
>
> --
> These are my opinions.  I hate spam.
>
>
>
> ___
> Bloat-devel mailing list
> bloat-de...@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat-devel
>
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] Latest codel, fq_codel, and pie sim study from cablelabs now available

2013-05-01 Thread Jonathan Morton

On 1 May, 2013, at 11:26 pm, Simon Barber wrote:

> Interesting to note that sfq-codel's reaction to a non conforming flow is of 
> course to start dropping more aggressively to make it conform, leading to the 
> high loss rates for whatever is hashed together with a VoIP flow that does 
> not reduce it's bandwidth.
> 
> One downside to SFQ really.

The only real solution, for the scenario where this happens, would be to 
somehow identify all the BitTorrent traffic and stuff it into a single bucket, 
where it has to compete on equal terms with the single VoIP flow.  The big 
unanswered question is then: can this realistically be done?  Does BitTorrent 
traffic get marked as the bulk, low priority traffic it is, for example?
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] Latest codel, fq_codel, and pie sim study from cablelabs now available

2013-05-02 Thread Jonathan Morton

On 2 May, 2013, at 5:20 am, Simon Barber wrote:

> Or one could use more queues in SFQ, so that the chance of 2 streams sharing 
> a queue is small.

CableLabs actually did try that - increasing the number of queues - and found 
that it made things worse.  This, I think, extends to true "fair queueing" with 
flows explicitly rather than stochastically identified.  The reason is that 
with a very large number of flows, the bandwidth (or the packet throughput) is 
still shared evenly between them, and there is not enough bandwidth in the VoIP 
flow's share to allow it to work correctly.  With a relatively small number of 
flow buckets, the responsive flows hashed to the same bucket get out of the way 
of the unresponsive VoIP flow.

In short, a very large number of flow buckets prioritises BitTorrent over 
anything latency-sensitive, because BitTorrent uses a very large number of 
individual flows.

By contrast, putting all the BitTorrent flows into one bucket (or a 
depressed-priority queue with it's own SFQ buckets), or else elevating the VoIP 
traffic explicitly to a prioritised queue, would share the bandwidth more 
favourably to the VoIP flow, allowing it to use as much as it needed.  Either, 
or indeed both simultaneously, would do the job reasonably well, although an 
elevated priority queue should be bandwidth limited to a fraction of capacity 
to avoid the temptation of abuse by bulk flows.  Then there would be no 
performance objection to using a large number of flow buckets.

I can easily see a four-tier system working for most consumers, just so long as 
the traffic for each tier can be identified - each tier would have it's own 
fq_codel queue:

1) Network control traffic, eg. DNS, ICMP, even SYNs and pure ACKs - max 1/16th 
bandwidth, top priority

2) Latency-sensitive unresponsive flows, eg. VoIP and gaming - max 1/4 
bandwidth, high priority

3) Ordinary bulk traffic, eg. web browsing, email, general purpose protocols - 
no bandwidth limit, normal priority

4) Background traffic, eg. BitTorrent - no bandwidth limit, low priority, 
voluntarily marked, competes at 1:4 with normal.

Obviously, the classification system implementing that must have some idea of 
what bandwidth is actually available at any given moment, but it is not 
necessary to explicitly restrict the top tiers' bandwidth when the link is 
otherwise idle.  Practical algorithms could be found to approximate the correct 
behaviour on a saturated link, while simply letting all traffic through on an 
unsaturated link.  Basic installations could already do this using HTB and 
assuming a link bandwidth.

Even better, of course, would be some system that allows BitTorrent to yield as 
though it were a smaller number of flows than it really is.  The "swarm" 
behaviour is very unusual among network protocols.  LEDBAT via uTP already does 
a good job on a per-flow basis, but since it's all under control of a single 
application, the necessary information should be available at that point.  I am 
reminded of the way Azureus adjusts global bandwidth limits - both incoming and 
outgoing - to match reality, based on both periodic and continuous measurements.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] optimizing for very small bandwidths with fq_codel better?

2013-05-02 Thread Jonathan Morton

On 3 May, 2013, at 1:07 am, Dave Taht wrote:

> 1) I think there's a bug in either the kernel or tc or me on tos matching,

So this works:

tc filter add dev eth2 parent a: protocol ip prio 8 u32 match ip tos 0x2e fc 
flowid a:b

But this doesn't:

tc filter add dev eth2 parent a: protocol ip prio 10 u32 match ip tos 0x08 0xfc 
flowid a:b

I notice, near the end, that one has fc and the other has 0xfc.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Codel] [Bloat] Latest codel, fq_codel, and pie sim study from cablelabs now available

2013-05-06 Thread Jonathan Morton

On 6 May, 2013, at 8:54 pm, Jesper Dangaard Brouer wrote:

> A flow is considered "new" if no packets for the given flow exists in
> the queue.  It does not have to be a truly new-flow, it just have to
> send packets "slow"/paced enough, that the queue is empty when the next
> packet arrive.
> 
> Perhaps VoIP would fit this traffic profile, and thus would work better
> with the Linux fq_codel implementation, compared to the SFQ-Codel used
> in the simulation.

That doesn't work, because the with a sufficient number of BT flows, the flow 
queue containing the VoIP flow is the fullest queue, not the emptiest.  That's 
independent of the number of flow queues, including the infinite case.  Think 
about it carefully.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] fq_codel is two years old

2014-05-15 Thread Jonathan Morton
There is, I think, one good way to make Diffserv actually work. It does
require several steps in tandem.

Step one is to accept and admit that differential pricing based on scarcity
economics does not work on the internet. That's going to be tough to
swallow for the big commercial players.

Step two is to define service levels in such a way that asking for a bonus
in one category inherently requires taking a deficit in some other
category. This permits trusting the Diffserv field, wherever it happens to
come from.

That part is where the old TOS flags went wrong, because they tried to
define mutually exclusive characteristics of traffic orthogonally. It was
possible for traffic to request service that was simultaneously higher
bandwidth, higher reliability, lower latency, *and* cheaper than service
without any flags set. This was obviously nonsensical.

My suggested definition is a straight trade-off of priority for bandwidth.
If you want maximum bandwidth, you're going to have to put up with lower
priority relative to traffic which has effectively requested low latency,
which in turn will find itself throttled to some fraction of the available
bandwidth in return for that priority. It forces whoever is setting the
flags to make a genuine engineering trade-off, and happily it can trivially
be made compatible with the legacy Precedence interpretation of the
Diffserv field.

Codepoint 00, naturally, corresponds to full bandwidth, minimum
priority traffic, and is the default.

To implement it, we're going to need a throttled priority queue. This
should be straightforward - a set of 64 TBFs with the special properties
that higher priority buckets refill more slowly, and that spending from a
bucket also spends the same amount from all lower-priority buckets. Then at
dequeue, take a packet from the highest priority queue with a positive
bucket and a waiting packet, then refill each bucket with the appropriate
fraction of the dequeued packet size. (Implementation detail: what to do if
no such packet exists; also, what fraction to use for each bucket.)
Naturally, each TBF can and should support a child qdisc such as fq_codel.

- Jonathan Morton
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] BQL, txqueue lengths and the internet of things

2014-06-11 Thread Jonathan Morton

On 12 Jun, 2014, at 4:05 am, David P. Reed wrote:

> Maybe you can do a quick blog howto?  I'd bet the same could be done for 
> raspberry pi and perhaps my other toy the wandboard which has a gigE adapter 
> and Scsi making it a nice iscsi target or nfs server. 

FYI, the Raspberry Pi's built-in Ethernet is attached via USB.  It's a chip 
that also includes a USB hub, which is why the cheaper model which drops 
Ethernet also loses a USB port.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] Marketing problems

2014-07-27 Thread Jonathan Morton
A marketing number?  Well, as we know, consumers respond best to "bigger is
better" statistics. So anything reporting delay or ratio in the ways
mentioned so far is doomed to failure - even if we convince the industry
(or the regulators, more likely) to adopt them.

Another problem that needs solving is that marketing statistics tend to get
gamed a lot.  They must therefore be defined in such a way that gaming them
is difficult without actually producing a corresponding improvement in the
service.  That's similar in nature to a security problem, by the way.

I have previously suggested defining a "responsiveness" measurement as a
frequency. This is the inverse of latency, so it gets bigger as latency
goes down. It would be relatively simple to declare that responsiveness is
to be measured under a saturating load.

Trickier would be defining where in the world/network the measurement
should be taken from and to. An ISP which hosted a test server on its
internal network would hold an unfair advantage over other ISPs, so the
sane solution is to insist that test servers are at least one neutral
peering hop away from the ISP. ISPs that are geographically distant from
the nearest test server would be disadvantaged, so test servers need to be
provided throughout the densely populated parts of the world - say one per
timezone and ten degrees of latitude if there's a major city in it.

At the opposite end of the measurement, we have the CPE supplied with the
connection. That will of course be crucial to the upload half of the
measurement.

While we're at it, we could try redefining bandwidth as an average, not a
peak value. If the ISP has a "fair usage cap" of 300GB per 30 days, then
they aren't allowed to claim an average bandwidth greater than 926kbps.
National broadband availability initiatives can then be based on that
figure.

- Jonathan Morton
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] still trying to find hardware for the next generation worth hacking on

2014-08-15 Thread Jonathan Morton
> one promising project is this one: https://www.turris.cz/en/

That does look promising. The existing software is OpenWRT, so porting
CeroWRT shouldn't be difficult.

The P2020 CPU is a PowerPC Book E type - basically a 603e with the FPU
ripped out, then turned into an SoC. It should have loads of performance,
and enough I/O to feed those GigE ports effectively.

The only real software concern should be that it's big-endian, but since I
already use an old PowerBook as a firewall, that's unlikely to be a big
hurdle. Fq_codel works well on it.

- Jonathan Morton
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] Comcast upped service levels -> WNDR3800 can't cope...

2014-08-29 Thread Jonathan Morton

On 29 Aug, 2014, at 7:57 pm, Aaron Wood wrote:

> That's roughly 10K cpu cycles per packet, which seems like an awful lot.

I could analyse the chief algorithms to see how many clock cycles per packet 
are theoretically possible - a number one could approach with an embedded core 
in the NIC, rather than as part of a full kernel.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] Comcast upped service levels -> WNDR3800 can't cope...

2014-08-29 Thread Jonathan Morton

On 29 Aug, 2014, at 7:57 pm, Aaron Wood wrote:

> Comcast has upped the download rates in my area, from 50Mbps to 100Mbps.

FWIW, it looks like the unshaped latency has about halved with the doubling of 
capacity.  That's consistent with the buffer size and (lack of) management 
remaining the same.

If PIE were enabled, it'd look a whole lot better than that, I'm sure.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] Comcast upped service levels -> WNDR3800 can't cope...

2014-08-30 Thread Jonathan Morton

On 29 Aug, 2014, at 9:06 pm, Dave Taht wrote:

> The cpu caches are 32k/32k, the memory interface 16 bit. The rate limiter
> (the thing eating all the cycles, not the fq_codel algorithm!) is
> single threaded and has global locks,
> and is at least partially interrupt bound at 100Mbits/sec.

Looking at the code, HTB is considerably more complex than TBF in Linux, and 
not all of the added complexity is due to being classful (though a lot of it 
is).  It seems that TBF has dire warnings all over it about having limited 
packet-rate capacity which depends on the value of HZ, while HTB has some sort 
of solution to that problem.

Meanwhile, FQ has per-flow throttling which looks like it could be torn out and 
used as a simple replacement for TBF.  I should take a closer look and check 
whether it would just suffer from the same problems, but if it won't, then that 
could be a potential life-extender for the 3800.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] Comcast upped service levels -> WNDR3800 can't cope...

2014-08-30 Thread Jonathan Morton

On 29 Aug, 2014, at 9:06 pm, Dave Taht wrote:

> In the future, finding something that could be easily implemented in hardware 
> would be good.

Does "implemented in firmware for a tiny ARM core" count?  I imagine that NICs 
could be made with those, if they aren't already, and it would probably make 
the lead-time shorter and engineering risk smaller.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] Comcast upped service levels -> WNDR3800 can't cope...

2014-08-30 Thread Jonathan Morton

On 30 Aug, 2014, at 4:03 pm, Toke Høiland-Jørgensen wrote:

> Jonathan Morton  writes:
> 
>> Looking at the code, HTB is considerably more complex than TBF in
>> Linux, and not all of the added complexity is due to being classful
>> (though a lot of it is). It seems that TBF has dire warnings all over
>> it about having limited packet-rate capacity which depends on the
>> value of HZ, while HTB has some sort of solution to that problem.
> 
> Last I checked, those warnings were out-dated. Everything is in
> nanosecond resolution now, including TBF. I've been successfully using
> TBF in my experiments at bandwidths up to 100Mbps (on Intel Core2 x86
> boxes, that is).

Closer inspection of the kernel code does trace to the High Resolution Timers, 
which is good.  I wish they'd update the comments to go with that sort of thing.

I've managed to run some tests now, and my old PowerBook G4 can certainly 
handle either HTB or TBF in the region of 200Mbps, at least for simple tests 
over a LAN.  The ancient Sun GEM chipset (integrated into the PowerBook's 
northbridge, actually) doesn't seem willing to push more than about 470Mbps 
outbound, even without filtering - but that might be normal for a decidedly 
PCI/AGP-era machine.  I'll need to investigate more closely to see whether 
there's a CPU load difference between HTB and TBF in practice.

I have two other machines which are able to talk to each other at ~980Mbps.  
They're both AMD based, and one of them is a "nettop" style MiniITX system, 
based around the E-450 APU.  The choice of NIC, and more specifically the way 
it is attached to the system, seems to matter most - these both use an 
RTL8111-family PCIe chipset.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] Comcast upped service levels -> WNDR3800 can't cope...

2014-08-30 Thread Jonathan Morton

On 30 Aug, 2014, at 8:19 pm, Aaron Wood wrote:

> Do you think this is a limitation of MIPS as a whole, or just the particular 
> MIPS cores in use on these platforms?  

There were historically a great many MIPS designs.  Several of the high-end 
designs were 64-bit and used in famous workstations.  The one we see in CPE 
today, however, is the MIPS equivalent of the AMD Geode, based on an old 
version of the MIPS architecture, and further crippled by embedded-style 
hardware choices.  It would have been a good CPU in 1989, considering that it 
would have competed against the 486 in the PC space, but it wouldn't have been 
hobbled by a 16-bit memory bus back then.

I'm not sure how much effort is going into improving the embeddable versions of 
MIPS cores, but certainly ARM seems to be a more active participant in the 
embedded space.  Their current range of embeddable cores scales from the 
Cortex-M0 (whose chief selling point is that it takes only a fraction of a 
square millimetre of die space) to some quite decent 64-bit multicore CPUs 
(which AMD is developing a server platform for), with a number of intermediate 
points along that continuum catered for.

So if a particular core works but proves to have inadequate performance, a 
better one can be integrated into the next version of the hardware, without any 
risk of having to rewrite all the software.  That future-proofing is probably 
important to manufacturers, and isn't very obviously available with MIPS cores.

I wouldn't be surprised to see something like a Cortex-A5, or possibly even a 
multicore Cortex-A7 in CPE.  These are capable of running conventional 
multitasking OSes like Linux (and hence OpenWRT), and have a lot of 
fully-mature toolchain support.  But perhaps they would leave out the FPU, or 
configure only the most basic type of FPU (VFPv3-D16), to save money compared 
to the NEON unit you'd normally find in a smartphone.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] Comcast upped service levels -> WNDR3800 can't cope...

2014-08-30 Thread Jonathan Morton

On 30 Aug, 2014, at 8:33 pm, Jonathan Morton wrote:

> I'll need to investigate more closely to see whether there's a CPU load 
> difference between HTB and TBF in practice.

Replying to myself, but...

The surprising result is that TBF seems to consume about TWICE the CPU time, at 
the same bandwidth, as HTB does.  Even FQ in flow-rate-limit mode is roughly on 
par with HTB on that score.  There is clearly something very wrong with how TBF 
is doing it.

HTB (running with only a default class) and FQ are both able to send and 
regulate 200Mbps using about an eighth of a 1.5GHz PowerPC, all told.  It 
doesn't even seem to matter which qdisc I slap on top of HTB.  TBF needs almost 
a quarter of the same CPU to do the same thing.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] Comcast upped service levels -> WNDR3800 can't cope...

2014-08-31 Thread Jonathan Morton

On 31 Aug, 2014, at 1:30 am, Dave Taht wrote:

> Could I get you to also try HFSC?

Once I got a kernel running that included it, and figured out how to make it do 
what I wanted...

...it seems to be indistinguishable from HTB and FQ in terms of CPU load.

Actually, I think most of the CPU load is due to overheads in the 
userspace-kernel interface and the device driver, rather than the qdiscs 
themselves.  Something about TBF causes more overhead - it goes through periods 
of lower CPU use similar to the other shapers, but then spends periods at 
considerably higher CPU load, all without changing the overall throughput.

The flip side of this is that TBF might be producing a smoother stream of 
packets.  The receiving computer (which is fast enough to notice such things) 
reports a substantially larger number of recv() calls are required to take in 
the data from TBF than from anything else - averaging about 4.4KB rather than 
9KB or so.  But at these data rates, it probably matters little.

FWIW, apparently Apple's variant of the GEM chipset doesn't support jumbo 
frames.  This does, however, mean that I'm definitely working with an MTU of 
1500, similar to what would be sent over the Internet.

These tests were all run using nttpc.  I wanted to finally try out RRUL, but 
the wrappers fail to install via pip on my Gentoo boxes.  I'll need to 
investigate further before I can make pretty graphs like everyone else.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] Comcast upped service levels -> WNDR3800 can't cope...

2014-09-01 Thread Jonathan Morton

On 1 Sep, 2014, at 8:01 pm, Dave Taht wrote:

> On Sun, Aug 31, 2014 at 3:18 AM, Jonathan Morton  
> wrote:
>> 
>> On 31 Aug, 2014, at 1:30 am, Dave Taht wrote:
>> 
>>> Could I get you to also try HFSC?
>> 
>> Once I got a kernel running that included it, and figured out how to make it 
>> do what I wanted...
>> 
>> ...it seems to be indistinguishable from HTB and FQ in terms of CPU load.
> 
> If you are feeling really inspired, try cbq. :) One thing I sort of like 
> about cbq is that it (I think)
> (unlike htb presently) operates off an estimated size for the next packet 
> (which isn't dynamic, sadly),
> where the others buffer up an extra packet until they can be delivered.

It's also hilariously opaque to configure, which is probably why nobody uses it 
- the RED problem again - and the top link when I Googled for best practice on 
it gushes enthusiastically about Linux 2.2!  The idea of manually specifying an 
"average packet size" in particular feels intuitively wrong to me.  Still, I 
might be able to try it later on.

Most class-based shapers are probably more complex to set up for simple needs 
than they need to be.  I have to issue three separate 'tc' invocations for a 
minimal configuration of each of them, repeating several items of data between 
them.  They scale up reasonably well to complex situations, but such uses are 
relatively rare.

> In my quest for absolutely minimal latency I'd love to be rid of that
> last extra non-in-the-fq_codel-qdisc packet... either with a "peek"
> operation or with a running estimate.

I suspect that something like fq_codel which included its own shaper (with the 
knobs set sensibly by default) would gain more traction via ease of use - and 
might even answer your wish.

> It would be cool to be able to program the ethernet hardware itself to
> return completion interrupts at a given transmit rate (so you could
> program the hardware to be any bandwidth not just 10/100/1000). Some
> hardware so far as I know supports this with a "pacing" feature.

Is there a summary of hardware features like this anywhere?  It'd be nice to 
see what us GEM and RTL proles are missing out on.  :-)

>> Actually, I think most of the CPU load is due to overheads in the 
>> userspace-kernel interface and the device driver, rather than the qdiscs 
>> themselves.
> 
> You will see it bound by the softirq thread, but, what, exactly,
> inside that, is kind of unknown. (I presently lack time to build up
> profilable kernels on these low end arches. )

When I eventually got RRUL running (on one of the AMD boxes, so the PowerBook 
only has to run the server end of netperf), the bandwidth maxed out at about 
300Mbps each way, and the softirq was bouncing around 60% CPU.  I'm pretty sure 
most of that is shoving stuff across the PCI bus (even though it's internal to 
the northbridge), or at least waiting for it to go there.  I'm happy to assume 
that the rest was mostly kernel-userspace interface overhead to the netserver 
instances.

But this doesn't really answer the question of why the WNDR has so much lower a 
ceiling with shaping than without.  The G4 is powerful enough that the overhead 
of shaping simply disappears next to the overhead of shoving data around.  Even 
when I turn up the shaping knob to a value quite close to the hardware's 
unshaped capabilities (eg. 400Mbps one-way), most of the shapers stick to the 
requested limit like glue, and even the worst offender is within 10%.  I 
estimate that it's using only about 500 clocks per packet *unless* it saturates 
the PCI bus.

It's possible, however, that we're not really looking at a CPU limitation, but 
a timer problem.  The PowerBook is a "proper" desktop computer with hardware to 
match (modulo its age).  If all the shapers now depend on the high-resolution 
timer, how high-resolution is the WNDR's timer?

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] Comcast upped service levels -> WNDR3800 can't cope...

2014-09-01 Thread Jonathan Morton
s old, it's also been die-shrunk over the 
intervening years, so it runs a lot faster than it originally did.  I very much 
doubt that it's as refined as my G4, but it could probably hold its own 
relative to a comparable ARM SoC such as the Raspberry Pi.  (Unfortunately, the 
latter doesn't have the I/O capacity to do high-speed networking - USB only.)  
Atheros publicity materials indicate that they increased the I-cache to 64KB 
for performance reasons, but saw no need to increase the D-cache at the same 
time.

Which brings me back to the timers, and other items of black magic.

Incidentally, transfer speed benchmarks involving wireless will certainly be 
limited by the wireless link.  I assume that's not a factor here.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] Comcast upped service levels -> WNDR3800 can't cope...

2014-09-02 Thread Jonathan Morton

On 2 Sep, 2014, at 1:14 am, Aaron Wood wrote:

>> For the purposes of shaping, the CPU shouldn't need to touch the majority of 
>> the payload - only the headers, which are relatively small.  The bulk of the 
>> payload should DMA from one NIC to RAM, then DMA back out of RAM to the 
>> other NIC.  It has to do that anyway to route them, and without shaping 
>> there'd be more of them to handle.  The difference might be in the data 
>> structures used by the shaper itself, but I think those are also reasonably 
>> compact.  It doesn't even have to touch userspace, since it's not acting as 
>> the endpoint as my PowerBook was during my tests.
> 
> In an ideal case, yes.  But is that how this gets managed?  (I have no idea, 
> I'm certainly not a kernel developer).

It would be monumentally stupid to integrate two GigE MACs onto an SoC, and 
then to call it a "network processor", without adequate DMA support.  I don't 
think Atheros are that stupid.

Here's a more detailed datasheet:

http://pdf.datasheetarchive.com/indexerfiles/Datasheets-SW6/DSASW00118777.pdf

"Another memory factor is the ability to support multiple I/O operations in 
parallel via the WNPU's various ports. The on-chip SRAM in AR7100 WNPUs has 5 
ports that enable simultaneous access to and from five sources: the two gigabit 
Ethernet ports, the PCI port, the USB 2.0 port and the MIPS processor."

It's a reasonable question, however, whether the driver uses that support 
properly.  Mainline Linux kernel code seems to support the SoC but not the 
Ethernet; if it were just a minor variant of some other Atheros hardware, I'd 
have expected to see it integrated into one of the existing drivers.  Or maybe 
it is, and my greps just aren't showing it.

At minimum, however, there are MMIO ranges reported for each MAC during 
OpenWRT's boot sequence.  That's where the ring buffers are.  The most the CPU 
has to do is read each packet from RAM and write it into those buffers, or vice 
versa for receive - I think that's what my PowerBook has to do.  Ideally, a 
bog-standard DMA engine would take over that simple duty.  Either way, that's 
something that has to happen whether it's shaped or not, so it's unlikely to be 
our problem.

The same goes for the wireless MACs, incidentally.  These are standard ath9k 
mini-PCI cards, and the drivers *are* in mainline.  There shouldn't be any 
surprises with them.

> If the packet data is getting moved about from buffer to buffer (for instance 
> to do the htb calculations?) could that substantially change the processing 
> load?

The qdiscs only deal with packet and socket headers, not the full packet data.  
Even then, they largely pass pointers around, inserting the headers into linked 
lists rather than copying them into arrays.  I believe a lot of attention has 
been directed at cache-friendliness in this area, and the MIPS caches are of 
conventional type.

>> Which brings me back to the timers, and other items of black magic.
> 
> Which would point to under-utilizing the processor core, while still having 
> high load? (I'm not seeing that, I'm curious if that would be the case).

It probably wouldn't manifest as high system load.  Rather, poor timer 
resolution or latency would show up as excessive delays between packets, during 
which the CPU is idle.  The packet egress times may turn out to be quantised - 
that would be a smoking gun, if detectable.

>> Incidentally, transfer speed benchmarks involving wireless will certainly be 
>> limited by the wireless link.  I assume that's not a factor here.
> 
> That's the usual suspicion.  But these are RF-chamber, short-range lab setups 
> where the radios are running at full speed in perfect environments...

Sure.  But even turbocharged 'n' gear tops out at 450Mbps signalling, and much 
less than that is available even theoretically for TCP/IP throughput.  My point 
is that you're probably not running *your* tests over wireless.

> What this makes me realize is that I should go instrument the cpu stats with 
> each of the various operating modes:
> 
> * no shaping, anywhere
> * egress shaping
> * egress and ingress shaping at various limited levels:
> * 10Mbps
> * 20Mbps
> * 50Mbps
> * 100Mbps

Smaller increments at the high end of the range may prove to be useful.  I 
would expect the CPU usage to climb nonlinearly (busy-waiting) if there's a 
bottleneck in a peripheral device, such as the PCI bus.  The way the kernel 
classifies that usage may also be revealing.

> Heck, what about running HTB simply from a 1ms timer instead of from a data 
> driven timer?

That might be what's already happening.  We have to figure out that before we 
can work out a solution.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] Comcast upped service levels -> WNDR3800 can't cope...

2014-09-02 Thread Jonathan Morton

On 1 Sep, 2014, at 9:32 pm, Dave Taht wrote:

>>> It would be cool to be able to program the ethernet hardware itself to
>>> return completion interrupts at a given transmit rate (so you could
>>> program the hardware to be any bandwidth not just 10/100/1000). Some
>>> hardware so far as I know supports this with a "pacing" feature.
>> 
>> Is there a summary of hardware features like this anywhere?  It'd be nice to 
>> see what us GEM and RTL proles are missing out on.  :-)
> 
> I'd like one.

Is there at least a list of drivers (both wired and wireless) which are BQL 
enabled?  If GEM is not in that list, it might explain why the PCI bus gets 
jammed solid on my PowerBook.

> There are certain 3rd party firmwares like octeon's
> where it seems possible to add more features to the firmware
> co-processor, in particular.

Octeon is basically a powerful, multi-core MIPS64 SoC that happens to have 
Ethernet hardware attached, and is available in NIC form.  These "NICs" look 
like miniature motherboards in PCIe-card format, complete with mini-SIMM slots. 
 Utter overkill for normal applications; they're meant to do encryption on the 
fly, and were originally introduced as Ethernet-less coprocessor cards for that 
purpose.  At least they represent a good example of what high-end MIPS is like 
these days.

The original Bigfoot KillerNIC was along those lines, too, but slightly less 
overdone.  It still managed to cost $250+, and Newegg still lists a price in 
that general range despite being permanently out of stock.  As well as running 
Linux on the card itself, the drivers apparently replaced large parts of the 
Windows network stack in the quest for efficiency and low latency.  Results 
varied; Anandtech suggested that the biggest improvements probably came on 
cheaper PCs, whose owners wouldn't be able to justify such a high-priced NIC - 
and that was in 2007.

I can't tell what the newer products under the Killer brand (taken over by 
Qualcomm/Atheros) really are, but they are sufficiently reduced in cost, size 
and complexity to be integrated into "gamer" PC motherboards and laptops, and 
they respond to being driven like standard (newish) Atheros hardware.  In 
particular, it's unclear whether they do most of their special sauce in 
software (so Windows-specific) or firmware.

Comments I hear sometimes seem to imply that *some* Atheros hardware runs 
internal firmware.  Whether that is strictly wireless hardware, or whether it 
extends into Ethernet, I can't yet tell.  Since it's widely deployed, it would 
theoretically be a good platform for experimentation - but in practice?

> tc qdisc add dev eth0 cake bandwidth 50mbit diffservmap std

Or even having the "diffservmap std" part be in the defaults.  I try not to 
spend too much mental effort understanding diffserv - it's widely 
misunderstood, and most end-user applications ignore it.  Supporting the basic 
eight precedences, and maybe some userspace effort to introduce marking, should 
be enough.

I like the name, though.  :-)

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] Comcast upped service levels -> WNDR3800 can't cope...

2014-09-02 Thread Jonathan Morton

On 2 Sep, 2014, at 6:37 pm, Dave Taht wrote:

> The ath10k has a cpu and firmware. The ath9k does not.

So what's this then?  http://wireless.kernel.org/en/users/Drivers/ar9170.fw

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] Comcast upped service levels -> WNDR3800 can't cope...

2014-09-02 Thread Jonathan Morton

On 2 Sep, 2014, at 6:37 pm, Dave Taht wrote:

> > Is there at least a list of drivers (both wired and wireless) which are BQL 
> > enabled?  If GEM is not in that list, it might explain why the PCI bus gets 
> > jammed solid on my PowerBook.
> 
> A fairly current list (and the means to generate a more current one) is at:
> 
> https://www.bufferbloat.net/projects/codel/wiki/Best_practices_for_benchmarking_Codel_and_FQ_Codel

Ah, so GEM doesn't have BQL.

...now it does.  :-D



sungem-bql.patch.gz
Description: GNU Zip compressed data


 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] Comcast upped service levels -> WNDR3800 can't cope...

2014-09-02 Thread Jonathan Morton

On 2 Sep, 2014, at 8:41 pm, Dave Taht wrote:

> unfortunately you may need to initialize things correctly with
> netdev_reset_queue in the appropriate initialization or
> recovery-from-error bits.
> 
> (this is the part that tends to be tricky)

I poked around a bit and found that gem_clean_rings() seems to be called from 
everywhere relevant, including from gem_init_rings() and gem_do_stop().  I was 
therefore able to add a single call there.

I've taken the other suggestion at face value.

> Do you have a before/after test result?

At gigabit link speeds, there seems to be no measurable difference - the 
machine just isn't capable of filling the buffer fast enough.  I have yet to 
try it at slower link rates.



sungem-bql.patch.gz
Description: GNU Zip compressed data


 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] Comcast upped service levels -> WNDR3800 can't cope...

2014-09-03 Thread Jonathan Morton

On 2 Sep, 2014, at 6:37 pm, Dave Taht wrote:

> > > tc qdisc add dev eth0 cake bandwidth 50mbit diffservmap std
> >
> > Or even having the "diffservmap std" part be in the defaults.  I try not to 
> > spend too much mental effort understanding diffserv - it's widely 
> > misunderstood, and most end-user applications ignore it.  Supporting the 
> > basic eight precedences, and maybe some userspace effort to introduce 
> > marking, should be enough.
> 
> The various ietf wgs seem to think AFxx is a useful concept.

I'm sure they do.  And I'm sure that certain networks make use of it 
internally.  But the Internet does not support such fine distinctions in 
practice - at least, not at the moment.  We have enough difficulty getting SQM 
of *any* colour deployed where it's needed.

A good default handling of Precedence would already be an improvement over the 
status quo, and I've worked out a CPU-efficient way of doing so.  It takes 
explicit advantage of the fact that the overall shaping bandwidth is known, but 
degrades gracefully in case the actual bandwidth temporarily falls below that 
value.  As I suggested previously, it gives weighted priority to 
higher-precedence packets, but limits their permitted bandwidth to prevent 
abuse.

As it happens, simply following the Precedence field, and ignoring the 
low-order bits of the Diffserv codepoint, satisfies the letter of the AF spec.  
The Class field is encoded as a Precedence value, and the drop-precedence 
subclasses then have equal drop probability, which the inequality equations 
permit.  The same equations say nothing obvious about how a packet marked 
*only* with Precedence 1-4 should be treated relative to AF-marked packets in 
the same Precedence band, which is part of what gives me a headache about the 
whole idea.

EF is also neatly handled by ignoring the low-order bits, since its encoding 
has a high Precedence value.  So, at the very least, more refined AF/EF 
handling can be deferred to a "version 2" implementation.

Reading the HTB code also gives me a headache.  I have so far been unable to 
distinguish any fundamental timing differences in its single-class behaviour 
relative to TBF.  The only clues I have so far are:

1) HTB uses a different timer call to schedule a future wakeup than TBF or FQ 
do.
2) FQ doesn't use a bucket of tokens, and explicitly avoids producing a "burst" 
of packets, but HTB and TBF both do and explicitly can.
3) TBF is the only one of the three to exhibit unusually high softirq load on 
egress.  But this varies over time, even with constant bandwidth and packet 
size.

> > I like the name, though.  :-)
> 
> It is partially a reference to a scene in the 2010 sequel to 2001.


I need to re-watch that.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] Comcast upped service levels -> WNDR3800 can't cope...

2014-09-03 Thread Jonathan Morton

On 3 Sep, 2014, at 9:15 am, Aaron Wood wrote:

> What this makes me realize is that I should go instrument the cpu stats with 
> each of the various operating modes:
> 
> * no shaping, anywhere
> * egress shaping
> * egress and ingress shaping at various limited levels:
> * 10Mbps
> * 20Mbps
> * 50Mbps
> * 100Mbps
> 
> So I set this up tonight, and have a big pile of data to go through.  But the 
> headline finding is that the WNDR3800 can't do more than 200Mbps ingress, 
> with shaping turned off.  The GbE switch fabric and my setup were just fine 
> (pushed some very nice numbers through those interfaces when on the switch), 
> but going through the routing engine (NATing), and 200Mbps is about all it 
> could do.
> 
> I took tcp captures of it shaping past it's limit (configured for 150/12), 
> with then rrul, tcp_download, tcp_upload tests.
> 
> And I took a series of tests walking down from 100/12, 90/12, 80/12, ... down 
> to 40/12, while capturing /proc/stats and /proc/softirqs once a second 
> (roughly), so that can be processed to pull out where the load might be 
> (initial peeking hints that it's all time spent in softirq).
> 
>  If anyone wants the raw data, let me know, I'll upload it somewhere.  The 
> rrul pcap is large, the rest of it can be e-mailed easily.

Given that the CPU load is confirmed as high, the pcap probably isn't as 
useful.  The rest would be interesting to look at.

Are you able to test with smaller packet sizes?  That might help to isolate 
packet-throughput (ie. connection tracking) versus byte-throughput problems.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] Comcast upped service levels -> WNDR3800 can't cope...

2014-09-03 Thread Jonathan Morton

On 4 Sep, 2014, at 3:33 am, Dave Taht wrote:

> Gigabit "routers", indeed, when only the switch is cable of that!

I have long thought that advertising regulators need to have a *lot* more 
teeth.  Right now, even when a decision comes down that an advert is blatantly 
misleading, all they can really do is say "please don't do it again".  Here's a 
reasonably typical example:

http://www.asa.org.uk/Rulings/Adjudications/2014/8/British-Telecommunications-plc/SHP_ADJ_265259.aspx

Many adverts and marketing techniques that I believe are misleading (at best) 
are never even considered by the regulators, probably because few people 
outside the technical community even understand that a problem exists, and 
those that do tend to seriously bungle the solution (not least because they get 
lobbied by the special interests).

It's bad enough that there's an ISO standard inexplicably defining a megabyte 
as 1,024,000 bytes, for storage-media purposes.  Yes, that's not a typo - it's 
2^10 * 10^3.  That official standard supposedly justifies all those "1.44MB" 
floppy disks (with a raw unformatted capacity of 1440KB), and the "terabyte" 
hard disks that are actually a full 10% smaller than 2^40 bytes.  SSDs often 
use the "slack" between the definitions to implement the necessary 
error-correction and wear-levelling overhead without changing the marketable 
number (so 256GB of flash chips installed, 256GB capacity reported to the 
consumer, but there's a 7% difference between the two).

Honestly though, they can get away with calling them "gigabit routers" because 
they have "gigabit" external interfaces.  They can also point to all the PCI 
GigE NICs that can only do 750Mbps, because that's where the PCI bus saturates, 
but nobody prevents *them* from being labelled 1000base-T and therefore 
"gigabit ethernet".

It's worse in the wireless world because the headline rate is the maximum 
signalling rate under ideal conditions.  The actual throughput under typical 
home/office/conference conditions bears zero resemblance to that figure for any 
number of reasons, but even under ideal conditions the actual throughput is a 
surprisingly small fraction of the signalling rate.

Consumer reports type stuff could be interesting, though.  I haven't seen any 
of the big tech-review sites take on networking seriously, except for basic 
throughput checks on bare Ethernet (which mostly reveal whether a GigE chipset 
is attached via PCI or PCIe).  It's a complicated subject; Anandtech conceded 
that accurate tests of the KillerNIC's marketing claims were particularly 
difficult to arrange, but they did a lot of subjective testing in an attempt to 
compensate.

One could, in principle, give out a bronze award for equipment which fails to 
meet (the spirit of) its marketing claims, but is still useful in the real 
world.  A silver award for equipment which *does* meet its marketing claims and 
generally works as it should.  A gold award would be reserved for equipment 
which both merits a silver award and genuinely stands out in the market.  And 
at the opposite end of the scale, a "rusty pipe" award for truly excrable 
efforts, similar to LowEndMac's "Road Apple" award.  All protected by copyright 
and trademark laws, which are rather easier to enforce in a legally binding 
manner than advertising regulations.

Incidentally, for those amused (or frustrated) by embedded hardware design 
decisions, the "Road Apple" awards list is well worth a read - and potentially 
eye-opening.  Watch out for the PowerPC Mac with dual 16-bit I/O buses.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] Comcast upped service levels -> WNDR3800 can't cope...

2014-09-04 Thread Jonathan Morton

On 4 Sep, 2014, at 10:04 am, Sebastian Moeller wrote:

> IPv6 NOTE: Everyone with a real dual-stack IPv6 and IPv4 connection to the 
> internet (so not tunneled over IPv4) and an ATM-based DSL connection (might 
> be the empty set...)

I believe at least A&A (Andrews & Arnold) in the UK have that setup.  They are 
unusual among ADSL ISPs in supporting IPv6 properly - among other things.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] Fixing bufferbloat: How about an open letter to the web benchmarkers?

2014-09-11 Thread Jonathan Morton

On 12 Sep, 2014, at 3:35 am, dpr...@reed.com wrote:

> Among friends of mine, we can publicize this widely.  But those friends 
> probably would like to see how the measurement would work.

Could we make use of the existing test servers (running netperf) for that 
demonstration?  How hard is the protocol to fake in Javascript?

Or would a netperf-wrapper demonstration suffice?  We've already got that, but 
we'd need to extract the single-figures-of-merit from the data.

I wonder if the speedof.me API can already be tricked into doing the right 
thing?

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] Fixing bufferbloat: How about an open letter to the web benchmarkers?

2014-09-11 Thread Jonathan Morton

On 12 Sep, 2014, at 4:49 am, Joel Wirāmu Pauling wrote:

> So if ookla implemented a udp based test, changed it's statical weighting and 
> data mining methods overnight. At least in NZ that might help. 

Isn't that the whole point of this discussion?

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] Fwd: Will Edwards to give Mill talk in Estonia on 12/10/2014

2014-12-09 Thread Jonathan Morton
Watching all the talks takes more than just an hour, but I've just spent a
couple of days doing so. This is certainly intriguing. At first glance it
all looks too good to be true, but the technical details actually look
plausible and elegant.

One way to look at it is: Mill is what Itanic could have been, if it wasn't
designed by committee. It neatly sidesteps a lot of the pitfalls that tend
to go with VLIW designs, and therefore has a chance of actually working as
advertised, unlike Itanic.

The only thing that I'm really not convinced about is the way they produce
a custom instruction encoding for each member of the architecture family.
However, that does make the architecture scalable, unlike Itanic.

It's possible that some of the more useful ideas it presents could have
been incorporated into a conventional RISC architecture. I might play with
that idea privately.

Estonia is even quite close to where I am, just the opposite side of the
Gulf of Finland, so I'm almost tempted to go and take a look. On the other
hand, it's probably just as educational to watch the video at home
afterwards, and doing so avoids scheduling conflict.

- Jonathan Morton
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] cake: changing bandwidth on the rate limiter dynamically

2014-12-12 Thread Jonathan Morton
> On 12 Dec, 2014, at 17:52, Dave Taht  wrote:
> 
> Now, it turns out that cake makes altering the bandwidth really easy,
> you can just change it from the command line.
> 
> http://pastebin.com/Jr9s6EBW
> 
> I am pretty sure changing it is currently pretty damaging to stuff in
> flight (don't remember), but it needent be.

I don’t think it is harmful.  All that happens is that the packet issue 
deadlines start to be advanced at a different rate with each packet 
transmitted.  A change in the rate by that mechanism doesn’t cause any packet 
flushing or other trouble.

The flow-discrimination mode and the diffserv mode can also be changed in the 
same way, but this is more disruptive; it may at least result in some packets 
arriving out of order within flows, if not also some spurious drops.  But the 
network can tolerate that happening occasionally.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] cake: changing bandwidth on the rate limiter dynamically

2014-12-12 Thread Jonathan Morton

> On 12 Dec, 2014, at 19:44, Dave Taht  wrote:
> 
> While fiddling with the idea a bit, I found that you can add a
> bandwidth limit to cake on the fly, but once added you cant remove it
> with the syntax at hand.

Yes you can:

# tc qdisc change dev ifb0 handle 2: cake unlimited

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] cake: changing bandwidth on the rate limiter dynamically

2014-12-14 Thread Jonathan Morton

> On 13 Dec, 2014, at 05:57, Dave Taht  wrote:
> 
> I guess all that is needed is a marie antoinette mode, then, huh?

For that, we would need to have something called “bread” for cake to be an 
alternative to.  But then again, what was the French for bread again?  ;-)

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] Fwd: Throughput regression with `tcp: refine TSO autosizing`

2015-02-01 Thread Jonathan Morton
Since this is going to be a big job, it's worth prioritising parts of it
appropriately.

Minstrel is probably already the single best feature of the Linux Wi-Fi
stack. AFAIK it still outperforms any other rate selector we know about. So
I don't consider improving it further to be a high priority, although that
trick of using it as a sneaky random packet loss inducer is intriguing.

Much more important and urgent is getting some form of functioning SQM
closer to the hardware, where the information is. I don't think we need to
get super fancy here to do some real good, in the same way that PIE is a
major improvement over drop-tail. I'd settle for a variant of fq_codel that
gets and uses information about whether the current packet request might be
aggregated with the previous packet provided, and adjusts its choice of
packet accordingly.

At the same time, models would undoubtedly be useful to help test and
repeatably demonstrate the advantages of both simple and more sophisticated
solutions. Ns3 allows laying out a reasonably complex radio environment,
which is great for this. To counter the prevalence of one-station Faraday
cage tests in the industry, the simulated environments should represent
realistic, challenging use cases:

1) the family home, with half a dozen client devices competing with several
interference sources (Bluetooth, RC toys, microwave oven, etc). This is a
relatively easy environment, representing the expected environment for
consumer equipment.

2) the apartment block, with fewer clients per AP but lots of APs
distributed throughout a large building. Walls and floors may provide
valuable attenuation here - unless you're in Japan, where they can be
notoriously thin.

3) the railway carriage, consisting of eighty passengers in a 20x3 m space,
and roughly the same number of client devices. The uplink is 3G based and
has some inherent latency. Add some Bluetooth for flavour, stir gently.
This one is rather challenging, but there is scope to optimise AP antenna
placement, and to scale the test down slightly by reducing seat occupancy.

4) the jumbo jet, consisting of several hundred passengers crammed in like
sardines. The uplink has satellite latencies built in. Good luck.

5) the business hotel. Multiple APs will be needed to provide adequate
coverage for this environment, which should encompass the rooms as well as
lounge, conference and dining areas. Some visitors may bring their own APs,
and the system must be able to cope with this without seriously degrading
performance.

6) the trade conference. A large arena filled with thousands of people.
Multiple APs required. Good luck.

I also feel that ultimately we're going to have to get industry on board.
Not just passively letting us play around as with ath9k, but actively
taking note of our findings and implementing at least a few of our ideas
themselves. Of course, tools, models and real-world results are likely to
make that easier.

- Jonathan Morton
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] infrastructure fixes for bufferbloat.net

2015-02-09 Thread Jonathan Morton
Count a vote in general for static pages for web hosting where possible.
Running a server side script (in any language, but especially PHP) and
making database calls for every hit is a performance and security nightmare.

Also, spammers can't spam a comment system that doesn't exist.

- Jonathan Morton
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] Two d-link products tested for bloat...

2015-02-20 Thread Jonathan Morton
Out of curiosity, perhaps you could talk to A&A about their FireBrick
router. They make a big point of having written the firmware for it
themselves, and they might be more interested in having researchers poke at
it in interesting ways than the average big name.  A&A are an ISP, not a
hardware manufacturer by trade.

Meanwhile, I suspect the ultimate hardware vendors don't care because their
customers, the big brands, don't care. They in turn don't care because
neither ISPs nor consumers care (on average). A coherent, magazine style
review system with specific areas given star ratings might have a chance of
fixing that, if it becomes visible enough. I'm not sure that a rant blog
would gain the same sort of traction.

Some guidance can be gained from the business of reviewing other computer
hardware. Power supplies are generally, at their core, one of a few
standard designs made by one of a couple of big subcontractors. The quality
of the components used to implement that design, and ancillary hardware
such as heatsinks and cabling, are what distinguish them in the
marketplace. Likewise motherboards are all built around a standard CPU
socket, chipset and form factor, but the manufacturers find lots of little
ways to distinguish themselves on razor thin margins; likewise graphics
cards. Laptops are usually badly designed in at least one stupid way
despite the best efforts of reviewers, but thanks to them it is now
possible to sort through the general mess and find one that doesn't
completely suck at a reasonable price.

As for the rating system itself:

- the Communications Black Hole, for when we can't get it to work at all.
Maybe we can shrink a screen grab from Interstellar for the job.

- the Tin Cans & String, for when it passes packets okay (out of the box)
but is horrible in every other important respect.

- the Carrier Pigeon. Bonus points if we can show it defecating on the
message (or the handler's wrist).

- the Telegraph Pole (or Morse Code Key). Maybe put the Titanic in the
background just to remind people how hard they are failing.

- the Dial-Up Modem. Perhaps products which become reliable and useful if
the user installs OpenWRT should get at least this rating.

- the Silver RJ45, for products which contrive to be overall competent in
all important respects.

- the Golden Fibre, for the very best, most outstanding examples of best
practice, without any significant faults at all. Bonus Pink Floyd reference.

I've been toying with the idea of putting up a website on a completely
different subject, but which might have similar structure. Being able to
use the same infrastructure for two different sites might spread the costs
in an interesting way...

- Jonathan Morton
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] Two d-link products tested for bloat...

2015-02-25 Thread Jonathan Morton
> Here's a comparison plot of box totals:
http://www.candelatech.com/downloads/rtt_fair4be-comparison-box-plot.png

That's a real mess. All of them utterly fail to get download bandwidth
anywhere near the upload (am I right in assuming it should ideally be about
equal?), and the only ones with even halfway acceptable latency are the
ones with least throughput in either direction.

- Jonathan Morton
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] Bufferbloat and the policy debate on packet loss in nanog

2015-03-01 Thread Jonathan Morton
And the reason for that, of course, is that "pile 'em high and sell 'em
cheap" works pretty well in the consumer marketplace - something that Far
Eastern companies have capitalised on a great deal. Amortising a not
inconsiderable R&D cost over the largest possible number of units makes
economic sense.

I think we'd all just rather they sorted out a better design in that
initial R&D phase. That's something that doesn't appeal to the mindsets of
most of those Far Eastern countries very well. Japan is the most likely
exception, but only because they tend to make stuff for themselves first
and others second.

Funny story from the early days of the Raspberry Pi: they were using a
Chinese factory because they needed cheap, and didn't really know how many
would sell - ten thousand was hoped for, as that would break even quite
nicely. But they went to a lot of trouble to be sure of getting something
that actually worked back from them. Engineering samples had come back to
the UK and tested fine, at last, so they gave the green light.

Then the first batch of 2000 Pis arrived, and the Ethernet port didn't work
on a single one of them. The factory had swapped out the RJ45 socket for a
cheaper one after completing the engineering samples, without noticing that
it didn't have the integrated magnetics that the design relied on, and as a
consequence also had a completely incompatible pinout. They quickly learned
their lesson on that point when the batch was sent back for repair, which
entailed hand desoldering and resoldering to swap the socket for the
correct one. That alone probably tripled the factory's costs, even at Asian
labour rates, but it was their own fault. Penny wise...

Of course the Pi sold slightly better than predicted, so they were soon
able to find a factory in Wales that fitted the budget.

- Jonathan Morton
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] [aqm] ping loss "considered harmful"

2015-03-02 Thread Jonathan Morton

> On 2 Mar, 2015, at 12:17, Mikael Abrahamsson  wrote:
> 
> On Mon, 2 Mar 2015, Brian Trammell wrote:
> 
>> Gaming protocols do this right - latency measurement is built into the 
>> protocol.
> 
> I believe this is the only way to do it properly, and the most likely easiest 
> way to get this deployed would be to use the TCP stack.
> 
> We need to give users an easy-to-understand metric on how well their Internet 
> traffic is working. So the problem here is that the users can't tell how well 
> it's working without resorting to ICMP PING to try to figure out what's going 
> on.
> 
> For instance, if their web browser had insight into what the TCP stack was 
> doing then it could present information a lot better to the user. Instead of 
> telling the user "time to first byte" (which is L4 information), it could 
> tell the less novice user about packet loss, PDV, reordering, RTT, how well 
> concurrent connections to the same IP address are doing, tell more about 
> *why* some connections are slow instead of just saying "it took 5.3 seconds 
> to load this webpage and here are the connections and how long each took". 
> For the novice user there should be some kind of expert system that collects 
> data that you can send to the ISP that also has an expert system to say "it 
> seems your local connection delays packets", please connect to a wired 
> connection and try again". It would know if the problem was excessive delay, 
> excessive delay that varied a lot, packet loss, reordering, or whatever.
> 
> We have a huge amount of information in our TCP stacks that either are locked 
> in there and not used properly to help users figure out what's going on, and 
> there is basically zero information flow between the applications using TCP 
> and the TCP stack itself. Each just tries to do its best on its own layer.

This seems like an actually good idea.  Several of those statistics, at least, 
could be exposed to userspace without incurring any additional overhead in the 
stack (except for the queries themselves), which is important for 
high-performance server users.  TCP stacks already track RTT, and sometimes 
MinRTT - the difference between these values is a reasonable lower-bound 
estimate of induced latency.

For stacks which don’t already track all the desirable data, a socket option 
could be used to turn that on, allocating extra space to do so.  To maximise 
portability, therefore, it might be necessary to require that option before 
statistics requests will be valid, even on stacks which do collect it all 
anyway.

Recent versions of Windows, even, have a semi-magic system which gives a little 
indicator of whether your connection has functioning Internet connectivity or 
not.  This could be extended, if Microsoft saw fit, to interpret these 
statistics and notify the user that their connection was behaving badly in the 
ways we now find interesting.  Whether Microsoft will do such a thing (which 
would undoubtedly piss off every major ISP on the planet) is another matter, 
but it’s a concept that can be used by Linux desktops as well, and with less 
political fallout.

Now, who’s going to knuckle down and implement it?

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


[Cerowrt-devel] The next slice of cake

2015-03-17 Thread Jonathan Morton
 close approximation.  Shared-medium 
links *can* behave like that, if they’re shaped to a miserly enough degree, but 
we really need something different for wifi - although several of cake’s 
components and ideas could be used in such a qdisc.

Roll on cake3.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] The next slice of cake

2015-03-17 Thread Jonathan Morton

> On 17 Mar, 2015, at 22:34, Carlos R. Pasqualini 
>  wrote:
> 
> would you mind to point me to a repository or download area and some
> docs about how to get it working and test it's performance?
> 
> in a (too)fast (and lazy) search at google can't find anything
> 
> Here, i have only 3 DSL links with 3Mbps bandwidth each, for aprox. 300
> student's computers.

That certainly sounds like a situation where cake could help.

Dave Täht made patches a few months ago, based on linux-net-next, which are 
available here:

http://snapon.lab.bufferbloat.net/~d/codel_patches/new_codels.tgz

Those include *two* versions of cake, one of which is configured to use a 
different version of the Codel algorithm than the other.  The intent at the 
time was to compare those two versions against each other, but they also happen 
to have the most up-to-date version of cake.  It really has been a while since 
I’ve been able to work on it.

You’ll need to build the kernel with “sch_cake” or “sch_cake2” turned on.  If 
you copy over your existing kernel config and run “make oldconfig”, you should 
get asked about them (as well as other things).

You’ll also need a patched version of the iproute2 utilities to configure cake. 
 Patches here:

http://snapon.lab.bufferbloat.net/~d/codel_patches/iproute_patches.tgz

Then it’s as simple as running:

# tc qdisc replace dev ethX root cake besteffort bandwidth Kbps atm

That will take care of your outbound traffic, if you replace “ethX” and “” 
with whatever is appropriate (and “cake2” if you built that version).  If you 
have control of both ends of the link, then you can do the same thing to handle 
inbound traffic.

If you only have control of one end of the link, you’ll need to use ingress 
shaping to handle inbound traffic.  This is a little bit more complicated to 
set up (via an Intermediate Functional Block device) than the usual egress 
shaping, and has a couple of disadvantages, but it does work and does help:

# ifconfig ifb0 up
# tc qdisc replace dev ethX handle : ingress
# tc filter add dev ethX parent : protocol all u32 match u32 0 0 action 
mirred egress redirect dev ifb0
# tc qdisc replace dev ifb0 root cake besteffort bandwidth Kbps atm

Both  and  should be slightly below your actual link rates, to ensure 
that cake controls the bottleneck queue.  The “atm” flag is there to take 
account of ATM framing, which ADSL uses.  You can experiment with the precise 
rates without disrupting existing traffic flows:

# tc -s qdisc

(the above is to look up the correct handle figures to use below)

# tc qdisc change dev  handle N:M cake bandwidth ...

Have fun!

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] DOCSIS 3+ recommendation?

2015-03-17 Thread Jonathan Morton
DOCSIS 3.1 mandates support for AQM (at minimum the PIE algorithm) in both
CPE and head end. If you can get hold of a D3.1 modem, you'll at least be
ready for the corresponding upgrade by your ISP.

Unfortunately I don't know which cable modems support which DOCSIS
versions, but it should be straightforward to look that up for any given
model.

- Jonathan Morton
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Codel] The next slice of cake

2015-03-18 Thread Jonathan Morton
> I wonder, are the low priority classes configured with a guaranteed minimum 
> bandwidth to avoid starvation? And will they opportunistically grab all left 
> over bandwidth to fill the pipe? Then speed test should just work as long as 
> there is no competing traffic…

The problem is that, in the present version, *only* the bulk/background class 
can use the full pipe.  Best effort gets a large fraction as its limit, but 
it’s not full.  Existing speed tests use best-effort traffic, and that’s not 
likely to change soon.

The next version should change that.

> I am probably out of my mind, but couldn't it help if cake would allow a 
> fixed cycle mode where it would process 50ms or so worth of packets pass them 
> to the interface, and then sleep until the next 50ms block start. This should 
> just be a fallback mode to not degrade badly under overload…

There is already such a mode to cope with limited-resolution timers and the 
existing overheads.  Without it, the Pentium-MMX is limited to a rather low 
rate (since it then has to wait for a timer interrupt for alternate packets).  
At 50Mbps+, it’s not too far off what it can bridge without shaping (60Mbps+).  
For some reason, the little CPE boxes still lose more performance than that to 
shaping.

Note that due to the very nature of shaping, the link is always either 
effectively idle (in which case an arriving packet is dispatched immediately, 
without waiting for a timer), or overloaded (in which case packets are 
delivered according to a schedule).  The question is whether the shaping rate 
also overloads the hardware.

In any case, bursting for fifty whole milliseconds at a time would absolutely 
*destroy* cake’s latency performance.  I’m not going to do that.  Accommodating 
timer performance is the only concession to bursting I’m willing to make.

> I think the highest priority band should only get its bandwidth quota, and 
> have no silent priority demotion; but otherwise I think that idea that 
> classes can pick up unused bandwidth sounds sane, especially for best effort 
> and background.

Let’s see how well it works this way.  It should be fairly easy to adjust this 
aspect of behaviour later on.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] DOCSIS 3+ recommendation?

2015-03-18 Thread Jonathan Morton
Right, so until 3.1 modems actually become available, it's probably best to
stick with a modem that already supports your subscribed speed, and manage
the bloat separately with shaping and AQM.

- Jonathan Morton
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] hires timer dependency?

2015-03-18 Thread Jonathan Morton
Yes, I recognise that behaviour from when I didn't have the timer
resolution workaround in cake. The PowerBook was fine (modern hardware,
good timers) but the Pentium-MMX only has the basic PC-AT timer hardware,
operating at 1kHz.

Because ack packets end up spaced the same way, it affects download as well
at a multiple of the upload effect bandwidth, even if download is not
shaped. The multiple depends on how the receiving host spaces acks.

- Jonathan Morton
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] DOCSIS 3+ recommendation?

2015-03-19 Thread Jonathan Morton
> On 19 Mar, 2015, at 19:04, Michael Richardson  wrote:
> 
> Jim Gettys  wrote:
>> Moral 1: anything not tested by being used on an ongoing basis,
>> doesn't work.
> 
>> Moral 2: Companies like Comcast do not (currently) control their own
>> destiny, since they outsourced too much of the technology to others.
> 
> Moral 2 might be something that the C* suite types might actuall get.
> I don't know how to get that message there, though.

Be careful what you wish for: if the cable companies controlled the hardware 
more tightly, how much less experimentation would we be able to do?  The 
general hackability of your average CPE router is a benefit to our research 
efforts, even if the default configuration they come with is still utterly 
terrible.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] DOCSIS 3+ recommendation?

2015-03-20 Thread Jonathan Morton

> On 20 Mar, 2015, at 16:54, Michael Welzl  wrote:
> 
> I'd like people to understand that packet loss often also comes with delay - 
> for having to retransmit.

Or, turning it upside down, it’s always a win to drop packets (in the service 
of signalling congestion) if the induced delay exceeds the inherent RTT.

With ECN, of course, you don’t even have that caveat.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] DOCSIS 3+ recommendation?

2015-03-20 Thread Jonathan Morton
ey will tend to get home and spend 
their leisure time on the Internet at roughly the same time as each other.  The 
son fires up his console to frag some noobs, and Mother calls her sister over 
VoIP; so far so good.  But then Father decides on which movie to watch later 
that evening and starts downloading it, and the daughter starts uploading 
photos from her school field trip to goodness knows where. So there are now two 
latency-sensitive and and two throughput-sensitive applications using this 
single link simultaneously, and the throughput-sensitive ones have immediately 
loaded the link to capacity in both directions (one each).

So what happens then?  You tell me - you know your hardware the best.  Or 
haven’t you measured its behaviour under those conditions? Oh, for shame!

Okay, I’ll tell you what happens with 99.9% of head-end and CPE hardware out 
there today:  Mother can’t hear her sister properly any more, nor vice versa.  
And not just because the son has just stormed out of his bedroom yelling about 
lag and how he would have pwned that lamer if only that crucial shot had 
actually gone where he knows he aimed it.  But as far as Father and the 
daughter are concerned, the Internet is still working just fine - look, the 
progress bars are ticking along nicely! - until, that is, Father wants to read 
the evening news, but the news site’s front page takes half a minute to load, 
and half the images are missing when it does.

And Father knows that calling the ISP in the morning (when their call centre is 
open) won’t help.  They’ll run tests and find absolutely nothing wrong, and 
not-so-subtly imply that he (or more likely his wife) is an idiotic 
time-waster.  Of course, a weekday morning isn't when everyone’s using it, so 
nothing *is* wrong.  The link is uncongested at the time of testing, latency is 
as low as it should be, and there’s no line-quality packet loss.  The problem 
has mysteriously disappeared - only to reappear in the evening.  It’s not even 
weather related, and the ISP insists that they have adequate backhaul and 
peering capacity.

So why?  Because the throughput-sensitive applications fill not only the link 
capacity but the buffers in front of it (on both sides).  Since it takes time 
for a packet at the back of each queue to reach the link, this induces latency 
- typically *hundreds* of milliseconds of it, and sometimes even much more than 
that; *minutes* in extreme cases.  But both a VoIP call and a typical online 
game require latencies *below one hundred* milliseconds for optimum 
performance.  That’s why Mother and the son had their respective evening 
activities ruined, and Father’s experience with the news site is representative 
of a particularly bad case.

The better AQM systems now available (eg. fq_codel) can separate 
latency-sensitive traffic from throughput-sensitive traffic and give them both 
the service they need.  This will give your customers a far better experience 
in the reasonably common situation I just outlined - but only if you put it in 
your hardware product and make sure that it actually works.  Otherwise, you’ll 
start losing customers to the first competitor who does.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] marketing #102 - giving netperf-wrapper a better name?

2015-03-20 Thread Jonathan Morton

> On 20 Mar, 2015, at 22:08, Bill Ver Steeg (versteb)  wrote:
> 
> We should call the metric "sucks-less". As in "Box A sucks less than Box B", 
> or "Box C scored a 17 on the sucks less test".

I suspect real marketing drones would get nervous at a negative-sounding name.

My idea - which I’ve floated in the past, more than once - is that the metric 
should be “responsiveness”, measured in Hertz.  The baseline standard would be 
10Hz, corresponding to a dumb 100ms buffer.  Get down into the single-digit 
millisecond range, as fq_codel does, and the Responsiveness goes up above 
100Hz, approaching 1000Hz.

Crucially, that’s a positive sort of term, as well as trending towards bigger 
numbers with actual improvements in performance, and is thus more potentially 
marketable.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] DOCSIS 3+ recommendation?

2015-03-20 Thread Jonathan Morton

> On 21 Mar, 2015, at 02:25, David Lang  wrote:
> 
> As I said, there are two possibilities
> 
> 1. if you mark packets sooner than you would drop them, advantage non-ECN
> 
> 2. if you mark packets and don't drop them until higher levels, advantage 
> ECN, and big advantage to fake ECN

3: if you have flow isolation with drop-from-longest-queue-on-overflow, faking 
ECN doesn’t matter to other traffic - it just turns the faker’s allocation of 
queue into a dumb, non-AQM one.  No problem.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] DOCSIS 3+ recommendation?

2015-03-20 Thread Jonathan Morton

> On 21 Mar, 2015, at 02:38, David Lang  wrote:
> 
> On Sat, 21 Mar 2015, Jonathan Morton wrote:
> 
>>> On 21 Mar, 2015, at 02:25, David Lang  wrote:
>>> 
>>> As I said, there are two possibilities
>>> 
>>> 1. if you mark packets sooner than you would drop them, advantage non-ECN
>>> 
>>> 2. if you mark packets and don't drop them until higher levels, advantage 
>>> ECN, and big advantage to fake ECN
>> 
>> 3: if you have flow isolation with drop-from-longest-queue-on-overflow, 
>> faking ECN doesn’t matter to other traffic - it just turns the faker’s 
>> allocation of queue into a dumb, non-AQM one.  No problem.
> 
> so if every flow is isolated so that what it generates has no effect on any 
> other traffic, what value does ECN provide?

A *genuine* ECN flow benefits from reduced packet loss and smoother progress, 
because the AQM can signal congestion to it without dropping.

> and how do you decide what the fair allocation of bandwidth is between all 
> the threads?

Using DRR.  This is what fq_codel does already, as it happens.  As does cake.

In other words, the last half-dozen posts have been an argument about a solved 
problem.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] archer c7 v2, policing, hostapd, test openwrt build

2015-03-22 Thread Jonathan Morton

> On 23 Mar, 2015, at 02:24, Dave Taht  wrote:
> 
> I don't know how to have it match all traffic, including ipv6
> traffic(anyone??), but that was encouraging.

I use "protocol all u32 match u32 0 0”.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] archer c7 v2, policing, hostapd, test openwrt build

2015-03-22 Thread Jonathan Morton

> On 23 Mar, 2015, at 02:24, Dave Taht  wrote:
> 
> I swear I'd poked into this and fixed it in cerowrt 3.10, but I guess
> I'll have to go poking through the patch set. Something involving
> random number obtaining, as best as I recall.

If it’s reseeding an RNG using the current time, that’s fairly bad practice, 
especially if it’s for any sort of cryptographic purpose.  For general 
purposes, seed a good RNG once before first use, using /dev/urandom, then just 
keep pulling values from it as needed.  Or, if cryptographic quality is 
required, use an actual crypto library’s RNG.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] archer c7 v2, policing, hostapd, test openwrt build

2015-03-22 Thread Jonathan Morton

> On 23 Mar, 2015, at 02:24, Dave Taht  wrote:
> 
> I have long maintained it was possible to build a better fq_codel-like
> policer without doing htb rate shaping, ("bobbie"), and I am tempted
> to give it a go in the coming months.

I have a hazy picture in my mind, now, of how it could be made to work.

A policer doesn’t actually maintain a queue, but it is possible to calculate 
when the currently-arriving packet would be scheduled for sending if a shaped 
FIFO was present, in much the same way that cake actually performs such 
scheduling at the head of a real queue.  The difference between that time and 
the current time is a virtual sojourn time which can be fed into the Codel 
algorithm.  Then, when Codel says to drop a packet, you do so.

Because there’s no queue management, timer interrupts nor flow segregation, the 
overhead should be significantly lower than an actual queue.  And there’s a 
reasonable hope that involving Codel will give better results than either a 
brick-wall or a token bucket.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] archer c7 v2, policing, hostapd, test openwrt build

2015-03-22 Thread Jonathan Morton

> On 23 Mar, 2015, at 03:45, David Lang  wrote:
> 
> are we running into performance issues with fq_codel? I thought all the 
> problems were with HTB or ingress shaping.

Cake is, in part, a response to the HTB problem; it is a few percent more 
efficient so far than an equivalent HTB+fq_codel combination.  It will have a 
few other novel features, too.

Bobbie is a response to the ingress-shaping problem.  A policer (with no queue) 
can be run without involving an IFB device, which we believe has a large 
overhead.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] archer c7 v2, policing, hostapd, test openwrt build

2015-03-23 Thread Jonathan Morton

> On 23 Mar, 2015, at 08:09, Sebastian Moeller  wrote:
> 
> It obviously degrade local performance of se00 and hence be not a true 
> solution unless one is happy to fully dedicate a box as shaper ;)

Dedicating a box as a router/shaper isn’t so much of a problem, but shaping 
traffic between wired and wireless - and sharing the incoming WAN bandwidth 
between them, too - is.  It’s a valid test, though, for this particular purpose.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] archer c7 v2, policing, hostapd, test openwrt build

2015-03-23 Thread Jonathan Morton

> On 23 Mar, 2015, at 19:07, David Lang  wrote:
> 
> I have a few spare 3800s if some of you developers need one.
> 
> unfortunantly I don't have a fast connection to test on.

It might be an idea if I had one, since then I could at least reproduce 
everyone else’s results.  Can you reasonably ship to Europe?

I don’t have a fast Internet connection either, but I do have enough computing 
hardware lying around to set up lab tests at >100Mbps quite well (though I 
could stand to get hold of a few extra GigE NICs).  Verification on a real 
connection is of course good, but netem should make a reasonable substitute if 
configured sanely.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] archer c7 v2, policing, hostapd, test openwrt build

2015-03-23 Thread Jonathan Morton

> On 24 Mar, 2015, at 02:00, Sebastian Moeller  wrote:
> 
> So I got around to a bit of rrul testing of the dual egress idea to asses the 
> cost of IFB, but the results are complicated (so most likely I screwed up).

IFB is normally used on the download direction (as a substitute for a lack of 
AQM at the ISP), so that’s the one which matters.  Can you try a unidirectional 
test which exercises only the download direction?  This should get the clearest 
signal - without CPU-load interference from the upload direction.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] archer c7 v2, policing, hostapd, test openwrt build

2015-03-24 Thread Jonathan Morton
What I'm seeing on your first tests is that double egress gives you
slightly more download at the expense of slightly less upload throughout.
The aggregate is higher.

Your second set of tests tells me almost nothing, because it exercises the
upload more and the download less. Hence why I'm asking for effectively the
opposite test. The aggregate is still significantly higher with double
egress, though.

The ping numbers also tell me that there's no significant latency penalty
either way. Even when CPU saturated, it's still effectively controlling the
latency better than leaving the pipe open.

- Jonathan Morton
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] some notes on the archer c7v2's suitability for make-wifi-fast

2015-03-26 Thread Jonathan Morton

> On 27 Mar, 2015, at 04:10, Dave Taht  wrote:
> 
> I couldn't crash it with a full workload nor
> overheat it with external temps at at 23C. I had tested the 3800 with
> external temp of 44C, and i would prefer to test any new product at
> that before wanting to use it here.

I wish thermal testing had been done on my 3G dongle.  It frequently overheats 
and shuts itself down at 25°C ambient.  It’s approaching the point where I want 
to move my firewall out onto the (usually cooler) balcony.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] some notes on the archer c7v2's suitability for make-wifi-fast

2015-03-26 Thread Jonathan Morton

> On 27 Mar, 2015, at 04:10, Dave Taht  wrote:
> 
> I think cake can be improved quite a bit more and we really need to do some 
> profiling to find other bottlenecks.

I’ve got far enough with the improved Diffserv logic to see that, at the very 
least, cake3 will need to do less work to figure out that it’s throttled.  
That’s because the hard shaper is now global rather than class-local, so I can 
hoist it before any of the class-specific work.  If it gets past that, it can 
be confident that it’s got a packet to deliver.

This is important, because cake_dequeue() often gets called twice per packet - 
once just after cake_enqueue(), when it might be too soon to transmit, and 
again when the watchdog timer fires to denote the correct transmit time.

The class selection loop is also smaller and simpler (fewer edge cases to cope 
with), and I worked out a shortcut to put in further down, so it doesn’t have 
to re-run the class selection if a flow happens to be in deficit.  That’s 
another likely win.

So those might turn out to be significant efficiency improvements, altogether.  
Of course, if the real overhead is elsewhere, the improvements in throughput 
might turn out to be small, but for the moment I’m actually focusing on 
behaviour rather than throughput.

On that note, I’ve added a four-class Diffserv mapping alongside the existing 
eight-class one.  This new mapping is:

Latency Sensitive  (CS7, CS6, EF, VA, CS5, CS4)
Streaming Media(AF4x, AF3x, CS3, AF2x, TOS4, CS2, TOS1)
Best Effort(CS0, AF1x, TOS2, and all not otherwise 
specified)
Background Traffic (CS1)

> So I saw fairly long delays (7ms or more) when running at these speeds 
> through the router.

TBH, it’s a sign of how far we’ve come that we now consider 7ms to be painful.  
:-)

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Commotion-dev] Commotion Router v1.2 enters release testing

2015-03-27 Thread Jonathan Morton

> On 28 Mar, 2015, at 04:05, Josh King  wrote:
> 
> I think the biggest problem we've had with including traffic shaping by
> default in our images is figuring out how best to provide an interface
> to users that is easy to understand and utilize. Any suggestions or help
> in that regard would be welcome.

Most people don’t have the first clue about it, or even that it is a necessary 
or desirable thing.  So ideally, you need something with either no knobs that’s 
invisible (or has a simple checkbox to turn it on with), or something with as 
few knobs as possible and clear instructions on how to turn them.

I’m currently working on something along those lines.

The one parameter that’s definitely situation-dependent is the bandwidth to 
shape at.  If you have an integrated DSL modem, you can probably derive that 
number from querying the sync rate.  If you don’t, you’ll have to ask the user 
to set it.  Even if you can detect a sync rate, though, you should let the user 
override it - some DSL ISPs throttle the connection by other means than sync 
rate.  NB: DSL sync rates do change occasionally, so poll it every minute and 
use “tc qdisc change…” to adjust without spilling the existing queues.

For everything else, put in sensible defaults, calculating from the given 
shaping rate where necessary, and do your best to avoid bothering the user with 
details up front.  At the same time, it’s probably wise to provide a way to see 
what’s being done behind the scenes, but make that an extra click.

If you want to also put in an advanced mode for people who do know what they’re 
doing, you can, but hide it behind an “advanced, here be dragons” button and 
make it easy to go back to the sane defaults.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] archer c7 v2, policing, hostapd, test openwrt build

2015-03-28 Thread Jonathan Morton

> On 29 Mar, 2015, at 04:14, Sebastian Moeller  wrote:
> 
> I do not think my measurements show that ingress handling via IFB is that 
> costly (< 5% bandwidth), that avoiding it will help much.

> Also the current diffserv implementation also costs around 5% bandwidth.

That’s useful information.  I may be able to calibrate that against similar 
tests on other hardware.

But presumably, if you remove the ingress shaping completely, it can then 
handle full line rate downstream?  What’s the comparable overhead figure for 
that?  You see, if we were to use a policer instead of ingress shaping, we’d 
not only be getting IFB and ingress Diffserv mangling out of the way, but HTB 
as well.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] archer c7 v2, policing, hostapd, test openwrt build

2015-03-29 Thread Jonathan Morton

> On 29 Mar, 2015, at 14:16, Sebastian Moeller  wrote:

Okay, so it looks like you get another 5% without any shaping running.  So in 
summary:

- With no shaping at all, the router is still about 10% down compared to 
downstream line rate.
- Upstream is fine *if* unidirectional.  The load of servicing downstream 
traffic hurts upstream badly.
- Turning on HTB + fq_codel loses you 5%.
- Using ingress filtering via IFB loses you another 5%.
- Mangling the Diffserv field loses you yet another 5%.

Those 5% penalties add up.  People might grudgingly accept a 10% loss of 
bandwidth to be sure of lower latency, and faster hardware would do better than 
that, but losing 25% is a bit much.

I should be able to run similar tests through my Pentium-MMX within a couple of 
days, so we can see whether I get similar overhead numbers out of that; I can 
even try plugging in your shaping settings, since they’re (just) within the 
line rate of the 100baseTX cards installed in it.  I could also compare cake’s 
throughput to that of HTB + fq_codel; I’ve already seen an improvement with 
older versions of cake, but I want to see what the newest version gets too.

Come to think of it, I should probably try swapping the same cards into a 
faster machine as well, to see how much they influence the result.

>> You see, if we were to use a policer instead of ingress shaping, we’d not 
>> only be getting IFB and ingress Diffserv mangling out of the way, but HTB as 
>> well.
> 
> But we still would run HTB for egress I assume, and the current results with 
> policers Dave hinted at do not seem like good candidates for replacing 
> shaping…

The point of this exercise was to find out whether a theoretical, ideal policer 
on ingress might - in theory, mind - give a noticeable improvement of 
efficiency and thus throughput.

The existing policers available are indeed pretty unsuitable, as Dave’s tests 
proved, but there may be a way to do better by adapting AQM techniques to the 
role.  In particular, Codel’s approach of gradually increasing a sparse drop 
rate seems like it would work better than the “brick wall” imposed by a plain 
token bucket.

Your results suggest that investigating this possibility might still be 
worthwhile.  Whether anything will come of it, I don’t know.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] archer c7 v2, policing, hostapd, test openwrt build

2015-03-29 Thread Jonathan Morton
>> - Turning on HTB + fq_codel loses you 5%.
> 
> I assume that this partly is caused by the need to shape below the physical 
> link bandwidth, it might be possible to get closer to the limit (if the true 
> bottleneck bandwidth is known, but see above).

> Downstream:   (((1500 - 8 - 40 -20) * 8) * (98407 * 1000) / ((1500 + 14 + 16) 
> * 8)) / 1000 = 92103.8 Kbps; measured: 85.35 Mbps (dual egress); 82.76 Mbps 
> (IFB ingress)

I interpret that as meaning: you have set HTB at 98407 Kbps, and after 
subtracting overheads you expect to get 92103 Kbps goodput.  You got pretty 
close to that on the raw line, and the upstream number gets pretty close to 
your calculated figure, so I can’t account for the missing 6700 Kbps (7%) due 
to link capacity simply not being there.  HTB, being a token-bucket-type 
shaper, should compensate for short lulls, so subtle timing effects probably 
don’t explain it either.

>> Those 5% penalties add up.  People might grudgingly accept a 10% loss of 
>> bandwidth to be sure of lower latency, and faster hardware would do better 
>> than that, but losing 25% is a bit much.
> 
>   But IPv4 simple.qos IFB ingress shaping: ingress 82.3 Mbps versus 93.48 
> Mbps (no SQM) =>  100 * 82.3 / 93.48 = 88.04%, so we only loose 12% (for the 
> sum of diffserv classification, IFB ingress shaping and HTB) which seems more 
> reasonable (that or my math is wrong).

Getting 95% three times leaves you with about 86%, so it’s a useful 
rule-of-thumb figure.  The more precise one (100% - 88.04% ^ -3) would be 4.16% 
per stage.

However, if the no-SQM throughput is really limited by the ISP rather than the 
router, then simply adding HTB + fq_codel might have a bigger impact on 
throughput for someone with a faster service; they would be limited to the same 
speed with SQM, but might have higher throughput without it.  So your 
measurements really give 5% as a lower bound for that case.

>   But anyway I do not argue that we should not aim at decreasing 
> overheads, but just that even without these overheads we are still a (binary) 
> order of magnitude short of the goal, a shaper that can do up to symmetric 
> 150Mbps shaping let alone Dave’s goal of symmetric 300 Mbps shaping.

Certainly, better hardware will perform better.  I personally use a decade-old 
PowerBook for my shaping needs; a 1.5GHz PowerPC 7447 (triple issue, out of 
order, 512KB+ on-die cache) is massively more powerful than a 680MHz MIPS 24K 
(single issue, in order, a few KB cache), and it shows when I conduct LAN 
throughput tests.  But I don’t get the chance to push that much data over the 
Internet.

The MIPS 74K in the Archer C7 v2 is dual issue, out of order; that certainly 
helps.  Multi-core (or at least multi-thread) would probably also help by 
reducing context switch overhead, and allowing more than one device’s 
interrupts to get serviced in parallel.  I happen to have one router with a 
MIPS 34K, which is multi-thread, but the basic pipeline is that of the 24K and 
the clock speed is much lower.

Still, it’s also good to help people get the most out of what they’ve already 
got.  Cake is part of that, but efficiency (by using a simpler shaper than HTB 
and eliminating one qdisc-to-qdisc interface) is only one of its goals.  Ease 
of configuration, and providing state-of-the-art behaviour, are equally 
important to me.

>> The point of this exercise was to find out whether a theoretical, ideal 
>> policer on ingress might - in theory, mind - give a noticeable improvement 
>> of efficiency and thus throughput.
> 
>   I think we only have 12% left on the table and there is a need to keep 
> the shaped/policed ingress rate below the real bottleneck rate with a margin, 
> to keep instances of buffering “bleeding” back into the real bottleneck 
> rare…, 

That’s 12% as a lower bound - and that’s already enough to be noticeable in 
practice.  Obviously we can’t be sure of getting all of it back, but we might 
get enough to bring *you* up to line rate.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] Latency Measurements in Speed Test suites (was: DOCSIS 3+ recommendation?)

2015-03-30 Thread Jonathan Morton

> On 29 Mar, 2015, at 20:36, Pedro Tumusok  wrote:
> 
> Dslreports got a new speedtester up, anybody know Justin or some of the other 
> people over there?
> 
> http://www.dslreports.com/speedtest
> 
> Maybe somebody on here could even lend a hand in getting them to implement 
> features like ping under load etc.

I gave that test a quick try.  It measured my download speed well enough, but 
the upload…

Let’s just say it effectively measured the speed to my local webcache, not to 
the server itself.

 - Jonathan Morton


___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] cake3 vs sqm+fq_codel at 115/12 mbit (basically comcast´s blast service)

2015-04-02 Thread Jonathan Morton
Awesome.

Oddly enough, cake3 actually gets slightly less throughput than
htb+fq_codel on the Pentium-MMX. However that's with the simplest possible
htb configuration (since I'm manually typing it in), and no firewall rules
or NAT going on (just a bridge between two Ethernet ports).

A couple of notes on the statistics that are now reported:

The rate for each class is now a threshold rather than a limit. The class
is permitted to use more than that bandwidth (up to the global limit), but
will yield to lower priority classes in that condition. This is consistent
with both user expectations and standard PHB specs, and means that traffic
benefits from high priority markings only if it's appropriately sparse.

On that note, I expect roughly the filtering uses of each class:

0 - background bulk traffic, CS1 marked, ie. BitTorrent. Use as many
parallel connections as you like, without worrying about ordinary traffic.

1 - best effort, the great majority of ordinary traffic - web pages,
software updates, whatever. If in doubt, leave it here (default CS0 lands
here).

2 - elevated priority, bandwidth sensitive traffic, such as streaming video
or a vlan.

3 - low volume, latency sensitive traffic such as VoIP, online games, NTP,
etc. EF traffic lands here.

A minor frustration for me here - firewall rules on ingress are processed
only after the traffic has already passed through ifb. This means I can't
custom mark my inbound traffic.

Three delay statistics are now reported, all of which are based on EWMAs of
packet sojourn times at dequeue. Pk is biased heavily to high delays (so
should usually report on fat flows), Sp to low delays (so should capture
sparse flows), and Av keeps a true average. The concept of a biased EWMA is
borrowed from ReplayGain and the whole "loudness war" problem that it aims
to solve; some broadcast studios (including the BBC) use audio meters which
work this way.

The new set-associative hash function also generates extra statistics. The
same 1024 queues are now divided into 128 sets of 8 "ways", and a tag on
each queue tracks which flow is presently using it. This allows hash
collisions to be resolved in most cases, with limited worst case overhead,
greatly improving flow isolation under severely stressed conditions. (It's
difficult to provoke this on a home network, but offices may well
appreciate this feature.)

The "way miss" counter is incremented whenever an empty queue's tag is
changed to assign it to a new flow, signalling a departure from the fast
path for that packet. Expect to see a small percentage of these with normal
traffic.

The "way indirect hit" counter tracks the situations where a hash collision
would have occurred with a plain hash function, but was resolved by the set
associativity. This is also a departure from the fast path.

The "way collision" counter indicates when even set associative hashing is
insufficient - there are more than 8 distinct flows attempting to occupy
queues in the same set. In such a case, the search for an empty queue is
terminated and the packet is placed in the queue matching the plain hash.
NB: so far this code path is completely untested to my knowledge!

- Jonathan Morton
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Cake] documentation review request and out of tree cake builds for openwrt/etc.

2015-04-28 Thread Jonathan Morton
> An integral shaper (that can be on or off or tuned dynamically)
> ---
> is much "tighter" than htb - works on current low end hardware without
running out of cpu. See attached graphs.

This seems to imply that "tighter" means "uses less CPU". In fact they are
two separate benefits; "tighter" means "bursts less".

Also, what graphs?

As for installing kernel headers, on Debian based distros the right package
should be linux-headers-`uname -r` .

- Jonathan Morton
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Cake] documentation review request and out of tree cake builds for openwrt/etc.

2015-04-30 Thread Jonathan Morton
It took me a while to get around to thinking about this, partly because my
phone inexplicably refuses to believe snapon exists.

I have two possible explanations for these results. Maybe both apply to
some extent.

Dropping packets rather than marking them results in an increase in ack
density in the reverse direction, because delayed acks get temporarily
disabled. The strength of this effect depends on the BDP and the depth of
delayed acks.

Increasing the number of simultaneous flows might increase the CPU load of
connection tracking for NAT. Are you shaping and doing NAT on the same box?
I think this might be the basic reason for increased latency.

- Jonathan Morton
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Cake] documentation review request and out of tree cake builds for openwrt/etc.

2015-04-30 Thread Jonathan Morton
I'm not concerned about aggregation effects on cable, because it's not
station specific as it is on Wi-Fi. It might be a source of one extra
access grant delay at most; after that there'll be enough packets in the
modem's FIFO to justify a full sized grant. Here the modem's buffer really
does exist for a good reason, and we can rely on it to do the job.

I'm also not concerned about ack bunching, because realistically that isn't
really caused by FQ. Given a 10:1 bandwidth ratio, and a 3:1 delayed ack
factor, there'll be 3.33 acks for each data packet in the slow direction,
while at our default 300 quantum the DRR will cycle five times per data
packet. So acks for a given flow will only be delivered bunched if they
arrived bunched.

- Jonathan Morton
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Cake] openwrt build with latest cake and other qdiscs

2015-05-14 Thread Jonathan Morton

> On 14 May, 2015, at 16:09, Alan Jenkins  
> wrote:
> 
> On 14/05/15 11:53, Jonathan Morton wrote:
>>> On 14 May, 2015, at 13:50, Alan Jenkins 
>>>  wrote:
>>> 
>>> generic-receive-offload: on

>> This implies that adding GRO peeling to cake might be a worthwhile priority.
>> 
>>  - Jonathan Morton
> 
> Ah, not on my account, it seems.
> 
> # tc -stat qdisc |grep maxpacket
>  maxpacket 590 drop_overlimit 0 new_flow_count 1 ecn_mark 0
>  maxpacket 256 drop_overlimit 0 new_flow_count 0 ecn_mark 0
> ...
>  maxpacket 1696 drop_overlimit 0 new_flow_count 305 ecn_mark 0
>  maxpacket 1749 drop_overlimit 0 new_flow_count 274 ecn_mark 0

A maxpacket of 1749 *does* imply that GRO or GSO is in use.  Otherwise I’d 
expect to see 1514 or less.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Cake] openwrt build with latest cake and other qdiscs

2015-05-14 Thread Jonathan Morton
Ah - looking at it from that perspective, your largest packet includes a
1500 byte payload, 40 bytes of PPPoE framing, and 44 more bytes of AAL5
padding, all wrapped up in 33 ATM cells. With even slightly less overhead
or a fractionally reduced payload, you'd go down to 32 cells.

- Jonathan Morton
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Cake] openwrt build with latest cake and other qdiscs

2015-05-14 Thread Jonathan Morton
A 64k aggregate would be broken up at any speed below half a gigabit. So
the 1ms heuristic seems sane from that perspective.

- Jonathan Morton
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] heisenbug: dslreports 16 flow test vs cablemodems

2015-05-15 Thread Jonathan Morton

> On 15 May, 2015, at 14:27, Bill Ver Steeg (versteb)  wrote:
> 
> But the TCP timestamps are impacted by packet loss. You will sometimes get an 
> accurate RTT reading, and you will sometimes get multiples of the RTT due to 
> packet loss and retransmissions. I would hate to see a line classified as 
> bloated when the real problem is simple packet loss. Head of line blocking, 
> cumulative acks, yada, yada, yada.

TCP stacks supporting Timestamps already implement an algorithm to get a 
relatively reliable RTT measurement out of them.  The algorithm is described in 
the relevant RFC.  That’s the entire point of having Timestamps, and it 
wouldn’t be difficult to replicate that externally by observing both directions 
of traffic past an intermediate point; you’d get the partial RTTs to each 
endpoint of the flow, the sum of which is the total RTT.

But what you’d get is the RTT of that particular TCP flow.  This is likely to 
be longer than the RTT of a competing sparse flow, if the bottleneck queue uses 
any kind of competent flow isolation.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] heisenbug: dslreports 16 flow test vs cablemodems

2015-05-18 Thread Jonathan Morton

> On 18 May, 2015, at 15:30, Simon Barber  wrote:
> 
> implementing AQM without implementing a low priority traffic class (such as 
> DSCP 8 - CS1) will prevent solutions like LEDBAT from working

I note that the LEDBAT RFC itself points out this fact, and also that an AQM 
which successfully “defeats” LEDBAT in fact achieves LEDBAT’s goal (it’s in the 
name: Low Extra Delay), just in a different way.

There’s a *different* reason for having a “background” traffic class, which is 
that certain applications use multiple flows, and thus tend to outcompete 
conventional single-flow applications.  Some of these multiple-flow 
applications currently use LEDBAT to mitigate this effect, but in an FQ 
environment (not with pure AQM!) this particular effect of LEDBAT is frustrated 
and even reversed.

That is the main reason why cake includes Diffserv support.  It allows 
multiple-flow LEDBAT applications to altruistically move themselves out of the 
way; it also allows applications which are latency-sensitive to request an 
appropriate boost over heavy best-effort traffic.  The trick is arrange such 
boosts so that requesting them doesn’t give an overwhelming advantage to bulk 
applications; this is necessary to avoid abuse of the Diffserv facility.

I think Cake does achieve that, but some day I’d like some data confirming it.  
A test I happened to run yesterday (involving 50 uploads and 1 download, with 
available bandwidth heavily in the download’s favour) does confirm that the 
Diffserv mechanism does its job properly when asked to, but that doesn’t 
address the abuse angle.

NB: the abuse angle is separate from the attack angle.  It’s always possible to 
flood the system in order to degrade service; that’a an attack.  Abuse, by 
contrast, is gaming the system to gain an unfair advantage.  The latter is what 
cake’s traffic classes are intended to prevent, by limiting the advantage that 
misrepresenting traffic classes can obtain.  If abuse is inherently discouraged 
by the system, then it becomes possible to *trust* DSCPs to some extent, making 
them more useful in practice.

For some reason, I haven’t actually subscribed to IETF AQM yet.  Perhaps I 
should catch up.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] heisenbug: dslreports 16 flow test vs cablemodems

2015-05-18 Thread Jonathan Morton

> On 18 May, 2015, at 18:09, dpr...@reed.com wrote:
> 
> I'm curious as to why one would need low priority class if you were using 
> fq_codel?  Are the LEDBAT flows indistinguishable?  Is there no congestion 
> signalling (no drops, no ECN)? The main reason I ask is that end-to-end flows 
> should share capacity well enough without magical and rarely implemented 
> things like diffserv and intserv.

The Cloonan paper addresses this question.  
http://snapon.lab.bufferbloat.net/~d/trimfat/Cloonan_Paper.pdf

Let me summarise, with some more up-to-date additions:

Consider a situation where a single application is downloading using many (say 
50) flows in parallel.  It’s rather easy to provoke BitTorrent into doing 
exactly this.  BitTorrent also happens to use LEDBAT by default (via uTP).

With a dumb FIFO, LEDBAT will sense the queue depth via the increased latency, 
and will tend to back off when some other traffic arrives to share that queue.

With AQM, the queue depth doesn’t increase much before ECN marks and/or packet 
drops appear.  LEDBAT then behaves like a conventional TCP, since it has lost 
the delay signal.  Hence LEDBAT is indistinguishable from conventional TCP 
under AQM.

With FQ, each flow gets a fair share of the bandwidth.  But the *application* 
using 50 flows gets 50 times as much bandwidth as the application using only 1 
flow.  If the single-flow application is something elastic like a Web browser 
or checking e-mail, that might be tolerable.

But if the single-flow application is inelastic (as VoIP usually is), and needs 
more than 2% of the link bandwidth to work properly, that’s a problem if it’s 
competing against 50 flows.  That’s one of the Cloonan paper’s results; what 
they recommended was to use FQ with a small number of queues, so that this 
drawback was mitigated by way of hash collisions.

Adding Diffserv and recommending that LEDBAT applications use the “background” 
traffic class (CS1 DSCP) solves this problem more elegantly.  The share of 
bandwidth used by BitTorrent (say) is then independent of the number of flows 
it uses, and it also makes sense to configure FQ for ideal flow isolation 
rather than for mitigation.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Bloat] heisenbug: dslreports 16 flow test vs cablemodems

2015-05-18 Thread Jonathan Morton

> On 18 May, 2015, at 20:03, Sebastian Moeller  wrote:
> 
>> Adding Diffserv and recommending that LEDBAT applications use the
>> “background” traffic class (CS1 DSCP) solves this problem more
>> elegantly.  The share of bandwidth used by BitTorrent (say) is then
>> independent of the number of flows it uses, and it also makes sense to
>> configure FQ for ideal flow isolation rather than for mitigation.
> 
> I wonder, for this to work well wouldn't we need to allow/honor at least CS1 
> marks on ingress? I remember there was some discussion about mislabeled 
> traffic on ingress (Comcast I believe), do you see an easy way around that 
> issue?

I don’t know much about the characteristics of this mislabelling.  Presumably 
though, Comcast is using DSCP remarking in an attempt to manage internal 
congestion.  If latency-sensitive and/or inelastic traffic is getting marked 
CS1, that would be a real problem, and Comcast would need leaning on to fix it. 
 It’s slightly less serious if general best-effort traffic gets CS1 markings.

One solution would be to re-mark the traffic at the CPE on ingress, using local 
knowledge of what traffic is important and which ports are associated with 
BitTorrent.  Unfortunately, the ingress qdisc runs before iptables, making that 
more difficult.  I think it would be necessary to do re-marking using an 
ingress action before passing it to the qdisc.  Either that, or a pseudo-qdisc 
which just does the re-marking before handing the packet up the stack.

I’m not sure whether it’s possible to attach two ingress actions to the same 
interface, though.  If not, the re-marking action module would also need to 
incorporate act_mirred functionality, or a minimal subset thereof.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] FireWRT router: cheap, AC1200 and runs OpenWRT

2015-05-24 Thread Jonathan Morton

> On 19 May, 2015, at 03:56, Jeremy Iliev  wrote:
> 
> Might be worth a look into as a potential candidate for the make-wifi-fast 
> router.

Apparently, the wifi chipsets are "MT7612E and MT7602E”.  Is the driver support 
for them any good?

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] ingress rate limiting falling short

2015-06-03 Thread Jonathan Morton
> On the 3800, it never meets the rate, but it's only off by maybe 5%.

That's about right for Ethernet, IPv4 and TCP header overheads with 1500
MTU. The measured throughput is application level, while HTB controls at
the Ethernet level.

- Jonathan Morton
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] ingress rate limiting falling short

2015-06-03 Thread Jonathan Morton
Remind me: does HTB have a divide in the fast path? ARMv6 and ARMv7-A CPUs
don't have a hardware integer divide, so that can really hurt.

This is fixed I think in ARMv8 and definitely in AArch64, but divides are
still expensive instructions on any CPU.

- Jonathan Morton
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Cake] openwrt build available with latest cake and fq_pie

2015-06-14 Thread Jonathan Morton

> On 14 Jun, 2015, at 19:09, Dave Taht  wrote:
> 
> I do pretty strongly think count - 1 is the rightest thing still.

I really don’t.  Here’s why:

Every time Codel triggers the dropping state, it will mark or drop at least one 
packet, and increment count by that number.  With count decremented only by 1 
on recovery, it will effectively remain constant *if*, by some miracle, the 
queue empties before the second signal was sent; it cannot decrease between 
episodes unless it resets or wraps.

With count decremented by 2 on recovery, it is possible for count to decrease 
slowly in that ideal case, but it’ll remain constant if two signals were sent 
before the queue cleared, and - this is important - it will always continue to 
increase if three or more signals are sent before the queue empties.

If one signal did suffice to clear the queue, then logically the value of count 
was irrelevant to that congestion episode and shouldn’t be preserved.  This is 
true regardless of the actual reason the queue emptied.

The problem arises when more than one signal is sent before the queue is 
observed to clear.  This could be a sign of several distinct network conditions:

- The RTT is longer than interval / sqrt(count), in which case one signal would 
still have been sufficient, and the ideal value of count is less than its 
current value.  On non-ECN TCP flows, this results in more retransmissions than 
necessary.

- The RTT is much shorter than interval / sqrt(count), so the congestion window 
is recovering faster than the signalling rate, and count needs to increase to 
compensate for that.

- There is more than one flow sharing the queue, and it was necessary to signal 
to all of them, in which case count should reflect the flow count and be 
capable of adjusting both up and down.

- The flow is unresponsive, so count should adjust to provide the correct 
dropping rate, and RTT is irrelevant.  With default parameters, the maximum 
drop rate is presently 25600 pps (which would cause count to wrap after a few 
seconds, until I put in the saturating arithmetic).

How does Codel distinguish between those cases?  It can’t - at least, not 
reliably.  So it must allow count to increase until the queue is observed to be 
controlled, and then decrease count by some other means to cover the case where 
it was overestimated.  For this latter phase, count-2 is obviously insufficient 
to cope with the case where count is actually correct, but more than one signal 
per episode is required.

*That* is why I put in count/2.  A multiplicative decrease allows count to 
stabilise at some value which adequately controls the queue, rather than 
continuously increasing past it.  For the typical cake case where there is one 
flow per Codel instance and the RTT is of Internet scale, this should work at 
least as well as an additive decrease; in particular, the behaviour is 
identical where count ended at 2, 3 or 4 (it can’t end at 1).

Of course, hard data would help to evaluate it, but I do think it’s 
theoretically sound.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Cake] openwrt build available with latest cake and fq_pie

2015-06-14 Thread Jonathan Morton

> On 14 Jun, 2015, at 20:38, Dave Taht  wrote:
> 
>> Every time Codel triggers the dropping state, it will mark or drop at least 
>> one packet, and increment count by that number.  With count decremented only 
>> by 1 on recovery, it will effectively remain constant *if*, by some miracle, 
>> the queue empties before the second signal was sent; it cannot decrease 
>> between episodes unless it resets or wraps.
> 
> It aint a miracle, it is hopefully within an rtt.

No, it is *at minimum* one RTT.  It takes that long for the congestion signal 
to reach the receiver, be reflected back to the sender, the sender’s reaction 
to *begin to* appear at the queue.  Then the queue *starts* to empty, if the 
signal is what’s required to make it do so.

> When resuming the drop phase of codel, it is almost *already* too late
> to catch that burst incurring the latency.

Yes, but that’s what FQ is for.  And ELR, if we ever get that properly started.

> Sometimes I think we need to do away with the count idea and measure
> slopes of curves instead, and "harmonics”.


> Is there any reason why the decrease couldn't be some sort of decay?
> I.e. a function of how long ago the drop state was exited?

Such things are theoretically possible, but require further thought to 
determine how best to do them.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Cake] openwrt build available with latest cake and fq_pie

2015-06-14 Thread Jonathan Morton

> On 14 Jun, 2015, at 21:24, Dave Taht  wrote:
> 
> Flows, btw, do end quite rapidly in the real world. What was it, 95%
> of all web flows ended inside of IW10?

It might be worth thinking about how heavily loaded a network needs to be for 
Codel to trigger on such a flow.  However, with perfect flow isolation, count 
will start at 1, making it less relevant to the present thread.

Cake will start triggering on a instantly-arrived burst (call it a packet 
salvo) after 35ms.  Thus, a ten-packet burst will not trigger provided at least 
4.5 Mbps is available to that flow.  However, on many links that is still a 
tall order, since if the flows really are that short, there are probably lots 
of them in parallel.

A paced burst is much more friendly.  At a 100ms RTT, IW10 could be delivered 
at 100pps (1.5 Mbps), extending the range of link speeds on which congestion 
signalling will not occur by at least 3x (and generally more).  Since this 
greatly reduces the risk of packet loss, it might actually reduce the average 
time that the sender needs to maintain the connection’s buffers, despite the 
deliberate 1-RTT delay introduced.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] Latest build test - new sqm-scripts seem to work; "cake overhead 40" [still] didn't

2015-06-20 Thread Jonathan Morton
I can probably glean more information from "tc -s qdisc".

However, my hypothesis is still that the version of cake you have is old,
while the version of tc that you have is newer.

- Jonathan Morton
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] Latest build test - new sqm-scripts seem to work; "cake overhead 40" [still] didn't

2015-06-20 Thread Jonathan Morton
It looks like your cake is new enough to support set associative hashing,
but not the new overhead handling. The ATM flag was put in a long time ago.

Looking at the code which grabs those options (cake_change), there doesn't
seem to be a way to detect whether an unsupported option was provided by
userspace, unless nla_parse_nested returns an error if the provided option
struct would be overflowed. Clearly it doesn't, but just truncates it to
fit.

- Jonathan Morton
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] performance numbers from WRT1200AC (Re: Latest build test - new sqm-scripts seem to work; "cake overhead 40" didn't)

2015-06-23 Thread Jonathan Morton
Not so easy to find those in Finland, it seems, but I assume Amazon carry
them.

- Jonathan Morton
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] performance numbers from WRT1200AC (Re: Latest build test - new sqm-scripts seem to work; "cake overhead 40" didn't)

2015-06-26 Thread Jonathan Morton
Hypothesis: this might have to do with the receive path. Some devices might
have more capacity than others to buffer inbound packets until the CPU can
get around to servicing them.

- Jonathan Morton
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] performance numbers from WRT1200AC (Re: Latest build test - new sqm-scripts seem to work; "cake overhead 40" didn't)

2015-06-26 Thread Jonathan Morton
These would be hardware tail drops - there might not be a physical counter
recording them. But you could instrument three driver to see whether the
receive buffer is full when serviced.

- Jonathan Morton
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] performance numbers from WRT1200AC (Re: Latest build test - new sqm-scripts seem to work; "cake overhead 40" didn't)

2015-06-28 Thread Jonathan Morton
To be honest, HTB + cake isn't really the preferred configuration.

- Jonathan Morton
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] performance numbers from WRT1200AC (Re: Latest build test - new sqm-scripts seem to work; "cake overhead 40" didn't)

2015-06-29 Thread Jonathan Morton
I'd also like to be able to try it out on CPE hardware. However, what I've
got is a Buffalo H300N, so I'll need build instructions (preferably
starting from an existing stock build) as well as setup.

The Buffalo isn't as powerful as some others, being based around a 34K core.

- Jonathan Morton
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] [Cake] peeling harder with cake

2015-07-02 Thread Jonathan Morton

> On 3 Jul, 2015, at 04:27, Dave Taht  wrote:
> 
> Also got more throughput for some reason.

Is the NIC doing software GSO or does it have hardware support?  If the former, 
it would suggest that software GSO is a universally bad idea and should be 
excised.  If the latter, GSO should be disabled full stop for this hardware, so 
we can stop fannying about with peeling.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] Correct syntax for cake commands and atm issues.

2015-07-10 Thread Jonathan Morton
You're already using correct syntax - I've written it to be quite lenient
and use sensible defaults for missing information. There are several sets
of keywords and parameters which are mutually orthogonal, and don't depend
on each other, so "besteffort" has nothing to do with "overhead" or "atm".

What's probably happening is that you're using a slightly old version of
the cake kernel module which lacks the overhead parameter entirely, but a
more up to date tc which does support it. We've seen this combination crop
up ourselves recently.

- Jonathan Morton
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] Correct syntax for cake commands and atm issues.

2015-07-10 Thread Jonathan Morton
Qdiscs should be used on any link that might become a bottleneck. In most
consumer cases, that will indeed be your WAN interface.

- Jonathan Morton
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] Correct syntax for cake commands and atm issues.

2015-07-10 Thread Jonathan Morton
> qdisc cake 8002: dev pppoe-wan root refcnt 2 bandwidth 850Kbit besteffort 
> flows raw

> qdisc cake 8001: dev ifb4pppoe-wan root refcnt 2 bandwidth 11500Kbit 
> besteffort flows atm overhead 40

> Download:  6.8 Mbps
>   Upload:  0.59 Mbps

Does anyone else see the discrepancy here?

Simply put, if the shaper isn’t set to a lower bandwidth than the link rate, it 
won’t control the queue.

 - Jonathan Morton

___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


Re: [Cerowrt-devel] Cerowrt-devel Digest, Vol 44, Issue 24

2015-07-19 Thread Jonathan Morton
> We were on the verge of enabling it on our (the UUNET) end when
Louis Mamakos identified the fundamental show-stopper to doing it.
>
> It gives DOS attacks nuclear weapons.
>
> Simply set the DOS packets to the highest priority and pound away.

I identified this problem when designing cake, and came up with a
solution:  Every request for higher priority (low latency) is also
interpreted as a relinquishment of rights over high bandwidth.

In an early version, this tenet was enforced using hard limits. This worked
as designed, but caused problems for users attempting to tune their
bandwidth setting using best effort traffic, since there was also a least
effort class below that.

In the current version, a bandwidth threshold is used instead. If the
traffic in the class remains below the threshold, then they get the (non
strict) priority requested. If it strays above, the priority is demoted
below other classes instead. In the absence of competing traffic, any class
can use the full available bandwidth, but there's always room for other
classes to start up.

None of this behaviour is specified, suggested or even identified as
desirable in the relevant RFCs. I had to invent it out of whole cloth,
after recognising that Diffserv is simply not specified in a way that can
be practically implemented, or from an implementor's point of view. The old
version of the TOS byte was much clearer in that respect - three bits of
precedence, three or four bits of routing preferences (although the latter
was also poorly specified, it was at least clear what it meant).

Frankly I think IETF dropped the ball there. "Rough consensus and working
code." I find it difficult to believe that they had working code
implementing a complete Diffserv system.

- Jonathan Morton
___
Cerowrt-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


  1   2   3   >