In short, people who build hardware devices, or device drivers, don't understand TCP.
There is a first class education failure in all this. We have yet to find almost any device that isn't bloated; the only question is how badly. - Jim On Thu, Feb 28, 2013 at 3:58 PM, <dpr...@reed.com> wrote: > At least someone actually saw what I've been seeing for years now in Metro > area HSPA and LTE deployments. > > > > As you know, when I first reported this on the e2e list I was told it > could not possibly be happening and that I didn't know what I was talking > about. No one in the phone companies was even interested in replicating my > experiments, just dismissing them. It was sad. > > > > However, I had the same experience on the original Honeywell 6180 dual CPU > Multics deployment in about 1973. One day all my benchmarks were running > about 5 times slower every other time I ran the code. I suggested that one > of the CPUs was running 5x slower, and it was probably due to the CPU cache > being turned off. The hardware engineer on site said that that was > *impossible*. After 4 more hours of testing, I was sure I was right. That > evening, I got him to take the system down, and we hauled out an > oscilloscope. Sure enough, the gate that received the "cache hit" signal > had died in one of the processors. The machine continued to run, since > all that caused was for memory to be fetched every time, rather than using > the cache. > > > > Besides the value of finding the "root cause" of anomalies, the story > points out that you really need to understand software and hardware > sometimes. The hardware engineer didn't understand the role of a cache, > even though he fully understood timing margins, TTL logic, core memory > (yes, this machine used core memory), etc. > > > > We both understood oscilloscopes, fortunately. > > > > In some ways this is like the LTE designers understanding TCP. They > don't. But sometimes you need to know about both in some depth. > > > > Congratulations, Jim. More Internet Plumbing Merit Badges for you. > > > > -----Original Message----- > From: "Jim Gettys" <j...@freedesktop.org> > Sent: Thursday, February 28, 2013 3:03pm > To: "Dave Taht" <dave.t...@gmail.com> > Cc: "David P Reed" <dpr...@reed.com>, "cerowrt-devel@lists.bufferbloat.net" > <cerowrt-devel@lists.bufferbloat.net> > Subject: Re: [Cerowrt-devel] Google working on experimental 3.8 Linux > kernel for Android > > I've got a bit more insight into LTE than I did in the past, courtesy of > the last couple days. > To begin with, LTE runs with several classes of service (the call them > bearers). Your VOIP traffic goes into one of them. > And I think there is another as well that is for guaranteed bit rate > traffic. One transmit opportunity may have a bunch of chunks of data, and > that data may be destined for more than one device (IIRC). It's > substantially different than WiFi. > But most of what we think of as Internet stuff (web surfing, dns, etc) all > gets dumped into a single best effort ("BE"), class. > The BE class is definitely badly bloated; I can't say how much because I > don't really know yet; the test my colleague ran wasn't run long enough to > be confident it filled the buffers). But I will say worse than most cable > modems I've seen. I expect this will be true to different degrees on > different hardware. The other traffic classes haven't been tested yet for > bufferbloat, though I suspect they will have it too. I was told that those > classes have much shorter queues, and when the grow, they dump the whole > queues (because delivering late real time traffic is useless). But trust > *and* verify.... Verification hasn't been done for anything but BE > traffic, and that hasn't been quantified. > But each device gets a "fair" shot at bandwidth in the cell (or sector of > a cell; they run 3 radios in each cell), where fair is basically time > based; if you are at the edge of a cell, you'll get a lot less bandwidth > than someone near a tower; and this fairness is guaranteed by a scheduler > than runs in the base station (called a b-nodeb, IIIRC). So the base > station guarantees some sort of "fairness" between devices (a place where > Linux's wifi stack today fails utterly, since there is a single queue per > device, rather than one per station). > Whether there are bloat problems at the link level in LTE due to error > correction I don't know yet; but it wouldn't surprise me; I know there was > in 3g. The people I talked to this morning aren't familiar with the HARQ > layer in the system. > The base stations are complicated beasts; they have both a linux system in > them as well as a real time operating system based device inside We don't > know where the bottle neck(s) are yet. I spent lunch upping their paranoia > and getting them through some conceptual hurdles (e.g. multiple bottlenecks > that may move, and the like). They will try to get me some of the data so > I can help them figure it out. I don't know if the data flow goes through > the linux system in the bnodeb or not, for example. > Most carriers are now trying to ensure that their backhauls from the base > station are never congested, though that is another known source of > problems. And then there is the lack of AQM at peering point routers.... > You'd think they might run WRED there, but many/most do not. > - Jim > > > On Thu, Feb 28, 2013 at 2:08 PM, Dave Taht <dave.t...@gmail.com> wrote: > >> >> >> On Thu, Feb 28, 2013 at 1:57 PM, <dpr...@reed.com> wrote: >> >>> Doesn't fq_codel need an estimate of link capacity? >>> >> No, it just measures delay. Since so far as I know the outgoing portion >> of LTE is not soft-rate limited, but sensitive to the actual available link >> bandwidth, fq_codel should work pretty good (if the underlying interfaces >> weren't horribly overbuffired) in that direction. >> I'm looking forward to some measurements of actual buffering at the >> device driver/device levels. >> I don't know how inbound to the handset is managed via LTE. >> Still quite a few assumptions left to smash in the above. >> ... >> in the home router case.... >> ... >> When there are artificial rate limits in play (in, for example, a cable >> modem/CMTS, hooked up via gigE yet rate limiting to 24up/4mbit down), then >> a rate limiter (tbf,htb,hfsc) needs to be applied locally to move that rate >> limiter/queue management into the local device, se we can manage it better. >> I'd like to be rid of the need to use htb and come up with a rate limiter >> that could be adjusted dynamically from a daemon in userspace, probing for >> short all bandwidth fluctuations while monitoring the load. It needent send >> that much data very often, to come up with a stable result.... >> You've described one soft-rate sensing scheme (piggybacking on TCP), and >> I've thought up a few others, that could feed back from a daemon some >> samples into a a soft(er) rate limiter that would keep control of the >> queues in the home router. I am thinking it's going to take way too long to >> fix the CPE and far easier to fix the home router via this method, and >> certainly it's too painful and inaccurate to merely measure the bandwidth >> once, then set a hard rate, when >> So far as I know the gargoyle project was experimenting with this >> approach. >> A problem is in places that connect more than one device to the cable >> modem... then you end up with those needing to communicate their perception >> of the actual bandwidth beyond the link. >> >>> Where will it get that from the 4G or 3G uplink? >>> >>> >>> >>> -----Original Message----- >>> From: "Maciej Soltysiak" <mac...@soltysiak.com> >>> Sent: Thursday, February 28, 2013 1:03pm >>> To: cerowrt-devel@lists.bufferbloat.net >>> Subject: [Cerowrt-devel] Google working on experimental 3.8 Linux kernel >>> for Android >>> >>> Hiya, >>> Looks like Google's experimenting with 3.8 for Android: >>> https://android.googlesource.com/kernel/common/+/experimental/android-3.8 >>> Sounds great if this means they will utilize fq_codel, TFO, BQL, etc. >>> Anyway my nexus 7 says it has 3.1.10 and this 3.8 will probably go to >>> Android 5.0 so I hope Nexus 7 will get it too some day or at least 3.3+ >>> Phoronix coverage: >>> http://www.phoronix.com/scan.php?page=news_item&px=MTMxMzc >>> Their 3.8 changelog: >>> https://android.googlesource.com/kernel/common/+log/experimental/android-3.8 >>> Regards, >>> Maciej >>> _______________________________________________ >>> Cerowrt-devel mailing list >>> Cerowrt-devel@lists.bufferbloat.net >>> https://lists.bufferbloat.net/listinfo/cerowrt-devel >>> >>> >> >> -- >> Dave Täht >> >> Fixing bufferbloat with cerowrt: >> http://www.teklibre.com/cerowrt/subscribe.html >> _______________________________________________ >> Cerowrt-devel mailing list >> Cerowrt-devel@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/cerowrt-devel >> >>
_______________________________________________ Cerowrt-devel mailing list Cerowrt-devel@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/cerowrt-devel