Re: [Bloat] [Cake] [Make-wifi-fast] dslreports is no longer free

2020-05-06 Thread Sebastian Moeller
Hi Sergey,


> On May 4, 2020, at 19:04, Sergey Fedorov  wrote:
> 
> Sergey - I wasn't assuming anything about fast.com. The document you shared 
> wasn't clear about the methodology's details here. Others sadly, have 
> actually used ICMP pings in the way I described. I was making a generic 
> comment of concern.
> 
> That said, it sounds like what you are doing is really helpful (esp. given 
> that your measure is aimed at end user experiential qualities).
> David - my apologies, I incorrectly interpreted your statement as being said 
> in context of fast.com measurements. The blog post linked indeed doesn't 
> provide the latency measurement details - was written before we added the 
> extra metrics. We'll see if we can publish an update. 
> 
> 1) a clear definition of lag under load that is from end-to-end in latency, 
> and involves, ideally, independent traffic from multiple sources through the 
> bottleneck.
> Curious if by multiple sources you mean multiple clients (devices) or 
> multiple connections sending data? 

Not trying to speak for David obviously, but the dslreports speedtest, 
when using multiple streams mostly recruited streams for different server 
locations and reported these locations in some of the detailed report parts. 
For normal use that level of detail is overkill, but for problematic cases that 
was really elucidating (the reported the retransmit count for up to 5 server 
sites):




Server  NettSpeed 
Avg   RTT / Jitter AvgRe-xmit Avg Cwnd
Singapore (softlayer)   d1  7.3 Mb/s
200.5±7ms   0.1%154
Houston, USA (softlayer)d3  3.07 Mb/s   
157.6±3.6ms 0.4%125
Dallas, USA (softlayer) d3  2.65 Mb/s   
150.1±3.3ms 0.6%131
San Jose, USA (softlayer)   d3  2.77 Mb/s   
185.6±5ms   0.5%126
Nashville, TN, USA (Twinlakes coop) d3  2.34 Mb/s   
127.6±4ms   0.6%76


Run Log:
0.00s setting download file size to 40mb max for Safari
0.00s Start testing DSL
00.43s Servers available: 10
00.46s pinging 10 locations
01.66s geo location failed
05.47s 19ms Amsterdam, Netherlands, EU
05.47s 63ms Nashville, TN, USA
05.47s 72ms Dallas, USA
05.47s 75ms Houston, USA
05.47s 89ms San Jose, USA
05.47s 96ms Singapore
05.47s could not reach Silver Spring, MD, USA https://t70.dslreports.com
05.47s could not reach Newcastle, Delaware, USA https://t68.dslreports.com
05.47s could not reach Westland, Michigan, USA https://t67.dslreports.com
05.47s could not reach Beaverton, Oregon, USA https://t69.dslreports.com
05.48s 5 seconds measuring idle buffer bloat
10.96s Trial download normal
10.99s Using GET for upload testing
10.99s preference https set to 1
10.99s preference fixrids set to 1
10.99s preference streamsDown set to 16
10.99s preference dnmethod set to websocket
10.99s preference upmethod set to websocket
10.99s preference upduration set to 30
10.99s preference streamsUp set to 16
10.99s preference dnduration set to 30
10.99s preference bloathf set to 1
10.99s preference rids set to [object Object]
10.99s preference compress set to 1
19.11s  stream0 4.71 megabit Amsterdam, Netherlands, EU
19.11s  stream1 2.74 megabit Dallas, USA
19.11s  stream2 4.68 megabit Singapore
19.11s  stream3 2.23 megabit Dallas, USA
19.11s  stream4 3.31 megabit Houston, USA
19.11s  stream5 3.19 megabit Houston, USA
19.11s  stream6 2.83 megabit Amsterdam, Netherlands, EU
19.11s  stream7 1.13 megabit Dallas, USA
19.11s  stream8 2.15 megabit Amsterdam, Netherlands, EU
19.11s  stream9 2.35 megabit San Jose, USA
19.11s  stream10 1.46 megabit Nashville, TN, USA
19.11s  stream11 1.42 megabit Nashville, TN, USA
19.11s  stream12 2.92 megabit Nashville, TN, USA
19.11s  stream13 2.19 megabit Houston, USA
19.11s  stream14 2.16 megabit San Jose, USA
19.11s  stream15 1.2 megabit San Jose, USA
41.26s End of download testing. Starting upload in 2 seconds
43.27s Capping upload streams to 6 because of download result
43.27s starting websocket upload with 16 streams
43.27s minimum upload speed of 0.3 per stream
43.48s sent first packet to t56.dslreports.com
44.08s sent first packet to t59.dslreports.com
44.48s sent first packet to t59.dslreports.com
44.48s sent first packet to t57.dslreports.com
44.68s sent first packet to t56.dslreports.com
44.78s sent first packet to t58.dslreports.com
44.79s got first reply from t56.dslreports.com 221580
44.98s sent first packet to t58.dslreports.com
45.08s sent first packet to t56.dslreports.com
45.14s got first reply from t59.dslreports.com 221580
45.28s sent first packet to t59.dslreports.com
45.53s got first reply from t59.dslreports.com 155106
45.55s got first reply from t57.dslreports.com 70167
45.78s got first reply from t5

Re: [Bloat] [Cake] [Make-wifi-fast] dslreports is no longer free

2020-05-06 Thread Sebastian Moeller
Dear David,

Thanks for the elaboration below, and indeed I was not appreciating the full 
scope of the challenge.

> On May 3, 2020, at 17:06, David P. Reed  wrote:
> 
> Thanks Sebastian. I do agree that in many cases, reflecting the ICMP off the 
> entry device that has the external IP address for the NAT gets most of the 
> RTT measure, and if there's no queueing built up in the NAT device, that's a 
> reasonable measure. But...

Yes, I see; I really hope that with IPv6 coming more and more online, 
and hence less NAT, end-to-end RTT measurements will be simpler in the future. 
But cue the people who will for example recommend to drop/ignore ICMP in the 
name of security theater... Its the same mindset that basically recommends to 
ignore ICMP and/or IP timestamps, because "information leakage", while all the 
information that leaks for a standards conformant host is the time since 
midnight UTC (and potentially an idea about the difference between the local 
clock setting)... I fail to understand the rationale thread model behind 
eschewing this... For our purpoes one-way timestamps would be most excellent to 
have to be able to assess on which "leg" overload actually happens.

> 
> However, if the router has "taken up the queueing delay" by rate limiting its 
> uplink traffic to slightly less than the capacity (as with Cake and other TC 
> shaping that isn't as good as cake), then there is a queue in the TC layer 
> itself. This is what concerns me as a distortion in the measurement that can 
> fool one into thinking the TC shaper is doing a good job, when in fact, lag 
> under load may be quite high from inside the routed domain (the home).

As long as the shaper is instantiated on the NAT box, the latency 
probes reflected by that NAT-box will also travel through the shaper; but now 
you mention it, in SQM we do ingress shaping via an IFB and hence will also 
shape the incoming latency probes, but I started to recommend to do ingress 
shaping as egress-shaping on the LAN-wards interface of a router (to avoid the 
computational cost of the IFB redirection dance, and to allow people to use 
iptables for ingress*), and in such a configuration router reflected/emitted 
WAN-probes will avoid the ingress TC-queues... 

*) With nftables having a hook at ingress, that second rationale will become 
moot in the near future...


> 
> As you point out this unmeasured queueing delay can also be a problem with 
> WiFi inside the home. But it isn't limited to that.
> 
> A badly set up shaping/congestion management subsystem inside the NAT can 
> look "very good" in its echo of ICMP packets, but be terrible in response 
> time to trivial HTTP requests from inside, or equally terrible in twitch 
> games and video conferencing.

Good point, and one of Dave's pet peeves, in former time people 
recommended to up-priritize ICMP packets to make RTT look good, falling exactly 
into the trap you described.

> 
> So, for example, for tuning settings with "Cake" it is useless.

I believe that at least for the way we instantiate things by default in 
SQM-scripts we avoid that pit-fall. What do you think @Toke?

> 
> To be fair, usually the Access Provider has no control of what is done after 
> the cable is terminated at the home, so as a way to decide if the provider is 
> badly engineering its side, a ping from a server is a reasonable quality 
> measure of the provider. 

Most providers in Germany will try to steer customers to rent a wifi 
router from the ISP, so bloat in the wifi link would also be under the 
responsibility of the ISP to some degree, no?


> 
> But not a good measure of the user experience, and if the provider provides 
> the NAT box, even if it has a good shaper in it, like Cake or fq_codel, it 
> will just confuse the user and create the opportunity for a "finger pointing" 
> argument where neither side understands what is going on.
> 
> This is why we need 
> 
> 1) a clear definition of lag under load that is from end-to-end in latency, 
> and involves, ideally, independent traffic from multiple sources through the 
> bottleneck.

I am all for it, in addition in the past we also reasoned that this 
definition needs to be relative simple so it can be easily explained to turn 
naive layperson into informed amateurs ;) The multiple sources thing is 
something that dslreports did welll, they typically tried to serve from 
multiple server sites and reported some stats per site. Now with its basically 
gone, it becomes clear how much clue went into that speedtest, a pitty that 
most of the competition did not follow their lead yet (I am especially looking 
at you Ookla...).

> 
> 2) ideally, a better way to localize where the queues are building up and 
> present that to users and access providers.  

Yes. Now how to do this robustly and reliably escapes me, albeit 
enabling one-way timestamps might help, then a saturating speedtest could be 
accompani

Re: [Bloat] [Cake] [Make-wifi-fast] dslreports is no longer free

2020-05-05 Thread David P. Reed

I think the real test should be multiple clients, not multiple sources, but 
coordinating is hard. The middleboxes on the way may treat distinct IP host 
addresses specially, and of course there is an edge case because a single NIC 
by definition never sends two datagrams at once, which distort things as you 
look at edge performance issues.
 
The classic problem (Jim Gettys' "Daddy why is the Internet broken?" when 
uploading a big file from Dad's computer affects the web performance of the kid 
in the kid's bedroom) is an example of a UX issue that *really matters*. At HP 
Cambridge Research Lab, I used to have the local network management come to my 
office and yell at me because I was often uploading huge datasets to other HP 
locations, and it absolutely destroyed every other person's web usability when 
I did. (as usual, RTT went to multiple seconds, not affecting my file uploads 
at all, but it was the first example of what was later called Bufferbloat that 
got me focused on the issue of overbuffering.) Turned out that that problem was 
in choosing to use a Frame Relay link with the "don't ever discard packets" 
setting.
That was ALSO the first time I encountered "network experts" who absolutely 
denied that more buffering was bad. They thought that more buffering was GOOD. 
This was shocking, after I realized that almost no-one understood congestion 
was about excess queueing delay.
 
I still see badly misconfigured networks that destroy the ability to do Zoom or 
any other teleconferencing when someone is uploading files. And for some weird, 
weird reason, the work done by the Bloat team is constantly disparaged at IETF, 
to the point that their work isn't influencing anyone outside the 
Linux-based-router community. (Including Arista Networks, where they build 
overbuffered high speed switches and claim that is "a feature", and Andy 
Bechtolsheim refuses to listen to me or anyone else about it).
 
 
On Monday, May 4, 2020 1:04pm, "Sergey Fedorov"  said:



Sergey - I wasn't assuming anything about [ fast.com ]( http://fast.com/ ). The 
document you shared wasn't clear about the methodology's details here. Others 
sadly, have actually used ICMP pings in the way I described. I was making a 
generic comment of concern.
 
That said, it sounds like what you are doing is really helpful (esp. given that 
your measure is aimed at end user experiential qualities).
David - my apologies, I incorrectly interpreted your statement as being said in 
context of [ fast.com ]( http://fast.com ) measurements. The blog post linked 
indeed doesn't provide the latency measurement details - was written before we 
added the extra metrics. We'll see if we can publish an update. 1) a clear 
definition of lag under load that is from end-to-end in latency, and involves, 
ideally, independent traffic from multiple sources through the bottleneck.
 Curious if by multiple sources you mean multiple clients (devices) or multiple 
connections sending data? 
 





SERGEY FEDOROV
Director of Engineering
[ sfedo...@netflix.com ]( mailto:sfedo...@netflix.com )
121 Albright Way | Los Gatos, CA 95032




 


On Sun, May 3, 2020 at 8:07 AM David P. Reed <[ dpr...@deepplum.com ]( 
mailto:dpr...@deepplum.com )> wrote:
Thanks Sebastian. I do agree that in many cases, reflecting the ICMP off the 
entry device that has the external IP address for the NAT gets most of the RTT 
measure, and if there's no queueing built up in the NAT device, that's a 
reasonable measure. But...
 
However, if the router has "taken up the queueing delay" by rate limiting its 
uplink traffic to slightly less than the capacity (as with Cake and other TC 
shaping that isn't as good as cake), then there is a queue in the TC layer 
itself. This is what concerns me as a distortion in the measurement that can 
fool one into thinking the TC shaper is doing a good job, when in fact, lag 
under load may be quite high from inside the routed domain (the home).
 
As you point out this unmeasured queueing delay can also be a problem with WiFi 
inside the home. But it isn't limited to that.
 
A badly set up shaping/congestion management subsystem inside the NAT can look 
"very good" in its echo of ICMP packets, but be terrible in response time to 
trivial HTTP requests from inside, or equally terrible in twitch games and 
video conferencing.
 
So, for example, for tuning settings with "Cake" it is useless.
 
To be fair, usually the Access Provider has no control of what is done after 
the cable is terminated at the home, so as a way to decide if the provider is 
badly engineering its side, a ping from a server is a reasonable quality 
measure of the provider. 
 
But not a good measure of the user experience, and if the provider provides the 
NAT box, even if it has a good shaper in it, like Cake or fq_codel, it will 
just confuse the user and create the opportunity for a "finger pointing" 
argument where neither side understands what is going on.
 
This is why we need 
 
1) a

Re: [Bloat] [Cake] [Make-wifi-fast] dslreports is no longer free

2020-05-04 Thread Sergey Fedorov via Bloat
--- Begin Message ---
>
> Sergey - I wasn't assuming anything about fast.com. The document you
> shared wasn't clear about the methodology's details here. Others sadly,
> have actually used ICMP pings in the way I described. I was making a
> generic comment of concern.
>
> That said, it sounds like what you are doing is really helpful (esp. given
> that your measure is aimed at end user experiential qualities).

David - my apologies, I incorrectly interpreted your statement as being
said in context of fast.com measurements. The blog post linked indeed
doesn't provide the latency measurement details - was written before we
added the extra metrics. We'll see if we can publish an update.

1) a clear definition of lag under load that is from end-to-end in latency,
> and involves, ideally, independent traffic from multiple sources through
> the bottleneck.

 Curious if by multiple sources you mean multiple clients (devices) or
multiple connections sending data?


SERGEY FEDOROV

Director of Engineering

sfedo...@netflix.com

121 Albright Way | Los Gatos, CA 95032




On Sun, May 3, 2020 at 8:07 AM David P. Reed  wrote:

> Thanks Sebastian. I do agree that in many cases, reflecting the ICMP off
> the entry device that has the external IP address for the NAT gets most of
> the RTT measure, and if there's no queueing built up in the NAT device,
> that's a reasonable measure. But...
>
>
>
> However, if the router has "taken up the queueing delay" by rate limiting
> its uplink traffic to slightly less than the capacity (as with Cake and
> other TC shaping that isn't as good as cake), then there is a queue in the
> TC layer itself. This is what concerns me as a distortion in the
> measurement that can fool one into thinking the TC shaper is doing a good
> job, when in fact, lag under load may be quite high from inside the routed
> domain (the home).
>
>
>
> As you point out this unmeasured queueing delay can also be a problem with
> WiFi inside the home. But it isn't limited to that.
>
>
>
> A badly set up shaping/congestion management subsystem inside the NAT can
> look "very good" in its echo of ICMP packets, but be terrible in response
> time to trivial HTTP requests from inside, or equally terrible in twitch
> games and video conferencing.
>
>
>
> So, for example, for tuning settings with "Cake" it is useless.
>
>
>
> To be fair, usually the Access Provider has no control of what is done
> after the cable is terminated at the home, so as a way to decide if the
> provider is badly engineering its side, a ping from a server is a
> reasonable quality measure of the provider.
>
>
>
> But not a good measure of the user experience, and if the provider
> provides the NAT box, even if it has a good shaper in it, like Cake or
> fq_codel, it will just confuse the user and create the opportunity for a
> "finger pointing" argument where neither side understands what is going on.
>
>
>
> This is why we need
>
>
>
> 1) a clear definition of lag under load that is from end-to-end in
> latency, and involves, ideally, independent traffic from multiple sources
> through the bottleneck.
>
>
>
> 2) ideally, a better way to localize where the queues are building up and
> present that to users and access providers.  The flent graphs are not
> interpretable by most non-experts. What we need is a simple visualization
> of a sketch-map of the path (like traceroute might provide) with queueing
> delay measures  shown at key points that the user can understand.
>
> On Saturday, May 2, 2020 4:19pm, "Sebastian Moeller" 
> said:
>
> > Hi David,
> >
> > in principle I agree, a NATed IPv4 ICMP probe will be at best reflected
> at the NAT
> > router (CPE) (some commercial home gateways do not respond to ICMP echo
> requests
> > in the name of security theatre). So it is pretty hard to measure the
> full end to
> > end path in that configuration. I believe that IPv6 should make that
> > easier/simpler in that NAT hopefully will be out of the path (but let's
> see what
> > ingenuity ISPs will come up with).
> > Then again, traditionally the relevant bottlenecks often are a) the
> internet
> > access link itself and there the CPE is in a reasonable position as a
> reflector on
> > the other side of the bottleneck as seen from an internet server, b) the
> home
> > network between CPE and end-host, often with variable rate wifi, here I
> agree
> > reflecting echos at the CPE hides part of the issue.
> >
> >
> >
> > > On May 2, 2020, at 19:38, David P. Reed  wrote:
> > >
> > > I am still a bit worried about properly defining "latency under load"
> for a
> > NAT routed situation. If the test is based on ICMP Ping packets *from
> the server*,
> > it will NOT be measuring the full path latency, and if the potential
> congestion
> > is in the uplink path from the access provider's residential box to the
> access
> > provider's router/switch, it will NOT measure congestion caused by
> bufferbloat
> > reliably on either side, since the bufferbloat will be o

Re: [Bloat] [Cake] [Make-wifi-fast] dslreports is no longer free

2020-05-03 Thread Dave Taht
Hmm. Can webrtc set/see the ttl field? dscp? ecn?
I figure it might be able to on linux and osx, but not windows.
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Cake] [Make-wifi-fast] dslreports is no longer free

2020-05-03 Thread David P. Reed

Thanks Sebastian. I do agree that in many cases, reflecting the ICMP off the 
entry device that has the external IP address for the NAT gets most of the RTT 
measure, and if there's no queueing built up in the NAT device, that's a 
reasonable measure. But...
 
However, if the router has "taken up the queueing delay" by rate limiting its 
uplink traffic to slightly less than the capacity (as with Cake and other TC 
shaping that isn't as good as cake), then there is a queue in the TC layer 
itself. This is what concerns me as a distortion in the measurement that can 
fool one into thinking the TC shaper is doing a good job, when in fact, lag 
under load may be quite high from inside the routed domain (the home).
 
As you point out this unmeasured queueing delay can also be a problem with WiFi 
inside the home. But it isn't limited to that.
 
A badly set up shaping/congestion management subsystem inside the NAT can look 
"very good" in its echo of ICMP packets, but be terrible in response time to 
trivial HTTP requests from inside, or equally terrible in twitch games and 
video conferencing.
 
So, for example, for tuning settings with "Cake" it is useless.
 
To be fair, usually the Access Provider has no control of what is done after 
the cable is terminated at the home, so as a way to decide if the provider is 
badly engineering its side, a ping from a server is a reasonable quality 
measure of the provider. 
 
But not a good measure of the user experience, and if the provider provides the 
NAT box, even if it has a good shaper in it, like Cake or fq_codel, it will 
just confuse the user and create the opportunity for a "finger pointing" 
argument where neither side understands what is going on.
 
This is why we need 
 
1) a clear definition of lag under load that is from end-to-end in latency, and 
involves, ideally, independent traffic from multiple sources through the 
bottleneck.
 
2) ideally, a better way to localize where the queues are building up and 
present that to users and access providers.  The flent graphs are not 
interpretable by most non-experts. What we need is a simple visualization of a 
sketch-map of the path (like traceroute might provide) with queueing delay 
measures  shown at key points that the user can understand.
On Saturday, May 2, 2020 4:19pm, "Sebastian Moeller"  said:



> Hi David,
> 
> in principle I agree, a NATed IPv4 ICMP probe will be at best reflected at 
> the NAT
> router (CPE) (some commercial home gateways do not respond to ICMP echo 
> requests
> in the name of security theatre). So it is pretty hard to measure the full 
> end to
> end path in that configuration. I believe that IPv6 should make that
> easier/simpler in that NAT hopefully will be out of the path (but let's see 
> what
> ingenuity ISPs will come up with).
> Then again, traditionally the relevant bottlenecks often are a) the internet
> access link itself and there the CPE is in a reasonable position as a 
> reflector on
> the other side of the bottleneck as seen from an internet server, b) the home
> network between CPE and end-host, often with variable rate wifi, here I agree
> reflecting echos at the CPE hides part of the issue.
> 
> 
> 
> > On May 2, 2020, at 19:38, David P. Reed  wrote:
> >
> > I am still a bit worried about properly defining "latency under load" for a
> NAT routed situation. If the test is based on ICMP Ping packets *from the 
> server*,
> it will NOT be measuring the full path latency, and if the potential 
> congestion
> is in the uplink path from the access provider's residential box to the access
> provider's router/switch, it will NOT measure congestion caused by bufferbloat
> reliably on either side, since the bufferbloat will be outside the ICMP Ping
> path.
> 
> Puzzled, as i believe it is going to be the residential box that will respond
> here, or will it be the AFTRs for CG-NAT that reflect the ICMP echo requests?
> 
> >
> > I realize that a browser based speed test has to be basically run from the
> "server" end, because browsers are not that good at time measurement on a 
> packet
> basis. However, there are ways to solve this and avoid the ICMP Ping issue, 
> with a
> cooperative server.
> >
> > I once built a test that fixed this issue reasonably well. It carefully
> created a TCP based RTT measurement channel (over HTTP) that made the echo 
> have to
> traverse the whole end-to-end path, which is the best and only way to 
> accurately
> define lag under load from the user's perspective. The client end of an 
> unloaded
> TCP connection can depend on TCP (properly prepared by getting it past 
> slowstart)
> to generate a single packet response.
> >
> > This "TCP ping" is thus compatible with getting the end-to-end measurement 
> > on
> the server end of a true RTT.
> >
> > It's like tcp-traceroute tool, in that it tricks anyone in the middle boxes
> into thinking this is a real, serious packet, not an optional low priority
> packet.
> >
> > The same issue comes up

Re: [Bloat] [Cake] [Make-wifi-fast] dslreports is no longer free

2020-05-02 Thread David P. Reed

Sergey - I wasn't assuming anything about fast.com. The document you shared 
wasn't clear about the methodology's details here. Others sadly, have actually 
used ICMP pings in the way I described. I was making a generic comment of 
concern.
 
That said, it sounds like what you are doing is really helpful (esp. given that 
your measure is aimed at end user experiential qualities).
 
Good luck!
 
 
On Saturday, May 2, 2020 3:00pm, "Sergey Fedorov"  said:





Dave, thanks for sharing interesting thoughts and context. I am still a bit 
worried about properly defining "latency under load" for a NAT routed 
situation. If the test is based on ICMP Ping packets *from the server*,  it 
will NOT be measuring the full path latency, and if the potential congestion is 
in the uplink path from the access provider's residential box to the access 
provider's router/switch, it will NOT measure congestion caused by bufferbloat 
reliably on either side, since the bufferbloat will be outside the ICMP Ping 
path.
 
I realize that a browser based speed test has to be basically run from the 
"server" end, because browsers are not that good at time measurement on a 
packet basis. However, there are ways to solve this and avoid the ICMP Ping 
issue, with a cooperative server.
This erroneously assumes that [ fast.com ]( http://fast.com ) measures latency 
from the server side. It does not. The measurements are done from the client, 
over http, with a parallel connection(s) to the same or similar set of servers, 
by sending empty requests over a previously established connection (you can see 
that in the browser web inspector).
It should be noted that the value is not precisely the "RTT on a TCP/UDP flow 
that is loaded with traffic", but "user delay given the presence of heavy 
parallel flows". With that, some of the challenges you mentioned do not apply.
In line with another point I've shared earlier - the goal is to measure and 
explain the user experience, not to be a diagnostic tool showing internal 
transport metrics.






SERGEY FEDOROV
Director of Engineering
[ sfedo...@netflix.com ]( mailto:sfedo...@netflix.com )
121 Albright Way | Los Gatos, CA 95032


On Sat, May 2, 2020 at 10:38 AM David P. Reed <[ dpr...@deepplum.com ]( 
mailto:dpr...@deepplum.com )> wrote:
I am still a bit worried about properly defining "latency under load" for a NAT 
routed situation. If the test is based on ICMP Ping packets *from the server*,  
it will NOT be measuring the full path latency, and if the potential congestion 
is in the uplink path from the access provider's residential box to the access 
provider's router/switch, it will NOT measure congestion caused by bufferbloat 
reliably on either side, since the bufferbloat will be outside the ICMP Ping 
path.
 
I realize that a browser based speed test has to be basically run from the 
"server" end, because browsers are not that good at time measurement on a 
packet basis. However, there are ways to solve this and avoid the ICMP Ping 
issue, with a cooperative server.
 
I once built a test that fixed this issue reasonably well. It carefully created 
a TCP based RTT measurement channel (over HTTP) that made the echo have to 
traverse the whole end-to-end path, which is the best and only way to 
accurately define lag under load from the user's perspective. The client end of 
an unloaded TCP connection can depend on TCP (properly prepared by getting it 
past slowstart) to generate a single packet response.
 
This "TCP ping" is thus compatible with getting the end-to-end measurement on 
the server end of a true RTT.
 
It's like tcp-traceroute tool, in that it tricks anyone in the middle boxes 
into thinking this is a real, serious packet, not an optional low priority 
packet.
 
The same issue comes up with non-browser-based techniques for measuring true 
lag-under-load.
 
Now as we move HTTP to QUIC, this actually gets easier to do.
 
One other opportunity I haven't explored, but which is pregnant with potential 
is the use of WebRTC, which runs over UDP internally. Since JavaScript has 
direct access to create WebRTC connections (multiple ones), this makes detailed 
testing in the browser quite reasonable.
 
And the time measurements can resolve well below 100 microseconds, if the JS is 
based on modern JIT compilation (Chrome, Firefox, Edge all compile to machine 
code speed if the code is restricted and in a loop). Then again, there is Web 
Assembly if you want to write C code that runs in the brower fast. WebAssembly 
is a low level language that compiles to machine code in the browser execution, 
and still has access to all the browser networking facilities.
 
On Saturday, May 2, 2020 12:52pm, "Dave Taht" <[ dave.t...@gmail.com ]( 
mailto:dave.t...@gmail.com )> said:



> On Sat, May 2, 2020 at 9:37 AM Benjamin Cronce <[ bcro...@gmail.com ]( 
> mailto:bcro...@gmail.com )> wrote:
> >
> > > Fast.com reports my unloaded latency as 4ms, my loaded latency as ~7ms
> 
> I guess one of my questions

Re: [Bloat] [Cake] [Make-wifi-fast] dslreports is no longer free

2020-05-02 Thread Sebastian Moeller
Hi David,

in principle I agree, a NATed IPv4 ICMP probe will be at best reflected at the 
NAT router (CPE)  (some commercial home gateways do not respond to ICMP echo 
requests in the name of security theatre). So it is pretty hard to measure the 
full end to end path in that configuration. I believe that IPv6 should make 
that easier/simpler in that NAT hopefully will be out of the path (but let's 
see what ingenuity ISPs will come up with).
Then again, traditionally the relevant bottlenecks often are a) the internet 
access link itself and there the CPE is in a reasonable position as a reflector 
on the other side of the bottleneck as seen from an internet server, b) the 
home network between CPE and end-host, often with variable rate wifi, here I 
agree reflecting echos at the CPE hides part of the issue.



> On May 2, 2020, at 19:38, David P. Reed  wrote:
> 
> I am still a bit worried about properly defining "latency under load" for a 
> NAT routed situation. If the test is based on ICMP Ping packets *from the 
> server*,  it will NOT be measuring the full path latency, and if the 
> potential congestion is in the uplink path from the access provider's 
> residential box to the access provider's router/switch, it will NOT measure 
> congestion caused by bufferbloat reliably on either side, since the 
> bufferbloat will be outside the ICMP Ping path.

Puzzled, as i believe it is going to be the residential box that will 
respond here, or will it be the AFTRs for CG-NAT that reflect the ICMP echo 
requests?

>  
> I realize that a browser based speed test has to be basically run from the 
> "server" end, because browsers are not that good at time measurement on a 
> packet basis. However, there are ways to solve this and avoid the ICMP Ping 
> issue, with a cooperative server.
>  
> I once built a test that fixed this issue reasonably well. It carefully 
> created a TCP based RTT measurement channel (over HTTP) that made the echo 
> have to traverse the whole end-to-end path, which is the best and only way to 
> accurately define lag under load from the user's perspective. The client end 
> of an unloaded TCP connection can depend on TCP (properly prepared by getting 
> it past slowstart) to generate a single packet response.
>  
> This "TCP ping" is thus compatible with getting the end-to-end measurement on 
> the server end of a true RTT.
>  
> It's like tcp-traceroute tool, in that it tricks anyone in the middle boxes 
> into thinking this is a real, serious packet, not an optional low priority 
> packet.
>  
> The same issue comes up with non-browser-based techniques for measuring true 
> lag-under-load.
>  
> Now as we move HTTP to QUIC, this actually gets easier to do.
>  
> One other opportunity I haven't explored, but which is pregnant with 
> potential is the use of WebRTC, which runs over UDP internally. Since 
> JavaScript has direct access to create WebRTC connections (multiple ones), 
> this makes detailed testing in the browser quite reasonable.
>  
> And the time measurements can resolve well below 100 microseconds, if the JS 
> is based on modern JIT compilation (Chrome, Firefox, Edge all compile to 
> machine code speed if the code is restricted and in a loop). Then again, 
> there is Web Assembly if you want to write C code that runs in the brower 
> fast. WebAssembly is a low level language that compiles to machine code in 
> the browser execution, and still has access to all the browser networking 
> facilities.

Mmmh, according to https://github.com/w3c/hr-time/issues/56 due to 
spectre side-channel vulnerabilities many browsers seemed to have lowered the 
timer resolution, but even the ~1ms resolution should be fine for typical RTTs.

Best Regards
Sebastian

P.S.: I assume that I simply do not see/understand the full scope of the issue 
at hand yet.


>  
> On Saturday, May 2, 2020 12:52pm, "Dave Taht"  said:
> 
> > On Sat, May 2, 2020 at 9:37 AM Benjamin Cronce  wrote:
> > >
> > > > Fast.com reports my unloaded latency as 4ms, my loaded latency as ~7ms
> > 
> > I guess one of my questions is that with a switch to BBR netflix is
> > going to do pretty well. If fast.com is using bbr, well... that
> > excludes much of the current side of the internet.
> > 
> > > For download, I show 6ms unloaded and 6-7 loaded. But for upload the 
> > > loaded
> > shows as 7-8 and I see it blip upwards of 12ms. But I am no longer using any
> > traffic shaping. Any anti-bufferbloat is from my ISP. A graph of the bloat 
> > would
> > be nice.
> > 
> > The tests do need to last a fairly long time.
> > 
> > > On Sat, May 2, 2020 at 9:51 AM Jannie Hanekom 
> > wrote:
> > >>
> > >> Michael Richardson :
> > >> > Does it find/use my nearest Netflix cache?
> > >>
> > >> Thankfully, it appears so. The DSLReports bloat test was interesting,
> > but
> > >> the jitter on the ~240ms base latency from South Africa (and other parts
> > of
> > >> the world) was significant enough that the fig

Re: [Bloat] [Cake] [Make-wifi-fast] dslreports is no longer free

2020-05-02 Thread Sergey Fedorov via Bloat
--- Begin Message ---
Dave, thanks for sharing interesting thoughts and context.

> I am still a bit worried about properly defining "latency under load" for
> a NAT routed situation. If the test is based on ICMP Ping packets *from the
> server*,  it will NOT be measuring the full path latency, and if the
> potential congestion is in the uplink path from the access provider's
> residential box to the access provider's router/switch, it will NOT measure
> congestion caused by bufferbloat reliably on either side, since the
> bufferbloat will be outside the ICMP Ping path.
>
> I realize that a browser based speed test has to be basically run from the
> "server" end, because browsers are not that good at time measurement on a
> packet basis. However, there are ways to solve this and avoid the ICMP Ping
> issue, with a cooperative server.

This erroneously assumes that fast.com measures latency from the server
side. It does not. The measurements are done from the client, over http,
with a parallel connection(s) to the same or similar set of servers, by
sending empty requests over a previously established connection (you can
see that in the browser web inspector).
It should be noted that the value is not precisely the "RTT on a
TCP/UDP flow that is loaded with traffic", but "user delay given the
presence of heavy parallel flows". With that, some of the challenges you
mentioned do not apply.
In line with another point I've shared earlier - the goal is to measure and
explain the user experience, not to be a diagnostic tool showing internal
transport metrics.

SERGEY FEDOROV

Director of Engineering

sfedo...@netflix.com

121 Albright Way | Los Gatos, CA 95032



On Sat, May 2, 2020 at 10:38 AM David P. Reed  wrote:

> I am still a bit worried about properly defining "latency under load" for
> a NAT routed situation. If the test is based on ICMP Ping packets *from the
> server*,  it will NOT be measuring the full path latency, and if the
> potential congestion is in the uplink path from the access provider's
> residential box to the access provider's router/switch, it will NOT measure
> congestion caused by bufferbloat reliably on either side, since the
> bufferbloat will be outside the ICMP Ping path.
>
>
>
> I realize that a browser based speed test has to be basically run from the
> "server" end, because browsers are not that good at time measurement on a
> packet basis. However, there are ways to solve this and avoid the ICMP Ping
> issue, with a cooperative server.
>
>
>
> I once built a test that fixed this issue reasonably well. It carefully
> created a TCP based RTT measurement channel (over HTTP) that made the echo
> have to traverse the whole end-to-end path, which is the best and only way
> to accurately define lag under load from the user's perspective. The client
> end of an unloaded TCP connection can depend on TCP (properly prepared by
> getting it past slowstart) to generate a single packet response.
>
>
>
> This "TCP ping" is thus compatible with getting the end-to-end measurement
> on the server end of a true RTT.
>
>
>
> It's like tcp-traceroute tool, in that it tricks anyone in the middle
> boxes into thinking this is a real, serious packet, not an optional low
> priority packet.
>
>
>
> The same issue comes up with non-browser-based techniques for measuring
> true lag-under-load.
>
>
>
> Now as we move HTTP to QUIC, this actually gets easier to do.
>
>
>
> One other opportunity I haven't explored, but which is pregnant with
> potential is the use of WebRTC, which runs over UDP internally. Since
> JavaScript has direct access to create WebRTC connections (multiple ones),
> this makes detailed testing in the browser quite reasonable.
>
>
>
> And the time measurements can resolve well below 100 microseconds, if the
> JS is based on modern JIT compilation (Chrome, Firefox, Edge all compile to
> machine code speed if the code is restricted and in a loop). Then again,
> there is Web Assembly if you want to write C code that runs in the brower
> fast. WebAssembly is a low level language that compiles to machine code in
> the browser execution, and still has access to all the browser networking
> facilities.
>
>
>
> On Saturday, May 2, 2020 12:52pm, "Dave Taht"  said:
>
> > On Sat, May 2, 2020 at 9:37 AM Benjamin Cronce 
> wrote:
> > >
> > > > Fast.com reports my unloaded latency as 4ms, my loaded latency as
> ~7ms
> >
> > I guess one of my questions is that with a switch to BBR netflix is
> > going to do pretty well. If fast.com is using bbr, well... that
> > excludes much of the current side of the internet.
> >
> > > For download, I show 6ms unloaded and 6-7 loaded. But for upload the
> loaded
> > shows as 7-8 and I see it blip upwards of 12ms. But I am no longer using
> any
> > traffic shaping. Any anti-bufferbloat is from my ISP. A graph of the
> bloat would
> > be nice.
> >
> > The tests do need to last a fairly long time.
> >
> > > On Sat, May 2, 2020 at 9:51 AM Jannie Hanekom 
> > wrote:
> > >>

Re: [Bloat] [Cake] [Make-wifi-fast] dslreports is no longer free

2020-05-02 Thread Benjamin Cronce
> Fast.com reports my unloaded latency as 4ms, my loaded latency as ~7ms

For download, I show 6ms unloaded and 6-7 loaded. But for upload the loaded
shows as 7-8 and I see it blip upwards of 12ms. But I am no longer using
any traffic shaping. Any anti-bufferbloat is from my ISP. A graph of the
bloat would be nice.

On Sat, May 2, 2020 at 9:51 AM Jannie Hanekom  wrote:

> Michael Richardson :
> > Does it find/use my nearest Netflix cache?
>
> Thankfully, it appears so.  The DSLReports bloat test was interesting, but
> the jitter on the ~240ms base latency from South Africa (and other parts of
> the world) was significant enough that the figures returned were often
> unreliable and largely unusable - at least in my experience.
>
> Fast.com reports my unloaded latency as 4ms, my loaded latency as ~7ms and
> mentions servers located in local cities.  I finally have a test I can
> share
> with local non-technical people!
>
> (Agreed, upload test would be nice, but this is a huge step forward from
> what I had access to before.)
>
> Jannie Hanekom
>
> ___
> Cake mailing list
> c...@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake
>
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Cake] [Make-wifi-fast] dslreports is no longer free

2020-05-02 Thread David P. Reed

I am still a bit worried about properly defining "latency under load" for a NAT 
routed situation. If the test is based on ICMP Ping packets *from the server*,  
it will NOT be measuring the full path latency, and if the potential congestion 
is in the uplink path from the access provider's residential box to the access 
provider's router/switch, it will NOT measure congestion caused by bufferbloat 
reliably on either side, since the bufferbloat will be outside the ICMP Ping 
path.
 
I realize that a browser based speed test has to be basically run from the 
"server" end, because browsers are not that good at time measurement on a 
packet basis. However, there are ways to solve this and avoid the ICMP Ping 
issue, with a cooperative server.
 
I once built a test that fixed this issue reasonably well. It carefully created 
a TCP based RTT measurement channel (over HTTP) that made the echo have to 
traverse the whole end-to-end path, which is the best and only way to 
accurately define lag under load from the user's perspective. The client end of 
an unloaded TCP connection can depend on TCP (properly prepared by getting it 
past slowstart) to generate a single packet response.
 
This "TCP ping" is thus compatible with getting the end-to-end measurement on 
the server end of a true RTT.
 
It's like tcp-traceroute tool, in that it tricks anyone in the middle boxes 
into thinking this is a real, serious packet, not an optional low priority 
packet.
 
The same issue comes up with non-browser-based techniques for measuring true 
lag-under-load.
 
Now as we move HTTP to QUIC, this actually gets easier to do.
 
One other opportunity I haven't explored, but which is pregnant with potential 
is the use of WebRTC, which runs over UDP internally. Since JavaScript has 
direct access to create WebRTC connections (multiple ones), this makes detailed 
testing in the browser quite reasonable.
 
And the time measurements can resolve well below 100 microseconds, if the JS is 
based on modern JIT compilation (Chrome, Firefox, Edge all compile to machine 
code speed if the code is restricted and in a loop). Then again, there is Web 
Assembly if you want to write C code that runs in the brower fast. WebAssembly 
is a low level language that compiles to machine code in the browser execution, 
and still has access to all the browser networking facilities.
 
On Saturday, May 2, 2020 12:52pm, "Dave Taht"  said:



> On Sat, May 2, 2020 at 9:37 AM Benjamin Cronce  wrote:
> >
> > > Fast.com reports my unloaded latency as 4ms, my loaded latency as ~7ms
> 
> I guess one of my questions is that with a switch to BBR netflix is
> going to do pretty well. If fast.com is using bbr, well... that
> excludes much of the current side of the internet.
> 
> > For download, I show 6ms unloaded and 6-7 loaded. But for upload the loaded
> shows as 7-8 and I see it blip upwards of 12ms. But I am no longer using any
> traffic shaping. Any anti-bufferbloat is from my ISP. A graph of the bloat 
> would
> be nice.
> 
> The tests do need to last a fairly long time.
> 
> > On Sat, May 2, 2020 at 9:51 AM Jannie Hanekom 
> wrote:
> >>
> >> Michael Richardson :
> >> > Does it find/use my nearest Netflix cache?
> >>
> >> Thankfully, it appears so. The DSLReports bloat test was interesting,
> but
> >> the jitter on the ~240ms base latency from South Africa (and other parts
> of
> >> the world) was significant enough that the figures returned were often
> >> unreliable and largely unusable - at least in my experience.
> >>
> >> Fast.com reports my unloaded latency as 4ms, my loaded latency as ~7ms
> and
> >> mentions servers located in local cities. I finally have a test I can
> share
> >> with local non-technical people!
> >>
> >> (Agreed, upload test would be nice, but this is a huge step forward from
> >> what I had access to before.)
> >>
> >> Jannie Hanekom
> >>
> >> ___
> >> Cake mailing list
> >> c...@lists.bufferbloat.net
> >> https://lists.bufferbloat.net/listinfo/cake
> >
> > ___
> > Cake mailing list
> > c...@lists.bufferbloat.net
> > https://lists.bufferbloat.net/listinfo/cake
> 
> 
> 
> --
> Make Music, Not War
> 
> Dave Täht
> CTO, TekLibre, LLC
> http://www.teklibre.com
> Tel: 1-831-435-0729
> ___
> Cake mailing list
> c...@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake
> ___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Cake] [Make-wifi-fast] dslreports is no longer free

2020-05-02 Thread Dave Taht
On Sat, May 2, 2020 at 9:37 AM Benjamin Cronce  wrote:
>
> > Fast.com reports my unloaded latency as 4ms, my loaded latency as ~7ms

I guess one of my questions is that with a switch to BBR netflix is
going to do pretty well. If fast.com is using bbr, well... that
excludes much of the current side of the internet.

> For download, I show 6ms unloaded and 6-7 loaded. But for upload the loaded 
> shows as 7-8 and I see it blip upwards of 12ms. But I am no longer using any 
> traffic shaping. Any anti-bufferbloat is from my ISP. A graph of the bloat 
> would be nice.

The tests do need to last a fairly long time.

> On Sat, May 2, 2020 at 9:51 AM Jannie Hanekom  wrote:
>>
>> Michael Richardson :
>> > Does it find/use my nearest Netflix cache?
>>
>> Thankfully, it appears so.  The DSLReports bloat test was interesting, but
>> the jitter on the ~240ms base latency from South Africa (and other parts of
>> the world) was significant enough that the figures returned were often
>> unreliable and largely unusable - at least in my experience.
>>
>> Fast.com reports my unloaded latency as 4ms, my loaded latency as ~7ms and
>> mentions servers located in local cities.  I finally have a test I can share
>> with local non-technical people!
>>
>> (Agreed, upload test would be nice, but this is a huge step forward from
>> what I had access to before.)
>>
>> Jannie Hanekom
>>
>> ___
>> Cake mailing list
>> c...@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/cake
>
> ___
> Cake mailing list
> c...@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake



-- 
Make Music, Not War

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-435-0729
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat