Re: [Bloat] virtio_net: BQL?

2021-05-17 Thread Stephen Hemminger
On Mon, 17 May 2021 16:32:21 -0700
Dave Taht  wrote:

> On Mon, May 17, 2021 at 4:00 PM Stephen Hemminger
>  wrote:
> >
> > On Mon, 17 May 2021 14:48:46 -0700
> > Dave Taht  wrote:
> >  
> > > On Mon, May 17, 2021 at 1:23 PM Willem de Bruijn
> > >  wrote:  
> > > >
> > > > On Mon, May 17, 2021 at 2:44 PM Dave Taht  wrote:  
> > > > >
> > > > > Not really related to this patch, but is there some reason why virtio
> > > > > has no support for BQL?  
> > > >
> > > > There have been a few attempts to add it over the years.
> > > >
> > > > Most recently, 
> > > > https://lore.kernel.org/lkml/20181205225323.12555-2-...@redhat.com/
> > > >
> > > > That thread has a long discussion. I think the key open issue remains
> > > >
> > > > "The tricky part is the mode switching between napi and no napi."  
> > >
> > > Oy, vey.
> > >
> > > I didn't pay any attention to that discussion, sadly enough.
> > >
> > > It's been about that long (2018) since I paid any attention to
> > > bufferbloat in the cloud and my cloudy provider (linode) switched to
> > > using virtio when I wasn't looking. For over a year now, I'd been
> > > getting reports saying that comcast's pie rollout wasn't working as
> > > well as expected, that evenroute's implementation of sch_cake and sqm
> > > on inbound wasn't working right, nor pf_sense's and numerous other
> > > issues at Internet scale.
> > >
> > > Last week I ran a string of benchmarks against starlink's new services
> > > and was really aghast at what I found there, too. but the problem
> > > seemed deeper than in just the dishy...
> > >
> > > Without BQL, there's no backpressure for fq_codel to do its thing.
> > > None. My measurement servers aren't FQ-codeling
> > > no matter how much load I put on them. Since that qdisc is the default
> > > now in most linux distributions, I imagine that the bulk of the cloud
> > > is now behaving as erratically as linux was in 2011 with enormous
> > > swings in throughput and latency from GSO/TSO hitting overlarge rx/tx
> > > rings, [1], breaking various rate estimators in codel, pie and the tcp
> > > stack itself.
> > >
> > > See:
> > >
> > > http://fremont.starlink.taht.net/~d/virtio_nobql/rrul_-_evenroute_v3_server_fq_codel.png
> > >
> > > See the swings in latency there? that's symptomatic of tx/rx rings
> > > filling and emptying.
> > >
> > > it wasn't until I switched my measurement server temporarily over to
> > > sch_fq that I got a rrul result that was close to the results we used
> > > to get from the virtualized e1000e drivers we were using in 2014.
> > >
> > > http://fremont.starlink.taht.net/~d/virtio_nobql/rrul_-_evenroute_v3_server_fq.png
> > >
> > > While I have long supported the use of sch_fq for tcp-heavy workloads,
> > > it still behaves better with bql in place, and fq_codel is better for
> > > generic workloads... but needs bql based backpressure to kick in.
> > >
> > > [1] I really hope I'm overreacting but, um, er, could someone(s) spin
> > > up a new patch that does bql in some way even half right for this
> > > driver and help test it? I haven't built a kernel in a while.
> > >  
> >
> > The Azure network driver (netvsc) also does not have BQL. Several years ago
> > I tried adding it but it benchmarked worse and there is the added complexity
> > of handling the accelerated networking VF path.  
> 
> I certainly agree it adds complexity, but the question is what sort of
> network behavior resulted without backpressure inside the
> vm?
> 
> What sorts of benchmarks did you do?
> 
> I will get setup to do some testing of this that is less adhoc.

Less of an issue than it seems for must users.

For the most common case, all transmits are passed through to the underlying
VF network device (Mellanox). So since Mellanox supports BQL, that works.
The special case is if accelerated networking is disabled or host is being
serviced and the slow path is used. Optimizing the slow path is not that
interesting.

I wonder if the use of SRIOV with virtio (which requires another layer
with the failover device) behaves the same way?
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] virtio_net: BQL?

2021-05-17 Thread Dave Taht
On Mon, May 17, 2021 at 4:00 PM Stephen Hemminger
 wrote:
>
> On Mon, 17 May 2021 14:48:46 -0700
> Dave Taht  wrote:
>
> > On Mon, May 17, 2021 at 1:23 PM Willem de Bruijn
> >  wrote:
> > >
> > > On Mon, May 17, 2021 at 2:44 PM Dave Taht  wrote:
> > > >
> > > > Not really related to this patch, but is there some reason why virtio
> > > > has no support for BQL?
> > >
> > > There have been a few attempts to add it over the years.
> > >
> > > Most recently, 
> > > https://lore.kernel.org/lkml/20181205225323.12555-2-...@redhat.com/
> > >
> > > That thread has a long discussion. I think the key open issue remains
> > >
> > > "The tricky part is the mode switching between napi and no napi."
> >
> > Oy, vey.
> >
> > I didn't pay any attention to that discussion, sadly enough.
> >
> > It's been about that long (2018) since I paid any attention to
> > bufferbloat in the cloud and my cloudy provider (linode) switched to
> > using virtio when I wasn't looking. For over a year now, I'd been
> > getting reports saying that comcast's pie rollout wasn't working as
> > well as expected, that evenroute's implementation of sch_cake and sqm
> > on inbound wasn't working right, nor pf_sense's and numerous other
> > issues at Internet scale.
> >
> > Last week I ran a string of benchmarks against starlink's new services
> > and was really aghast at what I found there, too. but the problem
> > seemed deeper than in just the dishy...
> >
> > Without BQL, there's no backpressure for fq_codel to do its thing.
> > None. My measurement servers aren't FQ-codeling
> > no matter how much load I put on them. Since that qdisc is the default
> > now in most linux distributions, I imagine that the bulk of the cloud
> > is now behaving as erratically as linux was in 2011 with enormous
> > swings in throughput and latency from GSO/TSO hitting overlarge rx/tx
> > rings, [1], breaking various rate estimators in codel, pie and the tcp
> > stack itself.
> >
> > See:
> >
> > http://fremont.starlink.taht.net/~d/virtio_nobql/rrul_-_evenroute_v3_server_fq_codel.png
> >
> > See the swings in latency there? that's symptomatic of tx/rx rings
> > filling and emptying.
> >
> > it wasn't until I switched my measurement server temporarily over to
> > sch_fq that I got a rrul result that was close to the results we used
> > to get from the virtualized e1000e drivers we were using in 2014.
> >
> > http://fremont.starlink.taht.net/~d/virtio_nobql/rrul_-_evenroute_v3_server_fq.png
> >
> > While I have long supported the use of sch_fq for tcp-heavy workloads,
> > it still behaves better with bql in place, and fq_codel is better for
> > generic workloads... but needs bql based backpressure to kick in.
> >
> > [1] I really hope I'm overreacting but, um, er, could someone(s) spin
> > up a new patch that does bql in some way even half right for this
> > driver and help test it? I haven't built a kernel in a while.
> >
>
> The Azure network driver (netvsc) also does not have BQL. Several years ago
> I tried adding it but it benchmarked worse and there is the added complexity
> of handling the accelerated networking VF path.

I certainly agree it adds complexity, but the question is what sort of
network behavior resulted without backpressure inside the
vm?

What sorts of benchmarks did you do?

I will get setup to do some testing of this that is less adhoc.


-- 
Latest Podcast:
https://www.linkedin.com/feed/update/urn:li:activity:6791014284936785920/

Dave Täht CTO, TekLibre, LLC
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] virtio_net: BQL?

2021-05-17 Thread Stephen Hemminger
On Mon, 17 May 2021 14:48:46 -0700
Dave Taht  wrote:

> On Mon, May 17, 2021 at 1:23 PM Willem de Bruijn
>  wrote:
> >
> > On Mon, May 17, 2021 at 2:44 PM Dave Taht  wrote:  
> > >
> > > Not really related to this patch, but is there some reason why virtio
> > > has no support for BQL?  
> >
> > There have been a few attempts to add it over the years.
> >
> > Most recently, 
> > https://lore.kernel.org/lkml/20181205225323.12555-2-...@redhat.com/
> >
> > That thread has a long discussion. I think the key open issue remains
> >
> > "The tricky part is the mode switching between napi and no napi."  
> 
> Oy, vey.
> 
> I didn't pay any attention to that discussion, sadly enough.
> 
> It's been about that long (2018) since I paid any attention to
> bufferbloat in the cloud and my cloudy provider (linode) switched to
> using virtio when I wasn't looking. For over a year now, I'd been
> getting reports saying that comcast's pie rollout wasn't working as
> well as expected, that evenroute's implementation of sch_cake and sqm
> on inbound wasn't working right, nor pf_sense's and numerous other
> issues at Internet scale.
> 
> Last week I ran a string of benchmarks against starlink's new services
> and was really aghast at what I found there, too. but the problem
> seemed deeper than in just the dishy...
> 
> Without BQL, there's no backpressure for fq_codel to do its thing.
> None. My measurement servers aren't FQ-codeling
> no matter how much load I put on them. Since that qdisc is the default
> now in most linux distributions, I imagine that the bulk of the cloud
> is now behaving as erratically as linux was in 2011 with enormous
> swings in throughput and latency from GSO/TSO hitting overlarge rx/tx
> rings, [1], breaking various rate estimators in codel, pie and the tcp
> stack itself.
> 
> See:
> 
> http://fremont.starlink.taht.net/~d/virtio_nobql/rrul_-_evenroute_v3_server_fq_codel.png
> 
> See the swings in latency there? that's symptomatic of tx/rx rings
> filling and emptying.
> 
> it wasn't until I switched my measurement server temporarily over to
> sch_fq that I got a rrul result that was close to the results we used
> to get from the virtualized e1000e drivers we were using in 2014.
> 
> http://fremont.starlink.taht.net/~d/virtio_nobql/rrul_-_evenroute_v3_server_fq.png
> 
> While I have long supported the use of sch_fq for tcp-heavy workloads,
> it still behaves better with bql in place, and fq_codel is better for
> generic workloads... but needs bql based backpressure to kick in.
> 
> [1] I really hope I'm overreacting but, um, er, could someone(s) spin
> up a new patch that does bql in some way even half right for this
> driver and help test it? I haven't built a kernel in a while.
> 

The Azure network driver (netvsc) also does not have BQL. Several years ago
I tried adding it but it benchmarked worse and there is the added complexity
of handling the accelerated networking VF path.

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] virtio_net: BQL?

2021-05-17 Thread Dave Taht
On Mon, May 17, 2021 at 1:23 PM Willem de Bruijn
 wrote:
>
> On Mon, May 17, 2021 at 2:44 PM Dave Taht  wrote:
> >
> > Not really related to this patch, but is there some reason why virtio
> > has no support for BQL?
>
> There have been a few attempts to add it over the years.
>
> Most recently, 
> https://lore.kernel.org/lkml/20181205225323.12555-2-...@redhat.com/
>
> That thread has a long discussion. I think the key open issue remains
>
> "The tricky part is the mode switching between napi and no napi."

Oy, vey.

I didn't pay any attention to that discussion, sadly enough.

It's been about that long (2018) since I paid any attention to
bufferbloat in the cloud and my cloudy provider (linode) switched to
using virtio when I wasn't looking. For over a year now, I'd been
getting reports saying that comcast's pie rollout wasn't working as
well as expected, that evenroute's implementation of sch_cake and sqm
on inbound wasn't working right, nor pf_sense's and numerous other
issues at Internet scale.

Last week I ran a string of benchmarks against starlink's new services
and was really aghast at what I found there, too. but the problem
seemed deeper than in just the dishy...

Without BQL, there's no backpressure for fq_codel to do its thing.
None. My measurement servers aren't FQ-codeling
no matter how much load I put on them. Since that qdisc is the default
now in most linux distributions, I imagine that the bulk of the cloud
is now behaving as erratically as linux was in 2011 with enormous
swings in throughput and latency from GSO/TSO hitting overlarge rx/tx
rings, [1], breaking various rate estimators in codel, pie and the tcp
stack itself.

See:

http://fremont.starlink.taht.net/~d/virtio_nobql/rrul_-_evenroute_v3_server_fq_codel.png

See the swings in latency there? that's symptomatic of tx/rx rings
filling and emptying.

it wasn't until I switched my measurement server temporarily over to
sch_fq that I got a rrul result that was close to the results we used
to get from the virtualized e1000e drivers we were using in 2014.

http://fremont.starlink.taht.net/~d/virtio_nobql/rrul_-_evenroute_v3_server_fq.png

While I have long supported the use of sch_fq for tcp-heavy workloads,
it still behaves better with bql in place, and fq_codel is better for
generic workloads... but needs bql based backpressure to kick in.

[1] I really hope I'm overreacting but, um, er, could someone(s) spin
up a new patch that does bql in some way even half right for this
driver and help test it? I haven't built a kernel in a while.


> > On Mon, May 17, 2021 at 11:41 AM Xianting Tian
> >  wrote:
> > >
> > > BUG_ON() uses unlikely in if(), which can be optimized at compile time.
> > >
> > > Signed-off-by: Xianting Tian 
> > > ---
> > >   drivers/net/virtio_net.c | 5 ++---
> > >   1 file changed, 2 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index c921ebf3ae82..212d52204884 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -1646,10 +1646,9 @@ static int xmit_skb(struct send_queue *sq, struct
> > > sk_buff *skb)
> > > else
> > > hdr = skb_vnet_hdr(skb);
> > >
> > > -   if (virtio_net_hdr_from_skb(skb, >hdr,
> > > +   BUG_ON(virtio_net_hdr_from_skb(skb, >hdr,
> > > virtio_is_little_endian(vi->vdev), 
> > > false,
> > > -   0))
> > > -   BUG();
> > > +   0));
> > >
> > > if (vi->mergeable_rx_bufs)
> > > hdr->num_buffers = 0;
> > > --
> > > 2.17.1
> > >
> >
> >
> > --
> > Latest Podcast:
> > https://www.linkedin.com/feed/update/urn:li:activity:6791014284936785920/
> >
> > Dave Täht CTO, TekLibre, LLC



--
Latest Podcast:
https://www.linkedin.com/feed/update/urn:li:activity:6791014284936785920/

Dave Täht CTO, TekLibre, LLC
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [EXTERNAL] Re: Terminology for Laypeople

2021-05-17 Thread Matt Mathis via Bloat
I just got a cool idea: I wonder if it is original?

Write or adapt a spec based on "A One-way Active Measurement Protocol"
(OWAMP - RFC4656), as an application layer LAG metric.   Suitably framed
OWAMP messages could be injected as close as possible to the socket write
in the sending applications, and decoded as close as possible to the
receiving application's read, independent of all other protocol details.

This could expose lag, latency and jitter in a standardized way, that can
be reported by the applications and replicated by measurement diagnostics
that can be compared apples-to-apples.  The default data collection should
probably be histograms of one way delays.

This would expose problematic delays in all parts of the stack, including
excess socket buffers, etc.

This could be adapted to any application protocol that has an appropriate
framing layer, including ndt7.

Thanks,
--MM--
The best way to predict the future is to create it.  - Alan Kay

We must not tolerate intolerance;
   however our response must be carefully measured:
too strong would be hypocritical and risks spiraling out of
control;
too weak risks being mistaken for tacit approval.


On Mon, May 17, 2021 at 4:14 AM Jonathan Morton 
wrote:

> On 13 May, 2021, at 12:10 am, Michael Richardson  wrote:
>
> But, I'm looking for terminology that I can use with my mother-in-law.
>
>
> Here's a slide I used a while ago, which seems to be relevant here:
>
>
> The important thing about the term "quick" in this context is that
> throughput capacity can contribute to it in some circumstances, but is
> mostly irrelevant in others.  For small requests, throughput is irrelevant
> and quickness is a direct result of low latency.
>
> For a grandmother-friendly analogy, consider what you'd do if you wanted
> milk for your breakfast cereal, but found the fridge was empty.  The ideal
> solution to this problem would be to walk down the road to the village shop
> and buy a bottle of milk, then walk back home.  That might take about ten
> minutes - reasonably "quick".  It might take twice that long if you have to
> wait for someone who wants to scratch off a dozen lottery tickets right at
> the counter while paying by cheque; it's politer for such people to step
> out of the way.
>
> My village doesn't have a shop, so that's not an option.  But I've seen
> dairy tankers going along the main road, so I could consider flagging one
> of them down.  Most of them ignore the lunatic trying to do that, and the
> one that does (five hours later) decides to offload a thousand gallons of
> milk instead of the pint I actually wanted, to make it worth his while.
> That made rather a mess of my kitchen and was quite expensive.  Dairy
> tankers are set up for "fast" transport of milk - high throughput, not
> optimised for latency.
>
> The non-lunatic alternative would be to get on my bicycle and go to the
> supermarket in town.  That takes about two hours, there and back.  It takes
> me basically the same amount of time to fetch that one bottle of milk as it
> would to conduct a full shopping trip, and I can't reduce that time at all
> without upgrading to something faster than a bicycle, or moving house to
> somewhere closer to town.  That's latency for you.
>
>  - Jonathan Morton
> ___
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
>
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [Starlink] starlink bloat in review

2021-05-17 Thread Nathan Owens
Here's someone's monitoring setup with high frequency pings:
https://snapshot.raintank.io/dashboard/snapshot/eL3CqijxCvIn0yJz05QQkg47OTNlk05A?orgId=2
Looks better than the 50-115ms reported.

On Mon, May 17, 2021 at 8:34 AM Neal Cardwell  wrote:

> On Sat, May 15, 2021 at 7:00 PM Matt Mathis via Bloat
>  wrote:
> >
> > I don't understand: starlink doesn't terminate the TCP connection,
> > does it?   Or are you referring to YT's BBR adequately addressing
> > Starlinks variable RTT?   "Adequately" is probably the operative word.
> > It is not too hard to imagine what goes wrong with BBR if the actual
> > path length varies, and on an underloaded network, you may not be able
> > to even detect the symptoms.
>
> On that note, the article mentions:
>   "Starlink itself measures ping times for Counter-Strike: Go and
> Fortnite in its app, and I rarely saw those numbers dip below 50ms,
> mostly hovering around 85-115ms."
>
> If the range 50ms to 115ms is representative of two-way propagation
> delays on their network, then it sounds like BBR can probably perform
> reasonably well in that environment. The algorithm is designed to
> tolerate factor-of-two variations in RTT and still maintain full
> utilization, if there is reasonable buffering.
>
> neal
> ___
> Starlink mailing list
> starl...@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/starlink
>
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] starlink bloat in review

2021-05-17 Thread Jim Gettys
As always, we have the problem of the last mile, in this case the hop into
the starlink network, and whatever is going on in the home router end. Most
Wi-Fi bloat is much worse than the last mile bloat, but you have to set out
to measure each independently.

When I first ran into buffer bloat, I measured 8 second latencies on the
bed upstairs, which if you moved the laptop even a few inches might drop to
something sane.

The customer doesn't care where the bloat is, just that it's happening...

Jim

On Mon, May 17, 2021, 11:08 AM Neal Cardwell via Bloat <
bloat@lists.bufferbloat.net> wrote:

> On Sat, May 15, 2021 at 7:00 PM Matt Mathis via Bloat
>  wrote:
> >
> > I don't understand: starlink doesn't terminate the TCP connection,
> > does it?   Or are you referring to YT's BBR adequately addressing
> > Starlinks variable RTT?   "Adequately" is probably the operative word.
> > It is not too hard to imagine what goes wrong with BBR if the actual
> > path length varies, and on an underloaded network, you may not be able
> > to even detect the symptoms.
>
> On that note, the article mentions:
>   "Starlink itself measures ping times for Counter-Strike: Go and
> Fortnite in its app, and I rarely saw those numbers dip below 50ms,
> mostly hovering around 85-115ms."
>
> If the range 50ms to 115ms is representative of two-way propagation
> delays on their network, then it sounds like BBR can probably perform
> reasonably well in that environment. The algorithm is designed to
> tolerate factor-of-two variations in RTT and still maintain full
> utilization, if there is reasonable buffering.
>
> neal
> ___
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
>
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] starlink bloat in review

2021-05-17 Thread Neal Cardwell via Bloat
On Sat, May 15, 2021 at 7:00 PM Matt Mathis via Bloat
 wrote:
>
> I don't understand: starlink doesn't terminate the TCP connection,
> does it?   Or are you referring to YT's BBR adequately addressing
> Starlinks variable RTT?   "Adequately" is probably the operative word.
> It is not too hard to imagine what goes wrong with BBR if the actual
> path length varies, and on an underloaded network, you may not be able
> to even detect the symptoms.

On that note, the article mentions:
  "Starlink itself measures ping times for Counter-Strike: Go and
Fortnite in its app, and I rarely saw those numbers dip below 50ms,
mostly hovering around 85-115ms."

If the range 50ms to 115ms is representative of two-way propagation
delays on their network, then it sounds like BBR can probably perform
reasonably well in that environment. The algorithm is designed to
tolerate factor-of-two variations in RTT and still maintain full
utilization, if there is reasonable buffering.

neal
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] [EXTERNAL] Re: Terminology for Laypeople

2021-05-17 Thread Jonathan Morton
> On 13 May, 2021, at 12:10 am, Michael Richardson  wrote:
> 
> But, I'm looking for terminology that I can use with my mother-in-law.

Here's a slide I used a while ago, which seems to be relevant here:



The important thing about the term "quick" in this context is that throughput 
capacity can contribute to it in some circumstances, but is mostly irrelevant 
in others.  For small requests, throughput is irrelevant and quickness is a 
direct result of low latency.

For a grandmother-friendly analogy, consider what you'd do if you wanted milk 
for your breakfast cereal, but found the fridge was empty.  The ideal solution 
to this problem would be to walk down the road to the village shop and buy a 
bottle of milk, then walk back home.  That might take about ten minutes - 
reasonably "quick".  It might take twice that long if you have to wait for 
someone who wants to scratch off a dozen lottery tickets right at the counter 
while paying by cheque; it's politer for such people to step out of the way.

My village doesn't have a shop, so that's not an option.  But I've seen dairy 
tankers going along the main road, so I could consider flagging one of them 
down.  Most of them ignore the lunatic trying to do that, and the one that does 
(five hours later) decides to offload a thousand gallons of milk instead of the 
pint I actually wanted, to make it worth his while.  That made rather a mess of 
my kitchen and was quite expensive.  Dairy tankers are set up for "fast" 
transport of milk - high throughput, not optimised for latency.

The non-lunatic alternative would be to get on my bicycle and go to the 
supermarket in town.  That takes about two hours, there and back.  It takes me 
basically the same amount of time to fetch that one bottle of milk as it would 
to conduct a full shopping trip, and I can't reduce that time at all without 
upgrading to something faster than a bicycle, or moving house to somewhere 
closer to town.  That's latency for you.

 - Jonathan Morton___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat