Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

2020-11-09 Thread Jesper Dangaard Brouer
On Sat, 07 Nov 2020 14:00:04 +0100
Thomas Rosenstein via Bloat  wrote:
 
> Here's an extract from the ethtool https://pastebin.com/cabpWGFz just in 
> case there's something hidden.

Yes, there is something hiding in the data from ethtool_stats.pl[1]:
(10G Mellanox Connect-X cards via 10G SPF+ DAC)

 stat:1 (  1) <= outbound_pci_stalled_wr_events /sec
 stat:339731557 (339,731,557) <= rx_buffer_passed_thres_phy /sec

I've not seen this counter 'rx_buffer_passed_thres_phy' before, looking
in the kernel driver code it is related to "rx_buffer_almost_full".
The numbers per second is excessive (but it be related to a driver bug
as it ends up reading "high" -> rx_buffer_almost_full_high in the
extended counters).

 stat: 29583661 ( 29,583,661) <= rx_bytes /sec
 stat: 30343677 ( 30,343,677) <= rx_bytes_phy /sec

You are receiving with 236 Mbit/s in 10Gbit/s link.  There is a
difference between what the OS sees (rx_bytes) and what the NIC
hardware sees (rx_bytes_phy) (diff approx 6Mbit/s).

 stat:19552 ( 19,552) <= rx_packets /sec
 stat:19950 ( 19,950) <= rx_packets_phy /sec

Above RX packet counters also indicated HW is seeing more packets that
OS is receiving.

Next counters is likely your problem:

 stat:  718 (718) <= tx_global_pause /sec
 stat:   954035 (954,035) <= tx_global_pause_duration /sec
 stat:  714 (714) <= tx_pause_ctrl_phy /sec

It looks like you have enabled Ethernet Flow-Control, and something is
causing pause frames to be generated.  It seem strange that this happen
on a 10Gbit/s link with only 236 Mbit/s.

The TX byte counters are also very strange:

 stat:26063 ( 26,063) <= tx_bytes /sec
 stat:71950 ( 71,950) <= tx_bytes_phy /sec

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

[1] 
https://github.com/netoptimizer/network-testing/blob/master/bin/ethtool_stats.pl

Strange size distribution:
 stat: 19922 ( 19,922) <= rx_1519_to_2047_bytes_phy /sec
 stat:14 ( 14) <= rx_65_to_127_bytes_phy /sec

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

2020-11-09 Thread Thomas Rosenstein via Bloat



On 9 Nov 2020, at 9:24, Jesper Dangaard Brouer wrote:


On Sat, 07 Nov 2020 14:00:04 +0100
Thomas Rosenstein via Bloat  wrote:

Here's an extract from the ethtool https://pastebin.com/cabpWGFz just 
in

case there's something hidden.


Yes, there is something hiding in the data from ethtool_stats.pl[1]:
(10G Mellanox Connect-X cards via 10G SPF+ DAC)

 stat:1 (  1) <= outbound_pci_stalled_wr_events 
/sec

 stat:339731557 (339,731,557) <= rx_buffer_passed_thres_phy /sec

I've not seen this counter 'rx_buffer_passed_thres_phy' before, 
looking

in the kernel driver code it is related to "rx_buffer_almost_full".
The numbers per second is excessive (but it be related to a driver bug
as it ends up reading "high" -> rx_buffer_almost_full_high in the
extended counters).

 stat: 29583661 ( 29,583,661) <= rx_bytes /sec
 stat: 30343677 ( 30,343,677) <= rx_bytes_phy /sec

You are receiving with 236 Mbit/s in 10Gbit/s link.  There is a
difference between what the OS sees (rx_bytes) and what the NIC
hardware sees (rx_bytes_phy) (diff approx 6Mbit/s).

 stat:19552 ( 19,552) <= rx_packets /sec
 stat:19950 ( 19,950) <= rx_packets_phy /sec


Could these packets be from VLAN interfaces that are not used in the OS?



Above RX packet counters also indicated HW is seeing more packets that
OS is receiving.

Next counters is likely your problem:

 stat:  718 (718) <= tx_global_pause /sec
 stat:   954035 (954,035) <= tx_global_pause_duration /sec
 stat:  714 (714) <= tx_pause_ctrl_phy /sec


As far as I can see that's only the TX, and we are only doing RX on this 
interface - so maybe that's irrelevant?




It looks like you have enabled Ethernet Flow-Control, and something is
causing pause frames to be generated.  It seem strange that this 
happen

on a 10Gbit/s link with only 236 Mbit/s.

The TX byte counters are also very strange:

 stat:26063 ( 26,063) <= tx_bytes /sec
 stat:71950 ( 71,950) <= tx_bytes_phy /sec


Also, it's TX, and we are only doing RX, as I said already somewhere, 
it's async routing, so the TX data comes via another router back.




--
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

[1] 
https://github.com/netoptimizer/network-testing/blob/master/bin/ethtool_stats.pl


Strange size distribution:
 stat: 19922 ( 19,922) <= rx_1519_to_2047_bytes_phy /sec
 stat:14 ( 14) <= rx_65_to_127_bytes_phy /sec

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

2020-11-09 Thread Jesper Dangaard Brouer
On Mon, 09 Nov 2020 11:09:33 +0100
"Thomas Rosenstein"  wrote:

> On 9 Nov 2020, at 9:24, Jesper Dangaard Brouer wrote:
> 
> > On Sat, 07 Nov 2020 14:00:04 +0100
> > Thomas Rosenstein via Bloat  wrote:
> >  
> >> Here's an extract from the ethtool https://pastebin.com/cabpWGFz just 
> >> in
> >> case there's something hidden.  
> >
> > Yes, there is something hiding in the data from ethtool_stats.pl[1]:
> > (10G Mellanox Connect-X cards via 10G SPF+ DAC)
> >
> >  stat:1 (  1) <= outbound_pci_stalled_wr_events /sec
> >  stat:339731557 (339,731,557) <= rx_buffer_passed_thres_phy /sec
> >
> > I've not seen this counter 'rx_buffer_passed_thres_phy' before, looking
> > in the kernel driver code it is related to "rx_buffer_almost_full".
> > The numbers per second is excessive (but it be related to a driver bug
> > as it ends up reading "high" -> rx_buffer_almost_full_high in the
> > extended counters).

Notice this indication is a strong red-flag that something is wrong.

> >  stat: 29583661 ( 29,583,661) <= rx_bytes /sec
> >  stat: 30343677 ( 30,343,677) <= rx_bytes_phy /sec
> >
> > You are receiving with 236 Mbit/s in 10Gbit/s link.  There is a
> > difference between what the OS sees (rx_bytes) and what the NIC
> > hardware sees (rx_bytes_phy) (diff approx 6Mbit/s).
> >
> >  stat:19552 ( 19,552) <= rx_packets /sec
> >  stat:19950 ( 19,950) <= rx_packets_phy /sec  
> 
> Could these packets be from VLAN interfaces that are not used in the OS?
> 
> >
> > Above RX packet counters also indicated HW is seeing more packets that
> > OS is receiving.
> >
> > Next counters is likely your problem:
> >
> >  stat:  718 (718) <= tx_global_pause /sec
> >  stat:   954035 (954,035) <= tx_global_pause_duration /sec
> >  stat:  714 (714) <= tx_pause_ctrl_phy /sec  
> 
> As far as I can see that's only the TX, and we are only doing RX on this 
> interface - so maybe that's irrelevant?
> 
> >
> > It looks like you have enabled Ethernet Flow-Control, and something is
> > causing pause frames to be generated.  It seem strange that this 
> > happen on a 10Gbit/s link with only 236 Mbit/s.
> >
> > The TX byte counters are also very strange:
> >
> >  stat:26063 ( 26,063) <= tx_bytes /sec
> >  stat:71950 ( 71,950) <= tx_bytes_phy /sec  
> 
> Also, it's TX, and we are only doing RX, as I said already somewhere, 
> it's async routing, so the TX data comes via another router back.

Okay, but as this is a router you also need to transmit this
(asymmetric) traffic out another interface right.

Could you also provide ethtool_stats for the TX interface?

Notice that the tool[1] ethtool_stats.pl support monitoring several
interfaces at the same time, e.g. run:

 ethtool_stats.pl --sec 3 --dev eth4 --dev ethTX

And provide output as pastebin.


> > [1] 
> > https://github.com/netoptimizer/network-testing/blob/master/bin/ethtool_stats.pl
> >
> > Strange size distribution:
> >  stat: 19922 ( 19,922) <= rx_1519_to_2047_bytes_phy /sec
> >  stat:14 ( 14) <= rx_65_to_127_bytes_phy /sec  
> 

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

2020-11-09 Thread Toke Høiland-Jørgensen via Bloat
>> Also, it's TX, and we are only doing RX, as I said already somewhere, 
>> it's async routing, so the TX data comes via another router back.
>
> Okay, but as this is a router you also need to transmit this
> (asymmetric) traffic out another interface right.
>
> Could you also provide ethtool_stats for the TX interface?
>
> Notice that the tool[1] ethtool_stats.pl support monitoring several
> interfaces at the same time, e.g. run:
>
>  ethtool_stats.pl --sec 3 --dev eth4 --dev ethTX
>
> And provide output as pastebin.

Also, from your drawing, everything goes through the same switch, right?
And since pause frames are involved... Maybe it's the switch being
overwhelmed?

In general, I find that pause frames give more problems than they solve,
so I always turn them off...

-Toke
___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

2020-11-09 Thread Thomas Rosenstein via Bloat



On 9 Nov 2020, at 12:40, Jesper Dangaard Brouer wrote:


On Mon, 09 Nov 2020 11:09:33 +0100
"Thomas Rosenstein"  wrote:


On 9 Nov 2020, at 9:24, Jesper Dangaard Brouer wrote:


On Sat, 07 Nov 2020 14:00:04 +0100
Thomas Rosenstein via Bloat  wrote:

Here's an extract from the ethtool https://pastebin.com/cabpWGFz 
just

in
case there's something hidden.


Yes, there is something hiding in the data from ethtool_stats.pl[1]:
(10G Mellanox Connect-X cards via 10G SPF+ DAC)

 stat:1 (  1) <= outbound_pci_stalled_wr_events 
/sec

 stat:339731557 (339,731,557) <= rx_buffer_passed_thres_phy /sec

I've not seen this counter 'rx_buffer_passed_thres_phy' before, 
looking

in the kernel driver code it is related to "rx_buffer_almost_full".
The numbers per second is excessive (but it be related to a driver 
bug

as it ends up reading "high" -> rx_buffer_almost_full_high in the
extended counters).


Notice this indication is a strong red-flag that something is wrong.


Okay, but as this is a router you also need to transmit this
(asymmetric) traffic out another interface right.


The asymmetric traffic comes back on another router, this is router-02, 
traffic from internet comes back on router-01,

I also added the interfaces names.

See the updated diagram: 
https://drive.google.com/file/d/15oAsxiNfsbjB9a855Q_dh6YvFZBDdY5I/view?usp=sharing




Could you also provide ethtool_stats for the TX interface?

Notice that the tool[1] ethtool_stats.pl support monitoring several
interfaces at the same time, e.g. run:

 ethtool_stats.pl --sec 3 --dev eth4 --dev ethTX

And provide output as pastebin.


I have disabled pause control, like Toke said via:

ethtool -A eth4 autoneg off rx off tx off
ethtool -A eth5 autoneg off rx off tx off

Afterwards an ethtool output, first "without" traffic for a few seconds, 
then with the problematic flow.


Since the output is > 512KB I had to upload it on gdrive:

https://drive.google.com/file/d/1EVKt1LseaBuD40QE-SqFvqYSeWUEcGA_/view?usp=sharing





[1]
https://github.com/netoptimizer/network-testing/blob/master/bin/ethtool_stats.pl

Strange size distribution:
 stat: 19922 ( 19,922) <= rx_1519_to_2047_bytes_phy /sec
 stat:14 ( 14) <= rx_65_to_127_bytes_phy /sec


I assume it's because of the VLAN Tagging, and therefore 1522 bytes per 
packet with mtu of 1500?







--
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

2020-11-09 Thread Thomas Rosenstein via Bloat



On 9 Nov 2020, at 12:40, Jesper Dangaard Brouer wrote:


On Mon, 09 Nov 2020 11:09:33 +0100
"Thomas Rosenstein"  wrote:


On 9 Nov 2020, at 9:24, Jesper Dangaard Brouer wrote:


On Sat, 07 Nov 2020 14:00:04 +0100
Thomas Rosenstein via Bloat  wrote:

Here's an extract from the ethtool https://pastebin.com/cabpWGFz 
just

in
case there's something hidden.


Yes, there is something hiding in the data from ethtool_stats.pl[1]:
(10G Mellanox Connect-X cards via 10G SPF+ DAC)

 stat:1 (  1) <= outbound_pci_stalled_wr_events 
/sec

 stat:339731557 (339,731,557) <= rx_buffer_passed_thres_phy /sec

I've not seen this counter 'rx_buffer_passed_thres_phy' before, 
looking

in the kernel driver code it is related to "rx_buffer_almost_full".
The numbers per second is excessive (but it be related to a driver 
bug

as it ends up reading "high" -> rx_buffer_almost_full_high in the
extended counters).


I have now tested with a new kernel 5.9.4 build made from 3.10 with make 
oldconfig and I noticed an interesting effect.


The first ca. 2 minutes the router behaves completely normal as with 
3.10, after that the ping times go crazy.


I have recorded this with ethtool, and also the ping times.

Ethtool: (13 MB)
https://drive.google.com/file/d/1Ojp64UUw0zKwrgF_CisZb3BCdidAJYZo/view?usp=sharing

The transfer first was doing around 50 - 70 MB/s then once the ping 
times go worse it dropped to ~12 MB/s.

ca. Line 74324 the transfer speed drops to 12 MB/s

Seems you are right about the rx_buffer_passed_thres_phy if you check 
just those lines they appear more often once the speed dropped.

Not sure if that's the cause or an effect of the underlying problem!

Pings:
https://drive.google.com/file/d/16phOxM5IFU6RAl4Ua4pRqMNuLYBc4RK7/view?usp=sharing

Pause frames were activated again after the restart.

(Here a link for rerefence for the ethtool variables: 
https://community.mellanox.com/s/article/understanding-mlx5-ethtool-counters)

___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

2020-11-09 Thread Thomas Rosenstein via Bloat



On 9 Nov 2020, at 12:40, Jesper Dangaard Brouer wrote:


On Mon, 09 Nov 2020 11:09:33 +0100
"Thomas Rosenstein"  wrote:

Could you also provide ethtool_stats for the TX interface?

Notice that the tool[1] ethtool_stats.pl support monitoring several
interfaces at the same time, e.g. run:

 ethtool_stats.pl --sec 3 --dev eth4 --dev ethTX

And provide output as pastebin.


I have now also repeated the same test with 3.10, here are the ethtool 
outputs:


https://drive.google.com/file/d/1c98MVV0JYl6Su6xZTpqwS7m-6OlbmAFp/view?usp=sharing

and the ping times:

https://drive.google.com/file/d/1xhbGJHb5jUbPsee4frbx-c-uqh-7orXY/view?usp=sharing

Sadly the parameters we were looking at are not supported below 4.14.

but I immediatly saw 1 thing very different:

ethtool --statistics eth4 | grep discards
 rx_discards_phy: 0
 tx_discards_phy: 0

if we check the ethtool output from 5.9.4 were have:

 rx_discards_phy: 151793

And also the outbound_pci_stalled_wr_events get more frequent the lower 
the total bandwidth / the higher the ping is.
Logically there must be something blocking the the buffers, either they 
are not getting freed, or not rotated correctly, or processing is too 
slow.
I would exclude the processing, simply based on 0% CPU load, and also 
that it doesn't happen in 3.10.
Suspicious is also, that the issue only appears after a certain time of 
activity (maybe total traffic?!)



___
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat