You have been subscribed to a public bug:

== Comment: #0 - KISHORE KUMAR  G <kishor...@in.ibm.com> - 2022-09-19 04:39:42 
==
---Problem Description---
On a  Ubuntu/s390 system that houses a Mellanox CX5 Adapter  with two ports 
connected to the a pair of TOR switches , act as entry point to cluster of 
compute nodes to access public network ( edge node) with following level of mlx 
firmware :

ethtool -i p0

driver: mlx5e_rep
version: 5.4.0-104.118-
firmware-version: 16.27.1016 (MT_0000000013)
expansion-rom-version:
bus-info: 0100:00:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no


The LAG affinity module of mlx5_core in upstream 5.4 kernel listens to
routing events and sets the LAG affinity accordingly , whereas in one of
custom services  has  Fabcon service listens to the routing events and
sets the LAG affinity in the mellanox driver accordingly.

The edge node routes defined in  compute nodes  use both the two  interfaces 
(port1 -P0 and port2- P1) for the LAG affinity. For instance 
10.66.0.170 proto bgp src 10.66.11.43 metric 20 
nexthop via 172.31.22.42 dev p0 weight 1 
nexthop via 172.31.22.170 dev p1 weight 1

As an example post an edge node bootup ,  LAG mapping gets converged to use 
both  port1(P0) and port2 (P1) by default 
root@pok1-qz1-sr1-rk011-s20:/# dmesg | grep lag
[  282.043011] mlx5_core 0100:00:00.0: lag map port 1:2 port 2:2                
             
[  282.083541] mlx5_core 0100:00:00.0: modify lag map port 1:1 port 2:2     
(<------ Both ports are equally mapped)

The issue comes, when the mlx5_core driver  cannot derive the LAG
configuration from specific routes. For instance,an operation of
disabling an interface from edge node above (10.66.0.170) or
addition/removal of the interface, causes mlx5_core driver to listen on
the routing change and change the LAG affinity to use a single network
interface only.

In the following example ,a new static route entry  to a single
destination  (10.66.47.34) is added  as below

 ip route add 10.66.47.34 proto static src 10.66.11.43 metric 20 via
172.31.22.42 dev p0

Caused  the LAG mapping change to port1(p0)   as detected as following

root@pok1-qz1-sr1-rk011-s20:/# dmesg | grep lag
[  282.043011] mlx5_core 0100:00:00.0: lag map port 1:2 port 2:2
[  282.083541] mlx5_core 0100:00:00.0: modify lag map port 1:1 port 2:2
[  757.878626] mlx5_core 0100:00:00.0: modify lag map port 1:1 port 2:1   
<----mapping directs to go thru P0.

The above behaviour, causes all the traffic in 10.x to use  a single network 
interface.
The TOR switches (Fabric) doesn't capture or know  such a  LAG affinity change 
and therefore the packets will be dropped on  "not in use" interface ( Eg. Port 
2 (P1) ).

So the mellanox(mlx5_core)  should not be changing the LAG mapping
/config based on the last route event, rather should rely on the default
routes only.

Mellanox agreed to patch this and its is available in  5.15.29  Ubuntu and 
5.15.39 respectively 
Following are the commits  that resolves this issue .
1. net/mlx5e: Lag,Only handle events from highest priority multipath entry  . 
Available in upstream  
Kernel 5.15.29 - 
https://github.com/torvalds/linux/commit/ad11c4f1d8fd1f03639460e425a36f7fd0ea83f5

2.net/mlx5e: Lag, Don't skip fib events on current dst  .
(5.15.29)https://github.com/torvalds/linux/commit/4a2a664ed87962c4ddb806a84b5c9634820bcf55

)3. net/mlx5e: Lag, Fix fib_info pointer assignment - ( 5.15.39 )
https://github.com/torvalds/linux/commit/a6589155ec9847918e00e7279b8aa6d4c272bea7

4. net/mlx5e: Lag, Fix use-after-free in fib event handler  - (5.15.39)

https://github.com/torvalds/linux/commit/27b0420fd959e38e3500e60b637d39dfab065645


The request is to have the above commits backported in Ubuntu 20.04.x series  
including the 
Ubuntu 18.04 HWE kernel


 
Contact Information = Kishore Kumar G/kishore.pil...@in.ibm.com 
utsav.shrivas...@ibm.com 
 
---Additional Hardware Info---
Mellanox CX5 adapter with firmware-version: 16.27.1016 (MT_0000000013)
 

 
---uname output---
Linux version version: 5.4.0-104.118
 
Machine Type = s390x LPAR 
 
---Debugger---
A debugger is not configured
 
---Steps to Reproduce---
 ...
"
default proto bgp src 10.66.11.41 metric 20
        nexthop via 172.31.22.40 dev p0 weight 1
        nexthop via 172.31.22.168 dev p1 weight 1"
......
172.31.22.40/31 dev p0 proto kernel scope link src 172.31.22.41  
172.31.22.168/31 dev p1 proto kernel scope link src 172.31.22.169

..

Also we have around 64 SRIOV devices for VM Consumption.

In the above  case, the LAG mapping is working as expected as below, to
use both the ports (p0 and p1) for traffic

root@pok1-qz1-sr1-rk011-s20:/# dmesg | grep lag

[  282.043011] mlx5_core 0100:00:00.0: lag map port 1:2 port 2:2

[  282.083541] mlx5_core 0100:00:00.0: modify lag map port 1:1 port 2:2
<<<---behavior expected


The issue comes , when we set an additional route to a single IP in the 
underlying network with a single/one next hop , we observe that all the traffic 
is being shifted to a single next hop port as the example below shows.


root@pok1-qz1-sr1-rk011-s20:/# ip route add 10.66.47.34 proto static src 
10.66.11.41 metric 20 via 172.31.22.40 dev p0


root@pok1-qz1-sr1-rk011-s20:/# dmesg | grep lag

[  282.043011] mlx5_core 0100:00:00.0: lag map port 1:2 port 2:2

[  282.083541] mlx5_core 0100:00:00.0: modify lag map port 1:1 port 2:2

[  757.878626] mlx5_core 0100:00:00.0: modify lag map port 1:1 port 2:1
<<<<------- Issue


 
Stack trace output:
 no
 
Oops output:
 no
 
System Dump Info:
  The system is not configured to capture a system dump.
 
*Additional Instructions for Kishore Kumar G/kishore.pil...@in.ibm.com 
utsav.shrivas...@ibm.com: 
-Attach sysctl -a output output to the bug.

** Affects: linux (Ubuntu)
     Importance: Undecided
     Assignee: Skipper Bug Screeners (skipper-screen-team)
         Status: New


** Tags: architecture-s3903164 bugnameltc-200004 severity-high 
targetmilestone-inin---
-- 
[UBUNTU 20.04] Unexpected  LAG affinity behaviour with  mlx5_core driver in 
Ubuntu 20.04
https://bugs.launchpad.net/bugs/1990275
You received this bug notification because you are a member of Kernel Packages, 
which is subscribed to linux in Ubuntu.

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to