SRU request submitted:
https://lists.ubuntu.com/archives/kernel-team/2018-October/096376.html

** Also affects: linux (Ubuntu Cosmic)
   Importance: Undecided
       Status: New

** Changed in: linux (Ubuntu Cosmic)
       Status: New => In Progress

** Changed in: linux (Ubuntu Cosmic)
   Importance: Undecided => Critical

** Changed in: linux (Ubuntu Cosmic)
     Assignee: (unassigned) => Joseph Salisbury (jsalisbury)

** Description changed:

+ 
+ == SRU Justification ==
+ The requested commit fixes a regression introduce by mainline commit
+ 3a2f70331226, in v4.18-rc1.  The commit is only needed in Cosmic.  Do to
+ the regression, A Mellanox CX5 stops pinging with rx_wqe_err (mlx5_core)
+ 
+ == Fix ==
+ 37fdffb217a4 ("net/mlx5: WQ, fixes for fragmented WQ buffers API")
+ 
+ == Regression Potential ==
+ Low. This commit has been cc'd to stable, so it has had additional
+ upstream review.
+ 
+ == Test Case ==
+ A test kernel was built with this patch and tested by the original bug 
reporter.
+ The bug reporter states the test kernel resolved the bug.
+ 
+ 
+ 
  == Comment: #0 - Michael Ranweiler  - 2018-10-18 11:34:40 ==
  
+ ---Problem Description---
+ At the system if u do
+ ethtool -S enP48p1s0f0 | grep wqe_err
+      rx_wqe_err: 1
+      rx0_wqe_err: 0
+      rx1_wqe_err: 0
+      rx2_wqe_err: 0
+      rx3_wqe_err: 1
+      rx4_wqe_err: 0
+      rx5_wqe_err: 0
+      rx6_wqe_err: 0
+      rx7_wqe_err: 0
+      rx8_wqe_err: 0
+      rx9_wqe_err: 0
+      rx10_wqe_err: 0
+      rx11_wqe_err: 0
+      rx12_wqe_err: 0
+      rx13_wqe_err: 0
+      rx14_wqe_err: 0
+      rx15_wqe_err: 0
  
- ---Problem Description---
- At the system if u do 
- ethtool -S enP48p1s0f0 | grep wqe_err
-      rx_wqe_err: 1
-      rx0_wqe_err: 0
-      rx1_wqe_err: 0
-      rx2_wqe_err: 0
-      rx3_wqe_err: 1
-      rx4_wqe_err: 0
-      rx5_wqe_err: 0
-      rx6_wqe_err: 0
-      rx7_wqe_err: 0
-      rx8_wqe_err: 0
-      rx9_wqe_err: 0
-      rx10_wqe_err: 0
-      rx11_wqe_err: 0
-      rx12_wqe_err: 0
-      rx13_wqe_err: 0
-      rx14_wqe_err: 0
-      rx15_wqe_err: 0
+ Will see that rx side is hitting issue.
  
- Will see that rx side is hitting issue. 
-  
-  
  ---Additional Hardware Info---
  Mellanox CX5 Ethernet 100G
  lspci
  0030:01:00.0 Ethernet controller: Mellanox Technologies MT28800 Family 
[ConnectX-5 Ex]
  0030:01:00.1 Ethernet controller: Mellanox Technologies MT28800 Family 
[ConnectX-5 Ex]
-  
-  
- Machine Type = P9 
-  
+ 
+ Machine Type = P9
+ 
  ---Debugger---
  A debugger is not configured
-  
+ 
  ---Steps to Reproduce---
- Using a CX5 Ethernet 100G card 
+ Using a CX5 Ethernet 100G card
  lspci
  0030:01:00.0 Ethernet controller: Mellanox Technologies MT28800 Family 
[ConnectX-5 Ex]
  0030:01:00.1 Ethernet controller: Mellanox Technologies MT28800 Family 
[ConnectX-5 Ex]
  
- just configure IP 
+ just configure IP
  ifconfig enP48p1s0f0 33.33.33.33 netmask 255.255.255.0 up
  then partner system configure IP and then try ping -f
  ping -f 33.33.33.33
  PING 33.33.33.33 (33.33.33.33) 56(84) bytes of data.
  ........................................^C
  --- 33.33.33.33 ping statistics ---
  5413 packets transmitted, 5373 received, 0% packet loss, time 934ms
  rtt min/avg/max/mdev = 0.015/0.019/0.669/0.010 ms, ipg/ewma 0.172/0.020 ms
  # ping 33.33.33.33
  PING 33.33.33.33 (33.33.33.33) 56(84) bytes of data.
  ^C
  --- 33.33.33.33 ping statistics ---
  2 packets transmitted, 0 received, 100% packet loss, time 1071ms
  
- then at the recv system then do 
+ then at the recv system then do
  ethtool -S enP48p1s0f0 | grep wqe_err
-      rx_wqe_err: 1
-      rx0_wqe_err: 0
-      rx1_wqe_err: 0
-      rx2_wqe_err: 0
-      rx3_wqe_err: 1
-      rx4_wqe_err: 0
-      rx5_wqe_err: 0
-      rx6_wqe_err: 0
-      rx7_wqe_err: 0
-      rx8_wqe_err: 0
-      rx9_wqe_err: 0
-      rx10_wqe_err: 0
-      rx11_wqe_err: 0
-      rx12_wqe_err: 0
-      rx13_wqe_err: 0
-      rx14_wqe_err: 0
-      rx15_wqe_err: 0
- you will see rx_wqe_err with a counter non-zero. 
+      rx_wqe_err: 1
+      rx0_wqe_err: 0
+      rx1_wqe_err: 0
+      rx2_wqe_err: 0
+      rx3_wqe_err: 1
+      rx4_wqe_err: 0
+      rx5_wqe_err: 0
+      rx6_wqe_err: 0
+      rx7_wqe_err: 0
+      rx8_wqe_err: 0
+      rx9_wqe_err: 0
+      rx10_wqe_err: 0
+      rx11_wqe_err: 0
+      rx12_wqe_err: 0
+      rx13_wqe_err: 0
+      rx14_wqe_err: 0
+      rx15_wqe_err: 0
+ you will see rx_wqe_err with a counter non-zero.
  
  This is fixed by this patch:
  
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=37fdffb217a45609edccbb8b407d031143f551c0
  
  == Comment: #1 - Carol L. Soto  - 2018-10-18 11:46:00 ==
- I did a git clone to the cosmic tree and loaded the kernel in a system. 
+ I did a git clone to the cosmic tree and loaded the kernel in a system.
  
  kernel 4.18.12 and I can recreate it.
  
  lspci | grep Mell | grep ConnectX-5
  0000:01:00.0 Ethernet controller: Mellanox Technologies MT28800 Family 
[ConnectX-5 Ex]
  0000:01:00.1 Ethernet controller: Mellanox Technologies MT28800 Family 
[ConnectX-5 Ex]
  0030:01:00.0 Infiniband controller: Mellanox Technologies MT28800 Family 
[ConnectX-5 Ex]
  0030:01:00.1 Infiniband controller: Mellanox Technologies MT28800 Family 
[ConnectX-5 Ex]
  :~# ethtool -S enp1s0f0 | grep wqe_err
-      rx_wqe_err: 2
-      rx0_wqe_err: 1
-      rx1_wqe_err: 1
-      rx2_wqe_err: 0
-      rx3_wqe_err: 0
-      rx4_wqe_err: 0
-      rx5_wqe_err: 0
-      rx6_wqe_err: 0
-      rx7_wqe_err: 0
-      rx8_wqe_err: 0
-      rx9_wqe_err: 0
-      rx10_wqe_err: 0
+      rx_wqe_err: 2
+      rx0_wqe_err: 1
+      rx1_wqe_err: 1
+      rx2_wqe_err: 0
+      rx3_wqe_err: 0
+      rx4_wqe_err: 0
+      rx5_wqe_err: 0
+      rx6_wqe_err: 0
+      rx7_wqe_err: 0
+      rx8_wqe_err: 0
+      rx9_wqe_err: 0
+      rx10_wqe_err: 0
  ...
- 
  
  Let me check if the proposed patch needs backport or not.
  
  == Comment: #3 - Carol L. Soto  - 2018-10-18 13:34:46 ==
- I was able to apply the proposed patch as it  to the cosmic git tree and no 
issue. (no need to backport) 
- using a kernel 4.18.12+. 
+ I was able to apply the proposed patch as it  to the cosmic git tree and no 
issue. (no need to backport)
+ using a kernel 4.18.12+.
  
- With the proposed patch I do not see wqe err and ping does not stop. 
+ With the proposed patch I do not see wqe err and ping does not stop.
  ethtool -S enp1s0f0 | grep wqe_err
-      rx_wqe_err: 0
-      rx0_wqe_err: 0
-      rx1_wqe_err: 0
-      rx2_wqe_err: 0
-      rx3_wqe_err: 0
-      rx4_wqe_err: 0
-      rx5_wqe_err: 0
-      rx6_wqe_err: 0
-      rx7_wqe_err: 0
-      rx8_wqe_err: 0
-      rx9_wqe_err: 0
-      rx10_wqe_err: 0
+      rx_wqe_err: 0
+      rx0_wqe_err: 0
+      rx1_wqe_err: 0
+      rx2_wqe_err: 0
+      rx3_wqe_err: 0
+      rx4_wqe_err: 0
+      rx5_wqe_err: 0
+      rx6_wqe_err: 0
+      rx7_wqe_err: 0
+      rx8_wqe_err: 0
+      rx9_wqe_err: 0
+      rx10_wqe_err: 0
  ...

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1799393

Title:
  Mellanox CX5 stops pinging with rx_wqe_err (mlx5_core)

Status in The Ubuntu-power-systems project:
  In Progress
Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Cosmic:
  In Progress

Bug description:
  
  == SRU Justification ==
  The requested commit fixes a regression introduce by mainline commit
  3a2f70331226, in v4.18-rc1.  The commit is only needed in Cosmic.  Do to
  the regression, A Mellanox CX5 stops pinging with rx_wqe_err (mlx5_core)

  == Fix ==
  37fdffb217a4 ("net/mlx5: WQ, fixes for fragmented WQ buffers API")

  == Regression Potential ==
  Low. This commit has been cc'd to stable, so it has had additional
  upstream review.

  == Test Case ==
  A test kernel was built with this patch and tested by the original bug 
reporter.
  The bug reporter states the test kernel resolved the bug.


  
  == Comment: #0 - Michael Ranweiler  - 2018-10-18 11:34:40 ==

  ---Problem Description---
  At the system if u do
  ethtool -S enP48p1s0f0 | grep wqe_err
       rx_wqe_err: 1
       rx0_wqe_err: 0
       rx1_wqe_err: 0
       rx2_wqe_err: 0
       rx3_wqe_err: 1
       rx4_wqe_err: 0
       rx5_wqe_err: 0
       rx6_wqe_err: 0
       rx7_wqe_err: 0
       rx8_wqe_err: 0
       rx9_wqe_err: 0
       rx10_wqe_err: 0
       rx11_wqe_err: 0
       rx12_wqe_err: 0
       rx13_wqe_err: 0
       rx14_wqe_err: 0
       rx15_wqe_err: 0

  Will see that rx side is hitting issue.

  ---Additional Hardware Info---
  Mellanox CX5 Ethernet 100G
  lspci
  0030:01:00.0 Ethernet controller: Mellanox Technologies MT28800 Family 
[ConnectX-5 Ex]
  0030:01:00.1 Ethernet controller: Mellanox Technologies MT28800 Family 
[ConnectX-5 Ex]

  Machine Type = P9

  ---Debugger---
  A debugger is not configured

  ---Steps to Reproduce---
  Using a CX5 Ethernet 100G card
  lspci
  0030:01:00.0 Ethernet controller: Mellanox Technologies MT28800 Family 
[ConnectX-5 Ex]
  0030:01:00.1 Ethernet controller: Mellanox Technologies MT28800 Family 
[ConnectX-5 Ex]

  just configure IP
  ifconfig enP48p1s0f0 33.33.33.33 netmask 255.255.255.0 up
  then partner system configure IP and then try ping -f
  ping -f 33.33.33.33
  PING 33.33.33.33 (33.33.33.33) 56(84) bytes of data.
  ........................................^C
  --- 33.33.33.33 ping statistics ---
  5413 packets transmitted, 5373 received, 0% packet loss, time 934ms
  rtt min/avg/max/mdev = 0.015/0.019/0.669/0.010 ms, ipg/ewma 0.172/0.020 ms
  # ping 33.33.33.33
  PING 33.33.33.33 (33.33.33.33) 56(84) bytes of data.
  ^C
  --- 33.33.33.33 ping statistics ---
  2 packets transmitted, 0 received, 100% packet loss, time 1071ms

  then at the recv system then do
  ethtool -S enP48p1s0f0 | grep wqe_err
       rx_wqe_err: 1
       rx0_wqe_err: 0
       rx1_wqe_err: 0
       rx2_wqe_err: 0
       rx3_wqe_err: 1
       rx4_wqe_err: 0
       rx5_wqe_err: 0
       rx6_wqe_err: 0
       rx7_wqe_err: 0
       rx8_wqe_err: 0
       rx9_wqe_err: 0
       rx10_wqe_err: 0
       rx11_wqe_err: 0
       rx12_wqe_err: 0
       rx13_wqe_err: 0
       rx14_wqe_err: 0
       rx15_wqe_err: 0
  you will see rx_wqe_err with a counter non-zero.

  This is fixed by this patch:
  
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=37fdffb217a45609edccbb8b407d031143f551c0

  == Comment: #1 - Carol L. Soto  - 2018-10-18 11:46:00 ==
  I did a git clone to the cosmic tree and loaded the kernel in a system.

  kernel 4.18.12 and I can recreate it.

  lspci | grep Mell | grep ConnectX-5
  0000:01:00.0 Ethernet controller: Mellanox Technologies MT28800 Family 
[ConnectX-5 Ex]
  0000:01:00.1 Ethernet controller: Mellanox Technologies MT28800 Family 
[ConnectX-5 Ex]
  0030:01:00.0 Infiniband controller: Mellanox Technologies MT28800 Family 
[ConnectX-5 Ex]
  0030:01:00.1 Infiniband controller: Mellanox Technologies MT28800 Family 
[ConnectX-5 Ex]
  :~# ethtool -S enp1s0f0 | grep wqe_err
       rx_wqe_err: 2
       rx0_wqe_err: 1
       rx1_wqe_err: 1
       rx2_wqe_err: 0
       rx3_wqe_err: 0
       rx4_wqe_err: 0
       rx5_wqe_err: 0
       rx6_wqe_err: 0
       rx7_wqe_err: 0
       rx8_wqe_err: 0
       rx9_wqe_err: 0
       rx10_wqe_err: 0
  ...

  Let me check if the proposed patch needs backport or not.

  == Comment: #3 - Carol L. Soto  - 2018-10-18 13:34:46 ==
  I was able to apply the proposed patch as it  to the cosmic git tree and no 
issue. (no need to backport)
  using a kernel 4.18.12+.

  With the proposed patch I do not see wqe err and ping does not stop.
  ethtool -S enp1s0f0 | grep wqe_err
       rx_wqe_err: 0
       rx0_wqe_err: 0
       rx1_wqe_err: 0
       rx2_wqe_err: 0
       rx3_wqe_err: 0
       rx4_wqe_err: 0
       rx5_wqe_err: 0
       rx6_wqe_err: 0
       rx7_wqe_err: 0
       rx8_wqe_err: 0
       rx9_wqe_err: 0
       rx10_wqe_err: 0
  ...

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1799393/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to