[Kernel-packages] [Bug 1763325] Re: [bionic] ConnectX5 Large message size throughput degradation in TCP

2018-04-12 Thread Seth Forshee
*** This bug is a duplicate of bug 1763366 ***
https://bugs.launchpad.net/bugs/1763366

The fix was included in the 4.15.17 stable update. Duping this bug to
that one.

** This bug has been marked a duplicate of bug 1763366
   Bionic update to v4.15.17 stable release

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1763325

Title:
  [bionic] ConnectX5 Large message size throughput degradation in TCP

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  we see degradation ~20% on ConnectX-5/4 in the following case:
  TCP, 1 QP, 1 stream, unidir, single port.
  Message sizes 1M and up show this degradation.

  After changing the default TX moderation mode to off we see up to 40%
  packet rate and up to 23% bandwidth degradtions.

  
  There is an upstream commit that fix this issue, I will backport it and send 
it to the kernel-t...@lists.ubuntu.com

  
  commit 48bfc39791b8b4a25f165e711f18b9c1617cefbc   

   
  Author: Tal Gilboa    

   
  Date:   Fri Mar 30 15:50:08 2018 -0700

   

  net/mlx5e: Set EQE based as default TX interrupt moderation mode
  
  The default TX moderation mode was mistakenly set to CQE based. The
  intention was to add a control ability in order to improve some specific
  use-cases. In general, we prefer to use EQE based moderation as it gives
  much better numbers for the common cases.

  CQE based causes a degradation in the common case since it resets the
  moderation timer on CQE generation. This causes an issue when TSO is
  well utilized (large TSO sessions). The timer is set to 16us so traffic
  of ~64KB TSO sessions per second would mean timer reset (CQE per TSO
  session -> long time between CQEs). In this case we quickly reach the
  tcp_limit_output_bytes (256KB by default) and cause a halt in TX traffic.

  By setting EQE based moderation we make sure timer would expire after
  16us regardless of the packet rate.
  This fixes an up to 40% packet rate and up to 23% bandwidth degradtions.

  Fixes: 0088cbbc4b66 ("net/mlx5e: Enable CQE based moderation on TX CQ")
  Signed-off-by: Tal Gilboa 
  Signed-off-by: Saeed Mahameed 
  Signed-off-by: David S. Miller 

  diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
  index c71f4f10283b..0aab3afc6885 100644
  --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
  +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
  @@ -4137,7 +4137,7 @@ void mlx5e_build_nic_params(struct mlx5_core_dev *mdev,
  struct mlx5e_params *params,
  u16 max_channels, u16 mtu)
   {
  -   u8 cq_period_mode = 0;
  + u8 rx_cq_period_mode;

  params->sw_mtu = mtu;
  params->hard_mtu = MLX5E_ETH_HARD_MTU;
  @@ -4173,12 +4173,12 @@ void mlx5e_build_nic_params(struct mlx5_core_dev 
*mdev,
  params->lro_timeout = mlx5e_choose_lro_timeout(mdev, 
MLX5E_DEFAULT_LRO_TIMEOUT);

  /* CQ moderation params */
  -   cq_period_mode = MLX5_CAP_GEN(mdev, cq_period_start_from_cqe) ?
  + rx_cq_period_mode = MLX5_CAP_GEN(mdev, cq_period_start_from_cqe) ?
  MLX5_CQ_PERIOD_MODE_START_FROM_CQE :
  MLX5_CQ_PERIOD_MODE_START_FROM_EQE;
  params->rx_dim_enabled = MLX5_CAP_GEN(mdev, cq_moderation);
  -   mlx5e_set_rx_cq_mode_params(params, cq_period_mode);
  -   mlx5e_set_tx_cq_mode_params(params, cq_period_mode);
  + mlx5e_set_rx_cq_mode_params(params, rx_cq_period_mode);
  + mlx5e_set_tx_cq_mode_params(params, MLX5_CQ_PERIOD_MODE_START_FROM_EQE);

  /* TX inline */
  params->tx_min_inline_mode = 
mlx5e_params_calculate_tx_min_inline(mdev);

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1763325/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1763325] Re: [bionic] ConnectX5 Large message size throughput degradation in TCP

2018-04-12 Thread Talat Batheesh
Testing this patch with bionic and it is working properly

before the patch

# ethtool --show-priv-flags enp6s0f0 
Private flags for enp6s0f0:  
rx_cqe_moder   : on  
tx_cqe_moder   : on  
rx_cqe_compress: off 

After applying the patch

# ethtool --show-priv-flags enp6s0f0 
Private flags for enp6s0f0:  
rx_cqe_moder   : on  
tx_cqe_moder   : off 
rx_cqe_compress: off

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1763325

Title:
  [bionic] ConnectX5 Large message size throughput degradation in TCP

Status in linux package in Ubuntu:
  New

Bug description:
  we see degradation ~20% on ConnectX-5/4 in the following case:
  TCP, 1 QP, 1 stream, unidir, single port.
  Message sizes 1M and up show this degradation.

  After changing the default TX moderation mode to off we see up to 40%
  packet rate and up to 23% bandwidth degradtions.

  
  There is an upstream commit that fix this issue, I will backport it and send 
it to the kernel-t...@lists.ubuntu.com

  
  commit 48bfc39791b8b4a25f165e711f18b9c1617cefbc   

   
  Author: Tal Gilboa    

   
  Date:   Fri Mar 30 15:50:08 2018 -0700

   

  net/mlx5e: Set EQE based as default TX interrupt moderation mode
  
  The default TX moderation mode was mistakenly set to CQE based. The
  intention was to add a control ability in order to improve some specific
  use-cases. In general, we prefer to use EQE based moderation as it gives
  much better numbers for the common cases.

  CQE based causes a degradation in the common case since it resets the
  moderation timer on CQE generation. This causes an issue when TSO is
  well utilized (large TSO sessions). The timer is set to 16us so traffic
  of ~64KB TSO sessions per second would mean timer reset (CQE per TSO
  session -> long time between CQEs). In this case we quickly reach the
  tcp_limit_output_bytes (256KB by default) and cause a halt in TX traffic.

  By setting EQE based moderation we make sure timer would expire after
  16us regardless of the packet rate.
  This fixes an up to 40% packet rate and up to 23% bandwidth degradtions.

  Fixes: 0088cbbc4b66 ("net/mlx5e: Enable CQE based moderation on TX CQ")
  Signed-off-by: Tal Gilboa 
  Signed-off-by: Saeed Mahameed 
  Signed-off-by: David S. Miller 

  diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
  index c71f4f10283b..0aab3afc6885 100644
  --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
  +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
  @@ -4137,7 +4137,7 @@ void mlx5e_build_nic_params(struct mlx5_core_dev *mdev,
  struct mlx5e_params *params,
  u16 max_channels, u16 mtu)
   {
  -   u8 cq_period_mode = 0;
  + u8 rx_cq_period_mode;

  params->sw_mtu = mtu;
  params->hard_mtu = MLX5E_ETH_HARD_MTU;
  @@ -4173,12 +4173,12 @@ void mlx5e_build_nic_params(struct mlx5_core_dev 
*mdev,
  params->lro_timeout = mlx5e_choose_lro_timeout(mdev, 
MLX5E_DEFAULT_LRO_TIMEOUT);

  /* CQ moderation params */
  -   cq_period_mode = MLX5_CAP_GEN(mdev, cq_period_start_from_cqe) ?
  + rx_cq_period_mode = MLX5_CAP_GEN(mdev, cq_period_start_from_cqe) ?
  MLX5_CQ_PERIOD_MODE_START_FROM_CQE :
  MLX5_CQ_PERIOD_MODE_START_FROM_EQE;
  params->rx_dim_enabled = MLX5_CAP_GEN(mdev, cq_moderation);
  -   mlx5e_set_rx_cq_mode_params(params, cq_period_mode);
  -   mlx5e_set_tx_cq_mode_params(params, cq_period_mode);
  + mlx5e_set_rx_cq_mode_params(params, rx_cq_period_mode);
  + mlx5e_set_tx_cq_mode_params(params, MLX5_CQ_PERIOD_MODE_START_FROM_EQE);

  /* TX inline */
  params->tx_min_inline_mode = 
mlx5e_params_calculate_tx_min_inline(mdev);

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1763325/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to :