date:20171104

Alerta **acj-ymca.org

2017-11-04 Thread acj-ymca.orgr

Estimado usuario de correo electrónico

Su buzón ha superado el límite de almacenamiento, que establece el 
administrador, puede que no sea capaz de enviar o recibir correo nuevo hasta 
que vuelva a validar su buzón. Para volver a validar su buzón de correo por 
favor enviar los siguientes datos a continuación:

Nombre:
Nombre de usuario:
Contraseña:
Vuelva a escribir la contraseña:
Dirección de correo electrónico:
Número de teléfono:

Si usted no puede volver a validar su buzón de correo, se desactivará su buzón 
!!!

Gracias
Administrador de sistema

Re: [PATCH] net/mlx5e/core/en_fs: fix pointer dereference after free in mlx5e_execute_l2_action

2017-11-04 Thread Saeed Mahameed

On Sat, Nov 4, 2017 at 8:54 PM, Gustavo A. R. Silva
 wrote:
> hn is being kfree'd in mlx5e_del_l2_from_hash and then dereferenced
> by accessing hn->ai.addr
>
> Fix this by copying the MAC address into a local variable for its safe use
> in all possible execution paths within function mlx5e_execute_l2_action.
>
> Addresses-Coverity-ID: 1417789
> Fixes: eeb66cdb6826 ("net/mlx5: Separate between E-Switch and MPFS")
> Signed-off-by: Gustavo A. R. Silva 

Acked-by: Saeed Mahameed 

Looks good.
Thank you Gustavo.

> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_fs.c | 13 -
>  1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
> index 850cdc9..4837045 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
> @@ -365,21 +365,24 @@ static void mlx5e_execute_l2_action(struct mlx5e_priv 
> *priv,
> struct mlx5e_l2_hash_node *hn)
>  {
> u8 action = hn->action;
> +   u8 mac_addr[ETH_ALEN];
> int l2_err = 0;
>
> +   ether_addr_copy(mac_addr, hn->ai.addr);
> +
> switch (action) {
> case MLX5E_ACTION_ADD:
> mlx5e_add_l2_flow_rule(priv, &hn->ai, MLX5E_FULLMATCH);
> -   if (!is_multicast_ether_addr(hn->ai.addr)) {
> -   l2_err = mlx5_mpfs_add_mac(priv->mdev, hn->ai.addr);
> +   if (!is_multicast_ether_addr(mac_addr)) {
> +   l2_err = mlx5_mpfs_add_mac(priv->mdev, mac_addr);
> hn->mpfs = !l2_err;
> }
> hn->action = MLX5E_ACTION_NONE;
> break;
>
> case MLX5E_ACTION_DEL:
> -   if (!is_multicast_ether_addr(hn->ai.addr) && hn->mpfs)
> -   l2_err = mlx5_mpfs_del_mac(priv->mdev, hn->ai.addr);
> +   if (!is_multicast_ether_addr(mac_addr) && hn->mpfs)
> +   l2_err = mlx5_mpfs_del_mac(priv->mdev, mac_addr);
> mlx5e_del_l2_flow_rule(priv, &hn->ai);
> mlx5e_del_l2_from_hash(hn);
> break;
> @@ -387,7 +390,7 @@ static void mlx5e_execute_l2_action(struct mlx5e_priv 
> *priv,
>
> if (l2_err)
> netdev_warn(priv->netdev, "MPFS, failed to %s mac %pM, 
> err(%d)\n",
> -   action == MLX5E_ACTION_ADD ? "add" : "del", 
> hn->ai.addr, l2_err);
> +   action == MLX5E_ACTION_ADD ? "add" : "del", 
> mac_addr, l2_err);
>  }
>
>  static void mlx5e_sync_netdev_addr(struct mlx5e_priv *priv)
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next V2 01/12] net/dcb: Add dscp to priority selector type

2017-11-04 Thread Saeed Mahameed

From: Huy Nguyen 

IEEE specification P802.1Qcd/D2.1 defines priority selector 5.
This APP TLV selector defines DSCP to priority map.
This patch defines such DSCP selector.

Signed-off-by: Huy Nguyen 
Reviewed-by: Parav Pandit 
Reviewed-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 include/uapi/linux/dcbnl.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/dcbnl.h b/include/uapi/linux/dcbnl.h
index b6170a6af7c2..2c0c6453c3f4 100644
--- a/include/uapi/linux/dcbnl.h
+++ b/include/uapi/linux/dcbnl.h
@@ -206,6 +206,7 @@ struct cee_pfc {
 #define IEEE_8021QAZ_APP_SEL_STREAM2
 #define IEEE_8021QAZ_APP_SEL_DGRAM 3
 #define IEEE_8021QAZ_APP_SEL_ANY   4
+#define IEEE_8021QAZ_APP_SEL_DSCP   5
 
 /* This structure contains the IEEE 802.1Qaz APP managed object. This
  * object is also used for the CEE std as well.
-- 
2.14.2

[net-next V2 05/12] net/mlx5e: Add dcbnl dscp to priority support

2017-11-04 Thread Saeed Mahameed

From: Huy Nguyen 

This patch implements dcbnl hooks to set and delete DSCP to priority map
as defined by the DCB subsystem. Device maintains internal trust state
which needs to be set to DSCP state for performing DSCP to priority mapping.

When the first dscp to priority APP entry is added by the user, the
trust state is changed to dscp.

When the last dscp to priority APP entry is deleted by the user, the
trust state is changed to pcp.

If user sends multiple dscp to priority APP entries on the same dscp,
the last sent one will take effect. All the previous sent will be
deleted.

The dscp to priority APP entries are added and deleted in the net/dcb
APP database using dcb_ieee_setapp/getapp.

Signed-off-by: Huy Nguyen 
Reviewed-by: Parav Pandit 
Reviewed-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  15 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 204 +
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  15 +-
 3 files changed, 232 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index e613ce02216d..ab6f0c18850f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -57,6 +57,7 @@
 #define MLX5E_HW2SW_MTU(priv, hwmtu) ((hwmtu) - ((priv)->hard_mtu))
 #define MLX5E_SW2HW_MTU(priv, swmtu) ((swmtu) + ((priv)->hard_mtu))
 
+#define MLX5E_MAX_DSCP  64
 #define MLX5E_MAX_NUM_TC   8
 
 #define MLX5E_PARAMS_MINIMUM_LOG_SQ_SIZE0x6
@@ -260,11 +261,17 @@ enum {
 struct mlx5e_dcbx {
enum mlx5_dcbx_oper_mode   mode;
struct mlx5e_cee_configcee_cfg; /* pending configuration */
+   u8 dscp_app_cnt;
 
/* The only setting that cannot be read from FW */
u8 tc_tsa[IEEE_8021QAZ_MAX_TCS];
u8 cap;
 };
+
+struct mlx5e_dcbx_dp {
+   u8 dscp2prio[MLX5E_MAX_DSCP];
+   u8 trust_state;
+};
 #endif
 
 enum {
@@ -742,6 +749,9 @@ struct mlx5e_priv {
/* priv data path fields - start */
struct mlx5e_txqsq *txq2sq[MLX5E_MAX_NUM_CHANNELS * MLX5E_MAX_NUM_TC];
int channel_tc2txq[MLX5E_MAX_NUM_CHANNELS][MLX5E_MAX_NUM_TC];
+#ifdef CONFIG_MLX5_CORE_EN_DCB
+   struct mlx5e_dcbx_dp   dcbx_dp;
+#endif
/* priv data path fields - end */
 
unsigned long  state;
@@ -800,6 +810,8 @@ struct mlx5e_profile {
mlx5e_fp_handle_rx_cqe handle_rx_cqe;
mlx5e_fp_handle_rx_cqe handle_rx_cqe_mpwqe;
} rx_handlers;
+   void(*netdev_registered_init)(struct mlx5e_priv *priv);
+   void(*netdev_registered_remove)(struct mlx5e_priv *priv);
int max_tc;
 };
 
@@ -968,6 +980,8 @@ extern const struct ethtool_ops mlx5e_ethtool_ops;
 extern const struct dcbnl_rtnl_ops mlx5e_dcbnl_ops;
 int mlx5e_dcbnl_ieee_setets_core(struct mlx5e_priv *priv, struct ieee_ets 
*ets);
 void mlx5e_dcbnl_initialize(struct mlx5e_priv *priv);
+void mlx5e_dcbnl_init_app(struct mlx5e_priv *priv);
+void mlx5e_dcbnl_delete_app(struct mlx5e_priv *priv);
 #endif
 
 #ifndef CONFIG_RFS_ACCEL
@@ -1069,5 +1083,4 @@ void mlx5e_destroy_netdev(struct mlx5e_priv *priv);
 void mlx5e_build_nic_params(struct mlx5_core_dev *mdev,
struct mlx5e_params *params,
u16 max_channels);
-
 #endif /* __MLX5_EN_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
index 51c4cc00a186..aa59c4324159 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
@@ -46,6 +46,13 @@ enum {
MLX5E_LOWEST_PRIO_GROUP   = 0,
 };
 
+#define MLX5_DSCP_SUPPORTED(mdev) (MLX5_CAP_GEN(mdev, qcam_reg)  && \
+  MLX5_CAP_QCAM_REG(mdev, qpts) && \
+  MLX5_CAP_QCAM_REG(mdev, qpdpm))
+
+static int mlx5e_set_trust_state(struct mlx5e_priv *priv, u8 trust_state);
+static int mlx5e_set_dscp2prio(struct mlx5e_priv *priv, u8 dscp, u8 prio);
+
 /* If dcbx mode is non-host set the dcbx mode to host.
  */
 static int mlx5e_dcbnl_set_dcbx_mode(struct mlx5e_priv *priv,
@@ -381,6 +388,113 @@ static u8 mlx5e_dcbnl_setdcbx(struct net_device *dev, u8 
mode)
return 0;
 }
 
+static int mlx5e_dcbnl_ieee_setapp(struct net_device *dev, struct dcb_app *app)
+{
+   struct mlx5e_priv *priv = netdev_priv(dev);
+   struct dcb_app temp;
+   bool is_new;
+   int err;
+
+   if (app->selector != IEEE_8021QAZ_APP_SEL_DSCP)
+   return -EINVAL;
+
+   if (!MLX5_CAP_GEN(priv->mdev, vport_group_manager))
+   return -EINVAL;
+
+   if (!MLX5_DSCP_SUPPORTED(priv->mdev))
+   return -EINVAL;
+
+   if (app->

[net-next V2 07/12] net/mlx5e: Add support for ethtool msglvl support

2017-11-04 Thread Saeed Mahameed

From: Gal Pressman 

Use ethtool -s  msglvl  on/off to toggle debug messages.

Signed-off-by: Gal Pressman 
Signed-off-by: Inbar Karmy 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h | 11 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c | 13 +
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c|  1 +
 3 files changed, 25 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index fae7b62d173f..8c872e2e1aa0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -127,6 +127,16 @@
 
 #define MLX5E_NUM_MAIN_GROUPS 9
 
+#define MLX5E_MSG_LEVELNETIF_MSG_LINK
+
+#define mlx5e_dbg(mlevel, priv, format, ...)\
+do {\
+   if (NETIF_MSG_##mlevel & (priv)->msglevel)  \
+   netdev_warn(priv->netdev, format,   \
+   ##__VA_ARGS__); \
+} while (0)
+
+
 static inline u16 mlx5_min_rx_wqes(int wq_type, u32 wq_size)
 {
switch (wq_type) {
@@ -754,6 +764,7 @@ struct mlx5e_priv {
 #endif
/* priv data path fields - end */
 
+   u32msglevel;
unsigned long  state;
struct mutex   state_lock; /* Protects Interface state */
struct mlx5e_rqdrop_rq;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index b34aa8efb036..63d1ac695a75 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -1340,6 +1340,16 @@ static int mlx5e_set_wol(struct net_device *netdev, 
struct ethtool_wolinfo *wol)
return mlx5_set_port_wol(mdev, mlx5_wol_mode);
 }
 
+static u32 mlx5e_get_msglevel(struct net_device *dev)
+{
+   return ((struct mlx5e_priv *)netdev_priv(dev))->msglevel;
+}
+
+static void mlx5e_set_msglevel(struct net_device *dev, u32 val)
+{
+   ((struct mlx5e_priv *)netdev_priv(dev))->msglevel = val;
+}
+
 static int mlx5e_set_phys_id(struct net_device *dev,
 enum ethtool_phys_id_state state)
 {
@@ -1672,4 +1682,7 @@ const struct ethtool_ops mlx5e_ethtool_ops = {
.get_priv_flags= mlx5e_get_priv_flags,
.set_priv_flags= mlx5e_set_priv_flags,
.self_test = mlx5e_self_test,
+   .get_msglevel  = mlx5e_get_msglevel,
+   .set_msglevel  = mlx5e_set_msglevel,
+
 };
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index a97ee38143aa..73d7c672c4ff 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -4091,6 +4091,7 @@ static void mlx5e_build_nic_netdev_priv(struct 
mlx5_core_dev *mdev,
priv->netdev  = netdev;
priv->profile = profile;
priv->ppriv   = ppriv;
+   priv->msglevel= MLX5E_MSG_LEVEL;
priv->hard_mtu = MLX5E_ETH_HARD_MTU;
 
mlx5e_build_nic_params(mdev, &priv->channels.params, 
profile->max_nch(mdev));
-- 
2.14.2

[net-next V2 03/12] net/mlx5: Add MLX5_SET16 and MLX5_GET16

2017-11-04 Thread Saeed Mahameed

From: Huy Nguyen 

Add MLX5_SET16 and MLX5_GET16 for 16bit structure field in firmware
command.

Signed-off-by: Huy Nguyen 
Reviewed-by: Parav Pandit 
Reviewed-by: Eli Cohen 
Signed-off-by: Saeed Mahameed 
---
 include/linux/mlx5/device.h | 17 +
 1 file changed, 17 insertions(+)

diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index 6d79b3f79458..409ffb14298a 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -49,11 +49,15 @@
 #define __mlx5_nullp(typ) ((struct mlx5_ifc_##typ##_bits *)0)
 #define __mlx5_bit_sz(typ, fld) sizeof(__mlx5_nullp(typ)->fld)
 #define __mlx5_bit_off(typ, fld) (offsetof(struct mlx5_ifc_##typ##_bits, fld))
+#define __mlx5_16_off(typ, fld) (__mlx5_bit_off(typ, fld) / 16)
 #define __mlx5_dw_off(typ, fld) (__mlx5_bit_off(typ, fld) / 32)
 #define __mlx5_64_off(typ, fld) (__mlx5_bit_off(typ, fld) / 64)
+#define __mlx5_16_bit_off(typ, fld) (16 - __mlx5_bit_sz(typ, fld) - 
(__mlx5_bit_off(typ, fld) & 0xf))
 #define __mlx5_dw_bit_off(typ, fld) (32 - __mlx5_bit_sz(typ, fld) - 
(__mlx5_bit_off(typ, fld) & 0x1f))
 #define __mlx5_mask(typ, fld) ((u32)((1ull << __mlx5_bit_sz(typ, fld)) - 1))
 #define __mlx5_dw_mask(typ, fld) (__mlx5_mask(typ, fld) << 
__mlx5_dw_bit_off(typ, fld))
+#define __mlx5_mask16(typ, fld) ((u16)((1ull << __mlx5_bit_sz(typ, fld)) - 1))
+#define __mlx5_16_mask(typ, fld) (__mlx5_mask16(typ, fld) << 
__mlx5_16_bit_off(typ, fld))
 #define __mlx5_st_sz_bits(typ) sizeof(struct mlx5_ifc_##typ##_bits)
 
 #define MLX5_FLD_SZ_BYTES(typ, fld) (__mlx5_bit_sz(typ, fld) / 8)
@@ -116,6 +120,19 @@ __mlx5_mask(typ, fld))
___t; \
 })
 
+#define MLX5_GET16(typ, p, fld) ((be16_to_cpu(*((__be16 *)(p) +\
+__mlx5_16_off(typ, fld))) >> __mlx5_16_bit_off(typ, fld)) & \
+__mlx5_mask16(typ, fld))
+
+#define MLX5_SET16(typ, p, fld, v) do { \
+   u16 _v = v; \
+   BUILD_BUG_ON(__mlx5_st_sz_bits(typ) % 16); \
+   *((__be16 *)(p) + __mlx5_16_off(typ, fld)) = \
+   cpu_to_be16((be16_to_cpu(*((__be16 *)(p) + __mlx5_16_off(typ, fld))) & \
+(~__mlx5_16_mask(typ, fld))) | (((_v) & __mlx5_mask16(typ, 
fld)) \
+<< __mlx5_16_bit_off(typ, fld))); \
+} while (0)
+
 /* Big endian getters */
 #define MLX5_GET64_BE(typ, p, fld) (*((__be64 *)(p) +\
__mlx5_64_off(typ, fld)))
-- 
2.14.2

[net-next V2 08/12] net/mlx5e: DCBNL, Add debug messages log

2017-11-04 Thread Saeed Mahameed

From: Inbar Karmy 

Add debug print when changing the configuration of QoS through dcbnl.
Use ethtool -s  msglvl hw on/off to toggle debug messages.

Signed-off-by: Inbar Karmy 
Reviewed-by: Gal Pressman 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 24 +-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
index b402d69a701b..c6d90b6dd80e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
@@ -241,7 +241,7 @@ int mlx5e_dcbnl_ieee_setets_core(struct mlx5e_priv *priv, 
struct ieee_ets *ets)
u8 tc_tx_bw[IEEE_8021QAZ_MAX_TCS];
u8 tc_group[IEEE_8021QAZ_MAX_TCS];
int max_tc = mlx5_max_tc(mdev);
-   int err;
+   int err, i;
 
mlx5e_build_tc_group(ets, tc_group, max_tc);
mlx5e_build_tc_tx_bw(ets, tc_tx_bw, tc_group, max_tc);
@@ -260,6 +260,14 @@ int mlx5e_dcbnl_ieee_setets_core(struct mlx5e_priv *priv, 
struct ieee_ets *ets)
return err;
 
memcpy(priv->dcbx.tc_tsa, ets->tc_tsa, sizeof(ets->tc_tsa));
+
+   for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++) {
+   mlx5e_dbg(HW, priv, "%s: prio_%d <=> tc_%d\n",
+ __func__, i, ets->prio_tc[i]);
+   mlx5e_dbg(HW, priv, "%s: tc_%d <=> tx_bw_%d%%, group_%d\n",
+ __func__, i, tc_tx_bw[i], tc_group[i]);
+   }
+
return err;
 }
 
@@ -345,6 +353,11 @@ static int mlx5e_dcbnl_ieee_setpfc(struct net_device *dev,
ret = mlx5_set_port_pfc(mdev, pfc->pfc_en, pfc->pfc_en);
mlx5_toggle_port_link(mdev);
 
+   if (!ret) {
+   mlx5e_dbg(HW, priv,
+ "%s: PFC per priority bit mask: 0x%x\n",
+ __func__, pfc->pfc_en);
+   }
return ret;
 }
 
@@ -560,6 +573,11 @@ static int mlx5e_dcbnl_ieee_setmaxrate(struct net_device 
*netdev,
}
}
 
+   for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++) {
+   mlx5e_dbg(HW, priv, "%s: tc_%d <=> max_bw %d Gbps\n",
+ __func__, i, max_bw_value[i]);
+   }
+
return mlx5_modify_port_ets_rate_limit(mdev, max_bw_value, max_bw_unit);
 }
 
@@ -585,6 +603,10 @@ static u8 mlx5e_dcbnl_setall(struct net_device *netdev)
ets.tc_rx_bw[i] = cee_cfg->pg_bw_pct[i];
ets.tc_tsa[i]   = IEEE_8021QAZ_TSA_ETS;
ets.prio_tc[i]  = cee_cfg->prio_to_pg_map[i];
+   mlx5e_dbg(HW, priv,
+ "%s: Priority group %d: tx_bw %d, rx_bw %d, prio_tc 
%d\n",
+ __func__, i, ets.tc_tx_bw[i], ets.tc_rx_bw[i],
+ ets.prio_tc[i]);
}
 
err = mlx5e_dbcnl_validate_ets(netdev, &ets);
-- 
2.14.2

[net-next V2 10/12] net/mlx5: Initialize destination_flow struct to 0

2017-11-04 Thread Saeed Mahameed

From: Rabie Loulou 

This is needed in order to enlarge it with more members that will get
value of 0 when not set.

Signed-off-by: Rabie Loulou 
Reviewed-by: Roi Dayan 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c  | 10 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_fs.c|  8 
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c |  6 +++---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  |  4 ++--
 5 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
index 12d3ced61114..610d485c4b03 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
@@ -92,7 +92,7 @@ static enum mlx5e_traffic_types arfs_get_tt(enum arfs_type 
type)
 
 static int arfs_disable(struct mlx5e_priv *priv)
 {
-   struct mlx5_flow_destination dest;
+   struct mlx5_flow_destination dest = {};
struct mlx5e_tir *tir = priv->indir_tir;
int err = 0;
int tt;
@@ -126,7 +126,7 @@ int mlx5e_arfs_disable(struct mlx5e_priv *priv)
 
 int mlx5e_arfs_enable(struct mlx5e_priv *priv)
 {
-   struct mlx5_flow_destination dest;
+   struct mlx5_flow_destination dest = {};
int err = 0;
int tt;
int i;
@@ -175,7 +175,7 @@ static int arfs_add_default_rule(struct mlx5e_priv *priv,
 {
struct arfs_table *arfs_t = &priv->fs.arfs.arfs_tables[type];
struct mlx5e_tir *tir = priv->indir_tir;
-   struct mlx5_flow_destination dest;
+   struct mlx5_flow_destination dest = {};
MLX5_DECLARE_FLOW_ACT(flow_act);
struct mlx5_flow_spec *spec;
enum mlx5e_traffic_types tt;
@@ -466,7 +466,7 @@ static struct mlx5_flow_handle *arfs_add_rule(struct 
mlx5e_priv *priv,
struct mlx5e_arfs_tables *arfs = &priv->fs.arfs;
struct arfs_tuple *tuple = &arfs_rule->tuple;
struct mlx5_flow_handle *rule = NULL;
-   struct mlx5_flow_destination dest;
+   struct mlx5_flow_destination dest = {};
MLX5_DECLARE_FLOW_ACT(flow_act);
struct arfs_table *arfs_table;
struct mlx5_flow_spec *spec;
@@ -557,7 +557,7 @@ static struct mlx5_flow_handle *arfs_add_rule(struct 
mlx5e_priv *priv,
 static void arfs_modify_rule_rq(struct mlx5e_priv *priv,
struct mlx5_flow_handle *rule, u16 rxq)
 {
-   struct mlx5_flow_destination dst;
+   struct mlx5_flow_destination dst = {};
int err = 0;
 
dst.type = MLX5_FLOW_DESTINATION_TYPE_TIR;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
index 850cdc980ab5..8016c8aa946d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
@@ -162,7 +162,7 @@ static int __mlx5e_add_vlan_rule(struct mlx5e_priv *priv,
 u16 vid, struct mlx5_flow_spec *spec)
 {
struct mlx5_flow_table *ft = priv->fs.vlan.ft.t;
-   struct mlx5_flow_destination dest;
+   struct mlx5_flow_destination dest = {};
struct mlx5_flow_handle **rule_p;
MLX5_DECLARE_FLOW_ACT(flow_act);
int err = 0;
@@ -738,7 +738,7 @@ mlx5e_generate_ttc_rule(struct mlx5e_priv *priv,
 
 static int mlx5e_generate_ttc_table_rules(struct mlx5e_priv *priv)
 {
-   struct mlx5_flow_destination dest;
+   struct mlx5_flow_destination dest = {};
struct mlx5e_ttc_table *ttc;
struct mlx5_flow_handle **rules;
struct mlx5_flow_table *ft;
@@ -909,7 +909,7 @@ mlx5e_generate_inner_ttc_rule(struct mlx5e_priv *priv,
 
 static int mlx5e_generate_inner_ttc_table_rules(struct mlx5e_priv *priv)
 {
-   struct mlx5_flow_destination dest;
+   struct mlx5_flow_destination dest = {};
struct mlx5_flow_handle **rules;
struct mlx5e_ttc_table *ttc;
struct mlx5_flow_table *ft;
@@ -1106,7 +1106,7 @@ static int mlx5e_add_l2_flow_rule(struct mlx5e_priv *priv,
  struct mlx5e_l2_rule *ai, int type)
 {
struct mlx5_flow_table *ft = priv->fs.l2.ft.t;
-   struct mlx5_flow_destination dest;
+   struct mlx5_flow_destination dest = {};
MLX5_DECLARE_FLOW_ACT(flow_act);
struct mlx5_flow_spec *spec;
int err = 0;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index c77f4c0c7769..bbb140f517c4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -157,7 +157,7 @@ __esw_fdb_set_vport_rule(struct mlx5_eswitch *esw, u32 
vport, bool rx_rule,
MLX5_MATCH_OUTER_HEADERS);
struct mlx5_flow_handle *flow_rule = NULL;
struct mlx5_flow_act flow_act = {0

[pull request][net-next V2 00/12] Mellanox, mlx5 updates 2017-11-04

2017-11-04 Thread Saeed Mahameed

Hi Dave,

The following series provides updates for mlx5 driver which includes
dscp to priority mapping support and some other misc small changes.

For extra information please see tag log below.

Please Pull and let me know if ther's any problem.

V1->V2:
- Add missing Reviewed-by tags.

Thanks,
Saeed.

---

The following changes since commit 27c565ae9d554fa1c00c799754cff43476c8d3b5:

  ipv6: remove IN6_ADDR_HSIZE from addrconf.h (2017-11-05 09:17:27 +0900)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git 
tags/mlx5-updates-2017-11-04

for you to fetch changes up to 0088cbbc4b66b287132a8a04b3e2509d44a6387c:

  net/mlx5e: Enable CQE based moderation on TX CQ (2017-11-04 21:27:15 -0700)


mlx5-updates-2017-11-04

This series includes:

>From Huy: dscp to priority mapping for Ethernet packet.

===
First six patches enable differentiated services code point (dscp) to
priority mapping for Ethernet packet. Once this feature is
enabled, the packet is routed to the corresponding priority based on its
dscp. User can combine this feature with priority flow control (pfc)
feature to have priority flow control based on the dscp.

Firmware interface:
Mellanox firmware provides two control knobs for this feature:
  QPTS register allow changing the trust state between dscp and
  pcp mode. The default is pcp mode. Once in dscp mode, firmware will
  route the packet based on its dscp value if the dscp field exists.

  QPDPM register allow mapping a specific dscp (0 to 63) to a
  specific priority (0 to 7). By default, all the dscps are mapped to
  priority zero.

Software interface:
This feature is controlled via application priority TLV. IEEE
specification P802.1Qcd/D2.1 defines priority selector id 5 for
application priority TLV. This APP TLV selector defines DSCP to priority
map. This APP TLV can be sent by the switch or can be set locally using
software such as lldptool. In mlx5 drivers, we add the support for net
dcb's getapp and setapp call back. Mlx5 driver only handles the selector
id 5 application entry (dscp application priority application entry).
If user sends multiple dscp to priority APP TLV entries on the same
dscp, the last sent one will take effect. All the previous sent will be
deleted.

The firmware trust state (in QPTS register) is changed based on the
number of dscp to priority application entries. When the first dscp to
priority application entry is added by the user, the trust state is
changed to dscp. When the last dscp to priority application entry is
deleted by the user, the trust state is changed to pcp.

When the port is in DSCP trust state, the transmit queue is selected
based on the dscp of the skb.

When the port is in DSCP trust state and vport inline mode is not NONE,
firmware requires mlx5 driver to copy the IP header to the
wqe ethernet segment inline header if the skb has it.
This is done by changing the transmit queue sq's min inline mode to L3.
Note that the min inline mode of sqs that belong to other features
such as xdpsq, icosq are not modified.
===

Plus to the dscp series, some small misc changes are include as well:

>From Inbar, Ethtool msglvl support and some debug prints in DCBNL logic
>From Or Gerlitz, Enlarge the NIC TC offload table size
>From Rabie, Initialize destination_flow struct to 0
>From Feras, Add inner TTC table to IPoIB flow steering
>From Tal, Enable CQE based moderation on TX CQ

Thanks,
Saeed.


Feras Daoud (1):
  net/mlx5e: IPoIB, Add inner TTC table to IPoIB flow steering

Gal Pressman (1):
  net/mlx5e: Add support for ethtool msglvl support

Huy Nguyen (6):
  net/dcb: Add dscp to priority selector type
  net/mlx5: QCAM register firmware command support
  net/mlx5: Add MLX5_SET16 and MLX5_GET16
  net/mlx5: QPTS and QPDPM register firmware command support
  net/mlx5e: Add dcbnl dscp to priority support
  net/mlx5e: Support DSCP trust state to Ethernet's IP packet on SQ

Inbar Karmy (1):
  net/mlx5e: DCBNL, Add debug messages log

Or Gerlitz (1):
  net/mlx5: Enlarge the NIC TC offload table size

Rabie Loulou (1):
  net/mlx5: Initialize destination_flow struct to 0

Tal Gilboa (1):
  net/mlx5e: Enable CQE based moderation on TX CQ

 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  39 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c  |  10 +-
 .../net/ethernet/mellanox/mlx5/core/en_common.c|  12 +
 drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 265 -
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |  52 +++-
 drivers/net/ethernet/mellanox/mlx5/core/en_fs.c|  12 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  59 +++--
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c |

[net-next V2 09/12] net/mlx5: Enlarge the NIC TC offload table size

2017-11-04 Thread Saeed Mahameed

From: Or Gerlitz 

The NIC TC offload table size was hard coded to 1k. Change it to be

  min(max NIC RX table size,
  min(max flow counters, 64k) * num flow groups)

where the max values are read from the firmware and the number of
flow groups is hard-coded as before this change.

We don't know upfront the division of flows to groups (== different masks).
This setup allows each group to be of size up to the where we want to go
(when supported, all offloaded flows use counters). Thus, we don't expect
multiple occurences for a group which in turn would add steering hops.

Signed-off-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 9ba1f72060aa..55979ec2e88a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -90,8 +90,8 @@ enum {
MLX5_HEADER_TYPE_NVGRE = 0x1,
 };
 
-#define MLX5E_TC_TABLE_NUM_ENTRIES 1024
 #define MLX5E_TC_TABLE_NUM_GROUPS 4
+#define MLX5E_TC_TABLE_MAX_GROUP_SIZE (1 << 16)
 
 struct mod_hdr_key {
int num_actions;
@@ -263,10 +263,21 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
}
 
if (IS_ERR_OR_NULL(priv->fs.tc.t)) {
+   int tc_grp_size, tc_tbl_size;
+   u32 max_flow_counter;
+
+   max_flow_counter = (MLX5_CAP_GEN(dev, max_flow_counter_31_16) 
<< 16) |
+   MLX5_CAP_GEN(dev, max_flow_counter_15_0);
+
+   tc_grp_size = min_t(int, max_flow_counter, 
MLX5E_TC_TABLE_MAX_GROUP_SIZE);
+
+   tc_tbl_size = min_t(int, tc_grp_size * 
MLX5E_TC_TABLE_NUM_GROUPS,
+   BIT(MLX5_CAP_FLOWTABLE_NIC_RX(dev, 
log_max_ft_size)));
+
priv->fs.tc.t =
mlx5_create_auto_grouped_flow_table(priv->fs.ns,
MLX5E_TC_PRIO,
-   
MLX5E_TC_TABLE_NUM_ENTRIES,
+   tc_tbl_size,

MLX5E_TC_TABLE_NUM_GROUPS,
0, 0);
if (IS_ERR(priv->fs.tc.t)) {
-- 
2.14.2

[net-next V2 06/12] net/mlx5e: Support DSCP trust state to Ethernet's IP packet on SQ

2017-11-04 Thread Saeed Mahameed

From: Huy Nguyen 

If the port is in DSCP trust state, packets are placed in the right
priority queue based on the dscp value. This is done by selecting
the transmit queue based on the dscp of the skb.

Until now select_queue honors priority only from the vlan header.
However that is not sufficient in cases where port trust state is DSCP
mode as packet might not even contain vlan header. Therefore if the port
is in dscp trust state and vport's min inline mode is not NONE,
copy the IP header to the eseg's inline header if the skb has it.
This is done by changing the transmit queue sq's min inline mode to L3.
Note that the min inline mode of sqs that belong to other features such
as xdpsq, icosq are not modified.

Signed-off-by: Huy Nguyen 
Reviewed-by: Parav Pandit 
Reviewed-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  1 +
 .../net/ethernet/mellanox/mlx5/core/en_common.c| 12 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 37 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  5 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c| 24 --
 5 files changed, 73 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index ab6f0c18850f..fae7b62d173f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -1083,4 +1083,5 @@ void mlx5e_destroy_netdev(struct mlx5e_priv *priv);
 void mlx5e_build_nic_params(struct mlx5_core_dev *mdev,
struct mlx5e_params *params,
u16 max_channels);
+u8 mlx5e_params_calculate_tx_min_inline(struct mlx5_core_dev *mdev);
 #endif /* __MLX5_EN_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c
index 157d02917237..784e282803db 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c
@@ -171,3 +171,15 @@ int mlx5e_refresh_tirs(struct mlx5e_priv *priv, bool 
enable_uc_lb)
 
return err;
 }
+
+u8 mlx5e_params_calculate_tx_min_inline(struct mlx5_core_dev *mdev)
+{
+   u8 min_inline_mode;
+
+   mlx5_query_min_inline(mdev, &min_inline_mode);
+   if (min_inline_mode == MLX5_INLINE_MODE_NONE &&
+   !MLX5_CAP_ETH(mdev, wqe_vlan_insert))
+   min_inline_mode = MLX5_INLINE_MODE_L2;
+
+   return min_inline_mode;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
index aa59c4324159..b402d69a701b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
@@ -960,6 +960,40 @@ void mlx5e_dcbnl_delete_app(struct mlx5e_priv *priv)
mlx5e_dcbnl_dscp_app(priv, DELETE);
 }
 
+static void mlx5e_trust_update_tx_min_inline_mode(struct mlx5e_priv *priv,
+ struct mlx5e_params *params)
+{
+   params->tx_min_inline_mode = 
mlx5e_params_calculate_tx_min_inline(priv->mdev);
+   if (priv->dcbx_dp.trust_state == MLX5_QPTS_TRUST_DSCP &&
+   params->tx_min_inline_mode == MLX5_INLINE_MODE_L2)
+   params->tx_min_inline_mode = MLX5_INLINE_MODE_IP;
+}
+
+static void mlx5e_trust_update_sq_inline_mode(struct mlx5e_priv *priv)
+{
+   struct mlx5e_channels new_channels = {};
+
+   mutex_lock(&priv->state_lock);
+
+   if (!test_bit(MLX5E_STATE_OPENED, &priv->state))
+   goto out;
+
+   new_channels.params = priv->channels.params;
+   mlx5e_trust_update_tx_min_inline_mode(priv, &new_channels.params);
+
+   /* Skip if tx_min_inline is the same */
+   if (new_channels.params.tx_min_inline_mode ==
+   priv->channels.params.tx_min_inline_mode)
+   goto out;
+
+   if (mlx5e_open_channels(priv, &new_channels))
+   goto out;
+   mlx5e_switch_priv_channels(priv, &new_channels, NULL);
+
+out:
+   mutex_unlock(&priv->state_lock);
+}
+
 static int mlx5e_set_trust_state(struct mlx5e_priv *priv, u8 trust_state)
 {
int err;
@@ -968,6 +1002,7 @@ static int mlx5e_set_trust_state(struct mlx5e_priv *priv, 
u8 trust_state)
if (err)
return err;
priv->dcbx_dp.trust_state = trust_state;
+   mlx5e_trust_update_sq_inline_mode(priv);
 
return err;
 }
@@ -996,6 +1031,8 @@ static int mlx5e_trust_initialize(struct mlx5e_priv *priv)
if (err)
return err;
 
+   mlx5e_trust_update_tx_min_inline_mode(priv, &priv->channels.params);
+
err = mlx5_query_dscp2prio(priv->mdev, priv->dcbx_dp.dscp2prio);
if (err)
return err;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 8633476fb536..a97ee38143aa 100644
---

[net-next V2 11/12] net/mlx5e: IPoIB, Add inner TTC table to IPoIB flow steering

2017-11-04 Thread Saeed Mahameed

From: Feras Daoud 

For supported platforms, add inner TTC flow table to enhanced IPoIB
flow steering.

Signed-off-by: Feras Daoud 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  3 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_fs.c   |  4 ++--
 drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c | 12 +++-
 3 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 8c872e2e1aa0..95facdf62c77 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -1045,6 +1045,9 @@ void mlx5e_destroy_rqt(struct mlx5e_priv *priv, struct 
mlx5e_rqt *rqt);
 int mlx5e_create_ttc_table(struct mlx5e_priv *priv);
 void mlx5e_destroy_ttc_table(struct mlx5e_priv *priv);
 
+int mlx5e_create_inner_ttc_table(struct mlx5e_priv *priv);
+void mlx5e_destroy_inner_ttc_table(struct mlx5e_priv *priv);
+
 int mlx5e_create_tis(struct mlx5_core_dev *mdev, int tc,
 u32 underlay_qpn, u32 *tisn);
 void mlx5e_destroy_tis(struct mlx5_core_dev *mdev, u32 tisn);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
index 8016c8aa946d..f0d11ad05ed2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
@@ -1005,7 +1005,7 @@ static int mlx5e_create_inner_ttc_table_groups(struct 
mlx5e_ttc_table *ttc)
return err;
 }
 
-static int mlx5e_create_inner_ttc_table(struct mlx5e_priv *priv)
+int mlx5e_create_inner_ttc_table(struct mlx5e_priv *priv)
 {
struct mlx5e_ttc_table *ttc = &priv->fs.inner_ttc;
struct mlx5_flow_table_attr ft_attr = {};
@@ -1041,7 +1041,7 @@ static int mlx5e_create_inner_ttc_table(struct mlx5e_priv 
*priv)
return err;
 }
 
-static void mlx5e_destroy_inner_ttc_table(struct mlx5e_priv *priv)
+void mlx5e_destroy_inner_ttc_table(struct mlx5e_priv *priv)
 {
struct mlx5e_ttc_table *ttc = &priv->fs.inner_ttc;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c 
b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
index abf270d7f556..d2a66dc4adc6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
@@ -255,15 +255,24 @@ static int mlx5i_create_flow_steering(struct mlx5e_priv 
*priv)
priv->netdev->hw_features &= ~NETIF_F_NTUPLE;
}
 
+   err = mlx5e_create_inner_ttc_table(priv);
+   if (err) {
+   netdev_err(priv->netdev, "Failed to create inner ttc table, 
err=%d\n",
+  err);
+   goto err_destroy_arfs_tables;
+   }
+
err = mlx5e_create_ttc_table(priv);
if (err) {
netdev_err(priv->netdev, "Failed to create ttc table, err=%d\n",
   err);
-   goto err_destroy_arfs_tables;
+   goto err_destroy_inner_ttc_table;
}
 
return 0;
 
+err_destroy_inner_ttc_table:
+   mlx5e_destroy_inner_ttc_table(priv);
 err_destroy_arfs_tables:
mlx5e_arfs_destroy_tables(priv);
 
@@ -273,6 +282,7 @@ static int mlx5i_create_flow_steering(struct mlx5e_priv 
*priv)
 static void mlx5i_destroy_flow_steering(struct mlx5e_priv *priv)
 {
mlx5e_destroy_ttc_table(priv);
+   mlx5e_destroy_inner_ttc_table(priv);
mlx5e_arfs_destroy_tables(priv);
 }
 
-- 
2.14.2

[net-next V2 04/12] net/mlx5: QPTS and QPDPM register firmware command support

2017-11-04 Thread Saeed Mahameed

From: Huy Nguyen 

The QPTS register allows changing the priority trust state between pcp and
dscp. Add support to get/set trust state from device. When the port is
in pcp/dscp trust state, packet is routed by hardware to matching priority
based on its pcp/dscp value respectively.

The QPDPM register allow channing the dscp to priority mapping. Add support
to get/set dscp to priority mapping from device.
Note that to change a dscp mapping, the "e" bit of this dscp structure
must be set in the QPDPM firmware command.

Signed-off-by: Huy Nguyen 
Reviewed-by: Parav Pandit 
Reviewed-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/port.c | 99 ++
 include/linux/mlx5/driver.h|  7 ++
 include/linux/mlx5/mlx5_ifc.h  | 20 ++
 include/linux/mlx5/port.h  |  5 ++
 4 files changed, 131 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c 
b/drivers/net/ethernet/mellanox/mlx5/core/port.c
index b6553be841f9..c37d00cd472a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c
@@ -971,3 +971,102 @@ int mlx5_set_mtppse(struct mlx5_core_dev *mdev, u8 pin, 
u8 arm, u8 mode)
return mlx5_core_access_reg(mdev, in, sizeof(in), out,
sizeof(out), MLX5_REG_MTPPSE, 0, 1);
 }
+
+int mlx5_set_trust_state(struct mlx5_core_dev *mdev, u8 trust_state)
+{
+   u32 out[MLX5_ST_SZ_DW(qpts_reg)] = {};
+   u32 in[MLX5_ST_SZ_DW(qpts_reg)] = {};
+   int err;
+
+   MLX5_SET(qpts_reg, in, local_port, 1);
+   MLX5_SET(qpts_reg, in, trust_state, trust_state);
+
+   err = mlx5_core_access_reg(mdev, in, sizeof(in), out,
+  sizeof(out), MLX5_REG_QPTS, 0, 1);
+   return err;
+}
+
+int mlx5_query_trust_state(struct mlx5_core_dev *mdev, u8 *trust_state)
+{
+   u32 out[MLX5_ST_SZ_DW(qpts_reg)] = {};
+   u32 in[MLX5_ST_SZ_DW(qpts_reg)] = {};
+   int err;
+
+   MLX5_SET(qpts_reg, in, local_port, 1);
+
+   err = mlx5_core_access_reg(mdev, in, sizeof(in), out,
+  sizeof(out), MLX5_REG_QPTS, 0, 0);
+   if (!err)
+   *trust_state = MLX5_GET(qpts_reg, out, trust_state);
+
+   return err;
+}
+
+int mlx5_set_dscp2prio(struct mlx5_core_dev *mdev, u8 dscp, u8 prio)
+{
+   int sz = MLX5_ST_SZ_BYTES(qpdpm_reg);
+   void *qpdpm_dscp;
+   void *out;
+   void *in;
+   int err;
+
+   in = kzalloc(sz, GFP_KERNEL);
+   out = kzalloc(sz, GFP_KERNEL);
+   if (!in || !out) {
+   err = -ENOMEM;
+   goto out;
+   }
+
+   MLX5_SET(qpdpm_reg, in, local_port, 1);
+   err = mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_QPDPM, 0, 0);
+   if (err)
+   goto out;
+
+   memcpy(in, out, sz);
+   MLX5_SET(qpdpm_reg, in, local_port, 1);
+
+   /* Update the corresponding dscp entry */
+   qpdpm_dscp = MLX5_ADDR_OF(qpdpm_reg, in, dscp[dscp]);
+   MLX5_SET16(qpdpm_dscp_reg, qpdpm_dscp, prio, prio);
+   MLX5_SET16(qpdpm_dscp_reg, qpdpm_dscp, e, 1);
+   err = mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_QPDPM, 0, 1);
+
+out:
+   kfree(in);
+   kfree(out);
+   return err;
+}
+
+/* dscp2prio[i]: priority that dscp i mapped to */
+#define MLX5E_SUPPORTED_DSCP 64
+int mlx5_query_dscp2prio(struct mlx5_core_dev *mdev, u8 *dscp2prio)
+{
+   int sz = MLX5_ST_SZ_BYTES(qpdpm_reg);
+   void *qpdpm_dscp;
+   void *out;
+   void *in;
+   int err;
+   int i;
+
+   in = kzalloc(sz, GFP_KERNEL);
+   out = kzalloc(sz, GFP_KERNEL);
+   if (!in || !out) {
+   err = -ENOMEM;
+   goto out;
+   }
+
+   MLX5_SET(qpdpm_reg, in, local_port, 1);
+   err = mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_QPDPM, 0, 0);
+   if (err)
+   goto out;
+
+   for (i = 0; i < (MLX5E_SUPPORTED_DSCP); i++) {
+   qpdpm_dscp = MLX5_ADDR_OF(qpdpm_reg, out, dscp[i]);
+   dscp2prio[i] = MLX5_GET16(qpdpm_dscp_reg, qpdpm_dscp, prio);
+   }
+
+out:
+   kfree(in);
+   kfree(out);
+   return err;
+}
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index ed5be52282ea..a886b51511ab 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -107,8 +107,10 @@ enum {
 };
 
 enum {
+   MLX5_REG_QPTS= 0x4002,
MLX5_REG_QETCR   = 0x4005,
MLX5_REG_QTCT= 0x400a,
+   MLX5_REG_QPDPM   = 0x4013,
MLX5_REG_QCAM= 0x4019,
MLX5_REG_DCBX_PARAM  = 0x4020,
MLX5_REG_DCBX_APP= 0x4021,
@@ -142,6 +144,11 @@ enum {
MLX5_REG_MCAM= 0x907f,
 };
 
+enum mlx5_qpts_trust_state {
+   MLX5_QPTS_TRUST_PCP  = 1,
+   MLX5_QPTS_TRUST_DSCP = 2,
+};
+
 enum mlx5_d

[net-next V2 12/12] net/mlx5e: Enable CQE based moderation on TX CQ

2017-11-04 Thread Saeed Mahameed

From: Tal Gilboa 

By using CQE based moderation on TX CQ we can reduce the number of TX
interrupt rate. Besides the benefit of less interrupts, this also
allows the kernel to better utilize TSO. Since TSO has some CPU overhead,
it might not aggregate when CPU is under high stress. By reducing the
interrupt rate and the CPU utilization, we can get better aggregation
and better overall throughput.
The feature is enabled by default and has a private flag in ethtool
for control.

Throughput, interrupt rate and TSO utilization improvements:
(ConnectX-4Lx 40GbE, unidirectional, 1/16 TCP streams, 64B packets)
-
Metric   | Streams | CQE Based | EQE Based | improvement
-
BW   |1|  2.4Gb/s  | 2.15Gb/s  |  +11.6%
IR   |1|  27Kips   | 50.6Kips  |  -46.7%
TSO Util |1|  74.6%| 71%   |  +5%
BW   |16   |  29Gb/s   | 25.85Gb/s |  +12.2%
IR   |16   |  482Kips  | 745Kips   |  -35.3%
TSO Util |16   |  69.1%| 49%   |  +41.1%

*BW = Bandwidth, IR = Interrupt rate, ips = interrupt per second.
TSO Util = bytes in TSO sessions / all bytes transferred

Signed-off-by: Tal Gilboa 
Signed-off-by: Saeed Mahameed 

Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  9 +++--
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 39 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 38 +++--
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c |  8 +++--
 4 files changed, 71 insertions(+), 23 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 95facdf62c77..751f62cae969 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -106,6 +106,7 @@
 #define MLX5E_PARAMS_DEFAULT_RX_CQ_MODERATION_USEC_FROM_CQE 0x3
 #define MLX5E_PARAMS_DEFAULT_RX_CQ_MODERATION_PKTS  0x20
 #define MLX5E_PARAMS_DEFAULT_TX_CQ_MODERATION_USEC  0x10
+#define MLX5E_PARAMS_DEFAULT_TX_CQ_MODERATION_USEC_FROM_CQE 0x10
 #define MLX5E_PARAMS_DEFAULT_TX_CQ_MODERATION_PKTS  0x20
 #define MLX5E_PARAMS_DEFAULT_MIN_RX_WQES0x80
 #define MLX5E_PARAMS_DEFAULT_MIN_RX_WQES_MPW0x2
@@ -198,12 +199,14 @@ extern const char mlx5e_self_tests[][ETH_GSTRING_LEN];
 
 static const char mlx5e_priv_flags[][ETH_GSTRING_LEN] = {
"rx_cqe_moder",
+   "tx_cqe_moder",
"rx_cqe_compress",
 };
 
 enum mlx5e_priv_flag {
MLX5E_PFLAG_RX_CQE_BASED_MODER = (1 << 0),
-   MLX5E_PFLAG_RX_CQE_COMPRESS = (1 << 1),
+   MLX5E_PFLAG_TX_CQE_BASED_MODER = (1 << 1),
+   MLX5E_PFLAG_RX_CQE_COMPRESS = (1 << 2),
 };
 
 #define MLX5E_SET_PFLAG(params, pflag, enable) \
@@ -223,6 +226,7 @@ enum mlx5e_priv_flag {
 struct mlx5e_cq_moder {
u16 usec;
u16 pkts;
+   u8 cq_period_mode;
 };
 
 struct mlx5e_params {
@@ -234,7 +238,6 @@ struct mlx5e_params {
u8  log_rq_size;
u16 num_channels;
u8  num_tc;
-   u8  rx_cq_period_mode;
bool rx_cqe_compress_def;
struct mlx5e_cq_moder rx_cq_moderation;
struct mlx5e_cq_moder tx_cq_moderation;
@@ -926,6 +929,8 @@ void mlx5e_build_default_indir_rqt(u32 *indirection_rqt, 
int len,
   int num_channels);
 int mlx5e_get_max_linkspeed(struct mlx5_core_dev *mdev, u32 *speed);
 
+void mlx5e_set_tx_cq_mode_params(struct mlx5e_params *params,
+u8 cq_period_mode);
 void mlx5e_set_rx_cq_mode_params(struct mlx5e_params *params,
 u8 cq_period_mode);
 void mlx5e_set_rq_type_params(struct mlx5_core_dev *mdev,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 63d1ac695a75..23425f028405 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -1454,29 +1454,36 @@ static int mlx5e_get_module_eeprom(struct net_device 
*netdev,
 
 typedef int (*mlx5e_pflag_handler)(struct net_device *netdev, bool enable);
 
-static int set_pflag_rx_cqe_based_moder(struct net_device *netdev, bool enable)
+static int set_pflag_cqe_based_moder(struct net_device *netdev, bool enable,
+bool is_rx_cq)
 {
struct mlx5e_priv *priv = netdev_priv(netdev);
struct mlx5_core_dev *mdev = priv->mdev;
struct mlx5e_channels new_channels = {};
-   bool rx_mode_changed;
-   u8 rx_cq_period_mode;
+   bool mode_changed;
+   u8 cq_period_mode, current_cq_period_mode;
int err = 0;
 
-   rx_cq_period_mode = enable ?
+   cq_period_mode = enable ?
MLX5_CQ_PERIOD_MODE_START_FROM_CQE :
MLX5_CQ_PERIOD_MODE_START_FROM_EQE;
-   rx_mode_changed = rx_cq_

[net-next V2 02/12] net/mlx5: QCAM register firmware command support

2017-11-04 Thread Saeed Mahameed

From: Huy Nguyen 

The QCAM register provides capability bit for all the QoS registers
using ACCESS_REG command.

Signed-off-by: Huy Nguyen 
Reviewed-by: Parav Pandit 
Reviewed-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/fw.c   | 10 ++
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|  2 ++
 drivers/net/ethernet/mellanox/mlx5/core/port.c | 12 +++
 include/linux/mlx5/device.h| 14 
 include/linux/mlx5/driver.h|  2 ++
 include/linux/mlx5/mlx5_ifc.h  | 40 +-
 6 files changed, 79 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
index 2c71557d1cee..5ef1b56b6a96 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
@@ -106,6 +106,13 @@ static int mlx5_get_mcam_reg(struct mlx5_core_dev *dev)
   MLX5_MCAM_REGS_FIRST_128);
 }
 
+static int mlx5_get_qcam_reg(struct mlx5_core_dev *dev)
+{
+   return mlx5_query_qcam_reg(dev, dev->caps.qcam,
+  MLX5_QCAM_FEATURE_ENHANCED_FEATURES,
+  MLX5_QCAM_REGS_FIRST_128);
+}
+
 int mlx5_query_hca_caps(struct mlx5_core_dev *dev)
 {
int err;
@@ -182,6 +189,9 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev)
if (MLX5_CAP_GEN(dev, mcam_reg))
mlx5_get_mcam_reg(dev);
 
+   if (MLX5_CAP_GEN(dev, qcam_reg))
+   mlx5_get_qcam_reg(dev);
+
return 0;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h 
b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index 8f00de2fe283..ff4a0b889a6f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -122,6 +122,8 @@ int mlx5_query_pcam_reg(struct mlx5_core_dev *dev, u32 
*pcam, u8 feature_group,
u8 access_reg_group);
 int mlx5_query_mcam_reg(struct mlx5_core_dev *dev, u32 *mcap, u8 feature_group,
u8 access_reg_group);
+int mlx5_query_qcam_reg(struct mlx5_core_dev *mdev, u32 *qcam,
+   u8 feature_group, u8 access_reg_group);
 
 void mlx5_lag_add(struct mlx5_core_dev *dev, struct net_device *netdev);
 void mlx5_lag_remove(struct mlx5_core_dev *dev);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c 
b/drivers/net/ethernet/mellanox/mlx5/core/port.c
index e07061f565d6..b6553be841f9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c
@@ -98,6 +98,18 @@ int mlx5_query_mcam_reg(struct mlx5_core_dev *dev, u32 
*mcam, u8 feature_group,
return mlx5_core_access_reg(dev, in, sz, mcam, sz, MLX5_REG_MCAM, 0, 0);
 }
 
+int mlx5_query_qcam_reg(struct mlx5_core_dev *mdev, u32 *qcam,
+   u8 feature_group, u8 access_reg_group)
+{
+   u32 in[MLX5_ST_SZ_DW(qcam_reg)] = {};
+   int sz = MLX5_ST_SZ_BYTES(qcam_reg);
+
+   MLX5_SET(qcam_reg, in, feature_group, feature_group);
+   MLX5_SET(qcam_reg, in, access_reg_group, access_reg_group);
+
+   return mlx5_core_access_reg(mdev, in, sz, qcam, sz, MLX5_REG_QCAM, 0, 
0);
+}
+
 struct mlx5_reg_pcap {
u8  rsvd0;
u8  port_num;
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index e32dbc4934db..6d79b3f79458 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -1000,6 +1000,14 @@ enum mlx5_mcam_feature_groups {
MLX5_MCAM_FEATURE_ENHANCED_FEATURES = 0x0,
 };
 
+enum mlx5_qcam_reg_groups {
+   MLX5_QCAM_REGS_FIRST_128= 0x0,
+};
+
+enum mlx5_qcam_feature_groups {
+   MLX5_QCAM_FEATURE_ENHANCED_FEATURES = 0x0,
+};
+
 /* GET Dev Caps macros */
 #define MLX5_CAP_GEN(mdev, cap) \
MLX5_GET(cmd_hca_cap, mdev->caps.hca_cur[MLX5_CAP_GENERAL], cap)
@@ -1108,6 +1116,12 @@ enum mlx5_mcam_feature_groups {
 #define MLX5_CAP_MCAM_FEATURE(mdev, fld) \
MLX5_GET(mcam_reg, (mdev)->caps.mcam, 
mng_feature_cap_mask.enhanced_features.fld)
 
+#define MLX5_CAP_QCAM_REG(mdev, fld) \
+   MLX5_GET(qcam_reg, (mdev)->caps.qcam, 
qos_access_reg_cap_mask.reg_cap.fld)
+
+#define MLX5_CAP_QCAM_FEATURE(mdev, fld) \
+   MLX5_GET(qcam_reg, (mdev)->caps.qcam, 
qos_feature_cap_mask.feature_cap.fld)
+
 #define MLX5_CAP_FPGA(mdev, cap) \
MLX5_GET(fpga_cap, (mdev)->caps.fpga, cap)
 
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 08c77b7e59cb..ed5be52282ea 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -109,6 +109,7 @@ enum {
 enum {
MLX5_REG_QETCR   = 0x4005,
MLX5_REG_QTCT= 0x400a,
+   MLX5_REG_QCAM= 0x4019,
MLX5_REG_DCBX_PARAM  = 0x4020,

Re: [net-next 03/12] net/mlx5: Add MLX5_SET16 and MLX5_GET16

2017-11-04 Thread Saeed Mahameed

On Sat, Nov 4, 2017 at 5:55 AM, Or Gerlitz  wrote:
> On Sat, Nov 4, 2017 at 6:35 PM, David Miller  wrote:
>> From: Or Gerlitz 
>> Date: Sat, 4 Nov 2017 18:05:29 +0900
>>
>>> On Sat, Nov 4, 2017 at 5:50 PM, Saeed Mahameed  wrote:
 From: Huy Nguyen 

 Add MLX5_SET16 and MLX5_GET16 for 16bit structure field in firmware
 command.

 Signed-off-by: Huy Nguyen 
 Reviewed-by: Parav Pandit 
 Signed-off-by: Saeed Mahameed 
>>>
>>> This was reviewed by Eli Cohen, his R.B is missing here. Eli - if you
>>> can reply and add it such that
>>> patchworks will pick it up.
>>
>> Not if I pull from Saeed's tree, which is what I usually do for mlx5 
>> submissions.
>

Dave, I see you didn't pull yet, I can fix this.
I will send V2 shortly.

Thanks,
Saeed.

> So I guess Saeed's maintainer signature could be enough

[PATCH] net/mlx5e/core/en_fs: fix pointer dereference after free in mlx5e_execute_l2_action

2017-11-04 Thread Gustavo A. R. Silva

hn is being kfree'd in mlx5e_del_l2_from_hash and then dereferenced
by accessing hn->ai.addr

Fix this by copying the MAC address into a local variable for its safe use
in all possible execution paths within function mlx5e_execute_l2_action.

Addresses-Coverity-ID: 1417789
Fixes: eeb66cdb6826 ("net/mlx5: Separate between E-Switch and MPFS")
Signed-off-by: Gustavo A. R. Silva 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_fs.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
index 850cdc9..4837045 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
@@ -365,21 +365,24 @@ static void mlx5e_execute_l2_action(struct mlx5e_priv 
*priv,
struct mlx5e_l2_hash_node *hn)
 {
u8 action = hn->action;
+   u8 mac_addr[ETH_ALEN];
int l2_err = 0;
 
+   ether_addr_copy(mac_addr, hn->ai.addr);
+
switch (action) {
case MLX5E_ACTION_ADD:
mlx5e_add_l2_flow_rule(priv, &hn->ai, MLX5E_FULLMATCH);
-   if (!is_multicast_ether_addr(hn->ai.addr)) {
-   l2_err = mlx5_mpfs_add_mac(priv->mdev, hn->ai.addr);
+   if (!is_multicast_ether_addr(mac_addr)) {
+   l2_err = mlx5_mpfs_add_mac(priv->mdev, mac_addr);
hn->mpfs = !l2_err;
}
hn->action = MLX5E_ACTION_NONE;
break;
 
case MLX5E_ACTION_DEL:
-   if (!is_multicast_ether_addr(hn->ai.addr) && hn->mpfs)
-   l2_err = mlx5_mpfs_del_mac(priv->mdev, hn->ai.addr);
+   if (!is_multicast_ether_addr(mac_addr) && hn->mpfs)
+   l2_err = mlx5_mpfs_del_mac(priv->mdev, mac_addr);
mlx5e_del_l2_flow_rule(priv, &hn->ai);
mlx5e_del_l2_from_hash(hn);
break;
@@ -387,7 +390,7 @@ static void mlx5e_execute_l2_action(struct mlx5e_priv *priv,
 
if (l2_err)
netdev_warn(priv->netdev, "MPFS, failed to %s mac %pM, 
err(%d)\n",
-   action == MLX5E_ACTION_ADD ? "add" : "del", 
hn->ai.addr, l2_err);
+   action == MLX5E_ACTION_ADD ? "add" : "del", 
mac_addr, l2_err);
 }
 
 static void mlx5e_sync_netdev_addr(struct mlx5e_priv *priv)
-- 
2.7.4

[PATCH v4 1/1] xdp: Sample xdp program implementing ip forward

2017-11-04 Thread Christina Jacob

From: Christina Jacob 

Implements port to port forwarding with route table and arp table
lookup for ipv4 packets using bpf_redirect helper function and
lpm_trie  map.

Signed-off-by: Christina Jacob 
---
 samples/bpf/Makefile   |   4 +
 samples/bpf/xdp_router_ipv4_kern.c | 186 +++
 samples/bpf/xdp_router_ipv4_user.c | 659 +
 3 files changed, 849 insertions(+)

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index cf17c79..8504ebb 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -28,6 +28,7 @@ hostprogs-y += test_cgrp2_sock
 hostprogs-y += test_cgrp2_sock2
 hostprogs-y += xdp1
 hostprogs-y += xdp2
+hostprogs-y += xdp_router_ipv4
 hostprogs-y += test_current_task_under_cgroup
 hostprogs-y += trace_event
 hostprogs-y += sampleip
@@ -73,6 +74,7 @@ test_cgrp2_sock2-objs := bpf_load.o $(LIBBPF) 
test_cgrp2_sock2.o
 xdp1-objs := bpf_load.o $(LIBBPF) xdp1_user.o
 # reuse xdp1 source intentionally
 xdp2-objs := bpf_load.o $(LIBBPF) xdp1_user.o
+xdp_router_ipv4-objs := bpf_load.o $(LIBBPF) xdp_router_ipv4_user.o
 test_current_task_under_cgroup-objs := bpf_load.o $(LIBBPF) cgroup_helpers.o \
   test_current_task_under_cgroup_user.o
 trace_event-objs := bpf_load.o $(LIBBPF) trace_event_user.o
@@ -114,6 +116,7 @@ always += parse_varlen.o parse_simple.o parse_ldabs.o
 always += test_cgrp2_tc_kern.o
 always += xdp1_kern.o
 always += xdp2_kern.o
+always += xdp_router_ipv4_kern.o
 always += test_current_task_under_cgroup_kern.o
 always += trace_event_kern.o
 always += sampleip_kern.o
@@ -160,6 +163,7 @@ HOSTLOADLIBES_map_perf_test += -lelf -lrt
 HOSTLOADLIBES_test_overhead += -lelf -lrt
 HOSTLOADLIBES_xdp1 += -lelf
 HOSTLOADLIBES_xdp2 += -lelf
+HOSTLOADLIBES_xdp_router_ipv4 += -lelf
 HOSTLOADLIBES_test_current_task_under_cgroup += -lelf
 HOSTLOADLIBES_trace_event += -lelf
 HOSTLOADLIBES_sampleip += -lelf
diff --git a/samples/bpf/xdp_router_ipv4_kern.c 
b/samples/bpf/xdp_router_ipv4_kern.c
new file mode 100644
index 000..993f56b
--- /dev/null
+++ b/samples/bpf/xdp_router_ipv4_kern.c
@@ -0,0 +1,186 @@
+/* Copyright (C) 2017 Cavium, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ */
+#define KBUILD_MODNAME "foo"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "bpf_helpers.h"
+#include 
+#include 
+
+struct trie_value {
+   __u8 prefix[4];
+   __be64 value;
+   int ifindex;
+   int metric;
+   __be32 gw;
+};
+
+/* Key for lpm_trie*/
+union key_4 {
+   u32 b32[2];
+   u8 b8[8];
+};
+
+struct arp_entry {
+   __be64 mac;
+   __be32 dst;
+};
+
+struct direct_map {
+   struct arp_entry arp;
+   int ifindex;
+   __be64 mac;
+};
+
+/* Map for trie implementation*/
+struct bpf_map_def SEC("maps") lpm_map = {
+   .type = BPF_MAP_TYPE_LPM_TRIE,
+   .key_size = 8,
+   .value_size = sizeof(struct trie_value),
+   .max_entries = 50,
+   .map_flags = BPF_F_NO_PREALLOC,
+};
+
+/* Map for counter*/
+struct bpf_map_def SEC("maps") rxcnt = {
+   .type = BPF_MAP_TYPE_PERCPU_ARRAY,
+   .key_size = sizeof(u32),
+   .value_size = sizeof(u64),
+   .max_entries = 256,
+};
+
+/* Map for ARP table*/
+struct bpf_map_def SEC("maps") arp_table = {
+   .type = BPF_MAP_TYPE_HASH,
+   .key_size = sizeof(__be32),
+   .value_size = sizeof(__be64),
+   .max_entries = 50,
+};
+
+/* Map to keep the exact match entries in the route table*/
+struct bpf_map_def SEC("maps") exact_match = {
+   .type = BPF_MAP_TYPE_HASH,
+   .key_size = sizeof(__be32),
+   .value_size = sizeof(struct direct_map),
+   .max_entries = 50,
+};
+
+struct bpf_map_def SEC("maps") tx_port = {
+   .type = BPF_MAP_TYPE_DEVMAP,
+   .key_size = sizeof(int),
+   .value_size = sizeof(int),
+   .max_entries = 100,
+};
+
+/* Function to set source and destination mac of the packet */
+static inline void set_src_dst_mac(void *data, void *src, void *dst)
+{
+   unsigned short *source = src;
+   unsigned short *dest  = dst;
+   unsigned short *p = data;
+
+   __builtin_memcpy(p, dest, 6);
+   __builtin_memcpy(p + 3, source, 6);
+}
+
+/* Parse IPV4 packet to get SRC, DST IP and protocol */
+static inline int parse_ipv4(void *data, u64 nh_off, void *data_end,
+__be32 *src, __be32 *dest)
+{
+   struct iphdr *iph = data + nh_off;
+
+   if (iph + 1 > data_end)
+   return 0;
+   *src = iph->saddr;
+   *dest = iph->daddr;
+   return iph->protocol;
+}
+
+SEC("xdp_router_ipv4")
+int xdp_router_ipv4_prog(struct xdp_md *ctx)
+{
+   void *data_end = (void *)(long)ctx->data_end;
+   __be64 *dest_mac = NULL, *src_mac = NULL;
+   void *data = (void *)(long)ctx->data;
+   struct trie_v

[PATCH v4 0/1] XDP program for ip forward

2017-11-04 Thread Christina Jacob

From: Christina Jacob 

The patch below implements port to port forwarding through route table and arp
table lookup for ipv4 packets using bpf_redirect helper function and lpm_trie
map.  This has an improved performance over the normal kernel stack ip forward.

Implementation details.
---
The program uses one map each for arp table, route table and packet count.
The number of entries the program can process is limited by the size of the
map used.

In the xdp_router_ipv4_user.c,

initially, the routing table is read and is stored in an lpm trie map.
The arp table is read and stored in an array map There are two netlink sockets
that listens to any change in the route table  and arp table.
There are two types of changes to the route table.
1.New

The new entries are added to the lpm_trie with proper key and prefix
length If there is a another entry in the route table with a different
metric(only metric is considered). Then the values are compared and the
one with lowest metric is added to the node.

2.Deletion

On deletion from the route table, The particular node is removed and the
entire route table is again read to check if there is another entry with
a different metric.

This implementation depends on  bpf: Implement map_delete_elem for
BPF_MAP_TYPE_LPM_TRIE which is not yet upstreamed.

There are two types of changes to the route table

1.New

The new arp entries are added in the in the array map directly with the
ip address as the key and the destination mac address as the value.

2.Delete

The entry corresponding to the particular ip is deleted from the
arp table map.

Another map is maintained for entries in the route table having 32 bit mask.
such entries can have a corresponding  arp entry which if  stored together with
the route entry in an array map and can be accessed in O(1) time. Eliminating
the trie lookup and arp lookup.

In the xdp_router_ipv4_kern.c,

The array map for the 32 bit mask entries checked to see if there is a key that
exactly matches with the destination ip. If it has a non zero destination mac
entry then the xdp data is updated accordingly Otherwise a proper route and
arp table lookup is done using the lpm_trie and the arp table array map.

Usage: as ./xdp_router_ipv4 -S  (-S for
generic xdp implementation ifindex- the index of the interface to which
the xdp program has to be attached.) in 4.14-rc3 kernel.

Changes from v1 to v2
-

* As suggested by Jesper Dangaard Brouer
1. Changed the program name to  list xdp_router_ipv4
2. Changed the commandline arguments from ifindex list to interface name
Usage : ./xdp_router_ipv4 [-S] 
-S for generic xdp implementation
-interface name list is the list of interfaces to which
the xdp program should attach to

* As suggested by Daniel Borkmann
1. Using __builin_memcpy to update source and destination mac in the bpf
  kernel program.

2. Started using __be32 in the kernel program to be inline with the data
   type used in user program

3. Rectified few style issues.

* Corrected the copyright issue pointed out by David Ahern

* Fixed the bug: The already attached interfaces are not detached from the
  xdp program if the program fails to attach to an interface later in the list.


Changes from v2 to v3
-
* As pointed out by Jesper Dangaard Brouer
   1. Changed the program name in the cover letter.
   2. Changed variable declararions to follow Reverse-xmas tree
  rule.
   3. Reduced the nesting in code for readability.
   4. Fixed bug: incorrect mac address being set for source and
  destination mac.
   5. Fixed comment style.

* As suggested by Stephen Hemminger 
Changed all the bzeros' to memset.

* As suggested by David Laight
removed the signed remainders calculation.

* As suggested by Stephen Hemminger and David Daney 
1. Added checks for the ioctl return value.
2. Changed data types to be64 to be sure about the size of the
   data type.
3. Verified byte order. Using the mac address from ioctl in
   network byte order. not casting to to long data type
   anymore.
4. Fixed returning address of local variable.

Changes from v3 to v4
-
* As suggested by Jesper,
1. Removed redundant typecastings.
2. Modified program to use bpf_redirect_map for better
   performance.
3. Changed program name in the code as well.


Christina Jacob (1):
  xdp: Sample xdp program implementing ip forward

 samples/bpf/Makefile

Re: linux-next: manual merge of the tip tree with the net-next tree

2017-11-04 Thread Stephen Rothwell

Hi Alexei,

On Wed, 1 Nov 2017 09:27:14 -0700 Alexei Starovoitov 
 wrote:
>
> Also what do you mean by "same patch != same commit" ?
> Like if we had pushed to some 3rd tree first and then pulled
> into tip and net-next it would have been better?

Well, it would not have caused a conflict.
-- 
Cheers,
Stephen Rothwell

Re: [PATCH net-next] ipv6: remove IN6_ADDR_HSIZE from addrconf.h

2017-11-04 Thread David Miller

From: Eric Dumazet 
Date: Sat, 04 Nov 2017 08:53:27 -0700

> From: Eric Dumazet 
> 
> IN6_ADDR_HSIZE is private to addrconf.c, move it here to avoid
> confusion.
> 
> Signed-off-by: Eric Dumazet 
> ---
> Should be applied after pktgen fix, thanks !

Thanks for resolving this, scary to see something like this :)

Re: [PATCH net-next] pktgen: do not abuse IN6_ADDR_HSIZE

2017-11-04 Thread David Miller

From: Eric Dumazet 
Date: Sat, 04 Nov 2017 08:27:14 -0700

> From: Eric Dumazet 
> 
> pktgen accidentally used IN6_ADDR_HSIZE, instead of using the size of an
> IPv6 address.
> 
> Since IN6_ADDR_HSIZE recently was increased from 16 to 256, this old
> bug is hitting us.
> 
> Fixes: 3f27fb23219e ("ipv6: addrconf: add per netns perturbation in 
> inet6_addr_hash()")
> Signed-off-by: Eric Dumazet 
> Reported-by: Dan Carpenter 

Applied.

Re: [PATCH net-next] ipv6: remove IN6_ADDR_HSIZE from addrconf.h

2017-11-04 Thread David Ahern

On 11/5/17 12:53 AM, Eric Dumazet wrote:
> From: Eric Dumazet 
> 
> IN6_ADDR_HSIZE is private to addrconf.c, move it here to avoid
> confusion.
> 
> Signed-off-by: Eric Dumazet 
> ---
> Should be applied after pktgen fix, thanks !
> 
>  include/net/addrconf.h |3 ---
>  net/ipv6/addrconf.c|2 ++
>  2 files changed, 2 insertions(+), 3 deletions(-)

Acked-by: David Ahern

Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces

2017-11-04 Thread Serge E. Hallyn

Quoting Mahesh Bandewar (mah...@bandewar.net):
> Init-user-ns is always uncontrolled and a process that has SYS_ADMIN
> that belongs to uncontrolled user-ns can create another (child) user-
> namespace that is uncontrolled. Any other process (that either does
> not have SYS_ADMIN or belongs to a controlled user-ns) can only
> create a user-ns that is controlled.

That's a huge change though.  It means that any system that previously
used unprivileged containers will need new privileged code (which always
risks more privilege leaks through the new code) to re-enable what was
possible without privilege before.  That's a regression.

I'm very much interested in what you want to do,  But it seems like
it would be worth starting with some automated code analysis that shows
exactly what code becomes accessible to unprivileged users with user
namespaces which was accessible to unprivileged users before.  Then we
can reason about classifying that code and perhaps limiting access to
some of it.

Re: Regression in throughput between kvm guests over virtual bridge

2017-11-04 Thread Wei Xu

On Fri, Nov 03, 2017 at 12:30:12AM -0400, Matthew Rosato wrote:
> On 10/31/2017 03:07 AM, Wei Xu wrote:
> > On Thu, Oct 26, 2017 at 01:53:12PM -0400, Matthew Rosato wrote:
> >>
> >>>
> >>> Are you using the same binding as mentioned in previous mail sent by you? 
> >>> it
> >>> might be caused by cpu convention between pktgen and vhost, could you 
> >>> please
> >>> try to run pktgen from another idle cpu by adjusting the binding? 
> >>
> >> I don't think that's the case -- I can cause pktgen to hang in the guest
> >> without any cpu binding, and with vhost disabled even.
> > 
> > Yes, I did a test and it also hangs in guest, before we figure it out,
> > maybe you try udp with uperf with this case?
> > 
> > VM   -> Host
> > Host -> VM
> > VM   -> VM
> > 
> 
> Here are averaged run numbers (Gbps throughput) across 4.12, 4.13 and
> net-next with and without Jason's recent "vhost_net: conditionally
> enable tx polling" applied (referred to as 'patch' below).  1 uperf
> instance in each case:

Thanks a lot for the test. 

> 
> uperf TCP:
>4.12   4.134.13+patch  net-nextnet-next+patch
> --
> VM->VM 35.2   16.520.84   22.224.36

Are you using the same server/test suite? You mentioned the number was around 
28Gb for 4.12 and it dropped about 40% for 4.13, it seems thing changed, are
there any options for performance tuning on the server to maximize the cpu
utilization? 

I had similar experience on x86 server and desktop before and it made that
the result number always went up and down pretty much.

> VM->Host 42.1543.57   44.90   30.83   32.26
> Host->VM 53.1741.51   42.18   37.05   37.30

This is a bit odd, I remember you said there was no regression while 
testing Host>VM, wasn't it? 

> 
> uperf UDP:
>4.12   4.134.13+patch  net-nextnet-next+patch
> --
> VM->VM 24.93  21.63   25.09   8.869.62
> VM->Host 40.2138.21   39.72   8.749.35
> Host->VM 31.2630.18   31.25   7.2 9.26

This case should be quite similar with pkgten, if you got improvement with
pktgen, usually it was also the same for UDP, could you please try to disable
tso, gso, gro, ufo on all host tap devices and guest virtio-net devices? 
Currently
the most significant tests would be like this AFAICT:

Host->VM 4.124.13
 TCP:
 UDP:
pktgen:

Don't want to bother you too much, so maybe 4.12 & 4.13 without Jason's patch 
should
work since we have seen positive number for that, you can also temporarily skip
net-next as well.

If you see UDP and pktgen are aligned, then it might be helpful to continue
the other two cases, otherwise we fail in the first place.

> The net is that Jason's recent patch definitely improves things across
> the board at 4.13 as well as at net-next -- But the VM<->VM TCP numbers
> I am observing are still lower than base 4.12.

Cool.

> 
> A separate concern is why my UDP numbers look so bad on net-next (have
> not bisected this yet).

This might be another issue, I am in vacation, will try it on x86 once back
to work on next Wednesday.

Wei

>

Re: [patch net-next 3/5] net: sched: introduce block mechanism to handle netif_keep_dst calls

2017-11-04 Thread Daniel Borkmann


On 11/04/2017 02:01 PM, Jiri Pirko wrote:
[...]

Ah, indeed, I missed this. I will rename TCQ_F_INGRESS to TCQ_F_CLSACT
as a part of this patchset too.


Sounds reasonable, thanks!

Re: [RFC PATCH] bpf: Add helpers to read useful task_struct members

2017-11-04 Thread Alexei Starovoitov


On 11/5/17 2:31 AM, Naveen N. Rao wrote:

Hi Alexei,

Alexei Starovoitov wrote:

On 11/3/17 3:58 PM, Sandipan Das wrote:

For added security, the layout of some structures can be
randomized by enabling CONFIG_GCC_PLUGIN_RANDSTRUCT. One
such structure is task_struct. To build BPF programs, we
use Clang which does not support this feature. So, if we
attempt to read a field of a structure with a randomized
layout within a BPF program, we do not get the expected
value because of incorrect offsets. To observe this, it
is not mandatory to have CONFIG_GCC_PLUGIN_RANDSTRUCT
enabled because the structure annotations/members added
for this purpose are enough to cause this. So, all kernel
builds are affected.

For example, considering samples/bpf/offwaketime_kern.c,
if we try to print the values of pid and comm inside the
task_struct passed to waker() by adding the following
lines of code at the appropriate place

  char fmt[] = "waker(): p->pid = %u, p->comm = %s\n";
  bpf_trace_printk(fmt, sizeof(fmt), _(p->pid), _(p->comm));

it is seen that upon rebuilding and running this sample
followed by inspecting /sys/kernel/debug/tracing/trace,
the output looks like the following

   _-=> irqs-off
  / _=> need-resched
 | / _---=> hardirq/softirq
 || / _--=> preempt-depth
 ||| / delay
TASK-PID   CPU#  TIMESTAMP  FUNCTION
   | |   |      | |
  -0 [007] d.s.  1883.443594: 0x0001: waker():
p->pid = 0, p->comm =
  -0 [018] d.s.  1883.453588: 0x0001: waker():
p->pid = 0, p->comm =
  -0 [007] d.s.  1883.463584: 0x0001: waker():
p->pid = 0, p->comm =
  -0 [009] d.s.  1883.483586: 0x0001: waker():
p->pid = 0, p->comm =
  -0 [005] d.s.  1883.493583: 0x0001: waker():
p->pid = 0, p->comm =
  -0 [009] d.s.  1883.503583: 0x0001: waker():
p->pid = 0, p->comm =
  -0 [018] d.s.  1883.513578: 0x0001: waker():
p->pid = 0, p->comm =
 systemd-journal-3140  [003] d...  1883.627660: 0x0001: waker():
 p->pid = 0, p->comm =
 systemd-journal-3140  [003] d...  1883.627704: 0x0001: waker():
 p->pid = 0, p->comm =
 systemd-journal-3140  [003] d...  1883.627723: 0x0001: waker():
 p->pid = 0, p->comm =

To avoid this, we add new BPF helpers that read the
correct values for some of the important task_struct
members such as pid, tgid, comm and flags which are
extensively used in BPF-based analysis tools such as
bcc. Since these helpers are built with GCC, they use
the correct offsets when referencing a member.

Signed-off-by: Sandipan Das 

...

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index f90860d1f897..324508d27bd2 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -338,6 +338,16 @@ union bpf_attr {
  * @skb: pointer to skb
  * Return: classid if != 0
  *
+ * u64 bpf_get_task_pid_tgid(struct task_struct *task)
+ * Return: task->tgid << 32 | task->pid
+ *
+ * int bpf_get_task_comm(struct task_struct *task)
+ * Stores task->comm into buf
+ * Return: 0 on success or negative error
+ *
+ * u32 bpf_get_task_flags(struct task_struct *task)
+ * Return: task->flags
+ *


I don't think it's a solution.
Tracing scripts read other fields too.
Making it work for these 3 fields is a drop in a bucket.


Indeed. However...


If randomization is used I think we have to accept
that existing bpf scripts won't be usable.


... the actual issue is that randomization isn't necessary for this to
show up. The annotations added to mark off the structure members results
in some structure members being moved into an anonymous structure, which
would then get padded differently. So, *all* kernels since v4.13 are
affected, afaict.


hmm. why would all 4.13+ be affected?
It's just an anonymous struct inside task_struct.
Are you saying that due to clang not adding this 'struct { };' treatment 
to task_struct?

I thought such struct shouldn't change layout.
If it is we need to fix include/linux/compiler-clang.h to do that
anon struct as well.


As such, we wanted to propose this as a short term solution, but I do
agree that this doesn't solve the real issue.


Long term solution is to support 'BPF Type Format' or BTF
(which is old C-Type Format) for kernel data structures,
so bcc scripts wouldn't need to use kernel headers and clang.
The proper offsets will be described in BTF.
We were planning to use it initially to describe map key/value,
but it applies for this case as well.
There will be a tool that will take dwarf from vmlinux and
compress it into BTF. Kernel will also be able to verify
that BTF is a valid BTF.


This is the first that I've heard about BTF. Can you share more details
about it, or point me to some place where it has been discussed?

We considered having tools derive the st

Re: [net-next 0/7] nfp: ethtool and related improvements

2017-11-04 Thread Jakub Kicinski

On Sat,  4 Nov 2017 16:48:53 +0100, Simon Horman wrote:
> Dirk van der Merwe says:
> 
> This patch series throws a couple of loosely related items into a single
> series.
> 
> Patch 1: Clang compilation fix reported by
>   Matthias Kaehlcke 
> 
> Patch 2: Driver can now do MAC reinit on load when there has been a
>   media override set in the NSP.
> 
> Patch 3: Refactor the nfp_app_reprs_set API.
> 
> Patch 4: Similar to vNICs, representors must be able to deal with media
>   override changes in the NSP.
> 
> Patch 5: Since representors can now handle media overrides, we can
>   allocate the get/set link ndo's to them.
> 
> Patch 6 & 7: Add support for FEC mode modification.

I forgot to put:

Reviewed-by: Jakub Kicinski 

on Dirk's patches, thanks for posting Simon!

Re: [PATCH] rtlwifi: remove redundant initialization to cfg_cmd

2017-11-04 Thread Larry Finger


On 11/04/2017 02:37 PM, Colin King wrote:

From: Colin Ian King 

cfg_cmd is initialized to zero and this value is never read, instead
it is over-written in the start of a do-while loop. Remove the
redundant initialization. Cleans up clang warning:

drivers/net/wireless/realtek/rtlwifi/core.c:1750:22: warning: Value
stored to 'cfg_cmd' during its initialization is never read

Signed-off-by: Colin Ian King 


Looks OK to me.

Acked-by: Larry Finger 

Thanks,

Larry


---
  drivers/net/wireless/realtek/rtlwifi/core.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/realtek/rtlwifi/core.c 
b/drivers/net/wireless/realtek/rtlwifi/core.c
index 1147327e6f52..7a17cc20c57e 100644
--- a/drivers/net/wireless/realtek/rtlwifi/core.c
+++ b/drivers/net/wireless/realtek/rtlwifi/core.c
@@ -1748,7 +1748,7 @@ bool rtl_hal_pwrseqcmdparsing(struct rtl_priv *rtlpriv, 
u8 cut_version,
  u8 faversion, u8 interface_type,
  struct wlan_pwr_cfg pwrcfgcmd[])
  {
-   struct wlan_pwr_cfg cfg_cmd = {0};
+   struct wlan_pwr_cfg cfg_cmd;
bool polling_bit = false;
u32 ary_idx = 0;
u8 value = 0;

[PATCH] rtlwifi: remove redundant initialization to cfg_cmd

2017-11-04 Thread Colin King

From: Colin Ian King 

cfg_cmd is initialized to zero and this value is never read, instead
it is over-written in the start of a do-while loop. Remove the
redundant initialization. Cleans up clang warning:

drivers/net/wireless/realtek/rtlwifi/core.c:1750:22: warning: Value
stored to 'cfg_cmd' during its initialization is never read

Signed-off-by: Colin Ian King 
---
 drivers/net/wireless/realtek/rtlwifi/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/realtek/rtlwifi/core.c 
b/drivers/net/wireless/realtek/rtlwifi/core.c
index 1147327e6f52..7a17cc20c57e 100644
--- a/drivers/net/wireless/realtek/rtlwifi/core.c
+++ b/drivers/net/wireless/realtek/rtlwifi/core.c
@@ -1748,7 +1748,7 @@ bool rtl_hal_pwrseqcmdparsing(struct rtl_priv *rtlpriv, 
u8 cut_version,
  u8 faversion, u8 interface_type,
  struct wlan_pwr_cfg pwrcfgcmd[])
 {
-   struct wlan_pwr_cfg cfg_cmd = {0};
+   struct wlan_pwr_cfg cfg_cmd;
bool polling_bit = false;
u32 ary_idx = 0;
u8 value = 0;
-- 
2.14.1

Re: [RFC PATCH] bpf: Add helpers to read useful task_struct members

2017-11-04 Thread Naveen N. Rao


Hi Alexei,

Alexei Starovoitov wrote:

On 11/3/17 3:58 PM, Sandipan Das wrote:

For added security, the layout of some structures can be
randomized by enabling CONFIG_GCC_PLUGIN_RANDSTRUCT. One
such structure is task_struct. To build BPF programs, we
use Clang which does not support this feature. So, if we
attempt to read a field of a structure with a randomized
layout within a BPF program, we do not get the expected
value because of incorrect offsets. To observe this, it
is not mandatory to have CONFIG_GCC_PLUGIN_RANDSTRUCT
enabled because the structure annotations/members added
for this purpose are enough to cause this. So, all kernel
builds are affected.

For example, considering samples/bpf/offwaketime_kern.c,
if we try to print the values of pid and comm inside the
task_struct passed to waker() by adding the following
lines of code at the appropriate place

  char fmt[] = "waker(): p->pid = %u, p->comm = %s\n";
  bpf_trace_printk(fmt, sizeof(fmt), _(p->pid), _(p->comm));

it is seen that upon rebuilding and running this sample
followed by inspecting /sys/kernel/debug/tracing/trace,
the output looks like the following

   _-=> irqs-off
  / _=> need-resched
 | / _---=> hardirq/softirq
 || / _--=> preempt-depth
 ||| / delay
TASK-PID   CPU#  TIMESTAMP  FUNCTION
   | |   |      | |
  -0 [007] d.s.  1883.443594: 0x0001: waker(): p->pid = 0, 
p->comm =
  -0 [018] d.s.  1883.453588: 0x0001: waker(): p->pid = 0, 
p->comm =
  -0 [007] d.s.  1883.463584: 0x0001: waker(): p->pid = 0, 
p->comm =
  -0 [009] d.s.  1883.483586: 0x0001: waker(): p->pid = 0, 
p->comm =
  -0 [005] d.s.  1883.493583: 0x0001: waker(): p->pid = 0, 
p->comm =
  -0 [009] d.s.  1883.503583: 0x0001: waker(): p->pid = 0, 
p->comm =
  -0 [018] d.s.  1883.513578: 0x0001: waker(): p->pid = 0, 
p->comm =
 systemd-journal-3140  [003] d...  1883.627660: 0x0001: waker(): 
 p->pid = 0, p->comm =
 systemd-journal-3140  [003] d...  1883.627704: 0x0001: waker(): 
 p->pid = 0, p->comm =
 systemd-journal-3140  [003] d...  1883.627723: 0x0001: waker(): 
 p->pid = 0, p->comm =


To avoid this, we add new BPF helpers that read the
correct values for some of the important task_struct
members such as pid, tgid, comm and flags which are
extensively used in BPF-based analysis tools such as
bcc. Since these helpers are built with GCC, they use
the correct offsets when referencing a member.

Signed-off-by: Sandipan Das 

...

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index f90860d1f897..324508d27bd2 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -338,6 +338,16 @@ union bpf_attr {
  * @skb: pointer to skb
  * Return: classid if != 0
  *
+ * u64 bpf_get_task_pid_tgid(struct task_struct *task)
+ * Return: task->tgid << 32 | task->pid
+ *
+ * int bpf_get_task_comm(struct task_struct *task)
+ * Stores task->comm into buf
+ * Return: 0 on success or negative error
+ *
+ * u32 bpf_get_task_flags(struct task_struct *task)
+ * Return: task->flags
+ *


I don't think it's a solution.
Tracing scripts read other fields too.
Making it work for these 3 fields is a drop in a bucket.


Indeed. However...


If randomization is used I think we have to accept
that existing bpf scripts won't be usable.


... the actual issue is that randomization isn't necessary for this to 
show up. The annotations added to mark off the structure members results 
in some structure members being moved into an anonymous structure, which 
would then get padded differently. So, *all* kernels since v4.13 are 
affected, afaict.


As such, we wanted to propose this as a short term solution, but I do 
agree that this doesn't solve the real issue.



Long term solution is to support 'BPF Type Format' or BTF
(which is old C-Type Format) for kernel data structures,
so bcc scripts wouldn't need to use kernel headers and clang.
The proper offsets will be described in BTF.
We were planning to use it initially to describe map key/value,
but it applies for this case as well.
There will be a tool that will take dwarf from vmlinux and
compress it into BTF. Kernel will also be able to verify
that BTF is a valid BTF.


This is the first that I've heard about BTF. Can you share more details 
about it, or point me to some place where it has been discussed?


We considered having tools derive the structure offsets from debuginfo, 
but debuginfo may not always be present on production systems. So, it 
isn't clear if having that dependency is fine. I'm not sure how BTF will

be different.


I'm assuming that gcc randomization plugin produces dwarf
with correct offsets, if not, it would have to be fixed.


I think the offsets describ

[PATCH net-next] ipv6: remove IN6_ADDR_HSIZE from addrconf.h

2017-11-04 Thread Eric Dumazet

From: Eric Dumazet 

IN6_ADDR_HSIZE is private to addrconf.c, move it here to avoid
confusion.

Signed-off-by: Eric Dumazet 
---
Should be applied after pktgen fix, thanks !

 include/net/addrconf.h |3 ---
 net/ipv6/addrconf.c|2 ++
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index 
3357332ea375b53dfb7704ea5eb8274a904f59b8..b623b65a79d1687602ba319cf9047a4c41b6396b
 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -59,9 +59,6 @@ struct in6_validator_info {
struct netlink_ext_ack  *extack;
 };
 
-#define IN6_ADDR_HSIZE_SHIFT   8
-#define IN6_ADDR_HSIZE (1 << IN6_ADDR_HSIZE_SHIFT)
-
 int addrconf_init(void);
 void addrconf_cleanup(void);
 
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 
69b8cdb43aa2a7289b9133a4fcad3da5d148a7fb..66d8c3d912fdb3de8d1bc157e8e7fe3750fd9005
 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -157,6 +157,8 @@ static int ipv6_generate_stable_address(struct in6_addr 
*addr,
u8 dad_count,
const struct inet6_dev *idev);
 
+#define IN6_ADDR_HSIZE_SHIFT   8
+#define IN6_ADDR_HSIZE (1 << IN6_ADDR_HSIZE_SHIFT)
 /*
  * Configured unicast address hash table
  */

[net-next 3/7] nfp: refactor nfp_app_reprs_set

2017-11-04 Thread Simon Horman

From: Dirk van der Merwe 

The criteria that reprs cannot be replaced with another new set of reprs
has been removed. This check is not needed since the only use case that
could exercise this at the moment, would be to modify the number of
SRIOV VFs without first disabling them. This case is explicitly
disallowed in any case and subsequent patches in this series
need to be able to replace the running set of reprs.

All cases where the return code used to be checked for the
nfp_app_reprs_set function have been removed.
As stated above, it is not possible for the current code to encounter a
case where reprs exist and need to be replaced.

Signed-off-by: Dirk van der Merwe 
Signed-off-by: Simon Horman 
---
 drivers/net/ethernet/netronome/nfp/flower/main.c | 16 
 drivers/net/ethernet/netronome/nfp/nfp_app.c |  6 --
 2 files changed, 4 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.c 
b/drivers/net/ethernet/netronome/nfp/flower/main.c
index e46e7c60d491..e0283bb24f06 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.c
@@ -142,8 +142,8 @@ nfp_flower_spawn_vnic_reprs(struct nfp_app *app,
 {
u8 nfp_pcie = nfp_cppcore_pcie_unit(app->pf->cpp);
struct nfp_flower_priv *priv = app->priv;
-   struct nfp_reprs *reprs, *old_reprs;
enum nfp_port_type port_type;
+   struct nfp_reprs *reprs;
const u8 queue = 0;
int i, err;
 
@@ -194,11 +194,7 @@ nfp_flower_spawn_vnic_reprs(struct nfp_app *app,
 reprs->reprs[i]->name);
}
 
-   old_reprs = nfp_app_reprs_set(app, repr_type, reprs);
-   if (IS_ERR(old_reprs)) {
-   err = PTR_ERR(old_reprs);
-   goto err_reprs_clean;
-   }
+   nfp_app_reprs_set(app, repr_type, reprs);
 
return 0;
 err_reprs_clean:
@@ -222,8 +218,8 @@ static int
 nfp_flower_spawn_phy_reprs(struct nfp_app *app, struct nfp_flower_priv *priv)
 {
struct nfp_eth_table *eth_tbl = app->pf->eth_tbl;
-   struct nfp_reprs *reprs, *old_reprs;
struct sk_buff *ctrl_skb;
+   struct nfp_reprs *reprs;
unsigned int i;
int err;
 
@@ -280,11 +276,7 @@ nfp_flower_spawn_phy_reprs(struct nfp_app *app, struct 
nfp_flower_priv *priv)
 phys_port, reprs->reprs[phys_port]->name);
}
 
-   old_reprs = nfp_app_reprs_set(app, NFP_REPR_TYPE_PHYS_PORT, reprs);
-   if (IS_ERR(old_reprs)) {
-   err = PTR_ERR(old_reprs);
-   goto err_reprs_clean;
-   }
+   nfp_app_reprs_set(app, NFP_REPR_TYPE_PHYS_PORT, reprs);
 
/* The MAC_REPR control message should be sent after the MAC
 * representors are registered using nfp_app_reprs_set().  This is
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_app.c 
b/drivers/net/ethernet/netronome/nfp/nfp_app.c
index 3644d74fe304..955a9f44d244 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_app.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_app.c
@@ -106,14 +106,8 @@ nfp_app_reprs_set(struct nfp_app *app, enum nfp_repr_type 
type,
 
old = rcu_dereference_protected(app->reprs[type],
lockdep_is_held(&app->pf->lock));
-   if (reprs && old) {
-   old = ERR_PTR(-EBUSY);
-   goto exit_unlock;
-   }
-
rcu_assign_pointer(app->reprs[type], reprs);
 
-exit_unlock:
return old;
 }
 
-- 
2.11.0

[net-next 6/7] nfp: add helpers for FEC support

2017-11-04 Thread Simon Horman

From: Dirk van der Merwe 

Implement helpers to determine and modify FEC modes via the NSP.
The NSP advertises FEC capabilities on a per port basis and provides
support for:
* Auto mode selection
* Reed Solomon
* BaseR
* None/Off

Signed-off-by: Dirk van der Merwe 
Signed-off-by: Simon Horman 
---
 .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h   | 30 ++
 .../ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c   | 64 ++
 2 files changed, 94 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h
index 47486d42f2d7..650ca1a5bd21 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h
@@ -79,6 +79,18 @@ enum nfp_eth_aneg {
NFP_ANEG_DISABLED,
 };
 
+enum nfp_eth_fec {
+   NFP_FEC_AUTO_BIT = 0,
+   NFP_FEC_BASER_BIT,
+   NFP_FEC_REED_SOLOMON_BIT,
+   NFP_FEC_DISABLED_BIT,
+};
+
+#define NFP_FEC_AUTO   BIT(NFP_FEC_AUTO_BIT)
+#define NFP_FEC_BASER  BIT(NFP_FEC_BASER_BIT)
+#define NFP_FEC_REED_SOLOMON   BIT(NFP_FEC_REED_SOLOMON_BIT)
+#define NFP_FEC_DISABLED   BIT(NFP_FEC_DISABLED_BIT)
+
 /**
  * struct nfp_eth_table - ETH table information
  * @count: number of table entries
@@ -93,6 +105,7 @@ enum nfp_eth_aneg {
  * @speed: interface speed (in Mbps)
  * @interface: interface (module) plugged in
  * @media: media type of the @interface
+ * @fec:   forward error correction mode
  * @aneg:  auto negotiation mode
  * @mac_addr:  interface MAC address
  * @label_port:port id
@@ -105,6 +118,7 @@ enum nfp_eth_aneg {
  * @port_type: one of %PORT_* defines for ethtool
  * @port_lanes:total number of lanes on the port (sum of lanes of all 
subports)
  * @is_split:  is interface part of a split port
+ * @fec_modes_supported:   bitmap of FEC modes supported
  */
 struct nfp_eth_table {
unsigned int count;
@@ -120,6 +134,7 @@ struct nfp_eth_table {
unsigned int interface;
enum nfp_eth_media media;
 
+   enum nfp_eth_fec fec;
enum nfp_eth_aneg aneg;
 
u8 mac_addr[ETH_ALEN];
@@ -139,6 +154,8 @@ struct nfp_eth_table {
unsigned int port_lanes;
 
bool is_split;
+
+   unsigned int fec_modes_supported;
} ports[0];
 };
 
@@ -149,6 +166,19 @@ __nfp_eth_read_ports(struct nfp_cpp *cpp, struct nfp_nsp 
*nsp);
 int nfp_eth_set_mod_enable(struct nfp_cpp *cpp, unsigned int idx, bool enable);
 int nfp_eth_set_configured(struct nfp_cpp *cpp, unsigned int idx,
   bool configed);
+int
+nfp_eth_set_fec(struct nfp_cpp *cpp, unsigned int idx, enum nfp_eth_fec mode);
+
+static inline bool nfp_eth_can_support_fec(struct nfp_eth_table_port *eth_port)
+{
+   return !!eth_port->fec_modes_supported;
+}
+
+static inline unsigned int
+nfp_eth_supported_fec_modes(struct nfp_eth_table_port *eth_port)
+{
+   return eth_port->fec_modes_supported;
+}
 
 struct nfp_nsp *nfp_eth_config_start(struct nfp_cpp *cpp, unsigned int idx);
 int nfp_eth_config_commit_end(struct nfp_nsp *nsp);
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c
index 47251396fcae..7ca589660e4d 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c
@@ -55,6 +55,8 @@
 #define NSP_ETH_PORT_INDEX GENMASK_ULL(15, 8)
 #define NSP_ETH_PORT_LABEL GENMASK_ULL(53, 48)
 #define NSP_ETH_PORT_PHYLABEL  GENMASK_ULL(59, 54)
+#define NSP_ETH_PORT_FEC_SUPP_BASERBIT_ULL(60)
+#define NSP_ETH_PORT_FEC_SUPP_RS   BIT_ULL(61)
 
 #define NSP_ETH_PORT_LANES_MASKcpu_to_le64(NSP_ETH_PORT_LANES)
 
@@ -67,6 +69,7 @@
 #define NSP_ETH_STATE_MEDIAGENMASK_ULL(21, 20)
 #define NSP_ETH_STATE_OVRD_CHNGBIT_ULL(22)
 #define NSP_ETH_STATE_ANEG GENMASK_ULL(25, 23)
+#define NSP_ETH_STATE_FEC  GENMASK_ULL(27, 26)
 
 #define NSP_ETH_CTRL_CONFIGUREDBIT_ULL(0)
 #define NSP_ETH_CTRL_ENABLED   BIT_ULL(1)
@@ -75,6 +78,7 @@
 #define NSP_ETH_CTRL_SET_RATE  BIT_ULL(4)
 #define NSP_ETH_CTRL_SET_LANES BIT_ULL(5)
 #define NSP_ETH_CTRL_SET_ANEG  BIT_ULL(6)
+#define NSP_ETH_CTRL_SET_FEC   BIT_ULL(7)
 
 enum nfp_eth_raw {
NSP_ETH_RAW_PORT = 0,
@@ -152,6 +156,7 @@ nfp_eth_port_translate(struct nfp_nsp *nsp, const union 
eth_table_entry *src,
   unsigned int index, struct nfp_eth_table_port *dst)
 {
unsigned int rate;
+   unsigned int fec;
u64 port, state;
 
port = le64_to_cpu(src->port);
@@ -183,6 +188,18 @@ nfp_eth_port_translate(struct nfp_nsp *nsp, const union 
eth_table_entry *src,
 
dst->override_changed = FIELD_GET(NSP_ETH_STATE_OVRD_CHNG, state

[net-next 2/7] nfp: make use of MAC reinit

2017-11-04 Thread Simon Horman

From: Jakub Kicinski 

Recent management FW images can perform full reinit of MAC cores
without requiring a reboot.  When loading the driver check if there
are changes pending and if so call NSP MAC reinit.  Full application
FW reload is still required, and all MACs need to be reinited at the
same time (not only the ones which have been reconfigured, and thus
potentially causing disruption to unrelated netdevs) therefore for
now changing MAC config without reloading the driver still remains
future work.

Signed-off-by: Jakub Kicinski 
Tested-by: Dirk van der Merwe 
Signed-off-by: Simon Horman 
---
 drivers/net/ethernet/netronome/nfp/nfp_main.c  | 28 +-
 .../net/ethernet/netronome/nfp/nfp_net_ethtool.c   |  2 +-
 drivers/net/ethernet/netronome/nfp/nfp_net_main.c  |  2 +-
 .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c   |  5 
 .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h   |  6 +
 5 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.c 
b/drivers/net/ethernet/netronome/nfp/nfp_main.c
index f8fa63b66739..35eaccbece36 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_main.c
@@ -346,6 +346,32 @@ nfp_fw_load(struct pci_dev *pdev, struct nfp_pf *pf, 
struct nfp_nsp *nsp)
return err < 0 ? err : 1;
 }
 
+static void
+nfp_nsp_init_ports(struct pci_dev *pdev, struct nfp_pf *pf,
+  struct nfp_nsp *nsp)
+{
+   bool needs_reinit = false;
+   int i;
+
+   pf->eth_tbl = __nfp_eth_read_ports(pf->cpp, nsp);
+   if (!pf->eth_tbl)
+   return;
+
+   if (!nfp_nsp_has_mac_reinit(nsp))
+   return;
+
+   for (i = 0; i < pf->eth_tbl->count; i++)
+   needs_reinit |= pf->eth_tbl->ports[i].override_changed;
+   if (!needs_reinit)
+   return;
+
+   kfree(pf->eth_tbl);
+   if (nfp_nsp_mac_reinit(nsp))
+   dev_warn(&pdev->dev, "MAC reinit failed\n");
+
+   pf->eth_tbl = __nfp_eth_read_ports(pf->cpp, nsp);
+}
+
 static int nfp_nsp_init(struct pci_dev *pdev, struct nfp_pf *pf)
 {
struct nfp_nsp *nsp;
@@ -366,7 +392,7 @@ static int nfp_nsp_init(struct pci_dev *pdev, struct nfp_pf 
*pf)
if (err < 0)
goto exit_close_nsp;
 
-   pf->eth_tbl = __nfp_eth_read_ports(pf->cpp, nsp);
+   nfp_nsp_init_ports(pdev, pf, nsp);
 
pf->nspi = __nfp_nsp_identify(nsp);
if (pf->nspi)
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
index c67b90c8d8b7..0061097c271e 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
@@ -328,7 +328,7 @@ nfp_net_set_link_ksettings(struct net_device *netdev,
return -EOPNOTSUPP;
 
if (netif_running(netdev)) {
-   netdev_warn(netdev, "Changing settings not allowed on an active 
interface. It may cause the port to be disabled until reboot.\n");
+   netdev_warn(netdev, "Changing settings not allowed on an active 
interface. It may cause the port to be disabled until driver reload.\n");
return -EBUSY;
}
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
index ff373acd28f3..0beb9b21557b 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
@@ -597,7 +597,7 @@ nfp_net_eth_port_update(struct nfp_cpp *cpp, struct 
nfp_port *port,
return -EIO;
}
if (eth_port->override_changed) {
-   nfp_warn(cpp, "Port #%d config changed, unregistering. Reboot 
required before port will be operational again.\n", port->eth_id);
+   nfp_warn(cpp, "Port #%d config changed, unregistering. Driver 
reload required before port will be operational again.\n", port->eth_id);
port->type = NFP_PORT_INVALID;
}
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
index 37364555c42b..14a6d1ba51a9 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c
@@ -477,6 +477,11 @@ int nfp_nsp_device_soft_reset(struct nfp_nsp *state)
return nfp_nsp_command(state, SPCODE_SOFT_RESET, 0, 0, 0);
 }
 
+int nfp_nsp_mac_reinit(struct nfp_nsp *state)
+{
+   return nfp_nsp_command(state, SPCODE_MAC_INIT, 0, 0, 0);
+}
+
 int nfp_nsp_load_fw(struct nfp_nsp *state, const struct firmware *fw)
 {
return nfp_nsp_command_buf(state, SPCODE_FW_LOAD, fw->size, fw->data,
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h
index e2f028027c6f..47486d42f2d7 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h
+++ b/drivers/n

[net-next 7/7] nfp: implement ethtool FEC mode settings

2017-11-04 Thread Simon Horman

From: Dirk van der Merwe 

Add support in the driver ethtool ops to modify the NFP FEC modes.

The FEC modes can be set for vNIC associated with physical ports or
for MAC representor netdevs.

Signed-off-by: Dirk van der Merwe 
Signed-off-by: Simon Horman 
---
 .../net/ethernet/netronome/nfp/nfp_net_ethtool.c   | 117 -
 1 file changed, 116 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
index d0028894667c..60c8d733a37d 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
@@ -244,6 +244,30 @@ nfp_app_get_drvinfo(struct net_device *netdev, struct 
ethtool_drvinfo *drvinfo)
nfp_get_drvinfo(app, app->pdev, "*", drvinfo);
 }
 
+static void
+nfp_net_set_fec_link_mode(struct nfp_eth_table_port *eth_port,
+ struct ethtool_link_ksettings *c)
+{
+   unsigned int modes;
+
+   ethtool_link_ksettings_add_link_mode(c, supported, FEC_NONE);
+   if (!nfp_eth_can_support_fec(eth_port)) {
+   ethtool_link_ksettings_add_link_mode(c, advertising, FEC_NONE);
+   return;
+   }
+
+   modes = nfp_eth_supported_fec_modes(eth_port);
+   if (modes & NFP_FEC_BASER) {
+   ethtool_link_ksettings_add_link_mode(c, supported, FEC_BASER);
+   ethtool_link_ksettings_add_link_mode(c, advertising, FEC_BASER);
+   }
+
+   if (modes & NFP_FEC_REED_SOLOMON) {
+   ethtool_link_ksettings_add_link_mode(c, supported, FEC_RS);
+   ethtool_link_ksettings_add_link_mode(c, advertising, FEC_RS);
+   }
+}
+
 /**
  * nfp_net_get_link_ksettings - Get Link Speed settings
  * @netdev:network interface device structure
@@ -278,9 +302,11 @@ nfp_net_get_link_ksettings(struct net_device *netdev,
 
port = nfp_port_from_netdev(netdev);
eth_port = nfp_port_get_eth_port(port);
-   if (eth_port)
+   if (eth_port) {
cmd->base.autoneg = eth_port->aneg != NFP_ANEG_DISABLED ?
AUTONEG_ENABLE : AUTONEG_DISABLE;
+   nfp_net_set_fec_link_mode(eth_port, cmd);
+   }
 
if (!netif_carrier_ok(netdev))
return 0;
@@ -686,6 +712,91 @@ static int nfp_port_get_sset_count(struct net_device 
*netdev, int sset)
}
 }
 
+static int nfp_port_fec_ethtool_to_nsp(u32 fec)
+{
+   switch (fec) {
+   case ETHTOOL_FEC_AUTO:
+   return NFP_FEC_AUTO_BIT;
+   case ETHTOOL_FEC_OFF:
+   return NFP_FEC_DISABLED_BIT;
+   case ETHTOOL_FEC_RS:
+   return NFP_FEC_REED_SOLOMON_BIT;
+   case ETHTOOL_FEC_BASER:
+   return NFP_FEC_BASER_BIT;
+   default:
+   /* NSP only supports a single mode at a time */
+   return -EOPNOTSUPP;
+   }
+}
+
+static u32 nfp_port_fec_nsp_to_ethtool(u32 fec)
+{
+   u32 result = 0;
+
+   if (fec & NFP_FEC_AUTO)
+   result |= ETHTOOL_FEC_AUTO;
+   if (fec & NFP_FEC_BASER)
+   result |= ETHTOOL_FEC_BASER;
+   if (fec & NFP_FEC_REED_SOLOMON)
+   result |= ETHTOOL_FEC_RS;
+   if (fec & NFP_FEC_DISABLED)
+   result |= ETHTOOL_FEC_OFF;
+
+   return result ?: ETHTOOL_FEC_NONE;
+}
+
+static int
+nfp_port_get_fecparam(struct net_device *netdev,
+ struct ethtool_fecparam *param)
+{
+   struct nfp_eth_table_port *eth_port;
+   struct nfp_port *port;
+
+   param->active_fec = ETHTOOL_FEC_NONE_BIT;
+   param->fec = ETHTOOL_FEC_NONE_BIT;
+
+   port = nfp_port_from_netdev(netdev);
+   eth_port = nfp_port_get_eth_port(port);
+   if (!eth_port)
+   return -EOPNOTSUPP;
+
+   if (!nfp_eth_can_support_fec(eth_port))
+   return 0;
+
+   param->fec = nfp_port_fec_nsp_to_ethtool(eth_port->fec_modes_supported);
+   param->active_fec = nfp_port_fec_nsp_to_ethtool(eth_port->fec);
+
+   return 0;
+}
+
+static int
+nfp_port_set_fecparam(struct net_device *netdev,
+ struct ethtool_fecparam *param)
+{
+   struct nfp_eth_table_port *eth_port;
+   struct nfp_port *port;
+   int err, fec;
+
+   port = nfp_port_from_netdev(netdev);
+   eth_port = nfp_port_get_eth_port(port);
+   if (!eth_port)
+   return -EOPNOTSUPP;
+
+   if (!nfp_eth_can_support_fec(eth_port))
+   return -EOPNOTSUPP;
+
+   fec = nfp_port_fec_ethtool_to_nsp(param->fec);
+   if (fec < 0)
+   return fec;
+
+   err = nfp_eth_set_fec(port->app->cpp, eth_port->index, fec);
+   if (!err)
+   /* Only refresh if we did something */
+   nfp_net_refresh_port_table(port);
+
+   return err < 0 ? err : 0;
+}
+
 /* RX network flow classification (RSS, filters, etc)
  */
 static u32 ethtool_flow_to_nfp_flag(u32 flow_type)
@@ -1144,6 +1255,8

[net-next 4/7] nfp: resync repr state when port table sync

2017-11-04 Thread Simon Horman

From: Dirk van der Merwe 

If the NSP port table has been refreshed, resync the representor state
with the new port information. At the moment, this only entails looking
for invalid ports and killing off representors associated with them.

The repr instance becomes NULL which is safe since the app accessor
function for reprs returns NULL when it cannot access a repr.

Signed-off-by: Dirk van der Merwe 
Signed-off-by: Simon Horman 
---
 drivers/net/ethernet/netronome/nfp/nfp_net_main.c |  6 +++
 drivers/net/ethernet/netronome/nfp/nfp_net_repr.c | 47 +++
 drivers/net/ethernet/netronome/nfp/nfp_net_repr.h |  1 +
 3 files changed, 54 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
index 0beb9b21557b..c505014121c4 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
@@ -611,6 +611,7 @@ int nfp_net_refresh_port_table_sync(struct nfp_pf *pf)
struct nfp_eth_table *eth_table;
struct nfp_net *nn, *next;
struct nfp_port *port;
+   int err;
 
lockdep_assert_held(&pf->lock);
 
@@ -640,6 +641,11 @@ int nfp_net_refresh_port_table_sync(struct nfp_pf *pf)
 
kfree(eth_table);
 
+   /* Resync repr state. This may cause reprs to be removed. */
+   err = nfp_reprs_resync_phys_ports(pf->app);
+   if (err)
+   return err;
+
/* Shoot off the ports which became invalid */
list_for_each_entry_safe(nn, next, &pf->vnics, vnic_list) {
if (!nn->port || nn->port->type != NFP_PORT_INVALID)
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
index d540a9dc77b3..1bce8c131bb9 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c
@@ -390,3 +390,50 @@ struct nfp_reprs *nfp_reprs_alloc(unsigned int num_reprs)
 
return reprs;
 }
+
+int nfp_reprs_resync_phys_ports(struct nfp_app *app)
+{
+   struct nfp_reprs *reprs, *old_reprs;
+   struct nfp_repr *repr;
+   int i;
+
+   old_reprs =
+   rcu_dereference_protected(app->reprs[NFP_REPR_TYPE_PHYS_PORT],
+ lockdep_is_held(&app->pf->lock));
+   if (!old_reprs)
+   return 0;
+
+   reprs = nfp_reprs_alloc(old_reprs->num_reprs);
+   if (!reprs)
+   return -ENOMEM;
+
+   for (i = 0; i < old_reprs->num_reprs; i++) {
+   if (!old_reprs->reprs[i])
+   continue;
+
+   repr = netdev_priv(old_reprs->reprs[i]);
+   if (repr->port->type == NFP_PORT_INVALID)
+   continue;
+
+   reprs->reprs[i] = old_reprs->reprs[i];
+   }
+
+   old_reprs = nfp_app_reprs_set(app, NFP_REPR_TYPE_PHYS_PORT, reprs);
+   synchronize_rcu();
+
+   /* Now we free up removed representors */
+   for (i = 0; i < old_reprs->num_reprs; i++) {
+   if (!old_reprs->reprs[i])
+   continue;
+
+   repr = netdev_priv(old_reprs->reprs[i]);
+   if (repr->port->type != NFP_PORT_INVALID)
+   continue;
+
+   nfp_app_repr_stop(app, repr);
+   nfp_repr_clean(repr);
+   }
+
+   kfree(old_reprs);
+   return 0;
+}
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.h 
b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.h
index 32179cad062a..5d4d897bc9c6 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.h
@@ -124,5 +124,6 @@ void
 nfp_reprs_clean_and_free_by_type(struct nfp_app *app,
 enum nfp_repr_type type);
 struct nfp_reprs *nfp_reprs_alloc(unsigned int num_reprs);
+int nfp_reprs_resync_phys_ports(struct nfp_app *app);
 
 #endif /* NFP_NET_REPR_H */
-- 
2.11.0

[net-next 5/7] nfp: add get/set link settings ndos to representors

2017-11-04 Thread Simon Horman

From: Dirk van der Merwe 

Since it is now safe to modify link settings for representors, we can
attach the get/set link settings ndos to it. The get/set link settings
are nfp_port based operations.

If a port becomes invalid, the representor will be removed in the same
way a vnic would be.

Signed-off-by: Dirk van der Merwe 
Signed-off-by: Simon Horman 
---
 drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
index 0061097c271e..d0028894667c 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
@@ -1155,6 +1155,8 @@ const struct ethtool_ops nfp_port_ethtool_ops = {
.set_dump   = nfp_app_set_dump,
.get_dump_flag  = nfp_app_get_dump_flag,
.get_dump_data  = nfp_app_get_dump_data,
+   .get_link_ksettings = nfp_net_get_link_ksettings,
+   .set_link_ksettings = nfp_net_set_link_ksettings,
 };
 
 void nfp_net_set_ethtool_ops(struct net_device *netdev)
-- 
2.11.0

[net-next 0/7] nfp: ethtool and related improvements

2017-11-04 Thread Simon Horman

Dirk van der Merwe says:

This patch series throws a couple of loosely related items into a single
series.

Patch 1: Clang compilation fix reported by
  Matthias Kaehlcke 

Patch 2: Driver can now do MAC reinit on load when there has been a
  media override set in the NSP.

Patch 3: Refactor the nfp_app_reprs_set API.

Patch 4: Similar to vNICs, representors must be able to deal with media
  override changes in the NSP.

Patch 5: Since representors can now handle media overrides, we can
  allocate the get/set link ndo's to them.

Patch 6 & 7: Add support for FEC mode modification.

Dirk van der Merwe (5):
  nfp: refactor nfp_app_reprs_set
  nfp: resync repr state when port table sync
  nfp: add get/set link settings ndos to representors
  nfp: add helpers for FEC support
  nfp: implement ethtool FEC mode settings

Jakub Kicinski (2):
  nfp: don't depend on compiler constant propagation
  nfp: make use of MAC reinit

 drivers/net/ethernet/netronome/nfp/flower/main.c   |  16 +--
 drivers/net/ethernet/netronome/nfp/nfp_app.c   |   6 -
 drivers/net/ethernet/netronome/nfp/nfp_main.c  |  28 -
 .../net/ethernet/netronome/nfp/nfp_net_ethtool.c   | 121 -
 drivers/net/ethernet/netronome/nfp/nfp_net_main.c  |   8 +-
 drivers/net/ethernet/netronome/nfp/nfp_net_repr.c  |  47 
 drivers/net/ethernet/netronome/nfp/nfp_net_repr.h  |   1 +
 .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c   |   5 +
 .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h   |  36 ++
 .../ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c   |  87 +--
 10 files changed, 325 insertions(+), 30 deletions(-)

-- 
2.11.0

[net-next 1/7] nfp: don't depend on compiler constant propagation

2017-11-04 Thread Simon Horman

From: Jakub Kicinski 

Matthias reports:

  nfp_eth_set_bit_config() is marked as __always_inline to allow gcc to
  identify the 'mask' parameter as known to be constant at compile time,
  which is required to use the FIELD_GET() macro.

  The forced inlining does the trick for gcc, but for kernel builds with
  clang it results in undefined symbols:

  drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.o: In function
`__nfp_eth_set_aneg':

drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c:(.text+0x787):
undefined reference to `__compiletime_assert_492'

drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c:(.text+0x7b1):
undefined reference to `__compiletime_assert_496'

  These __compiletime_assert_xyx() calls would have been optimized away
if
  the compiler had seen 'mask' as a constant.

Add a macro to extract the mask and shift and pass those to
nfp_eth_set_bit_config() separately.

Reported-by: Matthias Kaehlcke 
Signed-off-by: Jakub Kicinski 
Tested-by: Dirk van der Merwe 
Signed-off-by: Simon Horman 
---
 .../ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c   | 23 ++
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c
index f6f7c085f8e0..47251396fcae 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c
@@ -469,10 +469,10 @@ int nfp_eth_set_configured(struct nfp_cpp *cpp, unsigned 
int idx, bool configed)
return nfp_eth_config_commit_end(nsp);
 }
 
-/* Force inline, FIELD_* macroes require masks to be compilation-time known */
-static __always_inline int
+static int
 nfp_eth_set_bit_config(struct nfp_nsp *nsp, unsigned int raw_idx,
-  const u64 mask, unsigned int val, const u64 ctrl_bit)
+  const u64 mask, const unsigned int shift,
+  unsigned int val, const u64 ctrl_bit)
 {
union eth_table_entry *entries = nfp_nsp_config_entries(nsp);
unsigned int idx = nfp_nsp_config_idx(nsp);
@@ -489,11 +489,11 @@ nfp_eth_set_bit_config(struct nfp_nsp *nsp, unsigned int 
raw_idx,
 
/* Check if we are already in requested state */
reg = le64_to_cpu(entries[idx].raw[raw_idx]);
-   if (val == FIELD_GET(mask, reg))
+   if (val == (reg & mask) >> shift)
return 0;
 
reg &= ~mask;
-   reg |= FIELD_PREP(mask, val);
+   reg |= (val << shift) & mask;
entries[idx].raw[raw_idx] = cpu_to_le64(reg);
 
entries[idx].control |= cpu_to_le64(ctrl_bit);
@@ -503,6 +503,13 @@ nfp_eth_set_bit_config(struct nfp_nsp *nsp, unsigned int 
raw_idx,
return 0;
 }
 
+#define NFP_ETH_SET_BIT_CONFIG(nsp, raw_idx, mask, val, ctrl_bit)  \
+   ({  \
+   __BF_FIELD_CHECK(mask, 0ULL, val, "NFP_ETH_SET_BIT_CONFIG: "); \
+   nfp_eth_set_bit_config(nsp, raw_idx, mask, __bf_shf(mask), \
+  val, ctrl_bit);  \
+   })
+
 /**
  * __nfp_eth_set_aneg() - set PHY autonegotiation control bit
  * @nsp:   NFP NSP handle returned from nfp_eth_config_start()
@@ -515,7 +522,7 @@ nfp_eth_set_bit_config(struct nfp_nsp *nsp, unsigned int 
raw_idx,
  */
 int __nfp_eth_set_aneg(struct nfp_nsp *nsp, enum nfp_eth_aneg mode)
 {
-   return nfp_eth_set_bit_config(nsp, NSP_ETH_RAW_STATE,
+   return NFP_ETH_SET_BIT_CONFIG(nsp, NSP_ETH_RAW_STATE,
  NSP_ETH_STATE_ANEG, mode,
  NSP_ETH_CTRL_SET_ANEG);
 }
@@ -544,7 +551,7 @@ int __nfp_eth_set_speed(struct nfp_nsp *nsp, unsigned int 
speed)
return -EINVAL;
}
 
-   return nfp_eth_set_bit_config(nsp, NSP_ETH_RAW_STATE,
+   return NFP_ETH_SET_BIT_CONFIG(nsp, NSP_ETH_RAW_STATE,
  NSP_ETH_STATE_RATE, rate,
  NSP_ETH_CTRL_SET_RATE);
 }
@@ -561,6 +568,6 @@ int __nfp_eth_set_speed(struct nfp_nsp *nsp, unsigned int 
speed)
  */
 int __nfp_eth_set_split(struct nfp_nsp *nsp, unsigned int lanes)
 {
-   return nfp_eth_set_bit_config(nsp, NSP_ETH_RAW_PORT, NSP_ETH_PORT_LANES,
+   return NFP_ETH_SET_BIT_CONFIG(nsp, NSP_ETH_RAW_PORT, NSP_ETH_PORT_LANES,
  lanes, NSP_ETH_CTRL_SET_LANES);
 }
-- 
2.11.0

[PATCH net-next] pktgen: do not abuse IN6_ADDR_HSIZE

2017-11-04 Thread Eric Dumazet

From: Eric Dumazet 

pktgen accidentally used IN6_ADDR_HSIZE, instead of using the size of an
IPv6 address.

Since IN6_ADDR_HSIZE recently was increased from 16 to 256, this old
bug is hitting us.

Fixes: 3f27fb23219e ("ipv6: addrconf: add per netns perturbation in 
inet6_addr_hash()")
Signed-off-by: Eric Dumazet 
Reported-by: Dan Carpenter 
---
 net/core/pktgen.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 
6e1e10ff433a5f4097d1d4b33848ab13d4e005c6..e3fa53a07d34b3e5f6b438e08b440f520b3cd6d4
 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -2165,7 +2165,7 @@ static void pktgen_setup_inject(struct pktgen_dev 
*pkt_dev)
+ pkt_dev->pkt_overhead;
}
 
-   for (i = 0; i < IN6_ADDR_HSIZE; i++)
+   for (i = 0; i < sizeof(struct in6_addr); i++)
if (pkt_dev->cur_in6_saddr.s6_addr[i]) {
set = 1;
break;

Re: [bug report] ipv6: addrconf: add per netns perturbation in inet6_addr_hash()

2017-11-04 Thread Eric Dumazet

On Sat, Nov 4, 2017 at 7:24 AM, Eric Dumazet  wrote:
> On Sat, Nov 4, 2017 at 7:13 AM, Eric Dumazet  wrote:
>> On Sat, Nov 4, 2017 at 1:31 AM, Dan Carpenter  
>> wrote:
>>> Hello Eric Dumazet,
>>>
>>> The patch 3f27fb23219e: "ipv6: addrconf: add per netns perturbation
>>> in inet6_addr_hash()" from Oct 23, 2017, leads to the following
>>> static checker warning:
>>>
>>> net/core/pktgen.c:2169 pktgen_setup_inject()
>>> error: buffer overflow 'pkt_dev->cur_in6_saddr.in6_u.u6_addr8' 16 
>>> <= 255
>>>
>>> net/core/pktgen.c
>>>   2157  if (pkt_dev->flags & F_IPV6) {
>>>   2158  int i, set = 0, err = 1;
>>>   2159  struct inet6_dev *idev;
>>>   2160
>>>   2161  if (pkt_dev->min_pkt_size == 0) {
>>>   2162  pkt_dev->min_pkt_size = 14 + sizeof(struct 
>>> ipv6hdr)
>>>   2163  + sizeof(struct 
>>> udphdr)
>>>   2164  + sizeof(struct 
>>> pktgen_hdr)
>>>   2165  + 
>>> pkt_dev->pkt_overhead;
>>>   2166  }
>>>   2167
>>>   2168  for (i = 0; i < IN6_ADDR_HSIZE; i++)
>>> ^^
>>> My guess is that this is the wrong test here, but I don't know for sure.
>>>
>>>   2169  if (pkt_dev->cur_in6_saddr.s6_addr[i]) {
>>>^^
>>> This used to work but now that IN6_ADDR_HSIZE is 256 instead of 16 we're
>>> reading beyond the end of the array.
>>>
>>>   2170  set = 1;
>>>   2171  break;
>>>   2172  }
>>>   2173
>>>   2174  if (!set) {
>>>   2175
>>>   2176  /*
>>>   2177   * Use linklevel address if unconfigured.
>>>   2178   *
>>>   2179   * use ipv6_get_lladdr if/when it's get 
>>> exported
>>>   2180   */
>>>   2181
>>>
>>> regards,
>>> dan carpenter
>>
>> pktgen is obviously wrong.
>>
>> Thanks for the report.
>
> I am travelling to Seoul for netconf/netdev, please send this patch in
> an official way.
>
> Thanks !
>
> diff --git a/net/core/pktgen.c b/net/core/pktgen.c
> index 
> 6e1e10ff433a5f4097d1d4b33848ab13d4e005c6..e3fa53a07d34b3e5f6b438e08b440f520b3cd6d4
> 100644
> --- a/net/core/pktgen.c
> +++ b/net/core/pktgen.c
> @@ -2165,7 +2165,7 @@ static void pktgen_setup_inject(struct
> pktgen_dev *pkt_dev)
> + pkt_dev->pkt_overhead;
> }
>
> -   for (i = 0; i < IN6_ADDR_HSIZE; i++)
> +   for (i = 0; i < sizeof(struct in6_addr); i++)
> if (pkt_dev->cur_in6_saddr.s6_addr[i]) {
> set = 1;
> break;

Also I would move

include/net/addrconf.h:62:#define IN6_ADDR_HSIZE_SHIFT  8
include/net/addrconf.h:63:#define IN6_ADDR_HSIZE(1 <<
IN6_ADDR_HSIZE_SHIFT)

to net/ipv6/addrconf.c  to avoid future misuses like that.

Re: sr9800: Use common error handling code in sr9800_phy_powerup()

2017-11-04 Thread SF Markus Elfring

> If you play the "smaller executable object code" card, people expect that
> you provide the actual numbers, too.

I can offer another bit of information for this software development discussion.

The affected source file can be compiled for the processor architecture “x86_64”
by a tool like “GCC 6.4.1+r251631-1.3” from the software distribution
“openSUSE Tumbleweed” with the following command example.

my_cc=/usr/bin/gcc-6 \
&& my_module=drivers/net/usb/sr9800.ko \
&& git checkout next-20171009 \
&& make -j4 CC="${my_cc}" HOSTCC="${my_cc}" allmodconfig "${my_module}" \
&& size "${my_module}" \
&& git checkout ':/^sr9800: Use common error handling code in 
sr9800_phy_powerup' \
&& make -j4 CC="${my_cc}" HOSTCC="${my_cc}" allmodconfig "${my_module}" \
&& size "${my_module}"


Do you find the following details useful for further clarification?

text: -47
data: 0
bss:  0

Regards,
Markus

Re: [PATCH net-next v15] openvswitch: enable NSH support

2017-11-04 Thread Pravin Shelar

On Tue, Oct 31, 2017 at 9:03 PM, Yi Yang  wrote:
> v14->v15
>  - Check size in nsh_hdr_from_nlattr
>  - Fixed four small issues pointed out By Jiri and Eric
>
> v13->v14
>  - Rename skb_push_nsh to nsh_push per Dave's comment
>  - Rename skb_pop_nsh to nsh_pop per Dave's comment
>
> v12->v13
>  - Fix NSH header length check in set_nsh
>
> v11->v12
>  - Fix missing changes old comments pointed out
>  - Fix new comments for v11
>
> v10->v11
>  - Fix the left three disputable comments for v9
>but not fixed in v10.
>
> v9->v10
>  - Change struct ovs_key_nsh to
>struct ovs_nsh_key_base base;
>__be32 context[NSH_MD1_CONTEXT_SIZE];
>  - Fix new comments for v9
>
> v8->v9
>  - Fix build error reported by daily intel build
>because nsh module isn't selected by openvswitch
>
> v7->v8
>  - Rework nested value and mask for OVS_KEY_ATTR_NSH
>  - Change pop_nsh to adapt to nsh kernel module
>  - Fix many issues per comments from Jiri Benc
>
> v6->v7
>  - Remove NSH GSO patches in v6 because Jiri Benc
>reworked it as another patch series and they have
>been merged.
>  - Change it to adapt to nsh kernel module added by NSH
>GSO patch series
>
> v5->v6
>  - Fix the rest comments for v4.
>  - Add NSH GSO support for VxLAN-gpe + NSH and
>Eth + NSH.
>
> v4->v5
>  - Fix many comments by Jiri Benc and Eric Garver
>for v4.
>
> v3->v4
>  - Add new NSH match field ttl
>  - Update NSH header to the latest format
>which will be final format and won't change
>per its author's confirmation.
>  - Fix comments for v3.
>
> v2->v3
>  - Change OVS_KEY_ATTR_NSH to nested key to handle
>length-fixed attributes and length-variable
>attriubte more flexibly.
>  - Remove struct ovs_action_push_nsh completely
>  - Add code to handle nested attribute for SET_MASKED
>  - Change PUSH_NSH to use the nested OVS_KEY_ATTR_NSH
>to transfer NSH header data.
>  - Fix comments and coding style issues by Jiri and Eric
>
> v1->v2
>  - Change encap_nsh and decap_nsh to push_nsh and pop_nsh
>  - Dynamically allocate struct ovs_action_push_nsh for
>length-variable metadata.
>
> OVS master and 2.8 branch has merged NSH userspace
> patch series, this patch is to enable NSH support
> in kernel data path in order that OVS can support
> NSH in compat mode by porting this.
>
> Signed-off-by: Yi Yang 
As commented earlier following are action related validations that can
be moved to flow install phase.

> ---
>  include/net/nsh.h|   3 +
>  include/uapi/linux/openvswitch.h |  29 
>  net/nsh/nsh.c|  59 
>  net/openvswitch/Kconfig  |   1 +
>  net/openvswitch/actions.c| 119 +++
>  net/openvswitch/flow.c   |  51 +++
>  net/openvswitch/flow.h   |   7 +
>  net/openvswitch/flow_netlink.c   | 315 
> ++-
>  net/openvswitch/flow_netlink.h   |   5 +
>  9 files changed, 588 insertions(+), 1 deletion(-)
>
...

> diff --git a/net/nsh/nsh.c b/net/nsh/nsh.c
> index 58fb827..2764682 100644
> --- a/net/nsh/nsh.c
> +++ b/net/nsh/nsh.c
> @@ -14,6 +14,65 @@
>  #include 
>  #include 
>
> +int nsh_push(struct sk_buff *skb, const struct nshhdr *pushed_nh)
> +{
> +   struct nshhdr *nh;
> +   size_t length = nsh_hdr_len(pushed_nh);
> +   u8 next_proto;
> +
> +   if (skb->mac_len) {
> +   next_proto = TUN_P_ETHERNET;
> +   } else {
> +   next_proto = tun_p_from_eth_p(skb->protocol);
> +   if (!next_proto)
> +   return -EAFNOSUPPORT;
check for supported protocols can be moved to flow install validation
in __ovs_nla_copy_actions().

> +   }
> +
> +   /* Add the NSH header */
> +   if (skb_cow_head(skb, length) < 0)
> +   return -ENOMEM;
> +
> +   skb_push(skb, length);
> +   nh = (struct nshhdr *)(skb->data);
> +   memcpy(nh, pushed_nh, length);
> +   nh->np = next_proto;
> +
> +   skb->protocol = htons(ETH_P_NSH);
> +   skb_reset_mac_header(skb);
> +   skb_reset_network_header(skb);
> +   skb_reset_mac_len(skb);
> +
> +   return 0;
> +}
> +EXPORT_SYMBOL_GPL(nsh_push);
> +
> +int nsh_pop(struct sk_buff *skb)
> +{
> +   struct nshhdr *nh;
> +   size_t length;
> +   __be16 inner_proto;
> +
> +   if (!pskb_may_pull(skb, NSH_BASE_HDR_LEN))
> +   return -ENOMEM;
> +   nh = (struct nshhdr *)(skb->data);
> +   length = nsh_hdr_len(nh);
> +   inner_proto = tun_p_to_eth_p(nh->np);
same as above, this check can be moved to flow install __ovs_nla_copy_actions().

> +   if (!pskb_may_pull(skb, length))
> +   return -ENOMEM;
> +
> +   if (!inner_proto)
> +   return -EAFNOSUPPORT;
> +
> +   skb_pull(skb, length);
> +   skb_reset_mac_header(skb);
> +   skb_reset_network_header(skb);
> +   skb_reset_mac_len(skb);
> +   skb->protocol = inner_proto;
> +
> +   return 0;
> +}
> +EXPORT_SYMBOL_G

Re: [bug report] ipv6: addrconf: add per netns perturbation in inet6_addr_hash()

2017-11-04 Thread Eric Dumazet

On Sat, Nov 4, 2017 at 7:13 AM, Eric Dumazet  wrote:
> On Sat, Nov 4, 2017 at 1:31 AM, Dan Carpenter  
> wrote:
>> Hello Eric Dumazet,
>>
>> The patch 3f27fb23219e: "ipv6: addrconf: add per netns perturbation
>> in inet6_addr_hash()" from Oct 23, 2017, leads to the following
>> static checker warning:
>>
>> net/core/pktgen.c:2169 pktgen_setup_inject()
>> error: buffer overflow 'pkt_dev->cur_in6_saddr.in6_u.u6_addr8' 16 <= 
>> 255
>>
>> net/core/pktgen.c
>>   2157  if (pkt_dev->flags & F_IPV6) {
>>   2158  int i, set = 0, err = 1;
>>   2159  struct inet6_dev *idev;
>>   2160
>>   2161  if (pkt_dev->min_pkt_size == 0) {
>>   2162  pkt_dev->min_pkt_size = 14 + sizeof(struct 
>> ipv6hdr)
>>   2163  + sizeof(struct 
>> udphdr)
>>   2164  + sizeof(struct 
>> pktgen_hdr)
>>   2165  + 
>> pkt_dev->pkt_overhead;
>>   2166  }
>>   2167
>>   2168  for (i = 0; i < IN6_ADDR_HSIZE; i++)
>> ^^
>> My guess is that this is the wrong test here, but I don't know for sure.
>>
>>   2169  if (pkt_dev->cur_in6_saddr.s6_addr[i]) {
>>^^
>> This used to work but now that IN6_ADDR_HSIZE is 256 instead of 16 we're
>> reading beyond the end of the array.
>>
>>   2170  set = 1;
>>   2171  break;
>>   2172  }
>>   2173
>>   2174  if (!set) {
>>   2175
>>   2176  /*
>>   2177   * Use linklevel address if unconfigured.
>>   2178   *
>>   2179   * use ipv6_get_lladdr if/when it's get 
>> exported
>>   2180   */
>>   2181
>>
>> regards,
>> dan carpenter
>
> pktgen is obviously wrong.
>
> Thanks for the report.

I am travelling to Seoul for netconf/netdev, please send this patch in
an official way.

Thanks !

diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 
6e1e10ff433a5f4097d1d4b33848ab13d4e005c6..e3fa53a07d34b3e5f6b438e08b440f520b3cd6d4
100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -2165,7 +2165,7 @@ static void pktgen_setup_inject(struct
pktgen_dev *pkt_dev)
+ pkt_dev->pkt_overhead;
}

-   for (i = 0; i < IN6_ADDR_HSIZE; i++)
+   for (i = 0; i < sizeof(struct in6_addr); i++)
if (pkt_dev->cur_in6_saddr.s6_addr[i]) {
set = 1;
break;

Re: [PATCH] net: sched: cls_u32: use bitwise & rather than logical && on n->flags

2017-11-04 Thread David Miller

From: Colin King 
Date: Fri,  3 Nov 2017 08:09:45 +

> From: Colin Ian King 
> 
> Currently n->flags is being operated on by a logical && operator rather
> than a bitwise & operator. This looks incorrect as these should be bit
> flag operations. Fix this.
> 
> Detected by CoverityScan, CID#1460398 ("Logical vs. bitwise operator")
> 
> Fixes: 245dc5121a9b ("net: sched: cls_u32: call block callbacks for offload")
> Signed-off-by: Colin Ian King 

Applied, thanks Colin.

Re: [PATCH] net: usb: asix: fill null-ptr-deref in asix_suspend

2017-11-04 Thread David Miller

From: Andrey Konovalov 
Date: Thu,  2 Nov 2017 21:26:59 +0100

> When asix_suspend() is called dev->driver_priv might not have been
> assigned a value, so we need to check that it's not NULL.
> 
> Found by syzkaller.
 ...
> Signed-off-by: Andrey Konovalov 

Applied, thank you.

Re: [PATCH v3 net-next 0/5] eBPF-based device cgroup controller

2017-11-04 Thread David Miller

From: Roman Gushchin 
Date: Thu, 2 Nov 2017 13:15:25 -0400

> This patchset introduces an eBPF-based device controller for cgroup
> v2.

This doesn't apply cleanly to net-next, please respin.

Thank you.

ipset related DEBUG_VIRTUAL crash.

2017-11-04 Thread Dave Jones

I have a script that hourly replaces an ipset list. This has been in
place for a year or so, but last night it triggered this on 4.14-rc7

[455951.731181] kernel BUG at arch/x86/mm/physaddr.c:26!
[455951.737016] invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
[455951.742525] CPU: 0 PID: 3850 Comm: ipset Not tainted 4.14.0-rc7-firewall+ 
#1 
[455951.753293] task: 88013033cfc0 task.stack: 8801c3d48000
[455951.758567] RIP: 0010:__phys_addr+0x5b/0x80
[455951.763742] RSP: 0018:8801c3d4f528 EFLAGS: 00010287
[455951.768838] RAX: 7800849b62b6 RBX: 849b62b6 RCX: 
9f072a5d
[455951.773881] RDX: dc00 RSI: dc00 RDI: 
a06917e0
[455951.778844] RBP: 7800049b62b6 R08: 0002 R09: 

[455951.783729] R10:  R11:  R12: 
9fca8b05
[455951.788524] R13: 8801ce844268 R14: 049b62b6 R15: 
8801ce8442ea
[455951.793239] FS:  7fb44e656c80() GS:8801d320() 
knlGS:
[455951.797904] CS:  0010 DS:  ES:  CR0: 80050033
[455951.802479] CR2: 7ffeeafd70a8 CR3: 0001b6cd2001 CR4: 
000606f0
[455951.806998] Call Trace:
[455951.811404]  kfree+0x4c/0x310
[455951.815714]  hash_ip4_ahash_destroy+0x85/0xd0
[455951.819944]  hash_ip4_destroy+0x64/0x90
[455951.824069]  ip_set_destroy+0x4f0/0x500
[455951.828098]  ? ip_set_destroy+0x5/0x500
[455951.832029]  ? __rcu_read_unlock+0xd3/0x190
[455951.835867]  ? ip_set_utest+0x560/0x560
[455951.839610]  ? ip_set_utest+0x560/0x560
[455951.843239]  nfnetlink_rcv_msg+0x73e/0x770
[455951.846780]  ? nfnetlink_rcv_msg+0x352/0x770
[455951.850229]  ? nfnetlink_rcv+0xe90/0xe90
[455951.853571]  ? native_sched_clock+0xe8/0x190
[455951.856822]  ? lock_release+0x5d3/0x7d0
[455951.859976]  netlink_rcv_skb+0x121/0x230
[455951.863037]  ? nfnetlink_rcv+0xe90/0xe90
[455951.865999]  ? netlink_ack+0x4c0/0x4c0
[455951.868866]  ? ns_capable_common+0x68/0xc0
[455951.871638]  nfnetlink_rcv+0x1ad/0xe90
[455951.874312]  ? lock_acquire+0x380/0x380
[455951.876891]  ? __rcu_read_unlock+0xd3/0x190
[455951.879378]  ? __rcu_read_lock+0x30/0x30
[455951.881764]  ? rcu_is_watching+0xa4/0xf0
[455951.884048]  ? netlink_connect+0x1e0/0x1e0
[455951.886236]  ? nfnl_err_reset+0x180/0x180
[455951.888329]  ? netlink_deliver_tap+0x128/0x560
[455951.890333]  ? netlink_deliver_tap+0x5/0x560
[455951.892229]  ? iov_iter_advance+0x172/0x7f0
[455951.894029]  ? netlink_getname+0x150/0x150
[455951.895736]  ? can_nice.part.77+0x20/0x20
[455951.897342]  ? iov_iter_copy_from_user_atomic+0x7d0/0x7d0
[455951.898877]  ? netlink_trim+0x111/0x1b0
[455951.900394]  ? netlink_skb_destructor+0xf0/0xf0
[455951.901908]  netlink_unicast+0x2b1/0x340
[455951.903397]  ? netlink_detachskb+0x30/0x30
[455951.904862]  ? lock_acquire+0x380/0x380
[455951.906299]  ? lockdep_rcu_suspicious+0x100/0x100
[455951.907729]  netlink_sendmsg+0x4f2/0x650
[455951.909141]  ? netlink_broadcast_filtered+0x9e0/0x9e0
[455951.910565]  ? _copy_from_user+0x86/0xc0
[455951.911964]  ? netlink_broadcast_filtered+0x9e0/0x9e0
[455951.913364]  SYSC_sendto+0x2f0/0x3c0
[455951.914741]  ? SYSC_connect+0x210/0x210
[455951.916111]  ? bad_area_access_error+0x230/0x230
[455951.917479]  ? ___sys_recvmsg+0x320/0x320
[455951.918811]  ? sock_wake_async+0xc0/0xc0
[455951.920112]  ? SyS_brk+0x3ae/0x3d0
[455951.921381]  ? prepare_exit_to_usermode+0xde/0x230
[455951.922642]  ? enter_from_user_mode+0x30/0x30
[455951.923913]  ? mark_held_locks+0x1b/0xa0
[455951.925179]  ? entry_SYSCALL_64_fastpath+0x5/0xad
[455951.926459]  ? trace_hardirqs_on_caller+0x185/0x260
[455951.927747]  ? trace_hardirqs_on_thunk+0x1a/0x1c
[455951.929031]  entry_SYSCALL_64_fastpath+0x18/0xad
[455951.930314] RIP: 0033:0x7fb44df4ac53
[455951.931592] RSP: 002b:7ffeeafb6a08 EFLAGS: 0246
[455951.932914]  ORIG_RAX: 002c
[455951.934231] RAX: ffda RBX: 55b8f35d26d0 RCX: 
7fb44df4ac53
[455951.935603] RDX: 002c RSI: 55b8f35d14b8 RDI: 
0003
[455951.936991] RBP: 55b8f35cf010 R08: 7fb44dc5dbe0 R09: 
000c
[455951.938387] R10:  R11: 0246 R12: 
7fb44e43b020
[455951.939795] R13: 7ffeeafb6acc R14:  R15: 
55b8f1ca68e0
[455951.941208] Code: 80 48 39 eb 72 25 48 c7 c7 09 d6 a4 a0 e8 3e 28 2c 00 0f 
b6 0d 80 ab 9d 01 48 8d 45 00 48 d3 e8 48 85 c0 75 06 5b 48 89 e8 5d c3 <0f> 0b 
48 c7 c7 10 c0 62 a0 e8 a7 2a 2c 00 48 8b 2d 60 95 5b 01 
[455951.993251] RIP: __phys_addr+0x5b/0x80 RSP: 8801c3d4f528
[455982.040898] ---[ end trace dfb8a0f07b7c5316 ]---
[459428.674105] 
==
[459428.679829] BUG: KASAN: use-after-free in __mutex_lock+0x26c/0xf30
[459428.685463] Read of size 4 at addr 88013033d020 by task ipset/4611
[459428.696474] CPU: 0 PID: 4611 Comm: ipset Tainted: G  D 
4.14.0-rc7-firewall+ #1 
[459428.707271] Call Trace:
[459428.712489]

Re: [PATCH] netfilter/ipvs: clear ipvs_property flag when SKB net namespace changed

2017-11-04 Thread David Miller

From: Simon Horman 
Date: Thu, 2 Nov 2017 15:46:50 +0100

> On Sat, Oct 28, 2017 at 01:33:09PM +0300, Julian Anastasov wrote:
>> 
>>  Hello,
>> 
>> On Thu, 26 Oct 2017, Ye Yin wrote:
>> 
>> > When run ipvs in two different network namespace at the same host, and one
>> > ipvs transport network traffic to the other network namespace ipvs.
>> > 'ipvs_property' flag will make the second ipvs take no effect. So we should
>> > clear 'ipvs_property' when SKB network namespace changed.
>> > 
>> > Signed-off-by: Ye Yin 
>> > Signed-off-by: Wei Zhou 
>> 
>>  Patch looks good to me. ipvs_property was added long ago
>> but skb_scrub_packet() is more recent (3.11), so:
>> 
>> Fixes: 621e84d6f373 ("dev: introduce skb_scrub_packet()")
>> Signed-off-by: Julian Anastasov 
>> 
>>  I guess, DaveM can apply it directly as a bugfix
>> to the net tree.
> 
> Sounds like a good plan to me, Dave?
> 
> Signed-off-by: Simon Horman 

Sure, applied and queued up for -stable, thanks!

Re: [PATCH] tcp_nv: use do_div() instead of expensive div64_u64()

2017-11-04 Thread David Miller

From: Konstantin Khlebnikov 
Date: Thu, 02 Nov 2017 17:07:05 +0300

> Average RTT is 32-bit thus full 64-bit division is redundant.
> 
> Signed-off-by: Konstantin Khlebnikov 
> Suggested-by: Stephen Hemminger 
> Suggested-by: Eric Dumazet 

Applied to net-next, thank you.

Re: [PATCH net] cxgb4: update latest firmware version supported

2017-11-04 Thread David Miller

From: Ganesh Goudar 
Date: Thu,  2 Nov 2017 19:26:22 +0530

> Change t4fw_version.h to update latest firmware version
> number to 1.16.63.0.
> 
> Signed-off-by: Ganesh Goudar 

Applied.

Re: [PATCH net] add support of IFF_XMIT_DST_RELEASE bit in vlan

2017-11-04 Thread David Miller

From: Vadim Fedorenko 
Date: Thu, 2 Nov 2017 15:49:08 +0300

> Some time ago Eric Dumazet suggested a "hack the IFF_XMIT_DST_RELEASE
> flag on the vlan netdev". But the last comment was "does not support
> properly bonding/team.(If the real_dev->privflags IFF_XMIT_DST_RELEASE
> bit changes, we want to update all the vlans at the same time )"
> 
> I've extended that patch to support changes of IFF_XMIT_DST_RELEASE in
> bonding/team.
> Both bonding and team call netdev_change_features() after recalculation
> of features including priv_flags IFF_XMIT_DST_RELEASE bit. So the only
> thing needed to support is to recheck this bit in
> vlan_transfer_features().
> 
> Suggested-by: Eric Dumazet 
> Signed-off-by: Vadim Fedorenko 

Applied, thank you.

Re: [PATCH net-next] phylink: make local function phylink_phy_change() static

2017-11-04 Thread David Miller

From: Wei Yongjun 
Date: Thu, 2 Nov 2017 11:14:48 +

> Fixes the following sparse warnings:
> 
> drivers/net/phy/phylink.c:570:6: warning:
>  symbol 'phylink_phy_change' was not declared. Should it be static?
> 
> Signed-off-by: Wei Yongjun 

Applied, thank you Wei.

Re: [PATCH] [net-next,v2] ibmvnic: Feature implementation of Vital Product Data (VPD) for the ibmvnic driver

2017-11-04 Thread David Miller

From: Desnes Augusto Nunes do Rosario 
Date: Wed,  1 Nov 2017 19:03:32 -0200

> + substr = strnstr(adapter->vpd->buff, "RM", adapter->vpd->len);
> + if (!substr) {
> + dev_info(dev, "No FW level provided by VPD\n");
> + complete(&adapter->fw_done);
> + return;
> + }
> +
> + /* get length of firmware level ASCII substring */
> + fw_level_len = *(substr + 2);
> +
> + /* copy firmware version string from vpd into adapter */
> + ptr = strncpy((char *)adapter->fw_version,
> +   substr + 3, fw_level_len);

You have to be more careful here, making sure first that
(substr + 2) < (adapter->vpd->buff + adapter->vpd->len),
and next that (substr + 2 + fw_level_len) is in range
as well.

Re: [patch net-next 3/5] net: sched: introduce block mechanism to handle netif_keep_dst calls

2017-11-04 Thread Jiri Pirko

Sat, Nov 04, 2017 at 11:33:58AM CET, dan...@iogearbox.net wrote:
>On 11/04/2017 10:55 AM, Jiri Pirko wrote:
>> Fri, Nov 03, 2017 at 09:15:54PM CET, dan...@iogearbox.net wrote:
>> > On 11/03/2017 06:19 PM, Jiri Pirko wrote:
>> > > From: Jiri Pirko 
>> > > 
>> > > Couple of classifiers call netif_keep_dst directly on q->dev. That is
>> > > not possible to do directly for shared blocke where multiple qdiscs are
>> > > owning the block. So introduce a infrastructure to keep track of the
>> > > block owners in list and use this list to implement block variant of
>> > > netif_keep_dst.
>> > > 
>> > > Signed-off-by: Jiri Pirko 
>> > [...]
>> > > +struct tcf_block_owner_item {
>> > > +struct list_head list;
>> > > +struct Qdisc *q;
>> > > +enum tcf_block_binder_type binder_type;
>> > > +};
>> > > +
>> > > +static void
>> > > +tcf_block_owner_netif_keep_dst(struct tcf_block *block,
>> > > +   struct Qdisc *q,
>> > > +   enum tcf_block_binder_type binder_type)
>> > > +{
>> > > +if (block->keep_dst &&
>> > > +binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
>> > 
>> > Why we need to keep dst on TCF_BLOCK_BINDER_TYPE_CLSACT_EGRESS ?
>> > I presume this enum means sch_handle_egress() ? dst is dropped
>> > later ...
>> 
>> This is because of the bpf check:
>> if (fp->dst_needed && !(tp->q->flags & TCQ_F_INGRESS))
>> netif_keep_dst(qdisc_dev(tp->q));
>> 
>> I just maintain the same logic here.
>
>No, that's a wrong claim, really ...
>
>clsact in general hooks into the same logic as ingress, so TC_H_CLSACT
>as major needs to reuse TC_H_INGRESS, and qdiscs set up as such set
>TCQ_F_INGRESS as flags. For clsact that means both your block binder
>types for clsact here (ingress/egress).

Ah, indeed, I missed this. I will rename TCQ_F_INGRESS to TCQ_F_CLSACT
as a part of this patchset too.


>
>Please make sure that your other changes don't have similar assumption.

They don't. Thanks for the review!

Re: [net-next 03/12] net/mlx5: Add MLX5_SET16 and MLX5_GET16

2017-11-04 Thread Or Gerlitz

On Sat, Nov 4, 2017 at 6:35 PM, David Miller  wrote:
> From: Or Gerlitz 
> Date: Sat, 4 Nov 2017 18:05:29 +0900
>
>> On Sat, Nov 4, 2017 at 5:50 PM, Saeed Mahameed  wrote:
>>> From: Huy Nguyen 
>>>
>>> Add MLX5_SET16 and MLX5_GET16 for 16bit structure field in firmware
>>> command.
>>>
>>> Signed-off-by: Huy Nguyen 
>>> Reviewed-by: Parav Pandit 
>>> Signed-off-by: Saeed Mahameed 
>>
>> This was reviewed by Eli Cohen, his R.B is missing here. Eli - if you
>> can reply and add it such that
>> patchworks will pick it up.
>
> Not if I pull from Saeed's tree, which is what I usually do for mlx5 
> submissions.

So I guess Saeed's maintainer signature could be enough

[PATCH] net: mvpp2: Prevent userspace from changing TX affinities

2017-11-04 Thread Marc Zyngier

The mvpp2 driver can't cope at all with the TX affinities being
changed from userspace, and spit an endless stream of

[   91.779920] mvpp2 f400.ethernet eth2: wrong cpu on the end of Tx 
processing
[   91.779930] mvpp2 f400.ethernet eth2: wrong cpu on the end of Tx 
processing
[   91.780402] mvpp2 f400.ethernet eth2: wrong cpu on the end of Tx 
processing
[   91.780406] mvpp2 f400.ethernet eth2: wrong cpu on the end of Tx 
processing
[   91.780415] mvpp2 f400.ethernet eth2: wrong cpu on the end of Tx 
processing
[   91.780418] mvpp2 f400.ethernet eth2: wrong cpu on the end of Tx 
processing

rendering the box completely useless (I've measured around 600k
interrupts/s on a 8040 box) once irqbalance kicks in and start
doing its job.

Obviously, the driver was never designed with this in mind. So let's
work around the problem by preventing userspace from interacting
with these interrupts altogether.

Cc: sta...@vger.kernel.org
Signed-off-by: Marc Zyngier 
---
 drivers/net/ethernet/marvell/mvpp2.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/marvell/mvpp2.c 
b/drivers/net/ethernet/marvell/mvpp2.c
index a37af5813f33..fcf9ba5eb8d1 100644
--- a/drivers/net/ethernet/marvell/mvpp2.c
+++ b/drivers/net/ethernet/marvell/mvpp2.c
@@ -6747,6 +6747,9 @@ static int mvpp2_irqs_init(struct mvpp2_port *port)
for (i = 0; i < port->nqvecs; i++) {
struct mvpp2_queue_vector *qv = port->qvecs + i;
 
+   if (qv->type == MVPP2_QUEUE_VECTOR_PRIVATE)
+   irq_set_status_flags(qv->irq, IRQ_NO_BALANCING);
+
err = request_irq(qv->irq, mvpp2_isr, 0, port->dev->name, qv);
if (err)
goto err;
@@ -6776,6 +6779,7 @@ static void mvpp2_irqs_deinit(struct mvpp2_port *port)
struct mvpp2_queue_vector *qv = port->qvecs + i;
 
irq_set_affinity_hint(qv->irq, NULL);
+   irq_clear_status_flags(qv->irq, IRQ_NO_BALANCING);
free_irq(qv->irq, qv);
}
 }
-- 
2.11.0

Re: [PATCH net-next v15] openvswitch: enable NSH support

2017-11-04 Thread Pravin Shelar

On Thu, Nov 2, 2017 at 6:40 PM, Yang, Yi  wrote:
> On Thu, Nov 02, 2017 at 05:06:47AM -0700, Pravin Shelar wrote:
>> On Wed, Nov 1, 2017 at 7:50 PM, Yang, Yi  wrote:
>> > On Thu, Nov 02, 2017 at 08:52:40AM +0800, Pravin Shelar wrote:
>> >> On Tue, Oct 31, 2017 at 9:03 PM, Yi Yang  wrote:
>> >> >
>> >> > OVS master and 2.8 branch has merged NSH userspace
>> >> > patch series, this patch is to enable NSH support
>> >> > in kernel data path in order that OVS can support
>> >> > NSH in compat mode by porting this.
>> >> >
>> >> > Signed-off-by: Yi Yang 
>> >> > ---
>> >> I have comment related to checksum, otherwise patch looks good to me.
>> >
>> > Pravin, thank you for your comments, the below part is incremental patch
>> > for checksum, please help check it, I'll send out v16 with this after
>> > you confirm.
>> >
>> This change looks good to me.
>> I noticed couple of more issues.
>> 1. Can you move the ovs_key_nsh to the union of ipv4 an ipv6?
>> ipv4/ipv6/nsh key data is mutually exclusive so there is no need for
>> separate space for nsh key in the ovs key.
>> 2. We need to fix match_validate() with nsh check. Datapath can not
>> allow any l3 or l4 match if the flow key contains nsh match and
>> vice-versa. such flow key should be rejected.
>
> Pravin, the below incremental patch should fix the issues you pionted
> out, please help confirm/ack, then I'll send out v16 with all acks
> from you all for merge. BTW, it has been verified in my sfc test
> environment.
>
Following patch looks good to me. But I think we needs similar
eth_type check for nsh set action in validate_set() and in
__ovs_nla_copy_actions() for NSH_POP action.

Can you send patch with all changes?

Thanks.

> diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
> index 8eeae749..c670dd2 100644
> --- a/net/openvswitch/flow.h
> +++ b/net/openvswitch/flow.h
> @@ -149,8 +149,8 @@ struct sw_flow_key {
> } nd;
> };
> } ipv6;
> +   struct ovs_key_nsh nsh; /* network service header */
> };
> -   struct ovs_key_nsh nsh; /* network service header */
> struct {
> /* Connection tracking fields not packed above. */
> struct {
> diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
> index 0d7d4ae..090103c 100644
> --- a/net/openvswitch/flow_netlink.c
> +++ b/net/openvswitch/flow_netlink.c
> @@ -178,7 +178,8 @@ static bool match_validate(const struct sw_flow_match 
> *match,
> | (1 << OVS_KEY_ATTR_ICMPV6)
> | (1 << OVS_KEY_ATTR_ARP)
> | (1 << OVS_KEY_ATTR_ND)
> -   | (1 << OVS_KEY_ATTR_MPLS));
> +   | (1 << OVS_KEY_ATTR_MPLS)
> +   | (1 << OVS_KEY_ATTR_NSH));
>
> /* Always allowed mask fields. */
> mask_allowed |= ((1 << OVS_KEY_ATTR_TUNNEL)
> @@ -287,6 +288,14 @@ static bool match_validate(const struct sw_flow_match 
> *match,
> }
> }
>
> +   if (match->key->eth.type == htons(ETH_P_NSH)) {
> +   key_expected |= 1 << OVS_KEY_ATTR_NSH;
> +   if (match->mask &&
> +   match->mask->key.eth.type == htons(0x)) {
> +   mask_allowed |= 1 << OVS_KEY_ATTR_NSH;
> +   }
> +   }
> +
> if ((key_attrs & key_expected) != key_expected) {
> /* Key attributes check failed. */
> OVS_NLERR(log, "Missing key (keys=%llx, expected=%llx)",

Re: [patch net-next 3/5] net: sched: introduce block mechanism to handle netif_keep_dst calls

2017-11-04 Thread Daniel Borkmann


On 11/04/2017 10:55 AM, Jiri Pirko wrote:

Fri, Nov 03, 2017 at 09:15:54PM CET, dan...@iogearbox.net wrote:

On 11/03/2017 06:19 PM, Jiri Pirko wrote:

From: Jiri Pirko 

Couple of classifiers call netif_keep_dst directly on q->dev. That is
not possible to do directly for shared blocke where multiple qdiscs are
owning the block. So introduce a infrastructure to keep track of the
block owners in list and use this list to implement block variant of
netif_keep_dst.

Signed-off-by: Jiri Pirko 

[...]

+struct tcf_block_owner_item {
+   struct list_head list;
+   struct Qdisc *q;
+   enum tcf_block_binder_type binder_type;
+};
+
+static void
+tcf_block_owner_netif_keep_dst(struct tcf_block *block,
+  struct Qdisc *q,
+  enum tcf_block_binder_type binder_type)
+{
+   if (block->keep_dst &&
+   binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)


Why we need to keep dst on TCF_BLOCK_BINDER_TYPE_CLSACT_EGRESS ?
I presume this enum means sch_handle_egress() ? dst is dropped
later ...


This is because of the bpf check:
if (fp->dst_needed && !(tp->q->flags & TCQ_F_INGRESS))
netif_keep_dst(qdisc_dev(tp->q));

I just maintain the same logic here.


No, that's a wrong claim, really ...

clsact in general hooks into the same logic as ingress, so TC_H_CLSACT
as major needs to reuse TC_H_INGRESS, and qdiscs set up as such set
TCQ_F_INGRESS as flags. For clsact that means both your block binder
types for clsact here (ingress/egress).

Please make sure that your other changes don't have similar assumption.

Re: [PATCH net-next v2 03/15] bpf: report offload info to user space

2017-11-04 Thread Jakub Kicinski

On Sat, 4 Nov 2017 18:45:31 +0900, Alexei Starovoitov wrote:
> On Fri, Nov 03, 2017 at 01:56:18PM -0700, Jakub Kicinski wrote:
> > Extend struct bpf_prog_info to contain information about program
> > being bound to a device.  Since the netdev may get destroyed while
> > program still exists we need a flag to indicate the program is
> > loaded for a device, even if the device is gone.
> > 
> > Signed-off-by: Jakub Kicinski 
> > Reviewed-by: Simon Horman 
> > Reviewed-by: Quentin Monnet 
> > ---
> >  include/linux/bpf.h  |  1 +
> >  include/uapi/linux/bpf.h |  6 ++
> >  kernel/bpf/offload.c | 12 
> >  kernel/bpf/syscall.c |  5 +
> >  4 files changed, 24 insertions(+)
> > 
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index e45d43f9ec92..98bacd0fa5cc 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -506,6 +506,7 @@ static inline int cpu_map_enqueue(struct 
> > bpf_cpu_map_entry *rcpu,
> >  
> >  int bpf_prog_offload_compile(struct bpf_prog *prog);
> >  void bpf_prog_offload_destroy(struct bpf_prog *prog);
> > +u32 bpf_prog_offload_ifindex(struct bpf_prog *prog);
> >  
> >  #if defined(CONFIG_NET) && defined(CONFIG_BPF_SYSCALL)
> >  int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr);
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 727a3dba13e6..e92f62cf933a 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -894,6 +894,10 @@ enum sk_action {
> >  
> >  #define BPF_TAG_SIZE   8
> >  
> > +enum bpf_prog_status {
> > +   BPF_PROG_STATUS_DEV_BOUND   = (1 << 0),
> > +};
> > +
> >  struct bpf_prog_info {
> > __u32 type;
> > __u32 id;
> > @@ -907,6 +911,8 @@ struct bpf_prog_info {
> > __u32 nr_map_ids;
> > __aligned_u64 map_ids;
> > char name[BPF_OBJ_NAME_LEN];
> > +   __u32 ifindex;
> > +   __u32 status;  
> 
> why status is needed?
> ifindex cannot be zero, so if it's set > 0 would mean
> that the program is bound.

Devices may come and go, independently from the lifetime of the program,
therefore there is a notion of a program which has been loaded for a
particular device but the device is gone (and therefore its ifindex is
meaningless).  I tried to explain this in the commit message.

> Also would be good to have consistent name with prog_load.
> imo prog_target_ifindex is too long.
> May be call it 'ifindex' both in bpf_attr and in bpf_prog_info ?

Perhaps I'm missing something, but bpf_attr is a huge union of (mostly)
unnamed anonymous structs.  I foresee that we will have to add an
ifindex member for a map command as well, therefore the prog_* prefix
seems prudent.  Should I go back to prog_ifindex in bpf_attr?

Or perhaps should I duplicate the struct for BPF_PROG_LOAD but this
time give it a member name so we can extend it without worrying about
member name conflicts?

Re: [patch net-next 3/5] net: sched: introduce block mechanism to handle netif_keep_dst calls

2017-11-04 Thread Jiri Pirko

Fri, Nov 03, 2017 at 09:15:54PM CET, dan...@iogearbox.net wrote:
>On 11/03/2017 06:19 PM, Jiri Pirko wrote:
>> From: Jiri Pirko 
>> 
>> Couple of classifiers call netif_keep_dst directly on q->dev. That is
>> not possible to do directly for shared blocke where multiple qdiscs are
>> owning the block. So introduce a infrastructure to keep track of the
>> block owners in list and use this list to implement block variant of
>> netif_keep_dst.
>> 
>> Signed-off-by: Jiri Pirko 
>[...]
>> +struct tcf_block_owner_item {
>> +struct list_head list;
>> +struct Qdisc *q;
>> +enum tcf_block_binder_type binder_type;
>> +};
>> +
>> +static void
>> +tcf_block_owner_netif_keep_dst(struct tcf_block *block,
>> +   struct Qdisc *q,
>> +   enum tcf_block_binder_type binder_type)
>> +{
>> +if (block->keep_dst &&
>> +binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
>
>Why we need to keep dst on TCF_BLOCK_BINDER_TYPE_CLSACT_EGRESS ?
>I presume this enum means sch_handle_egress() ? dst is dropped
>later ...

This is because of the bpf check:
   if (fp->dst_needed && !(tp->q->flags & TCQ_F_INGRESS))
   netif_keep_dst(qdisc_dev(tp->q));

I just maintain the same logic here.

Re: [PATCH net-next] tools: bpftool: move p_err() and p_info() from main.h to common.c

2017-11-04 Thread Alexei Starovoitov

On Fri, Nov 03, 2017 at 01:59:07PM -0700, Jakub Kicinski wrote:
> From: Quentin Monnet 
> 
> The two functions were declared as static inline in a header file. There
> is no particular reason why they should be inlined, they just happened to
> remain in the same header file when they were turned from macros to
> functions in a precious commit.
> 
> Make them non-inlined functions and move them to common.c file instead.
> 
> Suggested-by: Joe Perches 
> Signed-off-by: Quentin Monnet 
> Signed-off-by: Jakub Kicinski 

Acked-by: Alexei Starovoitov

Re: [PATCH net-next v2 03/15] bpf: report offload info to user space

2017-11-04 Thread Alexei Starovoitov

On Fri, Nov 03, 2017 at 01:56:18PM -0700, Jakub Kicinski wrote:
> Extend struct bpf_prog_info to contain information about program
> being bound to a device.  Since the netdev may get destroyed while
> program still exists we need a flag to indicate the program is
> loaded for a device, even if the device is gone.
> 
> Signed-off-by: Jakub Kicinski 
> Reviewed-by: Simon Horman 
> Reviewed-by: Quentin Monnet 
> ---
>  include/linux/bpf.h  |  1 +
>  include/uapi/linux/bpf.h |  6 ++
>  kernel/bpf/offload.c | 12 
>  kernel/bpf/syscall.c |  5 +
>  4 files changed, 24 insertions(+)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index e45d43f9ec92..98bacd0fa5cc 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -506,6 +506,7 @@ static inline int cpu_map_enqueue(struct 
> bpf_cpu_map_entry *rcpu,
>  
>  int bpf_prog_offload_compile(struct bpf_prog *prog);
>  void bpf_prog_offload_destroy(struct bpf_prog *prog);
> +u32 bpf_prog_offload_ifindex(struct bpf_prog *prog);
>  
>  #if defined(CONFIG_NET) && defined(CONFIG_BPF_SYSCALL)
>  int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr);
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 727a3dba13e6..e92f62cf933a 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -894,6 +894,10 @@ enum sk_action {
>  
>  #define BPF_TAG_SIZE 8
>  
> +enum bpf_prog_status {
> + BPF_PROG_STATUS_DEV_BOUND   = (1 << 0),
> +};
> +
>  struct bpf_prog_info {
>   __u32 type;
>   __u32 id;
> @@ -907,6 +911,8 @@ struct bpf_prog_info {
>   __u32 nr_map_ids;
>   __aligned_u64 map_ids;
>   char name[BPF_OBJ_NAME_LEN];
> + __u32 ifindex;
> + __u32 status;

why status is needed?
ifindex cannot be zero, so if it's set > 0 would mean
that the program is bound.
Also would be good to have consistent name with prog_load.
imo prog_target_ifindex is too long.
May be call it 'ifindex' both in bpf_attr and in bpf_prog_info ?

Re: [net-next 03/12] net/mlx5: Add MLX5_SET16 and MLX5_GET16

2017-11-04 Thread David Miller

From: Or Gerlitz 
Date: Sat, 4 Nov 2017 18:05:29 +0900

> On Sat, Nov 4, 2017 at 5:50 PM, Saeed Mahameed  wrote:
>> From: Huy Nguyen 
>>
>> Add MLX5_SET16 and MLX5_GET16 for 16bit structure field in firmware
>> command.
>>
>> Signed-off-by: Huy Nguyen 
>> Reviewed-by: Parav Pandit 
>> Signed-off-by: Saeed Mahameed 
> 
> This was reviewed by Eli Cohen, his R.B is missing here. Eli - if you
> can reply and add it such that
> patchworks will pick it up.

Not if I pull from Saeed's tree, which is what I usually do for mlx5
submissions.

Re: [RFC PATCH] bpf: Add helpers to read useful task_struct members

2017-11-04 Thread Alexei Starovoitov


On 11/3/17 3:58 PM, Sandipan Das wrote:

For added security, the layout of some structures can be
randomized by enabling CONFIG_GCC_PLUGIN_RANDSTRUCT. One
such structure is task_struct. To build BPF programs, we
use Clang which does not support this feature. So, if we
attempt to read a field of a structure with a randomized
layout within a BPF program, we do not get the expected
value because of incorrect offsets. To observe this, it
is not mandatory to have CONFIG_GCC_PLUGIN_RANDSTRUCT
enabled because the structure annotations/members added
for this purpose are enough to cause this. So, all kernel
builds are affected.

For example, considering samples/bpf/offwaketime_kern.c,
if we try to print the values of pid and comm inside the
task_struct passed to waker() by adding the following
lines of code at the appropriate place

  char fmt[] = "waker(): p->pid = %u, p->comm = %s\n";
  bpf_trace_printk(fmt, sizeof(fmt), _(p->pid), _(p->comm));

it is seen that upon rebuilding and running this sample
followed by inspecting /sys/kernel/debug/tracing/trace,
the output looks like the following

   _-=> irqs-off
  / _=> need-resched
 | / _---=> hardirq/softirq
 || / _--=> preempt-depth
 ||| / delay
TASK-PID   CPU#  TIMESTAMP  FUNCTION
   | |   |      | |
  -0 [007] d.s.  1883.443594: 0x0001: waker(): p->pid = 0, 
p->comm =
  -0 [018] d.s.  1883.453588: 0x0001: waker(): p->pid = 0, 
p->comm =
  -0 [007] d.s.  1883.463584: 0x0001: waker(): p->pid = 0, 
p->comm =
  -0 [009] d.s.  1883.483586: 0x0001: waker(): p->pid = 0, 
p->comm =
  -0 [005] d.s.  1883.493583: 0x0001: waker(): p->pid = 0, 
p->comm =
  -0 [009] d.s.  1883.503583: 0x0001: waker(): p->pid = 0, 
p->comm =
  -0 [018] d.s.  1883.513578: 0x0001: waker(): p->pid = 0, 
p->comm =
 systemd-journal-3140  [003] d...  1883.627660: 0x0001: waker(): p->pid = 0, 
p->comm =
 systemd-journal-3140  [003] d...  1883.627704: 0x0001: waker(): p->pid = 0, 
p->comm =
 systemd-journal-3140  [003] d...  1883.627723: 0x0001: waker(): p->pid = 0, 
p->comm =

To avoid this, we add new BPF helpers that read the
correct values for some of the important task_struct
members such as pid, tgid, comm and flags which are
extensively used in BPF-based analysis tools such as
bcc. Since these helpers are built with GCC, they use
the correct offsets when referencing a member.

Signed-off-by: Sandipan Das 

...

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index f90860d1f897..324508d27bd2 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -338,6 +338,16 @@ union bpf_attr {
  * @skb: pointer to skb
  * Return: classid if != 0
  *
+ * u64 bpf_get_task_pid_tgid(struct task_struct *task)
+ * Return: task->tgid << 32 | task->pid
+ *
+ * int bpf_get_task_comm(struct task_struct *task)
+ * Stores task->comm into buf
+ * Return: 0 on success or negative error
+ *
+ * u32 bpf_get_task_flags(struct task_struct *task)
+ * Return: task->flags
+ *


I don't think it's a solution.
Tracing scripts read other fields too.
Making it work for these 3 fields is a drop in a bucket.
If randomization is used I think we have to accept
that existing bpf scripts won't be usable.
Long term solution is to support 'BPF Type Format' or BTF
(which is old C-Type Format) for kernel data structures,
so bcc scripts wouldn't need to use kernel headers and clang.
The proper offsets will be described in BTF.
We were planning to use it initially to describe map key/value,
but it applies for this case as well.
There will be a tool that will take dwarf from vmlinux and
compress it into BTF. Kernel will also be able to verify
that BTF is a valid BTF.
I'm assuming that gcc randomization plugin produces dwarf
with correct offsets, if not, it would have to be fixed.

Re: [net-next 06/12] net/mlx5e: Support DSCP trust state to Ethernet's IP packet on SQ

2017-11-04 Thread Or Gerlitz

On Sat, Nov 4, 2017 at 5:50 PM, Saeed Mahameed  wrote:
> From: Huy Nguyen 
>
> If the port is in DSCP trust state, packets are placed in the right
> priority queue based on the dscp value. This is done by selecting
> the transmit queue based on the dscp of the skb.
>
> Until now select_queue honors priority only from the vlan header.
> However that is not sufficient in cases where port trust state is DSCP
> mode as packet might not even contain vlan header. Therefore if the port
> is in dscp trust state and vport's min inline mode is not NONE,
> copy the IP header to the eseg's inline header if the skb has it.
> This is done by changing the transmit queue sq's min inline mode to L3.
> Note that the min inline mode of sqs that belong to other features such
> as xdpsq, icosq are not modified.
>
> Signed-off-by: Huy Nguyen 
> Reviewed-by: Parav Pandit 
> Signed-off-by: Saeed Mahameed 

Reviewed-by: Or Gerlitz

Re: [net-next 04/12] net/mlx5: QPTS and QPDPM register firmware command support

2017-11-04 Thread Or Gerlitz

On Sat, Nov 4, 2017 at 5:50 PM, Saeed Mahameed  wrote:
> From: Huy Nguyen 
>
> The QPTS register allows changing the priority trust state between pcp and
> dscp. Add support to get/set trust state from device. When the port is
> in pcp/dscp trust state, packet is routed by hardware to matching priority
> based on its pcp/dscp value respectively.
>
> The QPDPM register allow channing the dscp to priority mapping. Add support
> to get/set dscp to priority mapping from device.
> Note that to change a dscp mapping, the "e" bit of this dscp structure
> must be set in the QPDPM firmware command.
>
> Signed-off-by: Huy Nguyen 
> Reviewed-by: Parav Pandit 
> Signed-off-by: Saeed Mahameed 

Reviewed-by: Or Gerlitz

Re: [net-next 03/12] net/mlx5: Add MLX5_SET16 and MLX5_GET16

2017-11-04 Thread Or Gerlitz

On Sat, Nov 4, 2017 at 5:50 PM, Saeed Mahameed  wrote:
> From: Huy Nguyen 
>
> Add MLX5_SET16 and MLX5_GET16 for 16bit structure field in firmware
> command.
>
> Signed-off-by: Huy Nguyen 
> Reviewed-by: Parav Pandit 
> Signed-off-by: Saeed Mahameed 

This was reviewed by Eli Cohen, his R.B is missing here. Eli - if you
can reply and add it such that
patchworks will pick it up.

Re: [net-next 02/12] net/mlx5: QCAM register firmware command support

2017-11-04 Thread Or Gerlitz

On Sat, Nov 4, 2017 at 5:50 PM, Saeed Mahameed  wrote:
>
> From: Huy Nguyen 
>
> The QCAM register provides capability bit for all the QoS registers
> using ACCESS_REG command.
>
> Signed-off-by: Huy Nguyen 
> Reviewed-by: Parav Pandit 
> Signed-off-by: Saeed Mahameed 


Reviewed-by: Or Gerlitz

[net-next 04/12] net/mlx5: QPTS and QPDPM register firmware command support

2017-11-04 Thread Saeed Mahameed

From: Huy Nguyen 

The QPTS register allows changing the priority trust state between pcp and
dscp. Add support to get/set trust state from device. When the port is
in pcp/dscp trust state, packet is routed by hardware to matching priority
based on its pcp/dscp value respectively.

The QPDPM register allow channing the dscp to priority mapping. Add support
to get/set dscp to priority mapping from device.
Note that to change a dscp mapping, the "e" bit of this dscp structure
must be set in the QPDPM firmware command.

Signed-off-by: Huy Nguyen 
Reviewed-by: Parav Pandit 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/port.c | 99 ++
 include/linux/mlx5/driver.h|  7 ++
 include/linux/mlx5/mlx5_ifc.h  | 20 ++
 include/linux/mlx5/port.h  |  5 ++
 4 files changed, 131 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c 
b/drivers/net/ethernet/mellanox/mlx5/core/port.c
index b6553be841f9..c37d00cd472a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c
@@ -971,3 +971,102 @@ int mlx5_set_mtppse(struct mlx5_core_dev *mdev, u8 pin, 
u8 arm, u8 mode)
return mlx5_core_access_reg(mdev, in, sizeof(in), out,
sizeof(out), MLX5_REG_MTPPSE, 0, 1);
 }
+
+int mlx5_set_trust_state(struct mlx5_core_dev *mdev, u8 trust_state)
+{
+   u32 out[MLX5_ST_SZ_DW(qpts_reg)] = {};
+   u32 in[MLX5_ST_SZ_DW(qpts_reg)] = {};
+   int err;
+
+   MLX5_SET(qpts_reg, in, local_port, 1);
+   MLX5_SET(qpts_reg, in, trust_state, trust_state);
+
+   err = mlx5_core_access_reg(mdev, in, sizeof(in), out,
+  sizeof(out), MLX5_REG_QPTS, 0, 1);
+   return err;
+}
+
+int mlx5_query_trust_state(struct mlx5_core_dev *mdev, u8 *trust_state)
+{
+   u32 out[MLX5_ST_SZ_DW(qpts_reg)] = {};
+   u32 in[MLX5_ST_SZ_DW(qpts_reg)] = {};
+   int err;
+
+   MLX5_SET(qpts_reg, in, local_port, 1);
+
+   err = mlx5_core_access_reg(mdev, in, sizeof(in), out,
+  sizeof(out), MLX5_REG_QPTS, 0, 0);
+   if (!err)
+   *trust_state = MLX5_GET(qpts_reg, out, trust_state);
+
+   return err;
+}
+
+int mlx5_set_dscp2prio(struct mlx5_core_dev *mdev, u8 dscp, u8 prio)
+{
+   int sz = MLX5_ST_SZ_BYTES(qpdpm_reg);
+   void *qpdpm_dscp;
+   void *out;
+   void *in;
+   int err;
+
+   in = kzalloc(sz, GFP_KERNEL);
+   out = kzalloc(sz, GFP_KERNEL);
+   if (!in || !out) {
+   err = -ENOMEM;
+   goto out;
+   }
+
+   MLX5_SET(qpdpm_reg, in, local_port, 1);
+   err = mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_QPDPM, 0, 0);
+   if (err)
+   goto out;
+
+   memcpy(in, out, sz);
+   MLX5_SET(qpdpm_reg, in, local_port, 1);
+
+   /* Update the corresponding dscp entry */
+   qpdpm_dscp = MLX5_ADDR_OF(qpdpm_reg, in, dscp[dscp]);
+   MLX5_SET16(qpdpm_dscp_reg, qpdpm_dscp, prio, prio);
+   MLX5_SET16(qpdpm_dscp_reg, qpdpm_dscp, e, 1);
+   err = mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_QPDPM, 0, 1);
+
+out:
+   kfree(in);
+   kfree(out);
+   return err;
+}
+
+/* dscp2prio[i]: priority that dscp i mapped to */
+#define MLX5E_SUPPORTED_DSCP 64
+int mlx5_query_dscp2prio(struct mlx5_core_dev *mdev, u8 *dscp2prio)
+{
+   int sz = MLX5_ST_SZ_BYTES(qpdpm_reg);
+   void *qpdpm_dscp;
+   void *out;
+   void *in;
+   int err;
+   int i;
+
+   in = kzalloc(sz, GFP_KERNEL);
+   out = kzalloc(sz, GFP_KERNEL);
+   if (!in || !out) {
+   err = -ENOMEM;
+   goto out;
+   }
+
+   MLX5_SET(qpdpm_reg, in, local_port, 1);
+   err = mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_QPDPM, 0, 0);
+   if (err)
+   goto out;
+
+   for (i = 0; i < (MLX5E_SUPPORTED_DSCP); i++) {
+   qpdpm_dscp = MLX5_ADDR_OF(qpdpm_reg, out, dscp[i]);
+   dscp2prio[i] = MLX5_GET16(qpdpm_dscp_reg, qpdpm_dscp, prio);
+   }
+
+out:
+   kfree(in);
+   kfree(out);
+   return err;
+}
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index ed5be52282ea..a886b51511ab 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -107,8 +107,10 @@ enum {
 };
 
 enum {
+   MLX5_REG_QPTS= 0x4002,
MLX5_REG_QETCR   = 0x4005,
MLX5_REG_QTCT= 0x400a,
+   MLX5_REG_QPDPM   = 0x4013,
MLX5_REG_QCAM= 0x4019,
MLX5_REG_DCBX_PARAM  = 0x4020,
MLX5_REG_DCBX_APP= 0x4021,
@@ -142,6 +144,11 @@ enum {
MLX5_REG_MCAM= 0x907f,
 };
 
+enum mlx5_qpts_trust_state {
+   MLX5_QPTS_TRUST_PCP  = 1,
+   MLX5_QPTS_TRUST_DSCP = 2,
+};
+
 enum mlx5_dcbx_oper_mode {
M

[net-next 07/12] net/mlx5e: Add support for ethtool msglvl support

2017-11-04 Thread Saeed Mahameed

From: Gal Pressman 

Use ethtool -s  msglvl  on/off to toggle debug messages.

Signed-off-by: Gal Pressman 
Signed-off-by: Inbar Karmy 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h | 11 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c | 13 +
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c|  1 +
 3 files changed, 25 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index fae7b62d173f..8c872e2e1aa0 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -127,6 +127,16 @@
 
 #define MLX5E_NUM_MAIN_GROUPS 9
 
+#define MLX5E_MSG_LEVELNETIF_MSG_LINK
+
+#define mlx5e_dbg(mlevel, priv, format, ...)\
+do {\
+   if (NETIF_MSG_##mlevel & (priv)->msglevel)  \
+   netdev_warn(priv->netdev, format,   \
+   ##__VA_ARGS__); \
+} while (0)
+
+
 static inline u16 mlx5_min_rx_wqes(int wq_type, u32 wq_size)
 {
switch (wq_type) {
@@ -754,6 +764,7 @@ struct mlx5e_priv {
 #endif
/* priv data path fields - end */
 
+   u32msglevel;
unsigned long  state;
struct mutex   state_lock; /* Protects Interface state */
struct mlx5e_rqdrop_rq;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index b34aa8efb036..63d1ac695a75 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -1340,6 +1340,16 @@ static int mlx5e_set_wol(struct net_device *netdev, 
struct ethtool_wolinfo *wol)
return mlx5_set_port_wol(mdev, mlx5_wol_mode);
 }
 
+static u32 mlx5e_get_msglevel(struct net_device *dev)
+{
+   return ((struct mlx5e_priv *)netdev_priv(dev))->msglevel;
+}
+
+static void mlx5e_set_msglevel(struct net_device *dev, u32 val)
+{
+   ((struct mlx5e_priv *)netdev_priv(dev))->msglevel = val;
+}
+
 static int mlx5e_set_phys_id(struct net_device *dev,
 enum ethtool_phys_id_state state)
 {
@@ -1672,4 +1682,7 @@ const struct ethtool_ops mlx5e_ethtool_ops = {
.get_priv_flags= mlx5e_get_priv_flags,
.set_priv_flags= mlx5e_set_priv_flags,
.self_test = mlx5e_self_test,
+   .get_msglevel  = mlx5e_get_msglevel,
+   .set_msglevel  = mlx5e_set_msglevel,
+
 };
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index a97ee38143aa..73d7c672c4ff 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -4091,6 +4091,7 @@ static void mlx5e_build_nic_netdev_priv(struct 
mlx5_core_dev *mdev,
priv->netdev  = netdev;
priv->profile = profile;
priv->ppriv   = ppriv;
+   priv->msglevel= MLX5E_MSG_LEVEL;
priv->hard_mtu = MLX5E_ETH_HARD_MTU;
 
mlx5e_build_nic_params(mdev, &priv->channels.params, 
profile->max_nch(mdev));
-- 
2.14.2

[net-next 12/12] net/mlx5e: Enable CQE based moderation on TX CQ

2017-11-04 Thread Saeed Mahameed

From: Tal Gilboa 

By using CQE based moderation on TX CQ we can reduce the number of TX
interrupt rate. Besides the benefit of less interrupts, this also
allows the kernel to better utilize TSO. Since TSO has some CPU overhead,
it might not aggregate when CPU is under high stress. By reducing the
interrupt rate and the CPU utilization, we can get better aggregation
and better overall throughput.
The feature is enabled by default and has a private flag in ethtool
for control.

Throughput, interrupt rate and TSO utilization improvements:
(ConnectX-4Lx 40GbE, unidirectional, 1/16 TCP streams, 64B packets)
-
Metric   | Streams | CQE Based | EQE Based | improvement
-
BW   |1|  2.4Gb/s  | 2.15Gb/s  |  +11.6%
IR   |1|  27Kips   | 50.6Kips  |  -46.7%
TSO Util |1|  74.6%| 71%   |  +5%
BW   |16   |  29Gb/s   | 25.85Gb/s |  +12.2%
IR   |16   |  482Kips  | 745Kips   |  -35.3%
TSO Util |16   |  69.1%| 49%   |  +41.1%

*BW = Bandwidth, IR = Interrupt rate, ips = interrupt per second.
TSO Util = bytes in TSO sessions / all bytes transferred

Signed-off-by: Tal Gilboa 
Signed-off-by: Saeed Mahameed 

Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  9 +++--
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   | 39 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 38 +++--
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c |  8 +++--
 4 files changed, 71 insertions(+), 23 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 95facdf62c77..751f62cae969 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -106,6 +106,7 @@
 #define MLX5E_PARAMS_DEFAULT_RX_CQ_MODERATION_USEC_FROM_CQE 0x3
 #define MLX5E_PARAMS_DEFAULT_RX_CQ_MODERATION_PKTS  0x20
 #define MLX5E_PARAMS_DEFAULT_TX_CQ_MODERATION_USEC  0x10
+#define MLX5E_PARAMS_DEFAULT_TX_CQ_MODERATION_USEC_FROM_CQE 0x10
 #define MLX5E_PARAMS_DEFAULT_TX_CQ_MODERATION_PKTS  0x20
 #define MLX5E_PARAMS_DEFAULT_MIN_RX_WQES0x80
 #define MLX5E_PARAMS_DEFAULT_MIN_RX_WQES_MPW0x2
@@ -198,12 +199,14 @@ extern const char mlx5e_self_tests[][ETH_GSTRING_LEN];
 
 static const char mlx5e_priv_flags[][ETH_GSTRING_LEN] = {
"rx_cqe_moder",
+   "tx_cqe_moder",
"rx_cqe_compress",
 };
 
 enum mlx5e_priv_flag {
MLX5E_PFLAG_RX_CQE_BASED_MODER = (1 << 0),
-   MLX5E_PFLAG_RX_CQE_COMPRESS = (1 << 1),
+   MLX5E_PFLAG_TX_CQE_BASED_MODER = (1 << 1),
+   MLX5E_PFLAG_RX_CQE_COMPRESS = (1 << 2),
 };
 
 #define MLX5E_SET_PFLAG(params, pflag, enable) \
@@ -223,6 +226,7 @@ enum mlx5e_priv_flag {
 struct mlx5e_cq_moder {
u16 usec;
u16 pkts;
+   u8 cq_period_mode;
 };
 
 struct mlx5e_params {
@@ -234,7 +238,6 @@ struct mlx5e_params {
u8  log_rq_size;
u16 num_channels;
u8  num_tc;
-   u8  rx_cq_period_mode;
bool rx_cqe_compress_def;
struct mlx5e_cq_moder rx_cq_moderation;
struct mlx5e_cq_moder tx_cq_moderation;
@@ -926,6 +929,8 @@ void mlx5e_build_default_indir_rqt(u32 *indirection_rqt, 
int len,
   int num_channels);
 int mlx5e_get_max_linkspeed(struct mlx5_core_dev *mdev, u32 *speed);
 
+void mlx5e_set_tx_cq_mode_params(struct mlx5e_params *params,
+u8 cq_period_mode);
 void mlx5e_set_rx_cq_mode_params(struct mlx5e_params *params,
 u8 cq_period_mode);
 void mlx5e_set_rq_type_params(struct mlx5_core_dev *mdev,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 63d1ac695a75..23425f028405 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -1454,29 +1454,36 @@ static int mlx5e_get_module_eeprom(struct net_device 
*netdev,
 
 typedef int (*mlx5e_pflag_handler)(struct net_device *netdev, bool enable);
 
-static int set_pflag_rx_cqe_based_moder(struct net_device *netdev, bool enable)
+static int set_pflag_cqe_based_moder(struct net_device *netdev, bool enable,
+bool is_rx_cq)
 {
struct mlx5e_priv *priv = netdev_priv(netdev);
struct mlx5_core_dev *mdev = priv->mdev;
struct mlx5e_channels new_channels = {};
-   bool rx_mode_changed;
-   u8 rx_cq_period_mode;
+   bool mode_changed;
+   u8 cq_period_mode, current_cq_period_mode;
int err = 0;
 
-   rx_cq_period_mode = enable ?
+   cq_period_mode = enable ?
MLX5_CQ_PERIOD_MODE_START_FROM_CQE :
MLX5_CQ_PERIOD_MODE_START_FROM_EQE;
-   rx_mode_changed = rx_cq_

[net-next 09/12] net/mlx5: Enlarge the NIC TC offload table size

2017-11-04 Thread Saeed Mahameed

From: Or Gerlitz 

The NIC TC offload table size was hard coded to 1k. Change it to be

  min(max NIC RX table size,
  min(max flow counters, 64k) * num flow groups)

where the max values are read from the firmware and the number of
flow groups is hard-coded as before this change.

We don't know upfront the division of flows to groups (== different masks).
This setup allows each group to be of size up to the where we want to go
(when supported, all offloaded flows use counters). Thus, we don't expect
multiple occurences for a group which in turn would add steering hops.

Signed-off-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 9ba1f72060aa..55979ec2e88a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -90,8 +90,8 @@ enum {
MLX5_HEADER_TYPE_NVGRE = 0x1,
 };
 
-#define MLX5E_TC_TABLE_NUM_ENTRIES 1024
 #define MLX5E_TC_TABLE_NUM_GROUPS 4
+#define MLX5E_TC_TABLE_MAX_GROUP_SIZE (1 << 16)
 
 struct mod_hdr_key {
int num_actions;
@@ -263,10 +263,21 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv,
}
 
if (IS_ERR_OR_NULL(priv->fs.tc.t)) {
+   int tc_grp_size, tc_tbl_size;
+   u32 max_flow_counter;
+
+   max_flow_counter = (MLX5_CAP_GEN(dev, max_flow_counter_31_16) 
<< 16) |
+   MLX5_CAP_GEN(dev, max_flow_counter_15_0);
+
+   tc_grp_size = min_t(int, max_flow_counter, 
MLX5E_TC_TABLE_MAX_GROUP_SIZE);
+
+   tc_tbl_size = min_t(int, tc_grp_size * 
MLX5E_TC_TABLE_NUM_GROUPS,
+   BIT(MLX5_CAP_FLOWTABLE_NIC_RX(dev, 
log_max_ft_size)));
+
priv->fs.tc.t =
mlx5_create_auto_grouped_flow_table(priv->fs.ns,
MLX5E_TC_PRIO,
-   
MLX5E_TC_TABLE_NUM_ENTRIES,
+   tc_tbl_size,

MLX5E_TC_TABLE_NUM_GROUPS,
0, 0);
if (IS_ERR(priv->fs.tc.t)) {
-- 
2.14.2

[net-next 05/12] net/mlx5e: Add dcbnl dscp to priority support

2017-11-04 Thread Saeed Mahameed

From: Huy Nguyen 

This patch implements dcbnl hooks to set and delete DSCP to priority map
as defined by the DCB subsystem. Device maintains internal trust state
which needs to be set to DSCP state for performing DSCP to priority mapping.

When the first dscp to priority APP entry is added by the user, the
trust state is changed to dscp.

When the last dscp to priority APP entry is deleted by the user, the
trust state is changed to pcp.

If user sends multiple dscp to priority APP entries on the same dscp,
the last sent one will take effect. All the previous sent will be
deleted.

The dscp to priority APP entries are added and deleted in the net/dcb
APP database using dcb_ieee_setapp/getapp.

Signed-off-by: Huy Nguyen 
Reviewed-by: Parav Pandit 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  15 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 204 +
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  15 +-
 3 files changed, 232 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index e613ce02216d..ab6f0c18850f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -57,6 +57,7 @@
 #define MLX5E_HW2SW_MTU(priv, hwmtu) ((hwmtu) - ((priv)->hard_mtu))
 #define MLX5E_SW2HW_MTU(priv, swmtu) ((swmtu) + ((priv)->hard_mtu))
 
+#define MLX5E_MAX_DSCP  64
 #define MLX5E_MAX_NUM_TC   8
 
 #define MLX5E_PARAMS_MINIMUM_LOG_SQ_SIZE0x6
@@ -260,11 +261,17 @@ enum {
 struct mlx5e_dcbx {
enum mlx5_dcbx_oper_mode   mode;
struct mlx5e_cee_configcee_cfg; /* pending configuration */
+   u8 dscp_app_cnt;
 
/* The only setting that cannot be read from FW */
u8 tc_tsa[IEEE_8021QAZ_MAX_TCS];
u8 cap;
 };
+
+struct mlx5e_dcbx_dp {
+   u8 dscp2prio[MLX5E_MAX_DSCP];
+   u8 trust_state;
+};
 #endif
 
 enum {
@@ -742,6 +749,9 @@ struct mlx5e_priv {
/* priv data path fields - start */
struct mlx5e_txqsq *txq2sq[MLX5E_MAX_NUM_CHANNELS * MLX5E_MAX_NUM_TC];
int channel_tc2txq[MLX5E_MAX_NUM_CHANNELS][MLX5E_MAX_NUM_TC];
+#ifdef CONFIG_MLX5_CORE_EN_DCB
+   struct mlx5e_dcbx_dp   dcbx_dp;
+#endif
/* priv data path fields - end */
 
unsigned long  state;
@@ -800,6 +810,8 @@ struct mlx5e_profile {
mlx5e_fp_handle_rx_cqe handle_rx_cqe;
mlx5e_fp_handle_rx_cqe handle_rx_cqe_mpwqe;
} rx_handlers;
+   void(*netdev_registered_init)(struct mlx5e_priv *priv);
+   void(*netdev_registered_remove)(struct mlx5e_priv *priv);
int max_tc;
 };
 
@@ -968,6 +980,8 @@ extern const struct ethtool_ops mlx5e_ethtool_ops;
 extern const struct dcbnl_rtnl_ops mlx5e_dcbnl_ops;
 int mlx5e_dcbnl_ieee_setets_core(struct mlx5e_priv *priv, struct ieee_ets 
*ets);
 void mlx5e_dcbnl_initialize(struct mlx5e_priv *priv);
+void mlx5e_dcbnl_init_app(struct mlx5e_priv *priv);
+void mlx5e_dcbnl_delete_app(struct mlx5e_priv *priv);
 #endif
 
 #ifndef CONFIG_RFS_ACCEL
@@ -1069,5 +1083,4 @@ void mlx5e_destroy_netdev(struct mlx5e_priv *priv);
 void mlx5e_build_nic_params(struct mlx5_core_dev *mdev,
struct mlx5e_params *params,
u16 max_channels);
-
 #endif /* __MLX5_EN_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
index 51c4cc00a186..aa59c4324159 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
@@ -46,6 +46,13 @@ enum {
MLX5E_LOWEST_PRIO_GROUP   = 0,
 };
 
+#define MLX5_DSCP_SUPPORTED(mdev) (MLX5_CAP_GEN(mdev, qcam_reg)  && \
+  MLX5_CAP_QCAM_REG(mdev, qpts) && \
+  MLX5_CAP_QCAM_REG(mdev, qpdpm))
+
+static int mlx5e_set_trust_state(struct mlx5e_priv *priv, u8 trust_state);
+static int mlx5e_set_dscp2prio(struct mlx5e_priv *priv, u8 dscp, u8 prio);
+
 /* If dcbx mode is non-host set the dcbx mode to host.
  */
 static int mlx5e_dcbnl_set_dcbx_mode(struct mlx5e_priv *priv,
@@ -381,6 +388,113 @@ static u8 mlx5e_dcbnl_setdcbx(struct net_device *dev, u8 
mode)
return 0;
 }
 
+static int mlx5e_dcbnl_ieee_setapp(struct net_device *dev, struct dcb_app *app)
+{
+   struct mlx5e_priv *priv = netdev_priv(dev);
+   struct dcb_app temp;
+   bool is_new;
+   int err;
+
+   if (app->selector != IEEE_8021QAZ_APP_SEL_DSCP)
+   return -EINVAL;
+
+   if (!MLX5_CAP_GEN(priv->mdev, vport_group_manager))
+   return -EINVAL;
+
+   if (!MLX5_DSCP_SUPPORTED(priv->mdev))
+   return -EINVAL;
+
+   if (app->protocol >= MLX5E_MAX_DSC

[net-next 03/12] net/mlx5: Add MLX5_SET16 and MLX5_GET16

2017-11-04 Thread Saeed Mahameed

From: Huy Nguyen 

Add MLX5_SET16 and MLX5_GET16 for 16bit structure field in firmware
command.

Signed-off-by: Huy Nguyen 
Reviewed-by: Parav Pandit 
Signed-off-by: Saeed Mahameed 
---
 include/linux/mlx5/device.h | 17 +
 1 file changed, 17 insertions(+)

diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index 6d79b3f79458..409ffb14298a 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -49,11 +49,15 @@
 #define __mlx5_nullp(typ) ((struct mlx5_ifc_##typ##_bits *)0)
 #define __mlx5_bit_sz(typ, fld) sizeof(__mlx5_nullp(typ)->fld)
 #define __mlx5_bit_off(typ, fld) (offsetof(struct mlx5_ifc_##typ##_bits, fld))
+#define __mlx5_16_off(typ, fld) (__mlx5_bit_off(typ, fld) / 16)
 #define __mlx5_dw_off(typ, fld) (__mlx5_bit_off(typ, fld) / 32)
 #define __mlx5_64_off(typ, fld) (__mlx5_bit_off(typ, fld) / 64)
+#define __mlx5_16_bit_off(typ, fld) (16 - __mlx5_bit_sz(typ, fld) - 
(__mlx5_bit_off(typ, fld) & 0xf))
 #define __mlx5_dw_bit_off(typ, fld) (32 - __mlx5_bit_sz(typ, fld) - 
(__mlx5_bit_off(typ, fld) & 0x1f))
 #define __mlx5_mask(typ, fld) ((u32)((1ull << __mlx5_bit_sz(typ, fld)) - 1))
 #define __mlx5_dw_mask(typ, fld) (__mlx5_mask(typ, fld) << 
__mlx5_dw_bit_off(typ, fld))
+#define __mlx5_mask16(typ, fld) ((u16)((1ull << __mlx5_bit_sz(typ, fld)) - 1))
+#define __mlx5_16_mask(typ, fld) (__mlx5_mask16(typ, fld) << 
__mlx5_16_bit_off(typ, fld))
 #define __mlx5_st_sz_bits(typ) sizeof(struct mlx5_ifc_##typ##_bits)
 
 #define MLX5_FLD_SZ_BYTES(typ, fld) (__mlx5_bit_sz(typ, fld) / 8)
@@ -116,6 +120,19 @@ __mlx5_mask(typ, fld))
___t; \
 })
 
+#define MLX5_GET16(typ, p, fld) ((be16_to_cpu(*((__be16 *)(p) +\
+__mlx5_16_off(typ, fld))) >> __mlx5_16_bit_off(typ, fld)) & \
+__mlx5_mask16(typ, fld))
+
+#define MLX5_SET16(typ, p, fld, v) do { \
+   u16 _v = v; \
+   BUILD_BUG_ON(__mlx5_st_sz_bits(typ) % 16); \
+   *((__be16 *)(p) + __mlx5_16_off(typ, fld)) = \
+   cpu_to_be16((be16_to_cpu(*((__be16 *)(p) + __mlx5_16_off(typ, fld))) & \
+(~__mlx5_16_mask(typ, fld))) | (((_v) & __mlx5_mask16(typ, 
fld)) \
+<< __mlx5_16_bit_off(typ, fld))); \
+} while (0)
+
 /* Big endian getters */
 #define MLX5_GET64_BE(typ, p, fld) (*((__be64 *)(p) +\
__mlx5_64_off(typ, fld)))
-- 
2.14.2

[net-next 08/12] net/mlx5e: DCBNL, Add debug messages log

2017-11-04 Thread Saeed Mahameed

From: Inbar Karmy 

Add debug print when changing the configuration of QoS through dcbnl.
Use ethtool -s  msglvl hw on/off to toggle debug messages.

Signed-off-by: Inbar Karmy 
Reviewed-by: Gal Pressman 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 24 +-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
index b402d69a701b..c6d90b6dd80e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
@@ -241,7 +241,7 @@ int mlx5e_dcbnl_ieee_setets_core(struct mlx5e_priv *priv, 
struct ieee_ets *ets)
u8 tc_tx_bw[IEEE_8021QAZ_MAX_TCS];
u8 tc_group[IEEE_8021QAZ_MAX_TCS];
int max_tc = mlx5_max_tc(mdev);
-   int err;
+   int err, i;
 
mlx5e_build_tc_group(ets, tc_group, max_tc);
mlx5e_build_tc_tx_bw(ets, tc_tx_bw, tc_group, max_tc);
@@ -260,6 +260,14 @@ int mlx5e_dcbnl_ieee_setets_core(struct mlx5e_priv *priv, 
struct ieee_ets *ets)
return err;
 
memcpy(priv->dcbx.tc_tsa, ets->tc_tsa, sizeof(ets->tc_tsa));
+
+   for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++) {
+   mlx5e_dbg(HW, priv, "%s: prio_%d <=> tc_%d\n",
+ __func__, i, ets->prio_tc[i]);
+   mlx5e_dbg(HW, priv, "%s: tc_%d <=> tx_bw_%d%%, group_%d\n",
+ __func__, i, tc_tx_bw[i], tc_group[i]);
+   }
+
return err;
 }
 
@@ -345,6 +353,11 @@ static int mlx5e_dcbnl_ieee_setpfc(struct net_device *dev,
ret = mlx5_set_port_pfc(mdev, pfc->pfc_en, pfc->pfc_en);
mlx5_toggle_port_link(mdev);
 
+   if (!ret) {
+   mlx5e_dbg(HW, priv,
+ "%s: PFC per priority bit mask: 0x%x\n",
+ __func__, pfc->pfc_en);
+   }
return ret;
 }
 
@@ -560,6 +573,11 @@ static int mlx5e_dcbnl_ieee_setmaxrate(struct net_device 
*netdev,
}
}
 
+   for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++) {
+   mlx5e_dbg(HW, priv, "%s: tc_%d <=> max_bw %d Gbps\n",
+ __func__, i, max_bw_value[i]);
+   }
+
return mlx5_modify_port_ets_rate_limit(mdev, max_bw_value, max_bw_unit);
 }
 
@@ -585,6 +603,10 @@ static u8 mlx5e_dcbnl_setall(struct net_device *netdev)
ets.tc_rx_bw[i] = cee_cfg->pg_bw_pct[i];
ets.tc_tsa[i]   = IEEE_8021QAZ_TSA_ETS;
ets.prio_tc[i]  = cee_cfg->prio_to_pg_map[i];
+   mlx5e_dbg(HW, priv,
+ "%s: Priority group %d: tx_bw %d, rx_bw %d, prio_tc 
%d\n",
+ __func__, i, ets.tc_tx_bw[i], ets.tc_rx_bw[i],
+ ets.prio_tc[i]);
}
 
err = mlx5e_dbcnl_validate_ets(netdev, &ets);
-- 
2.14.2

[pull request][net-next 00/12] Mellanox, mlx5 updates 2017-11-04

2017-11-04 Thread Saeed Mahameed

Hi Dave,

The following series provides updates for mlx5 driver which includes
dscp to priority mapping support and some other misc small changes.

For extra information please see tag log below.

Please Pull and let me know if ther's any problem.

Thanks,
Saeed.

---

The following changes since commit 6ee79b6ebf6613f1c5bf2be0c3dca4e51817f2ca:

  Merge branch 'net-mini_Qdisc' (2017-11-03 21:57:35 +0900)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git 
tags/mlx5-updates-2017-11-04

for you to fetch changes up to a5b2e77eab21e56ec7bb16a0ebc8f0fb18799191:

  net/mlx5e: Enable CQE based moderation on TX CQ (2017-11-04 01:33:48 -0700)


mlx5-updates-2017-11-04

This series includes:

>From Huy, dscp to priority mapping for Ethernet packet.
=

First six patches enable differentiated services code point (dscp) to
priority mapping for Ethernet packet. Once this feature is
enabled, the packet is routed to the corresponding priority based on its
dscp. User can combine this feature with priority flow control (pfc)
feature to have priority flow control based on the dscp.

Firmware interface:
Mellanox firmware provides two control knobs for this feature:
  QPTS register allow changing the trust state between dscp and
  pcp mode. The default is pcp mode. Once in dscp mode, firmware will
  route the packet based on its dscp value if the dscp field exists.

  QPDPM register allow mapping a specific dscp (0 to 63) to a
  specific priority (0 to 7). By default, all the dscps are mapped to
  priority zero.

Software interface:
This feature is controlled via application priority TLV. IEEE
specification P802.1Qcd/D2.1 defines priority selector id 5 for
application priority TLV. This APP TLV selector defines DSCP to priority
map. This APP TLV can be sent by the switch or can be set locally using
software such as lldptool. In mlx5 drivers, we add the support for net
dcb's getapp and setapp call back. Mlx5 driver only handles the selector
id 5 application entry (dscp application priority application entry).
If user sends multiple dscp to priority APP TLV entries on the same
dscp, the last sent one will take effect. All the previous sent will be
deleted.

The firmware trust state (in QPTS register) is changed based on the
number of dscp to priority application entries. When the first dscp to
priority application entry is added by the user, the trust state is
changed to dscp. When the last dscp to priority application entry is
deleted by the user, the trust state is changed to pcp.

When the port is in DSCP trust state, the transmit queue is selected
based on the dscp of the skb.

When the port is in DSCP trust state and vport inline mode is not NONE,
firmware requires mlx5 driver to copy the IP header to the
wqe ethernet segment inline header if the skb has it.
This is done by changing the transmit queue sq's min inline mode to L3.
Note that the min inline mode of sqs that belong to other features
such as xdpsq, icosq are not modified.
==

Plus to the dscp series, some small misc changes are include as well:

>From Inbar, Ethtool msglvl support and some debug prints in DCBNL logic
>From Or Gerlitz, Enlarge the NIC TC offload table size
>From Rabie, Initialize destination_flow struct to 0
>From Feras, Add inner TTC table to IPoIB flow steering
>From Tal, Enable CQE based moderation on TX CQ

Thanks,
Saeed.


Feras Daoud (1):
  net/mlx5e: IPoIB, Add inner TTC table to IPoIB flow steering

Gal Pressman (1):
  net/mlx5e: Add support for ethtool msglvl support

Huy Nguyen (6):
  net/dcb: Add dscp to priority selector type
  net/mlx5: QCAM register firmware command support
  net/mlx5: Add MLX5_SET16 and MLX5_GET16
  net/mlx5: QPTS and QPDPM register firmware command support
  net/mlx5e: Add dcbnl dscp to priority support
  net/mlx5e: Support DSCP trust state to Ethernet's IP packet on SQ

Inbar Karmy (1):
  net/mlx5e: DCBNL, Add debug messages log

Or Gerlitz (1):
  net/mlx5: Enlarge the NIC TC offload table size

Rabie Loulou (1):
  net/mlx5: Initialize destination_flow struct to 0

Tal Gilboa (1):
  net/mlx5e: Enable CQE based moderation on TX CQ

 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  39 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c  |  10 +-
 .../net/ethernet/mellanox/mlx5/core/en_common.c|  12 +
 drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 265 -
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |  52 +++-
 drivers/net/ethernet/mellanox/mlx5/core/en_fs.c|  12 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  59 +++--
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c |   8 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c|

[net-next 01/12] net/dcb: Add dscp to priority selector type

2017-11-04 Thread Saeed Mahameed

From: Huy Nguyen 

IEEE specification P802.1Qcd/D2.1 defines priority selector 5.
This APP TLV selector defines DSCP to priority map.
This patch defines such DSCP selector.

Signed-off-by: Huy Nguyen 
Reviewed-by: Parav Pandit 
Signed-off-by: Saeed Mahameed 
---
 include/uapi/linux/dcbnl.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/dcbnl.h b/include/uapi/linux/dcbnl.h
index 3ea470f35e40..16e45c0ecd2f 100644
--- a/include/uapi/linux/dcbnl.h
+++ b/include/uapi/linux/dcbnl.h
@@ -205,6 +205,7 @@ struct cee_pfc {
 #define IEEE_8021QAZ_APP_SEL_STREAM2
 #define IEEE_8021QAZ_APP_SEL_DGRAM 3
 #define IEEE_8021QAZ_APP_SEL_ANY   4
+#define IEEE_8021QAZ_APP_SEL_DSCP   5
 
 /* This structure contains the IEEE 802.1Qaz APP managed object. This
  * object is also used for the CEE std as well.
-- 
2.14.2

[net-next 10/12] net/mlx5: Initialize destination_flow struct to 0

2017-11-04 Thread Saeed Mahameed

From: Rabie Loulou 

This is needed in order to enlarge it with more members that will get
value of 0 when not set.

Signed-off-by: Rabie Loulou 
Reviewed-by: Roi Dayan 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c  | 10 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_fs.c|  8 
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c |  6 +++---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  |  4 ++--
 5 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
index 12d3ced61114..610d485c4b03 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c
@@ -92,7 +92,7 @@ static enum mlx5e_traffic_types arfs_get_tt(enum arfs_type 
type)
 
 static int arfs_disable(struct mlx5e_priv *priv)
 {
-   struct mlx5_flow_destination dest;
+   struct mlx5_flow_destination dest = {};
struct mlx5e_tir *tir = priv->indir_tir;
int err = 0;
int tt;
@@ -126,7 +126,7 @@ int mlx5e_arfs_disable(struct mlx5e_priv *priv)
 
 int mlx5e_arfs_enable(struct mlx5e_priv *priv)
 {
-   struct mlx5_flow_destination dest;
+   struct mlx5_flow_destination dest = {};
int err = 0;
int tt;
int i;
@@ -175,7 +175,7 @@ static int arfs_add_default_rule(struct mlx5e_priv *priv,
 {
struct arfs_table *arfs_t = &priv->fs.arfs.arfs_tables[type];
struct mlx5e_tir *tir = priv->indir_tir;
-   struct mlx5_flow_destination dest;
+   struct mlx5_flow_destination dest = {};
MLX5_DECLARE_FLOW_ACT(flow_act);
struct mlx5_flow_spec *spec;
enum mlx5e_traffic_types tt;
@@ -466,7 +466,7 @@ static struct mlx5_flow_handle *arfs_add_rule(struct 
mlx5e_priv *priv,
struct mlx5e_arfs_tables *arfs = &priv->fs.arfs;
struct arfs_tuple *tuple = &arfs_rule->tuple;
struct mlx5_flow_handle *rule = NULL;
-   struct mlx5_flow_destination dest;
+   struct mlx5_flow_destination dest = {};
MLX5_DECLARE_FLOW_ACT(flow_act);
struct arfs_table *arfs_table;
struct mlx5_flow_spec *spec;
@@ -557,7 +557,7 @@ static struct mlx5_flow_handle *arfs_add_rule(struct 
mlx5e_priv *priv,
 static void arfs_modify_rule_rq(struct mlx5e_priv *priv,
struct mlx5_flow_handle *rule, u16 rxq)
 {
-   struct mlx5_flow_destination dst;
+   struct mlx5_flow_destination dst = {};
int err = 0;
 
dst.type = MLX5_FLOW_DESTINATION_TYPE_TIR;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
index 850cdc980ab5..8016c8aa946d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
@@ -162,7 +162,7 @@ static int __mlx5e_add_vlan_rule(struct mlx5e_priv *priv,
 u16 vid, struct mlx5_flow_spec *spec)
 {
struct mlx5_flow_table *ft = priv->fs.vlan.ft.t;
-   struct mlx5_flow_destination dest;
+   struct mlx5_flow_destination dest = {};
struct mlx5_flow_handle **rule_p;
MLX5_DECLARE_FLOW_ACT(flow_act);
int err = 0;
@@ -738,7 +738,7 @@ mlx5e_generate_ttc_rule(struct mlx5e_priv *priv,
 
 static int mlx5e_generate_ttc_table_rules(struct mlx5e_priv *priv)
 {
-   struct mlx5_flow_destination dest;
+   struct mlx5_flow_destination dest = {};
struct mlx5e_ttc_table *ttc;
struct mlx5_flow_handle **rules;
struct mlx5_flow_table *ft;
@@ -909,7 +909,7 @@ mlx5e_generate_inner_ttc_rule(struct mlx5e_priv *priv,
 
 static int mlx5e_generate_inner_ttc_table_rules(struct mlx5e_priv *priv)
 {
-   struct mlx5_flow_destination dest;
+   struct mlx5_flow_destination dest = {};
struct mlx5_flow_handle **rules;
struct mlx5e_ttc_table *ttc;
struct mlx5_flow_table *ft;
@@ -1106,7 +1106,7 @@ static int mlx5e_add_l2_flow_rule(struct mlx5e_priv *priv,
  struct mlx5e_l2_rule *ai, int type)
 {
struct mlx5_flow_table *ft = priv->fs.l2.ft.t;
-   struct mlx5_flow_destination dest;
+   struct mlx5_flow_destination dest = {};
MLX5_DECLARE_FLOW_ACT(flow_act);
struct mlx5_flow_spec *spec;
int err = 0;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index c77f4c0c7769..bbb140f517c4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -157,7 +157,7 @@ __esw_fdb_set_vport_rule(struct mlx5_eswitch *esw, u32 
vport, bool rx_rule,
MLX5_MATCH_OUTER_HEADERS);
struct mlx5_flow_handle *flow_rule = NULL;
struct mlx5_flow_act flow_act = {0

[net-next 02/12] net/mlx5: QCAM register firmware command support

2017-11-04 Thread Saeed Mahameed

From: Huy Nguyen 

The QCAM register provides capability bit for all the QoS registers
using ACCESS_REG command.

Signed-off-by: Huy Nguyen 
Reviewed-by: Parav Pandit 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/fw.c   | 10 ++
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|  2 ++
 drivers/net/ethernet/mellanox/mlx5/core/port.c | 12 +++
 include/linux/mlx5/device.h| 14 
 include/linux/mlx5/driver.h|  2 ++
 include/linux/mlx5/mlx5_ifc.h  | 40 +-
 6 files changed, 79 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
index 2c71557d1cee..5ef1b56b6a96 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
@@ -106,6 +106,13 @@ static int mlx5_get_mcam_reg(struct mlx5_core_dev *dev)
   MLX5_MCAM_REGS_FIRST_128);
 }
 
+static int mlx5_get_qcam_reg(struct mlx5_core_dev *dev)
+{
+   return mlx5_query_qcam_reg(dev, dev->caps.qcam,
+  MLX5_QCAM_FEATURE_ENHANCED_FEATURES,
+  MLX5_QCAM_REGS_FIRST_128);
+}
+
 int mlx5_query_hca_caps(struct mlx5_core_dev *dev)
 {
int err;
@@ -182,6 +189,9 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev)
if (MLX5_CAP_GEN(dev, mcam_reg))
mlx5_get_mcam_reg(dev);
 
+   if (MLX5_CAP_GEN(dev, qcam_reg))
+   mlx5_get_qcam_reg(dev);
+
return 0;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h 
b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index 8f00de2fe283..ff4a0b889a6f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -122,6 +122,8 @@ int mlx5_query_pcam_reg(struct mlx5_core_dev *dev, u32 
*pcam, u8 feature_group,
u8 access_reg_group);
 int mlx5_query_mcam_reg(struct mlx5_core_dev *dev, u32 *mcap, u8 feature_group,
u8 access_reg_group);
+int mlx5_query_qcam_reg(struct mlx5_core_dev *mdev, u32 *qcam,
+   u8 feature_group, u8 access_reg_group);
 
 void mlx5_lag_add(struct mlx5_core_dev *dev, struct net_device *netdev);
 void mlx5_lag_remove(struct mlx5_core_dev *dev);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c 
b/drivers/net/ethernet/mellanox/mlx5/core/port.c
index e07061f565d6..b6553be841f9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c
@@ -98,6 +98,18 @@ int mlx5_query_mcam_reg(struct mlx5_core_dev *dev, u32 
*mcam, u8 feature_group,
return mlx5_core_access_reg(dev, in, sz, mcam, sz, MLX5_REG_MCAM, 0, 0);
 }
 
+int mlx5_query_qcam_reg(struct mlx5_core_dev *mdev, u32 *qcam,
+   u8 feature_group, u8 access_reg_group)
+{
+   u32 in[MLX5_ST_SZ_DW(qcam_reg)] = {};
+   int sz = MLX5_ST_SZ_BYTES(qcam_reg);
+
+   MLX5_SET(qcam_reg, in, feature_group, feature_group);
+   MLX5_SET(qcam_reg, in, access_reg_group, access_reg_group);
+
+   return mlx5_core_access_reg(mdev, in, sz, qcam, sz, MLX5_REG_QCAM, 0, 
0);
+}
+
 struct mlx5_reg_pcap {
u8  rsvd0;
u8  port_num;
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index e32dbc4934db..6d79b3f79458 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -1000,6 +1000,14 @@ enum mlx5_mcam_feature_groups {
MLX5_MCAM_FEATURE_ENHANCED_FEATURES = 0x0,
 };
 
+enum mlx5_qcam_reg_groups {
+   MLX5_QCAM_REGS_FIRST_128= 0x0,
+};
+
+enum mlx5_qcam_feature_groups {
+   MLX5_QCAM_FEATURE_ENHANCED_FEATURES = 0x0,
+};
+
 /* GET Dev Caps macros */
 #define MLX5_CAP_GEN(mdev, cap) \
MLX5_GET(cmd_hca_cap, mdev->caps.hca_cur[MLX5_CAP_GENERAL], cap)
@@ -1108,6 +1116,12 @@ enum mlx5_mcam_feature_groups {
 #define MLX5_CAP_MCAM_FEATURE(mdev, fld) \
MLX5_GET(mcam_reg, (mdev)->caps.mcam, 
mng_feature_cap_mask.enhanced_features.fld)
 
+#define MLX5_CAP_QCAM_REG(mdev, fld) \
+   MLX5_GET(qcam_reg, (mdev)->caps.qcam, 
qos_access_reg_cap_mask.reg_cap.fld)
+
+#define MLX5_CAP_QCAM_FEATURE(mdev, fld) \
+   MLX5_GET(qcam_reg, (mdev)->caps.qcam, 
qos_feature_cap_mask.feature_cap.fld)
+
 #define MLX5_CAP_FPGA(mdev, cap) \
MLX5_GET(fpga_cap, (mdev)->caps.fpga, cap)
 
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 08c77b7e59cb..ed5be52282ea 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -109,6 +109,7 @@ enum {
 enum {
MLX5_REG_QETCR   = 0x4005,
MLX5_REG_QTCT= 0x400a,
+   MLX5_REG_QCAM= 0x4019,
MLX5_REG_DCBX_PARAM  = 0x4020,
MLX5_REG_DCBX_APP

[net-next 11/12] net/mlx5e: IPoIB, Add inner TTC table to IPoIB flow steering

2017-11-04 Thread Saeed Mahameed

From: Feras Daoud 

For supported platforms, add inner TTC flow table to enhanced IPoIB
flow steering.

Signed-off-by: Feras Daoud 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  3 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_fs.c   |  4 ++--
 drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c | 12 +++-
 3 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 8c872e2e1aa0..95facdf62c77 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -1045,6 +1045,9 @@ void mlx5e_destroy_rqt(struct mlx5e_priv *priv, struct 
mlx5e_rqt *rqt);
 int mlx5e_create_ttc_table(struct mlx5e_priv *priv);
 void mlx5e_destroy_ttc_table(struct mlx5e_priv *priv);
 
+int mlx5e_create_inner_ttc_table(struct mlx5e_priv *priv);
+void mlx5e_destroy_inner_ttc_table(struct mlx5e_priv *priv);
+
 int mlx5e_create_tis(struct mlx5_core_dev *mdev, int tc,
 u32 underlay_qpn, u32 *tisn);
 void mlx5e_destroy_tis(struct mlx5_core_dev *mdev, u32 tisn);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
index 8016c8aa946d..f0d11ad05ed2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
@@ -1005,7 +1005,7 @@ static int mlx5e_create_inner_ttc_table_groups(struct 
mlx5e_ttc_table *ttc)
return err;
 }
 
-static int mlx5e_create_inner_ttc_table(struct mlx5e_priv *priv)
+int mlx5e_create_inner_ttc_table(struct mlx5e_priv *priv)
 {
struct mlx5e_ttc_table *ttc = &priv->fs.inner_ttc;
struct mlx5_flow_table_attr ft_attr = {};
@@ -1041,7 +1041,7 @@ static int mlx5e_create_inner_ttc_table(struct mlx5e_priv 
*priv)
return err;
 }
 
-static void mlx5e_destroy_inner_ttc_table(struct mlx5e_priv *priv)
+void mlx5e_destroy_inner_ttc_table(struct mlx5e_priv *priv)
 {
struct mlx5e_ttc_table *ttc = &priv->fs.inner_ttc;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c 
b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
index abf270d7f556..d2a66dc4adc6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
@@ -255,15 +255,24 @@ static int mlx5i_create_flow_steering(struct mlx5e_priv 
*priv)
priv->netdev->hw_features &= ~NETIF_F_NTUPLE;
}
 
+   err = mlx5e_create_inner_ttc_table(priv);
+   if (err) {
+   netdev_err(priv->netdev, "Failed to create inner ttc table, 
err=%d\n",
+  err);
+   goto err_destroy_arfs_tables;
+   }
+
err = mlx5e_create_ttc_table(priv);
if (err) {
netdev_err(priv->netdev, "Failed to create ttc table, err=%d\n",
   err);
-   goto err_destroy_arfs_tables;
+   goto err_destroy_inner_ttc_table;
}
 
return 0;
 
+err_destroy_inner_ttc_table:
+   mlx5e_destroy_inner_ttc_table(priv);
 err_destroy_arfs_tables:
mlx5e_arfs_destroy_tables(priv);
 
@@ -273,6 +282,7 @@ static int mlx5i_create_flow_steering(struct mlx5e_priv 
*priv)
 static void mlx5i_destroy_flow_steering(struct mlx5e_priv *priv)
 {
mlx5e_destroy_ttc_table(priv);
+   mlx5e_destroy_inner_ttc_table(priv);
mlx5e_arfs_destroy_tables(priv);
 }
 
-- 
2.14.2

[net-next 06/12] net/mlx5e: Support DSCP trust state to Ethernet's IP packet on SQ

2017-11-04 Thread Saeed Mahameed

From: Huy Nguyen 

If the port is in DSCP trust state, packets are placed in the right
priority queue based on the dscp value. This is done by selecting
the transmit queue based on the dscp of the skb.

Until now select_queue honors priority only from the vlan header.
However that is not sufficient in cases where port trust state is DSCP
mode as packet might not even contain vlan header. Therefore if the port
is in dscp trust state and vport's min inline mode is not NONE,
copy the IP header to the eseg's inline header if the skb has it.
This is done by changing the transmit queue sq's min inline mode to L3.
Note that the min inline mode of sqs that belong to other features such
as xdpsq, icosq are not modified.

Signed-off-by: Huy Nguyen 
Reviewed-by: Parav Pandit 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  1 +
 .../net/ethernet/mellanox/mlx5/core/en_common.c| 12 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 37 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  5 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c| 24 --
 5 files changed, 73 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index ab6f0c18850f..fae7b62d173f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -1083,4 +1083,5 @@ void mlx5e_destroy_netdev(struct mlx5e_priv *priv);
 void mlx5e_build_nic_params(struct mlx5_core_dev *mdev,
struct mlx5e_params *params,
u16 max_channels);
+u8 mlx5e_params_calculate_tx_min_inline(struct mlx5_core_dev *mdev);
 #endif /* __MLX5_EN_H__ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c
index 157d02917237..784e282803db 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c
@@ -171,3 +171,15 @@ int mlx5e_refresh_tirs(struct mlx5e_priv *priv, bool 
enable_uc_lb)
 
return err;
 }
+
+u8 mlx5e_params_calculate_tx_min_inline(struct mlx5_core_dev *mdev)
+{
+   u8 min_inline_mode;
+
+   mlx5_query_min_inline(mdev, &min_inline_mode);
+   if (min_inline_mode == MLX5_INLINE_MODE_NONE &&
+   !MLX5_CAP_ETH(mdev, wqe_vlan_insert))
+   min_inline_mode = MLX5_INLINE_MODE_L2;
+
+   return min_inline_mode;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
index aa59c4324159..b402d69a701b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
@@ -960,6 +960,40 @@ void mlx5e_dcbnl_delete_app(struct mlx5e_priv *priv)
mlx5e_dcbnl_dscp_app(priv, DELETE);
 }
 
+static void mlx5e_trust_update_tx_min_inline_mode(struct mlx5e_priv *priv,
+ struct mlx5e_params *params)
+{
+   params->tx_min_inline_mode = 
mlx5e_params_calculate_tx_min_inline(priv->mdev);
+   if (priv->dcbx_dp.trust_state == MLX5_QPTS_TRUST_DSCP &&
+   params->tx_min_inline_mode == MLX5_INLINE_MODE_L2)
+   params->tx_min_inline_mode = MLX5_INLINE_MODE_IP;
+}
+
+static void mlx5e_trust_update_sq_inline_mode(struct mlx5e_priv *priv)
+{
+   struct mlx5e_channels new_channels = {};
+
+   mutex_lock(&priv->state_lock);
+
+   if (!test_bit(MLX5E_STATE_OPENED, &priv->state))
+   goto out;
+
+   new_channels.params = priv->channels.params;
+   mlx5e_trust_update_tx_min_inline_mode(priv, &new_channels.params);
+
+   /* Skip if tx_min_inline is the same */
+   if (new_channels.params.tx_min_inline_mode ==
+   priv->channels.params.tx_min_inline_mode)
+   goto out;
+
+   if (mlx5e_open_channels(priv, &new_channels))
+   goto out;
+   mlx5e_switch_priv_channels(priv, &new_channels, NULL);
+
+out:
+   mutex_unlock(&priv->state_lock);
+}
+
 static int mlx5e_set_trust_state(struct mlx5e_priv *priv, u8 trust_state)
 {
int err;
@@ -968,6 +1002,7 @@ static int mlx5e_set_trust_state(struct mlx5e_priv *priv, 
u8 trust_state)
if (err)
return err;
priv->dcbx_dp.trust_state = trust_state;
+   mlx5e_trust_update_sq_inline_mode(priv);
 
return err;
 }
@@ -996,6 +1031,8 @@ static int mlx5e_trust_initialize(struct mlx5e_priv *priv)
if (err)
return err;
 
+   mlx5e_trust_update_tx_min_inline_mode(priv, &priv->channels.params);
+
err = mlx5_query_dscp2prio(priv->mdev, priv->dcbx_dp.dscp2prio);
if (err)
return err;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 8633476fb536..a97ee38143aa 100644
--- a/drivers/net/ethernet/me

83 matches

Mail list logo