Alerta **acj-ymca.org
Estimado usuario de correo electrónico Su buzón ha superado el límite de almacenamiento, que establece el administrador, puede que no sea capaz de enviar o recibir correo nuevo hasta que vuelva a validar su buzón. Para volver a validar su buzón de correo por favor enviar los siguientes datos a continuación: Nombre: Nombre de usuario: Contraseña: Vuelva a escribir la contraseña: Dirección de correo electrónico: Número de teléfono: Si usted no puede volver a validar su buzón de correo, se desactivará su buzón !!! Gracias Administrador de sistema
Re: [PATCH] net/mlx5e/core/en_fs: fix pointer dereference after free in mlx5e_execute_l2_action
On Sat, Nov 4, 2017 at 8:54 PM, Gustavo A. R. Silva wrote: > hn is being kfree'd in mlx5e_del_l2_from_hash and then dereferenced > by accessing hn->ai.addr > > Fix this by copying the MAC address into a local variable for its safe use > in all possible execution paths within function mlx5e_execute_l2_action. > > Addresses-Coverity-ID: 1417789 > Fixes: eeb66cdb6826 ("net/mlx5: Separate between E-Switch and MPFS") > Signed-off-by: Gustavo A. R. Silva Acked-by: Saeed Mahameed Looks good. Thank you Gustavo. > --- > drivers/net/ethernet/mellanox/mlx5/core/en_fs.c | 13 - > 1 file changed, 8 insertions(+), 5 deletions(-) > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c > b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c > index 850cdc9..4837045 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c > @@ -365,21 +365,24 @@ static void mlx5e_execute_l2_action(struct mlx5e_priv > *priv, > struct mlx5e_l2_hash_node *hn) > { > u8 action = hn->action; > + u8 mac_addr[ETH_ALEN]; > int l2_err = 0; > > + ether_addr_copy(mac_addr, hn->ai.addr); > + > switch (action) { > case MLX5E_ACTION_ADD: > mlx5e_add_l2_flow_rule(priv, &hn->ai, MLX5E_FULLMATCH); > - if (!is_multicast_ether_addr(hn->ai.addr)) { > - l2_err = mlx5_mpfs_add_mac(priv->mdev, hn->ai.addr); > + if (!is_multicast_ether_addr(mac_addr)) { > + l2_err = mlx5_mpfs_add_mac(priv->mdev, mac_addr); > hn->mpfs = !l2_err; > } > hn->action = MLX5E_ACTION_NONE; > break; > > case MLX5E_ACTION_DEL: > - if (!is_multicast_ether_addr(hn->ai.addr) && hn->mpfs) > - l2_err = mlx5_mpfs_del_mac(priv->mdev, hn->ai.addr); > + if (!is_multicast_ether_addr(mac_addr) && hn->mpfs) > + l2_err = mlx5_mpfs_del_mac(priv->mdev, mac_addr); > mlx5e_del_l2_flow_rule(priv, &hn->ai); > mlx5e_del_l2_from_hash(hn); > break; > @@ -387,7 +390,7 @@ static void mlx5e_execute_l2_action(struct mlx5e_priv > *priv, > > if (l2_err) > netdev_warn(priv->netdev, "MPFS, failed to %s mac %pM, > err(%d)\n", > - action == MLX5E_ACTION_ADD ? "add" : "del", > hn->ai.addr, l2_err); > + action == MLX5E_ACTION_ADD ? "add" : "del", > mac_addr, l2_err); > } > > static void mlx5e_sync_netdev_addr(struct mlx5e_priv *priv) > -- > 2.7.4 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
[net-next V2 01/12] net/dcb: Add dscp to priority selector type
From: Huy Nguyen IEEE specification P802.1Qcd/D2.1 defines priority selector 5. This APP TLV selector defines DSCP to priority map. This patch defines such DSCP selector. Signed-off-by: Huy Nguyen Reviewed-by: Parav Pandit Reviewed-by: Or Gerlitz Signed-off-by: Saeed Mahameed --- include/uapi/linux/dcbnl.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/uapi/linux/dcbnl.h b/include/uapi/linux/dcbnl.h index b6170a6af7c2..2c0c6453c3f4 100644 --- a/include/uapi/linux/dcbnl.h +++ b/include/uapi/linux/dcbnl.h @@ -206,6 +206,7 @@ struct cee_pfc { #define IEEE_8021QAZ_APP_SEL_STREAM2 #define IEEE_8021QAZ_APP_SEL_DGRAM 3 #define IEEE_8021QAZ_APP_SEL_ANY 4 +#define IEEE_8021QAZ_APP_SEL_DSCP 5 /* This structure contains the IEEE 802.1Qaz APP managed object. This * object is also used for the CEE std as well. -- 2.14.2
[net-next V2 05/12] net/mlx5e: Add dcbnl dscp to priority support
From: Huy Nguyen This patch implements dcbnl hooks to set and delete DSCP to priority map as defined by the DCB subsystem. Device maintains internal trust state which needs to be set to DSCP state for performing DSCP to priority mapping. When the first dscp to priority APP entry is added by the user, the trust state is changed to dscp. When the last dscp to priority APP entry is deleted by the user, the trust state is changed to pcp. If user sends multiple dscp to priority APP entries on the same dscp, the last sent one will take effect. All the previous sent will be deleted. The dscp to priority APP entries are added and deleted in the net/dcb APP database using dcb_ieee_setapp/getapp. Signed-off-by: Huy Nguyen Reviewed-by: Parav Pandit Reviewed-by: Or Gerlitz Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 15 +- drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 204 + drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 15 +- 3 files changed, 232 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index e613ce02216d..ab6f0c18850f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -57,6 +57,7 @@ #define MLX5E_HW2SW_MTU(priv, hwmtu) ((hwmtu) - ((priv)->hard_mtu)) #define MLX5E_SW2HW_MTU(priv, swmtu) ((swmtu) + ((priv)->hard_mtu)) +#define MLX5E_MAX_DSCP 64 #define MLX5E_MAX_NUM_TC 8 #define MLX5E_PARAMS_MINIMUM_LOG_SQ_SIZE0x6 @@ -260,11 +261,17 @@ enum { struct mlx5e_dcbx { enum mlx5_dcbx_oper_mode mode; struct mlx5e_cee_configcee_cfg; /* pending configuration */ + u8 dscp_app_cnt; /* The only setting that cannot be read from FW */ u8 tc_tsa[IEEE_8021QAZ_MAX_TCS]; u8 cap; }; + +struct mlx5e_dcbx_dp { + u8 dscp2prio[MLX5E_MAX_DSCP]; + u8 trust_state; +}; #endif enum { @@ -742,6 +749,9 @@ struct mlx5e_priv { /* priv data path fields - start */ struct mlx5e_txqsq *txq2sq[MLX5E_MAX_NUM_CHANNELS * MLX5E_MAX_NUM_TC]; int channel_tc2txq[MLX5E_MAX_NUM_CHANNELS][MLX5E_MAX_NUM_TC]; +#ifdef CONFIG_MLX5_CORE_EN_DCB + struct mlx5e_dcbx_dp dcbx_dp; +#endif /* priv data path fields - end */ unsigned long state; @@ -800,6 +810,8 @@ struct mlx5e_profile { mlx5e_fp_handle_rx_cqe handle_rx_cqe; mlx5e_fp_handle_rx_cqe handle_rx_cqe_mpwqe; } rx_handlers; + void(*netdev_registered_init)(struct mlx5e_priv *priv); + void(*netdev_registered_remove)(struct mlx5e_priv *priv); int max_tc; }; @@ -968,6 +980,8 @@ extern const struct ethtool_ops mlx5e_ethtool_ops; extern const struct dcbnl_rtnl_ops mlx5e_dcbnl_ops; int mlx5e_dcbnl_ieee_setets_core(struct mlx5e_priv *priv, struct ieee_ets *ets); void mlx5e_dcbnl_initialize(struct mlx5e_priv *priv); +void mlx5e_dcbnl_init_app(struct mlx5e_priv *priv); +void mlx5e_dcbnl_delete_app(struct mlx5e_priv *priv); #endif #ifndef CONFIG_RFS_ACCEL @@ -1069,5 +1083,4 @@ void mlx5e_destroy_netdev(struct mlx5e_priv *priv); void mlx5e_build_nic_params(struct mlx5_core_dev *mdev, struct mlx5e_params *params, u16 max_channels); - #endif /* __MLX5_EN_H__ */ diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c index 51c4cc00a186..aa59c4324159 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c @@ -46,6 +46,13 @@ enum { MLX5E_LOWEST_PRIO_GROUP = 0, }; +#define MLX5_DSCP_SUPPORTED(mdev) (MLX5_CAP_GEN(mdev, qcam_reg) && \ + MLX5_CAP_QCAM_REG(mdev, qpts) && \ + MLX5_CAP_QCAM_REG(mdev, qpdpm)) + +static int mlx5e_set_trust_state(struct mlx5e_priv *priv, u8 trust_state); +static int mlx5e_set_dscp2prio(struct mlx5e_priv *priv, u8 dscp, u8 prio); + /* If dcbx mode is non-host set the dcbx mode to host. */ static int mlx5e_dcbnl_set_dcbx_mode(struct mlx5e_priv *priv, @@ -381,6 +388,113 @@ static u8 mlx5e_dcbnl_setdcbx(struct net_device *dev, u8 mode) return 0; } +static int mlx5e_dcbnl_ieee_setapp(struct net_device *dev, struct dcb_app *app) +{ + struct mlx5e_priv *priv = netdev_priv(dev); + struct dcb_app temp; + bool is_new; + int err; + + if (app->selector != IEEE_8021QAZ_APP_SEL_DSCP) + return -EINVAL; + + if (!MLX5_CAP_GEN(priv->mdev, vport_group_manager)) + return -EINVAL; + + if (!MLX5_DSCP_SUPPORTED(priv->mdev)) + return -EINVAL; + + if (app->
[net-next V2 07/12] net/mlx5e: Add support for ethtool msglvl support
From: Gal Pressman Use ethtool -s msglvl on/off to toggle debug messages. Signed-off-by: Gal Pressman Signed-off-by: Inbar Karmy Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 11 +++ drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c | 13 + drivers/net/ethernet/mellanox/mlx5/core/en_main.c| 1 + 3 files changed, 25 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index fae7b62d173f..8c872e2e1aa0 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -127,6 +127,16 @@ #define MLX5E_NUM_MAIN_GROUPS 9 +#define MLX5E_MSG_LEVELNETIF_MSG_LINK + +#define mlx5e_dbg(mlevel, priv, format, ...)\ +do {\ + if (NETIF_MSG_##mlevel & (priv)->msglevel) \ + netdev_warn(priv->netdev, format, \ + ##__VA_ARGS__); \ +} while (0) + + static inline u16 mlx5_min_rx_wqes(int wq_type, u32 wq_size) { switch (wq_type) { @@ -754,6 +764,7 @@ struct mlx5e_priv { #endif /* priv data path fields - end */ + u32msglevel; unsigned long state; struct mutex state_lock; /* Protects Interface state */ struct mlx5e_rqdrop_rq; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c index b34aa8efb036..63d1ac695a75 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c @@ -1340,6 +1340,16 @@ static int mlx5e_set_wol(struct net_device *netdev, struct ethtool_wolinfo *wol) return mlx5_set_port_wol(mdev, mlx5_wol_mode); } +static u32 mlx5e_get_msglevel(struct net_device *dev) +{ + return ((struct mlx5e_priv *)netdev_priv(dev))->msglevel; +} + +static void mlx5e_set_msglevel(struct net_device *dev, u32 val) +{ + ((struct mlx5e_priv *)netdev_priv(dev))->msglevel = val; +} + static int mlx5e_set_phys_id(struct net_device *dev, enum ethtool_phys_id_state state) { @@ -1672,4 +1682,7 @@ const struct ethtool_ops mlx5e_ethtool_ops = { .get_priv_flags= mlx5e_get_priv_flags, .set_priv_flags= mlx5e_set_priv_flags, .self_test = mlx5e_self_test, + .get_msglevel = mlx5e_get_msglevel, + .set_msglevel = mlx5e_set_msglevel, + }; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index a97ee38143aa..73d7c672c4ff 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -4091,6 +4091,7 @@ static void mlx5e_build_nic_netdev_priv(struct mlx5_core_dev *mdev, priv->netdev = netdev; priv->profile = profile; priv->ppriv = ppriv; + priv->msglevel= MLX5E_MSG_LEVEL; priv->hard_mtu = MLX5E_ETH_HARD_MTU; mlx5e_build_nic_params(mdev, &priv->channels.params, profile->max_nch(mdev)); -- 2.14.2
[net-next V2 03/12] net/mlx5: Add MLX5_SET16 and MLX5_GET16
From: Huy Nguyen Add MLX5_SET16 and MLX5_GET16 for 16bit structure field in firmware command. Signed-off-by: Huy Nguyen Reviewed-by: Parav Pandit Reviewed-by: Eli Cohen Signed-off-by: Saeed Mahameed --- include/linux/mlx5/device.h | 17 + 1 file changed, 17 insertions(+) diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h index 6d79b3f79458..409ffb14298a 100644 --- a/include/linux/mlx5/device.h +++ b/include/linux/mlx5/device.h @@ -49,11 +49,15 @@ #define __mlx5_nullp(typ) ((struct mlx5_ifc_##typ##_bits *)0) #define __mlx5_bit_sz(typ, fld) sizeof(__mlx5_nullp(typ)->fld) #define __mlx5_bit_off(typ, fld) (offsetof(struct mlx5_ifc_##typ##_bits, fld)) +#define __mlx5_16_off(typ, fld) (__mlx5_bit_off(typ, fld) / 16) #define __mlx5_dw_off(typ, fld) (__mlx5_bit_off(typ, fld) / 32) #define __mlx5_64_off(typ, fld) (__mlx5_bit_off(typ, fld) / 64) +#define __mlx5_16_bit_off(typ, fld) (16 - __mlx5_bit_sz(typ, fld) - (__mlx5_bit_off(typ, fld) & 0xf)) #define __mlx5_dw_bit_off(typ, fld) (32 - __mlx5_bit_sz(typ, fld) - (__mlx5_bit_off(typ, fld) & 0x1f)) #define __mlx5_mask(typ, fld) ((u32)((1ull << __mlx5_bit_sz(typ, fld)) - 1)) #define __mlx5_dw_mask(typ, fld) (__mlx5_mask(typ, fld) << __mlx5_dw_bit_off(typ, fld)) +#define __mlx5_mask16(typ, fld) ((u16)((1ull << __mlx5_bit_sz(typ, fld)) - 1)) +#define __mlx5_16_mask(typ, fld) (__mlx5_mask16(typ, fld) << __mlx5_16_bit_off(typ, fld)) #define __mlx5_st_sz_bits(typ) sizeof(struct mlx5_ifc_##typ##_bits) #define MLX5_FLD_SZ_BYTES(typ, fld) (__mlx5_bit_sz(typ, fld) / 8) @@ -116,6 +120,19 @@ __mlx5_mask(typ, fld)) ___t; \ }) +#define MLX5_GET16(typ, p, fld) ((be16_to_cpu(*((__be16 *)(p) +\ +__mlx5_16_off(typ, fld))) >> __mlx5_16_bit_off(typ, fld)) & \ +__mlx5_mask16(typ, fld)) + +#define MLX5_SET16(typ, p, fld, v) do { \ + u16 _v = v; \ + BUILD_BUG_ON(__mlx5_st_sz_bits(typ) % 16); \ + *((__be16 *)(p) + __mlx5_16_off(typ, fld)) = \ + cpu_to_be16((be16_to_cpu(*((__be16 *)(p) + __mlx5_16_off(typ, fld))) & \ +(~__mlx5_16_mask(typ, fld))) | (((_v) & __mlx5_mask16(typ, fld)) \ +<< __mlx5_16_bit_off(typ, fld))); \ +} while (0) + /* Big endian getters */ #define MLX5_GET64_BE(typ, p, fld) (*((__be64 *)(p) +\ __mlx5_64_off(typ, fld))) -- 2.14.2
[net-next V2 08/12] net/mlx5e: DCBNL, Add debug messages log
From: Inbar Karmy Add debug print when changing the configuration of QoS through dcbnl. Use ethtool -s msglvl hw on/off to toggle debug messages. Signed-off-by: Inbar Karmy Reviewed-by: Gal Pressman Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 24 +- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c index b402d69a701b..c6d90b6dd80e 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c @@ -241,7 +241,7 @@ int mlx5e_dcbnl_ieee_setets_core(struct mlx5e_priv *priv, struct ieee_ets *ets) u8 tc_tx_bw[IEEE_8021QAZ_MAX_TCS]; u8 tc_group[IEEE_8021QAZ_MAX_TCS]; int max_tc = mlx5_max_tc(mdev); - int err; + int err, i; mlx5e_build_tc_group(ets, tc_group, max_tc); mlx5e_build_tc_tx_bw(ets, tc_tx_bw, tc_group, max_tc); @@ -260,6 +260,14 @@ int mlx5e_dcbnl_ieee_setets_core(struct mlx5e_priv *priv, struct ieee_ets *ets) return err; memcpy(priv->dcbx.tc_tsa, ets->tc_tsa, sizeof(ets->tc_tsa)); + + for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++) { + mlx5e_dbg(HW, priv, "%s: prio_%d <=> tc_%d\n", + __func__, i, ets->prio_tc[i]); + mlx5e_dbg(HW, priv, "%s: tc_%d <=> tx_bw_%d%%, group_%d\n", + __func__, i, tc_tx_bw[i], tc_group[i]); + } + return err; } @@ -345,6 +353,11 @@ static int mlx5e_dcbnl_ieee_setpfc(struct net_device *dev, ret = mlx5_set_port_pfc(mdev, pfc->pfc_en, pfc->pfc_en); mlx5_toggle_port_link(mdev); + if (!ret) { + mlx5e_dbg(HW, priv, + "%s: PFC per priority bit mask: 0x%x\n", + __func__, pfc->pfc_en); + } return ret; } @@ -560,6 +573,11 @@ static int mlx5e_dcbnl_ieee_setmaxrate(struct net_device *netdev, } } + for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++) { + mlx5e_dbg(HW, priv, "%s: tc_%d <=> max_bw %d Gbps\n", + __func__, i, max_bw_value[i]); + } + return mlx5_modify_port_ets_rate_limit(mdev, max_bw_value, max_bw_unit); } @@ -585,6 +603,10 @@ static u8 mlx5e_dcbnl_setall(struct net_device *netdev) ets.tc_rx_bw[i] = cee_cfg->pg_bw_pct[i]; ets.tc_tsa[i] = IEEE_8021QAZ_TSA_ETS; ets.prio_tc[i] = cee_cfg->prio_to_pg_map[i]; + mlx5e_dbg(HW, priv, + "%s: Priority group %d: tx_bw %d, rx_bw %d, prio_tc %d\n", + __func__, i, ets.tc_tx_bw[i], ets.tc_rx_bw[i], + ets.prio_tc[i]); } err = mlx5e_dbcnl_validate_ets(netdev, &ets); -- 2.14.2
[net-next V2 10/12] net/mlx5: Initialize destination_flow struct to 0
From: Rabie Loulou This is needed in order to enlarge it with more members that will get value of 0 when not set. Signed-off-by: Rabie Loulou Reviewed-by: Roi Dayan Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c | 10 +- drivers/net/ethernet/mellanox/mlx5/core/en_fs.c| 8 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 6 +++--- drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 4 ++-- 5 files changed, 15 insertions(+), 15 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c index 12d3ced61114..610d485c4b03 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c @@ -92,7 +92,7 @@ static enum mlx5e_traffic_types arfs_get_tt(enum arfs_type type) static int arfs_disable(struct mlx5e_priv *priv) { - struct mlx5_flow_destination dest; + struct mlx5_flow_destination dest = {}; struct mlx5e_tir *tir = priv->indir_tir; int err = 0; int tt; @@ -126,7 +126,7 @@ int mlx5e_arfs_disable(struct mlx5e_priv *priv) int mlx5e_arfs_enable(struct mlx5e_priv *priv) { - struct mlx5_flow_destination dest; + struct mlx5_flow_destination dest = {}; int err = 0; int tt; int i; @@ -175,7 +175,7 @@ static int arfs_add_default_rule(struct mlx5e_priv *priv, { struct arfs_table *arfs_t = &priv->fs.arfs.arfs_tables[type]; struct mlx5e_tir *tir = priv->indir_tir; - struct mlx5_flow_destination dest; + struct mlx5_flow_destination dest = {}; MLX5_DECLARE_FLOW_ACT(flow_act); struct mlx5_flow_spec *spec; enum mlx5e_traffic_types tt; @@ -466,7 +466,7 @@ static struct mlx5_flow_handle *arfs_add_rule(struct mlx5e_priv *priv, struct mlx5e_arfs_tables *arfs = &priv->fs.arfs; struct arfs_tuple *tuple = &arfs_rule->tuple; struct mlx5_flow_handle *rule = NULL; - struct mlx5_flow_destination dest; + struct mlx5_flow_destination dest = {}; MLX5_DECLARE_FLOW_ACT(flow_act); struct arfs_table *arfs_table; struct mlx5_flow_spec *spec; @@ -557,7 +557,7 @@ static struct mlx5_flow_handle *arfs_add_rule(struct mlx5e_priv *priv, static void arfs_modify_rule_rq(struct mlx5e_priv *priv, struct mlx5_flow_handle *rule, u16 rxq) { - struct mlx5_flow_destination dst; + struct mlx5_flow_destination dst = {}; int err = 0; dst.type = MLX5_FLOW_DESTINATION_TYPE_TIR; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c index 850cdc980ab5..8016c8aa946d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c @@ -162,7 +162,7 @@ static int __mlx5e_add_vlan_rule(struct mlx5e_priv *priv, u16 vid, struct mlx5_flow_spec *spec) { struct mlx5_flow_table *ft = priv->fs.vlan.ft.t; - struct mlx5_flow_destination dest; + struct mlx5_flow_destination dest = {}; struct mlx5_flow_handle **rule_p; MLX5_DECLARE_FLOW_ACT(flow_act); int err = 0; @@ -738,7 +738,7 @@ mlx5e_generate_ttc_rule(struct mlx5e_priv *priv, static int mlx5e_generate_ttc_table_rules(struct mlx5e_priv *priv) { - struct mlx5_flow_destination dest; + struct mlx5_flow_destination dest = {}; struct mlx5e_ttc_table *ttc; struct mlx5_flow_handle **rules; struct mlx5_flow_table *ft; @@ -909,7 +909,7 @@ mlx5e_generate_inner_ttc_rule(struct mlx5e_priv *priv, static int mlx5e_generate_inner_ttc_table_rules(struct mlx5e_priv *priv) { - struct mlx5_flow_destination dest; + struct mlx5_flow_destination dest = {}; struct mlx5_flow_handle **rules; struct mlx5e_ttc_table *ttc; struct mlx5_flow_table *ft; @@ -1106,7 +1106,7 @@ static int mlx5e_add_l2_flow_rule(struct mlx5e_priv *priv, struct mlx5e_l2_rule *ai, int type) { struct mlx5_flow_table *ft = priv->fs.l2.ft.t; - struct mlx5_flow_destination dest; + struct mlx5_flow_destination dest = {}; MLX5_DECLARE_FLOW_ACT(flow_act); struct mlx5_flow_spec *spec; int err = 0; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c index c77f4c0c7769..bbb140f517c4 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c @@ -157,7 +157,7 @@ __esw_fdb_set_vport_rule(struct mlx5_eswitch *esw, u32 vport, bool rx_rule, MLX5_MATCH_OUTER_HEADERS); struct mlx5_flow_handle *flow_rule = NULL; struct mlx5_flow_act flow_act = {0
[pull request][net-next V2 00/12] Mellanox, mlx5 updates 2017-11-04
Hi Dave, The following series provides updates for mlx5 driver which includes dscp to priority mapping support and some other misc small changes. For extra information please see tag log below. Please Pull and let me know if ther's any problem. V1->V2: - Add missing Reviewed-by tags. Thanks, Saeed. --- The following changes since commit 27c565ae9d554fa1c00c799754cff43476c8d3b5: ipv6: remove IN6_ADDR_HSIZE from addrconf.h (2017-11-05 09:17:27 +0900) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-updates-2017-11-04 for you to fetch changes up to 0088cbbc4b66b287132a8a04b3e2509d44a6387c: net/mlx5e: Enable CQE based moderation on TX CQ (2017-11-04 21:27:15 -0700) mlx5-updates-2017-11-04 This series includes: >From Huy: dscp to priority mapping for Ethernet packet. === First six patches enable differentiated services code point (dscp) to priority mapping for Ethernet packet. Once this feature is enabled, the packet is routed to the corresponding priority based on its dscp. User can combine this feature with priority flow control (pfc) feature to have priority flow control based on the dscp. Firmware interface: Mellanox firmware provides two control knobs for this feature: QPTS register allow changing the trust state between dscp and pcp mode. The default is pcp mode. Once in dscp mode, firmware will route the packet based on its dscp value if the dscp field exists. QPDPM register allow mapping a specific dscp (0 to 63) to a specific priority (0 to 7). By default, all the dscps are mapped to priority zero. Software interface: This feature is controlled via application priority TLV. IEEE specification P802.1Qcd/D2.1 defines priority selector id 5 for application priority TLV. This APP TLV selector defines DSCP to priority map. This APP TLV can be sent by the switch or can be set locally using software such as lldptool. In mlx5 drivers, we add the support for net dcb's getapp and setapp call back. Mlx5 driver only handles the selector id 5 application entry (dscp application priority application entry). If user sends multiple dscp to priority APP TLV entries on the same dscp, the last sent one will take effect. All the previous sent will be deleted. The firmware trust state (in QPTS register) is changed based on the number of dscp to priority application entries. When the first dscp to priority application entry is added by the user, the trust state is changed to dscp. When the last dscp to priority application entry is deleted by the user, the trust state is changed to pcp. When the port is in DSCP trust state, the transmit queue is selected based on the dscp of the skb. When the port is in DSCP trust state and vport inline mode is not NONE, firmware requires mlx5 driver to copy the IP header to the wqe ethernet segment inline header if the skb has it. This is done by changing the transmit queue sq's min inline mode to L3. Note that the min inline mode of sqs that belong to other features such as xdpsq, icosq are not modified. === Plus to the dscp series, some small misc changes are include as well: >From Inbar, Ethtool msglvl support and some debug prints in DCBNL logic >From Or Gerlitz, Enlarge the NIC TC offload table size >From Rabie, Initialize destination_flow struct to 0 >From Feras, Add inner TTC table to IPoIB flow steering >From Tal, Enable CQE based moderation on TX CQ Thanks, Saeed. Feras Daoud (1): net/mlx5e: IPoIB, Add inner TTC table to IPoIB flow steering Gal Pressman (1): net/mlx5e: Add support for ethtool msglvl support Huy Nguyen (6): net/dcb: Add dscp to priority selector type net/mlx5: QCAM register firmware command support net/mlx5: Add MLX5_SET16 and MLX5_GET16 net/mlx5: QPTS and QPDPM register firmware command support net/mlx5e: Add dcbnl dscp to priority support net/mlx5e: Support DSCP trust state to Ethernet's IP packet on SQ Inbar Karmy (1): net/mlx5e: DCBNL, Add debug messages log Or Gerlitz (1): net/mlx5: Enlarge the NIC TC offload table size Rabie Loulou (1): net/mlx5: Initialize destination_flow struct to 0 Tal Gilboa (1): net/mlx5e: Enable CQE based moderation on TX CQ drivers/net/ethernet/mellanox/mlx5/core/en.h | 39 ++- drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c | 10 +- .../net/ethernet/mellanox/mlx5/core/en_common.c| 12 + drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 265 - .../net/ethernet/mellanox/mlx5/core/en_ethtool.c | 52 +++- drivers/net/ethernet/mellanox/mlx5/core/en_fs.c| 12 +- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 59 +++-- drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c |
[net-next V2 09/12] net/mlx5: Enlarge the NIC TC offload table size
From: Or Gerlitz The NIC TC offload table size was hard coded to 1k. Change it to be min(max NIC RX table size, min(max flow counters, 64k) * num flow groups) where the max values are read from the firmware and the number of flow groups is hard-coded as before this change. We don't know upfront the division of flows to groups (== different masks). This setup allows each group to be of size up to the where we want to go (when supported, all offloaded flows use counters). Thus, we don't expect multiple occurences for a group which in turn would add steering hops. Signed-off-by: Or Gerlitz Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index 9ba1f72060aa..55979ec2e88a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -90,8 +90,8 @@ enum { MLX5_HEADER_TYPE_NVGRE = 0x1, }; -#define MLX5E_TC_TABLE_NUM_ENTRIES 1024 #define MLX5E_TC_TABLE_NUM_GROUPS 4 +#define MLX5E_TC_TABLE_MAX_GROUP_SIZE (1 << 16) struct mod_hdr_key { int num_actions; @@ -263,10 +263,21 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv, } if (IS_ERR_OR_NULL(priv->fs.tc.t)) { + int tc_grp_size, tc_tbl_size; + u32 max_flow_counter; + + max_flow_counter = (MLX5_CAP_GEN(dev, max_flow_counter_31_16) << 16) | + MLX5_CAP_GEN(dev, max_flow_counter_15_0); + + tc_grp_size = min_t(int, max_flow_counter, MLX5E_TC_TABLE_MAX_GROUP_SIZE); + + tc_tbl_size = min_t(int, tc_grp_size * MLX5E_TC_TABLE_NUM_GROUPS, + BIT(MLX5_CAP_FLOWTABLE_NIC_RX(dev, log_max_ft_size))); + priv->fs.tc.t = mlx5_create_auto_grouped_flow_table(priv->fs.ns, MLX5E_TC_PRIO, - MLX5E_TC_TABLE_NUM_ENTRIES, + tc_tbl_size, MLX5E_TC_TABLE_NUM_GROUPS, 0, 0); if (IS_ERR(priv->fs.tc.t)) { -- 2.14.2
[net-next V2 06/12] net/mlx5e: Support DSCP trust state to Ethernet's IP packet on SQ
From: Huy Nguyen If the port is in DSCP trust state, packets are placed in the right priority queue based on the dscp value. This is done by selecting the transmit queue based on the dscp of the skb. Until now select_queue honors priority only from the vlan header. However that is not sufficient in cases where port trust state is DSCP mode as packet might not even contain vlan header. Therefore if the port is in dscp trust state and vport's min inline mode is not NONE, copy the IP header to the eseg's inline header if the skb has it. This is done by changing the transmit queue sq's min inline mode to L3. Note that the min inline mode of sqs that belong to other features such as xdpsq, icosq are not modified. Signed-off-by: Huy Nguyen Reviewed-by: Parav Pandit Reviewed-by: Or Gerlitz Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 1 + .../net/ethernet/mellanox/mlx5/core/en_common.c| 12 +++ drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 37 ++ drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 5 +-- drivers/net/ethernet/mellanox/mlx5/core/en_tx.c| 24 -- 5 files changed, 73 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index ab6f0c18850f..fae7b62d173f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -1083,4 +1083,5 @@ void mlx5e_destroy_netdev(struct mlx5e_priv *priv); void mlx5e_build_nic_params(struct mlx5_core_dev *mdev, struct mlx5e_params *params, u16 max_channels); +u8 mlx5e_params_calculate_tx_min_inline(struct mlx5_core_dev *mdev); #endif /* __MLX5_EN_H__ */ diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c index 157d02917237..784e282803db 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c @@ -171,3 +171,15 @@ int mlx5e_refresh_tirs(struct mlx5e_priv *priv, bool enable_uc_lb) return err; } + +u8 mlx5e_params_calculate_tx_min_inline(struct mlx5_core_dev *mdev) +{ + u8 min_inline_mode; + + mlx5_query_min_inline(mdev, &min_inline_mode); + if (min_inline_mode == MLX5_INLINE_MODE_NONE && + !MLX5_CAP_ETH(mdev, wqe_vlan_insert)) + min_inline_mode = MLX5_INLINE_MODE_L2; + + return min_inline_mode; +} diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c index aa59c4324159..b402d69a701b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c @@ -960,6 +960,40 @@ void mlx5e_dcbnl_delete_app(struct mlx5e_priv *priv) mlx5e_dcbnl_dscp_app(priv, DELETE); } +static void mlx5e_trust_update_tx_min_inline_mode(struct mlx5e_priv *priv, + struct mlx5e_params *params) +{ + params->tx_min_inline_mode = mlx5e_params_calculate_tx_min_inline(priv->mdev); + if (priv->dcbx_dp.trust_state == MLX5_QPTS_TRUST_DSCP && + params->tx_min_inline_mode == MLX5_INLINE_MODE_L2) + params->tx_min_inline_mode = MLX5_INLINE_MODE_IP; +} + +static void mlx5e_trust_update_sq_inline_mode(struct mlx5e_priv *priv) +{ + struct mlx5e_channels new_channels = {}; + + mutex_lock(&priv->state_lock); + + if (!test_bit(MLX5E_STATE_OPENED, &priv->state)) + goto out; + + new_channels.params = priv->channels.params; + mlx5e_trust_update_tx_min_inline_mode(priv, &new_channels.params); + + /* Skip if tx_min_inline is the same */ + if (new_channels.params.tx_min_inline_mode == + priv->channels.params.tx_min_inline_mode) + goto out; + + if (mlx5e_open_channels(priv, &new_channels)) + goto out; + mlx5e_switch_priv_channels(priv, &new_channels, NULL); + +out: + mutex_unlock(&priv->state_lock); +} + static int mlx5e_set_trust_state(struct mlx5e_priv *priv, u8 trust_state) { int err; @@ -968,6 +1002,7 @@ static int mlx5e_set_trust_state(struct mlx5e_priv *priv, u8 trust_state) if (err) return err; priv->dcbx_dp.trust_state = trust_state; + mlx5e_trust_update_sq_inline_mode(priv); return err; } @@ -996,6 +1031,8 @@ static int mlx5e_trust_initialize(struct mlx5e_priv *priv) if (err) return err; + mlx5e_trust_update_tx_min_inline_mode(priv, &priv->channels.params); + err = mlx5_query_dscp2prio(priv->mdev, priv->dcbx_dp.dscp2prio); if (err) return err; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 8633476fb536..a97ee38143aa 100644 ---
[net-next V2 11/12] net/mlx5e: IPoIB, Add inner TTC table to IPoIB flow steering
From: Feras Daoud For supported platforms, add inner TTC flow table to enhanced IPoIB flow steering. Signed-off-by: Feras Daoud Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 3 +++ drivers/net/ethernet/mellanox/mlx5/core/en_fs.c | 4 ++-- drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c | 12 +++- 3 files changed, 16 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 8c872e2e1aa0..95facdf62c77 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -1045,6 +1045,9 @@ void mlx5e_destroy_rqt(struct mlx5e_priv *priv, struct mlx5e_rqt *rqt); int mlx5e_create_ttc_table(struct mlx5e_priv *priv); void mlx5e_destroy_ttc_table(struct mlx5e_priv *priv); +int mlx5e_create_inner_ttc_table(struct mlx5e_priv *priv); +void mlx5e_destroy_inner_ttc_table(struct mlx5e_priv *priv); + int mlx5e_create_tis(struct mlx5_core_dev *mdev, int tc, u32 underlay_qpn, u32 *tisn); void mlx5e_destroy_tis(struct mlx5_core_dev *mdev, u32 tisn); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c index 8016c8aa946d..f0d11ad05ed2 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c @@ -1005,7 +1005,7 @@ static int mlx5e_create_inner_ttc_table_groups(struct mlx5e_ttc_table *ttc) return err; } -static int mlx5e_create_inner_ttc_table(struct mlx5e_priv *priv) +int mlx5e_create_inner_ttc_table(struct mlx5e_priv *priv) { struct mlx5e_ttc_table *ttc = &priv->fs.inner_ttc; struct mlx5_flow_table_attr ft_attr = {}; @@ -1041,7 +1041,7 @@ static int mlx5e_create_inner_ttc_table(struct mlx5e_priv *priv) return err; } -static void mlx5e_destroy_inner_ttc_table(struct mlx5e_priv *priv) +void mlx5e_destroy_inner_ttc_table(struct mlx5e_priv *priv) { struct mlx5e_ttc_table *ttc = &priv->fs.inner_ttc; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c index abf270d7f556..d2a66dc4adc6 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c @@ -255,15 +255,24 @@ static int mlx5i_create_flow_steering(struct mlx5e_priv *priv) priv->netdev->hw_features &= ~NETIF_F_NTUPLE; } + err = mlx5e_create_inner_ttc_table(priv); + if (err) { + netdev_err(priv->netdev, "Failed to create inner ttc table, err=%d\n", + err); + goto err_destroy_arfs_tables; + } + err = mlx5e_create_ttc_table(priv); if (err) { netdev_err(priv->netdev, "Failed to create ttc table, err=%d\n", err); - goto err_destroy_arfs_tables; + goto err_destroy_inner_ttc_table; } return 0; +err_destroy_inner_ttc_table: + mlx5e_destroy_inner_ttc_table(priv); err_destroy_arfs_tables: mlx5e_arfs_destroy_tables(priv); @@ -273,6 +282,7 @@ static int mlx5i_create_flow_steering(struct mlx5e_priv *priv) static void mlx5i_destroy_flow_steering(struct mlx5e_priv *priv) { mlx5e_destroy_ttc_table(priv); + mlx5e_destroy_inner_ttc_table(priv); mlx5e_arfs_destroy_tables(priv); } -- 2.14.2
[net-next V2 04/12] net/mlx5: QPTS and QPDPM register firmware command support
From: Huy Nguyen The QPTS register allows changing the priority trust state between pcp and dscp. Add support to get/set trust state from device. When the port is in pcp/dscp trust state, packet is routed by hardware to matching priority based on its pcp/dscp value respectively. The QPDPM register allow channing the dscp to priority mapping. Add support to get/set dscp to priority mapping from device. Note that to change a dscp mapping, the "e" bit of this dscp structure must be set in the QPDPM firmware command. Signed-off-by: Huy Nguyen Reviewed-by: Parav Pandit Reviewed-by: Or Gerlitz Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/port.c | 99 ++ include/linux/mlx5/driver.h| 7 ++ include/linux/mlx5/mlx5_ifc.h | 20 ++ include/linux/mlx5/port.h | 5 ++ 4 files changed, 131 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c b/drivers/net/ethernet/mellanox/mlx5/core/port.c index b6553be841f9..c37d00cd472a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/port.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c @@ -971,3 +971,102 @@ int mlx5_set_mtppse(struct mlx5_core_dev *mdev, u8 pin, u8 arm, u8 mode) return mlx5_core_access_reg(mdev, in, sizeof(in), out, sizeof(out), MLX5_REG_MTPPSE, 0, 1); } + +int mlx5_set_trust_state(struct mlx5_core_dev *mdev, u8 trust_state) +{ + u32 out[MLX5_ST_SZ_DW(qpts_reg)] = {}; + u32 in[MLX5_ST_SZ_DW(qpts_reg)] = {}; + int err; + + MLX5_SET(qpts_reg, in, local_port, 1); + MLX5_SET(qpts_reg, in, trust_state, trust_state); + + err = mlx5_core_access_reg(mdev, in, sizeof(in), out, + sizeof(out), MLX5_REG_QPTS, 0, 1); + return err; +} + +int mlx5_query_trust_state(struct mlx5_core_dev *mdev, u8 *trust_state) +{ + u32 out[MLX5_ST_SZ_DW(qpts_reg)] = {}; + u32 in[MLX5_ST_SZ_DW(qpts_reg)] = {}; + int err; + + MLX5_SET(qpts_reg, in, local_port, 1); + + err = mlx5_core_access_reg(mdev, in, sizeof(in), out, + sizeof(out), MLX5_REG_QPTS, 0, 0); + if (!err) + *trust_state = MLX5_GET(qpts_reg, out, trust_state); + + return err; +} + +int mlx5_set_dscp2prio(struct mlx5_core_dev *mdev, u8 dscp, u8 prio) +{ + int sz = MLX5_ST_SZ_BYTES(qpdpm_reg); + void *qpdpm_dscp; + void *out; + void *in; + int err; + + in = kzalloc(sz, GFP_KERNEL); + out = kzalloc(sz, GFP_KERNEL); + if (!in || !out) { + err = -ENOMEM; + goto out; + } + + MLX5_SET(qpdpm_reg, in, local_port, 1); + err = mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_QPDPM, 0, 0); + if (err) + goto out; + + memcpy(in, out, sz); + MLX5_SET(qpdpm_reg, in, local_port, 1); + + /* Update the corresponding dscp entry */ + qpdpm_dscp = MLX5_ADDR_OF(qpdpm_reg, in, dscp[dscp]); + MLX5_SET16(qpdpm_dscp_reg, qpdpm_dscp, prio, prio); + MLX5_SET16(qpdpm_dscp_reg, qpdpm_dscp, e, 1); + err = mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_QPDPM, 0, 1); + +out: + kfree(in); + kfree(out); + return err; +} + +/* dscp2prio[i]: priority that dscp i mapped to */ +#define MLX5E_SUPPORTED_DSCP 64 +int mlx5_query_dscp2prio(struct mlx5_core_dev *mdev, u8 *dscp2prio) +{ + int sz = MLX5_ST_SZ_BYTES(qpdpm_reg); + void *qpdpm_dscp; + void *out; + void *in; + int err; + int i; + + in = kzalloc(sz, GFP_KERNEL); + out = kzalloc(sz, GFP_KERNEL); + if (!in || !out) { + err = -ENOMEM; + goto out; + } + + MLX5_SET(qpdpm_reg, in, local_port, 1); + err = mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_QPDPM, 0, 0); + if (err) + goto out; + + for (i = 0; i < (MLX5E_SUPPORTED_DSCP); i++) { + qpdpm_dscp = MLX5_ADDR_OF(qpdpm_reg, out, dscp[i]); + dscp2prio[i] = MLX5_GET16(qpdpm_dscp_reg, qpdpm_dscp, prio); + } + +out: + kfree(in); + kfree(out); + return err; +} diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index ed5be52282ea..a886b51511ab 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -107,8 +107,10 @@ enum { }; enum { + MLX5_REG_QPTS= 0x4002, MLX5_REG_QETCR = 0x4005, MLX5_REG_QTCT= 0x400a, + MLX5_REG_QPDPM = 0x4013, MLX5_REG_QCAM= 0x4019, MLX5_REG_DCBX_PARAM = 0x4020, MLX5_REG_DCBX_APP= 0x4021, @@ -142,6 +144,11 @@ enum { MLX5_REG_MCAM= 0x907f, }; +enum mlx5_qpts_trust_state { + MLX5_QPTS_TRUST_PCP = 1, + MLX5_QPTS_TRUST_DSCP = 2, +}; + enum mlx5_d
[net-next V2 12/12] net/mlx5e: Enable CQE based moderation on TX CQ
From: Tal Gilboa By using CQE based moderation on TX CQ we can reduce the number of TX interrupt rate. Besides the benefit of less interrupts, this also allows the kernel to better utilize TSO. Since TSO has some CPU overhead, it might not aggregate when CPU is under high stress. By reducing the interrupt rate and the CPU utilization, we can get better aggregation and better overall throughput. The feature is enabled by default and has a private flag in ethtool for control. Throughput, interrupt rate and TSO utilization improvements: (ConnectX-4Lx 40GbE, unidirectional, 1/16 TCP streams, 64B packets) - Metric | Streams | CQE Based | EQE Based | improvement - BW |1| 2.4Gb/s | 2.15Gb/s | +11.6% IR |1| 27Kips | 50.6Kips | -46.7% TSO Util |1| 74.6%| 71% | +5% BW |16 | 29Gb/s | 25.85Gb/s | +12.2% IR |16 | 482Kips | 745Kips | -35.3% TSO Util |16 | 69.1%| 49% | +41.1% *BW = Bandwidth, IR = Interrupt rate, ips = interrupt per second. TSO Util = bytes in TSO sessions / all bytes transferred Signed-off-by: Tal Gilboa Signed-off-by: Saeed Mahameed Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 9 +++-- .../net/ethernet/mellanox/mlx5/core/en_ethtool.c | 39 +- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 38 +++-- drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 8 +++-- 4 files changed, 71 insertions(+), 23 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 95facdf62c77..751f62cae969 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -106,6 +106,7 @@ #define MLX5E_PARAMS_DEFAULT_RX_CQ_MODERATION_USEC_FROM_CQE 0x3 #define MLX5E_PARAMS_DEFAULT_RX_CQ_MODERATION_PKTS 0x20 #define MLX5E_PARAMS_DEFAULT_TX_CQ_MODERATION_USEC 0x10 +#define MLX5E_PARAMS_DEFAULT_TX_CQ_MODERATION_USEC_FROM_CQE 0x10 #define MLX5E_PARAMS_DEFAULT_TX_CQ_MODERATION_PKTS 0x20 #define MLX5E_PARAMS_DEFAULT_MIN_RX_WQES0x80 #define MLX5E_PARAMS_DEFAULT_MIN_RX_WQES_MPW0x2 @@ -198,12 +199,14 @@ extern const char mlx5e_self_tests[][ETH_GSTRING_LEN]; static const char mlx5e_priv_flags[][ETH_GSTRING_LEN] = { "rx_cqe_moder", + "tx_cqe_moder", "rx_cqe_compress", }; enum mlx5e_priv_flag { MLX5E_PFLAG_RX_CQE_BASED_MODER = (1 << 0), - MLX5E_PFLAG_RX_CQE_COMPRESS = (1 << 1), + MLX5E_PFLAG_TX_CQE_BASED_MODER = (1 << 1), + MLX5E_PFLAG_RX_CQE_COMPRESS = (1 << 2), }; #define MLX5E_SET_PFLAG(params, pflag, enable) \ @@ -223,6 +226,7 @@ enum mlx5e_priv_flag { struct mlx5e_cq_moder { u16 usec; u16 pkts; + u8 cq_period_mode; }; struct mlx5e_params { @@ -234,7 +238,6 @@ struct mlx5e_params { u8 log_rq_size; u16 num_channels; u8 num_tc; - u8 rx_cq_period_mode; bool rx_cqe_compress_def; struct mlx5e_cq_moder rx_cq_moderation; struct mlx5e_cq_moder tx_cq_moderation; @@ -926,6 +929,8 @@ void mlx5e_build_default_indir_rqt(u32 *indirection_rqt, int len, int num_channels); int mlx5e_get_max_linkspeed(struct mlx5_core_dev *mdev, u32 *speed); +void mlx5e_set_tx_cq_mode_params(struct mlx5e_params *params, +u8 cq_period_mode); void mlx5e_set_rx_cq_mode_params(struct mlx5e_params *params, u8 cq_period_mode); void mlx5e_set_rq_type_params(struct mlx5_core_dev *mdev, diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c index 63d1ac695a75..23425f028405 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c @@ -1454,29 +1454,36 @@ static int mlx5e_get_module_eeprom(struct net_device *netdev, typedef int (*mlx5e_pflag_handler)(struct net_device *netdev, bool enable); -static int set_pflag_rx_cqe_based_moder(struct net_device *netdev, bool enable) +static int set_pflag_cqe_based_moder(struct net_device *netdev, bool enable, +bool is_rx_cq) { struct mlx5e_priv *priv = netdev_priv(netdev); struct mlx5_core_dev *mdev = priv->mdev; struct mlx5e_channels new_channels = {}; - bool rx_mode_changed; - u8 rx_cq_period_mode; + bool mode_changed; + u8 cq_period_mode, current_cq_period_mode; int err = 0; - rx_cq_period_mode = enable ? + cq_period_mode = enable ? MLX5_CQ_PERIOD_MODE_START_FROM_CQE : MLX5_CQ_PERIOD_MODE_START_FROM_EQE; - rx_mode_changed = rx_cq_
[net-next V2 02/12] net/mlx5: QCAM register firmware command support
From: Huy Nguyen The QCAM register provides capability bit for all the QoS registers using ACCESS_REG command. Signed-off-by: Huy Nguyen Reviewed-by: Parav Pandit Reviewed-by: Or Gerlitz Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/fw.c | 10 ++ .../net/ethernet/mellanox/mlx5/core/mlx5_core.h| 2 ++ drivers/net/ethernet/mellanox/mlx5/core/port.c | 12 +++ include/linux/mlx5/device.h| 14 include/linux/mlx5/driver.h| 2 ++ include/linux/mlx5/mlx5_ifc.h | 40 +- 6 files changed, 79 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw.c b/drivers/net/ethernet/mellanox/mlx5/core/fw.c index 2c71557d1cee..5ef1b56b6a96 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fw.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fw.c @@ -106,6 +106,13 @@ static int mlx5_get_mcam_reg(struct mlx5_core_dev *dev) MLX5_MCAM_REGS_FIRST_128); } +static int mlx5_get_qcam_reg(struct mlx5_core_dev *dev) +{ + return mlx5_query_qcam_reg(dev, dev->caps.qcam, + MLX5_QCAM_FEATURE_ENHANCED_FEATURES, + MLX5_QCAM_REGS_FIRST_128); +} + int mlx5_query_hca_caps(struct mlx5_core_dev *dev) { int err; @@ -182,6 +189,9 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev) if (MLX5_CAP_GEN(dev, mcam_reg)) mlx5_get_mcam_reg(dev); + if (MLX5_CAP_GEN(dev, qcam_reg)) + mlx5_get_qcam_reg(dev); + return 0; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h index 8f00de2fe283..ff4a0b889a6f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h @@ -122,6 +122,8 @@ int mlx5_query_pcam_reg(struct mlx5_core_dev *dev, u32 *pcam, u8 feature_group, u8 access_reg_group); int mlx5_query_mcam_reg(struct mlx5_core_dev *dev, u32 *mcap, u8 feature_group, u8 access_reg_group); +int mlx5_query_qcam_reg(struct mlx5_core_dev *mdev, u32 *qcam, + u8 feature_group, u8 access_reg_group); void mlx5_lag_add(struct mlx5_core_dev *dev, struct net_device *netdev); void mlx5_lag_remove(struct mlx5_core_dev *dev); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c b/drivers/net/ethernet/mellanox/mlx5/core/port.c index e07061f565d6..b6553be841f9 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/port.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c @@ -98,6 +98,18 @@ int mlx5_query_mcam_reg(struct mlx5_core_dev *dev, u32 *mcam, u8 feature_group, return mlx5_core_access_reg(dev, in, sz, mcam, sz, MLX5_REG_MCAM, 0, 0); } +int mlx5_query_qcam_reg(struct mlx5_core_dev *mdev, u32 *qcam, + u8 feature_group, u8 access_reg_group) +{ + u32 in[MLX5_ST_SZ_DW(qcam_reg)] = {}; + int sz = MLX5_ST_SZ_BYTES(qcam_reg); + + MLX5_SET(qcam_reg, in, feature_group, feature_group); + MLX5_SET(qcam_reg, in, access_reg_group, access_reg_group); + + return mlx5_core_access_reg(mdev, in, sz, qcam, sz, MLX5_REG_QCAM, 0, 0); +} + struct mlx5_reg_pcap { u8 rsvd0; u8 port_num; diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h index e32dbc4934db..6d79b3f79458 100644 --- a/include/linux/mlx5/device.h +++ b/include/linux/mlx5/device.h @@ -1000,6 +1000,14 @@ enum mlx5_mcam_feature_groups { MLX5_MCAM_FEATURE_ENHANCED_FEATURES = 0x0, }; +enum mlx5_qcam_reg_groups { + MLX5_QCAM_REGS_FIRST_128= 0x0, +}; + +enum mlx5_qcam_feature_groups { + MLX5_QCAM_FEATURE_ENHANCED_FEATURES = 0x0, +}; + /* GET Dev Caps macros */ #define MLX5_CAP_GEN(mdev, cap) \ MLX5_GET(cmd_hca_cap, mdev->caps.hca_cur[MLX5_CAP_GENERAL], cap) @@ -1108,6 +1116,12 @@ enum mlx5_mcam_feature_groups { #define MLX5_CAP_MCAM_FEATURE(mdev, fld) \ MLX5_GET(mcam_reg, (mdev)->caps.mcam, mng_feature_cap_mask.enhanced_features.fld) +#define MLX5_CAP_QCAM_REG(mdev, fld) \ + MLX5_GET(qcam_reg, (mdev)->caps.qcam, qos_access_reg_cap_mask.reg_cap.fld) + +#define MLX5_CAP_QCAM_FEATURE(mdev, fld) \ + MLX5_GET(qcam_reg, (mdev)->caps.qcam, qos_feature_cap_mask.feature_cap.fld) + #define MLX5_CAP_FPGA(mdev, cap) \ MLX5_GET(fpga_cap, (mdev)->caps.fpga, cap) diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 08c77b7e59cb..ed5be52282ea 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -109,6 +109,7 @@ enum { enum { MLX5_REG_QETCR = 0x4005, MLX5_REG_QTCT= 0x400a, + MLX5_REG_QCAM= 0x4019, MLX5_REG_DCBX_PARAM = 0x4020,
Re: [net-next 03/12] net/mlx5: Add MLX5_SET16 and MLX5_GET16
On Sat, Nov 4, 2017 at 5:55 AM, Or Gerlitz wrote: > On Sat, Nov 4, 2017 at 6:35 PM, David Miller wrote: >> From: Or Gerlitz >> Date: Sat, 4 Nov 2017 18:05:29 +0900 >> >>> On Sat, Nov 4, 2017 at 5:50 PM, Saeed Mahameed wrote: From: Huy Nguyen Add MLX5_SET16 and MLX5_GET16 for 16bit structure field in firmware command. Signed-off-by: Huy Nguyen Reviewed-by: Parav Pandit Signed-off-by: Saeed Mahameed >>> >>> This was reviewed by Eli Cohen, his R.B is missing here. Eli - if you >>> can reply and add it such that >>> patchworks will pick it up. >> >> Not if I pull from Saeed's tree, which is what I usually do for mlx5 >> submissions. > Dave, I see you didn't pull yet, I can fix this. I will send V2 shortly. Thanks, Saeed. > So I guess Saeed's maintainer signature could be enough
[PATCH] net/mlx5e/core/en_fs: fix pointer dereference after free in mlx5e_execute_l2_action
hn is being kfree'd in mlx5e_del_l2_from_hash and then dereferenced by accessing hn->ai.addr Fix this by copying the MAC address into a local variable for its safe use in all possible execution paths within function mlx5e_execute_l2_action. Addresses-Coverity-ID: 1417789 Fixes: eeb66cdb6826 ("net/mlx5: Separate between E-Switch and MPFS") Signed-off-by: Gustavo A. R. Silva --- drivers/net/ethernet/mellanox/mlx5/core/en_fs.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c index 850cdc9..4837045 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c @@ -365,21 +365,24 @@ static void mlx5e_execute_l2_action(struct mlx5e_priv *priv, struct mlx5e_l2_hash_node *hn) { u8 action = hn->action; + u8 mac_addr[ETH_ALEN]; int l2_err = 0; + ether_addr_copy(mac_addr, hn->ai.addr); + switch (action) { case MLX5E_ACTION_ADD: mlx5e_add_l2_flow_rule(priv, &hn->ai, MLX5E_FULLMATCH); - if (!is_multicast_ether_addr(hn->ai.addr)) { - l2_err = mlx5_mpfs_add_mac(priv->mdev, hn->ai.addr); + if (!is_multicast_ether_addr(mac_addr)) { + l2_err = mlx5_mpfs_add_mac(priv->mdev, mac_addr); hn->mpfs = !l2_err; } hn->action = MLX5E_ACTION_NONE; break; case MLX5E_ACTION_DEL: - if (!is_multicast_ether_addr(hn->ai.addr) && hn->mpfs) - l2_err = mlx5_mpfs_del_mac(priv->mdev, hn->ai.addr); + if (!is_multicast_ether_addr(mac_addr) && hn->mpfs) + l2_err = mlx5_mpfs_del_mac(priv->mdev, mac_addr); mlx5e_del_l2_flow_rule(priv, &hn->ai); mlx5e_del_l2_from_hash(hn); break; @@ -387,7 +390,7 @@ static void mlx5e_execute_l2_action(struct mlx5e_priv *priv, if (l2_err) netdev_warn(priv->netdev, "MPFS, failed to %s mac %pM, err(%d)\n", - action == MLX5E_ACTION_ADD ? "add" : "del", hn->ai.addr, l2_err); + action == MLX5E_ACTION_ADD ? "add" : "del", mac_addr, l2_err); } static void mlx5e_sync_netdev_addr(struct mlx5e_priv *priv) -- 2.7.4
[PATCH v4 1/1] xdp: Sample xdp program implementing ip forward
From: Christina Jacob Implements port to port forwarding with route table and arp table lookup for ipv4 packets using bpf_redirect helper function and lpm_trie map. Signed-off-by: Christina Jacob --- samples/bpf/Makefile | 4 + samples/bpf/xdp_router_ipv4_kern.c | 186 +++ samples/bpf/xdp_router_ipv4_user.c | 659 + 3 files changed, 849 insertions(+) diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile index cf17c79..8504ebb 100644 --- a/samples/bpf/Makefile +++ b/samples/bpf/Makefile @@ -28,6 +28,7 @@ hostprogs-y += test_cgrp2_sock hostprogs-y += test_cgrp2_sock2 hostprogs-y += xdp1 hostprogs-y += xdp2 +hostprogs-y += xdp_router_ipv4 hostprogs-y += test_current_task_under_cgroup hostprogs-y += trace_event hostprogs-y += sampleip @@ -73,6 +74,7 @@ test_cgrp2_sock2-objs := bpf_load.o $(LIBBPF) test_cgrp2_sock2.o xdp1-objs := bpf_load.o $(LIBBPF) xdp1_user.o # reuse xdp1 source intentionally xdp2-objs := bpf_load.o $(LIBBPF) xdp1_user.o +xdp_router_ipv4-objs := bpf_load.o $(LIBBPF) xdp_router_ipv4_user.o test_current_task_under_cgroup-objs := bpf_load.o $(LIBBPF) cgroup_helpers.o \ test_current_task_under_cgroup_user.o trace_event-objs := bpf_load.o $(LIBBPF) trace_event_user.o @@ -114,6 +116,7 @@ always += parse_varlen.o parse_simple.o parse_ldabs.o always += test_cgrp2_tc_kern.o always += xdp1_kern.o always += xdp2_kern.o +always += xdp_router_ipv4_kern.o always += test_current_task_under_cgroup_kern.o always += trace_event_kern.o always += sampleip_kern.o @@ -160,6 +163,7 @@ HOSTLOADLIBES_map_perf_test += -lelf -lrt HOSTLOADLIBES_test_overhead += -lelf -lrt HOSTLOADLIBES_xdp1 += -lelf HOSTLOADLIBES_xdp2 += -lelf +HOSTLOADLIBES_xdp_router_ipv4 += -lelf HOSTLOADLIBES_test_current_task_under_cgroup += -lelf HOSTLOADLIBES_trace_event += -lelf HOSTLOADLIBES_sampleip += -lelf diff --git a/samples/bpf/xdp_router_ipv4_kern.c b/samples/bpf/xdp_router_ipv4_kern.c new file mode 100644 index 000..993f56b --- /dev/null +++ b/samples/bpf/xdp_router_ipv4_kern.c @@ -0,0 +1,186 @@ +/* Copyright (C) 2017 Cavium, Inc. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of version 2 of the GNU General Public License + * as published by the Free Software Foundation. + */ +#define KBUILD_MODNAME "foo" +#include +#include +#include +#include +#include +#include +#include +#include "bpf_helpers.h" +#include +#include + +struct trie_value { + __u8 prefix[4]; + __be64 value; + int ifindex; + int metric; + __be32 gw; +}; + +/* Key for lpm_trie*/ +union key_4 { + u32 b32[2]; + u8 b8[8]; +}; + +struct arp_entry { + __be64 mac; + __be32 dst; +}; + +struct direct_map { + struct arp_entry arp; + int ifindex; + __be64 mac; +}; + +/* Map for trie implementation*/ +struct bpf_map_def SEC("maps") lpm_map = { + .type = BPF_MAP_TYPE_LPM_TRIE, + .key_size = 8, + .value_size = sizeof(struct trie_value), + .max_entries = 50, + .map_flags = BPF_F_NO_PREALLOC, +}; + +/* Map for counter*/ +struct bpf_map_def SEC("maps") rxcnt = { + .type = BPF_MAP_TYPE_PERCPU_ARRAY, + .key_size = sizeof(u32), + .value_size = sizeof(u64), + .max_entries = 256, +}; + +/* Map for ARP table*/ +struct bpf_map_def SEC("maps") arp_table = { + .type = BPF_MAP_TYPE_HASH, + .key_size = sizeof(__be32), + .value_size = sizeof(__be64), + .max_entries = 50, +}; + +/* Map to keep the exact match entries in the route table*/ +struct bpf_map_def SEC("maps") exact_match = { + .type = BPF_MAP_TYPE_HASH, + .key_size = sizeof(__be32), + .value_size = sizeof(struct direct_map), + .max_entries = 50, +}; + +struct bpf_map_def SEC("maps") tx_port = { + .type = BPF_MAP_TYPE_DEVMAP, + .key_size = sizeof(int), + .value_size = sizeof(int), + .max_entries = 100, +}; + +/* Function to set source and destination mac of the packet */ +static inline void set_src_dst_mac(void *data, void *src, void *dst) +{ + unsigned short *source = src; + unsigned short *dest = dst; + unsigned short *p = data; + + __builtin_memcpy(p, dest, 6); + __builtin_memcpy(p + 3, source, 6); +} + +/* Parse IPV4 packet to get SRC, DST IP and protocol */ +static inline int parse_ipv4(void *data, u64 nh_off, void *data_end, +__be32 *src, __be32 *dest) +{ + struct iphdr *iph = data + nh_off; + + if (iph + 1 > data_end) + return 0; + *src = iph->saddr; + *dest = iph->daddr; + return iph->protocol; +} + +SEC("xdp_router_ipv4") +int xdp_router_ipv4_prog(struct xdp_md *ctx) +{ + void *data_end = (void *)(long)ctx->data_end; + __be64 *dest_mac = NULL, *src_mac = NULL; + void *data = (void *)(long)ctx->data; + struct trie_v
[PATCH v4 0/1] XDP program for ip forward
From: Christina Jacob The patch below implements port to port forwarding through route table and arp table lookup for ipv4 packets using bpf_redirect helper function and lpm_trie map. This has an improved performance over the normal kernel stack ip forward. Implementation details. --- The program uses one map each for arp table, route table and packet count. The number of entries the program can process is limited by the size of the map used. In the xdp_router_ipv4_user.c, initially, the routing table is read and is stored in an lpm trie map. The arp table is read and stored in an array map There are two netlink sockets that listens to any change in the route table and arp table. There are two types of changes to the route table. 1.New The new entries are added to the lpm_trie with proper key and prefix length If there is a another entry in the route table with a different metric(only metric is considered). Then the values are compared and the one with lowest metric is added to the node. 2.Deletion On deletion from the route table, The particular node is removed and the entire route table is again read to check if there is another entry with a different metric. This implementation depends on bpf: Implement map_delete_elem for BPF_MAP_TYPE_LPM_TRIE which is not yet upstreamed. There are two types of changes to the route table 1.New The new arp entries are added in the in the array map directly with the ip address as the key and the destination mac address as the value. 2.Delete The entry corresponding to the particular ip is deleted from the arp table map. Another map is maintained for entries in the route table having 32 bit mask. such entries can have a corresponding arp entry which if stored together with the route entry in an array map and can be accessed in O(1) time. Eliminating the trie lookup and arp lookup. In the xdp_router_ipv4_kern.c, The array map for the 32 bit mask entries checked to see if there is a key that exactly matches with the destination ip. If it has a non zero destination mac entry then the xdp data is updated accordingly Otherwise a proper route and arp table lookup is done using the lpm_trie and the arp table array map. Usage: as ./xdp_router_ipv4 -S (-S for generic xdp implementation ifindex- the index of the interface to which the xdp program has to be attached.) in 4.14-rc3 kernel. Changes from v1 to v2 - * As suggested by Jesper Dangaard Brouer 1. Changed the program name to list xdp_router_ipv4 2. Changed the commandline arguments from ifindex list to interface name Usage : ./xdp_router_ipv4 [-S] -S for generic xdp implementation -interface name list is the list of interfaces to which the xdp program should attach to * As suggested by Daniel Borkmann 1. Using __builin_memcpy to update source and destination mac in the bpf kernel program. 2. Started using __be32 in the kernel program to be inline with the data type used in user program 3. Rectified few style issues. * Corrected the copyright issue pointed out by David Ahern * Fixed the bug: The already attached interfaces are not detached from the xdp program if the program fails to attach to an interface later in the list. Changes from v2 to v3 - * As pointed out by Jesper Dangaard Brouer 1. Changed the program name in the cover letter. 2. Changed variable declararions to follow Reverse-xmas tree rule. 3. Reduced the nesting in code for readability. 4. Fixed bug: incorrect mac address being set for source and destination mac. 5. Fixed comment style. * As suggested by Stephen Hemminger Changed all the bzeros' to memset. * As suggested by David Laight removed the signed remainders calculation. * As suggested by Stephen Hemminger and David Daney 1. Added checks for the ioctl return value. 2. Changed data types to be64 to be sure about the size of the data type. 3. Verified byte order. Using the mac address from ioctl in network byte order. not casting to to long data type anymore. 4. Fixed returning address of local variable. Changes from v3 to v4 - * As suggested by Jesper, 1. Removed redundant typecastings. 2. Modified program to use bpf_redirect_map for better performance. 3. Changed program name in the code as well. Christina Jacob (1): xdp: Sample xdp program implementing ip forward samples/bpf/Makefile
Re: linux-next: manual merge of the tip tree with the net-next tree
Hi Alexei, On Wed, 1 Nov 2017 09:27:14 -0700 Alexei Starovoitov wrote: > > Also what do you mean by "same patch != same commit" ? > Like if we had pushed to some 3rd tree first and then pulled > into tip and net-next it would have been better? Well, it would not have caused a conflict. -- Cheers, Stephen Rothwell
Re: [PATCH net-next] ipv6: remove IN6_ADDR_HSIZE from addrconf.h
From: Eric Dumazet Date: Sat, 04 Nov 2017 08:53:27 -0700 > From: Eric Dumazet > > IN6_ADDR_HSIZE is private to addrconf.c, move it here to avoid > confusion. > > Signed-off-by: Eric Dumazet > --- > Should be applied after pktgen fix, thanks ! Thanks for resolving this, scary to see something like this :)
Re: [PATCH net-next] pktgen: do not abuse IN6_ADDR_HSIZE
From: Eric Dumazet Date: Sat, 04 Nov 2017 08:27:14 -0700 > From: Eric Dumazet > > pktgen accidentally used IN6_ADDR_HSIZE, instead of using the size of an > IPv6 address. > > Since IN6_ADDR_HSIZE recently was increased from 16 to 256, this old > bug is hitting us. > > Fixes: 3f27fb23219e ("ipv6: addrconf: add per netns perturbation in > inet6_addr_hash()") > Signed-off-by: Eric Dumazet > Reported-by: Dan Carpenter Applied.
Re: [PATCH net-next] ipv6: remove IN6_ADDR_HSIZE from addrconf.h
On 11/5/17 12:53 AM, Eric Dumazet wrote: > From: Eric Dumazet > > IN6_ADDR_HSIZE is private to addrconf.c, move it here to avoid > confusion. > > Signed-off-by: Eric Dumazet > --- > Should be applied after pktgen fix, thanks ! > > include/net/addrconf.h |3 --- > net/ipv6/addrconf.c|2 ++ > 2 files changed, 2 insertions(+), 3 deletions(-) Acked-by: David Ahern
Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces
Quoting Mahesh Bandewar (mah...@bandewar.net): > Init-user-ns is always uncontrolled and a process that has SYS_ADMIN > that belongs to uncontrolled user-ns can create another (child) user- > namespace that is uncontrolled. Any other process (that either does > not have SYS_ADMIN or belongs to a controlled user-ns) can only > create a user-ns that is controlled. That's a huge change though. It means that any system that previously used unprivileged containers will need new privileged code (which always risks more privilege leaks through the new code) to re-enable what was possible without privilege before. That's a regression. I'm very much interested in what you want to do, But it seems like it would be worth starting with some automated code analysis that shows exactly what code becomes accessible to unprivileged users with user namespaces which was accessible to unprivileged users before. Then we can reason about classifying that code and perhaps limiting access to some of it.
Re: Regression in throughput between kvm guests over virtual bridge
On Fri, Nov 03, 2017 at 12:30:12AM -0400, Matthew Rosato wrote: > On 10/31/2017 03:07 AM, Wei Xu wrote: > > On Thu, Oct 26, 2017 at 01:53:12PM -0400, Matthew Rosato wrote: > >> > >>> > >>> Are you using the same binding as mentioned in previous mail sent by you? > >>> it > >>> might be caused by cpu convention between pktgen and vhost, could you > >>> please > >>> try to run pktgen from another idle cpu by adjusting the binding? > >> > >> I don't think that's the case -- I can cause pktgen to hang in the guest > >> without any cpu binding, and with vhost disabled even. > > > > Yes, I did a test and it also hangs in guest, before we figure it out, > > maybe you try udp with uperf with this case? > > > > VM -> Host > > Host -> VM > > VM -> VM > > > > Here are averaged run numbers (Gbps throughput) across 4.12, 4.13 and > net-next with and without Jason's recent "vhost_net: conditionally > enable tx polling" applied (referred to as 'patch' below). 1 uperf > instance in each case: Thanks a lot for the test. > > uperf TCP: >4.12 4.134.13+patch net-nextnet-next+patch > -- > VM->VM 35.2 16.520.84 22.224.36 Are you using the same server/test suite? You mentioned the number was around 28Gb for 4.12 and it dropped about 40% for 4.13, it seems thing changed, are there any options for performance tuning on the server to maximize the cpu utilization? I had similar experience on x86 server and desktop before and it made that the result number always went up and down pretty much. > VM->Host 42.1543.57 44.90 30.83 32.26 > Host->VM 53.1741.51 42.18 37.05 37.30 This is a bit odd, I remember you said there was no regression while testing Host>VM, wasn't it? > > uperf UDP: >4.12 4.134.13+patch net-nextnet-next+patch > -- > VM->VM 24.93 21.63 25.09 8.869.62 > VM->Host 40.2138.21 39.72 8.749.35 > Host->VM 31.2630.18 31.25 7.2 9.26 This case should be quite similar with pkgten, if you got improvement with pktgen, usually it was also the same for UDP, could you please try to disable tso, gso, gro, ufo on all host tap devices and guest virtio-net devices? Currently the most significant tests would be like this AFAICT: Host->VM 4.124.13 TCP: UDP: pktgen: Don't want to bother you too much, so maybe 4.12 & 4.13 without Jason's patch should work since we have seen positive number for that, you can also temporarily skip net-next as well. If you see UDP and pktgen are aligned, then it might be helpful to continue the other two cases, otherwise we fail in the first place. > The net is that Jason's recent patch definitely improves things across > the board at 4.13 as well as at net-next -- But the VM<->VM TCP numbers > I am observing are still lower than base 4.12. Cool. > > A separate concern is why my UDP numbers look so bad on net-next (have > not bisected this yet). This might be another issue, I am in vacation, will try it on x86 once back to work on next Wednesday. Wei >
Re: [patch net-next 3/5] net: sched: introduce block mechanism to handle netif_keep_dst calls
On 11/04/2017 02:01 PM, Jiri Pirko wrote: [...] Ah, indeed, I missed this. I will rename TCQ_F_INGRESS to TCQ_F_CLSACT as a part of this patchset too. Sounds reasonable, thanks!
Re: [RFC PATCH] bpf: Add helpers to read useful task_struct members
On 11/5/17 2:31 AM, Naveen N. Rao wrote: Hi Alexei, Alexei Starovoitov wrote: On 11/3/17 3:58 PM, Sandipan Das wrote: For added security, the layout of some structures can be randomized by enabling CONFIG_GCC_PLUGIN_RANDSTRUCT. One such structure is task_struct. To build BPF programs, we use Clang which does not support this feature. So, if we attempt to read a field of a structure with a randomized layout within a BPF program, we do not get the expected value because of incorrect offsets. To observe this, it is not mandatory to have CONFIG_GCC_PLUGIN_RANDSTRUCT enabled because the structure annotations/members added for this purpose are enough to cause this. So, all kernel builds are affected. For example, considering samples/bpf/offwaketime_kern.c, if we try to print the values of pid and comm inside the task_struct passed to waker() by adding the following lines of code at the appropriate place char fmt[] = "waker(): p->pid = %u, p->comm = %s\n"; bpf_trace_printk(fmt, sizeof(fmt), _(p->pid), _(p->comm)); it is seen that upon rebuilding and running this sample followed by inspecting /sys/kernel/debug/tracing/trace, the output looks like the following _-=> irqs-off / _=> need-resched | / _---=> hardirq/softirq || / _--=> preempt-depth ||| / delay TASK-PID CPU# TIMESTAMP FUNCTION | | | | | -0 [007] d.s. 1883.443594: 0x0001: waker(): p->pid = 0, p->comm = -0 [018] d.s. 1883.453588: 0x0001: waker(): p->pid = 0, p->comm = -0 [007] d.s. 1883.463584: 0x0001: waker(): p->pid = 0, p->comm = -0 [009] d.s. 1883.483586: 0x0001: waker(): p->pid = 0, p->comm = -0 [005] d.s. 1883.493583: 0x0001: waker(): p->pid = 0, p->comm = -0 [009] d.s. 1883.503583: 0x0001: waker(): p->pid = 0, p->comm = -0 [018] d.s. 1883.513578: 0x0001: waker(): p->pid = 0, p->comm = systemd-journal-3140 [003] d... 1883.627660: 0x0001: waker(): p->pid = 0, p->comm = systemd-journal-3140 [003] d... 1883.627704: 0x0001: waker(): p->pid = 0, p->comm = systemd-journal-3140 [003] d... 1883.627723: 0x0001: waker(): p->pid = 0, p->comm = To avoid this, we add new BPF helpers that read the correct values for some of the important task_struct members such as pid, tgid, comm and flags which are extensively used in BPF-based analysis tools such as bcc. Since these helpers are built with GCC, they use the correct offsets when referencing a member. Signed-off-by: Sandipan Das ... diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index f90860d1f897..324508d27bd2 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -338,6 +338,16 @@ union bpf_attr { * @skb: pointer to skb * Return: classid if != 0 * + * u64 bpf_get_task_pid_tgid(struct task_struct *task) + * Return: task->tgid << 32 | task->pid + * + * int bpf_get_task_comm(struct task_struct *task) + * Stores task->comm into buf + * Return: 0 on success or negative error + * + * u32 bpf_get_task_flags(struct task_struct *task) + * Return: task->flags + * I don't think it's a solution. Tracing scripts read other fields too. Making it work for these 3 fields is a drop in a bucket. Indeed. However... If randomization is used I think we have to accept that existing bpf scripts won't be usable. ... the actual issue is that randomization isn't necessary for this to show up. The annotations added to mark off the structure members results in some structure members being moved into an anonymous structure, which would then get padded differently. So, *all* kernels since v4.13 are affected, afaict. hmm. why would all 4.13+ be affected? It's just an anonymous struct inside task_struct. Are you saying that due to clang not adding this 'struct { };' treatment to task_struct? I thought such struct shouldn't change layout. If it is we need to fix include/linux/compiler-clang.h to do that anon struct as well. As such, we wanted to propose this as a short term solution, but I do agree that this doesn't solve the real issue. Long term solution is to support 'BPF Type Format' or BTF (which is old C-Type Format) for kernel data structures, so bcc scripts wouldn't need to use kernel headers and clang. The proper offsets will be described in BTF. We were planning to use it initially to describe map key/value, but it applies for this case as well. There will be a tool that will take dwarf from vmlinux and compress it into BTF. Kernel will also be able to verify that BTF is a valid BTF. This is the first that I've heard about BTF. Can you share more details about it, or point me to some place where it has been discussed? We considered having tools derive the st
Re: [net-next 0/7] nfp: ethtool and related improvements
On Sat, 4 Nov 2017 16:48:53 +0100, Simon Horman wrote: > Dirk van der Merwe says: > > This patch series throws a couple of loosely related items into a single > series. > > Patch 1: Clang compilation fix reported by > Matthias Kaehlcke > > Patch 2: Driver can now do MAC reinit on load when there has been a > media override set in the NSP. > > Patch 3: Refactor the nfp_app_reprs_set API. > > Patch 4: Similar to vNICs, representors must be able to deal with media > override changes in the NSP. > > Patch 5: Since representors can now handle media overrides, we can > allocate the get/set link ndo's to them. > > Patch 6 & 7: Add support for FEC mode modification. I forgot to put: Reviewed-by: Jakub Kicinski on Dirk's patches, thanks for posting Simon!
Re: [PATCH] rtlwifi: remove redundant initialization to cfg_cmd
On 11/04/2017 02:37 PM, Colin King wrote: From: Colin Ian King cfg_cmd is initialized to zero and this value is never read, instead it is over-written in the start of a do-while loop. Remove the redundant initialization. Cleans up clang warning: drivers/net/wireless/realtek/rtlwifi/core.c:1750:22: warning: Value stored to 'cfg_cmd' during its initialization is never read Signed-off-by: Colin Ian King Looks OK to me. Acked-by: Larry Finger Thanks, Larry --- drivers/net/wireless/realtek/rtlwifi/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/wireless/realtek/rtlwifi/core.c b/drivers/net/wireless/realtek/rtlwifi/core.c index 1147327e6f52..7a17cc20c57e 100644 --- a/drivers/net/wireless/realtek/rtlwifi/core.c +++ b/drivers/net/wireless/realtek/rtlwifi/core.c @@ -1748,7 +1748,7 @@ bool rtl_hal_pwrseqcmdparsing(struct rtl_priv *rtlpriv, u8 cut_version, u8 faversion, u8 interface_type, struct wlan_pwr_cfg pwrcfgcmd[]) { - struct wlan_pwr_cfg cfg_cmd = {0}; + struct wlan_pwr_cfg cfg_cmd; bool polling_bit = false; u32 ary_idx = 0; u8 value = 0;
[PATCH] rtlwifi: remove redundant initialization to cfg_cmd
From: Colin Ian King cfg_cmd is initialized to zero and this value is never read, instead it is over-written in the start of a do-while loop. Remove the redundant initialization. Cleans up clang warning: drivers/net/wireless/realtek/rtlwifi/core.c:1750:22: warning: Value stored to 'cfg_cmd' during its initialization is never read Signed-off-by: Colin Ian King --- drivers/net/wireless/realtek/rtlwifi/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/wireless/realtek/rtlwifi/core.c b/drivers/net/wireless/realtek/rtlwifi/core.c index 1147327e6f52..7a17cc20c57e 100644 --- a/drivers/net/wireless/realtek/rtlwifi/core.c +++ b/drivers/net/wireless/realtek/rtlwifi/core.c @@ -1748,7 +1748,7 @@ bool rtl_hal_pwrseqcmdparsing(struct rtl_priv *rtlpriv, u8 cut_version, u8 faversion, u8 interface_type, struct wlan_pwr_cfg pwrcfgcmd[]) { - struct wlan_pwr_cfg cfg_cmd = {0}; + struct wlan_pwr_cfg cfg_cmd; bool polling_bit = false; u32 ary_idx = 0; u8 value = 0; -- 2.14.1
Re: [RFC PATCH] bpf: Add helpers to read useful task_struct members
Hi Alexei, Alexei Starovoitov wrote: On 11/3/17 3:58 PM, Sandipan Das wrote: For added security, the layout of some structures can be randomized by enabling CONFIG_GCC_PLUGIN_RANDSTRUCT. One such structure is task_struct. To build BPF programs, we use Clang which does not support this feature. So, if we attempt to read a field of a structure with a randomized layout within a BPF program, we do not get the expected value because of incorrect offsets. To observe this, it is not mandatory to have CONFIG_GCC_PLUGIN_RANDSTRUCT enabled because the structure annotations/members added for this purpose are enough to cause this. So, all kernel builds are affected. For example, considering samples/bpf/offwaketime_kern.c, if we try to print the values of pid and comm inside the task_struct passed to waker() by adding the following lines of code at the appropriate place char fmt[] = "waker(): p->pid = %u, p->comm = %s\n"; bpf_trace_printk(fmt, sizeof(fmt), _(p->pid), _(p->comm)); it is seen that upon rebuilding and running this sample followed by inspecting /sys/kernel/debug/tracing/trace, the output looks like the following _-=> irqs-off / _=> need-resched | / _---=> hardirq/softirq || / _--=> preempt-depth ||| / delay TASK-PID CPU# TIMESTAMP FUNCTION | | | | | -0 [007] d.s. 1883.443594: 0x0001: waker(): p->pid = 0, p->comm = -0 [018] d.s. 1883.453588: 0x0001: waker(): p->pid = 0, p->comm = -0 [007] d.s. 1883.463584: 0x0001: waker(): p->pid = 0, p->comm = -0 [009] d.s. 1883.483586: 0x0001: waker(): p->pid = 0, p->comm = -0 [005] d.s. 1883.493583: 0x0001: waker(): p->pid = 0, p->comm = -0 [009] d.s. 1883.503583: 0x0001: waker(): p->pid = 0, p->comm = -0 [018] d.s. 1883.513578: 0x0001: waker(): p->pid = 0, p->comm = systemd-journal-3140 [003] d... 1883.627660: 0x0001: waker(): p->pid = 0, p->comm = systemd-journal-3140 [003] d... 1883.627704: 0x0001: waker(): p->pid = 0, p->comm = systemd-journal-3140 [003] d... 1883.627723: 0x0001: waker(): p->pid = 0, p->comm = To avoid this, we add new BPF helpers that read the correct values for some of the important task_struct members such as pid, tgid, comm and flags which are extensively used in BPF-based analysis tools such as bcc. Since these helpers are built with GCC, they use the correct offsets when referencing a member. Signed-off-by: Sandipan Das ... diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index f90860d1f897..324508d27bd2 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -338,6 +338,16 @@ union bpf_attr { * @skb: pointer to skb * Return: classid if != 0 * + * u64 bpf_get_task_pid_tgid(struct task_struct *task) + * Return: task->tgid << 32 | task->pid + * + * int bpf_get_task_comm(struct task_struct *task) + * Stores task->comm into buf + * Return: 0 on success or negative error + * + * u32 bpf_get_task_flags(struct task_struct *task) + * Return: task->flags + * I don't think it's a solution. Tracing scripts read other fields too. Making it work for these 3 fields is a drop in a bucket. Indeed. However... If randomization is used I think we have to accept that existing bpf scripts won't be usable. ... the actual issue is that randomization isn't necessary for this to show up. The annotations added to mark off the structure members results in some structure members being moved into an anonymous structure, which would then get padded differently. So, *all* kernels since v4.13 are affected, afaict. As such, we wanted to propose this as a short term solution, but I do agree that this doesn't solve the real issue. Long term solution is to support 'BPF Type Format' or BTF (which is old C-Type Format) for kernel data structures, so bcc scripts wouldn't need to use kernel headers and clang. The proper offsets will be described in BTF. We were planning to use it initially to describe map key/value, but it applies for this case as well. There will be a tool that will take dwarf from vmlinux and compress it into BTF. Kernel will also be able to verify that BTF is a valid BTF. This is the first that I've heard about BTF. Can you share more details about it, or point me to some place where it has been discussed? We considered having tools derive the structure offsets from debuginfo, but debuginfo may not always be present on production systems. So, it isn't clear if having that dependency is fine. I'm not sure how BTF will be different. I'm assuming that gcc randomization plugin produces dwarf with correct offsets, if not, it would have to be fixed. I think the offsets describ
[PATCH net-next] ipv6: remove IN6_ADDR_HSIZE from addrconf.h
From: Eric Dumazet IN6_ADDR_HSIZE is private to addrconf.c, move it here to avoid confusion. Signed-off-by: Eric Dumazet --- Should be applied after pktgen fix, thanks ! include/net/addrconf.h |3 --- net/ipv6/addrconf.c|2 ++ 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/include/net/addrconf.h b/include/net/addrconf.h index 3357332ea375b53dfb7704ea5eb8274a904f59b8..b623b65a79d1687602ba319cf9047a4c41b6396b 100644 --- a/include/net/addrconf.h +++ b/include/net/addrconf.h @@ -59,9 +59,6 @@ struct in6_validator_info { struct netlink_ext_ack *extack; }; -#define IN6_ADDR_HSIZE_SHIFT 8 -#define IN6_ADDR_HSIZE (1 << IN6_ADDR_HSIZE_SHIFT) - int addrconf_init(void); void addrconf_cleanup(void); diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 69b8cdb43aa2a7289b9133a4fcad3da5d148a7fb..66d8c3d912fdb3de8d1bc157e8e7fe3750fd9005 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -157,6 +157,8 @@ static int ipv6_generate_stable_address(struct in6_addr *addr, u8 dad_count, const struct inet6_dev *idev); +#define IN6_ADDR_HSIZE_SHIFT 8 +#define IN6_ADDR_HSIZE (1 << IN6_ADDR_HSIZE_SHIFT) /* * Configured unicast address hash table */
[net-next 3/7] nfp: refactor nfp_app_reprs_set
From: Dirk van der Merwe The criteria that reprs cannot be replaced with another new set of reprs has been removed. This check is not needed since the only use case that could exercise this at the moment, would be to modify the number of SRIOV VFs without first disabling them. This case is explicitly disallowed in any case and subsequent patches in this series need to be able to replace the running set of reprs. All cases where the return code used to be checked for the nfp_app_reprs_set function have been removed. As stated above, it is not possible for the current code to encounter a case where reprs exist and need to be replaced. Signed-off-by: Dirk van der Merwe Signed-off-by: Simon Horman --- drivers/net/ethernet/netronome/nfp/flower/main.c | 16 drivers/net/ethernet/netronome/nfp/nfp_app.c | 6 -- 2 files changed, 4 insertions(+), 18 deletions(-) diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.c b/drivers/net/ethernet/netronome/nfp/flower/main.c index e46e7c60d491..e0283bb24f06 100644 --- a/drivers/net/ethernet/netronome/nfp/flower/main.c +++ b/drivers/net/ethernet/netronome/nfp/flower/main.c @@ -142,8 +142,8 @@ nfp_flower_spawn_vnic_reprs(struct nfp_app *app, { u8 nfp_pcie = nfp_cppcore_pcie_unit(app->pf->cpp); struct nfp_flower_priv *priv = app->priv; - struct nfp_reprs *reprs, *old_reprs; enum nfp_port_type port_type; + struct nfp_reprs *reprs; const u8 queue = 0; int i, err; @@ -194,11 +194,7 @@ nfp_flower_spawn_vnic_reprs(struct nfp_app *app, reprs->reprs[i]->name); } - old_reprs = nfp_app_reprs_set(app, repr_type, reprs); - if (IS_ERR(old_reprs)) { - err = PTR_ERR(old_reprs); - goto err_reprs_clean; - } + nfp_app_reprs_set(app, repr_type, reprs); return 0; err_reprs_clean: @@ -222,8 +218,8 @@ static int nfp_flower_spawn_phy_reprs(struct nfp_app *app, struct nfp_flower_priv *priv) { struct nfp_eth_table *eth_tbl = app->pf->eth_tbl; - struct nfp_reprs *reprs, *old_reprs; struct sk_buff *ctrl_skb; + struct nfp_reprs *reprs; unsigned int i; int err; @@ -280,11 +276,7 @@ nfp_flower_spawn_phy_reprs(struct nfp_app *app, struct nfp_flower_priv *priv) phys_port, reprs->reprs[phys_port]->name); } - old_reprs = nfp_app_reprs_set(app, NFP_REPR_TYPE_PHYS_PORT, reprs); - if (IS_ERR(old_reprs)) { - err = PTR_ERR(old_reprs); - goto err_reprs_clean; - } + nfp_app_reprs_set(app, NFP_REPR_TYPE_PHYS_PORT, reprs); /* The MAC_REPR control message should be sent after the MAC * representors are registered using nfp_app_reprs_set(). This is diff --git a/drivers/net/ethernet/netronome/nfp/nfp_app.c b/drivers/net/ethernet/netronome/nfp/nfp_app.c index 3644d74fe304..955a9f44d244 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_app.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_app.c @@ -106,14 +106,8 @@ nfp_app_reprs_set(struct nfp_app *app, enum nfp_repr_type type, old = rcu_dereference_protected(app->reprs[type], lockdep_is_held(&app->pf->lock)); - if (reprs && old) { - old = ERR_PTR(-EBUSY); - goto exit_unlock; - } - rcu_assign_pointer(app->reprs[type], reprs); -exit_unlock: return old; } -- 2.11.0
[net-next 6/7] nfp: add helpers for FEC support
From: Dirk van der Merwe Implement helpers to determine and modify FEC modes via the NSP. The NSP advertises FEC capabilities on a per port basis and provides support for: * Auto mode selection * Reed Solomon * BaseR * None/Off Signed-off-by: Dirk van der Merwe Signed-off-by: Simon Horman --- .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h | 30 ++ .../ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c | 64 ++ 2 files changed, 94 insertions(+) diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h index 47486d42f2d7..650ca1a5bd21 100644 --- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h +++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h @@ -79,6 +79,18 @@ enum nfp_eth_aneg { NFP_ANEG_DISABLED, }; +enum nfp_eth_fec { + NFP_FEC_AUTO_BIT = 0, + NFP_FEC_BASER_BIT, + NFP_FEC_REED_SOLOMON_BIT, + NFP_FEC_DISABLED_BIT, +}; + +#define NFP_FEC_AUTO BIT(NFP_FEC_AUTO_BIT) +#define NFP_FEC_BASER BIT(NFP_FEC_BASER_BIT) +#define NFP_FEC_REED_SOLOMON BIT(NFP_FEC_REED_SOLOMON_BIT) +#define NFP_FEC_DISABLED BIT(NFP_FEC_DISABLED_BIT) + /** * struct nfp_eth_table - ETH table information * @count: number of table entries @@ -93,6 +105,7 @@ enum nfp_eth_aneg { * @speed: interface speed (in Mbps) * @interface: interface (module) plugged in * @media: media type of the @interface + * @fec: forward error correction mode * @aneg: auto negotiation mode * @mac_addr: interface MAC address * @label_port:port id @@ -105,6 +118,7 @@ enum nfp_eth_aneg { * @port_type: one of %PORT_* defines for ethtool * @port_lanes:total number of lanes on the port (sum of lanes of all subports) * @is_split: is interface part of a split port + * @fec_modes_supported: bitmap of FEC modes supported */ struct nfp_eth_table { unsigned int count; @@ -120,6 +134,7 @@ struct nfp_eth_table { unsigned int interface; enum nfp_eth_media media; + enum nfp_eth_fec fec; enum nfp_eth_aneg aneg; u8 mac_addr[ETH_ALEN]; @@ -139,6 +154,8 @@ struct nfp_eth_table { unsigned int port_lanes; bool is_split; + + unsigned int fec_modes_supported; } ports[0]; }; @@ -149,6 +166,19 @@ __nfp_eth_read_ports(struct nfp_cpp *cpp, struct nfp_nsp *nsp); int nfp_eth_set_mod_enable(struct nfp_cpp *cpp, unsigned int idx, bool enable); int nfp_eth_set_configured(struct nfp_cpp *cpp, unsigned int idx, bool configed); +int +nfp_eth_set_fec(struct nfp_cpp *cpp, unsigned int idx, enum nfp_eth_fec mode); + +static inline bool nfp_eth_can_support_fec(struct nfp_eth_table_port *eth_port) +{ + return !!eth_port->fec_modes_supported; +} + +static inline unsigned int +nfp_eth_supported_fec_modes(struct nfp_eth_table_port *eth_port) +{ + return eth_port->fec_modes_supported; +} struct nfp_nsp *nfp_eth_config_start(struct nfp_cpp *cpp, unsigned int idx); int nfp_eth_config_commit_end(struct nfp_nsp *nsp); diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c index 47251396fcae..7ca589660e4d 100644 --- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c +++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c @@ -55,6 +55,8 @@ #define NSP_ETH_PORT_INDEX GENMASK_ULL(15, 8) #define NSP_ETH_PORT_LABEL GENMASK_ULL(53, 48) #define NSP_ETH_PORT_PHYLABEL GENMASK_ULL(59, 54) +#define NSP_ETH_PORT_FEC_SUPP_BASERBIT_ULL(60) +#define NSP_ETH_PORT_FEC_SUPP_RS BIT_ULL(61) #define NSP_ETH_PORT_LANES_MASKcpu_to_le64(NSP_ETH_PORT_LANES) @@ -67,6 +69,7 @@ #define NSP_ETH_STATE_MEDIAGENMASK_ULL(21, 20) #define NSP_ETH_STATE_OVRD_CHNGBIT_ULL(22) #define NSP_ETH_STATE_ANEG GENMASK_ULL(25, 23) +#define NSP_ETH_STATE_FEC GENMASK_ULL(27, 26) #define NSP_ETH_CTRL_CONFIGUREDBIT_ULL(0) #define NSP_ETH_CTRL_ENABLED BIT_ULL(1) @@ -75,6 +78,7 @@ #define NSP_ETH_CTRL_SET_RATE BIT_ULL(4) #define NSP_ETH_CTRL_SET_LANES BIT_ULL(5) #define NSP_ETH_CTRL_SET_ANEG BIT_ULL(6) +#define NSP_ETH_CTRL_SET_FEC BIT_ULL(7) enum nfp_eth_raw { NSP_ETH_RAW_PORT = 0, @@ -152,6 +156,7 @@ nfp_eth_port_translate(struct nfp_nsp *nsp, const union eth_table_entry *src, unsigned int index, struct nfp_eth_table_port *dst) { unsigned int rate; + unsigned int fec; u64 port, state; port = le64_to_cpu(src->port); @@ -183,6 +188,18 @@ nfp_eth_port_translate(struct nfp_nsp *nsp, const union eth_table_entry *src, dst->override_changed = FIELD_GET(NSP_ETH_STATE_OVRD_CHNG, state
[net-next 2/7] nfp: make use of MAC reinit
From: Jakub Kicinski Recent management FW images can perform full reinit of MAC cores without requiring a reboot. When loading the driver check if there are changes pending and if so call NSP MAC reinit. Full application FW reload is still required, and all MACs need to be reinited at the same time (not only the ones which have been reconfigured, and thus potentially causing disruption to unrelated netdevs) therefore for now changing MAC config without reloading the driver still remains future work. Signed-off-by: Jakub Kicinski Tested-by: Dirk van der Merwe Signed-off-by: Simon Horman --- drivers/net/ethernet/netronome/nfp/nfp_main.c | 28 +- .../net/ethernet/netronome/nfp/nfp_net_ethtool.c | 2 +- drivers/net/ethernet/netronome/nfp/nfp_net_main.c | 2 +- .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c | 5 .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h | 6 + 5 files changed, 40 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.c b/drivers/net/ethernet/netronome/nfp/nfp_main.c index f8fa63b66739..35eaccbece36 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_main.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_main.c @@ -346,6 +346,32 @@ nfp_fw_load(struct pci_dev *pdev, struct nfp_pf *pf, struct nfp_nsp *nsp) return err < 0 ? err : 1; } +static void +nfp_nsp_init_ports(struct pci_dev *pdev, struct nfp_pf *pf, + struct nfp_nsp *nsp) +{ + bool needs_reinit = false; + int i; + + pf->eth_tbl = __nfp_eth_read_ports(pf->cpp, nsp); + if (!pf->eth_tbl) + return; + + if (!nfp_nsp_has_mac_reinit(nsp)) + return; + + for (i = 0; i < pf->eth_tbl->count; i++) + needs_reinit |= pf->eth_tbl->ports[i].override_changed; + if (!needs_reinit) + return; + + kfree(pf->eth_tbl); + if (nfp_nsp_mac_reinit(nsp)) + dev_warn(&pdev->dev, "MAC reinit failed\n"); + + pf->eth_tbl = __nfp_eth_read_ports(pf->cpp, nsp); +} + static int nfp_nsp_init(struct pci_dev *pdev, struct nfp_pf *pf) { struct nfp_nsp *nsp; @@ -366,7 +392,7 @@ static int nfp_nsp_init(struct pci_dev *pdev, struct nfp_pf *pf) if (err < 0) goto exit_close_nsp; - pf->eth_tbl = __nfp_eth_read_ports(pf->cpp, nsp); + nfp_nsp_init_ports(pdev, pf, nsp); pf->nspi = __nfp_nsp_identify(nsp); if (pf->nspi) diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c index c67b90c8d8b7..0061097c271e 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c @@ -328,7 +328,7 @@ nfp_net_set_link_ksettings(struct net_device *netdev, return -EOPNOTSUPP; if (netif_running(netdev)) { - netdev_warn(netdev, "Changing settings not allowed on an active interface. It may cause the port to be disabled until reboot.\n"); + netdev_warn(netdev, "Changing settings not allowed on an active interface. It may cause the port to be disabled until driver reload.\n"); return -EBUSY; } diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c index ff373acd28f3..0beb9b21557b 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c @@ -597,7 +597,7 @@ nfp_net_eth_port_update(struct nfp_cpp *cpp, struct nfp_port *port, return -EIO; } if (eth_port->override_changed) { - nfp_warn(cpp, "Port #%d config changed, unregistering. Reboot required before port will be operational again.\n", port->eth_id); + nfp_warn(cpp, "Port #%d config changed, unregistering. Driver reload required before port will be operational again.\n", port->eth_id); port->type = NFP_PORT_INVALID; } diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c index 37364555c42b..14a6d1ba51a9 100644 --- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c +++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c @@ -477,6 +477,11 @@ int nfp_nsp_device_soft_reset(struct nfp_nsp *state) return nfp_nsp_command(state, SPCODE_SOFT_RESET, 0, 0, 0); } +int nfp_nsp_mac_reinit(struct nfp_nsp *state) +{ + return nfp_nsp_command(state, SPCODE_MAC_INIT, 0, 0, 0); +} + int nfp_nsp_load_fw(struct nfp_nsp *state, const struct firmware *fw) { return nfp_nsp_command_buf(state, SPCODE_FW_LOAD, fw->size, fw->data, diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h index e2f028027c6f..47486d42f2d7 100644 --- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h +++ b/drivers/n
[net-next 7/7] nfp: implement ethtool FEC mode settings
From: Dirk van der Merwe Add support in the driver ethtool ops to modify the NFP FEC modes. The FEC modes can be set for vNIC associated with physical ports or for MAC representor netdevs. Signed-off-by: Dirk van der Merwe Signed-off-by: Simon Horman --- .../net/ethernet/netronome/nfp/nfp_net_ethtool.c | 117 - 1 file changed, 116 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c index d0028894667c..60c8d733a37d 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c @@ -244,6 +244,30 @@ nfp_app_get_drvinfo(struct net_device *netdev, struct ethtool_drvinfo *drvinfo) nfp_get_drvinfo(app, app->pdev, "*", drvinfo); } +static void +nfp_net_set_fec_link_mode(struct nfp_eth_table_port *eth_port, + struct ethtool_link_ksettings *c) +{ + unsigned int modes; + + ethtool_link_ksettings_add_link_mode(c, supported, FEC_NONE); + if (!nfp_eth_can_support_fec(eth_port)) { + ethtool_link_ksettings_add_link_mode(c, advertising, FEC_NONE); + return; + } + + modes = nfp_eth_supported_fec_modes(eth_port); + if (modes & NFP_FEC_BASER) { + ethtool_link_ksettings_add_link_mode(c, supported, FEC_BASER); + ethtool_link_ksettings_add_link_mode(c, advertising, FEC_BASER); + } + + if (modes & NFP_FEC_REED_SOLOMON) { + ethtool_link_ksettings_add_link_mode(c, supported, FEC_RS); + ethtool_link_ksettings_add_link_mode(c, advertising, FEC_RS); + } +} + /** * nfp_net_get_link_ksettings - Get Link Speed settings * @netdev:network interface device structure @@ -278,9 +302,11 @@ nfp_net_get_link_ksettings(struct net_device *netdev, port = nfp_port_from_netdev(netdev); eth_port = nfp_port_get_eth_port(port); - if (eth_port) + if (eth_port) { cmd->base.autoneg = eth_port->aneg != NFP_ANEG_DISABLED ? AUTONEG_ENABLE : AUTONEG_DISABLE; + nfp_net_set_fec_link_mode(eth_port, cmd); + } if (!netif_carrier_ok(netdev)) return 0; @@ -686,6 +712,91 @@ static int nfp_port_get_sset_count(struct net_device *netdev, int sset) } } +static int nfp_port_fec_ethtool_to_nsp(u32 fec) +{ + switch (fec) { + case ETHTOOL_FEC_AUTO: + return NFP_FEC_AUTO_BIT; + case ETHTOOL_FEC_OFF: + return NFP_FEC_DISABLED_BIT; + case ETHTOOL_FEC_RS: + return NFP_FEC_REED_SOLOMON_BIT; + case ETHTOOL_FEC_BASER: + return NFP_FEC_BASER_BIT; + default: + /* NSP only supports a single mode at a time */ + return -EOPNOTSUPP; + } +} + +static u32 nfp_port_fec_nsp_to_ethtool(u32 fec) +{ + u32 result = 0; + + if (fec & NFP_FEC_AUTO) + result |= ETHTOOL_FEC_AUTO; + if (fec & NFP_FEC_BASER) + result |= ETHTOOL_FEC_BASER; + if (fec & NFP_FEC_REED_SOLOMON) + result |= ETHTOOL_FEC_RS; + if (fec & NFP_FEC_DISABLED) + result |= ETHTOOL_FEC_OFF; + + return result ?: ETHTOOL_FEC_NONE; +} + +static int +nfp_port_get_fecparam(struct net_device *netdev, + struct ethtool_fecparam *param) +{ + struct nfp_eth_table_port *eth_port; + struct nfp_port *port; + + param->active_fec = ETHTOOL_FEC_NONE_BIT; + param->fec = ETHTOOL_FEC_NONE_BIT; + + port = nfp_port_from_netdev(netdev); + eth_port = nfp_port_get_eth_port(port); + if (!eth_port) + return -EOPNOTSUPP; + + if (!nfp_eth_can_support_fec(eth_port)) + return 0; + + param->fec = nfp_port_fec_nsp_to_ethtool(eth_port->fec_modes_supported); + param->active_fec = nfp_port_fec_nsp_to_ethtool(eth_port->fec); + + return 0; +} + +static int +nfp_port_set_fecparam(struct net_device *netdev, + struct ethtool_fecparam *param) +{ + struct nfp_eth_table_port *eth_port; + struct nfp_port *port; + int err, fec; + + port = nfp_port_from_netdev(netdev); + eth_port = nfp_port_get_eth_port(port); + if (!eth_port) + return -EOPNOTSUPP; + + if (!nfp_eth_can_support_fec(eth_port)) + return -EOPNOTSUPP; + + fec = nfp_port_fec_ethtool_to_nsp(param->fec); + if (fec < 0) + return fec; + + err = nfp_eth_set_fec(port->app->cpp, eth_port->index, fec); + if (!err) + /* Only refresh if we did something */ + nfp_net_refresh_port_table(port); + + return err < 0 ? err : 0; +} + /* RX network flow classification (RSS, filters, etc) */ static u32 ethtool_flow_to_nfp_flag(u32 flow_type) @@ -1144,6 +1255,8
[net-next 4/7] nfp: resync repr state when port table sync
From: Dirk van der Merwe If the NSP port table has been refreshed, resync the representor state with the new port information. At the moment, this only entails looking for invalid ports and killing off representors associated with them. The repr instance becomes NULL which is safe since the app accessor function for reprs returns NULL when it cannot access a repr. Signed-off-by: Dirk van der Merwe Signed-off-by: Simon Horman --- drivers/net/ethernet/netronome/nfp/nfp_net_main.c | 6 +++ drivers/net/ethernet/netronome/nfp/nfp_net_repr.c | 47 +++ drivers/net/ethernet/netronome/nfp/nfp_net_repr.h | 1 + 3 files changed, 54 insertions(+) diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c index 0beb9b21557b..c505014121c4 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c @@ -611,6 +611,7 @@ int nfp_net_refresh_port_table_sync(struct nfp_pf *pf) struct nfp_eth_table *eth_table; struct nfp_net *nn, *next; struct nfp_port *port; + int err; lockdep_assert_held(&pf->lock); @@ -640,6 +641,11 @@ int nfp_net_refresh_port_table_sync(struct nfp_pf *pf) kfree(eth_table); + /* Resync repr state. This may cause reprs to be removed. */ + err = nfp_reprs_resync_phys_ports(pf->app); + if (err) + return err; + /* Shoot off the ports which became invalid */ list_for_each_entry_safe(nn, next, &pf->vnics, vnic_list) { if (!nn->port || nn->port->type != NFP_PORT_INVALID) diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c index d540a9dc77b3..1bce8c131bb9 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c @@ -390,3 +390,50 @@ struct nfp_reprs *nfp_reprs_alloc(unsigned int num_reprs) return reprs; } + +int nfp_reprs_resync_phys_ports(struct nfp_app *app) +{ + struct nfp_reprs *reprs, *old_reprs; + struct nfp_repr *repr; + int i; + + old_reprs = + rcu_dereference_protected(app->reprs[NFP_REPR_TYPE_PHYS_PORT], + lockdep_is_held(&app->pf->lock)); + if (!old_reprs) + return 0; + + reprs = nfp_reprs_alloc(old_reprs->num_reprs); + if (!reprs) + return -ENOMEM; + + for (i = 0; i < old_reprs->num_reprs; i++) { + if (!old_reprs->reprs[i]) + continue; + + repr = netdev_priv(old_reprs->reprs[i]); + if (repr->port->type == NFP_PORT_INVALID) + continue; + + reprs->reprs[i] = old_reprs->reprs[i]; + } + + old_reprs = nfp_app_reprs_set(app, NFP_REPR_TYPE_PHYS_PORT, reprs); + synchronize_rcu(); + + /* Now we free up removed representors */ + for (i = 0; i < old_reprs->num_reprs; i++) { + if (!old_reprs->reprs[i]) + continue; + + repr = netdev_priv(old_reprs->reprs[i]); + if (repr->port->type != NFP_PORT_INVALID) + continue; + + nfp_app_repr_stop(app, repr); + nfp_repr_clean(repr); + } + + kfree(old_reprs); + return 0; +} diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.h b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.h index 32179cad062a..5d4d897bc9c6 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.h +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.h @@ -124,5 +124,6 @@ void nfp_reprs_clean_and_free_by_type(struct nfp_app *app, enum nfp_repr_type type); struct nfp_reprs *nfp_reprs_alloc(unsigned int num_reprs); +int nfp_reprs_resync_phys_ports(struct nfp_app *app); #endif /* NFP_NET_REPR_H */ -- 2.11.0
[net-next 5/7] nfp: add get/set link settings ndos to representors
From: Dirk van der Merwe Since it is now safe to modify link settings for representors, we can attach the get/set link settings ndos to it. The get/set link settings are nfp_port based operations. If a port becomes invalid, the representor will be removed in the same way a vnic would be. Signed-off-by: Dirk van der Merwe Signed-off-by: Simon Horman --- drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c index 0061097c271e..d0028894667c 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c @@ -1155,6 +1155,8 @@ const struct ethtool_ops nfp_port_ethtool_ops = { .set_dump = nfp_app_set_dump, .get_dump_flag = nfp_app_get_dump_flag, .get_dump_data = nfp_app_get_dump_data, + .get_link_ksettings = nfp_net_get_link_ksettings, + .set_link_ksettings = nfp_net_set_link_ksettings, }; void nfp_net_set_ethtool_ops(struct net_device *netdev) -- 2.11.0
[net-next 0/7] nfp: ethtool and related improvements
Dirk van der Merwe says: This patch series throws a couple of loosely related items into a single series. Patch 1: Clang compilation fix reported by Matthias Kaehlcke Patch 2: Driver can now do MAC reinit on load when there has been a media override set in the NSP. Patch 3: Refactor the nfp_app_reprs_set API. Patch 4: Similar to vNICs, representors must be able to deal with media override changes in the NSP. Patch 5: Since representors can now handle media overrides, we can allocate the get/set link ndo's to them. Patch 6 & 7: Add support for FEC mode modification. Dirk van der Merwe (5): nfp: refactor nfp_app_reprs_set nfp: resync repr state when port table sync nfp: add get/set link settings ndos to representors nfp: add helpers for FEC support nfp: implement ethtool FEC mode settings Jakub Kicinski (2): nfp: don't depend on compiler constant propagation nfp: make use of MAC reinit drivers/net/ethernet/netronome/nfp/flower/main.c | 16 +-- drivers/net/ethernet/netronome/nfp/nfp_app.c | 6 - drivers/net/ethernet/netronome/nfp/nfp_main.c | 28 - .../net/ethernet/netronome/nfp/nfp_net_ethtool.c | 121 - drivers/net/ethernet/netronome/nfp/nfp_net_main.c | 8 +- drivers/net/ethernet/netronome/nfp/nfp_net_repr.c | 47 drivers/net/ethernet/netronome/nfp/nfp_net_repr.h | 1 + .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.c | 5 + .../net/ethernet/netronome/nfp/nfpcore/nfp_nsp.h | 36 ++ .../ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c | 87 +-- 10 files changed, 325 insertions(+), 30 deletions(-) -- 2.11.0
[net-next 1/7] nfp: don't depend on compiler constant propagation
From: Jakub Kicinski Matthias reports: nfp_eth_set_bit_config() is marked as __always_inline to allow gcc to identify the 'mask' parameter as known to be constant at compile time, which is required to use the FIELD_GET() macro. The forced inlining does the trick for gcc, but for kernel builds with clang it results in undefined symbols: drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.o: In function `__nfp_eth_set_aneg': drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c:(.text+0x787): undefined reference to `__compiletime_assert_492' drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c:(.text+0x7b1): undefined reference to `__compiletime_assert_496' These __compiletime_assert_xyx() calls would have been optimized away if the compiler had seen 'mask' as a constant. Add a macro to extract the mask and shift and pass those to nfp_eth_set_bit_config() separately. Reported-by: Matthias Kaehlcke Signed-off-by: Jakub Kicinski Tested-by: Dirk van der Merwe Signed-off-by: Simon Horman --- .../ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c | 23 ++ 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c index f6f7c085f8e0..47251396fcae 100644 --- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c +++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c @@ -469,10 +469,10 @@ int nfp_eth_set_configured(struct nfp_cpp *cpp, unsigned int idx, bool configed) return nfp_eth_config_commit_end(nsp); } -/* Force inline, FIELD_* macroes require masks to be compilation-time known */ -static __always_inline int +static int nfp_eth_set_bit_config(struct nfp_nsp *nsp, unsigned int raw_idx, - const u64 mask, unsigned int val, const u64 ctrl_bit) + const u64 mask, const unsigned int shift, + unsigned int val, const u64 ctrl_bit) { union eth_table_entry *entries = nfp_nsp_config_entries(nsp); unsigned int idx = nfp_nsp_config_idx(nsp); @@ -489,11 +489,11 @@ nfp_eth_set_bit_config(struct nfp_nsp *nsp, unsigned int raw_idx, /* Check if we are already in requested state */ reg = le64_to_cpu(entries[idx].raw[raw_idx]); - if (val == FIELD_GET(mask, reg)) + if (val == (reg & mask) >> shift) return 0; reg &= ~mask; - reg |= FIELD_PREP(mask, val); + reg |= (val << shift) & mask; entries[idx].raw[raw_idx] = cpu_to_le64(reg); entries[idx].control |= cpu_to_le64(ctrl_bit); @@ -503,6 +503,13 @@ nfp_eth_set_bit_config(struct nfp_nsp *nsp, unsigned int raw_idx, return 0; } +#define NFP_ETH_SET_BIT_CONFIG(nsp, raw_idx, mask, val, ctrl_bit) \ + ({ \ + __BF_FIELD_CHECK(mask, 0ULL, val, "NFP_ETH_SET_BIT_CONFIG: "); \ + nfp_eth_set_bit_config(nsp, raw_idx, mask, __bf_shf(mask), \ + val, ctrl_bit); \ + }) + /** * __nfp_eth_set_aneg() - set PHY autonegotiation control bit * @nsp: NFP NSP handle returned from nfp_eth_config_start() @@ -515,7 +522,7 @@ nfp_eth_set_bit_config(struct nfp_nsp *nsp, unsigned int raw_idx, */ int __nfp_eth_set_aneg(struct nfp_nsp *nsp, enum nfp_eth_aneg mode) { - return nfp_eth_set_bit_config(nsp, NSP_ETH_RAW_STATE, + return NFP_ETH_SET_BIT_CONFIG(nsp, NSP_ETH_RAW_STATE, NSP_ETH_STATE_ANEG, mode, NSP_ETH_CTRL_SET_ANEG); } @@ -544,7 +551,7 @@ int __nfp_eth_set_speed(struct nfp_nsp *nsp, unsigned int speed) return -EINVAL; } - return nfp_eth_set_bit_config(nsp, NSP_ETH_RAW_STATE, + return NFP_ETH_SET_BIT_CONFIG(nsp, NSP_ETH_RAW_STATE, NSP_ETH_STATE_RATE, rate, NSP_ETH_CTRL_SET_RATE); } @@ -561,6 +568,6 @@ int __nfp_eth_set_speed(struct nfp_nsp *nsp, unsigned int speed) */ int __nfp_eth_set_split(struct nfp_nsp *nsp, unsigned int lanes) { - return nfp_eth_set_bit_config(nsp, NSP_ETH_RAW_PORT, NSP_ETH_PORT_LANES, + return NFP_ETH_SET_BIT_CONFIG(nsp, NSP_ETH_RAW_PORT, NSP_ETH_PORT_LANES, lanes, NSP_ETH_CTRL_SET_LANES); } -- 2.11.0
[PATCH net-next] pktgen: do not abuse IN6_ADDR_HSIZE
From: Eric Dumazet pktgen accidentally used IN6_ADDR_HSIZE, instead of using the size of an IPv6 address. Since IN6_ADDR_HSIZE recently was increased from 16 to 256, this old bug is hitting us. Fixes: 3f27fb23219e ("ipv6: addrconf: add per netns perturbation in inet6_addr_hash()") Signed-off-by: Eric Dumazet Reported-by: Dan Carpenter --- net/core/pktgen.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/pktgen.c b/net/core/pktgen.c index 6e1e10ff433a5f4097d1d4b33848ab13d4e005c6..e3fa53a07d34b3e5f6b438e08b440f520b3cd6d4 100644 --- a/net/core/pktgen.c +++ b/net/core/pktgen.c @@ -2165,7 +2165,7 @@ static void pktgen_setup_inject(struct pktgen_dev *pkt_dev) + pkt_dev->pkt_overhead; } - for (i = 0; i < IN6_ADDR_HSIZE; i++) + for (i = 0; i < sizeof(struct in6_addr); i++) if (pkt_dev->cur_in6_saddr.s6_addr[i]) { set = 1; break;
Re: [bug report] ipv6: addrconf: add per netns perturbation in inet6_addr_hash()
On Sat, Nov 4, 2017 at 7:24 AM, Eric Dumazet wrote: > On Sat, Nov 4, 2017 at 7:13 AM, Eric Dumazet wrote: >> On Sat, Nov 4, 2017 at 1:31 AM, Dan Carpenter >> wrote: >>> Hello Eric Dumazet, >>> >>> The patch 3f27fb23219e: "ipv6: addrconf: add per netns perturbation >>> in inet6_addr_hash()" from Oct 23, 2017, leads to the following >>> static checker warning: >>> >>> net/core/pktgen.c:2169 pktgen_setup_inject() >>> error: buffer overflow 'pkt_dev->cur_in6_saddr.in6_u.u6_addr8' 16 >>> <= 255 >>> >>> net/core/pktgen.c >>> 2157 if (pkt_dev->flags & F_IPV6) { >>> 2158 int i, set = 0, err = 1; >>> 2159 struct inet6_dev *idev; >>> 2160 >>> 2161 if (pkt_dev->min_pkt_size == 0) { >>> 2162 pkt_dev->min_pkt_size = 14 + sizeof(struct >>> ipv6hdr) >>> 2163 + sizeof(struct >>> udphdr) >>> 2164 + sizeof(struct >>> pktgen_hdr) >>> 2165 + >>> pkt_dev->pkt_overhead; >>> 2166 } >>> 2167 >>> 2168 for (i = 0; i < IN6_ADDR_HSIZE; i++) >>> ^^ >>> My guess is that this is the wrong test here, but I don't know for sure. >>> >>> 2169 if (pkt_dev->cur_in6_saddr.s6_addr[i]) { >>>^^ >>> This used to work but now that IN6_ADDR_HSIZE is 256 instead of 16 we're >>> reading beyond the end of the array. >>> >>> 2170 set = 1; >>> 2171 break; >>> 2172 } >>> 2173 >>> 2174 if (!set) { >>> 2175 >>> 2176 /* >>> 2177 * Use linklevel address if unconfigured. >>> 2178 * >>> 2179 * use ipv6_get_lladdr if/when it's get >>> exported >>> 2180 */ >>> 2181 >>> >>> regards, >>> dan carpenter >> >> pktgen is obviously wrong. >> >> Thanks for the report. > > I am travelling to Seoul for netconf/netdev, please send this patch in > an official way. > > Thanks ! > > diff --git a/net/core/pktgen.c b/net/core/pktgen.c > index > 6e1e10ff433a5f4097d1d4b33848ab13d4e005c6..e3fa53a07d34b3e5f6b438e08b440f520b3cd6d4 > 100644 > --- a/net/core/pktgen.c > +++ b/net/core/pktgen.c > @@ -2165,7 +2165,7 @@ static void pktgen_setup_inject(struct > pktgen_dev *pkt_dev) > + pkt_dev->pkt_overhead; > } > > - for (i = 0; i < IN6_ADDR_HSIZE; i++) > + for (i = 0; i < sizeof(struct in6_addr); i++) > if (pkt_dev->cur_in6_saddr.s6_addr[i]) { > set = 1; > break; Also I would move include/net/addrconf.h:62:#define IN6_ADDR_HSIZE_SHIFT 8 include/net/addrconf.h:63:#define IN6_ADDR_HSIZE(1 << IN6_ADDR_HSIZE_SHIFT) to net/ipv6/addrconf.c to avoid future misuses like that.
Re: sr9800: Use common error handling code in sr9800_phy_powerup()
> If you play the "smaller executable object code" card, people expect that > you provide the actual numbers, too. I can offer another bit of information for this software development discussion. The affected source file can be compiled for the processor architecture “x86_64” by a tool like “GCC 6.4.1+r251631-1.3” from the software distribution “openSUSE Tumbleweed” with the following command example. my_cc=/usr/bin/gcc-6 \ && my_module=drivers/net/usb/sr9800.ko \ && git checkout next-20171009 \ && make -j4 CC="${my_cc}" HOSTCC="${my_cc}" allmodconfig "${my_module}" \ && size "${my_module}" \ && git checkout ':/^sr9800: Use common error handling code in sr9800_phy_powerup' \ && make -j4 CC="${my_cc}" HOSTCC="${my_cc}" allmodconfig "${my_module}" \ && size "${my_module}" Do you find the following details useful for further clarification? text: -47 data: 0 bss: 0 Regards, Markus
Re: [PATCH net-next v15] openvswitch: enable NSH support
On Tue, Oct 31, 2017 at 9:03 PM, Yi Yang wrote: > v14->v15 > - Check size in nsh_hdr_from_nlattr > - Fixed four small issues pointed out By Jiri and Eric > > v13->v14 > - Rename skb_push_nsh to nsh_push per Dave's comment > - Rename skb_pop_nsh to nsh_pop per Dave's comment > > v12->v13 > - Fix NSH header length check in set_nsh > > v11->v12 > - Fix missing changes old comments pointed out > - Fix new comments for v11 > > v10->v11 > - Fix the left three disputable comments for v9 >but not fixed in v10. > > v9->v10 > - Change struct ovs_key_nsh to >struct ovs_nsh_key_base base; >__be32 context[NSH_MD1_CONTEXT_SIZE]; > - Fix new comments for v9 > > v8->v9 > - Fix build error reported by daily intel build >because nsh module isn't selected by openvswitch > > v7->v8 > - Rework nested value and mask for OVS_KEY_ATTR_NSH > - Change pop_nsh to adapt to nsh kernel module > - Fix many issues per comments from Jiri Benc > > v6->v7 > - Remove NSH GSO patches in v6 because Jiri Benc >reworked it as another patch series and they have >been merged. > - Change it to adapt to nsh kernel module added by NSH >GSO patch series > > v5->v6 > - Fix the rest comments for v4. > - Add NSH GSO support for VxLAN-gpe + NSH and >Eth + NSH. > > v4->v5 > - Fix many comments by Jiri Benc and Eric Garver >for v4. > > v3->v4 > - Add new NSH match field ttl > - Update NSH header to the latest format >which will be final format and won't change >per its author's confirmation. > - Fix comments for v3. > > v2->v3 > - Change OVS_KEY_ATTR_NSH to nested key to handle >length-fixed attributes and length-variable >attriubte more flexibly. > - Remove struct ovs_action_push_nsh completely > - Add code to handle nested attribute for SET_MASKED > - Change PUSH_NSH to use the nested OVS_KEY_ATTR_NSH >to transfer NSH header data. > - Fix comments and coding style issues by Jiri and Eric > > v1->v2 > - Change encap_nsh and decap_nsh to push_nsh and pop_nsh > - Dynamically allocate struct ovs_action_push_nsh for >length-variable metadata. > > OVS master and 2.8 branch has merged NSH userspace > patch series, this patch is to enable NSH support > in kernel data path in order that OVS can support > NSH in compat mode by porting this. > > Signed-off-by: Yi Yang As commented earlier following are action related validations that can be moved to flow install phase. > --- > include/net/nsh.h| 3 + > include/uapi/linux/openvswitch.h | 29 > net/nsh/nsh.c| 59 > net/openvswitch/Kconfig | 1 + > net/openvswitch/actions.c| 119 +++ > net/openvswitch/flow.c | 51 +++ > net/openvswitch/flow.h | 7 + > net/openvswitch/flow_netlink.c | 315 > ++- > net/openvswitch/flow_netlink.h | 5 + > 9 files changed, 588 insertions(+), 1 deletion(-) > ... > diff --git a/net/nsh/nsh.c b/net/nsh/nsh.c > index 58fb827..2764682 100644 > --- a/net/nsh/nsh.c > +++ b/net/nsh/nsh.c > @@ -14,6 +14,65 @@ > #include > #include > > +int nsh_push(struct sk_buff *skb, const struct nshhdr *pushed_nh) > +{ > + struct nshhdr *nh; > + size_t length = nsh_hdr_len(pushed_nh); > + u8 next_proto; > + > + if (skb->mac_len) { > + next_proto = TUN_P_ETHERNET; > + } else { > + next_proto = tun_p_from_eth_p(skb->protocol); > + if (!next_proto) > + return -EAFNOSUPPORT; check for supported protocols can be moved to flow install validation in __ovs_nla_copy_actions(). > + } > + > + /* Add the NSH header */ > + if (skb_cow_head(skb, length) < 0) > + return -ENOMEM; > + > + skb_push(skb, length); > + nh = (struct nshhdr *)(skb->data); > + memcpy(nh, pushed_nh, length); > + nh->np = next_proto; > + > + skb->protocol = htons(ETH_P_NSH); > + skb_reset_mac_header(skb); > + skb_reset_network_header(skb); > + skb_reset_mac_len(skb); > + > + return 0; > +} > +EXPORT_SYMBOL_GPL(nsh_push); > + > +int nsh_pop(struct sk_buff *skb) > +{ > + struct nshhdr *nh; > + size_t length; > + __be16 inner_proto; > + > + if (!pskb_may_pull(skb, NSH_BASE_HDR_LEN)) > + return -ENOMEM; > + nh = (struct nshhdr *)(skb->data); > + length = nsh_hdr_len(nh); > + inner_proto = tun_p_to_eth_p(nh->np); same as above, this check can be moved to flow install __ovs_nla_copy_actions(). > + if (!pskb_may_pull(skb, length)) > + return -ENOMEM; > + > + if (!inner_proto) > + return -EAFNOSUPPORT; > + > + skb_pull(skb, length); > + skb_reset_mac_header(skb); > + skb_reset_network_header(skb); > + skb_reset_mac_len(skb); > + skb->protocol = inner_proto; > + > + return 0; > +} > +EXPORT_SYMBOL_G
Re: [bug report] ipv6: addrconf: add per netns perturbation in inet6_addr_hash()
On Sat, Nov 4, 2017 at 7:13 AM, Eric Dumazet wrote: > On Sat, Nov 4, 2017 at 1:31 AM, Dan Carpenter > wrote: >> Hello Eric Dumazet, >> >> The patch 3f27fb23219e: "ipv6: addrconf: add per netns perturbation >> in inet6_addr_hash()" from Oct 23, 2017, leads to the following >> static checker warning: >> >> net/core/pktgen.c:2169 pktgen_setup_inject() >> error: buffer overflow 'pkt_dev->cur_in6_saddr.in6_u.u6_addr8' 16 <= >> 255 >> >> net/core/pktgen.c >> 2157 if (pkt_dev->flags & F_IPV6) { >> 2158 int i, set = 0, err = 1; >> 2159 struct inet6_dev *idev; >> 2160 >> 2161 if (pkt_dev->min_pkt_size == 0) { >> 2162 pkt_dev->min_pkt_size = 14 + sizeof(struct >> ipv6hdr) >> 2163 + sizeof(struct >> udphdr) >> 2164 + sizeof(struct >> pktgen_hdr) >> 2165 + >> pkt_dev->pkt_overhead; >> 2166 } >> 2167 >> 2168 for (i = 0; i < IN6_ADDR_HSIZE; i++) >> ^^ >> My guess is that this is the wrong test here, but I don't know for sure. >> >> 2169 if (pkt_dev->cur_in6_saddr.s6_addr[i]) { >>^^ >> This used to work but now that IN6_ADDR_HSIZE is 256 instead of 16 we're >> reading beyond the end of the array. >> >> 2170 set = 1; >> 2171 break; >> 2172 } >> 2173 >> 2174 if (!set) { >> 2175 >> 2176 /* >> 2177 * Use linklevel address if unconfigured. >> 2178 * >> 2179 * use ipv6_get_lladdr if/when it's get >> exported >> 2180 */ >> 2181 >> >> regards, >> dan carpenter > > pktgen is obviously wrong. > > Thanks for the report. I am travelling to Seoul for netconf/netdev, please send this patch in an official way. Thanks ! diff --git a/net/core/pktgen.c b/net/core/pktgen.c index 6e1e10ff433a5f4097d1d4b33848ab13d4e005c6..e3fa53a07d34b3e5f6b438e08b440f520b3cd6d4 100644 --- a/net/core/pktgen.c +++ b/net/core/pktgen.c @@ -2165,7 +2165,7 @@ static void pktgen_setup_inject(struct pktgen_dev *pkt_dev) + pkt_dev->pkt_overhead; } - for (i = 0; i < IN6_ADDR_HSIZE; i++) + for (i = 0; i < sizeof(struct in6_addr); i++) if (pkt_dev->cur_in6_saddr.s6_addr[i]) { set = 1; break;
Re: [PATCH] net: sched: cls_u32: use bitwise & rather than logical && on n->flags
From: Colin King Date: Fri, 3 Nov 2017 08:09:45 + > From: Colin Ian King > > Currently n->flags is being operated on by a logical && operator rather > than a bitwise & operator. This looks incorrect as these should be bit > flag operations. Fix this. > > Detected by CoverityScan, CID#1460398 ("Logical vs. bitwise operator") > > Fixes: 245dc5121a9b ("net: sched: cls_u32: call block callbacks for offload") > Signed-off-by: Colin Ian King Applied, thanks Colin.
Re: [PATCH] net: usb: asix: fill null-ptr-deref in asix_suspend
From: Andrey Konovalov Date: Thu, 2 Nov 2017 21:26:59 +0100 > When asix_suspend() is called dev->driver_priv might not have been > assigned a value, so we need to check that it's not NULL. > > Found by syzkaller. ... > Signed-off-by: Andrey Konovalov Applied, thank you.
Re: [PATCH v3 net-next 0/5] eBPF-based device cgroup controller
From: Roman Gushchin Date: Thu, 2 Nov 2017 13:15:25 -0400 > This patchset introduces an eBPF-based device controller for cgroup > v2. This doesn't apply cleanly to net-next, please respin. Thank you.
ipset related DEBUG_VIRTUAL crash.
I have a script that hourly replaces an ipset list. This has been in place for a year or so, but last night it triggered this on 4.14-rc7 [455951.731181] kernel BUG at arch/x86/mm/physaddr.c:26! [455951.737016] invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN [455951.742525] CPU: 0 PID: 3850 Comm: ipset Not tainted 4.14.0-rc7-firewall+ #1 [455951.753293] task: 88013033cfc0 task.stack: 8801c3d48000 [455951.758567] RIP: 0010:__phys_addr+0x5b/0x80 [455951.763742] RSP: 0018:8801c3d4f528 EFLAGS: 00010287 [455951.768838] RAX: 7800849b62b6 RBX: 849b62b6 RCX: 9f072a5d [455951.773881] RDX: dc00 RSI: dc00 RDI: a06917e0 [455951.778844] RBP: 7800049b62b6 R08: 0002 R09: [455951.783729] R10: R11: R12: 9fca8b05 [455951.788524] R13: 8801ce844268 R14: 049b62b6 R15: 8801ce8442ea [455951.793239] FS: 7fb44e656c80() GS:8801d320() knlGS: [455951.797904] CS: 0010 DS: ES: CR0: 80050033 [455951.802479] CR2: 7ffeeafd70a8 CR3: 0001b6cd2001 CR4: 000606f0 [455951.806998] Call Trace: [455951.811404] kfree+0x4c/0x310 [455951.815714] hash_ip4_ahash_destroy+0x85/0xd0 [455951.819944] hash_ip4_destroy+0x64/0x90 [455951.824069] ip_set_destroy+0x4f0/0x500 [455951.828098] ? ip_set_destroy+0x5/0x500 [455951.832029] ? __rcu_read_unlock+0xd3/0x190 [455951.835867] ? ip_set_utest+0x560/0x560 [455951.839610] ? ip_set_utest+0x560/0x560 [455951.843239] nfnetlink_rcv_msg+0x73e/0x770 [455951.846780] ? nfnetlink_rcv_msg+0x352/0x770 [455951.850229] ? nfnetlink_rcv+0xe90/0xe90 [455951.853571] ? native_sched_clock+0xe8/0x190 [455951.856822] ? lock_release+0x5d3/0x7d0 [455951.859976] netlink_rcv_skb+0x121/0x230 [455951.863037] ? nfnetlink_rcv+0xe90/0xe90 [455951.865999] ? netlink_ack+0x4c0/0x4c0 [455951.868866] ? ns_capable_common+0x68/0xc0 [455951.871638] nfnetlink_rcv+0x1ad/0xe90 [455951.874312] ? lock_acquire+0x380/0x380 [455951.876891] ? __rcu_read_unlock+0xd3/0x190 [455951.879378] ? __rcu_read_lock+0x30/0x30 [455951.881764] ? rcu_is_watching+0xa4/0xf0 [455951.884048] ? netlink_connect+0x1e0/0x1e0 [455951.886236] ? nfnl_err_reset+0x180/0x180 [455951.888329] ? netlink_deliver_tap+0x128/0x560 [455951.890333] ? netlink_deliver_tap+0x5/0x560 [455951.892229] ? iov_iter_advance+0x172/0x7f0 [455951.894029] ? netlink_getname+0x150/0x150 [455951.895736] ? can_nice.part.77+0x20/0x20 [455951.897342] ? iov_iter_copy_from_user_atomic+0x7d0/0x7d0 [455951.898877] ? netlink_trim+0x111/0x1b0 [455951.900394] ? netlink_skb_destructor+0xf0/0xf0 [455951.901908] netlink_unicast+0x2b1/0x340 [455951.903397] ? netlink_detachskb+0x30/0x30 [455951.904862] ? lock_acquire+0x380/0x380 [455951.906299] ? lockdep_rcu_suspicious+0x100/0x100 [455951.907729] netlink_sendmsg+0x4f2/0x650 [455951.909141] ? netlink_broadcast_filtered+0x9e0/0x9e0 [455951.910565] ? _copy_from_user+0x86/0xc0 [455951.911964] ? netlink_broadcast_filtered+0x9e0/0x9e0 [455951.913364] SYSC_sendto+0x2f0/0x3c0 [455951.914741] ? SYSC_connect+0x210/0x210 [455951.916111] ? bad_area_access_error+0x230/0x230 [455951.917479] ? ___sys_recvmsg+0x320/0x320 [455951.918811] ? sock_wake_async+0xc0/0xc0 [455951.920112] ? SyS_brk+0x3ae/0x3d0 [455951.921381] ? prepare_exit_to_usermode+0xde/0x230 [455951.922642] ? enter_from_user_mode+0x30/0x30 [455951.923913] ? mark_held_locks+0x1b/0xa0 [455951.925179] ? entry_SYSCALL_64_fastpath+0x5/0xad [455951.926459] ? trace_hardirqs_on_caller+0x185/0x260 [455951.927747] ? trace_hardirqs_on_thunk+0x1a/0x1c [455951.929031] entry_SYSCALL_64_fastpath+0x18/0xad [455951.930314] RIP: 0033:0x7fb44df4ac53 [455951.931592] RSP: 002b:7ffeeafb6a08 EFLAGS: 0246 [455951.932914] ORIG_RAX: 002c [455951.934231] RAX: ffda RBX: 55b8f35d26d0 RCX: 7fb44df4ac53 [455951.935603] RDX: 002c RSI: 55b8f35d14b8 RDI: 0003 [455951.936991] RBP: 55b8f35cf010 R08: 7fb44dc5dbe0 R09: 000c [455951.938387] R10: R11: 0246 R12: 7fb44e43b020 [455951.939795] R13: 7ffeeafb6acc R14: R15: 55b8f1ca68e0 [455951.941208] Code: 80 48 39 eb 72 25 48 c7 c7 09 d6 a4 a0 e8 3e 28 2c 00 0f b6 0d 80 ab 9d 01 48 8d 45 00 48 d3 e8 48 85 c0 75 06 5b 48 89 e8 5d c3 <0f> 0b 48 c7 c7 10 c0 62 a0 e8 a7 2a 2c 00 48 8b 2d 60 95 5b 01 [455951.993251] RIP: __phys_addr+0x5b/0x80 RSP: 8801c3d4f528 [455982.040898] ---[ end trace dfb8a0f07b7c5316 ]--- [459428.674105] == [459428.679829] BUG: KASAN: use-after-free in __mutex_lock+0x26c/0xf30 [459428.685463] Read of size 4 at addr 88013033d020 by task ipset/4611 [459428.696474] CPU: 0 PID: 4611 Comm: ipset Tainted: G D 4.14.0-rc7-firewall+ #1 [459428.707271] Call Trace: [459428.712489]
Re: [PATCH] netfilter/ipvs: clear ipvs_property flag when SKB net namespace changed
From: Simon Horman Date: Thu, 2 Nov 2017 15:46:50 +0100 > On Sat, Oct 28, 2017 at 01:33:09PM +0300, Julian Anastasov wrote: >> >> Hello, >> >> On Thu, 26 Oct 2017, Ye Yin wrote: >> >> > When run ipvs in two different network namespace at the same host, and one >> > ipvs transport network traffic to the other network namespace ipvs. >> > 'ipvs_property' flag will make the second ipvs take no effect. So we should >> > clear 'ipvs_property' when SKB network namespace changed. >> > >> > Signed-off-by: Ye Yin >> > Signed-off-by: Wei Zhou >> >> Patch looks good to me. ipvs_property was added long ago >> but skb_scrub_packet() is more recent (3.11), so: >> >> Fixes: 621e84d6f373 ("dev: introduce skb_scrub_packet()") >> Signed-off-by: Julian Anastasov >> >> I guess, DaveM can apply it directly as a bugfix >> to the net tree. > > Sounds like a good plan to me, Dave? > > Signed-off-by: Simon Horman Sure, applied and queued up for -stable, thanks!
Re: [PATCH] tcp_nv: use do_div() instead of expensive div64_u64()
From: Konstantin Khlebnikov Date: Thu, 02 Nov 2017 17:07:05 +0300 > Average RTT is 32-bit thus full 64-bit division is redundant. > > Signed-off-by: Konstantin Khlebnikov > Suggested-by: Stephen Hemminger > Suggested-by: Eric Dumazet Applied to net-next, thank you.
Re: [PATCH net] cxgb4: update latest firmware version supported
From: Ganesh Goudar Date: Thu, 2 Nov 2017 19:26:22 +0530 > Change t4fw_version.h to update latest firmware version > number to 1.16.63.0. > > Signed-off-by: Ganesh Goudar Applied.
Re: [PATCH net] add support of IFF_XMIT_DST_RELEASE bit in vlan
From: Vadim Fedorenko Date: Thu, 2 Nov 2017 15:49:08 +0300 > Some time ago Eric Dumazet suggested a "hack the IFF_XMIT_DST_RELEASE > flag on the vlan netdev". But the last comment was "does not support > properly bonding/team.(If the real_dev->privflags IFF_XMIT_DST_RELEASE > bit changes, we want to update all the vlans at the same time )" > > I've extended that patch to support changes of IFF_XMIT_DST_RELEASE in > bonding/team. > Both bonding and team call netdev_change_features() after recalculation > of features including priv_flags IFF_XMIT_DST_RELEASE bit. So the only > thing needed to support is to recheck this bit in > vlan_transfer_features(). > > Suggested-by: Eric Dumazet > Signed-off-by: Vadim Fedorenko Applied, thank you.
Re: [PATCH net-next] phylink: make local function phylink_phy_change() static
From: Wei Yongjun Date: Thu, 2 Nov 2017 11:14:48 + > Fixes the following sparse warnings: > > drivers/net/phy/phylink.c:570:6: warning: > symbol 'phylink_phy_change' was not declared. Should it be static? > > Signed-off-by: Wei Yongjun Applied, thank you Wei.
Re: [PATCH] [net-next,v2] ibmvnic: Feature implementation of Vital Product Data (VPD) for the ibmvnic driver
From: Desnes Augusto Nunes do Rosario Date: Wed, 1 Nov 2017 19:03:32 -0200 > + substr = strnstr(adapter->vpd->buff, "RM", adapter->vpd->len); > + if (!substr) { > + dev_info(dev, "No FW level provided by VPD\n"); > + complete(&adapter->fw_done); > + return; > + } > + > + /* get length of firmware level ASCII substring */ > + fw_level_len = *(substr + 2); > + > + /* copy firmware version string from vpd into adapter */ > + ptr = strncpy((char *)adapter->fw_version, > + substr + 3, fw_level_len); You have to be more careful here, making sure first that (substr + 2) < (adapter->vpd->buff + adapter->vpd->len), and next that (substr + 2 + fw_level_len) is in range as well.
Re: [patch net-next 3/5] net: sched: introduce block mechanism to handle netif_keep_dst calls
Sat, Nov 04, 2017 at 11:33:58AM CET, dan...@iogearbox.net wrote: >On 11/04/2017 10:55 AM, Jiri Pirko wrote: >> Fri, Nov 03, 2017 at 09:15:54PM CET, dan...@iogearbox.net wrote: >> > On 11/03/2017 06:19 PM, Jiri Pirko wrote: >> > > From: Jiri Pirko >> > > >> > > Couple of classifiers call netif_keep_dst directly on q->dev. That is >> > > not possible to do directly for shared blocke where multiple qdiscs are >> > > owning the block. So introduce a infrastructure to keep track of the >> > > block owners in list and use this list to implement block variant of >> > > netif_keep_dst. >> > > >> > > Signed-off-by: Jiri Pirko >> > [...] >> > > +struct tcf_block_owner_item { >> > > +struct list_head list; >> > > +struct Qdisc *q; >> > > +enum tcf_block_binder_type binder_type; >> > > +}; >> > > + >> > > +static void >> > > +tcf_block_owner_netif_keep_dst(struct tcf_block *block, >> > > + struct Qdisc *q, >> > > + enum tcf_block_binder_type binder_type) >> > > +{ >> > > +if (block->keep_dst && >> > > +binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS) >> > >> > Why we need to keep dst on TCF_BLOCK_BINDER_TYPE_CLSACT_EGRESS ? >> > I presume this enum means sch_handle_egress() ? dst is dropped >> > later ... >> >> This is because of the bpf check: >> if (fp->dst_needed && !(tp->q->flags & TCQ_F_INGRESS)) >> netif_keep_dst(qdisc_dev(tp->q)); >> >> I just maintain the same logic here. > >No, that's a wrong claim, really ... > >clsact in general hooks into the same logic as ingress, so TC_H_CLSACT >as major needs to reuse TC_H_INGRESS, and qdiscs set up as such set >TCQ_F_INGRESS as flags. For clsact that means both your block binder >types for clsact here (ingress/egress). Ah, indeed, I missed this. I will rename TCQ_F_INGRESS to TCQ_F_CLSACT as a part of this patchset too. > >Please make sure that your other changes don't have similar assumption. They don't. Thanks for the review!
Re: [net-next 03/12] net/mlx5: Add MLX5_SET16 and MLX5_GET16
On Sat, Nov 4, 2017 at 6:35 PM, David Miller wrote: > From: Or Gerlitz > Date: Sat, 4 Nov 2017 18:05:29 +0900 > >> On Sat, Nov 4, 2017 at 5:50 PM, Saeed Mahameed wrote: >>> From: Huy Nguyen >>> >>> Add MLX5_SET16 and MLX5_GET16 for 16bit structure field in firmware >>> command. >>> >>> Signed-off-by: Huy Nguyen >>> Reviewed-by: Parav Pandit >>> Signed-off-by: Saeed Mahameed >> >> This was reviewed by Eli Cohen, his R.B is missing here. Eli - if you >> can reply and add it such that >> patchworks will pick it up. > > Not if I pull from Saeed's tree, which is what I usually do for mlx5 > submissions. So I guess Saeed's maintainer signature could be enough
[PATCH] net: mvpp2: Prevent userspace from changing TX affinities
The mvpp2 driver can't cope at all with the TX affinities being changed from userspace, and spit an endless stream of [ 91.779920] mvpp2 f400.ethernet eth2: wrong cpu on the end of Tx processing [ 91.779930] mvpp2 f400.ethernet eth2: wrong cpu on the end of Tx processing [ 91.780402] mvpp2 f400.ethernet eth2: wrong cpu on the end of Tx processing [ 91.780406] mvpp2 f400.ethernet eth2: wrong cpu on the end of Tx processing [ 91.780415] mvpp2 f400.ethernet eth2: wrong cpu on the end of Tx processing [ 91.780418] mvpp2 f400.ethernet eth2: wrong cpu on the end of Tx processing rendering the box completely useless (I've measured around 600k interrupts/s on a 8040 box) once irqbalance kicks in and start doing its job. Obviously, the driver was never designed with this in mind. So let's work around the problem by preventing userspace from interacting with these interrupts altogether. Cc: sta...@vger.kernel.org Signed-off-by: Marc Zyngier --- drivers/net/ethernet/marvell/mvpp2.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/net/ethernet/marvell/mvpp2.c b/drivers/net/ethernet/marvell/mvpp2.c index a37af5813f33..fcf9ba5eb8d1 100644 --- a/drivers/net/ethernet/marvell/mvpp2.c +++ b/drivers/net/ethernet/marvell/mvpp2.c @@ -6747,6 +6747,9 @@ static int mvpp2_irqs_init(struct mvpp2_port *port) for (i = 0; i < port->nqvecs; i++) { struct mvpp2_queue_vector *qv = port->qvecs + i; + if (qv->type == MVPP2_QUEUE_VECTOR_PRIVATE) + irq_set_status_flags(qv->irq, IRQ_NO_BALANCING); + err = request_irq(qv->irq, mvpp2_isr, 0, port->dev->name, qv); if (err) goto err; @@ -6776,6 +6779,7 @@ static void mvpp2_irqs_deinit(struct mvpp2_port *port) struct mvpp2_queue_vector *qv = port->qvecs + i; irq_set_affinity_hint(qv->irq, NULL); + irq_clear_status_flags(qv->irq, IRQ_NO_BALANCING); free_irq(qv->irq, qv); } } -- 2.11.0
Re: [PATCH net-next v15] openvswitch: enable NSH support
On Thu, Nov 2, 2017 at 6:40 PM, Yang, Yi wrote: > On Thu, Nov 02, 2017 at 05:06:47AM -0700, Pravin Shelar wrote: >> On Wed, Nov 1, 2017 at 7:50 PM, Yang, Yi wrote: >> > On Thu, Nov 02, 2017 at 08:52:40AM +0800, Pravin Shelar wrote: >> >> On Tue, Oct 31, 2017 at 9:03 PM, Yi Yang wrote: >> >> > >> >> > OVS master and 2.8 branch has merged NSH userspace >> >> > patch series, this patch is to enable NSH support >> >> > in kernel data path in order that OVS can support >> >> > NSH in compat mode by porting this. >> >> > >> >> > Signed-off-by: Yi Yang >> >> > --- >> >> I have comment related to checksum, otherwise patch looks good to me. >> > >> > Pravin, thank you for your comments, the below part is incremental patch >> > for checksum, please help check it, I'll send out v16 with this after >> > you confirm. >> > >> This change looks good to me. >> I noticed couple of more issues. >> 1. Can you move the ovs_key_nsh to the union of ipv4 an ipv6? >> ipv4/ipv6/nsh key data is mutually exclusive so there is no need for >> separate space for nsh key in the ovs key. >> 2. We need to fix match_validate() with nsh check. Datapath can not >> allow any l3 or l4 match if the flow key contains nsh match and >> vice-versa. such flow key should be rejected. > > Pravin, the below incremental patch should fix the issues you pionted > out, please help confirm/ack, then I'll send out v16 with all acks > from you all for merge. BTW, it has been verified in my sfc test > environment. > Following patch looks good to me. But I think we needs similar eth_type check for nsh set action in validate_set() and in __ovs_nla_copy_actions() for NSH_POP action. Can you send patch with all changes? Thanks. > diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h > index 8eeae749..c670dd2 100644 > --- a/net/openvswitch/flow.h > +++ b/net/openvswitch/flow.h > @@ -149,8 +149,8 @@ struct sw_flow_key { > } nd; > }; > } ipv6; > + struct ovs_key_nsh nsh; /* network service header */ > }; > - struct ovs_key_nsh nsh; /* network service header */ > struct { > /* Connection tracking fields not packed above. */ > struct { > diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c > index 0d7d4ae..090103c 100644 > --- a/net/openvswitch/flow_netlink.c > +++ b/net/openvswitch/flow_netlink.c > @@ -178,7 +178,8 @@ static bool match_validate(const struct sw_flow_match > *match, > | (1 << OVS_KEY_ATTR_ICMPV6) > | (1 << OVS_KEY_ATTR_ARP) > | (1 << OVS_KEY_ATTR_ND) > - | (1 << OVS_KEY_ATTR_MPLS)); > + | (1 << OVS_KEY_ATTR_MPLS) > + | (1 << OVS_KEY_ATTR_NSH)); > > /* Always allowed mask fields. */ > mask_allowed |= ((1 << OVS_KEY_ATTR_TUNNEL) > @@ -287,6 +288,14 @@ static bool match_validate(const struct sw_flow_match > *match, > } > } > > + if (match->key->eth.type == htons(ETH_P_NSH)) { > + key_expected |= 1 << OVS_KEY_ATTR_NSH; > + if (match->mask && > + match->mask->key.eth.type == htons(0x)) { > + mask_allowed |= 1 << OVS_KEY_ATTR_NSH; > + } > + } > + > if ((key_attrs & key_expected) != key_expected) { > /* Key attributes check failed. */ > OVS_NLERR(log, "Missing key (keys=%llx, expected=%llx)",
Re: [patch net-next 3/5] net: sched: introduce block mechanism to handle netif_keep_dst calls
On 11/04/2017 10:55 AM, Jiri Pirko wrote: Fri, Nov 03, 2017 at 09:15:54PM CET, dan...@iogearbox.net wrote: On 11/03/2017 06:19 PM, Jiri Pirko wrote: From: Jiri Pirko Couple of classifiers call netif_keep_dst directly on q->dev. That is not possible to do directly for shared blocke where multiple qdiscs are owning the block. So introduce a infrastructure to keep track of the block owners in list and use this list to implement block variant of netif_keep_dst. Signed-off-by: Jiri Pirko [...] +struct tcf_block_owner_item { + struct list_head list; + struct Qdisc *q; + enum tcf_block_binder_type binder_type; +}; + +static void +tcf_block_owner_netif_keep_dst(struct tcf_block *block, + struct Qdisc *q, + enum tcf_block_binder_type binder_type) +{ + if (block->keep_dst && + binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS) Why we need to keep dst on TCF_BLOCK_BINDER_TYPE_CLSACT_EGRESS ? I presume this enum means sch_handle_egress() ? dst is dropped later ... This is because of the bpf check: if (fp->dst_needed && !(tp->q->flags & TCQ_F_INGRESS)) netif_keep_dst(qdisc_dev(tp->q)); I just maintain the same logic here. No, that's a wrong claim, really ... clsact in general hooks into the same logic as ingress, so TC_H_CLSACT as major needs to reuse TC_H_INGRESS, and qdiscs set up as such set TCQ_F_INGRESS as flags. For clsact that means both your block binder types for clsact here (ingress/egress). Please make sure that your other changes don't have similar assumption.
Re: [PATCH net-next v2 03/15] bpf: report offload info to user space
On Sat, 4 Nov 2017 18:45:31 +0900, Alexei Starovoitov wrote: > On Fri, Nov 03, 2017 at 01:56:18PM -0700, Jakub Kicinski wrote: > > Extend struct bpf_prog_info to contain information about program > > being bound to a device. Since the netdev may get destroyed while > > program still exists we need a flag to indicate the program is > > loaded for a device, even if the device is gone. > > > > Signed-off-by: Jakub Kicinski > > Reviewed-by: Simon Horman > > Reviewed-by: Quentin Monnet > > --- > > include/linux/bpf.h | 1 + > > include/uapi/linux/bpf.h | 6 ++ > > kernel/bpf/offload.c | 12 > > kernel/bpf/syscall.c | 5 + > > 4 files changed, 24 insertions(+) > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > > index e45d43f9ec92..98bacd0fa5cc 100644 > > --- a/include/linux/bpf.h > > +++ b/include/linux/bpf.h > > @@ -506,6 +506,7 @@ static inline int cpu_map_enqueue(struct > > bpf_cpu_map_entry *rcpu, > > > > int bpf_prog_offload_compile(struct bpf_prog *prog); > > void bpf_prog_offload_destroy(struct bpf_prog *prog); > > +u32 bpf_prog_offload_ifindex(struct bpf_prog *prog); > > > > #if defined(CONFIG_NET) && defined(CONFIG_BPF_SYSCALL) > > int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr); > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > index 727a3dba13e6..e92f62cf933a 100644 > > --- a/include/uapi/linux/bpf.h > > +++ b/include/uapi/linux/bpf.h > > @@ -894,6 +894,10 @@ enum sk_action { > > > > #define BPF_TAG_SIZE 8 > > > > +enum bpf_prog_status { > > + BPF_PROG_STATUS_DEV_BOUND = (1 << 0), > > +}; > > + > > struct bpf_prog_info { > > __u32 type; > > __u32 id; > > @@ -907,6 +911,8 @@ struct bpf_prog_info { > > __u32 nr_map_ids; > > __aligned_u64 map_ids; > > char name[BPF_OBJ_NAME_LEN]; > > + __u32 ifindex; > > + __u32 status; > > why status is needed? > ifindex cannot be zero, so if it's set > 0 would mean > that the program is bound. Devices may come and go, independently from the lifetime of the program, therefore there is a notion of a program which has been loaded for a particular device but the device is gone (and therefore its ifindex is meaningless). I tried to explain this in the commit message. > Also would be good to have consistent name with prog_load. > imo prog_target_ifindex is too long. > May be call it 'ifindex' both in bpf_attr and in bpf_prog_info ? Perhaps I'm missing something, but bpf_attr is a huge union of (mostly) unnamed anonymous structs. I foresee that we will have to add an ifindex member for a map command as well, therefore the prog_* prefix seems prudent. Should I go back to prog_ifindex in bpf_attr? Or perhaps should I duplicate the struct for BPF_PROG_LOAD but this time give it a member name so we can extend it without worrying about member name conflicts?
Re: [patch net-next 3/5] net: sched: introduce block mechanism to handle netif_keep_dst calls
Fri, Nov 03, 2017 at 09:15:54PM CET, dan...@iogearbox.net wrote: >On 11/03/2017 06:19 PM, Jiri Pirko wrote: >> From: Jiri Pirko >> >> Couple of classifiers call netif_keep_dst directly on q->dev. That is >> not possible to do directly for shared blocke where multiple qdiscs are >> owning the block. So introduce a infrastructure to keep track of the >> block owners in list and use this list to implement block variant of >> netif_keep_dst. >> >> Signed-off-by: Jiri Pirko >[...] >> +struct tcf_block_owner_item { >> +struct list_head list; >> +struct Qdisc *q; >> +enum tcf_block_binder_type binder_type; >> +}; >> + >> +static void >> +tcf_block_owner_netif_keep_dst(struct tcf_block *block, >> + struct Qdisc *q, >> + enum tcf_block_binder_type binder_type) >> +{ >> +if (block->keep_dst && >> +binder_type != TCF_BLOCK_BINDER_TYPE_CLSACT_INGRESS) > >Why we need to keep dst on TCF_BLOCK_BINDER_TYPE_CLSACT_EGRESS ? >I presume this enum means sch_handle_egress() ? dst is dropped >later ... This is because of the bpf check: if (fp->dst_needed && !(tp->q->flags & TCQ_F_INGRESS)) netif_keep_dst(qdisc_dev(tp->q)); I just maintain the same logic here.
Re: [PATCH net-next] tools: bpftool: move p_err() and p_info() from main.h to common.c
On Fri, Nov 03, 2017 at 01:59:07PM -0700, Jakub Kicinski wrote: > From: Quentin Monnet > > The two functions were declared as static inline in a header file. There > is no particular reason why they should be inlined, they just happened to > remain in the same header file when they were turned from macros to > functions in a precious commit. > > Make them non-inlined functions and move them to common.c file instead. > > Suggested-by: Joe Perches > Signed-off-by: Quentin Monnet > Signed-off-by: Jakub Kicinski Acked-by: Alexei Starovoitov
Re: [PATCH net-next v2 03/15] bpf: report offload info to user space
On Fri, Nov 03, 2017 at 01:56:18PM -0700, Jakub Kicinski wrote: > Extend struct bpf_prog_info to contain information about program > being bound to a device. Since the netdev may get destroyed while > program still exists we need a flag to indicate the program is > loaded for a device, even if the device is gone. > > Signed-off-by: Jakub Kicinski > Reviewed-by: Simon Horman > Reviewed-by: Quentin Monnet > --- > include/linux/bpf.h | 1 + > include/uapi/linux/bpf.h | 6 ++ > kernel/bpf/offload.c | 12 > kernel/bpf/syscall.c | 5 + > 4 files changed, 24 insertions(+) > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > index e45d43f9ec92..98bacd0fa5cc 100644 > --- a/include/linux/bpf.h > +++ b/include/linux/bpf.h > @@ -506,6 +506,7 @@ static inline int cpu_map_enqueue(struct > bpf_cpu_map_entry *rcpu, > > int bpf_prog_offload_compile(struct bpf_prog *prog); > void bpf_prog_offload_destroy(struct bpf_prog *prog); > +u32 bpf_prog_offload_ifindex(struct bpf_prog *prog); > > #if defined(CONFIG_NET) && defined(CONFIG_BPF_SYSCALL) > int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr); > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index 727a3dba13e6..e92f62cf933a 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -894,6 +894,10 @@ enum sk_action { > > #define BPF_TAG_SIZE 8 > > +enum bpf_prog_status { > + BPF_PROG_STATUS_DEV_BOUND = (1 << 0), > +}; > + > struct bpf_prog_info { > __u32 type; > __u32 id; > @@ -907,6 +911,8 @@ struct bpf_prog_info { > __u32 nr_map_ids; > __aligned_u64 map_ids; > char name[BPF_OBJ_NAME_LEN]; > + __u32 ifindex; > + __u32 status; why status is needed? ifindex cannot be zero, so if it's set > 0 would mean that the program is bound. Also would be good to have consistent name with prog_load. imo prog_target_ifindex is too long. May be call it 'ifindex' both in bpf_attr and in bpf_prog_info ?
Re: [net-next 03/12] net/mlx5: Add MLX5_SET16 and MLX5_GET16
From: Or Gerlitz Date: Sat, 4 Nov 2017 18:05:29 +0900 > On Sat, Nov 4, 2017 at 5:50 PM, Saeed Mahameed wrote: >> From: Huy Nguyen >> >> Add MLX5_SET16 and MLX5_GET16 for 16bit structure field in firmware >> command. >> >> Signed-off-by: Huy Nguyen >> Reviewed-by: Parav Pandit >> Signed-off-by: Saeed Mahameed > > This was reviewed by Eli Cohen, his R.B is missing here. Eli - if you > can reply and add it such that > patchworks will pick it up. Not if I pull from Saeed's tree, which is what I usually do for mlx5 submissions.
Re: [RFC PATCH] bpf: Add helpers to read useful task_struct members
On 11/3/17 3:58 PM, Sandipan Das wrote: For added security, the layout of some structures can be randomized by enabling CONFIG_GCC_PLUGIN_RANDSTRUCT. One such structure is task_struct. To build BPF programs, we use Clang which does not support this feature. So, if we attempt to read a field of a structure with a randomized layout within a BPF program, we do not get the expected value because of incorrect offsets. To observe this, it is not mandatory to have CONFIG_GCC_PLUGIN_RANDSTRUCT enabled because the structure annotations/members added for this purpose are enough to cause this. So, all kernel builds are affected. For example, considering samples/bpf/offwaketime_kern.c, if we try to print the values of pid and comm inside the task_struct passed to waker() by adding the following lines of code at the appropriate place char fmt[] = "waker(): p->pid = %u, p->comm = %s\n"; bpf_trace_printk(fmt, sizeof(fmt), _(p->pid), _(p->comm)); it is seen that upon rebuilding and running this sample followed by inspecting /sys/kernel/debug/tracing/trace, the output looks like the following _-=> irqs-off / _=> need-resched | / _---=> hardirq/softirq || / _--=> preempt-depth ||| / delay TASK-PID CPU# TIMESTAMP FUNCTION | | | | | -0 [007] d.s. 1883.443594: 0x0001: waker(): p->pid = 0, p->comm = -0 [018] d.s. 1883.453588: 0x0001: waker(): p->pid = 0, p->comm = -0 [007] d.s. 1883.463584: 0x0001: waker(): p->pid = 0, p->comm = -0 [009] d.s. 1883.483586: 0x0001: waker(): p->pid = 0, p->comm = -0 [005] d.s. 1883.493583: 0x0001: waker(): p->pid = 0, p->comm = -0 [009] d.s. 1883.503583: 0x0001: waker(): p->pid = 0, p->comm = -0 [018] d.s. 1883.513578: 0x0001: waker(): p->pid = 0, p->comm = systemd-journal-3140 [003] d... 1883.627660: 0x0001: waker(): p->pid = 0, p->comm = systemd-journal-3140 [003] d... 1883.627704: 0x0001: waker(): p->pid = 0, p->comm = systemd-journal-3140 [003] d... 1883.627723: 0x0001: waker(): p->pid = 0, p->comm = To avoid this, we add new BPF helpers that read the correct values for some of the important task_struct members such as pid, tgid, comm and flags which are extensively used in BPF-based analysis tools such as bcc. Since these helpers are built with GCC, they use the correct offsets when referencing a member. Signed-off-by: Sandipan Das ... diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index f90860d1f897..324508d27bd2 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -338,6 +338,16 @@ union bpf_attr { * @skb: pointer to skb * Return: classid if != 0 * + * u64 bpf_get_task_pid_tgid(struct task_struct *task) + * Return: task->tgid << 32 | task->pid + * + * int bpf_get_task_comm(struct task_struct *task) + * Stores task->comm into buf + * Return: 0 on success or negative error + * + * u32 bpf_get_task_flags(struct task_struct *task) + * Return: task->flags + * I don't think it's a solution. Tracing scripts read other fields too. Making it work for these 3 fields is a drop in a bucket. If randomization is used I think we have to accept that existing bpf scripts won't be usable. Long term solution is to support 'BPF Type Format' or BTF (which is old C-Type Format) for kernel data structures, so bcc scripts wouldn't need to use kernel headers and clang. The proper offsets will be described in BTF. We were planning to use it initially to describe map key/value, but it applies for this case as well. There will be a tool that will take dwarf from vmlinux and compress it into BTF. Kernel will also be able to verify that BTF is a valid BTF. I'm assuming that gcc randomization plugin produces dwarf with correct offsets, if not, it would have to be fixed.
Re: [net-next 06/12] net/mlx5e: Support DSCP trust state to Ethernet's IP packet on SQ
On Sat, Nov 4, 2017 at 5:50 PM, Saeed Mahameed wrote: > From: Huy Nguyen > > If the port is in DSCP trust state, packets are placed in the right > priority queue based on the dscp value. This is done by selecting > the transmit queue based on the dscp of the skb. > > Until now select_queue honors priority only from the vlan header. > However that is not sufficient in cases where port trust state is DSCP > mode as packet might not even contain vlan header. Therefore if the port > is in dscp trust state and vport's min inline mode is not NONE, > copy the IP header to the eseg's inline header if the skb has it. > This is done by changing the transmit queue sq's min inline mode to L3. > Note that the min inline mode of sqs that belong to other features such > as xdpsq, icosq are not modified. > > Signed-off-by: Huy Nguyen > Reviewed-by: Parav Pandit > Signed-off-by: Saeed Mahameed Reviewed-by: Or Gerlitz
Re: [net-next 04/12] net/mlx5: QPTS and QPDPM register firmware command support
On Sat, Nov 4, 2017 at 5:50 PM, Saeed Mahameed wrote: > From: Huy Nguyen > > The QPTS register allows changing the priority trust state between pcp and > dscp. Add support to get/set trust state from device. When the port is > in pcp/dscp trust state, packet is routed by hardware to matching priority > based on its pcp/dscp value respectively. > > The QPDPM register allow channing the dscp to priority mapping. Add support > to get/set dscp to priority mapping from device. > Note that to change a dscp mapping, the "e" bit of this dscp structure > must be set in the QPDPM firmware command. > > Signed-off-by: Huy Nguyen > Reviewed-by: Parav Pandit > Signed-off-by: Saeed Mahameed Reviewed-by: Or Gerlitz
Re: [net-next 03/12] net/mlx5: Add MLX5_SET16 and MLX5_GET16
On Sat, Nov 4, 2017 at 5:50 PM, Saeed Mahameed wrote: > From: Huy Nguyen > > Add MLX5_SET16 and MLX5_GET16 for 16bit structure field in firmware > command. > > Signed-off-by: Huy Nguyen > Reviewed-by: Parav Pandit > Signed-off-by: Saeed Mahameed This was reviewed by Eli Cohen, his R.B is missing here. Eli - if you can reply and add it such that patchworks will pick it up.
Re: [net-next 02/12] net/mlx5: QCAM register firmware command support
On Sat, Nov 4, 2017 at 5:50 PM, Saeed Mahameed wrote: > > From: Huy Nguyen > > The QCAM register provides capability bit for all the QoS registers > using ACCESS_REG command. > > Signed-off-by: Huy Nguyen > Reviewed-by: Parav Pandit > Signed-off-by: Saeed Mahameed Reviewed-by: Or Gerlitz
[net-next 04/12] net/mlx5: QPTS and QPDPM register firmware command support
From: Huy Nguyen The QPTS register allows changing the priority trust state between pcp and dscp. Add support to get/set trust state from device. When the port is in pcp/dscp trust state, packet is routed by hardware to matching priority based on its pcp/dscp value respectively. The QPDPM register allow channing the dscp to priority mapping. Add support to get/set dscp to priority mapping from device. Note that to change a dscp mapping, the "e" bit of this dscp structure must be set in the QPDPM firmware command. Signed-off-by: Huy Nguyen Reviewed-by: Parav Pandit Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/port.c | 99 ++ include/linux/mlx5/driver.h| 7 ++ include/linux/mlx5/mlx5_ifc.h | 20 ++ include/linux/mlx5/port.h | 5 ++ 4 files changed, 131 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c b/drivers/net/ethernet/mellanox/mlx5/core/port.c index b6553be841f9..c37d00cd472a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/port.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c @@ -971,3 +971,102 @@ int mlx5_set_mtppse(struct mlx5_core_dev *mdev, u8 pin, u8 arm, u8 mode) return mlx5_core_access_reg(mdev, in, sizeof(in), out, sizeof(out), MLX5_REG_MTPPSE, 0, 1); } + +int mlx5_set_trust_state(struct mlx5_core_dev *mdev, u8 trust_state) +{ + u32 out[MLX5_ST_SZ_DW(qpts_reg)] = {}; + u32 in[MLX5_ST_SZ_DW(qpts_reg)] = {}; + int err; + + MLX5_SET(qpts_reg, in, local_port, 1); + MLX5_SET(qpts_reg, in, trust_state, trust_state); + + err = mlx5_core_access_reg(mdev, in, sizeof(in), out, + sizeof(out), MLX5_REG_QPTS, 0, 1); + return err; +} + +int mlx5_query_trust_state(struct mlx5_core_dev *mdev, u8 *trust_state) +{ + u32 out[MLX5_ST_SZ_DW(qpts_reg)] = {}; + u32 in[MLX5_ST_SZ_DW(qpts_reg)] = {}; + int err; + + MLX5_SET(qpts_reg, in, local_port, 1); + + err = mlx5_core_access_reg(mdev, in, sizeof(in), out, + sizeof(out), MLX5_REG_QPTS, 0, 0); + if (!err) + *trust_state = MLX5_GET(qpts_reg, out, trust_state); + + return err; +} + +int mlx5_set_dscp2prio(struct mlx5_core_dev *mdev, u8 dscp, u8 prio) +{ + int sz = MLX5_ST_SZ_BYTES(qpdpm_reg); + void *qpdpm_dscp; + void *out; + void *in; + int err; + + in = kzalloc(sz, GFP_KERNEL); + out = kzalloc(sz, GFP_KERNEL); + if (!in || !out) { + err = -ENOMEM; + goto out; + } + + MLX5_SET(qpdpm_reg, in, local_port, 1); + err = mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_QPDPM, 0, 0); + if (err) + goto out; + + memcpy(in, out, sz); + MLX5_SET(qpdpm_reg, in, local_port, 1); + + /* Update the corresponding dscp entry */ + qpdpm_dscp = MLX5_ADDR_OF(qpdpm_reg, in, dscp[dscp]); + MLX5_SET16(qpdpm_dscp_reg, qpdpm_dscp, prio, prio); + MLX5_SET16(qpdpm_dscp_reg, qpdpm_dscp, e, 1); + err = mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_QPDPM, 0, 1); + +out: + kfree(in); + kfree(out); + return err; +} + +/* dscp2prio[i]: priority that dscp i mapped to */ +#define MLX5E_SUPPORTED_DSCP 64 +int mlx5_query_dscp2prio(struct mlx5_core_dev *mdev, u8 *dscp2prio) +{ + int sz = MLX5_ST_SZ_BYTES(qpdpm_reg); + void *qpdpm_dscp; + void *out; + void *in; + int err; + int i; + + in = kzalloc(sz, GFP_KERNEL); + out = kzalloc(sz, GFP_KERNEL); + if (!in || !out) { + err = -ENOMEM; + goto out; + } + + MLX5_SET(qpdpm_reg, in, local_port, 1); + err = mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_QPDPM, 0, 0); + if (err) + goto out; + + for (i = 0; i < (MLX5E_SUPPORTED_DSCP); i++) { + qpdpm_dscp = MLX5_ADDR_OF(qpdpm_reg, out, dscp[i]); + dscp2prio[i] = MLX5_GET16(qpdpm_dscp_reg, qpdpm_dscp, prio); + } + +out: + kfree(in); + kfree(out); + return err; +} diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index ed5be52282ea..a886b51511ab 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -107,8 +107,10 @@ enum { }; enum { + MLX5_REG_QPTS= 0x4002, MLX5_REG_QETCR = 0x4005, MLX5_REG_QTCT= 0x400a, + MLX5_REG_QPDPM = 0x4013, MLX5_REG_QCAM= 0x4019, MLX5_REG_DCBX_PARAM = 0x4020, MLX5_REG_DCBX_APP= 0x4021, @@ -142,6 +144,11 @@ enum { MLX5_REG_MCAM= 0x907f, }; +enum mlx5_qpts_trust_state { + MLX5_QPTS_TRUST_PCP = 1, + MLX5_QPTS_TRUST_DSCP = 2, +}; + enum mlx5_dcbx_oper_mode { M
[net-next 07/12] net/mlx5e: Add support for ethtool msglvl support
From: Gal Pressman Use ethtool -s msglvl on/off to toggle debug messages. Signed-off-by: Gal Pressman Signed-off-by: Inbar Karmy Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 11 +++ drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c | 13 + drivers/net/ethernet/mellanox/mlx5/core/en_main.c| 1 + 3 files changed, 25 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index fae7b62d173f..8c872e2e1aa0 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -127,6 +127,16 @@ #define MLX5E_NUM_MAIN_GROUPS 9 +#define MLX5E_MSG_LEVELNETIF_MSG_LINK + +#define mlx5e_dbg(mlevel, priv, format, ...)\ +do {\ + if (NETIF_MSG_##mlevel & (priv)->msglevel) \ + netdev_warn(priv->netdev, format, \ + ##__VA_ARGS__); \ +} while (0) + + static inline u16 mlx5_min_rx_wqes(int wq_type, u32 wq_size) { switch (wq_type) { @@ -754,6 +764,7 @@ struct mlx5e_priv { #endif /* priv data path fields - end */ + u32msglevel; unsigned long state; struct mutex state_lock; /* Protects Interface state */ struct mlx5e_rqdrop_rq; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c index b34aa8efb036..63d1ac695a75 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c @@ -1340,6 +1340,16 @@ static int mlx5e_set_wol(struct net_device *netdev, struct ethtool_wolinfo *wol) return mlx5_set_port_wol(mdev, mlx5_wol_mode); } +static u32 mlx5e_get_msglevel(struct net_device *dev) +{ + return ((struct mlx5e_priv *)netdev_priv(dev))->msglevel; +} + +static void mlx5e_set_msglevel(struct net_device *dev, u32 val) +{ + ((struct mlx5e_priv *)netdev_priv(dev))->msglevel = val; +} + static int mlx5e_set_phys_id(struct net_device *dev, enum ethtool_phys_id_state state) { @@ -1672,4 +1682,7 @@ const struct ethtool_ops mlx5e_ethtool_ops = { .get_priv_flags= mlx5e_get_priv_flags, .set_priv_flags= mlx5e_set_priv_flags, .self_test = mlx5e_self_test, + .get_msglevel = mlx5e_get_msglevel, + .set_msglevel = mlx5e_set_msglevel, + }; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index a97ee38143aa..73d7c672c4ff 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -4091,6 +4091,7 @@ static void mlx5e_build_nic_netdev_priv(struct mlx5_core_dev *mdev, priv->netdev = netdev; priv->profile = profile; priv->ppriv = ppriv; + priv->msglevel= MLX5E_MSG_LEVEL; priv->hard_mtu = MLX5E_ETH_HARD_MTU; mlx5e_build_nic_params(mdev, &priv->channels.params, profile->max_nch(mdev)); -- 2.14.2
[net-next 12/12] net/mlx5e: Enable CQE based moderation on TX CQ
From: Tal Gilboa By using CQE based moderation on TX CQ we can reduce the number of TX interrupt rate. Besides the benefit of less interrupts, this also allows the kernel to better utilize TSO. Since TSO has some CPU overhead, it might not aggregate when CPU is under high stress. By reducing the interrupt rate and the CPU utilization, we can get better aggregation and better overall throughput. The feature is enabled by default and has a private flag in ethtool for control. Throughput, interrupt rate and TSO utilization improvements: (ConnectX-4Lx 40GbE, unidirectional, 1/16 TCP streams, 64B packets) - Metric | Streams | CQE Based | EQE Based | improvement - BW |1| 2.4Gb/s | 2.15Gb/s | +11.6% IR |1| 27Kips | 50.6Kips | -46.7% TSO Util |1| 74.6%| 71% | +5% BW |16 | 29Gb/s | 25.85Gb/s | +12.2% IR |16 | 482Kips | 745Kips | -35.3% TSO Util |16 | 69.1%| 49% | +41.1% *BW = Bandwidth, IR = Interrupt rate, ips = interrupt per second. TSO Util = bytes in TSO sessions / all bytes transferred Signed-off-by: Tal Gilboa Signed-off-by: Saeed Mahameed Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 9 +++-- .../net/ethernet/mellanox/mlx5/core/en_ethtool.c | 39 +- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 38 +++-- drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 8 +++-- 4 files changed, 71 insertions(+), 23 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 95facdf62c77..751f62cae969 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -106,6 +106,7 @@ #define MLX5E_PARAMS_DEFAULT_RX_CQ_MODERATION_USEC_FROM_CQE 0x3 #define MLX5E_PARAMS_DEFAULT_RX_CQ_MODERATION_PKTS 0x20 #define MLX5E_PARAMS_DEFAULT_TX_CQ_MODERATION_USEC 0x10 +#define MLX5E_PARAMS_DEFAULT_TX_CQ_MODERATION_USEC_FROM_CQE 0x10 #define MLX5E_PARAMS_DEFAULT_TX_CQ_MODERATION_PKTS 0x20 #define MLX5E_PARAMS_DEFAULT_MIN_RX_WQES0x80 #define MLX5E_PARAMS_DEFAULT_MIN_RX_WQES_MPW0x2 @@ -198,12 +199,14 @@ extern const char mlx5e_self_tests[][ETH_GSTRING_LEN]; static const char mlx5e_priv_flags[][ETH_GSTRING_LEN] = { "rx_cqe_moder", + "tx_cqe_moder", "rx_cqe_compress", }; enum mlx5e_priv_flag { MLX5E_PFLAG_RX_CQE_BASED_MODER = (1 << 0), - MLX5E_PFLAG_RX_CQE_COMPRESS = (1 << 1), + MLX5E_PFLAG_TX_CQE_BASED_MODER = (1 << 1), + MLX5E_PFLAG_RX_CQE_COMPRESS = (1 << 2), }; #define MLX5E_SET_PFLAG(params, pflag, enable) \ @@ -223,6 +226,7 @@ enum mlx5e_priv_flag { struct mlx5e_cq_moder { u16 usec; u16 pkts; + u8 cq_period_mode; }; struct mlx5e_params { @@ -234,7 +238,6 @@ struct mlx5e_params { u8 log_rq_size; u16 num_channels; u8 num_tc; - u8 rx_cq_period_mode; bool rx_cqe_compress_def; struct mlx5e_cq_moder rx_cq_moderation; struct mlx5e_cq_moder tx_cq_moderation; @@ -926,6 +929,8 @@ void mlx5e_build_default_indir_rqt(u32 *indirection_rqt, int len, int num_channels); int mlx5e_get_max_linkspeed(struct mlx5_core_dev *mdev, u32 *speed); +void mlx5e_set_tx_cq_mode_params(struct mlx5e_params *params, +u8 cq_period_mode); void mlx5e_set_rx_cq_mode_params(struct mlx5e_params *params, u8 cq_period_mode); void mlx5e_set_rq_type_params(struct mlx5_core_dev *mdev, diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c index 63d1ac695a75..23425f028405 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c @@ -1454,29 +1454,36 @@ static int mlx5e_get_module_eeprom(struct net_device *netdev, typedef int (*mlx5e_pflag_handler)(struct net_device *netdev, bool enable); -static int set_pflag_rx_cqe_based_moder(struct net_device *netdev, bool enable) +static int set_pflag_cqe_based_moder(struct net_device *netdev, bool enable, +bool is_rx_cq) { struct mlx5e_priv *priv = netdev_priv(netdev); struct mlx5_core_dev *mdev = priv->mdev; struct mlx5e_channels new_channels = {}; - bool rx_mode_changed; - u8 rx_cq_period_mode; + bool mode_changed; + u8 cq_period_mode, current_cq_period_mode; int err = 0; - rx_cq_period_mode = enable ? + cq_period_mode = enable ? MLX5_CQ_PERIOD_MODE_START_FROM_CQE : MLX5_CQ_PERIOD_MODE_START_FROM_EQE; - rx_mode_changed = rx_cq_
[net-next 09/12] net/mlx5: Enlarge the NIC TC offload table size
From: Or Gerlitz The NIC TC offload table size was hard coded to 1k. Change it to be min(max NIC RX table size, min(max flow counters, 64k) * num flow groups) where the max values are read from the firmware and the number of flow groups is hard-coded as before this change. We don't know upfront the division of flows to groups (== different masks). This setup allows each group to be of size up to the where we want to go (when supported, all offloaded flows use counters). Thus, we don't expect multiple occurences for a group which in turn would add steering hops. Signed-off-by: Or Gerlitz Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index 9ba1f72060aa..55979ec2e88a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -90,8 +90,8 @@ enum { MLX5_HEADER_TYPE_NVGRE = 0x1, }; -#define MLX5E_TC_TABLE_NUM_ENTRIES 1024 #define MLX5E_TC_TABLE_NUM_GROUPS 4 +#define MLX5E_TC_TABLE_MAX_GROUP_SIZE (1 << 16) struct mod_hdr_key { int num_actions; @@ -263,10 +263,21 @@ mlx5e_tc_add_nic_flow(struct mlx5e_priv *priv, } if (IS_ERR_OR_NULL(priv->fs.tc.t)) { + int tc_grp_size, tc_tbl_size; + u32 max_flow_counter; + + max_flow_counter = (MLX5_CAP_GEN(dev, max_flow_counter_31_16) << 16) | + MLX5_CAP_GEN(dev, max_flow_counter_15_0); + + tc_grp_size = min_t(int, max_flow_counter, MLX5E_TC_TABLE_MAX_GROUP_SIZE); + + tc_tbl_size = min_t(int, tc_grp_size * MLX5E_TC_TABLE_NUM_GROUPS, + BIT(MLX5_CAP_FLOWTABLE_NIC_RX(dev, log_max_ft_size))); + priv->fs.tc.t = mlx5_create_auto_grouped_flow_table(priv->fs.ns, MLX5E_TC_PRIO, - MLX5E_TC_TABLE_NUM_ENTRIES, + tc_tbl_size, MLX5E_TC_TABLE_NUM_GROUPS, 0, 0); if (IS_ERR(priv->fs.tc.t)) { -- 2.14.2
[net-next 05/12] net/mlx5e: Add dcbnl dscp to priority support
From: Huy Nguyen This patch implements dcbnl hooks to set and delete DSCP to priority map as defined by the DCB subsystem. Device maintains internal trust state which needs to be set to DSCP state for performing DSCP to priority mapping. When the first dscp to priority APP entry is added by the user, the trust state is changed to dscp. When the last dscp to priority APP entry is deleted by the user, the trust state is changed to pcp. If user sends multiple dscp to priority APP entries on the same dscp, the last sent one will take effect. All the previous sent will be deleted. The dscp to priority APP entries are added and deleted in the net/dcb APP database using dcb_ieee_setapp/getapp. Signed-off-by: Huy Nguyen Reviewed-by: Parav Pandit Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 15 +- drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 204 + drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 15 +- 3 files changed, 232 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index e613ce02216d..ab6f0c18850f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -57,6 +57,7 @@ #define MLX5E_HW2SW_MTU(priv, hwmtu) ((hwmtu) - ((priv)->hard_mtu)) #define MLX5E_SW2HW_MTU(priv, swmtu) ((swmtu) + ((priv)->hard_mtu)) +#define MLX5E_MAX_DSCP 64 #define MLX5E_MAX_NUM_TC 8 #define MLX5E_PARAMS_MINIMUM_LOG_SQ_SIZE0x6 @@ -260,11 +261,17 @@ enum { struct mlx5e_dcbx { enum mlx5_dcbx_oper_mode mode; struct mlx5e_cee_configcee_cfg; /* pending configuration */ + u8 dscp_app_cnt; /* The only setting that cannot be read from FW */ u8 tc_tsa[IEEE_8021QAZ_MAX_TCS]; u8 cap; }; + +struct mlx5e_dcbx_dp { + u8 dscp2prio[MLX5E_MAX_DSCP]; + u8 trust_state; +}; #endif enum { @@ -742,6 +749,9 @@ struct mlx5e_priv { /* priv data path fields - start */ struct mlx5e_txqsq *txq2sq[MLX5E_MAX_NUM_CHANNELS * MLX5E_MAX_NUM_TC]; int channel_tc2txq[MLX5E_MAX_NUM_CHANNELS][MLX5E_MAX_NUM_TC]; +#ifdef CONFIG_MLX5_CORE_EN_DCB + struct mlx5e_dcbx_dp dcbx_dp; +#endif /* priv data path fields - end */ unsigned long state; @@ -800,6 +810,8 @@ struct mlx5e_profile { mlx5e_fp_handle_rx_cqe handle_rx_cqe; mlx5e_fp_handle_rx_cqe handle_rx_cqe_mpwqe; } rx_handlers; + void(*netdev_registered_init)(struct mlx5e_priv *priv); + void(*netdev_registered_remove)(struct mlx5e_priv *priv); int max_tc; }; @@ -968,6 +980,8 @@ extern const struct ethtool_ops mlx5e_ethtool_ops; extern const struct dcbnl_rtnl_ops mlx5e_dcbnl_ops; int mlx5e_dcbnl_ieee_setets_core(struct mlx5e_priv *priv, struct ieee_ets *ets); void mlx5e_dcbnl_initialize(struct mlx5e_priv *priv); +void mlx5e_dcbnl_init_app(struct mlx5e_priv *priv); +void mlx5e_dcbnl_delete_app(struct mlx5e_priv *priv); #endif #ifndef CONFIG_RFS_ACCEL @@ -1069,5 +1083,4 @@ void mlx5e_destroy_netdev(struct mlx5e_priv *priv); void mlx5e_build_nic_params(struct mlx5_core_dev *mdev, struct mlx5e_params *params, u16 max_channels); - #endif /* __MLX5_EN_H__ */ diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c index 51c4cc00a186..aa59c4324159 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c @@ -46,6 +46,13 @@ enum { MLX5E_LOWEST_PRIO_GROUP = 0, }; +#define MLX5_DSCP_SUPPORTED(mdev) (MLX5_CAP_GEN(mdev, qcam_reg) && \ + MLX5_CAP_QCAM_REG(mdev, qpts) && \ + MLX5_CAP_QCAM_REG(mdev, qpdpm)) + +static int mlx5e_set_trust_state(struct mlx5e_priv *priv, u8 trust_state); +static int mlx5e_set_dscp2prio(struct mlx5e_priv *priv, u8 dscp, u8 prio); + /* If dcbx mode is non-host set the dcbx mode to host. */ static int mlx5e_dcbnl_set_dcbx_mode(struct mlx5e_priv *priv, @@ -381,6 +388,113 @@ static u8 mlx5e_dcbnl_setdcbx(struct net_device *dev, u8 mode) return 0; } +static int mlx5e_dcbnl_ieee_setapp(struct net_device *dev, struct dcb_app *app) +{ + struct mlx5e_priv *priv = netdev_priv(dev); + struct dcb_app temp; + bool is_new; + int err; + + if (app->selector != IEEE_8021QAZ_APP_SEL_DSCP) + return -EINVAL; + + if (!MLX5_CAP_GEN(priv->mdev, vport_group_manager)) + return -EINVAL; + + if (!MLX5_DSCP_SUPPORTED(priv->mdev)) + return -EINVAL; + + if (app->protocol >= MLX5E_MAX_DSC
[net-next 03/12] net/mlx5: Add MLX5_SET16 and MLX5_GET16
From: Huy Nguyen Add MLX5_SET16 and MLX5_GET16 for 16bit structure field in firmware command. Signed-off-by: Huy Nguyen Reviewed-by: Parav Pandit Signed-off-by: Saeed Mahameed --- include/linux/mlx5/device.h | 17 + 1 file changed, 17 insertions(+) diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h index 6d79b3f79458..409ffb14298a 100644 --- a/include/linux/mlx5/device.h +++ b/include/linux/mlx5/device.h @@ -49,11 +49,15 @@ #define __mlx5_nullp(typ) ((struct mlx5_ifc_##typ##_bits *)0) #define __mlx5_bit_sz(typ, fld) sizeof(__mlx5_nullp(typ)->fld) #define __mlx5_bit_off(typ, fld) (offsetof(struct mlx5_ifc_##typ##_bits, fld)) +#define __mlx5_16_off(typ, fld) (__mlx5_bit_off(typ, fld) / 16) #define __mlx5_dw_off(typ, fld) (__mlx5_bit_off(typ, fld) / 32) #define __mlx5_64_off(typ, fld) (__mlx5_bit_off(typ, fld) / 64) +#define __mlx5_16_bit_off(typ, fld) (16 - __mlx5_bit_sz(typ, fld) - (__mlx5_bit_off(typ, fld) & 0xf)) #define __mlx5_dw_bit_off(typ, fld) (32 - __mlx5_bit_sz(typ, fld) - (__mlx5_bit_off(typ, fld) & 0x1f)) #define __mlx5_mask(typ, fld) ((u32)((1ull << __mlx5_bit_sz(typ, fld)) - 1)) #define __mlx5_dw_mask(typ, fld) (__mlx5_mask(typ, fld) << __mlx5_dw_bit_off(typ, fld)) +#define __mlx5_mask16(typ, fld) ((u16)((1ull << __mlx5_bit_sz(typ, fld)) - 1)) +#define __mlx5_16_mask(typ, fld) (__mlx5_mask16(typ, fld) << __mlx5_16_bit_off(typ, fld)) #define __mlx5_st_sz_bits(typ) sizeof(struct mlx5_ifc_##typ##_bits) #define MLX5_FLD_SZ_BYTES(typ, fld) (__mlx5_bit_sz(typ, fld) / 8) @@ -116,6 +120,19 @@ __mlx5_mask(typ, fld)) ___t; \ }) +#define MLX5_GET16(typ, p, fld) ((be16_to_cpu(*((__be16 *)(p) +\ +__mlx5_16_off(typ, fld))) >> __mlx5_16_bit_off(typ, fld)) & \ +__mlx5_mask16(typ, fld)) + +#define MLX5_SET16(typ, p, fld, v) do { \ + u16 _v = v; \ + BUILD_BUG_ON(__mlx5_st_sz_bits(typ) % 16); \ + *((__be16 *)(p) + __mlx5_16_off(typ, fld)) = \ + cpu_to_be16((be16_to_cpu(*((__be16 *)(p) + __mlx5_16_off(typ, fld))) & \ +(~__mlx5_16_mask(typ, fld))) | (((_v) & __mlx5_mask16(typ, fld)) \ +<< __mlx5_16_bit_off(typ, fld))); \ +} while (0) + /* Big endian getters */ #define MLX5_GET64_BE(typ, p, fld) (*((__be64 *)(p) +\ __mlx5_64_off(typ, fld))) -- 2.14.2
[net-next 08/12] net/mlx5e: DCBNL, Add debug messages log
From: Inbar Karmy Add debug print when changing the configuration of QoS through dcbnl. Use ethtool -s msglvl hw on/off to toggle debug messages. Signed-off-by: Inbar Karmy Reviewed-by: Gal Pressman Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 24 +- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c index b402d69a701b..c6d90b6dd80e 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c @@ -241,7 +241,7 @@ int mlx5e_dcbnl_ieee_setets_core(struct mlx5e_priv *priv, struct ieee_ets *ets) u8 tc_tx_bw[IEEE_8021QAZ_MAX_TCS]; u8 tc_group[IEEE_8021QAZ_MAX_TCS]; int max_tc = mlx5_max_tc(mdev); - int err; + int err, i; mlx5e_build_tc_group(ets, tc_group, max_tc); mlx5e_build_tc_tx_bw(ets, tc_tx_bw, tc_group, max_tc); @@ -260,6 +260,14 @@ int mlx5e_dcbnl_ieee_setets_core(struct mlx5e_priv *priv, struct ieee_ets *ets) return err; memcpy(priv->dcbx.tc_tsa, ets->tc_tsa, sizeof(ets->tc_tsa)); + + for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++) { + mlx5e_dbg(HW, priv, "%s: prio_%d <=> tc_%d\n", + __func__, i, ets->prio_tc[i]); + mlx5e_dbg(HW, priv, "%s: tc_%d <=> tx_bw_%d%%, group_%d\n", + __func__, i, tc_tx_bw[i], tc_group[i]); + } + return err; } @@ -345,6 +353,11 @@ static int mlx5e_dcbnl_ieee_setpfc(struct net_device *dev, ret = mlx5_set_port_pfc(mdev, pfc->pfc_en, pfc->pfc_en); mlx5_toggle_port_link(mdev); + if (!ret) { + mlx5e_dbg(HW, priv, + "%s: PFC per priority bit mask: 0x%x\n", + __func__, pfc->pfc_en); + } return ret; } @@ -560,6 +573,11 @@ static int mlx5e_dcbnl_ieee_setmaxrate(struct net_device *netdev, } } + for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++) { + mlx5e_dbg(HW, priv, "%s: tc_%d <=> max_bw %d Gbps\n", + __func__, i, max_bw_value[i]); + } + return mlx5_modify_port_ets_rate_limit(mdev, max_bw_value, max_bw_unit); } @@ -585,6 +603,10 @@ static u8 mlx5e_dcbnl_setall(struct net_device *netdev) ets.tc_rx_bw[i] = cee_cfg->pg_bw_pct[i]; ets.tc_tsa[i] = IEEE_8021QAZ_TSA_ETS; ets.prio_tc[i] = cee_cfg->prio_to_pg_map[i]; + mlx5e_dbg(HW, priv, + "%s: Priority group %d: tx_bw %d, rx_bw %d, prio_tc %d\n", + __func__, i, ets.tc_tx_bw[i], ets.tc_rx_bw[i], + ets.prio_tc[i]); } err = mlx5e_dbcnl_validate_ets(netdev, &ets); -- 2.14.2
[pull request][net-next 00/12] Mellanox, mlx5 updates 2017-11-04
Hi Dave, The following series provides updates for mlx5 driver which includes dscp to priority mapping support and some other misc small changes. For extra information please see tag log below. Please Pull and let me know if ther's any problem. Thanks, Saeed. --- The following changes since commit 6ee79b6ebf6613f1c5bf2be0c3dca4e51817f2ca: Merge branch 'net-mini_Qdisc' (2017-11-03 21:57:35 +0900) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-updates-2017-11-04 for you to fetch changes up to a5b2e77eab21e56ec7bb16a0ebc8f0fb18799191: net/mlx5e: Enable CQE based moderation on TX CQ (2017-11-04 01:33:48 -0700) mlx5-updates-2017-11-04 This series includes: >From Huy, dscp to priority mapping for Ethernet packet. = First six patches enable differentiated services code point (dscp) to priority mapping for Ethernet packet. Once this feature is enabled, the packet is routed to the corresponding priority based on its dscp. User can combine this feature with priority flow control (pfc) feature to have priority flow control based on the dscp. Firmware interface: Mellanox firmware provides two control knobs for this feature: QPTS register allow changing the trust state between dscp and pcp mode. The default is pcp mode. Once in dscp mode, firmware will route the packet based on its dscp value if the dscp field exists. QPDPM register allow mapping a specific dscp (0 to 63) to a specific priority (0 to 7). By default, all the dscps are mapped to priority zero. Software interface: This feature is controlled via application priority TLV. IEEE specification P802.1Qcd/D2.1 defines priority selector id 5 for application priority TLV. This APP TLV selector defines DSCP to priority map. This APP TLV can be sent by the switch or can be set locally using software such as lldptool. In mlx5 drivers, we add the support for net dcb's getapp and setapp call back. Mlx5 driver only handles the selector id 5 application entry (dscp application priority application entry). If user sends multiple dscp to priority APP TLV entries on the same dscp, the last sent one will take effect. All the previous sent will be deleted. The firmware trust state (in QPTS register) is changed based on the number of dscp to priority application entries. When the first dscp to priority application entry is added by the user, the trust state is changed to dscp. When the last dscp to priority application entry is deleted by the user, the trust state is changed to pcp. When the port is in DSCP trust state, the transmit queue is selected based on the dscp of the skb. When the port is in DSCP trust state and vport inline mode is not NONE, firmware requires mlx5 driver to copy the IP header to the wqe ethernet segment inline header if the skb has it. This is done by changing the transmit queue sq's min inline mode to L3. Note that the min inline mode of sqs that belong to other features such as xdpsq, icosq are not modified. == Plus to the dscp series, some small misc changes are include as well: >From Inbar, Ethtool msglvl support and some debug prints in DCBNL logic >From Or Gerlitz, Enlarge the NIC TC offload table size >From Rabie, Initialize destination_flow struct to 0 >From Feras, Add inner TTC table to IPoIB flow steering >From Tal, Enable CQE based moderation on TX CQ Thanks, Saeed. Feras Daoud (1): net/mlx5e: IPoIB, Add inner TTC table to IPoIB flow steering Gal Pressman (1): net/mlx5e: Add support for ethtool msglvl support Huy Nguyen (6): net/dcb: Add dscp to priority selector type net/mlx5: QCAM register firmware command support net/mlx5: Add MLX5_SET16 and MLX5_GET16 net/mlx5: QPTS and QPDPM register firmware command support net/mlx5e: Add dcbnl dscp to priority support net/mlx5e: Support DSCP trust state to Ethernet's IP packet on SQ Inbar Karmy (1): net/mlx5e: DCBNL, Add debug messages log Or Gerlitz (1): net/mlx5: Enlarge the NIC TC offload table size Rabie Loulou (1): net/mlx5: Initialize destination_flow struct to 0 Tal Gilboa (1): net/mlx5e: Enable CQE based moderation on TX CQ drivers/net/ethernet/mellanox/mlx5/core/en.h | 39 ++- drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c | 10 +- .../net/ethernet/mellanox/mlx5/core/en_common.c| 12 + drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 265 - .../net/ethernet/mellanox/mlx5/core/en_ethtool.c | 52 +++- drivers/net/ethernet/mellanox/mlx5/core/en_fs.c| 12 +- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 59 +++-- drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 8 +- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c|
[net-next 01/12] net/dcb: Add dscp to priority selector type
From: Huy Nguyen IEEE specification P802.1Qcd/D2.1 defines priority selector 5. This APP TLV selector defines DSCP to priority map. This patch defines such DSCP selector. Signed-off-by: Huy Nguyen Reviewed-by: Parav Pandit Signed-off-by: Saeed Mahameed --- include/uapi/linux/dcbnl.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/uapi/linux/dcbnl.h b/include/uapi/linux/dcbnl.h index 3ea470f35e40..16e45c0ecd2f 100644 --- a/include/uapi/linux/dcbnl.h +++ b/include/uapi/linux/dcbnl.h @@ -205,6 +205,7 @@ struct cee_pfc { #define IEEE_8021QAZ_APP_SEL_STREAM2 #define IEEE_8021QAZ_APP_SEL_DGRAM 3 #define IEEE_8021QAZ_APP_SEL_ANY 4 +#define IEEE_8021QAZ_APP_SEL_DSCP 5 /* This structure contains the IEEE 802.1Qaz APP managed object. This * object is also used for the CEE std as well. -- 2.14.2
[net-next 10/12] net/mlx5: Initialize destination_flow struct to 0
From: Rabie Loulou This is needed in order to enlarge it with more members that will get value of 0 when not set. Signed-off-by: Rabie Loulou Reviewed-by: Roi Dayan Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c | 10 +- drivers/net/ethernet/mellanox/mlx5/core/en_fs.c| 8 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 6 +++--- drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 4 ++-- 5 files changed, 15 insertions(+), 15 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c index 12d3ced61114..610d485c4b03 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c @@ -92,7 +92,7 @@ static enum mlx5e_traffic_types arfs_get_tt(enum arfs_type type) static int arfs_disable(struct mlx5e_priv *priv) { - struct mlx5_flow_destination dest; + struct mlx5_flow_destination dest = {}; struct mlx5e_tir *tir = priv->indir_tir; int err = 0; int tt; @@ -126,7 +126,7 @@ int mlx5e_arfs_disable(struct mlx5e_priv *priv) int mlx5e_arfs_enable(struct mlx5e_priv *priv) { - struct mlx5_flow_destination dest; + struct mlx5_flow_destination dest = {}; int err = 0; int tt; int i; @@ -175,7 +175,7 @@ static int arfs_add_default_rule(struct mlx5e_priv *priv, { struct arfs_table *arfs_t = &priv->fs.arfs.arfs_tables[type]; struct mlx5e_tir *tir = priv->indir_tir; - struct mlx5_flow_destination dest; + struct mlx5_flow_destination dest = {}; MLX5_DECLARE_FLOW_ACT(flow_act); struct mlx5_flow_spec *spec; enum mlx5e_traffic_types tt; @@ -466,7 +466,7 @@ static struct mlx5_flow_handle *arfs_add_rule(struct mlx5e_priv *priv, struct mlx5e_arfs_tables *arfs = &priv->fs.arfs; struct arfs_tuple *tuple = &arfs_rule->tuple; struct mlx5_flow_handle *rule = NULL; - struct mlx5_flow_destination dest; + struct mlx5_flow_destination dest = {}; MLX5_DECLARE_FLOW_ACT(flow_act); struct arfs_table *arfs_table; struct mlx5_flow_spec *spec; @@ -557,7 +557,7 @@ static struct mlx5_flow_handle *arfs_add_rule(struct mlx5e_priv *priv, static void arfs_modify_rule_rq(struct mlx5e_priv *priv, struct mlx5_flow_handle *rule, u16 rxq) { - struct mlx5_flow_destination dst; + struct mlx5_flow_destination dst = {}; int err = 0; dst.type = MLX5_FLOW_DESTINATION_TYPE_TIR; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c index 850cdc980ab5..8016c8aa946d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c @@ -162,7 +162,7 @@ static int __mlx5e_add_vlan_rule(struct mlx5e_priv *priv, u16 vid, struct mlx5_flow_spec *spec) { struct mlx5_flow_table *ft = priv->fs.vlan.ft.t; - struct mlx5_flow_destination dest; + struct mlx5_flow_destination dest = {}; struct mlx5_flow_handle **rule_p; MLX5_DECLARE_FLOW_ACT(flow_act); int err = 0; @@ -738,7 +738,7 @@ mlx5e_generate_ttc_rule(struct mlx5e_priv *priv, static int mlx5e_generate_ttc_table_rules(struct mlx5e_priv *priv) { - struct mlx5_flow_destination dest; + struct mlx5_flow_destination dest = {}; struct mlx5e_ttc_table *ttc; struct mlx5_flow_handle **rules; struct mlx5_flow_table *ft; @@ -909,7 +909,7 @@ mlx5e_generate_inner_ttc_rule(struct mlx5e_priv *priv, static int mlx5e_generate_inner_ttc_table_rules(struct mlx5e_priv *priv) { - struct mlx5_flow_destination dest; + struct mlx5_flow_destination dest = {}; struct mlx5_flow_handle **rules; struct mlx5e_ttc_table *ttc; struct mlx5_flow_table *ft; @@ -1106,7 +1106,7 @@ static int mlx5e_add_l2_flow_rule(struct mlx5e_priv *priv, struct mlx5e_l2_rule *ai, int type) { struct mlx5_flow_table *ft = priv->fs.l2.ft.t; - struct mlx5_flow_destination dest; + struct mlx5_flow_destination dest = {}; MLX5_DECLARE_FLOW_ACT(flow_act); struct mlx5_flow_spec *spec; int err = 0; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c index c77f4c0c7769..bbb140f517c4 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c @@ -157,7 +157,7 @@ __esw_fdb_set_vport_rule(struct mlx5_eswitch *esw, u32 vport, bool rx_rule, MLX5_MATCH_OUTER_HEADERS); struct mlx5_flow_handle *flow_rule = NULL; struct mlx5_flow_act flow_act = {0
[net-next 02/12] net/mlx5: QCAM register firmware command support
From: Huy Nguyen The QCAM register provides capability bit for all the QoS registers using ACCESS_REG command. Signed-off-by: Huy Nguyen Reviewed-by: Parav Pandit Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/fw.c | 10 ++ .../net/ethernet/mellanox/mlx5/core/mlx5_core.h| 2 ++ drivers/net/ethernet/mellanox/mlx5/core/port.c | 12 +++ include/linux/mlx5/device.h| 14 include/linux/mlx5/driver.h| 2 ++ include/linux/mlx5/mlx5_ifc.h | 40 +- 6 files changed, 79 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw.c b/drivers/net/ethernet/mellanox/mlx5/core/fw.c index 2c71557d1cee..5ef1b56b6a96 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fw.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fw.c @@ -106,6 +106,13 @@ static int mlx5_get_mcam_reg(struct mlx5_core_dev *dev) MLX5_MCAM_REGS_FIRST_128); } +static int mlx5_get_qcam_reg(struct mlx5_core_dev *dev) +{ + return mlx5_query_qcam_reg(dev, dev->caps.qcam, + MLX5_QCAM_FEATURE_ENHANCED_FEATURES, + MLX5_QCAM_REGS_FIRST_128); +} + int mlx5_query_hca_caps(struct mlx5_core_dev *dev) { int err; @@ -182,6 +189,9 @@ int mlx5_query_hca_caps(struct mlx5_core_dev *dev) if (MLX5_CAP_GEN(dev, mcam_reg)) mlx5_get_mcam_reg(dev); + if (MLX5_CAP_GEN(dev, qcam_reg)) + mlx5_get_qcam_reg(dev); + return 0; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h index 8f00de2fe283..ff4a0b889a6f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h @@ -122,6 +122,8 @@ int mlx5_query_pcam_reg(struct mlx5_core_dev *dev, u32 *pcam, u8 feature_group, u8 access_reg_group); int mlx5_query_mcam_reg(struct mlx5_core_dev *dev, u32 *mcap, u8 feature_group, u8 access_reg_group); +int mlx5_query_qcam_reg(struct mlx5_core_dev *mdev, u32 *qcam, + u8 feature_group, u8 access_reg_group); void mlx5_lag_add(struct mlx5_core_dev *dev, struct net_device *netdev); void mlx5_lag_remove(struct mlx5_core_dev *dev); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/port.c b/drivers/net/ethernet/mellanox/mlx5/core/port.c index e07061f565d6..b6553be841f9 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/port.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/port.c @@ -98,6 +98,18 @@ int mlx5_query_mcam_reg(struct mlx5_core_dev *dev, u32 *mcam, u8 feature_group, return mlx5_core_access_reg(dev, in, sz, mcam, sz, MLX5_REG_MCAM, 0, 0); } +int mlx5_query_qcam_reg(struct mlx5_core_dev *mdev, u32 *qcam, + u8 feature_group, u8 access_reg_group) +{ + u32 in[MLX5_ST_SZ_DW(qcam_reg)] = {}; + int sz = MLX5_ST_SZ_BYTES(qcam_reg); + + MLX5_SET(qcam_reg, in, feature_group, feature_group); + MLX5_SET(qcam_reg, in, access_reg_group, access_reg_group); + + return mlx5_core_access_reg(mdev, in, sz, qcam, sz, MLX5_REG_QCAM, 0, 0); +} + struct mlx5_reg_pcap { u8 rsvd0; u8 port_num; diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h index e32dbc4934db..6d79b3f79458 100644 --- a/include/linux/mlx5/device.h +++ b/include/linux/mlx5/device.h @@ -1000,6 +1000,14 @@ enum mlx5_mcam_feature_groups { MLX5_MCAM_FEATURE_ENHANCED_FEATURES = 0x0, }; +enum mlx5_qcam_reg_groups { + MLX5_QCAM_REGS_FIRST_128= 0x0, +}; + +enum mlx5_qcam_feature_groups { + MLX5_QCAM_FEATURE_ENHANCED_FEATURES = 0x0, +}; + /* GET Dev Caps macros */ #define MLX5_CAP_GEN(mdev, cap) \ MLX5_GET(cmd_hca_cap, mdev->caps.hca_cur[MLX5_CAP_GENERAL], cap) @@ -1108,6 +1116,12 @@ enum mlx5_mcam_feature_groups { #define MLX5_CAP_MCAM_FEATURE(mdev, fld) \ MLX5_GET(mcam_reg, (mdev)->caps.mcam, mng_feature_cap_mask.enhanced_features.fld) +#define MLX5_CAP_QCAM_REG(mdev, fld) \ + MLX5_GET(qcam_reg, (mdev)->caps.qcam, qos_access_reg_cap_mask.reg_cap.fld) + +#define MLX5_CAP_QCAM_FEATURE(mdev, fld) \ + MLX5_GET(qcam_reg, (mdev)->caps.qcam, qos_feature_cap_mask.feature_cap.fld) + #define MLX5_CAP_FPGA(mdev, cap) \ MLX5_GET(fpga_cap, (mdev)->caps.fpga, cap) diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 08c77b7e59cb..ed5be52282ea 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -109,6 +109,7 @@ enum { enum { MLX5_REG_QETCR = 0x4005, MLX5_REG_QTCT= 0x400a, + MLX5_REG_QCAM= 0x4019, MLX5_REG_DCBX_PARAM = 0x4020, MLX5_REG_DCBX_APP
[net-next 11/12] net/mlx5e: IPoIB, Add inner TTC table to IPoIB flow steering
From: Feras Daoud For supported platforms, add inner TTC flow table to enhanced IPoIB flow steering. Signed-off-by: Feras Daoud Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 3 +++ drivers/net/ethernet/mellanox/mlx5/core/en_fs.c | 4 ++-- drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c | 12 +++- 3 files changed, 16 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 8c872e2e1aa0..95facdf62c77 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -1045,6 +1045,9 @@ void mlx5e_destroy_rqt(struct mlx5e_priv *priv, struct mlx5e_rqt *rqt); int mlx5e_create_ttc_table(struct mlx5e_priv *priv); void mlx5e_destroy_ttc_table(struct mlx5e_priv *priv); +int mlx5e_create_inner_ttc_table(struct mlx5e_priv *priv); +void mlx5e_destroy_inner_ttc_table(struct mlx5e_priv *priv); + int mlx5e_create_tis(struct mlx5_core_dev *mdev, int tc, u32 underlay_qpn, u32 *tisn); void mlx5e_destroy_tis(struct mlx5_core_dev *mdev, u32 tisn); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c index 8016c8aa946d..f0d11ad05ed2 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs.c @@ -1005,7 +1005,7 @@ static int mlx5e_create_inner_ttc_table_groups(struct mlx5e_ttc_table *ttc) return err; } -static int mlx5e_create_inner_ttc_table(struct mlx5e_priv *priv) +int mlx5e_create_inner_ttc_table(struct mlx5e_priv *priv) { struct mlx5e_ttc_table *ttc = &priv->fs.inner_ttc; struct mlx5_flow_table_attr ft_attr = {}; @@ -1041,7 +1041,7 @@ static int mlx5e_create_inner_ttc_table(struct mlx5e_priv *priv) return err; } -static void mlx5e_destroy_inner_ttc_table(struct mlx5e_priv *priv) +void mlx5e_destroy_inner_ttc_table(struct mlx5e_priv *priv) { struct mlx5e_ttc_table *ttc = &priv->fs.inner_ttc; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c index abf270d7f556..d2a66dc4adc6 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c @@ -255,15 +255,24 @@ static int mlx5i_create_flow_steering(struct mlx5e_priv *priv) priv->netdev->hw_features &= ~NETIF_F_NTUPLE; } + err = mlx5e_create_inner_ttc_table(priv); + if (err) { + netdev_err(priv->netdev, "Failed to create inner ttc table, err=%d\n", + err); + goto err_destroy_arfs_tables; + } + err = mlx5e_create_ttc_table(priv); if (err) { netdev_err(priv->netdev, "Failed to create ttc table, err=%d\n", err); - goto err_destroy_arfs_tables; + goto err_destroy_inner_ttc_table; } return 0; +err_destroy_inner_ttc_table: + mlx5e_destroy_inner_ttc_table(priv); err_destroy_arfs_tables: mlx5e_arfs_destroy_tables(priv); @@ -273,6 +282,7 @@ static int mlx5i_create_flow_steering(struct mlx5e_priv *priv) static void mlx5i_destroy_flow_steering(struct mlx5e_priv *priv) { mlx5e_destroy_ttc_table(priv); + mlx5e_destroy_inner_ttc_table(priv); mlx5e_arfs_destroy_tables(priv); } -- 2.14.2
[net-next 06/12] net/mlx5e: Support DSCP trust state to Ethernet's IP packet on SQ
From: Huy Nguyen If the port is in DSCP trust state, packets are placed in the right priority queue based on the dscp value. This is done by selecting the transmit queue based on the dscp of the skb. Until now select_queue honors priority only from the vlan header. However that is not sufficient in cases where port trust state is DSCP mode as packet might not even contain vlan header. Therefore if the port is in dscp trust state and vport's min inline mode is not NONE, copy the IP header to the eseg's inline header if the skb has it. This is done by changing the transmit queue sq's min inline mode to L3. Note that the min inline mode of sqs that belong to other features such as xdpsq, icosq are not modified. Signed-off-by: Huy Nguyen Reviewed-by: Parav Pandit Signed-off-by: Saeed Mahameed --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 1 + .../net/ethernet/mellanox/mlx5/core/en_common.c| 12 +++ drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 37 ++ drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 5 +-- drivers/net/ethernet/mellanox/mlx5/core/en_tx.c| 24 -- 5 files changed, 73 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index ab6f0c18850f..fae7b62d173f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -1083,4 +1083,5 @@ void mlx5e_destroy_netdev(struct mlx5e_priv *priv); void mlx5e_build_nic_params(struct mlx5_core_dev *mdev, struct mlx5e_params *params, u16 max_channels); +u8 mlx5e_params_calculate_tx_min_inline(struct mlx5_core_dev *mdev); #endif /* __MLX5_EN_H__ */ diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c index 157d02917237..784e282803db 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c @@ -171,3 +171,15 @@ int mlx5e_refresh_tirs(struct mlx5e_priv *priv, bool enable_uc_lb) return err; } + +u8 mlx5e_params_calculate_tx_min_inline(struct mlx5_core_dev *mdev) +{ + u8 min_inline_mode; + + mlx5_query_min_inline(mdev, &min_inline_mode); + if (min_inline_mode == MLX5_INLINE_MODE_NONE && + !MLX5_CAP_ETH(mdev, wqe_vlan_insert)) + min_inline_mode = MLX5_INLINE_MODE_L2; + + return min_inline_mode; +} diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c index aa59c4324159..b402d69a701b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c @@ -960,6 +960,40 @@ void mlx5e_dcbnl_delete_app(struct mlx5e_priv *priv) mlx5e_dcbnl_dscp_app(priv, DELETE); } +static void mlx5e_trust_update_tx_min_inline_mode(struct mlx5e_priv *priv, + struct mlx5e_params *params) +{ + params->tx_min_inline_mode = mlx5e_params_calculate_tx_min_inline(priv->mdev); + if (priv->dcbx_dp.trust_state == MLX5_QPTS_TRUST_DSCP && + params->tx_min_inline_mode == MLX5_INLINE_MODE_L2) + params->tx_min_inline_mode = MLX5_INLINE_MODE_IP; +} + +static void mlx5e_trust_update_sq_inline_mode(struct mlx5e_priv *priv) +{ + struct mlx5e_channels new_channels = {}; + + mutex_lock(&priv->state_lock); + + if (!test_bit(MLX5E_STATE_OPENED, &priv->state)) + goto out; + + new_channels.params = priv->channels.params; + mlx5e_trust_update_tx_min_inline_mode(priv, &new_channels.params); + + /* Skip if tx_min_inline is the same */ + if (new_channels.params.tx_min_inline_mode == + priv->channels.params.tx_min_inline_mode) + goto out; + + if (mlx5e_open_channels(priv, &new_channels)) + goto out; + mlx5e_switch_priv_channels(priv, &new_channels, NULL); + +out: + mutex_unlock(&priv->state_lock); +} + static int mlx5e_set_trust_state(struct mlx5e_priv *priv, u8 trust_state) { int err; @@ -968,6 +1002,7 @@ static int mlx5e_set_trust_state(struct mlx5e_priv *priv, u8 trust_state) if (err) return err; priv->dcbx_dp.trust_state = trust_state; + mlx5e_trust_update_sq_inline_mode(priv); return err; } @@ -996,6 +1031,8 @@ static int mlx5e_trust_initialize(struct mlx5e_priv *priv) if (err) return err; + mlx5e_trust_update_tx_min_inline_mode(priv, &priv->channels.params); + err = mlx5_query_dscp2prio(priv->mdev, priv->dcbx_dp.dscp2prio); if (err) return err; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 8633476fb536..a97ee38143aa 100644 --- a/drivers/net/ethernet/me