[PATCH v4 net-next 2/9] devlink: Add generic parameter msix_vec_per_pf_max

2018-10-03 Thread Vasundhara Volam
msix_vec_per_pf_max - This param sets the number of MSIX vectors
that the device requests from the host on driver initialization.
This value is set in the device which is applicable per PF.

Cc: Jiri Pirko 
Cc: Michael Chan 
Signed-off-by: Vasundhara Volam 
---
 include/net/devlink.h | 4 
 net/core/devlink.c| 5 +
 2 files changed, 9 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 90d8343..59be17b 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -363,6 +363,7 @@ enum devlink_param_generic_id {
DEVLINK_PARAM_GENERIC_ID_ENABLE_SRIOV,
DEVLINK_PARAM_GENERIC_ID_REGION_SNAPSHOT,
DEVLINK_PARAM_GENERIC_ID_IGNORE_ARI,
+   DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MAX,
 
/* add new param generic ids above here*/
__DEVLINK_PARAM_GENERIC_ID_MAX,
@@ -384,6 +385,9 @@ enum devlink_param_generic_id {
 #define DEVLINK_PARAM_GENERIC_IGNORE_ARI_NAME "ignore_ari"
 #define DEVLINK_PARAM_GENERIC_IGNORE_ARI_TYPE DEVLINK_PARAM_TYPE_BOOL
 
+#define DEVLINK_PARAM_GENERIC_MSIX_VEC_PER_PF_MAX_NAME "msix_vec_per_pf_max"
+#define DEVLINK_PARAM_GENERIC_MSIX_VEC_PER_PF_MAX_TYPE DEVLINK_PARAM_TYPE_U32
+
 #define DEVLINK_PARAM_GENERIC(_id, _cmodes, _get, _set, _validate) \
 {  \
.id = DEVLINK_PARAM_GENERIC_ID_##_id,   \
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 3349a4d..ce9fe63 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -2680,6 +2680,11 @@ static int devlink_nl_cmd_reload(struct sk_buff *skb, 
struct genl_info *info)
.name = DEVLINK_PARAM_GENERIC_IGNORE_ARI_NAME,
.type = DEVLINK_PARAM_GENERIC_IGNORE_ARI_TYPE,
},
+   {
+   .id = DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MAX,
+   .name = DEVLINK_PARAM_GENERIC_MSIX_VEC_PER_PF_MAX_NAME,
+   .type = DEVLINK_PARAM_GENERIC_MSIX_VEC_PER_PF_MAX_TYPE,
+   },
 };
 
 static int devlink_param_generic_verify(const struct devlink_param *param)
-- 
1.8.3.1



[PATCH v4 net-next 4/9] bnxt_en: Use ignore_ari devlink parameter

2018-10-03 Thread Vasundhara Volam
This patch adds support for ignore_ari generic permanent mode
devlink parameter. This parameter is disabled by default. It can be
enabled using devlink param commands.

ignore_ari - If enabled, device ignores ARI(Alternate Routing ID)
capability, even when platforms has the support and creates same number
of partitions when platform does not support ARI capability.

Cc: Michael Chan 
Signed-off-by: Vasundhara Volam 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c | 6 ++
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h | 1 +
 2 files changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
index 790c684..5173881 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
@@ -24,6 +24,8 @@
 static const struct bnxt_dl_nvm_param nvm_params[] = {
{DEVLINK_PARAM_GENERIC_ID_ENABLE_SRIOV, NVM_OFF_ENABLE_SRIOV,
 BNXT_NVM_SHARED_CFG, 1},
+   {DEVLINK_PARAM_GENERIC_ID_IGNORE_ARI, NVM_OFF_IGNORE_ARI,
+BNXT_NVM_SHARED_CFG, 1},
 };
 
 static int bnxt_hwrm_nvm_req(struct bnxt *bp, u32 param_id, void *msg,
@@ -108,6 +110,10 @@ static int bnxt_dl_nvm_param_set(struct devlink *dl, u32 
id,
  BIT(DEVLINK_PARAM_CMODE_PERMANENT),
  bnxt_dl_nvm_param_get, bnxt_dl_nvm_param_set,
  NULL),
+   DEVLINK_PARAM_GENERIC(IGNORE_ARI,
+ BIT(DEVLINK_PARAM_CMODE_PERMANENT),
+ bnxt_dl_nvm_param_get, bnxt_dl_nvm_param_set,
+ NULL),
 };
 
 int bnxt_dl_register(struct bnxt *bp)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h
index 2f68dc0..3d07c8f 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h
@@ -33,6 +33,7 @@ static inline void bnxt_link_bp_to_dl(struct bnxt *bp, struct 
devlink *dl)
}
 }
 
+#define NVM_OFF_IGNORE_ARI 164
 #define NVM_OFF_ENABLE_SRIOV   401
 
 enum bnxt_nvm_dir_type {
-- 
1.8.3.1



[PATCH v4 net-next 3/9] devlink: Add generic parameter msix_vec_per_pf_min

2018-10-03 Thread Vasundhara Volam
msix_vec_per_pf_min - This param sets the number of minimal MSIX
vectors required for the device initialization. This value is set
in the device which limits MSIX vectors per PF.

Cc: Jiri Pirko 
Cc: Michael Chan 
Signed-off-by: Vasundhara Volam 
---
 include/net/devlink.h | 4 
 net/core/devlink.c| 5 +
 2 files changed, 9 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 59be17b..361f525 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -364,6 +364,7 @@ enum devlink_param_generic_id {
DEVLINK_PARAM_GENERIC_ID_REGION_SNAPSHOT,
DEVLINK_PARAM_GENERIC_ID_IGNORE_ARI,
DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MAX,
+   DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MIN,
 
/* add new param generic ids above here*/
__DEVLINK_PARAM_GENERIC_ID_MAX,
@@ -388,6 +389,9 @@ enum devlink_param_generic_id {
 #define DEVLINK_PARAM_GENERIC_MSIX_VEC_PER_PF_MAX_NAME "msix_vec_per_pf_max"
 #define DEVLINK_PARAM_GENERIC_MSIX_VEC_PER_PF_MAX_TYPE DEVLINK_PARAM_TYPE_U32
 
+#define DEVLINK_PARAM_GENERIC_MSIX_VEC_PER_PF_MIN_NAME "msix_vec_per_pf_min"
+#define DEVLINK_PARAM_GENERIC_MSIX_VEC_PER_PF_MIN_TYPE DEVLINK_PARAM_TYPE_U32
+
 #define DEVLINK_PARAM_GENERIC(_id, _cmodes, _get, _set, _validate) \
 {  \
.id = DEVLINK_PARAM_GENERIC_ID_##_id,   \
diff --git a/net/core/devlink.c b/net/core/devlink.c
index ce9fe63..25d3bfa 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -2685,6 +2685,11 @@ static int devlink_nl_cmd_reload(struct sk_buff *skb, 
struct genl_info *info)
.name = DEVLINK_PARAM_GENERIC_MSIX_VEC_PER_PF_MAX_NAME,
.type = DEVLINK_PARAM_GENERIC_MSIX_VEC_PER_PF_MAX_TYPE,
},
+   {
+   .id = DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MIN,
+   .name = DEVLINK_PARAM_GENERIC_MSIX_VEC_PER_PF_MIN_NAME,
+   .type = DEVLINK_PARAM_GENERIC_MSIX_VEC_PER_PF_MIN_TYPE,
+   },
 };
 
 static int devlink_param_generic_verify(const struct devlink_param *param)
-- 
1.8.3.1



[PATCH v4 net-next 5/9] bnxt_en: return proper error when FW returns HWRM_ERR_CODE_RESOURCE_ACCESS_DENIED

2018-10-03 Thread Vasundhara Volam
Return proper error code when Firmware returns
HWRM_ERR_CODE_RESOURCE_ACCESS_DENIED for HWRM_NVM_GET/SET_VARIABLE
commands.

Cc: Michael Chan 
Signed-off-by: Vasundhara Volam 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
index 5173881..dc566fd 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
@@ -80,8 +80,12 @@ static int bnxt_hwrm_nvm_req(struct bnxt *bp, u32 param_id, 
void *msg,
memcpy(buf, data_addr, bytesize);
 
dma_free_coherent(&bp->pdev->dev, bytesize, data_addr, data_dma_addr);
-   if (rc)
+   if (rc == HWRM_ERR_CODE_RESOURCE_ACCESS_DENIED) {
+   netdev_err(bp->dev, "PF does not have admin privileges to 
modify NVM config\n");
+   return -EACCES;
+   } else if (rc) {
return -EIO;
+   }
return 0;
 }
 
-- 
1.8.3.1



[PATCH v4 net-next 6/9] bnxt_en: Use msix_vec_per_pf_max and msix_vec_per_pf_min devlink params.

2018-10-03 Thread Vasundhara Volam
This patch adds support for following generic permanent mode
devlink parameters. They can be modified using devlink param
commands.

msix_vec_per_pf_max - This param sets the number of MSIX vectors
that the device requests from the host on driver initialization.
This value is set in the device which limits MSIX vectors per PF.

msix_vec_per_pf_min - This param sets the number of minimal MSIX
vectors required for the device initialization. Value 0 indicates
a default value is selected. This value is set in the device which
limits MSIX vectors per PF.

Cc: Michael Chan 
Signed-off-by: Vasundhara Volam 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c | 50 ++-
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h |  5 +++
 2 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
index dc566fd..de7e74a 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
@@ -26,6 +26,10 @@
 BNXT_NVM_SHARED_CFG, 1},
{DEVLINK_PARAM_GENERIC_ID_IGNORE_ARI, NVM_OFF_IGNORE_ARI,
 BNXT_NVM_SHARED_CFG, 1},
+   {DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MAX,
+NVM_OFF_MSIX_VEC_PER_PF_MAX, BNXT_NVM_SHARED_CFG, 10},
+   {DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MIN,
+NVM_OFF_MSIX_VEC_PER_PF_MIN, BNXT_NVM_SHARED_CFG, 7},
 };
 
 static int bnxt_hwrm_nvm_req(struct bnxt *bp, u32 param_id, void *msg,
@@ -57,8 +61,22 @@ static int bnxt_hwrm_nvm_req(struct bnxt *bp, u32 param_id, 
void *msg,
idx = bp->pf.fw_fid - BNXT_FIRST_PF_FID;
 
bytesize = roundup(nvm_param.num_bits, BITS_PER_BYTE) / BITS_PER_BYTE;
-   if (nvm_param.num_bits == 1)
-   buf = &val->vbool;
+   switch (bytesize) {
+   case 1:
+   if (nvm_param.num_bits == 1)
+   buf = &val->vbool;
+   else
+   buf = &val->vu8;
+   break;
+   case 2:
+   buf = &val->vu16;
+   break;
+   case 4:
+   buf = &val->vu32;
+   break;
+   default:
+   return -EFAULT;
+   }
 
data_addr = dma_zalloc_coherent(&bp->pdev->dev, bytesize,
&data_dma_addr, GFP_KERNEL);
@@ -109,6 +127,26 @@ static int bnxt_dl_nvm_param_set(struct devlink *dl, u32 
id,
return bnxt_hwrm_nvm_req(bp, id, &req, sizeof(req), &ctx->val);
 }
 
+static int bnxt_dl_msix_validate(struct devlink *dl, u32 id,
+union devlink_param_value val,
+struct netlink_ext_ack *extack)
+{
+   int max_val;
+
+   if (id == DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MAX)
+   max_val = BNXT_MSIX_VEC_MAX;
+
+   if (id == DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MIN)
+   max_val = BNXT_MSIX_VEC_MIN_MAX;
+
+   if (val.vu32 < 0 || val.vu32 > max_val) {
+   NL_SET_ERR_MSG_MOD(extack, "MSIX value is exceeding the range");
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
 static const struct devlink_param bnxt_dl_params[] = {
DEVLINK_PARAM_GENERIC(ENABLE_SRIOV,
  BIT(DEVLINK_PARAM_CMODE_PERMANENT),
@@ -118,6 +156,14 @@ static int bnxt_dl_nvm_param_set(struct devlink *dl, u32 
id,
  BIT(DEVLINK_PARAM_CMODE_PERMANENT),
  bnxt_dl_nvm_param_get, bnxt_dl_nvm_param_set,
  NULL),
+   DEVLINK_PARAM_GENERIC(MSIX_VEC_PER_PF_MAX,
+ BIT(DEVLINK_PARAM_CMODE_PERMANENT),
+ bnxt_dl_nvm_param_get, bnxt_dl_nvm_param_set,
+ bnxt_dl_msix_validate),
+   DEVLINK_PARAM_GENERIC(MSIX_VEC_PER_PF_MIN,
+ BIT(DEVLINK_PARAM_CMODE_PERMANENT),
+ bnxt_dl_nvm_param_get, bnxt_dl_nvm_param_set,
+ bnxt_dl_msix_validate),
 };
 
 int bnxt_dl_register(struct bnxt *bp)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h
index 3d07c8f..2bfd082 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h
@@ -33,9 +33,14 @@ static inline void bnxt_link_bp_to_dl(struct bnxt *bp, 
struct devlink *dl)
}
 }
 
+#define NVM_OFF_MSIX_VEC_PER_PF_MAX108
+#define NVM_OFF_MSIX_VEC_PER_PF_MIN114
 #define NVM_OFF_IGNORE_ARI 164
 #define NVM_OFF_ENABLE_SRIOV   401
 
+#define BNXT_MSIX_VEC_MAX  1280
+#define BNXT_MSIX_VEC_MIN_MAX  128
+
 enum bnxt_nvm_dir_type {
BNXT_NVM_SHARED_CFG = 40,
BNXT_NVM_PORT_CFG,
-- 
1.8.3.1



[PATCH v4 net-next 8/9] devlink: Add Documentation/networking/devlink-params.txt

2018-10-03 Thread Vasundhara Volam
This patch adds a new file to add information about some of the
generic configuration parameters set via devlink.

Cc: "David S. Miller" 
Cc: Jonathan Corbet 
Cc: linux-...@vger.kernel.org
Cc: Jiri Pirko 
Cc: Michael Chan 
Signed-off-by: Vasundhara Volam 
---
 Documentation/networking/devlink-params.txt | 42 +
 1 file changed, 42 insertions(+)
 create mode 100644 Documentation/networking/devlink-params.txt

diff --git a/Documentation/networking/devlink-params.txt 
b/Documentation/networking/devlink-params.txt
new file mode 100644
index 000..ae444ff
--- /dev/null
+++ b/Documentation/networking/devlink-params.txt
@@ -0,0 +1,42 @@
+Devlink configuration parameters
+
+Following is the list of configuration parameters via devlink interface.
+Each parameter can be generic or driver specific and are device level
+parameters.
+
+Note that the driver-specific files should contain the generic params
+they support to, with supported config modes.
+
+Each parameter can be set in different configuration modes:
+   runtime - set while driver is running, no reset required.
+   driverinit  - applied while driver initializes, requires restart
+   driver by devlink reload command.
+   permanent   - written to device's non-volatile memory, hard reset
+   required.
+
+Following is the list of parameters:
+
+enable_sriov   [DEVICE, GENERIC]
+   Enable Single Root I/O Virtualisation (SRIOV) in
+   the device.
+   Type: Boolean
+
+ignore_ari [DEVICE, GENERIC]
+   Ignore Alternative Routing-ID Interpretation (ARI)
+   capability. If enabled, adapter will ignore ARI
+   capability even when platforms has the support
+   enabled and creates same number of partitions when
+   platform does not support ARI.
+   Type: Boolean
+
+msix_vec_per_pf_max[DEVICE, GENERIC]
+   Provides the maximum number of MSIX interrupts that
+   a device can create. Value is same across all
+   physical functions (PFs) in the device.
+   Type: u32
+
+msix_vec_per_pf_min[DEVICE, GENERIC]
+   Provides the minimum number of MSIX interrupts required
+   for the device initialization. Value is same across all
+   physical functions (PFs) in the device.
+   Type: u32
-- 
1.8.3.1



[PATCH v4 net-next 9/9] devlink: Add Documentation/networking/devlink-params-bnxt.txt

2018-10-03 Thread Vasundhara Volam
This patch adds a new file to add information about configuration
parameters that are supported by bnxt_en driver via devlink.

Cc: "David S. Miller" 
Cc: Jonathan Corbet 
Cc: linux-...@vger.kernel.org
Cc: Jiri Pirko 
Cc: Michael Chan 
Signed-off-by: Vasundhara Volam 
---
 Documentation/networking/devlink-params-bnxt.txt | 18 ++
 1 file changed, 18 insertions(+)
 create mode 100644 Documentation/networking/devlink-params-bnxt.txt

diff --git a/Documentation/networking/devlink-params-bnxt.txt 
b/Documentation/networking/devlink-params-bnxt.txt
new file mode 100644
index 000..481aa30
--- /dev/null
+++ b/Documentation/networking/devlink-params-bnxt.txt
@@ -0,0 +1,18 @@
+enable_sriov   [DEVICE, GENERIC]
+   Configuration mode: Permanent
+
+ignore_ari [DEVICE, GENERIC]
+   Configuration mode: Permanent
+
+msix_vec_per_pf_max[DEVICE, GENERIC]
+   Configuration mode: Permanent
+
+msix_vec_per_pf_min[DEVICE, GENERIC]
+   Configuration mode: Permanent
+
+gre_ver_check  [DEVICE, DRIVER-SPECIFIC]
+   Generic Routing Encapsulation (GRE) version check will
+   be enabled in the device. If disabled, device skips
+   version checking for incoming packets.
+   Type: Boolean
+   Configuration mode: Permanent
-- 
1.8.3.1



[PATCH v4 net-next 1/9] devlink: Add generic parameter ignore_ari

2018-10-03 Thread Vasundhara Volam
ignore_ari - Device ignores ARI(Alternate Routing ID) capability,
even when platforms has the support and creates same number of
partitions when platform does not support ARI capability.

Cc: Jiri Pirko 
Cc: Michael Chan 
Signed-off-by: Vasundhara Volam 
---
 include/net/devlink.h | 4 
 net/core/devlink.c| 5 +
 2 files changed, 9 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index b9b89d6..90d8343 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -362,6 +362,7 @@ enum devlink_param_generic_id {
DEVLINK_PARAM_GENERIC_ID_MAX_MACS,
DEVLINK_PARAM_GENERIC_ID_ENABLE_SRIOV,
DEVLINK_PARAM_GENERIC_ID_REGION_SNAPSHOT,
+   DEVLINK_PARAM_GENERIC_ID_IGNORE_ARI,
 
/* add new param generic ids above here*/
__DEVLINK_PARAM_GENERIC_ID_MAX,
@@ -380,6 +381,9 @@ enum devlink_param_generic_id {
 #define DEVLINK_PARAM_GENERIC_REGION_SNAPSHOT_NAME "region_snapshot_enable"
 #define DEVLINK_PARAM_GENERIC_REGION_SNAPSHOT_TYPE DEVLINK_PARAM_TYPE_BOOL
 
+#define DEVLINK_PARAM_GENERIC_IGNORE_ARI_NAME "ignore_ari"
+#define DEVLINK_PARAM_GENERIC_IGNORE_ARI_TYPE DEVLINK_PARAM_TYPE_BOOL
+
 #define DEVLINK_PARAM_GENERIC(_id, _cmodes, _get, _set, _validate) \
 {  \
.id = DEVLINK_PARAM_GENERIC_ID_##_id,   \
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 8c0ed22..3349a4d 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -2675,6 +2675,11 @@ static int devlink_nl_cmd_reload(struct sk_buff *skb, 
struct genl_info *info)
.name = DEVLINK_PARAM_GENERIC_REGION_SNAPSHOT_NAME,
.type = DEVLINK_PARAM_GENERIC_REGION_SNAPSHOT_TYPE,
},
+   {
+   .id = DEVLINK_PARAM_GENERIC_ID_IGNORE_ARI,
+   .name = DEVLINK_PARAM_GENERIC_IGNORE_ARI_NAME,
+   .type = DEVLINK_PARAM_GENERIC_IGNORE_ARI_TYPE,
+   },
 };
 
 static int devlink_param_generic_verify(const struct devlink_param *param)
-- 
1.8.3.1



[PATCH v4 net-next 7/9] bnxt_en: Add a driver specific gre_ver_check devlink parameter.

2018-10-03 Thread Vasundhara Volam
This patch adds following driver-specific permanent mode boolean
parameter.

gre_ver_check - Generic Routing Encapsulation(GRE) version check
will be enabled in the device. If disabled, device skips version
checking for GRE packets.

Cc: Michael Chan 
Signed-off-by: Vasundhara Volam 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c | 24 ++-
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h |  1 +
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
index de7e74a..8a10e01 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c
@@ -21,6 +21,11 @@
 #endif /* CONFIG_BNXT_SRIOV */
 };
 
+enum bnxt_dl_param_id {
+   BNXT_DEVLINK_PARAM_ID_BASE = DEVLINK_PARAM_GENERIC_ID_MAX,
+   BNXT_DEVLINK_PARAM_ID_GRE_VER_CHECK,
+};
+
 static const struct bnxt_dl_nvm_param nvm_params[] = {
{DEVLINK_PARAM_GENERIC_ID_ENABLE_SRIOV, NVM_OFF_ENABLE_SRIOV,
 BNXT_NVM_SHARED_CFG, 1},
@@ -30,6 +35,8 @@
 NVM_OFF_MSIX_VEC_PER_PF_MAX, BNXT_NVM_SHARED_CFG, 10},
{DEVLINK_PARAM_GENERIC_ID_MSIX_VEC_PER_PF_MIN,
 NVM_OFF_MSIX_VEC_PER_PF_MIN, BNXT_NVM_SHARED_CFG, 7},
+   {BNXT_DEVLINK_PARAM_ID_GRE_VER_CHECK, NVM_OFF_DIS_GRE_VER_CHECK,
+BNXT_NVM_SHARED_CFG, 1},
 };
 
 static int bnxt_hwrm_nvm_req(struct bnxt *bp, u32 param_id, void *msg,
@@ -112,9 +119,15 @@ static int bnxt_dl_nvm_param_get(struct devlink *dl, u32 
id,
 {
struct hwrm_nvm_get_variable_input req = {0};
struct bnxt *bp = bnxt_get_bp_from_dl(dl);
+   int rc;
 
bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_NVM_GET_VARIABLE, -1, -1);
-   return bnxt_hwrm_nvm_req(bp, id, &req, sizeof(req), &ctx->val);
+   rc = bnxt_hwrm_nvm_req(bp, id, &req, sizeof(req), &ctx->val);
+   if (!rc)
+   if (id == BNXT_DEVLINK_PARAM_ID_GRE_VER_CHECK)
+   ctx->val.vbool = !ctx->val.vbool;
+
+   return rc;
 }
 
 static int bnxt_dl_nvm_param_set(struct devlink *dl, u32 id,
@@ -124,6 +137,10 @@ static int bnxt_dl_nvm_param_set(struct devlink *dl, u32 
id,
struct bnxt *bp = bnxt_get_bp_from_dl(dl);
 
bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_NVM_SET_VARIABLE, -1, -1);
+
+   if (id == BNXT_DEVLINK_PARAM_ID_GRE_VER_CHECK)
+   ctx->val.vbool = !ctx->val.vbool;
+
return bnxt_hwrm_nvm_req(bp, id, &req, sizeof(req), &ctx->val);
 }
 
@@ -164,6 +181,11 @@ static int bnxt_dl_msix_validate(struct devlink *dl, u32 
id,
  BIT(DEVLINK_PARAM_CMODE_PERMANENT),
  bnxt_dl_nvm_param_get, bnxt_dl_nvm_param_set,
  bnxt_dl_msix_validate),
+   DEVLINK_PARAM_DRIVER(BNXT_DEVLINK_PARAM_ID_GRE_VER_CHECK,
+"gre_ver_check", DEVLINK_PARAM_TYPE_BOOL,
+BIT(DEVLINK_PARAM_CMODE_PERMANENT),
+bnxt_dl_nvm_param_get, bnxt_dl_nvm_param_set,
+NULL),
 };
 
 int bnxt_dl_register(struct bnxt *bp)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h
index 2bfd082..5b6b2c7 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h
@@ -36,6 +36,7 @@ static inline void bnxt_link_bp_to_dl(struct bnxt *bp, struct 
devlink *dl)
 #define NVM_OFF_MSIX_VEC_PER_PF_MAX108
 #define NVM_OFF_MSIX_VEC_PER_PF_MIN114
 #define NVM_OFF_IGNORE_ARI 164
+#define NVM_OFF_DIS_GRE_VER_CHECK  171
 #define NVM_OFF_ENABLE_SRIOV   401
 
 #define BNXT_MSIX_VEC_MAX  1280
-- 
1.8.3.1



[PATCH v4 net-next 0/9] bnxt_en: devlink param updates

2018-10-03 Thread Vasundhara Volam
This patchset adds support for 3 generic and 1 driver-specific devlink
parameters. Add documentation for these configuration parameters.

Also, this patchset adds support to return proper error code if
HWRM_NVM_GET/SET_VARIABLE commands return error code
HWRM_ERR_CODE_RESOURCE_ACCESS_DENIED.

v3->v4:
-Remove extra definition of NVM_OFF_HW_TC_OFFLOAD from bnxt_devlink.h
-Remove type information for generic parameters from
devlink-params-bnxt.txt

v2->v3:
-Remove description of generic parameters from devlink-params-bnxt.txt

v1->v2:
-Remove hw_tc_offload parameter.
-Update all patches with Cc of MAINTAINERS.
-Add more description in commit message for device specific parameter.
-Add a new Documentation/networking/devlink-params.txt with some
generic devlink parameters information.
-Add a new Documentation/networking/devlink-params-bnxt.txt with devlink
parameters information that are supported by bnxt_en driver.

Vasundhara Volam (9):
  devlink: Add generic parameter ignore_ari
  devlink: Add generic parameter msix_vec_per_pf_max
  devlink: Add generic parameter msix_vec_per_pf_min
  bnxt_en: Use ignore_ari devlink parameter
  bnxt_en: return proper error when FW returns
HWRM_ERR_CODE_RESOURCE_ACCESS_DENIED
  bnxt_en: Use msix_vec_per_pf_max and msix_vec_per_pf_min devlink
params.
  bnxt_en: Add a driver specific gre_ver_check devlink parameter.
  devlink: Add Documentation/networking/devlink-params.txt
  devlink: Add Documentation/networking/devlink-params-bnxt.txt

 Documentation/networking/devlink-params-bnxt.txt  | 18 +
 Documentation/networking/devlink-params.txt   | 42 +++
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c | 86 +--
 drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.h |  7 ++
 include/net/devlink.h | 12 
 net/core/devlink.c| 15 
 6 files changed, 176 insertions(+), 4 deletions(-)
 create mode 100644 Documentation/networking/devlink-params-bnxt.txt
 create mode 100644 Documentation/networking/devlink-params.txt

-- 
1.8.3.1



[RFC 2/2] nfp: register remote block callbacks for vxlan/geneve

2018-10-03 Thread Jakub Kicinski
From: John Hurley 

Test stub to illustrate how the NFP could register for and receive
callbacks from remote block setups.

Signed-off-by: John Hurley 
Signed-off-by: Jakub Kicinski 
---
 .../net/ethernet/netronome/nfp/flower/main.c  |  12 ++
 .../net/ethernet/netronome/nfp/flower/main.h  |  10 ++
 .../ethernet/netronome/nfp/flower/offload.c   | 156 ++
 .../netronome/nfp/flower/tunnel_conf.c|   8 +
 4 files changed, 186 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.c 
b/drivers/net/ethernet/netronome/nfp/flower/main.c
index e57d23746585..34b0c3602ab2 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.c
@@ -587,8 +587,17 @@ static int nfp_flower_init(struct nfp_app *app)
goto err_cleanup_metadata;
}
 
+   app_priv->indir_cb_owner = tc_indr_block_owner_create();
+   if (!app_priv->indir_cb_owner)
+   goto err_cleanup_lag;
+
+   INIT_LIST_HEAD(&app_priv->nfp_indr_block_cb_list);
+
return 0;
 
+err_cleanup_lag:
+   if (app_priv->flower_ext_feats & NFP_FL_FEATS_LAG)
+   nfp_flower_lag_cleanup(&app_priv->nfp_lag);
 err_cleanup_metadata:
nfp_flower_metadata_cleanup(app);
 err_free_app_priv:
@@ -607,6 +616,9 @@ static void nfp_flower_clean(struct nfp_app *app)
if (app_priv->flower_ext_feats & NFP_FL_FEATS_LAG)
nfp_flower_lag_cleanup(&app_priv->nfp_lag);
 
+   tc_indr_block_owner_clean(app_priv->indir_cb_owner);
+   nfp_flower_clean_indr_block_cbs(app_priv);
+
nfp_flower_metadata_cleanup(app);
vfree(app->priv);
app->priv = NULL;
diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.h 
b/drivers/net/ethernet/netronome/nfp/flower/main.h
index 81d941ab895c..5f27318ecdbd 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/main.h
+++ b/drivers/net/ethernet/netronome/nfp/flower/main.h
@@ -161,6 +161,7 @@ struct nfp_fl_lag {
  * @reify_wait_queue:  wait queue for repr reify response counting
  * @mtu_conf:  Configuration of repr MTU value
  * @nfp_lag:   Link aggregation data block
+ * @indir_cb_owner:Master structure for indirect TC block callback
  */
 struct nfp_flower_priv {
struct nfp_app *app;
@@ -191,6 +192,8 @@ struct nfp_flower_priv {
wait_queue_head_t reify_wait_queue;
struct nfp_mtu_conf mtu_conf;
struct nfp_fl_lag nfp_lag;
+   struct list_head nfp_indr_block_cb_list;
+   struct tcf_indr_block_owner *indir_cb_owner;
 };
 
 /**
@@ -293,5 +296,12 @@ int nfp_flower_lag_populate_pre_action(struct nfp_app *app,
   struct nfp_fl_pre_lag *pre_act);
 int nfp_flower_lag_get_output_id(struct nfp_app *app,
 struct net_device *master);
+void
+nfp_flower_register_indr_block(struct nfp_flower_priv *app_priv,
+  struct net_device *netdev);
+void
+nfp_flower_unregister_indr_block(struct nfp_flower_priv *app_priv,
+struct net_device *netdev);
+void nfp_flower_clean_indr_block_cbs(struct nfp_flower_priv *app_priv);
 
 #endif
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c 
b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index bd19624f10cf..14f1b91b7b90 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -707,3 +707,159 @@ int nfp_flower_setup_tc(struct nfp_app *app, struct 
net_device *netdev,
return -EOPNOTSUPP;
}
 }
+
+struct indr_block_cb_priv {
+   struct net_device *netdev;
+   struct nfp_flower_priv *app_priv;
+   struct list_head list;
+};
+
+static struct indr_block_cb_priv *
+indr_block_cb_priv_lookup(struct nfp_flower_priv *app_priv,
+ struct net_device *netdev)
+{
+   struct indr_block_cb_priv *cb_priv;
+
+   /* All callback list access should be protected by RTNL. */
+   ASSERT_RTNL();
+
+   list_for_each_entry(cb_priv, &app_priv->nfp_indr_block_cb_list, list)
+   if (cb_priv->netdev == netdev)
+   return cb_priv;
+
+   return NULL;
+}
+
+void nfp_flower_clean_indr_block_cbs(struct nfp_flower_priv *app_priv)
+{
+   struct indr_block_cb_priv *cb_priv, *temp;
+
+   list_for_each_entry_safe(cb_priv, temp,
+&app_priv->nfp_indr_block_cb_list, list)
+   kfree(cb_priv);
+}
+
+static int
+nfp_flower_indr_offload(struct net_device *netdev,
+   struct tc_cls_flower_offload *flower)
+{
+   if (flower->common.chain_index)
+   return -EOPNOTSUPP;
+
+   if (!eth_proto_is_802_3(flower->common.protocol))
+   return -EOPNOTSUPP;
+
+   switch (flower->command) {
+   case TC_CLSFLOWER_REPLACE:
+   netdev_info(netdev, "Flower replace\n");
+   break;
+  

[RFC 1/2] net: sched: register callbacks for remote tc block binds

2018-10-03 Thread Jakub Kicinski
From: John Hurley 

Currently drivers can register for TC block binds/unbinds by implementing
the setup_tc ndo. However, drivers may also be interested in binds to
higher level devices (e.g. tunnel drivers) to potentially offload filters
applied to them.

Introduce indirect block setups which allows drivers to register callbacks
for block binds on other devices. The calling driver is expected to
allocate a struct containing an initialised list head to all its block
setup callbacks. This is used to track the callbacks from a given driver
and free them if the driver is removed while the upper level device is
still active. Freeing a setup cb will also trigger an unbind event (if
necessary) to direct the driver to unregister any block callbacks.

Allow registering an indirect block setup cb for a device that is already
bound to a block. In this case (if it is an ingress block), register and
also trigger the callback - meaning that any already installed rules can
be replayed to the calling driver if it chooses.

Signed-off-by: John Hurley 
Signed-off-by: Jakub Kicinski 
---
 include/net/pkt_cls.h |  56 +++
 include/net/sch_generic.h |   3 +
 net/sched/cls_api.c   | 297 +-
 3 files changed, 355 insertions(+), 1 deletion(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 338ef054bf16..85e335162982 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -37,6 +37,7 @@ struct tcf_block_ext_info {
 };
 
 struct tcf_block_cb;
+struct tcf_indr_block_owner;
 bool tcf_queue_work(struct rcu_work *rwork, work_func_t func);
 
 #ifdef CONFIG_NET_CLS
@@ -81,6 +82,20 @@ void __tcf_block_cb_unregister(struct tcf_block *block,
   struct tcf_block_cb *block_cb);
 void tcf_block_cb_unregister(struct tcf_block *block,
 tc_setup_cb_t *cb, void *cb_ident);
+int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
+   tc_indr_block_bind_cb_t *cb, void *cb_ident,
+   struct tcf_indr_block_owner *owner);
+int tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
+ tc_indr_block_bind_cb_t *cb, void *cb_ident,
+ struct tcf_indr_block_owner *owner);
+void __tc_indr_block_cb_unregister(struct net_device *dev,
+  tc_indr_block_bind_cb_t *cb, void *cb_ident);
+void tc_indr_block_cb_unregister(struct net_device *dev,
+tc_indr_block_bind_cb_t *cb,
+void *cb_ident);
+
+struct tcf_indr_block_owner *tc_indr_block_owner_create(void);
+void tc_indr_block_owner_clean(struct tcf_indr_block_owner *owner);
 
 int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
 struct tcf_result *res, bool compat_mode);
@@ -183,6 +198,47 @@ void tcf_block_cb_unregister(struct tcf_block *block,
 {
 }
 
+static inline
+int __tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
+   tc_indr_block_bind_cb_t *cb,
+   void *cb_ident,
+   struct tcf_indr_block_owner *owner)
+{
+   return 0;
+}
+
+static inline
+int tc_indr_block_cb_register(struct net_device *dev, void *cb_priv,
+ tc_indr_block_bind_cb_t *cb, void *cb_ident,
+ struct tcf_indr_block_owner *owner)
+{
+   return 0;
+}
+
+static inline
+void __tc_indr_block_cb_unregister(struct net_device *dev,
+  tc_indr_block_bind_cb_t *cb,
+  void *cb_ident)
+{
+}
+
+static inline
+void tc_indr_block_cb_unregister(struct net_device *dev,
+tc_indr_block_bind_cb_t *cb,
+void *cb_ident)
+{
+}
+
+static inline struct tcf_indr_block_owner *tc_indr_block_owner_create(void)
+{
+   /* NULL would mean an error, only CONFIG_NET_CLS can dereference this */
+   return (void *)1;
+}
+
+static inline void tc_indr_block_owner_clean(struct tcf_indr_block_owner 
*owner)
+{
+}
+
 static inline int tcf_classify(struct sk_buff *skb, const struct tcf_proto *tp,
   struct tcf_result *res, bool compat_mode)
 {
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index de972403d31e..da73864c001c 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -24,6 +24,9 @@ struct bpf_flow_keys;
 typedef int tc_setup_cb_t(enum tc_setup_type type,
  void *type_data, void *cb_priv);
 
+typedef int tc_indr_block_bind_cb_t(struct net_device *dev, void *cb_priv,
+  enum tc_setup_type type, void *type_data);
+
 struct qdisc_rate_table {
struct tc_ratespec rate;
u32 data[256];
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 3de47e99b78

[RFC 0/2] net: sched: indirect/remote setup tc block cb registering

2018-10-03 Thread Jakub Kicinski
Hi!

This set contains a rough RFC implementation of a proposed [1] replacement
for egdev cls_flower offloads.  I did some last minute restructuring
and removal of parts I felt were unnecessary, so if there are glaring bugs
they are probably mine, not John's :)  but hopefully this will give an idea
of the general direction.  We need to beef up the driver part to see how
it fully comes together.

[1] http://vger.kernel.org/netconf2018_files/JakubKicinski_netconf2018.pdf
slides 10-13

John's says:

This patchset introduces as an alternative to egdev offload by allowing a
driver to register for block updates when an external device (e.g. tunnel
netdev) is bound to a TC block. Drivers can track new netdevs or register
to existing ones to receive information on such events. Based on this,
they may register for block offload rules using already existing
functions.

Included with this RFC is a patch to the NFP driver. This is only supposed
to provide an example of how the remote block setup can be used.

John Hurley (2):
  net: sched: register callbacks for remote tc block binds
  nfp: register remote block callbacks for vxlan/geneve

 .../net/ethernet/netronome/nfp/flower/main.c  |  12 +
 .../net/ethernet/netronome/nfp/flower/main.h  |  10 +
 .../ethernet/netronome/nfp/flower/offload.c   | 156 +
 .../netronome/nfp/flower/tunnel_conf.c|   8 +
 include/net/pkt_cls.h |  56 
 include/net/sch_generic.h |   3 +
 net/sched/cls_api.c   | 297 +-
 7 files changed, 541 insertions(+), 1 deletion(-)

-- 
2.17.1



Re: [PATCH iproute2] lib/libnetlink: fix response seq check

2018-10-03 Thread Stephen Hemminger
On Wed, 3 Oct 2018 16:01:40 -0700
Vlad Dumitrescu  wrote:

> Hi,
> 
> On Fri, Sep 28, 2018 at 10:14 AM  wrote:
> >
> > From: Vlad Dumitrescu 
> >
> > Taking a one-iovec example, with rtnl->seq at 42. iovlen == 1, seq
> > becomes 43 on line 604, and a message is sent with nlmsg_seq == 43. If
> > a response with nlmsg_seq of 42 is received, the condition being fixed
> > in this patch would incorrectly accept it.
> >
> > Fixes: 72a2ff3916e5 ("lib/libnetlink: Add a new function rtnl_talk_iov")
> > Signed-off-by: Vlad Dumitrescu 
> > ---
> >  lib/libnetlink.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/lib/libnetlink.c b/lib/libnetlink.c
> > index f18dceac..4d2416bf 100644
> > --- a/lib/libnetlink.c
> > +++ b/lib/libnetlink.c
> > @@ -647,7 +647,7 @@ static int __rtnl_talk_iov(struct rtnl_handle *rtnl, 
> > struct iovec *iov,
> >
> > if (nladdr.nl_pid != 0 ||
> > h->nlmsg_pid != rtnl->local.nl_pid ||
> > -   h->nlmsg_seq > seq || h->nlmsg_seq < seq - 
> > iovlen) {
> > +   h->nlmsg_seq > seq || h->nlmsg_seq < seq - 
> > iovlen + 1) {
> > /* Don't forget to skip that message. */
> > status -= NLMSG_ALIGN(len);
> > h = (struct nlmsghdr *)((char *)h + 
> > NLMSG_ALIGN(len));
> > --
> > 2.19.0.605.g01d371f741-goog  
> 
> Did anybody get a chance to review this? I'm not 100% sure I'm fixing
> the right thing.
> 
> Thanks,
> Vlad

Could you give an example where this failed.
Better yet one of the tests.


Re: Kernel oops with mlx5 and dual XDP redirect programs

2018-10-03 Thread Saeed Mahameed
On Wed, 2018-10-03 at 11:30 +0200, Toke Høiland-Jørgensen wrote:
> Hi Saeed
> 
> I can reliably oops the kernel with the mlx5 driver, by installing
> XDP_REDIRECT programs on two devices so they redirect to each other,
> and then remove them while there is traffic on the interface.
> 
> Steps to reproduce:
> 
> # cd ~/build/linux/samples/bpf
> # ./xdp_redirect_map $( $( # ./xdp_redirect_map $( $( 
> Now, run some traffic (e.g., using pktgen) across the interfaces, and
> while the traffic is running, interrupt one of the xdp_redirect_map
> commands (thus unloading the eBPF program). This results in a kernel
> oops with the backtrace below. I get no crash if there's only a
> single
> XDP program.

Hi Toke,

What looks like happening is that while the traffic is being redirected
to the other device, the driver is trying to unload the program and
restarting the rings from below call trace we can see:

[ 1400.972318] RIP: 0010:mlx5e_xdp_xmit+0x7b/0x2a0 [mlx5_core]
[ 1401.077409]  bq_xmit_all+0x5e/0x160
[ 1401.080897]  dev_map_enqueue+0x12e/0x140
[ 1401.084823]  xdp_do_redirect+0x1a9/0x2a0
[ 1401.088756]  mlx5e_xdp_handle+0x24f/0x2b0 [mlx5_core]

and
[ 1401.154559] RIP: 0010:mlx5e_open_channels+0x65e/0x1390 [mlx5_core]
[ 1401.222834]  ? mlx5e_open_channels+0x5e1/0x1390 [mlx5_core]
[ 1401.228404]  ? rcu_exp_wait_wake+0x550/0x550
[ 1401.232674]  ? free_one_page+0x68/0x370
[ 1401.236519]  mlx5e_open_locked+0x28/0xa0 [mlx5_core]
[ 1401.241491]  mlx5e_xdp+0x2b2/0x300 [mlx5_core]
[ 1401.245936]  dev_xdp_install+0x4c/0x70
[ 1401.249686]  do_setlink+0xcdb/0xd10

I think that the mlx5 driver doesn't know how to tell the other device
to stop transmitting to it while it is resetting.. Maybe tariq or
Jesper know more about this ?
I will look at this tomorrow after noon and will try to repro...

what is interesting is that @ mlx5e_open_channels  stage all previous
TX queues must be still active and not destroyed only later on when we
switch to the new channels we stop and destroy older TX/RX queues, the
question is how much this call trace is reliable ?

Thanks for the report.

> 
> Is this something you could look into, please? :)

> 
> -Toke
> 
> 
> [ 1400.937870] BUG: unable to handle kernel paging request at
> 3fa8
> [ 1400.944826] PGD 80072cc7b067 P4D 80072cc7b067 PUD
> 72cc7a067 PMD 0 
> [ 1400.951693] Oops:  [#1] SMP PTI
> [ 1400.955184] CPU: 5 PID: 10392 Comm: xdp_redirect_ma Not tainted
> 4.19.0-rc5-xdptest-g5be3ebf+ #17
> [ 1400.965344] Hardware name: LENOVO 30B3005DMT/102F, BIOS S00KT56A
> 01/15/2018
> [ 1400.972318] RIP: 0010:mlx5e_xdp_xmit+0x7b/0x2a0 [mlx5_core]
> [ 1400.977889] Code: 8b 0d 29 d9 4f 3f 39 8f 48 39 00 00 b8 fa ff ff
> ff 0f 86 45 01 00 00 48 8b 87 40 39 00 00 48 63 c9 4c 8b 24 c8 b8 9c
> ff ff ff <49> 8b 8c 24 a8 3f 00 00 4d 8d bc 24 c0 3c 00 00 83 e1 01
> 0f 84 19
> [ 1400.996624] RSP: 0018:90209fb43bb0 EFLAGS: 00010202
> [ 1401.002001] RAX: ff9c RBX:  RCX:
> 0005
> [ 1401.009122] RDX: c7627fd75190 RSI: 0010 RDI:
> 90208458
> [ 1401.016250] RBP: c7627fd75190 R08: 901f9821c100 R09:
> c7627fd75210
> [ 1401.023379] R10: 05dc R11:  R12:
> 
> [ 1401.030500] R13: 90208158 R14: 0001 R15:
> c7627fd75190
> [ 1401.037645] FS:  7f460fa96700() GS:90209fb4()
> knlGS:
> [ 1401.045718] CS:  0010 DS:  ES:  CR0: 80050033
> [ 1401.051452] CR2: 3fa8 CR3: 00076c3b6006 CR4:
> 003606e0
> [ 1401.058573] DR0:  DR1:  DR2:
> 
> [ 1401.065823] DR3:  DR6: fffe0ff0 DR7:
> 0400
> [ 1401.072943] Call Trace:
> [ 1401.075390]  
> [ 1401.077409]  bq_xmit_all+0x5e/0x160
> [ 1401.080897]  dev_map_enqueue+0x12e/0x140
> [ 1401.084823]  xdp_do_redirect+0x1a9/0x2a0
> [ 1401.088756]  mlx5e_xdp_handle+0x24f/0x2b0 [mlx5_core]
> [ 1401.093821]  ? resched_cpu+0x5f/0x70
> [ 1401.097399]  ? __xdp_return+0x189/0x400
> [ 1401.101242]  mlx5e_skb_from_cqe_linear+0xdd/0x180 [mlx5_core]
> [ 1401.106987]  mlx5e_handle_rx_cqe+0x43/0xe0 [mlx5_core]
> [ 1401.112130]  mlx5e_poll_rx_cq+0xcb/0x940 [mlx5_core]
> [ 1401.117094]  mlx5e_napi_poll+0xa6/0xc90 [mlx5_core]
> [ 1401.121966]  ? smp_reschedule_interrupt+0x16/0xd0
> [ 1401.126789]  ? reschedule_interrupt+0xf/0x20
> [ 1401.131057]  ? reschedule_interrupt+0xa/0x20
> [ 1401.135321]  net_rx_action+0x279/0x3d0
> [ 1401.139071]  __do_softirq+0xf2/0x28e
> [ 1401.142651]  irq_exit+0xb6/0xc0
> [ 1401.145792]  do_IRQ+0x52/0xd0
> [ 1401.148785]  common_interrupt+0xf/0xf
> [ 1401.152445]  
> [ 1401.154559] RIP: 0010:mlx5e_open_channels+0x65e/0x1390 [mlx5_core]
> [ 1401.160734] Code: 8b 00 48 05 a8 00 00 00 48 89 85 78 3c 00 00 48
> 8b 83 f8 8d 01 00 48 89 85 80 3c 00 00 48 8b 83 f0 8d 01 00 8b 80 a8
> fb 03 00 <0f> c8 89 85 88 3c 00 00 41 0f b6 45 16 88 85 8c 3c 00 00
> 49 83 bd
> [ 14

[PATCH v3] net/ncsi: Add NCSI OEM command support

2018-10-03 Thread Vijay Khemka
This patch adds OEM commands and response handling. It also defines OEM
command and response structure as per NCSI specification along with its
handlers.

ncsi_cmd_handler_oem: This is a generic command request handler for OEM
commands
ncsi_rsp_handler_oem: This is a generic response handler for OEM commands

Signed-off-by: Vijay Khemka 
---
 net/ncsi/internal.h |  5 +
 net/ncsi/ncsi-cmd.c | 30 +++---
 net/ncsi/ncsi-pkt.h | 14 ++
 net/ncsi/ncsi-rsp.c | 43 ++-
 4 files changed, 88 insertions(+), 4 deletions(-)

diff --git a/net/ncsi/internal.h b/net/ncsi/internal.h
index 8055e3965cef..3d0a33b874f5 100644
--- a/net/ncsi/internal.h
+++ b/net/ncsi/internal.h
@@ -68,6 +68,10 @@ enum {
NCSI_MODE_MAX
 };
 
+/* OEM Vendor Manufacture ID */
+#define NCSI_OEM_MFR_MLX_ID 0x8119
+#define NCSI_OEM_MFR_BCM_ID 0x113d
+
 struct ncsi_channel_version {
u32 version;/* Supported BCD encoded NCSI version */
u32 alpha2; /* Supported BCD encoded NCSI version */
@@ -305,6 +309,7 @@ struct ncsi_cmd_arg {
unsigned short words[8];
unsigned int   dwords[4];
};
+   unsigned char*data;   /* NCSI OEM data */
 };
 
 extern struct list_head ncsi_dev_list;
diff --git a/net/ncsi/ncsi-cmd.c b/net/ncsi/ncsi-cmd.c
index 7567ca63aae2..82b7d9201db8 100644
--- a/net/ncsi/ncsi-cmd.c
+++ b/net/ncsi/ncsi-cmd.c
@@ -211,6 +211,25 @@ static int ncsi_cmd_handler_snfc(struct sk_buff *skb,
return 0;
 }
 
+static int ncsi_cmd_handler_oem(struct sk_buff *skb,
+   struct ncsi_cmd_arg *nca)
+{
+   struct ncsi_cmd_oem_pkt *cmd;
+   unsigned int len;
+
+   len = sizeof(struct ncsi_cmd_pkt_hdr) + 4;
+   if (nca->payload < 26)
+   len += 26;
+   else
+   len += nca->payload;
+
+   cmd = skb_put_zero(skb, len);
+   memcpy(&cmd->mfr_id, nca->data, nca->payload);
+   ncsi_cmd_build_header(&cmd->cmd.common, nca);
+
+   return 0;
+}
+
 static struct ncsi_cmd_handler {
unsigned char type;
int   payload;
@@ -244,7 +263,7 @@ static struct ncsi_cmd_handler {
{ NCSI_PKT_CMD_GNS,0, ncsi_cmd_handler_default },
{ NCSI_PKT_CMD_GNPTS,  0, ncsi_cmd_handler_default },
{ NCSI_PKT_CMD_GPS,0, ncsi_cmd_handler_default },
-   { NCSI_PKT_CMD_OEM,0, NULL },
+   { NCSI_PKT_CMD_OEM,   -1, ncsi_cmd_handler_oem },
{ NCSI_PKT_CMD_PLDM,   0, NULL },
{ NCSI_PKT_CMD_GPUUID, 0, ncsi_cmd_handler_default }
 };
@@ -316,8 +335,13 @@ int ncsi_xmit_cmd(struct ncsi_cmd_arg *nca)
return -ENOENT;
}
 
-   /* Get packet payload length and allocate the request */
-   nca->payload = nch->payload;
+   /* Get packet payload length and allocate the request
+* It is expected that if length set as negative in
+* handler structure means caller is initializing it
+* and setting length in nca before calling xmit function
+*/
+   if (nch->payload >= 0)
+   nca->payload = nch->payload;
nr = ncsi_alloc_command(nca);
if (!nr)
return -ENOMEM;
diff --git a/net/ncsi/ncsi-pkt.h b/net/ncsi/ncsi-pkt.h
index 91b4b66438df..0f2087c8d42a 100644
--- a/net/ncsi/ncsi-pkt.h
+++ b/net/ncsi/ncsi-pkt.h
@@ -151,6 +151,20 @@ struct ncsi_cmd_snfc_pkt {
unsigned char   pad[22];
 };
 
+/* OEM Request Command as per NCSI Specification */
+struct ncsi_cmd_oem_pkt {
+   struct ncsi_cmd_pkt_hdr cmd; /* Command header*/
+   __be32  mfr_id;  /* Manufacture ID*/
+   unsigned char   data[];  /* OEM Payload Data  */
+};
+
+/* OEM Response Packet as per NCSI Specification */
+struct ncsi_rsp_oem_pkt {
+   struct ncsi_rsp_pkt_hdr rsp; /* Command header*/
+   __be32  mfr_id;  /* Manufacture ID*/
+   unsigned char   data[];  /* Payload data  */
+};
+
 /* Get Link Status */
 struct ncsi_rsp_gls_pkt {
struct ncsi_rsp_pkt_hdr rsp;/* Response header   */
diff --git a/net/ncsi/ncsi-rsp.c b/net/ncsi/ncsi-rsp.c
index 930c1d3796f0..d66b34749027 100644
--- a/net/ncsi/ncsi-rsp.c
+++ b/net/ncsi/ncsi-rsp.c
@@ -596,6 +596,47 @@ static int ncsi_rsp_handler_snfc(struct ncsi_request *nr)
return 0;
 }
 
+static struct ncsi_rsp_oem_handler {
+   unsigned intmfr_id;
+   int (*handler)(struct ncsi_request *nr);
+} ncsi_rsp_oem_handlers[] = {
+   { NCSI_OEM_MFR_MLX_ID, NULL },
+   { NCSI_OEM_MFR_BCM_ID, NULL }
+};
+
+/* Response handler for OEM command */
+static int ncsi_rsp_handler_oem(struct ncsi_request *nr)
+{
+   struct ncsi_rsp_oem_pkt *rsp;
+   struct ncsi_rsp_oem_handler *nrh = NULL;
+   unsigned int mfr_id, 

Re: [PATCH iproute2] lib/libnetlink: fix response seq check

2018-10-03 Thread Vlad Dumitrescu
Hi,

On Fri, Sep 28, 2018 at 10:14 AM  wrote:
>
> From: Vlad Dumitrescu 
>
> Taking a one-iovec example, with rtnl->seq at 42. iovlen == 1, seq
> becomes 43 on line 604, and a message is sent with nlmsg_seq == 43. If
> a response with nlmsg_seq of 42 is received, the condition being fixed
> in this patch would incorrectly accept it.
>
> Fixes: 72a2ff3916e5 ("lib/libnetlink: Add a new function rtnl_talk_iov")
> Signed-off-by: Vlad Dumitrescu 
> ---
>  lib/libnetlink.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/lib/libnetlink.c b/lib/libnetlink.c
> index f18dceac..4d2416bf 100644
> --- a/lib/libnetlink.c
> +++ b/lib/libnetlink.c
> @@ -647,7 +647,7 @@ static int __rtnl_talk_iov(struct rtnl_handle *rtnl, 
> struct iovec *iov,
>
> if (nladdr.nl_pid != 0 ||
> h->nlmsg_pid != rtnl->local.nl_pid ||
> -   h->nlmsg_seq > seq || h->nlmsg_seq < seq - 
> iovlen) {
> +   h->nlmsg_seq > seq || h->nlmsg_seq < seq - iovlen 
> + 1) {
> /* Don't forget to skip that message. */
> status -= NLMSG_ALIGN(len);
> h = (struct nlmsghdr *)((char *)h + 
> NLMSG_ALIGN(len));
> --
> 2.19.0.605.g01d371f741-goog

Did anybody get a chance to review this? I'm not 100% sure I'm fixing
the right thing.

Thanks,
Vlad


[PATCH net-next] net/neigh: Extend dump filter to proxy neighbor dumps

2018-10-03 Thread David Ahern
From: David Ahern 

Move the attribute parsing from neigh_dump_table to neigh_dump_info, and
pass the filter arguments down to neigh_dump_table in a new struct. Add
the filter option to proxy neigh dumps as well to make them consistent.

Signed-off-by: David Ahern 
---
 net/core/neighbour.c | 72 ++--
 1 file changed, 42 insertions(+), 30 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 20e0d3308148..fb023df48b83 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -2329,35 +2329,24 @@ static bool neigh_ifindex_filtered(struct net_device 
*dev, int filter_idx)
return false;
 }
 
+struct neigh_dump_filter {
+   int master_idx;
+   int dev_idx;
+};
+
 static int neigh_dump_table(struct neigh_table *tbl, struct sk_buff *skb,
-   struct netlink_callback *cb)
+   struct netlink_callback *cb,
+   struct neigh_dump_filter *filter)
 {
struct net *net = sock_net(skb->sk);
-   const struct nlmsghdr *nlh = cb->nlh;
-   struct nlattr *tb[NDA_MAX + 1];
struct neighbour *n;
int rc, h, s_h = cb->args[1];
int idx, s_idx = idx = cb->args[2];
struct neigh_hash_table *nht;
-   int filter_master_idx = 0, filter_idx = 0;
unsigned int flags = NLM_F_MULTI;
-   int err;
 
-   err = nlmsg_parse(nlh, sizeof(struct ndmsg), tb, NDA_MAX, NULL, NULL);
-   if (!err) {
-   if (tb[NDA_IFINDEX]) {
-   if (nla_len(tb[NDA_IFINDEX]) != sizeof(u32))
-   return -EINVAL;
-   filter_idx = nla_get_u32(tb[NDA_IFINDEX]);
-   }
-   if (tb[NDA_MASTER]) {
-   if (nla_len(tb[NDA_MASTER]) != sizeof(u32))
-   return -EINVAL;
-   filter_master_idx = nla_get_u32(tb[NDA_MASTER]);
-   }
-   if (filter_idx || filter_master_idx)
-   flags |= NLM_F_DUMP_FILTERED;
-   }
+   if (filter->dev_idx || filter->master_idx)
+   flags |= NLM_F_DUMP_FILTERED;
 
rcu_read_lock_bh();
nht = rcu_dereference_bh(tbl->nht);
@@ -2370,8 +2359,8 @@ static int neigh_dump_table(struct neigh_table *tbl, 
struct sk_buff *skb,
 n = rcu_dereference_bh(n->next)) {
if (idx < s_idx || !net_eq(dev_net(n->dev), net))
goto next;
-   if (neigh_ifindex_filtered(n->dev, filter_idx) ||
-   neigh_master_filtered(n->dev, filter_master_idx))
+   if (neigh_ifindex_filtered(n->dev, filter->dev_idx) ||
+   neigh_master_filtered(n->dev, filter->master_idx))
goto next;
if (neigh_fill_info(skb, n, NETLINK_CB(cb->skb).portid,
cb->nlh->nlmsg_seq,
@@ -2393,12 +2382,17 @@ static int neigh_dump_table(struct neigh_table *tbl, 
struct sk_buff *skb,
 }
 
 static int pneigh_dump_table(struct neigh_table *tbl, struct sk_buff *skb,
-struct netlink_callback *cb)
+struct netlink_callback *cb,
+struct neigh_dump_filter *filter)
 {
struct pneigh_entry *n;
struct net *net = sock_net(skb->sk);
int rc, h, s_h = cb->args[3];
int idx, s_idx = idx = cb->args[4];
+   unsigned int flags = NLM_F_MULTI;
+
+   if (filter->dev_idx || filter->master_idx)
+   flags |= NLM_F_DUMP_FILTERED;
 
read_lock_bh(&tbl->lock);
 
@@ -2408,10 +2402,12 @@ static int pneigh_dump_table(struct neigh_table *tbl, 
struct sk_buff *skb,
for (n = tbl->phash_buckets[h], idx = 0; n; n = n->next) {
if (idx < s_idx || pneigh_net(n) != net)
goto next;
+   if (neigh_ifindex_filtered(n->dev, filter->dev_idx) ||
+   neigh_master_filtered(n->dev, filter->master_idx))
+   goto next;
if (pneigh_fill_info(skb, n, NETLINK_CB(cb->skb).portid,
cb->nlh->nlmsg_seq,
-   RTM_NEWNEIGH,
-   NLM_F_MULTI, tbl) < 0) {
+   RTM_NEWNEIGH, flags, tbl) < 0) {
read_unlock_bh(&tbl->lock);
rc = -1;
goto out;
@@ -2432,20 +2428,36 @@ static int pneigh_dump_table(struct neigh_table *tbl, 
struct sk_buff *skb,
 
 static int neigh_dump_info(struct sk_buff *skb, struct netlink_callback *cb)
 {
+   const struct nlmsghdr *nlh = cb->nlh;
+   struct neigh_dump_filter filter = {};
+  

[PATCH bpf-next 3/6] libbpf: Consistent prefixes for interfaces in nlattr.h.

2018-10-03 Thread Andrey Ignatov
libbpf is used more and more outside kernel tree. That means the library
should follow good practices in library design and implementation to
play well with third party code that uses it.

One of such practices is to have a common prefix (or a few) for every
interface, function or data structure, library provides. I helps to
avoid name conflicts with other libraries and keeps API consistent.

Inconsistent names in libbpf already cause problems in real life. E.g.
an application can't use both libbpf and libnl due to conflicting
symbols.

Having common prefix will help to fix current and avoid future problems.

libbpf already uses the following prefixes for its interfaces:
* bpf_ for bpf system call wrappers, program/map/elf-object
  abstractions and a few other things;
* btf_ for BTF related API;
* libbpf_ for everything else.

The patch adds libbpf_ prefix to interfaces in nlattr.h that use none of
mentioned above prefixes and doesn't fit well into the first two
categories.

Since affected part of API is used in bpftool, the patch applies
corresponding change to bpftool as well. Having it in a separate patch
will cause a state of tree where bpftool is broken what may not be a
good idea.

Signed-off-by: Andrey Ignatov 
Acked-by: Alexei Starovoitov 
---
 tools/bpf/bpftool/net.c| 10 +++--
 tools/bpf/bpftool/netlink_dumper.c | 32 ---
 tools/lib/bpf/netlink.c| 10 ++---
 tools/lib/bpf/nlattr.c | 64 --
 tools/lib/bpf/nlattr.h | 59 +--
 5 files changed, 94 insertions(+), 81 deletions(-)

diff --git a/tools/bpf/bpftool/net.c b/tools/bpf/bpftool/net.c
index ef83e8a08490..d441bb7035ca 100644
--- a/tools/bpf/bpftool/net.c
+++ b/tools/bpf/bpftool/net.c
@@ -69,7 +69,9 @@ static int dump_link_nlmsg(void *cookie, void *msg, struct 
nlattr **tb)
snprintf(netinfo->devices[netinfo->used_len].devname,
 sizeof(netinfo->devices[netinfo->used_len].devname),
 "%s",
-tb[IFLA_IFNAME] ? nla_getattr_str(tb[IFLA_IFNAME]) : "");
+tb[IFLA_IFNAME]
+? libbpf_nla_getattr_str(tb[IFLA_IFNAME])
+: "");
netinfo->used_len++;
 
return do_xdp_dump(ifinfo, tb);
@@ -83,7 +85,7 @@ static int dump_class_qdisc_nlmsg(void *cookie, void *msg, 
struct nlattr **tb)
if (tcinfo->is_qdisc) {
/* skip clsact qdisc */
if (tb[TCA_KIND] &&
-   strcmp(nla_data(tb[TCA_KIND]), "clsact") == 0)
+   strcmp(libbpf_nla_data(tb[TCA_KIND]), "clsact") == 0)
return 0;
if (info->tcm_handle == 0)
return 0;
@@ -101,7 +103,9 @@ static int dump_class_qdisc_nlmsg(void *cookie, void *msg, 
struct nlattr **tb)
snprintf(tcinfo->handle_array[tcinfo->used_len].kind,
 sizeof(tcinfo->handle_array[tcinfo->used_len].kind),
 "%s",
-tb[TCA_KIND] ? nla_getattr_str(tb[TCA_KIND]) : "unknown");
+tb[TCA_KIND]
+? libbpf_nla_getattr_str(tb[TCA_KIND])
+: "unknown");
tcinfo->used_len++;
 
return 0;
diff --git a/tools/bpf/bpftool/netlink_dumper.c 
b/tools/bpf/bpftool/netlink_dumper.c
index 6f5e9cc6836c..4e9f4531269f 100644
--- a/tools/bpf/bpftool/netlink_dumper.c
+++ b/tools/bpf/bpftool/netlink_dumper.c
@@ -21,7 +21,7 @@ static void xdp_dump_prog_id(struct nlattr **tb, int attr,
if (new_json_object)
NET_START_OBJECT
NET_DUMP_STR("mode", " %s", mode);
-   NET_DUMP_UINT("id", " id %u", nla_getattr_u32(tb[attr]))
+   NET_DUMP_UINT("id", " id %u", libbpf_nla_getattr_u32(tb[attr]))
if (new_json_object)
NET_END_OBJECT
 }
@@ -32,13 +32,13 @@ static int do_xdp_dump_one(struct nlattr *attr, unsigned 
int ifindex,
struct nlattr *tb[IFLA_XDP_MAX + 1];
unsigned char mode;
 
-   if (nla_parse_nested(tb, IFLA_XDP_MAX, attr, NULL) < 0)
+   if (libbpf_nla_parse_nested(tb, IFLA_XDP_MAX, attr, NULL) < 0)
return -1;
 
if (!tb[IFLA_XDP_ATTACHED])
return 0;
 
-   mode = nla_getattr_u8(tb[IFLA_XDP_ATTACHED]);
+   mode = libbpf_nla_getattr_u8(tb[IFLA_XDP_ATTACHED]);
if (mode == XDP_ATTACHED_NONE)
return 0;
 
@@ -75,14 +75,14 @@ int do_xdp_dump(struct ifinfomsg *ifinfo, struct nlattr 
**tb)
return 0;
 
return do_xdp_dump_one(tb[IFLA_XDP], ifinfo->ifi_index,
-  nla_getattr_str(tb[IFLA_IFNAME]));
+  libbpf_nla_getattr_str(tb[IFLA_IFNAME]));
 }
 
 static int do_bpf_dump_one_act(struct nlattr *attr)
 {
struct nlattr *tb[TCA_ACT_BPF_MAX + 1];
 
-   if (nla_parse_nested(tb, TCA_ACT_BPF_MAX, attr, NULL) < 0)
+   if (libbpf_nla_parse_nested(tb, TCA_ACT_BPF_MAX, attr, NULL

[PATCH bpf-next 1/6] libbpf: Move __dump_nlmsg_t from API to implementation

2018-10-03 Thread Andrey Ignatov
This typedef is used only by implementation in netlink.c. Nothing uses
it in public API. Move it to netlink.c.

Signed-off-by: Andrey Ignatov 
Acked-by: Alexei Starovoitov 
---
 tools/lib/bpf/libbpf.h  | 3 ---
 tools/lib/bpf/netlink.c | 3 +++
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 2ed24d3f80b3..8388be525388 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -304,11 +304,8 @@ int bpf_perf_event_read_simple(void *mem, unsigned long 
size,
   void **buf, size_t *buf_len,
   bpf_perf_event_print_t fn, void *priv);
 
-struct nlmsghdr;
 struct nlattr;
 typedef int (*dump_nlmsg_t)(void *cookie, void *msg, struct nlattr **tb);
-typedef int (*__dump_nlmsg_t)(struct nlmsghdr *nlmsg, dump_nlmsg_t,
- void *cookie);
 int bpf_netlink_open(unsigned int *nl_pid);
 int nl_get_link(int sock, unsigned int nl_pid, dump_nlmsg_t dump_link_nlmsg,
void *cookie);
diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
index fde1d7bf8199..da46d9358d9d 100644
--- a/tools/lib/bpf/netlink.c
+++ b/tools/lib/bpf/netlink.c
@@ -18,6 +18,9 @@
 #define SOL_NETLINK 270
 #endif
 
+typedef int (*__dump_nlmsg_t)(struct nlmsghdr *nlmsg, dump_nlmsg_t,
+ void *cookie);
+
 int bpf_netlink_open(__u32 *nl_pid)
 {
struct sockaddr_nl sa;
-- 
2.17.1



[PATCH bpf-next 2/6] libbpf: Consistent prefixes for interfaces in libbpf.h.

2018-10-03 Thread Andrey Ignatov
libbpf is used more and more outside kernel tree. That means the library
should follow good practices in library design and implementation to
play well with third party code that uses it.

One of such practices is to have a common prefix (or a few) for every
interface, function or data structure, library provides. I helps to
avoid name conflicts with other libraries and keeps API consistent.

Inconsistent names in libbpf already cause problems in real life. E.g.
an application can't use both libbpf and libnl due to conflicting
symbols.

Having common prefix will help to fix current and avoid future problems.

libbpf already uses the following prefixes for its interfaces:
* bpf_ for bpf system call wrappers, program/map/elf-object
  abstractions and a few other things;
* btf_ for BTF related API;
* libbpf_ for everything else.

The patch adds libbpf_ prefix to functions and typedef in libbpf.h that
use none of mentioned above prefixes and doesn't fit well into the first
two categories.

Since affected part of API is used in bpftool, the patch applies
corresponding change to bpftool as well. Having it in a separate patch
will cause a state of tree where bpftool is broken what may not be a
good idea.

Signed-off-by: Andrey Ignatov 
Acked-by: Alexei Starovoitov 
---
 tools/bpf/bpftool/net.c | 31 +++
 tools/lib/bpf/libbpf.h  | 20 ++--
 tools/lib/bpf/netlink.c | 37 -
 3 files changed, 45 insertions(+), 43 deletions(-)

diff --git a/tools/bpf/bpftool/net.c b/tools/bpf/bpftool/net.c
index ed205ee57655..ef83e8a08490 100644
--- a/tools/bpf/bpftool/net.c
+++ b/tools/bpf/bpftool/net.c
@@ -127,14 +127,14 @@ static int show_dev_tc_bpf(int sock, unsigned int nl_pid,
tcinfo.array_len = 0;
 
tcinfo.is_qdisc = false;
-   ret = nl_get_class(sock, nl_pid, dev->ifindex, dump_class_qdisc_nlmsg,
-  &tcinfo);
+   ret = libbpf_nl_get_class(sock, nl_pid, dev->ifindex,
+ dump_class_qdisc_nlmsg, &tcinfo);
if (ret)
goto out;
 
tcinfo.is_qdisc = true;
-   ret = nl_get_qdisc(sock, nl_pid, dev->ifindex, dump_class_qdisc_nlmsg,
-  &tcinfo);
+   ret = libbpf_nl_get_qdisc(sock, nl_pid, dev->ifindex,
+ dump_class_qdisc_nlmsg, &tcinfo);
if (ret)
goto out;
 
@@ -142,10 +142,9 @@ static int show_dev_tc_bpf(int sock, unsigned int nl_pid,
filter_info.ifindex = dev->ifindex;
for (i = 0; i < tcinfo.used_len; i++) {
filter_info.kind = tcinfo.handle_array[i].kind;
-   ret = nl_get_filter(sock, nl_pid, dev->ifindex,
-   tcinfo.handle_array[i].handle,
-   dump_filter_nlmsg,
-   &filter_info);
+   ret = libbpf_nl_get_filter(sock, nl_pid, dev->ifindex,
+  tcinfo.handle_array[i].handle,
+  dump_filter_nlmsg, &filter_info);
if (ret)
goto out;
}
@@ -153,22 +152,22 @@ static int show_dev_tc_bpf(int sock, unsigned int nl_pid,
/* root, ingress and egress handle */
handle = TC_H_ROOT;
filter_info.kind = "root";
-   ret = nl_get_filter(sock, nl_pid, dev->ifindex, handle,
-   dump_filter_nlmsg, &filter_info);
+   ret = libbpf_nl_get_filter(sock, nl_pid, dev->ifindex, handle,
+  dump_filter_nlmsg, &filter_info);
if (ret)
goto out;
 
handle = TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_INGRESS);
filter_info.kind = "clsact/ingress";
-   ret = nl_get_filter(sock, nl_pid, dev->ifindex, handle,
-   dump_filter_nlmsg, &filter_info);
+   ret = libbpf_nl_get_filter(sock, nl_pid, dev->ifindex, handle,
+  dump_filter_nlmsg, &filter_info);
if (ret)
goto out;
 
handle = TC_H_MAKE(TC_H_CLSACT, TC_H_MIN_EGRESS);
filter_info.kind = "clsact/egress";
-   ret = nl_get_filter(sock, nl_pid, dev->ifindex, handle,
-   dump_filter_nlmsg, &filter_info);
+   ret = libbpf_nl_get_filter(sock, nl_pid, dev->ifindex, handle,
+  dump_filter_nlmsg, &filter_info);
if (ret)
goto out;
 
@@ -196,7 +195,7 @@ static int do_show(int argc, char **argv)
usage();
}
 
-   sock = bpf_netlink_open(&nl_pid);
+   sock = libbpf_netlink_open(&nl_pid);
if (sock < 0) {
fprintf(stderr, "failed to open netlink sock\n");
return -1;
@@ -211,7 +210,7 @@ static int do_show(int argc, char **argv)
jsonw_start_array(json_wtr);
NET_START_OBJECT;
NET_START_ARRAY("xdp", "%

[PATCH bpf-next 5/6] libbpf: Make include guards consistent

2018-10-03 Thread Andrey Ignatov
Rename include guards to have consistent names "__LIBBPF_".

Signed-off-by: Andrey Ignatov 
Acked-by: Alexei Starovoitov 
---
 tools/lib/bpf/bpf.h   | 6 +++---
 tools/lib/bpf/btf.h   | 6 +++---
 tools/lib/bpf/libbpf.h| 6 +++---
 tools/lib/bpf/nlattr.h| 6 +++---
 tools/lib/bpf/str_error.h | 6 +++---
 5 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 6f38164b2618..4c78f61b7c71 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -20,8 +20,8 @@
  * You should have received a copy of the GNU Lesser General Public
  * License along with this program; if not,  see 
  */
-#ifndef __BPF_BPF_H
-#define __BPF_BPF_H
+#ifndef __LIBBPF_BPF_H
+#define __LIBBPF_BPF_H
 
 #include 
 #include 
@@ -111,4 +111,4 @@ int bpf_load_btf(void *btf, __u32 btf_size, char *log_buf, 
__u32 log_buf_size,
 int bpf_task_fd_query(int pid, int fd, __u32 flags, char *buf, __u32 *buf_len,
  __u32 *prog_id, __u32 *fd_type, __u64 *probe_offset,
  __u64 *probe_addr);
-#endif
+#endif /* __LIBBPF_BPF_H */
diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index 4897e0724d4e..d5d20682eeb6 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -1,8 +1,8 @@
 /* SPDX-License-Identifier: LGPL-2.1 */
 /* Copyright (c) 2018 Facebook */
 
-#ifndef __BPF_BTF_H
-#define __BPF_BTF_H
+#ifndef __LIBBPF_BTF_H
+#define __LIBBPF_BTF_H
 
 #include 
 
@@ -23,4 +23,4 @@ int btf__resolve_type(const struct btf *btf, __u32 type_id);
 int btf__fd(const struct btf *btf);
 const char *btf__name_by_offset(const struct btf *btf, __u32 offset);
 
-#endif
+#endif /* __LIBBPF_BTF_H */
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 710ff5724980..28f83dd6022b 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -20,8 +20,8 @@
  * You should have received a copy of the GNU Lesser General Public
  * License along with this program; if not,  see 
  */
-#ifndef __BPF_LIBBPF_H
-#define __BPF_LIBBPF_H
+#ifndef __LIBBPF_LIBBPF_H
+#define __LIBBPF_LIBBPF_H
 
 #include 
 #include 
@@ -315,4 +315,4 @@ int libbpf_nl_get_qdisc(int sock, unsigned int nl_pid, int 
ifindex,
libbpf_dump_nlmsg_t dump_qdisc_nlmsg, void *cookie);
 int libbpf_nl_get_filter(int sock, unsigned int nl_pid, int ifindex, int 
handle,
 libbpf_dump_nlmsg_t dump_filter_nlmsg, void *cookie);
-#endif
+#endif /* __LIBBPF_LIBBPF_H */
diff --git a/tools/lib/bpf/nlattr.h b/tools/lib/bpf/nlattr.h
index 755a3312c87f..7198584a3040 100644
--- a/tools/lib/bpf/nlattr.h
+++ b/tools/lib/bpf/nlattr.h
@@ -11,8 +11,8 @@
  * Copyright (c) 2003-2013 Thomas Graf 
  */
 
-#ifndef __NLATTR_H
-#define __NLATTR_H
+#ifndef __LIBBPF_NLATTR_H
+#define __LIBBPF_NLATTR_H
 
 #include 
 #include 
@@ -108,4 +108,4 @@ int libbpf_nla_parse_nested(struct nlattr *tb[], int 
maxtype,
 
 int libbpf_nla_dump_errormsg(struct nlmsghdr *nlh);
 
-#endif /* __NLATTR_H */
+#endif /* __LIBBPF_NLATTR_H */
diff --git a/tools/lib/bpf/str_error.h b/tools/lib/bpf/str_error.h
index 998eff7d6710..b9157f5eebde 100644
--- a/tools/lib/bpf/str_error.h
+++ b/tools/lib/bpf/str_error.h
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: LGPL-2.1
-#ifndef BPF_STR_ERROR
-#define BPF_STR_ERROR
+#ifndef __LIBBPF_STR_ERROR_H
+#define __LIBBPF_STR_ERROR_H
 
 char *libbpf_strerror_r(int err, char *dst, int len);
-#endif // BPF_STR_ERROR
+#endif /* __LIBBPF_STR_ERROR_H */
-- 
2.17.1



[PATCH bpf-next 4/6] libbpf: Consistent prefixes for interfaces in str_error.h.

2018-10-03 Thread Andrey Ignatov
libbpf is used more and more outside kernel tree. That means the library
should follow good practices in library design and implementation to
play well with third party code that uses it.

One of such practices is to have a common prefix (or a few) for every
interface, function or data structure, library provides. I helps to
avoid name conflicts with other libraries and keeps API consistent.

Inconsistent names in libbpf already cause problems in real life. E.g.
an application can't use both libbpf and libnl due to conflicting
symbols.

Having common prefix will help to fix current and avoid future problems.

libbpf already uses the following prefixes for its interfaces:
* bpf_ for bpf system call wrappers, program/map/elf-object
  abstractions and a few other things;
* btf_ for BTF related API;
* libbpf_ for everything else.

The patch renames function in str_error.h to have libbpf_ prefix since it
misses one and doesn't fit well into the first two categories.

Signed-off-by: Andrey Ignatov 
Acked-by: Alexei Starovoitov 
---
 tools/lib/bpf/libbpf.c| 20 +++-
 tools/lib/bpf/str_error.c |  2 +-
 tools/lib/bpf/str_error.h |  2 +-
 3 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 9e68fd9fcfca..02888d36b805 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -470,7 +470,8 @@ static int bpf_object__elf_init(struct bpf_object *obj)
obj->efile.fd = open(obj->path, O_RDONLY);
if (obj->efile.fd < 0) {
char errmsg[STRERR_BUFSIZE];
-   char *cp = str_error(errno, errmsg, sizeof(errmsg));
+   char *cp = libbpf_strerror_r(errno, errmsg,
+sizeof(errmsg));
 
pr_warning("failed to open %s: %s\n", obj->path, cp);
return -errno;
@@ -811,7 +812,8 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
  data->d_size, name, idx);
if (err) {
char errmsg[STRERR_BUFSIZE];
-   char *cp = str_error(-err, errmsg, 
sizeof(errmsg));
+   char *cp = libbpf_strerror_r(-err, errmsg,
+sizeof(errmsg));
 
pr_warning("failed to alloc program %s (%s): 
%s",
   name, obj->path, cp);
@@ -1140,7 +1142,7 @@ bpf_object__create_maps(struct bpf_object *obj)
 
*pfd = bpf_create_map_xattr(&create_attr);
if (*pfd < 0 && create_attr.btf_key_type_id) {
-   cp = str_error(errno, errmsg, sizeof(errmsg));
+   cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
pr_warning("Error in bpf_create_map_xattr(%s):%s(%d). 
Retrying without BTF.\n",
   map->name, cp, errno);
create_attr.btf_fd = 0;
@@ -1155,7 +1157,7 @@ bpf_object__create_maps(struct bpf_object *obj)
size_t j;
 
err = *pfd;
-   cp = str_error(errno, errmsg, sizeof(errmsg));
+   cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
pr_warning("failed to create map (name: '%s'): %s\n",
   map->name, cp);
for (j = 0; j < i; j++)
@@ -1339,7 +1341,7 @@ load_program(enum bpf_prog_type type, enum 
bpf_attach_type expected_attach_type,
}
 
ret = -LIBBPF_ERRNO__LOAD;
-   cp = str_error(errno, errmsg, sizeof(errmsg));
+   cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
pr_warning("load bpf program failed: %s\n", cp);
 
if (log_buf && log_buf[0] != '\0') {
@@ -1655,7 +1657,7 @@ static int check_path(const char *path)
 
dir = dirname(dname);
if (statfs(dir, &st_fs)) {
-   cp = str_error(errno, errmsg, sizeof(errmsg));
+   cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
pr_warning("failed to statfs %s: %s\n", dir, cp);
err = -errno;
}
@@ -1691,7 +1693,7 @@ int bpf_program__pin_instance(struct bpf_program *prog, 
const char *path,
}
 
if (bpf_obj_pin(prog->instances.fds[instance], path)) {
-   cp = str_error(errno, errmsg, sizeof(errmsg));
+   cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
pr_warning("failed to pin program: %s\n", cp);
return -errno;
}
@@ -1709,7 +1711,7 @@ static int make_dir(const char *path)
err = -errno;
 
if (err) {
-   cp = str_error(-err, errmsg, sizeof(errmsg));
+   cp = libbpf_strerror_r(-err, errmsg, sizeof(errmsg

[PATCH bpf-next 6/6] libbpf: Use __u32 instead of u32 in bpf_program__load

2018-10-03 Thread Andrey Ignatov
Make bpf_program__load consistent with other interfaces: use __u32
instead of u32. That in turn fixes build of samples:

In file included from ./samples/bpf/trace_output_user.c:21:0:
./tools/lib/bpf/libbpf.h:132:9: error: unknown type name ‘u32’
 u32 kern_version);
 ^

Fixes: commit 29cd77f41620d ("libbpf: Support loading individual progs")
Signed-off-by: Andrey Ignatov 
Acked-by: Alexei Starovoitov 
---
 tools/lib/bpf/libbpf.c | 2 +-
 tools/lib/bpf/libbpf.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 02888d36b805..85de1ebd4cb0 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1379,7 +1379,7 @@ load_program(enum bpf_prog_type type, enum 
bpf_attach_type expected_attach_type,
 
 int
 bpf_program__load(struct bpf_program *prog,
- char *license, u32 kern_version)
+ char *license, __u32 kern_version)
 {
int err = 0, fd, i;
 
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 28f83dd6022b..fbfc2aec0f0d 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -129,7 +129,7 @@ void bpf_program__set_ifindex(struct bpf_program *prog, 
__u32 ifindex);
 const char *bpf_program__title(struct bpf_program *prog, bool needs_copy);
 
 int bpf_program__load(struct bpf_program *prog, char *license,
- u32 kern_version);
+ __u32 kern_version);
 int bpf_program__fd(struct bpf_program *prog);
 int bpf_program__pin_instance(struct bpf_program *prog, const char *path,
  int instance);
-- 
2.17.1



[PATCH bpf-next 0/6] Consistent prefixes for libbpf interfaces

2018-10-03 Thread Andrey Ignatov
This patch set renames a few interfaces in libbpf, mostly netlink related,
so that all symbols provided by the library have only three possible
prefixes:

% nm -D tools/lib/bpf/libbpf.so  | \
awk '$2 == "T" {sub(/[_\(].*/, "", $3); if ($3) print $3}' | \
sort | \
uniq -c
 91 bpf
  8 btf
 14 libbpf

libbpf is used more and more outside kernel tree. That means the library
should follow good practices in library design and implementation to
play well with third party code that uses it.

One of such practices is to have a common prefix (or a few) for every
interface, function or data structure, library provides. It helps to
avoid name conflicts with other libraries and keeps API/ABI consistent.

Inconsistent names in libbpf already cause problems in real life. E.g.
an application can't use both libbpf and libnl due to conflicting
symbols (specifically nla_parse, nla_parse_nested and a few others).

Some of problematic global symbols are not part of ABI and can be
restricted from export with either visibility attribute/pragma or export
map (what is useful by itself and can be done in addition). That won't
solve the problem for those that are part of ABI though. Also export
restrictions would help only in DSO case. If third party application links
libbpf statically it won't help, and people do it (e.g. Facebook links
most of libraries statically, including libbpf).

libbpf already uses the following prefixes for its interfaces:
* bpf_ for bpf system call wrappers, program/map/elf-object
  abstractions and a few other things;
* btf_ for BTF related API;
* libbpf_ for everything else.

The patch adds libbpf_ prefix to interfaces that use none of mentioned
above prefixes and don't fit well into the first two categories.

Long term benefits of having common prefix should outweigh possible
inconvenience of changing API for those functions now.

Patches 2-4 add libbpf_ prefix to libbpf interfaces: separate patch per
header. Other patches are simple improvements in API.


Andrey Ignatov (6):
  libbpf: Move __dump_nlmsg_t from API to implementation
  libbpf: Consistent prefixes for interfaces in libbpf.h.
  libbpf: Consistent prefixes for interfaces in nlattr.h.
  libbpf: Consistent prefixes for interfaces in str_error.h.
  libbpf: Make include guards consistent
  libbpf: Use __u32 instead of u32 in bpf_program__load

 tools/bpf/bpftool/net.c| 41 ++-
 tools/bpf/bpftool/netlink_dumper.c | 32 ---
 tools/lib/bpf/bpf.h|  6 +--
 tools/lib/bpf/btf.h|  6 +--
 tools/lib/bpf/libbpf.c | 22 +-
 tools/lib/bpf/libbpf.h | 31 +++---
 tools/lib/bpf/netlink.c| 48 --
 tools/lib/bpf/nlattr.c | 64 +++--
 tools/lib/bpf/nlattr.h | 65 +++---
 tools/lib/bpf/str_error.c  |  2 +-
 tools/lib/bpf/str_error.h  |  8 ++--
 11 files changed, 171 insertions(+), 154 deletions(-)

-- 
2.17.1



[PATCH net] net: sched: Add policy validation for tc attributes

2018-10-03 Thread David Ahern
From: David Ahern 

A number of TC attributes are processed without proper validation
(e.g., length checks). Add a tca policy for all input attributes and use
when invoking nlmsg_parse.

The 2 Fixes tags below cover the latest additions. The other attributes
are a string (KIND), nested attribute (OPTIONS which does seem to have
validation in most cases), for dumps only or a flag.

Fixes: 5bc1701881e39 ("net: sched: introduce multichain support for filters")
Fixes: d47a6b0e7c492 ("net: sched: introduce ingress/egress block index 
attributes for qdisc")
Signed-off-by: David Ahern 
---
 net/sched/sch_api.c | 24 
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 98541c6399db..85e73f48e48f 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1311,6 +1311,18 @@ check_loop_fn(struct Qdisc *q, unsigned long cl, struct 
qdisc_walker *w)
  * Delete/get qdisc.
  */
 
+const struct nla_policy rtm_tca_policy[TCA_MAX + 1] = {
+   [TCA_KIND]  = { .type = NLA_STRING },
+   [TCA_OPTIONS]   = { .type = NLA_NESTED },
+   [TCA_RATE]  = { .type = NLA_BINARY,
+   .len = sizeof(struct tc_estimator) },
+   [TCA_STAB]  = { .type = NLA_NESTED },
+   [TCA_DUMP_INVISIBLE]= { .type = NLA_FLAG },
+   [TCA_CHAIN] = { .type = NLA_U32 },
+   [TCA_INGRESS_BLOCK] = { .type = NLA_U32 },
+   [TCA_EGRESS_BLOCK]  = { .type = NLA_U32 },
+};
+
 static int tc_get_qdisc(struct sk_buff *skb, struct nlmsghdr *n,
struct netlink_ext_ack *extack)
 {
@@ -1327,7 +1339,8 @@ static int tc_get_qdisc(struct sk_buff *skb, struct 
nlmsghdr *n,
!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
return -EPERM;
 
-   err = nlmsg_parse(n, sizeof(*tcm), tca, TCA_MAX, NULL, extack);
+   err = nlmsg_parse(n, sizeof(*tcm), tca, TCA_MAX, rtm_tca_policy,
+ extack);
if (err < 0)
return err;
 
@@ -1411,7 +1424,8 @@ static int tc_modify_qdisc(struct sk_buff *skb, struct 
nlmsghdr *n,
 
 replay:
/* Reinit, just in case something touches this. */
-   err = nlmsg_parse(n, sizeof(*tcm), tca, TCA_MAX, NULL, extack);
+   err = nlmsg_parse(n, sizeof(*tcm), tca, TCA_MAX, rtm_tca_policy,
+ extack);
if (err < 0)
return err;
 
@@ -1645,7 +1659,8 @@ static int tc_dump_qdisc(struct sk_buff *skb, struct 
netlink_callback *cb)
idx = 0;
ASSERT_RTNL();
 
-   err = nlmsg_parse(nlh, sizeof(struct tcmsg), tca, TCA_MAX, NULL, NULL);
+   err = nlmsg_parse(nlh, sizeof(struct tcmsg), tca, TCA_MAX,
+ rtm_tca_policy, NULL);
if (err < 0)
return err;
 
@@ -1864,7 +1879,8 @@ static int tc_ctl_tclass(struct sk_buff *skb, struct 
nlmsghdr *n,
!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN))
return -EPERM;
 
-   err = nlmsg_parse(n, sizeof(*tcm), tca, TCA_MAX, NULL, extack);
+   err = nlmsg_parse(n, sizeof(*tcm), tca, TCA_MAX, rtm_tca_policy,
+ extack);
if (err < 0)
return err;
 
-- 
2.11.0



Re: How to post rxrpc patches for net-next with deps on net?

2018-10-03 Thread David Miller
From: David Howells 
Date: Wed, 03 Oct 2018 21:29:19 +0100

> I have some rxrpc patches to post for your net-next/master branch, but there's
> a dependency in them on the rxrpc-fixes-20180928 tag you pulled into your
> net/master branch.
> 
> What's the best way to handle this?
> 
>  (1) Wait for you to merge net into net-next?
> 
>  (2) Base it on my own merge of rxrpc-fixes-20180928 into net-next?  I had to
>  fix up drivers/net/ethernet/netronome/nfp/nfp_net_common.c in the merge
>  I'm currently using.
> 
>  (3) Let you fix up the discrepency between the two branches?  Two lines that
>  got removed in the rxrpc-fixes-20180928 tag cause a merge failure in
>  net/rxrpc/conn_object.c when applying it to raw net-next by their
>  unexpected presence.  It's nothing too bad.

The best way to make me aware of the dependency, and wait for net to get merged
into net-next.

I plan to do either this evening or some time tomorrow afternoon.


Re: [net-next 00/13][pull request] 10GbE Intel Wired LAN Driver Updates 2018-10-03

2018-10-03 Thread David Miller
From: Jeff Kirsher 
Date: Wed,  3 Oct 2018 13:24:58 -0700

> This series contains updates to ixgbe/ixgbevf and few fixes for i40e & iavf.

Pulled, thanks Jeff.


Re: r8169 tx batching(?) causing performance problems

2018-10-03 Thread David Miller
From: David Howells 
Date: Wed, 03 Oct 2018 21:19:40 +0100

> David Miller  wrote:
> 
>> Probably you are seeing some interrupt mitigation.
>> 
>> It seems there is a difference in how the interrupt mitigation is
>> programmed on for 8168 chips vs. others by default.  Most get
>> all zeros in the IntrMitigate register, whilst for 8168 chips
>> a value of 0x5151 is programmed.
> 
> I'm not sure what that means.  I can't seem to find a programmer's manual for
> the chip.

There is a comment which documents what might be the register layout
elsewhere in the driver:

/*
 * Undocumented corner. Supposedly:
 * (TxTimer << 12) | (TxPackets << 8) | (RxTimer << 4) | RxPackets
 */
RTL_W16(tp, IntrMitigate, 0x);


Re: [net] ixgbe: check return value of napi_complete_done()

2018-10-03 Thread David Miller
From: Jeff Kirsher 
Date: Wed,  3 Oct 2018 11:30:35 -0700

> From: Song Liu 
> 
> The NIC driver should only enable interrupts when napi_complete_done()
> returns true. This patch adds the check for ixgbe.
> 
> Cc: sta...@vger.kernel.org # 4.10+
> Suggested-by: Eric Dumazet 
> Signed-off-by: Song Liu 
> Tested-by: Andrew Bowers 
> Signed-off-by: Jeff Kirsher 

Applied, thanks.


Re: [RFC PATCH bpf-next v3 4/7] bpf: add bpf queue and stack maps

2018-10-03 Thread Alexei Starovoitov
On Wed, Oct 03, 2018 at 12:01:37PM -0500, Mauricio Vasquez wrote:
> 
> 
> On 10/01/2018 07:26 PM, Alexei Starovoitov wrote:
> > On Mon, Oct 01, 2018 at 08:11:43AM -0500, Mauricio Vasquez wrote:
> > > > > > +BPF_CALL_3(bpf_map_pop_elem, struct bpf_map *, map, void *,
> > > > > > value, u32, size)
> > > > > > +{
> > > > > > +    void *ptr;
> > > > > > +
> > > > > > +    if (map->value_size != size)
> > > > > > +    return -EINVAL;
> > > > > > +
> > > > > > +    ptr = map->ops->map_lookup_and_delete_elem(map, NULL);
> > > > > > +    if (!ptr)
> > > > > > +    return -ENOENT;
> > > > > > +
> > > > > > +    switch (size) {
> > > > > > +    case 1:
> > > > > > +    *(u8 *) value = *(u8 *) ptr;
> > > > > > +    break;
> > > > > > +    case 2:
> > > > > > +    *(u16 *) value = *(u16 *) ptr;
> > > > > > +    break;
> > > > > > +    case 4:
> > > > > > +    *(u32 *) value = *(u32 *) ptr;
> > > > > > +    break;
> > > > > > +    case 8:
> > > > > > +    *(u64 *) value = *(u64 *) ptr;
> > > > > > +    break;
> > > > > this is inefficient. can we pass value ptr into ops and let it
> > > > > populate it?
> > > > I don't think so, doing that implies that look_and_delete will be a
> > > > per-value op, while other ops in maps are per-reference.
> > > > For instance, how to change it in the case of peek helper that is using
> > > > the lookup operation?, we cannot change the signature of the lookup
> > > > operation.
> > > > 
> > > > This is something that worries me a little bit, we are creating new
> > > > per-value helpers based on already existing per-reference operations,
> > > > this is not probably the cleanest way.  Here we are at the beginning of
> > > > the discussion once again, how should we map helpers and syscalls to
> > > > ops.
> > > > 
> > > > What about creating pop/peek/push ops, mapping helpers one to one and
> > > > adding some logic into syscall.c to call the correct operation in case
> > > > the map is stack/queue?
> > > > Syscall mapping would be:
> > > > bpf_map_lookup_elem() -> peek
> > > > bpf_map_lookup_and_delete_elem() -> pop
> > > > bpf_map_update_elem() -> push
> > > > 
> > > > Does it make sense?
> > > Hello Alexei,
> > > 
> > > Do you have any feedback on this specific part?
> > Indeed. It seems push/pop ops will be cleaner.
> > I still think that peek() is useless due to races.
> > So BPF_MAP_UPDATE_ELEM syscall cmd will map to 'push' ops
> > and new BPF_MAP_LOOKUP_AND_DELETE_ELEM will map to 'pop' ops.
> > right?
> > 
> > 
> That's right.
> 
> While updating the push api some came to me, do we have any specific reason
> to support only 1, 2, 4, 8 bytes? I think we could do it general enough to
> support any number of bytes, if the user is worried about the cost of
> memcpys he could use a map of 8 bytes pointers as you mentioned some time
> ago.
> From an API point of view, pop/peek helpers already expect a void *value
> (int bpf_map_[pop, peek]_elem(map, void *value)), the only change would also
> to use a pointer in the push instead of a u64.

Indeed. Good idea. Full copy of the value for push and pop makes sense.
On the verifier side we probably need ARG_PTR_TO_UNINIT_MAP_VALUE
in addition to normal ARG_PTR_TO_MAP_VALUE for the case of bpf_map_pop().



[PATCH iproute2-next v2] tc: flower: expose hardware offload count

2018-10-03 Thread Vlad Buslov
Recently flower classifier was updated to expose count of devices that
filter is offloaded to. Add support to print this counter as 'in_hw_count'.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
Changes from V1 to V2:
- Change print format string to "%u"

 tc/f_flower.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/tc/f_flower.c b/tc/f_flower.c
index 59e5f572c542..ab7ea3e32f69 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -1585,8 +1585,16 @@ static int flower_print_opt(struct filter_util *qu, FILE 
*f,
if (flags & TCA_CLS_FLAGS_SKIP_SW)
print_bool(PRINT_ANY, "skip_sw", "\n  skip_sw", true);
 
-   if (flags & TCA_CLS_FLAGS_IN_HW)
+   if (flags & TCA_CLS_FLAGS_IN_HW) {
print_bool(PRINT_ANY, "in_hw", "\n  in_hw", true);
+
+   if (tb[TCA_FLOWER_IN_HW_COUNT]) {
+   __u32 count = 
rta_getattr_u32(tb[TCA_FLOWER_IN_HW_COUNT]);
+
+   print_uint(PRINT_ANY, "in_hw_count",
+  " in_hw_count %u", count);
+   }
+   }
else if (flags & TCA_CLS_FLAGS_NOT_IN_HW)
print_bool(PRINT_ANY, "not_in_hw", "\n  not_in_hw", 
true);
}
-- 
2.7.5



How to post rxrpc patches for net-next with deps on net?

2018-10-03 Thread David Howells
Hi Dave,

I have some rxrpc patches to post for your net-next/master branch, but there's
a dependency in them on the rxrpc-fixes-20180928 tag you pulled into your
net/master branch.

What's the best way to handle this?

 (1) Wait for you to merge net into net-next?

 (2) Base it on my own merge of rxrpc-fixes-20180928 into net-next?  I had to
 fix up drivers/net/ethernet/netronome/nfp/nfp_net_common.c in the merge
 I'm currently using.

 (3) Let you fix up the discrepency between the two branches?  Two lines that
 got removed in the rxrpc-fixes-20180928 tag cause a merge failure in
 net/rxrpc/conn_object.c when applying it to raw net-next by their
 unexpected presence.  It's nothing too bad.

Thanks,
David


[net-next 07/13] ixgbe: Fix crash with VFs and flow director on interface flap

2018-10-03 Thread Jeff Kirsher
From: Radoslaw Tyl 

This patch fix crash when we have restore flow director filters after reset
adapter. In ixgbe_fdir_filter_restore() filter->action is outside of the
rx_ring array, as it has a VF identifier in the upper 32 bits.

Signed-off-by: Radoslaw Tyl 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index ddc22557155b..2928ce7653eb 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -5179,6 +5179,7 @@ static void ixgbe_fdir_filter_restore(struct 
ixgbe_adapter *adapter)
struct ixgbe_hw *hw = &adapter->hw;
struct hlist_node *node2;
struct ixgbe_fdir_filter *filter;
+   u64 action;
 
spin_lock(&adapter->fdir_perfect_lock);
 
@@ -5187,12 +5188,17 @@ static void ixgbe_fdir_filter_restore(struct 
ixgbe_adapter *adapter)
 
hlist_for_each_entry_safe(filter, node2,
  &adapter->fdir_filter_list, fdir_node) {
+   action = filter->action;
+   if (action != IXGBE_FDIR_DROP_QUEUE && action != 0)
+   action =
+   (action >> ETHTOOL_RX_FLOW_SPEC_RING_VF_OFF) - 1;
+
ixgbe_fdir_write_perfect_filter_82599(hw,
&filter->filter,
filter->sw_idx,
-   (filter->action == IXGBE_FDIR_DROP_QUEUE) ?
+   (action == IXGBE_FDIR_DROP_QUEUE) ?
IXGBE_FDIR_DROP_QUEUE :
-   adapter->rx_ring[filter->action]->reg_idx);
+   adapter->rx_ring[action]->reg_idx);
}
 
spin_unlock(&adapter->fdir_perfect_lock);
-- 
2.17.1



[net-next 12/13] ixgbe: add AF_XDP zero-copy Tx support

2018-10-03 Thread Jeff Kirsher
From: Björn Töpel 

This patch adds zero-copy Tx support for AF_XDP sockets. It implements
the ndo_xsk_async_xmit netdev ndo and performs all the Tx logic from a
NAPI context. This means pulling egress packets from the Tx ring,
placing the frames on the NIC HW descriptor ring and completing sent
frames back to the application via the completion ring.

The regular XDP Tx ring is used for AF_XDP as well. This rationale for
this is as follows: XDP_REDIRECT guarantees mutual exclusion between
different NAPI contexts based on CPU id. In other words, a netdev can
XDP_REDIRECT to another netdev with a different NAPI context, since
the operation is bound to a specific core and each core has its own
hardware ring.

As the AF_XDP Tx action is running in the same NAPI context and using
the same ring, it will also be protected from XDP_REDIRECT actions
with the exact same mechanism.

As with AF_XDP Rx, all AF_XDP Tx specific functions are added to
ixgbe_xsk.c.

Signed-off-by: Björn Töpel 
Tested-by: William Tu 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  17 +-
 .../ethernet/intel/ixgbe/ixgbe_txrx_common.h  |   4 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 175 ++
 3 files changed, 195 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index b7ee6d84d0c1..45fd670d35a6 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -3161,7 +3161,11 @@ int ixgbe_poll(struct napi_struct *napi, int budget)
 #endif
 
ixgbe_for_each_ring(ring, q_vector->tx) {
-   if (!ixgbe_clean_tx_irq(q_vector, ring, budget))
+   bool wd = ring->xsk_umem ?
+ ixgbe_clean_xdp_tx_irq(q_vector, ring, budget) :
+ ixgbe_clean_tx_irq(q_vector, ring, budget);
+
+   if (!wd)
clean_complete = false;
}
 
@@ -3470,6 +3474,10 @@ void ixgbe_configure_tx_ring(struct ixgbe_adapter 
*adapter,
u32 txdctl = IXGBE_TXDCTL_ENABLE;
u8 reg_idx = ring->reg_idx;
 
+   ring->xsk_umem = NULL;
+   if (ring_is_xdp(ring))
+   ring->xsk_umem = ixgbe_xsk_umem(adapter, ring);
+
/* disable queue to avoid issues while updating state */
IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(reg_idx), 0);
IXGBE_WRITE_FLUSH(hw);
@@ -5942,6 +5950,11 @@ static void ixgbe_clean_tx_ring(struct ixgbe_ring 
*tx_ring)
u16 i = tx_ring->next_to_clean;
struct ixgbe_tx_buffer *tx_buffer = &tx_ring->tx_buffer_info[i];
 
+   if (tx_ring->xsk_umem) {
+   ixgbe_xsk_clean_tx_ring(tx_ring);
+   goto out;
+   }
+
while (i != tx_ring->next_to_use) {
union ixgbe_adv_tx_desc *eop_desc, *tx_desc;
 
@@ -5993,6 +6006,7 @@ static void ixgbe_clean_tx_ring(struct ixgbe_ring 
*tx_ring)
if (!ring_is_xdp(tx_ring))
netdev_tx_reset_queue(txring_txq(tx_ring));
 
+out:
/* reset next_to_use and next_to_clean */
tx_ring->next_to_use = 0;
tx_ring->next_to_clean = 0;
@@ -10348,6 +10362,7 @@ static const struct net_device_ops ixgbe_netdev_ops = {
.ndo_features_check = ixgbe_features_check,
.ndo_bpf= ixgbe_xdp,
.ndo_xdp_xmit   = ixgbe_xdp_xmit,
+   .ndo_xsk_async_xmit = ixgbe_xsk_async_xmit,
 };
 
 static void ixgbe_disable_txr_hw(struct ixgbe_adapter *adapter,
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
index 56afb685c648..53d4089f5644 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
@@ -42,5 +42,9 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
  struct ixgbe_ring *rx_ring,
  const int budget);
 void ixgbe_xsk_clean_rx_ring(struct ixgbe_ring *rx_ring);
+bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
+   struct ixgbe_ring *tx_ring, int napi_budget);
+int ixgbe_xsk_async_xmit(struct net_device *dev, u32 queue_id);
+void ixgbe_xsk_clean_tx_ring(struct ixgbe_ring *tx_ring);
 
 #endif /* #define _IXGBE_TXRX_COMMON_H_ */
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index e876ff120758..65c3e2c979d4 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -624,3 +624,178 @@ void ixgbe_xsk_clean_rx_ring(struct ixgbe_ring *rx_ring)
}
}
 }
+
+static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget)
+{
+   union ixgbe_adv_tx_desc *tx_desc = NULL;
+   struct ixgbe_tx_buffer *tx_bi;
+   bool work_done = true;
+   u32 len, cmd_type;
+   dma_addr_t dm

[net-next 03/13] ixgbe: remove redundant function ixgbe_fw_recovery_mode()

2018-10-03 Thread Jeff Kirsher
From: YueHaibing 

There are no in-tree callers.

Signed-off-by: YueHaibing 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 11 ---
 1 file changed, 11 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
index 970f71d5da04..0bd1294ba517 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
@@ -3484,17 +3484,6 @@ void ixgbe_set_vlan_anti_spoofing(struct ixgbe_hw *hw, 
bool enable, int vf)
IXGBE_WRITE_REG(hw, IXGBE_PFVFSPOOF(vf_target_reg), pfvfspoof);
 }
 
-/**
- * ixgbe_fw_recovery_mode - Check if in FW NVM recovery mode
- * @hw: pointer to hardware structure
- */
-bool ixgbe_fw_recovery_mode(struct ixgbe_hw *hw)
-{
-   if (hw->mac.ops.fw_recovery_mode)
-   return hw->mac.ops.fw_recovery_mode(hw);
-   return false;
-}
-
 /**
  *  ixgbe_get_device_caps_generic - Get additional device capabilities
  *  @hw: pointer to hardware structure
-- 
2.17.1



[net-next 02/13] ixgbe: Fix ixgbe TX hangs with XDP_TX beyond queue limit

2018-10-03 Thread Jeff Kirsher
From: Radoslaw Tyl 

We have Tx hang when number Tx and XDP queues are more than 64.
In XDP always is MTQC == 0x0 (64TxQs). We need more space for Tx queues.

Signed-off-by: Radoslaw Tyl 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 140e87a10ff5..ddc22557155b 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -3577,12 +3577,18 @@ static void ixgbe_setup_mtqc(struct ixgbe_adapter 
*adapter)
else
mtqc |= IXGBE_MTQC_64VF;
} else {
-   if (tcs > 4)
+   if (tcs > 4) {
mtqc = IXGBE_MTQC_RT_ENA | IXGBE_MTQC_8TC_8TQ;
-   else if (tcs > 1)
+   } else if (tcs > 1) {
mtqc = IXGBE_MTQC_RT_ENA | IXGBE_MTQC_4TC_4TQ;
-   else
-   mtqc = IXGBE_MTQC_64Q_1PB;
+   } else {
+   u8 max_txq = adapter->num_tx_queues +
+   adapter->num_xdp_queues;
+   if (max_txq > 63)
+   mtqc = IXGBE_MTQC_RT_ENA | IXGBE_MTQC_4TC_4TQ;
+   else
+   mtqc = IXGBE_MTQC_64Q_1PB;
+   }
}
 
IXGBE_WRITE_REG(hw, IXGBE_MTQC, mtqc);
-- 
2.17.1



[net-next 01/13] ixgbevf: fix msglen for ipsec mbx messages

2018-10-03 Thread Jeff Kirsher
From: Shannon Nelson 

Don't be fancy with message lengths, just set lengths to
number of dwords, not bytes.

Fixes: 0062e7cc955e ("ixgbevf: add VF IPsec offload code")
Reported-by: Dan Carpenter 
Signed-off-by: Shannon Nelson 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbevf/ipsec.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ipsec.c 
b/drivers/net/ethernet/intel/ixgbevf/ipsec.c
index 997cea675a37..9e4f47d95d40 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ipsec.c
@@ -21,7 +21,6 @@ static int ixgbevf_ipsec_set_pf_sa(struct ixgbevf_adapter 
*adapter,
u32 msgbuf[IXGBE_VFMAILBOX_SIZE] = { 0 };
struct ixgbe_hw *hw = &adapter->hw;
struct sa_mbx_msg *sam;
-   u16 msglen;
int ret;
 
/* send the important bits to the PF */
@@ -38,16 +37,14 @@ static int ixgbevf_ipsec_set_pf_sa(struct ixgbevf_adapter 
*adapter,
memcpy(sam->key, xs->aead->alg_key, sizeof(sam->key));
 
msgbuf[0] = IXGBE_VF_IPSEC_ADD;
-   msglen = sizeof(*sam) + sizeof(msgbuf[0]);
 
spin_lock_bh(&adapter->mbx_lock);
 
-   ret = hw->mbx.ops.write_posted(hw, msgbuf, msglen);
+   ret = hw->mbx.ops.write_posted(hw, msgbuf, IXGBE_VFMAILBOX_SIZE);
if (ret)
goto out;
 
-   msglen = sizeof(msgbuf[0]) * 2;
-   ret = hw->mbx.ops.read_posted(hw, msgbuf, msglen);
+   ret = hw->mbx.ops.read_posted(hw, msgbuf, 2);
if (ret)
goto out;
 
@@ -80,11 +77,11 @@ static int ixgbevf_ipsec_del_pf_sa(struct ixgbevf_adapter 
*adapter, int pfsa)
 
spin_lock_bh(&adapter->mbx_lock);
 
-   err = hw->mbx.ops.write_posted(hw, msgbuf, sizeof(msgbuf));
+   err = hw->mbx.ops.write_posted(hw, msgbuf, 2);
if (err)
goto out;
 
-   err = hw->mbx.ops.read_posted(hw, msgbuf, sizeof(msgbuf));
+   err = hw->mbx.ops.read_posted(hw, msgbuf, 2);
if (err)
goto out;
 
-- 
2.17.1



[net-next 00/13][pull request] 10GbE Intel Wired LAN Driver Updates 2018-10-03

2018-10-03 Thread Jeff Kirsher
This series contains updates to ixgbe/ixgbevf and few fixes for i40e & iavf.

Shannon Nelson fixes the message length for IPsec mailbox messages.

Radoslaw fixes a transmit hang that occurs when XDP_TX exceeds the queue
limit.  Fixes a crash when we restor flow director filters after a reset.

YueHaibing cleans up dead code, which did not have any callers.

Dan Carpenter fixes an "off by one" error in IPsec for ixgbe.

Nathan Chancellor fixes the i40e driver to use the correct enum for link
speed.  Also remove a debug statement since it was not producing useful
information and equated to always "TRUE".

Most notably, Björn introduces zero-copy AF_XDP support for the ixgbe
driver.  The ixgbe zero-copy code is located in its own file ixgbe_xsk.[ch],
analogous to the i40e ZC support. Again, as in i40e, code paths have
been copied from the XDP path to the zero-copy path. Going forward we
will try to generalize more code between the AF_XDP ZC drivers, and
also reduce the heavy C&P.

The following are changes since commit 072eff2d9e2d64c3a95572f0326de3563f26c392:
  Merge branch '100GbE' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 10GbE

Björn Töpel (5):
  ixgbe: added Rx/Tx ring disable/enable functions
  ixgbe: move common Rx functions to ixgbe_txrx_common.h
  ixgbe: add AF_XDP zero-copy Rx support
  ixgbe: move common Tx functions to ixgbe_txrx_common.h
  ixgbe: add AF_XDP zero-copy Tx support

Dan Carpenter (1):
  ixgbevf: off by one in ixgbevf_ipsec_tx()

Nathan Chancellor (2):
  i40e: Use proper enum in i40e_ndo_set_vf_link_state
  i40e: Remove unnecessary print statement

Radoslaw Tyl (2):
  ixgbe: Fix ixgbe TX hangs with XDP_TX beyond queue limit
  ixgbe: Fix crash with VFs and flow director on interface flap

Rami Rosen (1):
  iavf: fix a typo

Shannon Nelson (1):
  ixgbevf: fix msglen for ipsec mbx messages

YueHaibing (1):
  ixgbe: remove redundant function ixgbe_fw_recovery_mode()

 .../net/ethernet/intel/i40e/i40e_debugfs.c|   2 -
 .../ethernet/intel/i40e/i40e_virtchnl_pf.c|   2 +-
 drivers/net/ethernet/intel/iavf/iavf.h|   2 +-
 drivers/net/ethernet/intel/ixgbe/Makefile |   3 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe.h  |  28 +-
 .../net/ethernet/intel/ixgbe/ixgbe_common.c   |  11 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c  |  17 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 315 ++-
 .../ethernet/intel/ixgbe/ixgbe_txrx_common.h  |  50 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 801 ++
 drivers/net/ethernet/intel/ixgbevf/ipsec.c|  13 +-
 11 files changed, 1169 insertions(+), 75 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
 create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c

-- 
2.17.1



[net-next 06/13] i40e: Remove unnecessary print statement

2018-10-03 Thread Jeff Kirsher
From: Nathan Chancellor 

Clang warns that the address of a pointer will always evaluated as true
in a boolean context.

drivers/net/ethernet/intel/i40e/i40e_debugfs.c:136:9: warning: address
of array 'vsi->active_vlans' will always evaluate to 'true'
[-Wpointer-bool-conversion]
 vsi->active_vlans ? "" : "");
 ~^~~~ ~
./include/linux/device.h:1431:33: note: expanded from macro 'dev_info'
_dev_info(dev, dev_fmt(fmt), ##__VA_ARGS__)
   ^~~
1 warning generated.

Given that the statement shows that active_vlans is always valid, just
remove the statement since it's not giving any useful information.

Link: https://github.com/ClangBuiltLinux/linux/issues/82
Signed-off-by: Nathan Chancellor 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c 
b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
index 56b911a5dd8b..a20d1cf058ad 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
@@ -132,8 +132,6 @@ static void i40e_dbg_dump_vsi_seid(struct i40e_pf *pf, int 
seid)
dev_info(&pf->pdev->dev, "vlan_features = 0x%08lx\n",
 (unsigned long int)nd->vlan_features);
}
-   dev_info(&pf->pdev->dev, "active_vlans is %s\n",
-vsi->active_vlans ? "" : "");
dev_info(&pf->pdev->dev,
 "flags = 0x%08lx, netdev_registered = %i, 
current_netdev_flags = 0x%04x\n",
 vsi->flags, vsi->netdev_registered, vsi->current_netdev_flags);
-- 
2.17.1



[net-next 10/13] ixgbe: add AF_XDP zero-copy Rx support

2018-10-03 Thread Jeff Kirsher
From: Björn Töpel 

This patch adds zero-copy Rx support for AF_XDP sockets. Instead of
allocating buffers of type MEM_TYPE_PAGE_SHARED, the Rx frames are
allocated as MEM_TYPE_ZERO_COPY when AF_XDP is enabled for a certain
queue.

All AF_XDP specific functions are added to a new file, ixgbe_xsk.c.

Note that when AF_XDP zero-copy is enabled, the XDP action XDP_PASS
will allocate a new buffer and copy the zero-copy frame prior passing
it to the kernel stack.

Signed-off-by: Björn Töpel 
Tested-by: William Tu 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbe/Makefile |   3 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe.h  |  27 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c  |  17 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  78 ++-
 .../ethernet/intel/ixgbe/ixgbe_txrx_common.h  |  15 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 626 ++
 6 files changed, 745 insertions(+), 21 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c

diff --git a/drivers/net/ethernet/intel/ixgbe/Makefile 
b/drivers/net/ethernet/intel/ixgbe/Makefile
index 5414685189ce..ca6b0c458e4a 100644
--- a/drivers/net/ethernet/intel/ixgbe/Makefile
+++ b/drivers/net/ethernet/intel/ixgbe/Makefile
@@ -8,7 +8,8 @@ obj-$(CONFIG_IXGBE) += ixgbe.o
 
 ixgbe-objs := ixgbe_main.o ixgbe_common.o ixgbe_ethtool.o \
   ixgbe_82599.o ixgbe_82598.o ixgbe_phy.o ixgbe_sriov.o \
-  ixgbe_mbx.o ixgbe_x540.o ixgbe_x550.o ixgbe_lib.o ixgbe_ptp.o
+  ixgbe_mbx.o ixgbe_x540.o ixgbe_x550.o ixgbe_lib.o ixgbe_ptp.o \
+  ixgbe_xsk.o
 
 ixgbe-$(CONFIG_IXGBE_DCB) +=  ixgbe_dcb.o ixgbe_dcb_82598.o \
   ixgbe_dcb_82599.o ixgbe_dcb_nl.o
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 265db172042a..7a7679e7be84 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -228,13 +228,17 @@ struct ixgbe_tx_buffer {
 struct ixgbe_rx_buffer {
struct sk_buff *skb;
dma_addr_t dma;
-   struct page *page;
-#if (BITS_PER_LONG > 32) || (PAGE_SIZE >= 65536)
-   __u32 page_offset;
-#else
-   __u16 page_offset;
-#endif
-   __u16 pagecnt_bias;
+   union {
+   struct {
+   struct page *page;
+   __u32 page_offset;
+   __u16 pagecnt_bias;
+   };
+   struct {
+   void *addr;
+   u64 handle;
+   };
+   };
 };
 
 struct ixgbe_queue_stats {
@@ -348,6 +352,10 @@ struct ixgbe_ring {
struct ixgbe_rx_queue_stats rx_stats;
};
struct xdp_rxq_info xdp_rxq;
+   struct xdp_umem *xsk_umem;
+   struct zero_copy_allocator zca; /* ZC allocator anchor */
+   u16 ring_idx;   /* {rx,tx,xdp}_ring back reference idx */
+   u16 rx_buf_len;
 } cacheline_internodealigned_in_smp;
 
 enum ixgbe_ring_f_enum {
@@ -765,6 +773,11 @@ struct ixgbe_adapter {
 #ifdef CONFIG_XFRM_OFFLOAD
struct ixgbe_ipsec *ipsec;
 #endif /* CONFIG_XFRM_OFFLOAD */
+
+   /* AF_XDP zero-copy */
+   struct xdp_umem **xsk_umems;
+   u16 num_xsk_umems_used;
+   u16 num_xsk_umems;
 };
 
 static inline u8 ixgbe_max_rss_indices(struct ixgbe_adapter *adapter)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
index d361f570ca37..62e6499e4146 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
@@ -1055,7 +1055,7 @@ static int ixgbe_alloc_q_vectors(struct ixgbe_adapter 
*adapter)
int txr_remaining = adapter->num_tx_queues;
int xdp_remaining = adapter->num_xdp_queues;
int rxr_idx = 0, txr_idx = 0, xdp_idx = 0, v_idx = 0;
-   int err;
+   int err, i;
 
/* only one q_vector if MSI-X is disabled. */
if (!(adapter->flags & IXGBE_FLAG_MSIX_ENABLED))
@@ -1097,6 +1097,21 @@ static int ixgbe_alloc_q_vectors(struct ixgbe_adapter 
*adapter)
xdp_idx += xqpv;
}
 
+   for (i = 0; i < adapter->num_rx_queues; i++) {
+   if (adapter->rx_ring[i])
+   adapter->rx_ring[i]->ring_idx = i;
+   }
+
+   for (i = 0; i < adapter->num_tx_queues; i++) {
+   if (adapter->tx_ring[i])
+   adapter->tx_ring[i]->ring_idx = i;
+   }
+
+   for (i = 0; i < adapter->num_xdp_queues; i++) {
+   if (adapter->xdp_ring[i])
+   adapter->xdp_ring[i]->ring_idx = i;
+   }
+
return 0;
 
 err_out:
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 681ed9f1ea35..cad4c12e8e63 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -34,6 +34

[net-next 05/13] i40e: Use proper enum in i40e_ndo_set_vf_link_state

2018-10-03 Thread Jeff Kirsher
From: Nathan Chancellor 

Clang warns when one enumerated type is converted implicitly to another.

drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c:4214:42: warning:
implicit conversion from enumeration type 'enum i40e_aq_link_speed' to
different enumeration type 'enum virtchnl_link_speed'
  [-Wenum-conversion]
pfe.event_data.link_event.link_speed = I40E_LINK_SPEED_40GB;
 ~ ^~~~
1 warning generated.

Use the proper enum from virtchnl_link_speed, which has the same value
as I40E_LINK_SPEED_40GB, VIRTCHNL_LINK_SPEED_40GB. This appears to be
missed by commit ff3f4cc267f6 ("virtchnl: finish conversion to virtchnl
interface").

Link: https://github.com/ClangBuiltLinux/linux/issues/81
Signed-off-by: Nathan Chancellor 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c 
b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index f4bb2779f03a..81b0e1f8d14b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -4256,7 +4256,7 @@ int i40e_ndo_set_vf_link_state(struct net_device *netdev, 
int vf_id, int link)
vf->link_forced = true;
vf->link_up = true;
pfe.event_data.link_event.link_status = true;
-   pfe.event_data.link_event.link_speed = I40E_LINK_SPEED_40GB;
+   pfe.event_data.link_event.link_speed = VIRTCHNL_LINK_SPEED_40GB;
break;
case IFLA_VF_LINK_STATE_DISABLE:
vf->link_forced = true;
-- 
2.17.1



[net-next 13/13] iavf: fix a typo

2018-10-03 Thread Jeff Kirsher
From: Rami Rosen 

This trivial patch fixes a typo in iavf.h.

Signed-off-by: Rami Rosen 
Acked-by: Jesse Brandeburg 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/iavf/iavf.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf.h 
b/drivers/net/ethernet/intel/iavf/iavf.h
index a512f7521841..272d76b733aa 100644
--- a/drivers/net/ethernet/intel/iavf/iavf.h
+++ b/drivers/net/ethernet/intel/iavf/iavf.h
@@ -342,7 +342,7 @@ struct iavf_adapter {
struct iavf_channel_config ch_config;
u8 num_tc;
struct list_head cloud_filter_list;
-   /* lock to protest access to the cloud filter list */
+   /* lock to protect access to the cloud filter list */
spinlock_t cloud_filter_list_lock;
u16 num_cloud_filters;
 };
-- 
2.17.1



[net-next 09/13] ixgbe: move common Rx functions to ixgbe_txrx_common.h

2018-10-03 Thread Jeff Kirsher
From: Björn Töpel 

This patch prepares for the upcoming zero-copy Rx functionality, by
moving/changing linkage of common functions, used both by the regular
path and zero-copy path.

Signed-off-by: Björn Töpel 
Tested-by: William Tu 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 29 +++
 .../ethernet/intel/ixgbe/ixgbe_txrx_common.h  | 26 +
 2 files changed, 37 insertions(+), 18 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 47e28d9ce1e3..681ed9f1ea35 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -40,6 +40,7 @@
 #include "ixgbe_dcb_82599.h"
 #include "ixgbe_sriov.h"
 #include "ixgbe_model.h"
+#include "ixgbe_txrx_common.h"
 
 char ixgbe_driver_name[] = "ixgbe";
 static const char ixgbe_driver_string[] =
@@ -1673,9 +1674,9 @@ static void ixgbe_update_rsc_stats(struct ixgbe_ring 
*rx_ring,
  * order to populate the hash, checksum, VLAN, timestamp, protocol, and
  * other fields within the skb.
  **/
-static void ixgbe_process_skb_fields(struct ixgbe_ring *rx_ring,
-union ixgbe_adv_rx_desc *rx_desc,
-struct sk_buff *skb)
+void ixgbe_process_skb_fields(struct ixgbe_ring *rx_ring,
+ union ixgbe_adv_rx_desc *rx_desc,
+ struct sk_buff *skb)
 {
struct net_device *dev = rx_ring->netdev;
u32 flags = rx_ring->q_vector->adapter->flags;
@@ -1708,8 +1709,8 @@ static void ixgbe_process_skb_fields(struct ixgbe_ring 
*rx_ring,
skb->protocol = eth_type_trans(skb, dev);
 }
 
-static void ixgbe_rx_skb(struct ixgbe_q_vector *q_vector,
-struct sk_buff *skb)
+void ixgbe_rx_skb(struct ixgbe_q_vector *q_vector,
+ struct sk_buff *skb)
 {
napi_gro_receive(&q_vector->napi, skb);
 }
@@ -1868,9 +1869,9 @@ static void ixgbe_dma_sync_frag(struct ixgbe_ring 
*rx_ring,
  *
  * Returns true if an error was encountered and skb was freed.
  **/
-static bool ixgbe_cleanup_headers(struct ixgbe_ring *rx_ring,
- union ixgbe_adv_rx_desc *rx_desc,
- struct sk_buff *skb)
+bool ixgbe_cleanup_headers(struct ixgbe_ring *rx_ring,
+  union ixgbe_adv_rx_desc *rx_desc,
+  struct sk_buff *skb)
 {
struct net_device *netdev = rx_ring->netdev;
 
@@ -2186,14 +2187,6 @@ static struct sk_buff *ixgbe_build_skb(struct ixgbe_ring 
*rx_ring,
return skb;
 }
 
-#define IXGBE_XDP_PASS 0
-#define IXGBE_XDP_CONSUMED BIT(0)
-#define IXGBE_XDP_TX   BIT(1)
-#define IXGBE_XDP_REDIRBIT(2)
-
-static int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter,
-  struct xdp_frame *xdpf);
-
 static struct sk_buff *ixgbe_run_xdp(struct ixgbe_adapter *adapter,
 struct ixgbe_ring *rx_ring,
 struct xdp_buff *xdp)
@@ -8469,8 +8462,8 @@ static u16 ixgbe_select_queue(struct net_device *dev, 
struct sk_buff *skb,
 }
 
 #endif
-static int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter,
-  struct xdp_frame *xdpf)
+int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter,
+   struct xdp_frame *xdpf)
 {
struct ixgbe_ring *ring = adapter->xdp_ring[smp_processor_id()];
struct ixgbe_tx_buffer *tx_buffer;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
new file mode 100644
index ..3780d315b991
--- /dev/null
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright(c) 2018 Intel Corporation. */
+
+#ifndef _IXGBE_TXRX_COMMON_H_
+#define _IXGBE_TXRX_COMMON_H_
+
+#define IXGBE_XDP_PASS 0
+#define IXGBE_XDP_CONSUMED BIT(0)
+#define IXGBE_XDP_TX   BIT(1)
+#define IXGBE_XDP_REDIRBIT(2)
+
+int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter,
+   struct xdp_frame *xdpf);
+bool ixgbe_cleanup_headers(struct ixgbe_ring *rx_ring,
+  union ixgbe_adv_rx_desc *rx_desc,
+  struct sk_buff *skb);
+void ixgbe_process_skb_fields(struct ixgbe_ring *rx_ring,
+ union ixgbe_adv_rx_desc *rx_desc,
+ struct sk_buff *skb);
+void ixgbe_rx_skb(struct ixgbe_q_vector *q_vector,
+ struct sk_buff *skb);
+
+void ixgbe_txrx_ring_disable(struct ixgbe_adapter *adapter, int ring);
+void ixgbe_txrx_ring_enable(struct ixgbe_adapter *adapter, int ring);
+
+#endif /* #define _IXGBE_TXRX_C

[net-next 11/13] ixgbe: move common Tx functions to ixgbe_txrx_common.h

2018-10-03 Thread Jeff Kirsher
From: Björn Töpel 

This patch prepares for the upcoming zero-copy Tx functionality by
moving common functions used both by the regular path and zero-copy
path.

Signed-off-by: Björn Töpel 
Tested-by: William Tu 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c| 9 +++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h | 5 +
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index cad4c12e8e63..b7ee6d84d0c1 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -895,8 +895,8 @@ static void ixgbe_set_ivar(struct ixgbe_adapter *adapter, 
s8 direction,
}
 }
 
-static inline void ixgbe_irq_rearm_queues(struct ixgbe_adapter *adapter,
- u64 qmask)
+void ixgbe_irq_rearm_queues(struct ixgbe_adapter *adapter,
+   u64 qmask)
 {
u32 mask;
 
@@ -8154,9 +8154,6 @@ static inline int ixgbe_maybe_stop_tx(struct ixgbe_ring 
*tx_ring, u16 size)
return __ixgbe_maybe_stop_tx(tx_ring, size);
 }
 
-#define IXGBE_TXD_CMD (IXGBE_TXD_CMD_EOP | \
-  IXGBE_TXD_CMD_RS)
-
 static int ixgbe_tx_map(struct ixgbe_ring *tx_ring,
struct ixgbe_tx_buffer *first,
const u8 hdr_len)
@@ -10257,7 +10254,7 @@ static int ixgbe_xdp(struct net_device *dev, struct 
netdev_bpf *xdp)
}
 }
 
-static void ixgbe_xdp_ring_update_tail(struct ixgbe_ring *ring)
+void ixgbe_xdp_ring_update_tail(struct ixgbe_ring *ring)
 {
/* Force memory writes to complete before letting h/w know there
 * are new descriptors to fetch.
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
index cf219f4e009d..56afb685c648 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
@@ -9,6 +9,9 @@
 #define IXGBE_XDP_TX   BIT(1)
 #define IXGBE_XDP_REDIRBIT(2)
 
+#define IXGBE_TXD_CMD (IXGBE_TXD_CMD_EOP | \
+  IXGBE_TXD_CMD_RS)
+
 int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter,
struct xdp_frame *xdpf);
 bool ixgbe_cleanup_headers(struct ixgbe_ring *rx_ring,
@@ -19,6 +22,8 @@ void ixgbe_process_skb_fields(struct ixgbe_ring *rx_ring,
  struct sk_buff *skb);
 void ixgbe_rx_skb(struct ixgbe_q_vector *q_vector,
  struct sk_buff *skb);
+void ixgbe_xdp_ring_update_tail(struct ixgbe_ring *ring);
+void ixgbe_irq_rearm_queues(struct ixgbe_adapter *adapter, u64 qmask);
 
 void ixgbe_txrx_ring_disable(struct ixgbe_adapter *adapter, int ring);
 void ixgbe_txrx_ring_enable(struct ixgbe_adapter *adapter, int ring);
-- 
2.17.1



[net-next 08/13] ixgbe: added Rx/Tx ring disable/enable functions

2018-10-03 Thread Jeff Kirsher
From: Björn Töpel 

Add functions for Rx/Tx ring enable/disable. Instead of resetting the
whole device, only the affected ring is disabled or enabled.

This plumbing is used in later commits, when zero-copy AF_XDP support
is introduced.

Signed-off-by: Björn Töpel 
Tested-by: William Tu 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h  |   1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 158 ++
 2 files changed, 159 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 5c6fd42e90ed..265db172042a 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -271,6 +271,7 @@ enum ixgbe_ring_state_t {
__IXGBE_TX_DETECT_HANG,
__IXGBE_HANG_CHECK_ARMED,
__IXGBE_TX_XDP_RING,
+   __IXGBE_TX_DISABLED,
 };
 
 #define ring_uses_build_skb(ring) \
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 2928ce7653eb..47e28d9ce1e3 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -8692,6 +8692,8 @@ static netdev_tx_t __ixgbe_xmit_frame(struct sk_buff *skb,
return NETDEV_TX_OK;
 
tx_ring = ring ? ring : adapter->tx_ring[skb->queue_mapping];
+   if (unlikely(test_bit(__IXGBE_TX_DISABLED, &tx_ring->state)))
+   return NETDEV_TX_BUSY;
 
return ixgbe_xmit_frame_ring(skb, adapter, tx_ring);
 }
@@ -10238,6 +10240,9 @@ static int ixgbe_xdp_xmit(struct net_device *dev, int n,
if (unlikely(!ring))
return -ENXIO;
 
+   if (unlikely(test_bit(__IXGBE_TX_DISABLED, &ring->state)))
+   return -ENXIO;
+
for (i = 0; i < n; i++) {
struct xdp_frame *xdpf = frames[i];
int err;
@@ -10301,6 +10306,159 @@ static const struct net_device_ops ixgbe_netdev_ops = 
{
.ndo_xdp_xmit   = ixgbe_xdp_xmit,
 };
 
+static void ixgbe_disable_txr_hw(struct ixgbe_adapter *adapter,
+struct ixgbe_ring *tx_ring)
+{
+   unsigned long wait_delay, delay_interval;
+   struct ixgbe_hw *hw = &adapter->hw;
+   u8 reg_idx = tx_ring->reg_idx;
+   int wait_loop;
+   u32 txdctl;
+
+   IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(reg_idx), IXGBE_TXDCTL_SWFLSH);
+
+   /* delay mechanism from ixgbe_disable_tx */
+   delay_interval = ixgbe_get_completion_timeout(adapter) / 100;
+
+   wait_loop = IXGBE_MAX_RX_DESC_POLL;
+   wait_delay = delay_interval;
+
+   while (wait_loop--) {
+   usleep_range(wait_delay, wait_delay + 10);
+   wait_delay += delay_interval * 2;
+   txdctl = IXGBE_READ_REG(hw, IXGBE_TXDCTL(reg_idx));
+
+   if (!(txdctl & IXGBE_TXDCTL_ENABLE))
+   return;
+   }
+
+   e_err(drv, "TXDCTL.ENABLE not cleared within the polling period\n");
+}
+
+static void ixgbe_disable_txr(struct ixgbe_adapter *adapter,
+ struct ixgbe_ring *tx_ring)
+{
+   set_bit(__IXGBE_TX_DISABLED, &tx_ring->state);
+   ixgbe_disable_txr_hw(adapter, tx_ring);
+}
+
+static void ixgbe_disable_rxr_hw(struct ixgbe_adapter *adapter,
+struct ixgbe_ring *rx_ring)
+{
+   unsigned long wait_delay, delay_interval;
+   struct ixgbe_hw *hw = &adapter->hw;
+   u8 reg_idx = rx_ring->reg_idx;
+   int wait_loop;
+   u32 rxdctl;
+
+   rxdctl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(reg_idx));
+   rxdctl &= ~IXGBE_RXDCTL_ENABLE;
+   rxdctl |= IXGBE_RXDCTL_SWFLSH;
+
+   /* write value back with RXDCTL.ENABLE bit cleared */
+   IXGBE_WRITE_REG(hw, IXGBE_RXDCTL(reg_idx), rxdctl);
+
+   /* RXDCTL.EN may not change on 82598 if link is down, so skip it */
+   if (hw->mac.type == ixgbe_mac_82598EB &&
+   !(IXGBE_READ_REG(hw, IXGBE_LINKS) & IXGBE_LINKS_UP))
+   return;
+
+   /* delay mechanism from ixgbe_disable_rx */
+   delay_interval = ixgbe_get_completion_timeout(adapter) / 100;
+
+   wait_loop = IXGBE_MAX_RX_DESC_POLL;
+   wait_delay = delay_interval;
+
+   while (wait_loop--) {
+   usleep_range(wait_delay, wait_delay + 10);
+   wait_delay += delay_interval * 2;
+   rxdctl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(reg_idx));
+
+   if (!(rxdctl & IXGBE_RXDCTL_ENABLE))
+   return;
+   }
+
+   e_err(drv, "RXDCTL.ENABLE not cleared within the polling period\n");
+}
+
+static void ixgbe_reset_txr_stats(struct ixgbe_ring *tx_ring)
+{
+   memset(&tx_ring->stats, 0, sizeof(tx_ring->stats));
+   memset(&tx_ring->tx_stats, 0, sizeof(tx_ring->tx_stats));
+}
+
+static void ixgbe_reset_rxr_stats(struct ixgbe_ring *rx_ring)
+{
+   memset(&rx_ring->stats, 0, sizeof(rx_ring->stats));
+   memset(&rx

[net-next 04/13] ixgbevf: off by one in ixgbevf_ipsec_tx()

2018-10-03 Thread Jeff Kirsher
From: Dan Carpenter 

The ipsec->tx_tbl[] array has IXGBE_IPSEC_MAX_SA_COUNT elements so the >
should be a >=.

Fixes: 0062e7cc955e ("ixgbevf: add VF IPsec offload code")
Signed-off-by: Dan Carpenter 
Signed-off-by: Shannon Nelson 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbevf/ipsec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ipsec.c 
b/drivers/net/ethernet/intel/ixgbevf/ipsec.c
index 9e4f47d95d40..e8a3231be0bf 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ipsec.c
@@ -467,7 +467,7 @@ int ixgbevf_ipsec_tx(struct ixgbevf_ring *tx_ring,
}
 
sa_idx = xs->xso.offload_handle - IXGBE_IPSEC_BASE_TX_INDEX;
-   if (unlikely(sa_idx > IXGBE_IPSEC_MAX_SA_COUNT)) {
+   if (unlikely(sa_idx >= IXGBE_IPSEC_MAX_SA_COUNT)) {
netdev_err(tx_ring->netdev, "%s: bad sa_idx=%d handle=%lu\n",
   __func__, sa_idx, xs->xso.offload_handle);
return 0;
-- 
2.17.1



Re: r8169 tx batching(?) causing performance problems

2018-10-03 Thread David Howells
David Miller  wrote:

> Probably you are seeing some interrupt mitigation.
> 
> It seems there is a difference in how the interrupt mitigation is
> programmed on for 8168 chips vs. others by default.  Most get
> all zeros in the IntrMitigate register, whilst for 8168 chips
> a value of 0x5151 is programmed.

I'm not sure what that means.  I can't seem to find a programmer's manual for
the chip.

> You can play with ethtool to mess with the coalescing settings
> to see if this is part of the problem.

These bits from "ethtool -c enp3s0"?

rx-usecs: 200
rx-frames: 4
rx-usecs-irq: 0
rx-frames-irq: 0

tx-usecs: 200
tx-frames: 4
tx-usecs-irq: 0
tx-frames-irq: 0

> I bet this might explain the behavior you see after including
> even Heiner's TXCFG_AUTO_FIFO patch.

Thanks,
David


Targeted Global B2B Companies emails list

2018-10-03 Thread catherine . white

Hi,

I just wanted to check if you would be interested in a list of Managed  
Service Providers (MSPs) and Managed Security Service Providers (MSSPs)?


We also have the data intelligence of:

•Managed Service Providers (MSP’s) – 25,000 unique companies
•Managed Security Service Providers (MSSP’s) – 7,520 unique  
companies

•IT Decision Makers – 6million across all industry
•Business Decision Makers – 10 million across all industry
•Value Added Resellers- VARs
•Independent Software Vendors- ISVs
•System Integrators- SIs
•VoIP Service Providers.
•Telecommunications Service Providers (TSPs)
•Application Service Providers (ASPs)
•IT Managed Services Providers (ITMSP)
•Storage Service Providers (SSPs)

Kindly review and let me know if I can share more information on this.

I look forward to hearing from you.
Regards,
Catherine
Marketing Executive

If you don't want to include yourself in our mailing list, please reply  
back “Leave Out" in a subject line"


[net] ixgbe: check return value of napi_complete_done()

2018-10-03 Thread Jeff Kirsher
From: Song Liu 

The NIC driver should only enable interrupts when napi_complete_done()
returns true. This patch adds the check for ixgbe.

Cc: sta...@vger.kernel.org # 4.10+
Suggested-by: Eric Dumazet 
Signed-off-by: Song Liu 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index f27d73a7bf16..6cdd58d9d461 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -3196,11 +3196,13 @@ int ixgbe_poll(struct napi_struct *napi, int budget)
return budget;
 
/* all work done, exit the polling mode */
-   napi_complete_done(napi, work_done);
-   if (adapter->rx_itr_setting & 1)
-   ixgbe_set_itr(q_vector);
-   if (!test_bit(__IXGBE_DOWN, &adapter->state))
-   ixgbe_irq_enable_queues(adapter, BIT_ULL(q_vector->v_idx));
+   if (likely(napi_complete_done(napi, work_done))) {
+   if (adapter->rx_itr_setting & 1)
+   ixgbe_set_itr(q_vector);
+   if (!test_bit(__IXGBE_DOWN, &adapter->state))
+   ixgbe_irq_enable_queues(adapter,
+   BIT_ULL(q_vector->v_idx));
+   }
 
return min(work_done, budget - 1);
 }
-- 
2.17.1



Re: [net-next 08/15] ice: Notify VF of link status change

2018-10-03 Thread Or Gerlitz
On Wed, Oct 3, 2018 at 6:46 PM Jeff Kirsher  wrote:
> From: Anirudh Venkataramanan 
>
> When PF gets a link status change event, notify the VFs of the same.

so you always want to block east/west traffic when the uplink is down? why?

The correct approach is to have vf/vport e-switch representor on the
host, and if
the host admin puts down the administrative link of the rep -- the vf
operational link (carrier)
goes down.


Re: [PATCH iproute2 net-next v1 5/6] tc: Add support for configuring the taprio scheduler

2018-10-03 Thread Vinicius Costa Gomes
Hi David,

David Ahern  writes:

> On 9/28/18 7:10 PM, Vinicius Costa Gomes wrote:
>> This traffic scheduler allows traffic classes states (transmission
>> allowed/not allowed, in the simplest case) to be scheduled, according
>> to a pre-generated time sequence. This is the basis of the IEEE
>> 802.1Qbv specification.
>> 
>> Example configuration:
>> 
>> tc qdisc replace dev enp3s0 parent root handle 100 taprio \
>>   num_tc 3 \
>>map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
>>queues 1@0 1@1 2@2 \
>>base-time 1528743495910289987 \
>>sched-entry S 01 30 \
>>sched-entry S 02 30 \
>>sched-entry S 04 30 \
>>clockid CLOCK_TAI
>> 
>> The configuration format is similar to mqprio. The main difference is
>> the presence of a schedule, built by multiple "sched-entry"
>> definitions, each entry has the following format:
>> 
>>  sched-entry   
>> 
>> The only supported  is "S", which means "SetGateStates",
>
> ...
>
>> +static int str_to_entry_cmd(const char *str)
>> +{
>> +if (strcmp(str, "S") == 0)
>> +return TC_TAPRIO_CMD_SET_GATES;
>> +
>> +if (strcmp(str, "H") == 0)
>> +return TC_TAPRIO_CMD_SET_AND_HOLD;
>> +
>> +if (strcmp(str, "R") == 0)
>> +return TC_TAPRIO_CMD_SET_AND_RELEASE;
>
> If 'S' is the only supported command, what are 'H' and 'R'?

It is the only command that works (for now): taprio is a software only
implementation now.

Just to give some background, 'H' (Set-And-Hold-MAC) and 'R'
(Set-And-Release-MAC) will be used when Frame Preemption is added, 'H'
means that any preemtible frame should have finished being transmitted
before this "entry" starts and frame preemption is disabled, 'R' means
preemptible frames may be transmitted during this entry's interval and
frame preemption is re-enabled.

Will remove these references for now. 

>
>> +
>> +return -1;
>> +}
>> +
>
>> +
>> +static const char *command_to_str(__u8 cmd)
>> +{
>> +switch (cmd) {
>> +case TC_TAPRIO_CMD_SET_GATES:
>> +return "S";
>> +case TC_TAPRIO_CMD_SET_AND_HOLD:
>> +return "H";
>> +case TC_TAPRIO_CMD_SET_AND_RELEASE:
>> +return "R";
>> +default:
>> +return "Invalid";
>> +}
>> +}
>
> And can you keep str-to-command and command-to-str helpers close
> together in the code.

Sure. WIll fix it for v2.


Cheers,
--
Vinicius


Re: [RFC PATCH bpf-next v3 4/7] bpf: add bpf queue and stack maps

2018-10-03 Thread Mauricio Vasquez




On 10/01/2018 07:26 PM, Alexei Starovoitov wrote:

On Mon, Oct 01, 2018 at 08:11:43AM -0500, Mauricio Vasquez wrote:

+BPF_CALL_3(bpf_map_pop_elem, struct bpf_map *, map, void *,
value, u32, size)
+{
+    void *ptr;
+
+    if (map->value_size != size)
+    return -EINVAL;
+
+    ptr = map->ops->map_lookup_and_delete_elem(map, NULL);
+    if (!ptr)
+    return -ENOENT;
+
+    switch (size) {
+    case 1:
+    *(u8 *) value = *(u8 *) ptr;
+    break;
+    case 2:
+    *(u16 *) value = *(u16 *) ptr;
+    break;
+    case 4:
+    *(u32 *) value = *(u32 *) ptr;
+    break;
+    case 8:
+    *(u64 *) value = *(u64 *) ptr;
+    break;

this is inefficient. can we pass value ptr into ops and let it
populate it?

I don't think so, doing that implies that look_and_delete will be a
per-value op, while other ops in maps are per-reference.
For instance, how to change it in the case of peek helper that is using
the lookup operation?, we cannot change the signature of the lookup
operation.

This is something that worries me a little bit, we are creating new
per-value helpers based on already existing per-reference operations,
this is not probably the cleanest way.  Here we are at the beginning of
the discussion once again, how should we map helpers and syscalls to
ops.

What about creating pop/peek/push ops, mapping helpers one to one and
adding some logic into syscall.c to call the correct operation in case
the map is stack/queue?
Syscall mapping would be:
bpf_map_lookup_elem() -> peek
bpf_map_lookup_and_delete_elem() -> pop
bpf_map_update_elem() -> push

Does it make sense?

Hello Alexei,

Do you have any feedback on this specific part?

Indeed. It seems push/pop ops will be cleaner.
I still think that peek() is useless due to races.
So BPF_MAP_UPDATE_ELEM syscall cmd will map to 'push' ops
and new BPF_MAP_LOOKUP_AND_DELETE_ELEM will map to 'pop' ops.
right?



That's right.

While updating the push api some came to me, do we have any specific 
reason to support only 1, 2, 4, 8 bytes? I think we could do it general 
enough to support any number of bytes, if the user is worried about the 
cost of memcpys he could use a map of 8 bytes pointers as you mentioned 
some time ago.
From an API point of view, pop/peek helpers already expect a void 
*value (int bpf_map_[pop, peek]_elem(map, void *value)), the only change 
would also to use a pointer in the push instead of a u64.





Re: [RFC PATCH v2 bpf-next 0/2] verifier liveness simplification

2018-10-03 Thread Jiong Wang

On 03/10/2018 16:59, Alexei Starovoitov wrote:

On Wed, Oct 03, 2018 at 04:36:31PM +0100, Jiong Wang wrote:



Now this hasn't happened. I am still debugging the root cause, but kind of
feel
"64-bit" attribute propagation is the issue, it seems to me it can't be
nicely
integrated into the existing register read/write propagation infrastructure.


may be share your patches that modify the liveness propagation?


OK, I will share it after some clean up.


For
example, for a slightly more complex sequence which is composed of three
states:

State A
  ...
  10: r6 = *(u32 *)(r10 - 4)
  11: r7 = *(u32 *)(r10 - 8)
  12: *(u64 *)(r10 - 16) = r6
  13: *(u64 *)(r10 - 24) = r7

State B
  14: r6 += 1
  15: r7 += r6
  16: *(u32 *)(r10 - 28) = r7

State C
  ...
  17: r3 += r7
  18: r4 = 1
  19: *(u64 *)(r10 - 32) = r3
  20: *(u64 *)(r10 - 40) = r4

State A is parent of state B which is parent of state C.

Inside state C, at insn 20, r4 is a 64-bit read/use, so its define at 18 is
marked as "64-bit". There is no register source at 18, so "64-bit" attribute
propagation is stopped.

Then at insn 19, r3 is a 64-bit read/use, so its define at 17 is marked as
"64-bit" read/use. Insn 17 has two register sources, r3 and r7, they become
"64-bit" now, and their definition should be marked as "64-bit".

Now if the definition of r3 or r7 comes from parent state, then the parent


... the definition of r3 _and_ r7 ...
both need to propagate up with your algo, right?


Yes, all sources of insn 17, both r3 and r7.


state
should receive a "REG_LIVE_READ64", this is necessary if later another path
reaches state C and triggers prune path, for which case that path should
know
there is "64-bit" use inside state C on some registers, and should use this
information to mark "64-bit" insn.

If the definition of r3 or r7 is still inside state C, we need to keep
walking
up the instruction sequences, and propagate "64-bit" attribute upward until
it
goes beyond the state C.

The above propagation logic is quite different from existing register
read/write propagation.
For the latter, a write just screen up all following read, and a
read would propagate directly to its parent is there is not previous write,
no instruction analysis is required.


correct.
with such algo REG_LIVE_WRITTEN shouldn't be screening the propagation.

I think the patches will discuss the algo.
Also I think the initial state of 'everything is 32-bit safe'
and make marks to enforce 64-bit-ness is a dangerous algorithmic choice.


Indeed, and I am actually thinking the same thing ...


Why not to start at a safer state where everything is 64-bit
and work backward to find out which ones can be 32-bit?
That will be safer algo in case there are issues with bit like
you described in above.


... but I failed to find a algo works on making initial state of
"everything is 64-bit". I haven't figured out a way to check that all use of one
definition are 32-bit on all possible code paths, which looks to me is a must 
for
one insn marked as 32-bit safe.



Re: [net-next 00/15][pull request] 100GbE Intel Wired LAN Driver Updates 2018-10-03

2018-10-03 Thread David Miller
From: Jeff Kirsher 
Date: Wed,  3 Oct 2018 08:48:10 -0700

> This series contains updates to ice and virtchnl.

Pulled, thanks Jeff.


Re: [PATCH net-next] cxgb4: remove the unneeded locks

2018-10-03 Thread David Miller
From: Ganesh Goudar 
Date: Wed,  3 Oct 2018 18:26:32 +0530

> cxgb_set_tx_maxrate will be called holding rtnl lock,
> hence remove all unneeded locks.
> 
> Signed-off-by: Ganesh Goudar 

Applied.


Re: [PATCH iproute2 net-next v1 5/6] tc: Add support for configuring the taprio scheduler

2018-10-03 Thread David Ahern
On 9/28/18 7:10 PM, Vinicius Costa Gomes wrote:
> This traffic scheduler allows traffic classes states (transmission
> allowed/not allowed, in the simplest case) to be scheduled, according
> to a pre-generated time sequence. This is the basis of the IEEE
> 802.1Qbv specification.
> 
> Example configuration:
> 
> tc qdisc replace dev enp3s0 parent root handle 100 taprio \
>   num_tc 3 \
> map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
> queues 1@0 1@1 2@2 \
> base-time 1528743495910289987 \
> sched-entry S 01 30 \
> sched-entry S 02 30 \
> sched-entry S 04 30 \
> clockid CLOCK_TAI
> 
> The configuration format is similar to mqprio. The main difference is
> the presence of a schedule, built by multiple "sched-entry"
> definitions, each entry has the following format:
> 
>  sched-entry   
> 
> The only supported  is "S", which means "SetGateStates",

...

> +static int str_to_entry_cmd(const char *str)
> +{
> + if (strcmp(str, "S") == 0)
> + return TC_TAPRIO_CMD_SET_GATES;
> +
> + if (strcmp(str, "H") == 0)
> + return TC_TAPRIO_CMD_SET_AND_HOLD;
> +
> + if (strcmp(str, "R") == 0)
> + return TC_TAPRIO_CMD_SET_AND_RELEASE;

If 'S' is the only supported command, what are 'H' and 'R'?

> +
> + return -1;
> +}
> +

> +
> +static const char *command_to_str(__u8 cmd)
> +{
> + switch (cmd) {
> + case TC_TAPRIO_CMD_SET_GATES:
> + return "S";
> + case TC_TAPRIO_CMD_SET_AND_HOLD:
> + return "H";
> + case TC_TAPRIO_CMD_SET_AND_RELEASE:
> + return "R";
> + default:
> + return "Invalid";
> + }
> +}

And can you keep str-to-command and command-to-str helpers close
together in the code.



Re: r8169 tx batching(?) causing performance problems

2018-10-03 Thread David Miller


Probably you are seeing some interrupt mitigation.

It seems there is a difference in how the interrupt mitigation is
programmed on for 8168 chips vs. others by default.  Most get
all zeros in the IntrMitigate register, whilst for 8168 chips
a value of 0x5151 is programmed.

You can play with ethtool to mess with the coalescing settings
to see if this is part of the problem.

I bet this might explain the behavior you see after including
even Heiner's TXCFG_AUTO_FIFO patch.


Re: [PATCH iproute2-next] tc: flower: expose hardware offload count

2018-10-03 Thread Vlad Buslov


On Wed 03 Oct 2018 at 16:08, Davide Caratti  wrote:
> On Wed, 2018-10-03 at 18:29 +0300, Vlad Buslov wrote:
>> Recently flower classifier was updated to expose count of devices that
>> filter is offloaded to. Add support to print this counter as 'in_hw_count'.
>> 
>> Signed-off-by: Vlad Buslov 
>> Acked-by: Jiri Pirko 
>> ---
>>  tc/f_flower.c | 10 +-
>>  1 file changed, 9 insertions(+), 1 deletion(-)
>> 
>> diff --git a/tc/f_flower.c b/tc/f_flower.c
>> index 59e5f572c542..cbacc664d397 100644
>
> hello Vlad!
>
>> --- a/tc/f_flower.c
>> +++ b/tc/f_flower.c
>> @@ -1585,8 +1585,16 @@ static int flower_print_opt(struct filter_util *qu, 
>> FILE *f,
>>  if (flags & TCA_CLS_FLAGS_SKIP_SW)
>>  print_bool(PRINT_ANY, "skip_sw", "\n  skip_sw", true);
>>  
>> -if (flags & TCA_CLS_FLAGS_IN_HW)
>> +if (flags & TCA_CLS_FLAGS_IN_HW) {
>>  print_bool(PRINT_ANY, "in_hw", "\n  in_hw", true);
>> +
>> +if (tb[TCA_FLOWER_IN_HW_COUNT]) {
>> +__u32 count = 
>> rta_getattr_u32(tb[TCA_FLOWER_IN_HW_COUNT]);
>> +
>> +print_uint(PRINT_ANY, "in_hw_count",
>> +   " in_hw_count %d", count);
> ^^ maybe using %u in the format is better?
>
> thanks!

Hello Davide!

Sure. I'll send V2 with "%u".

Thanks,
Vlad


Re: [PATCH v2 iproute2-next 1/3] tc: support conversions to or from 64 bit nanosecond-based time

2018-10-03 Thread Dave Taht
On Mon, Aug 27, 2018 at 9:39 AM Dave Taht  wrote:
>
> On Mon, Aug 27, 2018 at 9:11 AM Stephen Hemminger
>  wrote:
> >
> > On Sun, 26 Aug 2018 19:42:28 -0700
> > Yousuk Seung  wrote:
> >
> > > +int get_time(unsigned int *time, const char *str)
> > > +{
> > > + double t;
> > > + char *p;
> > > +
> > > + t = strtod(str, &p);
> > > + if (p == str)
> > > + return -1;
> > > +
> > > + if (*p) {
> > > + if (strcasecmp(p, "s") == 0 || strcasecmp(p, "sec") == 0 ||
> > > + strcasecmp(p, "secs") == 0)
> > > + t *= TIME_UNITS_PER_SEC;
> > > + else if (strcasecmp(p, "ms") == 0 || strcasecmp(p, "msec") 
> > > == 0 ||
> > > +  strcasecmp(p, "msecs") == 0)
> > > + t *= TIME_UNITS_PER_SEC/1000;
> > > + else if (strcasecmp(p, "us") == 0 || strcasecmp(p, "usec") 
> > > == 0 ||
> > > +  strcasecmp(p, "usecs") == 0)
> > > + t *= TIME_UNITS_PER_SEC/100;
> > > + else
> > > + return -1;
> >
> > Do we need to really support UPPER case.
>
> But that's  ALWAYS been the case in the 32 bit version of code above.
> Imagine how many former VMS and MVS hackers you'd upset if they had to
> turn caps-lock off!

I was trying to be funny, of course. If you want us to rework the
patch to also downgrade to
being lowercase for both, ok... I'd rather like to finish getting this
upstream, there's
a change to netem enabling nsec time long stuck behind it.

>
> > Isn't existing matches semantics good enough?
>
> But that's the existing case for the 32 bit api, now replicated in the
> 64 bit api. ? I think the case-insensitive ship has sailed here. Can't
> break userspace.
>
> Well.. adding UTF-8 would be cool. We could start using the actual
> greek  symbols for delta (δ) and beta (β) in particular. It would
> replace a lot of typing, with a whole bunch more shift keys on a
> single letter, fit better into
> 80 column lines, and so on, and tc inputs and outputs are already
> pretty greek to many.
> --
>
> Dave Täht
> CEO, TekLibre, LLC
> http://www.teklibre.com
> Tel: 1-669-226-2619



-- 

Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619


Re: [PATCH iproute2-next] tc: flower: expose hardware offload count

2018-10-03 Thread Davide Caratti


On Wed, 2018-10-03 at 18:29 +0300, Vlad Buslov wrote:
> Recently flower classifier was updated to expose count of devices that
> filter is offloaded to. Add support to print this counter as 'in_hw_count'.
> 
> Signed-off-by: Vlad Buslov 
> Acked-by: Jiri Pirko 
> ---
>  tc/f_flower.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/tc/f_flower.c b/tc/f_flower.c
> index 59e5f572c542..cbacc664d397 100644

hello Vlad!

> --- a/tc/f_flower.c
> +++ b/tc/f_flower.c
> @@ -1585,8 +1585,16 @@ static int flower_print_opt(struct filter_util *qu, 
> FILE *f,
>   if (flags & TCA_CLS_FLAGS_SKIP_SW)
>   print_bool(PRINT_ANY, "skip_sw", "\n  skip_sw", true);
>  
> - if (flags & TCA_CLS_FLAGS_IN_HW)
> + if (flags & TCA_CLS_FLAGS_IN_HW) {
>   print_bool(PRINT_ANY, "in_hw", "\n  in_hw", true);
> +
> + if (tb[TCA_FLOWER_IN_HW_COUNT]) {
> + __u32 count = 
> rta_getattr_u32(tb[TCA_FLOWER_IN_HW_COUNT]);
> +
> + print_uint(PRINT_ANY, "in_hw_count",
> +" in_hw_count %d", count);
^^ maybe using %u in the format is better?

thanks!
-- 
davide



Re: [PATCH v2 net] mpls: allow routes on ip6gre devices

2018-10-03 Thread Simon Horman
On Fri, Sep 21, 2018 at 02:30:05PM -0700, Saif Hasan wrote:
> Summary:
> 
> This appears to be necessary and sufficient change to enable `MPLS` on
> `ip6gre` tunnels (RFC4023).
> 
> This diff allows IP6GRE devices to be recognized by MPLS kernel module
> and hence user can configure interface to accept packets with mpls
> headers as well setup mpls routes on them.
> 
> Test Plan:
> 
> Test plan consists of multiple containers connected via GRE-V6 tunnel.
> Then carrying out testing steps as below.
> 
> - Carry out necessary sysctl settings on all containers
> 
> ```
> sysctl -w net.mpls.platform_labels=65536
> sysctl -w net.mpls.ip_ttl_propagate=1
> sysctl -w net.mpls.conf.lo.input=1
> ```
> 
> - Establish IP6GRE tunnels
> 
> ```
> ip -6 tunnel add name if_1_2_1 mode ip6gre \
>   local 2401:db00:21:6048:feed:0::1 \
>   remote 2401:db00:21:6048:feed:0::2 key 1
> ip link set dev if_1_2_1 up
> sysctl -w net.mpls.conf.if_1_2_1.input=1
> ip -4 addr add 169.254.0.2/31 dev if_1_2_1 scope link
> 
> ip -6 tunnel add name if_1_3_1 mode ip6gre \
>   local 2401:db00:21:6048:feed:0::1 \
>   remote 2401:db00:21:6048:feed:0::3 key 1
> ip link set dev if_1_3_1 up
> sysctl -w net.mpls.conf.if_1_3_1.input=1
> ip -4 addr add 169.254.0.4/31 dev if_1_3_1 scope link
> ```
> 
> - Install MPLS encap rules on node-1 towards node-2
> 
> ```
> ip route add 192.168.0.11/32 nexthop encap mpls 32/64 \
>   via inet 169.254.0.3 dev if_1_2_1
> ```
> 
> - Install MPLS forwarding rules on node-2 and node-3
> ```
> // node2
> ip -f mpls route add 32 via inet 169.254.0.7 dev if_2_4_1
> 
> // node3
> ip -f mpls route add 64 via inet 169.254.0.12 dev if_4_3_1
> ```
> 
> - Ping 192.168.0.11 (node4) from 192.168.0.1 (node1) (where routing
>   towards 192.168.0.1 is via IP route directly towards node1 from node4)
> ```
> ping 192.168.0.11
> ```
> 
> - tcpdump on interface to capture ping packets wrapped within MPLS
>   header which inturn wrapped within IP6GRE header
> 
> ```
> 16:43:41.121073 IP6
>   2401:db00:21:6048:feed::1 > 2401:db00:21:6048:feed::2:
>   DSTOPT GREv0, key=0x1, length 100:
>   MPLS (label 32, exp 0, ttl 255) (label 64, exp 0, [S], ttl 255)
>   IP 192.168.0.1 > 192.168.0.11:
>   ICMP echo request, id 1208, seq 45, length 64
> 
> 0x:  6000 2cdb 006c 3c3f 2401 db00 0021 6048  `.,..l 0x0010:  feed   0001 2401 db00 0021 6048  $!`H
> 0x0020:  feed   0002 2f00 0401 0401 0100  /...
> 0x0030:  2000 8847  0001 0002 00ff 0004 01ff  ...G
> 0x0040:  4500 0054 3280 4000 ff01 c7cb c0a8 0001  E..T2.@.
> 0x0050:  c0a8 000b 0800 a8d7 04b8 002d 2d3c a05b  ...--<.[
> 0x0060:    bcd8 0100   1011 1213  
> 0x0070:  1415 1617 1819 1a1b 1c1d 1e1f 2021 2223  .!"#
> 0x0080:  2425 2627 2829 2a2b 2c2d 2e2f 3031 3233  $%&'()*+,-./0123
> 0x0090:  3435 36374567
> ```
> 
> Signed-off-by: Saif Hasan 

This appears to be consistent with work I did to enable
MPLS over SIT, IPIP and (IPv4) IPGRE.

I do not recall why I did not enable IP6GRE at the time,
it may well have been a oversight.

Reviewed-by: Simon Horman 

> ---
>  net/mpls/af_mpls.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
> index 7a4de6d618b1..8fbe6cdbe255 100644
> --- a/net/mpls/af_mpls.c
> +++ b/net/mpls/af_mpls.c
> @@ -1533,10 +1533,14 @@ static int mpls_dev_notify(struct notifier_block 
> *this, unsigned long event,
>   unsigned int flags;
> 
>   if (event == NETDEV_REGISTER) {
> - /* For now just support Ethernet, IPGRE, SIT and IPIP devices */
> +
> + /* For now just support Ethernet, IPGRE, IP6GRE, SIT and
> +  * IPIP devices
> +  */
>   if (dev->type == ARPHRD_ETHER ||
>   dev->type == ARPHRD_LOOPBACK ||
>   dev->type == ARPHRD_IPGRE ||
> + dev->type == ARPHRD_IP6GRE ||
>   dev->type == ARPHRD_SIT ||
>   dev->type == ARPHRD_TUNNEL) {
>   mdev = mpls_add_dev(dev);
> --
> 2.13.5
> 


[PATCH v2] net: phy: phylink: fix SFP interface autodetection

2018-10-03 Thread Baruch Siach
When connecting SFP PHY to phylink use the detected interface.
Otherwise, the link fails to come up when the configured 'phy-mode'
differs from the SFP detected mode.

Move most of phylink_connect_phy() into __phylink_connect_phy(), and
leave phylink_connect_phy() as a wrapper. phylink_sfp_connect_phy() can
now pass the SFP detected PHY interface to __phylink_connect_phy().

This fixes 1GB SFP module link up on eth3 of the Macchiatobin board that
is configured in the DT to "2500base-x" phy-mode.

Fixes: 9525ae83959b6 ("phylink: add phylink infrastructure")
Suggested-by: Russell King 
Signed-off-by: Baruch Siach 
---
v2: Leave the phylink_connect_phy() functionality unchanged. Only
phylink_sfp_connect_phy() calls __phylink_connect_phy() with the
detected interface (Russell King)
---
 drivers/net/phy/phylink.c | 48 +++
 1 file changed, 28 insertions(+), 20 deletions(-)

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 3ba5cf2a8a5f..7abca86c3aa9 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -717,6 +717,30 @@ static int phylink_bringup_phy(struct phylink *pl, struct 
phy_device *phy)
return 0;
 }
 
+static int __phylink_connect_phy(struct phylink *pl, struct phy_device *phy,
+   phy_interface_t interface)
+{
+   int ret;
+
+   if (WARN_ON(pl->link_an_mode == MLO_AN_FIXED ||
+   (pl->link_an_mode == MLO_AN_INBAND &&
+phy_interface_mode_is_8023z(interface
+   return -EINVAL;
+
+   if (pl->phydev)
+   return -EBUSY;
+
+   ret = phy_attach_direct(pl->netdev, phy, 0, interface);
+   if (ret)
+   return ret;
+
+   ret = phylink_bringup_phy(pl, phy);
+   if (ret)
+   phy_detach(phy);
+
+   return ret;
+}
+
 /**
  * phylink_connect_phy() - connect a PHY to the phylink instance
  * @pl: a pointer to a &struct phylink returned from phylink_create()
@@ -734,31 +758,13 @@ static int phylink_bringup_phy(struct phylink *pl, struct 
phy_device *phy)
  */
 int phylink_connect_phy(struct phylink *pl, struct phy_device *phy)
 {
-   int ret;
-
-   if (WARN_ON(pl->link_an_mode == MLO_AN_FIXED ||
-   (pl->link_an_mode == MLO_AN_INBAND &&
-phy_interface_mode_is_8023z(pl->link_interface
-   return -EINVAL;
-
-   if (pl->phydev)
-   return -EBUSY;
-
/* Use PHY device/driver interface */
if (pl->link_interface == PHY_INTERFACE_MODE_NA) {
pl->link_interface = phy->interface;
pl->link_config.interface = pl->link_interface;
}
 
-   ret = phy_attach_direct(pl->netdev, phy, 0, pl->link_interface);
-   if (ret)
-   return ret;
-
-   ret = phylink_bringup_phy(pl, phy);
-   if (ret)
-   phy_detach(phy);
-
-   return ret;
+   return __phylink_connect_phy(pl, phy, pl->link_interface);
 }
 EXPORT_SYMBOL_GPL(phylink_connect_phy);
 
@@ -1672,7 +1678,9 @@ static void phylink_sfp_link_up(void *upstream)
 
 static int phylink_sfp_connect_phy(void *upstream, struct phy_device *phy)
 {
-   return phylink_connect_phy(upstream, phy);
+   struct phylink *pl = upstream;
+
+   return __phylink_connect_phy(upstream, phy, pl->link_config.interface);
 }
 
 static void phylink_sfp_disconnect_phy(void *upstream)
-- 
2.19.0



Re: [RFC PATCH v2 bpf-next 0/2] verifier liveness simplification

2018-10-03 Thread Alexei Starovoitov
On Wed, Oct 03, 2018 at 04:36:31PM +0100, Jiong Wang wrote:
> On 28/09/2018 14:36, Edward Cree wrote:
> > On 26/09/18 23:16, Jiong Wang wrote:
> >> On 22/08/2018 20:00, Edward Cree wrote:
> >>> In the future this idea may be extended to form use-def chains.
> >>
> >>   1. instruction level use->def chain
> >>
> >>  - new use->def chains for each instruction. one eBPF insn could have
> two
> >>    uses at maximum.
> > I was thinking of something a lot weaker/simpler, just making
> > ld rX, rY
> >  copy rY.parent into rX.parent and not read-mark rY (whereas actual
> >  arithmetic, pointer deref etc. would still create read marks).
> 
> Thanks for the feedback Edward.
> 
> > But what you've described sounds interesting; perhaps it would also
> >  help later with loop-variable handling?
> 
> Haven't considered how to use this for loop-variable handling, guess you
> mean
> applying what I have described to your previous loop detection RFC? I will
> look
> into your RFC later.
> 
> At the moment the design of the use->def chain is mainly to optimize 32-bit
> code-gen. I was about to satisfied with a local implementation and to share
> it
> to ML for further discussion. However, when manually check the optimization
> result on testcase with medium size (~1000 eBPF insns) and proper complexity
> (make sure path prunes etc are triggered inside verifier), I found the
> code-gen
> doesn't meet my expectation.
> 
> For example, for the following sequence, insn at 25 should operate on
> full-64
> bit but I found it is marked as 32-bit safe.
> 
>   25:    r7 = 1
>   26:    if r4 > r8 goto +1200 
>   27:    r1 = *(u8 *)(r1 + 0)
>   28:    r1 &= 15
>   29:    r7 = 1
>   ...
> 
> L:
>   1227:  r0 = r7
>   1228:  exit
> 
> As described at previous email, the algorithm assume all insns are 32-bit
> safe
> first, then start to insns back to "64-bit" if there is any 64-bit use found
> for a insn.
> 
> Insn 25 is defining r7 which is used at the 1227 where its value propagated
> to
> r0 and then r0 is implicitly used at insn 1228 as it is a exit from main
> function to external.
> 
> For above example, as we don't know the external use of r0 at 1228 (exit
> from
> main to external), so r0 is treated as 64-bit implicit use. The define is at
> 1227, so insn 1227 is marked as "64-bit". The "64-bit" attribute should
> propagate to source register operand through register move and arithmetic,
> so
> r7 at insn 1227 is a "64-bit" use and should make its definition
> instruction,
> insn 25, marked as "64-bit". This is my thinking of how insn 25 should be
> marked.

all makes sense to me.

> Now this hasn't happened. I am still debugging the root cause, but kind of
> feel
> "64-bit" attribute propagation is the issue, it seems to me it can't be
> nicely
> integrated into the existing register read/write propagation infrastructure.

may be share your patches that modify the liveness propagation?

> For
> example, for a slightly more complex sequence which is composed of three
> states:
> 
> State A
>   ...
>   10: r6 = *(u32 *)(r10 - 4)
>   11: r7 = *(u32 *)(r10 - 8)
>   12: *(u64 *)(r10 - 16) = r6
>   13: *(u64 *)(r10 - 24) = r7
> 
> State B
>   14: r6 += 1
>   15: r7 += r6
>   16: *(u32 *)(r10 - 28) = r7
> 
> State C
>   ...
>   17: r3 += r7
>   18: r4 = 1
>   19: *(u64 *)(r10 - 32) = r3
>   20: *(u64 *)(r10 - 40) = r4
> 
> State A is parent of state B which is parent of state C.
> 
> Inside state C, at insn 20, r4 is a 64-bit read/use, so its define at 18 is
> marked as "64-bit". There is no register source at 18, so "64-bit" attribute
> propagation is stopped.
> 
> Then at insn 19, r3 is a 64-bit read/use, so its define at 17 is marked as
> "64-bit" read/use. Insn 17 has two register sources, r3 and r7, they become
> "64-bit" now, and their definition should be marked as "64-bit".
> 
> Now if the definition of r3 or r7 comes from parent state, then the parent

... the definition of r3 _and_ r7 ...
both need to propagate up with your algo, right?

> state
> should receive a "REG_LIVE_READ64", this is necessary if later another path
> reaches state C and triggers prune path, for which case that path should
> know
> there is "64-bit" use inside state C on some registers, and should use this
> information to mark "64-bit" insn.
> 
> If the definition of r3 or r7 is still inside state C, we need to keep
> walking
> up the instruction sequences, and propagate "64-bit" attribute upward until
> it
> goes beyond the state C.
> 
> The above propagation logic is quite different from existing register
> read/write
> propagation.
> For the latter, a write just screen up all following read, and
> a
> read would propagate directly to its parent is there is not previous write,
> no
> instruction analysis is required.

correct.
with such algo REG_LIVE_WRITTEN shouldn't be screening the propagation.

I think the patches will discuss the algo.
Also I think the initial state of 'everything is 32-bit safe'
and make marks to enforce 64-bit-ne

[net-next 09/15] ice: Extend malicious operations detection logic

2018-10-03 Thread Jeff Kirsher
From: Anirudh Venkataramanan 

This patch extends the existing malicious driver operation detection
logic to cover malicious operations by the VF driver as well.

Signed-off-by: Anirudh Venkataramanan 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 .../net/ethernet/intel/ice/ice_hw_autogen.h   |  8 
 drivers/net/ethernet/intel/ice/ice_main.c | 46 +++
 .../net/ethernet/intel/ice/ice_virtchnl_pf.h  |  8 +++-
 3 files changed, 60 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h 
b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
index 5a4fa22d0a83..a6679a9bfd3a 100644
--- a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
+++ b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
@@ -219,6 +219,14 @@
 #define PF_MDET_TX_PQM_VALID_M BIT(0)
 #define PF_MDET_TX_TCLAN   0x000FC000
 #define PF_MDET_TX_TCLAN_VALID_M   BIT(0)
+#define VP_MDET_RX(_VF)(0x00294400 + ((_VF) * 
4))
+#define VP_MDET_RX_VALID_M BIT(0)
+#define VP_MDET_TX_PQM(_VF)(0x002D2000 + ((_VF) * 4))
+#define VP_MDET_TX_PQM_VALID_M BIT(0)
+#define VP_MDET_TX_TCLAN(_VF)  (0x000FB800 + ((_VF) * 4))
+#define VP_MDET_TX_TCLAN_VALID_M   BIT(0)
+#define VP_MDET_TX_TDPU(_VF)   (0x0004 + ((_VF) * 4))
+#define VP_MDET_TX_TDPU_VALID_MBIT(0)
 #define GLNVM_FLA  0x000B6108
 #define GLNVM_FLA_LOCKED_M BIT(6)
 #define GLNVM_GENS 0x000B6100
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c 
b/drivers/net/ethernet/intel/ice/ice_main.c
index d7cbc2e6e5c5..948c97defeba 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -951,6 +951,7 @@ static void ice_handle_mdd_event(struct ice_pf *pf)
struct ice_hw *hw = &pf->hw;
bool mdd_detected = false;
u32 reg;
+   int i;
 
if (!test_bit(__ICE_MDD_EVENT_PENDING, pf->state))
return;
@@ -1040,6 +1041,51 @@ static void ice_handle_mdd_event(struct ice_pf *pf)
}
}
 
+   /* see if one of the VFs needs to be reset */
+   for (i = 0; i < pf->num_alloc_vfs && mdd_detected; i++) {
+   struct ice_vf *vf = &pf->vf[i];
+
+   reg = rd32(hw, VP_MDET_TX_PQM(i));
+   if (reg & VP_MDET_TX_PQM_VALID_M) {
+   wr32(hw, VP_MDET_TX_PQM(i), 0x);
+   vf->num_mdd_events++;
+   dev_info(&pf->pdev->dev, "TX driver issue detected on 
VF %d\n",
+i);
+   }
+
+   reg = rd32(hw, VP_MDET_TX_TCLAN(i));
+   if (reg & VP_MDET_TX_TCLAN_VALID_M) {
+   wr32(hw, VP_MDET_TX_TCLAN(i), 0x);
+   vf->num_mdd_events++;
+   dev_info(&pf->pdev->dev, "TX driver issue detected on 
VF %d\n",
+i);
+   }
+
+   reg = rd32(hw, VP_MDET_TX_TDPU(i));
+   if (reg & VP_MDET_TX_TDPU_VALID_M) {
+   wr32(hw, VP_MDET_TX_TDPU(i), 0x);
+   vf->num_mdd_events++;
+   dev_info(&pf->pdev->dev, "TX driver issue detected on 
VF %d\n",
+i);
+   }
+
+   reg = rd32(hw, VP_MDET_RX(i));
+   if (reg & VP_MDET_RX_VALID_M) {
+   wr32(hw, VP_MDET_RX(i), 0x);
+   vf->num_mdd_events++;
+   dev_info(&pf->pdev->dev, "RX driver issue detected on 
VF %d\n",
+i);
+   }
+
+   if (vf->num_mdd_events > ICE_DFLT_NUM_MDD_EVENTS_ALLOWED) {
+   dev_info(&pf->pdev->dev,
+"Too many MDD events on VF %d, disabled\n", i);
+   dev_info(&pf->pdev->dev,
+"Use PF Control I/F to re-enable the VF\n");
+   set_bit(ICE_VF_STATE_DIS, vf->vf_states);
+   }
+   }
+
/* re-enable MDD interrupt cause */
clear_bit(__ICE_MDD_EVENT_PENDING, pf->state);
reg = rd32(hw, PFINT_OICR_ENA);
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h 
b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h
index a493cb1bb89d..10131e0180f9 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h
@@ -9,10 +9,13 @@
 #define ICE_VLAN_PRIORITY_S12
 #define ICE_VLAN_M 0xFFF
 #define ICE_PRIORITY_M 0x7000
-#define ICE_MAX_VLAN_PER_VF8 /* restriction for non-trusted VF */
 
-/* Restrict number of MACs a non-trusted VF can program */
+/* Restrict number 

[net-next 06/15] ice: Add handlers for VF netdevice operations

2018-10-03 Thread Jeff Kirsher
From: Anirudh Venkataramanan 

This patch implements handlers for the following NDO operations:

.ndo_set_vf_spoofchk
.ndo_set_vf_mac
.ndo_get_vf_config
.ndo_set_vf_trust
.ndo_set_vf_vlan
.ndo_set_vf_link_state

Signed-off-by: Anirudh Venkataramanan 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 .../net/ethernet/intel/ice/ice_lan_tx_rx.h|  12 +
 drivers/net/ethernet/intel/ice/ice_main.c |   6 +
 drivers/net/ethernet/intel/ice/ice_sriov.c|  86 
 drivers/net/ethernet/intel/ice/ice_sriov.h|   9 +
 .../net/ethernet/intel/ice/ice_virtchnl_pf.c  | 438 ++
 .../net/ethernet/intel/ice/ice_virtchnl_pf.h  |  79 +++-
 6 files changed, 629 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h 
b/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h
index f5269f780e1c..7d2a66739e3f 100644
--- a/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h
+++ b/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h
@@ -474,4 +474,16 @@ static inline struct ice_rx_ptype_decoded 
ice_decode_rx_desc_ptype(u16 ptype)
 {
return ice_ptype_lkup[ptype];
 }
+
+#define ICE_LINK_SPEED_UNKNOWN 0
+#define ICE_LINK_SPEED_10MBPS  10
+#define ICE_LINK_SPEED_100MBPS 100
+#define ICE_LINK_SPEED_1000MBPS1000
+#define ICE_LINK_SPEED_2500MBPS2500
+#define ICE_LINK_SPEED_5000MBPS5000
+#define ICE_LINK_SPEED_1MBPS   1
+#define ICE_LINK_SPEED_2MBPS   2
+#define ICE_LINK_SPEED_25000MBPS   25000
+#define ICE_LINK_SPEED_4MBPS   4
+
 #endif /* _ICE_LAN_TX_RX_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c 
b/drivers/net/ethernet/intel/ice/ice_main.c
index f1a116c9b527..01112ae7fdc3 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -3884,6 +3884,12 @@ static const struct net_device_ops ice_netdev_ops = {
.ndo_validate_addr = eth_validate_addr,
.ndo_change_mtu = ice_change_mtu,
.ndo_get_stats64 = ice_get_stats64,
+   .ndo_set_vf_spoofchk = ice_set_vf_spoofchk,
+   .ndo_set_vf_mac = ice_set_vf_mac,
+   .ndo_get_vf_config = ice_get_vf_cfg,
+   .ndo_set_vf_trust = ice_set_vf_trust,
+   .ndo_set_vf_vlan = ice_set_vf_port_vlan,
+   .ndo_set_vf_link_state = ice_set_vf_link_state,
.ndo_vlan_rx_add_vid = ice_vlan_rx_add_vid,
.ndo_vlan_rx_kill_vid = ice_vlan_rx_kill_vid,
.ndo_set_features = ice_set_features,
diff --git a/drivers/net/ethernet/intel/ice/ice_sriov.c 
b/drivers/net/ethernet/intel/ice/ice_sriov.c
index 191e832134b6..027eba4e13f8 100644
--- a/drivers/net/ethernet/intel/ice/ice_sriov.c
+++ b/drivers/net/ethernet/intel/ice/ice_sriov.c
@@ -39,3 +39,89 @@ ice_aq_send_msg_to_vf(struct ice_hw *hw, u16 vfid, u32 
v_opcode, u32 v_retval,
 
return ice_sq_send_cmd(hw, &hw->mailboxq, &desc, msg, msglen, cd);
 }
+
+/**
+ * ice_conv_link_speed_to_virtchnl
+ * @adv_link_support: determines the format of the returned link speed
+ * @link_speed: variable containing the link_speed to be converted
+ *
+ * Convert link speed supported by HW to link speed supported by virtchnl.
+ * If adv_link_support is true, then return link speed in Mbps.  Else return
+ * link speed as a VIRTCHNL_LINK_SPEED_* casted to a u32. Note that the caller
+ * needs to cast back to an enum virtchnl_link_speed in the case where
+ * adv_link_support is false, but when adv_link_support is true the caller can
+ * expect the speed in Mbps.
+ */
+u32 ice_conv_link_speed_to_virtchnl(bool adv_link_support, u16 link_speed)
+{
+   u32 speed;
+
+   if (adv_link_support)
+   switch (link_speed) {
+   case ICE_AQ_LINK_SPEED_10MB:
+   speed = ICE_LINK_SPEED_10MBPS;
+   break;
+   case ICE_AQ_LINK_SPEED_100MB:
+   speed = ICE_LINK_SPEED_100MBPS;
+   break;
+   case ICE_AQ_LINK_SPEED_1000MB:
+   speed = ICE_LINK_SPEED_1000MBPS;
+   break;
+   case ICE_AQ_LINK_SPEED_2500MB:
+   speed = ICE_LINK_SPEED_2500MBPS;
+   break;
+   case ICE_AQ_LINK_SPEED_5GB:
+   speed = ICE_LINK_SPEED_5000MBPS;
+   break;
+   case ICE_AQ_LINK_SPEED_10GB:
+   speed = ICE_LINK_SPEED_1MBPS;
+   break;
+   case ICE_AQ_LINK_SPEED_20GB:
+   speed = ICE_LINK_SPEED_2MBPS;
+   break;
+   case ICE_AQ_LINK_SPEED_25GB:
+   speed = ICE_LINK_SPEED_25000MBPS;
+   break;
+   case ICE_AQ_LINK_SPEED_40GB:
+   speed = ICE_LINK_SPEED_4MBPS;
+   break;
+   default:
+   speed = ICE_LINK_SPEED_UNKNOWN;
+

[net-next 00/15][pull request] 100GbE Intel Wired LAN Driver Updates 2018-10-03

2018-10-03 Thread Jeff Kirsher
This series contains updates to ice and virtchnl.

Yashaswini Raghuram adds a new virtchnl capability flag to support the
exchange of additional supported speeds.

Anirudh adds support for SR-IOV for the ice driver.  Added code to
initialize, configure and use mailbox queues for PF and VF
communication.  Updated the VSI and queue management to handle both PF
and VF VSI type.  Added "Adaptive Virtual Function (AVF)" support for
the ice PF driver by implementing virtchnl commands.  Extended the
malicious driver detection logic to include the VF driver as well.
Fixed the queue region size which needs to be log base 2 of the number
of queues in region.

Brett fixes an issue which was causing switch rules to be lost, by
making a call to ice_update_pkt_fwd_rule() with the necessary changes.
Fixed how the PF and VF assigned the ITR index by adding a struct member
itr_idx to be used to dynamically program the correct ITR index.

Dave fixed a potential NULL pointer dereference by adding checks in the
filter handling.

The following are changes since commit 4e6d47206c32d1bbb4931f1d851dae3870e0df81:
  tls: Add support for inplace records encryption
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 100GbE

Anirudh Venkataramanan (10):
  ice: Add support to detect SR-IOV capability and mailbox queues
  ice: Add handler to configure SR-IOV
  ice: Update VSI and queue management code to handle VF VSI
  ice: Add support for VF reset events
  ice: Add handlers for VF netdevice operations
  ice: Implement virtchnl commands for AVF support
  ice: Notify VF of link status change
  ice: Extend malicious operations detection logic
  ice: Fix forward to queue group logic
  ice: Update version string

Brett Creeley (2):
  ice: Add code to go from ICE_FWD_TO_VSI_LIST to ICE_FWD_TO_VSI
  ice: Add more flexibility on how we assign an ITR index

Dave Ertman (2):
  ice: Fix potential null pointer issues
  ice: Use the right function to enable/disable VSI

Yashaswini Raghuram Prathivadi Bhayankaram (1):
  virtchnl: Added support to exchange additional speed values

 drivers/net/ethernet/intel/ice/Makefile   |1 +
 drivers/net/ethernet/intel/ice/ice.h  |   34 +
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |   20 +
 drivers/net/ethernet/intel/ice/ice_common.c   |   78 +-
 drivers/net/ethernet/intel/ice/ice_common.h   |4 +-
 drivers/net/ethernet/intel/ice/ice_controlq.c |   46 +-
 drivers/net/ethernet/intel/ice/ice_controlq.h |2 +
 .../net/ethernet/intel/ice/ice_hw_autogen.h   |   69 +
 .../net/ethernet/intel/ice/ice_lan_tx_rx.h|   13 +
 drivers/net/ethernet/intel/ice/ice_lib.c  |  282 +-
 drivers/net/ethernet/intel/ice/ice_lib.h  |3 +-
 drivers/net/ethernet/intel/ice/ice_main.c |  150 +-
 drivers/net/ethernet/intel/ice/ice_sriov.c|  127 +
 drivers/net/ethernet/intel/ice/ice_sriov.h|   34 +
 drivers/net/ethernet/intel/ice/ice_status.h   |3 +
 drivers/net/ethernet/intel/ice/ice_switch.c   |   66 +-
 drivers/net/ethernet/intel/ice/ice_switch.h   |1 +
 drivers/net/ethernet/intel/ice/ice_txrx.h |   13 +-
 drivers/net/ethernet/intel/ice/ice_type.h |   20 +
 .../net/ethernet/intel/ice/ice_virtchnl_pf.c  | 2668 +
 .../net/ethernet/intel/ice/ice_virtchnl_pf.h  |  173 ++
 include/linux/avf/virtchnl.h  |   15 +
 22 files changed, 3714 insertions(+), 108 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/ice/ice_sriov.c
 create mode 100644 drivers/net/ethernet/intel/ice/ice_sriov.h
 create mode 100644 drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
 create mode 100644 drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h

-- 
2.17.1



[net-next 04/15] ice: Update VSI and queue management code to handle VF VSI

2018-10-03 Thread Jeff Kirsher
From: Anirudh Venkataramanan 

Until now, all the VSI and queue management code supported only the PF
VSI type (ICE_VSI_PF). Update these flows to handle the VF VSI type
(ICE_VSI_VF) type as well.

Signed-off-by: Anirudh Venkataramanan 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ice/ice.h  |   2 +
 .../net/ethernet/intel/ice/ice_hw_autogen.h   |   1 +
 .../net/ethernet/intel/ice/ice_lan_tx_rx.h|   1 +
 drivers/net/ethernet/intel/ice/ice_lib.c  | 210 +++---
 drivers/net/ethernet/intel/ice/ice_switch.h   |   1 +
 drivers/net/ethernet/intel/ice/ice_type.h |   3 +
 6 files changed, 184 insertions(+), 34 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice.h 
b/drivers/net/ethernet/intel/ice/ice.h
index f788cd63237a..89ec05e9983b 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -202,6 +202,8 @@ struct ice_vsi {
/* Interrupt thresholds */
u16 work_lmt;
 
+   s16 vf_id;  /* VF ID for SR-IOV VSIs */
+
/* RSS config */
u16 rss_table_size; /* HW RSS table size */
u16 rss_size;   /* Allocated RSS queues */
diff --git a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h 
b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
index b676b3151d04..12d4c862bf05 100644
--- a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
+++ b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
@@ -312,6 +312,7 @@
 #define GLV_UPTCH(_i)  (0x0030A004 + ((_i) * 8))
 #define GLV_UPTCL(_i)  (0x0030A000 + ((_i) * 8))
 #define VSIQF_HKEY_MAX_INDEX   12
+#define VSIQF_HLUT_MAX_INDEX   15
 #define VFINT_DYN_CTLN(_i) (0x3800 + ((_i) * 4))
 #define VFINT_DYN_CTLN_CLEARPBA_M  BIT(1)
 
diff --git a/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h 
b/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h
index 94504023d86e..f5269f780e1c 100644
--- a/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h
+++ b/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h
@@ -418,6 +418,7 @@ struct ice_tlan_ctx {
u8  pf_num;
u16 vmvf_num;
u8  vmvf_type;
+#define ICE_TLAN_CTX_VMVF_TYPE_VF  0
 #define ICE_TLAN_CTX_VMVF_TYPE_VMQ 1
 #define ICE_TLAN_CTX_VMVF_TYPE_PF  2
u16 src_vsi;
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c 
b/drivers/net/ethernet/intel/ice/ice_lib.c
index 4b26705a9ab5..8139302cd92b 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -68,18 +68,20 @@ static int ice_setup_rx_ctx(struct ice_ring *ring)
 /* Enable Flexible Descriptors in the queue context which
  * allows this driver to select a specific receive descriptor format
  */
-   regval = rd32(hw, QRXFLXP_CNTXT(pf_q));
-   regval |= (rxdid << QRXFLXP_CNTXT_RXDID_IDX_S) &
-   QRXFLXP_CNTXT_RXDID_IDX_M;
-
-   /* increasing context priority to pick up profile id;
-* default is 0x01; setting to 0x03 to ensure profile
-* is programming if prev context is of same priority
-*/
-   regval |= (0x03 << QRXFLXP_CNTXT_RXDID_PRIO_S) &
-   QRXFLXP_CNTXT_RXDID_PRIO_M;
+   if (vsi->type != ICE_VSI_VF) {
+   regval = rd32(hw, QRXFLXP_CNTXT(pf_q));
+   regval |= (rxdid << QRXFLXP_CNTXT_RXDID_IDX_S) &
+   QRXFLXP_CNTXT_RXDID_IDX_M;
+
+   /* increasing context priority to pick up profile id;
+* default is 0x01; setting to 0x03 to ensure profile
+* is programming if prev context is of same priority
+*/
+   regval |= (0x03 << QRXFLXP_CNTXT_RXDID_PRIO_S) &
+   QRXFLXP_CNTXT_RXDID_PRIO_M;
 
-   wr32(hw, QRXFLXP_CNTXT(pf_q), regval);
+   wr32(hw, QRXFLXP_CNTXT(pf_q), regval);
+   }
 
/* Absolute queue number out of 2K needs to be passed */
err = ice_write_rxq_ctx(hw, &rlan_ctx, pf_q);
@@ -90,6 +92,9 @@ static int ice_setup_rx_ctx(struct ice_ring *ring)
return -EIO;
}
 
+   if (vsi->type == ICE_VSI_VF)
+   return 0;
+
/* init queue specific tail register */
ring->tail = hw->hw_addr + QRX_TAIL(pf_q);
writel(0, ring->tail);
@@ -132,6 +137,11 @@ ice_setup_tx_ctx(struct ice_ring *ring, struct 
ice_tlan_ctx *tlan_ctx, u16 pf_q)
case ICE_VSI_PF:
tlan_ctx->vmvf_type = ICE_TLAN_CTX_VMVF_TYPE_PF;
break;
+   case ICE_VSI_VF:
+   /* Firmware expects vmvf_num to be absolute VF id */
+   tlan_ctx->vmvf_num = hw->func_caps.vf_base_id + vsi->vf_id;
+   tlan_ctx->vmvf_type = ICE_TLAN_CTX_VMVF_TYPE_VF;
+   break;
default:
return;
}
@@ -285,6 +295,16 @@ static void ice_vsi_set_num_qs(struct ice_vsi *vsi)
 

[net-next 03/15] ice: Add handler to configure SR-IOV

2018-10-03 Thread Jeff Kirsher
From: Anirudh Venkataramanan 

This patch implements parts of ice_sriov_configure and VF reset flow.

To create virtual functions (VFs), the user sets a value in num_vfs
through sysfs. This results in the kernel calling the handler for
.sriov_configure which is ice_sriov_configure.

VF setup first starts with a VF reset, followed by allocation of the VF
VSI using ice_vf_vsi_setup. Once the VF setup is complete a state bit
ICE_VF_STATE_INIT is set in the vf->states bitmap to indicate that
the VF is ready to go.

Also for VF reset to go into effect, it's necessary to issue a disable
queue command (ice_aqc_opc_dis_txqs). So this patch updates multiple
functions in the disable queue flow to take additional parameters that
distinguish if queues are being disabled due to VF reset.

Signed-off-by: Anirudh Venkataramanan 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ice/Makefile   |   1 +
 drivers/net/ethernet/intel/ice/ice.h  |  24 +
 drivers/net/ethernet/intel/ice/ice_common.c   |  56 +-
 drivers/net/ethernet/intel/ice/ice_common.h   |   4 +-
 .../net/ethernet/intel/ice/ice_hw_autogen.h   |  38 +
 drivers/net/ethernet/intel/ice/ice_lib.c  |   7 +-
 drivers/net/ethernet/intel/ice/ice_lib.h  |   3 +-
 drivers/net/ethernet/intel/ice/ice_main.c |   6 +-
 drivers/net/ethernet/intel/ice/ice_type.h |  10 +
 .../net/ethernet/intel/ice/ice_virtchnl_pf.c  | 847 ++
 .../net/ethernet/intel/ice/ice_virtchnl_pf.h  |  74 ++
 11 files changed, 1061 insertions(+), 9 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
 create mode 100644 drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h

diff --git a/drivers/net/ethernet/intel/ice/Makefile 
b/drivers/net/ethernet/intel/ice/Makefile
index 45125bd074d9..1999cd09239e 100644
--- a/drivers/net/ethernet/intel/ice/Makefile
+++ b/drivers/net/ethernet/intel/ice/Makefile
@@ -16,3 +16,4 @@ ice-y := ice_main.o   \
 ice_lib.o  \
 ice_txrx.o \
 ice_ethtool.o
+ice-$(CONFIG_PCI_IOV) += ice_virtchnl_pf.o
diff --git a/drivers/net/ethernet/intel/ice/ice.h 
b/drivers/net/ethernet/intel/ice/ice.h
index 639d45d1da49..f788cd63237a 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include "ice_devids.h"
 #include "ice_type.h"
@@ -35,6 +36,7 @@
 #include "ice_switch.h"
 #include "ice_common.h"
 #include "ice_sched.h"
+#include "ice_virtchnl_pf.h"
 
 extern const char ice_drv_ver[];
 #define ICE_BAR0   0
@@ -65,6 +67,12 @@ extern const char ice_drv_ver[];
 #define ICE_INVAL_Q_INDEX  0x
 #define ICE_INVAL_VFID 256
 #define ICE_MAX_VF_COUNT   256
+#define ICE_MAX_QS_PER_VF  256
+#define ICE_MIN_QS_PER_VF  1
+#define ICE_DFLT_QS_PER_VF 4
+#define ICE_MAX_INTR_PER_VF65
+#define ICE_MIN_INTR_PER_VF(ICE_MIN_QS_PER_VF + 1)
+#define ICE_DFLT_INTR_PER_VF   (ICE_DFLT_QS_PER_VF + 1)
 
 #define ICE_VSIQF_HKEY_ARRAY_SIZE  ((VSIQF_HKEY_MAX_INDEX + 1) *   4)
 
@@ -135,10 +143,20 @@ enum ice_state {
__ICE_EMPR_RECV,/* set by OICR handler */
__ICE_SUSPENDED,/* set on module remove path */
__ICE_RESET_FAILED, /* set by reset/rebuild */
+   /* When checking for the PF to be in a nominal operating state, the
+* bits that are grouped at the beginning of the list need to be
+* checked.  Bits occurring before __ICE_STATE_NOMINAL_CHECK_BITS will
+* be checked.  If you need to add a bit into consideration for nominal
+* operating state, it must be added before
+* __ICE_STATE_NOMINAL_CHECK_BITS.  Do not move this entry's position
+* without appropriate consideration.
+*/
+   __ICE_STATE_NOMINAL_CHECK_BITS,
__ICE_ADMINQ_EVENT_PENDING,
__ICE_MAILBOXQ_EVENT_PENDING,
__ICE_MDD_EVENT_PENDING,
__ICE_FLTR_OVERFLOW_PROMISC,
+   __ICE_VF_DIS,
__ICE_CFG_BUSY,
__ICE_SERVICE_SCHED,
__ICE_SERVICE_DIS,
@@ -243,6 +261,7 @@ enum ice_pf_flags {
ICE_FLAG_MSIX_ENA,
ICE_FLAG_FLTR_SYNC,
ICE_FLAG_RSS_ENA,
+   ICE_FLAG_SRIOV_ENA,
ICE_FLAG_SRIOV_CAPABLE,
ICE_PF_FLAGS_NBITS  /* must be last */
 };
@@ -259,7 +278,12 @@ struct ice_pf {
 
struct ice_vsi **vsi;   /* VSIs created by the driver */
struct ice_sw *first_sw;/* first switch created by firmware */
+   /* Virtchnl/SR-IOV config info */
+   struct ice_vf *vf;
+   int num_alloc_vfs;  /* actual number of VFs allocated */
u16 num_vfs_supported;  /* num VFs supported for this PF */
+   u16 num_vf_qps; /* num queue pairs per VF */
+   u16 num_vf_msix;/* num vectors per VF */
DECLARE_

[net-next 10/15] ice: Fix forward to queue group logic

2018-10-03 Thread Jeff Kirsher
From: Anirudh Venkataramanan 

When adding a rule, queue region size needs to be provided as log base 2
of the number of queues in region. Fix that.

Signed-off-by: Anirudh Venkataramanan 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ice/ice_switch.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_switch.c 
b/drivers/net/ethernet/intel/ice/ice_switch.c
index 9a2664fd87b8..ff933a348acc 100644
--- a/drivers/net/ethernet/intel/ice/ice_switch.c
+++ b/drivers/net/ethernet/intel/ice/ice_switch.c
@@ -656,6 +656,7 @@ ice_fill_sw_rule(struct ice_hw *hw, struct ice_fltr_info 
*f_info,
u8 *eth_hdr;
u32 act = 0;
__be16 *off;
+   u8 q_rgn;
 
if (opc == ice_aqc_opc_remove_sw_rules) {
s_rule->pdata.lkup_tx_rx.act = 0;
@@ -694,14 +695,19 @@ ice_fill_sw_rule(struct ice_hw *hw, struct ice_fltr_info 
*f_info,
act |= (f_info->fwd_id.q_id << ICE_SINGLE_ACT_Q_INDEX_S) &
ICE_SINGLE_ACT_Q_INDEX_M;
break;
+   case ICE_DROP_PACKET:
+   act |= ICE_SINGLE_ACT_VSI_FORWARDING | ICE_SINGLE_ACT_DROP |
+   ICE_SINGLE_ACT_VALID_BIT;
+   break;
case ICE_FWD_TO_QGRP:
+   q_rgn = f_info->qgrp_size > 0 ?
+   (u8)ilog2(f_info->qgrp_size) : 0;
act |= ICE_SINGLE_ACT_TO_Q;
-   act |= (f_info->qgrp_size << ICE_SINGLE_ACT_Q_REGION_S) &
+   act |= (f_info->fwd_id.q_id << ICE_SINGLE_ACT_Q_INDEX_S) &
+   ICE_SINGLE_ACT_Q_INDEX_M;
+   act |= (q_rgn << ICE_SINGLE_ACT_Q_REGION_S) &
ICE_SINGLE_ACT_Q_REGION_M;
break;
-   case ICE_DROP_PACKET:
-   act |= ICE_SINGLE_ACT_VSI_FORWARDING | ICE_SINGLE_ACT_DROP;
-   break;
default:
return;
}
-- 
2.17.1



[net-next 13/15] ice: Add more flexibility on how we assign an ITR index

2018-10-03 Thread Jeff Kirsher
From: Brett Creeley 

This issue came about when looking at the VF function
ice_vc_cfg_irq_map_msg. Currently we are assigning the itr_setting value
to the itr_idx received from the AVF driver, which is not correct and is
not used for the VF flow anyway. Currently the only way we set the ITR
index for both the PF and VF driver is by hard coding ICE_TX_ITR or
ICE_RX_ITR for the ITR index on each q_vector.

To fix this, add the member itr_idx in struct ice_ring_container. This
can then be used to dynamically program the correct ITR index. This change
also affected the PF driver so make the necessary changes there as well.

Also, removed the itr_setting member in struct ice_ring because it is not
being used meaningfully and is going to be removed in a future patch that
includes dynamic ITR.

On another note, this will be useful moving forward if we decide to split
Rx/Tx rings on different q_vectors instead of sharing them as queue pairs.

Signed-off-by: Brett Creeley 
Signed-off-by: Anirudh Venkataramanan 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ice/ice_lib.c  | 73 +++
 drivers/net/ethernet/intel/ice/ice_txrx.h | 13 +---
 .../net/ethernet/intel/ice/ice_virtchnl_pf.c  | 20 +++--
 3 files changed, 59 insertions(+), 47 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c 
b/drivers/net/ethernet/intel/ice/ice_lib.c
index 8139302cd92b..49f1940772ed 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -1204,7 +1204,6 @@ static int ice_vsi_alloc_rings(struct ice_vsi *vsi)
ring->vsi = vsi;
ring->dev = &pf->pdev->dev;
ring->count = vsi->num_desc;
-   ring->itr_setting = ICE_DFLT_TX_ITR;
vsi->tx_rings[i] = ring;
}
 
@@ -1224,7 +1223,6 @@ static int ice_vsi_alloc_rings(struct ice_vsi *vsi)
ring->netdev = vsi->netdev;
ring->dev = &pf->pdev->dev;
ring->count = vsi->num_desc;
-   ring->itr_setting = ICE_DFLT_RX_ITR;
vsi->rx_rings[i] = ring;
}
 
@@ -1261,6 +1259,7 @@ static void ice_vsi_map_rings_to_vectors(struct ice_vsi 
*vsi)
tx_rings_per_v = DIV_ROUND_UP(tx_rings_rem, q_vectors - v_id);
q_vector->num_ring_tx = tx_rings_per_v;
q_vector->tx.ring = NULL;
+   q_vector->tx.itr_idx = ICE_TX_ITR;
q_base = vsi->num_txq - tx_rings_rem;
 
for (q_id = q_base; q_id < (q_base + tx_rings_per_v); q_id++) {
@@ -1276,6 +1275,7 @@ static void ice_vsi_map_rings_to_vectors(struct ice_vsi 
*vsi)
rx_rings_per_v = DIV_ROUND_UP(rx_rings_rem, q_vectors - v_id);
q_vector->num_ring_rx = rx_rings_per_v;
q_vector->rx.ring = NULL;
+   q_vector->rx.itr_idx = ICE_RX_ITR;
q_base = vsi->num_rxq - rx_rings_rem;
 
for (q_id = q_base; q_id < (q_base + rx_rings_per_v); q_id++) {
@@ -1683,6 +1683,37 @@ static u32 ice_intrl_usec_to_reg(u8 intrl, u8 gran)
return 0;
 }
 
+/**
+ * ice_cfg_itr - configure the initial interrupt throttle values
+ * @hw: pointer to the HW structure
+ * @q_vector: interrupt vector that's being configured
+ * @vector: HW vector index to apply the interrupt throttling to
+ *
+ * Configure interrupt throttling values for the ring containers that are
+ * associated with the interrupt vector passed in.
+ */
+static void
+ice_cfg_itr(struct ice_hw *hw, struct ice_q_vector *q_vector, u16 vector)
+{
+   u8 itr_gran = hw->itr_gran;
+
+   if (q_vector->num_ring_rx) {
+   struct ice_ring_container *rc = &q_vector->rx;
+
+   rc->itr = ITR_TO_REG(ICE_DFLT_RX_ITR, itr_gran);
+   rc->latency_range = ICE_LOW_LATENCY;
+   wr32(hw, GLINT_ITR(rc->itr_idx, vector), rc->itr);
+   }
+
+   if (q_vector->num_ring_tx) {
+   struct ice_ring_container *rc = &q_vector->tx;
+
+   rc->itr = ITR_TO_REG(ICE_DFLT_TX_ITR, itr_gran);
+   rc->latency_range = ICE_LOW_LATENCY;
+   wr32(hw, GLINT_ITR(rc->itr_idx, vector), rc->itr);
+   }
+}
+
 /**
  * ice_vsi_cfg_msix - MSIX mode Interrupt Config in the HW
  * @vsi: the VSI being configured
@@ -1693,31 +1724,13 @@ void ice_vsi_cfg_msix(struct ice_vsi *vsi)
u16 vector = vsi->hw_base_vector;
struct ice_hw *hw = &pf->hw;
u32 txq = 0, rxq = 0;
-   int i, q, itr;
-   u8 itr_gran;
+   int i, q;
 
for (i = 0; i < vsi->num_q_vectors; i++, vector++) {
struct ice_q_vector *q_vector = vsi->q_vectors[i];
 
-   itr_gran = hw->itr_gran;
+   ice_cfg_itr(hw, q_vector, vector);
 
-   q_vector->intrl = ICE_DFLT_INTRL;
-
-   if (q_vector->num_ring_rx) {
-   q_vector->rx.itr =
-   ITR_TO

[net-next 02/15] ice: Add support to detect SR-IOV capability and mailbox queues

2018-10-03 Thread Jeff Kirsher
From: Anirudh Venkataramanan 

Mailbox queue is a type of control queue that's used for communication
between PF and VF. This patch adds code to initialize, configure and
use mailbox queues.

This patch also adds support to detect and parse SR-IOV capabilities
returned by the hardware.

Signed-off-by: Anirudh Venkataramanan 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ice/ice.h  |  5 ++
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |  2 +
 drivers/net/ethernet/intel/ice/ice_common.c   | 22 +
 drivers/net/ethernet/intel/ice/ice_controlq.c | 46 +-
 drivers/net/ethernet/intel/ice/ice_controlq.h |  2 +
 .../net/ethernet/intel/ice/ice_hw_autogen.h   | 21 +
 drivers/net/ethernet/intel/ice/ice_main.c | 47 +++
 drivers/net/ethernet/intel/ice/ice_type.h |  7 +++
 8 files changed, 151 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ice/ice.h 
b/drivers/net/ethernet/intel/ice/ice.h
index 0b269c470343..639d45d1da49 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -46,6 +46,7 @@ extern const char ice_drv_ver[];
 #define ICE_INT_NAME_STR_LEN   (IFNAMSIZ + 16)
 #define ICE_ETHTOOL_FWVER_LEN  32
 #define ICE_AQ_LEN 64
+#define ICE_MBXQ_LEN   64
 #define ICE_MIN_MSIX   2
 #define ICE_NO_VSI 0x
 #define ICE_MAX_VSI_ALLOC  130
@@ -63,6 +64,7 @@ extern const char ice_drv_ver[];
 #define ICE_RES_MISC_VEC_ID(ICE_RES_VALID_BIT - 1)
 #define ICE_INVAL_Q_INDEX  0x
 #define ICE_INVAL_VFID 256
+#define ICE_MAX_VF_COUNT   256
 
 #define ICE_VSIQF_HKEY_ARRAY_SIZE  ((VSIQF_HKEY_MAX_INDEX + 1) *   4)
 
@@ -134,6 +136,7 @@ enum ice_state {
__ICE_SUSPENDED,/* set on module remove path */
__ICE_RESET_FAILED, /* set by reset/rebuild */
__ICE_ADMINQ_EVENT_PENDING,
+   __ICE_MAILBOXQ_EVENT_PENDING,
__ICE_MDD_EVENT_PENDING,
__ICE_FLTR_OVERFLOW_PROMISC,
__ICE_CFG_BUSY,
@@ -240,6 +243,7 @@ enum ice_pf_flags {
ICE_FLAG_MSIX_ENA,
ICE_FLAG_FLTR_SYNC,
ICE_FLAG_RSS_ENA,
+   ICE_FLAG_SRIOV_CAPABLE,
ICE_PF_FLAGS_NBITS  /* must be last */
 };
 
@@ -255,6 +259,7 @@ struct ice_pf {
 
struct ice_vsi **vsi;   /* VSIs created by the driver */
struct ice_sw *first_sw;/* first switch created by firmware */
+   u16 num_vfs_supported;  /* num VFs supported for this PF */
DECLARE_BITMAP(state, __ICE_STATE_NBITS);
DECLARE_BITMAP(avail_txqs, ICE_MAX_TXQS);
DECLARE_BITMAP(avail_rxqs, ICE_MAX_RXQS);
diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h 
b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
index c100b4bda195..7d793cc96a18 100644
--- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
@@ -87,6 +87,8 @@ struct ice_aqc_list_caps {
 /* Device/Function buffer entry, repeated per reported capability */
 struct ice_aqc_list_caps_elem {
__le16 cap;
+#define ICE_AQC_CAPS_SRIOV 0x0012
+#define ICE_AQC_CAPS_VF0x0013
 #define ICE_AQC_CAPS_VSI   0x0017
 #define ICE_AQC_CAPS_RSS   0x0040
 #define ICE_AQC_CAPS_RXQS  0x0041
diff --git a/drivers/net/ethernet/intel/ice/ice_common.c 
b/drivers/net/ethernet/intel/ice/ice_common.c
index 68fbbb92d504..0fe054e4bfb8 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -1406,6 +1406,28 @@ ice_parse_caps(struct ice_hw *hw, void *buf, u32 
cap_count,
u16 cap = le16_to_cpu(cap_resp->cap);
 
switch (cap) {
+   case ICE_AQC_CAPS_SRIOV:
+   caps->sr_iov_1_1 = (number == 1);
+   ice_debug(hw, ICE_DBG_INIT,
+ "HW caps: SR-IOV = %d\n", caps->sr_iov_1_1);
+   break;
+   case ICE_AQC_CAPS_VF:
+   if (dev_p) {
+   dev_p->num_vfs_exposed = number;
+   ice_debug(hw, ICE_DBG_INIT,
+ "HW caps: VFs exposed = %d\n",
+ dev_p->num_vfs_exposed);
+   } else if (func_p) {
+   func_p->num_allocd_vfs = number;
+   func_p->vf_base_id = logical_id;
+   ice_debug(hw, ICE_DBG_INIT,
+ "HW caps: VFs allocated = %d\n",
+ func_p->num_allocd_vfs);
+   ice_debug(hw, ICE_DBG_INIT,
+ "HW caps: VF base_id = %d

[net-next 11/15] ice: Add code to go from ICE_FWD_TO_VSI_LIST to ICE_FWD_TO_VSI

2018-10-03 Thread Jeff Kirsher
From: Brett Creeley 

When a switch rule is initially created we set the filter action to
ICE_FWD_TO_VSI. The filter action changes to ICE_FWD_TO_VSI_LIST
whenever more than one VSI is subscribed to the same switch rule. When
the switch rule goes from 2 VSIs in the list to 1 VSI we remove and
delete the VSI list rule, but we currently don't update the switch rule
in hardware. This is causing switch rules to be lost, so fix that by
making a call to ice_update_pkt_fwd_rule() with the necessary changes.

Signed-off-by: Brett Creeley 
Signed-off-by: Anirudh Venkataramanan 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ice/ice_switch.c | 38 -
 1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_switch.c 
b/drivers/net/ethernet/intel/ice/ice_switch.c
index ff933a348acc..61a1b6adaef3 100644
--- a/drivers/net/ethernet/intel/ice/ice_switch.c
+++ b/drivers/net/ethernet/intel/ice/ice_switch.c
@@ -1422,8 +1422,8 @@ ice_rem_update_vsi_list(struct ice_hw *hw, u16 vsi_handle,
fm_list->vsi_count--;
clear_bit(vsi_handle, fm_list->vsi_list_info->vsi_map);
 
-   if ((fm_list->vsi_count == 1 && lkup_type != ICE_SW_LKUP_VLAN) ||
-   (fm_list->vsi_count == 0 && lkup_type == ICE_SW_LKUP_VLAN)) {
+   if (fm_list->vsi_count == 1 && lkup_type != ICE_SW_LKUP_VLAN) {
+   struct ice_fltr_info tmp_fltr_info = fm_list->fltr_info;
struct ice_vsi_list_map_info *vsi_list_info =
fm_list->vsi_list_info;
u16 rem_vsi_handle;
@@ -1432,6 +1432,8 @@ ice_rem_update_vsi_list(struct ice_hw *hw, u16 vsi_handle,
ICE_MAX_VSI);
if (!ice_is_vsi_valid(hw, rem_vsi_handle))
return ICE_ERR_OUT_OF_RANGE;
+
+   /* Make sure VSI list is empty before removing it below */
status = ice_update_vsi_list_rule(hw, &rem_vsi_handle, 1,
  vsi_list_id, true,
  ice_aqc_opc_update_sw_rules,
@@ -1439,16 +1441,34 @@ ice_rem_update_vsi_list(struct ice_hw *hw, u16 
vsi_handle,
if (status)
return status;
 
+   tmp_fltr_info.fltr_act = ICE_FWD_TO_VSI;
+   tmp_fltr_info.fwd_id.hw_vsi_id =
+   ice_get_hw_vsi_num(hw, rem_vsi_handle);
+   tmp_fltr_info.vsi_handle = rem_vsi_handle;
+   status = ice_update_pkt_fwd_rule(hw, &tmp_fltr_info);
+   if (status) {
+   ice_debug(hw, ICE_DBG_SW,
+ "Failed to update pkt fwd rule to FWD_TO_VSI 
on HW VSI %d, error %d\n",
+ tmp_fltr_info.fwd_id.hw_vsi_id, status);
+   return status;
+   }
+
+   fm_list->fltr_info = tmp_fltr_info;
+   }
+
+   if ((fm_list->vsi_count == 1 && lkup_type != ICE_SW_LKUP_VLAN) ||
+   (fm_list->vsi_count == 0 && lkup_type == ICE_SW_LKUP_VLAN)) {
+   struct ice_vsi_list_map_info *vsi_list_info =
+   fm_list->vsi_list_info;
+
/* Remove the VSI list since it is no longer used */
status = ice_remove_vsi_list_rule(hw, vsi_list_id, lkup_type);
-   if (status)
+   if (status) {
+   ice_debug(hw, ICE_DBG_SW,
+ "Failed to remove VSI list %d, error %d\n",
+ vsi_list_id, status);
return status;
-
-   /* Change the list entry action from VSI_LIST to VSI */
-   fm_list->fltr_info.fltr_act = ICE_FWD_TO_VSI;
-   fm_list->fltr_info.fwd_id.hw_vsi_id =
-   ice_get_hw_vsi_num(hw, rem_vsi_handle);
-   fm_list->fltr_info.vsi_handle = rem_vsi_handle;
+   }
 
list_del(&vsi_list_info->list_entry);
devm_kfree(ice_hw_to_dev(hw), vsi_list_info);
-- 
2.17.1



[net-next 14/15] ice: Use the right function to enable/disable VSI

2018-10-03 Thread Jeff Kirsher
From: Dave Ertman 

The ice_ena/dis_vsi should have a single differentiating
factor to determine if the netdev_ops call is used or a
direct call to ice_vsi_open/close.  This is if the netif is
running or not.  If netif is running, use ndo_open/ndo_close.
Else, use ice_vsi_open/ice_vsi_close.

Signed-off-by: Dave Ertman 
Signed-off-by: Anirudh Venkataramanan 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ice/ice_main.c | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c 
b/drivers/net/ethernet/intel/ice/ice_main.c
index 948c97defeba..02cfd874f674 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -3214,13 +3214,14 @@ static void ice_dis_vsi(struct ice_vsi *vsi)
 
set_bit(__ICE_NEEDS_RESTART, vsi->state);
 
-   if (vsi->netdev && netif_running(vsi->netdev) &&
-   vsi->type == ICE_VSI_PF) {
-   rtnl_lock();
-   vsi->netdev->netdev_ops->ndo_stop(vsi->netdev);
-   rtnl_unlock();
-   } else {
-   ice_vsi_close(vsi);
+   if (vsi->type == ICE_VSI_PF && vsi->netdev) {
+   if (netif_running(vsi->netdev)) {
+   rtnl_lock();
+   vsi->netdev->netdev_ops->ndo_stop(vsi->netdev);
+   rtnl_unlock();
+   } else {
+   ice_vsi_close(vsi);
+   }
}
 }
 
@@ -3232,12 +3233,16 @@ static int ice_ena_vsi(struct ice_vsi *vsi)
 {
int err = 0;
 
-   if (test_and_clear_bit(__ICE_NEEDS_RESTART, vsi->state))
-   if (vsi->netdev && netif_running(vsi->netdev)) {
+   if (test_and_clear_bit(__ICE_NEEDS_RESTART, vsi->state) &&
+   vsi->netdev) {
+   if (netif_running(vsi->netdev)) {
rtnl_lock();
err = vsi->netdev->netdev_ops->ndo_open(vsi->netdev);
rtnl_unlock();
+   } else {
+   err = ice_vsi_open(vsi);
}
+   }
 
return err;
 }
-- 
2.17.1



[net-next 12/15] ice: Fix potential null pointer issues

2018-10-03 Thread Jeff Kirsher
From: Dave Ertman 

Add checks in the filter handling flow to avoid dereferencing
NULL pointers.

Signed-off-by: Dave Ertman 
Signed-off-by: Anirudh Venkataramanan 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ice/ice_switch.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_switch.c 
b/drivers/net/ethernet/intel/ice/ice_switch.c
index 61a1b6adaef3..33403f39f1b3 100644
--- a/drivers/net/ethernet/intel/ice/ice_switch.c
+++ b/drivers/net/ethernet/intel/ice/ice_switch.c
@@ -2010,12 +2010,12 @@ ice_cfg_dflt_vsi(struct ice_hw *hw, u16 vsi_handle, 
bool set, u8 direction)
 enum ice_status
 ice_remove_mac(struct ice_hw *hw, struct list_head *m_list)
 {
-   struct ice_fltr_list_entry *list_itr;
+   struct ice_fltr_list_entry *list_itr, *tmp;
 
if (!m_list)
return ICE_ERR_PARAM;
 
-   list_for_each_entry(list_itr, m_list, list_entry) {
+   list_for_each_entry_safe(list_itr, tmp, m_list, list_entry) {
enum ice_sw_lkup_type l_type = list_itr->fltr_info.lkup_type;
 
if (l_type != ICE_SW_LKUP_MAC)
@@ -2037,12 +2037,12 @@ ice_remove_mac(struct ice_hw *hw, struct list_head 
*m_list)
 enum ice_status
 ice_remove_vlan(struct ice_hw *hw, struct list_head *v_list)
 {
-   struct ice_fltr_list_entry *v_list_itr;
+   struct ice_fltr_list_entry *v_list_itr, *tmp;
 
if (!v_list || !hw)
return ICE_ERR_PARAM;
 
-   list_for_each_entry(v_list_itr, v_list, list_entry) {
+   list_for_each_entry_safe(v_list_itr, tmp, v_list, list_entry) {
enum ice_sw_lkup_type l_type = v_list_itr->fltr_info.lkup_type;
 
if (l_type != ICE_SW_LKUP_VLAN)
@@ -2142,7 +2142,7 @@ ice_add_to_vsi_fltr_list(struct ice_hw *hw, u16 
vsi_handle,
struct ice_fltr_info *fi;
 
fi = &fm_entry->fltr_info;
-   if (!ice_vsi_uses_fltr(fm_entry, vsi_handle))
+   if (!fi || !ice_vsi_uses_fltr(fm_entry, vsi_handle))
continue;
 
status = ice_add_entry_to_vsi_fltr_list(hw, vsi_handle,
@@ -2259,7 +2259,8 @@ ice_replay_vsi_fltr(struct ice_hw *hw, u16 vsi_handle, u8 
recp_id,
goto end;
continue;
}
-   if (!test_bit(vsi_handle, itr->vsi_list_info->vsi_map))
+   if (!itr->vsi_list_info ||
+   !test_bit(vsi_handle, itr->vsi_list_info->vsi_map))
continue;
/* Clearing it so that the logic can add it back */
clear_bit(vsi_handle, itr->vsi_list_info->vsi_map);
-- 
2.17.1



[net-next 07/15] ice: Implement virtchnl commands for AVF support

2018-10-03 Thread Jeff Kirsher
From: Anirudh Venkataramanan 

virtchnl is a protocol/interface specification that allows the Intel
"Adaptive Virtual Function (AVF)" driver (iavf.ko) to work with more than
one physical function driver. The AVF driver sends "virtchnl commands"
(control plane only) to the PF driver over mailbox queues and the PF driver
executes these commands and returns a result to the VF, again over mailbox.

This patch adds AVF support for the ice PF driver by implementing the
following virtchnl commands:

VIRTCHNL_OP_VERSION
VIRTCHNL_OP_GET_VF_RESOURCES
VIRTCHNL_OP_RESET_VF
VIRTCHNL_OP_ADD_ETH_ADDR
VIRTCHNL_OP_DEL_ETH_ADDR
VIRTCHNL_OP_CONFIG_VSI_QUEUES
VIRTCHNL_OP_ENABLE_QUEUES
VIRTCHNL_OP_DISABLE_QUEUES
VIRTCHNL_OP_ADD_ETH_ADDR
VIRTCHNL_OP_DEL_ETH_ADDR
VIRTCHNL_OP_CONFIG_VSI_QUEUES
VIRTCHNL_OP_ENABLE_QUEUES
VIRTCHNL_OP_DISABLE_QUEUES
VIRTCHNL_OP_REQUEST_QUEUES
VIRTCHNL_OP_CONFIG_IRQ_MAP
VIRTCHNL_OP_CONFIG_RSS_KEY
VIRTCHNL_OP_CONFIG_RSS_LUT
VIRTCHNL_OP_GET_STATS
VIRTCHNL_OP_ADD_VLAN
VIRTCHNL_OP_DEL_VLAN
VIRTCHNL_OP_ENABLE_VLAN_STRIPPING
VIRTCHNL_OP_DISABLE_VLAN_STRIPPING

Signed-off-by: Anirudh Venkataramanan 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ice/ice.h  |1 +
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |1 +
 drivers/net/ethernet/intel/ice/ice_main.c |3 +
 drivers/net/ethernet/intel/ice/ice_switch.c   |1 +
 .../net/ethernet/intel/ice/ice_virtchnl_pf.c  | 1204 +
 .../net/ethernet/intel/ice/ice_virtchnl_pf.h  |   11 +
 6 files changed, 1221 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice.h 
b/drivers/net/ethernet/intel/ice/ice.h
index a9572f8ef6bf..4c4b5717a627 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -71,6 +71,7 @@ extern const char ice_drv_ver[];
 #define ICE_MAX_QS_PER_VF  256
 #define ICE_MIN_QS_PER_VF  1
 #define ICE_DFLT_QS_PER_VF 4
+#define ICE_MAX_BASE_QS_PER_VF 16
 #define ICE_MAX_INTR_PER_VF65
 #define ICE_MIN_INTR_PER_VF(ICE_MIN_QS_PER_VF + 1)
 #define ICE_DFLT_INTR_PER_VF   (ICE_DFLT_QS_PER_VF + 1)
diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h 
b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
index 2c8f590316e9..6653555f55dd 100644
--- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
@@ -1446,6 +1446,7 @@ enum ice_adminq_opc {
ice_aqc_opc_nvm_read= 0x0701,
 
/* PF/VF mailbox commands */
+   ice_mbx_opc_send_msg_to_pf  = 0x0801,
ice_mbx_opc_send_msg_to_vf  = 0x0802,
 
/* RSS commands */
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c 
b/drivers/net/ethernet/intel/ice/ice_main.c
index 01112ae7fdc3..4c8e7460e16b 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -800,6 +800,9 @@ static int __ice_clean_ctrlq(struct ice_pf *pf, enum 
ice_ctl_q q_type)
dev_err(&pf->pdev->dev,
"Could not handle link event\n");
break;
+   case ice_mbx_opc_send_msg_to_pf:
+   ice_vc_process_vf_msg(pf, &event);
+   break;
case ice_aqc_opc_fw_logging:
ice_output_fw_log(hw, &event.desc, event.msg_buf);
break;
diff --git a/drivers/net/ethernet/intel/ice/ice_switch.c 
b/drivers/net/ethernet/intel/ice/ice_switch.c
index e949224b5282..9a2664fd87b8 100644
--- a/drivers/net/ethernet/intel/ice/ice_switch.c
+++ b/drivers/net/ethernet/intel/ice/ice_switch.c
@@ -187,6 +187,7 @@ ice_aq_add_vsi(struct ice_hw *hw, struct ice_vsi_ctx 
*vsi_ctx,
if (!vsi_ctx->alloc_from_pool)
cmd->vsi_num = cpu_to_le16(vsi_ctx->vsi_num |
   ICE_AQ_VSI_IS_VALID);
+   cmd->vf_id = vsi_ctx->vf_num;
 
cmd->vsi_flags = cpu_to_le16(vsi_ctx->flags);
 
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c 
b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
index cf4517fd58e8..f44292b00807 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
@@ -79,6 +79,35 @@ ice_set_pfe_link_forced(struct ice_vf *vf, struct 
virtchnl_pf_event *pfe,
ice_set_pfe_link(vf, pfe, link_speed, link_up);
 }
 
+/**
+ * ice_vc_notify_vf_link_state - Inform a VF of link status
+ * @vf: pointer to the VF structure
+ *
+ * send a link status message to a single VF
+ */
+static void ice_vc_notify_vf_link_state(struct ice_vf *vf)
+{
+   struct virtchnl_pf_event pfe = { 0 };
+   struct ice_link_status *ls;
+   struct ice_pf *pf = vf->pf;
+   struct ice_hw *hw;
+
+   hw = &pf->hw;
+   ls = &hw->port_info->phy.link_info;
+
+   pfe.event = VIRTCHNL_EV

[net-next 15/15] ice: Update version string

2018-10-03 Thread Jeff Kirsher
From: Anirudh Venkataramanan 

Update version string to 0.7.2-k

Signed-off-by: Anirudh Venkataramanan 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ice/ice_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c 
b/drivers/net/ethernet/intel/ice/ice_main.c
index 02cfd874f674..8f61b375e768 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -8,7 +8,7 @@
 #include "ice.h"
 #include "ice_lib.h"
 
-#define DRV_VERSION"0.7.1-k"
+#define DRV_VERSION"0.7.2-k"
 #define DRV_SUMMARY"Intel(R) Ethernet Connection E800 Series Linux Driver"
 const char ice_drv_ver[] = DRV_VERSION;
 static const char ice_driver_string[] = DRV_SUMMARY;
-- 
2.17.1



[net-next 08/15] ice: Notify VF of link status change

2018-10-03 Thread Jeff Kirsher
From: Anirudh Venkataramanan 

When PF gets a link status change event, notify the VFs of the same.

Signed-off-by: Anirudh Venkataramanan 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ice/ice_main.c|  2 ++
 drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c | 12 
 drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h |  3 +++
 3 files changed, 17 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c 
b/drivers/net/ethernet/intel/ice/ice_main.c
index 4c8e7460e16b..d7cbc2e6e5c5 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -665,6 +665,8 @@ ice_link_event(struct ice_pf *pf, struct ice_port_info *pi)
}
}
 
+   ice_vc_notify_link_state(pf);
+
return 0;
 }
 
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c 
b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
index f44292b00807..20de2034e153 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
@@ -887,6 +887,18 @@ static bool ice_reset_vf(struct ice_vf *vf, bool is_vflr)
return true;
 }
 
+/**
+ * ice_vc_notify_link_state - Inform all VFs on a PF of link status
+ * @pf: pointer to the PF structure
+ */
+void ice_vc_notify_link_state(struct ice_pf *pf)
+{
+   int i;
+
+   for (i = 0; i < pf->num_alloc_vfs; i++)
+   ice_vc_notify_vf_link_state(&pf->vf[i]);
+}
+
 /**
  * ice_vc_notify_reset - Send pending reset message to all VFs
  * @pf: pointer to the PF structure
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h 
b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h
index 7561a678ebe6..a493cb1bb89d 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h
@@ -78,6 +78,7 @@ int ice_get_vf_cfg(struct net_device *netdev, int vf_id,
 
 void ice_free_vfs(struct ice_pf *pf);
 void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event);
+void ice_vc_notify_link_state(struct ice_pf *pf);
 void ice_vc_notify_reset(struct ice_pf *pf);
 bool ice_reset_all_vfs(struct ice_pf *pf, bool is_vflr);
 
@@ -96,7 +97,9 @@ int ice_set_vf_spoofchk(struct net_device *netdev, int vf_id, 
bool ena);
 #define ice_process_vflr_event(pf) do {} while (0)
 #define ice_free_vfs(pf) do {} while (0)
 #define ice_vc_process_vf_msg(pf, event) do {} while (0)
+#define ice_vc_notify_link_state(pf) do {} while (0)
 #define ice_vc_notify_reset(pf) do {} while (0)
+
 static inline bool
 ice_reset_all_vfs(struct ice_pf __always_unused *pf,
  bool __always_unused is_vflr)
-- 
2.17.1



[net-next 05/15] ice: Add support for VF reset events

2018-10-03 Thread Jeff Kirsher
From: Anirudh Venkataramanan 

Post VF initialization, there are a couple of different ways in which a
VF reset can be triggered. One is when the underlying PF itself goes
through a reset and other is via a VFLR interrupt. ice_reset_vf introduced
in this patch handles both these cases.

Also introduced in this patch is a helper function ice_aq_send_msg_to_vf
to send messages to VF over the mailbox queue. The PF uses this to send
reset notifications to VFs.

Signed-off-by: Anirudh Venkataramanan 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ice/Makefile   |   2 +-
 drivers/net/ethernet/intel/ice/ice.h  |   2 +
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |  17 ++
 .../net/ethernet/intel/ice/ice_hw_autogen.h   |   1 +
 drivers/net/ethernet/intel/ice/ice_main.c |  15 ++
 drivers/net/ethernet/intel/ice/ice_sriov.c|  41 +
 drivers/net/ethernet/intel/ice/ice_sriov.h|  25 +++
 drivers/net/ethernet/intel/ice/ice_status.h   |   3 +
 .../net/ethernet/intel/ice/ice_virtchnl_pf.c  | 163 ++
 .../net/ethernet/intel/ice/ice_virtchnl_pf.h  |   4 +
 10 files changed, 272 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/intel/ice/ice_sriov.c
 create mode 100644 drivers/net/ethernet/intel/ice/ice_sriov.h

diff --git a/drivers/net/ethernet/intel/ice/Makefile 
b/drivers/net/ethernet/intel/ice/Makefile
index 1999cd09239e..e5d6f684437e 100644
--- a/drivers/net/ethernet/intel/ice/Makefile
+++ b/drivers/net/ethernet/intel/ice/Makefile
@@ -16,4 +16,4 @@ ice-y := ice_main.o   \
 ice_lib.o  \
 ice_txrx.o \
 ice_ethtool.o
-ice-$(CONFIG_PCI_IOV) += ice_virtchnl_pf.o
+ice-$(CONFIG_PCI_IOV) += ice_virtchnl_pf.o ice_sriov.o
diff --git a/drivers/net/ethernet/intel/ice/ice.h 
b/drivers/net/ethernet/intel/ice/ice.h
index 89ec05e9983b..a9572f8ef6bf 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -37,6 +37,7 @@
 #include "ice_common.h"
 #include "ice_sched.h"
 #include "ice_virtchnl_pf.h"
+#include "ice_sriov.h"
 
 extern const char ice_drv_ver[];
 #define ICE_BAR0   0
@@ -155,6 +156,7 @@ enum ice_state {
__ICE_ADMINQ_EVENT_PENDING,
__ICE_MAILBOXQ_EVENT_PENDING,
__ICE_MDD_EVENT_PENDING,
+   __ICE_VFLR_EVENT_PENDING,
__ICE_FLTR_OVERFLOW_PROMISC,
__ICE_VF_DIS,
__ICE_CFG_BUSY,
diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h 
b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
index 7d793cc96a18..2c8f590316e9 100644
--- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
@@ -1077,6 +1077,19 @@ struct ice_aqc_nvm {
__le32 addr_low;
 };
 
+/**
+ * Send to PF command (indirect 0x0801) id is only used by PF
+ *
+ * Send to VF command (indirect 0x0802) id is only used by PF
+ *
+ */
+struct ice_aqc_pf_vf_msg {
+   __le32 id;
+   u32 reserved;
+   __le32 addr_high;
+   __le32 addr_low;
+};
+
 /* Get/Set RSS key (indirect 0x0B04/0x0B02) */
 struct ice_aqc_get_set_rss_key {
 #define ICE_AQC_GSET_RSS_KEY_VSI_VALID BIT(15)
@@ -1334,6 +1347,7 @@ struct ice_aq_desc {
struct ice_aqc_query_txsched_res query_sched_res;
struct ice_aqc_add_move_delete_elem add_move_delete_elem;
struct ice_aqc_nvm nvm;
+   struct ice_aqc_pf_vf_msg virt;
struct ice_aqc_get_set_rss_lut get_set_rss_lut;
struct ice_aqc_get_set_rss_key get_set_rss_key;
struct ice_aqc_add_txqs add_txqs;
@@ -1431,6 +1445,9 @@ enum ice_adminq_opc {
/* NVM commands */
ice_aqc_opc_nvm_read= 0x0701,
 
+   /* PF/VF mailbox commands */
+   ice_mbx_opc_send_msg_to_vf  = 0x0802,
+
/* RSS commands */
ice_aqc_opc_set_rss_key = 0x0B02,
ice_aqc_opc_set_rss_lut = 0x0B03,
diff --git a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h 
b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
index 12d4c862bf05..5a4fa22d0a83 100644
--- a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
+++ b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
@@ -136,6 +136,7 @@
 #define PFINT_OICR_PCI_EXCEPTION_M BIT(21)
 #define PFINT_OICR_HMC_ERR_M   BIT(26)
 #define PFINT_OICR_PE_CRITERR_MBIT(28)
+#define PFINT_OICR_VFLR_M  BIT(29)
 #define PFINT_OICR_CTL 0x0016CA80
 #define PFINT_OICR_CTL_MSIX_INDX_M ICE_M(0x7FF, 0)
 #define PFINT_OICR_CTL_ITR_INDX_S  11
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c 
b/drivers/net/ethernet/intel/ice/ice_main.c
index 5b8c950d219a..f1a116c9b527 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -342,6 +342,10 @@ ice_prepare_for_reset(struct ice_pf *pf)
 

[net-next 01/15] virtchnl: Added support to exchange additional speed values

2018-10-03 Thread Jeff Kirsher
From: Yashaswini Raghuram Prathivadi Bhayankaram 


Introduced a new virtchnl capability flag and a struct to support exchange
of additional supported speeds.

Signed-off-by: Yashaswini Raghuram Prathivadi Bhayankaram 

Signed-off-by: Anirudh Venkataramanan 
Signed-off-by: Jeff Kirsher 
---
 include/linux/avf/virtchnl.h | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/include/linux/avf/virtchnl.h b/include/linux/avf/virtchnl.h
index b41f7bc958ef..2c9756bd9c4c 100644
--- a/include/linux/avf/virtchnl.h
+++ b/include/linux/avf/virtchnl.h
@@ -252,6 +252,8 @@ VIRTCHNL_CHECK_STRUCT_LEN(16, virtchnl_vsi_resource);
 #define VIRTCHNL_VF_OFFLOAD_RX_ENCAP_CSUM  0X0040
 #define VIRTCHNL_VF_OFFLOAD_ADQ0X0080
 
+/* Define below the capability flags that are not offloads */
+#define VIRTCHNL_VF_CAP_ADV_LINK_SPEED 0x0080
 #define VF_BASE_MODE_OFFLOADS (VIRTCHNL_VF_OFFLOAD_L2 | \
   VIRTCHNL_VF_OFFLOAD_VLAN | \
   VIRTCHNL_VF_OFFLOAD_RSS_PF)
@@ -596,10 +598,23 @@ enum virtchnl_event_codes {
 struct virtchnl_pf_event {
enum virtchnl_event_codes event;
union {
+   /* If the PF driver does not support the new speed reporting
+* capabilities then use link_event else use link_event_adv to
+* get the speed and link information. The ability to understand
+* new speeds is indicated by setting the capability flag
+* VIRTCHNL_VF_CAP_ADV_LINK_SPEED in vf_cap_flags parameter
+* in virtchnl_vf_resource struct and can be used to determine
+* which link event struct to use below.
+*/
struct {
enum virtchnl_link_speed link_speed;
bool link_status;
} link_event;
+   struct {
+   /* link_speed provided in Mbps */
+   u32 link_speed;
+   u8 link_status;
+   } link_event_adv;
} event_data;
 
int severity;
-- 
2.17.1



Re: [RFC PATCH v2 bpf-next 0/2] verifier liveness simplification

2018-10-03 Thread Jiong Wang

On 28/09/2018 14:36, Edward Cree wrote:
> On 26/09/18 23:16, Jiong Wang wrote:
>> On 22/08/2018 20:00, Edward Cree wrote:
>>> In the future this idea may be extended to form use-def chains.
>>
>>   1. instruction level use->def chain
>>
>>  - new use->def chains for each instruction. one eBPF insn could 
have two

>>    uses at maximum.
> I was thinking of something a lot weaker/simpler, just making
> ld rX, rY
>  copy rY.parent into rX.parent and not read-mark rY (whereas actual
>  arithmetic, pointer deref etc. would still create read marks).

Thanks for the feedback Edward.

> But what you've described sounds interesting; perhaps it would also
>  help later with loop-variable handling?

Haven't considered how to use this for loop-variable handling, guess you 
mean
applying what I have described to your previous loop detection RFC? I 
will look

into your RFC later.

At the moment the design of the use->def chain is mainly to optimize 32-bit
code-gen. I was about to satisfied with a local implementation and to 
share it

to ML for further discussion. However, when manually check the optimization
result on testcase with medium size (~1000 eBPF insns) and proper complexity
(make sure path prunes etc are triggered inside verifier), I found the 
code-gen

doesn't meet my expectation.

For example, for the following sequence, insn at 25 should operate on 
full-64

bit but I found it is marked as 32-bit safe.

  25:    r7 = 1
  26:    if r4 > r8 goto +1200 
  27:    r1 = *(u8 *)(r1 + 0)
  28:    r1 &= 15
  29:    r7 = 1
  ...

L:
  1227:  r0 = r7
  1228:  exit

As described at previous email, the algorithm assume all insns are 
32-bit safe

first, then start to insns back to "64-bit" if there is any 64-bit use found
for a insn.

Insn 25 is defining r7 which is used at the 1227 where its value 
propagated to

r0 and then r0 is implicitly used at insn 1228 as it is a exit from main
function to external.

For above example, as we don't know the external use of r0 at 1228 (exit 
from

main to external), so r0 is treated as 64-bit implicit use. The define is at
1227, so insn 1227 is marked as "64-bit". The "64-bit" attribute should
propagate to source register operand through register move and 
arithmetic, so
r7 at insn 1227 is a "64-bit" use and should make its definition 
instruction,

insn 25, marked as "64-bit". This is my thinking of how insn 25 should be
marked.

Now this hasn't happened. I am still debugging the root cause, but kind 
of feel
"64-bit" attribute propagation is the issue, it seems to me it can't be 
nicely
integrated into the existing register read/write propagation 
infrastructure. For
example, for a slightly more complex sequence which is composed of three 
states:


State A
  ...
  10: r6 = *(u32 *)(r10 - 4)
  11: r7 = *(u32 *)(r10 - 8)
  12: *(u64 *)(r10 - 16) = r6
  13: *(u64 *)(r10 - 24) = r7

State B
  14: r6 += 1
  15: r7 += r6
  16: *(u32 *)(r10 - 28) = r7

State C
  ...
  17: r3 += r7
  18: r4 = 1
  19: *(u64 *)(r10 - 32) = r3
  20: *(u64 *)(r10 - 40) = r4

State A is parent of state B which is parent of state C.

Inside state C, at insn 20, r4 is a 64-bit read/use, so its define at 18 is
marked as "64-bit". There is no register source at 18, so "64-bit" attribute
propagation is stopped.

Then at insn 19, r3 is a 64-bit read/use, so its define at 17 is marked as
"64-bit" read/use. Insn 17 has two register sources, r3 and r7, they become
"64-bit" now, and their definition should be marked as "64-bit".

Now if the definition of r3 or r7 comes from parent state, then the 
parent state

should receive a "REG_LIVE_READ64", this is necessary if later another path
reaches state C and triggers prune path, for which case that path should 
know

there is "64-bit" use inside state C on some registers, and should use this
information to mark "64-bit" insn.

If the definition of r3 or r7 is still inside state C, we need to keep 
walking
up the instruction sequences, and propagate "64-bit" attribute upward 
until it

goes beyond the state C.

The above propagation logic is quite different from existing register 
read/write
propagation. For the latter, a write just screen up all following read, 
and a
read would propagate directly to its parent is there is not previous 
write, no

instruction analysis is required.

I am just describing what I have run into and trying to resolve, any 
thoughts

and suggestions are appreciated.

Regards,
Jiong


[PATCH iproute2-next] tc: flower: expose hardware offload count

2018-10-03 Thread Vlad Buslov
Recently flower classifier was updated to expose count of devices that
filter is offloaded to. Add support to print this counter as 'in_hw_count'.

Signed-off-by: Vlad Buslov 
Acked-by: Jiri Pirko 
---
 tc/f_flower.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/tc/f_flower.c b/tc/f_flower.c
index 59e5f572c542..cbacc664d397 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -1585,8 +1585,16 @@ static int flower_print_opt(struct filter_util *qu, FILE 
*f,
if (flags & TCA_CLS_FLAGS_SKIP_SW)
print_bool(PRINT_ANY, "skip_sw", "\n  skip_sw", true);
 
-   if (flags & TCA_CLS_FLAGS_IN_HW)
+   if (flags & TCA_CLS_FLAGS_IN_HW) {
print_bool(PRINT_ANY, "in_hw", "\n  in_hw", true);
+
+   if (tb[TCA_FLOWER_IN_HW_COUNT]) {
+   __u32 count = 
rta_getattr_u32(tb[TCA_FLOWER_IN_HW_COUNT]);
+
+   print_uint(PRINT_ANY, "in_hw_count",
+  " in_hw_count %d", count);
+   }
+   }
else if (flags & TCA_CLS_FLAGS_NOT_IN_HW)
print_bool(PRINT_ANY, "not_in_hw", "\n  not_in_hw", 
true);
}
-- 
2.7.5



Re: r8169 tx batching(?) causing performance problems

2018-10-03 Thread David Howells
David Howells  wrote:

> Can someone help me figure out a performance issue that seems to be caused by
> an RTL8168g/8111g NIC that seems to be batching up transmissions - or, at
> least, not starting immediately that it's given something to transmit?

I've been told that:

commit ad5f97faff4231e72b96bd96adbe1b6e977a9b86
Author: Heiner Kallweit 
Date:   Fri Sep 28 23:51:54 2018 +0200
r8169: fix network stalls due to missing bit TXCFG_AUTO_FIFO

might fix the problem, however it doesn't seem to help entirely.  Whilst it
does now seem to be transmitting whilst I'm generating up the queue, it still
seems that I'm able to load up the queue faster the packets are being cleared
from the queue.

So in the following excerpt:

id-0[001] d.h241.284702: net_rtl8169_interrupt: enp3s0 st=85
id-0[001] ..s241.284715: net_rtl8169_poll: enp3s0 st=85
dd-3186 [003] ...341.284741: net_rtl8169_tx: enp3s0 p=213-216 skb=2e2b0c3e
dd-3186 [003] ...341.284777: net_rtl8169_tx: enp3s0 p=213-217 skb=3950deca
dd-3186 [003] ...341.284790: net_rtl8169_tx: enp3s0 p=213-218 skb=471b2bc2
dd-3186 [003] ...341.284826: net_rtl8169_tx: enp3s0 p=213-219 skb=7c25ae16
dd-3186 [003] ...341.284839: net_rtl8169_tx: enp3s0 p=213-220 skb=cfbf719f
dd-3186 [003] ...341.284870: net_rtl8169_tx: enp3s0 p=213-221 skb=d34a1f67
dd-3186 [003] ...341.284883: net_rtl8169_tx: enp3s0 p=213-222 skb=466e20e8
dd-3186 [003] ...341.284914: net_rtl8169_tx: enp3s0 p=213-223 skb=3d36cb1c
dd-3186 [003] ...341.284927: net_rtl8169_tx: enp3s0 p=213-224 skb=399c06ea
id-0[001] ..s241.284938: net_rtl8169_tx_done: enp3s0 p=213 skb=c797fea6
id-0[001] ..s241.284939: net_rtl8169_tx_done: enp3s0 p=214 skb=c0e4d6f0
id-0[001] ..s241.284940: net_rtl8169_tx_done: enp3s0 p=215 skb=2e2b0c3e
id-0[001] ..s241.284941: net_rtl8169_tx_done: enp3s0 p=216 skb=3950deca
id-0[001] ..s241.284941: net_rtl8169_tx_done: enp3s0 p=217 skb=471b2bc2
id-0[001] ..s241.284942: net_rtl8169_tx_done: enp3s0 p=218 skb=7c25ae16
id-0[001] ..s241.284943: net_rtl8169_tx_done: enp3s0 p=219 skb=cfbf719f
id-0[001] ..s241.284944: net_rtl8169_tx_done: enp3s0 p=220 skb=d34a1f67
id-0[001] ..s241.284945: net_rtl8169_tx_done: enp3s0 p=221 skb=466e20e8
id-0[001] ..s241.284946: net_rtl8169_tx_done: enp3s0 p=222 skb=3d36cb1c
id-0[001] ..s241.284947: net_rtl8169_tx_done: enp3s0 p=223 skb=399c06ea
id-0[001] d.h241.284954: net_rtl8169_interrupt: enp3s0 st=85

packets are being queued something like 11-13uS apart, but there seems like a
big gap of about 200uS in the idle thread between the poll and the first
tx_done that might be masking things.

David


Re: [PATCH RFC v2 net-next 00/25] rtnetlink: Add support for rigid checking of data in dump request

2018-10-03 Thread David Ahern
On 10/3/18 8:59 AM, Stephen Hemminger wrote:
> On Mon,  1 Oct 2018 17:28:26 -0700
> David Ahern  wrote:
> 
>> How to resolve the problem of not breaking old userspace yet be able to
>> move forward with new features such as kernel side filtering which are
>> crucial for efficient operation at high scale?
> 
> What about forward compatibility? How would this work when running new 
> iproute2
> command on older kernels?
> 
> I expect the new command would set the "I am smart flag" and the older
> kernel would ignore it. The if the header for the message type had
> changed, the dump would be broken.
> 

The kernel today happily ignores garbage in the request it does not
understand. If the new iproute2 sends a dump request with attributes or
fields in the header set the kernel ignores it.

With the setsockopt option for setting the flag, userspace knows the
kernel does not support attribute checking and kernel side filtering.

As far as changing the header (new iproute2 on old kernel), there are 3
dumps that look at the header beyond the family:
1. link dumps - but it has the expected ifinfomsg header

2. neighbor dumps (expects the right ndmsg header)

3. fdb dumps - wrongly expect ifinfomsg header but there is patch to
detect when the ndmsg header is sent (ip neigh vs bridge fdb)

The 4th dump that looks at the header is addresses. Those patches were
added in this development cycle. Those dumps need to be wrapped in the
'userspace has a clue' setting or reverted until this is figured out.


Re: [PATCH RFC v2 net-next 00/25] rtnetlink: Add support for rigid checking of data in dump request

2018-10-03 Thread Stephen Hemminger
On Mon,  1 Oct 2018 17:28:26 -0700
David Ahern  wrote:

> How to resolve the problem of not breaking old userspace yet be able to
> move forward with new features such as kernel side filtering which are
> crucial for efficient operation at high scale?

What about forward compatibility? How would this work when running new iproute2
command on older kernels?

I expect the new command would set the "I am smart flag" and the older
kernel would ignore it. The if the header for the message type had
changed, the dump would be broken.



Re: [bpf-next PATCH 1/3] net: fix generic XDP to handle if eth header was mangled

2018-10-03 Thread Jesper Dangaard Brouer
On Tue, 25 Sep 2018 22:36:39 -0700
Song Liu  wrote:

> On Tue, Sep 25, 2018 at 7:26 AM Jesper Dangaard Brouer
>  wrote:
> >
> > XDP can modify (and resize) the Ethernet header in the packet.
> >
> > There is a bug in generic-XDP, because skb->protocol and skb->pkt_type
> > are setup before reaching (netif_receive_)generic_xdp.
> >
> > This bug was hit when XDP were popping VLAN headers (changing
> > eth->h_proto), as skb->protocol still contains VLAN-indication
> > (ETH_P_8021Q) causing invocation of skb_vlan_untag(skb), which corrupt
> > the packet (basically popping the VLAN again).
> >
> > This patch catch if XDP changed eth header in such a way, that SKB
> > fields needs to be updated.
> >
> > Fixes: d445516966dc ("net: xdp: support xdp generic on virtual devices")
> > Signed-off-by: Jesper Dangaard Brouer 
> > ---
> >  net/core/dev.c |   14 ++
> >  1 file changed, 14 insertions(+)
> >
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index ca78dc5a79a3..db6d89f536cb 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -4258,6 +4258,9 @@ static u32 netif_receive_generic_xdp(struct sk_buff 
> > *skb,
> > struct netdev_rx_queue *rxqueue;
> > void *orig_data, *orig_data_end;
> > u32 metalen, act = XDP_DROP;
> > +   __be16 orig_eth_type;
> > +   struct ethhdr *eth;
> > +   bool orig_bcast;
> > int hlen, off;
> > u32 mac_len;
> >
> > @@ -4298,6 +4301,9 @@ static u32 netif_receive_generic_xdp(struct sk_buff 
> > *skb,
> > xdp->data_hard_start = skb->data - skb_headroom(skb);
> > orig_data_end = xdp->data_end;
> > orig_data = xdp->data;
> > +   eth = (struct ethhdr *)xdp->data;
> > +   orig_bcast = is_multicast_ether_addr_64bits(eth->h_dest);
> > +   orig_eth_type = eth->h_proto;
> >
> > rxqueue = netif_get_rxqueue(skb);
> > xdp->rxq = &rxqueue->xdp_rxq;
> > @@ -4321,6 +4327,14 @@ static u32 netif_receive_generic_xdp(struct sk_buff 
> > *skb,
> >
> > }
> >
> > +   /* check if XDP changed eth hdr such SKB needs update */
> > +   eth = (struct ethhdr *)xdp->data;
> > +   if ((orig_eth_type != eth->h_proto) ||
> > +   (orig_bcast != is_multicast_ether_addr_64bits(eth->h_dest))) {  
> 
> Is the actions below always correct for the condition above? Do we need
> to confirm the SKB is updated properly?

I cannot find the issue that you are hinting to?

If the BPF prog used bpf_xdp_adjust_head(), which the included selftest
program does, then skb->data have been appropriately adjusted just
above (with __skb_pull(skb, off) or __skb_push(skb, -off)), which makes
the call to skb_reset_mac_header(skb) inside eth_type_trans() correct.

I've double checked the code, and I cannot find anything wrong...
please let me know if I missed something!?


> > +   __skb_push(skb, mac_len);
> > +   skb->protocol = eth_type_trans(skb, skb->dev);

We could change mac_len to be ETH_HLEN, because inside eth_type_trans()
the constant ETH_HLEN is used, that way we are 100% sure the
skb_push/skb_pull are "paired".  Will that be better for you?


> > +   }
> > +
> > switch (act) {
> > case XDP_REDIRECT:
> > case XDP_TX:
> >  

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


[PATCH] net: sample/bpf/tracex3_user.c: erase "ARRAY_SIZE" redefined

2018-10-03 Thread Bo YU

There is a warning when compiling bpf program in sample/bpf

BTW,i get the warning from David's net tree, then i git clone
bpf tree try to compile bpf program,but it tell me failed to
do that maybe i was not compile the whole kernel once.
I don't know this is ok or not.

Signed-off-by: Bo YU 
---
samples/bpf/tracex3_user.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/samples/bpf/tracex3_user.c b/samples/bpf/tracex3_user.c
index 6c6b10f4c3ee..3d8c39b8ef24 100644
--- a/samples/bpf/tracex3_user.c
+++ b/samples/bpf/tracex3_user.c
@@ -17,7 +17,6 @@
#include "bpf_load.h"
#include "bpf_util.h"

-#define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x)))

#define SLOTS 100

--
2.11.0



[PATCH net] be2net: don't flip hw_features when VXLANs are added/deleted

2018-10-03 Thread Davide Caratti
the be2net implementation of .ndo_tunnel_{add,del}() changes the value of
NETIF_F_GSO_UDP_TUNNEL bit in 'features' and 'hw_features', but it forgets
to call netdev_features_change(). Moreover, ethtool setting for that bit
can potentially be reverted after a tunnel is added or removed.

GSO already does software segmentation when 'hw_enc_features' is 0, even
if VXLAN offload is turned on. In addition, commit 096de2f83ebc ("benet:
stricter vxlan offloading check in be_features_check") avoids hardware
segmentation of non-VXLAN tunneled packets, or VXLAN packets having wrong
destination port. So, it's safe to avoid flipping the above feature on
addition/deletion of VXLAN tunnels.

Fixes: 630f4b70567f ("be2net: Export tunnel offloads only when a VxLAN tunnel 
is created")
Signed-off-by: Davide Caratti 
---
 drivers/net/ethernet/emulex/benet/be_main.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c 
b/drivers/net/ethernet/emulex/benet/be_main.c
index 74d122616e76..534787291b44 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -4002,8 +4002,6 @@ static int be_enable_vxlan_offloads(struct be_adapter 
*adapter)
netdev->hw_enc_features |= NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
   NETIF_F_TSO | NETIF_F_TSO6 |
   NETIF_F_GSO_UDP_TUNNEL;
-   netdev->hw_features |= NETIF_F_GSO_UDP_TUNNEL;
-   netdev->features |= NETIF_F_GSO_UDP_TUNNEL;
 
dev_info(dev, "Enabled VxLAN offloads for UDP port %d\n",
 be16_to_cpu(port));
@@ -4025,8 +4023,6 @@ static void be_disable_vxlan_offloads(struct be_adapter 
*adapter)
adapter->vxlan_port = 0;
 
netdev->hw_enc_features = 0;
-   netdev->hw_features &= ~(NETIF_F_GSO_UDP_TUNNEL);
-   netdev->features &= ~(NETIF_F_GSO_UDP_TUNNEL);
 }
 
 static void be_calculate_vf_res(struct be_adapter *adapter, u16 num_vfs,
@@ -5320,6 +5316,7 @@ static void be_netdev_init(struct net_device *netdev)
struct be_adapter *adapter = netdev_priv(netdev);
 
netdev->hw_features |= NETIF_F_SG | NETIF_F_TSO | NETIF_F_TSO6 |
+   NETIF_F_GSO_UDP_TUNNEL |
NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM | NETIF_F_RXCSUM |
NETIF_F_HW_VLAN_CTAG_TX;
if ((be_if_cap_flags(adapter) & BE_IF_FLAGS_RSS))
-- 
2.17.1



PROBLEM: NETDEV WATCHDOG r8169 transmit queue time out in commit 4fd48c4ac0a0578862819295222a825c97686ac7 onwards.

2018-10-03 Thread Iain Price

[1.] One line summary of the problem:

NETDEV WATCHDOG r8169 transmit queue time out in commit 
4fd48c4ac0a0578862819295222a825c97686ac7 onwards.



[2.] Full description of the problem/report:

Uncertain exact cause, standard usage uptime before issue is 2-12 hours, 
however I can accelerate the fault by high speed transmits from that 
machine (rsyncing 600GB data from that server causes fault quickly).
Issue is that machine stops responding to all network traffic. Console 
login reveals a NETDEV WATCHDOG timeout on the ethernet device.
A git bisect analysis blamed commit 
4fd48c4ac0a0578862819295222a825c97686ac7 and I can confirm this commit 
will trigger this condition.
I am currently using the previous commit 
(82d3ff6dd1994d54fe11714ffd4c696cfcacbea1) and have completed the 600GB 
rsync with this version without any issue.
Machines worked fine for some months, noticed issue early in 4.17 
release, held out for a while before starting a slow bisect (took 2 days 
for a good version, usually <12 hours for bad).


[3.] Keywords (i.e., modules, networking, kernel):

Kernel, Networking, Drivers, Realtek, RTL8169, r8169

[4.] Kernel information
[4.1.] Kernel version (from /proc/version):

(/proc/version lies, this was built from git) - Linux version 
4.17.0-rc2+ (root@mercury) (gcc version 7.3.0 (Gentoo 7.3.0-r3 p1.4)) 
#26 SMP Tue Oct 2 12:29:13 BST 2018


Good commit version: git checkout 82d3ff6dd1994d54fe11714ffd4c696cfcacbea1
Bad commit version: git checkout 
4fd48c4ac0a0578862819295222a825c97686ac7 (onwards)


[4.2.] Kernel .config file:

#
# Automatically generated file; DO NOT EDIT.
# Linux/x86 4.17.0-rc2 Kernel Configuration
#
CONFIG_64BIT=y
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_MMU=y
CONFIG_ARCH_MMAP_RND_BITS_MIN=28
CONFIG_ARCH_MMAP_RND_BITS_MAX=32
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MIN=8
CONFIG_ARCH_MMAP_RND_COMPAT_BITS_MAX=16
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_64_SMP=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_PGTABLE_LEVELS=4
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y
CONFIG_THREAD_INFO_IN_TASK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_USELIB=y
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_AUDITSYSCALL=y
CONFIG_AUDIT_WATCH=y
CONFIG_AUDIT_TREE=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_GENERIC_IRQ_MIGRATION=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR=y
CONFIG_GENERIC_IRQ_RESERVATION_MODE=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
# CONFIG_GENERIC_IRQ_DEBUGFS is not set
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
# CONFIG_IRQ_TIME_ACCOUNTING is not set
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y
CONFIG_CPU_ISOLATION=y

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_RCU_EXPERT i

[PATCH net-next] cxgb4: remove the unneeded locks

2018-10-03 Thread Ganesh Goudar
cxgb_set_tx_maxrate will be called holding rtnl lock,
hence remove all unneeded locks.

Signed-off-by: Ganesh Goudar 
---
 drivers/net/ethernet/chelsio/cxgb4/sched.c | 68 +++---
 drivers/net/ethernet/chelsio/cxgb4/sched.h |  2 -
 2 files changed, 15 insertions(+), 55 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/sched.c 
b/drivers/net/ethernet/chelsio/cxgb4/sched.c
index 7fc6566..52edb68 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sched.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sched.c
@@ -38,7 +38,6 @@
 #include "cxgb4.h"
 #include "sched.h"
 
-/* Spinlock must be held by caller */
 static int t4_sched_class_fw_cmd(struct port_info *pi,
 struct ch_sched_params *p,
 enum sched_fw_ops op)
@@ -67,7 +66,6 @@ static int t4_sched_class_fw_cmd(struct port_info *pi,
return err;
 }
 
-/* Spinlock must be held by caller */
 static int t4_sched_bind_unbind_op(struct port_info *pi, void *arg,
   enum sched_bind_type type, bool bind)
 {
@@ -163,7 +161,6 @@ static int t4_sched_queue_unbind(struct port_info *pi, 
struct ch_sched_queue *p)
if (e && index >= 0) {
int i = 0;
 
-   spin_lock(&e->lock);
list_for_each_entry(qe, &e->queue_list, list) {
if (i == index)
break;
@@ -171,10 +168,8 @@ static int t4_sched_queue_unbind(struct port_info *pi, 
struct ch_sched_queue *p)
}
err = t4_sched_bind_unbind_op(pi, (void *)qe, SCHED_QUEUE,
  false);
-   if (err) {
-   spin_unlock(&e->lock);
-   goto out;
-   }
+   if (err)
+   return err;
 
list_del(&qe->list);
kvfree(qe);
@@ -182,9 +177,7 @@ static int t4_sched_queue_unbind(struct port_info *pi, 
struct ch_sched_queue *p)
e->state = SCHED_STATE_UNUSED;
memset(&e->info, 0, sizeof(e->info));
}
-   spin_unlock(&e->lock);
}
-out:
return err;
 }
 
@@ -210,10 +203,8 @@ static int t4_sched_queue_bind(struct port_info *pi, 
struct ch_sched_queue *p)
 
/* Unbind queue from any existing class */
err = t4_sched_queue_unbind(pi, p);
-   if (err) {
-   kvfree(qe);
-   goto out;
-   }
+   if (err)
+   goto out_err;
 
/* Bind queue to specified class */
memset(qe, 0, sizeof(*qe));
@@ -221,18 +212,16 @@ static int t4_sched_queue_bind(struct port_info *pi, 
struct ch_sched_queue *p)
memcpy(&qe->param, p, sizeof(qe->param));
 
e = &s->tab[qe->param.class];
-   spin_lock(&e->lock);
err = t4_sched_bind_unbind_op(pi, (void *)qe, SCHED_QUEUE, true);
-   if (err) {
-   kvfree(qe);
-   spin_unlock(&e->lock);
-   goto out;
-   }
+   if (err)
+   goto out_err;
 
list_add_tail(&qe->list, &e->queue_list);
atomic_inc(&e->refcnt);
-   spin_unlock(&e->lock);
-out:
+   return err;
+
+out_err:
+   kvfree(qe);
return err;
 }
 
@@ -296,8 +285,6 @@ int cxgb4_sched_class_bind(struct net_device *dev, void 
*arg,
   enum sched_bind_type type)
 {
struct port_info *pi = netdev2pinfo(dev);
-   struct sched_table *s;
-   int err = 0;
u8 class_id;
 
if (!can_sched(dev))
@@ -323,12 +310,8 @@ int cxgb4_sched_class_bind(struct net_device *dev, void 
*arg,
if (class_id == SCHED_CLS_NONE)
return -ENOTSUPP;
 
-   s = pi->sched_tbl;
-   write_lock(&s->rw_lock);
-   err = t4_sched_class_bind_unbind_op(pi, arg, type, true);
-   write_unlock(&s->rw_lock);
+   return t4_sched_class_bind_unbind_op(pi, arg, type, true);
 
-   return err;
 }
 
 /**
@@ -343,8 +326,6 @@ int cxgb4_sched_class_unbind(struct net_device *dev, void 
*arg,
 enum sched_bind_type type)
 {
struct port_info *pi = netdev2pinfo(dev);
-   struct sched_table *s;
-   int err = 0;
u8 class_id;
 
if (!can_sched(dev))
@@ -367,12 +348,7 @@ int cxgb4_sched_class_unbind(struct net_device *dev, void 
*arg,
if (!valid_class_id(dev, class_id))
return -EINVAL;
 
-   s = pi->sched_tbl;
-   write_lock(&s->rw_lock);
-   err = t4_sched_class_bind_unbind_op(pi, arg, type, false);
-   write_unlock(&s->rw_lock);
-
-   return err;
+   return t4_sched_class_bind_unbind_op(pi, arg, type, false);
 }
 
 /* If @p is NULL, fetch any available unused class */
@@ -425,7 +401,6 @@ static struct sched_class *t4_sched_class_lookup(struct 
port_info *pi,
 static struct sched_class *t4_sched_class_alloc(struct port_info *pi,
 

Re: __dev_kfree_skb_any() and use of dev_kfree_skb()

2018-10-03 Thread Neil Horman
On Tue, Oct 02, 2018 at 03:20:48PM -0700, Florian Fainelli wrote:
> On 10/02/2018 03:05 PM, Eric Dumazet wrote:
> > On Tue, Oct 2, 2018 at 2:54 PM Florian Fainelli  
> > wrote:
> >>
> >> On 10/02/2018 02:17 PM, Eric Dumazet wrote:
> >>> On Tue, Oct 2, 2018 at 1:07 PM Florian Fainelli  
> >>> wrote:
> 
>  Hi Eric, Neil,
> 
>  Should not __dev_kfree_skb_any() call kfree_skb() instead of
>  dev_kfree_skb() which is aliased to consumes_skb() and therefore does
>  not flag the skb with SKB_REASON_DROPPED?
> 
>  If we take the in_irq() || irqs_disabled() branch, we will be calling
>  __dev_kfree_skb_irq() which takes care of setting the skb_free_reason
>  frmo the caller.
> 
>  Is there an implied semantic with dev_kfree_skb() that it means it was
>  freed by the network device and therefore this equals to a consumption
>  (not a drop)? The comment above dev_kfree_skb_any() seems to imply this
>  should be a context unaware replacement for kfree_skb().
> >>>
> >>>
> >>> Really the problem here is that we have more than one thousand calls
> >>> to dev_kfree_skb_any()
> >>> (compared to ~ 90 calls to dev_consume_skb_any())
> >>>
> >>> So it will be a huge task cleaning all this.
> >>
> >> So you are kind of saying this is an established behavior, don't change
> >> it :)
> >>
> >> One could argue that if people were happily sprinkling
> >> dev_kfree_skb_any() in error or success paths, and all SKB freeing was
> >> accounted for as "consumed" instead of "dropped" in non-atomic context,
> >> this may not be such a big deal to reverse that and make it "dropped" in
> >> all contexts?
> >>
> > 
> > Most of these calls happening on typical hosts are from TX completion path,
> > so they really are consumed, not dropped.
> > 
> > So if you intend to pretend they are drops, this will not please
> > people using drop monitor.
> 
> I am not intending to pretend they are drops, just trying to make their
> behavior consistent depending on the calling context, hence my question
> whether this was intentional or not because __dev_kfree_skb_irq9() will
> flag them as dropped correctly. Right now this is not consistent with
> either the function name nor its comment in include/linux/netdevice.h.
> 
As Eric noted, dev_kfree_skb_any was added before the drop monitor code, and so
I don't think much commentary can be made of in the way of 'intent'.  The way to
reconcile all these code points is to look at each one in turn, determine from
the location and the purpose of the call if it is really a drop, or simply the
end of the useful life of the skb (a consume), and either change the call to the
appropriate one, or decide that whats there is correct.  I've done this in the
past in several locations, and honestly, its just a tedium.  I find that, if you
are using drop monitor, and come accross a false positive (or false negative),
submit a patch for that location to reconcile it, and slowly they will all get
corrected.

Neil

> > 
> > Really the only way would to review all call sites and perform a
> > cleanup, then propagate the ' reason' properly
> > in the helper.
> > 
> 
> Alright, thanks!
> -- 
> Florian
> 


r8169 tx batching(?) causing performance problems

2018-10-03 Thread David Howells
Hi,

Can someone help me figure out a performance issue that seems to be caused by
an RTL8168g/8111g NIC that seems to be batching up transmissions - or, at
least, not starting immediately that it's given something to transmit?

The setup that I'm dealing with is an AFS filesystem client (the test machine
with the troublesome NIC) and an AFS server machine connected by GigE through
a switch by 2m cables.  The network is pretty much uncontended and the server
is only serving files to the test machine.

AFS uses the RxRPC protocol which runs over UDP, one RxRPC packet per UDP
packet.  RxRPC has two packet types that are of relevance to this issue: DATA
and ACK.  Linux contains an implementation of RxRPC as the AF_RXRPC socket
family and an AFS filesystem that can be found in fs/afs/.

The symptoms I'm seeing are that whilst the client is doing a multi-megabyte
direct-I/O reads over AFS, the server is occasionally stalling in its sending
of DATA packets because the ACK packets being sent by the client are appearing
in batches on the server.

I stuck some tracepoints into the r8169 driver to investigate the issue (see
attached patch).  Here's an excerpt from the trace (note that I didn't enable
the rx tracepoint):

 dd-3179 [000] .N.337.095116: net_rtl8169_tx: enp3s0 p=178-179 skb=ba987d36
 nd-2961 [001] d.h137.095250: net_rtl8169_interrupt: enp3s0 st=81
 nd-2961 [001] ..s137.095253: net_rtl8169_poll: enp3s0 st=81
 dd-3179 [000] ...337.095286: net_rtl8169_tx: enp3s0 p=178-180 skb=44d352cf
 dd-3179 [000] .N.337.095307: net_rtl8169_tx: enp3s0 p=178-181 skb=53ac88f1
 dd-3179 [000] .N.337.095315: net_rtl8169_tx: enp3s0 p=178-182 skb=4f3eedd9
 dd-3179 [000] .N.337.095328: net_rtl8169_tx: enp3s0 p=178-183 skb=38c81784
 dd-3179 [000] .N.337.095338: net_rtl8169_tx: enp3s0 p=178-184 skb=1f0a8fc3
 dd-3179 [000] .N.337.095345: net_rtl8169_tx: enp3s0 p=178-185 skb=281a484d
 dd-3179 [000] .N.337.095362: net_rtl8169_tx: enp3s0 p=178-186 skb=13d8a01a
 dd-3179 [000] .N.337.095370: net_rtl8169_tx: enp3s0 p=178-187 skb=68fe3d70
 dd-3179 [000] .N.337.095382: net_rtl8169_tx: enp3s0 p=178-188 skb=3cf64dd8
 dd-3179 [000] .N.337.095390: net_rtl8169_tx: enp3s0 p=178-189 skb=da35591a
 dd-3179 [000] .N.337.095403: net_rtl8169_tx: enp3s0 p=178-190 skb=3013974a
 dd-3179 [000] .N.337.095410: net_rtl8169_tx: enp3s0 p=178-191 skb=7ecddd8e
 dd-3179 [000] .N.337.095427: net_rtl8169_tx: enp3s0 p=178-192 skb=aa30c686
 nd-2961 [001] d.h137.095433: net_rtl8169_interrupt: enp3s0 st=85
 nd-2961 [001] ..s137.095434: net_rtl8169_poll: enp3s0 st=85
 dd-3179 [000] .N.337.095435: net_rtl8169_tx: enp3s0 p=178-193 skb=006b0947
 nd-2961 [001] ..s137.095439: net_rtl8169_tx_done: enp3s0 p=178 skb=ba987d36
 nd-2961 [001] ..s137.095440: net_rtl8169_tx_done: enp3s0 p=179 skb=44d352cf
 nd-2961 [001] ..s137.095441: net_rtl8169_tx_done: enp3s0 p=180 skb=53ac88f1
 nd-2961 [001] ..s137.095441: net_rtl8169_tx_done: enp3s0 p=181 skb=4f3eedd9
 nd-2961 [001] ..s137.095442: net_rtl8169_tx_done: enp3s0 p=182 skb=38c81784
 nd-2961 [001] ..s137.095443: net_rtl8169_tx_done: enp3s0 p=183 skb=1f0a8fc3
 nd-2961 [001] ..s137.095443: net_rtl8169_tx_done: enp3s0 p=184 skb=281a484d
 nd-2961 [001] ..s137.095444: net_rtl8169_tx_done: enp3s0 p=185 skb=13d8a01a
 nd-2961 [001] ..s137.095445: net_rtl8169_tx_done: enp3s0 p=186 skb=68fe3d70
 nd-2961 [001] ..s137.095445: net_rtl8169_tx_done: enp3s0 p=187 skb=3cf64dd8
 nd-2961 [001] ..s137.095446: net_rtl8169_tx_done: enp3s0 p=188 skb=da35591a
 nd-2961 [001] ..s137.095446: net_rtl8169_tx_done: enp3s0 p=189 skb=3013974a
 nd-2961 [001] ..s137.095447: net_rtl8169_tx_done: enp3s0 p=190 skb=7ecddd8e

As can be seen, there are a number of packets being transmitted by AF_RXRPC
from the dd command.  Each of these is an ACK packet.  ACK packets are around
850-900 bits in size on the wire according to wireshark.

The first Tx interrupt (status 0x85: TxOk is 0x04) occurs 317uS after the
packet 178 is added to an empty buffer and is deleted 6uS after that.  Packet
190 is deleted 37uS after being added.

What surprises me is that it would seem that an ACK packet should take about
~1uS to transmit and the average interval between ACK packets being added to
the queue is ~12uS, but the ring is not obviously making progress in being
transmitted.

Now, I understand that the dirty_tx and cur_tx ring indices are private to the
driver and not seen by the device, but I added an extra bit of code to count
up the number of descriptors with DescOwn still set, and it always appears to
match the number of packets in the buffer.

So I'm guessing that the NIC does one of a number of things:

 (1) It delays starting transmission until it's got a large enough batch.

 (2) It delays starting transmission until some time has passed since being
 poked.

 (3) It takes time to get the transmitter going for some reason.

 (4) It's batching the upda

Kernel oops with mlx5 and dual XDP redirect programs

2018-10-03 Thread Toke Høiland-Jørgensen
Hi Saeed

I can reliably oops the kernel with the mlx5 driver, by installing
XDP_REDIRECT programs on two devices so they redirect to each other,
and then remove them while there is traffic on the interface.

Steps to reproduce:

# cd ~/build/linux/samples/bpf
# ./xdp_redirect_map $( 8b 8c 
24 a8 3f 00 00 4d 8d bc 24 c0 3c 00 00 83 e1 01 0f 84 19
[ 1400.996624] RSP: 0018:90209fb43bb0 EFLAGS: 00010202
[ 1401.002001] RAX: ff9c RBX:  RCX: 0005
[ 1401.009122] RDX: c7627fd75190 RSI: 0010 RDI: 90208458
[ 1401.016250] RBP: c7627fd75190 R08: 901f9821c100 R09: c7627fd75210
[ 1401.023379] R10: 05dc R11:  R12: 
[ 1401.030500] R13: 90208158 R14: 0001 R15: c7627fd75190
[ 1401.037645] FS:  7f460fa96700() GS:90209fb4() 
knlGS:
[ 1401.045718] CS:  0010 DS:  ES:  CR0: 80050033
[ 1401.051452] CR2: 3fa8 CR3: 00076c3b6006 CR4: 003606e0
[ 1401.058573] DR0:  DR1:  DR2: 
[ 1401.065823] DR3:  DR6: fffe0ff0 DR7: 0400
[ 1401.072943] Call Trace:
[ 1401.075390]  
[ 1401.077409]  bq_xmit_all+0x5e/0x160
[ 1401.080897]  dev_map_enqueue+0x12e/0x140
[ 1401.084823]  xdp_do_redirect+0x1a9/0x2a0
[ 1401.088756]  mlx5e_xdp_handle+0x24f/0x2b0 [mlx5_core]
[ 1401.093821]  ? resched_cpu+0x5f/0x70
[ 1401.097399]  ? __xdp_return+0x189/0x400
[ 1401.101242]  mlx5e_skb_from_cqe_linear+0xdd/0x180 [mlx5_core]
[ 1401.106987]  mlx5e_handle_rx_cqe+0x43/0xe0 [mlx5_core]
[ 1401.112130]  mlx5e_poll_rx_cq+0xcb/0x940 [mlx5_core]
[ 1401.117094]  mlx5e_napi_poll+0xa6/0xc90 [mlx5_core]
[ 1401.121966]  ? smp_reschedule_interrupt+0x16/0xd0
[ 1401.126789]  ? reschedule_interrupt+0xf/0x20
[ 1401.131057]  ? reschedule_interrupt+0xa/0x20
[ 1401.135321]  net_rx_action+0x279/0x3d0
[ 1401.139071]  __do_softirq+0xf2/0x28e
[ 1401.142651]  irq_exit+0xb6/0xc0
[ 1401.145792]  do_IRQ+0x52/0xd0
[ 1401.148785]  common_interrupt+0xf/0xf
[ 1401.152445]  
[ 1401.154559] RIP: 0010:mlx5e_open_channels+0x65e/0x1390 [mlx5_core]
[ 1401.160734] Code: 8b 00 48 05 a8 00 00 00 48 89 85 78 3c 00 00 48 8b 83 f8 
8d 01 00 48 89 85 80 3c 00 00 48 8b 83 f0 8d 01 00 8b 80 a8 fb 03 00 <0f> c8 89 
85 88 3c 00 00 41 0f b6 45 16 88 85 8c 3c 00 00 49 83 bd
[ 1401.179463] RSP: 0018:a7628dd43808 EFLAGS: 0282 ORIG_RAX: 
ffd4
[ 1401.187024] RAX: 0008 RBX: 9020845808c0 RCX: 
[ 1401.194325] RDX: a7628dd43894 RSI:  RDI: 901f8a0e
[ 1401.201463] RBP: 901f8a0d8000 R08: e1799d283800 R09: 0008
[ 1401.208582] R10:  R11: 0002 R12: 
[ 1401.215702] R13: 902084583940 R14:  R15: 
[ 1401.222834]  ? mlx5e_open_channels+0x5e1/0x1390 [mlx5_core]
[ 1401.228404]  ? rcu_exp_wait_wake+0x550/0x550
[ 1401.232674]  ? free_one_page+0x68/0x370
[ 1401.236519]  mlx5e_open_locked+0x28/0xa0 [mlx5_core]
[ 1401.241491]  mlx5e_xdp+0x2b2/0x300 [mlx5_core]
[ 1401.245936]  dev_xdp_install+0x4c/0x70
[ 1401.249686]  do_setlink+0xcdb/0xd10
[ 1401.253300]  ? flat_send_IPI_allbutself+0x6c/0xa0
[ 1401.258003]  ? __update_load_avg_se+0x20c/0x290
[ 1401.262530]  rtnl_setlink+0x104/0x140
[ 1401.266189]  rtnetlink_rcv_msg+0x269/0x310
[ 1401.270283]  ? _cond_resched+0x16/0x40
[ 1401.274029]  ? __kmalloc_node_track_caller+0x1dd/0x2a0
[ 1401.279162]  ? rtnl_calcit.isra.32+0x110/0x110
[ 1401.283601]  netlink_rcv_skb+0xdb/0x110
[ 1401.287437]  netlink_unicast+0x18b/0x250
[ 1401.291359]  netlink_sendmsg+0x2c7/0x3b0
[ 1401.295287]  sock_sendmsg+0x30/0x40
[ 1401.298776]  __sys_sendto+0xd8/0x150
[ 1401.302351]  ? __sys_getsockname+0xac/0xc0
[ 1401.306448]  ? netlink_setsockopt+0x2e/0x2b0
[ 1401.310718]  ? __sys_setsockopt+0x7c/0xe0
[ 1401.314867]  __x64_sys_sendto+0x24/0x30
[ 1401.318709]  do_syscall_64+0x4f/0x100
[ 1401.322372]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1401.327420] RIP: 0033:0x7f460f3a83dd
[ 1401.330997] Code: 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 8b 05 7a 13 2c 00 
85 c0 75 3e 45 31 c9 45 31 c0 4c 63 d1 48 63 ff b8 2c 00 00 00 0f 05 <48> 3d 00 
f0 ff ff 77 0b c3 66 2e 0f 1f 84 00 00 00 00 00 48 8b 15
[ 1401.349733] RSP: 002b:7ffd28d23138 EFLAGS: 0246 ORIG_RAX: 
002c
[ 1401.357293] RAX: ffda RBX: ff90 RCX: 7f460f3a83dd
[ 1401.364413] RDX: 002c RSI: 7ffd28d23170 RDI: 0003
[ 1401.371533] RBP: 7ffd28d231e0 R08:  R09: 
[ 1401.378767] R10:  R11: 0246 R12: 0006
[ 1401.385895] R13: 7ffd28d237f0 R14: 7ffd28d23830 R15: 7ffd28d2388c
[ 1401.393016] Modules linked in: rpcrdma ib_umad sunrpc ib_ipoib rdma_ucm 
mlx5_ib binfmt_misc ib_uverbs snd_hda_codec_hdmi intel_rapl sb_edac 
x86_pkg_temp_thermal intel_powerclamp 

Re: [PATCH net] ipv6: revert degradation in IPv6 Ready Logo test results

2018-10-03 Thread 吉藤英明
Hi,

2018年10月3日(水) 16:57 Mike Manning :
>
> On 02/10/2018 19:26, David Miller wrote:
> > From: Mike Manning 
> > Date: Tue,  2 Oct 2018 12:40:30 +0100
> >
> >> This reverts commit 0ed4229b08c1 ("ipv6: defrag: drop non-last frags
> >> smaller than min mtu"). While one should not get fragments smaller than
> >> the IPv6 minimum MTU, not handling crafted packets in the TAHI IPv6
> >> conformance test suite (v6eval) for IPv6 Ready Logo results in 18
> >> failures representing over 5% of the score.
> >>
> >> Cc: Florian Westphal 
> >> Signed-off-by: Mike Manning 
> > Sorry, I'm not just going to blindly apply a patch because some
> > TAHI tests fail.
> >
> > It's possible the TAHI tests are wrong, or that the specification
> > elements it is testing don't make any sense these days.
> >
> > Allowing all kinds of random junk in the middle of the fragment queue
> > leads to lots of unnecessary cpu overhead and potential bugs, and it
> > triggerable remotely.
>
> Understood, thank you.
>
> It would be great if there is someone on this mailer who has influence
> with ipv6ready.org so as to get the TAHI tests for IPv6 conformance
> updated, as an upgrade to a kernel with the commit mentioned will result
> in a 5% degradation in results for the existing tests.
>

You can ignore some tests especially if you have some related,
updated RFC(s).

--yoshfuji


Re: [PATCH net] ipv6: revert degradation in IPv6 Ready Logo test results

2018-10-03 Thread Mike Manning
On 02/10/2018 19:26, David Miller wrote:
> From: Mike Manning 
> Date: Tue,  2 Oct 2018 12:40:30 +0100
>
>> This reverts commit 0ed4229b08c1 ("ipv6: defrag: drop non-last frags
>> smaller than min mtu"). While one should not get fragments smaller than
>> the IPv6 minimum MTU, not handling crafted packets in the TAHI IPv6
>> conformance test suite (v6eval) for IPv6 Ready Logo results in 18
>> failures representing over 5% of the score.
>>
>> Cc: Florian Westphal 
>> Signed-off-by: Mike Manning 
> Sorry, I'm not just going to blindly apply a patch because some
> TAHI tests fail.
>
> It's possible the TAHI tests are wrong, or that the specification
> elements it is testing don't make any sense these days.
>
> Allowing all kinds of random junk in the middle of the fragment queue
> leads to lots of unnecessary cpu overhead and potential bugs, and it
> triggerable remotely.

Understood, thank you.

It would be great if there is someone on this mailer who has influence
with ipv6ready.org so as to get the TAHI tests for IPv6 conformance
updated, as an upgrade to a kernel with the commit mentioned will result
in a 5% degradation in results for the existing tests.



Re: [PATCH] xfrm: fix gro_cells leak when remove virtual xfrm interfaces

2018-10-03 Thread Steffen Klassert
On Sun, Sep 30, 2018 at 03:06:06PM +0800, Li RongQing wrote:
> The device gro_cells has been initialized, it should be freed,
> otherwise it will be leaked
> 
> Fixes: f203b76d78092faf2 ("xfrm: Add virtual xfrm interfaces")
> Signed-off-by: Zhang Yu 
> Signed-off-by: Li RongQing 

Applied, thanks a lot!