date:20170215

Re: [PATCH net-next V2 1/6] net/sched: cls_matchall: Dump skip flags

2017-02-15 Thread Or Gerlitz

On Tue, Feb 14, 2017 at 9:28 PM, Jakub Kicinski  wrote:
> On Tue, 14 Feb 2017 16:30:35 +0200, Or Gerlitz wrote:
>> The skip flags are not dumped to user-space, do that.
>>
>> Signed-off-by: Or Gerlitz 
>> Acked-by: Jiri Pirko 
>> Acked-by: Yotam Gigi 
>> ---
>>  net/sched/cls_matchall.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c
>> index f2141cb..ce2f7d4 100644
>> --- a/net/sched/cls_matchall.c
>> +++ b/net/sched/cls_matchall.c
>> @@ -244,6 +244,8 @@ static int mall_dump(struct net *net, struct tcf_proto 
>> *tp, unsigned long fh,
>>   nla_put_u32(skb, TCA_MATCHALL_CLASSID, head->res.classid))
>>   goto nla_put_failure;
>>
>> + nla_put_u32(skb, TCA_MATCHALL_FLAGS, head->flags);
>> +
>
> Shouldn't the return status from nla_put_u32() be checked?

Yeah, amazing how the developer (me) and three internal reviewers
missed that :(, thanks!

Or.

[PATCH net-next v7 0/2] qed*: Add support for PTP

2017-02-15 Thread Yuval Mintz

This patch series adds required changes for qed/qede drivers for
supporting the IEEE Precision Time Protocol (PTP).

Dave,
Please consider applying this series to "net-next".

Thanks,
Yuval

Changes from previous versions:
---
v7: Fixed Kbuild robot warnings.

v6: Corrected broken loop iteration in previous version.
Reduced approximation error of adjfreq.

v5: Removed two divisions from the adjust-frequency loop.
Resulting logic would use 8 divisions [instead of 24].

v4: Remove the loop iteration for value '0' in the qed_ptp_hw_adjfreq()
implementation.

v3: Use div_s64 for 64-bit divisions as do_div gives error for signed
types.
Incorporated review comments from Richard Cochran.
  - Clear timestamp resgisters as soon as timestamp is read.
  - Use shift operation in the place of 'divide by 16'.

v2: Use do_div for 64-bit divisions.

Sudarsana Reddy Kalluru (2):
  qed: Add infrastructure for PTP support
  qede: Add driver support for PTP


 drivers/net/ethernet/qlogic/Kconfig |   1 +
 drivers/net/ethernet/qlogic/qed/Makefile|   2 +-
 drivers/net/ethernet/qlogic/qed/qed.h   |   2 +
 drivers/net/ethernet/qlogic/qed/qed_l2.c|   5 +
 drivers/net/ethernet/qlogic/qed/qed_l2.h|   1 +
 drivers/net/ethernet/qlogic/qed/qed_main.c  |  15 +
 drivers/net/ethernet/qlogic/qed/qed_ptp.c   | 323 ++
 drivers/net/ethernet/qlogic/qed/qed_ptp.h   |  47 +++
 drivers/net/ethernet/qlogic/qed/qed_reg_addr.h  |  31 ++
 drivers/net/ethernet/qlogic/qede/Makefile   |   2 +-
 drivers/net/ethernet/qlogic/qede/qede.h |   4 +
 drivers/net/ethernet/qlogic/qede/qede_ethtool.c |  10 +
 drivers/net/ethernet/qlogic/qede/qede_fp.c  |   5 +
 drivers/net/ethernet/qlogic/qede/qede_main.c|  39 ++
 drivers/net/ethernet/qlogic/qede/qede_ptp.c | 536 
 drivers/net/ethernet/qlogic/qede/qede_ptp.h |  65 +++
 include/linux/qed/qed_eth_if.h  |  22 +
 17 files changed, 1108 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/qlogic/qed/qed_ptp.c
 create mode 100644 drivers/net/ethernet/qlogic/qed/qed_ptp.h
 create mode 100644 drivers/net/ethernet/qlogic/qede/qede_ptp.c
 create mode 100644 drivers/net/ethernet/qlogic/qede/qede_ptp.h

-- 
1.9.3

[PATCH net-next v7 1/2] qed: Add infrastructure for PTP support

2017-02-15 Thread Yuval Mintz

From: Sudarsana Reddy Kalluru 

The patch adds the required qed interfaces for configuring/reading
the PTP clock on the adapter.

Signed-off-by: Sudarsana Reddy Kalluru 
Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qed/Makefile   |   2 +-
 drivers/net/ethernet/qlogic/qed/qed.h  |   2 +
 drivers/net/ethernet/qlogic/qed/qed_l2.c   |   5 +
 drivers/net/ethernet/qlogic/qed/qed_l2.h   |   1 +
 drivers/net/ethernet/qlogic/qed/qed_main.c |  15 ++
 drivers/net/ethernet/qlogic/qed/qed_ptp.c  | 323 +
 drivers/net/ethernet/qlogic/qed/qed_ptp.h  |  47 
 drivers/net/ethernet/qlogic/qed/qed_reg_addr.h |  31 +++
 include/linux/qed/qed_eth_if.h |  22 ++
 9 files changed, 447 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/qlogic/qed/qed_ptp.c
 create mode 100644 drivers/net/ethernet/qlogic/qed/qed_ptp.h

diff --git a/drivers/net/ethernet/qlogic/qed/Makefile 
b/drivers/net/ethernet/qlogic/qed/Makefile
index 729e437..1a7300f 100644
--- a/drivers/net/ethernet/qlogic/qed/Makefile
+++ b/drivers/net/ethernet/qlogic/qed/Makefile
@@ -2,7 +2,7 @@ obj-$(CONFIG_QED) := qed.o
 
 qed-y := qed_cxt.o qed_dev.o qed_hw.o qed_init_fw_funcs.o qed_init_ops.o \
 qed_int.o qed_main.o qed_mcp.o qed_sp_commands.o qed_spq.o qed_l2.o \
-qed_selftest.o qed_dcbx.o qed_debug.o
+qed_selftest.o qed_dcbx.o qed_debug.o qed_ptp.o
 qed-$(CONFIG_QED_SRIOV) += qed_sriov.o qed_vf.o
 qed-$(CONFIG_QED_LL2) += qed_ll2.o
 qed-$(CONFIG_QED_RDMA) += qed_roce.o
diff --git a/drivers/net/ethernet/qlogic/qed/qed.h 
b/drivers/net/ethernet/qlogic/qed/qed.h
index 1f61cf3..6557f94 100644
--- a/drivers/net/ethernet/qlogic/qed/qed.h
+++ b/drivers/net/ethernet/qlogic/qed/qed.h
@@ -456,6 +456,8 @@ struct qed_hwfn {
u8 dcbx_no_edpm;
u8 db_bar_no_edpm;
 
+   /* p_ptp_ptt is valid for leading HWFN only */
+   struct qed_ptt *p_ptp_ptt;
struct qed_simd_fp_handler  simd_proto_handler[64];
 
 #ifdef CONFIG_QED_SRIOV
diff --git a/drivers/net/ethernet/qlogic/qed/qed_l2.c 
b/drivers/net/ethernet/qlogic/qed/qed_l2.c
index 7520eb3..df932be 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_l2.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_l2.c
@@ -214,6 +214,7 @@ int qed_sp_eth_vport_start(struct qed_hwfn *p_hwfn,
p_ramrod->vport_id  = abs_vport_id;
 
p_ramrod->mtu   = cpu_to_le16(p_params->mtu);
+   p_ramrod->handle_ptp_pkts   = p_params->handle_ptp_pkts;
p_ramrod->inner_vlan_removal_en = p_params->remove_inner_vlan;
p_ramrod->drop_ttl0_en  = p_params->drop_ttl0;
p_ramrod->untagged  = p_params->only_untagged;
@@ -1886,6 +1887,7 @@ static int qed_start_vport(struct qed_dev *cdev,
start.drop_ttl0 = params->drop_ttl0;
start.opaque_fid = p_hwfn->hw_info.opaque_fid;
start.concrete_fid = p_hwfn->hw_info.concrete_fid;
+   start.handle_ptp_pkts = params->handle_ptp_pkts;
start.vport_id = params->vport_id;
start.max_buffers_per_cqe = 16;
start.mtu = params->mtu;
@@ -2328,6 +2330,8 @@ static int qed_fp_cqe_completion(struct qed_dev *dev,
 extern const struct qed_eth_dcbnl_ops qed_dcbnl_ops_pass;
 #endif
 
+extern const struct qed_eth_ptp_ops qed_ptp_ops_pass;
+
 static const struct qed_eth_ops qed_eth_ops_pass = {
.common = &qed_common_ops_pass,
 #ifdef CONFIG_QED_SRIOV
@@ -2336,6 +2340,7 @@ static int qed_fp_cqe_completion(struct qed_dev *dev,
 #ifdef CONFIG_DCB
.dcb = &qed_dcbnl_ops_pass,
 #endif
+   .ptp = &qed_ptp_ops_pass,
.fill_dev_info = &qed_fill_eth_dev_info,
.register_ops = &qed_register_eth_ops,
.check_mac = &qed_check_mac,
diff --git a/drivers/net/ethernet/qlogic/qed/qed_l2.h 
b/drivers/net/ethernet/qlogic/qed/qed_l2.h
index 93cb932..e763abd 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_l2.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_l2.h
@@ -156,6 +156,7 @@ struct qed_sp_vport_start_params {
enum qed_tpa_mode tpa_mode;
bool remove_inner_vlan;
bool tx_switching;
+   bool handle_ptp_pkts;
bool only_untagged;
bool drop_ttl0;
u8 max_buffers_per_cqe;
diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c 
b/drivers/net/ethernet/qlogic/qed/qed_main.c
index 93eee83..592e104 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_main.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_main.c
@@ -902,6 +902,7 @@ static int qed_slowpath_start(struct qed_dev *cdev,
struct qed_mcp_drv_version drv_version;
const u8 *data = NULL;
struct qed_hwfn *hwfn;
+   struct qed_ptt *p_ptt;
int rc = -EINVAL;
 
if (qed_iov_wq_start(cdev))
@@ -916,6 +917,14 @@ static int qed_slowpath_start(struct qed_dev *cdev,
  QED_FW_FILE_NAME);
goto err;
}
+
+

[PATCH net-next v7 2/2] qede: Add driver support for PTP

2017-02-15 Thread Yuval Mintz

From: Sudarsana Reddy Kalluru 

This patch adds the driver support for,
  - Registering the ptp clock functionality with the OS.
  - Timestamping the Rx/Tx PTP packets.
  - Ethtool callbacks related to PTP.

Signed-off-by: Sudarsana Reddy Kalluru 
Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/Kconfig |   1 +
 drivers/net/ethernet/qlogic/qede/Makefile   |   2 +-
 drivers/net/ethernet/qlogic/qede/qede.h |   4 +
 drivers/net/ethernet/qlogic/qede/qede_ethtool.c |  10 +
 drivers/net/ethernet/qlogic/qede/qede_fp.c  |   5 +
 drivers/net/ethernet/qlogic/qede/qede_main.c|  39 ++
 drivers/net/ethernet/qlogic/qede/qede_ptp.c | 536 
 drivers/net/ethernet/qlogic/qede/qede_ptp.h |  65 +++
 8 files changed, 661 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/qlogic/qede/qede_ptp.c
 create mode 100644 drivers/net/ethernet/qlogic/qede/qede_ptp.h

diff --git a/drivers/net/ethernet/qlogic/Kconfig 
b/drivers/net/ethernet/qlogic/Kconfig
index 3cfd105..aaa1e85 100644
--- a/drivers/net/ethernet/qlogic/Kconfig
+++ b/drivers/net/ethernet/qlogic/Kconfig
@@ -104,6 +104,7 @@ config QED_SRIOV
 config QEDE
tristate "QLogic QED 25/40/100Gb Ethernet NIC"
depends on QED
+   imply PTP_1588_CLOCK
---help---
  This enables the support for ...
 
diff --git a/drivers/net/ethernet/qlogic/qede/Makefile 
b/drivers/net/ethernet/qlogic/qede/Makefile
index 38fbee6..bc5f7c3 100644
--- a/drivers/net/ethernet/qlogic/qede/Makefile
+++ b/drivers/net/ethernet/qlogic/qede/Makefile
@@ -1,5 +1,5 @@
 obj-$(CONFIG_QEDE) := qede.o
 
-qede-y := qede_main.o qede_fp.o qede_filter.o qede_ethtool.o
+qede-y := qede_main.o qede_fp.o qede_filter.o qede_ethtool.o qede_ptp.o
 qede-$(CONFIG_DCB) += qede_dcbnl.o
 qede-$(CONFIG_QED_RDMA) += qede_roce.o
diff --git a/drivers/net/ethernet/qlogic/qede/qede.h 
b/drivers/net/ethernet/qlogic/qede/qede.h
index b423406..f2aaef2 100644
--- a/drivers/net/ethernet/qlogic/qede/qede.h
+++ b/drivers/net/ethernet/qlogic/qede/qede.h
@@ -137,6 +137,8 @@ struct qede_rdma_dev {
struct workqueue_struct *roce_wq;
 };
 
+struct qede_ptp;
+
 struct qede_dev {
struct qed_dev  *cdev;
struct net_device   *ndev;
@@ -148,8 +150,10 @@ struct qede_dev {
u32 flags;
 #define QEDE_FLAG_IS_VFBIT(0)
 #define IS_VF(edev)(!!((edev)->flags & QEDE_FLAG_IS_VF))
+#define QEDE_TX_TIMESTAMPING_ENBIT(1)
 
const struct qed_eth_ops*ops;
+   struct qede_ptp *ptp;
 
struct qed_dev_eth_info dev_info;
 #define QEDE_MAX_RSS_CNT(edev) ((edev)->dev_info.num_queues)
diff --git a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c 
b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
index baf2642..c02754d 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include "qede.h"
+#include "qede_ptp.h"
 
 #define QEDE_RQSTAT_OFFSET(stat_name) \
 (offsetof(struct qede_rx_queue, stat_name))
@@ -940,6 +941,14 @@ static int qede_set_channels(struct net_device *dev,
return 0;
 }
 
+static int qede_get_ts_info(struct net_device *dev,
+   struct ethtool_ts_info *info)
+{
+   struct qede_dev *edev = netdev_priv(dev);
+
+   return qede_ptp_get_ts_info(edev, info);
+}
+
 static int qede_set_phys_id(struct net_device *dev,
enum ethtool_phys_id_state state)
 {
@@ -1586,6 +1595,7 @@ static int qede_get_tunable(struct net_device *dev,
.get_rxfh_key_size = qede_get_rxfh_key_size,
.get_rxfh = qede_get_rxfh,
.set_rxfh = qede_set_rxfh,
+   .get_ts_info = qede_get_ts_info,
.get_channels = qede_get_channels,
.set_channels = qede_set_channels,
.self_test = qede_self_test,
diff --git a/drivers/net/ethernet/qlogic/qede/qede_fp.c 
b/drivers/net/ethernet/qlogic/qede/qede_fp.c
index 26848ee..1e65038 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_fp.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_fp.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include "qede_ptp.h"
 
 #include 
 #include "qede.h"
@@ -1277,6 +1278,7 @@ static int qede_rx_process_cqe(struct qede_dev *edev,
qede_get_rxhash(skb, fp_cqe->bitfields, fp_cqe->rss_hash);
qede_set_skb_csum(skb, csum_flag);
skb_record_rx_queue(skb, rxq->rxq_id);
+   qede_ptp_record_rx_ts(edev, cqe, skb);
 
/* SKB is prepared - pass it to stack */
qede_skb_receive(edev, fp, rxq, skb, le16_to_cpu(fp_cqe->vlan_tag));
@@ -1451,6 +1453,9 @@ netdev_tx_t qede_start_xmit(struct sk_buff *skb, struct 
net_device *ndev)
first_bd->data.bd_flags.bitfields =
1 << ETH_TX_1ST_BD_FLAGS_START_BD_SHIFT;
 
+   if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP))
+   qede_ptp_tx_ts(edev, skb);
+
/* M

Re: [PATCH 1/2] net: xilinx_emaclite: fix receive buffer overflow

2017-02-15 Thread Anssi Hannula

On 14.2.2017 22:12, David Miller wrote:
> From: Anssi Hannula 
> Date: Tue, 14 Feb 2017 19:11:44 +0200
>
>> xilinx_emaclite looks at the received data to try to determine the
>> Ethernet packet length but does not properly clamp it if
>> proto_type == ETH_P_IP or 1500 < proto_type <= 1518, causing a buffer
>> overflow and a panic via skb_panic() as the length exceeds the allocated
>> skb size.
>>
>> Fix those cases.
>>
>> Also add an additional unconditional check with WARN_ON() at the end.
>>
>> Signed-off-by: Anssi Hannula 
>> Fixes: bb81b2ddfa19 ("net: add Xilinx emac lite device driver")
> Why does this driver do all of this crazy stuff parsing the packet
> headers?
>
> It should be able to just read the length provided by the device
> at XEL_RPLR_OFFSET and just use that.

Looks like XEL_RPLR_OFFSET == XEL_HEADER_OFFSET + XEL_RXBUFF_OFFSET and
that is where the driver reads the on-wire Type/Length field.

Looking through the product guide [1] I don't see the actual receive
packet length provided anywhere, so I guess that is why the crazy stuff
is done.

[1]
https://www.xilinx.com/support/documentation/ip_documentation/axi_ethernetlite/v3_0/pg135-axi-ethernetlite.pdf

-- 
Anssi Hannula / Bitwise Oy
+358503803997

[PATCH] average: change to declare precision, not factor

2017-02-15 Thread Johannes Berg

From: Johannes Berg 

Declaring the factor is counter-intuitive, and people are prone
to using small(-ish) values even when that makes no sense.

Change the DECLARE_EWMA() macro to take the fractional precision,
in bits, rather than a factor, and update all users.

While at it, add some more documentation.

Signed-off-by: Johannes Berg 
---
Unless I hear any objections, I will take this through my tree.
---
 drivers/net/virtio_net.c|  2 +-
 drivers/net/wireless/ath/ath5k/ath5k.h  |  2 +-
 drivers/net/wireless/ralink/rt2x00/rt2x00.h |  2 +-
 include/linux/average.h | 61 +++--
 net/batman-adv/types.h  |  2 +-
 net/mac80211/ieee80211_i.h  |  2 +-
 net/mac80211/sta_info.h |  6 +--
 7 files changed, 49 insertions(+), 28 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 11e28530c83c..5e0cc9ec0f81 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -49,7 +49,7 @@ module_param(gso, bool, 0444);
  * at once, the weight is chosen so that the EWMA will be insensitive to short-
  * term, transient changes in packet size.
  */
-DECLARE_EWMA(pkt_len, 1, 64)
+DECLARE_EWMA(pkt_len, 0, 64)
 
 /* With mergeable buffers we align buffer address and use the low bits to
  * encode its true size. Buffer size is up to 1 page so we need to align to
diff --git a/drivers/net/wireless/ath/ath5k/ath5k.h 
b/drivers/net/wireless/ath/ath5k/ath5k.h
index 67fedb61fcc0..979800c6f57f 100644
--- a/drivers/net/wireless/ath/ath5k/ath5k.h
+++ b/drivers/net/wireless/ath/ath5k/ath5k.h
@@ -1252,7 +1252,7 @@ struct ath5k_statistics {
 #define ATH5K_TXQ_LEN_MAX  (ATH_TXBUF / 4) /* bufs per queue */
 #define ATH5K_TXQ_LEN_LOW  (ATH5K_TXQ_LEN_MAX / 2) /* low mark */
 
-DECLARE_EWMA(beacon_rssi, 1024, 8)
+DECLARE_EWMA(beacon_rssi, 10, 8)
 
 /* Driver state associated with an instance of a device */
 struct ath5k_hw {
diff --git a/drivers/net/wireless/ralink/rt2x00/rt2x00.h 
b/drivers/net/wireless/ralink/rt2x00/rt2x00.h
index 26869b3bef45..340787894c69 100644
--- a/drivers/net/wireless/ralink/rt2x00/rt2x00.h
+++ b/drivers/net/wireless/ralink/rt2x00/rt2x00.h
@@ -257,7 +257,7 @@ struct link_qual {
int tx_failed;
 };
 
-DECLARE_EWMA(rssi, 1024, 8)
+DECLARE_EWMA(rssi, 10, 8)
 
 /*
  * Antenna settings about the currently active link.
diff --git a/include/linux/average.h b/include/linux/average.h
index d04aa58280de..7ddaf340d2ac 100644
--- a/include/linux/average.h
+++ b/include/linux/average.h
@@ -1,45 +1,66 @@
 #ifndef _LINUX_AVERAGE_H
 #define _LINUX_AVERAGE_H
 
-/* Exponentially weighted moving average (EWMA) */
+/*
+ * Exponentially weighted moving average (EWMA)
+ *
+ * This implements a fixed-precision EWMA algorithm, with both the
+ * precision and fall-off coefficient determined at compile-time
+ * and built into the generated helper funtions.
+ *
+ * The first argument to the macro is the name that will be used
+ * for the struct and helper functions.
+ *
+ * The second argument, the precision, expresses how many bits are
+ * used for the fractional part of the fixed-precision values.
+ *
+ * The third argument, the weight reciprocal, determines how the
+ * new values will be weighed vs. the old state, new values will
+ * get weight 1/weight_rcp and old values 1-1/weight_rcp. Note
+ * that this parameter must be a power of two for efficiency.
+ */
 
-#define DECLARE_EWMA(name, _factor, _weight)   \
+#define DECLARE_EWMA(name, _precision, _weight_rcp)\
struct ewma_##name {\
unsigned long internal; \
};  \
static inline void ewma_##name##_init(struct ewma_##name *e)\
{   \
-   BUILD_BUG_ON(!__builtin_constant_p(_factor));   \
-   BUILD_BUG_ON(!__builtin_constant_p(_weight));   \
-   BUILD_BUG_ON_NOT_POWER_OF_2(_factor);   \
-   BUILD_BUG_ON_NOT_POWER_OF_2(_weight);   \
+   BUILD_BUG_ON(!__builtin_constant_p(_precision));\
+   BUILD_BUG_ON(!__builtin_constant_p(_weight_rcp));   \
+   /*  \
+* Even if you want to feed it just 0/1 you should have \
+* some bits for the non-fractional part... \
+*/ \
+   BUILD_BUG_ON((_precision) > 30);\
+   BUILD_BUG_ON_NOT_POWER_OF_2(_weight_rcp);   \
e->internal = 0;\
}

[PATCH net-next V3 1/7] net/sched: cls_flower: Properly handle classifier flags dumping

2017-02-15 Thread Or Gerlitz

Dump the classifier flags only if non zero and make sure to check
the return status of the handler that puts them into the netlink msg.

Signed-off-by: Or Gerlitz 
---
 net/sched/cls_flower.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 0826c8e..850d982 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -1229,7 +1229,8 @@ static int fl_dump(struct net *net, struct tcf_proto *tp, 
unsigned long fh,
if (fl_dump_key_flags(skb, key->control.flags, mask->control.flags))
goto nla_put_failure;
 
-   nla_put_u32(skb, TCA_FLOWER_FLAGS, f->flags);
+   if (f->flags && nla_put_u32(skb, TCA_FLOWER_FLAGS, f->flags))
+   goto nla_put_failure;
 
if (tcf_exts_dump(skb, &f->exts))
goto nla_put_failure;
-- 
2.3.7

[PATCH net-next V3 0/7] net/sched: Reflect HW offload status in classifiers

2017-02-15 Thread Or Gerlitz

Currently there is no way of querying whether a filter is
offloaded to HW or not when using "both" policy (where none
of skip_sw or skip_hw flags are set by user-space).

Added two new flags, "in hw" and "not in hw" such that user space 
can determine if a filter is actually offloaded to hw. The "in hw" 
UAPI semantics was chosen so it's similar to the "skip hw" flag logic.

If none of these two flags are set, this signals running
over older kernel.

As an example, add one vlan push + fwd rule, one matchall rule and one u32 rule 
without any flags, and another vlan + fwd skip_sw rule, such that the different 
TC 
classifier attempt to offload all of them -- all over mlx5 SRIOV VF rep:

# tc filter add dev eth2_0 protocol ip parent : 
flower skip_sw indev eth2_0 src_mac e4:11:22:33:44:50 dst_mac 
e4:1d:2d:a5:f3:9d 
action vlan push id 52 action mirred egress redirect dev eth2

# tc filter add dev eth2_0 protocol ip parent : 
flower indev eth2_0 src_mac e4:11:22:33:44:50 dst_mac e4:11:22:33:44:51 
action vlan push id 53 action mirred egress redirect dev eth2

# tc filter add dev eth2_0 parent : matchall action mirred egress mirror 
dev veth1

# tc filter add dev eth2_0 parent : protocol ip prio 99 handle 800:0:1 
u32 ht 800: flowid 800:1 match ip src 192.168.1.0/24 action drop

Since that VF rep doesn't offload matchall/u32 and can currently offload
only one vlan push rule we expect three of the rules not to be offloaded:

# tc filter show dev eth2_0 parent :

filter protocol ip pref 99 u32 
filter protocol ip pref 99 u32 fh 800: ht divisor 1 
filter protocol ip pref 99 u32 fh 800::1 order 1 key ht 800 bkt 0 flowid 800:1 
not in_hw 
  match c0a80100/ff00 at 12
action order 1: gact action drop
 random type none pass val 0
 index 8 ref 1 bind 1
 
filter protocol all pref 49150 matchall 
filter protocol all pref 49150 matchall handle 0x1 
  not in_hw
action order 1: mirred (Egress Mirror to device veth1) pipe
index 27 ref 1 bind 1
 
filter protocol ip pref 49151 flower 
filter protocol ip pref 49151 flower handle 0x1 
  indev eth2_0
  dst_mac e4:11:22:33:44:51
  src_mac e4:11:22:33:44:50
  eth_type ipv4
  not in_hw
action order 1:  vlan push id 53 protocol 802.1Q priority 0 pipe
 index 20 ref 1 bind 1
 
action order 2: mirred (Egress Redirect to device eth2) stolen
index 26 ref 1 bind 1
 
filter protocol ip pref 49152 flower 
filter protocol ip pref 49152 flower handle 0x1 
  indev eth2_0
  dst_mac e4:1d:2d:a5:f3:9d
  src_mac e4:11:22:33:44:50
  eth_type ipv4
  skip_sw
  in_hw
action order 1:  vlan push id 52 protocol 802.1Q priority 0 pipe
 index 19 ref 1 bind 1
 
action order 2: mirred (Egress Redirect to device eth2) stolen
index 25 ref 1 bind 1

v2 --> v3 changes:

 - fixed the matchall dump flags patch to do proper checks (Jakub)
 - added the same proper checks to flower where they were missing 
 - that flower patch was added as #1 and hence all the other patches are 
offed-by-one
 
v1 --> v2 changes:
 - applied feedback from Jakub and Dave -- where none of the skip flags were 
set, 
   the suggested approach didn't allow user space to distringuish between old 
kernel
   to a case when offloading to HW worked fine.

Or Gerlitz (6):
  net/sched: cls_matchall: Dump skip flags
  net/sched: Reflect HW offload status
  net/sched: cls_flower: Reflect HW offload status
  net/sched: cls_matchall: Reflect HW offloading status
  net/sched: cls_u32: Reflect HW offload status
  net/sched: cls_bpf: Reflect HW offload status

 include/net/pkt_cls.h|  5 +
 include/uapi/linux/pkt_cls.h |  6 --
 net/sched/cls_bpf.c  | 13 +++--
 net/sched/cls_flower.c   |  5 +
 net/sched/cls_matchall.c | 14 --
 net/sched/cls_u32.c  | 10 ++
 6 files changed, 47 insertions(+), 6 deletions(-)

-- 
2.3.7

*** BLURB HERE ***

Or Gerlitz (7):
  net/sched: cls_flower: Properly handle classifier flags dumping
  net/sched: cls_matchall: Dump the classifier flags
  net/sched: Reflect HW offload status
  net/sched: cls_flower: Reflect HW offload status
  net/sched: cls_matchall: Reflect HW offloading status
  net/sched: cls_u32: Reflect HW offload status
  net/sched: cls_bpf: Reflect HW offload status

 include/net/pkt_cls.h|  5 +
 include/uapi/linux/pkt_cls.h |  6 --
 net/sched/cls_bpf.c  | 13 +++--
 net/sched/cls_flower.c   |  8 +++-
 net/sched/cls_matchall.c | 15 +--
 net/sched/cls_u32.c  | 10 ++
 6 files changed, 50 insertions(+), 7 deletions(-)

-- 
2.3.7

[PATCH net-next V3 3/7] net/sched: Reflect HW offload status

2017-02-15 Thread Or Gerlitz

Currently there is no way of querying whether a filter is
offloaded to HW or not when using "both" policy (where none
of skip_sw or skip_hw flags are set by user-space).

Add two new flags, "in hw" and "not in hw" such that user
space can determine if a filter is actually offloaded to
hw or not. The "in hw" UAPI semantics was chosen so it's
similar to the "skip hw" flag logic.

If none of these two flags are set, this signals running
over older kernel.

Signed-off-by: Or Gerlitz 
Reviewed-by: Amir Vadai 
Acked-by: Jiri Pirko 
---
 include/net/pkt_cls.h| 5 +
 include/uapi/linux/pkt_cls.h | 6 --
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 71b266c..15cfe15 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -475,6 +475,11 @@ static inline bool tc_flags_valid(u32 flags)
return true;
 }
 
+static inline bool tc_in_hw(u32 flags)
+{
+   return (flags & TCA_CLS_FLAGS_IN_HW) ? true : false;
+}
+
 enum tc_fl_command {
TC_CLSFLOWER_REPLACE,
TC_CLSFLOWER_DESTROY,
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 345551e..7a69f2a 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -103,8 +103,10 @@ enum {
 #define TCA_POLICE_MAX (__TCA_POLICE_MAX - 1)
 
 /* tca flags definitions */
-#define TCA_CLS_FLAGS_SKIP_HW  (1 << 0)
-#define TCA_CLS_FLAGS_SKIP_SW  (1 << 1)
+#define TCA_CLS_FLAGS_SKIP_HW  (1 << 0) /* don't offload filter to HW */
+#define TCA_CLS_FLAGS_SKIP_SW  (1 << 1) /* don't use filter in SW */
+#define TCA_CLS_FLAGS_IN_HW(1 << 2) /* filter is offloaded to HW */
+#define TCA_CLS_FLAGS_NOT_IN_HW (1 << 3) /* filter isn't offloaded to HW */
 
 /* U32 filters */
 
-- 
2.3.7

[PATCH net-next V3 5/7] net/sched: cls_matchall: Reflect HW offloading status

2017-02-15 Thread Or Gerlitz

Matchall support for the "in hw" offloading flags.

Signed-off-by: Or Gerlitz 
Reviewed-by: Amir Vadai 
Acked-by: Jiri Pirko 
---
 net/sched/cls_matchall.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c
index 35ef1c1..14de5b7 100644
--- a/net/sched/cls_matchall.c
+++ b/net/sched/cls_matchall.c
@@ -56,6 +56,7 @@ static int mall_replace_hw_filter(struct tcf_proto *tp,
struct net_device *dev = tp->q->dev_queue->dev;
struct tc_to_netdev offload;
struct tc_cls_matchall_offload mall_offload = {0};
+   int err;
 
offload.type = TC_SETUP_MATCHALL;
offload.cls_mall = &mall_offload;
@@ -63,8 +64,12 @@ static int mall_replace_hw_filter(struct tcf_proto *tp,
offload.cls_mall->exts = &head->exts;
offload.cls_mall->cookie = cookie;
 
-   return dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle, tp->protocol,
-&offload);
+   err = dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle, tp->protocol,
+   &offload);
+   if (!err)
+   head->flags |= TCA_CLS_FLAGS_IN_HW;
+
+   return err;
 }
 
 static void mall_destroy_hw_filter(struct tcf_proto *tp,
@@ -194,6 +199,9 @@ static int mall_change(struct net *net, struct sk_buff 
*in_skb,
}
}
 
+   if (!(tc_in_hw(new->flags)))
+   new->flags |= TCA_CLS_FLAGS_NOT_IN_HW;
+
*arg = (unsigned long) head;
rcu_assign_pointer(tp->root, new);
if (head)
-- 
2.3.7

[PATCH net-next V3 7/7] net/sched: cls_bpf: Reflect HW offload status

2017-02-15 Thread Or Gerlitz

BPF classifier support for the "in hw" offloading flags.

Signed-off-by: Or Gerlitz 
Reviewed-by: Amir Vadai 
Acked-by: Jakub Kicinski 
---
 net/sched/cls_bpf.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index d9c9701..61a5b33 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -148,6 +148,7 @@ static int cls_bpf_offload_cmd(struct tcf_proto *tp, struct 
cls_bpf_prog *prog,
struct net_device *dev = tp->q->dev_queue->dev;
struct tc_cls_bpf_offload bpf_offload = {};
struct tc_to_netdev offload;
+   int err;
 
offload.type = TC_SETUP_CLSBPF;
offload.cls_bpf = &bpf_offload;
@@ -159,8 +160,13 @@ static int cls_bpf_offload_cmd(struct tcf_proto *tp, 
struct cls_bpf_prog *prog,
bpf_offload.exts_integrated = prog->exts_integrated;
bpf_offload.gen_flags = prog->gen_flags;
 
-   return dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle,
-tp->protocol, &offload);
+   err = dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle,
+   tp->protocol, &offload);
+
+   if (!err && (cmd == TC_CLSBPF_ADD || cmd == TC_CLSBPF_REPLACE))
+   prog->gen_flags |= TCA_CLS_FLAGS_IN_HW;
+
+   return err;
 }
 
 static int cls_bpf_offload(struct tcf_proto *tp, struct cls_bpf_prog *prog,
@@ -511,6 +517,9 @@ static int cls_bpf_change(struct net *net, struct sk_buff 
*in_skb,
return ret;
}
 
+   if (!(tc_in_hw(prog->gen_flags)))
+   prog->gen_flags |= TCA_CLS_FLAGS_NOT_IN_HW;
+
if (oldprog) {
list_replace_rcu(&oldprog->link, &prog->link);
tcf_unbind_filter(tp, &oldprog->res);
-- 
2.3.7

[PATCH net-next V3 2/7] net/sched: cls_matchall: Dump the classifier flags

2017-02-15 Thread Or Gerlitz

The classifier flags are not dumped to user-space, do that.

Signed-off-by: Or Gerlitz 
Acked-by: Jiri Pirko 
Acked-by: Yotam Gigi 
---
 net/sched/cls_matchall.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c
index f2141cb..35ef1c1 100644
--- a/net/sched/cls_matchall.c
+++ b/net/sched/cls_matchall.c
@@ -244,6 +244,9 @@ static int mall_dump(struct net *net, struct tcf_proto *tp, 
unsigned long fh,
nla_put_u32(skb, TCA_MATCHALL_CLASSID, head->res.classid))
goto nla_put_failure;
 
+   if (head->flags && nla_put_u32(skb, TCA_MATCHALL_FLAGS, head->flags))
+   goto nla_put_failure;
+
if (tcf_exts_dump(skb, &head->exts))
goto nla_put_failure;
 
-- 
2.3.7

[PATCH net-next V3 6/7] net/sched: cls_u32: Reflect HW offload status

2017-02-15 Thread Or Gerlitz

U32 support for the "in hw" offloading flags.

Signed-off-by: Or Gerlitz 
Reviewed-by: Amir Vadai 
---
 net/sched/cls_u32.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index a6ec3e4b..8c6cc39 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -523,6 +523,10 @@ static int u32_replace_hw_knode(struct tcf_proto *tp, 
struct tc_u_knode *n,
 
err = dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle,
tp->protocol, &offload);
+
+   if (!err)
+   n->flags |= TCA_CLS_FLAGS_IN_HW;
+
if (tc_skip_sw(flags))
return err;
 
@@ -895,6 +899,9 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
return err;
}
 
+   if (!(tc_in_hw(new->flags)))
+   new->flags |= TCA_CLS_FLAGS_NOT_IN_HW;
+
u32_replace_knode(tp, tp_c, new);
tcf_unbind_filter(tp, &n->res);
call_rcu(&n->rcu, u32_delete_key_rcu);
@@ -1014,6 +1021,9 @@ static int u32_change(struct net *net, struct sk_buff 
*in_skb,
if (err)
goto errhw;
 
+   if (!(tc_in_hw(n->flags)))
+   n->flags |= TCA_CLS_FLAGS_NOT_IN_HW;
+
ins = &ht->ht[TC_U32_HASH(handle)];
for (pins = rtnl_dereference(*ins); pins;
 ins = &pins->next, pins = rtnl_dereference(*ins))
-- 
2.3.7

[PATCH net-next V3 4/7] net/sched: cls_flower: Reflect HW offload status

2017-02-15 Thread Or Gerlitz

Flower support for the "in hw" offloading flags.

Signed-off-by: Or Gerlitz 
Reviewed-by: Amir Vadai 
Acked-by: Jiri Pirko 
---
 net/sched/cls_flower.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 850d982..9270c5b 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -273,6 +273,8 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
 
err = dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle, tp->protocol,
tc);
+   if (!err)
+   f->flags |= TCA_CLS_FLAGS_IN_HW;
 
if (tc_skip_sw(f->flags))
return err;
@@ -912,6 +914,9 @@ static int fl_change(struct net *net, struct sk_buff 
*in_skb,
goto errout;
}
 
+   if (!(tc_in_hw(fnew->flags)))
+   fnew->flags |= TCA_CLS_FLAGS_NOT_IN_HW;
+
if (fold) {
if (!tc_skip_sw(fold->flags))
rhashtable_remove_fast(&head->ht, &fold->ht_node,
-- 
2.3.7

[PATCH net-next] virito-net: set queues after reset during xdp_set

2017-02-15 Thread Jason Wang

We set queues before reset which will cause a crash[1]. This is
because is_xdp_raw_buffer_queue() depends on the old xdp queue pairs
number to do the correct detection. So fix this by:

- set queues after reset, to keep the old vi->curr_queue_pairs. (in
  fact setting queues before reset does not works since after feature
  set, all queue pairs were enabled by default during reset).
- change xdp_queue_pairs only after virtnet_reset() is succeed.

[1]

[   74.328168] general protection fault:  [#1] SMP
[   74.328625] Modules linked in: nfsd xfs libcrc32c virtio_net virtio_pci
[   74.329117] CPU: 0 PID: 2849 Comm: xdp2 Not tainted 4.10.0-rc7+ #499
[   74.329577] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014
[   74.330424] task: 88007a894000 task.stack: c90004388000
[   74.330844] RIP: 0010:skb_release_head_state+0x28/0x80
[   74.331298] RSP: 0018:c9000438b8d0 EFLAGS: 00010206
[   74.331676] RAX:  RBX: 88007ad96300 RCX: 
[   74.332217] RDX: 88007fc137a8 RSI: 88007fc0db28 RDI: 0001bf0001be
[   74.332758] RBP: c9000438b8d8 R08: 0005008f R09: 05f9
[   74.333274] R10: 88007d001700 R11: 820a8a4d R12: 88007ad96300
[   74.333787] R13: 0002 R14: 880036604000 R15: 77ff8000
[   74.334308] FS:  7fc70d8a7b40() GS:88007fc0() 
knlGS:
[   74.334891] CS:  0010 DS:  ES:  CR0: 80050033
[   74.335314] CR2: 7fff4144a710 CR3: 7ab56000 CR4: 003406f0
[   74.335830] DR0:  DR1:  DR2: 
[   74.336373] DR3:  DR6: fffe0ff0 DR7: 0400
[   74.336895] Call Trace:
[   74.337086]  skb_release_all+0xd/0x30
[   74.337356]  consume_skb+0x2c/0x90
[   74.337607]  free_unused_bufs+0x1ff/0x270 [virtio_net]
[   74.337988]  ? vp_synchronize_vectors+0x3b/0x60 [virtio_pci]
[   74.338398]  virtnet_xdp+0x21e/0x440 [virtio_net]
[   74.338741]  dev_change_xdp_fd+0x101/0x140
[   74.339048]  do_setlink+0xcf4/0xd20
[   74.339304]  ? symcmp+0xf/0x20
[   74.339529]  ? mls_level_isvalid+0x52/0x60
[   74.339828]  ? mls_range_isvalid+0x43/0x50
[   74.340135]  ? nla_parse+0xa0/0x100
[   74.340400]  rtnl_setlink+0xd4/0x120
[   74.340664]  ? cpumask_next_and+0x30/0x50
[   74.340966]  rtnetlink_rcv_msg+0x7f/0x1f0
[   74.341259]  ? sock_has_perm+0x59/0x60
[   74.341586]  ? napi_consume_skb+0xe2/0x100
[   74.342010]  ? rtnl_newlink+0x890/0x890
[   74.342435]  netlink_rcv_skb+0x92/0xb0
[   74.342846]  rtnetlink_rcv+0x23/0x30
[   74.343277]  netlink_unicast+0x162/0x210
[   74.343677]  netlink_sendmsg+0x2db/0x390
[   74.343968]  sock_sendmsg+0x33/0x40
[   74.344233]  SYSC_sendto+0xee/0x160
[   74.344482]  ? SYSC_bind+0xb0/0xe0
[   74.344806]  ? sock_alloc_file+0x92/0x110
[   74.345106]  ? fd_install+0x20/0x30
[   74.345360]  ? sock_map_fd+0x3f/0x60
[   74.345586]  SyS_sendto+0x9/0x10
[   74.345790]  entry_SYSCALL_64_fastpath+0x1a/0xa9
[   74.346086] RIP: 0033:0x7fc70d1b8f6d
[   74.346312] RSP: 002b:7fff4144a708 EFLAGS: 0246 ORIG_RAX: 
002c
[   74.346785] RAX: ffda RBX:  RCX: 7fc70d1b8f6d
[   74.347244] RDX: 002c RSI: 7fff4144a720 RDI: 0003
[   74.347683] RBP: 0003 R08:  R09: 
[   74.348544] R10:  R11: 0246 R12: 7fff4144bd90
[   74.349082] R13: 0002 R14: 0002 R15: 7fff4144cda0
[   74.349607] Code: 00 00 00 55 48 89 e5 53 48 89 fb 48 8b 7f 58 48 85 ff 74 
0e 40 f6 c7 01 74 3d 48 c7 43 58 00 00 00 00 48 8b 7b 68 48 85 ff 74 05  ff 
0f 74 20 48 8b 43 60 48 85 c0 74 14 65 8b 15 f3 ab 8d 7e
[   74.351008] RIP: skb_release_head_state+0x28/0x80 RSP: c9000438b8d0
[   74.351625] ---[ end trace fe6e19fd11cfc80b ]---

Fixes: 2de2f7f40ef9 ("virtio_net: XDP support for adjust_head")
Cc: John Fastabend 
Signed-off-by: Jason Wang 
---
 drivers/net/virtio_net.c | 35 ++-
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 11e2853..9ff959c 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1775,7 +1775,7 @@ static int virtnet_xdp_set(struct net_device *dev, struct 
bpf_prog *prog)
unsigned long int max_sz = PAGE_SIZE - sizeof(struct padded_vnet_hdr);
struct virtnet_info *vi = netdev_priv(dev);
struct bpf_prog *old_prog;
-   u16 oxdp_qp, xdp_qp = 0, curr_qp;
+   u16 xdp_qp = 0, curr_qp;
int i, err;
 
if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||
@@ -1813,24 +1813,24 @@ static int virtnet_xdp_set(struct net_device *dev, 
struct bpf_prog *prog)
return PTR_ERR(prog);
}
 
-   err = _virtnet_set_queues(vi, curr_qp + xdp_qp);
-   if (err) {

Re: [PATCH 0/8] net: stmmac: misc patchs

2017-02-15 Thread Giuseppe CAVALLARO


On 2/15/2017 8:39 AM, Corentin Labbe wrote:

Since all patch in v2 already hit linux-next, I just want to be sure, I can add 
"Acked-by and Reviewed-by" to thoses 8 new patchs ?


Right I have just seen the V2 patches are in net-next.

pls consider, for this set, my

Acked-by: Giuseppe Cavallaro 


Regards
Peppe



Regards



Corentin Labbe (8):
  net: stmmac: remove useless parenthesis
  net: stmmac: likely is useless in occasional function
  net: stmmac: use SPEED_UNKNOWN/DUPLEX_UNKNOWN
  net: stmmac: set speed at SPEED_UNKNOWN in case of broken speed
  net: stmmac: run stmmac_hw_fix_mac_speed when speed is valid
  net: stmmac: split the stmmac_adjust_link 10/100 case
  net: stmmac: reduce indentation by adding a continue
  net: stmmac: invert the logic for dumping regs

 .../net/ethernet/stmicro/stmmac/stmmac_ethtool.c   | 18 ++---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  | 40 ++-
 drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c  | 82 +++---
 3 files changed, 71 insertions(+), 69 deletions(-)

Re: rtlwifi: btcoexist: fix semicolon.cocci warnings

2017-02-15 Thread Kalle Valo

Julia Lawall  wrote:
> Remove unneeded semicolon.
> 
> Generated by: scripts/coccinelle/misc/semicolon.cocci
> 
> CC: Larry Finger 
> Signed-off-by: Julia Lawall 
> Signed-off-by: Fengguang Wu 
> Acked-by: Larry Finger 

Patch applied to wireless-drivers-next.git, thanks.

7546bba385b4 rtlwifi: btcoexist: fix semicolon.cocci warnings

-- 
https://patchwork.kernel.org/patch/9565451/

Documentation about submitting wireless patches and checking status
from patchwork:

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

Re: orinoco: Use net_device_stats from struct net_device

2017-02-15 Thread Kalle Valo

Tobias Klauser  wrote:
> Instead of using a private copy of struct net_device_stats in
> struct orinoco_private, use stats from struct net_device. Also remove
> the now unnecessary .ndo_get_stats function.
> 
> Signed-off-by: Tobias Klauser 

Patch applied to wireless-drivers-next.git, thanks.

3a6282045b22 orinoco: Use net_device_stats from struct net_device

-- 
https://patchwork.kernel.org/patch/9566717/

Documentation about submitting wireless patches and checking status
from patchwork:

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

Re: brcmfmac: Use net_device_stats from struct net_device

2017-02-15 Thread Kalle Valo

Tobias Klauser  wrote:
> Instead of using a private copy of struct net_device_stats in struct
> brcm_if, use stats from struct net_device.  Also remove the now
> unnecessary .ndo_get_stats function.
> 
> Signed-off-by: Tobias Klauser 
> Acked-by: Arend van Spriel 

Patch applied to wireless-drivers-next.git, thanks.

91b632803ee4 brcmfmac: Use net_device_stats from struct net_device

-- 
https://patchwork.kernel.org/patch/9569271/

Documentation about submitting wireless patches and checking status
from patchwork:

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

Re: rt2500usb: don't mark register accesses as inline

2017-02-15 Thread Kalle Valo

Arnd Bergmann  wrote:
> When CONFIG_KASAN is set, we get a rather large stack here:
> 
> drivers/net/wireless/ralink/rt2x00/rt2500usb.c: In function 
> 'rt2500usb_set_device_state':
> drivers/net/wireless/ralink/rt2x00/rt2500usb.c:1074:1: error: the frame size 
> of 3032 bytes is larger than 100 bytes [-Werror=frame-larger-than=]
> 
> If we don't force those functions to be inline, the compiler can figure this
> out better itself and not inline the functions when doing so would be harmful,
> reducing the stack size to a merge 256 bytes.
> 
> Note that there is another problem that manifests in this driver, as a result
> of the typecheck() macro causing even larger stack frames.
> 
> Signed-off-by: Arnd Bergmann 

Patch applied to wireless-drivers-next.git, thanks.

727241660912 rt2500usb: don't mark register accesses as inline

-- 
https://patchwork.kernel.org/patch/9572947/

Documentation about submitting wireless patches and checking status
from patchwork:

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

[PATCH v3 0/8] misc patchs

2017-02-15 Thread Corentin Labbe

Hello

This is a follow up of my previous stmmac serie which address some comment
done in v2.

Corentin Labbe (8):
  net: stmmac: remove useless parenthesis
  net: stmmac: likely is useless in occasional function
  net: stmmac: use SPEED_UNKNOWN/DUPLEX_UNKNOWN
  net: stmmac: set speed at SPEED_UNKNOWN in case of broken speed
  net: stmmac: run stmmac_hw_fix_mac_speed when speed is valid
  net: stmmac: split the stmmac_adjust_link 10/100 case
  net: stmmac: reduce indentation by adding a continue
  net: stmmac: invert the logic for dumping regs

 .../net/ethernet/stmicro/stmmac/stmmac_ethtool.c   | 18 ++---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  | 40 ++-
 drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c  | 82 +++---
 3 files changed, 71 insertions(+), 69 deletions(-)

-- 
2.10.2

[PATCH v3 1/8] net: stmmac: remove useless parenthesis

2017-02-15 Thread Corentin Labbe

This patch remove some useless parenthesis.

Signed-off-by: Corentin Labbe 
Acked-by: Giuseppe Cavallaro 
Reviewed-by: Giuseppe Cavallaro 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 8 
 drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 7251871..ee1dbf4 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -716,15 +716,15 @@ static void stmmac_adjust_link(struct net_device *dev)
new_state = 1;
switch (phydev->speed) {
case 1000:
-   if (likely((priv->plat->has_gmac) ||
-  (priv->plat->has_gmac4)))
+   if (likely(priv->plat->has_gmac ||
+  priv->plat->has_gmac4))
ctrl &= ~priv->hw->link.port;
stmmac_hw_fix_mac_speed(priv);
break;
case 100:
case 10:
-   if (likely((priv->plat->has_gmac) ||
-  (priv->plat->has_gmac4))) {
+   if (likely(priv->plat->has_gmac ||
+  priv->plat->has_gmac4)) {
ctrl |= priv->hw->link.port;
if (phydev->speed == SPEED_100) {
ctrl |= priv->hw->link.speed;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
index d9893cf..a695773 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
@@ -257,7 +257,7 @@ int stmmac_mdio_register(struct net_device *ndev)
 * If an IRQ was provided to be assigned after
 * the bus probe, do it here.
 */
-   if ((!mdio_bus_data->irqs) &&
+   if (!mdio_bus_data->irqs &&
(mdio_bus_data->probed_phy_irq > 0)) {
new_bus->irq[addr] =
mdio_bus_data->probed_phy_irq;
-- 
2.10.2

[PATCH v3 2/8] net: stmmac: likely is useless in occasional function

2017-02-15 Thread Corentin Labbe

The stmmac_adjust_link() function is called too rarely for having
likely() macros being useful.
Just remove likely annotation in it.

Signed-off-by: Corentin Labbe 
Acked-by: Giuseppe Cavallaro 
Reviewed-by: Giuseppe Cavallaro 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index ee1dbf4..511c47c 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -716,15 +716,15 @@ static void stmmac_adjust_link(struct net_device *dev)
new_state = 1;
switch (phydev->speed) {
case 1000:
-   if (likely(priv->plat->has_gmac ||
-  priv->plat->has_gmac4))
+   if (priv->plat->has_gmac ||
+   priv->plat->has_gmac4)
ctrl &= ~priv->hw->link.port;
stmmac_hw_fix_mac_speed(priv);
break;
case 100:
case 10:
-   if (likely(priv->plat->has_gmac ||
-  priv->plat->has_gmac4)) {
+   if (priv->plat->has_gmac ||
+   priv->plat->has_gmac4) {
ctrl |= priv->hw->link.port;
if (phydev->speed == SPEED_100) {
ctrl |= priv->hw->link.speed;
-- 
2.10.2

[PATCH v3 3/8] net: stmmac: use SPEED_UNKNOWN/DUPLEX_UNKNOWN

2017-02-15 Thread Corentin Labbe

It is better to use DUPLEX_UNKNOWN instead of just "-1".
Using 0 for an invalid speed is bad since 0 is a valid value for speed.
So this patch replace 0 by SPEED_UNKNOWN.

Signed-off-by: Corentin Labbe 
Acked-by: Giuseppe Cavallaro 
Reviewed-by: Giuseppe Cavallaro 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 511c47c..a87071d 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -754,8 +754,8 @@ static void stmmac_adjust_link(struct net_device *dev)
} else if (priv->oldlink) {
new_state = 1;
priv->oldlink = 0;
-   priv->speed = 0;
-   priv->oldduplex = -1;
+   priv->speed = SPEED_UNKNOWN;
+   priv->oldduplex = DUPLEX_UNKNOWN;
}
 
if (new_state && netif_msg_link(priv))
@@ -817,8 +817,8 @@ static int stmmac_init_phy(struct net_device *dev)
int interface = priv->plat->interface;
int max_speed = priv->plat->max_speed;
priv->oldlink = 0;
-   priv->speed = 0;
-   priv->oldduplex = -1;
+   priv->speed = SPEED_UNKNOWN;
+   priv->oldduplex = DUPLEX_UNKNOWN;
 
if (priv->plat->phy_node) {
phydev = of_phy_connect(dev, priv->plat->phy_node,
@@ -3434,8 +3434,8 @@ int stmmac_suspend(struct device *dev)
spin_unlock_irqrestore(&priv->lock, flags);
 
priv->oldlink = 0;
-   priv->speed = 0;
-   priv->oldduplex = -1;
+   priv->speed = SPEED_UNKNOWN;
+   priv->oldduplex = DUPLEX_UNKNOWN;
return 0;
 }
 EXPORT_SYMBOL_GPL(stmmac_suspend);
-- 
2.10.2

Re: [PATCH net-next] mlx4: do not use rwlock in fast path

2017-02-15 Thread Tariq Toukan

On 14/02/2017 6:28 PM, David Miller wrote:

From: Eric Dumazet 
Date: Thu, 09 Feb 2017 09:10:04 -0800

From: Eric Dumazet 

Using a reader-writer lock in fast path is silly, when we can
instead use RCU or a seqlock.

For mlx4 hwstamp clock, a seqlock is the way to go, removing
two atomic operations and false sharing.

Signed-off-by: Eric Dumazet 
Cc: Tariq Toukan 

Tariq or someone else at Mellanox please review, this patch has been
rotting for 5 days in patchwork.

Thank you.

Reviewed-by: Tariq Toukan 
Thanks.

[PATCH v3 4/8] net: stmmac: set speed at SPEED_UNKNOWN in case of broken speed

2017-02-15 Thread Corentin Labbe

In case of invalid speed given, stmmac_adjust_link() still record it as
current speed.
This patch modify the default case to set speed as SPEED_UNKNOWN if not
10/100/1000.

Signed-off-by: Corentin Labbe 
Acked-by: Giuseppe Cavallaro 
Reviewed-by: Giuseppe Cavallaro 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index a87071d..f7664b9 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -739,6 +739,7 @@ static void stmmac_adjust_link(struct net_device *dev)
default:
netif_warn(priv, link, priv->dev,
   "broken speed: %d\n", phydev->speed);
+   phydev->speed = SPEED_UNKNOWN;
break;
}
 
-- 
2.10.2

[PATCH v3 7/8] net: stmmac: reduce indentation by adding a continue

2017-02-15 Thread Corentin Labbe

As suggested by Joe Perches, replacing the "if phydev" logic permit to
reduce indentation in the for loop.

Signed-off-by: Corentin Labbe 
Acked-by: Giuseppe Cavallaro 
Reviewed-by: Giuseppe Cavallaro 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c | 82 +++
 1 file changed, 40 insertions(+), 42 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
index a695773..db157a4 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
@@ -247,50 +247,48 @@ int stmmac_mdio_register(struct net_device *ndev)
found = 0;
for (addr = 0; addr < PHY_MAX_ADDR; addr++) {
struct phy_device *phydev = mdiobus_get_phy(new_bus, addr);
+   int act = 0;
+   char irq_num[4];
+   char *irq_str;
+
+   if (!phydev)
+   continue;
+
+   /*
+* If an IRQ was provided to be assigned after
+* the bus probe, do it here.
+*/
+   if (!mdio_bus_data->irqs &&
+   (mdio_bus_data->probed_phy_irq > 0)) {
+   new_bus->irq[addr] = mdio_bus_data->probed_phy_irq;
+   phydev->irq = mdio_bus_data->probed_phy_irq;
+   }
 
-   if (phydev) {
-   int act = 0;
-   char irq_num[4];
-   char *irq_str;
-
-   /*
-* If an IRQ was provided to be assigned after
-* the bus probe, do it here.
-*/
-   if (!mdio_bus_data->irqs &&
-   (mdio_bus_data->probed_phy_irq > 0)) {
-   new_bus->irq[addr] =
-   mdio_bus_data->probed_phy_irq;
-   phydev->irq = mdio_bus_data->probed_phy_irq;
-   }
-
-   /*
-* If we're going to bind the MAC to this PHY bus,
-* and no PHY number was provided to the MAC,
-* use the one probed here.
-*/
-   if (priv->plat->phy_addr == -1)
-   priv->plat->phy_addr = addr;
-
-   act = (priv->plat->phy_addr == addr);
-   switch (phydev->irq) {
-   case PHY_POLL:
-   irq_str = "POLL";
-   break;
-   case PHY_IGNORE_INTERRUPT:
-   irq_str = "IGNORE";
-   break;
-   default:
-   sprintf(irq_num, "%d", phydev->irq);
-   irq_str = irq_num;
-   break;
-   }
-   netdev_info(ndev, "PHY ID %08x at %d IRQ %s (%s)%s\n",
-   phydev->phy_id, addr,
-   irq_str, phydev_name(phydev),
-   act ? " active" : "");
-   found = 1;
+   /*
+* If we're going to bind the MAC to this PHY bus,
+* and no PHY number was provided to the MAC,
+* use the one probed here.
+*/
+   if (priv->plat->phy_addr == -1)
+   priv->plat->phy_addr = addr;
+
+   act = (priv->plat->phy_addr == addr);
+   switch (phydev->irq) {
+   case PHY_POLL:
+   irq_str = "POLL";
+   break;
+   case PHY_IGNORE_INTERRUPT:
+   irq_str = "IGNORE";
+   break;
+   default:
+   sprintf(irq_num, "%d", phydev->irq);
+   irq_str = irq_num;
+   break;
}
+   netdev_info(ndev, "PHY ID %08x at %d IRQ %s (%s)%s\n",
+   phydev->phy_id, addr, irq_str, phydev_name(phydev),
+   act ? " active" : "");
+   found = 1;
}
 
if (!found && !mdio_node) {
-- 
2.10.2

[PATCH v3 8/8] net: stmmac: invert the logic for dumping regs

2017-02-15 Thread Corentin Labbe

It is easier to follow the logic by removing the not operator

Signed-off-by: Corentin Labbe 
Acked-by: Giuseppe Cavallaro 
Reviewed-by: Giuseppe Cavallaro 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
index aab895d..5ff6bc4 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
@@ -442,24 +442,24 @@ static void stmmac_ethtool_gregs(struct net_device *dev,
 
memset(reg_space, 0x0, REG_SPACE_SIZE);
 
-   if (!(priv->plat->has_gmac || priv->plat->has_gmac4)) {
+   if (priv->plat->has_gmac || priv->plat->has_gmac4) {
/* MAC registers */
-   for (i = 0; i < 12; i++)
+   for (i = 0; i < 55; i++)
reg_space[i] = readl(priv->ioaddr + (i * 4));
/* DMA registers */
-   for (i = 0; i < 9; i++)
-   reg_space[i + 12] =
+   for (i = 0; i < 22; i++)
+   reg_space[i + 55] =
readl(priv->ioaddr + (DMA_BUS_MODE + (i * 4)));
-   reg_space[22] = readl(priv->ioaddr + DMA_CUR_TX_BUF_ADDR);
-   reg_space[23] = readl(priv->ioaddr + DMA_CUR_RX_BUF_ADDR);
} else {
/* MAC registers */
-   for (i = 0; i < 55; i++)
+   for (i = 0; i < 12; i++)
reg_space[i] = readl(priv->ioaddr + (i * 4));
/* DMA registers */
-   for (i = 0; i < 22; i++)
-   reg_space[i + 55] =
+   for (i = 0; i < 9; i++)
+   reg_space[i + 12] =
readl(priv->ioaddr + (DMA_BUS_MODE + (i * 4)));
+   reg_space[22] = readl(priv->ioaddr + DMA_CUR_TX_BUF_ADDR);
+   reg_space[23] = readl(priv->ioaddr + DMA_CUR_RX_BUF_ADDR);
}
 }
 
-- 
2.10.2

[PATCH v3 5/8] net: stmmac: run stmmac_hw_fix_mac_speed when speed is valid

2017-02-15 Thread Corentin Labbe

This patch mutualise a bit by running stmmac_hw_fix_mac_speed() after
the switch in case of valid speed.

Signed-off-by: Corentin Labbe 
Acked-by: Giuseppe Cavallaro 
Reviewed-by: Giuseppe Cavallaro 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index f7664b9..bebe810 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -719,7 +719,6 @@ static void stmmac_adjust_link(struct net_device *dev)
if (priv->plat->has_gmac ||
priv->plat->has_gmac4)
ctrl &= ~priv->hw->link.port;
-   stmmac_hw_fix_mac_speed(priv);
break;
case 100:
case 10:
@@ -734,7 +733,6 @@ static void stmmac_adjust_link(struct net_device *dev)
} else {
ctrl &= ~priv->hw->link.port;
}
-   stmmac_hw_fix_mac_speed(priv);
break;
default:
netif_warn(priv, link, priv->dev,
@@ -742,7 +740,8 @@ static void stmmac_adjust_link(struct net_device *dev)
phydev->speed = SPEED_UNKNOWN;
break;
}
-
+   if (phydev->speed != SPEED_UNKNOWN)
+   stmmac_hw_fix_mac_speed(priv);
priv->speed = phydev->speed;
}
 
-- 
2.10.2

[PATCH v3 6/8] net: stmmac: split the stmmac_adjust_link 10/100 case

2017-02-15 Thread Corentin Labbe

The 10/100 case have too many ifcase.
This patch split it for removing an if.

Signed-off-by: Corentin Labbe 
Acked-by: Giuseppe Cavallaro 
Reviewed-by: Giuseppe Cavallaro 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index bebe810..3cbe096 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -721,15 +721,19 @@ static void stmmac_adjust_link(struct net_device *dev)
ctrl &= ~priv->hw->link.port;
break;
case 100:
+   if (priv->plat->has_gmac ||
+   priv->plat->has_gmac4) {
+   ctrl |= priv->hw->link.port;
+   ctrl |= priv->hw->link.speed;
+   } else {
+   ctrl &= ~priv->hw->link.port;
+   }
+   break;
case 10:
if (priv->plat->has_gmac ||
priv->plat->has_gmac4) {
ctrl |= priv->hw->link.port;
-   if (phydev->speed == SPEED_100) {
-   ctrl |= priv->hw->link.speed;
-   } else {
-   ctrl &= ~(priv->hw->link.speed);
-   }
+   ctrl &= ~(priv->hw->link.speed);
} else {
ctrl &= ~priv->hw->link.port;
}
-- 
2.10.2

Re: [PATCH] net: hip04: Omit private ndo_get_stats function

2017-02-15 Thread Joe Perches

On Tue, 2017-02-14 at 15:10 +0100, Tobias Klauser wrote:
> hip04_get_stats() just returns dev->stats so we can leave it
> out altogether and let dev_get_stats() do the job.

This could be done for at least a few more drivers:

drivers/net/ethernet/chelsio/cxgb/sge.c
drivers/net/ethernet/intel/e1000/e1000_main.c
drivers/net/ethernet/intel/ixgb/ixgb_main.c
drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c

Re: [PATCH] net: hip04: Omit private ndo_get_stats function

2017-02-15 Thread Tobias Klauser

On 2017-02-15 at 11:28:08 +0100, Joe Perches  wrote:
> On Tue, 2017-02-14 at 15:10 +0100, Tobias Klauser wrote:
> > hip04_get_stats() just returns dev->stats so we can leave it
> > out altogether and let dev_get_stats() do the job.
> 
> This could be done for at least a few more drivers:
> 
> drivers/net/ethernet/chelsio/cxgb/sge.c

I don't see an .ndo_get_stats being defined/set in this file. AFAICT,
this driver sets .ndo_get_stats to t1_get_stats(), which does some
additional extraction of statistics from device registers and thus
shouldn't be removed.

> drivers/net/ethernet/intel/e1000/e1000_main.c
> drivers/net/ethernet/intel/ixgb/ixgb_main.c

Will cover these in follow-up patches, thanks.

> drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c

A patch for pch_gbe already landed in net-next.

Re: [PATCH] net: hip04: Omit private ndo_get_stats function

2017-02-15 Thread Joe Perches

On Wed, 2017-02-15 at 11:38 +0100, Tobias Klauser wrote:
> On 2017-02-15 at 11:28:08 +0100, Joe Perches  wrote:
> > On Tue, 2017-02-14 at 15:10 +0100, Tobias Klauser wrote:
> > > hip04_get_stats() just returns dev->stats so we can leave it
> > > out altogether and let dev_get_stats() do the job.
> > 
> > This could be done for at least a few more drivers:
> > 
> > drivers/net/ethernet/chelsio/cxgb/sge.c
> 
> I don't see an .ndo_get_stats being defined/set in this file. AFAICT,
> this driver sets .ndo_get_stats to t1_get_stats(), which does some
> additional extraction of statistics from device registers and thus
> shouldn't be removed.

Right.

> > drivers/net/ethernet/intel/e1000/e1000_main.c
> > drivers/net/ethernet/intel/ixgb/ixgb_main.c
> 
> Will cover these in follow-up patches, thanks.

Thanks.

> > drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
> 
> A patch for pch_gbe already landed in net-next.

Great, cheers, Joe

[patch net-next] sched: have stub for tcf_destroy_chain in case NET_CLS is not configured

2017-02-15 Thread Jiri Pirko

From: Jiri Pirko 

This fixes broken build for !NET_CLS:

net/built-in.o: In function `fq_codel_destroy':
/home/sab/linux/net-next/net/sched/sch_fq_codel.c:468: undefined reference to 
`tcf_destroy_chain'

Fixes: cf1facda2f61 ("sched: move tcf_proto_destroy and tcf_destroy_chain 
helpers into cls_api")
Reported-by: Sabrina Dubroca 
Signed-off-by: Jiri Pirko 
---
 include/net/pkt_cls.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 71b266c..be5c12a 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -17,7 +17,13 @@ struct tcf_walker {
 int register_tcf_proto_ops(struct tcf_proto_ops *ops);
 int unregister_tcf_proto_ops(struct tcf_proto_ops *ops);
 
+#ifdef CONFIG_NET_CLS
 void tcf_destroy_chain(struct tcf_proto __rcu **fl);
+#else
+static inline void tcf_destroy_chain(struct tcf_proto __rcu **fl)
+{
+}
+#endif
 
 static inline unsigned long
 __cls_set_class(unsigned long *clp, unsigned long cl)
-- 
2.7.4

Re: [PATCH net] packet: Do not call fanout_release from atomic contexts

2017-02-15 Thread Anoob Soman


On 13/02/17 14:50, Anoob Soman wrote:

On 13/02/17 14:26, Eric Dumazet wrote:

On Mon, 2017-02-13 at 13:28 +, Anoob Soman wrote:


Wouldn't it be easier to call synchronize_net(), before calling
fanout_release_data() and kfree(f).
The behavior, wrt synchronize_net, would be same as before and
fanout_release() will cleanup everything without leaving any residue.

So we would require two synchronize_net() calls instead of one ?

synchronize_net() is very expensive on some hosts, it is a big hammer.





Yes, one before fanout_release_data() (will be called only if 
fanout->sk_ref == 0) and one after fanout_release().


I understand synchronize_net() is expensive, but adding another 
synchronize_net(),  before fanout_release_data(), will be no different 
from what we have in the existing code.


I can also make sure second synchronize_net() doesn't get called 
again, if fanout_release() calls synchronize_net(), by making 
fanout_release() return something to indicate it has done 
synchronize_net().


Hi Eric,

Did you get a chance to looks at my comments ?

[PATCH net-next] ixgb: Omit private ndo_get_stats function

2017-02-15 Thread Tobias Klauser

ixgb_get_stats() just returns dev->stats so we can leave it
out altogether and let dev_get_stats() do the job.

Suggested-by: Joe Perches 
Signed-off-by: Tobias Klauser 
---
 drivers/net/ethernet/intel/ixgb/ixgb_main.c | 16 
 1 file changed, 16 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgb/ixgb_main.c 
b/drivers/net/ethernet/intel/ixgb/ixgb_main.c
index fbd220d137b3..5a713199653c 100644
--- a/drivers/net/ethernet/intel/ixgb/ixgb_main.c
+++ b/drivers/net/ethernet/intel/ixgb/ixgb_main.c
@@ -86,7 +86,6 @@ static void ixgb_set_multi(struct net_device *netdev);
 static void ixgb_watchdog(unsigned long data);
 static netdev_tx_t ixgb_xmit_frame(struct sk_buff *skb,
   struct net_device *netdev);
-static struct net_device_stats *ixgb_get_stats(struct net_device *netdev);
 static int ixgb_change_mtu(struct net_device *netdev, int new_mtu);
 static int ixgb_set_mac(struct net_device *netdev, void *p);
 static irqreturn_t ixgb_intr(int irq, void *data);
@@ -367,7 +366,6 @@ static const struct net_device_ops ixgb_netdev_ops = {
.ndo_open   = ixgb_open,
.ndo_stop   = ixgb_close,
.ndo_start_xmit = ixgb_xmit_frame,
-   .ndo_get_stats  = ixgb_get_stats,
.ndo_set_rx_mode= ixgb_set_multi,
.ndo_validate_addr  = eth_validate_addr,
.ndo_set_mac_address= ixgb_set_mac,
@@ -1597,20 +1595,6 @@ ixgb_tx_timeout_task(struct work_struct *work)
 }
 
 /**
- * ixgb_get_stats - Get System Network Statistics
- * @netdev: network interface device structure
- *
- * Returns the address of the device statistics structure.
- * The statistics are actually updated from the timer callback.
- **/
-
-static struct net_device_stats *
-ixgb_get_stats(struct net_device *netdev)
-{
-   return &netdev->stats;
-}
-
-/**
  * ixgb_change_mtu - Change the Maximum Transfer Unit
  * @netdev: network interface device structure
  * @new_mtu: new value for maximum frame size
-- 
2.11.0

[PATCH net-next] e1000: Omit private ndo_get_stats function

2017-02-15 Thread Tobias Klauser

e1000_get_stats() just returns dev->stats so we can leave it
out altogether and let dev_get_stats() do the job.

Suggested-by: Joe Perches 
Signed-off-by: Tobias Klauser 
---
 drivers/net/ethernet/intel/e1000/e1000_main.c | 15 ---
 1 file changed, 15 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c 
b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 93fc6c67306b..bd8b05fe8258 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -131,7 +131,6 @@ static void e1000_watchdog(struct work_struct *work);
 static void e1000_82547_tx_fifo_stall_task(struct work_struct *work);
 static netdev_tx_t e1000_xmit_frame(struct sk_buff *skb,
struct net_device *netdev);
-static struct net_device_stats *e1000_get_stats(struct net_device *netdev);
 static int e1000_change_mtu(struct net_device *netdev, int new_mtu);
 static int e1000_set_mac(struct net_device *netdev, void *p);
 static irqreturn_t e1000_intr(int irq, void *data);
@@ -846,7 +845,6 @@ static const struct net_device_ops e1000_netdev_ops = {
.ndo_open   = e1000_open,
.ndo_stop   = e1000_close,
.ndo_start_xmit = e1000_xmit_frame,
-   .ndo_get_stats  = e1000_get_stats,
.ndo_set_rx_mode= e1000_set_rx_mode,
.ndo_set_mac_address= e1000_set_mac,
.ndo_tx_timeout = e1000_tx_timeout,
@@ -3530,19 +3528,6 @@ static void e1000_reset_task(struct work_struct *work)
 }
 
 /**
- * e1000_get_stats - Get System Network Statistics
- * @netdev: network interface device structure
- *
- * Returns the address of the device statistics structure.
- * The statistics are actually updated from the watchdog.
- **/
-static struct net_device_stats *e1000_get_stats(struct net_device *netdev)
-{
-   /* only return the current stats */
-   return &netdev->stats;
-}
-
-/**
  * e1000_change_mtu - Change the Maximum Transfer Unit
  * @netdev: network interface device structure
  * @new_mtu: new value for maximum frame size
-- 
2.11.0

[patch net-next] mlxsw: acl: Use PBS type for forward action

2017-02-15 Thread Jiri Pirko

From: Jiri Pirko 

Current behaviour of "mirred redirect" action (forward) offload is a bit
odd. For matched packets the action forwards them to the desired
destination, but it also lets the packet duplicates to go the original
way down (bridge, router, etc). That is more like "mirred mirror".
Fix this by using PBS type which behaves exactly like "mirred redirect".
Note that PBS does not support loopback mode.

Fixes: 4cda7d8d7098 ("mlxsw: core: Introduce flexible actions support")
Signed-off-by: Jiri Pirko 
Reviewed-by: Ido Schimmel 
---
 .../ethernet/mellanox/mlxsw/core_acl_flex_actions.c  | 20 +---
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c 
b/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c
index 42bb18f..5f337715 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_actions.c
@@ -651,17 +651,16 @@ int mlxsw_afa_block_append_fwd(struct mlxsw_afa_block 
*block,
   u8 local_port, bool in_port)
 {
struct mlxsw_afa_fwd_entry_ref *fwd_entry_ref;
-   u32 kvdl_index = 0;
+   u32 kvdl_index;
char *act;
int err;
 
-   if (!in_port) {
-   fwd_entry_ref = mlxsw_afa_fwd_entry_ref_create(block,
-  local_port);
-   if (IS_ERR(fwd_entry_ref))
-   return PTR_ERR(fwd_entry_ref);
-   kvdl_index = fwd_entry_ref->fwd_entry->kvdl_index;
-   }
+   if (in_port)
+   return -EOPNOTSUPP;
+   fwd_entry_ref = mlxsw_afa_fwd_entry_ref_create(block, local_port);
+   if (IS_ERR(fwd_entry_ref))
+   return PTR_ERR(fwd_entry_ref);
+   kvdl_index = fwd_entry_ref->fwd_entry->kvdl_index;
 
act = mlxsw_afa_block_append_action(block, MLXSW_AFA_FORWARD_CODE,
MLXSW_AFA_FORWARD_SIZE);
@@ -669,13 +668,12 @@ int mlxsw_afa_block_append_fwd(struct mlxsw_afa_block 
*block,
err = -ENOBUFS;
goto err_append_action;
}
-   mlxsw_afa_forward_pack(act, MLXSW_AFA_FORWARD_TYPE_OUTPUT,
+   mlxsw_afa_forward_pack(act, MLXSW_AFA_FORWARD_TYPE_PBS,
   kvdl_index, in_port);
return 0;
 
 err_append_action:
-   if (!in_port)
-   mlxsw_afa_fwd_entry_ref_destroy(block, fwd_entry_ref);
+   mlxsw_afa_fwd_entry_ref_destroy(block, fwd_entry_ref);
return err;
 }
 EXPORT_SYMBOL(mlxsw_afa_block_append_fwd);
-- 
2.7.4

Re: [PATCH net-next] mlx4: do not fire tasklet unless necessary

2017-02-15 Thread Saeed Mahameed

On Fri, Feb 10, 2017 at 2:27 PM, Eric Dumazet  wrote:
> From: Eric Dumazet 
>
> All rx and rx netdev interrupts are handled by respectively
> by mlx4_en_rx_irq() and mlx4_en_tx_irq() which simply schedule a NAPI.
>
> But mlx4_eq_int() also fires a tasklet to service all items that were
> queued via mlx4_add_cq_to_tasklet(), but this handler was not called
> unless user cqe was handled.
>
> This is very confusing, as "mpstat -I SCPU ..." show huge number of
> tasklet invocations.
>
> This patch saves this overhead, by carefully firing the tasklet directly
> from mlx4_add_cq_to_tasklet(), removing four atomic operations per IRQ.
>
> Signed-off-by: Eric Dumazet 
> Cc: Tariq Toukan 
> Cc: Saeed Mahameed 
> ---
>  drivers/net/ethernet/mellanox/mlx4/cq.c |6 +-
>  drivers/net/ethernet/mellanox/mlx4/eq.c |9 +
>  2 files changed, 6 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/cq.c 
> b/drivers/net/ethernet/mellanox/mlx4/cq.c
> index 
> 6b8635378f1fcb2aae4e8ac390bcd09d552c2256..fa6d2354a0e910ee160863e3cbe21a512d77bf03
>  100644
> --- a/drivers/net/ethernet/mellanox/mlx4/cq.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/cq.c
> @@ -81,8 +81,9 @@ void mlx4_cq_tasklet_cb(unsigned long data)
>
>  static void mlx4_add_cq_to_tasklet(struct mlx4_cq *cq)
>  {
> -   unsigned long flags;
> struct mlx4_eq_tasklet *tasklet_ctx = cq->tasklet_ctx.priv;
> +   unsigned long flags;
> +   bool kick;
>
> spin_lock_irqsave(&tasklet_ctx->lock, flags);
> /* When migrating CQs between EQs will be implemented, please note
> @@ -92,7 +93,10 @@ static void mlx4_add_cq_to_tasklet(struct mlx4_cq *cq)
>  */
> if (list_empty_careful(&cq->tasklet_ctx.list)) {
> atomic_inc(&cq->refcount);
> +   kick = list_empty(&tasklet_ctx->list);

So first one in would fire the tasklet, but wouldn't this cause CQE
processing loss
in the same mlx4_eq_int loop if the tasklet was fast enough to
schedule and while other CQEs are going to add themselves to the
tasklet_ctx->list ?

Anyway i tried to find race scenarios that could cause such thing but
synchronization looks good.

> list_add_tail(&cq->tasklet_ctx.list, &tasklet_ctx->list);
> +   if (kick)
> +   tasklet_schedule(&tasklet_ctx->task);
> }
> spin_unlock_irqrestore(&tasklet_ctx->lock, flags);
>  }
> diff --git a/drivers/net/ethernet/mellanox/mlx4/eq.c 
> b/drivers/net/ethernet/mellanox/mlx4/eq.c
> index 
> 0509996957d9664b612358dd805359f4bc67b8dc..39232b6a974f4b4b961d3b0b8634f04e6b9d0caa
>  100644
> --- a/drivers/net/ethernet/mellanox/mlx4/eq.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/eq.c
> @@ -494,7 +494,7 @@ static int mlx4_eq_int(struct mlx4_dev *dev, struct 
> mlx4_eq *eq)
>  {
> struct mlx4_priv *priv = mlx4_priv(dev);
> struct mlx4_eqe *eqe;
> -   int cqn = -1;
> +   int cqn;
> int eqes_found = 0;
> int set_ci = 0;
> int port;
> @@ -840,13 +840,6 @@ static int mlx4_eq_int(struct mlx4_dev *dev, struct 
> mlx4_eq *eq)
>
> eq_set_ci(eq, 1);
>
> -   /* cqn is 24bit wide but is initialized such that its higher bits
> -* are ones too. Thus, if we got any event, cqn's high bits should be 
> off
> -* and we need to schedule the tasklet.
> -*/
> -   if (!(cqn & ~0xff))

what if we simply change this condition to:
if (!list_empty_careful(eq->tasklet_ctx.list))

Wouldn't this be sort of equivalent to what you did ? and this way we
would simply fire the tasklet only when needed and not on every
handled CQE.

> -   tasklet_schedule(&eq->tasklet_ctx.task);
> -
> return eqes_found;
>  }
>
>
>

Re: [PATCH RFC v2 ipsec-next 0/6] IPsec GRO layer decapsulation

2017-02-15 Thread Steffen Klassert

On Mon, Feb 13, 2017 at 10:57:55AM +0100, Steffen Klassert wrote:
> This patchset adds a software GRO codepath for IPsec ESP.
> The ESP gro_receive callback functions decapsulate the
> ESP packets at the GRO layer and reinject them back with
> gro_cells_receive(). This saves a complete round through
> the stack for IPsec ESP packets.
> 
> We also need this for ESP HW offload, because HW decrypt but
> does not decapsulate the packet. We need to decapsulate before
> the inbound policy check, otherwise this check will fail.
> 
> Patches 2 an 3 prepare the generic code for packet consuming
> gro callbacks.
> 
> Changes from v1:
> 
> - Add the 'xfrm: Extend the sec_path for IPsec offloading'
>   patch and use the secpath to encode GRO calls.
> 
> - Coding style fixes.

I have not got any further feedback, so I applied this
series to ipsec-next.

Re: [PATCH net-next] mlx4: do not fire tasklet unless necessary

2017-02-15 Thread Eric Dumazet

On Wed, 2017-02-15 at 13:10 +0200, Saeed Mahameed wrote:
> On Fri, Feb 10, 2017 at 2:27 PM, Eric Dumazet  wrote:
> > From: Eric Dumazet 
> >
> > All rx and rx netdev interrupts are handled by respectively
> > by mlx4_en_rx_irq() and mlx4_en_tx_irq() which simply schedule a NAPI.
> >
> > But mlx4_eq_int() also fires a tasklet to service all items that were
> > queued via mlx4_add_cq_to_tasklet(), but this handler was not called
> > unless user cqe was handled.
> >
> > This is very confusing, as "mpstat -I SCPU ..." show huge number of
> > tasklet invocations.
> >
> > This patch saves this overhead, by carefully firing the tasklet directly
> > from mlx4_add_cq_to_tasklet(), removing four atomic operations per IRQ.
> >
> > Signed-off-by: Eric Dumazet 
> > Cc: Tariq Toukan 
> > Cc: Saeed Mahameed 
> > ---
> >  drivers/net/ethernet/mellanox/mlx4/cq.c |6 +-
> >  drivers/net/ethernet/mellanox/mlx4/eq.c |9 +
> >  2 files changed, 6 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx4/cq.c 
> > b/drivers/net/ethernet/mellanox/mlx4/cq.c
> > index 
> > 6b8635378f1fcb2aae4e8ac390bcd09d552c2256..fa6d2354a0e910ee160863e3cbe21a512d77bf03
> >  100644
> > --- a/drivers/net/ethernet/mellanox/mlx4/cq.c
> > +++ b/drivers/net/ethernet/mellanox/mlx4/cq.c
> > @@ -81,8 +81,9 @@ void mlx4_cq_tasklet_cb(unsigned long data)
> >
> >  static void mlx4_add_cq_to_tasklet(struct mlx4_cq *cq)
> >  {
> > -   unsigned long flags;
> > struct mlx4_eq_tasklet *tasklet_ctx = cq->tasklet_ctx.priv;
> > +   unsigned long flags;
> > +   bool kick;
> >
> > spin_lock_irqsave(&tasklet_ctx->lock, flags);
> > /* When migrating CQs between EQs will be implemented, please note
> > @@ -92,7 +93,10 @@ static void mlx4_add_cq_to_tasklet(struct mlx4_cq *cq)
> >  */
> > if (list_empty_careful(&cq->tasklet_ctx.list)) {
> > atomic_inc(&cq->refcount);
> > +   kick = list_empty(&tasklet_ctx->list);
> 
> So first one in would fire the tasklet, but wouldn't this cause CQE
> processing loss
> in the same mlx4_eq_int loop if the tasklet was fast enough to
> schedule and while other CQEs are going to add themselves to the
> tasklet_ctx->list ?


mlx4_eq_int() is a hard irq handler.

How a tasklet could run in the middle of it ?

A tasklet is a softirq handler.

softirq must wait that the current hard irq handler is done.
> 
> Anyway i tried to find race scenarios that could cause such thing but
> synchronization looks good.
> 
> > list_add_tail(&cq->tasklet_ctx.list, &tasklet_ctx->list);
> > +   if (kick)
> > +   tasklet_schedule(&tasklet_ctx->task);
> > }
> > spin_unlock_irqrestore(&tasklet_ctx->lock, flags);
> >  }
> > diff --git a/drivers/net/ethernet/mellanox/mlx4/eq.c 
> > b/drivers/net/ethernet/mellanox/mlx4/eq.c
> > index 
> > 0509996957d9664b612358dd805359f4bc67b8dc..39232b6a974f4b4b961d3b0b8634f04e6b9d0caa
> >  100644
> > --- a/drivers/net/ethernet/mellanox/mlx4/eq.c
> > +++ b/drivers/net/ethernet/mellanox/mlx4/eq.c
> > @@ -494,7 +494,7 @@ static int mlx4_eq_int(struct mlx4_dev *dev, struct 
> > mlx4_eq *eq)
> >  {
> > struct mlx4_priv *priv = mlx4_priv(dev);
> > struct mlx4_eqe *eqe;
> > -   int cqn = -1;
> > +   int cqn;
> > int eqes_found = 0;
> > int set_ci = 0;
> > int port;
> > @@ -840,13 +840,6 @@ static int mlx4_eq_int(struct mlx4_dev *dev, struct 
> > mlx4_eq *eq)
> >
> > eq_set_ci(eq, 1);
> >
> > -   /* cqn is 24bit wide but is initialized such that its higher bits
> > -* are ones too. Thus, if we got any event, cqn's high bits should 
> > be off
> > -* and we need to schedule the tasklet.
> > -*/
> > -   if (!(cqn & ~0xff))
> 
> what if we simply change this condition to:
> if (!list_empty_careful(eq->tasklet_ctx.list))
> 
> Wouldn't this be sort of equivalent to what you did ? and this way we
> would simply fire the tasklet only when needed and not on every
> handled CQE.

Still this test would be done one million time per second on my hosts.

What is the point exactly ?

Thanks.

[PATCH] Implement full-functionality option for ECN encapsulation in tunnel

2017-02-15 Thread Vadim Fedorenko

IPVS tunnel mode works as simple tunnel (see RFC 3168) copying ECN field
to outer header. That's result in packet drops on egress tunnels in case
the egress tunnel operates as ECN-capable with Full-functionality option
(like ip_tunnel and ip6_tunnel kernel modules), according to RFC 3168
section 9.1.1 recommendation.

This patch implements ECN full-functionality option into ipvs xmit code.

Cc: netdev@vger.kernel.org
Cc: lvs-de...@vger.kernel.org
Signed-off-by: Vadim Fedorenko 
Reviewed-by: Konstantin Khlebnikov 
---
 net/netfilter/ipvs/ip_vs_xmit.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index 01d3d89..b3286f3 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -879,6 +879,7 @@ static inline int ip_vs_send_or_cont(int pf, struct sk_buff 
*skb,
 {
struct sk_buff *new_skb = NULL;
struct iphdr *old_iph = NULL;
+   __u8 old_dsfield;
 #ifdef CONFIG_IP_VS_IPV6
struct ipv6hdr *old_ipv6h = NULL;
 #endif
@@ -903,7 +904,7 @@ static inline int ip_vs_send_or_cont(int pf, struct sk_buff 
*skb,
*payload_len =
ntohs(old_ipv6h->payload_len) +
sizeof(*old_ipv6h);
-   *dsfield = ipv6_get_dsfield(old_ipv6h);
+   old_dsfield = ipv6_get_dsfield(old_ipv6h);
*ttl = old_ipv6h->hop_limit;
if (df)
*df = 0;
@@ -918,12 +919,15 @@ static inline int ip_vs_send_or_cont(int pf, struct 
sk_buff *skb,
 
/* fix old IP header checksum */
ip_send_check(old_iph);
-   *dsfield = ipv4_get_dsfield(old_iph);
+   old_dsfield = ipv4_get_dsfield(old_iph);
*ttl = old_iph->ttl;
if (payload_len)
*payload_len = ntohs(old_iph->tot_len);
}
 
+   /* Implement full-functionality option for ECN encapsulation */
+   *dsfield = INET_ECN_encapsulate(old_dsfield, old_dsfield);
+
return skb;
 error:
kfree_skb(skb);
-- 
1.9.1

Re: [PATCH net] packet: Do not call fanout_release from atomic contexts

2017-02-15 Thread Eric Dumazet

On Wed, 2017-02-15 at 11:07 +, Anoob Soman wrote:
> On 13/02/17 14:50, Anoob Soman wrote:
> > On 13/02/17 14:26, Eric Dumazet wrote:
> >> On Mon, 2017-02-13 at 13:28 +, Anoob Soman wrote:
> >>
> >>> Wouldn't it be easier to call synchronize_net(), before calling
> >>> fanout_release_data() and kfree(f).
> >>> The behavior, wrt synchronize_net, would be same as before and
> >>> fanout_release() will cleanup everything without leaving any residue.
> >> So we would require two synchronize_net() calls instead of one ?
> >>
> >> synchronize_net() is very expensive on some hosts, it is a big hammer.
> >>
> >>
> >>
> >
> > Yes, one before fanout_release_data() (will be called only if 
> > fanout->sk_ref == 0) and one after fanout_release().
> >
> > I understand synchronize_net() is expensive, but adding another 
> > synchronize_net(),  before fanout_release_data(), will be no different 
> > from what we have in the existing code.
> >
> > I can also make sure second synchronize_net() doesn't get called 
> > again, if fanout_release() calls synchronize_net(), by making 
> > fanout_release() return something to indicate it has done 
> > synchronize_net().
> 
> Hi Eric,
> 
> Did you get a chance to looks at my comments ?

You misunderstood my suggestion. 

I simply suggested to move the code, not adding another
synchronize_net()

Re: [PATCH net-next] mlx4: do not fire tasklet unless necessary

2017-02-15 Thread Saeed Mahameed

On Wed, Feb 15, 2017 at 3:29 PM, Eric Dumazet  wrote:
> On Wed, 2017-02-15 at 13:10 +0200, Saeed Mahameed wrote:
>> On Fri, Feb 10, 2017 at 2:27 PM, Eric Dumazet  wrote:
>> > From: Eric Dumazet 
>> >
>> > All rx and rx netdev interrupts are handled by respectively
>> > by mlx4_en_rx_irq() and mlx4_en_tx_irq() which simply schedule a NAPI.
>> >
>> > But mlx4_eq_int() also fires a tasklet to service all items that were
>> > queued via mlx4_add_cq_to_tasklet(), but this handler was not called
>> > unless user cqe was handled.
>> >
>> > This is very confusing, as "mpstat -I SCPU ..." show huge number of
>> > tasklet invocations.
>> >
>> > This patch saves this overhead, by carefully firing the tasklet directly
>> > from mlx4_add_cq_to_tasklet(), removing four atomic operations per IRQ.
>> >
>> > Signed-off-by: Eric Dumazet 
>> > Cc: Tariq Toukan 
>> > Cc: Saeed Mahameed 
>> > ---
>> >  drivers/net/ethernet/mellanox/mlx4/cq.c |6 +-
>> >  drivers/net/ethernet/mellanox/mlx4/eq.c |9 +
>> >  2 files changed, 6 insertions(+), 9 deletions(-)
>> >
>> > diff --git a/drivers/net/ethernet/mellanox/mlx4/cq.c 
>> > b/drivers/net/ethernet/mellanox/mlx4/cq.c
>> > index 
>> > 6b8635378f1fcb2aae4e8ac390bcd09d552c2256..fa6d2354a0e910ee160863e3cbe21a512d77bf03
>> >  100644
>> > --- a/drivers/net/ethernet/mellanox/mlx4/cq.c
>> > +++ b/drivers/net/ethernet/mellanox/mlx4/cq.c
>> > @@ -81,8 +81,9 @@ void mlx4_cq_tasklet_cb(unsigned long data)
>> >
>> >  static void mlx4_add_cq_to_tasklet(struct mlx4_cq *cq)
>> >  {
>> > -   unsigned long flags;
>> > struct mlx4_eq_tasklet *tasklet_ctx = cq->tasklet_ctx.priv;
>> > +   unsigned long flags;
>> > +   bool kick;
>> >
>> > spin_lock_irqsave(&tasklet_ctx->lock, flags);
>> > /* When migrating CQs between EQs will be implemented, please note
>> > @@ -92,7 +93,10 @@ static void mlx4_add_cq_to_tasklet(struct mlx4_cq *cq)
>> >  */
>> > if (list_empty_careful(&cq->tasklet_ctx.list)) {
>> > atomic_inc(&cq->refcount);
>> > +   kick = list_empty(&tasklet_ctx->list);
>>
>> So first one in would fire the tasklet, but wouldn't this cause CQE
>> processing loss
>> in the same mlx4_eq_int loop if the tasklet was fast enough to
>> schedule and while other CQEs are going to add themselves to the
>> tasklet_ctx->list ?
>
>
> mlx4_eq_int() is a hard irq handler.
>
> How a tasklet could run in the middle of it ?
>

can the tasklet run on a different core ?

> A tasklet is a softirq handler.
>
> softirq must wait that the current hard irq handler is done.
>>
>> Anyway i tried to find race scenarios that could cause such thing but
>> synchronization looks good.
>>
>> > list_add_tail(&cq->tasklet_ctx.list, &tasklet_ctx->list);
>> > +   if (kick)
>> > +   tasklet_schedule(&tasklet_ctx->task);
>> > }
>> > spin_unlock_irqrestore(&tasklet_ctx->lock, flags);
>> >  }
>> > diff --git a/drivers/net/ethernet/mellanox/mlx4/eq.c 
>> > b/drivers/net/ethernet/mellanox/mlx4/eq.c
>> > index 
>> > 0509996957d9664b612358dd805359f4bc67b8dc..39232b6a974f4b4b961d3b0b8634f04e6b9d0caa
>> >  100644
>> > --- a/drivers/net/ethernet/mellanox/mlx4/eq.c
>> > +++ b/drivers/net/ethernet/mellanox/mlx4/eq.c
>> > @@ -494,7 +494,7 @@ static int mlx4_eq_int(struct mlx4_dev *dev, struct 
>> > mlx4_eq *eq)
>> >  {
>> > struct mlx4_priv *priv = mlx4_priv(dev);
>> > struct mlx4_eqe *eqe;
>> > -   int cqn = -1;
>> > +   int cqn;
>> > int eqes_found = 0;
>> > int set_ci = 0;
>> > int port;
>> > @@ -840,13 +840,6 @@ static int mlx4_eq_int(struct mlx4_dev *dev, struct 
>> > mlx4_eq *eq)
>> >
>> > eq_set_ci(eq, 1);
>> >
>> > -   /* cqn is 24bit wide but is initialized such that its higher bits
>> > -* are ones too. Thus, if we got any event, cqn's high bits should 
>> > be off
>> > -* and we need to schedule the tasklet.
>> > -*/
>> > -   if (!(cqn & ~0xff))
>>
>> what if we simply change this condition to:
>> if (!list_empty_careful(eq->tasklet_ctx.list))
>>
>> Wouldn't this be sort of equivalent to what you did ? and this way we
>> would simply fire the tasklet only when needed and not on every
>> handled CQE.
>
> Still this test would be done one million time per second on my hosts.
>
> What is the point exactly ?
>

the point is that if the EQ is full of CQEs from different CQs you would
do the "  kick = list_empty(&tasklet_ctx->list);" test per empty CQ
list rather than once at the end.

in mlx4_en case, you have only two CQs on each EQ but in RoCE/IB you
can have as many CQs as you want.

> Thanks.
>
>

Re: [PATCH net-next] mlx4: do not fire tasklet unless necessary

2017-02-15 Thread Eric Dumazet

On Wed, 2017-02-15 at 05:29 -0800, Eric Dumazet wrote:

> 
> mlx4_eq_int() is a hard irq handler.
> 
> How a tasklet could run in the middle of it ?
> 
> A tasklet is a softirq handler.

Speaking of mlx4_eq_int() , 50% of cycles are spent on mb() (mfence)
in eq_set_ci()

I wonder why this very expensive mb() is required, right before exiting
the interrupt handler.

[RESEND PATCH 1/1] can: m_can: fix bitrate setup on latest silicon

2017-02-15 Thread Quentin Schulz

From: Florian Vallee 

According to the m_can user manual changelog the BTP register layout was
updated with core revision 3.1.0

This change is not backward-compatible and using the current driver along
with a recent IP results in an incorrect bitrate on the wire.

Tested with a SAMA5D2 SoC (CREL = 0x31040730)

Signed-off-by: Florian Vallee 
Tested-by: Quentin Schulz 
---
 drivers/net/can/m_can/m_can.c | 38 +++---
 1 file changed, 35 insertions(+), 3 deletions(-)

diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
index 195f15e..246584e 100644
--- a/drivers/net/can/m_can/m_can.c
+++ b/drivers/net/can/m_can/m_can.c
@@ -105,6 +105,10 @@ enum m_can_mram_cfg {
MRAM_CFG_NUM,
 };
 
+/* Core Release Register (CREL) */
+#define CRR_REL_MASK   0xfff
+#define CRR_REL_SHIFT  20
+
 /* Fast Bit Timing & Prescaler Register (FBTP) */
 #define FBTR_FBRP_MASK 0x1f
 #define FBTR_FBRP_SHIFT16
@@ -136,7 +140,7 @@ enum m_can_mram_cfg {
 #define CCCR_INIT  BIT(0)
 #define CCCR_CANFD 0x10
 
-/* Bit Timing & Prescaler Register (BTP) */
+/* Bit Timing & Prescaler Register (BTP) (M_CAN IP < 3.1.0) */
 #define BTR_BRP_MASK   0x3ff
 #define BTR_BRP_SHIFT  16
 #define BTR_TSEG1_SHIFT8
@@ -146,6 +150,16 @@ enum m_can_mram_cfg {
 #define BTR_SJW_SHIFT  0
 #define BTR_SJW_MASK   0xf
 
+/* Nominal Bit Timing & Prescaler Register (NBTP) (M_CAN IP >= 3.1.0) */
+#define NBTR_SJW_SHIFT 25
+#define NBTR_SJW_MASK  (0x7f << NBTR_SJW_SHIFT)
+#define NBTR_BRP_SHIFT 16
+#define NBTR_BRP_MASK  (0x3ff << NBTR_BRP_SHIFT)
+#define NBTR_TSEG1_SHIFT   8
+#define NBTR_TSEG1_MASK(0xff << NBTR_TSEG1_SHIFT)
+#define NBTR_TSEG2_SHIFT   0
+#define NBTR_TSEG2_MASK(0x7f << NBTR_TSEG2_SHIFT)
+
 /* Error Counter Register(ECR) */
 #define ECR_RP BIT(15)
 #define ECR_REC_SHIFT  8
@@ -200,6 +214,9 @@ enum m_can_mram_cfg {
 IR_RF1L | IR_RF0L)
 #define IR_ERR_ALL (IR_ERR_STATE | IR_ERR_BUS)
 
+/* Core Version */
+#define M_CAN_COREREL_3_1_00x310
+
 /* Interrupt Line Select (ILS) */
 #define ILS_ALL_INT0   0x0
 #define ILS_ALL_INT1   0x
@@ -357,6 +374,13 @@ static inline void m_can_disable_all_interrupts(const 
struct m_can_priv *priv)
m_can_write(priv, M_CAN_ILE, 0x0);
 }
 
+static inline int m_can_read_core_rev(const struct m_can_priv *priv)
+{
+   u32 reg = m_can_read(priv, M_CAN_CREL);
+
+   return ((reg >> CRR_REL_SHIFT) & CRR_REL_MASK);
+}
+
 static void m_can_read_fifo(struct net_device *dev, u32 rxfs)
 {
struct net_device_stats *stats = &dev->stats;
@@ -811,8 +835,16 @@ static int m_can_set_bittiming(struct net_device *dev)
sjw = bt->sjw - 1;
tseg1 = bt->prop_seg + bt->phase_seg1 - 1;
tseg2 = bt->phase_seg2 - 1;
-   reg_btp = (brp << BTR_BRP_SHIFT) | (sjw << BTR_SJW_SHIFT) |
-   (tseg1 << BTR_TSEG1_SHIFT) | (tseg2 << BTR_TSEG2_SHIFT);
+
+   if (m_can_read_core_rev(priv) < M_CAN_COREREL_3_1_0)
+   reg_btp = (brp << BTR_BRP_SHIFT) | (sjw << BTR_SJW_SHIFT) |
+   (tseg1 << BTR_TSEG1_SHIFT) |
+   (tseg2 << BTR_TSEG2_SHIFT);
+   else
+   reg_btp = (brp << NBTR_BRP_SHIFT) | (sjw << NBTR_SJW_SHIFT) |
+   (tseg1 << NBTR_TSEG1_SHIFT) |
+   (tseg2 << NBTR_TSEG2_SHIFT);
+
m_can_write(priv, M_CAN_BTP, reg_btp);
 
if (priv->can.ctrlmode & CAN_CTRLMODE_FD) {
-- 
2.9.3

Re: [PATCH net-next] sctp: change to use uint_t in uapi sctp.h

2017-02-15 Thread Neil Horman

On Tue, Feb 14, 2017 at 11:26:04AM -0500, David Miller wrote:
> From: Xin Long 
> Date: Tue, 14 Feb 2017 16:23:48 +0800
> 
> > All structures in uapi sctp.h are exported for userspace, their members'
> > types should use uint_t instead of __u.
> 
> This is not true.
> 
> __u is in fact preferred for userspace exported datastructures.
> 

I'll admit that I can never remember which is which, but the files in uapi seem
pretty evenly split between the two types, and I always thought the uint_t
was the one meant for user space

Neil

Re: [RFC 2/2] net: emac: add support for device-tree based PHY discovery and setup

2017-02-15 Thread Andrew Lunn

> > > Is the PHY just powered down by chance (BMCR_PWRDN set?) and resetting
> > > it implicitly clears the power down that seems to be what is going on.
> > 
> > Yes, the PHY is just in the BMCR_PDOWN state. I can do the same
> > on the WNDR4700, by messing with u-boot:

Hi Christian

What happens if you list the PHYs in the device tree, with their PHY
ID. That should avoid it looking for the ID and getting 0x
back. It should just probe the correct PHY driver. If the first thing
the drivers probe function does it reset the power down bit, it might
work.

Andrew

[PATCH iproute2 net-next 3/3] iplink: bridge_slave: add support for displaying xstats

2017-02-15 Thread Nikolay Aleksandrov

This patch adds support to the bridge_slave link type for displaying
xstats by reusing the previously added bridge xstats callbacks.

Signed-off-by: Nikolay Aleksandrov 
---
 ip/ip_common.h   | 3 +++
 ip/iplink_bridge.c   | 6 +++---
 ip/iplink_bridge_slave.c | 2 ++
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/ip/ip_common.h b/ip/ip_common.h
index 071c3db280f2..9c3cd294d79e 100644
--- a/ip/ip_common.h
+++ b/ip/ip_common.h
@@ -97,6 +97,9 @@ struct link_util {
 struct link_util *get_link_kind(const char *kind);
 
 void br_dump_bridge_id(const struct ifla_bridge_id *id, char *buf, size_t len);
+int bridge_parse_xstats(struct link_util *lu, int argc, char **argv);
+int bridge_print_xstats(const struct sockaddr_nl *who,
+   struct nlmsghdr *n, void *arg);
 
 __u32 ipvrf_get_table(const char *name);
 int name_is_vrf(const char *name);
diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c
index 62ceee6b571e..818b43c89b5b 100644
--- a/ip/iplink_bridge.c
+++ b/ip/iplink_bridge.c
@@ -680,8 +680,8 @@ static void bridge_print_stats_attr(FILE *f, struct rtattr 
*attr, int ifindex)
}
 }
 
-static int bridge_print_xstats(const struct sockaddr_nl *who,
-  struct nlmsghdr *n, void *arg)
+int bridge_print_xstats(const struct sockaddr_nl *who,
+   struct nlmsghdr *n, void *arg)
 {
struct if_stats_msg *ifsm = NLMSG_DATA(n);
struct rtattr *tb[IFLA_STATS_MAX+1];
@@ -708,7 +708,7 @@ static int bridge_print_xstats(const struct sockaddr_nl 
*who,
return 0;
 }
 
-static int bridge_parse_xstats(struct link_util *lu, int argc, char **argv)
+int bridge_parse_xstats(struct link_util *lu, int argc, char **argv)
 {
while (argc > 0) {
if (strcmp(*argv, "igmp") == 0 || strcmp(*argv, "mcast") == 0) {
diff --git a/ip/iplink_bridge_slave.c b/ip/iplink_bridge_slave.c
index 6353fc533bf9..3e883328ae0c 100644
--- a/ip/iplink_bridge_slave.c
+++ b/ip/iplink_bridge_slave.c
@@ -312,4 +312,6 @@ struct link_util bridge_slave_link_util = {
.print_opt  = bridge_slave_print_opt,
.parse_opt  = bridge_slave_parse_opt,
.print_help = bridge_slave_print_help,
+   .parse_ifla_xstats = bridge_parse_xstats,
+   .print_ifla_xstats = bridge_print_xstats,
 };
-- 
2.1.4

[PATCH iproute2 net-next 0/3] iplink: add support for link xstats

2017-02-15 Thread Nikolay Aleksandrov

Hi,
This set adds support for printing link xstats per link type. Currently
only the bridge and its ports support such call and it dumps the mcast
stats. This model makes it easy to use the same callback for both bridge
and bridge_slave link types. Patch 01 also updates the man page with the
new xstats link option and you can find an example in patch 02's commit
message.

Thanks,
 Nik


Nikolay Aleksandrov (3):
  iplink: add support for xstats subcommand
  iplink: bridge: add support for displaying xstats
  iplink: bridge_slave: add support for displaying xstats

 ip/Makefile  |   2 +-
 ip/ip_common.h   |  12 +++-
 ip/iplink.c  |   5 ++
 ip/iplink_bridge.c   | 153 +++
 ip/iplink_bridge_slave.c |   2 +
 ip/iplink_xstats.c   |  81 +
 man/man8/ip-link.8.in|  12 
 7 files changed, 264 insertions(+), 3 deletions(-)
 create mode 100644 ip/iplink_xstats.c

-- 
2.1.4

[PATCH iproute2 net-next 2/3] iplink: bridge: add support for displaying xstats

2017-02-15 Thread Nikolay Aleksandrov

Add support for the new parse/print_ifla_xstats callbacks and use them to
print the per-bridge multicast stats.
Example:
$ ip link xstats type bridge
br0
IGMP queries:
  RX: v1 0 v2 0 v3 0
  TX: v1 0 v2 0 v3 0
IGMP reports:
  RX: v1 0 v2 0 v3 0
  TX: v1 0 v2 0 v3 0
IGMP leaves: RX: 0 TX: 0
IGMP parse errors: 0
MLD queries:
  RX: v1 0 v2 0
  TX: v1 0 v2 0
MLD reports:
  RX: v1 0 v2 0
  TX: v1 0 v2 0
MLD leaves: RX: 0 TX: 0
MLD parse errors: 0

Signed-off-by: Nikolay Aleksandrov 
---
 ip/iplink_bridge.c | 153 +
 1 file changed, 153 insertions(+)

diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c
index a17ff3555488..62ceee6b571e 100644
--- a/ip/iplink_bridge.c
+++ b/ip/iplink_bridge.c
@@ -12,13 +12,19 @@
 #include 
 #include 
 #include 
+#include 
 #include 
+#include 
 #include 
+#include 
 
 #include "rt_names.h"
 #include "utils.h"
 #include "ip_common.h"
 
+static unsigned int xstats_print_attr;
+static int filter_index;
+
 static void print_explain(FILE *f)
 {
fprintf(f,
@@ -582,10 +588,157 @@ static void bridge_print_help(struct link_util *lu, int 
argc, char **argv,
print_explain(f);
 }
 
+static void bridge_print_xstats_help(struct link_util *lu, FILE *f)
+{
+   fprintf(f, "Usage: ... %s [ igmp ] [ dev DEVICE ]\n", lu->id);
+}
+
+static void bridge_print_stats_attr(FILE *f, struct rtattr *attr, int ifindex)
+{
+   struct rtattr *brtb[LINK_XSTATS_TYPE_MAX+1];
+   struct br_mcast_stats *mstats;
+   struct rtattr *i, *list;
+   const char *ifname = "";
+   int rem;
+
+   parse_rtattr(brtb, LINK_XSTATS_TYPE_MAX, RTA_DATA(attr),
+   RTA_PAYLOAD(attr));
+   if (!brtb[LINK_XSTATS_TYPE_BRIDGE])
+   return;
+
+   list = brtb[LINK_XSTATS_TYPE_BRIDGE];
+   rem = RTA_PAYLOAD(list);
+   for (i = RTA_DATA(list); RTA_OK(i, rem); i = RTA_NEXT(i, rem)) {
+   if (xstats_print_attr && i->rta_type != xstats_print_attr)
+   continue;
+   switch (i->rta_type) {
+   case BRIDGE_XSTATS_MCAST:
+   mstats = RTA_DATA(i);
+   ifname = ll_index_to_name(ifindex);
+   fprintf(f, "%-16s\n", ifname);
+   fprintf(f, "%-16sIGMP queries:\n", "");
+   fprintf(f, "%-16s  RX: v1 %llu v2 %llu v3 %llu\n",
+   "",
+   mstats->igmp_v1queries[BR_MCAST_DIR_RX],
+   mstats->igmp_v2queries[BR_MCAST_DIR_RX],
+   mstats->igmp_v3queries[BR_MCAST_DIR_RX]);
+   fprintf(f, "%-16s  TX: v1 %llu v2 %llu v3 %llu\n",
+   "",
+   mstats->igmp_v1queries[BR_MCAST_DIR_TX],
+   mstats->igmp_v2queries[BR_MCAST_DIR_TX],
+   mstats->igmp_v3queries[BR_MCAST_DIR_TX]);
+
+   fprintf(f, "%-16sIGMP reports:\n", "");
+   fprintf(f, "%-16s  RX: v1 %llu v2 %llu v3 %llu\n",
+   "",
+   mstats->igmp_v1reports[BR_MCAST_DIR_RX],
+   mstats->igmp_v2reports[BR_MCAST_DIR_RX],
+   mstats->igmp_v3reports[BR_MCAST_DIR_RX]);
+   fprintf(f, "%-16s  TX: v1 %llu v2 %llu v3 %llu\n",
+   "",
+   mstats->igmp_v1reports[BR_MCAST_DIR_TX],
+   mstats->igmp_v2reports[BR_MCAST_DIR_TX],
+   mstats->igmp_v3reports[BR_MCAST_DIR_TX]);
+
+   fprintf(f, "%-16sIGMP leaves: RX: %llu TX: %llu\n",
+   "",
+   mstats->igmp_leaves[BR_MCAST_DIR_RX],
+   mstats->igmp_leaves[BR_MCAST_DIR_TX]);
+
+   fprintf(f, "%-16sIGMP parse errors: %llu\n",
+   "", mstats->igmp_parse_errors);
+
+   fprintf(f, "%-16sMLD queries:\n", "");
+   fprintf(f, "%-16s  RX: v1 %llu v2 %llu\n",
+   "",
+   mstats->mld_v1queries[BR_MCAST_DIR_RX],
+   mstats->mld_v2queries[BR_MCAST_DIR_RX]);
+   fprintf(f, "%-16s  TX: v1 %llu v2 %llu\n",
+   "",
+   mstats->mld_v1queries[BR_MCAST_DIR_TX],
+   mstats->mld_v2qu

[PATCH iproute2 net-next 1/3] iplink: add support for xstats subcommand

2017-02-15 Thread Nikolay Aleksandrov

This patch adds support for a new xstats link subcommand which uses the
specified link type's new parse/print_ifla_xstats callbacks to display
extended statistics.

Signed-off-by: Nikolay Aleksandrov 
---
 ip/Makefile   |  2 +-
 ip/ip_common.h|  9 --
 ip/iplink.c   |  5 
 ip/iplink_xstats.c| 81 +++
 man/man8/ip-link.8.in | 12 
 5 files changed, 106 insertions(+), 3 deletions(-)
 create mode 100644 ip/iplink_xstats.c

diff --git a/ip/Makefile b/ip/Makefile
index 1928489e7f90..4276a34b529e 100644
--- a/ip/Makefile
+++ b/ip/Makefile
@@ -8,7 +8,7 @@ IPOBJ=ip.o ipaddress.o ipaddrlabel.o iproute.o iprule.o 
ipnetns.o \
 link_iptnl.o link_gre6.o iplink_bond.o iplink_bond_slave.o iplink_hsr.o \
 iplink_bridge.o iplink_bridge_slave.o ipfou.o iplink_ipvlan.o \
 iplink_geneve.o iplink_vrf.o iproute_lwtunnel.o ipmacsec.o ipila.o \
-ipvrf.o
+ipvrf.o iplink_xstats.o
 
 RTMONOBJ=rtmon.o
 
diff --git a/ip/ip_common.h b/ip/ip_common.h
index ab6a83431fd6..071c3db280f2 100644
--- a/ip/ip_common.h
+++ b/ip/ip_common.h
@@ -61,6 +61,7 @@ int do_ipvrf(int argc, char **argv);
 void vrf_reset(void);
 
 int iplink_get(unsigned int flags, char *name, __u32 filt_mask);
+int iplink_ifla_xstats(int argc, char **argv);
 
 static inline int rtm_get_table(struct rtmsg *r, struct rtattr **tb)
 {
@@ -84,9 +85,13 @@ struct link_util {
void(*print_opt)(struct link_util *, FILE *,
 struct rtattr *[]);
void(*print_xstats)(struct link_util *, FILE *,
-struct rtattr *);
+   struct rtattr *);
void(*print_help)(struct link_util *, int, char **,
-FILE *);
+ FILE *);
+   int (*parse_ifla_xstats)(struct link_util *,
+int, char **);
+   int (*print_ifla_xstats)(const struct sockaddr_nl *,
+struct nlmsghdr *, void *);
 };
 
 struct link_util *get_link_kind(const char *kind);
diff --git a/ip/iplink.c b/ip/iplink.c
index 2638408c23b8..00fed9006ea6 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -98,6 +98,8 @@ void iplink_usage(void)
"\n"
"   ip link show [ DEVICE | group GROUP ] [up] [master DEV] 
[vrf NAME] [type TYPE]\n");
 
+   fprintf(stderr, "\n   ip link xstats type TYPE [ ARGS ]\n");
+
if (iplink_have_newlink()) {
fprintf(stderr,
"\n"
@@ -1411,6 +1413,9 @@ int do_iplink(int argc, char **argv)
matches(*argv, "list") == 0)
return ipaddr_list_link(argc-1, argv+1);
 
+   if (matches(*argv, "xstats") == 0)
+   return iplink_ifla_xstats(argc-1, argv+1);
+
if (matches(*argv, "help") == 0) {
do_help(argc-1, argv+1);
return 0;
diff --git a/ip/iplink_xstats.c b/ip/iplink_xstats.c
new file mode 100644
index ..10f953bc4584
--- /dev/null
+++ b/ip/iplink_xstats.c
@@ -0,0 +1,81 @@
+/*
+ * iplink_stats.c   Extended statistics commands
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ *
+ * Authors: Nikolay Aleksandrov 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "utils.h"
+#include "ip_common.h"
+
+static void print_explain(FILE *f)
+{
+   fprintf(f, "Usage: ... xstats type TYPE [ ARGS ]\n");
+}
+
+int iplink_ifla_xstats(int argc, char **argv)
+{
+   struct link_util *lu = NULL;
+   __u32 filt_mask;
+
+   if (!argc) {
+   fprintf(stderr, "xstats: missing argument\n");
+   return -1;
+   }
+
+   if (matches(*argv, "type") == 0) {
+   NEXT_ARG();
+   lu = get_link_kind(*argv);
+   if (!lu)
+   invarg("invalid type", *argv);
+   } else if (matches(*argv, "help") == 0) {
+   print_explain(stdout);
+   return 0;
+   } else {
+   invarg("unknown argument", *argv);
+   }
+
+   if (!lu) {
+   print_explain(stderr);
+   return -1;
+   }
+
+   if (!lu->print_ifla_xstats) {
+   fprintf(stderr, "xstats: link type %s doesn't support xstats\n",
+   lu->id);
+   return -1;
+   }
+
+   if (lu->parse_ifla_xstats &&
+   lu->parse_ifla_xstats(lu, argc-1, argv+1))
+   return -1;
+
+   if (

[PATCH V5 0/2] Add QLogic FastLinQ FCoE (qedf) driver

2017-02-15 Thread Dupuis, Chad

From: "Dupuis, Chad" 

Dave, please apply the qed patch to net-next at your earliest convenience.
Martin, the qed patch needs to be applied first as the qedf patch is dependent
on the FCoE bits in the first qed driver patch.

This series introduces the hardware offload FCoE initiator driver for the
41000 Series Converged Network Adapters (579xx chip) by Cavium. The overall
driver design includes a common module ('qed') and protocol specific
dependent modules ('qedf' for FCoE).

This driver uses the kernel components of libfc and libfcoe as is and does not
make use of the open-fcoe user space components.  Therefore, no changes will 
need to be
made to any open-fcoe components.

The 'qed' common module, under drivers/net/ethernet/qlogic/qed/, is
enhanced with functionality required for FCoE support.

Changes from V4 -> V5

- Fix code alignment, function and variable formatting based on review comments
  in qed patch

Changes from V3 -> V4

- Minor update to banner text in qed_fcoe.c|h files
- Fix kbuild robot error on 32-bit systems in qedf
- Remove unneeded double memcpy for offloaded ELS commands in qedf

Changes from V2 -> V3

- Fix uninitialized variables reported by kbuild robot in qedf
- Remove superfluous comments from qedf.h
- Introduce new qedf_ctx flag to different stopping I/O for debug purposes.
- Don't take lport->disc.disc_mutex when restarting an rport.
- Remove extra whitespace in qedf_hsi.h

Changes from V1 -> V2

Changes in qed:
- Fix compiler warning when CONFIG_DCB is not set.

Fixes in qedf:
- Add qedf to scsi directory Makefile.
- Updates to convert LightL2 and I/O processing kthreads to workqueues.

Changes from RFC -> V1

- Squash qedf patches to one patch now that the initial review has taken place
- Convert qedf to use hotplug state machine
- Return via va_end to match corresponding va_start in logging functions
- Convert qedf_ctx offloaded port list to a RCU list so searches do not need
  to make use of spinlocks.  Also eliminates the need to fcport conn_id's.
- Use IS_ERR(fp) in qedf_flogi_resp() instead of checking individual FC_EX_* 
errors.
- Remove scsi_block_target when executing TMF request.
- Checkpatch fixes in the qed and qedf patches

Arun Easi (1):
  qed: Add support for hardware offloaded FCoE.

Dupuis, Chad (1):
  qedf: Add QLogic FastLinQ offload FCoE driver framework.

 MAINTAINERS   |6 +
 drivers/net/ethernet/qlogic/Kconfig   |3 +
 drivers/net/ethernet/qlogic/qed/Makefile  |1 +
 drivers/net/ethernet/qlogic/qed/qed.h |   11 +
 drivers/net/ethernet/qlogic/qed/qed_cxt.c |   98 +-
 drivers/net/ethernet/qlogic/qed/qed_cxt.h |3 +
 drivers/net/ethernet/qlogic/qed/qed_dcbx.c|   13 +-
 drivers/net/ethernet/qlogic/qed/qed_dcbx.h|5 +-
 drivers/net/ethernet/qlogic/qed/qed_dev.c |  205 +-
 drivers/net/ethernet/qlogic/qed/qed_dev_api.h |   42 +
 drivers/net/ethernet/qlogic/qed/qed_fcoe.c| 1014 +++
 drivers/net/ethernet/qlogic/qed/qed_fcoe.h|   87 +
 drivers/net/ethernet/qlogic/qed/qed_hsi.h |  781 -
 drivers/net/ethernet/qlogic/qed/qed_hw.c  |3 +
 drivers/net/ethernet/qlogic/qed/qed_ll2.c |   25 +
 drivers/net/ethernet/qlogic/qed/qed_ll2.h |2 +-
 drivers/net/ethernet/qlogic/qed/qed_main.c|7 +
 drivers/net/ethernet/qlogic/qed/qed_mcp.c |3 +
 drivers/net/ethernet/qlogic/qed/qed_mcp.h |1 +
 drivers/net/ethernet/qlogic/qed/qed_reg_addr.h|8 +
 drivers/net/ethernet/qlogic/qed/qed_sp.h  |4 +
 drivers/net/ethernet/qlogic/qed/qed_sp_commands.c |3 +
 drivers/scsi/Kconfig  |1 +
 drivers/scsi/Makefile |1 +
 drivers/scsi/qedf/Kconfig |   11 +
 drivers/scsi/qedf/Makefile|5 +
 drivers/scsi/qedf/qedf.h  |  545 
 drivers/scsi/qedf/qedf_attr.c |  165 +
 drivers/scsi/qedf/qedf_dbg.c  |  195 ++
 drivers/scsi/qedf/qedf_dbg.h  |  154 +
 drivers/scsi/qedf/qedf_debugfs.c  |  460 +++
 drivers/scsi/qedf/qedf_els.c  |  949 ++
 drivers/scsi/qedf/qedf_fip.c  |  269 ++
 drivers/scsi/qedf/qedf_hsi.h  |  422 +++
 drivers/scsi/qedf/qedf_io.c   | 2282 ++
 drivers/scsi/qedf/qedf_main.c | 3336 +
 drivers/scsi/qedf/qedf_version.h  |   15 +
 include/linux/qed/common_hsi.h|   10 +-
 include/linux/qed/fcoe_common.h   |  715 +
 include/linux/qed/qed_fcoe_if.h   |  145 +
 include/linux/qed/qed_if.h|   41 +-
 41 files changed, 12027 insertions(+), 19 deletions(-)
 create mode 100644 drivers/net/ethernet/qlogic/qed/qed_fcoe.c

[PATCH V5 net-next 1/2] qed: Add support for hardware offloaded FCoE.

2017-02-15 Thread Dupuis, Chad

From: Arun Easi 

This adds the backbone required for the various HW initalizations
which are necessary for the FCoE driver (qedf) for QLogic FastLinQ
4 line of adapters - FW notification, resource initializations, etc.

Signed-off-by: Arun Easi 
Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/Kconfig   |3 +
 drivers/net/ethernet/qlogic/qed/Makefile  |1 +
 drivers/net/ethernet/qlogic/qed/qed.h |   11 +
 drivers/net/ethernet/qlogic/qed/qed_cxt.c |   98 +-
 drivers/net/ethernet/qlogic/qed/qed_cxt.h |3 +
 drivers/net/ethernet/qlogic/qed/qed_dcbx.c|   13 +-
 drivers/net/ethernet/qlogic/qed/qed_dcbx.h|5 +-
 drivers/net/ethernet/qlogic/qed/qed_dev.c |  205 -
 drivers/net/ethernet/qlogic/qed/qed_dev_api.h |   42 +
 drivers/net/ethernet/qlogic/qed/qed_fcoe.c| 1014 +
 drivers/net/ethernet/qlogic/qed/qed_fcoe.h|   87 ++
 drivers/net/ethernet/qlogic/qed/qed_hsi.h |  781 +++-
 drivers/net/ethernet/qlogic/qed/qed_hw.c  |3 +
 drivers/net/ethernet/qlogic/qed/qed_ll2.c |   25 +
 drivers/net/ethernet/qlogic/qed/qed_ll2.h |2 +-
 drivers/net/ethernet/qlogic/qed/qed_main.c|7 +
 drivers/net/ethernet/qlogic/qed/qed_mcp.c |3 +
 drivers/net/ethernet/qlogic/qed/qed_mcp.h |1 +
 drivers/net/ethernet/qlogic/qed/qed_reg_addr.h|8 +
 drivers/net/ethernet/qlogic/qed/qed_sp.h  |4 +
 drivers/net/ethernet/qlogic/qed/qed_sp_commands.c |3 +
 include/linux/qed/common_hsi.h|   10 +-
 include/linux/qed/fcoe_common.h   |  715 +++
 include/linux/qed/qed_fcoe_if.h   |  145 +++
 include/linux/qed/qed_if.h|   41 +-
 25 files changed, 3211 insertions(+), 19 deletions(-)
 create mode 100644 drivers/net/ethernet/qlogic/qed/qed_fcoe.c
 create mode 100644 drivers/net/ethernet/qlogic/qed/qed_fcoe.h
 create mode 100644 include/linux/qed/fcoe_common.h
 create mode 100644 include/linux/qed/qed_fcoe_if.h

diff --git a/drivers/net/ethernet/qlogic/Kconfig 
b/drivers/net/ethernet/qlogic/Kconfig
index 3cfd105..737b303 100644
--- a/drivers/net/ethernet/qlogic/Kconfig
+++ b/drivers/net/ethernet/qlogic/Kconfig
@@ -113,4 +113,7 @@ config QED_RDMA
 config QED_ISCSI
bool
 
+config QED_FCOE
+   bool
+
 endif # NET_VENDOR_QLOGIC
diff --git a/drivers/net/ethernet/qlogic/qed/Makefile 
b/drivers/net/ethernet/qlogic/qed/Makefile
index 729e437..e234083 100644
--- a/drivers/net/ethernet/qlogic/qed/Makefile
+++ b/drivers/net/ethernet/qlogic/qed/Makefile
@@ -7,3 +7,4 @@ qed-$(CONFIG_QED_SRIOV) += qed_sriov.o qed_vf.o
 qed-$(CONFIG_QED_LL2) += qed_ll2.o
 qed-$(CONFIG_QED_RDMA) += qed_roce.o
 qed-$(CONFIG_QED_ISCSI) += qed_iscsi.o qed_ooo.o
+qed-$(CONFIG_QED_FCOE) += qed_fcoe.o
diff --git a/drivers/net/ethernet/qlogic/qed/qed.h 
b/drivers/net/ethernet/qlogic/qed/qed.h
index 1f61cf3..0e218d0 100644
--- a/drivers/net/ethernet/qlogic/qed/qed.h
+++ b/drivers/net/ethernet/qlogic/qed/qed.h
@@ -60,6 +60,7 @@
 #define QED_WFQ_UNIT   100
 
 #define ISCSI_BDQ_ID(_port_id) (_port_id)
+#define FCOE_BDQ_ID(_port_id) ((_port_id) + 2)
 #define QED_WID_SIZE(1024)
 #define QED_PF_DEMS_SIZE(4)
 
@@ -167,6 +168,7 @@ struct qed_tunn_update_params {
  */
 enum qed_pci_personality {
QED_PCI_ETH,
+   QED_PCI_FCOE,
QED_PCI_ISCSI,
QED_PCI_ETH_ROCE,
QED_PCI_DEFAULT /* default in shmem */
@@ -204,6 +206,7 @@ enum QED_FEATURE {
QED_VF,
QED_RDMA_CNQ,
QED_VF_L2_QUE,
+   QED_FCOE_CQ,
QED_MAX_FEATURES,
 };
 
@@ -221,6 +224,7 @@ enum QED_PORT_MODE {
 
 enum qed_dev_cap {
QED_DEV_CAP_ETH,
+   QED_DEV_CAP_FCOE,
QED_DEV_CAP_ISCSI,
QED_DEV_CAP_ROCE,
 };
@@ -255,6 +259,10 @@ struct qed_hw_info {
u32 part_num[4];
 
unsigned char   hw_mac_addr[ETH_ALEN];
+   u64 node_wwn;
+   u64 port_wwn;
+
+   u16 num_fcoe_conns;
 
struct qed_igu_info *p_igu_info;
 
@@ -410,6 +418,7 @@ struct qed_hwfn {
struct qed_ooo_info *p_ooo_info;
struct qed_rdma_info*p_rdma_info;
struct qed_iscsi_info   *p_iscsi_info;
+   struct qed_fcoe_info*p_fcoe_info;
struct qed_pf_paramspf_params;
 
bool b_rdma_enabled_in_prs;
@@ -618,11 +627,13 @@ struct qed_dev {
 
u8  protocol;
 #define IS_QED_ETH_IF(cdev) ((cdev)->protocol == QED_PROTOCOL_ETH)
+#define IS_QED_FCOE_IF(cdev)((cdev)->protocol == QED_PROTOCOL_FCOE)
 
/* Callbacks to protocol driver */
union {
struct qed_common_cb_ops*common;
struct qed_eth_cb_ops

Re: [PATCH net-next] mlx4: do not fire tasklet unless necessary

2017-02-15 Thread Eric Dumazet

On Wed, 2017-02-15 at 15:59 +0200, Saeed Mahameed wrote:

> can the tasklet run on a different core ?

No. tasklets are scheduled on local cpu, like softirqs in general.

They are not migrated, unless cpu is removed (hotplug)

> 
> the point is that if the EQ is full of CQEs from different CQs you would
> do the "  kick = list_empty(&tasklet_ctx->list);" test per empty CQ
> list rather than once at the end.
> 
> in mlx4_en case, you have only two CQs on each EQ but in RoCE/IB you
> can have as many CQs as you want.

list_empty() before one list_add_tail() is a single instruction.

By doing this right before manipulating the list, it comes with a zero
cache line penalty.

While if you do it at the wrong place, it incurs one extra cache line
miss.

This extra tasklet invocations can cost 3 % of cpu cycles.

Re: [PATCH net-next] mlx4: do not fire tasklet unless necessary

2017-02-15 Thread Matan Barak (External)


On 15/02/2017 13:10, Saeed Mahameed wrote:

On Fri, Feb 10, 2017 at 2:27 PM, Eric Dumazet  wrote:

From: Eric Dumazet 

All rx and rx netdev interrupts are handled by respectively
by mlx4_en_rx_irq() and mlx4_en_tx_irq() which simply schedule a NAPI.

But mlx4_eq_int() also fires a tasklet to service all items that were
queued via mlx4_add_cq_to_tasklet(), but this handler was not called
unless user cqe was handled.

This is very confusing, as "mpstat -I SCPU ..." show huge number of
tasklet invocations.

This patch saves this overhead, by carefully firing the tasklet directly
from mlx4_add_cq_to_tasklet(), removing four atomic operations per IRQ.



So, in case of RDMA CQs, we add some per-CQE overhead of comparing the 
list pointers and condition upon that. Maybe we could add an 
invoke_tasklet boolean field on mlx4_cq and return its value from 
mlx4_cq_completion.

That's way we could do invoke_tasklet |= mlx4_cq_completion();

Outside the while loop we could just
if (invoke_tasklet)
tasklet_schedule

Anyway, I guess that even with per-CQE overhead, the performance impact 
here is pretty negligible - so I guess that's fine too :)

Re: [PATCH v2 net-next] virtio: Fix affinity for #VCPUs != #queue pairs

2017-02-15 Thread Michael S. Tsirkin

On Tue, Feb 14, 2017 at 11:17:41AM -0800, Benjamin Serebrin wrote:
> On Wed, Feb 8, 2017 at 11:37 AM, Michael S. Tsirkin  wrote:
> 
> > IIRC irqbalance will bail out and avoid touching affinity
> > if you set affinity from driver.  Breaking that's not nice.
> > Pls correct me if I'm wrong.
> 
> 
> I believe you're right that irqbalance will leave the affinity alone.
> 
> Irqbalance has had changes that may or may not be in the versions bundled with
> various guests, and I don't have a definitive cross-correlation of irqbalance
> version to guest version.  But in the existing code, the driver does
> set affinity for #VCPUs==#queues, so that's been happening anyway.

Right but only for the case where we are very sure we are doing the
right thing, so we don't need any help from irqbalance.

> The (original) intention of this patch was to extend the existing behavior
> to the case where we limit queue counts, to avoid the surprising discontinuity
> when #VCPU != #queues.
> 
> It's not obvious that it's wrong to cause irqbalance to leave these
> queues alone:  Generally you want the interrupt to come to the core that
> caused the work, to have cache locality and avoid lock contention.
> Doing fancier things is outside the scope of this patch.

Doing fancier things like trying to balance the load would be in scope
for irqbalance so I think you need to find a way to supply default
affinity without disabling irqbalance.

> > Doesn't look like this will handle the case of num cpus < num queues well.
> 
> I believe it's correct.  The first #VCPUs queues will have one bit set in 
> their
> xps mask, and the remaining queues have no bits set.  That means each VCPU 
> uses
> its own assigned TX queue (and the TX interrupt comes back to that VCPU).
>
> Thanks again for the review!
> Ben

RE: [PATCH 1/2] net: xilinx_emaclite: fix receive buffer overflow

2017-02-15 Thread David Laight

From: Anssi Hannula
> Sent: 15 February 2017 08:29
...
> Looking through the product guide [1] I don't see the actual receive
> packet length provided anywhere, so I guess that is why the crazy stuff
> is done.

If the hardware doesn't provide the receive packet length then I suggest
you 'fix' the hardware with an angle grinder :-)
Not fit for purpose.

David

Re: [PATCH net-next] mlx4: do not fire tasklet unless necessary

2017-02-15 Thread Eric Dumazet

On Wed, 2017-02-15 at 16:52 +0200, Matan Barak (External) wrote:

> So, in case of RDMA CQs, we add some per-CQE overhead of comparing the 
> list pointers and condition upon that. Maybe we could add an 
> invoke_tasklet boolean field on mlx4_cq and return its value from 
> mlx4_cq_completion.
> That's way we could do invoke_tasklet |= mlx4_cq_completion();
> 
> Outside the while loop we could just
> if (invoke_tasklet)
>  tasklet_schedule
> 
> Anyway, I guess that even with per-CQE overhead, the performance impact 
> here is pretty negligible - so I guess that's fine too :)

Real question or suggestion would be to use/fire a tasklet only under
stress.

Firing a tasklet adds a lot of latencies for user-space CQ completion,
since softirqs might have to be handled by a kernel thread (ksoftirqd)

I would be surprised if no customer was hit by your commit,
( net/mlx4_core: Use tasklet for user-space CQ completion events )
especially when using specific (RT) scheduler classes.

[PATCH] atm: idt77252, use setup_timer and mod_timer

2017-02-15 Thread Jiri Slaby

From: Jan Koniarik 

Stop accessing timer struct members directly and use setup_timer and
mod_timer helpers intended for that use. It makes the code cleaner and
will allow for easier change of the timer struct internals.

Signed-off-by: Jan Koniarik 
Signed-off-by: Jiri Slaby 
Cc: Chas Williams <3ch...@gmail.com>
Cc: 
Cc: 
---
 drivers/atm/idt77252.c | 12 +++-
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/atm/idt77252.c b/drivers/atm/idt77252.c
index 471ddfd93ea8..5ec109533bb9 100644
--- a/drivers/atm/idt77252.c
+++ b/drivers/atm/idt77252.c
@@ -2132,12 +2132,8 @@ idt77252_init_est(struct vc_map *vc, int pcr)
 
est->interval = 2;  /* XXX: make this configurable */
est->ewma_log = 2;  /* XXX: make this configurable */
-   init_timer(&est->timer);
-   est->timer.data = (unsigned long)vc;
-   est->timer.function = idt77252_est_timer;
-
-   est->timer.expires = jiffies + ((HZ / 4) << est->interval);
-   add_timer(&est->timer);
+   setup_timer(&est->timer, idt77252_est_timer, (unsigned long)vc);
+   mod_timer(&est->timer, jiffies + ((HZ / 4) << est->interval));
 
return est;
 }
@@ -3638,9 +3634,7 @@ static int idt77252_init_one(struct pci_dev *pcidev,
spin_lock_init(&card->cmd_lock);
spin_lock_init(&card->tst_lock);
 
-   init_timer(&card->tst_timer);
-   card->tst_timer.data = (unsigned long)card;
-   card->tst_timer.function = tst_timer;
+   setup_timer(&card->tst_timer, tst_timer, (unsigned long)card);
 
/* Do the I/O remapping... */
card->membase = ioremap(membase, 1024);
-- 
2.11.1

Re: [PATCH net-next v3 2/8] gtp: switch from struct socket to struct sock for the GTP sockets

2017-02-15 Thread David Miller

From: Andreas Schultz 
Date: Wed, 15 Feb 2017 08:04:56 +0100 (CET)

> - On Feb 14, 2017, at 6:48 PM, David S. Miller da...@davemloft.net wrote:
> 
>> From: Andreas Schultz 
>> Date: Mon, 13 Feb 2017 16:36:18 +0100
>> 
>>> +   if (gtp->sk0) {
>>> +   udp_sk(gtp->sk0)->encap_type = 0;
>>> +   rcu_assign_sk_user_data(gtp->sk0, NULL);
>>> +   sock_put(gtp->sk0);
>>> }
>> 
>> This does "sock_put(NULL);" because you are assigning gtp->sk0 to
>> NULL before the sock_put() call.  So you are leaking the socket,
>> at best.
> 
> I don't understand how this should happen. If I where to use 
> rcu_assign_pointer,
> then yes, but rcu_assign_sk_user_data does assign to the sk_user_data member
> of struct sock and not to the argument itself.

You are right, I misread the assignment.

Re: [PATCH net-next V3 3/7] net/sched: Reflect HW offload status

2017-02-15 Thread Or Gerlitz

On Wed, Feb 15, 2017 at 10:52 AM, Or Gerlitz  wrote:
> Currently there is no way of querying whether a filter is
> offloaded to HW or not when using "both" policy (where none
> of skip_sw or skip_hw flags are set by user-space).

> Add two new flags, "in hw" and "not in hw" such that user
> space can determine if a filter is actually offloaded to
> hw or not. The "in hw" UAPI semantics was chosen so it's
> similar to the "skip hw" flag logic.

To make things a bit more clear, the semantics of the "in hw"
thing relates to the time of dumping the rule.

Currently in all of the offloading drivers/cases, when the driver returns
success for the ndo_setup_tc call, the flow is offloaded to hw.

But moving fwd that might change, a flow might be not offloaded to HW
on some window of time.

The coming up example, is support for neigh updates w.r.t to IP tunnel
encapsulation offloads in mlx5 SRIOV switchdev mode.

Today we offload tunnel encap flow only if the kernel has valid neigh
to the tunnel destination. Under the works is a code to offload/un-offload
the flow to/from HW when the neigh becomes valid/invalid or goes through
hardware address change etc.

So what I basically suggest here is to enhance that future mlx5 series
with patches
under which the dump code of all the classifiers will invoke the
tc_setup_ndo with
a fourth sub command (today there are add/del/stats) which will return
the actual
"in hw" status.

This is aligned with the general architecture/approach in the kernel
for switchdev and other
offloads.

Note that this future change doesn't change the UAPI, it will still
have two values, "in hw"
and "not in hw". The values with this series are the actual values and
later with that change,
they will keep being the actual values, just the kernel method to
retrieve them will be different.

Or.

> If none of these two flags are set, this signals running
> over older kernel.

> Signed-off-by: Or Gerlitz 
> Reviewed-by: Amir Vadai 
> Acked-by: Jiri Pirko 
> ---
>  include/net/pkt_cls.h| 5 +
>  include/uapi/linux/pkt_cls.h | 6 --
>  2 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
> index 71b266c..15cfe15 100644
> --- a/include/net/pkt_cls.h
> +++ b/include/net/pkt_cls.h
> @@ -475,6 +475,11 @@ static inline bool tc_flags_valid(u32 flags)
> return true;
>  }
>
> +static inline bool tc_in_hw(u32 flags)
> +{
> +   return (flags & TCA_CLS_FLAGS_IN_HW) ? true : false;
> +}
> +
>  enum tc_fl_command {
> TC_CLSFLOWER_REPLACE,
> TC_CLSFLOWER_DESTROY,
> diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
> index 345551e..7a69f2a 100644
> --- a/include/uapi/linux/pkt_cls.h
> +++ b/include/uapi/linux/pkt_cls.h
> @@ -103,8 +103,10 @@ enum {
>  #define TCA_POLICE_MAX (__TCA_POLICE_MAX - 1)
>
>  /* tca flags definitions */
> -#define TCA_CLS_FLAGS_SKIP_HW  (1 << 0)
> -#define TCA_CLS_FLAGS_SKIP_SW  (1 << 1)
> +#define TCA_CLS_FLAGS_SKIP_HW  (1 << 0) /* don't offload filter to HW */
> +#define TCA_CLS_FLAGS_SKIP_SW  (1 << 1) /* don't use filter in SW */
> +#define TCA_CLS_FLAGS_IN_HW(1 << 2) /* filter is offloaded to HW */
> +#define TCA_CLS_FLAGS_NOT_IN_HW (1 << 3) /* filter isn't offloaded to HW */

[PATCH net] ibmvnic: Fix endian error when requesting device capabilities

2017-02-15 Thread Thomas Falcon

When a vNIC client driver requests a faulty device setting, the
server returns an acceptable value for the client to request.
This 64 bit value was incorrectly being swapped as a 32 bit value,
resulting in loss of data. This patch corrects that by using
the 64 bit swap function.

Signed-off-by: Thomas Falcon 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 5b66b4f..158b49a 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -2389,10 +2389,10 @@ static void handle_request_cap_rsp(union ibmvnic_crq 
*crq,
case PARTIALSUCCESS:
dev_info(dev, "req=%lld, rsp=%ld in %s queue, retrying.\n",
 *req_value,
-(long int)be32_to_cpu(crq->request_capability_rsp.
+(long int)be64_to_cpu(crq->request_capability_rsp.
   number), name);
release_sub_crqs_no_irqs(adapter);
-   *req_value = be32_to_cpu(crq->request_capability_rsp.number);
+   *req_value = be64_to_cpu(crq->request_capability_rsp.number);
init_sub_crqs(adapter, 1);
return;
default:
-- 
1.8.3.1

[PATCH net] ibmvnic: Fix endian errors in error reporting output

2017-02-15 Thread Thomas Falcon

Error reports received from firmware were not being converted from
big endian values, leading to bogus error codes reported on little
endian systems.

Signed-off-by: Thomas Falcon 
---
This patch depends on 
"[PATCH net] ibmvnic: Fix endian error when requesting device capabilites"
---
 drivers/net/ethernet/ibm/ibmvnic.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 158b49a..a07b8d7 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -2186,12 +2186,12 @@ static void handle_error_info_rsp(union ibmvnic_crq 
*crq,
 
if (!found) {
dev_err(dev, "Couldn't find error id %x\n",
-   crq->request_error_rsp.error_id);
+   be32_to_cpu(crq->request_error_rsp.error_id));
return;
}
 
dev_err(dev, "Detailed info for error id %x:",
-   crq->request_error_rsp.error_id);
+   be32_to_cpu(crq->request_error_rsp.error_id));
 
for (i = 0; i < error_buff->len; i++) {
pr_cont("%02x", (int)error_buff->buff[i]);
@@ -2270,8 +2270,8 @@ static void handle_error_indication(union ibmvnic_crq 
*crq,
dev_err(dev, "Firmware reports %serror id %x, cause %d\n",
crq->error_indication.
flags & IBMVNIC_FATAL_ERROR ? "FATAL " : "",
-   crq->error_indication.error_id,
-   crq->error_indication.error_cause);
+   be32_to_cpu(crq->error_indication.error_id),
+   be16_to_cpu(crq->error_indication.error_cause));
 
error_buff = kmalloc(sizeof(*error_buff), GFP_ATOMIC);
if (!error_buff)
-- 
1.8.3.1

Re: [PATCH net] packet: Do not call fanout_release from atomic contexts

2017-02-15 Thread Anoob Soman


On 15/02/17 13:46, Eric Dumazet wrote:

On Wed, 2017-02-15 at 11:07 +, Anoob Soman wrote:

On 13/02/17 14:50, Anoob Soman wrote:

On 13/02/17 14:26, Eric Dumazet wrote:

On Mon, 2017-02-13 at 13:28 +, Anoob Soman wrote:


Wouldn't it be easier to call synchronize_net(), before calling
fanout_release_data() and kfree(f).
The behavior, wrt synchronize_net, would be same as before and
fanout_release() will cleanup everything without leaving any residue.

So we would require two synchronize_net() calls instead of one ?

synchronize_net() is very expensive on some hosts, it is a big hammer.




Yes, one before fanout_release_data() (will be called only if
fanout->sk_ref == 0) and one after fanout_release().

I understand synchronize_net() is expensive, but adding another
synchronize_net(),  before fanout_release_data(), will be no different
from what we have in the existing code.

I can also make sure second synchronize_net() doesn't get called
again, if fanout_release() calls synchronize_net(), by making
fanout_release() return something to indicate it has done
synchronize_net().

Hi Eric,

Did you get a chance to looks at my comments ?

You misunderstood my suggestion.

I simply suggested to move the code, not adding another
synchronize_net()



I will move the code and send a v2.

Re: [PATCH v3 net-next 08/14] mlx4: use order-0 pages for RX

2017-02-15 Thread Tariq Toukan




On 14/02/2017 7:29 PM, Tom Herbert wrote:

On Tue, Feb 14, 2017 at 7:51 AM, Eric Dumazet  wrote:

On Tue, 2017-02-14 at 16:56 +0200, Tariq Toukan wrote:


As the previous series caused hangs, we must run functional regression
tests over this series as well.
Run has already started, and results will be available tomorrow morning.

In general, I really like this series. The re-factorization looks more
elegant and more correct, functionally.

However, performance wise: we fear that the numbers will be drastically
lower with this transition to order-0 pages,
because of the (becoming critical) page allocator and dma operations
bottlenecks, especially on systems with costly
dma operations, such as ARM, iommu=on, etc...


So, again, performance after this patch series his higher,
once you have sensible RX queues parameters, for the expected workload.

Only in pathological cases, you might have some regression.

The old schem was _maybe_ better _when_ memory is not fragmented.

When you run hosts for months, memory _is_ fragmented.

You never see that on benchmarks, unless you force memory being
fragmented.




We already have this exact issue in mlx5, where we moved to order-0
allocations with a fixed size cache, but that was not enough.
Customers of mlx5 have already complained about the performance
degradation, and currently this is hurting our business.
We get a clear nack from our performance regression team regarding doing
the same in mlx4.
So, the question is, can we live with this degradation until those
bottleneck challenges are addressed?

Again, there is no degradation.

We have been using order-0 pages for years at Google.

Only when we made the mistake to rebase from the upstream driver and
order-3 pages we got horrible regressions, causing production outages.

I was silly to believe that mm layer got better.


Following our perf experts feedback, I cannot just simply Ack. We need
to have a clear plan to close the perf gap or reduce the impact.

Your perf experts need to talk to me, or any experts at Google and
Facebook, really.


I agree with this 100%! To be blunt, power users like this are testing
your drivers far beyond what Mellanox is doing and understand how
performance gains in benchmarks translate to possible gains in real
production way more than your perf experts can. Listen to Eric!

Tom



Anything _relying_ on order-3 pages being available to impress
friends/customers is a lie.


Isn't it the same principle in page_frag_alloc() ?
It is called form __netdev_alloc_skb()/__napi_alloc_skb().

Why is it ok to have order-3 pages (PAGE_FRAG_CACHE_MAX_ORDER) there?
By using netdev/napi_alloc_skb, you'll get that the SKB's linear data is 
a frag of a huge page,

and it is not going to be freed before the other non-linear frags.
Cannot this cause the same threats (memory pinning and so...)?

Currently, mlx4 doesn't use this generic API, while most other drivers do.

Similar claims are true for TX:
https://github.com/torvalds/linux/commit/5640f7685831e088fe6c2e1f863a6805962f8e81

[PATCH v2 net] packet: Do not call fanout_release from atomic contexts

2017-02-15 Thread Anoob Soman

Commit 6664498280cf ("packet: call fanout_release, while UNREGISTERING a
netdev"), unfortunately, introduced the following issues.

1. calling mutex_lock(&fanout_mutex) (fanout_release()) from inside
rcu_read-side critical section. rcu_read_lock disables preemption, most often,
which prohibits calling sleeping functions.

[  ] include/linux/rcupdate.h:560 Illegal context switch in RCU read-side 
critical section!
[  ]
[  ] rcu_scheduler_active = 1, debug_locks = 0
[  ] 4 locks held by ovs-vswitchd/1969:
[  ]  #0:  (cb_lock){++}, at: [] genl_rcv+0x19/0x40
[  ]  #1:  (ovs_mutex){+.+.+.}, at: [] 
ovs_vport_cmd_del+0x4a/0x100 [openvswitch]
[  ]  #2:  (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x17/0x20
[  ]  #3:  (rcu_read_lock){..}, at: [] 
packet_notifier+0x5/0x3f0
[  ]
[  ] Call Trace:
[  ]  [] dump_stack+0x85/0xc4
[  ]  [] lockdep_rcu_suspicious+0x107/0x110
[  ]  [] ___might_sleep+0x57/0x210
[  ]  [] __might_sleep+0x70/0x90
[  ]  [] mutex_lock_nested+0x3c/0x3a0
[  ]  [] ? vprintk_default+0x1f/0x30
[  ]  [] ? printk+0x4d/0x4f
[  ]  [] fanout_release+0x1d/0xe0
[  ]  [] packet_notifier+0x2f9/0x3f0

2. calling mutex_lock(&fanout_mutex) inside spin_lock(&po->bind_lock).
"sleeping function called from invalid context"

[  ] BUG: sleeping function called from invalid context at 
kernel/locking/mutex.c:620
[  ] in_atomic(): 1, irqs_disabled(): 0, pid: 1969, name: ovs-vswitchd
[  ] INFO: lockdep is turned off.
[  ] Call Trace:
[  ]  [] dump_stack+0x85/0xc4
[  ]  [] ___might_sleep+0x202/0x210
[  ]  [] __might_sleep+0x70/0x90
[  ]  [] mutex_lock_nested+0x3c/0x3a0
[  ]  [] fanout_release+0x1d/0xe0
[  ]  [] packet_notifier+0x2f9/0x3f0

3. calling dev_remove_pack(&fanout->prot_hook), from inside
spin_lock(&po->bind_lock) or rcu_read-side critical-section. dev_remove_pack()
-> synchronize_net(), which might sleep.

[  ] BUG: scheduling while atomic: ovs-vswitchd/1969/0x0002
[  ] INFO: lockdep is turned off.
[  ] Call Trace:
[  ]  [] dump_stack+0x85/0xc4
[  ]  [] __schedule_bug+0x64/0x73
[  ]  [] __schedule+0x6b/0xd10
[  ]  [] schedule+0x6b/0x80
[  ]  [] schedule_timeout+0x38d/0x410
[  ]  [] synchronize_sched_expedited+0x53d/0x810
[  ]  [] synchronize_rcu_expedited+0xe/0x10
[  ]  [] synchronize_net+0x35/0x50
[  ]  [] dev_remove_pack+0x13/0x20
[  ]  [] fanout_release+0xbe/0xe0
[  ]  [] packet_notifier+0x2f9/0x3f0

4. fanout_release() races with calls from different CPU.

To fix the above problems, remove the call to fanout_release() under
rcu_read_lock(). Instead, call __dev_remove_pack(&fanout->prot_hook) and
netdev_run_todo will be happy that &dev->ptype_specific list is empty. In order
to achieve this, I moved dev_{add,remove}_pack() out of fanout_{add,release} to
__fanout_{link,unlink}. So, call to {,__}unregister_prot_hook() will make sure
fanout->prot_hook is removed as well.

Fixes: 6664498280cf ("packet: call fanout_release, while UNREGISTERING a 
netdev")
Reported-by: Eric Dumazet 
Signed-off-by: Anoob Soman 
---

Changes in v2:
 - Incorporated Eric's suggestion to do fanout_release_data() and kfree() after
synchronize_net()

 net/packet/af_packet.c | 29 ++---
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index d56ee46..af29510 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1497,6 +1497,8 @@ static void __fanout_link(struct sock *sk, struct 
packet_sock *po)
f->arr[f->num_members] = sk;
smp_wmb();
f->num_members++;
+   if (f->num_members == 1)
+   dev_add_pack(&f->prot_hook);
spin_unlock(&f->lock);
 }
 
@@ -1513,6 +1515,8 @@ static void __fanout_unlink(struct sock *sk, struct 
packet_sock *po)
BUG_ON(i >= f->num_members);
f->arr[i] = f->arr[f->num_members - 1];
f->num_members--;
+   if (f->num_members == 0)
+   __dev_remove_pack(&f->prot_hook);
spin_unlock(&f->lock);
 }
 
@@ -1687,7 +1691,6 @@ static int fanout_add(struct sock *sk, u16 id, u16 
type_flags)
match->prot_hook.func = packet_rcv_fanout;
match->prot_hook.af_packet_priv = match;
match->prot_hook.id_match = match_fanout_group;
-   dev_add_pack(&match->prot_hook);
list_add(&match->list, &fanout_list);
}
err = -EINVAL;
@@ -1712,10 +1715,16 @@ static int fanout_add(struct sock *sk, u16 id, u16 
type_flags)
return err;
 }
 
-static void fanout_release(struct sock *sk)
+/* If pkt_sk(sk)->fanout->sk_ref is zero, this functuon removes
+ * pkt_sk(sk)->fanout from fanout_list and returns pkt_sk(sk)->fanout.
+ * It is the responsibility of the caller to call fanout_release_data() and
+ * free the returned packet_fanout (after synchronize_net())
+ */
+static struct packet_fanout *fanout_release(struct sock *sk)
 {
struct packet_sock *po = pkt_sk(sk);
struct packet_fanout *f;
+   bool ret_fanout = false;
 
f = po->fanout;

Re: [kernel-hardening] Re: [RFC][PATCH] nfsd: add +1 to reference counting scheme for struct nfsd4_session

2017-02-15 Thread Bruce Fields

On Mon, Feb 13, 2017 at 06:46:19AM -0500, David Windsor wrote:
> On Mon, Feb 13, 2017 at 5:54 AM, Hans Liljestrand  wrote:
> > On Sat, Feb 11, 2017 at 01:42:53AM -0500, David Windsor wrote:
> >>
> >> 
> >>
> >>> Signed-off-by: David Windsor 
> >>> ---
> >>>  fs/nfsd/nfs4state.c | 6 +++---
> >>>  1 file changed, 3 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> >>> index a0dee8a..b0f3010 100644
> >>> --- a/fs/nfsd/nfs4state.c
> >>> +++ b/fs/nfsd/nfs4state.c
> >>> @@ -196,7 +196,7 @@ static void nfsd4_put_session_locked(struct
> >>> nfsd4_session *ses)
> >>>
> >>> lockdep_assert_held(&nn->client_lock);
> >>>
> >>> -   if (atomic_dec_and_test(&ses->se_ref) && is_session_dead(ses))
> >>> +   if (!atomic_add_unless(&ses->se_ref, -1, 1) &&
> >>> is_session_des(ses))
> >>
> >>
> >> This should read:
> >> if (!atomic_add_unless(&ses->se_ref, -1, 1) && is_session_dead(ses))
> >>
> >>> free_session(ses);
> >
> >
> > Hi,
> > I'm not sure if I have this correctly; But both before and after the patch
> > free_session gets called when se_ref count was 1, shouldn't this have
> > changed with the +1 scheme?
> >
> > Also, since the !atomic_add_unless doesn't actually decrement when at 1,
> > doesn't this leave the se_ref as 1 when it's destroyed? The function seems
> > to always be locked, so perhaps this doesn't matter, but still seems a bit
> > risky.
> >
> 
> Yes; I forgot the additional call to atomic_dec_and_test() before
> free_session().  Thanks!
> 
> I'll resubmit this after seeing how the rest of this discussion goes.
> We may end up abandoning this refcounting case.

I could live with it.

My knee jerk reaction is like Jeff's--it just seems more natural to me
for reference count 0 to mean "not in use, OK to free" in cases like
this--but maybe I just need to get used to the idea.

It'd be interesting to see what the final result looks like after
conversion to refcount_t.

--b.

Re: [PATCH v3 net-next] net: phy: Add LED mode driver for Microsemi PHYs.

2017-02-15 Thread Rob Herring

On Tue, Feb 07, 2017 at 07:10:26PM +0530, Raju Lakkaraju wrote:
> From: Raju Lakkaraju 
> 
> LED Mode:
> Microsemi PHY support 2 LEDs (LED[0] and LED[1]) to display different
> status information that can be selected by setting LED mode.
> 
> LED Mode parameter (vsc8531, led-0-mode) and (vsc8531, led-1-mode) get
> from Device Tree.
> 
> Signed-off-by: Raju Lakkaraju 
> ---
> Change set:
> v0:
> - Initial version of LED driver for Microsemi PHYs.
> v1:
> - Update all review comments given by Andrew.
> - Add new header file "mscc-phy-vsc8531.h" to define DT macros.
> - Add error/range check for DT LED mode input
> v2:
> - Fixed x86_64 build error.
> v3:
> - Update all review comments.
> - Fix the error check condition.
> 
>  .../devicetree/bindings/net/mscc-phy-vsc8531.txt   | 10 +++
>  drivers/net/phy/mscc.c | 85 
> +-
>  include/dt-bindings/net/mscc-phy-vsc8531.h | 29 
>  3 files changed, 123 insertions(+), 1 deletion(-)
>  create mode 100644 include/dt-bindings/net/mscc-phy-vsc8531.h
> 
> diff --git a/Documentation/devicetree/bindings/net/mscc-phy-vsc8531.txt 
> b/Documentation/devicetree/bindings/net/mscc-phy-vsc8531.txt
> index bdefefc6..0eedabe 100644
> --- a/Documentation/devicetree/bindings/net/mscc-phy-vsc8531.txt
> +++ b/Documentation/devicetree/bindings/net/mscc-phy-vsc8531.txt
> @@ -27,6 +27,14 @@ Optional properties:
> 'vddmac'.
> Default value is 0%.
> Ref: Table:1 - Edge rate change (below).
> +- vsc8531,led-0-mode : LED mode. Specify how the LED[0] should behave.
> +   Allowed values are define in
> +   "include/dt-bindings/net/mscc-phy-vsc8531.h".
> +   Default value is VSC8531_LINK_1000_ACTIVITY (1).
> +- vsc8531,led-1-mode : LED mode. Specify how the LED[1] should behave.

You failed to address my comment on v2. vsc8531 is not a vendor prefix. 
Please fix in a new patch since David already applied it.

Rob

Re: [PATCH v2 net-next] virtio: Fix affinity for #VCPUs != #queue pairs

2017-02-15 Thread Willem de Bruijn

On Tue, Feb 14, 2017 at 1:05 PM, Michael S. Tsirkin  wrote:
> On Tue, Feb 14, 2017 at 11:17:41AM -0800, Benjamin Serebrin wrote:
>> On Wed, Feb 8, 2017 at 11:37 AM, Michael S. Tsirkin  wrote:
>>
>> > IIRC irqbalance will bail out and avoid touching affinity
>> > if you set affinity from driver.  Breaking that's not nice.
>> > Pls correct me if I'm wrong.
>>
>>
>> I believe you're right that irqbalance will leave the affinity alone.
>>
>> Irqbalance has had changes that may or may not be in the versions bundled 
>> with
>> various guests, and I don't have a definitive cross-correlation of irqbalance
>> version to guest version.  But in the existing code, the driver does
>> set affinity for #VCPUs==#queues, so that's been happening anyway.
>
> Right - the idea being we load all CPUs equally so we don't
> need help from irqbalance - hopefully packets will be spread
> across queues in a balanced way.
>
> When we have less queues the load isn't balanced so we
> definitely need something fancier to take into account
> the overall system load.

For pure network load, assigning each txqueue IRQ exclusively
to one of the cores that generates traffic on that queue is the
optimal layout in terms of load spreading. Irqbalance does
not have the XPS information to make this optimal decision.

Overall system load affects this calculation both in the case of 1:1
mapping uneven queue distribution. In both cases, irqbalance
is hopefully smart enough to migrate other non-pinned IRQs to
cpus with lower overall load.

> But why the first N cpus? That's more or less the same as assigning them
> at random.

CPU selection is an interesting point. Spreading equally across numa
nodes would be preferable over first N. Aside from that, the first N
should work best to minimize the chance of hitting multiple
hyperthreads on the same core -- if all architectures lay out
hyperthreads in the same way as x86_64.

Re: [PATCH v3 net-next 08/14] mlx4: use order-0 pages for RX

2017-02-15 Thread Eric Dumazet

On Wed, Feb 15, 2017 at 8:42 AM, Tariq Toukan  wrote:
>

>
> Isn't it the same principle in page_frag_alloc() ?
> It is called form __netdev_alloc_skb()/__napi_alloc_skb().
>
> Why is it ok to have order-3 pages (PAGE_FRAG_CACHE_MAX_ORDER) there?

This is not ok.

This is a very well known problem, we already mentioned that here in the past,
but at least core networking stack uses  order-0 pages on PowerPC.

mlx4 driver suffers from this problem 100% more than other drivers ;)

One problem at a time Tariq. Right now, only mlx4 has this big problem
compared to other NIC.

Then, if we _still_ hit major issues, we might also need to force
napi_get_frags()
to allocate skb->head using kmalloc() instead of a page frag.

That is a very simple fix.

Remember that we have skb->truesize that is an approximation, it will
never be completely accurate,
but we need to make it better.

mlx4 driver pretends to have a frag truesize of 1536 bytes, but this
is obviously wrong when host is under memory pressure
(2 frags per page -> truesize should be 2048)

> By using netdev/napi_alloc_skb, you'll get that the SKB's linear data is a
> frag of a huge page,
> and it is not going to be freed before the other non-linear frags.
> Cannot this cause the same threats (memory pinning and so...)?
>
> Currently, mlx4 doesn't use this generic API, while most other drivers do.
>
> Similar claims are true for TX:
> https://github.com/torvalds/linux/commit/5640f7685831e088fe6c2e1f863a6805962f8e81

We do not have such problem on TX. GFP_KERNEL allocations do not have
the same issues.

Tasks are usually not malicious in our DC, and most serious
applications use memcg or such memory control.

Re: [PATCH net-next] mlx4: do not use rwlock in fast path

2017-02-15 Thread David Miller

From: Eric Dumazet 
Date: Thu, 09 Feb 2017 09:10:04 -0800

> From: Eric Dumazet 
> 
> Using a reader-writer lock in fast path is silly, when we can
> instead use RCU or a seqlock.
> 
> For mlx4 hwstamp clock, a seqlock is the way to go, removing
> two atomic operations and false sharing. 
> 
> Signed-off-by: Eric Dumazet 

Applied, thanks Eric.

Re: [PATCH net-next v1] bpf: Remove redundant ifdef

2017-02-15 Thread David Miller

From: Mickaël Salaün 
Date: Sat, 11 Feb 2017 20:37:08 +0100

> Remove a useless ifdef __NR_bpf as requested by Wang Nan.
> 
> Inline one-line static functions as it was in the bpf_sys.h file.
> 
> Signed-off-by: Mickaël Salaün 
> Cc: Alexei Starovoitov 
> Cc: Daniel Borkmann 
> Cc: David S. Miller 
> Cc: Wang Nan 
> Link: 
> https://lkml.kernel.org/r/828ab1ff-4dcf-53ff-c97b-074adb895...@huawei.com

Applied.

Re: [PATCH net-next v1] bpf: Rebuild bpf.o for any dependency update

2017-02-15 Thread David Miller

From: Mickaël Salaün 
Date: Sat, 11 Feb 2017 23:20:23 +0100

> This is needed to force a rebuild of bpf.o when one of its dependencies
> (e.g. uapi/linux/bpf.h) is updated.
> 
> Add a phony target.
> 
> Signed-off-by: Mickaël Salaün 

Applied.

Re: [PATCH v2 net] packet: Do not call fanout_release from atomic contexts

2017-02-15 Thread kbuild test robot

Hi Anoob,

[auto build test WARNING on net-next/master]
[also build test WARNING on v4.10-rc8]
[cannot apply to net/master next-20170215]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Anoob-Soman/packet-Do-not-call-fanout_release-from-atomic-contexts/20170216-004744
config: x86_64-randconfig-x008-201707 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   net/packet/af_packet.c: In function 'fanout_release':
>> net/packet/af_packet.c:1739:3: warning: 'return' with no value, in function 
>> returning non-void [-Wreturn-type]
  return;
  ^~
   net/packet/af_packet.c:1731:30: note: declared here
static struct packet_fanout *fanout_release(struct sock *sk)
 ^~

vim +/return +1739 net/packet/af_packet.c

dc99f600 David S. Miller 2011-07-05  1723   return err;
dc99f600 David S. Miller 2011-07-05  1724  }
dc99f600 David S. Miller 2011-07-05  1725  
53d5c353 Anoob Soman 2017-02-15  1726  /* If pkt_sk(sk)->fanout->sk_ref is 
zero, this functuon removes
53d5c353 Anoob Soman 2017-02-15  1727   * pkt_sk(sk)->fanout from 
fanout_list and returns pkt_sk(sk)->fanout.
53d5c353 Anoob Soman 2017-02-15  1728   * It is the responsibility of the 
caller to call fanout_release_data() and
53d5c353 Anoob Soman 2017-02-15  1729   * free the returned packet_fanout 
(after synchronize_net())
53d5c353 Anoob Soman 2017-02-15  1730   */
53d5c353 Anoob Soman 2017-02-15  1731  static struct packet_fanout 
*fanout_release(struct sock *sk)
dc99f600 David S. Miller 2011-07-05  1732  {
dc99f600 David S. Miller 2011-07-05  1733   struct packet_sock *po = 
pkt_sk(sk);
dc99f600 David S. Miller 2011-07-05  1734   struct packet_fanout *f;
53d5c353 Anoob Soman 2017-02-15  1735   bool ret_fanout = false;
dc99f600 David S. Miller 2011-07-05  1736  
dc99f600 David S. Miller 2011-07-05  1737   f = po->fanout;
dc99f600 David S. Miller 2011-07-05  1738   if (!f)
dc99f600 David S. Miller 2011-07-05 @1739   return;
dc99f600 David S. Miller 2011-07-05  1740  
fff3321d Pavel Emelyanov 2012-08-16  1741   mutex_lock(&fanout_mutex);
dc99f600 David S. Miller 2011-07-05  1742   po->fanout = NULL;
dc99f600 David S. Miller 2011-07-05  1743  
dc99f600 David S. Miller 2011-07-05  1744   if 
(atomic_dec_and_test(&f->sk_ref)) {
dc99f600 David S. Miller 2011-07-05  1745   list_del(&f->list);
53d5c353 Anoob Soman 2017-02-15  1746   ret_fanout = true;
dc99f600 David S. Miller 2011-07-05  1747   }

:: The code at line 1739 was first introduced by commit
:: dc99f600698dcac69b8f56dda9a8a00d645c5ffc packet: Add fanout support.

:: TO: David S. Miller 
:: CC: David S. Miller 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH net-next] DCTCP drop compatibility

2017-02-15 Thread David Miller


First, this needs to be reviewed by other DCTCP experts.

Second, it is missing a proper Signed-off-by: tag.

Re: [patch net-next] sched: have stub for tcf_destroy_chain in case NET_CLS is not configured

2017-02-15 Thread Sabrina Dubroca

2017-02-15, 11:57:50 +0100, Jiri Pirko wrote:
> From: Jiri Pirko 
> 
> This fixes broken build for !NET_CLS:
> 
> net/built-in.o: In function `fq_codel_destroy':
> /home/sab/linux/net-next/net/sched/sch_fq_codel.c:468: undefined reference to 
> `tcf_destroy_chain'
> 
> Fixes: cf1facda2f61 ("sched: move tcf_proto_destroy and tcf_destroy_chain 
> helpers into cls_api")
> Reported-by: Sabrina Dubroca 
> Signed-off-by: Jiri Pirko 

Thanks for the fix.

Tested-by: Sabrina Dubroca 

-- 
Sabrina

Re: [PATCH 1/2] net: xilinx_emaclite: fix receive buffer overflow

2017-02-15 Thread David Miller

From: Anssi Hannula 
Date: Tue, 14 Feb 2017 19:11:44 +0200

> xilinx_emaclite looks at the received data to try to determine the
> Ethernet packet length but does not properly clamp it if
> proto_type == ETH_P_IP or 1500 < proto_type <= 1518, causing a buffer
> overflow and a panic via skb_panic() as the length exceeds the allocated
> skb size.
> 
> Fix those cases.
> 
> Also add an additional unconditional check with WARN_ON() at the end.
> 
> Signed-off-by: Anssi Hannula 
> Fixes: bb81b2ddfa19 ("net: add Xilinx emac lite device driver")

Applied.

Re: [PATCH 2/2] net: xilinx_emaclite: fix freezes due to unordered I/O

2017-02-15 Thread David Miller

From: Anssi Hannula 
Date: Tue, 14 Feb 2017 19:11:45 +0200

> The xilinx_emaclite uses __raw_writel and __raw_readl for register
> accesses. Those functions do not imply any kind of memory barriers and
> they may be reordered.
> 
> The driver does not seem to take that into account, though, and the
> driver does not satisfy the ordering requirements of the hardware.
> For clear examples, see xemaclite_mdio_write() and xemaclite_mdio_read()
> which try to set MDIO address before initiating the transaction.
> 
> I'm seeing system freezes with the driver with GCC 5.4 and current
> Linux kernels on Zynq-7000 SoC immediately when trying to use the
> interface.
> 
> In commit 123c1407af87 ("net: emaclite: Do not use microblaze and ppc
> IO functions") the driver was switched from non-generic
> in_be32/out_be32 (memory barriers, big endian) to
> __raw_readl/__raw_writel (no memory barriers, native endian), so
> apparently the device follows system endianness and the driver was
> originally written with the assumption of memory barriers.
> 
> Rather than try to hunt for each case of missing barrier, just switch
> the driver to use iowrite32/ioread32/iowrite32be/ioread32be depending
> on endianness instead.
> 
> Tested on little-endian Zynq-7000 ARM SoC FPGA.
> 
> Signed-off-by: Anssi Hannula 
> Fixes: 123c1407af87 ("net: emaclite: Do not use microblaze and ppc IO
> functions")

Applied.

Re: [PATCH v2 0/4] PTP attribute handling cleanup

2017-02-15 Thread David Miller

From: Dmitry Torokhov 
Date: Tue, 14 Feb 2017 10:23:30 -0800

> PTP core was creating some attributes, such as "period" and "fifo", and the
> entire "pins" attribute group, after creating class deevice, which creates
> a race for userspace: uevent may arrive before all attributes are created.
> 
> This series of patches switches PTP to use is_visible() to control
> visibility of attributes in a group, and device_create_with_groups() to
> ensure that attributes are created before we notify userspace of a new
> device.

Richard, please review.

Re: [patch net-next] sched: have stub for tcf_destroy_chain in case NET_CLS is not configured

2017-02-15 Thread David Miller

From: Jiri Pirko 
Date: Wed, 15 Feb 2017 11:57:50 +0100

> From: Jiri Pirko 
> 
> This fixes broken build for !NET_CLS:
> 
> net/built-in.o: In function `fq_codel_destroy':
> /home/sab/linux/net-next/net/sched/sch_fq_codel.c:468: undefined reference to 
> `tcf_destroy_chain'
> 
> Fixes: cf1facda2f61 ("sched: move tcf_proto_destroy and tcf_destroy_chain 
> helpers into cls_api")
> Reported-by: Sabrina Dubroca 
> Signed-off-by: Jiri Pirko 

Applied, thanks Jiri.

Re: [RFC PATCH v2] net: ethtool: add support for forward error correction modes

2017-02-15 Thread Casey Leedom

  Vidya and I have been in communication on how to support Forward Error 
Correction Management on Links.  My own thoughts are that we should consider 
using the new Get/Set Link Ksettiongs API because FEC is a low-level Link 
parameter which is either negotiated at the same time as Link Speed if 
Auto-Negotiation is in use, or must be set identically on both Link Peers is AN 
is not in use.  I'm not convinced that FEC Management should use a separate 
ethtool/Kernel Driver API ala Pause Frame settings.

  I'm hoping to meet with Vidya and his collaborators at Cumulus Networks in 
the very near term to hash these issues out.

Casey

Re: [PATCH net] net: neigh: Fix netevent NETEVENT_DELAY_PROBE_TIME_UPDATE notification

2017-02-15 Thread David Miller

From: Ido Schimmel 
Date: Wed, 15 Feb 2017 08:59:25 +0200

> On Wed, Feb 15, 2017 at 01:00:36AM +0100, Marcus Huewe wrote:
>> When setting a neigh related sysctl parameter, we always send a
>> NETEVENT_DELAY_PROBE_TIME_UPDATE netevent. For instance, when
>> executing
>> 
>>  sysctl net.ipv6.neigh.wlp3s0.retrans_time_ms=2000
>> 
>> a NETEVENT_DELAY_PROBE_TIME_UPDATE netevent is generated.
>> 
>> This is caused by commit 2a4501ae18b5 ("neigh: Send a
>> notification when DELAY_PROBE_TIME changes"). According to the
>> commit's description, it was intended to generate such an event
>> when setting the "delay_first_probe_time" sysctl parameter.
>> 
>> In order to fix this, only generate this event when actually
>> setting the "delay_first_probe_time" sysctl parameter. This fix
>> should not have any unintended side-effects, because all but one
>> registered netevent callbacks check for other netevent event
>> types (the registered callbacks were obtained by grepping for
>> "register_netevent_notifier"). The only callback that uses the
>> NETEVENT_DELAY_PROBE_TIME_UPDATE event is
>> mlxsw_sp_router_netevent_event() (in
>> drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c): in case
>> of this event, it only accesses the DELAY_PROBE_TIME of the
>> passed neigh_parms.
>> 
>> Signed-off-by: Marcus Huewe 
> 
> Fixes: 2a4501ae18b5 ("neigh: Send a notification when DELAY_PROBE_TIME 
> changes")
> Reviewed-by: Ido Schimmel 

Applied and queued up for -stable, thanks everyone.

Re: [PATCH net-next] cxgb4: Update proper netdev stats for rx drops

2017-02-15 Thread David Miller

From: Ganesh Goudar 
Date: Wed, 15 Feb 2017 11:45:25 +0530

> Count buffer group drops or truncates as rx drops rather than
> rx errors in netdev stats.
> 
> Signed-off-by: Ganesh Goudar 
> Signed-off-by: Arjun V 

Applied.

Re: [PATCH v2 net] packet: Do not call fanout_release from atomic contexts

2017-02-15 Thread Eric Dumazet

On Wed, 2017-02-15 at 16:43 +, Anoob Soman wrote:

> 
>  net/packet/af_packet.c | 29 ++---
>  1 file changed, 22 insertions(+), 7 deletions(-)
> 
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index d56ee46..af29510 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -1497,6 +1497,8 @@ static void __fanout_link(struct sock *sk, struct 
> packet_sock *po)
>   f->arr[f->num_members] = sk;
>   smp_wmb();
>   f->num_members++;
> + if (f->num_members == 1)
> + dev_add_pack(&f->prot_hook);
>   spin_unlock(&f->lock);
>  }
>  
> @@ -1513,6 +1515,8 @@ static void __fanout_unlink(struct sock *sk, struct 
> packet_sock *po)
>   BUG_ON(i >= f->num_members);
>   f->arr[i] = f->arr[f->num_members - 1];
>   f->num_members--;
> + if (f->num_members == 0)
> + __dev_remove_pack(&f->prot_hook);
>   spin_unlock(&f->lock);
>  }
>  
> @@ -1687,7 +1691,6 @@ static int fanout_add(struct sock *sk, u16 id, u16 
> type_flags)
>   match->prot_hook.func = packet_rcv_fanout;
>   match->prot_hook.af_packet_priv = match;
>   match->prot_hook.id_match = match_fanout_group;
> - dev_add_pack(&match->prot_hook);
>   list_add(&match->list, &fanout_list);
>   }
>   err = -EINVAL;
> @@ -1712,10 +1715,16 @@ static int fanout_add(struct sock *sk, u16 id, u16 
> type_flags)
>   return err;
>  }
>  
> -static void fanout_release(struct sock *sk)
> +/* If pkt_sk(sk)->fanout->sk_ref is zero, this functuon removes
> + * pkt_sk(sk)->fanout from fanout_list and returns pkt_sk(sk)->fanout.
> + * It is the responsibility of the caller to call fanout_release_data() and
> + * free the returned packet_fanout (after synchronize_net())
> + */
> +static struct packet_fanout *fanout_release(struct sock *sk)
>  {
>   struct packet_sock *po = pkt_sk(sk);
>   struct packet_fanout *f;
> + bool ret_fanout = false;

No need for this new variable.

>  
>   f = po->fanout;
>   if (!f)
> @@ -1726,14 +1735,14 @@ static void fanout_release(struct sock *sk)
>  
>   if (atomic_dec_and_test(&f->sk_ref)) {
>   list_del(&f->list);
> - dev_remove_pack(&f->prot_hook);
> - fanout_release_data(f);
> - kfree(f);
> + ret_fanout = true;
>   }

} else {
 f = NULL;
}

>   mutex_unlock(&fanout_mutex);
>  
>   if (po->rollover)
>   kfree_rcu(po->rollover, rcu);
> +
> + return ret_fanout ? f : NULL;

return f;

>  }
>  
>  

Otherwise, this look good.

But make sure to respin your patch based on latest net tree.

af_packet.c got a recent fix.

Thanks

Re: [PATCH net-next] openvswitch: Set internal device max mtu to ETH_MAX_MTU.

2017-02-15 Thread David Miller

From: Jarno Rajahalme 
Date: Tue, 14 Feb 2017 21:16:28 -0800

> Commit 91572088e3fd ("net: use core MTU range checking in core net
> infra") changed the openvswitch internal device to use the core net
> infra for controlling the MTU range, but failed to actually set the
> max_mtu as described in the commit message, which now defaults to
> ETH_DATA_LEN.
> 
> This patch fixes this by setting max_mtu to ETH_MAX_MTU after
> ether_setup() call.
> 
> Fixes: 91572088e3fd ("net: use core MTU range checking in core net infra")
> Signed-off-by: Jarno Rajahalme 

Applied, thank you.

Re: [PATCH v2 net-next] virtio: Fix affinity for #VCPUs != #queue pairs

2017-02-15 Thread Michael S. Tsirkin

On Wed, Feb 15, 2017 at 08:50:34AM -0800, Willem de Bruijn wrote:
> On Tue, Feb 14, 2017 at 1:05 PM, Michael S. Tsirkin  wrote:
> > On Tue, Feb 14, 2017 at 11:17:41AM -0800, Benjamin Serebrin wrote:
> >> On Wed, Feb 8, 2017 at 11:37 AM, Michael S. Tsirkin  
> >> wrote:
> >>
> >> > IIRC irqbalance will bail out and avoid touching affinity
> >> > if you set affinity from driver.  Breaking that's not nice.
> >> > Pls correct me if I'm wrong.
> >>
> >>
> >> I believe you're right that irqbalance will leave the affinity alone.
> >>
> >> Irqbalance has had changes that may or may not be in the versions bundled 
> >> with
> >> various guests, and I don't have a definitive cross-correlation of 
> >> irqbalance
> >> version to guest version.  But in the existing code, the driver does
> >> set affinity for #VCPUs==#queues, so that's been happening anyway.
> >
> > Right - the idea being we load all CPUs equally so we don't
> > need help from irqbalance - hopefully packets will be spread
> > across queues in a balanced way.
> >
> > When we have less queues the load isn't balanced so we
> > definitely need something fancier to take into account
> > the overall system load.
> 
> For pure network load, assigning each txqueue IRQ exclusively
> to one of the cores that generates traffic on that queue is the
> optimal layout in terms of load spreading. Irqbalance does
> not have the XPS information to make this optimal decision.

Try to add hints for it?

> Overall system load affects this calculation both in the case of 1:1
> mapping uneven queue distribution. In both cases, irqbalance
> is hopefully smart enough to migrate other non-pinned IRQs to
> cpus with lower overall load.

Not if everyone starts inserting hacks like this one in code.

> > But why the first N cpus? That's more or less the same as assigning them
> > at random.
> 
> CPU selection is an interesting point. Spreading equally across numa
> nodes would be preferable over first N. Aside from that, the first N
> should work best to minimize the chance of hitting multiple
> hyperthreads on the same core -- if all architectures lay out
> hyperthreads in the same way as x86_64.

That's another problem with this patch. If you care about hyperthreads
you want an API to probe for that.

-- 
MST

Re: [PATCH net-next v7 0/2] qed*: Add support for PTP

2017-02-15 Thread David Miller

From: Yuval Mintz 
Date: Wed, 15 Feb 2017 10:24:09 +0200

> This patch series adds required changes for qed/qede drivers for
> supporting the IEEE Precision Time Protocol (PTP).

Series applied, thanks.

[PATCH] net: ethernet: ti: cpsw: correct ale dev to cpsw

2017-02-15 Thread Ivan Khoronzhuk

The ale is a property of cpsw, so change dev to cpsw->dev,
aka pdev->dev, to be consistent.

Signed-off-by: Ivan Khoronzhuk 
---
Based on net-next/master

 drivers/net/ethernet/ti/cpsw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index e86f226..57c8308 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -3032,7 +3032,7 @@ static int cpsw_probe(struct platform_device *pdev)
goto clean_dma_ret;
}
 
-   ale_params.dev  = &ndev->dev;
+   ale_params.dev  = &pdev->dev;
ale_params.ale_ageout   = ale_ageout;
ale_params.ale_entries  = data->ale_entries;
ale_params.ale_ports= data->slaves;
-- 
2.7.4

Re: [PATCH] average: change to declare precision, not factor

2017-02-15 Thread David Miller

From: Johannes Berg 
Date: Wed, 15 Feb 2017 09:49:26 +0100

> From: Johannes Berg 
> 
> Declaring the factor is counter-intuitive, and people are prone
> to using small(-ish) values even when that makes no sense.
> 
> Change the DECLARE_EWMA() macro to take the fractional precision,
> in bits, rather than a factor, and update all users.
> 
> While at it, add some more documentation.
> 
> Signed-off-by: Johannes Berg 
> ---
> Unless I hear any objections, I will take this through my tree.

Acked-by: David S. Miller

Re: [PATCH] net: ethernet: aquantia: switch to pci_alloc_irq_vectors

2017-02-15 Thread Pavel Belous



On 15.02.2017 10:38, Christoph Hellwig wrote:

pci_enable_msix has been long deprecated, but this driver adds a new
instance.  Convert it to pci_alloc_irq_vectors so that no new instance
of the deprecated function reaches mainline.

Signed-off-by: Christoph Hellwig 
---
 .../net/ethernet/aquantia/atlantic/aq_pci_func.c   | 101 +
 1 file changed, 25 insertions(+), 76 deletions(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c 
b/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c
index da4bc09dac51..581de71a958a 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c
@@ -22,13 +22,11 @@ struct aq_pci_func_s {
void *aq_vec[AQ_CFG_PCI_FUNC_MSIX_IRQS];
resource_size_t mmio_pa;
unsigned int msix_entry_mask;
-   unsigned int irq_type;
unsigned int ports;
bool is_pci_enabled;
bool is_regions;
bool is_pci_using_dac;
struct aq_hw_caps_s aq_hw_caps;
-   struct msix_entry msix_entry[AQ_CFG_PCI_FUNC_MSIX_IRQS];
 };

 struct aq_pci_func_s *aq_pci_func_alloc(struct aq_hw_ops *aq_hw_ops,
@@ -87,7 +85,6 @@ int aq_pci_func_init(struct aq_pci_func_s *self)
int err = 0;
unsigned int bar = 0U;
unsigned int port = 0U;
-   unsigned int i = 0U;

err = pci_enable_device(self->pdev);
if (err < 0)
@@ -145,27 +142,16 @@ int aq_pci_func_init(struct aq_pci_func_s *self)
}
}

-   for (i = 0; i < self->aq_hw_caps.msix_irqs; i++)
-   self->msix_entry[i].entry = i;
-
/*enable interrupts */
-#if AQ_CFG_FORCE_LEGACY_INT
-   self->irq_type = AQ_HW_IRQ_LEGACY;
-#else
-   err = pci_enable_msix(self->pdev, self->msix_entry,
- self->aq_hw_caps.msix_irqs);
+#if !AQ_CFG_FORCE_LEGACY_INT
+   err = pci_alloc_irq_vectors(self->pdev, self->aq_hw_caps.msix_irqs,
+ self->aq_hw_caps.msix_irqs, PCI_IRQ_MSIX);

-   if (err >= 0) {
-   self->irq_type = AQ_HW_IRQ_MSIX;
-   } else {
-   err = pci_enable_msi(self->pdev);
-
-   if (err >= 0) {
-   self->irq_type = AQ_HW_IRQ_MSI;
-   } else {
-   self->irq_type = AQ_HW_IRQ_LEGACY;
-   err = 0;
-   }
+   if (err < 0) {
+   err = pci_alloc_irq_vectors(self->pdev, 1, 1,
+   PCI_IRQ_MSI | PCI_IRQ_LEGACY);
+   if (err < 0)
+   goto err_exit;
}
 #endif

@@ -196,34 +182,22 @@ int aq_pci_func_init(struct aq_pci_func_s *self)
 int aq_pci_func_alloc_irq(struct aq_pci_func_s *self, unsigned int i,
  char *name, void *aq_vec, cpumask_t *affinity_mask)
 {
+   struct pci_dev *pdev = self->pdev;
int err = 0;

-   switch (self->irq_type) {
-   case AQ_HW_IRQ_MSIX:
-   err = request_irq(self->msix_entry[i].vector, aq_vec_isr, 0,
+   if (pdev->msix_enabled || pdev->msi_enabled)
+   err = request_irq(pci_irq_vector(pdev, i), aq_vec_isr, 0,
  name, aq_vec);
-   break;
-
-   case AQ_HW_IRQ_MSI:
-   err = request_irq(self->pdev->irq, aq_vec_isr, 0, name, aq_vec);
-   break;
-
-   case AQ_HW_IRQ_LEGACY:
-   err = request_irq(self->pdev->irq, aq_vec_isr_legacy,
+   else
+   err = request_irq(pci_irq_vector(pdev, i), aq_vec_isr_legacy,
  IRQF_SHARED, name, aq_vec);
-   break;
-
-   default:
-   err = -EFAULT;
-   break;
-   }

if (err >= 0) {
self->msix_entry_mask |= (1 << i);
self->aq_vec[i] = aq_vec;

-   if (self->irq_type == AQ_HW_IRQ_MSIX)
-   irq_set_affinity_hint(self->msix_entry[i].vector,
+   if (pdev->msix_enabled)
+   irq_set_affinity_hint(pci_irq_vector(pdev, i),
  affinity_mask);
}

@@ -232,30 +206,16 @@ int aq_pci_func_alloc_irq(struct aq_pci_func_s *self, 
unsigned int i,

 void aq_pci_func_free_irqs(struct aq_pci_func_s *self)
 {
+   struct pci_dev *pdev = self->pdev;
unsigned int i = 0U;

for (i = 32U; i--;) {
if (!((1U << i) & self->msix_entry_mask))
continue;

-   switch (self->irq_type) {
-   case AQ_HW_IRQ_MSIX:
-   irq_set_affinity_hint(self->msix_entry[i].vector, NULL);
-   free_irq(self->msix_entry[i].vector, self->aq_vec[i]);
-   break;
-
-   case AQ_HW_IRQ_MSI:
-   free_irq(self->pdev->irq, self->aq_vec[i]);
-   break;
-
-   case AQ_HW_IRQ_LEGACY:
-

Re: [PATCH] net: ethernet: aquantia: switch to pci_alloc_irq_vectors

2017-02-15 Thread David Miller

From: Christoph Hellwig 
Date: Wed, 15 Feb 2017 08:38:47 +0100

> pci_enable_msix has been long deprecated, but this driver adds a new
> instance.  Convert it to pci_alloc_irq_vectors so that no new instance
> of the deprecated function reaches mainline.
> 
> Signed-off-by: Christoph Hellwig 

Applied to net-next, thanks.

Re: [PATCH net-next V3 5/7] net/sched: cls_matchall: Reflect HW offloading status

2017-02-15 Thread David Miller

From: Or Gerlitz 
Date: Wed, 15 Feb 2017 10:52:35 +0200

> @@ -194,6 +199,9 @@ static int mall_change(struct net *net, struct sk_buff 
> *in_skb,
>   }
>   }
>  
> + if (!(tc_in_hw(new->flags)))
> + new->flags |= TCA_CLS_FLAGS_NOT_IN_HW;

Too many parenthesis, please make this:

if (!tc_in_hw(new->flags))
new->flags |= TCA_CLS_FLAGS_NOT_IN_HW;

Thanks.

Re: [PATCH net-next V3 6/7] net/sched: cls_u32: Reflect HW offload status

2017-02-15 Thread David Miller

From: Or Gerlitz 
Date: Wed, 15 Feb 2017 10:52:36 +0200

> @@ -895,6 +899,9 @@ static int u32_change(struct net *net, struct sk_buff 
> *in_skb,
>   return err;
>   }
>  
> + if (!(tc_in_hw(new->flags)))

Less parenthesis, please.

> @@ -1014,6 +1021,9 @@ static int u32_change(struct net *net, struct sk_buff 
> *in_skb,
>   if (err)
>   goto errhw;
>  
> + if (!(tc_in_hw(n->flags)))

Likewise.

Re: [PATCH net-next V3 7/7] net/sched: cls_bpf: Reflect HW offload status

2017-02-15 Thread David Miller

From: Or Gerlitz 
Date: Wed, 15 Feb 2017 10:52:37 +0200

> @@ -511,6 +517,9 @@ static int cls_bpf_change(struct net *net, struct sk_buff 
> *in_skb,
>   return ret;
>   }
>  
> + if (!(tc_in_hw(prog->gen_flags)))

Again, less parenthesis.

Thanks.

[PATCH net] ibmvnic: Use common counter for capabilities checks

2017-02-15 Thread Thomas Falcon

Two different counters were being used for capabilities
requests and queries. These commands are not called
at the same time so there is no reason a single counter
cannot be used.

Signed-off-by: Thomas Falcon 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 71 --
 drivers/net/ethernet/ibm/ibmvnic.h |  3 +-
 2 files changed, 39 insertions(+), 35 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 244be77..fb683f2 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1255,8 +1255,6 @@ static void release_sub_crqs(struct ibmvnic_adapter 
*adapter)
}
adapter->rx_scrq = NULL;
}
-
-   adapter->requested_caps = 0;
 }
 
 static void release_sub_crqs_no_irqs(struct ibmvnic_adapter *adapter)
@@ -1278,8 +1276,6 @@ static void release_sub_crqs_no_irqs(struct 
ibmvnic_adapter *adapter)
  adapter->rx_scrq[i]);
adapter->rx_scrq = NULL;
}
-
-   adapter->requested_caps = 0;
 }
 
 static int disable_scrq_irq(struct ibmvnic_adapter *adapter,
@@ -1567,30 +1563,36 @@ static void init_sub_crqs(struct ibmvnic_adapter 
*adapter, int retry)
 
crq.request_capability.capability = cpu_to_be16(REQ_TX_QUEUES);
crq.request_capability.number = cpu_to_be64(adapter->req_tx_queues);
+   atomic_inc(&adapter->running_cap_crqs);
ibmvnic_send_crq(adapter, &crq);
 
crq.request_capability.capability = cpu_to_be16(REQ_RX_QUEUES);
crq.request_capability.number = cpu_to_be64(adapter->req_rx_queues);
+   atomic_inc(&adapter->running_cap_crqs);
ibmvnic_send_crq(adapter, &crq);
 
crq.request_capability.capability = cpu_to_be16(REQ_RX_ADD_QUEUES);
crq.request_capability.number = cpu_to_be64(adapter->req_rx_add_queues);
+   atomic_inc(&adapter->running_cap_crqs);
ibmvnic_send_crq(adapter, &crq);
 
crq.request_capability.capability =
cpu_to_be16(REQ_TX_ENTRIES_PER_SUBCRQ);
crq.request_capability.number =
cpu_to_be64(adapter->req_tx_entries_per_subcrq);
+   atomic_inc(&adapter->running_cap_crqs);
ibmvnic_send_crq(adapter, &crq);
 
crq.request_capability.capability =
cpu_to_be16(REQ_RX_ADD_ENTRIES_PER_SUBCRQ);
crq.request_capability.number =
cpu_to_be64(adapter->req_rx_add_entries_per_subcrq);
+   atomic_inc(&adapter->running_cap_crqs);
ibmvnic_send_crq(adapter, &crq);
 
crq.request_capability.capability = cpu_to_be16(REQ_MTU);
crq.request_capability.number = cpu_to_be64(adapter->req_mtu);
+   atomic_inc(&adapter->running_cap_crqs);
ibmvnic_send_crq(adapter, &crq);
 
if (adapter->netdev->flags & IFF_PROMISC) {
@@ -1598,12 +1600,14 @@ static void init_sub_crqs(struct ibmvnic_adapter 
*adapter, int retry)
crq.request_capability.capability =
cpu_to_be16(PROMISC_REQUESTED);
crq.request_capability.number = cpu_to_be64(1);
+   atomic_inc(&adapter->running_cap_crqs);
ibmvnic_send_crq(adapter, &crq);
}
} else {
crq.request_capability.capability =
cpu_to_be16(PROMISC_REQUESTED);
crq.request_capability.number = cpu_to_be64(0);
+   atomic_inc(&adapter->running_cap_crqs);
ibmvnic_send_crq(adapter, &crq);
}
 
@@ -1955,112 +1959,112 @@ static void send_cap_queries(struct ibmvnic_adapter 
*adapter)
 {
union ibmvnic_crq crq;
 
-   atomic_set(&adapter->running_cap_queries, 0);
+   atomic_set(&adapter->running_cap_crqs, 0);
memset(&crq, 0, sizeof(crq));
crq.query_capability.first = IBMVNIC_CRQ_CMD;
crq.query_capability.cmd = QUERY_CAPABILITY;
 
crq.query_capability.capability = cpu_to_be16(MIN_TX_QUEUES);
-   atomic_inc(&adapter->running_cap_queries);
+   atomic_inc(&adapter->running_cap_crqs);
ibmvnic_send_crq(adapter, &crq);
 
crq.query_capability.capability = cpu_to_be16(MIN_RX_QUEUES);
-   atomic_inc(&adapter->running_cap_queries);
+   atomic_inc(&adapter->running_cap_crqs);
ibmvnic_send_crq(adapter, &crq);
 
crq.query_capability.capability = cpu_to_be16(MIN_RX_ADD_QUEUES);
-   atomic_inc(&adapter->running_cap_queries);
+   atomic_inc(&adapter->running_cap_crqs);
ibmvnic_send_crq(adapter, &crq);
 
crq.query_capability.capability = cpu_to_be16(MAX_TX_QUEUES);
-   atomic_inc(&adapter->running_cap_queries);
+   atomic_inc(&adapter->running_cap_crqs);
ibmvnic_send_crq(adapter, &crq);
 
crq.query_capability.capability = cpu_to_be16(MAX_RX_QUEUES);
-   atomic_inc(&adapter->running_cap_queries);
+   atomic_inc(&adapter->running_ca

[PATCH net] ibmvnic: Handle processing of CRQ messages in a tasklet

2017-02-15 Thread Thomas Falcon

Create a tasklet to process queued commands or messages received from
firmware instead of processing them in the interrupt handler. Note that
this handler does not process network traffic, but communications related
to resource allocation and device settings.

Signed-off-by: Thomas Falcon 
---
 drivers/net/ethernet/ibm/ibmvnic.c | 18 +-
 drivers/net/ethernet/ibm/ibmvnic.h |  1 +
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index 5b66b4f..244be77 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -3415,6 +3415,18 @@ static void ibmvnic_handle_crq(union ibmvnic_crq *crq,
 static irqreturn_t ibmvnic_interrupt(int irq, void *instance)
 {
struct ibmvnic_adapter *adapter = instance;
+   unsigned long flags;
+
+   spin_lock_irqsave(&adapter->crq.lock, flags);
+   vio_disable_interrupts(adapter->vdev);
+   tasklet_schedule(&adapter->tasklet);
+   spin_unlock_irqrestore(&adapter->crq.lock, flags);
+   return IRQ_HANDLED;
+}
+
+static void ibmvnic_tasklet(void *data)
+{
+   struct ibmvnic_adapter *adapter = data;
struct ibmvnic_crq_queue *queue = &adapter->crq;
struct vio_dev *vdev = adapter->vdev;
union ibmvnic_crq *crq;
@@ -3440,7 +3452,6 @@ static irqreturn_t ibmvnic_interrupt(int irq, void 
*instance)
}
}
spin_unlock_irqrestore(&queue->lock, flags);
-   return IRQ_HANDLED;
 }
 
 static int ibmvnic_reenable_crq_queue(struct ibmvnic_adapter *adapter)
@@ -3495,6 +3506,7 @@ static void ibmvnic_release_crq_queue(struct 
ibmvnic_adapter *adapter)
 
netdev_dbg(adapter->netdev, "Releasing CRQ\n");
free_irq(vdev->irq, adapter);
+   tasklet_kill(&adapter->tasklet);
do {
rc = plpar_hcall_norets(H_FREE_CRQ, vdev->unit_address);
} while (rc == H_BUSY || H_IS_LONG_BUSY(rc));
@@ -3540,6 +3552,9 @@ static int ibmvnic_init_crq_queue(struct ibmvnic_adapter 
*adapter)
 
retrc = 0;
 
+   tasklet_init(&adapter->tasklet, (void *)ibmvnic_tasklet,
+(unsigned long)adapter);
+
netdev_dbg(adapter->netdev, "registering irq 0x%x\n", vdev->irq);
rc = request_irq(vdev->irq, ibmvnic_interrupt, 0, IBMVNIC_NAME,
 adapter);
@@ -3561,6 +3576,7 @@ static int ibmvnic_init_crq_queue(struct ibmvnic_adapter 
*adapter)
return retrc;
 
 req_irq_failed:
+   tasklet_kill(&adapter->tasklet);
do {
rc = plpar_hcall_norets(H_FREE_CRQ, vdev->unit_address);
} while (rc == H_BUSY || H_IS_LONG_BUSY(rc));
diff --git a/drivers/net/ethernet/ibm/ibmvnic.h 
b/drivers/net/ethernet/ibm/ibmvnic.h
index dd775d9..0d0edc3 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.h
+++ b/drivers/net/ethernet/ibm/ibmvnic.h
@@ -1049,5 +1049,6 @@ struct ibmvnic_adapter {
 
struct work_struct vnic_crq_init;
struct work_struct ibmvnic_xport;
+   struct tasklet_struct tasklet;
bool failover;
 };
-- 
2.7.4

[PATCH net] ibmvnic: Make CRQ interrupt tasklet wait for all capabilities crqs

2017-02-15 Thread Thomas Falcon

After sending device capability queries and requests to the vNIC Server,
an interrupt is triggered and the responses are written to the driver's
CRQ response buffer. Since the interrupt can be triggered before all
responses are written and visible to the partition, there is a danger
that the interrupt handler or tasklet can terminate before all responses
are read, resulting in a failure to initialize the device.

To avoid this scenario, when capability commands are sent, we set
a flag that will be checked in the following interrupt tasklet that
will handle the capability responses from the server. Once all
responses have been handled, the flag is disabled; and the tasklet
is allowed to terminate.

Signed-off-by: Thomas Falcon 
---
This patch depends on
"[PATCH net] ibmvnic: Handle processing of CRQ messages in a tasklet"
and
"[PATCH net] ibmvnic: Use common counter for capabilities checks"
---
 drivers/net/ethernet/ibm/ibmvnic.c | 16 ++--
 drivers/net/ethernet/ibm/ibmvnic.h |  1 +
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c 
b/drivers/net/ethernet/ibm/ibmvnic.c
index fb683f2..5a8707f 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -2413,6 +2413,7 @@ static void handle_request_cap_rsp(union ibmvnic_crq *crq,
struct ibmvnic_query_ip_offload_buffer *ip_offload_buf =
&adapter->ip_offload_buf;
 
+   adapter->wait_capability = false;
adapter->ip_offload_tok = dma_map_single(dev, ip_offload_buf,
 buf_sz,
 DMA_FROM_DEVICE);
@@ -2708,9 +2709,11 @@ static void handle_query_cap_rsp(union ibmvnic_crq *crq,
}
 
 out:
-   if (atomic_read(&adapter->running_cap_crqs) == 0)
+   if (atomic_read(&adapter->running_cap_crqs) == 0) {
+   adapter->wait_capability = false;
init_sub_crqs(adapter, 0);
/* We're done querying the capabilities, initialize sub-crqs */
+   }
 }
 
 static void handle_control_ras_rsp(union ibmvnic_crq *crq,
@@ -3453,9 +3456,18 @@ static void ibmvnic_tasklet(void *data)
ibmvnic_handle_crq(crq, adapter);
crq->generic.first = 0;
} else {
-   done = true;
+   /* remain in tasklet until all
+* capabilities responses are received
+*/
+   if (!adapter->wait_capability)
+   done = true;
}
}
+   /* if capabilities CRQ's were sent in this tasklet, the following
+* tasklet must wait until all responses are received
+*/
+   if (atomic_read(&adapter->running_cap_crqs) != 0)
+   adapter->wait_capability = true;
spin_unlock_irqrestore(&queue->lock, flags);
 }
 
diff --git a/drivers/net/ethernet/ibm/ibmvnic.h 
b/drivers/net/ethernet/ibm/ibmvnic.h
index 504d05c..422824f 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.h
+++ b/drivers/net/ethernet/ibm/ibmvnic.h
@@ -977,6 +977,7 @@ struct ibmvnic_adapter {
int login_rsp_buf_sz;
 
atomic_t running_cap_crqs;
+   bool wait_capability;
 
struct ibmvnic_sub_crq_queue **tx_scrq;
struct ibmvnic_sub_crq_queue **rx_scrq;
-- 
2.7.4

Re: [PATCH v3 0/8] misc patchs

2017-02-15 Thread David Miller

From: Corentin Labbe 
Date: Wed, 15 Feb 2017 10:46:37 +0100

> This is a follow up of my previous stmmac serie which address some comment
> done in v2.

Series applied.

Re: [patch net-next] mlxsw: acl: Use PBS type for forward action

2017-02-15 Thread David Miller

From: Jiri Pirko 
Date: Wed, 15 Feb 2017 12:09:51 +0100

> From: Jiri Pirko 
> 
> Current behaviour of "mirred redirect" action (forward) offload is a bit
> odd. For matched packets the action forwards them to the desired
> destination, but it also lets the packet duplicates to go the original
> way down (bridge, router, etc). That is more like "mirred mirror".
> Fix this by using PBS type which behaves exactly like "mirred redirect".
> Note that PBS does not support loopback mode.
> 
> Fixes: 4cda7d8d7098 ("mlxsw: core: Introduce flexible actions support")
> Signed-off-by: Jiri Pirko 
> Reviewed-by: Ido Schimmel 

Applied, thanks.

1 2 >

1 - 100 of 177 matches

Mail list logo