[PATCH v3 net-next 3/8] net/ncsi: Enforce failover on link monitor timeout

2017-04-17 Thread Gavin Shan
The NCSI channel has been configured to provide service if its link
monitor timer is enabled, regardless of its state (inactive or active).
So the timeout event on the link monitor indicates the out-of-service
on that channel, for which a failover is needed.

This sets NCSI_DEV_RESHUFFLE flag to enforce failover on link monitor
timeout, regardless the channel's original state (inactive or active).
Also, the link is put into "down" state to give the failing channel
lowest priority when selecting for the active channel. The state of
failing channel should be set to active in order for deinitialization
and failover to be done.

Signed-off-by: Gavin Shan 
---
 net/ncsi/ncsi-manage.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/ncsi/ncsi-manage.c b/net/ncsi/ncsi-manage.c
index c71a3a5..13ad1f26 100644
--- a/net/ncsi/ncsi-manage.c
+++ b/net/ncsi/ncsi-manage.c
@@ -170,6 +170,7 @@ static void ncsi_channel_monitor(unsigned long data)
struct ncsi_channel *nc = (struct ncsi_channel *)data;
struct ncsi_package *np = nc->package;
struct ncsi_dev_priv *ndp = np->ndp;
+   struct ncsi_channel_mode *ncm;
struct ncsi_cmd_arg nca;
bool enabled, chained;
unsigned int monitor_state;
@@ -214,20 +215,21 @@ static void ncsi_channel_monitor(unsigned long data)
case NCSI_CHANNEL_MONITOR_WAIT ... NCSI_CHANNEL_MONITOR_WAIT_MAX:
break;
default:
-   if (!(ndp->flags & NCSI_DEV_HWA) &&
-   state == NCSI_CHANNEL_ACTIVE) {
+   if (!(ndp->flags & NCSI_DEV_HWA)) {
ncsi_report_link(ndp, true);
ndp->flags |= NCSI_DEV_RESHUFFLE;
}
 
+   ncm = &nc->modes[NCSI_MODE_LINK];
spin_lock_irqsave(&nc->lock, flags);
nc->state = NCSI_CHANNEL_INVISIBLE;
+   ncm->data[2] &= ~0x1;
spin_unlock_irqrestore(&nc->lock, flags);
 
ncsi_stop_channel_monitor(nc);
 
spin_lock_irqsave(&ndp->lock, flags);
-   nc->state = NCSI_CHANNEL_INACTIVE;
+   nc->state = NCSI_CHANNEL_ACTIVE;
list_add_tail_rcu(&nc->link, &ndp->channel_queue);
spin_unlock_irqrestore(&ndp->lock, flags);
ncsi_process_next_channel(ndp);
-- 
2.7.4



[PATCH v3 net-next 1/8] net/ncsi: Disable HWA mode when no channels are found

2017-04-17 Thread Gavin Shan
When there are no NCSI channels probed, HWA (Hardware Arbitration)
mode is enabled. It's not correct because HWA depends on the fact:
NCSI channels exist and all of them support HWA mode. This disables
HWA when no channels are probed.

Signed-off-by: Gavin Shan 
---
 net/ncsi/ncsi-manage.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/net/ncsi/ncsi-manage.c b/net/ncsi/ncsi-manage.c
index a3bd5fa..5073e15 100644
--- a/net/ncsi/ncsi-manage.c
+++ b/net/ncsi/ncsi-manage.c
@@ -839,12 +839,15 @@ static bool ncsi_check_hwa(struct ncsi_dev_priv *ndp)
struct ncsi_package *np;
struct ncsi_channel *nc;
unsigned int cap;
+   bool has_channel = false;
 
/* The hardware arbitration is disabled if any one channel
 * doesn't support explicitly.
 */
NCSI_FOR_EACH_PACKAGE(ndp, np) {
NCSI_FOR_EACH_CHANNEL(np, nc) {
+   has_channel = true;
+
cap = nc->caps[NCSI_CAP_GENERIC].cap;
if (!(cap & NCSI_CAP_GENERIC_HWA) ||
(cap & NCSI_CAP_GENERIC_HWA_MASK) !=
@@ -855,8 +858,13 @@ static bool ncsi_check_hwa(struct ncsi_dev_priv *ndp)
}
}
 
-   ndp->flags |= NCSI_DEV_HWA;
-   return true;
+   if (has_channel) {
+   ndp->flags |= NCSI_DEV_HWA;
+   return true;
+   }
+
+   ndp->flags &= ~NCSI_DEV_HWA;
+   return false;
 }
 
 static int ncsi_enable_hwa(struct ncsi_dev_priv *ndp)
-- 
2.7.4



[PATCH v3 net-next 8/8] net/ncsi: Fix length of GVI response packet

2017-04-17 Thread Gavin Shan
The length of GVI (GetVersionInfo) response packet should be 40 instead
of 36. This issue was found from /sys/kernel/debug/ncsi/eth0/stats.

 # cat /sys/kernel/debug/ncsi/eth0/stats
 :
 RSP  OK   TIMEOUT  ERROR
 ===
 GVI  002

With this applied, no error reported on GVI response packets:

 # cat /sys/kernel/debug/ncsi/eth0/stats
 :
 RSP  OK   TIMEOUT  ERROR
 ===
 GVI  200

Signed-off-by: Gavin Shan 
---
 net/ncsi/ncsi-rsp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ncsi/ncsi-rsp.c b/net/ncsi/ncsi-rsp.c
index 092f2d8..804ccca 100644
--- a/net/ncsi/ncsi-rsp.c
+++ b/net/ncsi/ncsi-rsp.c
@@ -951,7 +951,7 @@ static struct ncsi_rsp_handler {
{ NCSI_PKT_RSP_EGMF,4, ncsi_rsp_handler_egmf},
{ NCSI_PKT_RSP_DGMF,4, ncsi_rsp_handler_dgmf},
{ NCSI_PKT_RSP_SNFC,4, ncsi_rsp_handler_snfc},
-   { NCSI_PKT_RSP_GVI,36, ncsi_rsp_handler_gvi },
+   { NCSI_PKT_RSP_GVI,40, ncsi_rsp_handler_gvi },
{ NCSI_PKT_RSP_GC, 32, ncsi_rsp_handler_gc  },
{ NCSI_PKT_RSP_GP, -1, ncsi_rsp_handler_gp  },
{ NCSI_PKT_RSP_GCPS,  172, ncsi_rsp_handler_gcps},
-- 
2.7.4



[PATCH v3 net-next 0/8] net/ncsi: Add debugging functionality

2017-04-17 Thread Gavin Shan
This series supports NCSI debugging infrastructure by adding several
debugfs files. It was inspired by the reported issues: No available
package and channel are probed successfully. Obviously, we don't
have a debugging infrastructure for NCSI stack yet.

The first 3 patches, fixing some issues, aren't relevant to the
subject. I included them because I expect they can be merged beofre
the code for debugging infrastructure. PATCH[4,5,6/8] adds debugfs
directories and files to support the debugging infrastructure for
several purposes: presenting the NCSI topology; statistics on sent
and received NCSI packets; generate NCSI command packet manually.
PATCH[7,8/8] fixes two issues found from the debugging functionality.

Changelog
=
v3:
   * Use pr_debug() instead of pr_warn() upon failure to create
 debugfs directory or file   (Joe 
Perches)
   * Use relative debugfs path/file names in debug messages in
 ncsi-debug.c(Joe 
Perches)
   * Use const specifier for @ncsi_pkt_handlers and @ranges  (Joe 
Perches)
   * Eliminate CONFIG_NET_NCSI_DEBUG ifdef's in *.c  (Jakub 
Kicinski)
v2:
   * Use debugfs instead of procfs   (Joe 
Perches)

Gavin Shan (8):
  net/ncsi: Disable HWA mode when no channels are found
  net/ncsi: Properly track channel monitor timer state
  net/ncsi: Enforce failover on link monitor timeout
  net/ncsi: Add debugging infrastructurre
  net/ncsi: Dump NCSI packet statistics
  net/ncsi: Support NCSI packet generation
  net/ncsi: No error report on DP response to non-existing package
  net/ncsi: Fix length of GVI response packet

 net/ncsi/Kconfig   |   9 +
 net/ncsi/Makefile  |   1 +
 net/ncsi/internal.h| 105 ++
 net/ncsi/ncsi-aen.c|  13 +-
 net/ncsi/ncsi-cmd.c|  13 +-
 net/ncsi/ncsi-debug.c  | 992 +
 net/ncsi/ncsi-manage.c |  58 ++-
 net/ncsi/ncsi-rsp.c|  24 +-
 8 files changed, 1203 insertions(+), 12 deletions(-)
 create mode 100644 net/ncsi/ncsi-debug.c

-- 
2.7.4



[PATCH v3 net-next 2/8] net/ncsi: Properly track channel monitor timer state

2017-04-17 Thread Gavin Shan
The field @monitor.enabled in the NCSI channel descriptor is used
to track the state of channel monitor timer. It indicates the timer's
state (pending or not). We could not start the timer again in its
handler. In that case, We missed to update @monitor.enabled to false.
It leads to below warning printed by WARN_ON_ONCE() when the monitor
is restarted afterwards.

   [ cut here ]
   WARNING: CPU: 0 PID: 411 at /var/lib/jenkins/workspace/openbmc-build \
   /distro/ubuntu/target/palmetto/openbmc/build/tmp/work-shared/palmetto \
   net/ncsi/ncsi-manage.c:240 ncsi_start_channel_monitor+0x44/0x7c
   CPU: 0 PID: 411 Comm: kworker/0:3 Not tainted \
   4.7.10-f26558191540830589fe03932d05577957670b8d #1
   Hardware name: ASpeed SoC
   Workqueue: events ncsi_dev_work
   [] (unwind_backtrace) from [] (show_stack+0x10/0x14)
   [] (show_stack) from [] (__warn+0xc4/0xf0)
   [] (__warn) from [] (warn_slowpath_null+0x1c/0x24)
   [] (warn_slowpath_null) from [] 
(ncsi_start_channel_monitor+0x44/0x7c)
   [] (ncsi_start_channel_monitor) from [] 
(ncsi_configure_channel+0x27c/0x2dc)
   [] (ncsi_configure_channel) from [] 
(ncsi_dev_work+0x39c/0x3e8)
   [] (ncsi_dev_work) from [] (process_one_work+0x1b8/0x2fc)
   [] (process_one_work) from [] (worker_thread+0x2c0/0x3f8)
   [] (worker_thread) from [] (kthread+0xd0/0xe8)
   [] (kthread) from [] (ret_from_fork+0x14/0x24)
   ---[ end trace 110cccf2b038c44d ]---

This fixes the issue by updating @monitor.enabled to false if needed.

Reported-by: Sridevi Ramesh 
Signed-off-by: Gavin Shan 
---
 net/ncsi/ncsi-manage.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/net/ncsi/ncsi-manage.c b/net/ncsi/ncsi-manage.c
index 5073e15..c71a3a5 100644
--- a/net/ncsi/ncsi-manage.c
+++ b/net/ncsi/ncsi-manage.c
@@ -183,11 +183,16 @@ static void ncsi_channel_monitor(unsigned long data)
monitor_state = nc->monitor.state;
spin_unlock_irqrestore(&nc->lock, flags);
 
-   if (!enabled || chained)
+   if (!enabled || chained) {
+   ncsi_stop_channel_monitor(nc);
return;
+   }
+
if (state != NCSI_CHANNEL_INACTIVE &&
-   state != NCSI_CHANNEL_ACTIVE)
+   state != NCSI_CHANNEL_ACTIVE) {
+   ncsi_stop_channel_monitor(nc);
return;
+   }
 
switch (monitor_state) {
case NCSI_CHANNEL_MONITOR_START:
@@ -199,6 +204,7 @@ static void ncsi_channel_monitor(unsigned long data)
nca.req_flags = 0;
ret = ncsi_xmit_cmd(&nca);
if (ret) {
+   ncsi_stop_channel_monitor(nc);
netdev_err(ndp->ndev.dev, "Error %d sending GLS\n",
   ret);
return;
@@ -218,6 +224,8 @@ static void ncsi_channel_monitor(unsigned long data)
nc->state = NCSI_CHANNEL_INVISIBLE;
spin_unlock_irqrestore(&nc->lock, flags);
 
+   ncsi_stop_channel_monitor(nc);
+
spin_lock_irqsave(&ndp->lock, flags);
nc->state = NCSI_CHANNEL_INACTIVE;
list_add_tail_rcu(&nc->link, &ndp->channel_queue);
@@ -257,6 +265,10 @@ void ncsi_stop_channel_monitor(struct ncsi_channel *nc)
nc->monitor.enabled = false;
spin_unlock_irqrestore(&nc->lock, flags);
 
+   /* The timer isn't in pending state if we're deleting the timer
+* in its handler. del_timer_sync() can detect it and just does
+* nothing.
+*/
del_timer_sync(&nc->monitor.timer);
 }
 
-- 
2.7.4



[PATCH v3 net-next 7/8] net/ncsi: No error report on DP response to non-existing package

2017-04-17 Thread Gavin Shan
The issue was found from /sys/kernel/debug/ncsi/eth0/stats. The
first step in NCSI package/channel enumeration is deselect all
packages by sending DP (Deselect Package) commands. The remote
NIC replies with response while the corresponding package isn't
populated yet and it is treated as an error wrongly.

 # cat /sys/kernel/debug/ncsi/eth0/stats
 :
 RSP  OK   TIMEOUT  ERROR
 ===
 CIS  300
 SP   300
 DP   201

This fixes the issue by ignoring the error in DP response handler,
when the corresponding package isn't existing. With this applied,
no error reported from DP response packets.

 # cat /sys/kernel/debug/ncsi/eth0/stats
 :
 RSP  OK   TIMEOUT  ERROR
 ===
 CIS  300
 SP   300
 DP   300

Signed-off-by: Gavin Shan 
---
 net/ncsi/ncsi-rsp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ncsi/ncsi-rsp.c b/net/ncsi/ncsi-rsp.c
index 24154fc..092f2d8 100644
--- a/net/ncsi/ncsi-rsp.c
+++ b/net/ncsi/ncsi-rsp.c
@@ -118,7 +118,7 @@ static int ncsi_rsp_handler_dp(struct ncsi_request *nr)
ncsi_find_package_and_channel(ndp, rsp->rsp.common.channel,
  &np, NULL);
if (!np)
-   return -ENODEV;
+   return 0;
 
/* Change state of all channels attached to the package */
NCSI_FOR_EACH_CHANNEL(np, nc) {
-- 
2.7.4



[PATCH v3 net-next 6/8] net/ncsi: Support NCSI packet generation

2017-04-17 Thread Gavin Shan
This introduces /sys/kernel/debug/ncsi/eth0/pkt. The debugfs entry
can accept parameters to produce NCSI command packet. The received
NCSI response packet is dumped on read. Below is an example to send
CIS command and dump its response.

   # echo CIS,0,0 > /sys/kernel/debug/ncsi/eth0/pkt
   # cat /sys/kernel/debug/ncsi/eth0/pkt
   NCSI response [CIS] packet received

   00 01 dd 80 00 0004  

Signed-off-by: Gavin Shan 
---
 net/ncsi/internal.h|  43 +++
 net/ncsi/ncsi-cmd.c|   1 +
 net/ncsi/ncsi-debug.c  | 697 ++---
 net/ncsi/ncsi-manage.c |   2 +
 net/ncsi/ncsi-rsp.c|   9 +
 5 files changed, 722 insertions(+), 30 deletions(-)

diff --git a/net/ncsi/internal.h b/net/ncsi/internal.h
index c0b50a9..67d987b 100644
--- a/net/ncsi/internal.h
+++ b/net/ncsi/internal.h
@@ -221,6 +221,9 @@ struct ncsi_request {
bool used;/* Request that has been assigned  */
unsigned int flags;   /* NCSI request property   */
 #define NCSI_REQ_FLAG_EVENT_DRIVEN 1
+#ifdef CONFIG_NET_NCSI_DEBUG
+#define NCSI_REQ_FLAG_DEBUG2
+#endif
struct ncsi_dev_priv *ndp;/* Associated NCSI device  */
struct sk_buff   *cmd;/* Associated NCSI command packet  */
struct sk_buff   *rsp;/* Associated NCSI response packet */
@@ -293,6 +296,14 @@ struct ncsi_dev_priv {
unsigned long  rsp[128][NCSI_PKT_STAT_MAX];
unsigned long  aen[256][NCSI_PKT_STAT_MAX];
} stats;
+   struct {
+   struct dentry  *dentry;
+   unsigned int   req;
+#define NCSI_PKT_REQ_FREE  0
+#define NCSI_PKT_REQ_BUSY  0x
+   interrno;
+   struct sk_buff *rsp;
+   } pkt;
struct dentry   *dentry; /* Procfs directory   */
 #endif
 };
@@ -361,6 +372,22 @@ int ncsi_aen_handler(struct ncsi_dev_priv *ndp, struct 
sk_buff *skb);
 int ncsi_dev_init_debug(struct ncsi_dev_priv *ndp);
 void ncsi_dev_update_stats(struct ncsi_dev_priv *ndp,
   int type, int subtype, int errno);
+
+static inline bool ncsi_dev_is_debug_pkt(struct ncsi_dev_priv *ndp,
+struct ncsi_request *nr)
+{
+   return ((nr->flags & NCSI_REQ_FLAG_DEBUG) && ndp->pkt.req == nr->id);
+}
+
+static inline void ncsi_dev_set_debug_pkt(struct ncsi_dev_priv *ndp,
+ struct ncsi_request *nr)
+{
+   if (nr->flags & NCSI_REQ_FLAG_DEBUG)
+   ndp->pkt.req = nr->id;
+}
+
+void ncsi_dev_reset_debug_pkt(struct ncsi_dev_priv *ndp,
+ struct sk_buff *skb, int errno);
 void ncsi_dev_release_debug(struct ncsi_dev_priv *ndp);
 int ncsi_package_init_debug(struct ncsi_package *np);
 void ncsi_package_release_debug(struct ncsi_package *np);
@@ -377,6 +404,22 @@ static inline void ncsi_dev_update_stats(struct 
ncsi_dev_priv *ndp,
 {
 }
 
+static inline bool ncsi_dev_is_debug_pkt(struct ncsi_dev_priv *ndp,
+struct ncsi_request *nr)
+{
+   return false;
+}
+
+static inline void ncsi_dev_set_debug_pkt(struct ncsi_dev_priv *ndp,
+ struct ncsi_request *nr)
+{
+}
+
+static inline void ncsi_dev_reset_debug_pkt(struct ncsi_dev_priv *ndp,
+   struct sk_buff *skb, int errno)
+{
+}
+
 static inline void ncsi_dev_release_debug(struct ncsi_dev_priv *ndp)
 {
 }
diff --git a/net/ncsi/ncsi-cmd.c b/net/ncsi/ncsi-cmd.c
index 9a8dac2..baf82b3 100644
--- a/net/ncsi/ncsi-cmd.c
+++ b/net/ncsi/ncsi-cmd.c
@@ -361,6 +361,7 @@ int ncsi_xmit_cmd(struct ncsi_cmd_arg *nca)
 */
nr->enabled = true;
mod_timer(&nr->timer, jiffies + 1 * HZ);
+   ncsi_dev_set_debug_pkt(nca->ndp, nr);
 
/* Send NCSI packet */
skb_get(nr->cmd);
diff --git a/net/ncsi/ncsi-debug.c b/net/ncsi/ncsi-debug.c
index b6df895..4f2b72c 100644
--- a/net/ncsi/ncsi-debug.c
+++ b/net/ncsi/ncsi-debug.c
@@ -23,40 +23,651 @@
 #include "ncsi-pkt.h"
 
 static struct dentry *ncsi_dentry;
+
+static const char *ncsi_pkt_type_name(unsigned int type);
+
+static int ncsi_pkt_input_default(struct ncsi_dev_priv *ndp,
+ struct ncsi_cmd_arg *nca, char *buf)
+{
+   return 0;
+}
+
+static int ncsi_pkt_input_params(char *buf, int *outval, int count)
+{
+   int num, i;
+
+   for (i = 0; i < count; i++, outval++) {
+   if (sscanf(buf, "%x%n", outval, &num) != 1)
+   return -EINVAL;
+
+   if (buf[num] == ',')
+   buf += (count + 1);
+   else
+   buf += count;
+   }
+
+   return 0;
+}
+
+static int ncsi_pkt_input_sp(struct ncsi_dev_priv *ndp,
+struct ncsi_cmd_arg *nca, char *buf)
+{
+   int param, ret;
+
+   /* The hardware ar

[PATCH v3 net-next 4/8] net/ncsi: Add debugging infrastructurre

2017-04-17 Thread Gavin Shan
This creates debugfs directories as NCSI debugging infrastructure.
With the patch applied, We will see below debugfs directories. Every
NCSI package and channel has one corresponding directory. Other than
presenting the NCSI topology, No real function has been achieved
through these debugfs directories so far.

 /sys/kernel/debug/ncsi/eth0
 /sys/kernel/debug/ncsi/eth0/p0
 /sys/kernel/debug/ncsi/eth0/p0/c0
 /sys/kernel/debug/ncsi/eth0/p0/c1

Signed-off-by: Gavin Shan 
---
 net/ncsi/Kconfig   |   9 +
 net/ncsi/Makefile  |   1 +
 net/ncsi/internal.h|  45 +
 net/ncsi/ncsi-debug.c  | 103 +
 net/ncsi/ncsi-manage.c |  16 
 5 files changed, 174 insertions(+)
 create mode 100644 net/ncsi/ncsi-debug.c

diff --git a/net/ncsi/Kconfig b/net/ncsi/Kconfig
index 08a8a60..baa42501 100644
--- a/net/ncsi/Kconfig
+++ b/net/ncsi/Kconfig
@@ -10,3 +10,12 @@ config NET_NCSI
  support. Enable this only if your system connects to a network
  device via NCSI and the ethernet driver you're using supports
  the protocol explicitly.
+
+config NET_NCSI_DEBUG
+   bool "Enable NCSI debugging"
+   depends on NET_NCSI && DEBUG_FS
+   default n
+   ---help---
+ This enables the interfaces (e.g. debugfs) for NCSI debugging purpose.
+
+ If unsure, say Y.
diff --git a/net/ncsi/Makefile b/net/ncsi/Makefile
index dd12b56..2897fa0 100644
--- a/net/ncsi/Makefile
+++ b/net/ncsi/Makefile
@@ -2,3 +2,4 @@
 # Makefile for NCSI API
 #
 obj-$(CONFIG_NET_NCSI) += ncsi-cmd.o ncsi-rsp.o ncsi-aen.o ncsi-manage.o
+obj-$(CONFIG_NET_NCSI_DEBUG) += ncsi-debug.o
diff --git a/net/ncsi/internal.h b/net/ncsi/internal.h
index 1308a56..e9ede4f 100644
--- a/net/ncsi/internal.h
+++ b/net/ncsi/internal.h
@@ -198,6 +198,9 @@ struct ncsi_channel {
} monitor;
struct list_headnode;
struct list_headlink;
+#ifdef CONFIG_NET_NCSI_DEBUG
+   struct dentry   *dentry;/* Debugfs directory*/
+#endif
 };
 
 struct ncsi_package {
@@ -208,6 +211,9 @@ struct ncsi_package {
unsigned int channel_num; /* Number of channels */
struct list_head channels;/* List of chanels*/
struct list_head node;/* Form list of packages  */
+#ifdef CONFIG_NET_NCSI_DEBUG
+   struct dentry*dentry; /* Debugfs directory   */
+#endif
 };
 
 struct ncsi_request {
@@ -276,6 +282,9 @@ struct ncsi_dev_priv {
struct work_struct  work;/* For channel management */
struct packet_type  ptype;   /* NCSI packet Rx handler */
struct list_headnode;/* Form NCSI device list  */
+#ifdef CONFIG_NET_NCSI_DEBUG
+   struct dentry   *dentry; /* Procfs directory   */
+#endif
 };
 
 struct ncsi_cmd_arg {
@@ -337,4 +346,40 @@ int ncsi_rcv_rsp(struct sk_buff *skb, struct net_device 
*dev,
 struct packet_type *pt, struct net_device *orig_dev);
 int ncsi_aen_handler(struct ncsi_dev_priv *ndp, struct sk_buff *skb);
 
+/* Debugging functionality */
+#ifdef CONFIG_NET_NCSI_DEBUG
+int ncsi_dev_init_debug(struct ncsi_dev_priv *ndp);
+void ncsi_dev_release_debug(struct ncsi_dev_priv *ndp);
+int ncsi_package_init_debug(struct ncsi_package *np);
+void ncsi_package_release_debug(struct ncsi_package *np);
+int ncsi_channel_init_debug(struct ncsi_channel *nc);
+void ncsi_channel_release_debug(struct ncsi_channel *nc);
+#else
+static inline int ncsi_dev_init_debug(struct ncsi_dev_priv *ndp)
+{
+   return -ENOTTY;
+}
+
+static inline void ncsi_dev_release_debug(struct ncsi_dev_priv *ndp)
+{
+}
+
+static inline int ncsi_package_init_debug(struct ncsi_package *np)
+{
+   return -ENOTTY;
+}
+
+static inline void ncsi_package_release_debug(struct ncsi_package *np)
+{
+}
+
+static inline int ncsi_channel_init_debug(struct ncsi_channel *nc)
+{
+   return -ENOTTY;
+}
+
+static inline void ncsi_channel_release_debug(struct ncsi_channel *nc)
+{
+}
+#endif /* CONFIG_NET_NCSI_DEBUG */
 #endif /* __NCSI_INTERNAL_H__ */
diff --git a/net/ncsi/ncsi-debug.c b/net/ncsi/ncsi-debug.c
new file mode 100644
index 000..f38483d
--- /dev/null
+++ b/net/ncsi/ncsi-debug.c
@@ -0,0 +1,103 @@
+/*
+ * Copyright Gavin Shan, IBM Corporation 2017.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include "internal.h"
+#include "ncsi-pkt.h"
+
+static struct dentry *ncsi_dentry;
+
+int ncsi_dev_init_debug(struct ncsi_dev_priv *ndp)
+{
+   if (WARN_ON_ONCE(ndp->dentry))
+   return 0;
+
+   if (!ncsi_dentr

[PATCH v3 net-next 5/8] net/ncsi: Dump NCSI packet statistics

2017-04-17 Thread Gavin Shan
This creates /sys/kernel/debug/ncsi//stats to dump the NCSI
packets sent and received over all packages and channels. It's useful
to diagnose NCSI problems, especially when NCSI packages and channels
aren't probed properly. The statistics can be gained from debugfs file
as below:

 # cat /sys/kernel/debug/ncsi/eth0/stats

 CMD  OK   TIMEOUT  ERROR
 ===
 CIS  32   29   0
 SP   10   70
 DP   17   14   0
 EC   100
 ECNT 100
 AE   100
 GLS  11   00
 SMA  100
 EBF  100
 GVI  200
 GC   200

 RSP  OK   TIMEOUT  ERROR
 ===
 CIS  300
 SP   300
 DP   201
 EC   100
 ECNT 100
 AE   100
 GLS  11   00
 SMA  100
 EBF  100
 GVI  002
 GC   200

 AEN  OK   TIMEOUT  ERROR
 ===

Signed-off-by: Gavin Shan 
---
 net/ncsi/internal.h|  17 
 net/ncsi/ncsi-aen.c|  13 ++-
 net/ncsi/ncsi-cmd.c|  12 ++-
 net/ncsi/ncsi-debug.c  | 252 +
 net/ncsi/ncsi-manage.c |   4 +
 net/ncsi/ncsi-rsp.c|  11 ++-
 6 files changed, 306 insertions(+), 3 deletions(-)

diff --git a/net/ncsi/internal.h b/net/ncsi/internal.h
index e9ede4f..c0b50a9 100644
--- a/net/ncsi/internal.h
+++ b/net/ncsi/internal.h
@@ -282,7 +282,17 @@ struct ncsi_dev_priv {
struct work_struct  work;/* For channel management */
struct packet_type  ptype;   /* NCSI packet Rx handler */
struct list_headnode;/* Form NCSI device list  */
+#define NCSI_PKT_STAT_OK   0
+#define NCSI_PKT_STAT_TIMEOUT  1
+#define NCSI_PKT_STAT_ERROR2
+#define NCSI_PKT_STAT_MAX  3
 #ifdef CONFIG_NET_NCSI_DEBUG
+   struct {
+   struct dentry  *dentry;
+   unsigned long  cmd[128][NCSI_PKT_STAT_MAX];
+   unsigned long  rsp[128][NCSI_PKT_STAT_MAX];
+   unsigned long  aen[256][NCSI_PKT_STAT_MAX];
+   } stats;
struct dentry   *dentry; /* Procfs directory   */
 #endif
 };
@@ -349,6 +359,8 @@ int ncsi_aen_handler(struct ncsi_dev_priv *ndp, struct 
sk_buff *skb);
 /* Debugging functionality */
 #ifdef CONFIG_NET_NCSI_DEBUG
 int ncsi_dev_init_debug(struct ncsi_dev_priv *ndp);
+void ncsi_dev_update_stats(struct ncsi_dev_priv *ndp,
+  int type, int subtype, int errno);
 void ncsi_dev_release_debug(struct ncsi_dev_priv *ndp);
 int ncsi_package_init_debug(struct ncsi_package *np);
 void ncsi_package_release_debug(struct ncsi_package *np);
@@ -360,6 +372,11 @@ static inline int ncsi_dev_init_debug(struct ncsi_dev_priv 
*ndp)
return -ENOTTY;
 }
 
+static inline void ncsi_dev_update_stats(struct ncsi_dev_priv *ndp,
+int type, int subtype, int errno)
+{
+}
+
 static inline void ncsi_dev_release_debug(struct ncsi_dev_priv *ndp)
 {
 }
diff --git a/net/ncsi/ncsi-aen.c b/net/ncsi/ncsi-aen.c
index 6898e72..72bac7c 100644
--- a/net/ncsi/ncsi-aen.c
+++ b/net/ncsi/ncsi-aen.c
@@ -206,16 +206,27 @@ int ncsi_aen_handler(struct ncsi_dev_priv *ndp, struct 
sk_buff *skb)
}
 
if (!nah) {
+   ncsi_dev_update_stats(ndp, NCSI_PKT_AEN,
+ h->type, NCSI_PKT_STAT_ERROR);
netdev_warn(ndp->ndev.dev, "Invalid AEN (0x%x) received\n",
h->type);
return -ENOENT;
}
 
ret = ncsi_validate_aen_pkt(h, nah->payload);
-   if (ret)
+   if (ret) {
+   ncsi_dev_update_stats(ndp, NCSI_PKT_AEN,
+ h->type, NCSI_PKT_STAT_ERROR);
goto out;
+   }
 
ret = nah->handler(ndp, h);
+   if (ret)
+   ncsi_dev_update_stats(ndp, NCSI_PKT_AEN,
+ h->type, NCSI_PKT_STAT_ERROR);
+   else
+   ncsi_dev_update_stats(ndp, NCSI_PKT_AEN,
+ h->type, NCSI_PKT_STAT_OK);
 out:
consume_skb(skb);
return ret;
diff --git a/net/ncsi/ncsi-cmd.c b/net/ncsi/ncsi-cmd.c
index db7083b..9a8dac2 100644
--- a/net/ncsi/ncsi-cmd.c
+++ b/net/ncsi/ncsi-cmd.c
@@ -323,6 +323,8 @@ int ncsi_xmit_cmd(struct ncsi_cmd_arg *nca)
}
 
if (!nch) {
+   ncsi_dev_update_stats(nca->ndp, nca->type,
+ 0, NCSI_PKT_STAT_ERROR);
netdev_err(nca->ndp->ndev.dev,

Re: [PATCH 05/22] drm/i915: Make use of the new sg_map helper function

2017-04-17 Thread Daniel Vetter
On Thu, Apr 13, 2017 at 04:05:18PM -0600, Logan Gunthorpe wrote:
> This is a single straightforward conversion from kmap to sg_map.
> 
> Signed-off-by: Logan Gunthorpe 

Acked-by: Daniel Vetter 

Probably makes sense to merge through some other tree, but please be aware
of the considerable churn rate in i915 (i.e. make sure your tree is in
linux-next before you send a pull request for this). Plane B would be to
get the prep patch in first and then merge the i915 conversion one kernel
release later.
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_gem.c | 27 ---
>  1 file changed, 16 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 67b1fc5..1b1b91a 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2188,6 +2188,15 @@ static void __i915_gem_object_reset_page_iter(struct 
> drm_i915_gem_object *obj)
>   radix_tree_delete(&obj->mm.get_page.radix, iter.index);
>  }
>  
> +static void i915_gem_object_unmap(const struct drm_i915_gem_object *obj,
> +   void *ptr)
> +{
> + if (is_vmalloc_addr(ptr))
> + vunmap(ptr);
> + else
> + sg_unmap(obj->mm.pages->sgl, ptr, SG_KMAP);
> +}
> +
>  void __i915_gem_object_put_pages(struct drm_i915_gem_object *obj,
>enum i915_mm_subclass subclass)
>  {
> @@ -2215,10 +2224,7 @@ void __i915_gem_object_put_pages(struct 
> drm_i915_gem_object *obj,
>   void *ptr;
>  
>   ptr = ptr_mask_bits(obj->mm.mapping);
> - if (is_vmalloc_addr(ptr))
> - vunmap(ptr);
> - else
> - kunmap(kmap_to_page(ptr));
> + i915_gem_object_unmap(obj, ptr);
>  
>   obj->mm.mapping = NULL;
>   }
> @@ -2475,8 +2481,11 @@ static void *i915_gem_object_map(const struct 
> drm_i915_gem_object *obj,
>   void *addr;
>  
>   /* A single page can always be kmapped */
> - if (n_pages == 1 && type == I915_MAP_WB)
> - return kmap(sg_page(sgt->sgl));
> + if (n_pages == 1 && type == I915_MAP_WB) {
> + addr = sg_map(sgt->sgl, SG_KMAP);
> + if (IS_ERR(addr))
> + return NULL;
> + }
>  
>   if (n_pages > ARRAY_SIZE(stack_pages)) {
>   /* Too big for stack -- allocate temporary array instead */
> @@ -2543,11 +2552,7 @@ void *i915_gem_object_pin_map(struct 
> drm_i915_gem_object *obj,
>   goto err_unpin;
>   }
>  
> - if (is_vmalloc_addr(ptr))
> - vunmap(ptr);
> - else
> - kunmap(kmap_to_page(ptr));
> -
> + i915_gem_object_unmap(obj, ptr);
>   ptr = obj->mm.mapping = NULL;
>   }
>  
> -- 
> 2.1.4
> 
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


[PATCH] net/mlx4: suppress 'may be used uninitialized' warning

2017-04-17 Thread Greg Thelen
gcc 4.8.4 complains that mlx4_SW2HW_MPT_wrapper() uses an uninitialized
'mpt' variable:
  drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 
'mlx4_SW2HW_MPT_wrapper':
  drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:2802:12: warning: 'mpt' 
may be used uninitialized in this function [-Wmaybe-uninitialized]
 mpt->mtt = mtt;

I think this warning is a false complaint.  mpt is only used when
mr_res_start_move_to() return zero, and in all such cases it initializes
mpt.  But apparently gcc cannot see that.

Initialize mpt to avoid the warning.

Signed-off-by: Greg Thelen 
---
 drivers/net/ethernet/mellanox/mlx4/resource_tracker.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c 
b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
index d8d5d161b8c7..4aa29ee93013 100644
--- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
+++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
@@ -2749,7 +2749,7 @@ int mlx4_SW2HW_MPT_wrapper(struct mlx4_dev *dev, int 
slave,
int err;
int index = vhcr->in_modifier;
struct res_mtt *mtt;
-   struct res_mpt *mpt;
+   struct res_mpt *mpt = NULL;
int mtt_base = mr_get_mtt_addr(inbox->buf) / dev->caps.mtt_entry_sz;
int phys;
int id;
-- 
2.12.2.762.g0e3151a226-goog



Re: [PATCH RFC] sparc64: eBPF JIT

2017-04-17 Thread Alexei Starovoitov
On Mon, Apr 17, 2017 at 06:12:45PM -0700, David Miller wrote:
> > 
> >> +  if (insn->src_reg == BPF_REG_FP || insn->dst_reg == BPF_REG_FP) {
> >> +  ctx->saw_frame_pointer = true;
> >> +  if (BPF_CLASS(code) == BPF_ALU ||
> >> +  BPF_CLASS(code) == BPF_ALU64) {
> >> +  pr_err_once("ALU op on FP not supported by JIT\n");
> >> +  return -EINVAL;
> > 
> > That should be fine. The verifier checks for that:
> >   /* check whether register used as dest operand can be written to */
> >   if (regno == BPF_REG_FP) {
> >   verbose("frame pointer is read only\n");
> >   return -EACCES;
> >   }
> 
> I need to trap it as a source as well, because if that is allowed then
> I have to add special handling to every ALU operation we allow.
> 
> The reason is that the sparc64 stack is biased by 2047 bytes.  So
> I have to adjust every FP relative reference to include that 2047
> bias.
> 
> Can LLVM and CLANG emit arithmetic operations with FP as source?

The way llvm generates stack access is:
rX = r10
rX += imm
and that's the only thing verifier recognizes as valid ptr_to_stack.
Like rX -= imm will not be recognized as proper stack offset,
since llvm never does it.
I guess that counts as ALU on FP ?
Looks like we don't have such tests in test_bpf.ko. Hmm.
Pretty much all compiled C code should have such stack access.
The easiest test is tools/testing/selftests/bpf/test_progs

> >> +  /* dst = imm64 */
> >> +  case BPF_LD | BPF_IMM | BPF_DW:
> >> +  {
> >> +  const struct bpf_insn insn1 = insn[1];
> >> +  u64 imm64;
> >> +
> >> +  if (insn1.code != 0 || insn1.src_reg != 0 ||
> >> +  insn1.dst_reg != 0 || insn1.off != 0) {
> >> +  /* Note: verifier in BPF core must catch invalid
> >> +   * instructions.
> >> +   */
> >> +  pr_err_once("Invalid BPF_LD_IMM64 instruction\n");
> >> +  return -EINVAL;
> > 
> > verifier should catch that too, but extra check doesn't hurt.
> 
> I just copied from anoter JIT, I can remove it.

That check was added to x64 JIT, because JIT patches were submitted
before eBPF was known to user space. There was no verifier at that time.
So JIT had do some checking, just for sanity.
It's probably ok to remove it now... or leave it as-is as historical artifact.



Re: [PATCH RFC] ptr_ring: add ptr_ring_unconsume

2017-04-17 Thread Jason Wang



On 2017年04月17日 07:19, Michael S. Tsirkin wrote:

Applications that consume a batch of entries in one go
can benefit from ability to return some of them back
into the ring.

Add an API for that - assuming there's space. If there's no space
naturally we can't do this and have to drop entries, but this implies
ring is full so we'd likely drop some anyway.

Signed-off-by: Michael S. Tsirkin 
---

Jason, in my mind the biggest issue with your batching patchset is the
backet drops on disconnect.  This API will help avoid that in the common
case.


Ok, I will rebase the series on top of this. (Though I don't think we 
care the packet loss).




I would still prefer that we understand what's going on,


I try to reply in another thread, does it make sense?


  and I would
like to know what's the smallest batch size that's still helpful,


Yes, I've replied in another thread, the result is:


no batching   1.88Mpps
RX_BATCH=11.93Mpps
RX_BATCH=42.11Mpps
RX_BATCH=16   2.14Mpps
RX_BATCH=64   2.25Mpps
RX_BATCH=256  2.18Mpps


  but
I'm not going to block the patch on these grounds assuming packet drops
are fixed.


Thanks a lot.



Lightly tested - this is on top of consumer batching patches.

Thanks!

  include/linux/ptr_ring.h | 57 
  1 file changed, 57 insertions(+)

diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
index 783e7f5..5fbeab4 100644
--- a/include/linux/ptr_ring.h
+++ b/include/linux/ptr_ring.h
@@ -457,6 +457,63 @@ static inline int ptr_ring_init(struct ptr_ring *r, int 
size, gfp_t gfp)
return 0;
  }
  
+/*

+ * Return entries into ring. Destroy entries that don't fit.
+ *
+ * Note: this is expected to be a rare slow path operation.
+ *
+ * Note: producer lock is nested within consumer lock, so if you
+ * resize you must make sure all uses nest correctly.
+ * In particular if you consume ring in interrupt or BH context, you must
+ * disable interrupts/BH when doing so.
+ */
+static inline void ptr_ring_unconsume(struct ptr_ring *r, void **batch, int n,
+ void (*destroy)(void *))
+{
+   unsigned long flags;
+   int head;
+
+   spin_lock_irqsave(&(r)->consumer_lock, flags);
+   spin_lock(&(r)->producer_lock);
+
+   if (!r->size)
+   goto done;
+
+   /*
+* Clean out buffered entries (for simplicity). This way following code
+* can test entries for NULL and if not assume they are valid.
+*/
+   head = r->consumer_head - 1;
+   while (likely(head >= r->consumer_tail))
+   r->queue[head--] = NULL;
+   r->consumer_tail = r->consumer_head;
+
+   /*
+* Go over entries in batch, start moving head back and copy entries.
+* Stop when we run into previously unconsumed entries.
+*/
+   while (n--) {
+   head = r->consumer_head - 1;
+   if (head < 0)
+   head = r->size - 1;
+   if (r->queue[head]) {
+   /* This batch entry will have to be destroyed. */
+   ++n;
+   goto done;
+   }
+   r->queue[head] = batch[n];
+   r->consumer_tail = r->consumer_head = head;
+   }
+
+done:
+   /* Destroy all entries left in the batch. */
+   while (n--) {
+   destroy(batch[n]);
+   }
+   spin_unlock(&(r)->producer_lock);
+   spin_unlock_irqrestore(&(r)->consumer_lock, flags);
+}
+
  static inline void **__ptr_ring_swap_queue(struct ptr_ring *r, void **queue,
   int size, gfp_t gfp,
   void (*destroy)(void *))




linux-next: build failure after merge of the block tree

2017-04-17 Thread Stephen Rothwell
Hi all,

After merging the block tree, today's linux-next build (powerpc
ppc64_defconfig) failed like this:

drivers/block/nbd.c: In function 'nbd_genl_connect':
drivers/block/nbd.c:1662:10: error: too few arguments to function 
'nla_parse_nested'
ret = nla_parse_nested(socks, NBD_SOCK_MAX, attr,
  ^
In file included from include/net/rtnetlink.h:5:0,
 from include/net/sch_generic.h:12,
 from include/linux/filter.h:20,
 from include/net/sock.h:64,
 from drivers/block/nbd.c:32:
include/net/netlink.h:754:19: note: declared here
 static inline int nla_parse_nested(struct nlattr *tb[], int maxtype,
   ^
drivers/block/nbd.c: In function 'nbd_genl_reconfigure':
drivers/block/nbd.c:1818:10: error: too few arguments to function 
'nla_parse_nested'
ret = nla_parse_nested(socks, NBD_SOCK_MAX, attr,
  ^
In file included from include/net/rtnetlink.h:5:0,
 from include/net/sch_generic.h:12,
 from include/linux/filter.h:20,
 from include/net/sock.h:64,
 from drivers/block/nbd.c:32:
include/net/netlink.h:754:19: note: declared here
 static inline int nla_parse_nested(struct nlattr *tb[], int maxtype,
   ^

Caused by commits

  e46c7287b1c2 ("nbd: add a basic netlink interface")
  b7aa3d39385d ("nbd: add a reconfigure netlink command")

interacting with commit

  fceb6435e852 ("netlink: pass extended ACK struct to parsing functions")

from the net-next tree.

I have applied the following merge fix patch:

From: Stephen Rothwell 
Date: Tue, 18 Apr 2017 12:59:05 +1000
Subject: [PATCH] nbd: fix up for nla_parse_nested() API change

Signed-off-by: Stephen Rothwell 
---
 drivers/block/nbd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index b78f23ce2395..5049d19f3940 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -1660,7 +1660,7 @@ static int nbd_genl_connect(struct sk_buff *skb, struct 
genl_info *info)
goto out;
}
ret = nla_parse_nested(socks, NBD_SOCK_MAX, attr,
-  nbd_sock_policy);
+  nbd_sock_policy, NULL);
if (ret != 0) {
printk(KERN_ERR "nbd: error processing sock 
list\n");
ret = -EINVAL;
@@ -1816,7 +1816,7 @@ static int nbd_genl_reconfigure(struct sk_buff *skb, 
struct genl_info *info)
goto out;
}
ret = nla_parse_nested(socks, NBD_SOCK_MAX, attr,
-  nbd_sock_policy);
+  nbd_sock_policy, NULL);
if (ret != 0) {
printk(KERN_ERR "nbd: error processing sock 
list\n");
ret = -EINVAL;
-- 
2.11.0

-- 
Cheers,
Stephen Rothwell


Re: [PATCH RFC (resend) net-next 0/6] virtio-net: Add support for virtio-net header extensions

2017-04-17 Thread Jason Wang



On 2017年04月16日 00:38, Vladislav Yasevich wrote:

Curreclty virtion net header is fixed size and adding things to it is rather
difficult to do.  This series attempt to add the infrastructure as well as some
extensions that try to resolve some deficiencies we currently have.

First, vnet header only has space for 16 flags.  This may not be enough
in the future.  The extensions will provide space for 32 possbile extension
flags and 32 possible extensions.   These flags will be carried in the
first pseudo extension header, the presense of which will be determined by
the flag in the virtio net header.

The extensions themselves will immidiately follow the extension header itself.
They will be added to the packet in the same order as they appear in the
extension flags.  No padding is placed between the extensions and any
extensions negotiated, but not used need by a given packet will convert to
trailing padding.


Do we need a explicit padding (e.g an extension) which could be 
controlled by each side?




For example:
  | vnet mrg hdr | ext hdr | ext 1 | ext 2 | ext 5 | .. pad .. | packet data |


Just some rough thoughts:

- Is this better to use TLV instead of bitmap here? One advantage of TLV 
is that the length is not limited by the length of bitmap.
- For 1.1, do we really want something like vnet header? AFAIK, it was 
not used by modern NICs, is this better to pack all meta-data into 
descriptor itself? This may need a some changes in tun/macvtap, but 
looks more PCIE friendly.


Thanks



Extensions proposed in this series are:
  - IPv6 fragment id extension
* Currently, the guest generated fragment id is discarded and the host
  generates an IPv6 fragment id if the packet has to be fragmented.  The
  code attempts to add time based perturbation to id generation to make
  it harder to guess the next fragment id to be used.  However, doing this
  on the host may result is less perturbation (due to differnet timing)
  and might make id guessing easier.  Ideally, the ids generated by the
  guest should be used.  One could also argue that we a "violating" the
  IPv6 protocol in the if the _strict_ interpretation of the spec.

  - VLAN header acceleration
* Currently virtio doesn't not do vlan header acceleration and instead
  uses software tagging.  One of the first things that the host will do is
  strip the vlan header out.  When passing the packet the a guest the
  vlan header is re-inserted in to the packet.  We can skip all that work
  if we can pass the vlan data in accelearted format.  Then the host will
  not do any extra work.  However, so far, this yeilded a very small
  perf bump (only ~1%).  I am still looking into this.

  - UDP tunnel offload
* Similar to vlan acceleration, with this extension we can pass additional
  data to host for support GSO with udp tunnel and possible other
  encapsulations.  This yeilds a significant perfromance improvement
 (still testing remote checksum code).

An addition extension that is unfinished (due to still testing for any
side-effects) is checksum passthrough to support drivers that set
CHECKSUM_COMPLETE.  This would eliminate the need for guests to compute
the software checksum.

This series only takes care of virtio net.  I have addition patches for the
host side (vhost and tap/macvtap as well as qemu), but wanted to get feedback
on the general approach first.

Vladislav Yasevich (6):
   virtio-net: Remove the use the padded vnet_header structure
   virtio-net: make header length handling uniform
   virtio_net: Add basic skeleton for handling vnet header extensions.
   virtio-net: Add support for IPv6 fragment id vnet header extension.
   virtio-net: Add support for vlan acceleration vnet header extension.
   virtio-net: Add support for UDP tunnel offload and extension.

  drivers/net/virtio_net.c| 132 +---
  include/linux/skbuff.h  |   5 ++
  include/linux/virtio_net.h  |  91 ++-
  include/uapi/linux/virtio_net.h |  38 
  4 files changed, 242 insertions(+), 24 deletions(-)





Re: [PATCH RFC (resend) net-next 5/6] virtio-net: Add support for vlan acceleration vnet header extension.

2017-04-17 Thread Jason Wang



On 2017年04月16日 00:38, Vladislav Yasevich wrote:

This extension allows us to pass vlan ID and vlan protocol data to the
host hypervisor as part of the vnet header and lets us take advantage
of HW accelerated vlan tagging in the host.  It requires support in the
host to negotiate the feature.  When the extension is enabled, the
virtio device will enabled HW accelerated vlan features.

Signed-off-by: Vladislav Yasevich 
---
  drivers/net/virtio_net.c| 17 -
  include/linux/virtio_net.h  | 17 +
  include/uapi/linux/virtio_net.h |  7 +++
  3 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 18eb0dd..696ef4a 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -182,6 +182,7 @@ struct virtio_net_hdr_max {
struct virtio_net_hdr_mrg_rxbuf hdr;
struct virtio_net_ext_hdr ext_hdr;
struct virtio_net_ext_ip6frag ip6f_ext;
+   struct virtio_net_ext_vlan vlan_ext;
  };
  
  static inline u8 padded_vnet_hdr(struct virtnet_info *vi)

@@ -2276,6 +2277,11 @@ static void virtnet_init_extensions(struct virtio_device 
*vdev)
vi->hdr_len += sizeof(u32);
vi->ext_mask |= VIRTIO_NET_EXT_F_IP6FRAG;
}
+
+   if (virtio_has_feature(vdev, VIRTIO_NET_F_VLAN_OFFLOAD)) {
+   vi->hdr_len += sizeof(struct virtio_net_ext_vlan);
+   vi->ext_mask |= VIRTIO_NET_EXT_F_VLAN;
+   }
  }
  
  #define MIN_MTU ETH_MIN_MTU

@@ -2352,6 +2358,14 @@ static int virtnet_probe(struct virtio_device *vdev)
if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
dev->features |= NETIF_F_RXCSUM;
  
+	if (virtio_has_feature(vdev, VIRTIO_NET_F_VLAN_OFFLOAD)) {

+   dev->features |= NETIF_F_HW_VLAN_CTAG_TX |
+NETIF_F_HW_VLAN_CTAG_RX |
+NETIF_F_HW_VLAN_STAG_TX |
+NETIF_F_HW_VLAN_STAG_RX;
+   }
+
+
dev->vlan_features = dev->features;
  
  	/* MTU range: 68 - 65535 */

@@ -2395,7 +2409,8 @@ static int virtnet_probe(struct virtio_device *vdev)
if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
vi->mergeable_rx_bufs = true;
  
-	if (virtio_has_feature(vdev, VIRTIO_NET_F_IP6_FRAGID))

+   if (virtio_has_feature(vdev, VIRTIO_NET_F_IP6_FRAGID) ||
+   virtio_has_feature(vdev, VIRTIO_NET_F_VLAN_OFFLOAD))
vi->hdr_ext = true;
  
  	if (vi->hdr_ext)

diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index 3b259dc..e790191 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -113,6 +113,14 @@ static inline int virtio_net_ext_to_skb(struct sk_buff 
*skb,
ptr += sizeof(struct virtio_net_ext_ip6frag);
}
  
+	if (ext->flags & VIRTIO_NET_EXT_F_VLAN) {

+   struct virtio_net_ext_vlan *vhdr =
+   (struct virtio_net_ext_vlan *)ptr;
+
+   __vlan_hwaccel_put_tag(skb, vhdr->vlan_proto, vhdr->vlan_tci);
+   ptr += sizeof(struct virtio_net_ext_vlan);
+   }
+
return 0;
  }
  
@@ -130,6 +138,15 @@ static inline int virtio_net_ext_from_skb(const struct sk_buff *skb,

ext->flags |= VIRTIO_NET_EXT_F_IP6FRAG;


Looks like you need advance ptr here?


}
  
+	if (ext_mask & VIRTIO_NET_EXT_F_VLAN && skb_vlan_tag_present(skb)) {

+   struct virtio_net_ext_vlan *vhdr =
+   (struct virtio_net_ext_vlan *)ptr;
+
+   vlan_get_tag(skb, &vhdr->vlan_tci);
+   vhdr->vlan_proto = skb->vlan_proto;
+   ext->flags |= VIRTIO_NET_EXT_F_VLAN;


And here?

Thanks

+   }
+
return 0;
  }
  #endif /* _LINUX_VIRTIO_NET_H */
diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
index eac8d94..6125de7 100644
--- a/include/uapi/linux/virtio_net.h
+++ b/include/uapi/linux/virtio_net.h
@@ -57,6 +57,7 @@
 * Steering */
  #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */
  #define VIRTIO_NET_F_IP6_FRAGID24 /* Host supports VLAN accleration */
+#define VIRTIO_NET_F_VLAN_OFFLOAD 25   /* Host supports VLAN accleration */
  
  #ifndef VIRTIO_NET_NO_LEGACY

  #define VIRTIO_NET_F_GSO  6   /* Host handles pkts w/ any GSO type */
@@ -111,6 +112,7 @@ struct virtio_net_hdr_v1 {
   */
  struct virtio_net_ext_hdr {
  #define VIRTIO_NET_EXT_F_IP6FRAG  (1<<0)
+#define VIRTIO_NET_EXT_F_VLAN  (1<<1)
__u32 flags;
__u8 extensions[];
  };
@@ -120,6 +122,11 @@ struct virtio_net_ext_ip6frag {
__be32 frag_id;
  };
  
+struct virtio_net_ext_vlan {

+   __be16 vlan_tci;
+   __be16 vlan_proto;
+};
+
  #ifndef VIRTIO_NET_NO_LEGACY
  /* This header comes first in the scatter-gather list.
   * For legacy virtio, if VIRTIO_F_ANY_LAYOUT 

Re: [PATCH RFC (resend) net-next 3/6] virtio_net: Add basic skeleton for handling vnet header extensions.

2017-04-17 Thread Jason Wang



On 2017年04月16日 00:38, Vladislav Yasevich wrote:

This is the basic sceleton which will be fleshed out by individiual
extensions.

Signed-off-by: Vladislav Yasevich 
---
  drivers/net/virtio_net.c| 21 +
  include/linux/virtio_net.h  | 12 
  include/uapi/linux/virtio_net.h | 11 +++
  3 files changed, 44 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 5ad6ee6..08e2709 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -145,6 +145,10 @@ struct virtnet_info {
/* Packet virtio header size */
u8 hdr_len;
  
+	/* Header extensions were negotiated */

+   bool hdr_ext;
+   u32 ext_mask;
+
/* Active statistics */
struct virtnet_stats __percpu *stats;
  
@@ -174,6 +178,11 @@ struct virtnet_info {

u32 speed;
  };
  
+struct virtio_net_hdr_max {

+   struct virtio_net_hdr_mrg_rxbuf hdr;
+   struct virtio_net_ext_hdr ext_hdr;
+};
+
  static inline u8 padded_vnet_hdr(struct virtnet_info *vi)
  {
u8 hdr_len = vi->hdr_len;
@@ -214,6 +223,7 @@ static int rxq2vq(int rxq)
  
  static inline struct virtio_net_hdr_mrg_rxbuf *skb_vnet_hdr(struct sk_buff *skb)

  {
+   BUILD_BUG_ON(sizeof(struct virtio_net_hdr_max) > sizeof(skb->cb));
return (struct virtio_net_hdr_mrg_rxbuf *)skb->cb;
  }
  
@@ -767,6 +777,12 @@ static int receive_buf(struct virtnet_info *vi, struct receive_queue *rq,

goto frame_err;
}
  
+	if (vi->hdr_ext &&

+   virtio_net_ext_to_skb(skb,
+ (struct virtio_net_ext_hdr *)(hdr + 1))) {
+   goto frame_err;
+   }
+
skb->protocol = eth_type_trans(skb, dev);
pr_debug("Receiving skb proto 0x%04x len %i type %i\n",
 ntohs(skb->protocol), skb->len, skb->pkt_type);
@@ -1106,6 +1122,11 @@ static int xmit_skb(struct send_queue *sq, struct 
sk_buff *skb)
if (vi->mergeable_rx_bufs)
hdr->num_buffers = 0;
  
+	if (vi->hdr_ext &&

+   virtio_net_ext_from_skb(skb, (struct virtio_net_ext_hdr *)(hdr + 1),
+   vi->ext_mask))
+   BUG();
+
sg_init_table(sq->sg, skb_shinfo(skb)->nr_frags + (can_push ? 1 : 2));
if (can_push) {
__skb_push(skb, hdr_len);
diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index 5209b5e..eaa524f 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -100,4 +100,16 @@ static inline int virtio_net_hdr_from_skb(const struct 
sk_buff *skb,
return 0;
  }
  
+static inline int virtio_net_ext_to_skb(struct sk_buff *skb,

+   struct virtio_net_ext_hdr *ext)
+{
+   return 0;
+}
+
+static inline int virtio_net_ext_from_skb(const struct sk_buff *skb,
+ struct virtio_net_ext_hdr *ext,
+ __u32 ext_mask)
+{
+   return 0;
+}
  #endif /* _LINUX_VIRTIO_NET_H */
diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
index fc353b5..0039b72 100644
--- a/include/uapi/linux/virtio_net.h
+++ b/include/uapi/linux/virtio_net.h
@@ -88,6 +88,7 @@ struct virtio_net_config {
  struct virtio_net_hdr_v1 {
  #define VIRTIO_NET_HDR_F_NEEDS_CSUM   1   /* Use csum_start, csum_offset 
*/
  #define VIRTIO_NET_HDR_F_DATA_VALID   2   /* Csum is valid */
+#define VIRTIO_NET_HDR_F_VNET_EXT  4   /* Vnet extensions present */
__u8 flags;
  #define VIRTIO_NET_HDR_GSO_NONE   0   /* Not a GSO frame */
  #define VIRTIO_NET_HDR_GSO_TCPV4  1   /* GSO frame, IPv4 TCP (TSO) */
@@ -102,6 +103,16 @@ struct virtio_net_hdr_v1 {
__virtio16 num_buffers; /* Number of merged rx buffers */
  };
  
+/* If IRTIO_NET_HDR_F_VNET_EXT flags is set, this header immediately

+ * follows the virtio_net_hdr.  The flags in this header will indicate
+ * which extension will follow.  The extnsion data will immidiately follow


s/extnsion/extension/


+ * this header.
+ */
+struct virtio_net_ext_hdr {
+   __u32 flags;
+   __u8 extensions[];
+};
+
  #ifndef VIRTIO_NET_NO_LEGACY
  /* This header comes first in the scatter-gather list.
   * For legacy virtio, if VIRTIO_F_ANY_LAYOUT is not negotiated, it must




RE: [Intel-wired-lan] [PATCH] net: igbvf: Use net_device_stats from struct net_device

2017-04-17 Thread Brown, Aaron F
> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On
> Behalf Of Tobias Klauser
> Sent: Wednesday, April 5, 2017 11:45 PM
> To: Kirsher, Jeffrey T 
> Cc: netdev@vger.kernel.org; intel-wired-...@lists.osuosl.org
> Subject: [Intel-wired-lan] [PATCH] net: igbvf: Use net_device_stats from
> struct net_device
> 
> Instead of using a private copy of struct net_device_stats in
> struct igbvf_adapter, use stats from struct net_device. Also remove the
> now unnecessary .ndo_get_stats function.
> 
> Signed-off-by: Tobias Klauser 
> ---
>  drivers/net/ethernet/intel/igbvf/igbvf.h  |  1 -
>  drivers/net/ethernet/intel/igbvf/netdev.c | 26 +-
>  2 files changed, 5 insertions(+), 22 deletions(-)

Tested-by: Aaron Brown 


Re: [PATCH RFC] sparc64: eBPF JIT

2017-04-17 Thread David Miller
From: Alexei Starovoitov 
Date: Mon, 17 Apr 2017 16:27:42 -0700

> On Sun, Apr 16, 2017 at 11:38:25PM -0400, David Miller wrote:
>> +static void build_prologue(struct jit_ctx *ctx)
>> +{
>> +s32 stack_needed = 176;
>> +
>> +if (ctx->saw_frame_pointer)
>> +stack_needed += MAX_BPF_STACK;
>> +
>> +/* save %sp, -176, %sp */
>> +emit(SAVE | IMMED | RS1(SP) | S13(-stack_needed) | RD(SP), ctx);
>> +
>> +if (ctx->saw_ld_abs_ind) {
>> +load_skb_regs(ctx, bpf2sparc[BPF_REG_1]);
>> +} else {
>> +emit_nop(ctx);
>> +emit_nop(ctx);
>> +emit_nop(ctx);
>> +emit_nop(ctx);
> 
> why 4 nops? to keep prologue size constant w/ and w/o caching ?
> does it help somehow? I'm assuming that's prep for next step
> of tail_call.

I need to make some adjustments to how the branch offsets are done
if the prologue is variable size.  Simply an implementation issue,
which I intend to fix, nothing more.

> 
>> +if (insn->src_reg == BPF_REG_FP || insn->dst_reg == BPF_REG_FP) {
>> +ctx->saw_frame_pointer = true;
>> +if (BPF_CLASS(code) == BPF_ALU ||
>> +BPF_CLASS(code) == BPF_ALU64) {
>> +pr_err_once("ALU op on FP not supported by JIT\n");
>> +return -EINVAL;
> 
> That should be fine. The verifier checks for that:
>   /* check whether register used as dest operand can be written to */
>   if (regno == BPF_REG_FP) {
>   verbose("frame pointer is read only\n");
>   return -EACCES;
>   }

I need to trap it as a source as well, because if that is allowed then
I have to add special handling to every ALU operation we allow.

The reason is that the sparc64 stack is biased by 2047 bytes.  So
I have to adjust every FP relative reference to include that 2047
bias.

Can LLVM and CLANG emit arithmetic operations with FP as source?

>> +/* dst = imm64 */
>> +case BPF_LD | BPF_IMM | BPF_DW:
>> +{
>> +const struct bpf_insn insn1 = insn[1];
>> +u64 imm64;
>> +
>> +if (insn1.code != 0 || insn1.src_reg != 0 ||
>> +insn1.dst_reg != 0 || insn1.off != 0) {
>> +/* Note: verifier in BPF core must catch invalid
>> + * instructions.
>> + */
>> +pr_err_once("Invalid BPF_LD_IMM64 instruction\n");
>> +return -EINVAL;
> 
> verifier should catch that too, but extra check doesn't hurt.

I just copied from anoter JIT, I can remove it.

> all looks great to me.

Thanks for reviewing.

> 


Re: [PATCH net-next v2] net: ipv6: Fix UDP early demux lookup with udp_l3mdev_accept=0

2017-04-17 Thread Subash Abhinov Kasiviswanathan

+   break;

I think break here should remove ?


Hi Yuan

This is similar to __udp4_lib_demux_lookup where we need to check if the 
first

socket is an exact match or break since chains maybe long.

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,

a Linux Foundation Collaborative Project


[PATCH net-next] ravb: Fix ravb_ptp_interrupt clear interrupt all status

2017-04-17 Thread Simon Horman
From: Tsutomu Izawa 

This patch fixes ravb_ptp_interrupt clears GIS register of all interrupts
status. It corrects to clear PTCF bit or PTMF bit.
Also it fixes returned value to IRQ_HANDLED or IRQ_NONE.

Signed-off-by: Tsutomu Izawa 
Signed-off-by: Kazuya Mizuguchi 
Signed-off-by: Simon Horman 
---
 drivers/net/ethernet/renesas/ravb.h  |  4 ++--
 drivers/net/ethernet/renesas/ravb_main.c | 12 
 drivers/net/ethernet/renesas/ravb_ptp.c  | 15 ---
 3 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/renesas/ravb.h 
b/drivers/net/ethernet/renesas/ravb.h
index 0525bd696d5d..fbfdefa659ce 100644
--- a/drivers/net/ethernet/renesas/ravb.h
+++ b/drivers/net/ethernet/renesas/ravb.h
@@ -1,6 +1,6 @@
 /* Renesas Ethernet AVB device driver
  *
- * Copyright (C) 2014-2015 Renesas Electronics Corporation
+ * Copyright (C) 2014-2017 Renesas Electronics Corporation
  * Copyright (C) 2015 Renesas Solutions Corp.
  * Copyright (C) 2015-2016 Cogent Embedded, Inc. 
  *
@@ -1054,7 +1054,7 @@ void ravb_modify(struct net_device *ndev, enum ravb_reg 
reg, u32 clear,
 u32 set);
 int ravb_wait(struct net_device *ndev, enum ravb_reg reg, u32 mask, u32 value);
 
-void ravb_ptp_interrupt(struct net_device *ndev);
+irqreturn_t ravb_ptp_interrupt(struct net_device *ndev);
 void ravb_ptp_init(struct net_device *ndev, struct platform_device *pdev);
 void ravb_ptp_stop(struct net_device *ndev);
 
diff --git a/drivers/net/ethernet/renesas/ravb_main.c 
b/drivers/net/ethernet/renesas/ravb_main.c
index 8cfc4a54f2dc..747686386513 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -820,10 +820,8 @@ static irqreturn_t ravb_interrupt(int irq, void *dev_id)
}
 
/* gPTP interrupt status summary */
-   if (iss & ISS_CGIS) {
-   ravb_ptp_interrupt(ndev);
-   result = IRQ_HANDLED;
-   }
+   if (iss & ISS_CGIS)
+   result = ravb_ptp_interrupt(ndev);
 
mmiowb();
spin_unlock(&priv->lock);
@@ -853,10 +851,8 @@ static irqreturn_t ravb_multi_interrupt(int irq, void 
*dev_id)
}
 
/* gPTP interrupt status summary */
-   if (iss & ISS_CGIS) {
-   ravb_ptp_interrupt(ndev);
-   result = IRQ_HANDLED;
-   }
+   if (iss & ISS_CGIS)
+   result = ravb_ptp_interrupt(ndev);
 
mmiowb();
spin_unlock(&priv->lock);
diff --git a/drivers/net/ethernet/renesas/ravb_ptp.c 
b/drivers/net/ethernet/renesas/ravb_ptp.c
index eede70ec37f8..403cf85631ba 100644
--- a/drivers/net/ethernet/renesas/ravb_ptp.c
+++ b/drivers/net/ethernet/renesas/ravb_ptp.c
@@ -1,6 +1,6 @@
 /* PTP 1588 clock using the Renesas Ethernet AVB
  *
- * Copyright (C) 2013-2015 Renesas Electronics Corporation
+ * Copyright (C) 2013-2017 Renesas Electronics Corporation
  * Copyright (C) 2015 Renesas Solutions Corp.
  * Copyright (C) 2015-2016 Cogent Embedded, Inc. 
  *
@@ -296,10 +296,11 @@ static const struct ptp_clock_info ravb_ptp_info = {
 };
 
 /* Caller must hold the lock */
-void ravb_ptp_interrupt(struct net_device *ndev)
+irqreturn_t ravb_ptp_interrupt(struct net_device *ndev)
 {
struct ravb_private *priv = netdev_priv(ndev);
u32 gis = ravb_read(ndev, GIS);
+   irqreturn_t result = IRQ_NONE;
 
gis &= ravb_read(ndev, GIC);
if (gis & GIS_PTCF) {
@@ -309,6 +310,9 @@ void ravb_ptp_interrupt(struct net_device *ndev)
event.index = 0;
event.timestamp = ravb_read(ndev, GCPT);
ptp_clock_event(priv->ptp.clock, &event);
+
+   result = IRQ_HANDLED;
+   gis &= ~GIS_PTCF;
}
if (gis & GIS_PTMF) {
struct ravb_ptp_perout *perout = priv->ptp.perout;
@@ -317,9 +321,14 @@ void ravb_ptp_interrupt(struct net_device *ndev)
perout->target += perout->period;
ravb_ptp_update_compare(priv, perout->target);
}
+
+   result = IRQ_HANDLED;
+   gis &= ~GIS_PTMF;
}
 
-   ravb_write(ndev, ~gis, GIS);
+   ravb_write(ndev, gis, GIS);
+
+   return result;
 }
 
 void ravb_ptp_init(struct net_device *ndev, struct platform_device *pdev)
-- 
2.7.0.rc3.207.g0ac5344



Corrupted SKB

2017-04-17 Thread Michael Ma
Hi -

We've implemented a "glue" qdisc similar to mqprio which can associate
one qdisc to multiple txqs as the root qdisc. Reference count of the
child qdiscs have been adjusted properly in this case so that it
represents the number of txqs it has been attached to. However when
sending packets we saw the skb from dequeue_skb() corrupted with the
following call stack:

[exception RIP: netif_skb_features+51]
RIP: 815292b3  RSP: 8817f6987940  RFLAGS: 00010246

 #9 [8817f6987968] validate_xmit_skb at 815294aa
#10 [8817f69879a0] validate_xmit_skb at 8152a0d9
#11 [8817f69879b0] __qdisc_run at 8154a193
#12 [8817f6987a00] dev_queue_xmit at 81529e03

It looks like the skb has already been released since its dev pointer
field is invalid.

Any clue on how this can be investigated further? My current thought
is to add some instrumentation to the place where skb is released and
analyze whether there is any race condition happening there. However
by looking through the existing code I think the case where one root
qdisc is associated with multiple txqs already exists (when mqprio is
not used) so not sure why it won't work when we group txqs and assign
each group a root qdisc. Any insight on this issue would be much
appreciated!

Thanks,
Michael


Re: linux-next: manual merge of the net-next tree with the net tree

2017-04-17 Thread Daniel Borkmann

On 04/18/2017 02:18 AM, Stephen Rothwell wrote:

Hi all,

Today's linux-next merge of the net-next tree got a conflict in:

   kernel/bpf/syscall.c

between commits:

   6b1bb01bcc5b ("bpf: fix cb access in socket filter programs on tail calls")
   c2002f983767 ("bpf: fix checking xdp_adjust_head on tail calls")

from the net tree and commit:

   e245c5c6a565 ("bpf: move fixup_bpf_calls() function")
   79741b3bdec0 ("bpf: refactor fixup_bpf_calls()")

from the net-next tree.

I fixed it up (the latter moved and changed teh code modified by the
former  - I added the following fix up patch) and can carry the fix as
necessary. This is now fixed as far as linux-next is concerned, but any
non trivial conflicts should be mentioned to your upstream maintainer
when your tree is submitted for merging.  You may also want to consider
cooperating with the maintainer of the conflicting tree to minimise any
particularly complex conflicts.

From: Stephen Rothwell 
Date: Tue, 18 Apr 2017 10:16:03 +1000
Subject: [PATCH] bpf: merge fix for move of fixup_bpf_calls()

Signed-off-by: Stephen Rothwell 
---
  kernel/bpf/verifier.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 62e1e447ded9..5939b4c81fe1 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3349,6 +3349,14 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env)
if (insn->imm == BPF_FUNC_xdp_adjust_head)
prog->xdp_adjust_head = 1;
if (insn->imm == BPF_FUNC_tail_call) {
+   /* If we tail call into other programs, we
+* cannot make any assumptions since they
+* can be replaced dynamically during runtime
+* in the program array.
+*/
+   prog->cb_access = 1;
+   prog->xdp_adjust_head = 1;
+
/* mark bpf_tail_call as different opcode to avoid
 * conditional branch in the interpeter for every normal
 * call and to prevent accidental JITing by JIT compiler



Looks good, thanks.


Re: [PATCH v2 net-next 5/8] net/ncsi: Dump NCSI packet statistics

2017-04-17 Thread Gavin Shan
On Thu, Apr 13, 2017 at 07:30:45PM -0700, Joe Perches wrote:
>On Thu, 2017-04-13 at 17:48 +1000, Gavin Shan wrote:
>> This creates /sys/kernel/debug/ncsi//stats to dump the NCSI
>> packets sent and received over all packages and channels. It's useful
>> to diagnose NCSI problems, especially when NCSI packages and channels
>> aren't probed properly. The statistics can be gained from debugfs file
>> as below:
>> 
>>  # cat /sys/kernel/debug/ncsi/eth0/stats
>> 
>>  CMD  OK   TIMEOUT  ERROR
>>  ===
>>  CIS  32   29   0
>>  SP   10   70
>>  DP   17   14   0
>>  EC   100
>>  ECNT 100
>>  AE   100
>>  GLS  11   00
>>  SMA  100
>>  EBF  100
>>  GVI  200
>>  GC   200
>
>more trivia:
>
>> diff --git a/net/ncsi/ncsi-debug.c b/net/ncsi/ncsi-debug.c
>[]
>> @@ -23,6 +23,235 @@
>>  #include "ncsi-pkt.h"
>>  
>>  static struct dentry *ncsi_dentry;
>> +static struct ncsi_pkt_handler {
>> +unsigned char   type;
>> +const char  *name;
>> +} ncsi_pkt_handlers[] = {
>> +{ NCSI_PKT_CMD_CIS,"CIS"},
>> +{ NCSI_PKT_CMD_SP, "SP" },
>> +{ NCSI_PKT_CMD_DP, "DP" },
>> +{ NCSI_PKT_CMD_EC, "EC" },
>> +{ NCSI_PKT_CMD_DC, "DC" },
>> +{ NCSI_PKT_CMD_RC, "RC" },
>> +{ NCSI_PKT_CMD_ECNT,   "ECNT"   },
>> +{ NCSI_PKT_CMD_DCNT,   "DCNT"   },
>> +{ NCSI_PKT_CMD_AE, "AE" },
>> +{ NCSI_PKT_CMD_SL, "SL" },
>> +{ NCSI_PKT_CMD_GLS,"GLS"},
>> +{ NCSI_PKT_CMD_SVF,"SVF"},
>> +{ NCSI_PKT_CMD_EV, "EV" },
>> +{ NCSI_PKT_CMD_DV, "DV" },
>> +{ NCSI_PKT_CMD_SMA,"SMA"},
>> +{ NCSI_PKT_CMD_EBF,"EBF"},
>> +{ NCSI_PKT_CMD_DBF,"DBF"},
>> +{ NCSI_PKT_CMD_EGMF,   "EGMF"   },
>> +{ NCSI_PKT_CMD_DGMF,   "DGMF"   },
>> +{ NCSI_PKT_CMD_SNFC,   "SNFC"   },
>> +{ NCSI_PKT_CMD_GVI,"GVI"},
>> +{ NCSI_PKT_CMD_GC, "GC" },
>> +{ NCSI_PKT_CMD_GP, "GP" },
>> +{ NCSI_PKT_CMD_GCPS,   "GCPS"   },
>> +{ NCSI_PKT_CMD_GNS,"GNS"},
>> +{ NCSI_PKT_CMD_GNPTS,  "GNPTS"  },
>> +{ NCSI_PKT_CMD_GPS,"GPS"},
>> +{ NCSI_PKT_CMD_OEM,"OEM"},
>> +{ NCSI_PKT_CMD_PLDM,   "PLDM"   },
>> +{ NCSI_PKT_CMD_GPUUID, "GPUUID" },
>
>I don't know how common these are and how
>intelligible these acronyms are to knowledgeable
>developer/users, but maybe it'd be better to
>spell out what these are instead of having to
>look up what the acronyms stand for
>
>   CIS - Clear Initial State
>   SP - Select Package
>   etc...
>
>Maybe copy the descriptions from the ncsi-pkt.h file
>

Joe, good question. As these decriptive strings are part of
the output from ncsi/eth0/stats and input to ncsi/eth0/pkt,
I intended to keep them short enough. Also, this debugging
interface would service developers who knows NCSI protocol
and perhaps know the meanings of these acronyms.

Thanks,
Gavin

>#define NCSI_PKT_CMD_CIS   0x00 /* Clear Initial State  */
>#define NCSI_PKT_CMD_SP0x01 /* Select Package  
> */
>#define NCSI_PKT_CMD_DP0x02 /* Deselect Package
> */
>#define NCSI_PKT_CMD_EC0x03 /* Enable Channel  
> */
>#define NCSI_PKT_CMD_DC0x04 /* Disable Channel 
> */
>#define NCSI_PKT_CMD_RC0x05 /* Reset Channel   
> */
>#define NCSI_PKT_CMD_ECNT  0x06 /* Enable Channel Network Tx*/
>#define NCSI_PKT_CMD_DCNT  0x07 /* Disable Channel Network Tx   */
>#define NCSI_PKT_CMD_AE0x08 /* AEN Enable  
> */
>#define NCSI_PKT_CMD_SL0x09 /* Set Link
> */
>#define NCSI_PKT_CMD_GLS   0x0a /* Get Link */
>#define NCSI_PKT_CMD_SVF   0x0b /* Set VLAN Filter  */
>#define NCSI_PKT_CMD_EV0x0c /* Enable VLAN 
> */
>#define NCSI_PKT_CMD_DV0x0d /* Disable VLAN
> */
>#define NCSI_PKT_CMD_SMA   0x0e /* Set MAC address  */
>#define NCSI_PKT_CMD_EBF   0x10 /* Enable Broadcast Filter  */
>#define NCSI_PKT_CMD_DBF   0x11 /* Disable Broadcast Filter */
>#define NCSI_PKT_CMD_EGMF  0x12 /* Enable Global Multicast Filter   */
>#define NCSI_PKT_CMD_DGMF  0x13 /* Disable Global Multicast Filter  */
>#define NCSI_PKT_CMD_SNFC  0x14 /* Set NCSI Flow Control*/
>#define NCSI_PKT_CMD_GVI   0x15 /* Get Version ID   */
>#define NCSI_PKT_CMD_GC0x16 /* Get Capabilities
> */
>#define NC

linux-next: manual merge of the net-next tree with the net tree

2017-04-17 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the net-next tree got a conflict in:

  kernel/bpf/syscall.c

between commits:

  6b1bb01bcc5b ("bpf: fix cb access in socket filter programs on tail calls")
  c2002f983767 ("bpf: fix checking xdp_adjust_head on tail calls")

from the net tree and commit:

  e245c5c6a565 ("bpf: move fixup_bpf_calls() function")
  79741b3bdec0 ("bpf: refactor fixup_bpf_calls()")

from the net-next tree.

I fixed it up (the latter moved and changed teh code modified by the
former  - I added the following fix up patch) and can carry the fix as
necessary. This is now fixed as far as linux-next is concerned, but any
non trivial conflicts should be mentioned to your upstream maintainer
when your tree is submitted for merging.  You may also want to consider
cooperating with the maintainer of the conflicting tree to minimise any
particularly complex conflicts.

From: Stephen Rothwell 
Date: Tue, 18 Apr 2017 10:16:03 +1000
Subject: [PATCH] bpf: merge fix for move of fixup_bpf_calls()

Signed-off-by: Stephen Rothwell 
---
 kernel/bpf/verifier.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 62e1e447ded9..5939b4c81fe1 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3349,6 +3349,14 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env)
if (insn->imm == BPF_FUNC_xdp_adjust_head)
prog->xdp_adjust_head = 1;
if (insn->imm == BPF_FUNC_tail_call) {
+   /* If we tail call into other programs, we
+* cannot make any assumptions since they
+* can be replaced dynamically during runtime
+* in the program array.
+*/
+   prog->cb_access = 1;
+   prog->xdp_adjust_head = 1;
+
/* mark bpf_tail_call as different opcode to avoid
 * conditional branch in the interpeter for every normal
 * call and to prevent accidental JITing by JIT compiler
-- 
2.11.0

-- 
Cheers,
Stephen Rothwell


Re: [PATCH v2 net-next 5/8] net/ncsi: Dump NCSI packet statistics

2017-04-17 Thread Gavin Shan
On Thu, Apr 13, 2017 at 06:50:42PM -0700, Jakub Kicinski wrote:
>On Thu, 13 Apr 2017 17:48:18 +1000, Gavin Shan wrote:
>> This creates /sys/kernel/debug/ncsi//stats to dump the NCSI
>> packets sent and received over all packages and channels. It's useful
>> to diagnose NCSI problems, especially when NCSI packages and channels
>> aren't probed properly. The statistics can be gained from debugfs file
>> as below:
>> 
>>  # cat /sys/kernel/debug/ncsi/eth0/stats
>> 
>>  CMD  OK   TIMEOUT  ERROR
>>  ===
>>  CIS  32   29   0
>>  SP   10   70
>>  DP   17   14   0
>>  EC   100
>>  ECNT 100
>>  AE   100
>>  GLS  11   00
>>  SMA  100
>>  EBF  100
>>  GVI  200
>>  GC   200
>> 
>>  RSP  OK   TIMEOUT  ERROR
>>  ===
>>  CIS  300
>>  SP   300
>>  DP   201
>>  EC   100
>>  ECNT 100
>>  AE   100
>>  GLS  11   00
>>  SMA  100
>>  EBF  100
>>  GVI  002
>>  GC   200
>> 
>>  AEN  OK   TIMEOUT  ERROR
>>  ===
>> 
>> Signed-off-by: Gavin Shan 
>
>I'm not familiar with NC-SI but these look like some standard stats.
>Would it make sense to provide a proper netlink API for them?
>
>[...]
>> +#ifdef CONFIG_NET_NCSI_DEBUG
>> +ndp->stats.aen[h->type][NCSI_PKT_STAT_ERROR]++;
>> +#endif
>
>In any case, did you consider creating a macro or inline helper to
>limit the number of #ifdefs?
>

Jakub, thanks for the comments. NCSI does have standard statistics
about the packets passed to peer (NIC) or NCSI packets handled by
hardware. I have some patches (not posted yet and won't post in
this merge window) to create debugfs file ncsi/eth0/p0/c0/stats
and dump them there.

This debugfs tracks NCSI packets sent and received by software
in order to see if the software has obvious bugs.

Yeah, it's definitely worthy to eliminate the #ifdef's. I'll do
in next respin.

Thanks,
Gavin



Re: [PATCH] net/ncsi: fix checksum validation in response packet

2017-04-17 Thread Gavin Shan
On Mon, Apr 17, 2017 at 01:36:19PM -0400, David Miller wrote:
>From: Cédric Le Goater 
>Date: Fri, 14 Apr 2017 10:56:37 +0200
>
>> htonl was used instead of ntohl. Surely a typo.
>> 
>> Signed-off-by: Cédric Le Goater 
>
>I don't think so, "checksum" is of type "u32" thus is in host byte
>order.  Therefore "htonl()" is correct.
>

Yeah, "htonl()" is correct here. "*pchecksum" is in big-endian.
I want to know how Cédric thinks it's a problem. I guess he might
encounter the issue on the emulated NCSI channel by QEMU. On BCM5718
or BCM5719, the checksum in AEN and response packet are zero'd, meaning
the software shouldn't validate it at all.

Thanks,
Gavin



RE: [PATCH net-next v2] net: ipv6: Fix UDP early demux lookup with udp_l3mdev_accept=0

2017-04-17 Thread YUAN Linyu


> -Original Message-
> From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org]
> On Behalf Of Subash Abhinov Kasiviswanathan
> Sent: Tuesday, April 18, 2017 7:25 AM
> To: d...@cumulusnetworks.com; da...@davemloft.net;
> netdev@vger.kernel.org; rshea...@brocade.com; eric.duma...@gmail.com
> Cc: Subash Abhinov Kasiviswanathan; Eric Dumazet
> Subject: [PATCH net-next v2] net: ipv6: Fix UDP early demux lookup with
> udp_l3mdev_accept=0
> 
> - return sk;
> + udp_portaddr_for_each_entry_rcu(sk, &hslot2->head) {
> + if (INET6_MATCH(sk, net, rmt_addr, loc_addr, ports, dif))
> + return sk;
> + break;
I think break here should remove ?
> + }
> + return NULL;
>  }
> --
> 1.9.1



[PATCH v2 net-next] drivers: net: xgene-v2: Extend ethtool statistics

2017-04-17 Thread Iyappan Subramanian
This patch adds extended statistics reporting to ethtool.

In summary, this patch,

   - adds ethtool.h with the statistics register definitions
   - adds 'struct xge_gstrings_extd_stats' to gather extended stats
   - modifies xge_get_strings(), get_sset_count() and
 get_ethtool_stats() accordingly
   - moves 'struct xge_gstrings_stats' to ethtool.h

Signed-off-by: Iyappan Subramanian 
---
v2: Address review comments from v1
- removed duplicate statistics counters that were reported by
  xge_get_stats64()
v1:
- Initial version
---

 drivers/net/ethernet/apm/xgene-v2/ethtool.c | 78 ++---
 drivers/net/ethernet/apm/xgene-v2/ethtool.h | 78 +
 drivers/net/ethernet/apm/xgene-v2/mac.h |  3 --
 drivers/net/ethernet/apm/xgene-v2/main.h|  2 +-
 4 files changed, 151 insertions(+), 10 deletions(-)
 create mode 100644 drivers/net/ethernet/apm/xgene-v2/ethtool.h

diff --git a/drivers/net/ethernet/apm/xgene-v2/ethtool.c 
b/drivers/net/ethernet/apm/xgene-v2/ethtool.c
index 0c426f5..be4 100644
--- a/drivers/net/ethernet/apm/xgene-v2/ethtool.c
+++ b/drivers/net/ethernet/apm/xgene-v2/ethtool.c
@@ -21,12 +21,13 @@
 
 #include "main.h"
 
-struct xge_gstrings_stats {
-   char name[ETH_GSTRING_LEN];
-   int offset;
-};
-
 #define XGE_STAT(m){ #m, offsetof(struct xge_pdata, stats.m) }
+#define XGE_EXTD_STAT(m, n)\
+   {   \
+   #m, \
+   n,  \
+   0   \
+   }
 
 static const struct xge_gstrings_stats gstrings_stats[] = {
XGE_STAT(rx_packets),
@@ -36,7 +37,62 @@ struct xge_gstrings_stats {
XGE_STAT(rx_errors)
 };
 
+static struct xge_gstrings_extd_stats gstrings_extd_stats[] = {
+   XGE_EXTD_STAT(tx_rx_64b_frame_cntr, TR64),
+   XGE_EXTD_STAT(tx_rx_127b_frame_cntr, TR127),
+   XGE_EXTD_STAT(tx_rx_255b_frame_cntr, TR255),
+   XGE_EXTD_STAT(tx_rx_511b_frame_cntr, TR511),
+   XGE_EXTD_STAT(tx_rx_1023b_frame_cntr, TR1K),
+   XGE_EXTD_STAT(tx_rx_1518b_frame_cntr, TRMAX),
+   XGE_EXTD_STAT(tx_rx_1522b_frame_cntr, TRMGV),
+   XGE_EXTD_STAT(rx_fcs_error_cntr, RFCS),
+   XGE_EXTD_STAT(rx_multicast_pkt_cntr, RMCA),
+   XGE_EXTD_STAT(rx_broadcast_pkt_cntr, RBCA),
+   XGE_EXTD_STAT(rx_ctrl_frame_pkt_cntr, RXCF),
+   XGE_EXTD_STAT(rx_pause_frame_pkt_cntr, RXPF),
+   XGE_EXTD_STAT(rx_unk_opcode_cntr, RXUO),
+   XGE_EXTD_STAT(rx_align_err_cntr, RALN),
+   XGE_EXTD_STAT(rx_frame_len_err_cntr, RFLR),
+   XGE_EXTD_STAT(rx_code_err_cntr, RCDE),
+   XGE_EXTD_STAT(rx_carrier_sense_err_cntr, RCSE),
+   XGE_EXTD_STAT(rx_undersize_pkt_cntr, RUND),
+   XGE_EXTD_STAT(rx_oversize_pkt_cntr, ROVR),
+   XGE_EXTD_STAT(rx_fragments_cntr, RFRG),
+   XGE_EXTD_STAT(rx_jabber_cntr, RJBR),
+   XGE_EXTD_STAT(rx_dropped_pkt_cntr, RDRP),
+   XGE_EXTD_STAT(tx_multicast_pkt_cntr, TMCA),
+   XGE_EXTD_STAT(tx_broadcast_pkt_cntr, TBCA),
+   XGE_EXTD_STAT(tx_pause_ctrl_frame_cntr, TXPF),
+   XGE_EXTD_STAT(tx_defer_pkt_cntr, TDFR),
+   XGE_EXTD_STAT(tx_excv_defer_pkt_cntr, TEDF),
+   XGE_EXTD_STAT(tx_single_col_pkt_cntr, TSCL),
+   XGE_EXTD_STAT(tx_multi_col_pkt_cntr, TMCL),
+   XGE_EXTD_STAT(tx_late_col_pkt_cntr, TLCL),
+   XGE_EXTD_STAT(tx_excv_col_pkt_cntr, TXCL),
+   XGE_EXTD_STAT(tx_total_col_cntr, TNCL),
+   XGE_EXTD_STAT(tx_pause_frames_hnrd_cntr, TPFH),
+   XGE_EXTD_STAT(tx_drop_frame_cntr, TDRP),
+   XGE_EXTD_STAT(tx_jabber_frame_cntr, TJBR),
+   XGE_EXTD_STAT(tx_fcs_error_cntr, TFCS),
+   XGE_EXTD_STAT(tx_ctrl_frame_cntr, TXCF),
+   XGE_EXTD_STAT(tx_oversize_frame_cntr, TOVR),
+   XGE_EXTD_STAT(tx_undersize_frame_cntr, TUND),
+   XGE_EXTD_STAT(tx_fragments_cntr, TFRG)
+};
+
 #define XGE_STATS_LEN  ARRAY_SIZE(gstrings_stats)
+#define XGE_EXTD_STATS_LEN ARRAY_SIZE(gstrings_extd_stats)
+
+static void xge_mac_get_extd_stats(struct xge_pdata *pdata)
+{
+   u32 data;
+   int i;
+
+   for (i = 0; i < XGE_EXTD_STATS_LEN; i++) {
+   data = xge_rd_csr(pdata, gstrings_extd_stats[i].addr);
+   gstrings_extd_stats[i].value += data;
+   }
+}
 
 static void xge_get_drvinfo(struct net_device *ndev,
struct ethtool_drvinfo *info)
@@ -62,6 +118,11 @@ static void xge_get_strings(struct net_device *ndev, u32 
stringset, u8 *data)
memcpy(p, gstrings_stats[i].name, ETH_GSTRING_LEN);
p += ETH_GSTRING_LEN;
}
+
+   for (i = 0; i < XGE_EXTD_STATS_LEN; i++) {
+   memcpy(p, gstrings_extd_stats[i].name, ETH_GSTRING_LEN);
+   p += ETH_GSTRING_LEN;
+   }
 }
 
 static int xge_get_sset_count(struct net_device 

Re: [PATCH v3 net-next RFC] Generic XDP

2017-04-17 Thread Daniel Borkmann

On 04/18/2017 01:04 AM, Alexei Starovoitov wrote:

On Mon, Apr 17, 2017 at 03:49:55PM -0400, David Miller wrote:

From: Jesper Dangaard Brouer 
Date: Sun, 16 Apr 2017 22:26:01 +0200


The bpf tail-call use-case is a very good example of why the
verifier cannot deduct the needed HEADROOM upfront.


This brings up a very interesting question for me.

I notice that tail calls are implemented by JITs largely by skipping
over the prologue of that destination program.

However, many JITs preload cached SKB values into fixed registers in
the prologue.  But they only do this if the program being JITed needs
those values.

So how can it work properly if a program that does not need the SKB
values tail calls into one that does?


For x86 JIT it's fine, since caching of skb values is not part of the prologue:
   emit_prologue(&prog);
   if (seen_ld_abs)
   emit_load_skb_data_hlen(&prog);
and tail_call jumps into the next program as:
   EMIT4(0x48, 0x83, 0xC0, PROLOGUE_SIZE);   /* add rax, prologue_size */
   EMIT2(0xFF, 0xE0);/* jmp rax */
whereas inside emit_prologue() we have:
B  UILD_BUG_ON(cnt != PROLOGUE_SIZE);

arm64 has similar proplogue skipping code and it's even
simpler than x86, since it doesn't try to optimize LD_ABS/IND in assembler
and instead calls into bpf_load_pointer() from generated code,
so no caching of skb values at all.

s390 jit has partial skipping of prologue, since bunch
of registers are save/restored during tail_call and it looks fine
to me as well.


And ppc64 does unwinding/tearing down the stack of the prog before
jumping into the other program. Thus, no skipping of others prologue;
looks fine, too.


It's very hard to extend test_bpf.ko with tail_calls, since maps need
to be allocated and populated with file descriptors which are
not feasible to do from .ko. Instead we need a user space based test for it.
We've started building one in tools/testing/selftests/bpf/test_progs.c
much more tests need to be added. Thorough testing of tail_calls
is on the todo list.


Re: [PATCH RFC] sparc64: eBPF JIT

2017-04-17 Thread Alexei Starovoitov
On Sun, Apr 16, 2017 at 11:38:25PM -0400, David Miller wrote:
> 
> There are a bunch of things I want to do still, and I know that I have
> to attend to sparc32 more cleanly, but I wanted to post this now that
> I have it passing the BPF testsuite completely:
> 
> [24174.315421] test_bpf: Summary: 305 PASSED, 0 FAILED, [297/297 JIT'ed]

Nice!

> +static void build_prologue(struct jit_ctx *ctx)
> +{
> + s32 stack_needed = 176;
> +
> + if (ctx->saw_frame_pointer)
> + stack_needed += MAX_BPF_STACK;
> +
> + /* save %sp, -176, %sp */
> + emit(SAVE | IMMED | RS1(SP) | S13(-stack_needed) | RD(SP), ctx);
> +
> + if (ctx->saw_ld_abs_ind) {
> + load_skb_regs(ctx, bpf2sparc[BPF_REG_1]);
> + } else {
> + emit_nop(ctx);
> + emit_nop(ctx);
> + emit_nop(ctx);
> + emit_nop(ctx);

why 4 nops? to keep prologue size constant w/ and w/o caching ?
does it help somehow? I'm assuming that's prep for next step
of tail_call.

> + if (insn->src_reg == BPF_REG_FP || insn->dst_reg == BPF_REG_FP) {
> + ctx->saw_frame_pointer = true;
> + if (BPF_CLASS(code) == BPF_ALU ||
> + BPF_CLASS(code) == BPF_ALU64) {
> + pr_err_once("ALU op on FP not supported by JIT\n");
> + return -EINVAL;

That should be fine. The verifier checks for that:
  /* check whether register used as dest operand can be written to */
  if (regno == BPF_REG_FP) {
  verbose("frame pointer is read only\n");
  return -EACCES;
  }

> + /* dst = imm64 */
> + case BPF_LD | BPF_IMM | BPF_DW:
> + {
> + const struct bpf_insn insn1 = insn[1];
> + u64 imm64;
> +
> + if (insn1.code != 0 || insn1.src_reg != 0 ||
> + insn1.dst_reg != 0 || insn1.off != 0) {
> + /* Note: verifier in BPF core must catch invalid
> +  * instructions.
> +  */
> + pr_err_once("Invalid BPF_LD_IMM64 instruction\n");
> + return -EINVAL;

verifier should catch that too, but extra check doesn't hurt.

> + /* STX XADD: lock *(u32 *)(dst + off) += src */
> + case BPF_STX | BPF_XADD | BPF_W: {
> + const u8 tmp = bpf2sparc[TMP_REG_1];
> + const u8 tmp2 = bpf2sparc[TMP_REG_2];
> + const u8 tmp3 = bpf2sparc[TMP_REG_3];
> + s32 real_off = off;
> +
> + ctx->tmp_1_used = true;
> + ctx->tmp_2_used = true;
> + ctx->tmp_3_used = true;
> + if (dst == FP)
> + real_off += STACK_BIAS;
> + emit_loadimm(real_off, tmp, ctx);
> + emit_alu3(ADD, dst, tmp, tmp, ctx);
> +
> + emit(LD32 | RS1(tmp) | RS2(G0) | RD(tmp2), ctx);
> + emit_alu3(ADD, tmp2, src, tmp3, ctx);
> + emit(CAS | ASI(ASI_P) | RS1(tmp) | RS2(tmp2) | RD(tmp3), ctx);
> + emit_cmp(tmp2, tmp3, ctx);
> + emit_branch(BNE, 4, 0, ctx);
> + emit_nop(ctx);

loops in bpf code! run for your life ;)

> + if (bpf_jit_enable > 1)
> + bpf_jit_dump(prog->len, image_size, 2, ctx.image);
> + bpf_flush_icache(ctx.image, ctx.image + image_size);
> +
> + bpf_jit_binary_lock_ro(header);

all looks great to me.
Thanks!



[PATCH net-next v2] net: ipv6: Fix UDP early demux lookup with udp_l3mdev_accept=0

2017-04-17 Thread Subash Abhinov Kasiviswanathan
David Ahern reported that 5425077d73e0c ("net: ipv6: Add early demux
handler for UDP unicast") breaks udp_l3mdev_accept=0 since early
demux for IPv6 UDP was doing a generic socket lookup which does not
require an exact match. Fix this by making UDPv6 early demux match
connected sockets only.

v1->v2: Take reference to socket after match as suggested by Eric

Fixes: 5425077d73e0c ("net: ipv6: Add early demux handler for UDP unicast")
Reported-by: David Ahern 
Signed-off-by: Subash Abhinov Kasiviswanathan 
Cc: Eric Dumazet 
---
 net/ipv6/udp.c | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index b793ed1..5a4504b 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -46,6 +46,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -864,21 +865,25 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table 
*udptable,
return 0;
 }
 
+
 static struct sock *__udp6_lib_demux_lookup(struct net *net,
__be16 loc_port, const struct in6_addr *loc_addr,
__be16 rmt_port, const struct in6_addr *rmt_addr,
int dif)
 {
+   unsigned short hnum = ntohs(loc_port);
+   unsigned int hash2 = udp6_portaddr_hash(net, loc_addr, hnum);
+   unsigned int slot2 = hash2 & udp_table.mask;
+   struct udp_hslot *hslot2 = &udp_table.hash2[slot2];
+   const __portpair ports = INET_COMBINED_PORTS(rmt_port, hnum);
struct sock *sk;
 
-   rcu_read_lock();
-   sk = __udp6_lib_lookup(net, rmt_addr, rmt_port, loc_addr, loc_port,
-  dif, &udp_table, NULL);
-   if (sk && !atomic_inc_not_zero(&sk->sk_refcnt))
-   sk = NULL;
-   rcu_read_unlock();
-
-   return sk;
+   udp_portaddr_for_each_entry_rcu(sk, &hslot2->head) {
+   if (INET6_MATCH(sk, net, rmt_addr, loc_addr, ports, dif))
+   return sk;
+   break;
+   }
+   return NULL;
 }
 
 static void udp_v6_early_demux(struct sk_buff *skb)
@@ -903,7 +908,7 @@ static void udp_v6_early_demux(struct sk_buff *skb)
else
return;
 
-   if (!sk)
+   if (!sk || !atomic_inc_not_zero_hint(&sk->sk_refcnt, 2))
return;
 
skb->sk = sk;
-- 
1.9.1



Re: [PATCH v2 net-next 5/8] net/ncsi: Dump NCSI packet statistics

2017-04-17 Thread Gavin Shan
On Thu, Apr 13, 2017 at 03:50:40AM -0700, Joe Perches wrote:
>On Thu, 2017-04-13 at 17:48 +1000, Gavin Shan wrote:
>> This creates /sys/kernel/debug/ncsi//stats to dump the NCSI
>> packets sent and received over all packages and channels. It's useful
>> to diagnose NCSI problems, especially when NCSI packages and channels
>> aren't probed properly.
>
>trivia:
>
>> diff --git a/net/ncsi/ncsi-debug.c b/net/ncsi/ncsi-debug.c
>> index 6c00e9b..29c233c 100644
>> --- a/net/ncsi/ncsi-debug.c
>> +++ b/net/ncsi/ncsi-debug.c
>> @@ -23,6 +23,235 @@
>>  #include "ncsi-pkt.h"
>>  
>>  static struct dentry *ncsi_dentry;
>> +static struct ncsi_pkt_handler {
>
>static const struct etc...
>

Yes.

>> +unsigned char   type;
>> +const char  *name;
>> +} ncsi_pkt_handlers[] = {
>> +{ NCSI_PKT_CMD_CIS,"CIS"},
>[]
>> +static bool ncsi_dev_stats_index(struct ncsi_dev_priv *ndp, loff_t pos,
>> + unsigned long *type, unsigned long *index,
>> + unsigned long *entries)
>> +{
>> +int i;
>> +unsigned long ranges[3][2] = {
>> +{ 1,
>> +  ARRAY_SIZE(ndp->stats.cmd) - 1},
>> +{ ranges[0][1] + 2,
>> +  ranges[1][0] + ARRAY_SIZE(ndp->stats.rsp) - 1 },
>> +{ ranges[1][1] + 2,
>> +  ranges[2][0] + ARRAY_SIZE(ndp->stats.aen) - 1 }
>> +};
>
>const?
>

Yes. I will modify in next respin.

Thanks,
Gavin



Re: [PATCH v2 net-next 4/8] net/ncsi: Add debugging infrastructurre

2017-04-17 Thread Gavin Shan
On Thu, Apr 13, 2017 at 03:41:46AM -0700, Joe Perches wrote:
>On Thu, 2017-04-13 at 17:48 +1000, Gavin Shan wrote:
>> This creates debugfs directories as NCSI debugging infrastructure.
>> With the patch applied, We will see below debugfs directories. Every
>> NCSI package and channel has one corresponding directory. Other than
>> presenting the NCSI topology, No real function has been achieved
>> through these debugfs directories so far.
>> 
>>  /sys/kernel/debug/ncsi/eth0
>>  /sys/kernel/debug/ncsi/eth0/p0
>>  /sys/kernel/debug/ncsi/eth0/p0/c0
>>  /sys/kernel/debug/ncsi/eth0/p0/c1
>
>[]
>
>> diff --git a/net/ncsi/ncsi-debug.c b/net/ncsi/ncsi-debug.c
>[]
>> +int ncsi_dev_init_debug(struct ncsi_dev_priv *ndp)
>> +{
>> +if (WARN_ON_ONCE(ndp->dentry))
>> +return 0;
>> +
>> +if (!ncsi_dentry) {
>> +ncsi_dentry = debugfs_create_dir("ncsi", NULL);
>> +if (!ncsi_dentry) {
>> +pr_warn("NCSI: Cannot create /sys/kernel/debug/ncsi\n");
>> +return -ENOENT;
>
>debugfs does not have a fixed path.
>
>Most error messages for this just use something like
>   pr_("Failed to create debugfs directory '%s'\n", foo)
>And most failures don't emit any error message at all.
>

Thanks, Joe. I will correct in next respin.

Thanks,
Gavin




Re: [PATCH v3 net-next RFC] Generic XDP

2017-04-17 Thread Alexei Starovoitov
On Mon, Apr 17, 2017 at 03:49:55PM -0400, David Miller wrote:
> From: Jesper Dangaard Brouer 
> Date: Sun, 16 Apr 2017 22:26:01 +0200
> 
> > The bpf tail-call use-case is a very good example of why the
> > verifier cannot deduct the needed HEADROOM upfront.
> 
> This brings up a very interesting question for me.
> 
> I notice that tail calls are implemented by JITs largely by skipping
> over the prologue of that destination program.
> 
> However, many JITs preload cached SKB values into fixed registers in
> the prologue.  But they only do this if the program being JITed needs
> those values.
> 
> So how can it work properly if a program that does not need the SKB
> values tail calls into one that does?

For x86 JIT it's fine, since caching of skb values is not part of the prologue:
  emit_prologue(&prog);
  if (seen_ld_abs)
  emit_load_skb_data_hlen(&prog);
and tail_call jumps into the next program as:
  EMIT4(0x48, 0x83, 0xC0, PROLOGUE_SIZE);   /* add rax, prologue_size */
  EMIT2(0xFF, 0xE0);/* jmp rax */
whereas inside emit_prologue() we have:
B  UILD_BUG_ON(cnt != PROLOGUE_SIZE);

arm64 has similar proplogue skipping code and it's even
simpler than x86, since it doesn't try to optimize LD_ABS/IND in assembler
and instead calls into bpf_load_pointer() from generated code,
so no caching of skb values at all.

s390 jit has partial skipping of prologue, since bunch
of registers are save/restored during tail_call and it looks fine
to me as well.

It's very hard to extend test_bpf.ko with tail_calls, since maps need
to be allocated and populated with file descriptors which are
not feasible to do from .ko. Instead we need a user space based test for it.
We've started building one in tools/testing/selftests/bpf/test_progs.c
much more tests need to be added. Thorough testing of tail_calls
is on the todo list.



[PATCH] nl80211: Fix enum type of variable in nl80211_put_sta_rate()

2017-04-17 Thread Matthias Kaehlcke
rate_flg is of type 'enum nl80211_attrs', however it is assigned with
'enum nl80211_rate_info' values. Change the type of rate_flg accordingly.

Signed-off-by: Matthias Kaehlcke 
---
 net/wireless/nl80211.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index 2312dc2ffdb9..9af21a21ea6b 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -4151,7 +4151,7 @@ static bool nl80211_put_sta_rate(struct sk_buff *msg, 
struct rate_info *info,
struct nlattr *rate;
u32 bitrate;
u16 bitrate_compat;
-   enum nl80211_attrs rate_flg;
+   enum nl80211_rate_info rate_flg;
 
rate = nla_nest_start(msg, attr);
if (!rate)
-- 
2.12.2.762.g0e3151a226-goog



[PATCH v3 6/9] ftgmac100: Allow configuration of phy interface via device-tree

2017-04-17 Thread Benjamin Herrenschmidt
This uses the standard phy-mode property

Signed-off-by: Benjamin Herrenschmidt 
---
 drivers/net/ethernet/faraday/ftgmac100.c | 42 +---
 1 file changed, 39 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
b/drivers/net/ethernet/faraday/ftgmac100.c
index f76765e..7721c2a 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -1051,7 +1052,7 @@ static void ftgmac100_adjust_link(struct net_device 
*netdev)
schedule_work(&priv->reset_task);
 }
 
-static int ftgmac100_mii_probe(struct ftgmac100 *priv)
+static int ftgmac100_mii_probe(struct ftgmac100 *priv, phy_interface_t intf)
 {
struct net_device *netdev = priv->netdev;
struct phy_device *phydev;
@@ -1063,7 +1064,7 @@ static int ftgmac100_mii_probe(struct ftgmac100 *priv)
}
 
phydev = phy_connect(netdev, phydev_name(phydev),
-&ftgmac100_adjust_link, PHY_INTERFACE_MODE_GMII);
+&ftgmac100_adjust_link, intf);
 
if (IS_ERR(phydev)) {
netdev_err(netdev, "%s: Could not attach to PHY\n", 
netdev->name);
@@ -1618,6 +1619,8 @@ static int ftgmac100_setup_mdio(struct net_device *netdev)
 {
struct ftgmac100 *priv = netdev_priv(netdev);
struct platform_device *pdev = to_platform_device(priv->dev);
+   int phy_intf = PHY_INTERFACE_MODE_RGMII;
+   struct device_node *np = pdev->dev.of_node;
int i, err = 0;
u32 reg;
 
@@ -1633,6 +1636,39 @@ static int ftgmac100_setup_mdio(struct net_device 
*netdev)
iowrite32(reg, priv->base + FTGMAC100_OFFSET_REVR);
};
 
+   /* Get PHY mode from device-tree */
+   if (np) {
+   /* Default to RGMII. It's a gigabit part after all */
+   phy_intf = of_get_phy_mode(np);
+   if (phy_intf < 0)
+   phy_intf = PHY_INTERFACE_MODE_RGMII;
+
+   /* Aspeed only supports these. I don't know about other IP
+* block vendors so I'm going to just let them through for
+* now. Note that this is only a warning if for some obscure
+* reason the DT really means to lie about it or it's a newer
+* part we don't know about.
+*
+* On the Aspeed SoC there are additionally straps and SCU
+* control bits that could tell us what the interface is
+* (or allow us to configure it while the IP block is held
+* in reset). For now I chose to keep this driver away from
+* those SoC specific bits and assume the device-tree is
+* right and the SCU has been configured properly by pinmux
+* or the firmware.
+*/
+   if (priv->is_aspeed &&
+   phy_intf != PHY_INTERFACE_MODE_RMII &&
+   phy_intf != PHY_INTERFACE_MODE_RGMII &&
+   phy_intf != PHY_INTERFACE_MODE_RGMII_ID &&
+   phy_intf != PHY_INTERFACE_MODE_RGMII_RXID &&
+   phy_intf != PHY_INTERFACE_MODE_RGMII_TXID) {
+   netdev_warn(netdev,
+  "Unsupported PHY mode %s !\n",
+  phy_modes(phy_intf));
+   }
+   }
+
priv->mii_bus->name = "ftgmac100_mdio";
snprintf(priv->mii_bus->id, MII_BUS_ID_SIZE, "%s-%d",
 pdev->name, pdev->id);
@@ -1649,7 +1685,7 @@ static int ftgmac100_setup_mdio(struct net_device *netdev)
goto err_register_mdiobus;
}
 
-   err = ftgmac100_mii_probe(priv);
+   err = ftgmac100_mii_probe(priv, phy_intf);
if (err) {
dev_err(priv->dev, "MII Probe failed!\n");
goto err_mii_probe;
-- 
2.9.3



[PATCH v3 8/9] ftgmac100: Fix potential ordering issue in NAPI poll

2017-04-17 Thread Benjamin Herrenschmidt
We need to ensure the loads from the descriptor are done after the
MMIO store clearing the interrupts has completed, otherwise we
might still miss work.

A read back from the MMIO register will "push" the posted store and
ioread32 has a barrier on weakly aordered architectures that will
order subsequent accesses.

Signed-off-by: Benjamin Herrenschmidt 
---
 drivers/net/ethernet/faraday/ftgmac100.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
b/drivers/net/ethernet/faraday/ftgmac100.c
index 45b8267..95bf5e8 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -1349,6 +1349,13 @@ static int ftgmac100_poll(struct napi_struct *napi, int 
budget)
 */
iowrite32(FTGMAC100_INT_RXTX,
  priv->base + FTGMAC100_OFFSET_ISR);
+
+   /* Push the above (and provides a barrier vs. subsequent
+* reads of the descriptor).
+*/
+   ioread32(priv->base + FTGMAC100_OFFSET_ISR);
+
+   /* Check RX and TX descriptors for more work to do */
if (ftgmac100_check_rx(priv) ||
ftgmac100_tx_buf_cleanable(priv))
return budget;
-- 
2.9.3



[PATCH v3 7/9] ftgmac100: Display the discovered PHY device info

2017-04-17 Thread Benjamin Herrenschmidt
Signed-off-by: Benjamin Herrenschmidt 
---
 drivers/net/ethernet/faraday/ftgmac100.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
b/drivers/net/ethernet/faraday/ftgmac100.c
index 7721c2a..45b8267 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -1077,6 +1077,9 @@ static int ftgmac100_mii_probe(struct ftgmac100 *priv, 
phy_interface_t intf)
phydev->supported |= SUPPORTED_Pause | SUPPORTED_Asym_Pause;
phydev->advertising = phydev->supported;
 
+   /* Display what we found */
+   phy_attached_info(phydev);
+
return 0;
 }
 
-- 
2.9.3



[PATCH v3 9/9] ftgmac100: Document device-tree binding

2017-04-17 Thread Benjamin Herrenschmidt
Signed-off-by: Benjamin Herrenschmidt 
--

v3. - Update supported values for phy-mode
---
 .../devicetree/bindings/net/ftgmac100.txt  | 37 ++
 1 file changed, 37 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/ftgmac100.txt

diff --git a/Documentation/devicetree/bindings/net/ftgmac100.txt 
b/Documentation/devicetree/bindings/net/ftgmac100.txt
new file mode 100644
index 000..fceeede
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/ftgmac100.txt
@@ -0,0 +1,37 @@
+* Faraday Technology FTGMAC100 gigabit ethernet controller
+
+Required properties:
+- compatible: "faraday,ftgmac100"
+
+  Must also contain one of these if used as part of an Aspeed AST2400
+  or 2500 family SoC as they have some subtle tweaks to the
+  implementation:
+
+ - "aspeed,ast2400-mac"
+ - "aspeed,ast2500-mac"
+
+- reg: Address and length of the register set for the device
+- interrupts: Should contain ethernet controller interrupt
+
+Optional properties:
+- phy-mode: See ethernet.txt file in the same directory. If the property is
+  absent, "rgmii" is assumed. Supported values are "rgmii*" and "rmii" for
+  aspeed parts. Other (unknown) parts will accept any value.
+- use-ncsi: Use the NC-SI stack instead of an MDIO PHY. Currently assumes
+  rmii (100bT) but kept as a separate property in case NC-SI grows support
+  for a gigabit link.
+- no-hw-checksum: Used to disable HW checksum support. Here for backward
+  compatibility as the driver now should have correct defaults based on
+  the SoC.
+
+Example:
+
+   mac0: ethernet@1e66 {
+   compatible = "aspeed,ast2500-mac", "faraday,ftgmac100";
+   reg = <0x1e66 0x180>;
+   interrupts = <2>;
+   status = "okay";
+   use-ncsi;
+   };
+
+
-- 
2.9.3



[PATCH v3 5/9] ftgmac100: Add netpoll support

2017-04-17 Thread Benjamin Herrenschmidt
Just call the interrupt handler with interrupts locally disabled

Signed-off-by: Benjamin Herrenschmidt 
---
 drivers/net/ethernet/faraday/ftgmac100.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
b/drivers/net/ethernet/faraday/ftgmac100.c
index 40a03d5..f76765e 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -1588,6 +1588,17 @@ static int ftgmac100_set_features(struct net_device 
*netdev,
return 0;
 }
 
+#ifdef CONFIG_NET_POLL_CONTROLLER
+static void ftgmac100_poll_controller(struct net_device *netdev)
+{
+   unsigned long flags;
+
+   local_irq_save(flags);
+   ftgmac100_interrupt(netdev->irq, netdev);
+   local_irq_restore(flags);
+}
+#endif
+
 static const struct net_device_ops ftgmac100_netdev_ops = {
.ndo_open   = ftgmac100_open,
.ndo_stop   = ftgmac100_stop,
@@ -1598,6 +1609,9 @@ static const struct net_device_ops ftgmac100_netdev_ops = 
{
.ndo_tx_timeout = ftgmac100_tx_timeout,
.ndo_set_rx_mode= ftgmac100_set_rx_mode,
.ndo_set_features   = ftgmac100_set_features,
+#ifdef CONFIG_NET_POLL_CONTROLLER
+   .ndo_poll_controller= ftgmac100_poll_controller,
+#endif
 };
 
 static int ftgmac100_setup_mdio(struct net_device *netdev)
-- 
2.9.3



[PATCH v3 3/9] ftgmac100: Add ndo_set_rx_mode() and support for multicast & promisc

2017-04-17 Thread Benjamin Herrenschmidt
This adds the ndo_set_rx_mode() callback to configure the
multicast filters, promisc and allmulti options.

Signed-off-by: Benjamin Herrenschmidt 
--

v3. - Rebase to fix conflict with #include changes
---
 drivers/net/ethernet/faraday/ftgmac100.c | 52 
 1 file changed, 52 insertions(+)

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
b/drivers/net/ethernet/faraday/ftgmac100.c
index 949b48c..f4db6e2 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -99,6 +100,10 @@ struct ftgmac100 {
int cur_duplex;
bool use_ncsi;
 
+   /* Multicast filter settings */
+   u32 maht0;
+   u32 maht1;
+
/* Flow control settings */
bool tx_pause;
bool rx_pause;
@@ -266,6 +271,10 @@ static void ftgmac100_init_hw(struct ftgmac100 *priv)
/* Write MAC address */
ftgmac100_write_mac_addr(priv, priv->netdev->dev_addr);
 
+   /* Write multicast filter */
+   iowrite32(priv->maht0, priv->base + FTGMAC100_OFFSET_MAHT0);
+   iowrite32(priv->maht1, priv->base + FTGMAC100_OFFSET_MAHT1);
+
/* Configure descriptor sizes and increase burst sizes according
 * to values in Aspeed SDK. The FIFO arbitration is enabled and
 * the thresholds set based on the recommended values in the
@@ -319,6 +328,12 @@ static void ftgmac100_start_hw(struct ftgmac100 *priv)
/* Add other bits as needed */
if (priv->cur_duplex == DUPLEX_FULL)
maccr |= FTGMAC100_MACCR_FULLDUP;
+   if (priv->netdev->flags & IFF_PROMISC)
+   maccr |= FTGMAC100_MACCR_RX_ALL;
+   if (priv->netdev->flags & IFF_ALLMULTI)
+   maccr |= FTGMAC100_MACCR_RX_MULTIPKT;
+   else if (netdev_mc_count(priv->netdev))
+   maccr |= FTGMAC100_MACCR_HT_MULTI_EN;
 
/* Hit the HW */
iowrite32(maccr, priv->base + FTGMAC100_OFFSET_MACCR);
@@ -329,6 +344,42 @@ static void ftgmac100_stop_hw(struct ftgmac100 *priv)
iowrite32(0, priv->base + FTGMAC100_OFFSET_MACCR);
 }
 
+static void ftgmac100_calc_mc_hash(struct ftgmac100 *priv)
+{
+   struct netdev_hw_addr *ha;
+
+   priv->maht1 = 0;
+   priv->maht0 = 0;
+   netdev_for_each_mc_addr(ha, priv->netdev) {
+   u32 crc_val = ether_crc_le(ETH_ALEN, ha->addr);
+
+   crc_val = (~(crc_val >> 2)) & 0x3f;
+   if (crc_val >= 32)
+   priv->maht1 |= 1ul << (crc_val - 32);
+   else
+   priv->maht0 |= 1ul << (crc_val);
+   }
+}
+
+static void ftgmac100_set_rx_mode(struct net_device *netdev)
+{
+   struct ftgmac100 *priv = netdev_priv(netdev);
+
+   /* Setup the hash filter */
+   ftgmac100_calc_mc_hash(priv);
+
+   /* Interface down ? that's all there is to do */
+   if (!netif_running(netdev))
+   return;
+
+   /* Update the HW */
+   iowrite32(priv->maht0, priv->base + FTGMAC100_OFFSET_MAHT0);
+   iowrite32(priv->maht1, priv->base + FTGMAC100_OFFSET_MAHT1);
+
+   /* Reconfigure MACCR */
+   ftgmac100_start_hw(priv);
+}
+
 static int ftgmac100_alloc_rx_buf(struct ftgmac100 *priv, unsigned int entry,
  struct ftgmac100_rxdes *rxdes, gfp_t gfp)
 {
@@ -1503,6 +1554,7 @@ static const struct net_device_ops ftgmac100_netdev_ops = 
{
.ndo_validate_addr  = eth_validate_addr,
.ndo_do_ioctl   = ftgmac100_do_ioctl,
.ndo_tx_timeout = ftgmac100_tx_timeout,
+   .ndo_set_rx_mode= ftgmac100_set_rx_mode,
 };
 
 static int ftgmac100_setup_mdio(struct net_device *netdev)
-- 
2.9.3



[PATCH v3 4/9] ftgmac100: Add vlan HW offload

2017-04-17 Thread Benjamin Herrenschmidt
The chip supports HW vlan tag insertion and extraction. Add support
for it.

Signed-off-by: Benjamin Herrenschmidt 
---
 drivers/net/ethernet/faraday/ftgmac100.c | 46 +++-
 1 file changed, 45 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
b/drivers/net/ethernet/faraday/ftgmac100.c
index f4db6e2..40a03d5 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -335,6 +336,10 @@ static void ftgmac100_start_hw(struct ftgmac100 *priv)
else if (netdev_mc_count(priv->netdev))
maccr |= FTGMAC100_MACCR_HT_MULTI_EN;
 
+   /* Vlan filtering enabled */
+   if (priv->netdev->features & NETIF_F_HW_VLAN_CTAG_RX)
+   maccr |= FTGMAC100_MACCR_RM_VLAN;
+
/* Hit the HW */
iowrite32(maccr, priv->base + FTGMAC100_OFFSET_MACCR);
 }
@@ -530,6 +535,12 @@ static bool ftgmac100_rx_packet(struct ftgmac100 *priv, 
int *processed)
/* Transfer received size to skb */
skb_put(skb, size);
 
+   /* Extract vlan tag */
+   if ((netdev->features & NETIF_F_HW_VLAN_CTAG_RX) &&
+   (csum_vlan & FTGMAC100_RXDES1_VLANTAG_AVAIL))
+   __vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q),
+  csum_vlan & 0x);
+
/* Tear down DMA mapping, do necessary cache management */
map = le32_to_cpu(rxdes->rxdes3);
 
@@ -754,6 +765,13 @@ static int ftgmac100_hard_start_xmit(struct sk_buff *skb,
if (skb->ip_summed == CHECKSUM_PARTIAL &&
!ftgmac100_prep_tx_csum(skb, &csum_vlan))
goto drop;
+
+   /* Add VLAN tag */
+   if (skb_vlan_tag_present(skb)) {
+   csum_vlan |= FTGMAC100_TXDES1_INS_VLANTAG;
+   csum_vlan |= skb_vlan_tag_get(skb) & 0x;
+   }
+
txdes->txdes1 = cpu_to_le32(csum_vlan);
 
/* Next descriptor */
@@ -1546,6 +1564,30 @@ static void ftgmac100_tx_timeout(struct net_device 
*netdev)
schedule_work(&priv->reset_task);
 }
 
+static int ftgmac100_set_features(struct net_device *netdev,
+ netdev_features_t features)
+{
+   struct ftgmac100 *priv = netdev_priv(netdev);
+   netdev_features_t changed = netdev->features ^ features;
+
+   if (!netif_running(netdev))
+   return 0;
+
+   /* Update the vlan filtering bit */
+   if (changed & NETIF_F_HW_VLAN_CTAG_RX) {
+   u32 maccr;
+
+   maccr = ioread32(priv->base + FTGMAC100_OFFSET_MACCR);
+   if (priv->netdev->features & NETIF_F_HW_VLAN_CTAG_RX)
+   maccr |= FTGMAC100_MACCR_RM_VLAN;
+   else
+   maccr &= ~FTGMAC100_MACCR_RM_VLAN;
+   iowrite32(maccr, priv->base + FTGMAC100_OFFSET_MACCR);
+   }
+
+   return 0;
+}
+
 static const struct net_device_ops ftgmac100_netdev_ops = {
.ndo_open   = ftgmac100_open,
.ndo_stop   = ftgmac100_stop,
@@ -1555,6 +1597,7 @@ static const struct net_device_ops ftgmac100_netdev_ops = 
{
.ndo_do_ioctl   = ftgmac100_do_ioctl,
.ndo_tx_timeout = ftgmac100_tx_timeout,
.ndo_set_rx_mode= ftgmac100_set_rx_mode,
+   .ndo_set_features   = ftgmac100_set_features,
 };
 
 static int ftgmac100_setup_mdio(struct net_device *netdev)
@@ -1730,7 +1773,8 @@ static int ftgmac100_probe(struct platform_device *pdev)
 
/* Base feature set */
netdev->hw_features = NETIF_F_RXCSUM | NETIF_F_HW_CSUM |
-   NETIF_F_GRO | NETIF_F_SG;
+   NETIF_F_GRO | NETIF_F_SG | NETIF_F_HW_VLAN_CTAG_RX |
+   NETIF_F_HW_VLAN_CTAG_TX;
 
/* AST2400  doesn't have working HW checksum generation */
if (np && (of_device_is_compatible(np, "aspeed,ast2400-mac")))
-- 
2.9.3



[PATCH v3 1/9] ftgmac100: Add ethtool n-way reset call

2017-04-17 Thread Benjamin Herrenschmidt
A non-wired up implementation accidentally made its way in
a previous patch (Make ring sizes configurable via ethtool).

This removes it and wires up the generic phy_ethtool_nway_reset
instead.

Signed-off-by: Benjamin Herrenschmidt 
--

v2. - Use phy_ethtool_nway_reset() instead of custom implementation
---
 drivers/net/ethernet/faraday/ftgmac100.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
b/drivers/net/ethernet/faraday/ftgmac100.c
index 2153c5b..4cdd25a 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -1045,13 +1045,6 @@ static void ftgmac100_get_drvinfo(struct net_device 
*netdev,
strlcpy(info->bus_info, dev_name(&netdev->dev), sizeof(info->bus_info));
 }
 
-static int ftgmac100_nway_reset(struct net_device *ndev)
-{
-   if (!ndev->phydev)
-   return -ENXIO;
-   return phy_start_aneg(ndev->phydev);
-}
-
 static void ftgmac100_get_ringparam(struct net_device *netdev,
struct ethtool_ringparam *ering)
 {
@@ -1090,6 +1083,7 @@ static const struct ethtool_ops ftgmac100_ethtool_ops = {
.get_link   = ethtool_op_get_link,
.get_link_ksettings = phy_ethtool_get_link_ksettings,
.set_link_ksettings = phy_ethtool_set_link_ksettings,
+   .nway_reset = phy_ethtool_nway_reset,
.get_ringparam  = ftgmac100_get_ringparam,
.set_ringparam  = ftgmac100_set_ringparam,
 };
-- 
2.9.3



[PATCH v3 2/9] ftgmac100: Add pause frames configuration and support

2017-04-17 Thread Benjamin Herrenschmidt
Hopefully my understanding of how the hardware works is correct,
as the documentation isn't completely clear. So far I have seen
no obvious issue. Pause seem to also work with NC-SI.

Signed-off-by: Benjamin Herrenschmidt 
---
 drivers/net/ethernet/faraday/ftgmac100.c | 96 +++-
 drivers/net/ethernet/faraday/ftgmac100.h |  7 +++
 2 files changed, 102 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
b/drivers/net/ethernet/faraday/ftgmac100.c
index 4cdd25a..949b48c 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -99,6 +99,11 @@ struct ftgmac100 {
int cur_duplex;
bool use_ncsi;
 
+   /* Flow control settings */
+   bool tx_pause;
+   bool rx_pause;
+   bool aneg_pause;
+
/* Misc */
bool need_mac_restart;
bool is_aspeed;
@@ -219,6 +224,23 @@ static int ftgmac100_set_mac_addr(struct net_device *dev, 
void *p)
return 0;
 }
 
+static void ftgmac100_config_pause(struct ftgmac100 *priv)
+{
+   u32 fcr = FTGMAC100_FCR_PAUSE_TIME(16);
+
+   /* Throttle tx queue when receiving pause frames */
+   if (priv->rx_pause)
+   fcr |= FTGMAC100_FCR_FC_EN;
+
+   /* Enables sending pause frames when the RX queue is past a
+* certain threshold.
+*/
+   if (priv->tx_pause)
+   fcr |= FTGMAC100_FCR_FCTHR_EN;
+
+   iowrite32(fcr, priv->base + FTGMAC100_OFFSET_FCR);
+}
+
 static void ftgmac100_init_hw(struct ftgmac100 *priv)
 {
u32 reg, rfifo_sz, tfifo_sz;
@@ -912,6 +934,7 @@ static void ftgmac100_adjust_link(struct net_device *netdev)
 {
struct ftgmac100 *priv = netdev_priv(netdev);
struct phy_device *phydev = netdev->phydev;
+   bool tx_pause, rx_pause;
int new_speed;
 
/* We store "no link" as speed 0 */
@@ -920,8 +943,21 @@ static void ftgmac100_adjust_link(struct net_device 
*netdev)
else
new_speed = phydev->speed;
 
+   /* Grab pause settings from PHY if configured to do so */
+   if (priv->aneg_pause) {
+   rx_pause = tx_pause = phydev->pause;
+   if (phydev->asym_pause)
+   tx_pause = !rx_pause;
+   } else {
+   rx_pause = priv->rx_pause;
+   tx_pause = priv->tx_pause;
+   }
+
+   /* Link hasn't changed, do nothing */
if (phydev->speed == priv->cur_speed &&
-   phydev->duplex == priv->cur_duplex)
+   phydev->duplex == priv->cur_duplex &&
+   rx_pause == priv->rx_pause &&
+   tx_pause == priv->tx_pause)
return;
 
/* Print status if we have a link or we had one and just lost it,
@@ -932,6 +968,8 @@ static void ftgmac100_adjust_link(struct net_device *netdev)
 
priv->cur_speed = new_speed;
priv->cur_duplex = phydev->duplex;
+   priv->rx_pause = rx_pause;
+   priv->tx_pause = tx_pause;
 
/* Link is down, do nothing else */
if (!new_speed)
@@ -963,6 +1001,12 @@ static int ftgmac100_mii_probe(struct ftgmac100 *priv)
return PTR_ERR(phydev);
}
 
+   /* Indicate that we support PAUSE frames (see comment in
+* Documentation/networking/phy.txt)
+*/
+   phydev->supported |= SUPPORTED_Pause | SUPPORTED_Asym_Pause;
+   phydev->advertising = phydev->supported;
+
return 0;
 }
 
@@ -1078,6 +1122,48 @@ static int ftgmac100_set_ringparam(struct net_device 
*netdev,
return 0;
 }
 
+static void ftgmac100_get_pauseparam(struct net_device *netdev,
+struct ethtool_pauseparam *pause)
+{
+   struct ftgmac100 *priv = netdev_priv(netdev);
+
+   pause->autoneg = priv->aneg_pause;
+   pause->tx_pause = priv->tx_pause;
+   pause->rx_pause = priv->rx_pause;
+}
+
+static int ftgmac100_set_pauseparam(struct net_device *netdev,
+   struct ethtool_pauseparam *pause)
+{
+   struct ftgmac100 *priv = netdev_priv(netdev);
+   struct phy_device *phydev = netdev->phydev;
+
+   priv->aneg_pause = pause->autoneg;
+   priv->tx_pause = pause->tx_pause;
+   priv->rx_pause = pause->rx_pause;
+
+   if (phydev) {
+   phydev->advertising &= ~ADVERTISED_Pause;
+   phydev->advertising &= ~ADVERTISED_Asym_Pause;
+
+   if (pause->rx_pause) {
+   phydev->advertising |= ADVERTISED_Pause;
+   phydev->advertising |= ADVERTISED_Asym_Pause;
+   }
+
+   if (pause->tx_pause)
+   phydev->advertising ^= ADVERTISED_Asym_Pause;
+   }
+   if (netif_running(netdev)) {
+   if (phydev && priv->aneg_pause)
+   phy_start_aneg(phydev);
+   else
+   ftgmac100_config_pause(priv);
+   }
+
+   return 0;
+}
+
 static const struct ethtoo

[PATCH v3 0/9] ftgmac100: Rework batch 5 - Features

2017-04-17 Thread Benjamin Herrenschmidt
This is the third spin of the fifth and last batch of
updates to the ftgmac100 driver.

This contains a few additional "features" such as:

 - Support for ethtool n-way reset
 - Multicast filtering & promisc support
 - Vlan offload
 - netpoll

And a couple of misc bits. This also adds the device-tree binding
documentation.

v2. - Addresses review comments and adds a new patch fixing a
  theorical ordering issue in my new NAPI poll implementation
- Add a bug fix (Patch 8/9) for a potential ordering issue
  in the new NAPI poll code.

v3. - Rebase on net-next (fix conflict with an unrelated #include
  change series)
- Update DT bindings better describing accepted phy-mode values



Re: [PATCH net-next] net: ipv6: Fix UDP early demux lookup with udp_l3mdev_accept=0

2017-04-17 Thread Eric Dumazet
On Mon, 2017-04-17 at 15:11 -0600, Subash Abhinov Kasiviswanathan wrote:
> David Ahern reported that 5425077d73e0c ("net: ipv6: Add early demux
> handler for UDP unicast") breaks udp_l3mdev_accept=0 since early
> demux for IPv6 UDP was doing a generic socket lookup which does not
> require an exact match. Fix this by making UDPv6 early demux match
> connected sockets only.
> 
> Fixes: 5425077d73e0c ("net: ipv6: Add early demux handler for UDP unicast")
> Reported-by: David Ahern 
> Signed-off-by: Subash Abhinov Kasiviswanathan 
> ---
>  net/ipv6/udp.c | 21 +
>  1 file changed, 13 insertions(+), 8 deletions(-)
> 
> diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
> index b793ed1..0e307e5 100644
> --- a/net/ipv6/udp.c
> +++ b/net/ipv6/udp.c
> @@ -46,6 +46,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -864,21 +865,25 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct 
> udp_table *udptable,
>   return 0;
>  }
>  
> +
>  static struct sock *__udp6_lib_demux_lookup(struct net *net,
>   __be16 loc_port, const struct in6_addr *loc_addr,
>   __be16 rmt_port, const struct in6_addr *rmt_addr,
>   int dif)
>  {
> + unsigned short hnum = ntohs(loc_port);
> + unsigned int hash2 = udp6_portaddr_hash(net, loc_addr, hnum);
> + unsigned int slot2 = hash2 & udp_table.mask;
> + struct udp_hslot *hslot2 = &udp_table.hash2[slot2];
> + const __portpair ports = INET_COMBINED_PORTS(rmt_port, hnum);
>   struct sock *sk;
>  
> - rcu_read_lock();
> - sk = __udp6_lib_lookup(net, rmt_addr, rmt_port, loc_addr, loc_port,
> -dif, &udp_table, NULL);
> - if (sk && !atomic_inc_not_zero(&sk->sk_refcnt))
> - sk = NULL;
> - rcu_read_unlock();
> -
> - return sk;
> + udp_portaddr_for_each_entry_rcu(sk, &hslot2->head) {
> + if (INET6_MATCH(sk, net, rmt_addr, loc_addr, ports, dif))
> + return sk;
> + break;
> + }
> + return NULL;
>  }
>  
>  static void udp_v6_early_demux(struct sk_buff *skb)


This can not be right.

You removed the atomic_inc_not_zero() call, meaning that this code will
release a live socket.







Re: IGMP on IPv6

2017-04-17 Thread Cong Wang
Hello,

On Thu, Apr 13, 2017 at 9:36 AM, Murali Karicheri  wrote:
> On 03/22/2017 11:04 AM, Murali Karicheri wrote:
>> This is going directly to the slave Ethernet interface.
>>
>> When I put a WARN_ONCE, I found this is coming directly from
>> mld_ifc_timer_expire() -> mld_sendpack() -> ip6_output()
>>
>> Do you think this is fixed in latest kernel at master? If so, could
>> you point me to some commits.
>>
>>
> Ping... I see this behavior is also seen on v4.9.x Kernel. Any clue if
> this is fixed by some commit or I need to debug? I see IGMPv6 has some
> fixes on the list to make it similar to IGMPv4. So can someone clarify this is
> is a bug at IGMPv6 code or I need to look into the HSR driver code?
> Since IGMPv4 is going over the HSR interface I am assuming this is a
> bug in the IGMPv6 code. But since I have not experience with this code
> can some expert comment please?
>

How did you configure your network interfaces and IPv4/IPv6 multicast?
IOW, how did you reproduce this? For example, did you change your
HSR setup when this happened since you mentioned
NETDEV_CHANGEUPPER?


[PATCH net-next] net: ipv6: Fix UDP early demux lookup with udp_l3mdev_accept=0

2017-04-17 Thread Subash Abhinov Kasiviswanathan
David Ahern reported that 5425077d73e0c ("net: ipv6: Add early demux
handler for UDP unicast") breaks udp_l3mdev_accept=0 since early
demux for IPv6 UDP was doing a generic socket lookup which does not
require an exact match. Fix this by making UDPv6 early demux match
connected sockets only.

Fixes: 5425077d73e0c ("net: ipv6: Add early demux handler for UDP unicast")
Reported-by: David Ahern 
Signed-off-by: Subash Abhinov Kasiviswanathan 
---
 net/ipv6/udp.c | 21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index b793ed1..0e307e5 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -46,6 +46,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -864,21 +865,25 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table 
*udptable,
return 0;
 }
 
+
 static struct sock *__udp6_lib_demux_lookup(struct net *net,
__be16 loc_port, const struct in6_addr *loc_addr,
__be16 rmt_port, const struct in6_addr *rmt_addr,
int dif)
 {
+   unsigned short hnum = ntohs(loc_port);
+   unsigned int hash2 = udp6_portaddr_hash(net, loc_addr, hnum);
+   unsigned int slot2 = hash2 & udp_table.mask;
+   struct udp_hslot *hslot2 = &udp_table.hash2[slot2];
+   const __portpair ports = INET_COMBINED_PORTS(rmt_port, hnum);
struct sock *sk;
 
-   rcu_read_lock();
-   sk = __udp6_lib_lookup(net, rmt_addr, rmt_port, loc_addr, loc_port,
-  dif, &udp_table, NULL);
-   if (sk && !atomic_inc_not_zero(&sk->sk_refcnt))
-   sk = NULL;
-   rcu_read_unlock();
-
-   return sk;
+   udp_portaddr_for_each_entry_rcu(sk, &hslot2->head) {
+   if (INET6_MATCH(sk, net, rmt_addr, loc_addr, ports, dif))
+   return sk;
+   break;
+   }
+   return NULL;
 }
 
 static void udp_v6_early_demux(struct sk_buff *skb)
-- 
1.9.1



[PATCH] mac80211: ibss: Fix channel type enum in ieee80211_sta_join_ibss()

2017-04-17 Thread Matthias Kaehlcke
cfg80211_chandef_create() expects an 'enum nl80211_channel_type' as
channel type however in ieee80211_sta_join_ibss()
NL80211_CHAN_WIDTH_20_NOHT is passed in two occasions, which is of
the enum type 'nl80211_chan_width'. Change the value to NL80211_CHAN_NO_HT
(20 MHz, non-HT channel) of the channel type enum.

Signed-off-by: Matthias Kaehlcke 
---
 net/mac80211/ibss.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/mac80211/ibss.c b/net/mac80211/ibss.c
index 98999d3d5262..e957351976a2 100644
--- a/net/mac80211/ibss.c
+++ b/net/mac80211/ibss.c
@@ -425,7 +425,7 @@ static void ieee80211_sta_join_ibss(struct 
ieee80211_sub_if_data *sdata,
case NL80211_CHAN_WIDTH_5:
case NL80211_CHAN_WIDTH_10:
cfg80211_chandef_create(&chandef, cbss->channel,
-   NL80211_CHAN_WIDTH_20_NOHT);
+   NL80211_CHAN_NO_HT);
chandef.width = sdata->u.ibss.chandef.width;
break;
case NL80211_CHAN_WIDTH_80:
@@ -437,7 +437,7 @@ static void ieee80211_sta_join_ibss(struct 
ieee80211_sub_if_data *sdata,
default:
/* fall back to 20 MHz for unsupported modes */
cfg80211_chandef_create(&chandef, cbss->channel,
-   NL80211_CHAN_WIDTH_20_NOHT);
+   NL80211_CHAN_NO_HT);
break;
}
 
-- 
2.12.2.762.g0e3151a226-goog



Re: [PATCH RFC] sparc64: eBPF JIT

2017-04-17 Thread Daniel Borkmann

On 04/17/2017 09:03 PM, David Miller wrote:

From: Daniel Borkmann 
Date: Mon, 17 Apr 2017 20:44:35 +0200


On 04/17/2017 05:38 AM, David Miller wrote:

+/* Map BPF registers to SPARC registers */
+static const int bpf2sparc[] = {
+ /* return value from in-kernel function, and exit value from eBPF */
+   [BPF_REG_0] = I5,
+
+   /* arguments from eBPF program to in-kernel function */
+   [BPF_REG_1] = I0,
+   [BPF_REG_2] = I1,
+   [BPF_REG_3] = I2,
+   [BPF_REG_4] = I3,
+   [BPF_REG_5] = I4,
+
+ /* callee saved registers that in-kernel function will preserve */
+   [BPF_REG_6] = L0,
+   [BPF_REG_7] = L1,
+   [BPF_REG_8] = L2,
+   [BPF_REG_9] = L3,
+
+   /* read-only frame pointer to access stack */
+   [BPF_REG_FP] = FP,


On a quick initial glance, you also need to map BPF_REG_AX. If
I understand the convention correctly, you could use L7 for that.

You can test for it through tools/testing/selftests/bpf/test_kmod.sh
which exercises the test_bpf.ko under various sysctl combinations as
part of the BPF selftest suite.


Oh I see, it's used for constant blinding.  I can use a global register
for that since it's only used as a temporary right?


Yeah, correct.


[PATCH v2] net: cx89x0: move attribute declaration before struct keyword

2017-04-17 Thread Stefan Agner
The attribute declaration is typically before the definition. Move
the __maybe_unused attribute declaration before the struct keyword.

Signed-off-by: Stefan Agner 
---
Changes in v2:
- Move __maybe_unused after the complete type

 drivers/net/ethernet/cirrus/cs89x0.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/cirrus/cs89x0.c 
b/drivers/net/ethernet/cirrus/cs89x0.c
index 3647b28e8de0..47384f7323ac 100644
--- a/drivers/net/ethernet/cirrus/cs89x0.c
+++ b/drivers/net/ethernet/cirrus/cs89x0.c
@@ -1896,7 +1896,7 @@ static int cs89x0_platform_remove(struct platform_device 
*pdev)
return 0;
 }
 
-static const struct __maybe_unused of_device_id cs89x0_match[] = {
+static const struct of_device_id __maybe_unused cs89x0_match[] = {
{ .compatible = "cirrus,cs8900", },
{ .compatible = "cirrus,cs8920", },
{ },
-- 
2.12.2



Re: [PATCH] net: cx89x0: move attribute declaration before struct keyword

2017-04-17 Thread David Miller
From: Stefan Agner 
Date: Mon, 17 Apr 2017 13:31:28 -0700

> Given that, can you reconsider?

Please put the attribute after the compete type.

Thanks.


Re: [PATCH v2 0/9] ftgmac100: Rework batch 5 - Features

2017-04-17 Thread Benjamin Herrenschmidt
On Mon, 2017-04-17 at 11:11 -0400, David Miller wrote:
> From: Benjamin Herrenschmidt 
> Date: Thu, 13 Apr 2017 14:39:07 +1000
> 
> > This is the second spin of the fifth and last batch of
> > updates to the ftgmac100 driver.
> 
> This series doesn't apply cleanly to net-next, please respin.

Sure, I'll do this today. Thanks.

Cheers,
Ben.



Re: [PATCH] net: cx89x0: move attribute declaration before struct keyword

2017-04-17 Thread Stefan Agner
On 2017-04-17 13:09, David Miller wrote:
> From: Stefan Agner 
> Date: Sun, 16 Apr 2017 23:20:32 -0700
> 
>> The attribute declaration is typically before the definition. Move
>> the __maybe_unused attribute declaration before the struct keyword.
>>
>> Signed-off-by: Stefan Agner 
> 

I did catch that while compiling with clang, and the exact error message
is:
drivers/net/ethernet/cirrus/cs89x0.c:1899:21: warning: attribute
declaration must precede definition [-Wignored-attributes]
static const struct __maybe_unused of_device_id cs89x0_match[] = {

> Well, I see if just as often after the variable name too:
> 
> net/irda/iriap.c:static const char *const ias_charset_types[] __maybe_unused 
> = {
> net/irda/irlap.c:static const char *const lap_reasons[] __maybe_unused = {
> net/irda/irlap_event.c:static const char *const irlap_event[] __maybe_unused 
> = {
> net/irda/irlmp_event.c:static const char *const irlmp_event[] __maybe_unused 
> = {
> 

That seems not to fire when compiling with clang. I guess because the
attribute is after the _complete_ type?

> Or after the struct:
> 
> drivers/net/phy/ste10Xp.c:static struct mdio_device_id __maybe_unused
> ste10Xp_tbl[] = {
> drivers/net/phy/teranetics.c:static struct mdio_device_id
> __maybe_unused teranetics_tbl[] = {
> drivers/net/phy/vitesse.c:static struct mdio_device_id __maybe_unused
> vitesse_tbl[] = {
> 

Same here...

> So unless we decide tree wide to do it in one order or another, such changes
> are largely a waste of time.

Afaik, "struct of_device_id" as a whole is a type. This case is really
odd since it puts the attribute in the middle of a type. It is the only
instance which came across (not everything compiles fine with clang yet,
so there might be more... but a quick grep did not turn up more of the
same cases)...

> 
> Sorry I'm not applying this patch.

Given that, can you reconsider?

--
Stefan


Re: [PATCH RFC] sparc64: eBPF JIT

2017-04-17 Thread David Miller
From: Daniel Borkmann 
Date: Mon, 17 Apr 2017 20:44:35 +0200

> On a quick initial glance, you also need to map BPF_REG_AX. If
> I understand the convention correctly, you could use L7 for that.
> 
> You can test for it through tools/testing/selftests/bpf/test_kmod.sh
> which exercises the test_bpf.ko under various sysctl combinations as
> part of the BPF selftest suite.

Ok, blinding works properly with this fix:

--- arch/sparc/net/bpf_jit.h~   2017-04-13 17:44:24.084936201 -0700
+++ arch/sparc/net/bpf_jit.h2017-04-17 12:07:22.559029482 -0700
@@ -17,8 +17,10 @@
 #ifndef __ASSEMBLER__
 #define G0 0x00
 #define G1 0x01
+#define G2 0x02
 #define G3 0x03
 #define G6 0x06
+#define G7 0x07
 #define O0 0x08
 #define O1 0x09
 #define O2 0x0a
--- arch/sparc/net/bpf_jit_comp.c~  2017-04-16 20:24:49.060342700 -0700
+++ arch/sparc/net/bpf_jit_comp.c   2017-04-17 12:05:08.470048483 -0700
@@ -195,10 +195,12 @@ static const int bpf2sparc[] = {
/* read-only frame pointer to access stack */
[BPF_REG_FP] = FP,
 
+   [BPF_REG_AX] = G7,
+
/* temporary register for internal BPF JIT */
[TMP_REG_1] = G1,
-   [TMP_REG_2] = G3,
-   [TMP_REG_3] = L6,
+   [TMP_REG_2] = G2,
+   [TMP_REG_3] = G3,
 
[SKB_HLEN_REG] = L4,
[SKB_DATA_REG] = L5,
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -31,7 +31,8 @@ config SPARC
select ARCH_WANT_IPC_PARSE_VERSION
select GENERIC_PCI_IOMAP
select HAVE_NMI_WATCHDOG if SPARC64
-   select HAVE_CBPF_JIT
+   select HAVE_CBPF_JIT if SPARC32
+   select HAVE_EBPF_JIT if SPARC64
select HAVE_DEBUG_BUGVERBOSE
select GENERIC_SMP_IDLE_THREAD
select GENERIC_CLOCKEVENTS


Re: [PATCH v3 3/4] bluetooth: hci_uart: add LL protocol serdev driver support

2017-04-17 Thread Adam Ford
On Thu, Apr 13, 2017 at 10:03 AM, Rob Herring  wrote:
> Turns out that the LL protocol and the TI-ST are the same thing AFAICT.
> The TI-ST adds firmware loading, GPIO control, and shared access for
> NFC, FM radio, etc. For now, we're only implementing what is needed for
> BT. This mirrors other drivers like BCM and Intel, but uses the new
> serdev bus.
>
> The firmware loading is greatly simplified by using existing
> infrastructure to send commands. It may be a bit slower than the
> original code using synchronous functions, but the real bottleneck is
> likely doing firmware load at 115.2kbps.

I am using pdata-quirks to drive my wl1283 Bluetooth on a DM3730.  I
have the Bluetooth set to 300 baud in pdata quirks.  Looking at
the binding, I don't see an option to set the baudrate.  Is there (or
will there) be a way to set the baud rate of the Bluetooth?

adam
>
> Signed-off-by: Rob Herring 
> Cc: Marcel Holtmann 
> Cc: Gustavo Padovan 
> Cc: Johan Hedberg 
> Cc: linux-blueto...@vger.kernel.org
> ---
> v3:
> - rebase on bluetooth-next
> - Add explicit of.h include
> v2:
> - Use IS_ENABLED() to fix module build
>
>  drivers/bluetooth/hci_ll.c | 262 
> -
>  1 file changed, 261 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/bluetooth/hci_ll.c b/drivers/bluetooth/hci_ll.c
> index 02692fe30279..485e8eb04542 100644
> --- a/drivers/bluetooth/hci_ll.c
> +++ b/drivers/bluetooth/hci_ll.c
> @@ -34,20 +34,24 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
>
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include 
> +#include 
>
>  #include 
>  #include 
> +#include 
>
>  #include "hci_uart.h"
>
> @@ -76,6 +80,12 @@ struct hcill_cmd {
> u8 cmd;
>  } __packed;
>
> +struct ll_device {
> +   struct hci_uart hu;
> +   struct serdev_device *serdev;
> +   struct gpio_desc *enable_gpio;
> +};
> +
>  struct ll_struct {
> unsigned long rx_state;
> unsigned long rx_count;
> @@ -136,6 +146,9 @@ static int ll_open(struct hci_uart *hu)
>
> hu->priv = ll;
>
> +   if (hu->serdev)
> +   serdev_device_open(hu->serdev);
> +
> return 0;
>  }
>
> @@ -164,6 +177,13 @@ static int ll_close(struct hci_uart *hu)
>
> kfree_skb(ll->rx_skb);
>
> +   if (hu->serdev) {
> +   struct ll_device *lldev = 
> serdev_device_get_drvdata(hu->serdev);
> +   gpiod_set_value_cansleep(lldev->enable_gpio, 0);
> +
> +   serdev_device_close(hu->serdev);
> +   }
> +
> hu->priv = NULL;
>
> kfree(ll);
> @@ -505,9 +525,245 @@ static struct sk_buff *ll_dequeue(struct hci_uart *hu)
> return skb_dequeue(&ll->txq);
>  }
>
> +#if IS_ENABLED(CONFIG_SERIAL_DEV_BUS)
> +static int read_local_version(struct hci_dev *hdev)
> +{
> +   int err = 0;
> +   unsigned short version = 0;
> +   struct sk_buff *skb;
> +   struct hci_rp_read_local_version *ver;
> +
> +   skb = __hci_cmd_sync(hdev, HCI_OP_READ_LOCAL_VERSION, 0, NULL, 
> HCI_INIT_TIMEOUT);
> +   if (IS_ERR(skb)) {
> +   bt_dev_err(hdev, "Reading TI version information failed 
> (%ld)",
> +  PTR_ERR(skb));
> +   err = PTR_ERR(skb);
> +   goto out;
> +   }
> +   if (skb->len != sizeof(*ver)) {
> +   err = -EILSEQ;
> +   goto out;
> +   }
> +
> +   ver = (struct hci_rp_read_local_version *)skb->data;
> +   if (le16_to_cpu(ver->manufacturer) != 13) {
> +   err = -ENODEV;
> +   goto out;
> +   }
> +
> +   version = le16_to_cpu(ver->lmp_subver);
> +
> +out:
> +   if (err) bt_dev_err(hdev, "Failed to read TI version info: %d", err);
> +   kfree_skb(skb);
> +   return err ? err : version;
> +}
> +
> +/**
> + * download_firmware -
> + * internal function which parses through the .bts firmware
> + * script file intreprets SEND, DELAY actions only as of now
> + */
> +static int download_firmware(struct ll_device *lldev)
> +{
> +   unsigned short chip, min_ver, maj_ver;
> +   int version, err, len;
> +   unsigned char *ptr, *action_ptr;
> +   unsigned char bts_scr_name[40]; /* 40 char long bts scr name? */
> +   const struct firmware *fw;
> +   struct sk_buff *skb;
> +   struct hci_command *cmd;
> +
> +   version = read_local_version(lldev->hu.hdev);
> +   if (version < 0)
> +   return version;
> +
> +   chip = (version & 0x7C00) >> 10;
> +   min_ver = (version & 0x007F);
> +   maj_ver = (version & 0x0380) >> 7;
> +   if (version & 0x8000)
> +   maj_ver |= 0x0008;
> +
> +   snprintf(bts_scr_name, sizeof(bts_scr_name),
> +"ti-connectivity/TIInit_%d.%d.%d.bts",
> +chip, maj_ver, min_ver);
> +
> +   err = request_firmware(&fw, bts_scr_name, &lldev->

Re: [PATCH v2] sh_eth: unmap DMA buffers when freeing rings

2017-04-17 Thread David Miller
From: Sergei Shtylyov 
Date: Mon, 17 Apr 2017 15:55:22 +0300

> The DMA API debugging (when enabled) causes:
> 
> WARNING: CPU: 0 PID: 1445 at lib/dma-debug.c:519 add_dma_entry+0xe0/0x12c
> DMA-API: exceeded 7 overlapping mappings of cacheline 0x01b2974d
> 
> to be  printed after repeated initialization of the Ether device, e.g.
> suspend/resume or 'ifconfig' up/down. This is because DMA buffers mapped
> using dma_map_single() in sh_eth_ring_format() and sh_eth_start_xmit() are
> never unmapped. Resolve this problem by unmapping the buffers when freeing
> the descriptor  rings;  in order  to do it right, we'd have to add an extra
> parameter to sh_eth_txfree() (we rename this function to sh_eth_tx_free(),
> while at it).
> 
> Based on the commit a47b70ea86bd ("ravb: unmap descriptors when freeing
> rings").
> 
> Signed-off-by: Sergei Shtylyov 

Applied, thanks.


Re: [PATCH] net: cx89x0: move attribute declaration before struct keyword

2017-04-17 Thread David Miller
From: Stefan Agner 
Date: Sun, 16 Apr 2017 23:20:32 -0700

> The attribute declaration is typically before the definition. Move
> the __maybe_unused attribute declaration before the struct keyword.
> 
> Signed-off-by: Stefan Agner 

Well, I see if just as often after the variable name too:

net/irda/iriap.c:static const char *const ias_charset_types[] __maybe_unused = {
net/irda/irlap.c:static const char *const lap_reasons[] __maybe_unused = {
net/irda/irlap_event.c:static const char *const irlap_event[] __maybe_unused = {
net/irda/irlmp_event.c:static const char *const irlmp_event[] __maybe_unused = {

Or after the struct:

drivers/net/phy/ste10Xp.c:static struct mdio_device_id __maybe_unused 
ste10Xp_tbl[] = {
drivers/net/phy/teranetics.c:static struct mdio_device_id __maybe_unused 
teranetics_tbl[] = {
drivers/net/phy/vitesse.c:static struct mdio_device_id __maybe_unused 
vitesse_tbl[] = {

So unless we decide tree wide to do it in one order or another, such changes
are largely a waste of time.

Sorry I'm not applying this patch.


Re: [PATCH net 0/2] Two BPF fixes

2017-04-17 Thread David Miller
From: Daniel Borkmann 
Date: Mon, 17 Apr 2017 03:12:05 +0200

> The set fixes cb_access and xdp_adjust_head bits in struct bpf_prog,
> that are used for requirement checks on the program rather than f.e.
> heuristics. Thus, for tail calls, we cannot make any assumptions and
> are forced to set them.

Series applied, thanks.

Tail calls bring up all kinds of caching and assumption issues, see my
question in another thread about how register cached SKB parameters
are handled in JITs across tail calls.


Re: [PATCH v3 net-next RFC] Generic XDP

2017-04-17 Thread David Miller
From: Jesper Dangaard Brouer 
Date: Sun, 16 Apr 2017 22:26:01 +0200

> The bpf tail-call use-case is a very good example of why the
> verifier cannot deduct the needed HEADROOM upfront.

This brings up a very interesting question for me.

I notice that tail calls are implemented by JITs largely by skipping
over the prologue of that destination program.

However, many JITs preload cached SKB values into fixed registers in
the prologue.  But they only do this if the program being JITed needs
those values.

So how can it work properly if a program that does not need the SKB
values tail calls into one that does?

Daniel, Alexei?




Re: [PATCH net-next v2] bonding: deliver link-local packets with skb->dev set to link that packets arrived on

2017-04-17 Thread David Miller
From: Chonggang Li 
Date: Sun, 16 Apr 2017 12:02:18 -0700

> Bonding driver changes the skb->dev to the bonding-master before
> passing the packet to stack for further processing. This, however
> does not make sense for the link-local packets and it looses "the
> link info" once its skb->dev is changed to bonding-master.  This
> patch changes this behavior for link-local packets by not changing
> the skb->dev to the bonding-master and maintaining it as it is,
> i.e. the link on which the packet arrived.
> 
> Signed-off-by: Chonggang Li 
> Signed-off-by: Mahesh Bandewar 
> Signed-off-by: Maciej Żenczykowski 
> ---
> Changes in v2:
>   - Make the commit message more clearer.

Applied with "looses" type fixed.


Re: [PATCH net-next] net: rtnetlink: plumb extended ack to doit function

2017-04-17 Thread David Miller
From: David Ahern 
Date: Sun, 16 Apr 2017 09:48:24 -0700

> Add netlink_ext_ack arg to rtnl_doit_func. Pass extack arg to nlmsg_parse
> for doit functions that call it directly.
> 
> This is the first step to using extended error reporting in rtnetlink.
> From here individual subsystems can be updated to set netlink_ext_ack as
> needed.
> 
> Signed-off-by: David Ahern 

Applied, thanks David.


Re: [PATCH net-next 1/1] ipv6: sr: fix BUG due to headroom too small after SRH push

2017-04-17 Thread David Miller
From: David Lebrun 
Date: Sun, 16 Apr 2017 12:27:14 +0200

> When a locally generated packet receives an SRH with two or more segments,
> the remaining headroom is too small to push an ethernet header. This patch
> ensures that the headroom is large enough after SRH push.
> 
> The BUG generated the following trace.
 ...
> Fixes: 19d5a26f5ef8de5dcb78799feaf404d717b1aac3 ("ipv6: sr: expand skb head 
> only if necessary")
> Signed-off-by: David Lebrun 

Applied, thank you.


Re: [PATCH] gso: Validate assumption of frag_list segementation

2017-04-17 Thread David Miller
From: 
Date: Sun, 16 Apr 2017 11:00:07 +0300

> From: Ilan Tayari 
> 
> Commit 07b26c9454a2 ("gso: Support partial splitting at the frag_list
> pointer") assumes that all SKBs in a frag_list (except maybe the last
> one) contain the same amount of GSO payload.
> 
> This assumption is not always correct, resulting in the following
> warning message in the log:
> skb_segment: too many frags
> 
> For example, mlx5 driver in Striding RQ mode creates some RX SKBs with
> one frag, and some with 2 frags.
> After GRO, the frag_list SKBs end up having different amounts of payload.
> If this frag_list SKB is then forwarded, the aforementioned assumption
> is violated.
> 
> Validate the assumption, and fall back to software GSO if it not true.
> 
> Fixes: 07b26c9454a2 ("gso: Support partial splitting at the frag_list 
> pointer")
> Signed-off-by: Ilan Tayari 
> Signed-off-by: Ilya Lesokhin 
> ---

Your commit message mentions a fixes tag which make this change seem
relevant for 'net', but your patch depends upon things in 'net-next'
and therefore only applies there.

I've added this change to net-next, but I want an explanation of why
this change is not targettting 'net' if it seems to fix a problem
there.

Thanks.


Re: [PATCH net-next v2] Add uid and cookie bpf helper to cg_skb_func_proto

2017-04-17 Thread David Miller
From: Chenbo Feng 
Date: Fri, 14 Apr 2017 18:25:26 -0700

> From: Chenbo Feng 
> 
> BPF helper functions get_socket_cookie and get_socket_uid can be
> used for network traffic classifications, among others. Expose
> them also to programs of type BPF_PROG_TYPE_CGROUP_SKB. As of
> commit 8f917bba0042 ("bpf: pass sk to helper functions") the
> required skb->sk function is available at both cgroup bpf ingress
> and egress hooks. With these two new helper, cg_skb_func_proto is
> effectively the same as sk_filter_func_proto.
> 
> Change since V1:
> Instead of add the helper to cg_skb_func_proto, redirect the
> cg_skb_func_proto to sk_filter_func_proto since all helper function
> in sk_filter_func_proto are applicable to cg_skb_func_proto now.
> 
> Signed-off-by: Chenbo Feng 

Applied, thanks.


Re: [PATCH net-next] drivers: net: xgene-v2: Extend ethtool statistics

2017-04-17 Thread David Miller
From: Iyappan Subramanian 
Date: Fri, 14 Apr 2017 16:48:18 -0700

> + XGE_EXTD_STAT(rx_byte_cntr, RBYT),
> + XGE_EXTD_STAT(rx_pkt_cntr, RPKT),
> + XGE_EXTD_STAT(rx_fcs_error_cntr, RFCS),
> + XGE_EXTD_STAT(rx_multicast_pkt_cntr, RMCA),
> + XGE_EXTD_STAT(rx_broadcast_pkt_cntr, RBCA),

Do not duplicate statistics already reported via xge_get_stats64().


[PATCH net-next v3] bonding: deliver link-local packets with skb->dev set to link that packets arrived on

2017-04-17 Thread Chonggang Li
Bonding driver changes the skb->dev to the bonding-master before
passing the packet to stack for further processing. This, however
does not make sense for the link-local packets and it loses "the
link info" once its skb->dev is changed to bonding-master.  This
patch changes this behavior for link-local packets by not changing
the skb->dev to the bonding-master and maintaining it as it is,
i.e. the link on which the packet arrived.

Signed-off-by: Chonggang Li 
Signed-off-by: Mahesh Bandewar 
Signed-off-by: Maciej Żenczykowski 
Signed-off-by: Jay Vosburgh 
---
Changes in v2:
  - Make the commit message more clearer.
Changes in v3:
  - Fix a typo in commit message.

 drivers/net/bonding/bond_main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 01e4a69af421..6bd3b50faf48 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1176,6 +1176,9 @@ static rx_handler_result_t bond_handle_frame(struct 
sk_buff **pskb)
}
}
 
+   /* don't change skb->dev for link-local packets */
+   if (is_link_local_ether_addr(eth_hdr(skb)->h_dest))
+   return RX_HANDLER_PASS;
if (bond_should_deliver_exact_match(skb, slave, bond))
return RX_HANDLER_EXACT;
 
-- 
2.12.2.762.g0e3151a226-goog



Re: [PATCH] net: natsemi: ns83820: add checks for dma mapping error

2017-04-17 Thread David Miller
From: Alexey Khoroshilov 
Date: Sat, 15 Apr 2017 01:50:50 +0300

> @@ -1136,6 +1141,10 @@ static netdev_tx_t ns83820_hard_start_xmit(struct 
> sk_buff *skb,
>   if (nr_frags)
>   len -= skb->data_len;
>   buf = pci_map_single(dev->pci_dev, skb->data, len, PCI_DMA_TODEVICE);
> + if (pci_dma_mapping_error(dev->pci_dev, buf)) {
> + dev_kfree_skb_any(skb);
> + return NETDEV_TX_OK;
> + }
>  
>   first_desc = dev->tx_descs + (free_idx * DESC_SIZE);
>  

You need to also add this check for the skb_map_dma_frag() calls below
this line, and therefore you'll need to add unwind on such a failure.



Re: [PATCH net-next 1/2] netvsc: fix RCU warning in get_stats

2017-04-17 Thread David Miller

Both patches applied but you may want to use more consistent
Subject line subsystem prefixes in the future.


Re: [PATCH -next] net: phy: test the right variable in phy_write_mmd()

2017-04-17 Thread David Miller
From: Dan Carpenter 
Date: Fri, 14 Apr 2017 22:10:41 +0300

> This is a copy and paste buglet.  We meant to test for ->write_mmd but
> we test for ->read_mmd.
> 
> Fixes: 1ee6b9bc6206 ("net: phy: make phy_(read|write)_mmd() generic MMD 
> accessors")
> Signed-off-by: Dan Carpenter 

Applied, thanks Dan.


Re: [PATCH net] ipv6: drop non loopback packets claiming to originate from ::1

2017-04-17 Thread David Miller
From: Florian Westphal 
Date: Fri, 14 Apr 2017 20:22:43 +0200

> We lack a saddr check for ::1. This causes security issues e.g. with acls
> permitting connections from ::1 because of assumption that these originate
> from local machine.
> 
> Assuming a source address of ::1 is local seems reasonable.
> RFC4291 doesn't allow such a source address either, so drop such packets.
> 
> Reported-by: Eric Dumazet 
> Signed-off-by: Florian Westphal 

Applied, thanks Florian.


Re: [PATCH RFC] sparc64: eBPF JIT

2017-04-17 Thread David Miller
From: Daniel Borkmann 
Date: Mon, 17 Apr 2017 20:44:35 +0200

> On 04/17/2017 05:38 AM, David Miller wrote:
>> +/* Map BPF registers to SPARC registers */
>> +static const int bpf2sparc[] = {
>> + /* return value from in-kernel function, and exit value from eBPF */
>> +[BPF_REG_0] = I5,
>> +
>> +/* arguments from eBPF program to in-kernel function */
>> +[BPF_REG_1] = I0,
>> +[BPF_REG_2] = I1,
>> +[BPF_REG_3] = I2,
>> +[BPF_REG_4] = I3,
>> +[BPF_REG_5] = I4,
>> +
>> + /* callee saved registers that in-kernel function will preserve */
>> +[BPF_REG_6] = L0,
>> +[BPF_REG_7] = L1,
>> +[BPF_REG_8] = L2,
>> +[BPF_REG_9] = L3,
>> +
>> +/* read-only frame pointer to access stack */
>> +[BPF_REG_FP] = FP,
> 
> On a quick initial glance, you also need to map BPF_REG_AX. If
> I understand the convention correctly, you could use L7 for that.
> 
> You can test for it through tools/testing/selftests/bpf/test_kmod.sh
> which exercises the test_bpf.ko under various sysctl combinations as
> part of the BPF selftest suite.

Oh I see, it's used for constant blinding.  I can use a global register
for that since it's only used as a temporary right?


Re: pull request: bluetooth-next 2017-04-14

2017-04-17 Thread David Miller
From: Johan Hedberg 
Date: Fri, 14 Apr 2017 21:12:12 +0300

> Here's the main batch of Bluetooth & 802.15.4 patches for the 4.12
> kernel.
> 
>  - Many fixes to 6LoWPAN, in particular for BLE
>  - New CA8210 IEEE 802.15.4 device driver (accounting for most of the
>lines of code added in this pull request)
>  - Added Nokia Bluetooth (UART) HCI driver
>  - Some serdev & TTY changes that are dependencies for the Nokia
>driver (with acks from relevant maintainers and an agreement that
>these come through the bluetooth tree)
>  - Support for new Intel Bluetooth device
>  - Various other minor cleanups/fixes here and there
> 
> Please let me know if there are any issues pulling. Thanks.

Applied, thanks.


Re: [PATCH RFC] sparc64: eBPF JIT

2017-04-17 Thread Daniel Borkmann

On 04/17/2017 05:38 AM, David Miller wrote:


There are a bunch of things I want to do still, and I know that I have
to attend to sparc32 more cleanly, but I wanted to post this now that
I have it passing the BPF testsuite completely:

[24174.315421] test_bpf: Summary: 305 PASSED, 0 FAILED, [297/297 JIT'ed]


Awesome, thanks for working on it! :)


Only major unimplemented feature is tail calls, which I am very sure I
can do simply but until something easy to use like test_bpf can
exercise it I probably won't do it.


There is samples/bpf/sockex3_kern.c, which exercises it. To
run it, it would be (clang/llvm needed due to BPF backend not
available in gcc):

# cd samples/bpf
# make
# ./sockex3
IP src.port -> dst.port   bytes  packets
127.0.0.1.12865 -> 127.0.0.1.49711  1482
127.0.0.1.49711 -> 127.0.0.1.12865  1082
[...]

Inside parse_eth_proto(), it will do tail calls based on the
eth protocol. Over time, we'll move such C based tests over to
tools/testing/selftests/bpf/.


 From my side, what I need to do to turn this into a non-RFC is to sort
out sparc32.  My plan is to take the existing cBPF JIT, rip out all of
the sparc64 specific bits, and have sparc32 use that.  And do it in
such a way that git bisection is not broken.


Makes sense. That would follow the same model as ppc32/64.


As a future optimization I'd like to add support for emitting cbcond
instructions on newer chips.

This implementation grabs a register window all the time, and we could
avoid that and use a leaf function in certatin situations.  The
register layout is also not optimal, and one side effect is that we
have to move the argument registers over during function calls.

Signed-off-by: David S. Miller 

[...]

+/* Map BPF registers to SPARC registers */
+static const int bpf2sparc[] = {
+   /* return value from in-kernel function, and exit value from eBPF */
+   [BPF_REG_0] = I5,
+
+   /* arguments from eBPF program to in-kernel function */
+   [BPF_REG_1] = I0,
+   [BPF_REG_2] = I1,
+   [BPF_REG_3] = I2,
+   [BPF_REG_4] = I3,
+   [BPF_REG_5] = I4,
+
+   /* callee saved registers that in-kernel function will preserve */
+   [BPF_REG_6] = L0,
+   [BPF_REG_7] = L1,
+   [BPF_REG_8] = L2,
+   [BPF_REG_9] = L3,
+
+   /* read-only frame pointer to access stack */
+   [BPF_REG_FP] = FP,


On a quick initial glance, you also need to map BPF_REG_AX. If
I understand the convention correctly, you could use L7 for that.

You can test for it through tools/testing/selftests/bpf/test_kmod.sh
which exercises the test_bpf.ko under various sysctl combinations as
part of the BPF selftest suite.


+   /* temporary register for internal BPF JIT */
+   [TMP_REG_1] = G1,
+   [TMP_REG_2] = G3,
+   [TMP_REG_3] = L6,
+
+   [SKB_HLEN_REG] = L4,
+   [SKB_DATA_REG] = L5,
+};


Thanks a lot!
Daniel


[Patch net-next v3] net_sched: move the empty tp check from ->destroy() to ->delete()

2017-04-17 Thread Cong Wang
Roi reported we could have a race condition where in ->classify() path
we dereference tp->root and meanwhile a parallel ->destroy() makes it
a NULL.

This is possible because ->destroy() could be called when deleting
a filter to check if we are the last one in tp, this tp is still
linked and visible at that time.

Daniel fixed this in commit d936377414fa
("net, sched: respect rcu grace period on cls destruction"), but
the root cause of this problem is the semantic of ->destroy(), it
does two things (for non-force case):

1) check if tp is empty
2) if tp is empty we could really destroy it

and its caller, if cares, needs to check its return value to see if
it is really destroyed. Therefore we can't unlink tp unless we know
it is empty.

As suggested by Daniel, we could actually move the test logic to ->delete()
so that we can safely unlink tp after ->delete() tells us the last one is
just deleted and before ->destroy().

What's more, even we unlink it before ->destroy(), it could still have
readers since we don't wait for a grace period here, we should not modify
tp->root in ->destroy() either.

Reported-by: Roi Dayan 
Cc: Daniel Borkmann 
Cc: John Fastabend 
Cc: Jamal Hadi Salim 
Signed-off-by: Cong Wang 
---
 include/net/sch_generic.h |  4 +--
 net/sched/cls_api.c   | 27 +-
 net/sched/cls_basic.c | 10 +++
 net/sched/cls_bpf.c   | 11 
 net/sched/cls_cgroup.c|  8 ++
 net/sched/cls_flow.c  | 10 +++
 net/sched/cls_flower.c| 10 ++-
 net/sched/cls_fw.c| 30 +++-
 net/sched/cls_matchall.c  |  7 ++---
 net/sched/cls_route.c | 30 ++--
 net/sched/cls_rsvp.h  | 34 +++
 net/sched/cls_tcindex.c   | 14 +-
 net/sched/cls_u32.c   | 71 +++
 13 files changed, 134 insertions(+), 132 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 65d5026..22e5209 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -204,14 +204,14 @@ struct tcf_proto_ops {
const struct tcf_proto *,
struct tcf_result *);
int (*init)(struct tcf_proto*);
-   bool(*destroy)(struct tcf_proto*, bool);
+   void(*destroy)(struct tcf_proto*);
 
unsigned long   (*get)(struct tcf_proto*, u32 handle);
int (*change)(struct net *net, struct sk_buff *,
struct tcf_proto*, unsigned long,
u32 handle, struct nlattr **,
unsigned long *, bool);
-   int (*delete)(struct tcf_proto*, unsigned long);
+   int (*delete)(struct tcf_proto*, unsigned long, 
bool*);
void(*walk)(struct tcf_proto*, struct tcf_walker 
*arg);
 
/* rtnetlink specific */
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index e2c68c3..55d7b4e7 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -178,14 +178,11 @@ static struct tcf_proto *tcf_proto_create(const char 
*kind, u32 protocol,
return ERR_PTR(err);
 }
 
-static bool tcf_proto_destroy(struct tcf_proto *tp, bool force)
+static void tcf_proto_destroy(struct tcf_proto *tp)
 {
-   if (tp->ops->destroy(tp, force)) {
-   module_put(tp->ops->owner);
-   kfree_rcu(tp, rcu);
-   return true;
-   }
-   return false;
+   tp->ops->destroy(tp);
+   module_put(tp->ops->owner);
+   kfree_rcu(tp, rcu);
 }
 
 void tcf_destroy_chain(struct tcf_proto __rcu **fl)
@@ -194,7 +191,7 @@ void tcf_destroy_chain(struct tcf_proto __rcu **fl)
 
while ((tp = rtnl_dereference(*fl)) != NULL) {
RCU_INIT_POINTER(*fl, tp->next);
-   tcf_proto_destroy(tp, true);
+   tcf_proto_destroy(tp);
}
 }
 EXPORT_SYMBOL(tcf_destroy_chain);
@@ -360,7 +357,7 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct 
nlmsghdr *n)
RCU_INIT_POINTER(*back, next);
tfilter_notify(net, skb, n, tp, fh,
   RTM_DELTFILTER, false);
-   tcf_proto_destroy(tp, true);
+   tcf_proto_destroy(tp);
err = 0;
goto errout;
}
@@ -371,24 +368,28 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct 
nlmsghdr *n)
goto errout;
}
} else {
+   bool last;
+
switch (n->nlmsg_type) {
case RTM_NEWTFILTER:
if (n->nlmsg_flags & NLM_F_EXCL) {
if (tp_created)
-   tcf_proto_destroy(tp, t

Re: [PATCH net-next v2] bonding: deliver link-local packets with skb->dev set to link that packets arrived on

2017-04-17 Thread Jay Vosburgh
Chonggang Li  wrote:

>Bonding driver changes the skb->dev to the bonding-master before
>passing the packet to stack for further processing. This, however
>does not make sense for the link-local packets and it looses "the
>link info" once its skb->dev is changed to bonding-master.  This
>patch changes this behavior for link-local packets by not changing
>the skb->dev to the bonding-master and maintaining it as it is,
>i.e. the link on which the packet arrived.

Minor nit: "looses" should be "loses".  Other than that:

Signed-off-by: Jay Vosburgh 


>Signed-off-by: Chonggang Li 
>Signed-off-by: Mahesh Bandewar 
>Signed-off-by: Maciej Żenczykowski 
>---
>Changes in v2:
>  - Make the commit message more clearer.
>
> drivers/net/bonding/bond_main.c | 3 +++
> 1 file changed, 3 insertions(+)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 01e4a69af421..6bd3b50faf48 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -1176,6 +1176,9 @@ static rx_handler_result_t bond_handle_frame(struct 
>sk_buff **pskb)
>   }
>   }
> 
>+  /* don't change skb->dev for link-local packets */
>+  if (is_link_local_ether_addr(eth_hdr(skb)->h_dest))
>+  return RX_HANDLER_PASS;
>   if (bond_should_deliver_exact_match(skb, slave, bond))
>   return RX_HANDLER_EXACT;
> 
>-- 
>2.12.2.762.g0e3151a226-goog
>


Re: [PATCH linux 2/2] net sched actions: fix refcount decrement on error

2017-04-17 Thread Cong Wang
On Sat, Apr 15, 2017 at 11:48 AM, Wolfgang Bumiller
 wrote:
>
>> On April 15, 2017 at 8:20 PM Cong Wang  wrote:
>>
>>
>> On Fri, Apr 14, 2017 at 2:08 AM, Wolfgang Bumiller
>>  wrote:
>> > Before I do that - trying to wrap my head around the interdependencies
>> > here better to be thorough - I noticed that tcf_hash_release() can
>> > return ACT_P_DELETED. The ACT_P_CREATED case means tcf_hash_create()
>> > was used, in the other case the tc_action's ref & bind count is bumped
>> > by tcf_hash_check() and then also decremented by tcf_hash_release() if
>> > it existed, iow. kept at 1, but not always: It does always happen in
>> > act_police.c but in other files such as act_bpf.c or act_connmark.c if
>> > eg. bind is set they return without decrementing, so both ref&bind count
>> > are bumped when they return - the refcount logic isn't easy to follow
>> > for a newcomer. Now there are two uses of __tcf_hash_release() in
>> > act_api.c which check for a return value of ACT_P_DELETED, in which case
>> > they call module_put().
>>
>>
>> That's the nasty part... IIRC, Jamal has fixed two bugs on action refcnt'ing.
>> We really need to clean up the code.
>>
>> > So I'm not sure exactly how the module and tc_action counts are related
>> > (and I usually like to understand my own patches ;-) ).
>>
>>
>> Each action holds a refcnt to its module, each filter holds a refcnt to
>> its bound or referenced (unbound) action.
>>
>>
>> > Maybe I'm missing something obvious but I'm currently a bit confused as
>> > to whether the tcf_hash_release() call there is okay, or should have its
>> > return value checked or should depend on ->init()'s ACT_P_CREATED value
>> > as well?
>> >
>>
>> I think it's the same? If we have ACT_P_CREATED here, tcf_hash_release()
>> will return ACT_P_DELETED for sure because the newly created action has
>> refcnt==1?
>
> Makes sense on the one hand, but for ACT_P_DELETED both ref and bind
> count need to reach 0, so I'm still concerned that the different behaviors

Bind refcnt is only used when it is bound to a filter and refcnt is always used,
so either bind refcnt is 0 or it is same with refcnt.

> I mentioned above might be problematic if we use ACT_P_CREATED only.
> (It also means my patches still leak a count - which is probably still
> better than the previous underflow, but ultimately doesn't satisfy me.)
> Should I still resend it this way for the record with the Acked-bys?
> (Since given the fact that with unprivileged containers it's possible to
> trigger this access and potentially crash the kernel I strongly feel that
> some version of this should end up in the 4.11 release.)

I think so.

Thanks.


bluetooth 6lowpan interfaces are not virtual anymore

2017-04-17 Thread Alexander Aring
Hi,

bluetooth-next contains patches which introduces a queue for bluetooth
6LoWPAN interfaces. [0]

At first, the current behaviour is now that 802.15.4 6LoWPAN interfaces
are virtual and bluetooth 6LoWPAN interfaces are not virtual anymore.
To have a different handling in both subsystems is _definitely_ wrong.

What does the 6LoWPAN interface?

It will do a protocol change (an adaptation, because 6LoWPAN should
provide the same functionality as IPv6) from IPv6 to 6LoWPAN (tx) and
vice versa for (rx). In my opinion this should be handled as a virtual
interface and not as an interface with a queue.

What makes a queue on 6LoWPAN interfaces?

It will queue IPv6 packets which waits for it adaptation (the protocol
change) with some additional qdisc handling.
If finally the xmit_do callback will occur it will replace IPv6 header
with 6LoWPAN, etc. After that it should be queued into some queue on
link layer side which should be do the transmit finally.

Why I think bluetooth introduced a queue handling on 6LoWPAN interfaces?

Because I think they don't like their own *qdisc* handling on their link
layer interface. I write *qdisc* here because, they have no net_devices
and use some kind of own qdisc handling.

My question: is this correct?

How to fix that (In my opinion):

So commit [0] says something "out of credits" that's what I think it's
the *qdisc* handling. If you cannot queue anymore -> you need to drop
it. If you don't like how the current behaviour is, you need to change
your *qdisc* handling on your link layer interface. Introducing queue at
6LoWPAN interfaces will introduce "buffer bloating".

---

I don't care what bluetooth does with the 6LoWPAN interface. If bluetooth
people wants such behaviour, then I am okay with that.

I will bookmark this mail for the time when somebody reverts it to a
virtual interface again. I think somebody will change it again, or maybe
somebody will argument why we need a queue here. 

- Alex

[0] 
https://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git/commit/?id=814f1b243d2c63928f6aa72d66bec0e5fae2f4a9


Re: [PATCH net-next 0/6] bpf: LRU performance and test-program improvements

2017-04-17 Thread David Miller
From: Martin KaFai Lau 
Date: Fri, 14 Apr 2017 10:30:24 -0700

> The first 4 patches make a few improvements to the LRU tests.
> 
> Patch 5/6 is to improve the performance of BPF_F_NO_COMMON_LRU map.
> 
> Patch 6/6 adds an example in using LRU map with map-in-map.

Series applied, thank you.


Re: [PATCH net-next 1/1] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch

2017-04-17 Thread Roman Mashak
Eric Dumazet  writes:

> On Mon, 2017-04-17 at 12:46 -0400, Jamal Hadi Salim wrote:
>
>> Of course it is trivial to add this as attributes and 32 bits
>> for this case is not a big deal because it is done once. I want to talk
>> about the pads instead ;-> What do you suggest we do with pads?
>
> We do nothing with pads. Just leave them.
>
> Or, you need something to make sure to not break old user applications.
>
> Something like a version.

Would be a new NLM_F_* modifier for GET request an option for large
dumps implementation? Looks like there are extract bits and Johannes'
patches for extended ACK do not use it.


Re: [PATCH net-next] ibmvnic: Report errors when failing to release sub-crqs

2017-04-17 Thread David Miller

Please do not submit a set of changes to the same driver like this.

Instead, submit a proper patch series which is numbered (so that the
dependencies betweeen changes, if any, are explciit) and also with
a proper "[PATCH 0/N] ..." header posting which describes what the
patch series is doing, how it is doing it, and why it is doing it
that way.

Thanks.


Re: [PATCH net-next] net: mvneta: fix failed to suspend if WOL is enabled

2017-04-17 Thread David Miller
From: Jisheng Zhang 
Date: Fri, 14 Apr 2017 19:07:32 +0800

> Recently, suspend/resume and WOL support are added into mvneta driver.
> If we enable WOL, then we get some error as below on Marvell BG4CT
> platforms during suspend:
> 
> [  184.149723] dpm_run_callback(): mdio_bus_suspend+0x0/0x50 returns -16
> [  184.149727] PM: Device f7b62004.mdio-mi:00 failed to suspend: error -16
> 
> -16 means -EBUSY, phy_suspend() will return -EBUSY if it finds the
> device has WOL enabled.
> 
> We fix this issue by properly setting the netdev's power.can_wakeup
> and power.wakeup, i.e
> 
> 1. in mvneta_mdio_probe(), call device_set_wakeup_capable() to set
> power.can_wakeup if the phy support WOL.
> 
> 2. in mvneta_ethtool_set_wol(), call device_set_wakeup_enable() to
> set power.wakeup if WOL has been successfully enabled in phy.
> 
> Signed-off-by: Jisheng Zhang 

Applied.


Re: [PATCH net-next] net: bridge: notify on hw fdb takeover

2017-04-17 Thread David Miller
From: Nikolay Aleksandrov 
Date: Fri, 14 Apr 2017 13:49:34 +0300

> Recently we added support for SW fdbs to take over HW ones, but that
> results in changing a user-visible fdb flag thus we need to send a
> notification, also it's consistent with how HW takes over SW entries.
> 
> Signed-off-by: Nikolay Aleksandrov 

Applied.


Re: [PATCH] net/ncsi: fix checksum validation in response packet

2017-04-17 Thread David Miller
From: Cédric Le Goater 
Date: Fri, 14 Apr 2017 10:56:37 +0200

> htonl was used instead of ntohl. Surely a typo.
> 
> Signed-off-by: Cédric Le Goater 

I don't think so, "checksum" is of type "u32" thus is in host byte
order.  Therefore "htonl()" is correct.


Re: [PATCH v2 net 0/2] Fix crash caused by reporting inconsistent skb->len to BQL

2017-04-17 Thread David Miller
From: 
Date: Fri, 14 Apr 2017 11:19:10 +0800

> From: Sean Wang 
> 
> Changes since v1:
> - fix inconsistent enumeration which easily causes the potential bug

Series applied, thanks.


Re: [PATCH v3] net: phy: micrel: fix crash when statistic requested for KSZ9031 phy

2017-04-17 Thread David Miller
From: Grygorii Strashko 
Date: Thu, 13 Apr 2017 14:11:27 -0500

> Now the command:
>   ethtool --phy-statistics eth0
> will cause system crash with meassage "Unable to handle kernel NULL pointer
> dereference at virtual address 0010" from:
> 
>  (kszphy_get_stats) from [] (ethtool_get_phy_stats+0xd8/0x210)
>  (ethtool_get_phy_stats) from [] (dev_ethtool+0x5b8/0x228c)
>  (dev_ethtool) from [] (dev_ioctl+0x3fc/0x964)
>  (dev_ioctl) from [] (sock_ioctl+0x170/0x2c0)
>  (sock_ioctl) from [] (do_vfs_ioctl+0xa8/0x95c)
>  (do_vfs_ioctl) from [] (SyS_ioctl+0x3c/0x64)
>  (SyS_ioctl) from [] (ret_fast_syscall+0x0/0x44)
> 
> The reason: phy_driver structure for KSZ9031 phy has no .probe() callback
> defined. As result, struct phy_device *phydev->priv pointer will not be
> initializes (null).
> This issue will affect also following phys:
>  KSZ8795, KSZ886X, KSZ8873MLL, KSZ9031, KSZ9021, KSZ8061, KS8737
> 
> Fix it by:
> - adding .probe() = kszphy_probe() callback to KSZ9031, KSZ9021
> phys. The kszphy_probe() can be re-used as it doesn't do any phy specific
> settings.
> - removing statistic callbacks from other phys (KSZ8795, KSZ886X,
> KSZ8873MLL, KSZ8061, KS8737) as they doesn't have corresponding
> statistic counters.
> 
> Fixes: 2b2427d06426 ("phy: micrel: Add ethtool statistics counters")
> Signed-off-by: Grygorii Strashko 
> Reviewed-by: Andrew Lunn 

Applied.


Re: [Patch net-next] kcm: remove a useless copy_from_user()

2017-04-17 Thread David Miller
From: Cong Wang 
Date: Thu, 13 Apr 2017 11:38:02 -0700

> struct kcm_clone only contains fd, and kcm_clone() only
> writes this struct, so there is no need to copy it from user.
> 
> Cc: Tom Herbert 
> Signed-off-by: Cong Wang 

Applied.


Re: [PATCH net] net: vrf: Fix setting NLM_F_EXCL flag when adding l3mdev rule

2017-04-17 Thread David Miller
From: David Ahern 
Date: Thu, 13 Apr 2017 10:57:15 -0600

> Only need 1 l3mdev FIB rule. Fix setting NLM_F_EXCL in the nlmsghdr.
> 
> Fixes: 1aa6c4f6b8cd8 ("net: vrf: Add l3mdev rules on first device create")
> Signed-off-by: David Ahern 

Applied and queued up for -stable, thanks.


Re: [patch net-next] MAINTAINERS: rename TC entry and add couple of header files

2017-04-17 Thread David Miller
From: Jiri Pirko 
Date: Thu, 13 Apr 2017 18:13:51 +0200

> From: Jiri Pirko 
> 
> The section is not specific only to "TC classifiers", but applies to the
> whole TC subsystem. Also, add couple of forgotten headers.
> 
> Signed-off-by: Jiri Pirko 

Applied.


Re: [PATCH 2/2] net: phy: simplify phy_supported_speeds()

2017-04-17 Thread David Miller
From: Russell King 
Date: Thu, 13 Apr 2017 16:49:20 +0100

> Simplify the loop in phy_supported_speeds().
> 
> Signed-off-by: Russell King 

Also applied to net-next, thanks.


Re: [PATCH 1/2] net: phy: improve phylib correctness for non-autoneg settings

2017-04-17 Thread David Miller
From: Russell King 
Date: Thu, 13 Apr 2017 16:49:15 +0100

> phylib has some undesirable behaviour when forcing a link mode through
> ethtool.  phylib uses this code:
> 
>   idx = phy_find_valid(phy_find_setting(phydev->speed, phydev->duplex),
>   features);
> 
> to find an index in the settings table.  phy_find_setting() starts at
> index 0, and scans upwards looking for an exact speed and duplex match.
> When it doesn't find it, it returns MAX_NUM_SETTINGS - 1, which is
> 10baseT-Half duplex.
> 
> phy_find_valid() then scans from the point (and effectively only checks
> one entry) before bailing out, returning MAX_NUM_SETTINGS - 1.
> 
> phy_sanitize_settings() then sets ->speed to SPEED_10 and ->duplex to
> DUPLEX_HALF whether or not 10baseT-Half is supported or not.  This goes
> against all the comments against these functions, and 10baseT-Half may
> not even be supported by the hardware.
> 
> Rework these functions, introducing a new method of scanning the table.
> There are two modes of lookup that phylib wants: exact, and inexact.
> 
> - in exact mode, we return either an exact match or failure
> - in inexact mode, we return an exact match if it exists, a match at
>   the highest speed that is not greater than the requested speed
>   (ignoring duplex), or failing that, the lowest supported speed, or
>   failure.
> 
> The biggest difference is that we always check whether the entry is
> supported before further consideration, so all unsupported entries are
> not considered as candidates.
> 
> This results in arguably saner behaviour, better matches the comments,
> and is probably what users would expect.
> 
> This becomes important as ethernet speeds increase, PHYs exist which do
> not support the 10Mbit speeds, and half-duplex is likely to become
> obsolete - it's already not even an option on 10Gbit and faster links.
> 
> Signed-off-by: Russell King 

Applied to net-next


Re: [PATCH net-next v2] Subject: net: allow configuring default qdisc

2017-04-17 Thread David Miller
From: Stephen Hemminger 
Date: Thu, 13 Apr 2017 08:40:53 -0700

> Since 3.12 it has been possible to configure the default queuing
> discipline via sysctl. This patch adds ability to configure the
> default queue discipline in kernel configuration. This is useful for
> environments where configuring the value from userspace is difficult
> to manage.
> 
> The default is still the same as before (pfifo_fast) and it is
> possible to change after kernel init with sysctl. This is similar
> to how TCP congestion control works.
> 
> Signed-off-by: Stephen Hemminger 
> ---
> v2 -- add another level of indirection to make it easier for
>   users blindly doing make oldconfig

Applied, thanks.


Re: [PATCH net-next 1/1] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch

2017-04-17 Thread Eric Dumazet
On Mon, 2017-04-17 at 12:46 -0400, Jamal Hadi Salim wrote:

> Of course it is trivial to add this as attributes and 32 bits
> for this case is not a big deal because it is done once. I want to talk
> about the pads instead ;-> What do you suggest we do with pads?

We do nothing with pads. Just leave them.

Or, you need something to make sure to not break old user applications.

Something like a version.




Re: [PATCH net-next 0/2] qed/qede: aRFS support

2017-04-17 Thread David Miller
From: Manish Chopra 
Date: Thu, 13 Apr 2017 04:54:43 -0700

> This series adds support for Accelerated Flow Steering
> in qede driver for TCP/UDP over IPv4/IPv6 protocols.
> 
> Please consider applying this series to "net-next"

Series applied, thanks.


Re: [PATCH v4] smsc95xx: Add comments to the registers definition

2017-04-17 Thread David Miller
From: Martin Wetterwald 
Date: Thu, 13 Apr 2017 10:08:44 +0200

> This chip is used by a lot of embedded devices and also by the Raspberry
> Pi 1, 2 & 3 which were created to promote the study of computer
> sciences. Students wanting to learn kernel / network device driver
> programming through those devices can only rely on the Linux kernel
> driver source to make their own.
> 
> This commit adds a lot of comments to the registers definition to expand
> the register names.
> 
> Cc: Steve Glendinning 
> Cc: Microchip Linux Driver Support 
> CC: David Miller 
> Signed-off-by: Martin Wetterwald 
> Reviewed-by: Andrew Lunn 
> Acked-by: Steve Glendinning 

Applied to net-next, thanks.


Re: [PATCH net-next 1/1] net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch

2017-04-17 Thread Jiri Pirko
Mon, Apr 17, 2017 at 06:46:17PM CEST, j...@mojatatu.com wrote:
>On 17-04-17 11:31 AM, Jiri Pirko wrote:
>> Mon, Apr 17, 2017 at 03:10:59PM CEST, eric.duma...@gmail.com wrote:
>> > On Mon, 2017-04-17 at 07:01 -0400, Jamal Hadi Salim wrote:
>
>> Agreed.
>> 
>> Plus the argument that attributes are "a big waste" sounds to me really
>> silly. What is couple of bytes?Please do this properly, as it should
>> be done.
>
>Jiri - you wanted to have these uapi discussions, right? ;->
>
>Of course it is trivial to add this as attributes and 32 bits
>for this case is not a big deal because it is done once. I want to talk
>about the pads instead ;-> What do you suggest we do with pads?

I believe we should leave them as they are.


>
>cheers,
>jaaml
>


Re: [PATCH] net: thunderx: Fix set_max_bgx_per_node for 81xx rgx

2017-04-17 Thread David Miller
From: George Cherian 
Date: Thu, 13 Apr 2017 07:25:01 +

> Add the PCI_SUBSYS_DEVID_81XX_RGX and use the same to set
> the max bgx per node count.
> 
> This fixes the issue intoduced by following commit
> 78aacb6f6 net: thunderx: Fix invalid mac addresses for node1 interfaces
> With this commit the max_bgx_per_node for 81xx is set as 2 instead of 3
> because of which num_vfs is always calculated as zero.
> 
> Signed-off-by: George Cherian 

Applied.


Re: [PATCH net-next] l2tp: device MTU setup, tunnel socket needs a lock

2017-04-17 Thread David Miller
From: "R. Parameswaran" 
Date: Wed, 12 Apr 2017 18:31:04 -0700 (PDT)

> 
> The MTU overhead calculation in L2TP device set-up
> merged via commit b784e7ebfce8cfb16c6f95e14e8532d0768ab7ff
> needs to be adjusted to lock the tunnel socket while
> referencing the sub-data structures to derive the
> socket's IP overhead.
> 
> Reported-by: Guillaume Nault 
> Tested-by: Guillaume Nault 
> Signed-off-by: R. Parameswaran 

Applied, thanks.


Re: [PATCH net] net-timestamp: avoid use-after-free in ip_recv_error

2017-04-17 Thread David Miller
From: Willem de Bruijn 
Date: Wed, 12 Apr 2017 19:24:35 -0400

> From: Willem de Bruijn 
> 
> Syzkaller reported a use-after-free in ip_recv_error at line
> 
> info->ipi_ifindex = skb->dev->ifindex;
> 
> This function is called on dequeue from the error queue, at which
> point the device pointer may no longer be valid.
> 
> Save ifindex on enqueue in __skb_complete_tx_timestamp, when the
> pointer is valid or NULL. Store it in temporary storage skb->cb.
> 
> It is safe to reference skb->dev here, as called from device drivers
> or dev_queue_xmit. The exception is when called from tcp_ack_tstamp;
> in that case it is NULL and ifindex is set to 0 (invalid).
> 
> Do not return a pktinfo cmsg if ifindex is 0. This maintains the
> current behavior of not returning a cmsg if skb->dev was NULL.
> 
> On dequeue, the ipv4 path will cast from sock_exterr_skb to
> in_pktinfo. Both have ifindex as their first element, so no explicit
> conversion is needed. This is by design, introduced in commit
> 0b922b7a829c ("net: original ingress device index in PKTINFO"). For
> ipv6 ip6_datagram_support_cmsg converts to in6_pktinfo.
> 
> Fixes: 829ae9d61165 ("net-timestamp: allow reading recv cmsg on errqueue with 
> origin tstamp")
> 
> Reported-by: Andrey Konovalov 
> Signed-off-by: Willem de Bruijn 

Applied and queued up for -stable, thanks.

In the future please don't insert empty lines between the Fixes: and
other tags.

Thanks.


  1   2   >