date:20170117

Re: [PATCH net-next v3 06/10] net: dsa: Migrate to device_find_class()

2017-01-17 Thread Greg KH

On Mon, Jan 16, 2017 at 12:01:02PM -0800, Florian Fainelli wrote:
> On 01/15/2017 11:16 AM, Andrew Lunn wrote:
> >>> What exactly is the relationship between these devices (a ascii-art tree
> >>> or sysfs tree output might be nice) so I can try to understand what is
> >>> going on here.
> > 
> > Hi Greg, Florian
> > 
> > A few diagrams and trees which might help understand what is going on.
> > 
> > The first diagram comes from the 2008 patch which added all this code:
> > 
> > +---+   +---+
> > |   | RGMII |   |
> > |   +---+   +-- 1000baseT MDI ("WAN")
> > |   |   |  6-port   +-- 1000baseT MDI ("LAN1")
> > |CPU|   |  ethernet +-- 1000baseT MDI ("LAN2")
> > |   |MIImgmt|  switch   +-- 1000baseT MDI ("LAN3")
> > |   +---+  w/5 PHYs +-- 1000baseT MDI ("LAN4")
> > |   |   |   |
> > +---+   +---+
> > 
> > We have an ethernet switch and a host CPU. The switch is connected to
> > the CPU in two different ways. RGMII allows us to get Ethernet frames
> > from the CPU into the switch. MIImgmt, is the management bus normally
> > used for Ethernet PHYs, but Marvell switches also use it for Managing
> > switches.
> > 
> > The diagram above is the simplest setup. You can have multiple
> > Ethernet switches, connected together via switch ports. Each switch
> > has its own MIImgmt connect to the CPU, but there is only one RGMII
> > link.
> > 
> > When this code was designed back in 2008, it was decided to represent
> > this is a platform device, and it has a platform_data, which i have
> > slightly edited to keep it simple:
> > 
> > struct dsa_platform_data {
> > /*
> >  * Reference to a Linux network interface that connects
> >  * to the root switch chip of the tree.
> >  */
> > struct device   *netdev;
> > 
> > /*
> >  * Info structs describing each of the switch chips
> >  * connected via this network interface.
> >  */
> > int nr_chips;
> > struct dsa_chip_data*chip;
> > };
> > 
> > This netdev is the CPU side of the RGMII interface.
> > 
> > Each switch has a dsa_chip_data, again edited:
> > 
> > struct dsa_chip_data {
> > /*
> >  * How to access the switch configuration registers.
> >  */
> > struct device   *host_dev;
> > int sw_addr;
> > ...
> > }
> > 
> > The host_dev is the CPU side of the MIImgmt, and we have the address
> > the switch is using on the bus.
> > 
> > During probe of this platform device, we need to get from the
> > struct device *netdev to a struct net_device *dev.
> > 
> > So the code looks in the device net class to find the device
> > 
> > |   |   |   |-- f1074000.ethernet
> > |   |   |   |   |-- deferred_probe
> > |   |   |   |   |-- driver -> ../../../../../bus/platform/drivers/mvneta
> > |   |   |   |   |-- driver_override
> > |   |   |   |   |-- modalias
> > |   |   |   |   |-- net
> > |   |   |   |   |   `-- eth1
> > |   |   |   |   |   |-- addr_assign_type
> > |   |   |   |   |   |-- address
> > |   |   |   |   |   |-- addr_len
> > |   |   |   |   |   |-- broadcast
> > |   |   |   |   |   |-- carrier
> > |   |   |   |   |   |-- carrier_changes
> > |   |   |   |   |   |-- deferred_probe
> > |   |   |   |   |   |-- device -> ../../../f1074000.ethernet
> > 
> > and then use container_of() to get the net_device.
> > 
> > Similarly, the code needs to get from struct device *host_dev to a struct 
> > mii_bus *.
> > 
> > |   |   |   |-- f1072004.mdio
> > |   |   |   |   |-- deferred_probe
> > |   |   |   |   |-- driver -> ../../../../../bus/platform/drivers/orion-mdio
> > |   |   |   |   |-- driver_override
> > |   |   |   |   |-- mdio_bus
> > |   |   |   |   |   `-- f1072004.mdio-mi
> > |   |   |   |   |   |-- deferred_probe
> > |   |   |   |   |   |-- device -> ../../../f1072004.mdio
> > 
> 
> Thanks Andrew! Greg, does that make it clearer how these devices
> references are used, do you still think the way this is done is wrong,
> too cautious, or valid?

I'm still not sold on it, I think there is something odd here with your
use/assumptions of the driver model.  Give me a few days to catch up
with other stuff to respond back please...

thanks,

greg k-h

[PATCH net-next V5 3/3] tun: rx batching

2017-01-17 Thread Jason Wang

We can only process 1 packet at one time during sendmsg(). This often
lead bad cache utilization under heavy load. So this patch tries to do
some batching during rx before submitting them to host network
stack. This is done through accepting MSG_MORE as a hint from
sendmsg() caller, if it was set, batch the packet temporarily in a
linked list and submit them all once MSG_MORE were cleared.

Tests were done by pktgen (burst=128) in guest over mlx4(noqueue) on host:

 Mpps  -+%
rx-frames = 00.91  +0%
rx-frames = 41.00  +9.8%
rx-frames = 81.00  +9.8%
rx-frames = 16   1.01  +10.9%
rx-frames = 32   1.07  +17.5%
rx-frames = 48   1.07  +17.5%
rx-frames = 64   1.08  +18.6%
rx-frames = 64 (no MSG_MORE) 0.91  +0%

User were allowed to change per device batched packets through
ethtool -C rx-frames. NAPI_POLL_WEIGHT were used as upper limitation
to prevent bh from being disabled too long.

Signed-off-by: Jason Wang 
---
 drivers/net/tun.c | 76 ++-
 1 file changed, 70 insertions(+), 6 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 8c1d3bd..13890ac 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -218,6 +218,7 @@ struct tun_struct {
struct list_head disabled;
void *security;
u32 flow_count;
+   u32 rx_batched;
struct tun_pcpu_stats __percpu *pcpu_stats;
 };
 
@@ -522,6 +523,7 @@ static void tun_queue_purge(struct tun_file *tfile)
while ((skb = skb_array_consume(&tfile->tx_array)) != NULL)
kfree_skb(skb);
 
+   skb_queue_purge(&tfile->sk.sk_write_queue);
skb_queue_purge(&tfile->sk.sk_error_queue);
 }
 
@@ -1139,10 +1141,46 @@ static struct sk_buff *tun_alloc_skb(struct tun_file 
*tfile,
return skb;
 }
 
+static void tun_rx_batched(struct tun_struct *tun, struct tun_file *tfile,
+  struct sk_buff *skb, int more)
+{
+   struct sk_buff_head *queue = &tfile->sk.sk_write_queue;
+   struct sk_buff_head process_queue;
+   u32 rx_batched = tun->rx_batched;
+   bool rcv = false;
+
+   if (!rx_batched || (!more && skb_queue_empty(queue))) {
+   local_bh_disable();
+   netif_receive_skb(skb);
+   local_bh_enable();
+   return;
+   }
+
+   spin_lock(&queue->lock);
+   if (!more || skb_queue_len(queue) == rx_batched) {
+   __skb_queue_head_init(&process_queue);
+   skb_queue_splice_tail_init(queue, &process_queue);
+   rcv = true;
+   } else {
+   __skb_queue_tail(queue, skb);
+   }
+   spin_unlock(&queue->lock);
+
+   if (rcv) {
+   struct sk_buff *nskb;
+
+   local_bh_disable();
+   while ((nskb = __skb_dequeue(&process_queue)))
+   netif_receive_skb(nskb);
+   netif_receive_skb(skb);
+   local_bh_enable();
+   }
+}
+
 /* Get packet from user space buffer */
 static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
void *msg_control, struct iov_iter *from,
-   int noblock)
+   int noblock, bool more)
 {
struct tun_pi pi = { 0, cpu_to_be16(ETH_P_IP) };
struct sk_buff *skb;
@@ -1283,9 +1321,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, 
struct tun_file *tfile,
 
rxhash = skb_get_hash(skb);
 #ifndef CONFIG_4KSTACKS
-   local_bh_disable();
-   netif_receive_skb(skb);
-   local_bh_enable();
+   tun_rx_batched(tun, tfile, skb, more);
 #else
netif_rx_ni(skb);
 #endif
@@ -1311,7 +1347,8 @@ static ssize_t tun_chr_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
if (!tun)
return -EBADFD;
 
-   result = tun_get_user(tun, tfile, NULL, from, file->f_flags & 
O_NONBLOCK);
+   result = tun_get_user(tun, tfile, NULL, from,
+ file->f_flags & O_NONBLOCK, false);
 
tun_put(tun);
return result;
@@ -1569,7 +1606,8 @@ static int tun_sendmsg(struct socket *sock, struct msghdr 
*m, size_t total_len)
return -EBADFD;
 
ret = tun_get_user(tun, tfile, m->msg_control, &m->msg_iter,
-  m->msg_flags & MSG_DONTWAIT);
+  m->msg_flags & MSG_DONTWAIT,
+  m->msg_flags & MSG_MORE);
tun_put(tun);
return ret;
 }
@@ -1770,6 +1808,7 @@ static int tun_set_iff(struct net *net, struct file 
*file, struct ifreq *ifr)
tun->align = NET_SKB_PAD;
tun->filter_attached = false;
tun->sndbuf = tfile->socket.sk->sk_sndbuf;
+   tun->rx_batched = 0;
 
tun->pcpu_stats = netdev_alloc_pcpu_stats(struct 
tun_

[PATCH net-next V5 2/3] vhost_net: tx batching

2017-01-17 Thread Jason Wang

This patch tries to utilize tuntap rx batching by peeking the tx
virtqueue during transmission, if there's more available buffers in
the virtqueue, set MSG_MORE flag for a hint for backend (e.g tuntap)
to batch the packets.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Jason Wang 
---
 drivers/vhost/net.c | 23 ---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 5dc3465..c42e9c3 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -351,6 +351,15 @@ static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
return r;
 }
 
+static bool vhost_exceeds_maxpend(struct vhost_net *net)
+{
+   struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
+   struct vhost_virtqueue *vq = &nvq->vq;
+
+   return (nvq->upend_idx + vq->num - VHOST_MAX_PEND) % UIO_MAXIOV
+   == nvq->done_idx;
+}
+
 /* Expects to be always run from workqueue - which acts as
  * read-size critical section for our kind of RCU. */
 static void handle_tx(struct vhost_net *net)
@@ -394,8 +403,7 @@ static void handle_tx(struct vhost_net *net)
/* If more outstanding DMAs, queue the work.
 * Handle upend_idx wrap around
 */
-   if (unlikely((nvq->upend_idx + vq->num - VHOST_MAX_PEND)
- % UIO_MAXIOV == nvq->done_idx))
+   if (unlikely(vhost_exceeds_maxpend(net)))
break;
 
head = vhost_net_tx_get_vq_desc(net, vq, vq->iov,
@@ -454,6 +462,16 @@ static void handle_tx(struct vhost_net *net)
msg.msg_control = NULL;
ubufs = NULL;
}
+
+   total_len += len;
+   if (total_len < VHOST_NET_WEIGHT &&
+   !vhost_vq_avail_empty(&net->dev, vq) &&
+   likely(!vhost_exceeds_maxpend(net))) {
+   msg.msg_flags |= MSG_MORE;
+   } else {
+   msg.msg_flags &= ~MSG_MORE;
+   }
+
/* TODO: Check specific error and bomb out unless ENOBUFS? */
err = sock->ops->sendmsg(sock, &msg, len);
if (unlikely(err < 0)) {
@@ -472,7 +490,6 @@ static void handle_tx(struct vhost_net *net)
vhost_add_used_and_signal(&net->dev, vq, head, 0);
else
vhost_zerocopy_signal_used(net, vq);
-   total_len += len;
vhost_net_tx_packet(net);
if (unlikely(total_len >= VHOST_NET_WEIGHT)) {
vhost_poll_queue(&vq->poll);
-- 
2.7.4

[PATCH net-next V5 1/3] vhost: better detection of available buffers

2017-01-17 Thread Jason Wang

This patch tries to do several tweaks on vhost_vq_avail_empty() for a
better performance:

- check cached avail index first which could avoid userspace memory access.
- using unlikely() for the failure of userspace access
- check vq->last_avail_idx instead of cached avail index as the last
  step.

This patch is need for batching supports which needs to peek whether
or not there's still available buffers in the ring.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Jason Wang 
---
 drivers/vhost/vhost.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index d643260..9f11838 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2241,11 +2241,15 @@ bool vhost_vq_avail_empty(struct vhost_dev *dev, struct 
vhost_virtqueue *vq)
__virtio16 avail_idx;
int r;
 
+   if (vq->avail_idx != vq->last_avail_idx)
+   return false;
+
r = vhost_get_user(vq, avail_idx, &vq->avail->idx);
-   if (r)
+   if (unlikely(r))
return false;
+   vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
 
-   return vhost16_to_cpu(vq, avail_idx) == vq->avail_idx;
+   return vq->avail_idx == vq->last_avail_idx;
 }
 EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);
 
-- 
2.7.4

Re: [PATCH net-next v4 05/10] drivers: base: Add device_find_in_class_name()

2017-01-17 Thread Greg Kroah-Hartman

On Tue, Jan 17, 2017 at 03:21:47PM -0800, Florian Fainelli wrote:
> Add a helper function to lookup a device reference given a class name.
> This is a preliminary patch to remove adhoc code from net/dsa/dsa.c and
> make it more generic.
> 
> Signed-off-by: Florian Fainelli 
> ---
>  drivers/base/core.c| 31 +++
>  include/linux/device.h |  2 ++
>  2 files changed, 33 insertions(+)

My NAK still stands here, please give me a day or so to respond to the
other thread about this...

thanks,

greg k-h

[PATCH net-next V5 0/3] vhost_net tx batching

2017-01-17 Thread Jason Wang

Hi:

This series tries to implement tx batching support for vhost. This was
done by using MSG_MORE as a hint for under layer socket. The backend
(e.g tap) can then batch the packets temporarily in a list and
submit it all once the number of bacthed exceeds a limitation.

Tests shows obvious improvement on guest pktgen over over
mlx4(noqueue) on host:

 Mpps  -+%
rx-frames = 00.91  +0%
rx-frames = 41.00  +9.8%
rx-frames = 81.00  +9.8%
rx-frames = 16   1.01  +10.9%
rx-frames = 32   1.07  +17.5%
rx-frames = 48   1.07  +17.5%
rx-frames = 64   1.08  +18.6%
rx-frames = 64 (no MSG_MORE) 0.91  +0%

Changes from V4:
- stick to NAPI_POLL_WEIGHT for rx-frames is user specify a value
  greater than it.
Changes from V3:
- use ethtool instead of module parameter to control the maximum
  number of batched packets
- avoid overhead when MSG_MORE were not set and no packet queued
Changes from V2:
- remove uselss queue limitation check (and we don't drop any packet now)
Changes from V1:
- drop NAPI handler since we don't use NAPI now
- fix the issues that may exceeds max pending of zerocopy
- more improvement on available buffer detection
- move the limitation of batched pacekts from vhost to tuntap

Please review.

Thanks

Jason Wang (3):
  vhost: better detection of available buffers
  vhost_net: tx batching
  tun: rx batching

 drivers/net/tun.c | 76 +++
 drivers/vhost/net.c   | 23 ++--
 drivers/vhost/vhost.c |  8 --
 3 files changed, 96 insertions(+), 11 deletions(-)

-- 
2.7.4

[PATCH iproute2 net-next] iplink: bridge_slave: add support for IFLA_BRPORT_FLUSH

2017-01-17 Thread Hangbin Liu

This patch implements support for the IFLA_BRPORT_FLUSH attribute
in iproute2 so it can flush bridge slave's fdb dynamic entries.

Signed-off-by: Hangbin Liu 
---
 ip/iplink_bridge_slave.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/ip/iplink_bridge_slave.c b/ip/iplink_bridge_slave.c
index fbb3f06..6353fc5 100644
--- a/ip/iplink_bridge_slave.c
+++ b/ip/iplink_bridge_slave.c
@@ -22,7 +22,10 @@
 static void print_explain(FILE *f)
 {
fprintf(f,
-   "Usage: ... bridge_slave [ state STATE ] [ priority PRIO ] 
[cost COST ]\n"
+   "Usage: ... bridge_slave [ fdb_flush ]\n"
+   "[ state STATE ]\n"
+   "[ priority PRIO ]\n"
+   "[ cost COST ]\n"
"[ guard {on | off} ]\n"
"[ hairpin {on | off} ]\n"
"[ fastleave {on | off} ]\n"
@@ -217,7 +220,9 @@ static int bridge_slave_parse_opt(struct link_util *lu, int 
argc, char **argv,
__u32 cost;
 
while (argc > 0) {
-   if (matches(*argv, "state") == 0) {
+   if (matches(*argv, "fdb_flush") == 0) {
+   addattr(n, 1024, IFLA_BRPORT_FLUSH);
+   } else if (matches(*argv, "state") == 0) {
NEXT_ARG();
if (get_u8(&state, *argv, 0))
invarg("state is invalid", *argv);
-- 
2.5.5

[PATCHv2 iproute2 net-next 1/5] iplink: bridge: add support for IFLA_BR_FDB_FLUSH

2017-01-17 Thread Hangbin Liu

This patch implements support for the IFLA_BR_FDB_FLUSH attribute
in iproute2 so it can flush bridge fdb dynamic entries.

Reviewed-by: Nikolay Aleksandrov 
Signed-off-by: Hangbin Liu 
---
 ip/iplink_bridge.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c
index d2d4202..85e6597 100644
--- a/ip/iplink_bridge.c
+++ b/ip/iplink_bridge.c
@@ -22,7 +22,8 @@
 static void print_explain(FILE *f)
 {
fprintf(f,
-   "Usage: ... bridge [ forward_delay FORWARD_DELAY ]\n"
+   "Usage: ... bridge [ fdb_flush ]\n"
+   "  [ forward_delay FORWARD_DELAY ]\n"
"  [ hello_time HELLO_TIME ]\n"
"  [ max_age MAX_AGE ]\n"
"  [ ageing_time AGEING_TIME ]\n"
@@ -145,6 +146,8 @@ static int bridge_parse_opt(struct link_util *lu, int argc, 
char **argv,
if (len < 0)
return -1;
addattr_l(n, 1024, IFLA_BR_GROUP_ADDR, llabuf, len);
+   } else if (matches(*argv, "fdb_flush") == 0) {
+   addattr(n, 1024, IFLA_BR_FDB_FLUSH);
} else if (matches(*argv, "vlan_default_pvid") == 0) {
__u16 default_pvid;
 
-- 
2.5.5

[PATCHv2 iproute2 net-next 4/5] iplink: bridge: add support for IFLA_BR_MCAST_IGMP_VERSION

2017-01-17 Thread Hangbin Liu

This patch implements support for the IFLA_BR_MCAST_IGMP_VERSION
attribute in iproute2 so it can change the mcast igmp version.

Reviewed-by: Nikolay Aleksandrov 
Signed-off-by: Hangbin Liu 
---
 ip/iplink_bridge.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c
index 46bbbee..3e9143e 100644
--- a/ip/iplink_bridge.c
+++ b/ip/iplink_bridge.c
@@ -50,6 +50,7 @@ static void print_explain(FILE *f)
"  [ mcast_query_response_interval 
QUERY_RESPONSE_INTERVAL ]\n"
"  [ mcast_startup_query_interval 
STARTUP_QUERY_INTERVAL ]\n"
"  [ mcast_stats_enabled MCAST_STATS_ENABLED 
]\n"
+   "  [ mcast_igmp_version IGMP_VERSION ]\n"
"  [ nf_call_iptables NF_CALL_IPTABLES ]\n"
"  [ nf_call_ip6tables NF_CALL_IP6TABLES ]\n"
"  [ nf_call_arptables NF_CALL_ARPTABLES ]\n"
@@ -308,6 +309,14 @@ static int bridge_parse_opt(struct link_util *lu, int 
argc, char **argv,
invarg("invalid mcast_stats_enabled", *argv);
addattr8(n, 1024, IFLA_BR_MCAST_STATS_ENABLED,
  mcast_stats_enabled);
+   } else if (matches(*argv, "mcast_igmp_version") == 0) {
+   __u8 igmp_version;
+
+   NEXT_ARG();
+   if (get_u8(&igmp_version, *argv, 0))
+   invarg("invalid mcast_igmp_version", *argv);
+   addattr8(n, 1024, IFLA_BR_MCAST_IGMP_VERSION,
+ igmp_version);
} else if (matches(*argv, "nf_call_iptables") == 0) {
__u8 nf_call_ipt;
 
@@ -537,6 +546,10 @@ static void bridge_print_opt(struct link_util *lu, FILE 
*f, struct rtattr *tb[])
fprintf(f, "mcast_stats_enabled %u ",
rta_getattr_u8(tb[IFLA_BR_MCAST_STATS_ENABLED]));
 
+   if (tb[IFLA_BR_MCAST_IGMP_VERSION])
+   fprintf(f, "mcast_igmp_version %u ",
+   rta_getattr_u8(tb[IFLA_BR_MCAST_IGMP_VERSION]));
+
if (tb[IFLA_BR_NF_CALL_IPTABLES])
fprintf(f, "nf_call_iptables %u ",
rta_getattr_u8(tb[IFLA_BR_NF_CALL_IPTABLES]));
-- 
2.5.5

[PATCHv2 iproute2 net-next 0/5] add latest bridge netlink options

2017-01-17 Thread Hangbin Liu

Add the bridge netlink attributes added to kernel recently.

v2: rename vlan/mcast_state to vlan/mcast_stats_enabled as suggested by
Nikolay. The previous name has different meaning and will mislead people.
I will post a separate patch for IFLA_BRPORT_FLUSH support.

Hangbin Liu (5):
  iplink: bridge: add support for IFLA_BR_FDB_FLUSH
  iplink: bridge: add support for IFLA_BR_VLAN_STATS_ENABLED
  iplink: bridge: add support for IFLA_BR_MCAST_STATS_ENABLED
  iplink: bridge: add support for IFLA_BR_MCAST_IGMP_VERSION
  iplink: bridge: add support for IFLA_BR_MCAST_MLD_VERSION

 ip/iplink_bridge.c | 57 +-
 1 file changed, 56 insertions(+), 1 deletion(-)

-- 
2.5.5

[PATCHv2 iproute2 net-next 5/5] iplink: bridge: add support for IFLA_BR_MCAST_MLD_VERSION

2017-01-17 Thread Hangbin Liu

This patch implements support for the IFLA_BR_MCAST_MLD_VERSION
attribute in iproute2 so it can change the mcast mld version.

Reviewed-by: Nikolay Aleksandrov 
Signed-off-by: Hangbin Liu 
---
 ip/iplink_bridge.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c
index 3e9143e..a17ff35 100644
--- a/ip/iplink_bridge.c
+++ b/ip/iplink_bridge.c
@@ -51,6 +51,7 @@ static void print_explain(FILE *f)
"  [ mcast_startup_query_interval 
STARTUP_QUERY_INTERVAL ]\n"
"  [ mcast_stats_enabled MCAST_STATS_ENABLED 
]\n"
"  [ mcast_igmp_version IGMP_VERSION ]\n"
+   "  [ mcast_mld_version MLD_VERSION ]\n"
"  [ nf_call_iptables NF_CALL_IPTABLES ]\n"
"  [ nf_call_ip6tables NF_CALL_IP6TABLES ]\n"
"  [ nf_call_arptables NF_CALL_ARPTABLES ]\n"
@@ -317,6 +318,14 @@ static int bridge_parse_opt(struct link_util *lu, int 
argc, char **argv,
invarg("invalid mcast_igmp_version", *argv);
addattr8(n, 1024, IFLA_BR_MCAST_IGMP_VERSION,
  igmp_version);
+   } else if (matches(*argv, "mcast_mld_version") == 0) {
+   __u8 mld_version;
+
+   NEXT_ARG();
+   if (get_u8(&mld_version, *argv, 0))
+   invarg("invalid mcast_mld_version", *argv);
+   addattr8(n, 1024, IFLA_BR_MCAST_MLD_VERSION,
+ mld_version);
} else if (matches(*argv, "nf_call_iptables") == 0) {
__u8 nf_call_ipt;
 
@@ -550,6 +559,10 @@ static void bridge_print_opt(struct link_util *lu, FILE 
*f, struct rtattr *tb[])
fprintf(f, "mcast_igmp_version %u ",
rta_getattr_u8(tb[IFLA_BR_MCAST_IGMP_VERSION]));
 
+   if (tb[IFLA_BR_MCAST_MLD_VERSION])
+   fprintf(f, "mcast_mld_version %u ",
+   rta_getattr_u8(tb[IFLA_BR_MCAST_MLD_VERSION]));
+
if (tb[IFLA_BR_NF_CALL_IPTABLES])
fprintf(f, "nf_call_iptables %u ",
rta_getattr_u8(tb[IFLA_BR_NF_CALL_IPTABLES]));
-- 
2.5.5

[PATCHv2 iproute2 net-next 3/5] iplink: bridge: add support for IFLA_BR_MCAST_STATS_ENABLED

2017-01-17 Thread Hangbin Liu

This patch implements support for the IFLA_BR_MCAST_STATS_ENABLED
attribute in iproute2 so it can enable/disable mcast stats accounting.

Signed-off-by: Hangbin Liu 
---
 ip/iplink_bridge.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c
index cd495b3..46bbbee 100644
--- a/ip/iplink_bridge.c
+++ b/ip/iplink_bridge.c
@@ -49,6 +49,7 @@ static void print_explain(FILE *f)
"  [ mcast_query_interval QUERY_INTERVAL ]\n"
"  [ mcast_query_response_interval 
QUERY_RESPONSE_INTERVAL ]\n"
"  [ mcast_startup_query_interval 
STARTUP_QUERY_INTERVAL ]\n"
+   "  [ mcast_stats_enabled MCAST_STATS_ENABLED 
]\n"
"  [ nf_call_iptables NF_CALL_IPTABLES ]\n"
"  [ nf_call_ip6tables NF_CALL_IP6TABLES ]\n"
"  [ nf_call_arptables NF_CALL_ARPTABLES ]\n"
@@ -299,6 +300,14 @@ static int bridge_parse_opt(struct link_util *lu, int 
argc, char **argv,
 
addattr64(n, 1024, IFLA_BR_MCAST_STARTUP_QUERY_INTVL,
  mcast_startup_query_intvl);
+   } else if (matches(*argv, "mcast_stats_enabled") == 0) {
+   __u8 mcast_stats_enabled;
+
+   NEXT_ARG();
+   if (get_u8(&mcast_stats_enabled, *argv, 0))
+   invarg("invalid mcast_stats_enabled", *argv);
+   addattr8(n, 1024, IFLA_BR_MCAST_STATS_ENABLED,
+ mcast_stats_enabled);
} else if (matches(*argv, "nf_call_iptables") == 0) {
__u8 nf_call_ipt;
 
@@ -524,6 +533,10 @@ static void bridge_print_opt(struct link_util *lu, FILE 
*f, struct rtattr *tb[])
fprintf(f, "mcast_startup_query_interval %llu ",
rta_getattr_u64(tb[IFLA_BR_MCAST_STARTUP_QUERY_INTVL]));
 
+   if (tb[IFLA_BR_MCAST_STATS_ENABLED])
+   fprintf(f, "mcast_stats_enabled %u ",
+   rta_getattr_u8(tb[IFLA_BR_MCAST_STATS_ENABLED]));
+
if (tb[IFLA_BR_NF_CALL_IPTABLES])
fprintf(f, "nf_call_iptables %u ",
rta_getattr_u8(tb[IFLA_BR_NF_CALL_IPTABLES]));
-- 
2.5.5

[PATCHv2 iproute2 net-next 2/5] iplink: bridge: add support for IFLA_BR_VLAN_STATS_ENABLED

2017-01-17 Thread Hangbin Liu

This patch implements support for the IFLA_BR_VLAN_STATS_ENABLED
attribute in iproute2 so it can enable/disable vlan stats accounting.

Signed-off-by: Hangbin Liu 
---
 ip/iplink_bridge.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c
index 85e6597..cd495b3 100644
--- a/ip/iplink_bridge.c
+++ b/ip/iplink_bridge.c
@@ -34,6 +34,7 @@ static void print_explain(FILE *f)
"  [ vlan_filtering VLAN_FILTERING ]\n"
"  [ vlan_protocol VLAN_PROTOCOL ]\n"
"  [ vlan_default_pvid VLAN_DEFAULT_PVID ]\n"
+   "  [ vlan_stats_enabled VLAN_STATS_ENABLED ]\n"
"  [ mcast_snooping MULTICAST_SNOOPING ]\n"
"  [ mcast_router MULTICAST_ROUTER ]\n"
"  [ mcast_query_use_ifaddr 
MCAST_QUERY_USE_IFADDR ]\n"
@@ -157,6 +158,14 @@ static int bridge_parse_opt(struct link_util *lu, int 
argc, char **argv,
 
addattr16(n, 1024, IFLA_BR_VLAN_DEFAULT_PVID,
  default_pvid);
+   } else if (matches(*argv, "vlan_stats_enabled") == 0) {
+   __u8 vlan_stats_enabled;
+
+   NEXT_ARG();
+   if (get_u8(&vlan_stats_enabled, *argv, 0))
+   invarg("invalid vlan_stats_enabled", *argv);
+   addattr8(n, 1024, IFLA_BR_VLAN_STATS_ENABLED,
+ vlan_stats_enabled);
} else if (matches(*argv, "mcast_router") == 0) {
__u8 mcast_router;
 
@@ -442,6 +451,10 @@ static void bridge_print_opt(struct link_util *lu, FILE 
*f, struct rtattr *tb[])
fprintf(f, "vlan_default_pvid %u ",
rta_getattr_u16(tb[IFLA_BR_VLAN_DEFAULT_PVID]));
 
+   if (tb[IFLA_BR_VLAN_STATS_ENABLED])
+   fprintf(f, "vlan_stats_enabled %u ",
+   rta_getattr_u8(tb[IFLA_BR_VLAN_STATS_ENABLED]));
+
if (tb[IFLA_BR_GROUP_FWD_MASK])
fprintf(f, "group_fwd_mask %#x ",
rta_getattr_u16(tb[IFLA_BR_GROUP_FWD_MASK]));
-- 
2.5.5

[PATCH net-next v2] net/mlx5e: Support bpf_xdp_adjust_head()

2017-01-17 Thread Martin KaFai Lau

This patch adds bpf_xdp_adjust_head() support to mlx5e.

1. rx_headroom is added to struct mlx5e_rq.  It uses
   an existing 4 byte hole in the struct.
2. The adjusted data length is checked against
   MLX5E_XDP_MIN_INLINE and MLX5E_SW2HW_MTU(rq->netdev->mtu).
3. The macro MLX5E_SW2HW_MTU is moved from en_main.c to en.h.
   MLX5E_HW2SW_MTU is also moved to en.h for symmetric reason
   but it is not a must.

v2:
- Keep the xdp specific logic in mlx5e_xdp_handle()
- Update dma_len after the sanity checks in mlx5e_xmit_xdp_frame()

Signed-off-by: Martin KaFai Lau 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  4 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 18 -
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c   | 47 ++-
 3 files changed, 40 insertions(+), 29 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index a473cea10c16..0d9dd860a295 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -51,6 +51,9 @@
 
 #define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v)
 
+#define MLX5E_HW2SW_MTU(hwmtu) ((hwmtu) - (ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN))
+#define MLX5E_SW2HW_MTU(swmtu) ((swmtu) + (ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN))
+
 #define MLX5E_MAX_NUM_TC   8
 
 #define MLX5E_PARAMS_MINIMUM_LOG_SQ_SIZE0x6
@@ -369,6 +372,7 @@ struct mlx5e_rq {
 
unsigned long  state;
intix;
+   u16rx_headroom;
 
struct mlx5e_rx_am am; /* Adaptive Moderation */
struct bpf_prog   *xdp_prog;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index f74ba73c55c7..aba3691e0919 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -343,9 +343,6 @@ static void mlx5e_disable_async_events(struct mlx5e_priv 
*priv)
synchronize_irq(mlx5_get_msix_vec(priv->mdev, MLX5_EQ_VEC_ASYNC));
 }
 
-#define MLX5E_HW2SW_MTU(hwmtu) (hwmtu - (ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN))
-#define MLX5E_SW2HW_MTU(swmtu) (swmtu + (ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN))
-
 static inline int mlx5e_get_wqe_mtt_sz(void)
 {
/* UMR copies MTTs in units of MLX5_UMR_MTT_ALIGNMENT bytes.
@@ -534,9 +531,13 @@ static int mlx5e_create_rq(struct mlx5e_channel *c,
goto err_rq_wq_destroy;
}
 
-   rq->buff.map_dir = DMA_FROM_DEVICE;
-   if (rq->xdp_prog)
+   if (rq->xdp_prog) {
rq->buff.map_dir = DMA_BIDIRECTIONAL;
+   rq->rx_headroom = XDP_PACKET_HEADROOM;
+   } else {
+   rq->buff.map_dir = DMA_FROM_DEVICE;
+   rq->rx_headroom = MLX5_RX_HEADROOM;
+   }
 
switch (priv->params.rq_wq_type) {
case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ:
@@ -586,7 +587,7 @@ static int mlx5e_create_rq(struct mlx5e_channel *c,
byte_count = rq->buff.wqe_sz;
 
/* calc the required page order */
-   frag_sz = MLX5_RX_HEADROOM +
+   frag_sz = rq->rx_headroom +
  byte_count /* packet data */ +
  SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
frag_sz = SKB_DATA_ALIGN(frag_sz);
@@ -3153,11 +3154,6 @@ static int mlx5e_xdp_set(struct net_device *netdev, 
struct bpf_prog *prog)
bool reset, was_opened;
int i;
 
-   if (prog && prog->xdp_adjust_head) {
-   netdev_err(netdev, "Does not support bpf_xdp_adjust_head()\n");
-   return -EOPNOTSUPP;
-   }
-
mutex_lock(&priv->state_lock);
 
if ((netdev->features & NETIF_F_LRO) && prog) {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 0e2fb3ed1790..20f116f8c457 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -264,7 +264,7 @@ int mlx5e_alloc_rx_wqe(struct mlx5e_rq *rq, struct 
mlx5e_rx_wqe *wqe, u16 ix)
if (unlikely(mlx5e_page_alloc_mapped(rq, di)))
return -ENOMEM;
 
-   wqe->data.addr = cpu_to_be64(di->addr + MLX5_RX_HEADROOM);
+   wqe->data.addr = cpu_to_be64(di->addr + rq->rx_headroom);
return 0;
 }
 
@@ -646,8 +646,7 @@ static inline void mlx5e_xmit_xdp_doorbell(struct mlx5e_sq 
*sq)
 
 static inline void mlx5e_xmit_xdp_frame(struct mlx5e_rq *rq,
struct mlx5e_dma_info *di,
-   unsigned int data_offset,
-   int len)
+   const struct xdp_buff *xdp)
 {
struct mlx5e_sq  *sq   = &rq->channel->xdp_sq;
struct mlx5_wq_cyc   *wq   = &sq->wq;
@@ -659,9 +658,16 @@ static inline void mlx5e_xmit_xdp_fr

Re: [PATCH net-next v2] bridge: multicast to unicast

2017-01-17 Thread kbuild test robot

Hi Felix,

[auto build test WARNING on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Linus-L-ssing/bridge-multicast-to-unicast/20170118-120345
config: x86_64-rhel-7.2 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

Note: it may well be a FALSE warning. FWIW you are at least aware of it now.
http://gcc.gnu.org/wiki/Better_Uninitialized_Warnings

All warnings (new ones prefixed by >>):

   net/bridge/br_forward.c: In function 'br_multicast_flood':
>> net/bridge/br_forward.c:261:27: warning: 'port' may be used uninitialized in 
>> this function [-Wmaybe-uninitialized]
  struct net_bridge_port *port, *lport, *rport;
  ^~~~

vim +/port +261 net/bridge/br_forward.c

5cb5e947 Herbert Xu  2010-02-27  245  #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
5cb5e947 Herbert Xu  2010-02-27  246  /* called with rcu_read_lock */
37b090e6 Nikolay Aleksandrov 2016-07-14  247  void br_multicast_flood(struct 
net_bridge_mdb_entry *mdst,
b35c5f63 Nikolay Aleksandrov 2016-07-14  248struct sk_buff 
*skb,
37b090e6 Nikolay Aleksandrov 2016-07-14  249bool local_rcv, 
bool local_orig)
5cb5e947 Herbert Xu  2010-02-27  250  {
5cb5e947 Herbert Xu  2010-02-27  251struct net_device *dev = 
BR_INPUT_SKB_CB(skb)->brdev;
1080ab95 Nikolay Aleksandrov 2016-06-28  252u8 igmp_type = 
br_multicast_igmp_type(skb);
5cb5e947 Herbert Xu  2010-02-27  253struct net_bridge *br = 
netdev_priv(dev);
afe0159d stephen hemminger   2010-04-27  254struct net_bridge_port *prev = 
NULL;
5cb5e947 Herbert Xu  2010-02-27  255struct net_bridge_port_group *p;
5cb5e947 Herbert Xu  2010-02-27  256struct hlist_node *rp;
5cb5e947 Herbert Xu  2010-02-27  257  
e8051688 Eric Dumazet2010-11-15  258rp = 
rcu_dereference(hlist_first_rcu(&br->router_list));
83f6a740 stephen hemminger   2010-04-27  259p = mdst ? 
rcu_dereference(mdst->ports) : NULL;
5cb5e947 Herbert Xu  2010-02-27  260while (p || rp) {
afe0159d stephen hemminger   2010-04-27 @261struct net_bridge_port 
*port, *lport, *rport;
afe0159d stephen hemminger   2010-04-27  262  
5cb5e947 Herbert Xu  2010-02-27  263lport = p ? p->port : 
NULL;
5cb5e947 Herbert Xu  2010-02-27  264rport = rp ? 
hlist_entry(rp, struct net_bridge_port, rlist) :
5cb5e947 Herbert Xu  2010-02-27  265 NULL;
5cb5e947 Herbert Xu  2010-02-27  266  
507962cd Felix Fietkau   2017-01-17  267if ((unsigned 
long)lport > (unsigned long)rport) {
507962cd Felix Fietkau   2017-01-17  268if (p->flags & 
MDB_PG_FLAGS_MCAST_TO_UCAST) {
507962cd Felix Fietkau   2017-01-17  269
maybe_deliver_addr(lport, skb, p->eth_addr,

:: The code at line 261 was first introduced by commit
:: afe0159d935ab731c682e811356914bb2be9470c bridge: multicast_flood cleanup

:: TO: stephen hemminger 
:: CC: David S. Miller 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [RFC v2 00/10] HFI Virtual Network Interface Controller (VNIC)

2017-01-17 Thread Leon Romanovsky

On Tue, Jan 17, 2017 at 11:27:20AM -0800, Vishwanathapura, Niranjana wrote:
> Thanks Jason for the valuable inputs.
>
> Here is the new generic interface.
>
> Overview:
> Bottom driver defines net_device_ops. The upper driver can override it.
> For example, upper driver can implement ndo_open() which calls bottom
> driver's ndo_open() and also do some book keeping.
>
>
> include/rdma/ib_verbs.h:
>
> /* rdma netdev type - specifies protocol type */
> enum rdma_netdev_t {
>   RDMA_NETDEV_HFI_VNIC,
> };
>
> /* rdma netdev
>  * For usecases where netstack interfacing is required.
>  */
> struct rdma_netdev {
>   struct net_device *netdev;
>   u8 port_num;
>
>   /* client private data structure */
>   void *clnt_priv;
>
>   /* control functions */
>   void (*set_id)(struct rdma_netdev *rn, int id);
>   void (*set_state)(struct rdma_netdev *rn, int state);
> };
>
> struct ib_device {
>   ...
>   ...
>   /* rdma netdev operations */
>   struct net_device *(*alloc_rdma_netdev)(struct ib_device *device,
>   u8 port_num,
>   enum rdma_netdev_t type,
>   const char *name,
>   unsigned char name_assign_type,
>   void (*setup)(struct net_device *));
>   void (*free_rdma_netdev)(struct net_device *netdev);
> };
>
>
> hfi1 driver:
>
> /* rdma netdev's private data structure */
> struct hfi1_rdma_netdev {
>   struct rdma_netdev  rn; /* keep this first */
>   /* hfi1's vnic private data follows */
> };
>
>
> include/rdma/opa_hfi.h:
>
> /* Client's ndo operations use below function instead of netdev_priv() */
> static inline void *hfi_vnic_priv(const struct net_device *dev)
> {
>   struct rdma_netdev *rn = netdev_priv(dev);
>
>   return rn->clnt_priv;
> }
>
> /* Overrides rtnl_link_stats64 to include hfi_vnic stats.
>  * ndo_get_stats64() can be used to get the stats
>  */
> struct hfi_vnic_stats {
>   /* standard netdev statistics */
>   struct rtnl_link_stats64  netstat;
>
>   /* HFI VNIC statistics */
>   u64  tx_mcastbcast;
>   u64  tx_untagged;
>   u64  tx_vlan;
>   u64  tx_64_size;
>   u64  tx_65_127;
>   u64  tx_128_255;
>   u64  tx_256_511;
>   u64  tx_512_1023;
>   u64  tx_1024_1518;
>   u64  tx_1519_max;
>
>   u64  rx_untagged;
>   u64  rx_vlan;
>   u64  rx_64_size;
>   u64  rx_65_127;
>   u64  rx_128_255;
>   u64  rx_256_511;
>   u64  rx_512_1023;
>   u64  rx_1024_1518;
>   u64  rx_1519_max;
>
>   u64  rx_runt;
>   u64  rx_oversize;
> };
>
> I have started working on porting hfi_vnic as per this new interface.
> I will post RFC v3 later.
> Posting the interface definition early for comments.

I wonder how many people will comment it without seeing usage example.

>
> Thanks,
> Niranjana
>


signature.asc
Description: PGP signature

RE: [PATCH v2] net: fec: Fixed panic problem with non-tso

2017-01-17 Thread Andy Duan

From: Eric Dumazet  Sent: Wednesday, January 18, 2017 
1:02 PM
>To: Yuusuke Ashiduka 
>Cc: Andy Duan ; netdev@vger.kernel.org
>Subject: Re: [PATCH v2] net: fec: Fixed panic problem with non-tso
>
>On Wed, 2017-01-18 at 13:11 +0900, Yuusuke Ashiduka wrote:
>> If highmem and 2GB or more of memory are valid, "this_frag-> page.p"
>> indicates the highmem area, so the result of page_address() is NULL
>> and panic occurs.
>>
>> This commit fixes this by using the skb_frag_dma_map() helper, which
>> takes care of mapping the skb fragment properly. Additionally, the
>> type of mapping is now tracked, so it can be unmapped using
>> dma_unmap_page or dma_unmap_single when appropriate.
>
>
>I would prefer we fix the root cause, instead of tweaking all legacy drivers 
>out
>there :/
>
>
I agree with you.

The driver always doesn't support highmem. The fragment shouldn't  allocate 
from highmem except the common code bug.
If request the driver to support NETIF_F_HIGHDMA feature, we also add highmem 
support for tso driver.

Andy

Re: [PATCH v2] net: fec: Fixed panic problem with non-tso

2017-01-17 Thread Eric Dumazet

On Wed, 2017-01-18 at 13:11 +0900, Yuusuke Ashiduka wrote:
> If highmem and 2GB or more of memory are valid,
> "this_frag-> page.p" indicates the highmem area,
> so the result of page_address() is NULL and panic occurs.
> 
> This commit fixes this by using the skb_frag_dma_map() helper,
> which takes care of mapping the skb fragment properly. Additionally,
> the type of mapping is now tracked, so it can be unmapped using
> dma_unmap_page or dma_unmap_single when appropriate.


I would prefer we fix the root cause, instead of tweaking all legacy
drivers out there :/

[PATCH net-next] mlx4: support __GFP_MEMALLOC for rx

2017-01-17 Thread Eric Dumazet

From: Eric Dumazet 

Commit 04aeb56a1732 ("net/mlx4_en: allocate non 0-order pages for RX
ring with __GFP_NOMEMALLOC") added code that appears to be not needed at
that time, since mlx4 never used __GFP_MEMALLOC allocations anyway.

As using memory reserves is a must in some situations (swap over NFS or
iSCSI), this patch adds this flag.

Note that this driver does not reuse pages (yet) so we do not have to
add anything else.

Signed-off-by: Eric Dumazet 
Cc: Konstantin Khlebnikov 
Cc: Tariq Toukan 
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 
eac527e25ec902c2a586e9952272b9e8e599e2c8..e362f99334d03c0df4d88320977670015870dd9c
 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -706,7 +706,8 @@ static bool mlx4_en_refill_rx_buffers(struct mlx4_en_priv 
*priv,
do {
if (mlx4_en_prepare_rx_desc(priv, ring,
ring->prod & ring->size_mask,
-   GFP_ATOMIC | __GFP_COLD))
+   GFP_ATOMIC | __GFP_COLD |
+   __GFP_MEMALLOC))
break;
ring->prod++;
} while (--missing);

Re: [PATCH] net: fec: Fixed panic problem with non-tso

2017-01-17 Thread Eric Dumazet

On Tue, 2017-01-17 at 20:21 -0800, Eric Dumazet wrote:
> On Wed, 2017-01-18 at 03:12 +, Ashizuka, Yuusuke wrote:
> 
> > indeed.
> > 
> > In the case of TSO with i.MX6 system (highmem enabled) with 2GB memory,
> > "this_frag->page.p" did not become highmem area.
> > (We confirmed by transferring about 100MB of files)
> > 
> > However, in the case of non-tso on an i.MX6 system with 2GB of memory,
> > "this_frag->page.p" may become a highmem area.
> > (Occurred with approximately 2MB of file transfer)
> > 
> > For non-tso only, I do not know the reason why "this_frag-> page.p" 
> > in this driver shows highmem area.
> 
> This worries me, since this driver does not set NETIF_F_HIGHDMA in its
> features.
> 
> No packet should be given to this driver with a highmem fragment
> 
> Check is done in illegal_highdma() in net/core/dev.c

This used to work.

I suspect commit ec5f061564238892005257c83565a0b58ec79295
("net: Kill link between CSUM and SG features.")

added this bug.

Can you try this hot fix :

diff --git a/net/core/dev.c b/net/core/dev.c
index 
ad5959e561166f445bdd9d7260652a338f74cfea..073b832b945257dba9ed47f4bf875605225effc9
 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2773,9 +2773,9 @@ static netdev_features_t harmonize_features(struct 
sk_buff *skb,
if (skb->ip_summed != CHECKSUM_NONE &&
!can_checksum_protocol(features, type)) {
features &= ~(NETIF_F_CSUM_MASK | NETIF_F_GSO_MASK);
-   } else if (illegal_highdma(skb->dev, skb)) {
-   features &= ~NETIF_F_SG;
}
+   if (illegal_highdma(skb->dev, skb))
+   features &= ~NETIF_F_SG;
 
return features;
 }

Re: [PATCH] net: fec: Fixed panic problem with non-tso

2017-01-17 Thread Eric Dumazet

On Wed, 2017-01-18 at 03:12 +, Ashizuka, Yuusuke wrote:

> indeed.
> 
> In the case of TSO with i.MX6 system (highmem enabled) with 2GB memory,
> "this_frag->page.p" did not become highmem area.
> (We confirmed by transferring about 100MB of files)
> 
> However, in the case of non-tso on an i.MX6 system with 2GB of memory,
> "this_frag->page.p" may become a highmem area.
> (Occurred with approximately 2MB of file transfer)
> 
> For non-tso only, I do not know the reason why "this_frag-> page.p" 
> in this driver shows highmem area.

This worries me, since this driver does not set NETIF_F_HIGHDMA in its
features.

No packet should be given to this driver with a highmem fragment

Check is done in illegal_highdma() in net/core/dev.c

[PATCH v2] net: fec: Fixed panic problem with non-tso

2017-01-17 Thread Yuusuke Ashiduka

If highmem and 2GB or more of memory are valid,
"this_frag-> page.p" indicates the highmem area,
so the result of page_address() is NULL and panic occurs.

This commit fixes this by using the skb_frag_dma_map() helper,
which takes care of mapping the skb fragment properly. Additionally,
the type of mapping is now tracked, so it can be unmapped using
dma_unmap_page or dma_unmap_single when appropriate.

Signed-off-by: Yuusuke Ashiduka 
---

Changes for v2:
 - Added signed-off
---
 drivers/net/ethernet/freescale/fec.h  |  1 +
 drivers/net/ethernet/freescale/fec_main.c | 48 +++
 2 files changed, 37 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec.h 
b/drivers/net/ethernet/freescale/fec.h
index 5ea740b4cf14..5b187e8aacf0 100644
--- a/drivers/net/ethernet/freescale/fec.h
+++ b/drivers/net/ethernet/freescale/fec.h
@@ -463,6 +463,7 @@ struct bufdesc_prop {
 struct fec_enet_priv_tx_q {
struct bufdesc_prop bd;
unsigned char *tx_bounce[TX_RING_SIZE];
+   int tx_page_mapping[TX_RING_SIZE];
struct  sk_buff *tx_skbuff[TX_RING_SIZE];
 
unsigned short tx_stop_threshold;
diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index 38160c2bebcb..b1562107e337 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -60,6 +60,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -377,20 +378,28 @@ fec_enet_txq_submit_frag_skb(struct fec_enet_priv_tx_q 
*txq,
ebdp->cbd_esc = cpu_to_fec32(estatus);
}
 
-   bufaddr = page_address(this_frag->page.p) + 
this_frag->page_offset;
-
index = fec_enet_get_bd_index(bdp, &txq->bd);
-   if (((unsigned long) bufaddr) & fep->tx_align ||
+   txq->tx_page_mapping[index] = 0;
+   if (this_frag->page_offset & fep->tx_align ||
fep->quirks & FEC_QUIRK_SWAP_FRAME) {
+   bufaddr = kmap_atomic(this_frag->page.p) +
+   this_frag->page_offset;
memcpy(txq->tx_bounce[index], bufaddr, frag_len);
+   kunmap_atomic(bufaddr);
bufaddr = txq->tx_bounce[index];
 
if (fep->quirks & FEC_QUIRK_SWAP_FRAME)
swap_buffer(bufaddr, frag_len);
+   addr = dma_map_single(&fep->pdev->dev,
+ bufaddr,
+ frag_len,
+ DMA_TO_DEVICE);
+   } else {
+   txq->tx_page_mapping[index] = 1;
+   addr = skb_frag_dma_map(&fep->pdev->dev, this_frag, 0,
+   frag_len, DMA_TO_DEVICE);
}
 
-   addr = dma_map_single(&fep->pdev->dev, bufaddr, frag_len,
- DMA_TO_DEVICE);
if (dma_mapping_error(&fep->pdev->dev, addr)) {
if (net_ratelimit())
netdev_err(ndev, "Tx DMA memory map failed\n");
@@ -411,8 +420,16 @@ fec_enet_txq_submit_frag_skb(struct fec_enet_priv_tx_q 
*txq,
bdp = txq->bd.cur;
for (i = 0; i < frag; i++) {
bdp = fec_enet_get_nextdesc(bdp, &txq->bd);
-   dma_unmap_single(&fep->pdev->dev, 
fec32_to_cpu(bdp->cbd_bufaddr),
-fec16_to_cpu(bdp->cbd_datlen), DMA_TO_DEVICE);
+   if (txq->tx_page_mapping[index])
+   dma_unmap_page(&fep->pdev->dev,
+  fec32_to_cpu(bdp->cbd_bufaddr),
+  fec16_to_cpu(bdp->cbd_datlen),
+  DMA_TO_DEVICE);
+   else
+   dma_unmap_single(&fep->pdev->dev,
+fec32_to_cpu(bdp->cbd_bufaddr),
+fec16_to_cpu(bdp->cbd_datlen),
+DMA_TO_DEVICE);
}
return ERR_PTR(-ENOMEM);
 }
@@ -1201,11 +1218,18 @@ fec_enet_tx_queue(struct net_device *ndev, u16 queue_id)
 
skb = txq->tx_skbuff[index];
txq->tx_skbuff[index] = NULL;
-   if (!IS_TSO_HEADER(txq, fec32_to_cpu(bdp->cbd_bufaddr)))
-   dma_unmap_single(&fep->pdev->dev,
-fec32_to_cpu(bdp->cbd_bufaddr),
-fec16_to_cpu(bdp->cbd_datlen),
-DMA_TO_DEVICE);
+   if (!IS_TSO_HEADER(txq, fec32_to_cpu(bdp->cbd_bufaddr))) {
+   if (txq->tx_page_mapping[index])
+   dma_unmap_page(&fep->pdev->

Re: [PATCH] virtio: don't set VIRTIO_NET_HDR_F_DATA_VALID on xmit

2017-01-17 Thread Jason Wang




On 2017年01月18日 02:27, Michael S. Tsirkin wrote:

On Tue, Jan 17, 2017 at 06:13:51PM +, Rolf Neugebauer wrote:

This patch part reverts fd2a0437dc33 and e858fae2b0b8 which introduced a
subtle change in how the virtio_net flags are derived from the SKBs
ip_summed field.

With the above commits, the flags are set to VIRTIO_NET_HDR_F_DATA_VALID
when ip_summed == CHECKSUM_UNNECESSARY, thus treating it differently to
ip_summed == CHECKSUM_NONE, which should be the same.

Further, the virtio spec 1.0 / CS04 explicitly says that
VIRTIO_NET_HDR_F_DATA_VALID must not be set by the driver.

Signed-off-by: Rolf Neugebauer 

Fixes: fd2a0437dc33 ("virtio_net: introduce virtio_net_hdr_{from,to}_skb")
Fixes: e858fae2b0b8 (" virtio_net: use common code for virtio_net_hdr and skb GSO 
conversion")
Acked-by: Michael S. Tsirkin 

Should be backported into stable as well.


Looks like a side effect is that we will never see this on receive path? 
We probably need a hint for virtio_net_hdr_from_skb().


Thanks





---
  include/linux/virtio_net.h | 2 --
  1 file changed, 2 deletions(-)

diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index 66204007d7ac..56436472ccc7 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -91,8 +91,6 @@ static inline int virtio_net_hdr_from_skb(const struct 
sk_buff *skb,
skb_checksum_start_offset(skb));
hdr->csum_offset = __cpu_to_virtio16(little_endian,
skb->csum_offset);
-   } else if (skb->ip_summed == CHECKSUM_UNNECESSARY) {
-   hdr->flags = VIRTIO_NET_HDR_F_DATA_VALID;
} /* else everything is zero */
  
  	return 0;

--
2.11.0

Re: [net PATCH v5 6/6] virtio_net: XDP support for adjust_head

2017-01-17 Thread Jason Wang




On 2017年01月18日 06:22, John Fastabend wrote:
  
+static int virtnet_reset(struct virtnet_info *vi)

+{
+   struct virtio_device *dev = vi->vdev;
+   int ret;
+
+   virtio_config_disable(dev);
+   dev->failed = dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED;
+   virtnet_freeze_down(dev);
+   _remove_vq_common(vi);
+
+   dev->config->reset(dev);
+   virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
+   virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
+
+   ret = virtio_finalize_features(dev);
+   if (ret)
+   goto err;
+
+   ret = virtnet_restore_up(dev);
+   if (ret)
+   goto err;
+   ret = _virtnet_set_queues(vi, vi->curr_queue_pairs);
+   if (ret)
+   goto err;
+
+   virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
+   virtio_config_enable(dev);
+   return 0;
+err:
+   virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
+   return ret;
+}
+


Hi John:

I still prefer not open code (part of) virtio_device_freeze() and 
virtio_device_restore() here. How about:


1) introduce __virtio_device_freeze/__virtio_device_restore which 
accepts a function pointer of free/restore
2) for virtio_device_freeze/virtio_device_restore just pass 
drv->freeze/drv->restore (locked version)

3) for virtnet_reset(), we can pass unlocked version of freeze and restore

Just my preference, if both Michael and you stick to this, I'm also fine.

Thanks

RE: [PATCH] net: fec: Fixed panic problem with non-tso

2017-01-17 Thread Ashizuka, Yuusuke

> -Original Message-
> From: Andy Duan [mailto:fugang.d...@nxp.com]
> Sent: Tuesday, January 17, 2017 8:02 PM
> To: Ashizuka, Yuusuke/芦塚 雄介
> Cc: netdev@vger.kernel.org
> Subject: RE: [PATCH] net: fec: Fixed panic problem with non-tso
> 
> From: Yuusuke Ashiduka  Sent: Tuesday, January
> 17, 2017 3:48 PM
> >To: Andy Duan 
> >Cc: netdev@vger.kernel.org; Yuusuke Ashiduka 
> >Subject: [PATCH] net: fec: Fixed panic problem with non-tso
> >
> >If highmem and 2GB or more of memory are valid, "this_frag-> page.p"
> >indicates the highmem area, so the result of page_address() is NULL and
> >panic occurs.
> >
> >This commit fixes this by using the skb_frag_dma_map() helper, which
> >takes care of mapping the skb fragment properly. Additionally, the type
> >of mapping is now tracked, so it can be unmapped using dma_unmap_page
> >or dma_unmap_single when appropriate.
> >---
> > drivers/net/ethernet/freescale/fec.h  |  1 +
> > drivers/net/ethernet/freescale/fec_main.c | 48
> >+++
> > 2 files changed, 37 insertions(+), 12 deletions(-)
> >
> The patch itself seems fine.
> The driver doesn't support skb from highmem, if to support highmem, it should
> add frag_skb (highmem) support for tso and non-tso.
> In driver net/core/tso.c, it also add highmem support, right ?

indeed.

In the case of TSO with i.MX6 system (highmem enabled) with 2GB memory,
"this_frag->page.p" did not become highmem area.
(We confirmed by transferring about 100MB of files)

However, in the case of non-tso on an i.MX6 system with 2GB of memory,
"this_frag->page.p" may become a highmem area.
(Occurred with approximately 2MB of file transfer)

For non-tso only, I do not know the reason why "this_frag-> page.p" 
in this driver shows highmem area.

Thanks.

> 
> Thanks.
> 
> >diff --git a/drivers/net/ethernet/freescale/fec.h
> >b/drivers/net/ethernet/freescale/fec.h
> >index 5ea740b4cf14..5b187e8aacf0 100644
> >--- a/drivers/net/ethernet/freescale/fec.h
> >+++ b/drivers/net/ethernet/freescale/fec.h
> >@@ -463,6 +463,7 @@ struct bufdesc_prop {  struct fec_enet_priv_tx_q {
> > struct bufdesc_prop bd;
> > unsigned char *tx_bounce[TX_RING_SIZE];
> >+int tx_page_mapping[TX_RING_SIZE];
> > struct  sk_buff *tx_skbuff[TX_RING_SIZE];
> >
> > unsigned short tx_stop_threshold;
> >diff --git a/drivers/net/ethernet/freescale/fec_main.c
> >b/drivers/net/ethernet/freescale/fec_main.c
> >index 38160c2bebcb..b1562107e337 100644
> >--- a/drivers/net/ethernet/freescale/fec_main.c
> >+++ b/drivers/net/ethernet/freescale/fec_main.c
> >@@ -60,6 +60,7 @@
> > #include 
> > #include 
> > #include 
> >+#include 
> > #include 
> >
> > #include 
> >@@ -377,20 +378,28 @@ fec_enet_txq_submit_frag_skb(struct
> >fec_enet_priv_tx_q *txq,
> > ebdp->cbd_esc = cpu_to_fec32(estatus);
> > }
> >
> >-bufaddr = page_address(this_frag->page.p) + this_frag-
> >>page_offset;
> >-
> > index = fec_enet_get_bd_index(bdp, &txq->bd);
> >-if (((unsigned long) bufaddr) & fep->tx_align ||
> >+txq->tx_page_mapping[index] = 0;
> >+if (this_frag->page_offset & fep->tx_align ||
> > fep->quirks & FEC_QUIRK_SWAP_FRAME) {
> >+bufaddr = kmap_atomic(this_frag->page.p) +
> >+
>   this_frag->page_offset;
> > memcpy(txq->tx_bounce[index], bufaddr,
> frag_len);
> >+kunmap_atomic(bufaddr);
> > bufaddr = txq->tx_bounce[index];
> >
> > if (fep->quirks & FEC_QUIRK_SWAP_FRAME)
> > swap_buffer(bufaddr, frag_len);
> >+addr = dma_map_single(&fep->pdev->dev,
> >+  bufaddr,
> >+  frag_len,
> >+  DMA_TO_DEVICE);
> >+} else {
> >+txq->tx_page_mapping[index] = 1;
> >+addr = skb_frag_dma_map(&fep->pdev->dev,
> >this_frag, 0,
> >+frag_len,
> DMA_TO_DEVICE);
> > }
> >
> >-addr = dma_map_single(&fep->pdev->dev, bufaddr,
> frag_len,
> >-  DMA_TO_DEVICE);
> > if (dma_mapping_error(&fep->pdev->dev, addr)) {
> > if (net_ratelimit())
> > netdev_err(ndev, "Tx DMA memory map
> failed\n"); @@ -411,8 +420,16
> >@@ fec_enet_txq_submit_frag_skb(struct
> >fec_enet_priv_tx_q *txq,
> > bdp = txq->bd.cur;
> > for (i = 0; i < frag; i++) {
> > bdp = fec_enet_get_nextdesc(bdp, &txq->bd);
> >-dma_unmap_single(&fep->pdev->dev, fec32_to_cpu(bdp-
> >>cbd_bufaddr),
> >- fec16_to_cpu(bdp->cbd_datlen),
> >DMA_TO_DEVICE);
> >+if (txq->tx_page_mapping[index])
> >+dma_unmap_page(&fep->pdev->dev,
> >+

RE: [PATCH] net: fec: Fixed panic problem with non-tso

2017-01-17 Thread Ashizuka, Yuusuke

> -Original Message-
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Wednesday, January 18, 2017 5:45 AM
> To: Ashizuka, Yuusuke/芦塚 雄介
> Cc: fugang.d...@nxp.com; netdev@vger.kernel.org
> Subject: Re: [PATCH] net: fec: Fixed panic problem with non-tso
> 
> From: Yuusuke Ashiduka 
> Date: Tue, 17 Jan 2017 16:48:20 +0900
> 
> > If highmem and 2GB or more of memory are valid, "this_frag-> page.p"
> > indicates the highmem area, so the result of page_address() is NULL
> > and panic occurs.
> >
> > This commit fixes this by using the skb_frag_dma_map() helper, which
> > takes care of mapping the skb fragment properly. Additionally, the
> > type of mapping is now tracked, so it can be unmapped using
> > dma_unmap_page or dma_unmap_single when appropriate.
> 
> This patch submission is lacking a proper signoff.

Thank you for pointing out my mistake.
I will submit the patch again.

[PATCH net] bnxt_en: Fix "uninitialized variable" bug in TPA code path.

2017-01-17 Thread Michael Chan

In the TPA GRO code path, initialize the tcp_opt_len variable to 0 so
that it will be correct for packets without TCP timestamps.  The bug
caused the SKB fields to be incorrectly set up for packets without
TCP timestamps, leading to these packets being rejected by the stack.

Reported-by: Andy Gospodarek 
Acked-by: Andy Gospodarek 
Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 9608cb4..53e686f 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -1099,7 +1099,7 @@ static struct sk_buff *bnxt_gro_func_5730x(struct 
bnxt_tpa_info *tpa_info,
 {
 #ifdef CONFIG_INET
struct tcphdr *th;
-   int len, nw_off, tcp_opt_len;
+   int len, nw_off, tcp_opt_len = 0;
 
if (tcp_ts)
tcp_opt_len = 12;
-- 
1.8.3.1

Re: [PATCH] net: ethernet: stmmac: add ARP management

2017-01-17 Thread kbuild test robot

Hi Christophe,

[auto build test WARNING on net-next/master]
[also build test WARNING on next-20170117]
[cannot apply to v4.10-rc4]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Christophe-Roullier/net-ethernet-stmmac-add-ARP-management/20170118-084026
config: x86_64-kexec (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   drivers/net/ethernet/stmicro/stmmac/stmmac_main.c: In function 
'stmmac_dvr_probe':
>> drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:3296:11: warning: passing 
>> argument 3 of 'priv->hw->dma->set_arp_addr' makes integer from pointer 
>> without a cast [-Wint-conversion]
  priv->dev->dev_addr);
  ^~~~
   drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:3296:11: note: expected 
'u32 {aka unsigned int}' but argument is of type 'unsigned char *'

vim +3296 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c

  3280  NETIF_F_RXCSUM;
  3281  
  3282  if ((priv->plat->tso_en) && (priv->dma_cap.tsoen)) {
  3283  ndev->hw_features |= NETIF_F_TSO;
  3284  priv->tso = true;
  3285  dev_info(priv->device, "TSO feature enabled\n");
  3286  }
  3287  
  3288  if ((priv->plat->arp_en) && (priv->dma_cap.arpoffsel)) {
  3289  ret = priv->hw->mac->arp_en(priv->hw);
  3290  if (!ret) {
  3291  pr_warn(" ARP feature disabled\n");
  3292  } else {
  3293  pr_info(" ARP feature enabled\n");
  3294  /* Copy MAC addr into MAC_ARP_ADDRESS register*/
  3295  priv->hw->dma->set_arp_addr(priv->ioaddr, 1,
> 3296  
> priv->dev->dev_addr);
  3297  }
  3298  }
  3299  
  3300  ndev->features |= ndev->hw_features | NETIF_F_HIGHDMA;
  3301  ndev->watchdog_timeo = msecs_to_jiffies(watchdog);
  3302  #ifdef STMMAC_VLAN_TAG_USED
  3303  /* Both mac100 and gmac support receive VLAN tag detection */
  3304  ndev->features |= NETIF_F_HW_VLAN_CTAG_RX;

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

[PATCH net-next 2/2] net: dsa: use cpu_switch instead of ds[0]

2017-01-17 Thread Vivien Didelot

Now that the DSA Ethernet switches are true Linux devices, the CPU
switch is not necessarily the first one. If its address is higher than
the second switch on the same MDIO bus, its index will be 1, not 0.

Avoid any confusion by using dst->cpu_switch instead of dst->ds[0].

Signed-off-by: Vivien Didelot 
---
 net/dsa/dsa.c | 2 +-
 net/dsa/dsa2.c| 8 
 net/dsa/slave.c   | 6 +++---
 net/dsa/tag_brcm.c| 2 +-
 net/dsa/tag_qca.c | 2 +-
 net/dsa/tag_trailer.c | 2 +-
 6 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index cb42655ba7da..87f2a9c9fa12 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -868,7 +868,7 @@ static void dsa_remove_dst(struct dsa_switch_tree *dst)
dsa_switch_destroy(ds);
}
 
-   dsa_cpu_port_ethtool_restore(dst->ds[0]);
+   dsa_cpu_port_ethtool_restore(dst->cpu_switch);
 
dev_put(dst->master_netdev);
 }
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index a9bf28d9f41f..634c6700a179 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -381,8 +381,8 @@ static int dsa_dst_apply(struct dsa_switch_tree *dst)
return err;
}
 
-   if (dst->ds[0]) {
-   err = dsa_cpu_port_ethtool_setup(dst->ds[0]);
+   if (dst->cpu_switch) {
+   err = dsa_cpu_port_ethtool_setup(dst->cpu_switch);
if (err)
return err;
}
@@ -426,8 +426,8 @@ static void dsa_dst_unapply(struct dsa_switch_tree *dst)
dsa_ds_unapply(dst, ds);
}
 
-   if (dst->ds[0])
-   dsa_cpu_port_ethtool_restore(dst->ds[0]);
+   if (dst->cpu_switch)
+   dsa_cpu_port_ethtool_restore(dst->cpu_switch);
 
pr_info("DSA: tree %d unapplied\n", dst->tree);
dst->applied = false;
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 0cdcaf526987..b8e58689a9a1 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -781,7 +781,7 @@ static void dsa_cpu_port_get_ethtool_stats(struct 
net_device *dev,
   uint64_t *data)
 {
struct dsa_switch_tree *dst = dev->dsa_ptr;
-   struct dsa_switch *ds = dst->ds[0];
+   struct dsa_switch *ds = dst->cpu_switch;
s8 cpu_port = dst->cpu_port;
int count = 0;
 
@@ -798,7 +798,7 @@ static void dsa_cpu_port_get_ethtool_stats(struct 
net_device *dev,
 static int dsa_cpu_port_get_sset_count(struct net_device *dev, int sset)
 {
struct dsa_switch_tree *dst = dev->dsa_ptr;
-   struct dsa_switch *ds = dst->ds[0];
+   struct dsa_switch *ds = dst->cpu_switch;
int count = 0;
 
if (dst->master_ethtool_ops.get_sset_count)
@@ -814,7 +814,7 @@ static void dsa_cpu_port_get_strings(struct net_device *dev,
 uint32_t stringset, uint8_t *data)
 {
struct dsa_switch_tree *dst = dev->dsa_ptr;
-   struct dsa_switch *ds = dst->ds[0];
+   struct dsa_switch *ds = dst->cpu_switch;
s8 cpu_port = dst->cpu_port;
int len = ETH_GSTRING_LEN;
int mcount = 0, count;
diff --git a/net/dsa/tag_brcm.c b/net/dsa/tag_brcm.c
index 21bffde6e4bf..af82927674e0 100644
--- a/net/dsa/tag_brcm.c
+++ b/net/dsa/tag_brcm.c
@@ -102,7 +102,7 @@ static int brcm_tag_rcv(struct sk_buff *skb, struct 
net_device *dev,
if (unlikely(dst == NULL))
goto out_drop;
 
-   ds = dst->ds[0];
+   ds = dst->cpu_switch;
 
skb = skb_unshare(skb, GFP_ATOMIC);
if (skb == NULL)
diff --git a/net/dsa/tag_qca.c b/net/dsa/tag_qca.c
index 0c90cacee7aa..736ca8e8c31e 100644
--- a/net/dsa/tag_qca.c
+++ b/net/dsa/tag_qca.c
@@ -104,7 +104,7 @@ static int qca_tag_rcv(struct sk_buff *skb, struct 
net_device *dev,
/* This protocol doesn't support cascading multiple switches so it's
 * safe to assume the switch is first in the tree
 */
-   ds = dst->ds[0];
+   ds = dst->cpu_switch;
if (!ds)
goto out_drop;
 
diff --git a/net/dsa/tag_trailer.c b/net/dsa/tag_trailer.c
index 5e3903eb1afa..271128a2dc64 100644
--- a/net/dsa/tag_trailer.c
+++ b/net/dsa/tag_trailer.c
@@ -67,7 +67,7 @@ static int trailer_rcv(struct sk_buff *skb, struct net_device 
*dev,
 
if (unlikely(dst == NULL))
goto out_drop;
-   ds = dst->ds[0];
+   ds = dst->cpu_switch;
 
skb = skb_unshare(skb, GFP_ATOMIC);
if (skb == NULL)
-- 
2.11.0

[PATCH net-next 1/2] net: dsa: store CPU switch structure in the tree

2017-01-17 Thread Vivien Didelot

Store a dsa_switch pointer to the CPU switch in the tree instead of only
its index. This avoids the need to initialize it to -1.

Signed-off-by: Vivien Didelot 
---
 include/net/dsa.h | 8 
 net/dsa/dsa.c | 7 +++
 net/dsa/dsa2.c| 5 ++---
 3 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 454667952d6d..82f7019f27f2 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -124,7 +124,7 @@ struct dsa_switch_tree {
/*
 * The switch and port to which the CPU is attached.
 */
-   s8  cpu_switch;
+   struct dsa_switch   *cpu_switch;
s8  cpu_port;
 
/*
@@ -211,7 +211,7 @@ struct dsa_switch {
 
 static inline bool dsa_is_cpu_port(struct dsa_switch *ds, int p)
 {
-   return !!(ds->index == ds->dst->cpu_switch && p == ds->dst->cpu_port);
+   return !!(ds == ds->dst->cpu_switch && p == ds->dst->cpu_port);
 }
 
 static inline bool dsa_is_dsa_port(struct dsa_switch *ds, int p)
@@ -234,10 +234,10 @@ static inline u8 dsa_upstream_port(struct dsa_switch *ds)
 * Else return the (DSA) port number that connects to the
 * switch that is one hop closer to the cpu.
 */
-   if (dst->cpu_switch == ds->index)
+   if (dst->cpu_switch == ds)
return dst->cpu_port;
else
-   return ds->rtable[dst->cpu_switch];
+   return ds->rtable[dst->cpu_switch->index];
 }
 
 struct switchdev_trans;
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 96d1544df518..cb42655ba7da 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -225,12 +225,12 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, 
struct device *parent)
continue;
 
if (!strcmp(name, "cpu")) {
-   if (dst->cpu_switch != -1) {
+   if (!dst->cpu_switch) {
netdev_err(dst->master_netdev,
   "multiple cpu ports?!\n");
return -EINVAL;
}
-   dst->cpu_switch = index;
+   dst->cpu_switch = ds;
dst->cpu_port = i;
ds->cpu_port_mask |= 1 << i;
} else if (!strcmp(name, "dsa")) {
@@ -254,7 +254,7 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, 
struct device *parent)
 * tagging protocol to the preferred tagging format of this
 * switch.
 */
-   if (dst->cpu_switch == index) {
+   if (dst->cpu_switch == ds) {
enum dsa_tag_protocol tag_protocol;
 
tag_protocol = ops->get_tag_protocol(ds);
@@ -757,7 +757,6 @@ static int dsa_setup_dst(struct dsa_switch_tree *dst, 
struct net_device *dev,
 
dst->pd = pd;
dst->master_netdev = dev;
-   dst->cpu_switch = -1;
dst->cpu_port = -1;
 
for (i = 0; i < pd->nr_chips; i++) {
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index a1f26fc0f585..a9bf28d9f41f 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -57,7 +57,6 @@ static struct dsa_switch_tree *dsa_add_dst(u32 tree)
if (!dst)
return NULL;
dst->tree = tree;
-   dst->cpu_switch = -1;
INIT_LIST_HEAD(&dst->list);
list_add_tail(&dsa_switch_trees, &dst->list);
kref_init(&dst->refcount);
@@ -456,8 +455,8 @@ static int dsa_cpu_parse(struct device_node *port, u32 
index,
if (!dst->master_netdev)
dst->master_netdev = ethernet_dev;
 
-   if (dst->cpu_switch == -1) {
-   dst->cpu_switch = ds->index;
+   if (!dst->cpu_switch) {
+   dst->cpu_switch = ds;
dst->cpu_port = index;
}
 
-- 
2.11.0

Re: [PATCH 3/4] net: ethernet: ti: cpsw: don't duplicate ndev_running

2017-01-17 Thread Ivan Khoronzhuk

On Thu, Jan 12, 2017 at 11:34:47AM -0600, Grygorii Strashko wrote:

Hi Grygorii,
Sorry for late reply.

> 
> 
> On 01/10/2017 07:56 PM, Ivan Khoronzhuk wrote:
> > On Mon, Jan 09, 2017 at 11:25:38AM -0600, Grygorii Strashko wrote:
> >>
> >>
> >> On 01/08/2017 10:41 AM, Ivan Khoronzhuk wrote:
> >>> No need to create additional vars to identify if interface is running.
> >>> So simplify code by removing redundant var and checking usage counter
> >>> instead.
> >>>
> >>> Signed-off-by: Ivan Khoronzhuk 
> >>> ---
> >>>  drivers/net/ethernet/ti/cpsw.c | 14 --
> >>>  1 file changed, 4 insertions(+), 10 deletions(-)
> >>>
> >>> diff --git a/drivers/net/ethernet/ti/cpsw.c 
> >>> b/drivers/net/ethernet/ti/cpsw.c
> >>> index 40d7fc9..daae87f 100644
> >>> --- a/drivers/net/ethernet/ti/cpsw.c
> >>> +++ b/drivers/net/ethernet/ti/cpsw.c
> >>> @@ -357,7 +357,6 @@ struct cpsw_slave {
> >>>   struct phy_device   *phy;
> >>>   struct net_device   *ndev;
> >>>   u32 port_vlan;
> >>> - u32 open_stat;
> >>>  };
> >>>  
> >>>  static inline u32 slave_read(struct cpsw_slave *slave, u32 offset)
> >>> @@ -1241,7 +1240,7 @@ static int cpsw_common_res_usage_state(struct 
> >>> cpsw_common *cpsw)
> >>>   u32 usage_count = 0;
> >>>  
> >>>   for (i = 0; i < cpsw->data.slaves; i++)
> >>> - if (cpsw->slaves[i].open_stat)
> >>> + if (netif_running(cpsw->slaves[i].ndev))
> >>>   usage_count++;
> >>
> >> Not sure this will work as you expected, but may be I've missed smth :(
> > I've changed conditions, will work.
> > 
> >>
> >> code in static int __dev_open(struct net_device *dev)
> >> ..
> >>set_bit(__LINK_STATE_START, &dev->state);
> >>
> >>if (ops->ndo_validate_addr)
> >>ret = ops->ndo_validate_addr(dev);
> >>
> >>if (!ret && ops->ndo_open)
> >>ret = ops->ndo_open(dev);
> >>
> >>netpoll_poll_enable(dev);
> >>
> >>if (ret)
> >>clear_bit(__LINK_STATE_START, &dev->state);
> >> ..
> >>
> >> so, netif_running(ndev) will start returning true before calling 
> >> ops->ndo_open(dev);
> > Yes, It's done bearing it in mind of course.
> > 
> >>
> >>>  
> >>>   return usage_count;
> >>> @@ -1502,7 +1501,7 @@ static int cpsw_ndo_open(struct net_device *ndev)
> >>>CPSW_RTL_VERSION(reg));
> >>>  
> >>>   /* initialize host and slave ports */
> >>> - if (!cpsw_common_res_usage_state(cpsw))
> >>> + if (cpsw_common_res_usage_state(cpsw) < 2)
> >>
> >> Ah. You've changed the condition here.
> >>
> >> I think it might be reasonable to hide this inside 
> >> cpsw_common_res_usage_state()
> >> and seems it can be renamed to smth like cpsw_is_running().
> > It probably needs to be renamed to smth a little different,
> > like cpsw_get_usage_count ...or cpsw_get_open_ndev_count
> 
> cpsw_get_usage_count () sounds good
Like it more also. Will change it.

> 
> > 
> >>
> >>
> >>>   cpsw_init_host_port(priv);
> >>>   for_each_slave(priv, cpsw_slave_open, priv);
> >>>  
> >>> @@ -1513,7 +1512,7 @@ static int cpsw_ndo_open(struct net_device *ndev)
> >>>   cpsw_ale_add_vlan(cpsw->ale, cpsw->data.default_vlan,
> >>> ALE_ALL_PORTS, ALE_ALL_PORTS, 0, 0);
> >>>  
> >>> - if (!cpsw_common_res_usage_state(cpsw)) {
> >>> + if (cpsw_common_res_usage_state(cpsw) < 2) {
> >>>   /* disable priority elevation */
> >>>   __raw_writel(0, &cpsw->regs->ptype);
> >>>  
> >>> @@ -1556,9 +1555,6 @@ static int cpsw_ndo_open(struct net_device *ndev)
> >>>   cpdma_ctlr_start(cpsw->dma);
> >>>   cpsw_intr_enable(cpsw);
> >>>  
> >>> - if (cpsw->data.dual_emac)
> >>> - cpsw->slaves[priv->emac_port].open_stat = true;
> >>> -
> >>>   return 0;
> >>>  
> >>>  err_cleanup:
> >>> @@ -1578,7 +1574,7 @@ static int cpsw_ndo_stop(struct net_device *ndev)
> >>>   netif_tx_stop_all_queues(priv->ndev);
> >>>   netif_carrier_off(priv->ndev);
> >>>  
> >>> - if (cpsw_common_res_usage_state(cpsw) <= 1) {
> >>> + if (!cpsw_common_res_usage_state(cpsw)) {
> >>
> >> and here __LINK_STATE_START will be cleared before calling 
> >> ops->ndo_stop(dev);
> > Actually it's changed because of it.
> > 
> >> So, from one side netif_running(ndev) usage will simplify 
> >> cpsw_common_res_usage_state() internals,
> >> but from another side - it will make places where it's used even more 
> >> entangled :( as for me,
> >> because when cpsw_common_res_usage_state() will return 1 in 
> >> cpsw_ndo_open() it will mean
> >> "no interfaces is really running yet", but the same value 1 in 
> >> cpsw_ndo_stop()
> > why not? no interfaces running, except the one excuting ndo_open now.
> > It's more clear then duplicating it and using two different ways in
> > different places for identifing running devices. Current way more
> > close to some testing code, not final version. Just to be consistent
> > better to change it.
> > 
> > Yes, it returns different results when it's ca

Re: [PATCH v3 net-next] net:add one common config ARCH_WANT_RELAX_ORDER to support relax ordering

2017-01-17 Thread Alexander Duyck

On Tue, Jan 17, 2017 at 4:50 PM, Mao Wenan  wrote:
> Relax ordering(RO) is one feature of 82599 NIC, to enable this feature can
> enhance the performance for some cpu architecure, such as SPARC and so on.
> Currently it only supports one special cpu architecture(SPARC) in 82599
> driver to enable RO feature, this is not very common for other cpu 
> architecture
> which really needs RO feature.
> This patch add one common config CONFIG_ARCH_WANT_RELAX_ORDER to set RO 
> feature,
> and should define CONFIG_ARCH_WANT_RELAX_ORDER in sparc Kconfig firstly.
>
> Signed-off-by: Mao Wenan 
> Reviewed-by: Alexander Duyck 

Reviewed-by: Alexander Duyck 

> ---
> v2 -> v3: add reviewed information.
> ---
>  arch/Kconfig| 3 +++
>  arch/sparc/Kconfig  | 1 +
>  drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 2 +-
>  3 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 99839c2..bd04eac 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -781,4 +781,7 @@ config VMAP_STACK
>   the stack to map directly to the KASAN shadow map using a formula
>   that is incorrect if the stack is in vmalloc space.
>
> +config ARCH_WANT_RELAX_ORDER
> +   bool
> +
>  source "kernel/gcov/Kconfig"
> diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
> index cf4034c..68ac5c7 100644
> --- a/arch/sparc/Kconfig
> +++ b/arch/sparc/Kconfig
> @@ -44,6 +44,7 @@ config SPARC
> select CPU_NO_EFFICIENT_FFS
> select HAVE_ARCH_HARDENED_USERCOPY
> select PROVE_LOCKING_SMALL if PROVE_LOCKING
> +   select ARCH_WANT_RELAX_ORDER
>
>  config SPARC32
> def_bool !64BIT
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c 
> b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
> index 094e1d6..c38d50c 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
> @@ -350,7 +350,7 @@ s32 ixgbe_start_hw_gen2(struct ixgbe_hw *hw)
> }
> IXGBE_WRITE_FLUSH(hw);
>
> -#ifndef CONFIG_SPARC
> +#ifndef CONFIG_ARCH_WANT_RELAX_ORDER
> /* Disable relaxed ordering */
> for (i = 0; i < hw->mac.max_tx_queues; i++) {
> u32 regval;
> --
> 2.7.0
>
>

RE: GOOD NEWS

2017-01-17 Thread Khokhar, Kashif




A donation was made to you . Contact ( antoiaxjohn...@yahoo.com ) for details...

Re: [PATCH RFC] net: dsa: remove unnecessary phy.h include

2017-01-17 Thread Florian Fainelli

On 01/17/2017 04:14 PM, Russell King - ARM Linux wrote:
> Including phy.h and phy_fixed.h into net/dsa.h causes phy*.h to be an
> unnecessary dependency for quite a large amount of the kernel.  There's
> very little which actually requires definitions from phy.h in net/dsa.h
> - the include itself only wants the declaration of a couple of
> structures and IFNAMSIZ.
> 
> Add linux/if.h for IFNAMSIZ, declarations for the structures, phy.h to
> mv88e6xxx.h as it needs it for phy_interface_t, and remove both phy.h
> and phy_fixed.h from net/dsa.h.
> 
> This patch reduces from around 800 files rebuilt to around 40 - even
> with ccache, the time difference is noticable.
> 
> Signed-off-by: Russell King 

Reviewed-by: Florian Fainelli 

> ---
> I noticed when I touched linux/phy.h that a lot of the kernel ended up
> being unexpectedly rebuilt, as linux/netdevice.h includes net/dsa.h,
> which then then includes linux/phy.h.  I've tested this change on both
> ARM and ARM64, but I'd suggest letting the 0-day builder have a bite
> at this, and then only taking it if everyone is confident that there's
> a slim chance of any problems.  Also, it may need some rework to apply
> to davem's tree.  All of the above makes this RFC only.
> 
>  drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 1 +
>  include/net/dsa.h | 6 --
>  2 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h 
> b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
> index a319c06d82e3..d247b0639ed4 100644
> --- a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
> +++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
> @@ -15,6 +15,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #ifndef UINT64_MAX
>  #define UINT64_MAX   (u64)(~((u64)0))
> diff --git a/include/net/dsa.h b/include/net/dsa.h
> index b122196d5a1f..887b2f98f9ea 100644
> --- a/include/net/dsa.h
> +++ b/include/net/dsa.h
> @@ -11,15 +11,17 @@
>  #ifndef __LINUX_NET_DSA_H
>  #define __LINUX_NET_DSA_H
>  
> +#include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
> -#include 
> -#include 
>  #include 
>  
> +struct phy_device;
> +struct fixed_phy_status;
> +
>  enum dsa_tag_protocol {
>   DSA_TAG_PROTO_NONE = 0,
>   DSA_TAG_PROTO_DSA,
> 


-- 
Florian

Re: [PATCH RFC] net: dsa: remove unnecessary phy.h include

2017-01-17 Thread Vivien Didelot

Hi Russell,

Russell King - ARM Linux  writes:

> Including phy.h and phy_fixed.h into net/dsa.h causes phy*.h to be an
> unnecessary dependency for quite a large amount of the kernel.  There's
> very little which actually requires definitions from phy.h in net/dsa.h
> - the include itself only wants the declaration of a couple of
> structures and IFNAMSIZ.
>
> Add linux/if.h for IFNAMSIZ, declarations for the structures, phy.h to
> mv88e6xxx.h as it needs it for phy_interface_t, and remove both phy.h
> and phy_fixed.h from net/dsa.h.
>
> This patch reduces from around 800 files rebuilt to around 40 - even
> with ccache, the time difference is noticable.
>
> Signed-off-by: Russell King 

This patch applies cleanly on net-next and builds correctly after
touching include/linux/phy.h. My boards work fine with it.

Tested-by: Vivien Didelot 

Thanks,

Vivien

RE: [PATCH v2 net-next] net:add one common config ARCH_WANT_RELAX_ORDER to support relax ordering.

2017-01-17 Thread maowenan



> -Original Message-
> From: Alexander Duyck [mailto:alexander.du...@gmail.com]
> Sent: Wednesday, January 18, 2017 3:28 AM
> To: David Miller
> Cc: maowenan; Netdev; Jeff Kirsher
> Subject: Re: [PATCH v2 net-next] net:add one common config
> ARCH_WANT_RELAX_ORDER to support relax ordering.
> 
> On Tue, Jan 17, 2017 at 11:15 AM, David Miller 
> wrote:
> > From: Mao Wenan 
> > Date: Mon, 9 Jan 2017 13:32:34 +0800
> >
> >> Relax ordering(RO) is one feature of 82599 NIC, to enable this
> >> feature can enhance the performance for some cpu architecure, such as
> SPARC and so on.
> >> Currently it only supports one special cpu architecture(SPARC) in
> >> 82599 driver to enable RO feature, this is not very common for other
> >> cpu architecture which really needs RO feature.
> >> This patch add one common config CONFIG_ARCH_WANT_RELAX_ORDER to
> set
> >> RO feature, and should define CONFIG_ARCH_WANT_RELAX_ORDER in
> sparc Kconfig firstly.
> >>
> >> Signed-off-by: Mao Wenan 
> >
> > Since no-one has reviewed this patch, and I do not feel comfortable
> > with applying it without such review, I am tossing this patch.
> >
> > If someone eventually reviews it, repost this patch.
> 
> Mao,
> 
> Go ahead and repost the patch and feel free to add my Reviewed-by.
> Sorry I didn't reply to this earlier but I have been getting over the flu for 
> the last
> week or so.
> 
> - Alex

Hi Alex, 
I have reposted the patch(V3), thanks a lot.

Re: [PATCHv2 5/7] TAP: Extending tap device create/destroy APIs

2017-01-17 Thread Andy Shevchenko

On Wed, Jan 18, 2017 at 2:03 AM, Sainath Grandhi
 wrote:
> Extending tap APIs get/free_minor and create/destroy_cdev to handle more than 
> one
> type of virtual interface.
>

Yes, looks better now.

FWIW:
Reviewed-by: Andy Shevchenko 

> Signed-off-by: Sainath Grandhi 
> ---
>  drivers/net/macvtap_main.c |  6 +--
>  drivers/net/tap.c  | 98 
> +++---
>  include/linux/if_tap.h |  4 +-
>  3 files changed, 80 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/net/macvtap_main.c b/drivers/net/macvtap_main.c
> index 6326a82..3f047b4 100644
> --- a/drivers/net/macvtap_main.c
> +++ b/drivers/net/macvtap_main.c
> @@ -160,7 +160,7 @@ static int macvtap_device_event(struct notifier_block 
> *unused,
>  * been registered but before register_netdevice has
>  * finished running.
>  */
> -   err = tap_get_minor(&vlantap->tap);
> +   err = tap_get_minor(macvtap_major, &vlantap->tap);
> if (err)
> return notifier_from_errno(err);
>
> @@ -168,7 +168,7 @@ static int macvtap_device_event(struct notifier_block 
> *unused,
> classdev = device_create(&macvtap_class, &dev->dev, devt,
>  dev, tap_name);
> if (IS_ERR(classdev)) {
> -   tap_free_minor(&vlantap->tap);
> +   tap_free_minor(macvtap_major, &vlantap->tap);
> return notifier_from_errno(PTR_ERR(classdev));
> }
> err = sysfs_create_link(&dev->dev.kobj, &classdev->kobj,
> @@ -183,7 +183,7 @@ static int macvtap_device_event(struct notifier_block 
> *unused,
> sysfs_remove_link(&dev->dev.kobj, tap_name);
> devt = MKDEV(MAJOR(macvtap_major), vlantap->tap.minor);
> device_destroy(&macvtap_class, devt);
> -   tap_free_minor(&vlantap->tap);
> +   tap_free_minor(macvtap_major, &vlantap->tap);
> break;
> case NETDEV_CHANGE_TX_QUEUE_LEN:
> if (tap_queue_resize(&vlantap->tap))
> diff --git a/drivers/net/tap.c b/drivers/net/tap.c
> index 43d9d54..7f38dbe 100644
> --- a/drivers/net/tap.c
> +++ b/drivers/net/tap.c
> @@ -99,12 +99,16 @@ static struct proto tap_proto = {
>  };
>
>  #define TAP_NUM_DEVS (1U << MINORBITS)
> +
> +static LIST_HEAD(major_list);
> +
>  struct major_info {
> dev_t major;
> struct idr minor_idr;
> struct mutex minor_lock;
> const char *device_name;
> -} macvtap_major;
> +   struct list_head next;
> +};
>
>  #define GOODCOPY_LEN 128
>
> @@ -385,44 +389,73 @@ rx_handler_result_t tap_handle_frame(struct sk_buff 
> **pskb)
> return RX_HANDLER_CONSUMED;
>  }
>
> -int tap_get_minor(struct tap_dev *tap)
> +static struct major_info *tap_get_major(int major)
> +{
> +   struct major_info *tap_major, *tmp;
> +
> +   list_for_each_entry_safe(tap_major, tmp, &major_list, next) {
> +   if (tap_major->major == major) {
> +   return tap_major;
> +   }
> +   }
> +
> +   return NULL;
> +}
> +
> +int tap_get_minor(dev_t major, struct tap_dev *tap)
>  {
> int retval = -ENOMEM;
> +   struct major_info *tap_major;
> +
> +   tap_major = tap_get_major(MAJOR(major));
> +   if (!tap_major)
> +   return -EINVAL;
>
> -   mutex_lock(&macvtap_major.minor_lock);
> -   retval = idr_alloc(&macvtap_major.minor_idr, tap, 1, TAP_NUM_DEVS, 
> GFP_KERNEL);
> +   mutex_lock(&tap_major->minor_lock);
> +   retval = idr_alloc(&tap_major->minor_idr, tap, 1, TAP_NUM_DEVS, 
> GFP_KERNEL);
> if (retval >= 0) {
> tap->minor = retval;
> } else if (retval == -ENOSPC) {
> netdev_err(tap->dev, "Too many tap devices\n");
> retval = -EINVAL;
> }
> -   mutex_unlock(&macvtap_major.minor_lock);
> +   mutex_unlock(&tap_major->minor_lock);
> return retval < 0 ? retval : 0;
>  }
>
> -void tap_free_minor(struct tap_dev *tap)
> +void tap_free_minor(dev_t major, struct tap_dev *tap)
>  {
> -   mutex_lock(&macvtap_major.minor_lock);
> +   struct major_info *tap_major;
> +
> +   tap_major = tap_get_major(MAJOR(major));
> +   if (!tap_major)
> +   return;
> +
> +   mutex_lock(&tap_major->minor_lock);
> if (tap->minor) {
> -   idr_remove(&macvtap_major.minor_idr, tap->minor);
> +   idr_remove(&tap_major->minor_idr, tap->minor);
> tap->minor = 0;
> }
> -   mutex_unlock(&macvtap_major.minor_lock);
> +   mutex_unlock(&tap_major->minor_lock);
>  }
>
> -static struct tap_dev *dev_get_by_tap_minor(int minor)
> +static struct tap_dev *dev_get_by_tap_file(int major, int minor)
>  {
> struct net_device *dev = NULL;
> struct tap_dev *tap;
> +   str

[PATCH v3 net-next] net:add one common config ARCH_WANT_RELAX_ORDER to support relax ordering

2017-01-17 Thread Mao Wenan

Relax ordering(RO) is one feature of 82599 NIC, to enable this feature can
enhance the performance for some cpu architecure, such as SPARC and so on.
Currently it only supports one special cpu architecture(SPARC) in 82599
driver to enable RO feature, this is not very common for other cpu architecture
which really needs RO feature.
This patch add one common config CONFIG_ARCH_WANT_RELAX_ORDER to set RO feature,
and should define CONFIG_ARCH_WANT_RELAX_ORDER in sparc Kconfig firstly.

Signed-off-by: Mao Wenan 
Reviewed-by: Alexander Duyck 
---
v2 -> v3: add reviewed information.
---
 arch/Kconfig| 3 +++
 arch/sparc/Kconfig  | 1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 2 +-
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 99839c2..bd04eac 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -781,4 +781,7 @@ config VMAP_STACK
  the stack to map directly to the KASAN shadow map using a formula
  that is incorrect if the stack is in vmalloc space.
 
+config ARCH_WANT_RELAX_ORDER
+   bool
+
 source "kernel/gcov/Kconfig"
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index cf4034c..68ac5c7 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -44,6 +44,7 @@ config SPARC
select CPU_NO_EFFICIENT_FFS
select HAVE_ARCH_HARDENED_USERCOPY
select PROVE_LOCKING_SMALL if PROVE_LOCKING
+   select ARCH_WANT_RELAX_ORDER
 
 config SPARC32
def_bool !64BIT
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
index 094e1d6..c38d50c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
@@ -350,7 +350,7 @@ s32 ixgbe_start_hw_gen2(struct ixgbe_hw *hw)
}
IXGBE_WRITE_FLUSH(hw);
 
-#ifndef CONFIG_SPARC
+#ifndef CONFIG_ARCH_WANT_RELAX_ORDER
/* Disable relaxed ordering */
for (i = 0; i < hw->mac.max_tx_queues; i++) {
u32 regval;
-- 
2.7.0

Darlehen angebot 3 %

2017-01-17 Thread Frau SCHMIDT



Sehr geehrte Damen  und Herren,

Haben Sie Interesse über einer finanziellen Darlehen zu 3%???
kontaktieren Sie mich für mehr Details und Bedingungen. ich kann all 
jenen helfen, wer ein Darlehen benötigen.

Ich kann Ihnen biete ein darlehen in hohe von 10.000.000 Euro
Meine mail: info@rschmidt.online

Mit freundlichen Grüßen

linux-next: build warnings after merge of the net-next tree

2017-01-17 Thread Stephen Rothwell

Hi all,

After merging the net-next tree, today's linux-next build (powerpc
ppc64_defconfig) produced these warnings:

drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c: In function 'init_one':
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:4646:9: warning: unused 
variable 'port_vec' [-Wunused-variable]
  u32 v, port_vec;
 ^
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:4646:6: warning: unused 
variable 'v' [-Wunused-variable]
  u32 v, port_vec;
  ^
Introduced by commit

  96fe11f27b70 ("cxgb4: Implement ndo_get_phys_port_id for mgmt dev")

-- 
Cheers,
Stephen Rothwell

[PATCH] net: ethernet: ti: davinci_cpdma: correct check on NULL in set rate

2017-01-17 Thread Ivan Khoronzhuk

Check "ch" on NULL first, then get ctlr.

Signed-off-by: Ivan Khoronzhuk 
---

Based on net-next/master

 drivers/net/ethernet/ti/davinci_cpdma.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c 
b/drivers/net/ethernet/ti/davinci_cpdma.c
index d80bff1..7ecc6b7 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -835,8 +835,8 @@ EXPORT_SYMBOL_GPL(cpdma_chan_get_min_rate);
  */
 int cpdma_chan_set_rate(struct cpdma_chan *ch, u32 rate)
 {
-   struct cpdma_ctlr *ctlr = ch->ctlr;
unsigned long flags, ch_flags;
+   struct cpdma_ctlr *ctlr;
int ret, prio_mode;
u32 rmask;
 
@@ -846,6 +846,7 @@ int cpdma_chan_set_rate(struct cpdma_chan *ch, u32 rate)
if (ch->rate == rate)
return rate;
 
+   ctlr = ch->ctlr;
spin_lock_irqsave(&ctlr->lock, flags);
spin_lock_irqsave(&ch->lock, ch_flags);
 
-- 
2.7.4

[PATCH net] net: phy: bcm63xx: Utilize correct config_intr function

2017-01-17 Thread Florian Fainelli

From: Daniel Gonzalez Cabanelas 

Commit a1cba5613edf ("net: phy: Add Broadcom phy library for common
interfaces") make the BCM63xx PHY driver utilize bcm_phy_config_intr()
which would appear to do the right thing, except that it does not write
to the MII_BCM63XX_IR register but to MII_BCM54XX_ECR which is
different.

This would be causing invalid link parameters and events from being
generated by the PHY interrupt.

Fixes: a1cba5613edf ("net: phy: Add Broadcom phy library for common interfaces")
Signed-off-by: Daniel Gonzalez Cabanelas 
Signed-off-by: Florian Fainelli 
---
 drivers/net/phy/bcm63xx.c | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/net/phy/bcm63xx.c b/drivers/net/phy/bcm63xx.c
index e741bf614c4e..b0492ef2cdaa 100644
--- a/drivers/net/phy/bcm63xx.c
+++ b/drivers/net/phy/bcm63xx.c
@@ -21,6 +21,23 @@ MODULE_DESCRIPTION("Broadcom 63xx internal PHY driver");
 MODULE_AUTHOR("Maxime Bizon ");
 MODULE_LICENSE("GPL");
 
+static int bcm63xx_config_intr(struct phy_device *phydev)
+{
+   int reg, err;
+
+   reg = phy_read(phydev, MII_BCM63XX_IR);
+   if (reg < 0)
+   return reg;
+
+   if (phydev->interrupts == PHY_INTERRUPT_ENABLED)
+   reg &= ~MII_BCM63XX_IR_GMASK;
+   else
+   reg |= MII_BCM63XX_IR_GMASK;
+
+   err = phy_write(phydev, MII_BCM63XX_IR, reg);
+   return err;
+}
+
 static int bcm63xx_config_init(struct phy_device *phydev)
 {
int reg, err;
@@ -55,7 +72,7 @@ static struct phy_driver bcm63xx_driver[] = {
.config_aneg= genphy_config_aneg,
.read_status= genphy_read_status,
.ack_interrupt  = bcm_phy_ack_intr,
-   .config_intr= bcm_phy_config_intr,
+   .config_intr= bcm63xx_config_intr,
 }, {
/* same phy as above, with just a different OUI */
.phy_id = 0x002bdc00,
@@ -67,7 +84,7 @@ static struct phy_driver bcm63xx_driver[] = {
.config_aneg= genphy_config_aneg,
.read_status= genphy_read_status,
.ack_interrupt  = bcm_phy_ack_intr,
-   .config_intr= bcm_phy_config_intr,
+   .config_intr= bcm63xx_config_intr,
 } };
 
 module_phy_driver(bcm63xx_driver);
-- 
2.9.3

[PATCH RFC] net: dsa: remove unnecessary phy.h include

2017-01-17 Thread Russell King - ARM Linux

Including phy.h and phy_fixed.h into net/dsa.h causes phy*.h to be an
unnecessary dependency for quite a large amount of the kernel.  There's
very little which actually requires definitions from phy.h in net/dsa.h
- the include itself only wants the declaration of a couple of
structures and IFNAMSIZ.

Add linux/if.h for IFNAMSIZ, declarations for the structures, phy.h to
mv88e6xxx.h as it needs it for phy_interface_t, and remove both phy.h
and phy_fixed.h from net/dsa.h.

This patch reduces from around 800 files rebuilt to around 40 - even
with ccache, the time difference is noticable.

Signed-off-by: Russell King 
---
I noticed when I touched linux/phy.h that a lot of the kernel ended up
being unexpectedly rebuilt, as linux/netdevice.h includes net/dsa.h,
which then then includes linux/phy.h.  I've tested this change on both
ARM and ARM64, but I'd suggest letting the 0-day builder have a bite
at this, and then only taking it if everyone is confident that there's
a slim chance of any problems.  Also, it may need some rework to apply
to davem's tree.  All of the above makes this RFC only.

 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 1 +
 include/net/dsa.h | 6 --
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h 
b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
index a319c06d82e3..d247b0639ed4 100644
--- a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifndef UINT64_MAX
 #define UINT64_MAX (u64)(~((u64)0))
diff --git a/include/net/dsa.h b/include/net/dsa.h
index b122196d5a1f..887b2f98f9ea 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -11,15 +11,17 @@
 #ifndef __LINUX_NET_DSA_H
 #define __LINUX_NET_DSA_H
 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
-#include 
-#include 
 #include 
 
+struct phy_device;
+struct fixed_phy_status;
+
 enum dsa_tag_protocol {
DSA_TAG_PROTO_NONE = 0,
DSA_TAG_PROTO_DSA,

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

Re: [PATCH net-next v4 05/10] drivers: base: Add device_find_in_class_name()

2017-01-17 Thread Florian Fainelli

On 01/17/2017 04:07 PM, Andy Shevchenko wrote:
> On Wed, Jan 18, 2017 at 2:04 AM, Florian Fainelli  
> wrote:
>> On 01/17/2017 04:00 PM, Andy Shevchenko wrote:
>>> On Wed, Jan 18, 2017 at 1:43 AM, Florian Fainelli  
>>> wrote:
 On 01/17/2017 03:34 PM, Andy Shevchenko wrote:
> On Wed, Jan 18, 2017 at 1:21 AM, Florian Fainelli  
> wrote:
> 
>>> But why not to use void *class_name to be consistent with callback and
>>> device_find_child()?
>>
>> The top-level function: device_find_in_class_name() should have a
>> stronger typing of its argument even if it internally uses
>> device_find_child() and a callback that takes a void * argument, that's
>> how I see it.
> 
> Fair enough.
> 
>>> Btw,
>>> return get_device(parent);
>>
>> Not sure I follow what that means here?
> 
> Missed remark. Instead of
> 
> get_device(parent);
> return parent;
> 
> you can use
> 
> return get_device(parent);

Seems reasonable, if I have to respin a v5, will add that, thanks!
-- 
Florian

[PATCHv2 1/7] TAP: Refactoring macvtap.c

2017-01-17 Thread Sainath Grandhi

macvtap module has code for tap/queue management and link management. This 
patch splits
the code into macvtap_main.c for link management and tap.c for tap/queue 
management.
Functionality in tap.c can be re-used for implementing tap on other virtual 
interfaces.

Signed-off-by: Sainath Grandhi 
---
 drivers/net/Makefile |   2 +
 drivers/net/macvtap_main.c   | 218 +++
 drivers/net/{macvtap.c => tap.c} | 204 ++--
 include/linux/if_macvtap.h   |  10 ++
 4 files changed, 238 insertions(+), 196 deletions(-)
 create mode 100644 drivers/net/macvtap_main.c
 rename drivers/net/{macvtap.c => tap.c} (84%)
 create mode 100644 include/linux/if_macvtap.h

diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 7336cbd..19b03a9 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -29,6 +29,8 @@ obj-$(CONFIG_GTP) += gtp.o
 obj-$(CONFIG_NLMON) += nlmon.o
 obj-$(CONFIG_NET_VRF) += vrf.o
 
+macvtap-objs := macvtap_main.o tap.o
+
 #
 # Networking Drivers
 #
diff --git a/drivers/net/macvtap_main.c b/drivers/net/macvtap_main.c
new file mode 100644
index 000..96ffa60
--- /dev/null
+++ b/drivers/net/macvtap_main.c
@@ -0,0 +1,218 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * Variables for dealing with macvtaps device numbers.
+ */
+static dev_t macvtap_major;
+#define MACVTAP_NUM_DEVS (1U << MINORBITS)
+
+static const void *macvtap_net_namespace(struct device *d)
+{
+   struct net_device *dev = to_net_dev(d->parent);
+   return dev_net(dev);
+}
+
+static struct class macvtap_class = {
+   .name = "macvtap",
+   .owner = THIS_MODULE,
+   .ns_type = &net_ns_type_operations,
+   .namespace = macvtap_net_namespace,
+};
+static struct cdev macvtap_cdev;
+
+#define TUN_OFFLOADS (NETIF_F_HW_CSUM | NETIF_F_TSO_ECN | NETIF_F_TSO | \
+ NETIF_F_TSO6 | NETIF_F_UFO)
+
+static int macvtap_newlink(struct net *src_net,
+  struct net_device *dev,
+  struct nlattr *tb[],
+  struct nlattr *data[])
+{
+   struct macvlan_dev *vlan = netdev_priv(dev);
+   int err;
+
+   INIT_LIST_HEAD(&vlan->queue_list);
+
+   /* Since macvlan supports all offloads by default, make
+* tap support all offloads also.
+*/
+   vlan->tap_features = TUN_OFFLOADS;
+
+   err = netdev_rx_handler_register(dev, macvtap_handle_frame, vlan);
+   if (err)
+   return err;
+
+   /* Don't put anything that may fail after macvlan_common_newlink
+* because we can't undo what it does.
+*/
+   err = macvlan_common_newlink(src_net, dev, tb, data);
+   if (err) {
+   netdev_rx_handler_unregister(dev);
+   return err;
+   }
+
+   return 0;
+}
+
+static void macvtap_dellink(struct net_device *dev,
+   struct list_head *head)
+{
+   netdev_rx_handler_unregister(dev);
+   macvtap_del_queues(dev);
+   macvlan_dellink(dev, head);
+}
+
+static void macvtap_setup(struct net_device *dev)
+{
+   macvlan_common_setup(dev);
+   dev->tx_queue_len = TUN_READQ_SIZE;
+}
+
+static struct rtnl_link_ops macvtap_link_ops __read_mostly = {
+   .kind   = "macvtap",
+   .setup  = macvtap_setup,
+   .newlink= macvtap_newlink,
+   .dellink= macvtap_dellink,
+};
+
+static int macvtap_device_event(struct notifier_block *unused,
+   unsigned long event, void *ptr)
+{
+   struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+   struct macvlan_dev *vlan;
+   struct device *classdev;
+   dev_t devt;
+   int err;
+   char tap_name[IFNAMSIZ];
+
+   if (dev->rtnl_link_ops != &macvtap_link_ops)
+   return NOTIFY_DONE;
+
+   snprintf(tap_name, IFNAMSIZ, "tap%d", dev->ifindex);
+   vlan = netdev_priv(dev);
+
+   switch (event) {
+   case NETDEV_REGISTER:
+   /* Create the device node here after the network device has
+* been registered but before register_netdevice has
+* finished running.
+*/
+   err = macvtap_get_minor(vlan);
+   if (err)
+   return notifier_from_errno(err);
+
+   devt = MKDEV(MAJOR(macvtap_major), vlan->minor);
+   classdev = device_create(&macvtap_class, &dev->dev, devt,
+dev, tap_name);
+   if (IS_ERR(classdev)) {
+   macvtap_free_minor(vlan);
+   return notifier_from_errno(PTR_ERR(classdev));
+   }
+   err = sysfs_create_l

Re: [PATCH net-next v4 05/10] drivers: base: Add device_find_in_class_name()

2017-01-17 Thread Andy Shevchenko

On Wed, Jan 18, 2017 at 2:04 AM, Florian Fainelli  wrote:
> On 01/17/2017 04:00 PM, Andy Shevchenko wrote:
>> On Wed, Jan 18, 2017 at 1:43 AM, Florian Fainelli  
>> wrote:
>>> On 01/17/2017 03:34 PM, Andy Shevchenko wrote:
 On Wed, Jan 18, 2017 at 1:21 AM, Florian Fainelli  
 wrote:

>> But why not to use void *class_name to be consistent with callback and
>> device_find_child()?
>
> The top-level function: device_find_in_class_name() should have a
> stronger typing of its argument even if it internally uses
> device_find_child() and a callback that takes a void * argument, that's
> how I see it.

Fair enough.

>> Btw,
>> return get_device(parent);
>
> Not sure I follow what that means here?

Missed remark. Instead of

get_device(parent);
return parent;

you can use

return get_device(parent);

-- 
With Best Regards,
Andy Shevchenko

[PATCHv2 0/7] Refactor macvtap to re-use tap functionality by other virtual intefaces

2017-01-17 Thread Sainath Grandhi

Tap character devices can be implemented on other virtual interfaces like
ipvlan, similar to macvtap. Source code for tap functionality in macvtap
can be re-used for this purpose.

This patch series splits macvtap source into two modules, macvtap and tap.
This patch series also includes a patch for implementing tap character
device driver based on the IP-VLAN network interface, called ipvtap.

These patches are tested on x86 platform.

Sainath Grandhi (7):
  TAP: Refactoring macvtap.c
  TAP: Renaming tap related APIs, data structures, macros
  TAP: Tap character device creation/destroy API
  TAP: Abstract type of virtual interface from tap  implementation
  TAP: Extending tap device create/destroy APIs
  TAP: tap as an independent module
  IPVTAP: IP-VLAN based tap driver

 drivers/net/Kconfig  |   28 +
 drivers/net/Makefile |2 +
 drivers/net/ipvlan/Makefile  |1 +
 drivers/net/ipvlan/ipvlan.h  |7 +
 drivers/net/ipvlan/ipvlan_core.c |5 +-
 drivers/net/ipvlan/ipvlan_main.c |   27 +-
 drivers/net/ipvlan/ipvtap.c  |  238 +++
 drivers/net/macvlan.c|2 +-
 drivers/net/macvtap.c| 1226 ++--
 drivers/net/tap.c| 1262 ++
 drivers/vhost/Kconfig|2 +-
 drivers/vhost/net.c  |3 +-
 include/linux/if_macvlan.h   |   17 +-
 include/linux/if_tap.h   |   75 +++
 14 files changed, 1686 insertions(+), 1209 deletions(-)
 create mode 100644 drivers/net/ipvlan/ipvtap.c
 create mode 100644 drivers/net/tap.c
 create mode 100644 include/linux/if_tap.h

-- 
2.7.4

[PATCHv2 4/7] TAP: Abstract type of virtual interface from tap implementation

2017-01-17 Thread Sainath Grandhi

macvlan object is re-structured to hold tap related elements in a separate
entity, tap_dev. Upon NETDEV_REGISTER device_event, tap_dev is registered with
idr and fetched again on tap_open. Few of the tap functions are modified to
accepted tap_dev as argument. tap_dev object includes callbacks to be used by
underlying virtual interface to take care of tx and rx accounting.

Signed-off-by: Sainath Grandhi 
---
 drivers/net/macvlan.c  |   2 +-
 drivers/net/macvtap_main.c |  68 +---
 drivers/net/tap.c  | 264 -
 include/linux/if_tap.h |  57 +-
 4 files changed, 226 insertions(+), 165 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 20b3fdf2..79383f9 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -1526,7 +1526,6 @@ static const struct nla_policy 
macvlan_policy[IFLA_MACVLAN_MAX + 1] = {
 int macvlan_link_register(struct rtnl_link_ops *ops)
 {
/* common fields */
-   ops->priv_size  = sizeof(struct macvlan_dev);
ops->validate   = macvlan_validate;
ops->maxtype= IFLA_MACVLAN_MAX;
ops->policy = macvlan_policy;
@@ -1549,6 +1548,7 @@ static struct rtnl_link_ops macvlan_link_ops = {
.newlink= macvlan_newlink,
.dellink= macvlan_dellink,
.get_link_net   = macvlan_get_link_net,
+   .priv_size  = sizeof(struct macvlan_dev),
 };
 
 static int macvlan_device_event(struct notifier_block *unused,
diff --git a/drivers/net/macvtap_main.c b/drivers/net/macvtap_main.c
index 32ad560..6326a82 100644
--- a/drivers/net/macvtap_main.c
+++ b/drivers/net/macvtap_main.c
@@ -24,6 +24,11 @@
 #include 
 #include 
 
+struct macvtap_dev {
+   struct macvlan_dev vlan;
+   struct tap_devtap;
+};
+
 /*
  * Variables for dealing with macvtaps device numbers.
  */
@@ -46,22 +51,52 @@ static struct cdev macvtap_cdev;
 #define TUN_OFFLOADS (NETIF_F_HW_CSUM | NETIF_F_TSO_ECN | NETIF_F_TSO | \
  NETIF_F_TSO6 | NETIF_F_UFO)
 
+static void macvtap_count_tx_dropped(struct tap_dev *tap)
+{
+   struct macvlan_dev *vlan = (struct macvlan_dev *)container_of(tap, 
struct macvtap_dev, tap);
+
+   this_cpu_inc(vlan->pcpu_stats->tx_dropped);
+}
+
+static void macvtap_count_rx_dropped(struct tap_dev *tap)
+{
+   struct macvlan_dev *vlan = (struct macvlan_dev *)container_of(tap, 
struct macvtap_dev, tap);
+
+   macvlan_count_rx(vlan, 0, 0, 0);
+}
+
+static void macvtap_update_features(struct tap_dev *tap,
+   netdev_features_t features)
+{
+   struct macvlan_dev *vlan = (struct macvlan_dev *)container_of(tap, 
struct macvtap_dev, tap);
+
+   vlan->set_features = features;
+   netdev_update_features(vlan->dev);
+}
+
 static int macvtap_newlink(struct net *src_net,
   struct net_device *dev,
   struct nlattr *tb[],
   struct nlattr *data[])
 {
-   struct macvlan_dev *vlan = netdev_priv(dev);
+   struct macvtap_dev *vlantap = netdev_priv(dev);
int err;
 
-   INIT_LIST_HEAD(&vlan->queue_list);
+   INIT_LIST_HEAD(&vlantap->tap.queue_list);
 
/* Since macvlan supports all offloads by default, make
 * tap support all offloads also.
 */
-   vlan->tap_features = TUN_OFFLOADS;
+   vlantap->tap.tap_features = TUN_OFFLOADS;
 
-   err = netdev_rx_handler_register(dev, tap_handle_frame, vlan);
+   /* Register callbacks for rx/tx drops accounting and updating
+* net_device features
+*/
+   vlantap->tap.count_tx_dropped = macvtap_count_tx_dropped;
+   vlantap->tap.count_rx_dropped = macvtap_count_rx_dropped;
+   vlantap->tap.update_features  = macvtap_update_features;
+
+   err = netdev_rx_handler_register(dev, tap_handle_frame, &vlantap->tap);
if (err)
return err;
 
@@ -74,14 +109,18 @@ static int macvtap_newlink(struct net *src_net,
return err;
}
 
+   vlantap->tap.dev = vlantap->vlan.dev;
+
return 0;
 }
 
 static void macvtap_dellink(struct net_device *dev,
struct list_head *head)
 {
+   struct macvtap_dev *vlantap = netdev_priv(dev);
+
netdev_rx_handler_unregister(dev);
-   tap_del_queues(dev);
+   tap_del_queues(&vlantap->tap);
macvlan_dellink(dev, head);
 }
 
@@ -96,13 +135,14 @@ static struct rtnl_link_ops macvtap_link_ops __read_mostly 
= {
.setup  = macvtap_setup,
.newlink= macvtap_newlink,
.dellink= macvtap_dellink,
+   .priv_size  = sizeof(struct macvtap_dev),
 };
 
 static int macvtap_device_event(struct notifier_block *unused,
unsigned long event, void *ptr)
 {
struct net_device *dev = netdev_notifier_info_to_dev(ptr);
-   struct macvlan_dev *vlan;
+

[PATCHv2 7/7] IPVTAP: IP-VLAN based tap driver

2017-01-17 Thread Sainath Grandhi

This patch adds a tap character device driver that is based on the
IP-VLAN network interface, called ipvtap. An ipvtap device can be created
in the same way as an ipvlan device, using 'type ipvtap', and then accessed
using the tap user space interface.

Signed-off-by: Sainath Grandhi 
---
 drivers/net/Kconfig  |  13 +++
 drivers/net/Makefile |   1 +
 drivers/net/ipvlan/Makefile  |   1 +
 drivers/net/ipvlan/ipvlan.h  |   7 ++
 drivers/net/ipvlan/ipvlan_core.c |   5 +-
 drivers/net/ipvlan/ipvlan_main.c |  27 +++--
 drivers/net/ipvlan/ipvtap.c  | 238 +++
 7 files changed, 278 insertions(+), 14 deletions(-)
 create mode 100644 drivers/net/ipvlan/ipvtap.c

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 1c88437..d07b5f5 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -166,6 +166,19 @@ config IPVLAN
   To compile this driver as a module, choose M here: the module
   will be called ipvlan.
 
+config IPVTAP
+   tristate "IP-VLAN based tap driver"
+   depends on IPVLAN
+   depends on INET
+   depends on TAP
+   ---help---
+ This adds a specialized tap character device driver that is based
+ on the IP-VLAN network interface, called ipvtap. An ipvtap device
+ can be added in the same way as a ipvlan device, using 'type
+ ipvtap', and then be accessed through the tap user space interface.
+
+ To compile this driver as a module, choose M here: the module
+ will be called ipvtap.
 
 config VXLAN
tristate "Virtual eXtensible Local Area Network (VXLAN)"
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 7dd86ca..98ed4d9 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -7,6 +7,7 @@
 #
 obj-$(CONFIG_BONDING) += bonding/
 obj-$(CONFIG_IPVLAN) += ipvlan/
+obj-$(CONFIG_IPVTAP) += ipvlan/
 obj-$(CONFIG_DUMMY) += dummy.o
 obj-$(CONFIG_EQUALIZER) += eql.o
 obj-$(CONFIG_IFB) += ifb.o
diff --git a/drivers/net/ipvlan/Makefile b/drivers/net/ipvlan/Makefile
index df79910..8a2c64d 100644
--- a/drivers/net/ipvlan/Makefile
+++ b/drivers/net/ipvlan/Makefile
@@ -3,5 +3,6 @@
 #
 
 obj-$(CONFIG_IPVLAN) += ipvlan.o
+obj-$(CONFIG_IPVTAP) += ipvtap.o
 
 ipvlan-objs := ipvlan_core.o ipvlan_main.o
diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h
index dbfbb33..4362d88 100644
--- a/drivers/net/ipvlan/ipvlan.h
+++ b/drivers/net/ipvlan/ipvlan.h
@@ -133,4 +133,11 @@ struct sk_buff *ipvlan_l3_rcv(struct net_device *dev, 
struct sk_buff *skb,
  u16 proto);
 unsigned int ipvlan_nf_input(void *priv, struct sk_buff *skb,
 const struct nf_hook_state *state);
+void ipvlan_count_rx(const struct ipvl_dev *ipvlan,
+unsigned int len, bool success, bool mcast);
+int ipvlan_link_new(struct net *src_net, struct net_device *dev,
+   struct nlattr *tb[], struct nlattr *data[]);
+void ipvlan_link_delete(struct net_device *dev, struct list_head *head);
+void ipvlan_link_setup(struct net_device *dev);
+int ipvlan_link_register(struct rtnl_link_ops *ops);
 #endif /* __IPVLAN_H */
diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
index 83ce74a..9af16ab 100644
--- a/drivers/net/ipvlan/ipvlan_core.c
+++ b/drivers/net/ipvlan/ipvlan_core.c
@@ -16,8 +16,8 @@ void ipvlan_init_secret(void)
net_get_random_once(&ipvlan_jhash_secret, sizeof(ipvlan_jhash_secret));
 }
 
-static void ipvlan_count_rx(const struct ipvl_dev *ipvlan,
-   unsigned int len, bool success, bool mcast)
+void ipvlan_count_rx(const struct ipvl_dev *ipvlan,
+unsigned int len, bool success, bool mcast)
 {
if (!ipvlan)
return;
@@ -36,6 +36,7 @@ static void ipvlan_count_rx(const struct ipvl_dev *ipvlan,
this_cpu_inc(ipvlan->pcpu_stats->rx_errs);
}
 }
+EXPORT_SYMBOL_GPL(ipvlan_count_rx);
 
 static u8 ipvlan_get_v6_hash(const void *iaddr)
 {
diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index 8b0f993..ed750e2 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -494,8 +494,8 @@ static int ipvlan_nl_fillinfo(struct sk_buff *skb,
return ret;
 }
 
-static int ipvlan_link_new(struct net *src_net, struct net_device *dev,
-  struct nlattr *tb[], struct nlattr *data[])
+int ipvlan_link_new(struct net *src_net, struct net_device *dev,
+   struct nlattr *tb[], struct nlattr *data[])
 {
struct ipvl_dev *ipvlan = netdev_priv(dev);
struct ipvl_port *port;
@@ -567,8 +567,9 @@ static int ipvlan_link_new(struct net *src_net, struct 
net_device *dev,
ipvlan_port_destroy(phy_dev);
return err;
 }
+EXPORT_SYMBOL_GPL(ipvlan_link_new);
 
-static void ipvlan_link_delete(struct net_device *dev, struct list_head *head)
+void ipvlan_link_de

[PATCHv2 3/7] TAP: Tap character device creation/destroy API

2017-01-17 Thread Sainath Grandhi

This patch provides tap device create/destroy APIs in tap.c.

Signed-off-by: Sainath Grandhi 
---
 drivers/net/macvtap_main.c | 29 +++--
 drivers/net/tap.c  | 63 ++
 include/linux/if_tap.h |  5 +++-
 3 files changed, 65 insertions(+), 32 deletions(-)

diff --git a/drivers/net/macvtap_main.c b/drivers/net/macvtap_main.c
index 548f339..32ad560 100644
--- a/drivers/net/macvtap_main.c
+++ b/drivers/net/macvtap_main.c
@@ -28,7 +28,6 @@
  * Variables for dealing with macvtaps device numbers.
  */
 static dev_t macvtap_major;
-#define MACVTAP_NUM_DEVS (1U << MINORBITS)
 
 static const void *macvtap_net_namespace(struct device *d)
 {
@@ -159,43 +158,35 @@ static struct notifier_block macvtap_notifier_block 
__read_mostly = {
.notifier_call  = macvtap_device_event,
 };
 
-extern struct file_operations tap_fops;
 static int macvtap_init(void)
 {
int err;
 
-   err = alloc_chrdev_region(&macvtap_major, 0,
-   MACVTAP_NUM_DEVS, "macvtap");
-   if (err)
-   goto out1;
+   err = tap_create_cdev(&macvtap_cdev, &macvtap_major, "macvtap");
 
-   cdev_init(&macvtap_cdev, &tap_fops);
-   err = cdev_add(&macvtap_cdev, macvtap_major, MACVTAP_NUM_DEVS);
if (err)
-   goto out2;
+   goto out1;
 
err = class_register(&macvtap_class);
if (err)
-   goto out3;
+   goto out2;
 
err = register_netdevice_notifier(&macvtap_notifier_block);
if (err)
-   goto out4;
+   goto out3;
 
err = macvlan_link_register(&macvtap_link_ops);
if (err)
-   goto out5;
+   goto out4;
 
return 0;
 
-out5:
-   unregister_netdevice_notifier(&macvtap_notifier_block);
 out4:
-   class_unregister(&macvtap_class);
+   unregister_netdevice_notifier(&macvtap_notifier_block);
 out3:
-   cdev_del(&macvtap_cdev);
+   class_unregister(&macvtap_class);
 out2:
-   unregister_chrdev_region(macvtap_major, MACVTAP_NUM_DEVS);
+   cdev_del(&macvtap_cdev);
 out1:
return err;
 }
@@ -207,9 +198,7 @@ static void macvtap_exit(void)
rtnl_link_unregister(&macvtap_link_ops);
unregister_netdevice_notifier(&macvtap_notifier_block);
class_unregister(&macvtap_class);
-   cdev_del(&macvtap_cdev);
-   unregister_chrdev_region(macvtap_major, MACVTAP_NUM_DEVS);
-   idr_destroy(&minor_idr);
+   tap_destroy_cdev(macvtap_major, &macvtap_cdev);
 }
 module_exit(macvtap_exit);
 
diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index d0807c2..774ef33 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -123,8 +123,12 @@ static struct proto tap_proto = {
 };
 
 #define TAP_NUM_DEVS (1U << MINORBITS)
-static DEFINE_MUTEX(minor_lock);
-DEFINE_IDR(minor_idr);
+struct major_info {
+   dev_t major;
+   struct idr minor_idr;
+   struct mutex minor_lock;
+   const char *device_name;
+} macvtap_major;
 
 #define GOODCOPY_LEN 128
 
@@ -413,26 +417,26 @@ int tap_get_minor(struct macvlan_dev *vlan)
 {
int retval = -ENOMEM;
 
-   mutex_lock(&minor_lock);
-   retval = idr_alloc(&minor_idr, vlan, 1, TAP_NUM_DEVS, GFP_KERNEL);
+   mutex_lock(&macvtap_major.minor_lock);
+   retval = idr_alloc(&macvtap_major.minor_idr, vlan, 1, TAP_NUM_DEVS, 
GFP_KERNEL);
if (retval >= 0) {
vlan->minor = retval;
} else if (retval == -ENOSPC) {
netdev_err(vlan->dev, "Too many tap devices\n");
retval = -EINVAL;
}
-   mutex_unlock(&minor_lock);
+   mutex_unlock(&macvtap_major.minor_lock);
return retval < 0 ? retval : 0;
 }
 
 void tap_free_minor(struct macvlan_dev *vlan)
 {
-   mutex_lock(&minor_lock);
+   mutex_lock(&macvtap_major.minor_lock);
if (vlan->minor) {
-   idr_remove(&minor_idr, vlan->minor);
+   idr_remove(&macvtap_major.minor_idr, vlan->minor);
vlan->minor = 0;
}
-   mutex_unlock(&minor_lock);
+   mutex_unlock(&macvtap_major.minor_lock);
 }
 
 static struct net_device *dev_get_by_tap_minor(int minor)
@@ -440,13 +444,13 @@ static struct net_device *dev_get_by_tap_minor(int minor)
struct net_device *dev = NULL;
struct macvlan_dev *vlan;
 
-   mutex_lock(&minor_lock);
-   vlan = idr_find(&minor_idr, minor);
+   mutex_lock(&macvtap_major.minor_lock);
+   vlan = idr_find(&macvtap_major.minor_idr, minor);
if (vlan) {
dev = vlan->dev;
dev_hold(dev);
}
-   mutex_unlock(&minor_lock);
+   mutex_unlock(&macvtap_major.minor_lock);
return dev;
 }
 
@@ -1184,3 +1188,40 @@ int tap_queue_resize(struct macvlan_dev *vlan)
kfree(arrays);
return ret;
 }
+
+int tap_create_cdev(struct cdev *tap_cdev,
+   dev_t *tap_major, const char *devic

[PATCHv2 2/7] TAP: Renaming tap related APIs, data structures, macros

2017-01-17 Thread Sainath Grandhi

Renaming tap related APIs, data structures and macros in tap.c from macvtap_.* 
to tap_.*

Signed-off-by: Sainath Grandhi 
---
 drivers/net/macvtap_main.c |  18 +--
 drivers/net/tap.c  | 332 ++---
 drivers/vhost/net.c|   3 +-
 include/linux/if_macvlan.h |  17 +--
 include/linux/if_macvtap.h |  10 --
 include/linux/if_tap.h |  23 
 6 files changed, 202 insertions(+), 201 deletions(-)
 delete mode 100644 include/linux/if_macvtap.h
 create mode 100644 include/linux/if_tap.h

diff --git a/drivers/net/macvtap_main.c b/drivers/net/macvtap_main.c
index 96ffa60..548f339 100644
--- a/drivers/net/macvtap_main.c
+++ b/drivers/net/macvtap_main.c
@@ -1,6 +1,6 @@
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -62,7 +62,7 @@ static int macvtap_newlink(struct net *src_net,
 */
vlan->tap_features = TUN_OFFLOADS;
 
-   err = netdev_rx_handler_register(dev, macvtap_handle_frame, vlan);
+   err = netdev_rx_handler_register(dev, tap_handle_frame, vlan);
if (err)
return err;
 
@@ -82,7 +82,7 @@ static void macvtap_dellink(struct net_device *dev,
struct list_head *head)
 {
netdev_rx_handler_unregister(dev);
-   macvtap_del_queues(dev);
+   tap_del_queues(dev);
macvlan_dellink(dev, head);
 }
 
@@ -121,7 +121,7 @@ static int macvtap_device_event(struct notifier_block 
*unused,
 * been registered but before register_netdevice has
 * finished running.
 */
-   err = macvtap_get_minor(vlan);
+   err = tap_get_minor(vlan);
if (err)
return notifier_from_errno(err);
 
@@ -129,7 +129,7 @@ static int macvtap_device_event(struct notifier_block 
*unused,
classdev = device_create(&macvtap_class, &dev->dev, devt,
 dev, tap_name);
if (IS_ERR(classdev)) {
-   macvtap_free_minor(vlan);
+   tap_free_minor(vlan);
return notifier_from_errno(PTR_ERR(classdev));
}
err = sysfs_create_link(&dev->dev.kobj, &classdev->kobj,
@@ -144,10 +144,10 @@ static int macvtap_device_event(struct notifier_block 
*unused,
sysfs_remove_link(&dev->dev.kobj, tap_name);
devt = MKDEV(MAJOR(macvtap_major), vlan->minor);
device_destroy(&macvtap_class, devt);
-   macvtap_free_minor(vlan);
+   tap_free_minor(vlan);
break;
case NETDEV_CHANGE_TX_QUEUE_LEN:
-   if (macvtap_queue_resize(vlan))
+   if (tap_queue_resize(vlan))
return NOTIFY_BAD;
break;
}
@@ -159,7 +159,7 @@ static struct notifier_block macvtap_notifier_block 
__read_mostly = {
.notifier_call  = macvtap_device_event,
 };
 
-extern struct file_operations macvtap_fops;
+extern struct file_operations tap_fops;
 static int macvtap_init(void)
 {
int err;
@@ -169,7 +169,7 @@ static int macvtap_init(void)
if (err)
goto out1;
 
-   cdev_init(&macvtap_cdev, &macvtap_fops);
+   cdev_init(&macvtap_cdev, &tap_fops);
err = cdev_add(&macvtap_cdev, macvtap_major, MACVTAP_NUM_DEVS);
if (err)
goto out2;
diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index 8f12a39..d0807c2 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -24,16 +24,16 @@
 #include 
 
 /*
- * A macvtap queue is the central object of this driver, it connects
+ * A tap queue is the central object of this driver, it connects
  * an open character device to a macvlan interface. There can be
  * multiple queues on one interface, which map back to queues
  * implemented in hardware on the underlying device.
  *
- * macvtap_proto is used to allocate queues through the sock allocation
+ * tap_proto is used to allocate queues through the sock allocation
  * mechanism.
  *
  */
-struct macvtap_queue {
+struct tap_queue {
struct sock sk;
struct socket sock;
struct socket_wq wq;
@@ -47,21 +47,21 @@ struct macvtap_queue {
struct skb_array skb_array;
 };
 
-#define MACVTAP_FEATURES (IFF_VNET_HDR | IFF_MULTI_QUEUE)
+#define TAP_IFFEATURES (IFF_VNET_HDR | IFF_MULTI_QUEUE)
 
-#define MACVTAP_VNET_LE 0x8000
-#define MACVTAP_VNET_BE 0x4000
+#define TAP_VNET_LE 0x8000
+#define TAP_VNET_BE 0x4000
 
 #ifdef CONFIG_TUN_VNET_CROSS_LE
-static inline bool macvtap_legacy_is_little_endian(struct macvtap_queue *q)
+static inline bool tap_legacy_is_little_endian(struct tap_queue *q)
 {
-   return q->flags & MACVTAP_VNET_BE ? false :
+   return q->flags & TAP_VNET_BE ? false :
virtio_legacy_is_little_endian();
 }
 
-static long macvtap_get_vnet_be(struct macvtap_queue *q, int __user *sp)
+static long tap_g

[PATCHv2 6/7] TAP: tap as an independent module

2017-01-17 Thread Sainath Grandhi

This patch makes tap a separate module for other types of virtual interfaces, 
for example,
ipvlan to use.

Signed-off-by: Sainath Grandhi 
---
 drivers/net/Kconfig   | 15 +++
 drivers/net/Makefile  |  3 +--
 drivers/net/{macvtap_main.c => macvtap.c} |  1 -
 drivers/net/tap.c | 11 +++
 drivers/vhost/Kconfig |  2 +-
 include/linux/if_tap.h|  4 ++--
 6 files changed, 30 insertions(+), 6 deletions(-)
 rename drivers/net/{macvtap_main.c => macvtap.c} (99%)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 95c32f2..1c88437 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -135,6 +135,7 @@ config MACVTAP
tristate "MAC-VLAN based tap driver"
depends on MACVLAN
depends on INET
+   depends on TAP
help
  This adds a specialized tap character device driver that is based
  on the MAC-VLAN network interface, called macvtap. A macvtap device
@@ -284,6 +285,20 @@ config TUN
 
  If you don't know what to use this for, you don't need it.
 
+config TAP
+tristate "TAP module support for virtual interfaces"
+---help---
+  TAP module serves two purposes. This can be used as library of 
functions
+  for virtual interfaces to implement tap functionality.
+
+  This module also includes character device file and socket operations
+  that can be used by virtual interface implementing tap.
+
+  To compile this driver as a module, choose M here: the module
+  will be called tap.
+
+  If you don't know what to use this for, you don't need it.
+
 config TUN_VNET_CROSS_LE
bool "Support for cross-endian vnet headers on little-endian kernels"
default n
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 19b03a9..7dd86ca 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -21,6 +21,7 @@ obj-$(CONFIG_PHYLIB) += phy/
 obj-$(CONFIG_RIONET) += rionet.o
 obj-$(CONFIG_NET_TEAM) += team/
 obj-$(CONFIG_TUN) += tun.o
+obj-$(CONFIG_TAP) += tap.o
 obj-$(CONFIG_VETH) += veth.o
 obj-$(CONFIG_VIRTIO_NET) += virtio_net.o
 obj-$(CONFIG_VXLAN) += vxlan.o
@@ -29,8 +30,6 @@ obj-$(CONFIG_GTP) += gtp.o
 obj-$(CONFIG_NLMON) += nlmon.o
 obj-$(CONFIG_NET_VRF) += vrf.o
 
-macvtap-objs := macvtap_main.o tap.o
-
 #
 # Networking Drivers
 #
diff --git a/drivers/net/macvtap_main.c b/drivers/net/macvtap.c
similarity index 99%
rename from drivers/net/macvtap_main.c
rename to drivers/net/macvtap.c
index 3f047b4..3efed94 100644
--- a/drivers/net/macvtap_main.c
+++ b/drivers/net/macvtap.c
@@ -232,7 +232,6 @@ static int macvtap_init(void)
 }
 module_init(macvtap_init);
 
-extern struct idr minor_idr;
 static void macvtap_exit(void)
 {
rtnl_link_unregister(&macvtap_link_ops);
diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index 7f38dbe..32066dd 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -311,6 +311,7 @@ void tap_del_queues(struct tap_dev *tap)
/* guarantee that any future tap_set_queue will fail */
tap->numvtaps = MAX_TAP_QUEUES;
 }
+EXPORT_SYMBOL_GPL(tap_del_queues);
 
 rx_handler_result_t tap_handle_frame(struct sk_buff **pskb)
 {
@@ -388,6 +389,7 @@ rx_handler_result_t tap_handle_frame(struct sk_buff **pskb)
kfree_skb(skb);
return RX_HANDLER_CONSUMED;
 }
+EXPORT_SYMBOL_GPL(tap_handle_frame);
 
 static struct major_info *tap_get_major(int major)
 {
@@ -422,6 +424,7 @@ int tap_get_minor(dev_t major, struct tap_dev *tap)
mutex_unlock(&tap_major->minor_lock);
return retval < 0 ? retval : 0;
 }
+EXPORT_SYMBOL_GPL(tap_get_minor);
 
 void tap_free_minor(dev_t major, struct tap_dev *tap)
 {
@@ -438,6 +441,7 @@ void tap_free_minor(dev_t major, struct tap_dev *tap)
}
mutex_unlock(&tap_major->minor_lock);
 }
+EXPORT_SYMBOL_GPL(tap_free_minor);
 
 static struct tap_dev *dev_get_by_tap_file(int major, int minor)
 {
@@ -1193,6 +1197,7 @@ int tap_queue_resize(struct tap_dev *tap)
kfree(arrays);
return ret;
 }
+EXPORT_SYMBOL_GPL(tap_queue_resize);
 
 static int tap_list_add(dev_t major, const char *device_name)
 {
@@ -1236,6 +1241,7 @@ int tap_create_cdev(struct cdev *tap_cdev,
 out1:
return err;
 }
+EXPORT_SYMBOL_GPL(tap_create_cdev);
 
 void tap_destroy_cdev(dev_t major, struct cdev *tap_cdev)
 {
@@ -1249,3 +1255,8 @@ void tap_destroy_cdev(dev_t major, struct cdev *tap_cdev)
unregister_chrdev_region(major, TAP_NUM_DEVS);
idr_destroy(&tap_major->minor_idr);
 }
+EXPORT_SYMBOL_GPL(tap_destroy_cdev);
+
+MODULE_AUTHOR("Arnd Bergmann ");
+MODULE_AUTHOR("Sainath Grandhi ");
+MODULE_LICENSE("GPL");
diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 40764ec..cfdecea 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -1,6 +1,6 @@
 config VHOST_NET
tristate "Host kernel accelerator for virtio net"
-   depends on NE

[PATCHv2 5/7] TAP: Extending tap device create/destroy APIs

2017-01-17 Thread Sainath Grandhi

Extending tap APIs get/free_minor and create/destroy_cdev to handle more than 
one
type of virtual interface.

Signed-off-by: Sainath Grandhi 
---
 drivers/net/macvtap_main.c |  6 +--
 drivers/net/tap.c  | 98 +++---
 include/linux/if_tap.h |  4 +-
 3 files changed, 80 insertions(+), 28 deletions(-)

diff --git a/drivers/net/macvtap_main.c b/drivers/net/macvtap_main.c
index 6326a82..3f047b4 100644
--- a/drivers/net/macvtap_main.c
+++ b/drivers/net/macvtap_main.c
@@ -160,7 +160,7 @@ static int macvtap_device_event(struct notifier_block 
*unused,
 * been registered but before register_netdevice has
 * finished running.
 */
-   err = tap_get_minor(&vlantap->tap);
+   err = tap_get_minor(macvtap_major, &vlantap->tap);
if (err)
return notifier_from_errno(err);
 
@@ -168,7 +168,7 @@ static int macvtap_device_event(struct notifier_block 
*unused,
classdev = device_create(&macvtap_class, &dev->dev, devt,
 dev, tap_name);
if (IS_ERR(classdev)) {
-   tap_free_minor(&vlantap->tap);
+   tap_free_minor(macvtap_major, &vlantap->tap);
return notifier_from_errno(PTR_ERR(classdev));
}
err = sysfs_create_link(&dev->dev.kobj, &classdev->kobj,
@@ -183,7 +183,7 @@ static int macvtap_device_event(struct notifier_block 
*unused,
sysfs_remove_link(&dev->dev.kobj, tap_name);
devt = MKDEV(MAJOR(macvtap_major), vlantap->tap.minor);
device_destroy(&macvtap_class, devt);
-   tap_free_minor(&vlantap->tap);
+   tap_free_minor(macvtap_major, &vlantap->tap);
break;
case NETDEV_CHANGE_TX_QUEUE_LEN:
if (tap_queue_resize(&vlantap->tap))
diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index 43d9d54..7f38dbe 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -99,12 +99,16 @@ static struct proto tap_proto = {
 };
 
 #define TAP_NUM_DEVS (1U << MINORBITS)
+
+static LIST_HEAD(major_list);
+
 struct major_info {
dev_t major;
struct idr minor_idr;
struct mutex minor_lock;
const char *device_name;
-} macvtap_major;
+   struct list_head next;
+};
 
 #define GOODCOPY_LEN 128
 
@@ -385,44 +389,73 @@ rx_handler_result_t tap_handle_frame(struct sk_buff 
**pskb)
return RX_HANDLER_CONSUMED;
 }
 
-int tap_get_minor(struct tap_dev *tap)
+static struct major_info *tap_get_major(int major)
+{
+   struct major_info *tap_major, *tmp;
+
+   list_for_each_entry_safe(tap_major, tmp, &major_list, next) {
+   if (tap_major->major == major) {
+   return tap_major;
+   }
+   }
+
+   return NULL;
+}
+
+int tap_get_minor(dev_t major, struct tap_dev *tap)
 {
int retval = -ENOMEM;
+   struct major_info *tap_major;
+
+   tap_major = tap_get_major(MAJOR(major));
+   if (!tap_major)
+   return -EINVAL;
 
-   mutex_lock(&macvtap_major.minor_lock);
-   retval = idr_alloc(&macvtap_major.minor_idr, tap, 1, TAP_NUM_DEVS, 
GFP_KERNEL);
+   mutex_lock(&tap_major->minor_lock);
+   retval = idr_alloc(&tap_major->minor_idr, tap, 1, TAP_NUM_DEVS, 
GFP_KERNEL);
if (retval >= 0) {
tap->minor = retval;
} else if (retval == -ENOSPC) {
netdev_err(tap->dev, "Too many tap devices\n");
retval = -EINVAL;
}
-   mutex_unlock(&macvtap_major.minor_lock);
+   mutex_unlock(&tap_major->minor_lock);
return retval < 0 ? retval : 0;
 }
 
-void tap_free_minor(struct tap_dev *tap)
+void tap_free_minor(dev_t major, struct tap_dev *tap)
 {
-   mutex_lock(&macvtap_major.minor_lock);
+   struct major_info *tap_major;
+
+   tap_major = tap_get_major(MAJOR(major));
+   if (!tap_major)
+   return;
+
+   mutex_lock(&tap_major->minor_lock);
if (tap->minor) {
-   idr_remove(&macvtap_major.minor_idr, tap->minor);
+   idr_remove(&tap_major->minor_idr, tap->minor);
tap->minor = 0;
}
-   mutex_unlock(&macvtap_major.minor_lock);
+   mutex_unlock(&tap_major->minor_lock);
 }
 
-static struct tap_dev *dev_get_by_tap_minor(int minor)
+static struct tap_dev *dev_get_by_tap_file(int major, int minor)
 {
struct net_device *dev = NULL;
struct tap_dev *tap;
+   struct major_info *tap_major;
+
+   tap_major = tap_get_major(major);
+   if (!tap_major)
+   return NULL;
 
-   mutex_lock(&macvtap_major.minor_lock);
-   tap = idr_find(&macvtap_major.minor_idr, minor);
+   mutex_lock(&tap_major->minor_lock);
+   tap = idr_find(&tap_major->minor_idr, minor);
if (tap) {
dev = tap->dev;

Re: [PATCH net-next v4 05/10] drivers: base: Add device_find_in_class_name()

2017-01-17 Thread Florian Fainelli

On 01/17/2017 04:00 PM, Andy Shevchenko wrote:
> On Wed, Jan 18, 2017 at 1:43 AM, Florian Fainelli  
> wrote:
>> On 01/17/2017 03:34 PM, Andy Shevchenko wrote:
>>> On Wed, Jan 18, 2017 at 1:21 AM, Florian Fainelli  
>>> wrote:
> 
 +static int device_class_name_match(struct device *dev, void *class)
>>>
>>> And why not const char *class?
>>
>> This was raised back in v2, and the same response applies:
>>
>> https://www.mail-archive.com/netdev@vger.kernel.org/msg147559.html
>>
>> Changing the signature of a callback is out of the scope of this patch
>> series.
> 
> Ah, right.
> 
> But why not to use void *class_name to be consistent with callback and
> device_find_child()?

The top-level function: device_find_in_class_name() should have a
stronger typing of its argument even if it internally uses
device_find_child() and a callback that takes a void * argument, that's
how I see it.

> 
> Btw,
> 
> return get_device(parent);

Not sure I follow what that means here?
-- 
Florian

Recall: [PATCHv2 0/7] Refactor macvtap to re-use tap functionality by other virtual intefaces

2017-01-17 Thread Grandhi, Sainath

Grandhi, Sainath would like to recall the message, "[PATCHv2 0/7] Refactor 
macvtap to re-use tap functionality by other virtual intefaces".

[PATCHv2 0/7] Refactor macvtap to re-use tap functionality by other virtual intefaces

2017-01-17 Thread Sainath Grandhi

Tap character devices can be implemented on other virtual interfaces like
ipvlan, similar to macvtap. Source code for tap functionality in macvtap
can be re-used for this purpose.

This patch series splits macvtap source into two modules, macvtap and tap.
This patch series also includes a patch for implementing tap character
device driver based on the IP-VLAN network interface, called ipvtap.

These patches are tested on x86 platform.

Sainath Grandhi (7):
  TAP: Refactoring macvtap.c
  TAP: Renaming tap related APIs, data structures, macros
  TAP: Tap character device creation/destroy API
  TAP: Abstract type of virtual interface from tap  implementation
  TAP: Extending tap device create/destroy APIs
  TAP: tap as an independent module
  IPVTAP: IP-VLAN based tap driver

 drivers/net/Kconfig  |   28 +
 drivers/net/Makefile |2 +
 drivers/net/ipvlan/Makefile  |1 +
 drivers/net/ipvlan/ipvlan.h  |7 +
 drivers/net/ipvlan/ipvlan_core.c |5 +-
 drivers/net/ipvlan/ipvlan_main.c |   27 +-
 drivers/net/ipvlan/ipvtap.c  |  238 +++
 drivers/net/macvlan.c|2 +-
 drivers/net/macvtap.c| 1226 ++--
 drivers/net/tap.c| 1262 ++
 drivers/vhost/Kconfig|2 +-
 drivers/vhost/net.c  |3 +-
 include/linux/if_macvlan.h   |   17 +-
 include/linux/if_tap.h   |   75 +++
 14 files changed, 1686 insertions(+), 1209 deletions(-)
 create mode 100644 drivers/net/ipvlan/ipvtap.c
 create mode 100644 drivers/net/tap.c
 create mode 100644 include/linux/if_tap.h

-- 
2.7.4

Re: Getting a handle on all these new NIC features

2017-01-17 Thread Florian Fainelli

On 01/17/2017 02:05 PM, Tom Herbert wrote:
> I realize that backports of a driver is not a specific concern of the
> Linux kernel, but nevertheless this is a real problem and a fact of
> life for many users. Rebasing the full kernel is still a major effort
> and it seems the best we could ever do is one rebase per year. In the
> interim we need to occasionally backport drivers. Backporting drivers
> is difficult precisely because of new features or API changes to
> existing ones. These sort of changes tend to have a spiderweb of
> dependencies in other parts of the stack so that the number of patches
> we need to cherry-pick goes way beyond those that touch the driver we
> are interested in.

backports (formerly known as compat-wireless) dealt with that problem by
pulling in all dependencies from the networking stack (and beyond ),
this allowed people with a need to stay on a particular kernel version
to get the newest and latest networking bits and drivers with minor
disruption to other parts of the kernel. The project now seems to be
largely dead, but could be revived I presume:

https://backports.wiki.kernel.org/index.php/Main_Page

> 
> In short, I would like to ask if driver maintainers to start to
> modularize driver features. If something being added is obviously a
> narrow feature that only a subset of users will need can we allow
> config options to #ifdef those out somehow? 

Multiplying the number if #ifdef means that every config option is going
to be turned on by Linux distributions, and most likely just a subset
will be turned by specific kernel configurations (like yours), but all
in all, this multiplies the number of build combinations to a point
where this may not be manageable for an upstream driver and some
combinations won't be tested properly except by whoever diverges from
these. I understand the concern of modularizing and having clean
independent features/modules, I am unsure that more configuration
options is necessarily right approach.

Slightly tangential, once a series of patches lands in a given
maintainers' tree, it is very hard to match a given commit with its
original submission and say, locate the 11 other patches out of this 12
patch series adding feature XYZ of interest. David does a great job a
putting submissions in a branch, which helps a lot, but in general,
there is not enough information in git to associate a given patch with
its companion patches within a series, hence making backporting harder IMHO.
-- 
Florian

Re: [PATCH net-next v4 05/10] drivers: base: Add device_find_in_class_name()

2017-01-17 Thread Andy Shevchenko

On Wed, Jan 18, 2017 at 1:43 AM, Florian Fainelli  wrote:
> On 01/17/2017 03:34 PM, Andy Shevchenko wrote:
>> On Wed, Jan 18, 2017 at 1:21 AM, Florian Fainelli  
>> wrote:

>>> +static int device_class_name_match(struct device *dev, void *class)
>>
>> And why not const char *class?
>
> This was raised back in v2, and the same response applies:
>
> https://www.mail-archive.com/netdev@vger.kernel.org/msg147559.html
>
> Changing the signature of a callback is out of the scope of this patch
> series.

Ah, right.

But why not to use void *class_name to be consistent with callback and
device_find_child()?

Btw,

return get_device(parent);

-- 
With Best Regards,
Andy Shevchenko

RE: [PATCHv1 5/7] TAP: Extending tap device create/destroy APIs

2017-01-17 Thread Grandhi, Sainath

Please find reply inline.

> -Original Message-
> From: Andy Shevchenko [mailto:andy.shevche...@gmail.com]
> Sent: Friday, January 06, 2017 3:21 PM
> To: Grandhi, Sainath 
> Cc: netdev ; David S. Miller
> ; mah...@bandewar.net; linux-
> ker...@vger.kernel.org
> Subject: Re: [PATCHv1 5/7] TAP: Extending tap device create/destroy APIs
> 
> On Sat, Jan 7, 2017 at 12:33 AM, Sainath Grandhi
>  wrote:
> > Extending tap APIs get/free_minor and create/destroy_cdev to handle
> > more than one type of virtual interface.
> >
> > Signed-off-by: Sainath Grandhi 
> > Tested-by: Sainath Grandhi 
> 
> Usually it implies that commiter has tested the stuff.
> 
> > --- a/drivers/net/tap.c
> > +++ b/drivers/net/tap.c
> > @@ -99,12 +99,16 @@ static struct proto tap_proto = {  };
> >
> >  #define TAP_NUM_DEVS (1U << MINORBITS)
> 
> > +
> > +LIST_HEAD(major_list);
> > +
> 
> static ?
Makes sense. Would take care of it.
> 
> > -int tap_get_minor(struct tap_dev *tap)
> > +int tap_get_minor(dev_t major, struct tap_dev *tap)
> >  {
> > int retval = -ENOMEM;
> > +   struct major_info *tap_major, *tmp;
> > +   bool found = false;
> >
> > -   mutex_lock(&macvtap_major.minor_lock);
> > -   retval = idr_alloc(&macvtap_major.minor_idr, tap, 1, TAP_NUM_DEVS,
> GFP_KERNEL);
> 
> > +   list_for_each_entry_safe(tap_major, tmp, &major_list, next) {
> > +   if (tap_major->major == MAJOR(major)) {
> > +   found = true;
> > +   break;
> > +   }
> > +   }
> > +
> > +   if (!found)
> > +   return -EINVAL;
> 
> This is candidate to be a separate helper function. See also below.
Would define a helper function.
> 
> 
> > -void tap_free_minor(struct tap_dev *tap)
> > +void tap_free_minor(dev_t major, struct tap_dev *tap)
> >  {
> > -   mutex_lock(&macvtap_major.minor_lock);
> > +   struct major_info *tap_major, *tmp;
> 
> > +   bool found = false;
> > +
> > +   list_for_each_entry_safe(tap_major, tmp, &major_list, next) {
> > +   if (tap_major->major == MAJOR(major)) {
> > +   found = true;
> > +   break;
> > +   }
> > +   }
> > +
> > +   if (!found)
> > +   return;
> 
> Here is quite the same code (as above).
> 
> > -static struct tap_dev *dev_get_by_tap_minor(int minor)
> > +static struct tap_dev *dev_get_by_tap_file(int major, int minor)
> >  {
> > struct net_device *dev = NULL;
> > struct tap_dev *tap;
> > +   struct major_info *tap_major, *tmp;
> > +   bool found = false;
> >
> > -   mutex_lock(&macvtap_major.minor_lock);
> > -   tap = idr_find(&macvtap_major.minor_idr, minor);
> 
> > +   list_for_each_entry_safe(tap_major, tmp, &major_list, next) {
> > +   if (tap_major->major == major) {
> > +   found = true;
> > +   break;
> > +   }
> > +   }
> > +
> > +   if (!found)
> > +   return NULL;
> 
> And here.
> 
> > +static int tap_list_add(dev_t major, const char *device_name) {
> 
> > +   int err = 0;
> > +   struct major_info *tap_major;
> 
> Perhaps
> +   struct major_info *tap_major;
> +   int err = 0;
> 
> > +
> > +   tap_major = kzalloc(sizeof(*tap_major), GFP_ATOMIC);
> > +
> > +   tap_major->major = MAJOR(major);
> > +
> > +   idr_init(&tap_major->minor_idr);
> > +   mutex_init(&tap_major->minor_lock);
> > +
> > +   tap_major->device_name = device_name;
> > +
> > +   list_add_tail(&tap_major->next, &major_list);
> > +   return err;
> 
> 
> > +   err = tap_list_add(*tap_major, device_name);
> >
> > return err;
> 
> return tap_list_add();
> 
> >  void tap_destroy_cdev(dev_t major, struct cdev *tap_cdev)  {
> > +   struct major_info *tap_major, *tmp;
> > +   bool found = false;
> > +
> > +   list_for_each_entry_safe(tap_major, tmp, &major_list, next) {
> > +   if (tap_major->major == MAJOR(major)) {
> > +   found = true;
> > +   break;
> > +   }
> > +   }
> > +
> > +   if (!found)
> > +   return;
> 
> And here.
> 
> --
> With Best Regards,
> Andy Shevchenko

[PATCH net-next 1/2] net: ipv6: remove nowait arg to rt6_fill_node

2017-01-17 Thread David Ahern

All callers of rt6_fill_node pass 0 for nowait arg. Remove the arg and
simplify rt6_fill_node accordingly.

rt6_fill_node passes the nowait of 0 to ip6mr_get_route. Remove the
nowait arg from it as well.

Signed-off-by: David Ahern 
---
 include/linux/mroute6.h |  2 +-
 net/ipv6/ip6mr.c|  9 ++---
 net/ipv6/route.c| 27 ++-
 3 files changed, 13 insertions(+), 25 deletions(-)

diff --git a/include/linux/mroute6.h b/include/linux/mroute6.h
index 19a1c0c2993b..ce44e3e96d27 100644
--- a/include/linux/mroute6.h
+++ b/include/linux/mroute6.h
@@ -116,7 +116,7 @@ struct mfc6_cache {
 
 struct rtmsg;
 extern int ip6mr_get_route(struct net *net, struct sk_buff *skb,
-  struct rtmsg *rtm, int nowait, u32 portid);
+  struct rtmsg *rtm, u32 portid);
 
 #ifdef CONFIG_IPV6_MROUTE
 extern struct sock *mroute6_socket(struct net *net, struct sk_buff *skb);
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index e275077e8af2..babaf3ec2742 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -2288,7 +2288,7 @@ static int __ip6mr_fill_mroute(struct mr6_table *mrt, 
struct sk_buff *skb,
 }
 
 int ip6mr_get_route(struct net *net, struct sk_buff *skb, struct rtmsg *rtm,
-   int nowait, u32 portid)
+   u32 portid)
 {
int err;
struct mr6_table *mrt;
@@ -2315,11 +2315,6 @@ int ip6mr_get_route(struct net *net, struct sk_buff 
*skb, struct rtmsg *rtm,
struct net_device *dev;
int vif;
 
-   if (nowait) {
-   read_unlock(&mrt_lock);
-   return -EAGAIN;
-   }
-
dev = skb->dev;
if (!dev || (vif = ip6mr_find_vif(mrt, dev)) < 0) {
read_unlock(&mrt_lock);
@@ -2357,7 +2352,7 @@ int ip6mr_get_route(struct net *net, struct sk_buff *skb, 
struct rtmsg *rtm,
return err;
}
 
-   if (!nowait && (rtm->rtm_flags&RTM_F_NOTIFY))
+   if (rtm->rtm_flags & RTM_F_NOTIFY)
cache->mfc_flags |= MFC_NOTIFY;
 
err = __ip6mr_fill_mroute(mrt, skb, cache, rtm);
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 4f6b067c8753..b2044dd71724 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3169,7 +3169,7 @@ static int rt6_fill_node(struct net *net,
 struct sk_buff *skb, struct rt6_info *rt,
 struct in6_addr *dst, struct in6_addr *src,
 int iif, int type, u32 portid, u32 seq,
-int prefix, int nowait, unsigned int flags)
+int prefix, unsigned int flags)
 {
u32 metrics[RTAX_MAX];
struct rtmsg *rtm;
@@ -3261,19 +3261,12 @@ static int rt6_fill_node(struct net *net,
if (iif) {
 #ifdef CONFIG_IPV6_MROUTE
if (ipv6_addr_is_multicast(&rt->rt6i_dst.addr)) {
-   int err = ip6mr_get_route(net, skb, rtm, nowait,
- portid);
-
-   if (err <= 0) {
-   if (!nowait) {
-   if (err == 0)
-   return 0;
-   goto nla_put_failure;
-   } else {
-   if (err == -EMSGSIZE)
-   goto nla_put_failure;
-   }
-   }
+   int err = ip6mr_get_route(net, skb, rtm, portid);
+
+   if (err == 0)
+   return 0;
+   if (err < 0)
+   goto nla_put_failure;
} else
 #endif
if (nla_put_u32(skb, RTA_IIF, iif))
@@ -3342,7 +3335,7 @@ int rt6_dump_route(struct rt6_info *rt, void *p_arg)
return rt6_fill_node(arg->net,
 arg->skb, rt, NULL, NULL, 0, RTM_NEWROUTE,
 NETLINK_CB(arg->cb->skb).portid, arg->cb->nlh->nlmsg_seq,
-prefix, 0, NLM_F_MULTI);
+prefix, NLM_F_MULTI);
 }
 
 static int inet6_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh)
@@ -3433,7 +3426,7 @@ static int inet6_rtm_getroute(struct sk_buff *in_skb, 
struct nlmsghdr *nlh)
 
err = rt6_fill_node(net, skb, rt, &fl6.daddr, &fl6.saddr, iif,
RTM_NEWROUTE, NETLINK_CB(in_skb).portid,
-   nlh->nlmsg_seq, 0, 0, 0);
+   nlh->nlmsg_seq, 0, 0);
if (err < 0) {
kfree_skb(skb);
goto errout;
@@ -3460,7 +3453,7 @@ void inet6_rt_notify(int event, struct rt6_info *rt, 
struct nl_info *info,
goto errout;
 
err = rt6_fill_node(net, skb, rt, NULL, NULL, 0,
-   event, inf

[PATCH net-next 2/2] net: ipv6: remove prefix arg to rt6_fill_node

2017-01-17 Thread David Ahern

The prefix arg to rt6_fill_node is non-0 in only 1 path - rt6_dump_route
where a user is requesting a prefix only dump. Simplify rt6_fill_node
by removing the prefix arg and moving the prefix check to rt6_dump_route.

Signed-off-by: David Ahern 
---
 net/ipv6/route.c | 27 ---
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index b2044dd71724..5585c501a540 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3169,7 +3169,7 @@ static int rt6_fill_node(struct net *net,
 struct sk_buff *skb, struct rt6_info *rt,
 struct in6_addr *dst, struct in6_addr *src,
 int iif, int type, u32 portid, u32 seq,
-int prefix, unsigned int flags)
+unsigned int flags)
 {
u32 metrics[RTAX_MAX];
struct rtmsg *rtm;
@@ -3177,13 +3177,6 @@ static int rt6_fill_node(struct net *net,
long expires;
u32 table;
 
-   if (prefix) {   /* user wants prefix routes only */
-   if (!(rt->rt6i_flags & RTF_PREFIX_RT)) {
-   /* success since this is not a prefix route */
-   return 1;
-   }
-   }
-
nlh = nlmsg_put(skb, portid, seq, type, sizeof(*rtm), flags);
if (!nlh)
return -EMSGSIZE;
@@ -3324,18 +3317,22 @@ static int rt6_fill_node(struct net *net,
 int rt6_dump_route(struct rt6_info *rt, void *p_arg)
 {
struct rt6_rtnl_dump_arg *arg = (struct rt6_rtnl_dump_arg *) p_arg;
-   int prefix;
 
if (nlmsg_len(arg->cb->nlh) >= sizeof(struct rtmsg)) {
struct rtmsg *rtm = nlmsg_data(arg->cb->nlh);
-   prefix = (rtm->rtm_flags & RTM_F_PREFIX) != 0;
-   } else
-   prefix = 0;
+
+   /* user wants prefix routes only */
+   if (rtm->rtm_flags & RTM_F_PREFIX &&
+   !(rt->rt6i_flags & RTF_PREFIX_RT)) {
+   /* success since this is not a prefix route */
+   return 1;
+   }
+   }
 
return rt6_fill_node(arg->net,
 arg->skb, rt, NULL, NULL, 0, RTM_NEWROUTE,
 NETLINK_CB(arg->cb->skb).portid, arg->cb->nlh->nlmsg_seq,
-prefix, NLM_F_MULTI);
+NLM_F_MULTI);
 }
 
 static int inet6_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh)
@@ -3426,7 +3423,7 @@ static int inet6_rtm_getroute(struct sk_buff *in_skb, 
struct nlmsghdr *nlh)
 
err = rt6_fill_node(net, skb, rt, &fl6.daddr, &fl6.saddr, iif,
RTM_NEWROUTE, NETLINK_CB(in_skb).portid,
-   nlh->nlmsg_seq, 0, 0);
+   nlh->nlmsg_seq, 0);
if (err < 0) {
kfree_skb(skb);
goto errout;
@@ -3453,7 +3450,7 @@ void inet6_rt_notify(int event, struct rt6_info *rt, 
struct nl_info *info,
goto errout;
 
err = rt6_fill_node(net, skb, rt, NULL, NULL, 0,
-   event, info->portid, seq, 0, nlm_flags);
+   event, info->portid, seq, nlm_flags);
if (err < 0) {
/* -EMSGSIZE implies BUG in rt6_nlmsg_size() */
WARN_ON(err == -EMSGSIZE);
-- 
2.1.4

[PATCH net-next 0/2] net: ipv6: simplify rt6_fill_node

2017-01-17 Thread David Ahern

Remove a couple of unnecessary input arguments to rt6_fill_node.

David Ahern (2):
  net: ipv6: remove nowait arg to rt6_fill_node
  net: ipv6: remove prefix arg to rt6_fill_node

 include/linux/mroute6.h |  2 +-
 net/ipv6/ip6mr.c|  9 ++---
 net/ipv6/route.c| 46 ++
 3 files changed, 21 insertions(+), 36 deletions(-)

-- 
2.1.4

Re: [PATCH net-next v4 05/10] drivers: base: Add device_find_in_class_name()

2017-01-17 Thread Florian Fainelli

On 01/17/2017 03:34 PM, Andy Shevchenko wrote:
> On Wed, Jan 18, 2017 at 1:21 AM, Florian Fainelli  
> wrote:
>> Add a helper function to lookup a device reference given a class name.
>> This is a preliminary patch to remove adhoc code from net/dsa/dsa.c and
>> make it more generic.
> 
> 
>> +static int device_class_name_match(struct device *dev, void *class)
> 
> And why not const char *class?

This was raised back in v2, and the same response applies:

https://www.mail-archive.com/netdev@vger.kernel.org/msg147559.html

Changing the signature of a callback is out of the scope of this patch
series.
-- 
Florian

RE: [PATCHv1 7/7] IPVTAP: IP-VLAN based tap driver

2017-01-17 Thread Grandhi, Sainath



> -Original Message-
> From: Mahesh Bandewar (महेश बंडेवार)
> [mailto:mahe...@google.com]
> Sent: Friday, January 06, 2017 3:47 PM
> To: Grandhi, Sainath 
> Cc: linux-netdev ; David Miller
> ; mah...@bandewar.net; linux-
> ker...@vger.kernel.org
> Subject: Re: [PATCHv1 7/7] IPVTAP: IP-VLAN based tap driver
> 
> few superficial comments inline.
> 
> On Fri, Jan 6, 2017 at 2:33 PM, Sainath Grandhi 
> wrote:
> > This patch adds a tap character device driver that is based on the
> > IP-VLAN network interface, called ipvtap. An ipvtap device can be
> > created in the same way as an ipvlan device, using 'type ipvtap', and
> > then accessed using the tap user space interface.
> >
> > Signed-off-by: Sainath Grandhi 
> > Tested-by: Sainath Grandhi 
> > ---
> >  drivers/net/Kconfig  |  12 ++
> >  drivers/net/Makefile |   1 +
> >  drivers/net/ipvlan/Makefile  |   1 +
> >  drivers/net/ipvlan/ipvlan.h  |   7 ++
> >  drivers/net/ipvlan/ipvlan_core.c |   5 +-
> >  drivers/net/ipvlan/ipvlan_main.c |  37 +++---
> >  drivers/net/ipvlan/ipvtap.c  | 238
> +++
> >  7 files changed, 282 insertions(+), 19 deletions(-)  create mode
> > 100644 drivers/net/ipvlan/ipvtap.c
> >
> > diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index
> > 280380d..ddfb30a 100644
> > --- a/drivers/net/Kconfig
> > +++ b/drivers/net/Kconfig
> > @@ -165,6 +165,18 @@ config IPVLAN
> >To compile this driver as a module, choose M here: the module
> >will be called ipvlan.
> >
> > +config IPVTAP
> > +tristate "IP-VLAN based tap driver"
> > +depends on IPVLAN
> > +depends on INET
> > +help
> > +  This adds a specialized tap character device driver that is based
> > +  on the IP-VLAN network interface, called ipvtap. An ipvtap device
> > +  can be added in the same way as a ipvlan device, using 'type
> > +  ipvtap', and then be accessed through the tap user space 
> > interface.
> > +
> > +  To compile this driver as a module, choose M here: the module
> > +  will be called macvtap.
> >
> >  config VXLAN
> > tristate "Virtual eXtensible Local Area Network (VXLAN)"
> > diff --git a/drivers/net/Makefile b/drivers/net/Makefile index
> > 7dd86ca..98ed4d9 100644
> > --- a/drivers/net/Makefile
> > +++ b/drivers/net/Makefile
> > @@ -7,6 +7,7 @@
> >  #
> >  obj-$(CONFIG_BONDING) += bonding/
> >  obj-$(CONFIG_IPVLAN) += ipvlan/
> > +obj-$(CONFIG_IPVTAP) += ipvlan/
> >  obj-$(CONFIG_DUMMY) += dummy.o
> >  obj-$(CONFIG_EQUALIZER) += eql.o
> >  obj-$(CONFIG_IFB) += ifb.o
> > diff --git a/drivers/net/ipvlan/Makefile b/drivers/net/ipvlan/Makefile
> > index df79910..8a2c64d 100644
> > --- a/drivers/net/ipvlan/Makefile
> > +++ b/drivers/net/ipvlan/Makefile
> > @@ -3,5 +3,6 @@
> >  #
> >
> >  obj-$(CONFIG_IPVLAN) += ipvlan.o
> > +obj-$(CONFIG_IPVTAP) += ipvtap.o
> >
> >  ipvlan-objs := ipvlan_core.o ipvlan_main.o diff --git
> > a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h index
> > dbfbb33..4362d88 100644
> > --- a/drivers/net/ipvlan/ipvlan.h
> > +++ b/drivers/net/ipvlan/ipvlan.h
> > @@ -133,4 +133,11 @@ struct sk_buff *ipvlan_l3_rcv(struct net_device
> *dev, struct sk_buff *skb,
> >   u16 proto);  unsigned int
> > ipvlan_nf_input(void *priv, struct sk_buff *skb,
> >  const struct nf_hook_state *state);
> > +void ipvlan_count_rx(const struct ipvl_dev *ipvlan,
> > +unsigned int len, bool success, bool mcast); int
> > +ipvlan_link_new(struct net *src_net, struct net_device *dev,
> > +   struct nlattr *tb[], struct nlattr *data[]); void
> > +ipvlan_link_delete(struct net_device *dev, struct list_head *head);
> > +void ipvlan_link_setup(struct net_device *dev); int
> > +ipvlan_link_register(struct rtnl_link_ops *ops);
> >  #endif /* __IPVLAN_H */
> > diff --git a/drivers/net/ipvlan/ipvlan_core.c
> > b/drivers/net/ipvlan/ipvlan_core.c
> > index 83ce74a..9af16ab 100644
> > --- a/drivers/net/ipvlan/ipvlan_core.c
> > +++ b/drivers/net/ipvlan/ipvlan_core.c
> > @@ -16,8 +16,8 @@ void ipvlan_init_secret(void)
> > net_get_random_once(&ipvlan_jhash_secret,
> > sizeof(ipvlan_jhash_secret));  }
> >
> > -static void ipvlan_count_rx(const struct ipvl_dev *ipvlan,
> > -   unsigned int len, bool success, bool mcast)
> > +void ipvlan_count_rx(const struct ipvl_dev *ipvlan,
> > +unsigned int len, bool success, bool mcast)
> >  {
> > if (!ipvlan)
> > return;
> > @@ -36,6 +36,7 @@ static void ipvlan_count_rx(const struct ipvl_dev
> *ipvlan,
> > this_cpu_inc(ipvlan->pcpu_stats->rx_errs);
> > }
> >  }
> > +EXPORT_SYMBOL_GPL(ipvlan_count_rx);
> Why export, isn't just removing 'static' enough?
This function becomes part of "ipvlan" module. 
"ipvtap" module depends on this function exported by "ipvlan" module.
>

RE: [PATCHv1 7/7] IPVTAP: IP-VLAN based tap driver

2017-01-17 Thread Grandhi, Sainath



> -Original Message-
> From: Eric Dumazet [mailto:eric.duma...@gmail.com]
> Sent: Friday, January 06, 2017 3:14 PM
> To: Grandhi, Sainath 
> Cc: netdev@vger.kernel.org; da...@davemloft.net;
> mah...@bandewar.net; linux-ker...@vger.kernel.org
> Subject: Re: [PATCHv1 7/7] IPVTAP: IP-VLAN based tap driver
> 
> On Fri, 2017-01-06 at 14:33 -0800, Sainath Grandhi wrote:
> > This patch adds a tap character device driver that is based on the
> > IP-VLAN network interface, called ipvtap. An ipvtap device can be
> > created in the same way as an ipvlan device, using 'type ipvtap', and
> > then accessed using the tap user space interface.
> >
> > Signed-off-by: Sainath Grandhi 
> > Tested-by: Sainath Grandhi 
> > ---
> 
> 
> > +module_exit(ipvtap_exit);
> > +MODULE_ALIAS_RTNL_LINK("ipvtap");
> > +MODULE_AUTHOR("Arnd Bergmann ");
> > +MODULE_LICENSE("GPL");
> 
> Who wrote this driver exactly ???
> 
> 
Sending out next version, modifying this.

Re: Getting a handle on all these new NIC features

2017-01-17 Thread Saeed Mahameed

On Wed, Jan 18, 2017 at 12:05 AM, Tom Herbert  wrote:
> There was some discussion about the problems of dealing with the
> explosion of NIC features in the mlx directory restructuring proposal,
> but I think the is a deeper issue here that should be discussed.
>
> It's hard not to notice that there has been quite a proliferation of
> NIC features in several drivers. This trend had resulted in very
> complex driver code that may or may not segment individual features.
> One visible manifestation of this is number of ndo functions which is
> somewhere around seventy-five now.
>
> I suspect the vast majority of these advances NIC features (e.g.
> bridging, UDP offloads, tc offload, etc.) are only relevant to some of
> the people some of the time. The problem we have, in this case those
> of us that are attempting to deploy and maintain NICs at scale, is
> when we have to deal with the ramifications of these features being
> intertwined with core driver functionality that is relevant to
> everyone. This becomes very obvious when we need to backport drivers
> from later versions of kernel.
>
> I realize that backports of a driver is not a specific concern of the
> Linux kernel, but nevertheless this is a real problem and a fact of
> life for many users. Rebasing the full kernel is still a major effort
> and it seems the best we could ever do is one rebase per year. In the
> interim we need to occasionally backport drivers. Backporting drivers
> is difficult precisely because of new features or API changes to
> existing ones. These sort of changes tend to have a spiderweb of
> dependencies in other parts of the stack so that the number of patches
> we need to cherry-pick goes way beyond those that touch the driver we
> are interested in.
>

I think backporting is not the only concern here, the other main issue
 is a pure software
design related that cannot just be ignored, device drivers are getting
smarter and
are doing lots of offloads and logic, they are not as thin as they
used to be, which is also a justification for why we should take a
second (stop coding for a while :-) ) and give this issue some
attention.

> Currently we (FB) need to backport two NIC drivers. I've already gave
> details of backporting mlx5 on the thread to restructure the driver
> directories. The other driver being backporting seems to suffer from
> the same type of feature complexity.
>

Can you share some more about the most complex stuff you faced while
backporting?
What would have made it simpler if we designed the driver differently ?

> In short, I would like to ask if driver maintainers to start to
> modularize driver features. If something being added is obviously a
> narrow feature that only a subset of users will need can we allow
> config options to #ifdef those out somehow? Furthermore can the file
> and directory structure of drivers reflect that; our lives would be
> _so_ much simpler to maintain drivers in production if we have such
> modularity and the ability to build drivers with the features of our
> choosing.
>

Before we do this or define the plan, there are some questions to be asked:
1. Can we allow ourselves to have kconfig or even an internal
compilation flag per device driver feature ?
2. What about previous features ? i mean in order to have a clean and
clear way to do have this isolation for new features, some kind of
restructuring or core reorganizing is required, it is ugly to have
driver with a hybrid structuring.
3. in case if we decide to do a restructuring phase as we suggested in
the mlx5 patch, what is the plan for older kernels who still backport
fixes to the previous structure.
4. What is the concrete plan ? is there a design reference or
guidelines known to someone that every one can follow ?

Anyway I would like to contribute some thoughts and design techniques
to achieve this moularization and features isolation by design ( at
least for new features):

Device initialization and netdev registration:
 - most of the device drivers have main.c which handles driver
initialization and netdev registration.
 - but today this file provide much more than the above.
 - I suggest to keep it as thin as possible and dedicated to what
it should do.
 - keep HAL (Hardware Abstraction Layer) separated from main.c and
main should call entry points exposed by the HAL layer.
 - basic netdev features RX/TX and most basic ndos for basic
Ethernet functionality can still be in main.c
  - Advanced features (eswitch,TC offloads, vxlan and tunneling
offloads, XDP, etc..) such features can go to separate file(s) with
full logic implementation and clear code locality wrapped by #ifdef
compilation or kconfig flag to have easy control on them and to give
the reviewer/developer a chance to logically understand the code and
distinguish between the different features by looking at the Makefile
or the c file including those features. ( just keep the feature logic
out of main.c)

I've been partially followi

[PATCH net-next] lwtunnel: remove device arg to lwtunnel_build_state

2017-01-17 Thread David Ahern

Nothing about lwt state requires a device reference, so remove the
input argument.

Signed-off-by: David Ahern 
---
 include/net/lwtunnel.h|  6 +++---
 net/core/lwt_bpf.c|  2 +-
 net/core/lwtunnel.c   |  4 ++--
 net/ipv4/fib_semantics.c  | 22 ++
 net/ipv4/ip_tunnel_core.c |  4 ++--
 net/ipv6/ila/ila_lwt.c|  2 +-
 net/ipv6/route.c  |  2 +-
 net/ipv6/seg6_iptunnel.c  |  2 +-
 net/mpls/mpls_iptunnel.c  |  2 +-
 9 files changed, 18 insertions(+), 28 deletions(-)

diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h
index d4c1c75b8862..671d5a766dd9 100644
--- a/include/net/lwtunnel.h
+++ b/include/net/lwtunnel.h
@@ -33,7 +33,7 @@ struct lwtunnel_state {
 };
 
 struct lwtunnel_encap_ops {
-   int (*build_state)(struct net_device *dev, struct nlattr *encap,
+   int (*build_state)(struct nlattr *encap,
   unsigned int family, const void *cfg,
   struct lwtunnel_state **ts);
void (*destroy_state)(struct lwtunnel_state *lws);
@@ -105,7 +105,7 @@ int lwtunnel_encap_add_ops(const struct lwtunnel_encap_ops 
*op,
   unsigned int num);
 int lwtunnel_encap_del_ops(const struct lwtunnel_encap_ops *op,
   unsigned int num);
-int lwtunnel_build_state(struct net_device *dev, u16 encap_type,
+int lwtunnel_build_state(u16 encap_type,
 struct nlattr *encap,
 unsigned int family, const void *cfg,
 struct lwtunnel_state **lws);
@@ -168,7 +168,7 @@ static inline int lwtunnel_encap_del_ops(const struct 
lwtunnel_encap_ops *op,
return -EOPNOTSUPP;
 }
 
-static inline int lwtunnel_build_state(struct net_device *dev, u16 encap_type,
+static inline int lwtunnel_build_state(u16 encap_type,
   struct nlattr *encap,
   unsigned int family, const void *cfg,
   struct lwtunnel_state **lws)
diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
index 71bb3e2eca08..4b737a2e5457 100644
--- a/net/core/lwt_bpf.c
+++ b/net/core/lwt_bpf.c
@@ -237,7 +237,7 @@ static const struct nla_policy bpf_nl_policy[LWT_BPF_MAX + 
1] = {
[LWT_BPF_XMIT_HEADROOM] = { .type = NLA_U32 },
 };
 
-static int bpf_build_state(struct net_device *dev, struct nlattr *nla,
+static int bpf_build_state(struct nlattr *nla,
   unsigned int family, const void *cfg,
   struct lwtunnel_state **ts)
 {
diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c
index a5d4e866ce88..0f30398e0bdd 100644
--- a/net/core/lwtunnel.c
+++ b/net/core/lwtunnel.c
@@ -100,7 +100,7 @@ int lwtunnel_encap_del_ops(const struct lwtunnel_encap_ops 
*ops,
 }
 EXPORT_SYMBOL(lwtunnel_encap_del_ops);
 
-int lwtunnel_build_state(struct net_device *dev, u16 encap_type,
+int lwtunnel_build_state(u16 encap_type,
 struct nlattr *encap, unsigned int family,
 const void *cfg, struct lwtunnel_state **lws)
 {
@@ -127,7 +127,7 @@ int lwtunnel_build_state(struct net_device *dev, u16 
encap_type,
}
 #endif
if (likely(ops && ops->build_state))
-   ret = ops->build_state(dev, encap, family, cfg, lws);
+   ret = ops->build_state(encap, family, cfg, lws);
rcu_read_unlock();
 
return ret;
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 9a375b908d01..f57efe73b84f 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -471,7 +471,6 @@ static int fib_count_nexthops(struct rtnexthop *rtnh, int 
remaining)
 static int fib_get_nhs(struct fib_info *fi, struct rtnexthop *rtnh,
   int remaining, struct fib_config *cfg)
 {
-   struct net *net = cfg->fc_nlinfo.nl_net;
int ret;
 
change_nexthops(fi) {
@@ -503,16 +502,14 @@ static int fib_get_nhs(struct fib_info *fi, struct 
rtnexthop *rtnh,
nla = nla_find(attrs, attrlen, RTA_ENCAP);
if (nla) {
struct lwtunnel_state *lwtstate;
-   struct net_device *dev = NULL;
struct nlattr *nla_entype;
 
nla_entype = nla_find(attrs, attrlen,
  RTA_ENCAP_TYPE);
if (!nla_entype)
goto err_inval;
-   if (cfg->fc_oif)
-   dev = __dev_get_by_index(net, 
cfg->fc_oif);
-   ret = lwtunnel_build_state(dev, nla_get_u16(
+
+   ret = lwtunnel_build_state(nla_get_u16(
   nla_entype),
   nla,  AF_INET, cfg,

[PATCH net-next v4 06/10] net: dsa: Migrate to device_find_in_class_name()

2017-01-17 Thread Florian Fainelli

Now that the base device driver code provides an identical
implementation of dev_find_class() utilize device_find_in_class_name()
instead of our own version of it.

Signed-off-by: Florian Fainelli 
---
 net/dsa/dsa.c | 22 ++
 1 file changed, 2 insertions(+), 20 deletions(-)

diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 2306d1b87c83..d9db63910887 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -455,29 +455,11 @@ EXPORT_SYMBOL_GPL(dsa_switch_resume);
 #endif
 
 /* platform driver init and cleanup */
-static int dev_is_class(struct device *dev, void *class)
-{
-   if (dev->class != NULL && !strcmp(dev->class->name, class))
-   return 1;
-
-   return 0;
-}
-
-static struct device *dev_find_class(struct device *parent, char *class)
-{
-   if (dev_is_class(parent, class)) {
-   get_device(parent);
-   return parent;
-   }
-
-   return device_find_child(parent, class, dev_is_class);
-}
-
 struct mii_bus *dsa_host_dev_to_mii_bus(struct device *dev)
 {
struct device *d;
 
-   d = dev_find_class(dev, "mdio_bus");
+   d = device_find_in_class_name(dev, "mdio_bus");
if (d != NULL) {
struct mii_bus *bus;
 
@@ -495,7 +477,7 @@ static struct net_device *dev_to_net_device(struct device 
*dev)
 {
struct device *d;
 
-   d = dev_find_class(dev, "net");
+   d = device_find_in_class_name(dev, "net");
if (d != NULL) {
struct net_device *nd;
 
-- 
2.9.3

[PATCH net-next v4 02/10] net: dsa: Make most functions take a dsa_port argument

2017-01-17 Thread Florian Fainelli

In preparation for allowing platform data, and therefore no valid
device_node pointer, make most DSA functions takes a pointer to a
dsa_port structure whenever possible. While at it, introduce a
dsa_port_is_valid() helper function which checks whether port->dn is
NULL or not at the moment.

Signed-off-by: Florian Fainelli 
---
 net/dsa/dsa.c  | 15 --
 net/dsa/dsa2.c | 61 +-
 net/dsa/dsa_priv.h |  4 ++--
 3 files changed, 44 insertions(+), 36 deletions(-)

diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index fd532487dfdf..2306d1b87c83 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -110,8 +110,9 @@ dsa_switch_probe(struct device *parent, struct device 
*host_dev, int sw_addr,
 
 /* basic switch operations **/
 int dsa_cpu_dsa_setup(struct dsa_switch *ds, struct device *dev,
- struct device_node *port_dn, int port)
+ struct dsa_port *dport, int port)
 {
+   struct device_node *port_dn = dport->dn;
struct phy_device *phydev;
int ret, mode;
 
@@ -141,15 +142,15 @@ int dsa_cpu_dsa_setup(struct dsa_switch *ds, struct 
device *dev,
 
 static int dsa_cpu_dsa_setups(struct dsa_switch *ds, struct device *dev)
 {
-   struct device_node *port_dn;
+   struct dsa_port *dport;
int ret, port;
 
for (port = 0; port < DSA_MAX_PORTS; port++) {
if (!(dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port)))
continue;
 
-   port_dn = ds->ports[port].dn;
-   ret = dsa_cpu_dsa_setup(ds, dev, port_dn, port);
+   dport = &ds->ports[port];
+   ret = dsa_cpu_dsa_setup(ds, dev, dport, port);
if (ret)
return ret;
}
@@ -366,8 +367,10 @@ dsa_switch_setup(struct dsa_switch_tree *dst, int index,
return ds;
 }
 
-void dsa_cpu_dsa_destroy(struct device_node *port_dn)
+void dsa_cpu_dsa_destroy(struct dsa_port *port)
 {
+   struct device_node *port_dn = port->dn;
+
if (of_phy_is_fixed_link(port_dn))
of_phy_deregister_fixed_link(port_dn);
 }
@@ -393,7 +396,7 @@ static void dsa_switch_destroy(struct dsa_switch *ds)
for (port = 0; port < DSA_MAX_PORTS; port++) {
if (!(dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port)))
continue;
-   dsa_cpu_dsa_destroy(ds->ports[port].dn);
+   dsa_cpu_dsa_destroy(&ds->ports[port]);
 
/* Clearing a bit which is not set does no harm */
ds->cpu_port_mask |= ~(1 << port);
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 4170f7ea8e28..6e3675220fef 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -79,14 +79,19 @@ static void dsa_dst_del_ds(struct dsa_switch_tree *dst,
kref_put(&dst->refcount, dsa_free_dst);
 }
 
-static bool dsa_port_is_dsa(struct device_node *port)
+static bool dsa_port_is_valid(struct dsa_port *port)
 {
-   return !!of_parse_phandle(port, "link", 0);
+   return !!port->dn;
 }
 
-static bool dsa_port_is_cpu(struct device_node *port)
+static bool dsa_port_is_dsa(struct dsa_port *port)
 {
-   return !!of_parse_phandle(port, "ethernet", 0);
+   return !!of_parse_phandle(port->dn, "link", 0);
+}
+
+static bool dsa_port_is_cpu(struct dsa_port *port)
+{
+   return !!of_parse_phandle(port->dn, "ethernet", 0);
 }
 
 static bool dsa_ds_find_port(struct dsa_switch *ds,
@@ -120,7 +125,7 @@ static struct dsa_switch *dsa_dst_find_port(struct 
dsa_switch_tree *dst,
 
 static int dsa_port_complete(struct dsa_switch_tree *dst,
 struct dsa_switch *src_ds,
-struct device_node *port,
+struct dsa_port *port,
 u32 src_port)
 {
struct device_node *link;
@@ -128,7 +133,7 @@ static int dsa_port_complete(struct dsa_switch_tree *dst,
struct dsa_switch *dst_ds;
 
for (index = 0;; index++) {
-   link = of_parse_phandle(port, "link", index);
+   link = of_parse_phandle(port->dn, "link", index);
if (!link)
break;
 
@@ -151,13 +156,13 @@ static int dsa_port_complete(struct dsa_switch_tree *dst,
  */
 static int dsa_ds_complete(struct dsa_switch_tree *dst, struct dsa_switch *ds)
 {
-   struct device_node *port;
+   struct dsa_port *port;
u32 index;
int err;
 
for (index = 0; index < DSA_MAX_PORTS; index++) {
-   port = ds->ports[index].dn;
-   if (!port)
+   port = &ds->ports[index];
+   if (!dsa_port_is_valid(port))
continue;
 
if (!dsa_port_is_dsa(port))
@@ -197,7 +202,7 @@ static int dsa_dst_complete(struct dsa_switch_tree *dst)
return 0;
 }
 
-static int dsa_dsa_port_apply(struct device_node *port, u3

Re: [PATCH net-next v4 05/10] drivers: base: Add device_find_in_class_name()

2017-01-17 Thread Andy Shevchenko

On Wed, Jan 18, 2017 at 1:21 AM, Florian Fainelli  wrote:
> Add a helper function to lookup a device reference given a class name.
> This is a preliminary patch to remove adhoc code from net/dsa/dsa.c and
> make it more generic.


> +static int device_class_name_match(struct device *dev, void *class)

And why not const char *class?

> +{
> +   if (dev->class != NULL && !strcmp(dev->class->name, class))

if (dev->class && ...)

> +   return 1;
> +
> +   return 0;

Perhaps even one line:

return dev->class && ...;

> +}
> +
> +/**
> + * device_find_in_class_name - device iterator for locating a particular 
> device
> + * within the specified class name
> + * @parent: parent struct device
> + * @class_name: Class name to match against
> + *
> + * This function returns 1 if the device (specified by @parent), or one of 
> its child
> + * is in the class whose name is specified by @class_name. Returns 0 
> otherwise.
> + *
> + * NOTE: you will need to drop the reference with put_device() after use.
> + */
> +struct device *device_find_in_class_name(struct device *parent,
> +char *class_name)

const char *class_name

> +{
> +   if (device_class_name_match(parent, class_name)) {
> +   get_device(parent);
> +   return parent;
> +   }
> +
> +   return device_find_child(parent, class_name, device_class_name_match);
> +}
> +EXPORT_SYMBOL_GPL(device_find_in_class_name);

> +extern struct device *device_find_in_class_name(struct device *parent,
> +   char *class_name);

Ditto.

-- 
With Best Regards,
Andy Shevchenko

[PATCH net-next v4 07/10] net: Relocate dev_to_net_device() into net/core/dev.c

2017-01-17 Thread Florian Fainelli

dev_to_net_device() is moved from net/dsa/dsa.c to net/core/dev.c since
it going to be used by net/dsa/dsa2.c and the namespace of the function
justifies making it available to other users potentially. We also rename
it to device_to_net_device() to better illustrate what it does since it
is not just a container_of() wrapper.

Signed-off-by: Florian Fainelli 
---
 include/linux/netdevice.h |  2 ++
 net/core/dev.c| 30 ++
 net/dsa/dsa.c | 20 +---
 3 files changed, 33 insertions(+), 19 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 97ae0ac513ee..f8cc9833107c 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4390,4 +4390,6 @@ do {  
\
 #define PTYPE_HASH_SIZE(16)
 #define PTYPE_HASH_MASK(PTYPE_HASH_SIZE - 1)
 
+struct net_device *device_to_net_device(struct device *dev);
+
 #endif /* _LINUX_NETDEVICE_H */
diff --git a/net/core/dev.c b/net/core/dev.c
index ad5959e56116..f6897906f229 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -8128,6 +8128,36 @@ const char *netdev_drivername(const struct net_device 
*dev)
return empty;
 }
 
+/**
+ * device_to_net_device - return the net_device from device
+ * @dev: device reference
+ *
+ * Returns the net_device associated with this device reference
+ * NULL if the device is not a network device, or could not be
+ * found.
+ *
+ * Note: caller must call dev_put() to release the net_device
+ * once done with it.
+ */
+struct net_device *device_to_net_device(struct device *dev)
+{
+   struct device *d;
+
+   d = device_find_in_class_name(dev, "net");
+   if (d) {
+   struct net_device *nd;
+
+   nd = to_net_dev(d);
+   dev_hold(nd);
+   put_device(d);
+
+   return nd;
+   }
+
+   return NULL;
+}
+EXPORT_SYMBOL_GPL(device_to_net_device);
+
 static void __netdev_printk(const char *level, const struct net_device *dev,
struct va_format *vaf)
 {
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index d9db63910887..88b56f7e3dd2 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -473,24 +473,6 @@ struct mii_bus *dsa_host_dev_to_mii_bus(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(dsa_host_dev_to_mii_bus);
 
-static struct net_device *dev_to_net_device(struct device *dev)
-{
-   struct device *d;
-
-   d = device_find_in_class_name(dev, "net");
-   if (d != NULL) {
-   struct net_device *nd;
-
-   nd = to_net_dev(d);
-   dev_hold(nd);
-   put_device(d);
-
-   return nd;
-   }
-
-   return NULL;
-}
-
 #ifdef CONFIG_OF
 static int dsa_of_setup_routing_table(struct dsa_platform_data *pd,
struct dsa_chip_data *cd,
@@ -799,7 +781,7 @@ static int dsa_probe(struct platform_device *pdev)
dev = pd->of_netdev;
dev_hold(dev);
} else {
-   dev = dev_to_net_device(pd->netdev);
+   dev = device_to_net_device(pd->netdev);
}
if (dev == NULL) {
ret = -EPROBE_DEFER;
-- 
2.9.3

[PATCH net-next v4 04/10] net: dsa: Move ports assignment closer to error checking

2017-01-17 Thread Florian Fainelli

Move the assignment of ports in _dsa_register_switch() closer to where
it is checked, no functional change. Re-order declarations to be
preserve the inverted christmas tree style.

Signed-off-by: Florian Fainelli 
---
 net/dsa/dsa2.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 04ab62251fe3..cd91070b5467 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -587,8 +587,8 @@ static struct device_node *dsa_get_ports(struct dsa_switch 
*ds,
 static int _dsa_register_switch(struct dsa_switch *ds, struct device *dev)
 {
struct device_node *np = dev->of_node;
-   struct device_node *ports = dsa_get_ports(ds, np);
struct dsa_switch_tree *dst;
+   struct device_node *ports;
u32 tree, index;
int i, err;
 
@@ -596,6 +596,7 @@ static int _dsa_register_switch(struct dsa_switch *ds, 
struct device *dev)
if (err)
return err;
 
+   ports = dsa_get_ports(ds, np);
if (IS_ERR(ports))
return PTR_ERR(ports);
 
-- 
2.9.3

[PATCH net-next v4 03/10] net: dsa: Suffix function manipulating device_node with _dn

2017-01-17 Thread Florian Fainelli

Make it clear that these functions take a device_node structure pointer

Signed-off-by: Florian Fainelli 
---
 net/dsa/dsa2.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 6e3675220fef..04ab62251fe3 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -94,8 +94,8 @@ static bool dsa_port_is_cpu(struct dsa_port *port)
return !!of_parse_phandle(port->dn, "ethernet", 0);
 }
 
-static bool dsa_ds_find_port(struct dsa_switch *ds,
-struct device_node *port)
+static bool dsa_ds_find_port_dn(struct dsa_switch *ds,
+   struct device_node *port)
 {
u32 index;
 
@@ -105,8 +105,8 @@ static bool dsa_ds_find_port(struct dsa_switch *ds,
return false;
 }
 
-static struct dsa_switch *dsa_dst_find_port(struct dsa_switch_tree *dst,
-   struct device_node *port)
+static struct dsa_switch *dsa_dst_find_port_dn(struct dsa_switch_tree *dst,
+  struct device_node *port)
 {
struct dsa_switch *ds;
u32 index;
@@ -116,7 +116,7 @@ static struct dsa_switch *dsa_dst_find_port(struct 
dsa_switch_tree *dst,
if (!ds)
continue;
 
-   if (dsa_ds_find_port(ds, port))
+   if (dsa_ds_find_port_dn(ds, port))
return ds;
}
 
@@ -137,7 +137,7 @@ static int dsa_port_complete(struct dsa_switch_tree *dst,
if (!link)
break;
 
-   dst_ds = dsa_dst_find_port(dst, link);
+   dst_ds = dsa_dst_find_port_dn(dst, link);
of_node_put(link);
 
if (!dst_ds)
@@ -546,7 +546,7 @@ static int dsa_parse_ports_dn(struct device_node *ports, 
struct dsa_switch *ds)
return 0;
 }
 
-static int dsa_parse_member(struct device_node *np, u32 *tree, u32 *index)
+static int dsa_parse_member_dn(struct device_node *np, u32 *tree, u32 *index)
 {
int err;
 
@@ -592,7 +592,7 @@ static int _dsa_register_switch(struct dsa_switch *ds, 
struct device *dev)
u32 tree, index;
int i, err;
 
-   err = dsa_parse_member(np, &tree, &index);
+   err = dsa_parse_member_dn(np, &tree, &index);
if (err)
return err;
 
-- 
2.9.3

Re: [PATCH 2/2] at803x: double check SGMII side autoneg

2017-01-17 Thread Timur Tabi


On 10/24/2016 05:40 AM, Zefir Kurtisi wrote:

As a result, if you ever see a warning
'803x_aneg_done: SGMII link is not ok' you will
end up having an Ethernet link up but won't get
any data through. This should not happen, if it
does, please contact the module maintainer.


I am now seeing this:

ubuntu@ubuntu:~$ ifup eth1
ubuntu@ubuntu:~$ [  588.687689] 803x_aneg_done: SGMII link is not ok
[  588.694909] qcom-emac QCOM8070:00 eth1: Link is Up - 1Gbps/Full - flow 
control rx/tx
[  588.703985] qcom-emac QCOM8070:00 eth1: Link is Up - 1Gbps/Full - flow 
control rx/tx


ubuntu@ubuntu:~$ ping 192.168.3.1
PING 192.168.3.1 (192.168.3.1) 56(84) bytes of data.
64 bytes from 192.168.3.1: icmp_seq=1 ttl=64 time=0.502 ms
64 bytes from 192.168.3.1: icmp_seq=2 ttl=64 time=0.244 ms
64 bytes from 192.168.3.1: icmp_seq=3 ttl=64 time=0.220 ms
^C
--- 192.168.3.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2107ms
rtt min/avg/max/mdev = 0.220/0.322/0.502/0.127 ms

So I do get the "SGMII link is not ok" message, but my connection is fine.  I 
don't know why the link-up message is displayed twice.  It's only displayed 
once if I use the genphy driver instead of the at803x driver.


I'm going to debug the at803x to see what it does that causes the double 
link-up message.


--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

[PATCH net-next v4 05/10] drivers: base: Add device_find_in_class_name()

2017-01-17 Thread Florian Fainelli

Add a helper function to lookup a device reference given a class name.
This is a preliminary patch to remove adhoc code from net/dsa/dsa.c and
make it more generic.

Signed-off-by: Florian Fainelli 
---
 drivers/base/core.c| 31 +++
 include/linux/device.h |  2 ++
 2 files changed, 33 insertions(+)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 8c25e68e67d7..fb9fced38634 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -2058,6 +2058,37 @@ struct device *device_find_child(struct device *parent, 
void *data,
 }
 EXPORT_SYMBOL_GPL(device_find_child);
 
+static int device_class_name_match(struct device *dev, void *class)
+{
+   if (dev->class != NULL && !strcmp(dev->class->name, class))
+   return 1;
+
+   return 0;
+}
+
+/**
+ * device_find_in_class_name - device iterator for locating a particular device
+ * within the specified class name
+ * @parent: parent struct device
+ * @class_name: Class name to match against
+ *
+ * This function returns 1 if the device (specified by @parent), or one of its 
child
+ * is in the class whose name is specified by @class_name. Returns 0 otherwise.
+ *
+ * NOTE: you will need to drop the reference with put_device() after use.
+ */
+struct device *device_find_in_class_name(struct device *parent,
+char *class_name)
+{
+   if (device_class_name_match(parent, class_name)) {
+   get_device(parent);
+   return parent;
+   }
+
+   return device_find_child(parent, class_name, device_class_name_match);
+}
+EXPORT_SYMBOL_GPL(device_find_in_class_name);
+
 int __init devices_init(void)
 {
devices_kset = kset_create_and_add("devices", &device_uevent_ops, NULL);
diff --git a/include/linux/device.h b/include/linux/device.h
index 491b4c0ca633..fbc2a255f92e 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -1120,6 +1120,8 @@ extern int device_for_each_child_reverse(struct device 
*dev, void *data,
 int (*fn)(struct device *dev, void *data));
 extern struct device *device_find_child(struct device *dev, void *data,
int (*match)(struct device *dev, void *data));
+extern struct device *device_find_in_class_name(struct device *parent,
+   char *class_name);
 extern int device_rename(struct device *dev, const char *new_name);
 extern int device_move(struct device *dev, struct device *new_parent,
   enum dpm_order dpm_order);
-- 
2.9.3

[PATCH net-next v4 08/10] net: dsa: Add support for platform data

2017-01-17 Thread Florian Fainelli

Allow drivers to use the new DSA API with platform data. Most of the
code in net/dsa/dsa2.c does not rely so much on device_nodes and can get
the same information from platform_data instead.

We purposely do not support distributed configurations with platform
data, so drivers should be providing a pointer to a 'struct
dsa_chip_data' structure if they wish to communicate per-port layout.

Multiple CPUs port could potentially be supported and dsa_chip_data is
extended to receive up to one reference to an upstream network device
per port described by a dsa_chip_data structure.

Signed-off-by: Florian Fainelli 
---
 include/net/dsa.h |   6 
 net/dsa/dsa2.c| 102 --
 2 files changed, 90 insertions(+), 18 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 16a502a6c26a..491008792e4d 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -42,6 +42,11 @@ struct dsa_chip_data {
struct device   *host_dev;
int sw_addr;
 
+   /*
+* Reference to network devices
+*/
+   struct device   *netdev[DSA_MAX_PORTS];
+
/* set to size of eeprom if supported by the switch */
int eeprom_len;
 
@@ -140,6 +145,7 @@ struct dsa_switch_tree {
 };
 
 struct dsa_port {
+   const char  *name;
struct net_device   *netdev;
struct device_node  *dn;
unsigned intageing_time;
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index cd91070b5467..761e8724423f 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -79,19 +79,28 @@ static void dsa_dst_del_ds(struct dsa_switch_tree *dst,
kref_put(&dst->refcount, dsa_free_dst);
 }
 
+/* For platform data configurations, we need to have a valid name argument to
+ * differentiate a disabled port from an enabled one
+ */
 static bool dsa_port_is_valid(struct dsa_port *port)
 {
-   return !!port->dn;
+   return !!(port->dn || port->name);
 }
 
 static bool dsa_port_is_dsa(struct dsa_port *port)
 {
-   return !!of_parse_phandle(port->dn, "link", 0);
+   if (port->name && !strcmp(port->name, "dsa"))
+   return true;
+   else
+   return !!of_parse_phandle(port->dn, "link", 0);
 }
 
 static bool dsa_port_is_cpu(struct dsa_port *port)
 {
-   return !!of_parse_phandle(port->dn, "ethernet", 0);
+   if (port->name && !strcmp(port->name, "cpu"))
+   return true;
+   else
+   return !!of_parse_phandle(port->dn, "ethernet", 0);
 }
 
 static bool dsa_ds_find_port_dn(struct dsa_switch *ds,
@@ -251,10 +260,11 @@ static void dsa_cpu_port_unapply(struct dsa_port *port, 
u32 index,
 static int dsa_user_port_apply(struct dsa_port *port, u32 index,
   struct dsa_switch *ds)
 {
-   const char *name;
+   const char *name = port->name;
int err;
 
-   name = of_get_property(port->dn, "label", NULL);
+   if (port->dn)
+   name = of_get_property(port->dn, "label", NULL);
if (!name)
name = "eth%d";
 
@@ -439,11 +449,15 @@ static int dsa_cpu_parse(struct dsa_port *port, u32 index,
struct net_device *ethernet_dev;
struct device_node *ethernet;
 
-   ethernet = of_parse_phandle(port->dn, "ethernet", 0);
-   if (!ethernet)
-   return -EINVAL;
+   if (port->dn) {
+   ethernet = of_parse_phandle(port->dn, "ethernet", 0);
+   if (!ethernet)
+   return -EINVAL;
+   ethernet_dev = of_find_net_device_by_node(ethernet);
+   } else {
+   ethernet_dev = device_to_net_device(ds->cd->netdev[index]);
+   }
 
-   ethernet_dev = of_find_net_device_by_node(ethernet);
if (!ethernet_dev)
return -EPROBE_DEFER;
 
@@ -462,6 +476,7 @@ static int dsa_cpu_parse(struct dsa_port *port, u32 index,
dst->tag_ops = dsa_resolve_tag_protocol(tag_protocol);
if (IS_ERR(dst->tag_ops)) {
dev_warn(ds->dev, "No tagger for this switch\n");
+   dev_put(ethernet_dev);
return PTR_ERR(dst->tag_ops);
}
 
@@ -546,6 +561,33 @@ static int dsa_parse_ports_dn(struct device_node *ports, 
struct dsa_switch *ds)
return 0;
 }
 
+static int dsa_parse_ports(struct dsa_chip_data *cd, struct dsa_switch *ds)
+{
+   bool valid_name_found = false;
+   unsigned int i;
+
+   for (i = 0; i < DSA_MAX_PORTS; i++) {
+   if (!cd->port_names[i])
+   continue;
+
+   ds->ports[i].name = cd->port_names[i];
+
+   /* Initialize enabled_port_mask now for drv->setup()
+* to have access to a correct value, just like what
+* net/dsa/dsa.c::dsa_switch_setup_one does.
+*/
+   if (!dsa_port_is_cpu(&ds->ports[i]))
+   ds->enabled_port_mask |= 1 << i;
+
+

[PATCH net-next v4 09/10] net: phy: Allow pre-declaration of MDIO devices

2017-01-17 Thread Florian Fainelli

Allow board support code to collect pre-declarations for MDIO devices by
registering them with mdiobus_register_board_info(). SPI and I2C buses
have a similar feature, we were missing this for MDIO devices, but this
is particularly useful for e.g: MDIO-connected switches which need to
provide their port layout (often board-specific) to a MDIO Ethernet
switch driver.

Signed-off-by: Florian Fainelli 
---
 drivers/net/phy/Makefile |  3 +-
 drivers/net/phy/mdio-boardinfo.c | 86 
 drivers/net/phy/mdio-boardinfo.h | 19 +
 drivers/net/phy/mdio_bus.c   |  4 ++
 drivers/net/phy/mdio_device.c| 11 +
 include/linux/mdio.h |  3 ++
 include/linux/mod_devicetable.h  |  1 +
 include/linux/phy.h  | 19 +
 8 files changed, 145 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/phy/mdio-boardinfo.c
 create mode 100644 drivers/net/phy/mdio-boardinfo.h

diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index 356859ac7c18..407b0b601ea8 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -1,6 +1,7 @@
 # Makefile for Linux PHY drivers and MDIO bus drivers
 
-libphy-y   := phy.o phy_device.o mdio_bus.o mdio_device.o
+libphy-y   := phy.o phy_device.o mdio_bus.o mdio_device.o \
+  mdio-boardinfo.o
 libphy-$(CONFIG_SWPHY) += swphy.o
 libphy-$(CONFIG_LED_TRIGGER_PHY)   += phy_led_triggers.o
 
diff --git a/drivers/net/phy/mdio-boardinfo.c b/drivers/net/phy/mdio-boardinfo.c
new file mode 100644
index ..6b988f77da08
--- /dev/null
+++ b/drivers/net/phy/mdio-boardinfo.c
@@ -0,0 +1,86 @@
+/*
+ * mdio-boardinfo - Collect pre-declarations for MDIO devices
+ *
+ * This program is free software; you can redistribute  it and/or modify it
+ * under  the terms of  the GNU General  Public License as published by the
+ * Free Software Foundation;  either version 2 of the  License, or (at your
+ * option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "mdio-boardinfo.h"
+
+static LIST_HEAD(mdio_board_list);
+static DEFINE_MUTEX(mdio_board_lock);
+
+/**
+ * mdiobus_setup_mdiodev_from_board_info - create and setup MDIO devices
+ * from pre-collected board specific MDIO information
+ * @mdiodev: MDIO device pointer
+ * Context: can sleep
+ */
+void mdiobus_setup_mdiodev_from_board_info(struct mii_bus *bus)
+{
+   struct mdio_board_entry *be;
+   struct mdio_device *mdiodev;
+   struct mdio_board_info *bi;
+   int ret;
+
+   mutex_lock(&mdio_board_lock);
+   list_for_each_entry(be, &mdio_board_list, list) {
+   bi = &be->board_info;
+
+   if (strcmp(bus->id, bi->bus_id))
+   continue;
+
+   mdiodev = mdio_device_create(bus, bi->mdio_addr);
+   if (IS_ERR(mdiodev))
+   continue;
+
+   strncpy(mdiodev->modalias, bi->modalias,
+   sizeof(mdiodev->modalias));
+   mdiodev->bus_match = mdio_device_bus_match;
+   mdiodev->dev.platform_data = (void *)bi->platform_data;
+
+   ret = mdio_device_register(mdiodev);
+   if (ret) {
+   mdio_device_free(mdiodev);
+   continue;
+   }
+   }
+   mutex_unlock(&mdio_board_lock);
+}
+
+/**
+ * mdio_register_board_info - register MDIO devices for a given board
+ * @info: array of devices descriptors
+ * @n: number of descriptors provided
+ * Context: can sleep
+ *
+ * The board info passed can be marked with __initdata but be pointers
+ * such as platform_data etc. are copied as-is
+ */
+int mdiobus_register_board_info(const struct mdio_board_info *info,
+   unsigned int n)
+{
+   struct mdio_board_entry *be;
+   unsigned int i;
+
+   be = kcalloc(n, sizeof(*be), GFP_KERNEL);
+   if (!be)
+   return -ENOMEM;
+
+   for (i = 0; i < n; i++, be++, info++) {
+   memcpy(&be->board_info, info, sizeof(*info));
+   mutex_lock(&mdio_board_lock);
+   list_add_tail(&be->list, &mdio_board_list);
+   mutex_unlock(&mdio_board_lock);
+   }
+
+   return 0;
+}
diff --git a/drivers/net/phy/mdio-boardinfo.h b/drivers/net/phy/mdio-boardinfo.h
new file mode 100644
index ..00f98163e90e
--- /dev/null
+++ b/drivers/net/phy/mdio-boardinfo.h
@@ -0,0 +1,19 @@
+/*
+ * mdio-boardinfo.h - board info interface internal to the mdio_bus
+ * component
+ */
+
+#ifndef __MDIO_BOARD_INFO_H
+#define __MDIO_BOARD_INFO_H
+
+#include 
+#include 
+
+struct mdio_board_entry {
+   struct list_headlist;
+   struct mdio_board_info  board_info;
+};
+
+void mdiobus_setup_mdiodev_from_board_info(struct mii_bus *bus);
+
+#endif /* __MDIO_BOARD_INFO_H */
diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net

[PATCH net-next v4 10/10] ARM: orion: Register DSA switch as a MDIO device

2017-01-17 Thread Florian Fainelli

Utilize the ability to pass board specific MDIO bus information towards a
particular MDIO device thus allowing us to provide the per-port switch layout
to the Marvell 88E6XXX switch driver.

Since we would end-up with conflicting registration paths, do not register the
"dsa" platform device anymore.

Note that the MDIO devices registered by code in net/dsa/dsa2.c does not
parse a dsa_platform_data, but directly take a dsa_chip_data (specific
to a single switch chip), so we update the different call sites to pass
this structure down to orion_ge00_switch_init().

Signed-off-by: Florian Fainelli 
---
 arch/arm/mach-orion5x/common.c   |  2 +-
 arch/arm/mach-orion5x/common.h   |  4 ++--
 arch/arm/mach-orion5x/rd88f5181l-fxo-setup.c |  7 +--
 arch/arm/mach-orion5x/rd88f5181l-ge-setup.c  |  7 +--
 arch/arm/mach-orion5x/rd88f6183ap-ge-setup.c |  7 +--
 arch/arm/mach-orion5x/wnr854t-setup.c|  2 +-
 arch/arm/mach-orion5x/wrt350n-v2-setup.c |  7 +--
 arch/arm/plat-orion/common.c | 25 +++--
 arch/arm/plat-orion/include/plat/common.h|  4 ++--
 9 files changed, 29 insertions(+), 36 deletions(-)

diff --git a/arch/arm/mach-orion5x/common.c b/arch/arm/mach-orion5x/common.c
index 04910764c385..83a7ec4c16d0 100644
--- a/arch/arm/mach-orion5x/common.c
+++ b/arch/arm/mach-orion5x/common.c
@@ -105,7 +105,7 @@ void __init orion5x_eth_init(struct 
mv643xx_eth_platform_data *eth_data)
 /*
  * Ethernet switch
  /
-void __init orion5x_eth_switch_init(struct dsa_platform_data *d)
+void __init orion5x_eth_switch_init(struct dsa_chip_data *d)
 {
orion_ge00_switch_init(d);
 }
diff --git a/arch/arm/mach-orion5x/common.h b/arch/arm/mach-orion5x/common.h
index 8a4115bd441d..efeffc6b4ebb 100644
--- a/arch/arm/mach-orion5x/common.h
+++ b/arch/arm/mach-orion5x/common.h
@@ -3,7 +3,7 @@
 
 #include 
 
-struct dsa_platform_data;
+struct dsa_chip_data;
 struct mv643xx_eth_platform_data;
 struct mv_sata_platform_data;
 
@@ -41,7 +41,7 @@ void orion5x_setup_wins(void);
 void orion5x_ehci0_init(void);
 void orion5x_ehci1_init(void);
 void orion5x_eth_init(struct mv643xx_eth_platform_data *eth_data);
-void orion5x_eth_switch_init(struct dsa_platform_data *d);
+void orion5x_eth_switch_init(struct dsa_chip_data *d);
 void orion5x_i2c_init(void);
 void orion5x_sata_init(struct mv_sata_platform_data *sata_data);
 void orion5x_spi_init(void);
diff --git a/arch/arm/mach-orion5x/rd88f5181l-fxo-setup.c 
b/arch/arm/mach-orion5x/rd88f5181l-fxo-setup.c
index dccadf68ea2b..a3c1336d30c9 100644
--- a/arch/arm/mach-orion5x/rd88f5181l-fxo-setup.c
+++ b/arch/arm/mach-orion5x/rd88f5181l-fxo-setup.c
@@ -101,11 +101,6 @@ static struct dsa_chip_data 
rd88f5181l_fxo_switch_chip_data = {
.port_names[7]  = "lan3",
 };
 
-static struct dsa_platform_data __initdata rd88f5181l_fxo_switch_plat_data = {
-   .nr_chips   = 1,
-   .chip   = &rd88f5181l_fxo_switch_chip_data,
-};
-
 static void __init rd88f5181l_fxo_init(void)
 {
/*
@@ -120,7 +115,7 @@ static void __init rd88f5181l_fxo_init(void)
 */
orion5x_ehci0_init();
orion5x_eth_init(&rd88f5181l_fxo_eth_data);
-   orion5x_eth_switch_init(&rd88f5181l_fxo_switch_plat_data);
+   orion5x_eth_switch_init(&rd88f5181l_fxo_switch_chip_data);
orion5x_uart0_init();
 
mvebu_mbus_add_window_by_id(ORION_MBUS_DEVBUS_BOOT_TARGET,
diff --git a/arch/arm/mach-orion5x/rd88f5181l-ge-setup.c 
b/arch/arm/mach-orion5x/rd88f5181l-ge-setup.c
index affe5ec825de..252efe29bd1a 100644
--- a/arch/arm/mach-orion5x/rd88f5181l-ge-setup.c
+++ b/arch/arm/mach-orion5x/rd88f5181l-ge-setup.c
@@ -102,11 +102,6 @@ static struct dsa_chip_data rd88f5181l_ge_switch_chip_data 
= {
.port_names[7]  = "lan3",
 };
 
-static struct dsa_platform_data __initdata rd88f5181l_ge_switch_plat_data = {
-   .nr_chips   = 1,
-   .chip   = &rd88f5181l_ge_switch_chip_data,
-};
-
 static struct i2c_board_info __initdata rd88f5181l_ge_i2c_rtc = {
I2C_BOARD_INFO("ds1338", 0x68),
 };
@@ -125,7 +120,7 @@ static void __init rd88f5181l_ge_init(void)
 */
orion5x_ehci0_init();
orion5x_eth_init(&rd88f5181l_ge_eth_data);
-   orion5x_eth_switch_init(&rd88f5181l_ge_switch_plat_data);
+   orion5x_eth_switch_init(&rd88f5181l_ge_switch_chip_data);
orion5x_i2c_init();
orion5x_uart0_init();
 
diff --git a/arch/arm/mach-orion5x/rd88f6183ap-ge-setup.c 
b/arch/arm/mach-orion5x/rd88f6183ap-ge-setup.c
index 67ee8571b03c..f4f1dbe1d91d 100644
--- a/arch/arm/mach-orion5x/rd88f6183ap-ge-setup.c
+++ b/arch/arm/mach-orion5x/rd88f6183ap-ge-setup.c
@@ -40,11 +40,6 @@ static struct dsa_chip_data rd88f6183ap_ge_switch_chip_data 
= {
.port_names[5]  = "cpu",
 };
 
-static struct dsa_platform_data __in

[PATCH net-next v4 00/10] net: dsa: Support for pdata in dsa2

2017-01-17 Thread Florian Fainelli

Hi all,

This is not exactly new, and was sent before, although back then, I did not
have an user of the pre-declared MDIO board information, but now we do. Note
that I have additional changes queued up to have b53 register platform data for
MIPS bcm47xx and bcm63xx.

Yes I know that we should have the Orion platforms eventually be converted to
Device Tree, but until that happens, I don't want any remaining users of the
old "dsa" platform device (hence the previous DTS submissions for ARM/mvebu)
and, there will be platforms out there that most likely won't never see DT
coming their way (BCM47xx is almost 100% sure, BCM63xx maybe not in a distant
future).

We would probably want the whole series to be merged via David Miller's tree
to simplify things.

Greg, can you Ack/Nack patch 5 since it touched the core LDD?

Vivien, since some patches did change, I did not carry your Tested-by tag
to all patches.

Thanks!

Changes in v4:

- Changed device_find_class() to device_find_in_class_name()
- Added kerneldoc above device_find_in_class_name() to explain what it does
  and the calling convention regarding device reference counts
- Changed dev_to_net_device to device_to_net_device() added comments
  about what it does and the caller conventions regarding reference counts

Changes in v3:

- Tested EPROBE_DEFER from a mockup MDIO/DSA switch driver and everything
  is fine, once the driver finally probes we have access to platform data
  as expected

- added comment above dsa_port_is_valid() that port->name is mandatory
  for platform data cases

- added an extra check in dsa_parse_member() for a NULL pdata pointer

- fixed a bunch of checkpatch errors and warnings

Changes in v2:

- Rebased against latest net-next/master

- Moved dev_find_class() to device_find_class() into drivers/base/core.c

- Moved dev_to_net_device into net/core/dev.c

- Utilize dsa_chip_data directly instead of dsa_platform_data

- Augmented dsa_chip_data to be multi-CPU port ready

Changes from last submission (few months back):

- rebased against latest net-next

- do not introduce dsa2_platform_data which was overkill and was meant to
  allow us to do exaclty the same things with platform data and Device Tree
  we use the existing dsa_platform_data instead

- properly register MDIO devices when the MDIO bus is registered and associate
  platform_data with them

- add a change to the Orion platform code to demonstrate how this can be used

Thank you

Florian Fainelli (10):
  net: dsa: Pass device pointer to dsa_register_switch
  net: dsa: Make most functions take a dsa_port argument
  net: dsa: Suffix function manipulating device_node with _dn
  net: dsa: Move ports assignment closer to error checking
  drivers: base: Add device_find_in_class_name()
  net: dsa: Migrate to device_find_in_class_name()
  net: Relocate dev_to_net_device() into net/core/dev.c
  net: dsa: Add support for platform data
  net: phy: Allow pre-declaration of MDIO devices
  ARM: orion: Register DSA switch as a MDIO device

 arch/arm/mach-orion5x/common.c   |   2 +-
 arch/arm/mach-orion5x/common.h   |   4 +-
 arch/arm/mach-orion5x/rd88f5181l-fxo-setup.c |   7 +-
 arch/arm/mach-orion5x/rd88f5181l-ge-setup.c  |   7 +-
 arch/arm/mach-orion5x/rd88f6183ap-ge-setup.c |   7 +-
 arch/arm/mach-orion5x/wnr854t-setup.c|   2 +-
 arch/arm/mach-orion5x/wrt350n-v2-setup.c |   7 +-
 arch/arm/plat-orion/common.c |  25 +++-
 arch/arm/plat-orion/include/plat/common.h|   4 +-
 drivers/base/core.c  |  31 +
 drivers/net/dsa/b53/b53_common.c |   2 +-
 drivers/net/dsa/mv88e6xxx/chip.c |  11 +-
 drivers/net/dsa/qca8k.c  |   2 +-
 drivers/net/phy/Makefile |   3 +-
 drivers/net/phy/mdio-boardinfo.c |  86 +
 drivers/net/phy/mdio-boardinfo.h |  19 +++
 drivers/net/phy/mdio_bus.c   |   4 +
 drivers/net/phy/mdio_device.c|  11 ++
 include/linux/device.h   |   2 +
 include/linux/mdio.h |   3 +
 include/linux/mod_devicetable.h  |   1 +
 include/linux/netdevice.h|   2 +
 include/linux/phy.h  |  19 +++
 include/net/dsa.h|   8 +-
 net/core/dev.c   |  30 +
 net/dsa/dsa.c|  55 ++---
 net/dsa/dsa2.c   | 175 +++
 net/dsa/dsa_priv.h   |   4 +-
 28 files changed, 391 insertions(+), 142 deletions(-)
 create mode 100644 drivers/net/phy/mdio-boardinfo.c
 create mode 100644 drivers/net/phy/mdio-boardinfo.h

-- 
2.9.3

[PATCH net-next v4 01/10] net: dsa: Pass device pointer to dsa_register_switch

2017-01-17 Thread Florian Fainelli

In preparation for allowing dsa_register_switch() to be supplied with
device/platform data, pass down a struct device pointer instead of a
struct device_node.

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/b53/b53_common.c |  2 +-
 drivers/net/dsa/mv88e6xxx/chip.c | 11 ++-
 drivers/net/dsa/qca8k.c  |  2 +-
 include/net/dsa.h|  2 +-
 net/dsa/dsa2.c   |  7 ---
 5 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index 5102a3701a1a..7179eed9ee6d 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -1882,7 +1882,7 @@ int b53_switch_register(struct b53_device *dev)
 
pr_info("found switch: %s, rev %i\n", dev->name, dev->core_rev);
 
-   return dsa_register_switch(dev->ds, dev->ds->dev->of_node);
+   return dsa_register_switch(dev->ds, dev->ds->dev);
 }
 EXPORT_SYMBOL(b53_switch_register);
 
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 987b2dbbd35a..3238a4752b98 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -4421,8 +4421,7 @@ static struct dsa_switch_driver mv88e6xxx_switch_drv = {
.ops= &mv88e6xxx_switch_ops,
 };
 
-static int mv88e6xxx_register_switch(struct mv88e6xxx_chip *chip,
-struct device_node *np)
+static int mv88e6xxx_register_switch(struct mv88e6xxx_chip *chip)
 {
struct device *dev = chip->dev;
struct dsa_switch *ds;
@@ -4437,7 +4436,7 @@ static int mv88e6xxx_register_switch(struct 
mv88e6xxx_chip *chip,
 
dev_set_drvdata(dev, ds);
 
-   return dsa_register_switch(ds, np);
+   return dsa_register_switch(ds, dev);
 }
 
 static void mv88e6xxx_unregister_switch(struct mv88e6xxx_chip *chip)
@@ -4521,9 +4520,11 @@ static int mv88e6xxx_probe(struct mdio_device *mdiodev)
if (err)
goto out_g2_irq;
 
-   err = mv88e6xxx_register_switch(chip, np);
-   if (err)
+   err = mv88e6xxx_register_switch(chip);
+   if (err) {
+   mv88e6xxx_mdio_unregister(chip);
goto out_mdio;
+   }
 
return 0;
 
diff --git a/drivers/net/dsa/qca8k.c b/drivers/net/dsa/qca8k.c
index 54d270d59eb0..c084aa484d2b 100644
--- a/drivers/net/dsa/qca8k.c
+++ b/drivers/net/dsa/qca8k.c
@@ -964,7 +964,7 @@ qca8k_sw_probe(struct mdio_device *mdiodev)
mutex_init(&priv->reg_mutex);
dev_set_drvdata(&mdiodev->dev, priv);
 
-   return dsa_register_switch(priv->ds, priv->ds->dev->of_node);
+   return dsa_register_switch(priv->ds, &mdiodev->dev);
 }
 
 static void
diff --git a/include/net/dsa.h b/include/net/dsa.h
index b94d1f2ef912..16a502a6c26a 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -403,7 +403,7 @@ static inline bool dsa_uses_tagged_protocol(struct 
dsa_switch_tree *dst)
 }
 
 void dsa_unregister_switch(struct dsa_switch *ds);
-int dsa_register_switch(struct dsa_switch *ds, struct device_node *np);
+int dsa_register_switch(struct dsa_switch *ds, struct device *dev);
 #ifdef CONFIG_PM_SLEEP
 int dsa_switch_suspend(struct dsa_switch *ds);
 int dsa_switch_resume(struct dsa_switch *ds);
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 42a41d84053c..4170f7ea8e28 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -579,8 +579,9 @@ static struct device_node *dsa_get_ports(struct dsa_switch 
*ds,
return ports;
 }
 
-static int _dsa_register_switch(struct dsa_switch *ds, struct device_node *np)
+static int _dsa_register_switch(struct dsa_switch *ds, struct device *dev)
 {
+   struct device_node *np = dev->of_node;
struct device_node *ports = dsa_get_ports(ds, np);
struct dsa_switch_tree *dst;
u32 tree, index;
@@ -660,12 +661,12 @@ static int _dsa_register_switch(struct dsa_switch *ds, 
struct device_node *np)
return err;
 }
 
-int dsa_register_switch(struct dsa_switch *ds, struct device_node *np)
+int dsa_register_switch(struct dsa_switch *ds, struct device *dev)
 {
int err;
 
mutex_lock(&dsa2_mutex);
-   err = _dsa_register_switch(ds, np);
+   err = _dsa_register_switch(ds, dev);
mutex_unlock(&dsa2_mutex);
 
return err;
-- 
2.9.3

Re: 52bd2d62ce6758d811edcbd2256eb9ea7f6a56cb fixing crashes? -> 4.4 stable?

2017-01-17 Thread Greg

On Tue, 2017-01-17 at 22:48 +0100, Nikola Ciprich wrote:
> Dear netdev developers,
> 
> I'd like to ask for a consultation regarding 4.4 kernel crashes.
> we're using intel X540-AT2 10g controllers (onboard ones, on supermicro
> boards) and we've noticed, then when using openvswitch, system very quickly
> crashes on 4.4.x kernels we're usign. 4.5 is fine though.
> 
> here's backtrace gathered from system pstore:

Adding the openvswitch maintainer, Pravin. Hopefully you'll get a
quicker response.

- Greg

> 
> <1>[ 1084.114586] BUG: unable to handle kernel paging request at 
> 8840c365b5c4
> <1>[ 1084.114918] IP: [] __netdev_pick_tx+0x92/0x140
> <4>[ 1084.115101] PGD 2018067 PUD 0
> <4>[ 1084.115270] Oops:  [#1] SMP
> <4>[ 1084.115439] Modules linked in: bonding(E) openvswitch(E) 
> nf_defrag_ipv6(E) nf_conntrack(E) crc32_pclmul(E) aesni_intel(E) lrw(E) 
> gf128mul(E) glue_helper(E) ablk_helper(E) cryptd(E) kvm
> _intel(E) kvm(E) irqbypass(E) coretemp(E) crct10dif_pclmul(E) 
> intel_powerclamp(E) x86_pkg_temp_thermal(E) ses(E) enclosure(E) iTCO_wdt(E) 
> iTCO_vendor_support(E) mxm_wmi(E) i2c_i801(E) lpc_ic
> h(E) mei_me(E) mfd_core(E) i2c_core(E) sb_edac(E) sg(E) mei(E) pcspkr(E) 
> edac_core(E) ipmi_devintf(E) ioatdma(E) shpchp(E) wmi(E) ipmi_si(E) 
> ipmi_msghandler(E) 8250_fintek(E) acpi_power_mete
> r(E) acpi_pad(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) 
> sunrpc(E) ip_tables(E) ext4(E) jbd2(E) mbcache(E) raid1(E) sd_mod(E) ahci(E) 
> libahci(E) bnx2x(E) libcrc32c(E) ixgbe(E) cr
> c32c_intel(E) libata(E) mdio(E) ptp(E) dca(E) megaraid_sas(E) pps_core(E) 
> dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
> <4>[ 1084.117683] CPU: 0 PID: 0 Comm: swapper/0 Tainted: GE   
> 4.4.33lb7.01 #1
> <4>[ 1084.118012] Hardware name: Supermicro X10DRi/X10DRI-T, BIOS 2.1 
> 09/13/2016
> <4>[ 1084.118181] task: 819f14c0 ti: 819e task.ti: 
> 819e
> <4>[ 1084.118501] RIP: 0010:[]  [] 
> __netdev_pick_tx+0x92/0x140
> <4>[ 1084.118828] RSP: 0018:883f7f003638  EFLAGS: 00010a02
> <4>[ 1084.118994] RAX: aef55a76 RBX:  RCX: 
> 9d6e7dcd
> <4>[ 1084.119164] RDX: ba9f4f5f RSI: 883f63f14d00 RDI: 
> 883f7f0035ec
> <4>[ 1084.119333] RBP: 883f7f003668 R08: 0003 R09: 
> c8cfdbe1
> <4>[ 1084.119506] R10: 883f61206042 R11: 883f7f0035c0 R12: 
> 
> <4>[ 1084.119679] R13: 883f657b00c0 R14: 883f5d92 R15: 
> f012
> <4>[ 1084.119850] FS:  () GS:883f7f00() 
> knlGS:
> <4>[ 1084.120171] CS:  0010 DS:  ES:  CR0: 80050033
> <4>[ 1084.120338] CR2: 8840c365b5c4 CR3: 019ea000 CR4: 
> 003406f0
> <4>[ 1084.120509] DR0:  DR1:  DR2: 
> 
> <4>[ 1084.120678] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> <4>[ 1084.120847] Stack:
> <4>[ 1084.121006]  883f63f14d00 883f63f14d00 000e 
> 
> <4>[ 1084.121339]  883f5d92 883f60a7f840 883f7f0036a0 
> a00fbed4
> <4>[ 1084.121672]  883f603612ac 883f5d92 883f63f14d00 
> 
> <4>[ 1084.122006] Call Trace:
> <4>[ 1084.122168]  
> <4>[ 1084.122193]  [] ixgbe_select_queue+0xc4/0x150 [ixgbe]
> <4>[ 1084.122519]  [] netdev_pick_tx+0x5e/0xf0
> <4>[ 1084.122687]  [] __dev_queue_xmit+0xa2/0x560
> <4>[ 1084.122856]  [] dev_queue_xmit+0x10/0x20
> <4>[ 1084.123034]  [] bond_dev_queue_xmit+0x32/0x80 
> [bonding]
> <4>[ 1084.123207]  [] bond_start_xmit+0x1a6/0x3f0 [bonding]
> <4>[ 1084.123382]  [] ? ep_poll_callback+0xb5/0x160
> <4>[ 1084.123551]  [] dev_hard_start_xmit+0x238/0x3f0
> <4>[ 1084.123721]  [] ? netif_skb_features+0xff/0x200
> <4>[ 1084.123890]  [] __dev_queue_xmit+0x442/0x560
> <4>[ 1084.124059]  [] dev_queue_xmit+0x10/0x20
> <4>[ 1084.124232]  [] ovs_vport_send+0x4a/0xc0 [openvswitch]
> <4>[ 1084.124404]  [] do_output.isra.30+0x43/0x160 
> [openvswitch]
> <4>[ 1084.124575]  [] ? __skb_clone+0x2e/0x140
> <4>[ 1084.124744]  [] do_execute_actions+0x684/0x7e0 
> [openvswitch]
> <4>[ 1084.125067]  [] ovs_execute_actions+0x32/0xd0 
> [openvswitch]
> <4>[ 1084.125240]  [] ovs_dp_process_packet+0x84/0x110 
> [openvswitch]
> <4>[ 1084.125565]  [] ovs_vport_receive+0x6c/0xd0 
> [openvswitch]
> <4>[ 1084.125740]  [] ? check_preempt_curr+0x75/0x90
> <4>[ 1084.125912]  [] ? ttwu_do_wakeup+0x19/0xe0
> <4>[ 1084.126081]  [] ? 
> ttwu_do_activate.constprop.95+0x5d/0x70
> <4>[ 1084.126252]  [] ? try_to_wake_up+0x47/0x340
> <4>[ 1084.126427]  [] ? default_wake_function+0x12/0x20
> <4>[ 1084.126600]  [] ? autoremove_wake_function+0x2b/0x40
> <4>[ 1084.126773]  [] netdev_frame_hook+0xe7/0x150 
> [openvswitch]
> <4>[ 1084.126945]  [] __netif_receive_skb_core+0x1e0/0x9e0
> <4>[ 1084.127115]  [] ? ipv6_gro_receive+0x246/0x360
> <4>[ 1084.127284]  [] __netif_receive_skb+0x18/0x60
> <4>[ 1084.127453]  [] netif_r

Re: [PATCH net] lwtunnel: fix autoload of lwt modules

2017-01-17 Thread David Ahern

On 1/17/17 1:54 PM, David Miller wrote:
> From: David Ahern 
> Date: Tue, 17 Jan 2017 13:46:22 -0700
> 
>> In short seems like removing the dev + the current patch dropping
>> the lock fixes the current deadlock problem and should be fine.
> 
> What about the state recorded by fib_get_nhs() and similar?  There is
> a mapping from ifindex to ->nh_dev which would be invalidated if the
> RTNL semaphore is dropped.

As far as I can see through the call to build_state all device indices came 
from the user and have not been validated yet (once the dev arg to build_state 
is removed; sent that patch for net-next). The device index validation happens 
later in fib_create_info with the call to fib_check_nh (or dev_get_by_index for 
host scope).

I sent an alternative approach that pulls the module loading into a separate 
function that is called while creating the fib_config. Performance heavy for 
multipath but solves the autoload without delving into the restart problem.

Re: [PATCH] net: ethernet: stmmac: add ARP management

2017-01-17 Thread Andy Shevchenko

On Tue, Jan 17, 2017 at 6:56 PM, Christophe Roullier
 wrote:

> +static int dwmac4_arp_enable(struct mac_device_info *hw)
> +{
> +   void __iomem *ioaddr = hw->pcsr;

__iomem *config = hw->pcsr + GMAC_CONFIG;

> +   u32 value = readl(ioaddr + GMAC_CONFIG);
> +
> +   value |= GMAC_CONFIG_ARPEN;
> +
> +   writel(value, ioaddr + GMAC_CONFIG);

u32 value;

value = readl();
writel(value | ...);

?

> +
> +   value = readl(ioaddr + GMAC_CONFIG);
> +
> +   return !!(value & GMAC_CONFIG_ARPEN);
> +}

> +/* Set ARP Address */
> +static void dwmac4_set_arp_addr(void __iomem *ioaddr, bool set, u32 addr)
> +{

__iomem *arp_addr = ioaddr + GMAC_ARP_ADDR;

> +   u32 value;
> +
> +   value = readl(ioaddr + GMAC_ARP_ADDR);

Care to explain why do you need dummy readl() here?

> +
> +   if (set) {
> +   /* set arp address */
> +   value = addr;
> +   } else {
> +   /* unset arp address */
> +   value = 0;
> +   }

value = set ? addr : 0;


> +
> +   writel(value, ioaddr + GMAC_ARP_ADDR);
> +   value = readl(ioaddr + GMAC_ARP_ADDR);
> +}


> +   if ((priv->plat->arp_en) && (priv->dma_cap.arpoffsel)) {
> +   ret = priv->hw->mac->arp_en(priv->hw);
> +   if (!ret) {

Hmm... Most would expect

if (ret) {
 doing something
} else {
 doing something else
}

> +   pr_warn(" ARP feature disabled\n");
> +   } else {

> +   pr_info(" ARP feature enabled\n");

Wouldn't be too noisy?

pr_* -> dev_*

> +   /* Copy MAC addr into MAC_ARP_ADDRESS register*/
> +   priv->hw->dma->set_arp_addr(priv->ioaddr, 1,
> +   priv->dev->dev_addr);
> +   }
> +   }

-- 
With Best Regards,
Andy Shevchenko

RE: [PATCHv1 5/7] TAP: Extending tap device create/destroy APIs

2017-01-17 Thread Grandhi, Sainath



> -Original Message-
> From: Eric Dumazet [mailto:eric.duma...@gmail.com]
> Sent: Friday, January 06, 2017 3:16 PM
> To: Grandhi, Sainath 
> Cc: netdev@vger.kernel.org; da...@davemloft.net;
> mah...@bandewar.net; linux-ker...@vger.kernel.org
> Subject: Re: [PATCHv1 5/7] TAP: Extending tap device create/destroy APIs
> 
> On Fri, 2017-01-06 at 14:33 -0800, Sainath Grandhi wrote:
> 
> > +static int tap_list_add(dev_t major, const char *device_name) {
> > +   int err = 0;
> > +   struct major_info *tap_major;
> > +
> > +   tap_major = kzalloc(sizeof(*tap_major), GFP_ATOMIC);
> > +
> > +   tap_major->major = MAJOR(major);
> > +
> 
> 
> kzalloc() can perfectly return NULL.
> 
> You do not want to crash it that happens.
> 
Thanks for pointing out. Will send out next version that takes care of null 
pointer

[PATCH net v2] lwtunnel: fix autoload of lwt modules

2017-01-17 Thread David Ahern

Trying to add an mpls encap route when the MPLS modules are not loaded
hangs. For example:

CONFIG_MPLS=y
CONFIG_NET_MPLS_GSO=m
CONFIG_MPLS_ROUTING=m
CONFIG_MPLS_IPTUNNEL=m

$ ip route add 10.10.10.10/32 encap mpls 100 via inet 10.100.1.2

The ip command hangs:
root   880   826  0 21:25 pts/000:00:00 ip route add 10.10.10.10/32 
encap mpls 100 via inet 10.100.1.2

$ cat /proc/880/stack
[] call_usermodehelper_exec+0xd6/0x134
[] __request_module+0x27b/0x30a
[] lwtunnel_build_state+0xe4/0x178
[] fib_create_info+0x47f/0xdd4
[] fib_table_insert+0x90/0x41f
[] inet_rtm_newroute+0x4b/0x52
...

modprobe is trying to load rtnl-lwt-MPLS:

root   881 5  0 21:25 ?00:00:00 /sbin/modprobe -q -- 
rtnl-lwt-MPLS

and it hangs after loading mpls_router:

$ cat /proc/881/stack
[] rtnl_lock+0x12/0x14
[] register_netdevice_notifier+0x16/0x179
[] mpls_init+0x25/0x1000 [mpls_router]
[] do_one_initcall+0x8e/0x13f
[] do_init_module+0x5a/0x1e5
[] load_module+0x13bd/0x17d6
...

The problem is that lwtunnel_build_state is called with rtnl lock
held preventing mpls_init from registering.

Given the potential references held by the time lwtunnel_build_state it
can not drop the rtnl lock to the load module. So, extract the module
loading code from lwtunnel_build_state into a new function to validate
the encap type. The new function is called while converting the user
request into a fib_config which is well before any table, device or
fib entries are examined.

Fixes: 745041e2aaf1 ("lwtunnel: autoload of lwt modules")
Signed-off-by: David Ahern 
---
v2
- extract the module load attempt into a separate function that is
  called early in the newroute code paths

 include/net/lwtunnel.h  | 11 +
 net/core/lwtunnel.c | 62 -
 net/ipv4/fib_frontend.c |  8 +++
 net/ipv6/route.c| 12 +-
 4 files changed, 86 insertions(+), 7 deletions(-)

diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h
index d4c1c75b8862..0b585f1fd340 100644
--- a/include/net/lwtunnel.h
+++ b/include/net/lwtunnel.h
@@ -105,6 +105,8 @@ int lwtunnel_encap_add_ops(const struct lwtunnel_encap_ops 
*op,
   unsigned int num);
 int lwtunnel_encap_del_ops(const struct lwtunnel_encap_ops *op,
   unsigned int num);
+int lwtunnel_valid_encap_type(u16 encap_type);
+int lwtunnel_valid_encap_type_attr(struct nlattr *attr, int len);
 int lwtunnel_build_state(struct net_device *dev, u16 encap_type,
 struct nlattr *encap,
 unsigned int family, const void *cfg,
@@ -168,6 +170,15 @@ static inline int lwtunnel_encap_del_ops(const struct 
lwtunnel_encap_ops *op,
return -EOPNOTSUPP;
 }
 
+static inline int lwtunnel_valid_encap_type(u16 encap_type)
+{
+   return -EOPNOTSUPP;
+}
+static inline int lwtunnel_valid_encap_type_attr(struct nlattr *attr, int len)
+{
+   return -EOPNOTSUPP;
+}
+
 static inline int lwtunnel_build_state(struct net_device *dev, u16 encap_type,
   struct nlattr *encap,
   unsigned int family, const void *cfg,
diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c
index a5d4e866ce88..47b1dd65947b 100644
--- a/net/core/lwtunnel.c
+++ b/net/core/lwtunnel.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_MODULES
 
@@ -114,25 +115,74 @@ int lwtunnel_build_state(struct net_device *dev, u16 
encap_type,
ret = -EOPNOTSUPP;
rcu_read_lock();
ops = rcu_dereference(lwtun_encaps[encap_type]);
+   if (likely(ops && ops->build_state))
+   ret = ops->build_state(dev, encap, family, cfg, lws);
+   rcu_read_unlock();
+
+   return ret;
+}
+EXPORT_SYMBOL(lwtunnel_build_state);
+
+int lwtunnel_valid_encap_type(u16 encap_type)
+{
+   const struct lwtunnel_encap_ops *ops;
+   int ret = -EINVAL;
+
+   if (encap_type == LWTUNNEL_ENCAP_NONE ||
+   encap_type > LWTUNNEL_ENCAP_MAX)
+   return ret;
+
+   rcu_read_lock();
+   ops = rcu_dereference(lwtun_encaps[encap_type]);
+   rcu_read_unlock();
 #ifdef CONFIG_MODULES
if (!ops) {
const char *encap_type_str = lwtunnel_encap_str(encap_type);
 
if (encap_type_str) {
-   rcu_read_unlock();
+   __rtnl_unlock();
request_module("rtnl-lwt-%s", encap_type_str);
+   rtnl_lock();
+
rcu_read_lock();
ops = rcu_dereference(lwtun_encaps[encap_type]);
+   rcu_read_unlock();
}
}
 #endif
-   if (likely(ops && ops->build_state))
-   ret = ops->build_state(dev, encap, family, cfg, lws);
-   rcu_read_unlock();
+   return ops ? 0 : -EOPNOTSUPP;
+}
+E

[net PATCH v5 5/6] virtio_net: refactor freeze/restore logic into virtnet reset logic

2017-01-17 Thread John Fastabend

For XDP we will need to reset the queues to allow for buffer headroom
to be configured. In order to do this we need to essentially run the
freeze()/restore() code path. Unfortunately the locking requirements
between the freeze/restore and reset paths are different however so
we can not simply reuse the code.

This patch refactors the code path and adds a reset helper routine.

Signed-off-by: John Fastabend 
---
 drivers/net/virtio_net.c |   75 --
 drivers/virtio/virtio.c  |   42 ++
 include/linux/virtio.h   |4 ++
 3 files changed, 73 insertions(+), 48 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 922ca66..62dbf4b 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1684,6 +1684,49 @@ static void virtnet_init_settings(struct net_device *dev)
.set_settings = virtnet_set_settings,
 };
 
+static void virtnet_freeze_down(struct virtio_device *vdev)
+{
+   struct virtnet_info *vi = vdev->priv;
+   int i;
+
+   /* Make sure no work handler is accessing the device */
+   flush_work(&vi->config_work);
+
+   netif_device_detach(vi->dev);
+   cancel_delayed_work_sync(&vi->refill);
+
+   if (netif_running(vi->dev)) {
+   for (i = 0; i < vi->max_queue_pairs; i++)
+   napi_disable(&vi->rq[i].napi);
+   }
+}
+
+static int init_vqs(struct virtnet_info *vi);
+
+static int virtnet_restore_up(struct virtio_device *vdev)
+{
+   struct virtnet_info *vi = vdev->priv;
+   int err, i;
+
+   err = init_vqs(vi);
+   if (err)
+   return err;
+
+   virtio_device_ready(vdev);
+
+   if (netif_running(vi->dev)) {
+   for (i = 0; i < vi->curr_queue_pairs; i++)
+   if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
+   schedule_delayed_work(&vi->refill, 0);
+
+   for (i = 0; i < vi->max_queue_pairs; i++)
+   virtnet_napi_enable(&vi->rq[i]);
+   }
+
+   netif_device_attach(vi->dev);
+   return err;
+}
+
 static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
 {
unsigned long int max_sz = PAGE_SIZE - sizeof(struct padded_vnet_hdr);
@@ -2374,21 +2417,9 @@ static void virtnet_remove(struct virtio_device *vdev)
 static int virtnet_freeze(struct virtio_device *vdev)
 {
struct virtnet_info *vi = vdev->priv;
-   int i;
 
virtnet_cpu_notif_remove(vi);
-
-   /* Make sure no work handler is accessing the device */
-   flush_work(&vi->config_work);
-
-   netif_device_detach(vi->dev);
-   cancel_delayed_work_sync(&vi->refill);
-
-   if (netif_running(vi->dev)) {
-   for (i = 0; i < vi->max_queue_pairs; i++)
-   napi_disable(&vi->rq[i].napi);
-   }
-
+   virtnet_freeze_down(vdev);
remove_vq_common(vi);
 
return 0;
@@ -2397,25 +2428,11 @@ static int virtnet_freeze(struct virtio_device *vdev)
 static int virtnet_restore(struct virtio_device *vdev)
 {
struct virtnet_info *vi = vdev->priv;
-   int err, i;
+   int err;
 
-   err = init_vqs(vi);
+   err = virtnet_restore_up(vdev);
if (err)
return err;
-
-   virtio_device_ready(vdev);
-
-   if (netif_running(vi->dev)) {
-   for (i = 0; i < vi->curr_queue_pairs; i++)
-   if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
-   schedule_delayed_work(&vi->refill, 0);
-
-   for (i = 0; i < vi->max_queue_pairs; i++)
-   virtnet_napi_enable(&vi->rq[i]);
-   }
-
-   netif_device_attach(vi->dev);
-
virtnet_set_queues(vi, vi->curr_queue_pairs);
 
err = virtnet_cpu_notif_add(vi);
diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 7062bb0..400d70b 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -100,11 +100,6 @@ static int virtio_uevent(struct device *_dv, struct 
kobj_uevent_env *env)
  dev->id.device, dev->id.vendor);
 }
 
-static void add_status(struct virtio_device *dev, unsigned status)
-{
-   dev->config->set_status(dev, dev->config->get_status(dev) | status);
-}
-
 void virtio_check_driver_offered_feature(const struct virtio_device *vdev,
 unsigned int fbit)
 {
@@ -145,14 +140,15 @@ void virtio_config_changed(struct virtio_device *dev)
 }
 EXPORT_SYMBOL_GPL(virtio_config_changed);
 
-static void virtio_config_disable(struct virtio_device *dev)
+void virtio_config_disable(struct virtio_device *dev)
 {
spin_lock_irq(&dev->config_lock);
dev->config_enabled = false;
spin_unlock_irq(&dev->config_lock);
 }
+EXPORT_SYMBOL_GPL(virtio_config_disable);
 
-static void virtio_config_enable(struct virtio_device *dev)
+void virtio_config_enable(struct virtio_device *dev)
 {

[PATCH] Revert "net: qcom/emac: configure the external phy to allow pause frames"

2017-01-17 Thread Timur Tabi

This reverts commit 3e884493448131179a5b7cae1ddca1028ffaecc8.

With commit 529ed1275263 ("net: phy: phy drivers should not set
SUPPORTED_[Asym_]Pause"), phylib now handles automatically enabling
pause frame support in the PHY, and the MAC driver should follow suit.

Since the EMAC driver driver does this,  we no longer need to force
pause frames support.

Signed-off-by: Timur Tabi 
---
 drivers/net/ethernet/qualcomm/emac/emac-mac.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/emac/emac-mac.c 
b/drivers/net/ethernet/qualcomm/emac/emac-mac.c
index 0b4deb3..384e1be 100644
--- a/drivers/net/ethernet/qualcomm/emac/emac-mac.c
+++ b/drivers/net/ethernet/qualcomm/emac/emac-mac.c
@@ -1004,12 +1004,6 @@ int emac_mac_up(struct emac_adapter *adpt)
writel((u32)~DIS_INT, adpt->base + EMAC_INT_STATUS);
writel(adpt->irq.mask, adpt->base + EMAC_INT_MASK);
 
-   /* Enable pause frames.  Without this feature, the EMAC has been shown
-* to receive (and drop) frames with FCS errors at gigabit connections.
-*/
-   adpt->phydev->supported |= SUPPORTED_Pause | SUPPORTED_Asym_Pause;
-   adpt->phydev->advertising |= SUPPORTED_Pause | SUPPORTED_Asym_Pause;
-
adpt->phydev->irq = PHY_IGNORE_INTERRUPT;
phy_start(adpt->phydev);
 
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

[net PATCH v5 6/6] virtio_net: XDP support for adjust_head

2017-01-17 Thread John Fastabend

Add support for XDP adjust head by allocating a 256B header region
that XDP programs can grow into. This is only enabled when a XDP
program is loaded.

In order to ensure that we do not have to unwind queue headroom push
queue setup below bpf_prog_add. It reads better to do a prog ref
unwind vs another queue setup call.

At the moment this code must do a full reset to ensure old buffers
without headroom on program add or with headroom on program removal
are not used incorrectly in the datapath. Ideally we would only
have to disable/enable the RX queues being updated but there is no
API to do this at the moment in virtio so use the big hammer. In
practice it is likely not that big of a problem as this will only
happen when XDP is enabled/disabled changing programs does not
require the reset. There is some risk that the driver may either
have an allocation failure or for some reason fail to correctly
negotiate with the underlying backend in this case the driver will
be left uninitialized. I have not seen this ever happen on my test
systems and for what its worth this same failure case can occur
from probe and other contexts in virtio framework.

Signed-off-by: John Fastabend 
---
 drivers/net/virtio_net.c |  149 +++---
 1 file changed, 125 insertions(+), 24 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 62dbf4b..3b129b4 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -41,6 +41,9 @@
 #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
 #define GOOD_COPY_LEN  128
 
+/* Amount of XDP headroom to prepend to packets for use by xdp_adjust_head */
+#define VIRTIO_XDP_HEADROOM 256
+
 /* RX packet size EWMA. The average packet size is used to determine the packet
  * buffer size when refilling RX rings. As the entire RX ring may be refilled
  * at once, the weight is chosen so that the EWMA will be insensitive to short-
@@ -359,6 +362,7 @@ static void virtnet_xdp_xmit(struct virtnet_info *vi,
}
 
if (vi->mergeable_rx_bufs) {
+   xdp->data -= sizeof(struct virtio_net_hdr_mrg_rxbuf);
/* Zero header and leave csum up to XDP layers */
hdr = xdp->data;
memset(hdr, 0, vi->hdr_len);
@@ -375,7 +379,9 @@ static void virtnet_xdp_xmit(struct virtnet_info *vi,
num_sg = 2;
sg_init_table(sq->sg, 2);
sg_set_buf(sq->sg, hdr, vi->hdr_len);
-   skb_to_sgvec(skb, sq->sg + 1, 0, skb->len);
+   skb_to_sgvec(skb, sq->sg + 1,
+xdp->data - xdp->data_hard_start,
+xdp->data_end - xdp->data);
}
err = virtqueue_add_outbuf(sq->vq, sq->sg, num_sg,
   data, GFP_ATOMIC);
@@ -401,7 +407,6 @@ static struct sk_buff *receive_small(struct net_device *dev,
struct bpf_prog *xdp_prog;
 
len -= vi->hdr_len;
-   skb_trim(skb, len);
 
rcu_read_lock();
xdp_prog = rcu_dereference(rq->xdp_prog);
@@ -413,11 +418,15 @@ static struct sk_buff *receive_small(struct net_device 
*dev,
if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
goto err_xdp;
 
-   xdp.data = skb->data;
+   xdp.data_hard_start = skb->data;
+   xdp.data = skb->data + VIRTIO_XDP_HEADROOM;
xdp.data_end = xdp.data + len;
act = bpf_prog_run_xdp(xdp_prog, &xdp);
switch (act) {
case XDP_PASS:
+   /* Recalculate length in case bpf program changed it */
+   __skb_pull(skb, xdp.data - xdp.data_hard_start);
+   len = xdp.data_end - xdp.data;
break;
case XDP_TX:
virtnet_xdp_xmit(vi, rq, &xdp, skb);
@@ -432,6 +441,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
}
rcu_read_unlock();
 
+   skb_trim(skb, len);
return skb;
 
 err_xdp:
@@ -480,7 +490,7 @@ static struct page *xdp_linearize_page(struct receive_queue 
*rq,
   unsigned int *len)
 {
struct page *page = alloc_page(GFP_ATOMIC);
-   unsigned int page_off = 0;
+   unsigned int page_off = VIRTIO_XDP_HEADROOM;
 
if (!page)
return NULL;
@@ -516,7 +526,8 @@ static struct page *xdp_linearize_page(struct receive_queue 
*rq,
put_page(p);
}
 
-   *len = page_off;
+   /* Headroom does not contribute to packet length */
+   *len = page_off - VIRTIO_XDP_HEADROOM;
return page;
 err_buf:
__free_pages(page, 0);
@@ -555,7 +566,7 @@ static struct sk_buff *receive_mergeable(struct net_device 
*dev,
  page, offset, &len);
if (!xdp_page)
goto err_

[net PATCH v5 4/6] virtio_net: remove duplicate queue pair binding in XDP

2017-01-17 Thread John Fastabend

Factor out qp assignment.

Signed-off-by: John Fastabend 
---
 drivers/net/virtio_net.c |   18 +++---
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 6de0cbe..922ca66 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -332,15 +332,19 @@ static struct sk_buff *page_to_skb(struct virtnet_info 
*vi,
 
 static void virtnet_xdp_xmit(struct virtnet_info *vi,
 struct receive_queue *rq,
-struct send_queue *sq,
 struct xdp_buff *xdp,
 void *data)
 {
struct virtio_net_hdr_mrg_rxbuf *hdr;
unsigned int num_sg, len;
+   struct send_queue *sq;
+   unsigned int qp;
void *xdp_sent;
int err;
 
+   qp = vi->curr_queue_pairs - vi->xdp_queue_pairs + smp_processor_id();
+   sq = &vi->sq[qp];
+
/* Free up any pending old buffers before queueing new ones. */
while ((xdp_sent = virtqueue_get_buf(sq->vq, &len)) != NULL) {
if (vi->mergeable_rx_bufs) {
@@ -404,7 +408,6 @@ static struct sk_buff *receive_small(struct net_device *dev,
if (xdp_prog) {
struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
struct xdp_buff xdp;
-   unsigned int qp;
u32 act;
 
if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
@@ -417,10 +420,7 @@ static struct sk_buff *receive_small(struct net_device 
*dev,
case XDP_PASS:
break;
case XDP_TX:
-   qp = vi->curr_queue_pairs -
-   vi->xdp_queue_pairs +
-   smp_processor_id();
-   virtnet_xdp_xmit(vi, rq, &vi->sq[qp], &xdp, skb);
+   virtnet_xdp_xmit(vi, rq, &xdp, skb);
rcu_read_unlock();
goto xdp_xmit;
default:
@@ -545,7 +545,6 @@ static struct sk_buff *receive_mergeable(struct net_device 
*dev,
if (xdp_prog) {
struct page *xdp_page;
struct xdp_buff xdp;
-   unsigned int qp;
void *data;
u32 act;
 
@@ -586,10 +585,7 @@ static struct sk_buff *receive_mergeable(struct net_device 
*dev,
}
break;
case XDP_TX:
-   qp = vi->curr_queue_pairs -
-   vi->xdp_queue_pairs +
-   smp_processor_id();
-   virtnet_xdp_xmit(vi, rq, &vi->sq[qp], &xdp, data);
+   virtnet_xdp_xmit(vi, rq, &xdp, data);
ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);
if (unlikely(xdp_page != page))
goto err_xdp;

[net PATCH v5 0/6] virtio_net XDP fixes and adjust_header support

2017-01-17 Thread John Fastabend

This has a fix to handle small buffer free logic correctly and then
also adds adjust head support.

I pushed adjust head at net (even though its rc3) to avoid having
to push another exception case into virtio_net to catch if the
program uses adjust_head and then block it. If there are any strong
objections to this we can push it at net-next and use a patch from
Jakub to add the exception handling but then user space has to deal
with it either via try/fail logic or via kernel version checks. Granted
we already have some cases that need to be configured to enable XDP
but I don't see any reason to have yet another one when we can fix it
now vs delaying a kernel version.


v2: fix spelling error, convert unsigned -> unsigned int
v3: v2 git crashed during send so retrying sorry for the noise
v4: changed layout of rtnl_lock fixes (Stephen)
moved reset logic into virtio core with new patch (MST)
fixed up linearize and some code cleanup (Jason)

Otherwise did some generic code cleanup so might be a bit
cleaner this time at least that is the hope.
v5: fixed rtnl_lock issue (DaveM)

In order to fix rtnl_lock issue and also to address Jason's
comment questioning the need for a generic virtio_device_reset
routine I exported some virtio core routines and then wrote
virtio_net reset routine. This is the cleanest solution I
came up with today and I do not at this time have any need
for a more generic reset. If folks don't like this I could
revert back to v3 variant but Stephen pointed out that the
pattern used there is also not ideal.

Thanks for the review.

---

John Fastabend (6):
  virtio_net: use dev_kfree_skb for small buffer XDP receive
  virtio_net: wrap rtnl_lock in test for calling with lock already held
  virtio_net: factor out xdp handler for readability
  virtio_net: remove duplicate queue pair binding in XDP
  virtio_net: refactor freeze/restore logic into virtnet reset logic
  virtio_net: XDP support for adjust_head


 drivers/net/virtio_net.c |  332 ++
 drivers/virtio/virtio.c  |   42 +++---
 include/linux/virtio.h   |4 +
 3 files changed, 247 insertions(+), 131 deletions(-)

--
Signature

[net PATCH v5 3/6] virtio_net: factor out xdp handler for readability

2017-01-17 Thread John Fastabend

At this point the do_xdp_prog is mostly if/else branches handling
the different modes of virtio_net. So remove it and handle running
the program in the per mode handlers.

Signed-off-by: John Fastabend 
---
 drivers/net/virtio_net.c |   75 +-
 1 file changed, 27 insertions(+), 48 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index ba0efee..6de0cbe 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -388,49 +388,6 @@ static void virtnet_xdp_xmit(struct virtnet_info *vi,
virtqueue_kick(sq->vq);
 }
 
-static u32 do_xdp_prog(struct virtnet_info *vi,
-  struct receive_queue *rq,
-  struct bpf_prog *xdp_prog,
-  void *data, int len)
-{
-   int hdr_padded_len;
-   struct xdp_buff xdp;
-   void *buf;
-   unsigned int qp;
-   u32 act;
-
-   if (vi->mergeable_rx_bufs) {
-   hdr_padded_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
-   xdp.data = data + hdr_padded_len;
-   xdp.data_end = xdp.data + (len - vi->hdr_len);
-   buf = data;
-   } else { /* small buffers */
-   struct sk_buff *skb = data;
-
-   xdp.data = skb->data;
-   xdp.data_end = xdp.data + len;
-   buf = skb->data;
-   }
-
-   act = bpf_prog_run_xdp(xdp_prog, &xdp);
-   switch (act) {
-   case XDP_PASS:
-   return XDP_PASS;
-   case XDP_TX:
-   qp = vi->curr_queue_pairs -
-   vi->xdp_queue_pairs +
-   smp_processor_id();
-   xdp.data = buf;
-   virtnet_xdp_xmit(vi, rq, &vi->sq[qp], &xdp, data);
-   return XDP_TX;
-   default:
-   bpf_warn_invalid_xdp_action(act);
-   case XDP_ABORTED:
-   case XDP_DROP:
-   return XDP_DROP;
-   }
-}
-
 static struct sk_buff *receive_small(struct net_device *dev,
 struct virtnet_info *vi,
 struct receive_queue *rq,
@@ -446,19 +403,30 @@ static struct sk_buff *receive_small(struct net_device 
*dev,
xdp_prog = rcu_dereference(rq->xdp_prog);
if (xdp_prog) {
struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
+   struct xdp_buff xdp;
+   unsigned int qp;
u32 act;
 
if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
goto err_xdp;
-   act = do_xdp_prog(vi, rq, xdp_prog, skb, len);
+
+   xdp.data = skb->data;
+   xdp.data_end = xdp.data + len;
+   act = bpf_prog_run_xdp(xdp_prog, &xdp);
switch (act) {
case XDP_PASS:
break;
case XDP_TX:
+   qp = vi->curr_queue_pairs -
+   vi->xdp_queue_pairs +
+   smp_processor_id();
+   virtnet_xdp_xmit(vi, rq, &vi->sq[qp], &xdp, skb);
rcu_read_unlock();
goto xdp_xmit;
-   case XDP_DROP:
default:
+   bpf_warn_invalid_xdp_action(act);
+   case XDP_ABORTED:
+   case XDP_DROP:
goto err_xdp;
}
}
@@ -576,6 +544,9 @@ static struct sk_buff *receive_mergeable(struct net_device 
*dev,
xdp_prog = rcu_dereference(rq->xdp_prog);
if (xdp_prog) {
struct page *xdp_page;
+   struct xdp_buff xdp;
+   unsigned int qp;
+   void *data;
u32 act;
 
/* This happens when rx buffer size is underestimated */
@@ -598,8 +569,10 @@ static struct sk_buff *receive_mergeable(struct net_device 
*dev,
if (unlikely(hdr->hdr.gso_type))
goto err_xdp;
 
-   act = do_xdp_prog(vi, rq, xdp_prog,
- page_address(xdp_page) + offset, len);
+   data = page_address(xdp_page) + offset;
+   xdp.data = data + vi->hdr_len;
+   xdp.data_end = xdp.data + (len - vi->hdr_len);
+   act = bpf_prog_run_xdp(xdp_prog, &xdp);
switch (act) {
case XDP_PASS:
/* We can only create skb based on xdp_page. */
@@ -613,13 +586,19 @@ static struct sk_buff *receive_mergeable(struct 
net_device *dev,
}
break;
case XDP_TX:
+   qp = vi->curr_queue_pairs -
+   vi->xdp_queue_pairs +
+   smp_processor_id();
+   virtnet_xdp_xmit(vi, rq, &vi->sq[qp], &xdp, data);
ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);

[net PATCH v5 2/6] virtio_net: wrap rtnl_lock in test for calling with lock already held

2017-01-17 Thread John Fastabend

For XDP use case and to allow ethtool reset tests it is useful to be
able to use reset paths from contexts where rtnl lock is already
held.

This requries updating virtnet_set_queues and free_receive_bufs the
two places where rtnl_lock is taken in virtio_net. To do this we
use the following pattern,

_foo(...) { do stuff }
foo(...) { rtnl_lock(); _foo(...); rtnl_unlock()};

this allows us to use freeze()/restore() flow from both contexts.

Signed-off-by: John Fastabend 
---
 drivers/net/virtio_net.c |   31 +--
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index d97bb71..ba0efee 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1331,7 +1331,7 @@ static void virtnet_ack_link_announce(struct virtnet_info 
*vi)
rtnl_unlock();
 }
 
-static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
+static int _virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
 {
struct scatterlist sg;
struct net_device *dev = vi->dev;
@@ -1357,6 +1357,16 @@ static int virtnet_set_queues(struct virtnet_info *vi, 
u16 queue_pairs)
return 0;
 }
 
+static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
+{
+   int err;
+
+   rtnl_lock();
+   err = _virtnet_set_queues(vi, queue_pairs);
+   rtnl_unlock();
+   return err;
+}
+
 static int virtnet_close(struct net_device *dev)
 {
struct virtnet_info *vi = netdev_priv(dev);
@@ -1609,7 +1619,7 @@ static int virtnet_set_channels(struct net_device *dev,
return -EINVAL;
 
get_online_cpus();
-   err = virtnet_set_queues(vi, queue_pairs);
+   err = _virtnet_set_queues(vi, queue_pairs);
if (!err) {
netif_set_real_num_tx_queues(dev, queue_pairs);
netif_set_real_num_rx_queues(dev, queue_pairs);
@@ -1736,7 +1746,7 @@ static int virtnet_xdp_set(struct net_device *dev, struct 
bpf_prog *prog)
return -ENOMEM;
}
 
-   err = virtnet_set_queues(vi, curr_qp + xdp_qp);
+   err = _virtnet_set_queues(vi, curr_qp + xdp_qp);
if (err) {
dev_warn(&dev->dev, "XDP Device queue allocation failure.\n");
return err;
@@ -1745,7 +1755,7 @@ static int virtnet_xdp_set(struct net_device *dev, struct 
bpf_prog *prog)
if (prog) {
prog = bpf_prog_add(prog, vi->max_queue_pairs - 1);
if (IS_ERR(prog)) {
-   virtnet_set_queues(vi, curr_qp);
+   _virtnet_set_queues(vi, curr_qp);
return PTR_ERR(prog);
}
}
@@ -1864,12 +1874,11 @@ static void virtnet_free_queues(struct virtnet_info *vi)
kfree(vi->sq);
 }
 
-static void free_receive_bufs(struct virtnet_info *vi)
+static void _free_receive_bufs(struct virtnet_info *vi)
 {
struct bpf_prog *old_prog;
int i;
 
-   rtnl_lock();
for (i = 0; i < vi->max_queue_pairs; i++) {
while (vi->rq[i].pages)
__free_pages(get_a_page(&vi->rq[i], GFP_KERNEL), 0);
@@ -1879,6 +1888,12 @@ static void free_receive_bufs(struct virtnet_info *vi)
if (old_prog)
bpf_prog_put(old_prog);
}
+}
+
+static void free_receive_bufs(struct virtnet_info *vi)
+{
+   rtnl_lock();
+   _free_receive_bufs(vi);
rtnl_unlock();
 }
 
@@ -2317,9 +2332,7 @@ static int virtnet_probe(struct virtio_device *vdev)
goto free_unregister_netdev;
}
 
-   rtnl_lock();
virtnet_set_queues(vi, vi->curr_queue_pairs);
-   rtnl_unlock();
 
/* Assume link up if device can't report link status,
   otherwise get link status from config. */
@@ -2428,9 +2441,7 @@ static int virtnet_restore(struct virtio_device *vdev)
 
netif_device_attach(vi->dev);
 
-   rtnl_lock();
virtnet_set_queues(vi, vi->curr_queue_pairs);
-   rtnl_unlock();
 
err = virtnet_cpu_notif_add(vi);
if (err)

[net PATCH v5 1/6] virtio_net: use dev_kfree_skb for small buffer XDP receive

2017-01-17 Thread John Fastabend

In the small buffer case during driver unload we currently use
put_page instead of dev_kfree_skb. Resolve this by adding a check
for virtnet mode when checking XDP queue type. Also name the
function so that the code reads correctly to match the additional
check.

Fixes: bb91accf2733 ("virtio-net: XDP support for small buffers")
Signed-off-by: John Fastabend 
Acked-by: Jason Wang 
---
 drivers/net/virtio_net.c |8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 4a10500..d97bb71 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1890,8 +1890,12 @@ static void free_receive_page_frags(struct virtnet_info 
*vi)
put_page(vi->rq[i].alloc_frag.page);
 }
 
-static bool is_xdp_queue(struct virtnet_info *vi, int q)
+static bool is_xdp_raw_buffer_queue(struct virtnet_info *vi, int q)
 {
+   /* For small receive mode always use kfree_skb variants */
+   if (!vi->mergeable_rx_bufs)
+   return false;
+
if (q < (vi->curr_queue_pairs - vi->xdp_queue_pairs))
return false;
else if (q < vi->curr_queue_pairs)
@@ -1908,7 +1912,7 @@ static void free_unused_bufs(struct virtnet_info *vi)
for (i = 0; i < vi->max_queue_pairs; i++) {
struct virtqueue *vq = vi->sq[i].vq;
while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
-   if (!is_xdp_queue(vi, i))
+   if (!is_xdp_raw_buffer_queue(vi, i))
dev_kfree_skb(buf);
else
put_page(virt_to_head_page(buf));

Getting a handle on all these new NIC features

2017-01-17 Thread Tom Herbert

There was some discussion about the problems of dealing with the
explosion of NIC features in the mlx directory restructuring proposal,
but I think the is a deeper issue here that should be discussed.

It's hard not to notice that there has been quite a proliferation of
NIC features in several drivers. This trend had resulted in very
complex driver code that may or may not segment individual features.
One visible manifestation of this is number of ndo functions which is
somewhere around seventy-five now.

I suspect the vast majority of these advances NIC features (e.g.
bridging, UDP offloads, tc offload, etc.) are only relevant to some of
the people some of the time. The problem we have, in this case those
of us that are attempting to deploy and maintain NICs at scale, is
when we have to deal with the ramifications of these features being
intertwined with core driver functionality that is relevant to
everyone. This becomes very obvious when we need to backport drivers
from later versions of kernel.

I realize that backports of a driver is not a specific concern of the
Linux kernel, but nevertheless this is a real problem and a fact of
life for many users. Rebasing the full kernel is still a major effort
and it seems the best we could ever do is one rebase per year. In the
interim we need to occasionally backport drivers. Backporting drivers
is difficult precisely because of new features or API changes to
existing ones. These sort of changes tend to have a spiderweb of
dependencies in other parts of the stack so that the number of patches
we need to cherry-pick goes way beyond those that touch the driver we
are interested in.

Currently we (FB) need to backport two NIC drivers. I've already gave
details of backporting mlx5 on the thread to restructure the driver
directories. The other driver being backporting seems to suffer from
the same type of feature complexity.

In short, I would like to ask if driver maintainers to start to
modularize driver features. If something being added is obviously a
narrow feature that only a subset of users will need can we allow
config options to #ifdef those out somehow? Furthermore can the file
and directory structure of drivers reflect that; our lives would be
_so_ much simpler to maintain drivers in production if we have such
modularity and the ability to build drivers with the features of our
choosing.

Thanks,
Tom

52bd2d62ce6758d811edcbd2256eb9ea7f6a56cb fixing crashes? -> 4.4 stable?

2017-01-17 Thread Nikola Ciprich

Dear netdev developers,

I'd like to ask for a consultation regarding 4.4 kernel crashes.
we're using intel X540-AT2 10g controllers (onboard ones, on supermicro
boards) and we've noticed, then when using openvswitch, system very quickly
crashes on 4.4.x kernels we're usign. 4.5 is fine though.

here's backtrace gathered from system pstore:

<1>[ 1084.114586] BUG: unable to handle kernel paging request at 
8840c365b5c4
<1>[ 1084.114918] IP: [] __netdev_pick_tx+0x92/0x140
<4>[ 1084.115101] PGD 2018067 PUD 0
<4>[ 1084.115270] Oops:  [#1] SMP
<4>[ 1084.115439] Modules linked in: bonding(E) openvswitch(E) 
nf_defrag_ipv6(E) nf_conntrack(E) crc32_pclmul(E) aesni_intel(E) lrw(E) 
gf128mul(E) glue_helper(E) ablk_helper(E) cryptd(E) kvm
_intel(E) kvm(E) irqbypass(E) coretemp(E) crct10dif_pclmul(E) 
intel_powerclamp(E) x86_pkg_temp_thermal(E) ses(E) enclosure(E) iTCO_wdt(E) 
iTCO_vendor_support(E) mxm_wmi(E) i2c_i801(E) lpc_ic
h(E) mei_me(E) mfd_core(E) i2c_core(E) sb_edac(E) sg(E) mei(E) pcspkr(E) 
edac_core(E) ipmi_devintf(E) ioatdma(E) shpchp(E) wmi(E) ipmi_si(E) 
ipmi_msghandler(E) 8250_fintek(E) acpi_power_mete
r(E) acpi_pad(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) 
ip_tables(E) ext4(E) jbd2(E) mbcache(E) raid1(E) sd_mod(E) ahci(E) libahci(E) 
bnx2x(E) libcrc32c(E) ixgbe(E) cr
c32c_intel(E) libata(E) mdio(E) ptp(E) dca(E) megaraid_sas(E) pps_core(E) 
dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
<4>[ 1084.117683] CPU: 0 PID: 0 Comm: swapper/0 Tainted: GE   
4.4.33lb7.01 #1
<4>[ 1084.118012] Hardware name: Supermicro X10DRi/X10DRI-T, BIOS 2.1 09/13/2016
<4>[ 1084.118181] task: 819f14c0 ti: 819e task.ti: 
819e
<4>[ 1084.118501] RIP: 0010:[]  [] 
__netdev_pick_tx+0x92/0x140
<4>[ 1084.118828] RSP: 0018:883f7f003638  EFLAGS: 00010a02
<4>[ 1084.118994] RAX: aef55a76 RBX:  RCX: 
9d6e7dcd
<4>[ 1084.119164] RDX: ba9f4f5f RSI: 883f63f14d00 RDI: 
883f7f0035ec
<4>[ 1084.119333] RBP: 883f7f003668 R08: 0003 R09: 
c8cfdbe1
<4>[ 1084.119506] R10: 883f61206042 R11: 883f7f0035c0 R12: 

<4>[ 1084.119679] R13: 883f657b00c0 R14: 883f5d92 R15: 
f012
<4>[ 1084.119850] FS:  () GS:883f7f00() 
knlGS:
<4>[ 1084.120171] CS:  0010 DS:  ES:  CR0: 80050033
<4>[ 1084.120338] CR2: 8840c365b5c4 CR3: 019ea000 CR4: 
003406f0
<4>[ 1084.120509] DR0:  DR1:  DR2: 

<4>[ 1084.120678] DR3:  DR6: fffe0ff0 DR7: 
0400
<4>[ 1084.120847] Stack:
<4>[ 1084.121006]  883f63f14d00 883f63f14d00 000e 

<4>[ 1084.121339]  883f5d92 883f60a7f840 883f7f0036a0 
a00fbed4
<4>[ 1084.121672]  883f603612ac 883f5d92 883f63f14d00 

<4>[ 1084.122006] Call Trace:
<4>[ 1084.122168]  
<4>[ 1084.122193]  [] ixgbe_select_queue+0xc4/0x150 [ixgbe]
<4>[ 1084.122519]  [] netdev_pick_tx+0x5e/0xf0
<4>[ 1084.122687]  [] __dev_queue_xmit+0xa2/0x560
<4>[ 1084.122856]  [] dev_queue_xmit+0x10/0x20
<4>[ 1084.123034]  [] bond_dev_queue_xmit+0x32/0x80 [bonding]
<4>[ 1084.123207]  [] bond_start_xmit+0x1a6/0x3f0 [bonding]
<4>[ 1084.123382]  [] ? ep_poll_callback+0xb5/0x160
<4>[ 1084.123551]  [] dev_hard_start_xmit+0x238/0x3f0
<4>[ 1084.123721]  [] ? netif_skb_features+0xff/0x200
<4>[ 1084.123890]  [] __dev_queue_xmit+0x442/0x560
<4>[ 1084.124059]  [] dev_queue_xmit+0x10/0x20
<4>[ 1084.124232]  [] ovs_vport_send+0x4a/0xc0 [openvswitch]
<4>[ 1084.124404]  [] do_output.isra.30+0x43/0x160 
[openvswitch]
<4>[ 1084.124575]  [] ? __skb_clone+0x2e/0x140
<4>[ 1084.124744]  [] do_execute_actions+0x684/0x7e0 
[openvswitch]
<4>[ 1084.125067]  [] ovs_execute_actions+0x32/0xd0 
[openvswitch]
<4>[ 1084.125240]  [] ovs_dp_process_packet+0x84/0x110 
[openvswitch]
<4>[ 1084.125565]  [] ovs_vport_receive+0x6c/0xd0 
[openvswitch]
<4>[ 1084.125740]  [] ? check_preempt_curr+0x75/0x90
<4>[ 1084.125912]  [] ? ttwu_do_wakeup+0x19/0xe0
<4>[ 1084.126081]  [] ? 
ttwu_do_activate.constprop.95+0x5d/0x70
<4>[ 1084.126252]  [] ? try_to_wake_up+0x47/0x340
<4>[ 1084.126427]  [] ? default_wake_function+0x12/0x20
<4>[ 1084.126600]  [] ? autoremove_wake_function+0x2b/0x40
<4>[ 1084.126773]  [] netdev_frame_hook+0xe7/0x150 
[openvswitch]
<4>[ 1084.126945]  [] __netif_receive_skb_core+0x1e0/0x9e0
<4>[ 1084.127115]  [] ? ipv6_gro_receive+0x246/0x360
<4>[ 1084.127284]  [] __netif_receive_skb+0x18/0x60
<4>[ 1084.127453]  [] netif_receive_skb_internal+0x40/0xb0
<4>[ 1084.127623]  [] napi_gro_receive+0xc3/0x110
<4>[ 1084.127813]  [] bnx2x_rx_int+0x101c/0x19d0 [bnx2x]
<4>[ 1084.127984]  [] ? load_balance+0x163/0x8d0
<4>[ 1084.128166]  [] bnx2x_poll+0x284/0x340 [bnx2x]
<4>[ 1084.128334]  [] net_rx_action+0x16b/0x370
<4>[ 1084.128503]  [] __do_softirq+0xe2/0x2e0
<4>[ 1084.128671]  [] ir

Re: fs, net: deadlock between bind/splice on af_unix

2017-01-17 Thread Cong Wang

On Mon, Jan 16, 2017 at 1:32 AM, Dmitry Vyukov  wrote:
> On Fri, Dec 9, 2016 at 7:41 AM, Al Viro  wrote:
>> On Thu, Dec 08, 2016 at 10:32:00PM -0800, Cong Wang wrote:
>>
>>> > Why do we do autobind there, anyway, and why is it conditional on
>>> > SOCK_PASSCRED?  Note that e.g. for SOCK_STREAM we can bloody well get
>>> > to sending stuff without autobind ever done - just use socketpair()
>>> > to create that sucker and we won't be going through the connect()
>>> > at all.
>>>
>>> In the case Dmitry reported, unix_dgram_sendmsg() calls unix_autobind(),
>>> not SOCK_STREAM.
>>
>> Yes, I've noticed.  What I'm asking is what in there needs autobind triggered
>> on sendmsg and why doesn't the same need affect the SOCK_STREAM case?
>>
>>> I guess some lock, perhaps the u->bindlock could be dropped before
>>> acquiring the next one (sb_writer), but I need to double check.
>>
>> Bad idea, IMO - do you *want* autobind being able to come through while
>> bind(2) is busy with mknod?
>
>
> Ping. This is still happening on HEAD.
>

Thanks for your reminder. Mind to give the attached patch (compile only)
a try? I take another approach to fix this deadlock, which moves the
unix_mknod() out of unix->bindlock. Not sure if there is any unexpected
impact with this way.

Thanks.
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 127656e..5d4b4d1 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -995,6 +995,7 @@ static int unix_bind(struct socket *sock, struct sockaddr 
*uaddr, int addr_len)
unsigned int hash;
struct unix_address *addr;
struct hlist_head *list;
+   struct path path;
 
err = -EINVAL;
if (sunaddr->sun_family != AF_UNIX)
@@ -1010,9 +1011,20 @@ static int unix_bind(struct socket *sock, struct 
sockaddr *uaddr, int addr_len)
goto out;
addr_len = err;
 
+   if (sun_path[0]) {
+   umode_t mode = S_IFSOCK |
+  (SOCK_INODE(sock)->i_mode & ~current_umask());
+   err = unix_mknod(sun_path, mode, &path);
+   if (err) {
+   if (err == -EEXIST)
+   err = -EADDRINUSE;
+   goto out;
+   }
+   }
+
err = mutex_lock_interruptible(&u->bindlock);
if (err)
-   goto out;
+   goto out_put;
 
err = -EINVAL;
if (u->addr)
@@ -1029,16 +1041,6 @@ static int unix_bind(struct socket *sock, struct 
sockaddr *uaddr, int addr_len)
atomic_set(&addr->refcnt, 1);
 
if (sun_path[0]) {
-   struct path path;
-   umode_t mode = S_IFSOCK |
-  (SOCK_INODE(sock)->i_mode & ~current_umask());
-   err = unix_mknod(sun_path, mode, &path);
-   if (err) {
-   if (err == -EEXIST)
-   err = -EADDRINUSE;
-   unix_release_addr(addr);
-   goto out_up;
-   }
addr->hash = UNIX_HASH_SIZE;
hash = d_backing_inode(path.dentry)->i_ino & (UNIX_HASH_SIZE - 
1);
spin_lock(&unix_table_lock);
@@ -1065,6 +1067,9 @@ static int unix_bind(struct socket *sock, struct sockaddr 
*uaddr, int addr_len)
spin_unlock(&unix_table_lock);
 out_up:
mutex_unlock(&u->bindlock);
+out_put:
+   if (err)
+   path_put(&path);
 out:
return err;
 }

Re: [PATCH net] lwtunnel: fix autoload of lwt modules

2017-01-17 Thread David Miller

From: David Ahern 
Date: Tue, 17 Jan 2017 13:46:22 -0700

> In short seems like removing the dev + the current patch dropping
> the lock fixes the current deadlock problem and should be fine.

What about the state recorded by fib_get_nhs() and similar?  There is
a mapping from ifindex to ->nh_dev which would be invalidated if the
RTNL semaphore is dropped.

It won't get updated by device events, which is what normally happens,
because the fib_info is not in any of the fib_trie tables yet.

So I think you still have a huge problem without doing proper restarts.

Re: [PATCH net-next] tcp: accept RST for rcv_nxt - 1 after receiving a FIN

2017-01-17 Thread David Miller

From: Jason Baron 
Date: Tue, 17 Jan 2017 13:37:19 -0500

> From: Jason Baron 
> 
> Using a Mac OSX box as a client connecting to a Linux server, we have found
> that when certain applications (such as 'ab'), are abruptly terminated
> (via ^C), a FIN is sent followed by a RST packet on tcp connections. The
> FIN is accepted by the Linux stack but the RST is sent with the same
> sequence number as the FIN, and Linux responds with a challenge ACK per
> RFC 5961. The OSX client then sometimes (they are rate-limited) does not
> reply with any RST as would be expected on a closed socket.
> 
> This results in sockets accumulating on the Linux server left mostly in
> the CLOSE_WAIT state, although LAST_ACK and CLOSING are also possible.
> This sequence of events can tie up a lot of resources on the Linux server
> since there may be a lot of data in write buffers at the time of the RST.
> Accepting a RST equal to rcv_nxt - 1, after we have already successfully
> processed a FIN, has made a significant difference for us in practice, by
> freeing up unneeded resources in a more expedient fashion.
> 
> A packetdrill test demonstrating the behavior:
 ...
> Signed-off-by: Jason Baron 

Applied, thanks Jason.

Re: Potential issues (security and otherwise) with the current cgroup-bpf API

2017-01-17 Thread Andy Lutomirski

On Tue, Jan 17, 2017 at 5:58 AM, Michal Hocko  wrote:
> On Tue 17-01-17 14:32:04, Peter Zijlstra wrote:
>> On Tue, Jan 17, 2017 at 02:03:03PM +0100, Michal Hocko wrote:
>> > On Sun 15-01-17 20:19:01, Tejun Heo wrote:
>> > [...]
>> > > So, what's proposed is a proper part of bpf.  In terms of
>> > > implementation, cgroup helps by hosting the pointers but that doesn't
>> > > necessarily affect the conceptual structure of it.  Given that, I
>> > > don't think it'd be a good idea to add anything to cgroup interface
>> > > for this feature.  Introspection is great to have but this should be
>> > > introspectable together with other bpf programs using the same
>> > > mechanism.  That's where it belongs.
>> >
>> > If BPF only piggy backs on top of cgroup to iterate tasks shouldn't we
>> > at least enforce that the cgroup has to be a leaf one and no further
>> > children groups can be created once there is BPF program attached?
>>
>> Why (again) this stupid constraint?
>>
>> If you want to use cgroups for tagging (like perf does), _any_ parent
>> cgroup will also tag you.
>>
>> So creating child cgroups, and placing tasks in it, should not be a
>> problem, the BPF thing should apply to all of them.
>
> This would require using hierarchical cgroup iterators to iterate over
> tasks. As per Andy's testing this doesn't seem to be the case. I haven't
> checked the implementation closely but my understanding was that using
> only cgroup specific tasks was intentional.

The current semantics are AFAIK that only the innermost cgroup that
has a hook installed is in effect.  I think this is the wrong design.

I think that the right semantics are probably to support both
innermost-to-outermost and outermost-to-innermost and to select which
is appropriate for each hook.  Suppose we have a cgroup /a/b where a
and b both have hooks installed.  If the hook is a socket creation or
egress hook, I think that b's hook should run first.  If b's hook
rejects, then a's hook is not run.  If b's hook accepts, then a's hook
is run.  This way a gets the last word on any changes to the socket
settings and a sees exactly what would happen if it were to accept.

Conversely, for ingress hooks, I think that a's hook should run first.
This way a sees the packet as it originally came in and can modify or
reject it, and then b only sees whatever a chooses to let through.

The guiding principle here is that, for actions that originate outside
the machine, the outer hooks should IMO run first and, for actions
that originate from a task in a cgroup, the innermost hooks should run
first.

--Andy

Re: [PATCH] net: ethoc: Make needlessly global struct ethtool_ops static

2017-01-17 Thread David Miller

From: Tobias Klauser 
Date: Tue, 17 Jan 2017 15:01:08 +0100

> Make the needlessly global struct ethtool_ops ethoc_ethtool_ops static
> to fix a sparse warning.
> 
> Signed-off-by: Tobias Klauser 

Applied, thanks.

1 2 3 >

1 - 100 of 269 matches

Mail list logo