Re: [PATCH net-next v3 06/10] net: dsa: Migrate to device_find_class()
On Mon, Jan 16, 2017 at 12:01:02PM -0800, Florian Fainelli wrote: > On 01/15/2017 11:16 AM, Andrew Lunn wrote: > >>> What exactly is the relationship between these devices (a ascii-art tree > >>> or sysfs tree output might be nice) so I can try to understand what is > >>> going on here. > > > > Hi Greg, Florian > > > > A few diagrams and trees which might help understand what is going on. > > > > The first diagram comes from the 2008 patch which added all this code: > > > > +---+ +---+ > > | | RGMII | | > > | +---+ +-- 1000baseT MDI ("WAN") > > | | | 6-port +-- 1000baseT MDI ("LAN1") > > |CPU| | ethernet +-- 1000baseT MDI ("LAN2") > > | |MIImgmt| switch +-- 1000baseT MDI ("LAN3") > > | +---+ w/5 PHYs +-- 1000baseT MDI ("LAN4") > > | | | | > > +---+ +---+ > > > > We have an ethernet switch and a host CPU. The switch is connected to > > the CPU in two different ways. RGMII allows us to get Ethernet frames > > from the CPU into the switch. MIImgmt, is the management bus normally > > used for Ethernet PHYs, but Marvell switches also use it for Managing > > switches. > > > > The diagram above is the simplest setup. You can have multiple > > Ethernet switches, connected together via switch ports. Each switch > > has its own MIImgmt connect to the CPU, but there is only one RGMII > > link. > > > > When this code was designed back in 2008, it was decided to represent > > this is a platform device, and it has a platform_data, which i have > > slightly edited to keep it simple: > > > > struct dsa_platform_data { > > /* > > * Reference to a Linux network interface that connects > > * to the root switch chip of the tree. > > */ > > struct device *netdev; > > > > /* > > * Info structs describing each of the switch chips > > * connected via this network interface. > > */ > > int nr_chips; > > struct dsa_chip_data*chip; > > }; > > > > This netdev is the CPU side of the RGMII interface. > > > > Each switch has a dsa_chip_data, again edited: > > > > struct dsa_chip_data { > > /* > > * How to access the switch configuration registers. > > */ > > struct device *host_dev; > > int sw_addr; > > ... > > } > > > > The host_dev is the CPU side of the MIImgmt, and we have the address > > the switch is using on the bus. > > > > During probe of this platform device, we need to get from the > > struct device *netdev to a struct net_device *dev. > > > > So the code looks in the device net class to find the device > > > > | | | |-- f1074000.ethernet > > | | | | |-- deferred_probe > > | | | | |-- driver -> ../../../../../bus/platform/drivers/mvneta > > | | | | |-- driver_override > > | | | | |-- modalias > > | | | | |-- net > > | | | | | `-- eth1 > > | | | | | |-- addr_assign_type > > | | | | | |-- address > > | | | | | |-- addr_len > > | | | | | |-- broadcast > > | | | | | |-- carrier > > | | | | | |-- carrier_changes > > | | | | | |-- deferred_probe > > | | | | | |-- device -> ../../../f1074000.ethernet > > > > and then use container_of() to get the net_device. > > > > Similarly, the code needs to get from struct device *host_dev to a struct > > mii_bus *. > > > > | | | |-- f1072004.mdio > > | | | | |-- deferred_probe > > | | | | |-- driver -> ../../../../../bus/platform/drivers/orion-mdio > > | | | | |-- driver_override > > | | | | |-- mdio_bus > > | | | | | `-- f1072004.mdio-mi > > | | | | | |-- deferred_probe > > | | | | | |-- device -> ../../../f1072004.mdio > > > > Thanks Andrew! Greg, does that make it clearer how these devices > references are used, do you still think the way this is done is wrong, > too cautious, or valid? I'm still not sold on it, I think there is something odd here with your use/assumptions of the driver model. Give me a few days to catch up with other stuff to respond back please... thanks, greg k-h
[PATCH net-next V5 3/3] tun: rx batching
We can only process 1 packet at one time during sendmsg(). This often lead bad cache utilization under heavy load. So this patch tries to do some batching during rx before submitting them to host network stack. This is done through accepting MSG_MORE as a hint from sendmsg() caller, if it was set, batch the packet temporarily in a linked list and submit them all once MSG_MORE were cleared. Tests were done by pktgen (burst=128) in guest over mlx4(noqueue) on host: Mpps -+% rx-frames = 00.91 +0% rx-frames = 41.00 +9.8% rx-frames = 81.00 +9.8% rx-frames = 16 1.01 +10.9% rx-frames = 32 1.07 +17.5% rx-frames = 48 1.07 +17.5% rx-frames = 64 1.08 +18.6% rx-frames = 64 (no MSG_MORE) 0.91 +0% User were allowed to change per device batched packets through ethtool -C rx-frames. NAPI_POLL_WEIGHT were used as upper limitation to prevent bh from being disabled too long. Signed-off-by: Jason Wang --- drivers/net/tun.c | 76 ++- 1 file changed, 70 insertions(+), 6 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 8c1d3bd..13890ac 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -218,6 +218,7 @@ struct tun_struct { struct list_head disabled; void *security; u32 flow_count; + u32 rx_batched; struct tun_pcpu_stats __percpu *pcpu_stats; }; @@ -522,6 +523,7 @@ static void tun_queue_purge(struct tun_file *tfile) while ((skb = skb_array_consume(&tfile->tx_array)) != NULL) kfree_skb(skb); + skb_queue_purge(&tfile->sk.sk_write_queue); skb_queue_purge(&tfile->sk.sk_error_queue); } @@ -1139,10 +1141,46 @@ static struct sk_buff *tun_alloc_skb(struct tun_file *tfile, return skb; } +static void tun_rx_batched(struct tun_struct *tun, struct tun_file *tfile, + struct sk_buff *skb, int more) +{ + struct sk_buff_head *queue = &tfile->sk.sk_write_queue; + struct sk_buff_head process_queue; + u32 rx_batched = tun->rx_batched; + bool rcv = false; + + if (!rx_batched || (!more && skb_queue_empty(queue))) { + local_bh_disable(); + netif_receive_skb(skb); + local_bh_enable(); + return; + } + + spin_lock(&queue->lock); + if (!more || skb_queue_len(queue) == rx_batched) { + __skb_queue_head_init(&process_queue); + skb_queue_splice_tail_init(queue, &process_queue); + rcv = true; + } else { + __skb_queue_tail(queue, skb); + } + spin_unlock(&queue->lock); + + if (rcv) { + struct sk_buff *nskb; + + local_bh_disable(); + while ((nskb = __skb_dequeue(&process_queue))) + netif_receive_skb(nskb); + netif_receive_skb(skb); + local_bh_enable(); + } +} + /* Get packet from user space buffer */ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, void *msg_control, struct iov_iter *from, - int noblock) + int noblock, bool more) { struct tun_pi pi = { 0, cpu_to_be16(ETH_P_IP) }; struct sk_buff *skb; @@ -1283,9 +1321,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, rxhash = skb_get_hash(skb); #ifndef CONFIG_4KSTACKS - local_bh_disable(); - netif_receive_skb(skb); - local_bh_enable(); + tun_rx_batched(tun, tfile, skb, more); #else netif_rx_ni(skb); #endif @@ -1311,7 +1347,8 @@ static ssize_t tun_chr_write_iter(struct kiocb *iocb, struct iov_iter *from) if (!tun) return -EBADFD; - result = tun_get_user(tun, tfile, NULL, from, file->f_flags & O_NONBLOCK); + result = tun_get_user(tun, tfile, NULL, from, + file->f_flags & O_NONBLOCK, false); tun_put(tun); return result; @@ -1569,7 +1606,8 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len) return -EBADFD; ret = tun_get_user(tun, tfile, m->msg_control, &m->msg_iter, - m->msg_flags & MSG_DONTWAIT); + m->msg_flags & MSG_DONTWAIT, + m->msg_flags & MSG_MORE); tun_put(tun); return ret; } @@ -1770,6 +1808,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr) tun->align = NET_SKB_PAD; tun->filter_attached = false; tun->sndbuf = tfile->socket.sk->sk_sndbuf; + tun->rx_batched = 0; tun->pcpu_stats = netdev_alloc_pcpu_stats(struct tun_
[PATCH net-next V5 2/3] vhost_net: tx batching
This patch tries to utilize tuntap rx batching by peeking the tx virtqueue during transmission, if there's more available buffers in the virtqueue, set MSG_MORE flag for a hint for backend (e.g tuntap) to batch the packets. Reviewed-by: Stefan Hajnoczi Signed-off-by: Jason Wang --- drivers/vhost/net.c | 23 --- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 5dc3465..c42e9c3 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -351,6 +351,15 @@ static int vhost_net_tx_get_vq_desc(struct vhost_net *net, return r; } +static bool vhost_exceeds_maxpend(struct vhost_net *net) +{ + struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX]; + struct vhost_virtqueue *vq = &nvq->vq; + + return (nvq->upend_idx + vq->num - VHOST_MAX_PEND) % UIO_MAXIOV + == nvq->done_idx; +} + /* Expects to be always run from workqueue - which acts as * read-size critical section for our kind of RCU. */ static void handle_tx(struct vhost_net *net) @@ -394,8 +403,7 @@ static void handle_tx(struct vhost_net *net) /* If more outstanding DMAs, queue the work. * Handle upend_idx wrap around */ - if (unlikely((nvq->upend_idx + vq->num - VHOST_MAX_PEND) - % UIO_MAXIOV == nvq->done_idx)) + if (unlikely(vhost_exceeds_maxpend(net))) break; head = vhost_net_tx_get_vq_desc(net, vq, vq->iov, @@ -454,6 +462,16 @@ static void handle_tx(struct vhost_net *net) msg.msg_control = NULL; ubufs = NULL; } + + total_len += len; + if (total_len < VHOST_NET_WEIGHT && + !vhost_vq_avail_empty(&net->dev, vq) && + likely(!vhost_exceeds_maxpend(net))) { + msg.msg_flags |= MSG_MORE; + } else { + msg.msg_flags &= ~MSG_MORE; + } + /* TODO: Check specific error and bomb out unless ENOBUFS? */ err = sock->ops->sendmsg(sock, &msg, len); if (unlikely(err < 0)) { @@ -472,7 +490,6 @@ static void handle_tx(struct vhost_net *net) vhost_add_used_and_signal(&net->dev, vq, head, 0); else vhost_zerocopy_signal_used(net, vq); - total_len += len; vhost_net_tx_packet(net); if (unlikely(total_len >= VHOST_NET_WEIGHT)) { vhost_poll_queue(&vq->poll); -- 2.7.4
[PATCH net-next V5 1/3] vhost: better detection of available buffers
This patch tries to do several tweaks on vhost_vq_avail_empty() for a better performance: - check cached avail index first which could avoid userspace memory access. - using unlikely() for the failure of userspace access - check vq->last_avail_idx instead of cached avail index as the last step. This patch is need for batching supports which needs to peek whether or not there's still available buffers in the ring. Reviewed-by: Stefan Hajnoczi Signed-off-by: Jason Wang --- drivers/vhost/vhost.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index d643260..9f11838 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -2241,11 +2241,15 @@ bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq) __virtio16 avail_idx; int r; + if (vq->avail_idx != vq->last_avail_idx) + return false; + r = vhost_get_user(vq, avail_idx, &vq->avail->idx); - if (r) + if (unlikely(r)) return false; + vq->avail_idx = vhost16_to_cpu(vq, avail_idx); - return vhost16_to_cpu(vq, avail_idx) == vq->avail_idx; + return vq->avail_idx == vq->last_avail_idx; } EXPORT_SYMBOL_GPL(vhost_vq_avail_empty); -- 2.7.4
Re: [PATCH net-next v4 05/10] drivers: base: Add device_find_in_class_name()
On Tue, Jan 17, 2017 at 03:21:47PM -0800, Florian Fainelli wrote: > Add a helper function to lookup a device reference given a class name. > This is a preliminary patch to remove adhoc code from net/dsa/dsa.c and > make it more generic. > > Signed-off-by: Florian Fainelli > --- > drivers/base/core.c| 31 +++ > include/linux/device.h | 2 ++ > 2 files changed, 33 insertions(+) My NAK still stands here, please give me a day or so to respond to the other thread about this... thanks, greg k-h
[PATCH net-next V5 0/3] vhost_net tx batching
Hi: This series tries to implement tx batching support for vhost. This was done by using MSG_MORE as a hint for under layer socket. The backend (e.g tap) can then batch the packets temporarily in a list and submit it all once the number of bacthed exceeds a limitation. Tests shows obvious improvement on guest pktgen over over mlx4(noqueue) on host: Mpps -+% rx-frames = 00.91 +0% rx-frames = 41.00 +9.8% rx-frames = 81.00 +9.8% rx-frames = 16 1.01 +10.9% rx-frames = 32 1.07 +17.5% rx-frames = 48 1.07 +17.5% rx-frames = 64 1.08 +18.6% rx-frames = 64 (no MSG_MORE) 0.91 +0% Changes from V4: - stick to NAPI_POLL_WEIGHT for rx-frames is user specify a value greater than it. Changes from V3: - use ethtool instead of module parameter to control the maximum number of batched packets - avoid overhead when MSG_MORE were not set and no packet queued Changes from V2: - remove uselss queue limitation check (and we don't drop any packet now) Changes from V1: - drop NAPI handler since we don't use NAPI now - fix the issues that may exceeds max pending of zerocopy - more improvement on available buffer detection - move the limitation of batched pacekts from vhost to tuntap Please review. Thanks Jason Wang (3): vhost: better detection of available buffers vhost_net: tx batching tun: rx batching drivers/net/tun.c | 76 +++ drivers/vhost/net.c | 23 ++-- drivers/vhost/vhost.c | 8 -- 3 files changed, 96 insertions(+), 11 deletions(-) -- 2.7.4
[PATCH iproute2 net-next] iplink: bridge_slave: add support for IFLA_BRPORT_FLUSH
This patch implements support for the IFLA_BRPORT_FLUSH attribute in iproute2 so it can flush bridge slave's fdb dynamic entries. Signed-off-by: Hangbin Liu --- ip/iplink_bridge_slave.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/ip/iplink_bridge_slave.c b/ip/iplink_bridge_slave.c index fbb3f06..6353fc5 100644 --- a/ip/iplink_bridge_slave.c +++ b/ip/iplink_bridge_slave.c @@ -22,7 +22,10 @@ static void print_explain(FILE *f) { fprintf(f, - "Usage: ... bridge_slave [ state STATE ] [ priority PRIO ] [cost COST ]\n" + "Usage: ... bridge_slave [ fdb_flush ]\n" + "[ state STATE ]\n" + "[ priority PRIO ]\n" + "[ cost COST ]\n" "[ guard {on | off} ]\n" "[ hairpin {on | off} ]\n" "[ fastleave {on | off} ]\n" @@ -217,7 +220,9 @@ static int bridge_slave_parse_opt(struct link_util *lu, int argc, char **argv, __u32 cost; while (argc > 0) { - if (matches(*argv, "state") == 0) { + if (matches(*argv, "fdb_flush") == 0) { + addattr(n, 1024, IFLA_BRPORT_FLUSH); + } else if (matches(*argv, "state") == 0) { NEXT_ARG(); if (get_u8(&state, *argv, 0)) invarg("state is invalid", *argv); -- 2.5.5
[PATCHv2 iproute2 net-next 1/5] iplink: bridge: add support for IFLA_BR_FDB_FLUSH
This patch implements support for the IFLA_BR_FDB_FLUSH attribute in iproute2 so it can flush bridge fdb dynamic entries. Reviewed-by: Nikolay Aleksandrov Signed-off-by: Hangbin Liu --- ip/iplink_bridge.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c index d2d4202..85e6597 100644 --- a/ip/iplink_bridge.c +++ b/ip/iplink_bridge.c @@ -22,7 +22,8 @@ static void print_explain(FILE *f) { fprintf(f, - "Usage: ... bridge [ forward_delay FORWARD_DELAY ]\n" + "Usage: ... bridge [ fdb_flush ]\n" + " [ forward_delay FORWARD_DELAY ]\n" " [ hello_time HELLO_TIME ]\n" " [ max_age MAX_AGE ]\n" " [ ageing_time AGEING_TIME ]\n" @@ -145,6 +146,8 @@ static int bridge_parse_opt(struct link_util *lu, int argc, char **argv, if (len < 0) return -1; addattr_l(n, 1024, IFLA_BR_GROUP_ADDR, llabuf, len); + } else if (matches(*argv, "fdb_flush") == 0) { + addattr(n, 1024, IFLA_BR_FDB_FLUSH); } else if (matches(*argv, "vlan_default_pvid") == 0) { __u16 default_pvid; -- 2.5.5
[PATCHv2 iproute2 net-next 4/5] iplink: bridge: add support for IFLA_BR_MCAST_IGMP_VERSION
This patch implements support for the IFLA_BR_MCAST_IGMP_VERSION attribute in iproute2 so it can change the mcast igmp version. Reviewed-by: Nikolay Aleksandrov Signed-off-by: Hangbin Liu --- ip/iplink_bridge.c | 13 + 1 file changed, 13 insertions(+) diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c index 46bbbee..3e9143e 100644 --- a/ip/iplink_bridge.c +++ b/ip/iplink_bridge.c @@ -50,6 +50,7 @@ static void print_explain(FILE *f) " [ mcast_query_response_interval QUERY_RESPONSE_INTERVAL ]\n" " [ mcast_startup_query_interval STARTUP_QUERY_INTERVAL ]\n" " [ mcast_stats_enabled MCAST_STATS_ENABLED ]\n" + " [ mcast_igmp_version IGMP_VERSION ]\n" " [ nf_call_iptables NF_CALL_IPTABLES ]\n" " [ nf_call_ip6tables NF_CALL_IP6TABLES ]\n" " [ nf_call_arptables NF_CALL_ARPTABLES ]\n" @@ -308,6 +309,14 @@ static int bridge_parse_opt(struct link_util *lu, int argc, char **argv, invarg("invalid mcast_stats_enabled", *argv); addattr8(n, 1024, IFLA_BR_MCAST_STATS_ENABLED, mcast_stats_enabled); + } else if (matches(*argv, "mcast_igmp_version") == 0) { + __u8 igmp_version; + + NEXT_ARG(); + if (get_u8(&igmp_version, *argv, 0)) + invarg("invalid mcast_igmp_version", *argv); + addattr8(n, 1024, IFLA_BR_MCAST_IGMP_VERSION, + igmp_version); } else if (matches(*argv, "nf_call_iptables") == 0) { __u8 nf_call_ipt; @@ -537,6 +546,10 @@ static void bridge_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[]) fprintf(f, "mcast_stats_enabled %u ", rta_getattr_u8(tb[IFLA_BR_MCAST_STATS_ENABLED])); + if (tb[IFLA_BR_MCAST_IGMP_VERSION]) + fprintf(f, "mcast_igmp_version %u ", + rta_getattr_u8(tb[IFLA_BR_MCAST_IGMP_VERSION])); + if (tb[IFLA_BR_NF_CALL_IPTABLES]) fprintf(f, "nf_call_iptables %u ", rta_getattr_u8(tb[IFLA_BR_NF_CALL_IPTABLES])); -- 2.5.5
[PATCHv2 iproute2 net-next 0/5] add latest bridge netlink options
Add the bridge netlink attributes added to kernel recently. v2: rename vlan/mcast_state to vlan/mcast_stats_enabled as suggested by Nikolay. The previous name has different meaning and will mislead people. I will post a separate patch for IFLA_BRPORT_FLUSH support. Hangbin Liu (5): iplink: bridge: add support for IFLA_BR_FDB_FLUSH iplink: bridge: add support for IFLA_BR_VLAN_STATS_ENABLED iplink: bridge: add support for IFLA_BR_MCAST_STATS_ENABLED iplink: bridge: add support for IFLA_BR_MCAST_IGMP_VERSION iplink: bridge: add support for IFLA_BR_MCAST_MLD_VERSION ip/iplink_bridge.c | 57 +- 1 file changed, 56 insertions(+), 1 deletion(-) -- 2.5.5
[PATCHv2 iproute2 net-next 5/5] iplink: bridge: add support for IFLA_BR_MCAST_MLD_VERSION
This patch implements support for the IFLA_BR_MCAST_MLD_VERSION attribute in iproute2 so it can change the mcast mld version. Reviewed-by: Nikolay Aleksandrov Signed-off-by: Hangbin Liu --- ip/iplink_bridge.c | 13 + 1 file changed, 13 insertions(+) diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c index 3e9143e..a17ff35 100644 --- a/ip/iplink_bridge.c +++ b/ip/iplink_bridge.c @@ -51,6 +51,7 @@ static void print_explain(FILE *f) " [ mcast_startup_query_interval STARTUP_QUERY_INTERVAL ]\n" " [ mcast_stats_enabled MCAST_STATS_ENABLED ]\n" " [ mcast_igmp_version IGMP_VERSION ]\n" + " [ mcast_mld_version MLD_VERSION ]\n" " [ nf_call_iptables NF_CALL_IPTABLES ]\n" " [ nf_call_ip6tables NF_CALL_IP6TABLES ]\n" " [ nf_call_arptables NF_CALL_ARPTABLES ]\n" @@ -317,6 +318,14 @@ static int bridge_parse_opt(struct link_util *lu, int argc, char **argv, invarg("invalid mcast_igmp_version", *argv); addattr8(n, 1024, IFLA_BR_MCAST_IGMP_VERSION, igmp_version); + } else if (matches(*argv, "mcast_mld_version") == 0) { + __u8 mld_version; + + NEXT_ARG(); + if (get_u8(&mld_version, *argv, 0)) + invarg("invalid mcast_mld_version", *argv); + addattr8(n, 1024, IFLA_BR_MCAST_MLD_VERSION, + mld_version); } else if (matches(*argv, "nf_call_iptables") == 0) { __u8 nf_call_ipt; @@ -550,6 +559,10 @@ static void bridge_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[]) fprintf(f, "mcast_igmp_version %u ", rta_getattr_u8(tb[IFLA_BR_MCAST_IGMP_VERSION])); + if (tb[IFLA_BR_MCAST_MLD_VERSION]) + fprintf(f, "mcast_mld_version %u ", + rta_getattr_u8(tb[IFLA_BR_MCAST_MLD_VERSION])); + if (tb[IFLA_BR_NF_CALL_IPTABLES]) fprintf(f, "nf_call_iptables %u ", rta_getattr_u8(tb[IFLA_BR_NF_CALL_IPTABLES])); -- 2.5.5
[PATCHv2 iproute2 net-next 3/5] iplink: bridge: add support for IFLA_BR_MCAST_STATS_ENABLED
This patch implements support for the IFLA_BR_MCAST_STATS_ENABLED attribute in iproute2 so it can enable/disable mcast stats accounting. Signed-off-by: Hangbin Liu --- ip/iplink_bridge.c | 13 + 1 file changed, 13 insertions(+) diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c index cd495b3..46bbbee 100644 --- a/ip/iplink_bridge.c +++ b/ip/iplink_bridge.c @@ -49,6 +49,7 @@ static void print_explain(FILE *f) " [ mcast_query_interval QUERY_INTERVAL ]\n" " [ mcast_query_response_interval QUERY_RESPONSE_INTERVAL ]\n" " [ mcast_startup_query_interval STARTUP_QUERY_INTERVAL ]\n" + " [ mcast_stats_enabled MCAST_STATS_ENABLED ]\n" " [ nf_call_iptables NF_CALL_IPTABLES ]\n" " [ nf_call_ip6tables NF_CALL_IP6TABLES ]\n" " [ nf_call_arptables NF_CALL_ARPTABLES ]\n" @@ -299,6 +300,14 @@ static int bridge_parse_opt(struct link_util *lu, int argc, char **argv, addattr64(n, 1024, IFLA_BR_MCAST_STARTUP_QUERY_INTVL, mcast_startup_query_intvl); + } else if (matches(*argv, "mcast_stats_enabled") == 0) { + __u8 mcast_stats_enabled; + + NEXT_ARG(); + if (get_u8(&mcast_stats_enabled, *argv, 0)) + invarg("invalid mcast_stats_enabled", *argv); + addattr8(n, 1024, IFLA_BR_MCAST_STATS_ENABLED, + mcast_stats_enabled); } else if (matches(*argv, "nf_call_iptables") == 0) { __u8 nf_call_ipt; @@ -524,6 +533,10 @@ static void bridge_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[]) fprintf(f, "mcast_startup_query_interval %llu ", rta_getattr_u64(tb[IFLA_BR_MCAST_STARTUP_QUERY_INTVL])); + if (tb[IFLA_BR_MCAST_STATS_ENABLED]) + fprintf(f, "mcast_stats_enabled %u ", + rta_getattr_u8(tb[IFLA_BR_MCAST_STATS_ENABLED])); + if (tb[IFLA_BR_NF_CALL_IPTABLES]) fprintf(f, "nf_call_iptables %u ", rta_getattr_u8(tb[IFLA_BR_NF_CALL_IPTABLES])); -- 2.5.5
[PATCHv2 iproute2 net-next 2/5] iplink: bridge: add support for IFLA_BR_VLAN_STATS_ENABLED
This patch implements support for the IFLA_BR_VLAN_STATS_ENABLED attribute in iproute2 so it can enable/disable vlan stats accounting. Signed-off-by: Hangbin Liu --- ip/iplink_bridge.c | 13 + 1 file changed, 13 insertions(+) diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c index 85e6597..cd495b3 100644 --- a/ip/iplink_bridge.c +++ b/ip/iplink_bridge.c @@ -34,6 +34,7 @@ static void print_explain(FILE *f) " [ vlan_filtering VLAN_FILTERING ]\n" " [ vlan_protocol VLAN_PROTOCOL ]\n" " [ vlan_default_pvid VLAN_DEFAULT_PVID ]\n" + " [ vlan_stats_enabled VLAN_STATS_ENABLED ]\n" " [ mcast_snooping MULTICAST_SNOOPING ]\n" " [ mcast_router MULTICAST_ROUTER ]\n" " [ mcast_query_use_ifaddr MCAST_QUERY_USE_IFADDR ]\n" @@ -157,6 +158,14 @@ static int bridge_parse_opt(struct link_util *lu, int argc, char **argv, addattr16(n, 1024, IFLA_BR_VLAN_DEFAULT_PVID, default_pvid); + } else if (matches(*argv, "vlan_stats_enabled") == 0) { + __u8 vlan_stats_enabled; + + NEXT_ARG(); + if (get_u8(&vlan_stats_enabled, *argv, 0)) + invarg("invalid vlan_stats_enabled", *argv); + addattr8(n, 1024, IFLA_BR_VLAN_STATS_ENABLED, + vlan_stats_enabled); } else if (matches(*argv, "mcast_router") == 0) { __u8 mcast_router; @@ -442,6 +451,10 @@ static void bridge_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[]) fprintf(f, "vlan_default_pvid %u ", rta_getattr_u16(tb[IFLA_BR_VLAN_DEFAULT_PVID])); + if (tb[IFLA_BR_VLAN_STATS_ENABLED]) + fprintf(f, "vlan_stats_enabled %u ", + rta_getattr_u8(tb[IFLA_BR_VLAN_STATS_ENABLED])); + if (tb[IFLA_BR_GROUP_FWD_MASK]) fprintf(f, "group_fwd_mask %#x ", rta_getattr_u16(tb[IFLA_BR_GROUP_FWD_MASK])); -- 2.5.5
[PATCH net-next v2] net/mlx5e: Support bpf_xdp_adjust_head()
This patch adds bpf_xdp_adjust_head() support to mlx5e. 1. rx_headroom is added to struct mlx5e_rq. It uses an existing 4 byte hole in the struct. 2. The adjusted data length is checked against MLX5E_XDP_MIN_INLINE and MLX5E_SW2HW_MTU(rq->netdev->mtu). 3. The macro MLX5E_SW2HW_MTU is moved from en_main.c to en.h. MLX5E_HW2SW_MTU is also moved to en.h for symmetric reason but it is not a must. v2: - Keep the xdp specific logic in mlx5e_xdp_handle() - Update dma_len after the sanity checks in mlx5e_xmit_xdp_frame() Signed-off-by: Martin KaFai Lau --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 4 ++ drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 18 - drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 47 ++- 3 files changed, 40 insertions(+), 29 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index a473cea10c16..0d9dd860a295 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -51,6 +51,9 @@ #define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v) +#define MLX5E_HW2SW_MTU(hwmtu) ((hwmtu) - (ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN)) +#define MLX5E_SW2HW_MTU(swmtu) ((swmtu) + (ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN)) + #define MLX5E_MAX_NUM_TC 8 #define MLX5E_PARAMS_MINIMUM_LOG_SQ_SIZE0x6 @@ -369,6 +372,7 @@ struct mlx5e_rq { unsigned long state; intix; + u16rx_headroom; struct mlx5e_rx_am am; /* Adaptive Moderation */ struct bpf_prog *xdp_prog; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index f74ba73c55c7..aba3691e0919 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -343,9 +343,6 @@ static void mlx5e_disable_async_events(struct mlx5e_priv *priv) synchronize_irq(mlx5_get_msix_vec(priv->mdev, MLX5_EQ_VEC_ASYNC)); } -#define MLX5E_HW2SW_MTU(hwmtu) (hwmtu - (ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN)) -#define MLX5E_SW2HW_MTU(swmtu) (swmtu + (ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN)) - static inline int mlx5e_get_wqe_mtt_sz(void) { /* UMR copies MTTs in units of MLX5_UMR_MTT_ALIGNMENT bytes. @@ -534,9 +531,13 @@ static int mlx5e_create_rq(struct mlx5e_channel *c, goto err_rq_wq_destroy; } - rq->buff.map_dir = DMA_FROM_DEVICE; - if (rq->xdp_prog) + if (rq->xdp_prog) { rq->buff.map_dir = DMA_BIDIRECTIONAL; + rq->rx_headroom = XDP_PACKET_HEADROOM; + } else { + rq->buff.map_dir = DMA_FROM_DEVICE; + rq->rx_headroom = MLX5_RX_HEADROOM; + } switch (priv->params.rq_wq_type) { case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ: @@ -586,7 +587,7 @@ static int mlx5e_create_rq(struct mlx5e_channel *c, byte_count = rq->buff.wqe_sz; /* calc the required page order */ - frag_sz = MLX5_RX_HEADROOM + + frag_sz = rq->rx_headroom + byte_count /* packet data */ + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); frag_sz = SKB_DATA_ALIGN(frag_sz); @@ -3153,11 +3154,6 @@ static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog) bool reset, was_opened; int i; - if (prog && prog->xdp_adjust_head) { - netdev_err(netdev, "Does not support bpf_xdp_adjust_head()\n"); - return -EOPNOTSUPP; - } - mutex_lock(&priv->state_lock); if ((netdev->features & NETIF_F_LRO) && prog) { diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c index 0e2fb3ed1790..20f116f8c457 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c @@ -264,7 +264,7 @@ int mlx5e_alloc_rx_wqe(struct mlx5e_rq *rq, struct mlx5e_rx_wqe *wqe, u16 ix) if (unlikely(mlx5e_page_alloc_mapped(rq, di))) return -ENOMEM; - wqe->data.addr = cpu_to_be64(di->addr + MLX5_RX_HEADROOM); + wqe->data.addr = cpu_to_be64(di->addr + rq->rx_headroom); return 0; } @@ -646,8 +646,7 @@ static inline void mlx5e_xmit_xdp_doorbell(struct mlx5e_sq *sq) static inline void mlx5e_xmit_xdp_frame(struct mlx5e_rq *rq, struct mlx5e_dma_info *di, - unsigned int data_offset, - int len) + const struct xdp_buff *xdp) { struct mlx5e_sq *sq = &rq->channel->xdp_sq; struct mlx5_wq_cyc *wq = &sq->wq; @@ -659,9 +658,16 @@ static inline void mlx5e_xmit_xdp_fr
Re: [PATCH net-next v2] bridge: multicast to unicast
Hi Felix, [auto build test WARNING on net-next/master] url: https://github.com/0day-ci/linux/commits/Linus-L-ssing/bridge-multicast-to-unicast/20170118-120345 config: x86_64-rhel-7.2 (attached as .config) compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 Note: it may well be a FALSE warning. FWIW you are at least aware of it now. http://gcc.gnu.org/wiki/Better_Uninitialized_Warnings All warnings (new ones prefixed by >>): net/bridge/br_forward.c: In function 'br_multicast_flood': >> net/bridge/br_forward.c:261:27: warning: 'port' may be used uninitialized in >> this function [-Wmaybe-uninitialized] struct net_bridge_port *port, *lport, *rport; ^~~~ vim +/port +261 net/bridge/br_forward.c 5cb5e947 Herbert Xu 2010-02-27 245 #ifdef CONFIG_BRIDGE_IGMP_SNOOPING 5cb5e947 Herbert Xu 2010-02-27 246 /* called with rcu_read_lock */ 37b090e6 Nikolay Aleksandrov 2016-07-14 247 void br_multicast_flood(struct net_bridge_mdb_entry *mdst, b35c5f63 Nikolay Aleksandrov 2016-07-14 248struct sk_buff *skb, 37b090e6 Nikolay Aleksandrov 2016-07-14 249bool local_rcv, bool local_orig) 5cb5e947 Herbert Xu 2010-02-27 250 { 5cb5e947 Herbert Xu 2010-02-27 251struct net_device *dev = BR_INPUT_SKB_CB(skb)->brdev; 1080ab95 Nikolay Aleksandrov 2016-06-28 252u8 igmp_type = br_multicast_igmp_type(skb); 5cb5e947 Herbert Xu 2010-02-27 253struct net_bridge *br = netdev_priv(dev); afe0159d stephen hemminger 2010-04-27 254struct net_bridge_port *prev = NULL; 5cb5e947 Herbert Xu 2010-02-27 255struct net_bridge_port_group *p; 5cb5e947 Herbert Xu 2010-02-27 256struct hlist_node *rp; 5cb5e947 Herbert Xu 2010-02-27 257 e8051688 Eric Dumazet2010-11-15 258rp = rcu_dereference(hlist_first_rcu(&br->router_list)); 83f6a740 stephen hemminger 2010-04-27 259p = mdst ? rcu_dereference(mdst->ports) : NULL; 5cb5e947 Herbert Xu 2010-02-27 260while (p || rp) { afe0159d stephen hemminger 2010-04-27 @261struct net_bridge_port *port, *lport, *rport; afe0159d stephen hemminger 2010-04-27 262 5cb5e947 Herbert Xu 2010-02-27 263lport = p ? p->port : NULL; 5cb5e947 Herbert Xu 2010-02-27 264rport = rp ? hlist_entry(rp, struct net_bridge_port, rlist) : 5cb5e947 Herbert Xu 2010-02-27 265 NULL; 5cb5e947 Herbert Xu 2010-02-27 266 507962cd Felix Fietkau 2017-01-17 267if ((unsigned long)lport > (unsigned long)rport) { 507962cd Felix Fietkau 2017-01-17 268if (p->flags & MDB_PG_FLAGS_MCAST_TO_UCAST) { 507962cd Felix Fietkau 2017-01-17 269 maybe_deliver_addr(lport, skb, p->eth_addr, :: The code at line 261 was first introduced by commit :: afe0159d935ab731c682e811356914bb2be9470c bridge: multicast_flood cleanup :: TO: stephen hemminger :: CC: David S. Miller --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [RFC v2 00/10] HFI Virtual Network Interface Controller (VNIC)
On Tue, Jan 17, 2017 at 11:27:20AM -0800, Vishwanathapura, Niranjana wrote: > Thanks Jason for the valuable inputs. > > Here is the new generic interface. > > Overview: > Bottom driver defines net_device_ops. The upper driver can override it. > For example, upper driver can implement ndo_open() which calls bottom > driver's ndo_open() and also do some book keeping. > > > include/rdma/ib_verbs.h: > > /* rdma netdev type - specifies protocol type */ > enum rdma_netdev_t { > RDMA_NETDEV_HFI_VNIC, > }; > > /* rdma netdev > * For usecases where netstack interfacing is required. > */ > struct rdma_netdev { > struct net_device *netdev; > u8 port_num; > > /* client private data structure */ > void *clnt_priv; > > /* control functions */ > void (*set_id)(struct rdma_netdev *rn, int id); > void (*set_state)(struct rdma_netdev *rn, int state); > }; > > struct ib_device { > ... > ... > /* rdma netdev operations */ > struct net_device *(*alloc_rdma_netdev)(struct ib_device *device, > u8 port_num, > enum rdma_netdev_t type, > const char *name, > unsigned char name_assign_type, > void (*setup)(struct net_device *)); > void (*free_rdma_netdev)(struct net_device *netdev); > }; > > > hfi1 driver: > > /* rdma netdev's private data structure */ > struct hfi1_rdma_netdev { > struct rdma_netdev rn; /* keep this first */ > /* hfi1's vnic private data follows */ > }; > > > include/rdma/opa_hfi.h: > > /* Client's ndo operations use below function instead of netdev_priv() */ > static inline void *hfi_vnic_priv(const struct net_device *dev) > { > struct rdma_netdev *rn = netdev_priv(dev); > > return rn->clnt_priv; > } > > /* Overrides rtnl_link_stats64 to include hfi_vnic stats. > * ndo_get_stats64() can be used to get the stats > */ > struct hfi_vnic_stats { > /* standard netdev statistics */ > struct rtnl_link_stats64 netstat; > > /* HFI VNIC statistics */ > u64 tx_mcastbcast; > u64 tx_untagged; > u64 tx_vlan; > u64 tx_64_size; > u64 tx_65_127; > u64 tx_128_255; > u64 tx_256_511; > u64 tx_512_1023; > u64 tx_1024_1518; > u64 tx_1519_max; > > u64 rx_untagged; > u64 rx_vlan; > u64 rx_64_size; > u64 rx_65_127; > u64 rx_128_255; > u64 rx_256_511; > u64 rx_512_1023; > u64 rx_1024_1518; > u64 rx_1519_max; > > u64 rx_runt; > u64 rx_oversize; > }; > > I have started working on porting hfi_vnic as per this new interface. > I will post RFC v3 later. > Posting the interface definition early for comments. I wonder how many people will comment it without seeing usage example. > > Thanks, > Niranjana > signature.asc Description: PGP signature
RE: [PATCH v2] net: fec: Fixed panic problem with non-tso
From: Eric Dumazet Sent: Wednesday, January 18, 2017 1:02 PM >To: Yuusuke Ashiduka >Cc: Andy Duan ; netdev@vger.kernel.org >Subject: Re: [PATCH v2] net: fec: Fixed panic problem with non-tso > >On Wed, 2017-01-18 at 13:11 +0900, Yuusuke Ashiduka wrote: >> If highmem and 2GB or more of memory are valid, "this_frag-> page.p" >> indicates the highmem area, so the result of page_address() is NULL >> and panic occurs. >> >> This commit fixes this by using the skb_frag_dma_map() helper, which >> takes care of mapping the skb fragment properly. Additionally, the >> type of mapping is now tracked, so it can be unmapped using >> dma_unmap_page or dma_unmap_single when appropriate. > > >I would prefer we fix the root cause, instead of tweaking all legacy drivers >out >there :/ > > I agree with you. The driver always doesn't support highmem. The fragment shouldn't allocate from highmem except the common code bug. If request the driver to support NETIF_F_HIGHDMA feature, we also add highmem support for tso driver. Andy
Re: [PATCH v2] net: fec: Fixed panic problem with non-tso
On Wed, 2017-01-18 at 13:11 +0900, Yuusuke Ashiduka wrote: > If highmem and 2GB or more of memory are valid, > "this_frag-> page.p" indicates the highmem area, > so the result of page_address() is NULL and panic occurs. > > This commit fixes this by using the skb_frag_dma_map() helper, > which takes care of mapping the skb fragment properly. Additionally, > the type of mapping is now tracked, so it can be unmapped using > dma_unmap_page or dma_unmap_single when appropriate. I would prefer we fix the root cause, instead of tweaking all legacy drivers out there :/
[PATCH net-next] mlx4: support __GFP_MEMALLOC for rx
From: Eric Dumazet Commit 04aeb56a1732 ("net/mlx4_en: allocate non 0-order pages for RX ring with __GFP_NOMEMALLOC") added code that appears to be not needed at that time, since mlx4 never used __GFP_MEMALLOC allocations anyway. As using memory reserves is a must in some situations (swap over NFS or iSCSI), this patch adds this flag. Note that this driver does not reuse pages (yet) so we do not have to add anything else. Signed-off-by: Eric Dumazet Cc: Konstantin Khlebnikov Cc: Tariq Toukan --- drivers/net/ethernet/mellanox/mlx4/en_rx.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c index eac527e25ec902c2a586e9952272b9e8e599e2c8..e362f99334d03c0df4d88320977670015870dd9c 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c @@ -706,7 +706,8 @@ static bool mlx4_en_refill_rx_buffers(struct mlx4_en_priv *priv, do { if (mlx4_en_prepare_rx_desc(priv, ring, ring->prod & ring->size_mask, - GFP_ATOMIC | __GFP_COLD)) + GFP_ATOMIC | __GFP_COLD | + __GFP_MEMALLOC)) break; ring->prod++; } while (--missing);
Re: [PATCH] net: fec: Fixed panic problem with non-tso
On Tue, 2017-01-17 at 20:21 -0800, Eric Dumazet wrote: > On Wed, 2017-01-18 at 03:12 +, Ashizuka, Yuusuke wrote: > > > indeed. > > > > In the case of TSO with i.MX6 system (highmem enabled) with 2GB memory, > > "this_frag->page.p" did not become highmem area. > > (We confirmed by transferring about 100MB of files) > > > > However, in the case of non-tso on an i.MX6 system with 2GB of memory, > > "this_frag->page.p" may become a highmem area. > > (Occurred with approximately 2MB of file transfer) > > > > For non-tso only, I do not know the reason why "this_frag-> page.p" > > in this driver shows highmem area. > > This worries me, since this driver does not set NETIF_F_HIGHDMA in its > features. > > No packet should be given to this driver with a highmem fragment > > Check is done in illegal_highdma() in net/core/dev.c This used to work. I suspect commit ec5f061564238892005257c83565a0b58ec79295 ("net: Kill link between CSUM and SG features.") added this bug. Can you try this hot fix : diff --git a/net/core/dev.c b/net/core/dev.c index ad5959e561166f445bdd9d7260652a338f74cfea..073b832b945257dba9ed47f4bf875605225effc9 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2773,9 +2773,9 @@ static netdev_features_t harmonize_features(struct sk_buff *skb, if (skb->ip_summed != CHECKSUM_NONE && !can_checksum_protocol(features, type)) { features &= ~(NETIF_F_CSUM_MASK | NETIF_F_GSO_MASK); - } else if (illegal_highdma(skb->dev, skb)) { - features &= ~NETIF_F_SG; } + if (illegal_highdma(skb->dev, skb)) + features &= ~NETIF_F_SG; return features; }
Re: [PATCH] net: fec: Fixed panic problem with non-tso
On Wed, 2017-01-18 at 03:12 +, Ashizuka, Yuusuke wrote: > indeed. > > In the case of TSO with i.MX6 system (highmem enabled) with 2GB memory, > "this_frag->page.p" did not become highmem area. > (We confirmed by transferring about 100MB of files) > > However, in the case of non-tso on an i.MX6 system with 2GB of memory, > "this_frag->page.p" may become a highmem area. > (Occurred with approximately 2MB of file transfer) > > For non-tso only, I do not know the reason why "this_frag-> page.p" > in this driver shows highmem area. This worries me, since this driver does not set NETIF_F_HIGHDMA in its features. No packet should be given to this driver with a highmem fragment Check is done in illegal_highdma() in net/core/dev.c
[PATCH v2] net: fec: Fixed panic problem with non-tso
If highmem and 2GB or more of memory are valid, "this_frag-> page.p" indicates the highmem area, so the result of page_address() is NULL and panic occurs. This commit fixes this by using the skb_frag_dma_map() helper, which takes care of mapping the skb fragment properly. Additionally, the type of mapping is now tracked, so it can be unmapped using dma_unmap_page or dma_unmap_single when appropriate. Signed-off-by: Yuusuke Ashiduka --- Changes for v2: - Added signed-off --- drivers/net/ethernet/freescale/fec.h | 1 + drivers/net/ethernet/freescale/fec_main.c | 48 +++ 2 files changed, 37 insertions(+), 12 deletions(-) diff --git a/drivers/net/ethernet/freescale/fec.h b/drivers/net/ethernet/freescale/fec.h index 5ea740b4cf14..5b187e8aacf0 100644 --- a/drivers/net/ethernet/freescale/fec.h +++ b/drivers/net/ethernet/freescale/fec.h @@ -463,6 +463,7 @@ struct bufdesc_prop { struct fec_enet_priv_tx_q { struct bufdesc_prop bd; unsigned char *tx_bounce[TX_RING_SIZE]; + int tx_page_mapping[TX_RING_SIZE]; struct sk_buff *tx_skbuff[TX_RING_SIZE]; unsigned short tx_stop_threshold; diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index 38160c2bebcb..b1562107e337 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -60,6 +60,7 @@ #include #include #include +#include #include #include @@ -377,20 +378,28 @@ fec_enet_txq_submit_frag_skb(struct fec_enet_priv_tx_q *txq, ebdp->cbd_esc = cpu_to_fec32(estatus); } - bufaddr = page_address(this_frag->page.p) + this_frag->page_offset; - index = fec_enet_get_bd_index(bdp, &txq->bd); - if (((unsigned long) bufaddr) & fep->tx_align || + txq->tx_page_mapping[index] = 0; + if (this_frag->page_offset & fep->tx_align || fep->quirks & FEC_QUIRK_SWAP_FRAME) { + bufaddr = kmap_atomic(this_frag->page.p) + + this_frag->page_offset; memcpy(txq->tx_bounce[index], bufaddr, frag_len); + kunmap_atomic(bufaddr); bufaddr = txq->tx_bounce[index]; if (fep->quirks & FEC_QUIRK_SWAP_FRAME) swap_buffer(bufaddr, frag_len); + addr = dma_map_single(&fep->pdev->dev, + bufaddr, + frag_len, + DMA_TO_DEVICE); + } else { + txq->tx_page_mapping[index] = 1; + addr = skb_frag_dma_map(&fep->pdev->dev, this_frag, 0, + frag_len, DMA_TO_DEVICE); } - addr = dma_map_single(&fep->pdev->dev, bufaddr, frag_len, - DMA_TO_DEVICE); if (dma_mapping_error(&fep->pdev->dev, addr)) { if (net_ratelimit()) netdev_err(ndev, "Tx DMA memory map failed\n"); @@ -411,8 +420,16 @@ fec_enet_txq_submit_frag_skb(struct fec_enet_priv_tx_q *txq, bdp = txq->bd.cur; for (i = 0; i < frag; i++) { bdp = fec_enet_get_nextdesc(bdp, &txq->bd); - dma_unmap_single(&fep->pdev->dev, fec32_to_cpu(bdp->cbd_bufaddr), -fec16_to_cpu(bdp->cbd_datlen), DMA_TO_DEVICE); + if (txq->tx_page_mapping[index]) + dma_unmap_page(&fep->pdev->dev, + fec32_to_cpu(bdp->cbd_bufaddr), + fec16_to_cpu(bdp->cbd_datlen), + DMA_TO_DEVICE); + else + dma_unmap_single(&fep->pdev->dev, +fec32_to_cpu(bdp->cbd_bufaddr), +fec16_to_cpu(bdp->cbd_datlen), +DMA_TO_DEVICE); } return ERR_PTR(-ENOMEM); } @@ -1201,11 +1218,18 @@ fec_enet_tx_queue(struct net_device *ndev, u16 queue_id) skb = txq->tx_skbuff[index]; txq->tx_skbuff[index] = NULL; - if (!IS_TSO_HEADER(txq, fec32_to_cpu(bdp->cbd_bufaddr))) - dma_unmap_single(&fep->pdev->dev, -fec32_to_cpu(bdp->cbd_bufaddr), -fec16_to_cpu(bdp->cbd_datlen), -DMA_TO_DEVICE); + if (!IS_TSO_HEADER(txq, fec32_to_cpu(bdp->cbd_bufaddr))) { + if (txq->tx_page_mapping[index]) + dma_unmap_page(&fep->pdev->
Re: [PATCH] virtio: don't set VIRTIO_NET_HDR_F_DATA_VALID on xmit
On 2017年01月18日 02:27, Michael S. Tsirkin wrote: On Tue, Jan 17, 2017 at 06:13:51PM +, Rolf Neugebauer wrote: This patch part reverts fd2a0437dc33 and e858fae2b0b8 which introduced a subtle change in how the virtio_net flags are derived from the SKBs ip_summed field. With the above commits, the flags are set to VIRTIO_NET_HDR_F_DATA_VALID when ip_summed == CHECKSUM_UNNECESSARY, thus treating it differently to ip_summed == CHECKSUM_NONE, which should be the same. Further, the virtio spec 1.0 / CS04 explicitly says that VIRTIO_NET_HDR_F_DATA_VALID must not be set by the driver. Signed-off-by: Rolf Neugebauer Fixes: fd2a0437dc33 ("virtio_net: introduce virtio_net_hdr_{from,to}_skb") Fixes: e858fae2b0b8 (" virtio_net: use common code for virtio_net_hdr and skb GSO conversion") Acked-by: Michael S. Tsirkin Should be backported into stable as well. Looks like a side effect is that we will never see this on receive path? We probably need a hint for virtio_net_hdr_from_skb(). Thanks --- include/linux/virtio_net.h | 2 -- 1 file changed, 2 deletions(-) diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h index 66204007d7ac..56436472ccc7 100644 --- a/include/linux/virtio_net.h +++ b/include/linux/virtio_net.h @@ -91,8 +91,6 @@ static inline int virtio_net_hdr_from_skb(const struct sk_buff *skb, skb_checksum_start_offset(skb)); hdr->csum_offset = __cpu_to_virtio16(little_endian, skb->csum_offset); - } else if (skb->ip_summed == CHECKSUM_UNNECESSARY) { - hdr->flags = VIRTIO_NET_HDR_F_DATA_VALID; } /* else everything is zero */ return 0; -- 2.11.0
Re: [net PATCH v5 6/6] virtio_net: XDP support for adjust_head
On 2017年01月18日 06:22, John Fastabend wrote: +static int virtnet_reset(struct virtnet_info *vi) +{ + struct virtio_device *dev = vi->vdev; + int ret; + + virtio_config_disable(dev); + dev->failed = dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED; + virtnet_freeze_down(dev); + _remove_vq_common(vi); + + dev->config->reset(dev); + virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE); + virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER); + + ret = virtio_finalize_features(dev); + if (ret) + goto err; + + ret = virtnet_restore_up(dev); + if (ret) + goto err; + ret = _virtnet_set_queues(vi, vi->curr_queue_pairs); + if (ret) + goto err; + + virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK); + virtio_config_enable(dev); + return 0; +err: + virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED); + return ret; +} + Hi John: I still prefer not open code (part of) virtio_device_freeze() and virtio_device_restore() here. How about: 1) introduce __virtio_device_freeze/__virtio_device_restore which accepts a function pointer of free/restore 2) for virtio_device_freeze/virtio_device_restore just pass drv->freeze/drv->restore (locked version) 3) for virtnet_reset(), we can pass unlocked version of freeze and restore Just my preference, if both Michael and you stick to this, I'm also fine. Thanks
RE: [PATCH] net: fec: Fixed panic problem with non-tso
> -Original Message- > From: Andy Duan [mailto:fugang.d...@nxp.com] > Sent: Tuesday, January 17, 2017 8:02 PM > To: Ashizuka, Yuusuke/芦塚 雄介 > Cc: netdev@vger.kernel.org > Subject: RE: [PATCH] net: fec: Fixed panic problem with non-tso > > From: Yuusuke Ashiduka Sent: Tuesday, January > 17, 2017 3:48 PM > >To: Andy Duan > >Cc: netdev@vger.kernel.org; Yuusuke Ashiduka > >Subject: [PATCH] net: fec: Fixed panic problem with non-tso > > > >If highmem and 2GB or more of memory are valid, "this_frag-> page.p" > >indicates the highmem area, so the result of page_address() is NULL and > >panic occurs. > > > >This commit fixes this by using the skb_frag_dma_map() helper, which > >takes care of mapping the skb fragment properly. Additionally, the type > >of mapping is now tracked, so it can be unmapped using dma_unmap_page > >or dma_unmap_single when appropriate. > >--- > > drivers/net/ethernet/freescale/fec.h | 1 + > > drivers/net/ethernet/freescale/fec_main.c | 48 > >+++ > > 2 files changed, 37 insertions(+), 12 deletions(-) > > > The patch itself seems fine. > The driver doesn't support skb from highmem, if to support highmem, it should > add frag_skb (highmem) support for tso and non-tso. > In driver net/core/tso.c, it also add highmem support, right ? indeed. In the case of TSO with i.MX6 system (highmem enabled) with 2GB memory, "this_frag->page.p" did not become highmem area. (We confirmed by transferring about 100MB of files) However, in the case of non-tso on an i.MX6 system with 2GB of memory, "this_frag->page.p" may become a highmem area. (Occurred with approximately 2MB of file transfer) For non-tso only, I do not know the reason why "this_frag-> page.p" in this driver shows highmem area. Thanks. > > Thanks. > > >diff --git a/drivers/net/ethernet/freescale/fec.h > >b/drivers/net/ethernet/freescale/fec.h > >index 5ea740b4cf14..5b187e8aacf0 100644 > >--- a/drivers/net/ethernet/freescale/fec.h > >+++ b/drivers/net/ethernet/freescale/fec.h > >@@ -463,6 +463,7 @@ struct bufdesc_prop { struct fec_enet_priv_tx_q { > > struct bufdesc_prop bd; > > unsigned char *tx_bounce[TX_RING_SIZE]; > >+int tx_page_mapping[TX_RING_SIZE]; > > struct sk_buff *tx_skbuff[TX_RING_SIZE]; > > > > unsigned short tx_stop_threshold; > >diff --git a/drivers/net/ethernet/freescale/fec_main.c > >b/drivers/net/ethernet/freescale/fec_main.c > >index 38160c2bebcb..b1562107e337 100644 > >--- a/drivers/net/ethernet/freescale/fec_main.c > >+++ b/drivers/net/ethernet/freescale/fec_main.c > >@@ -60,6 +60,7 @@ > > #include > > #include > > #include > >+#include > > #include > > > > #include > >@@ -377,20 +378,28 @@ fec_enet_txq_submit_frag_skb(struct > >fec_enet_priv_tx_q *txq, > > ebdp->cbd_esc = cpu_to_fec32(estatus); > > } > > > >-bufaddr = page_address(this_frag->page.p) + this_frag- > >>page_offset; > >- > > index = fec_enet_get_bd_index(bdp, &txq->bd); > >-if (((unsigned long) bufaddr) & fep->tx_align || > >+txq->tx_page_mapping[index] = 0; > >+if (this_frag->page_offset & fep->tx_align || > > fep->quirks & FEC_QUIRK_SWAP_FRAME) { > >+bufaddr = kmap_atomic(this_frag->page.p) + > >+ > this_frag->page_offset; > > memcpy(txq->tx_bounce[index], bufaddr, > frag_len); > >+kunmap_atomic(bufaddr); > > bufaddr = txq->tx_bounce[index]; > > > > if (fep->quirks & FEC_QUIRK_SWAP_FRAME) > > swap_buffer(bufaddr, frag_len); > >+addr = dma_map_single(&fep->pdev->dev, > >+ bufaddr, > >+ frag_len, > >+ DMA_TO_DEVICE); > >+} else { > >+txq->tx_page_mapping[index] = 1; > >+addr = skb_frag_dma_map(&fep->pdev->dev, > >this_frag, 0, > >+frag_len, > DMA_TO_DEVICE); > > } > > > >-addr = dma_map_single(&fep->pdev->dev, bufaddr, > frag_len, > >- DMA_TO_DEVICE); > > if (dma_mapping_error(&fep->pdev->dev, addr)) { > > if (net_ratelimit()) > > netdev_err(ndev, "Tx DMA memory map > failed\n"); @@ -411,8 +420,16 > >@@ fec_enet_txq_submit_frag_skb(struct > >fec_enet_priv_tx_q *txq, > > bdp = txq->bd.cur; > > for (i = 0; i < frag; i++) { > > bdp = fec_enet_get_nextdesc(bdp, &txq->bd); > >-dma_unmap_single(&fep->pdev->dev, fec32_to_cpu(bdp- > >>cbd_bufaddr), > >- fec16_to_cpu(bdp->cbd_datlen), > >DMA_TO_DEVICE); > >+if (txq->tx_page_mapping[index]) > >+dma_unmap_page(&fep->pdev->dev, > >+
RE: [PATCH] net: fec: Fixed panic problem with non-tso
> -Original Message- > From: David Miller [mailto:da...@davemloft.net] > Sent: Wednesday, January 18, 2017 5:45 AM > To: Ashizuka, Yuusuke/芦塚 雄介 > Cc: fugang.d...@nxp.com; netdev@vger.kernel.org > Subject: Re: [PATCH] net: fec: Fixed panic problem with non-tso > > From: Yuusuke Ashiduka > Date: Tue, 17 Jan 2017 16:48:20 +0900 > > > If highmem and 2GB or more of memory are valid, "this_frag-> page.p" > > indicates the highmem area, so the result of page_address() is NULL > > and panic occurs. > > > > This commit fixes this by using the skb_frag_dma_map() helper, which > > takes care of mapping the skb fragment properly. Additionally, the > > type of mapping is now tracked, so it can be unmapped using > > dma_unmap_page or dma_unmap_single when appropriate. > > This patch submission is lacking a proper signoff. Thank you for pointing out my mistake. I will submit the patch again.
[PATCH net] bnxt_en: Fix "uninitialized variable" bug in TPA code path.
In the TPA GRO code path, initialize the tcp_opt_len variable to 0 so that it will be correct for packets without TCP timestamps. The bug caused the SKB fields to be incorrectly set up for packets without TCP timestamps, leading to these packets being rejected by the stack. Reported-by: Andy Gospodarek Acked-by: Andy Gospodarek Signed-off-by: Michael Chan --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 9608cb4..53e686f 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -1099,7 +1099,7 @@ static struct sk_buff *bnxt_gro_func_5730x(struct bnxt_tpa_info *tpa_info, { #ifdef CONFIG_INET struct tcphdr *th; - int len, nw_off, tcp_opt_len; + int len, nw_off, tcp_opt_len = 0; if (tcp_ts) tcp_opt_len = 12; -- 1.8.3.1
Re: [PATCH] net: ethernet: stmmac: add ARP management
Hi Christophe, [auto build test WARNING on net-next/master] [also build test WARNING on next-20170117] [cannot apply to v4.10-rc4] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Christophe-Roullier/net-ethernet-stmmac-add-ARP-management/20170118-084026 config: x86_64-kexec (attached as .config) compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All warnings (new ones prefixed by >>): drivers/net/ethernet/stmicro/stmmac/stmmac_main.c: In function 'stmmac_dvr_probe': >> drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:3296:11: warning: passing >> argument 3 of 'priv->hw->dma->set_arp_addr' makes integer from pointer >> without a cast [-Wint-conversion] priv->dev->dev_addr); ^~~~ drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:3296:11: note: expected 'u32 {aka unsigned int}' but argument is of type 'unsigned char *' vim +3296 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 3280 NETIF_F_RXCSUM; 3281 3282 if ((priv->plat->tso_en) && (priv->dma_cap.tsoen)) { 3283 ndev->hw_features |= NETIF_F_TSO; 3284 priv->tso = true; 3285 dev_info(priv->device, "TSO feature enabled\n"); 3286 } 3287 3288 if ((priv->plat->arp_en) && (priv->dma_cap.arpoffsel)) { 3289 ret = priv->hw->mac->arp_en(priv->hw); 3290 if (!ret) { 3291 pr_warn(" ARP feature disabled\n"); 3292 } else { 3293 pr_info(" ARP feature enabled\n"); 3294 /* Copy MAC addr into MAC_ARP_ADDRESS register*/ 3295 priv->hw->dma->set_arp_addr(priv->ioaddr, 1, > 3296 > priv->dev->dev_addr); 3297 } 3298 } 3299 3300 ndev->features |= ndev->hw_features | NETIF_F_HIGHDMA; 3301 ndev->watchdog_timeo = msecs_to_jiffies(watchdog); 3302 #ifdef STMMAC_VLAN_TAG_USED 3303 /* Both mac100 and gmac support receive VLAN tag detection */ 3304 ndev->features |= NETIF_F_HW_VLAN_CTAG_RX; --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
[PATCH net-next 2/2] net: dsa: use cpu_switch instead of ds[0]
Now that the DSA Ethernet switches are true Linux devices, the CPU switch is not necessarily the first one. If its address is higher than the second switch on the same MDIO bus, its index will be 1, not 0. Avoid any confusion by using dst->cpu_switch instead of dst->ds[0]. Signed-off-by: Vivien Didelot --- net/dsa/dsa.c | 2 +- net/dsa/dsa2.c| 8 net/dsa/slave.c | 6 +++--- net/dsa/tag_brcm.c| 2 +- net/dsa/tag_qca.c | 2 +- net/dsa/tag_trailer.c | 2 +- 6 files changed, 11 insertions(+), 11 deletions(-) diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c index cb42655ba7da..87f2a9c9fa12 100644 --- a/net/dsa/dsa.c +++ b/net/dsa/dsa.c @@ -868,7 +868,7 @@ static void dsa_remove_dst(struct dsa_switch_tree *dst) dsa_switch_destroy(ds); } - dsa_cpu_port_ethtool_restore(dst->ds[0]); + dsa_cpu_port_ethtool_restore(dst->cpu_switch); dev_put(dst->master_netdev); } diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c index a9bf28d9f41f..634c6700a179 100644 --- a/net/dsa/dsa2.c +++ b/net/dsa/dsa2.c @@ -381,8 +381,8 @@ static int dsa_dst_apply(struct dsa_switch_tree *dst) return err; } - if (dst->ds[0]) { - err = dsa_cpu_port_ethtool_setup(dst->ds[0]); + if (dst->cpu_switch) { + err = dsa_cpu_port_ethtool_setup(dst->cpu_switch); if (err) return err; } @@ -426,8 +426,8 @@ static void dsa_dst_unapply(struct dsa_switch_tree *dst) dsa_ds_unapply(dst, ds); } - if (dst->ds[0]) - dsa_cpu_port_ethtool_restore(dst->ds[0]); + if (dst->cpu_switch) + dsa_cpu_port_ethtool_restore(dst->cpu_switch); pr_info("DSA: tree %d unapplied\n", dst->tree); dst->applied = false; diff --git a/net/dsa/slave.c b/net/dsa/slave.c index 0cdcaf526987..b8e58689a9a1 100644 --- a/net/dsa/slave.c +++ b/net/dsa/slave.c @@ -781,7 +781,7 @@ static void dsa_cpu_port_get_ethtool_stats(struct net_device *dev, uint64_t *data) { struct dsa_switch_tree *dst = dev->dsa_ptr; - struct dsa_switch *ds = dst->ds[0]; + struct dsa_switch *ds = dst->cpu_switch; s8 cpu_port = dst->cpu_port; int count = 0; @@ -798,7 +798,7 @@ static void dsa_cpu_port_get_ethtool_stats(struct net_device *dev, static int dsa_cpu_port_get_sset_count(struct net_device *dev, int sset) { struct dsa_switch_tree *dst = dev->dsa_ptr; - struct dsa_switch *ds = dst->ds[0]; + struct dsa_switch *ds = dst->cpu_switch; int count = 0; if (dst->master_ethtool_ops.get_sset_count) @@ -814,7 +814,7 @@ static void dsa_cpu_port_get_strings(struct net_device *dev, uint32_t stringset, uint8_t *data) { struct dsa_switch_tree *dst = dev->dsa_ptr; - struct dsa_switch *ds = dst->ds[0]; + struct dsa_switch *ds = dst->cpu_switch; s8 cpu_port = dst->cpu_port; int len = ETH_GSTRING_LEN; int mcount = 0, count; diff --git a/net/dsa/tag_brcm.c b/net/dsa/tag_brcm.c index 21bffde6e4bf..af82927674e0 100644 --- a/net/dsa/tag_brcm.c +++ b/net/dsa/tag_brcm.c @@ -102,7 +102,7 @@ static int brcm_tag_rcv(struct sk_buff *skb, struct net_device *dev, if (unlikely(dst == NULL)) goto out_drop; - ds = dst->ds[0]; + ds = dst->cpu_switch; skb = skb_unshare(skb, GFP_ATOMIC); if (skb == NULL) diff --git a/net/dsa/tag_qca.c b/net/dsa/tag_qca.c index 0c90cacee7aa..736ca8e8c31e 100644 --- a/net/dsa/tag_qca.c +++ b/net/dsa/tag_qca.c @@ -104,7 +104,7 @@ static int qca_tag_rcv(struct sk_buff *skb, struct net_device *dev, /* This protocol doesn't support cascading multiple switches so it's * safe to assume the switch is first in the tree */ - ds = dst->ds[0]; + ds = dst->cpu_switch; if (!ds) goto out_drop; diff --git a/net/dsa/tag_trailer.c b/net/dsa/tag_trailer.c index 5e3903eb1afa..271128a2dc64 100644 --- a/net/dsa/tag_trailer.c +++ b/net/dsa/tag_trailer.c @@ -67,7 +67,7 @@ static int trailer_rcv(struct sk_buff *skb, struct net_device *dev, if (unlikely(dst == NULL)) goto out_drop; - ds = dst->ds[0]; + ds = dst->cpu_switch; skb = skb_unshare(skb, GFP_ATOMIC); if (skb == NULL) -- 2.11.0
[PATCH net-next 1/2] net: dsa: store CPU switch structure in the tree
Store a dsa_switch pointer to the CPU switch in the tree instead of only its index. This avoids the need to initialize it to -1. Signed-off-by: Vivien Didelot --- include/net/dsa.h | 8 net/dsa/dsa.c | 7 +++ net/dsa/dsa2.c| 5 ++--- 3 files changed, 9 insertions(+), 11 deletions(-) diff --git a/include/net/dsa.h b/include/net/dsa.h index 454667952d6d..82f7019f27f2 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -124,7 +124,7 @@ struct dsa_switch_tree { /* * The switch and port to which the CPU is attached. */ - s8 cpu_switch; + struct dsa_switch *cpu_switch; s8 cpu_port; /* @@ -211,7 +211,7 @@ struct dsa_switch { static inline bool dsa_is_cpu_port(struct dsa_switch *ds, int p) { - return !!(ds->index == ds->dst->cpu_switch && p == ds->dst->cpu_port); + return !!(ds == ds->dst->cpu_switch && p == ds->dst->cpu_port); } static inline bool dsa_is_dsa_port(struct dsa_switch *ds, int p) @@ -234,10 +234,10 @@ static inline u8 dsa_upstream_port(struct dsa_switch *ds) * Else return the (DSA) port number that connects to the * switch that is one hop closer to the cpu. */ - if (dst->cpu_switch == ds->index) + if (dst->cpu_switch == ds) return dst->cpu_port; else - return ds->rtable[dst->cpu_switch]; + return ds->rtable[dst->cpu_switch->index]; } struct switchdev_trans; diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c index 96d1544df518..cb42655ba7da 100644 --- a/net/dsa/dsa.c +++ b/net/dsa/dsa.c @@ -225,12 +225,12 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, struct device *parent) continue; if (!strcmp(name, "cpu")) { - if (dst->cpu_switch != -1) { + if (!dst->cpu_switch) { netdev_err(dst->master_netdev, "multiple cpu ports?!\n"); return -EINVAL; } - dst->cpu_switch = index; + dst->cpu_switch = ds; dst->cpu_port = i; ds->cpu_port_mask |= 1 << i; } else if (!strcmp(name, "dsa")) { @@ -254,7 +254,7 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, struct device *parent) * tagging protocol to the preferred tagging format of this * switch. */ - if (dst->cpu_switch == index) { + if (dst->cpu_switch == ds) { enum dsa_tag_protocol tag_protocol; tag_protocol = ops->get_tag_protocol(ds); @@ -757,7 +757,6 @@ static int dsa_setup_dst(struct dsa_switch_tree *dst, struct net_device *dev, dst->pd = pd; dst->master_netdev = dev; - dst->cpu_switch = -1; dst->cpu_port = -1; for (i = 0; i < pd->nr_chips; i++) { diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c index a1f26fc0f585..a9bf28d9f41f 100644 --- a/net/dsa/dsa2.c +++ b/net/dsa/dsa2.c @@ -57,7 +57,6 @@ static struct dsa_switch_tree *dsa_add_dst(u32 tree) if (!dst) return NULL; dst->tree = tree; - dst->cpu_switch = -1; INIT_LIST_HEAD(&dst->list); list_add_tail(&dsa_switch_trees, &dst->list); kref_init(&dst->refcount); @@ -456,8 +455,8 @@ static int dsa_cpu_parse(struct device_node *port, u32 index, if (!dst->master_netdev) dst->master_netdev = ethernet_dev; - if (dst->cpu_switch == -1) { - dst->cpu_switch = ds->index; + if (!dst->cpu_switch) { + dst->cpu_switch = ds; dst->cpu_port = index; } -- 2.11.0
Re: [PATCH 3/4] net: ethernet: ti: cpsw: don't duplicate ndev_running
On Thu, Jan 12, 2017 at 11:34:47AM -0600, Grygorii Strashko wrote: Hi Grygorii, Sorry for late reply. > > > On 01/10/2017 07:56 PM, Ivan Khoronzhuk wrote: > > On Mon, Jan 09, 2017 at 11:25:38AM -0600, Grygorii Strashko wrote: > >> > >> > >> On 01/08/2017 10:41 AM, Ivan Khoronzhuk wrote: > >>> No need to create additional vars to identify if interface is running. > >>> So simplify code by removing redundant var and checking usage counter > >>> instead. > >>> > >>> Signed-off-by: Ivan Khoronzhuk > >>> --- > >>> drivers/net/ethernet/ti/cpsw.c | 14 -- > >>> 1 file changed, 4 insertions(+), 10 deletions(-) > >>> > >>> diff --git a/drivers/net/ethernet/ti/cpsw.c > >>> b/drivers/net/ethernet/ti/cpsw.c > >>> index 40d7fc9..daae87f 100644 > >>> --- a/drivers/net/ethernet/ti/cpsw.c > >>> +++ b/drivers/net/ethernet/ti/cpsw.c > >>> @@ -357,7 +357,6 @@ struct cpsw_slave { > >>> struct phy_device *phy; > >>> struct net_device *ndev; > >>> u32 port_vlan; > >>> - u32 open_stat; > >>> }; > >>> > >>> static inline u32 slave_read(struct cpsw_slave *slave, u32 offset) > >>> @@ -1241,7 +1240,7 @@ static int cpsw_common_res_usage_state(struct > >>> cpsw_common *cpsw) > >>> u32 usage_count = 0; > >>> > >>> for (i = 0; i < cpsw->data.slaves; i++) > >>> - if (cpsw->slaves[i].open_stat) > >>> + if (netif_running(cpsw->slaves[i].ndev)) > >>> usage_count++; > >> > >> Not sure this will work as you expected, but may be I've missed smth :( > > I've changed conditions, will work. > > > >> > >> code in static int __dev_open(struct net_device *dev) > >> .. > >>set_bit(__LINK_STATE_START, &dev->state); > >> > >>if (ops->ndo_validate_addr) > >>ret = ops->ndo_validate_addr(dev); > >> > >>if (!ret && ops->ndo_open) > >>ret = ops->ndo_open(dev); > >> > >>netpoll_poll_enable(dev); > >> > >>if (ret) > >>clear_bit(__LINK_STATE_START, &dev->state); > >> .. > >> > >> so, netif_running(ndev) will start returning true before calling > >> ops->ndo_open(dev); > > Yes, It's done bearing it in mind of course. > > > >> > >>> > >>> return usage_count; > >>> @@ -1502,7 +1501,7 @@ static int cpsw_ndo_open(struct net_device *ndev) > >>>CPSW_RTL_VERSION(reg)); > >>> > >>> /* initialize host and slave ports */ > >>> - if (!cpsw_common_res_usage_state(cpsw)) > >>> + if (cpsw_common_res_usage_state(cpsw) < 2) > >> > >> Ah. You've changed the condition here. > >> > >> I think it might be reasonable to hide this inside > >> cpsw_common_res_usage_state() > >> and seems it can be renamed to smth like cpsw_is_running(). > > It probably needs to be renamed to smth a little different, > > like cpsw_get_usage_count ...or cpsw_get_open_ndev_count > > cpsw_get_usage_count () sounds good Like it more also. Will change it. > > > > >> > >> > >>> cpsw_init_host_port(priv); > >>> for_each_slave(priv, cpsw_slave_open, priv); > >>> > >>> @@ -1513,7 +1512,7 @@ static int cpsw_ndo_open(struct net_device *ndev) > >>> cpsw_ale_add_vlan(cpsw->ale, cpsw->data.default_vlan, > >>> ALE_ALL_PORTS, ALE_ALL_PORTS, 0, 0); > >>> > >>> - if (!cpsw_common_res_usage_state(cpsw)) { > >>> + if (cpsw_common_res_usage_state(cpsw) < 2) { > >>> /* disable priority elevation */ > >>> __raw_writel(0, &cpsw->regs->ptype); > >>> > >>> @@ -1556,9 +1555,6 @@ static int cpsw_ndo_open(struct net_device *ndev) > >>> cpdma_ctlr_start(cpsw->dma); > >>> cpsw_intr_enable(cpsw); > >>> > >>> - if (cpsw->data.dual_emac) > >>> - cpsw->slaves[priv->emac_port].open_stat = true; > >>> - > >>> return 0; > >>> > >>> err_cleanup: > >>> @@ -1578,7 +1574,7 @@ static int cpsw_ndo_stop(struct net_device *ndev) > >>> netif_tx_stop_all_queues(priv->ndev); > >>> netif_carrier_off(priv->ndev); > >>> > >>> - if (cpsw_common_res_usage_state(cpsw) <= 1) { > >>> + if (!cpsw_common_res_usage_state(cpsw)) { > >> > >> and here __LINK_STATE_START will be cleared before calling > >> ops->ndo_stop(dev); > > Actually it's changed because of it. > > > >> So, from one side netif_running(ndev) usage will simplify > >> cpsw_common_res_usage_state() internals, > >> but from another side - it will make places where it's used even more > >> entangled :( as for me, > >> because when cpsw_common_res_usage_state() will return 1 in > >> cpsw_ndo_open() it will mean > >> "no interfaces is really running yet", but the same value 1 in > >> cpsw_ndo_stop() > > why not? no interfaces running, except the one excuting ndo_open now. > > It's more clear then duplicating it and using two different ways in > > different places for identifing running devices. Current way more > > close to some testing code, not final version. Just to be consistent > > better to change it. > > > > Yes, it returns different results when it's ca
Re: [PATCH v3 net-next] net:add one common config ARCH_WANT_RELAX_ORDER to support relax ordering
On Tue, Jan 17, 2017 at 4:50 PM, Mao Wenan wrote: > Relax ordering(RO) is one feature of 82599 NIC, to enable this feature can > enhance the performance for some cpu architecure, such as SPARC and so on. > Currently it only supports one special cpu architecture(SPARC) in 82599 > driver to enable RO feature, this is not very common for other cpu > architecture > which really needs RO feature. > This patch add one common config CONFIG_ARCH_WANT_RELAX_ORDER to set RO > feature, > and should define CONFIG_ARCH_WANT_RELAX_ORDER in sparc Kconfig firstly. > > Signed-off-by: Mao Wenan > Reviewed-by: Alexander Duyck Reviewed-by: Alexander Duyck > --- > v2 -> v3: add reviewed information. > --- > arch/Kconfig| 3 +++ > arch/sparc/Kconfig | 1 + > drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 2 +- > 3 files changed, 5 insertions(+), 1 deletion(-) > > diff --git a/arch/Kconfig b/arch/Kconfig > index 99839c2..bd04eac 100644 > --- a/arch/Kconfig > +++ b/arch/Kconfig > @@ -781,4 +781,7 @@ config VMAP_STACK > the stack to map directly to the KASAN shadow map using a formula > that is incorrect if the stack is in vmalloc space. > > +config ARCH_WANT_RELAX_ORDER > + bool > + > source "kernel/gcov/Kconfig" > diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig > index cf4034c..68ac5c7 100644 > --- a/arch/sparc/Kconfig > +++ b/arch/sparc/Kconfig > @@ -44,6 +44,7 @@ config SPARC > select CPU_NO_EFFICIENT_FFS > select HAVE_ARCH_HARDENED_USERCOPY > select PROVE_LOCKING_SMALL if PROVE_LOCKING > + select ARCH_WANT_RELAX_ORDER > > config SPARC32 > def_bool !64BIT > diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c > b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c > index 094e1d6..c38d50c 100644 > --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c > +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c > @@ -350,7 +350,7 @@ s32 ixgbe_start_hw_gen2(struct ixgbe_hw *hw) > } > IXGBE_WRITE_FLUSH(hw); > > -#ifndef CONFIG_SPARC > +#ifndef CONFIG_ARCH_WANT_RELAX_ORDER > /* Disable relaxed ordering */ > for (i = 0; i < hw->mac.max_tx_queues; i++) { > u32 regval; > -- > 2.7.0 > >
RE: GOOD NEWS
A donation was made to you . Contact ( antoiaxjohn...@yahoo.com ) for details...
Re: [PATCH RFC] net: dsa: remove unnecessary phy.h include
On 01/17/2017 04:14 PM, Russell King - ARM Linux wrote: > Including phy.h and phy_fixed.h into net/dsa.h causes phy*.h to be an > unnecessary dependency for quite a large amount of the kernel. There's > very little which actually requires definitions from phy.h in net/dsa.h > - the include itself only wants the declaration of a couple of > structures and IFNAMSIZ. > > Add linux/if.h for IFNAMSIZ, declarations for the structures, phy.h to > mv88e6xxx.h as it needs it for phy_interface_t, and remove both phy.h > and phy_fixed.h from net/dsa.h. > > This patch reduces from around 800 files rebuilt to around 40 - even > with ccache, the time difference is noticable. > > Signed-off-by: Russell King Reviewed-by: Florian Fainelli > --- > I noticed when I touched linux/phy.h that a lot of the kernel ended up > being unexpectedly rebuilt, as linux/netdevice.h includes net/dsa.h, > which then then includes linux/phy.h. I've tested this change on both > ARM and ARM64, but I'd suggest letting the 0-day builder have a bite > at this, and then only taking it if everyone is confident that there's > a slim chance of any problems. Also, it may need some rework to apply > to davem's tree. All of the above makes this RFC only. > > drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 1 + > include/net/dsa.h | 6 -- > 2 files changed, 5 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h > b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h > index a319c06d82e3..d247b0639ed4 100644 > --- a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h > +++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h > @@ -15,6 +15,7 @@ > #include > #include > #include > +#include > > #ifndef UINT64_MAX > #define UINT64_MAX (u64)(~((u64)0)) > diff --git a/include/net/dsa.h b/include/net/dsa.h > index b122196d5a1f..887b2f98f9ea 100644 > --- a/include/net/dsa.h > +++ b/include/net/dsa.h > @@ -11,15 +11,17 @@ > #ifndef __LINUX_NET_DSA_H > #define __LINUX_NET_DSA_H > > +#include > #include > #include > #include > #include > #include > -#include > -#include > #include > > +struct phy_device; > +struct fixed_phy_status; > + > enum dsa_tag_protocol { > DSA_TAG_PROTO_NONE = 0, > DSA_TAG_PROTO_DSA, > -- Florian
Re: [PATCH RFC] net: dsa: remove unnecessary phy.h include
Hi Russell, Russell King - ARM Linux writes: > Including phy.h and phy_fixed.h into net/dsa.h causes phy*.h to be an > unnecessary dependency for quite a large amount of the kernel. There's > very little which actually requires definitions from phy.h in net/dsa.h > - the include itself only wants the declaration of a couple of > structures and IFNAMSIZ. > > Add linux/if.h for IFNAMSIZ, declarations for the structures, phy.h to > mv88e6xxx.h as it needs it for phy_interface_t, and remove both phy.h > and phy_fixed.h from net/dsa.h. > > This patch reduces from around 800 files rebuilt to around 40 - even > with ccache, the time difference is noticable. > > Signed-off-by: Russell King This patch applies cleanly on net-next and builds correctly after touching include/linux/phy.h. My boards work fine with it. Tested-by: Vivien Didelot Thanks, Vivien
RE: [PATCH v2 net-next] net:add one common config ARCH_WANT_RELAX_ORDER to support relax ordering.
> -Original Message- > From: Alexander Duyck [mailto:alexander.du...@gmail.com] > Sent: Wednesday, January 18, 2017 3:28 AM > To: David Miller > Cc: maowenan; Netdev; Jeff Kirsher > Subject: Re: [PATCH v2 net-next] net:add one common config > ARCH_WANT_RELAX_ORDER to support relax ordering. > > On Tue, Jan 17, 2017 at 11:15 AM, David Miller > wrote: > > From: Mao Wenan > > Date: Mon, 9 Jan 2017 13:32:34 +0800 > > > >> Relax ordering(RO) is one feature of 82599 NIC, to enable this > >> feature can enhance the performance for some cpu architecure, such as > SPARC and so on. > >> Currently it only supports one special cpu architecture(SPARC) in > >> 82599 driver to enable RO feature, this is not very common for other > >> cpu architecture which really needs RO feature. > >> This patch add one common config CONFIG_ARCH_WANT_RELAX_ORDER to > set > >> RO feature, and should define CONFIG_ARCH_WANT_RELAX_ORDER in > sparc Kconfig firstly. > >> > >> Signed-off-by: Mao Wenan > > > > Since no-one has reviewed this patch, and I do not feel comfortable > > with applying it without such review, I am tossing this patch. > > > > If someone eventually reviews it, repost this patch. > > Mao, > > Go ahead and repost the patch and feel free to add my Reviewed-by. > Sorry I didn't reply to this earlier but I have been getting over the flu for > the last > week or so. > > - Alex Hi Alex, I have reposted the patch(V3), thanks a lot.
Re: [PATCHv2 5/7] TAP: Extending tap device create/destroy APIs
On Wed, Jan 18, 2017 at 2:03 AM, Sainath Grandhi wrote: > Extending tap APIs get/free_minor and create/destroy_cdev to handle more than > one > type of virtual interface. > Yes, looks better now. FWIW: Reviewed-by: Andy Shevchenko > Signed-off-by: Sainath Grandhi > --- > drivers/net/macvtap_main.c | 6 +-- > drivers/net/tap.c | 98 > +++--- > include/linux/if_tap.h | 4 +- > 3 files changed, 80 insertions(+), 28 deletions(-) > > diff --git a/drivers/net/macvtap_main.c b/drivers/net/macvtap_main.c > index 6326a82..3f047b4 100644 > --- a/drivers/net/macvtap_main.c > +++ b/drivers/net/macvtap_main.c > @@ -160,7 +160,7 @@ static int macvtap_device_event(struct notifier_block > *unused, > * been registered but before register_netdevice has > * finished running. > */ > - err = tap_get_minor(&vlantap->tap); > + err = tap_get_minor(macvtap_major, &vlantap->tap); > if (err) > return notifier_from_errno(err); > > @@ -168,7 +168,7 @@ static int macvtap_device_event(struct notifier_block > *unused, > classdev = device_create(&macvtap_class, &dev->dev, devt, > dev, tap_name); > if (IS_ERR(classdev)) { > - tap_free_minor(&vlantap->tap); > + tap_free_minor(macvtap_major, &vlantap->tap); > return notifier_from_errno(PTR_ERR(classdev)); > } > err = sysfs_create_link(&dev->dev.kobj, &classdev->kobj, > @@ -183,7 +183,7 @@ static int macvtap_device_event(struct notifier_block > *unused, > sysfs_remove_link(&dev->dev.kobj, tap_name); > devt = MKDEV(MAJOR(macvtap_major), vlantap->tap.minor); > device_destroy(&macvtap_class, devt); > - tap_free_minor(&vlantap->tap); > + tap_free_minor(macvtap_major, &vlantap->tap); > break; > case NETDEV_CHANGE_TX_QUEUE_LEN: > if (tap_queue_resize(&vlantap->tap)) > diff --git a/drivers/net/tap.c b/drivers/net/tap.c > index 43d9d54..7f38dbe 100644 > --- a/drivers/net/tap.c > +++ b/drivers/net/tap.c > @@ -99,12 +99,16 @@ static struct proto tap_proto = { > }; > > #define TAP_NUM_DEVS (1U << MINORBITS) > + > +static LIST_HEAD(major_list); > + > struct major_info { > dev_t major; > struct idr minor_idr; > struct mutex minor_lock; > const char *device_name; > -} macvtap_major; > + struct list_head next; > +}; > > #define GOODCOPY_LEN 128 > > @@ -385,44 +389,73 @@ rx_handler_result_t tap_handle_frame(struct sk_buff > **pskb) > return RX_HANDLER_CONSUMED; > } > > -int tap_get_minor(struct tap_dev *tap) > +static struct major_info *tap_get_major(int major) > +{ > + struct major_info *tap_major, *tmp; > + > + list_for_each_entry_safe(tap_major, tmp, &major_list, next) { > + if (tap_major->major == major) { > + return tap_major; > + } > + } > + > + return NULL; > +} > + > +int tap_get_minor(dev_t major, struct tap_dev *tap) > { > int retval = -ENOMEM; > + struct major_info *tap_major; > + > + tap_major = tap_get_major(MAJOR(major)); > + if (!tap_major) > + return -EINVAL; > > - mutex_lock(&macvtap_major.minor_lock); > - retval = idr_alloc(&macvtap_major.minor_idr, tap, 1, TAP_NUM_DEVS, > GFP_KERNEL); > + mutex_lock(&tap_major->minor_lock); > + retval = idr_alloc(&tap_major->minor_idr, tap, 1, TAP_NUM_DEVS, > GFP_KERNEL); > if (retval >= 0) { > tap->minor = retval; > } else if (retval == -ENOSPC) { > netdev_err(tap->dev, "Too many tap devices\n"); > retval = -EINVAL; > } > - mutex_unlock(&macvtap_major.minor_lock); > + mutex_unlock(&tap_major->minor_lock); > return retval < 0 ? retval : 0; > } > > -void tap_free_minor(struct tap_dev *tap) > +void tap_free_minor(dev_t major, struct tap_dev *tap) > { > - mutex_lock(&macvtap_major.minor_lock); > + struct major_info *tap_major; > + > + tap_major = tap_get_major(MAJOR(major)); > + if (!tap_major) > + return; > + > + mutex_lock(&tap_major->minor_lock); > if (tap->minor) { > - idr_remove(&macvtap_major.minor_idr, tap->minor); > + idr_remove(&tap_major->minor_idr, tap->minor); > tap->minor = 0; > } > - mutex_unlock(&macvtap_major.minor_lock); > + mutex_unlock(&tap_major->minor_lock); > } > > -static struct tap_dev *dev_get_by_tap_minor(int minor) > +static struct tap_dev *dev_get_by_tap_file(int major, int minor) > { > struct net_device *dev = NULL; > struct tap_dev *tap; > + str
[PATCH v3 net-next] net:add one common config ARCH_WANT_RELAX_ORDER to support relax ordering
Relax ordering(RO) is one feature of 82599 NIC, to enable this feature can enhance the performance for some cpu architecure, such as SPARC and so on. Currently it only supports one special cpu architecture(SPARC) in 82599 driver to enable RO feature, this is not very common for other cpu architecture which really needs RO feature. This patch add one common config CONFIG_ARCH_WANT_RELAX_ORDER to set RO feature, and should define CONFIG_ARCH_WANT_RELAX_ORDER in sparc Kconfig firstly. Signed-off-by: Mao Wenan Reviewed-by: Alexander Duyck --- v2 -> v3: add reviewed information. --- arch/Kconfig| 3 +++ arch/sparc/Kconfig | 1 + drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 2 +- 3 files changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/Kconfig b/arch/Kconfig index 99839c2..bd04eac 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -781,4 +781,7 @@ config VMAP_STACK the stack to map directly to the KASAN shadow map using a formula that is incorrect if the stack is in vmalloc space. +config ARCH_WANT_RELAX_ORDER + bool + source "kernel/gcov/Kconfig" diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index cf4034c..68ac5c7 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -44,6 +44,7 @@ config SPARC select CPU_NO_EFFICIENT_FFS select HAVE_ARCH_HARDENED_USERCOPY select PROVE_LOCKING_SMALL if PROVE_LOCKING + select ARCH_WANT_RELAX_ORDER config SPARC32 def_bool !64BIT diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c index 094e1d6..c38d50c 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c @@ -350,7 +350,7 @@ s32 ixgbe_start_hw_gen2(struct ixgbe_hw *hw) } IXGBE_WRITE_FLUSH(hw); -#ifndef CONFIG_SPARC +#ifndef CONFIG_ARCH_WANT_RELAX_ORDER /* Disable relaxed ordering */ for (i = 0; i < hw->mac.max_tx_queues; i++) { u32 regval; -- 2.7.0
Darlehen angebot 3 %
Sehr geehrte Damen und Herren, Haben Sie Interesse über einer finanziellen Darlehen zu 3%??? kontaktieren Sie mich für mehr Details und Bedingungen. ich kann all jenen helfen, wer ein Darlehen benötigen. Ich kann Ihnen biete ein darlehen in hohe von 10.000.000 Euro Meine mail: info@rschmidt.online Mit freundlichen Grüßen
linux-next: build warnings after merge of the net-next tree
Hi all, After merging the net-next tree, today's linux-next build (powerpc ppc64_defconfig) produced these warnings: drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c: In function 'init_one': drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:4646:9: warning: unused variable 'port_vec' [-Wunused-variable] u32 v, port_vec; ^ drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:4646:6: warning: unused variable 'v' [-Wunused-variable] u32 v, port_vec; ^ Introduced by commit 96fe11f27b70 ("cxgb4: Implement ndo_get_phys_port_id for mgmt dev") -- Cheers, Stephen Rothwell
[PATCH] net: ethernet: ti: davinci_cpdma: correct check on NULL in set rate
Check "ch" on NULL first, then get ctlr. Signed-off-by: Ivan Khoronzhuk --- Based on net-next/master drivers/net/ethernet/ti/davinci_cpdma.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c index d80bff1..7ecc6b7 100644 --- a/drivers/net/ethernet/ti/davinci_cpdma.c +++ b/drivers/net/ethernet/ti/davinci_cpdma.c @@ -835,8 +835,8 @@ EXPORT_SYMBOL_GPL(cpdma_chan_get_min_rate); */ int cpdma_chan_set_rate(struct cpdma_chan *ch, u32 rate) { - struct cpdma_ctlr *ctlr = ch->ctlr; unsigned long flags, ch_flags; + struct cpdma_ctlr *ctlr; int ret, prio_mode; u32 rmask; @@ -846,6 +846,7 @@ int cpdma_chan_set_rate(struct cpdma_chan *ch, u32 rate) if (ch->rate == rate) return rate; + ctlr = ch->ctlr; spin_lock_irqsave(&ctlr->lock, flags); spin_lock_irqsave(&ch->lock, ch_flags); -- 2.7.4
[PATCH net] net: phy: bcm63xx: Utilize correct config_intr function
From: Daniel Gonzalez Cabanelas Commit a1cba5613edf ("net: phy: Add Broadcom phy library for common interfaces") make the BCM63xx PHY driver utilize bcm_phy_config_intr() which would appear to do the right thing, except that it does not write to the MII_BCM63XX_IR register but to MII_BCM54XX_ECR which is different. This would be causing invalid link parameters and events from being generated by the PHY interrupt. Fixes: a1cba5613edf ("net: phy: Add Broadcom phy library for common interfaces") Signed-off-by: Daniel Gonzalez Cabanelas Signed-off-by: Florian Fainelli --- drivers/net/phy/bcm63xx.c | 21 +++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/drivers/net/phy/bcm63xx.c b/drivers/net/phy/bcm63xx.c index e741bf614c4e..b0492ef2cdaa 100644 --- a/drivers/net/phy/bcm63xx.c +++ b/drivers/net/phy/bcm63xx.c @@ -21,6 +21,23 @@ MODULE_DESCRIPTION("Broadcom 63xx internal PHY driver"); MODULE_AUTHOR("Maxime Bizon "); MODULE_LICENSE("GPL"); +static int bcm63xx_config_intr(struct phy_device *phydev) +{ + int reg, err; + + reg = phy_read(phydev, MII_BCM63XX_IR); + if (reg < 0) + return reg; + + if (phydev->interrupts == PHY_INTERRUPT_ENABLED) + reg &= ~MII_BCM63XX_IR_GMASK; + else + reg |= MII_BCM63XX_IR_GMASK; + + err = phy_write(phydev, MII_BCM63XX_IR, reg); + return err; +} + static int bcm63xx_config_init(struct phy_device *phydev) { int reg, err; @@ -55,7 +72,7 @@ static struct phy_driver bcm63xx_driver[] = { .config_aneg= genphy_config_aneg, .read_status= genphy_read_status, .ack_interrupt = bcm_phy_ack_intr, - .config_intr= bcm_phy_config_intr, + .config_intr= bcm63xx_config_intr, }, { /* same phy as above, with just a different OUI */ .phy_id = 0x002bdc00, @@ -67,7 +84,7 @@ static struct phy_driver bcm63xx_driver[] = { .config_aneg= genphy_config_aneg, .read_status= genphy_read_status, .ack_interrupt = bcm_phy_ack_intr, - .config_intr= bcm_phy_config_intr, + .config_intr= bcm63xx_config_intr, } }; module_phy_driver(bcm63xx_driver); -- 2.9.3
[PATCH RFC] net: dsa: remove unnecessary phy.h include
Including phy.h and phy_fixed.h into net/dsa.h causes phy*.h to be an unnecessary dependency for quite a large amount of the kernel. There's very little which actually requires definitions from phy.h in net/dsa.h - the include itself only wants the declaration of a couple of structures and IFNAMSIZ. Add linux/if.h for IFNAMSIZ, declarations for the structures, phy.h to mv88e6xxx.h as it needs it for phy_interface_t, and remove both phy.h and phy_fixed.h from net/dsa.h. This patch reduces from around 800 files rebuilt to around 40 - even with ccache, the time difference is noticable. Signed-off-by: Russell King --- I noticed when I touched linux/phy.h that a lot of the kernel ended up being unexpectedly rebuilt, as linux/netdevice.h includes net/dsa.h, which then then includes linux/phy.h. I've tested this change on both ARM and ARM64, but I'd suggest letting the 0-day builder have a bite at this, and then only taking it if everyone is confident that there's a slim chance of any problems. Also, it may need some rework to apply to davem's tree. All of the above makes this RFC only. drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 1 + include/net/dsa.h | 6 -- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h index a319c06d82e3..d247b0639ed4 100644 --- a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h +++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h @@ -15,6 +15,7 @@ #include #include #include +#include #ifndef UINT64_MAX #define UINT64_MAX (u64)(~((u64)0)) diff --git a/include/net/dsa.h b/include/net/dsa.h index b122196d5a1f..887b2f98f9ea 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -11,15 +11,17 @@ #ifndef __LINUX_NET_DSA_H #define __LINUX_NET_DSA_H +#include #include #include #include #include #include -#include -#include #include +struct phy_device; +struct fixed_phy_status; + enum dsa_tag_protocol { DSA_TAG_PROTO_NONE = 0, DSA_TAG_PROTO_DSA, -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net.
Re: [PATCH net-next v4 05/10] drivers: base: Add device_find_in_class_name()
On 01/17/2017 04:07 PM, Andy Shevchenko wrote: > On Wed, Jan 18, 2017 at 2:04 AM, Florian Fainelli > wrote: >> On 01/17/2017 04:00 PM, Andy Shevchenko wrote: >>> On Wed, Jan 18, 2017 at 1:43 AM, Florian Fainelli >>> wrote: On 01/17/2017 03:34 PM, Andy Shevchenko wrote: > On Wed, Jan 18, 2017 at 1:21 AM, Florian Fainelli > wrote: > >>> But why not to use void *class_name to be consistent with callback and >>> device_find_child()? >> >> The top-level function: device_find_in_class_name() should have a >> stronger typing of its argument even if it internally uses >> device_find_child() and a callback that takes a void * argument, that's >> how I see it. > > Fair enough. > >>> Btw, >>> return get_device(parent); >> >> Not sure I follow what that means here? > > Missed remark. Instead of > > get_device(parent); > return parent; > > you can use > > return get_device(parent); Seems reasonable, if I have to respin a v5, will add that, thanks! -- Florian
[PATCHv2 1/7] TAP: Refactoring macvtap.c
macvtap module has code for tap/queue management and link management. This patch splits the code into macvtap_main.c for link management and tap.c for tap/queue management. Functionality in tap.c can be re-used for implementing tap on other virtual interfaces. Signed-off-by: Sainath Grandhi --- drivers/net/Makefile | 2 + drivers/net/macvtap_main.c | 218 +++ drivers/net/{macvtap.c => tap.c} | 204 ++-- include/linux/if_macvtap.h | 10 ++ 4 files changed, 238 insertions(+), 196 deletions(-) create mode 100644 drivers/net/macvtap_main.c rename drivers/net/{macvtap.c => tap.c} (84%) create mode 100644 include/linux/if_macvtap.h diff --git a/drivers/net/Makefile b/drivers/net/Makefile index 7336cbd..19b03a9 100644 --- a/drivers/net/Makefile +++ b/drivers/net/Makefile @@ -29,6 +29,8 @@ obj-$(CONFIG_GTP) += gtp.o obj-$(CONFIG_NLMON) += nlmon.o obj-$(CONFIG_NET_VRF) += vrf.o +macvtap-objs := macvtap_main.o tap.o + # # Networking Drivers # diff --git a/drivers/net/macvtap_main.c b/drivers/net/macvtap_main.c new file mode 100644 index 000..96ffa60 --- /dev/null +++ b/drivers/net/macvtap_main.c @@ -0,0 +1,218 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include + +/* + * Variables for dealing with macvtaps device numbers. + */ +static dev_t macvtap_major; +#define MACVTAP_NUM_DEVS (1U << MINORBITS) + +static const void *macvtap_net_namespace(struct device *d) +{ + struct net_device *dev = to_net_dev(d->parent); + return dev_net(dev); +} + +static struct class macvtap_class = { + .name = "macvtap", + .owner = THIS_MODULE, + .ns_type = &net_ns_type_operations, + .namespace = macvtap_net_namespace, +}; +static struct cdev macvtap_cdev; + +#define TUN_OFFLOADS (NETIF_F_HW_CSUM | NETIF_F_TSO_ECN | NETIF_F_TSO | \ + NETIF_F_TSO6 | NETIF_F_UFO) + +static int macvtap_newlink(struct net *src_net, + struct net_device *dev, + struct nlattr *tb[], + struct nlattr *data[]) +{ + struct macvlan_dev *vlan = netdev_priv(dev); + int err; + + INIT_LIST_HEAD(&vlan->queue_list); + + /* Since macvlan supports all offloads by default, make +* tap support all offloads also. +*/ + vlan->tap_features = TUN_OFFLOADS; + + err = netdev_rx_handler_register(dev, macvtap_handle_frame, vlan); + if (err) + return err; + + /* Don't put anything that may fail after macvlan_common_newlink +* because we can't undo what it does. +*/ + err = macvlan_common_newlink(src_net, dev, tb, data); + if (err) { + netdev_rx_handler_unregister(dev); + return err; + } + + return 0; +} + +static void macvtap_dellink(struct net_device *dev, + struct list_head *head) +{ + netdev_rx_handler_unregister(dev); + macvtap_del_queues(dev); + macvlan_dellink(dev, head); +} + +static void macvtap_setup(struct net_device *dev) +{ + macvlan_common_setup(dev); + dev->tx_queue_len = TUN_READQ_SIZE; +} + +static struct rtnl_link_ops macvtap_link_ops __read_mostly = { + .kind = "macvtap", + .setup = macvtap_setup, + .newlink= macvtap_newlink, + .dellink= macvtap_dellink, +}; + +static int macvtap_device_event(struct notifier_block *unused, + unsigned long event, void *ptr) +{ + struct net_device *dev = netdev_notifier_info_to_dev(ptr); + struct macvlan_dev *vlan; + struct device *classdev; + dev_t devt; + int err; + char tap_name[IFNAMSIZ]; + + if (dev->rtnl_link_ops != &macvtap_link_ops) + return NOTIFY_DONE; + + snprintf(tap_name, IFNAMSIZ, "tap%d", dev->ifindex); + vlan = netdev_priv(dev); + + switch (event) { + case NETDEV_REGISTER: + /* Create the device node here after the network device has +* been registered but before register_netdevice has +* finished running. +*/ + err = macvtap_get_minor(vlan); + if (err) + return notifier_from_errno(err); + + devt = MKDEV(MAJOR(macvtap_major), vlan->minor); + classdev = device_create(&macvtap_class, &dev->dev, devt, +dev, tap_name); + if (IS_ERR(classdev)) { + macvtap_free_minor(vlan); + return notifier_from_errno(PTR_ERR(classdev)); + } + err = sysfs_create_l
Re: [PATCH net-next v4 05/10] drivers: base: Add device_find_in_class_name()
On Wed, Jan 18, 2017 at 2:04 AM, Florian Fainelli wrote: > On 01/17/2017 04:00 PM, Andy Shevchenko wrote: >> On Wed, Jan 18, 2017 at 1:43 AM, Florian Fainelli >> wrote: >>> On 01/17/2017 03:34 PM, Andy Shevchenko wrote: On Wed, Jan 18, 2017 at 1:21 AM, Florian Fainelli wrote: >> But why not to use void *class_name to be consistent with callback and >> device_find_child()? > > The top-level function: device_find_in_class_name() should have a > stronger typing of its argument even if it internally uses > device_find_child() and a callback that takes a void * argument, that's > how I see it. Fair enough. >> Btw, >> return get_device(parent); > > Not sure I follow what that means here? Missed remark. Instead of get_device(parent); return parent; you can use return get_device(parent); -- With Best Regards, Andy Shevchenko
[PATCHv2 0/7] Refactor macvtap to re-use tap functionality by other virtual intefaces
Tap character devices can be implemented on other virtual interfaces like ipvlan, similar to macvtap. Source code for tap functionality in macvtap can be re-used for this purpose. This patch series splits macvtap source into two modules, macvtap and tap. This patch series also includes a patch for implementing tap character device driver based on the IP-VLAN network interface, called ipvtap. These patches are tested on x86 platform. Sainath Grandhi (7): TAP: Refactoring macvtap.c TAP: Renaming tap related APIs, data structures, macros TAP: Tap character device creation/destroy API TAP: Abstract type of virtual interface from tap implementation TAP: Extending tap device create/destroy APIs TAP: tap as an independent module IPVTAP: IP-VLAN based tap driver drivers/net/Kconfig | 28 + drivers/net/Makefile |2 + drivers/net/ipvlan/Makefile |1 + drivers/net/ipvlan/ipvlan.h |7 + drivers/net/ipvlan/ipvlan_core.c |5 +- drivers/net/ipvlan/ipvlan_main.c | 27 +- drivers/net/ipvlan/ipvtap.c | 238 +++ drivers/net/macvlan.c|2 +- drivers/net/macvtap.c| 1226 ++-- drivers/net/tap.c| 1262 ++ drivers/vhost/Kconfig|2 +- drivers/vhost/net.c |3 +- include/linux/if_macvlan.h | 17 +- include/linux/if_tap.h | 75 +++ 14 files changed, 1686 insertions(+), 1209 deletions(-) create mode 100644 drivers/net/ipvlan/ipvtap.c create mode 100644 drivers/net/tap.c create mode 100644 include/linux/if_tap.h -- 2.7.4
[PATCHv2 4/7] TAP: Abstract type of virtual interface from tap implementation
macvlan object is re-structured to hold tap related elements in a separate entity, tap_dev. Upon NETDEV_REGISTER device_event, tap_dev is registered with idr and fetched again on tap_open. Few of the tap functions are modified to accepted tap_dev as argument. tap_dev object includes callbacks to be used by underlying virtual interface to take care of tx and rx accounting. Signed-off-by: Sainath Grandhi --- drivers/net/macvlan.c | 2 +- drivers/net/macvtap_main.c | 68 +--- drivers/net/tap.c | 264 - include/linux/if_tap.h | 57 +- 4 files changed, 226 insertions(+), 165 deletions(-) diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index 20b3fdf2..79383f9 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -1526,7 +1526,6 @@ static const struct nla_policy macvlan_policy[IFLA_MACVLAN_MAX + 1] = { int macvlan_link_register(struct rtnl_link_ops *ops) { /* common fields */ - ops->priv_size = sizeof(struct macvlan_dev); ops->validate = macvlan_validate; ops->maxtype= IFLA_MACVLAN_MAX; ops->policy = macvlan_policy; @@ -1549,6 +1548,7 @@ static struct rtnl_link_ops macvlan_link_ops = { .newlink= macvlan_newlink, .dellink= macvlan_dellink, .get_link_net = macvlan_get_link_net, + .priv_size = sizeof(struct macvlan_dev), }; static int macvlan_device_event(struct notifier_block *unused, diff --git a/drivers/net/macvtap_main.c b/drivers/net/macvtap_main.c index 32ad560..6326a82 100644 --- a/drivers/net/macvtap_main.c +++ b/drivers/net/macvtap_main.c @@ -24,6 +24,11 @@ #include #include +struct macvtap_dev { + struct macvlan_dev vlan; + struct tap_devtap; +}; + /* * Variables for dealing with macvtaps device numbers. */ @@ -46,22 +51,52 @@ static struct cdev macvtap_cdev; #define TUN_OFFLOADS (NETIF_F_HW_CSUM | NETIF_F_TSO_ECN | NETIF_F_TSO | \ NETIF_F_TSO6 | NETIF_F_UFO) +static void macvtap_count_tx_dropped(struct tap_dev *tap) +{ + struct macvlan_dev *vlan = (struct macvlan_dev *)container_of(tap, struct macvtap_dev, tap); + + this_cpu_inc(vlan->pcpu_stats->tx_dropped); +} + +static void macvtap_count_rx_dropped(struct tap_dev *tap) +{ + struct macvlan_dev *vlan = (struct macvlan_dev *)container_of(tap, struct macvtap_dev, tap); + + macvlan_count_rx(vlan, 0, 0, 0); +} + +static void macvtap_update_features(struct tap_dev *tap, + netdev_features_t features) +{ + struct macvlan_dev *vlan = (struct macvlan_dev *)container_of(tap, struct macvtap_dev, tap); + + vlan->set_features = features; + netdev_update_features(vlan->dev); +} + static int macvtap_newlink(struct net *src_net, struct net_device *dev, struct nlattr *tb[], struct nlattr *data[]) { - struct macvlan_dev *vlan = netdev_priv(dev); + struct macvtap_dev *vlantap = netdev_priv(dev); int err; - INIT_LIST_HEAD(&vlan->queue_list); + INIT_LIST_HEAD(&vlantap->tap.queue_list); /* Since macvlan supports all offloads by default, make * tap support all offloads also. */ - vlan->tap_features = TUN_OFFLOADS; + vlantap->tap.tap_features = TUN_OFFLOADS; - err = netdev_rx_handler_register(dev, tap_handle_frame, vlan); + /* Register callbacks for rx/tx drops accounting and updating +* net_device features +*/ + vlantap->tap.count_tx_dropped = macvtap_count_tx_dropped; + vlantap->tap.count_rx_dropped = macvtap_count_rx_dropped; + vlantap->tap.update_features = macvtap_update_features; + + err = netdev_rx_handler_register(dev, tap_handle_frame, &vlantap->tap); if (err) return err; @@ -74,14 +109,18 @@ static int macvtap_newlink(struct net *src_net, return err; } + vlantap->tap.dev = vlantap->vlan.dev; + return 0; } static void macvtap_dellink(struct net_device *dev, struct list_head *head) { + struct macvtap_dev *vlantap = netdev_priv(dev); + netdev_rx_handler_unregister(dev); - tap_del_queues(dev); + tap_del_queues(&vlantap->tap); macvlan_dellink(dev, head); } @@ -96,13 +135,14 @@ static struct rtnl_link_ops macvtap_link_ops __read_mostly = { .setup = macvtap_setup, .newlink= macvtap_newlink, .dellink= macvtap_dellink, + .priv_size = sizeof(struct macvtap_dev), }; static int macvtap_device_event(struct notifier_block *unused, unsigned long event, void *ptr) { struct net_device *dev = netdev_notifier_info_to_dev(ptr); - struct macvlan_dev *vlan; +
[PATCHv2 7/7] IPVTAP: IP-VLAN based tap driver
This patch adds a tap character device driver that is based on the IP-VLAN network interface, called ipvtap. An ipvtap device can be created in the same way as an ipvlan device, using 'type ipvtap', and then accessed using the tap user space interface. Signed-off-by: Sainath Grandhi --- drivers/net/Kconfig | 13 +++ drivers/net/Makefile | 1 + drivers/net/ipvlan/Makefile | 1 + drivers/net/ipvlan/ipvlan.h | 7 ++ drivers/net/ipvlan/ipvlan_core.c | 5 +- drivers/net/ipvlan/ipvlan_main.c | 27 +++-- drivers/net/ipvlan/ipvtap.c | 238 +++ 7 files changed, 278 insertions(+), 14 deletions(-) create mode 100644 drivers/net/ipvlan/ipvtap.c diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index 1c88437..d07b5f5 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -166,6 +166,19 @@ config IPVLAN To compile this driver as a module, choose M here: the module will be called ipvlan. +config IPVTAP + tristate "IP-VLAN based tap driver" + depends on IPVLAN + depends on INET + depends on TAP + ---help--- + This adds a specialized tap character device driver that is based + on the IP-VLAN network interface, called ipvtap. An ipvtap device + can be added in the same way as a ipvlan device, using 'type + ipvtap', and then be accessed through the tap user space interface. + + To compile this driver as a module, choose M here: the module + will be called ipvtap. config VXLAN tristate "Virtual eXtensible Local Area Network (VXLAN)" diff --git a/drivers/net/Makefile b/drivers/net/Makefile index 7dd86ca..98ed4d9 100644 --- a/drivers/net/Makefile +++ b/drivers/net/Makefile @@ -7,6 +7,7 @@ # obj-$(CONFIG_BONDING) += bonding/ obj-$(CONFIG_IPVLAN) += ipvlan/ +obj-$(CONFIG_IPVTAP) += ipvlan/ obj-$(CONFIG_DUMMY) += dummy.o obj-$(CONFIG_EQUALIZER) += eql.o obj-$(CONFIG_IFB) += ifb.o diff --git a/drivers/net/ipvlan/Makefile b/drivers/net/ipvlan/Makefile index df79910..8a2c64d 100644 --- a/drivers/net/ipvlan/Makefile +++ b/drivers/net/ipvlan/Makefile @@ -3,5 +3,6 @@ # obj-$(CONFIG_IPVLAN) += ipvlan.o +obj-$(CONFIG_IPVTAP) += ipvtap.o ipvlan-objs := ipvlan_core.o ipvlan_main.o diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h index dbfbb33..4362d88 100644 --- a/drivers/net/ipvlan/ipvlan.h +++ b/drivers/net/ipvlan/ipvlan.h @@ -133,4 +133,11 @@ struct sk_buff *ipvlan_l3_rcv(struct net_device *dev, struct sk_buff *skb, u16 proto); unsigned int ipvlan_nf_input(void *priv, struct sk_buff *skb, const struct nf_hook_state *state); +void ipvlan_count_rx(const struct ipvl_dev *ipvlan, +unsigned int len, bool success, bool mcast); +int ipvlan_link_new(struct net *src_net, struct net_device *dev, + struct nlattr *tb[], struct nlattr *data[]); +void ipvlan_link_delete(struct net_device *dev, struct list_head *head); +void ipvlan_link_setup(struct net_device *dev); +int ipvlan_link_register(struct rtnl_link_ops *ops); #endif /* __IPVLAN_H */ diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c index 83ce74a..9af16ab 100644 --- a/drivers/net/ipvlan/ipvlan_core.c +++ b/drivers/net/ipvlan/ipvlan_core.c @@ -16,8 +16,8 @@ void ipvlan_init_secret(void) net_get_random_once(&ipvlan_jhash_secret, sizeof(ipvlan_jhash_secret)); } -static void ipvlan_count_rx(const struct ipvl_dev *ipvlan, - unsigned int len, bool success, bool mcast) +void ipvlan_count_rx(const struct ipvl_dev *ipvlan, +unsigned int len, bool success, bool mcast) { if (!ipvlan) return; @@ -36,6 +36,7 @@ static void ipvlan_count_rx(const struct ipvl_dev *ipvlan, this_cpu_inc(ipvlan->pcpu_stats->rx_errs); } } +EXPORT_SYMBOL_GPL(ipvlan_count_rx); static u8 ipvlan_get_v6_hash(const void *iaddr) { diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c index 8b0f993..ed750e2 100644 --- a/drivers/net/ipvlan/ipvlan_main.c +++ b/drivers/net/ipvlan/ipvlan_main.c @@ -494,8 +494,8 @@ static int ipvlan_nl_fillinfo(struct sk_buff *skb, return ret; } -static int ipvlan_link_new(struct net *src_net, struct net_device *dev, - struct nlattr *tb[], struct nlattr *data[]) +int ipvlan_link_new(struct net *src_net, struct net_device *dev, + struct nlattr *tb[], struct nlattr *data[]) { struct ipvl_dev *ipvlan = netdev_priv(dev); struct ipvl_port *port; @@ -567,8 +567,9 @@ static int ipvlan_link_new(struct net *src_net, struct net_device *dev, ipvlan_port_destroy(phy_dev); return err; } +EXPORT_SYMBOL_GPL(ipvlan_link_new); -static void ipvlan_link_delete(struct net_device *dev, struct list_head *head) +void ipvlan_link_de
[PATCHv2 3/7] TAP: Tap character device creation/destroy API
This patch provides tap device create/destroy APIs in tap.c. Signed-off-by: Sainath Grandhi --- drivers/net/macvtap_main.c | 29 +++-- drivers/net/tap.c | 63 ++ include/linux/if_tap.h | 5 +++- 3 files changed, 65 insertions(+), 32 deletions(-) diff --git a/drivers/net/macvtap_main.c b/drivers/net/macvtap_main.c index 548f339..32ad560 100644 --- a/drivers/net/macvtap_main.c +++ b/drivers/net/macvtap_main.c @@ -28,7 +28,6 @@ * Variables for dealing with macvtaps device numbers. */ static dev_t macvtap_major; -#define MACVTAP_NUM_DEVS (1U << MINORBITS) static const void *macvtap_net_namespace(struct device *d) { @@ -159,43 +158,35 @@ static struct notifier_block macvtap_notifier_block __read_mostly = { .notifier_call = macvtap_device_event, }; -extern struct file_operations tap_fops; static int macvtap_init(void) { int err; - err = alloc_chrdev_region(&macvtap_major, 0, - MACVTAP_NUM_DEVS, "macvtap"); - if (err) - goto out1; + err = tap_create_cdev(&macvtap_cdev, &macvtap_major, "macvtap"); - cdev_init(&macvtap_cdev, &tap_fops); - err = cdev_add(&macvtap_cdev, macvtap_major, MACVTAP_NUM_DEVS); if (err) - goto out2; + goto out1; err = class_register(&macvtap_class); if (err) - goto out3; + goto out2; err = register_netdevice_notifier(&macvtap_notifier_block); if (err) - goto out4; + goto out3; err = macvlan_link_register(&macvtap_link_ops); if (err) - goto out5; + goto out4; return 0; -out5: - unregister_netdevice_notifier(&macvtap_notifier_block); out4: - class_unregister(&macvtap_class); + unregister_netdevice_notifier(&macvtap_notifier_block); out3: - cdev_del(&macvtap_cdev); + class_unregister(&macvtap_class); out2: - unregister_chrdev_region(macvtap_major, MACVTAP_NUM_DEVS); + cdev_del(&macvtap_cdev); out1: return err; } @@ -207,9 +198,7 @@ static void macvtap_exit(void) rtnl_link_unregister(&macvtap_link_ops); unregister_netdevice_notifier(&macvtap_notifier_block); class_unregister(&macvtap_class); - cdev_del(&macvtap_cdev); - unregister_chrdev_region(macvtap_major, MACVTAP_NUM_DEVS); - idr_destroy(&minor_idr); + tap_destroy_cdev(macvtap_major, &macvtap_cdev); } module_exit(macvtap_exit); diff --git a/drivers/net/tap.c b/drivers/net/tap.c index d0807c2..774ef33 100644 --- a/drivers/net/tap.c +++ b/drivers/net/tap.c @@ -123,8 +123,12 @@ static struct proto tap_proto = { }; #define TAP_NUM_DEVS (1U << MINORBITS) -static DEFINE_MUTEX(minor_lock); -DEFINE_IDR(minor_idr); +struct major_info { + dev_t major; + struct idr minor_idr; + struct mutex minor_lock; + const char *device_name; +} macvtap_major; #define GOODCOPY_LEN 128 @@ -413,26 +417,26 @@ int tap_get_minor(struct macvlan_dev *vlan) { int retval = -ENOMEM; - mutex_lock(&minor_lock); - retval = idr_alloc(&minor_idr, vlan, 1, TAP_NUM_DEVS, GFP_KERNEL); + mutex_lock(&macvtap_major.minor_lock); + retval = idr_alloc(&macvtap_major.minor_idr, vlan, 1, TAP_NUM_DEVS, GFP_KERNEL); if (retval >= 0) { vlan->minor = retval; } else if (retval == -ENOSPC) { netdev_err(vlan->dev, "Too many tap devices\n"); retval = -EINVAL; } - mutex_unlock(&minor_lock); + mutex_unlock(&macvtap_major.minor_lock); return retval < 0 ? retval : 0; } void tap_free_minor(struct macvlan_dev *vlan) { - mutex_lock(&minor_lock); + mutex_lock(&macvtap_major.minor_lock); if (vlan->minor) { - idr_remove(&minor_idr, vlan->minor); + idr_remove(&macvtap_major.minor_idr, vlan->minor); vlan->minor = 0; } - mutex_unlock(&minor_lock); + mutex_unlock(&macvtap_major.minor_lock); } static struct net_device *dev_get_by_tap_minor(int minor) @@ -440,13 +444,13 @@ static struct net_device *dev_get_by_tap_minor(int minor) struct net_device *dev = NULL; struct macvlan_dev *vlan; - mutex_lock(&minor_lock); - vlan = idr_find(&minor_idr, minor); + mutex_lock(&macvtap_major.minor_lock); + vlan = idr_find(&macvtap_major.minor_idr, minor); if (vlan) { dev = vlan->dev; dev_hold(dev); } - mutex_unlock(&minor_lock); + mutex_unlock(&macvtap_major.minor_lock); return dev; } @@ -1184,3 +1188,40 @@ int tap_queue_resize(struct macvlan_dev *vlan) kfree(arrays); return ret; } + +int tap_create_cdev(struct cdev *tap_cdev, + dev_t *tap_major, const char *devic
[PATCHv2 2/7] TAP: Renaming tap related APIs, data structures, macros
Renaming tap related APIs, data structures and macros in tap.c from macvtap_.* to tap_.* Signed-off-by: Sainath Grandhi --- drivers/net/macvtap_main.c | 18 +-- drivers/net/tap.c | 332 ++--- drivers/vhost/net.c| 3 +- include/linux/if_macvlan.h | 17 +-- include/linux/if_macvtap.h | 10 -- include/linux/if_tap.h | 23 6 files changed, 202 insertions(+), 201 deletions(-) delete mode 100644 include/linux/if_macvtap.h create mode 100644 include/linux/if_tap.h diff --git a/drivers/net/macvtap_main.c b/drivers/net/macvtap_main.c index 96ffa60..548f339 100644 --- a/drivers/net/macvtap_main.c +++ b/drivers/net/macvtap_main.c @@ -1,6 +1,6 @@ #include #include -#include +#include #include #include #include @@ -62,7 +62,7 @@ static int macvtap_newlink(struct net *src_net, */ vlan->tap_features = TUN_OFFLOADS; - err = netdev_rx_handler_register(dev, macvtap_handle_frame, vlan); + err = netdev_rx_handler_register(dev, tap_handle_frame, vlan); if (err) return err; @@ -82,7 +82,7 @@ static void macvtap_dellink(struct net_device *dev, struct list_head *head) { netdev_rx_handler_unregister(dev); - macvtap_del_queues(dev); + tap_del_queues(dev); macvlan_dellink(dev, head); } @@ -121,7 +121,7 @@ static int macvtap_device_event(struct notifier_block *unused, * been registered but before register_netdevice has * finished running. */ - err = macvtap_get_minor(vlan); + err = tap_get_minor(vlan); if (err) return notifier_from_errno(err); @@ -129,7 +129,7 @@ static int macvtap_device_event(struct notifier_block *unused, classdev = device_create(&macvtap_class, &dev->dev, devt, dev, tap_name); if (IS_ERR(classdev)) { - macvtap_free_minor(vlan); + tap_free_minor(vlan); return notifier_from_errno(PTR_ERR(classdev)); } err = sysfs_create_link(&dev->dev.kobj, &classdev->kobj, @@ -144,10 +144,10 @@ static int macvtap_device_event(struct notifier_block *unused, sysfs_remove_link(&dev->dev.kobj, tap_name); devt = MKDEV(MAJOR(macvtap_major), vlan->minor); device_destroy(&macvtap_class, devt); - macvtap_free_minor(vlan); + tap_free_minor(vlan); break; case NETDEV_CHANGE_TX_QUEUE_LEN: - if (macvtap_queue_resize(vlan)) + if (tap_queue_resize(vlan)) return NOTIFY_BAD; break; } @@ -159,7 +159,7 @@ static struct notifier_block macvtap_notifier_block __read_mostly = { .notifier_call = macvtap_device_event, }; -extern struct file_operations macvtap_fops; +extern struct file_operations tap_fops; static int macvtap_init(void) { int err; @@ -169,7 +169,7 @@ static int macvtap_init(void) if (err) goto out1; - cdev_init(&macvtap_cdev, &macvtap_fops); + cdev_init(&macvtap_cdev, &tap_fops); err = cdev_add(&macvtap_cdev, macvtap_major, MACVTAP_NUM_DEVS); if (err) goto out2; diff --git a/drivers/net/tap.c b/drivers/net/tap.c index 8f12a39..d0807c2 100644 --- a/drivers/net/tap.c +++ b/drivers/net/tap.c @@ -24,16 +24,16 @@ #include /* - * A macvtap queue is the central object of this driver, it connects + * A tap queue is the central object of this driver, it connects * an open character device to a macvlan interface. There can be * multiple queues on one interface, which map back to queues * implemented in hardware on the underlying device. * - * macvtap_proto is used to allocate queues through the sock allocation + * tap_proto is used to allocate queues through the sock allocation * mechanism. * */ -struct macvtap_queue { +struct tap_queue { struct sock sk; struct socket sock; struct socket_wq wq; @@ -47,21 +47,21 @@ struct macvtap_queue { struct skb_array skb_array; }; -#define MACVTAP_FEATURES (IFF_VNET_HDR | IFF_MULTI_QUEUE) +#define TAP_IFFEATURES (IFF_VNET_HDR | IFF_MULTI_QUEUE) -#define MACVTAP_VNET_LE 0x8000 -#define MACVTAP_VNET_BE 0x4000 +#define TAP_VNET_LE 0x8000 +#define TAP_VNET_BE 0x4000 #ifdef CONFIG_TUN_VNET_CROSS_LE -static inline bool macvtap_legacy_is_little_endian(struct macvtap_queue *q) +static inline bool tap_legacy_is_little_endian(struct tap_queue *q) { - return q->flags & MACVTAP_VNET_BE ? false : + return q->flags & TAP_VNET_BE ? false : virtio_legacy_is_little_endian(); } -static long macvtap_get_vnet_be(struct macvtap_queue *q, int __user *sp) +static long tap_g
[PATCHv2 6/7] TAP: tap as an independent module
This patch makes tap a separate module for other types of virtual interfaces, for example, ipvlan to use. Signed-off-by: Sainath Grandhi --- drivers/net/Kconfig | 15 +++ drivers/net/Makefile | 3 +-- drivers/net/{macvtap_main.c => macvtap.c} | 1 - drivers/net/tap.c | 11 +++ drivers/vhost/Kconfig | 2 +- include/linux/if_tap.h| 4 ++-- 6 files changed, 30 insertions(+), 6 deletions(-) rename drivers/net/{macvtap_main.c => macvtap.c} (99%) diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index 95c32f2..1c88437 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -135,6 +135,7 @@ config MACVTAP tristate "MAC-VLAN based tap driver" depends on MACVLAN depends on INET + depends on TAP help This adds a specialized tap character device driver that is based on the MAC-VLAN network interface, called macvtap. A macvtap device @@ -284,6 +285,20 @@ config TUN If you don't know what to use this for, you don't need it. +config TAP +tristate "TAP module support for virtual interfaces" +---help--- + TAP module serves two purposes. This can be used as library of functions + for virtual interfaces to implement tap functionality. + + This module also includes character device file and socket operations + that can be used by virtual interface implementing tap. + + To compile this driver as a module, choose M here: the module + will be called tap. + + If you don't know what to use this for, you don't need it. + config TUN_VNET_CROSS_LE bool "Support for cross-endian vnet headers on little-endian kernels" default n diff --git a/drivers/net/Makefile b/drivers/net/Makefile index 19b03a9..7dd86ca 100644 --- a/drivers/net/Makefile +++ b/drivers/net/Makefile @@ -21,6 +21,7 @@ obj-$(CONFIG_PHYLIB) += phy/ obj-$(CONFIG_RIONET) += rionet.o obj-$(CONFIG_NET_TEAM) += team/ obj-$(CONFIG_TUN) += tun.o +obj-$(CONFIG_TAP) += tap.o obj-$(CONFIG_VETH) += veth.o obj-$(CONFIG_VIRTIO_NET) += virtio_net.o obj-$(CONFIG_VXLAN) += vxlan.o @@ -29,8 +30,6 @@ obj-$(CONFIG_GTP) += gtp.o obj-$(CONFIG_NLMON) += nlmon.o obj-$(CONFIG_NET_VRF) += vrf.o -macvtap-objs := macvtap_main.o tap.o - # # Networking Drivers # diff --git a/drivers/net/macvtap_main.c b/drivers/net/macvtap.c similarity index 99% rename from drivers/net/macvtap_main.c rename to drivers/net/macvtap.c index 3f047b4..3efed94 100644 --- a/drivers/net/macvtap_main.c +++ b/drivers/net/macvtap.c @@ -232,7 +232,6 @@ static int macvtap_init(void) } module_init(macvtap_init); -extern struct idr minor_idr; static void macvtap_exit(void) { rtnl_link_unregister(&macvtap_link_ops); diff --git a/drivers/net/tap.c b/drivers/net/tap.c index 7f38dbe..32066dd 100644 --- a/drivers/net/tap.c +++ b/drivers/net/tap.c @@ -311,6 +311,7 @@ void tap_del_queues(struct tap_dev *tap) /* guarantee that any future tap_set_queue will fail */ tap->numvtaps = MAX_TAP_QUEUES; } +EXPORT_SYMBOL_GPL(tap_del_queues); rx_handler_result_t tap_handle_frame(struct sk_buff **pskb) { @@ -388,6 +389,7 @@ rx_handler_result_t tap_handle_frame(struct sk_buff **pskb) kfree_skb(skb); return RX_HANDLER_CONSUMED; } +EXPORT_SYMBOL_GPL(tap_handle_frame); static struct major_info *tap_get_major(int major) { @@ -422,6 +424,7 @@ int tap_get_minor(dev_t major, struct tap_dev *tap) mutex_unlock(&tap_major->minor_lock); return retval < 0 ? retval : 0; } +EXPORT_SYMBOL_GPL(tap_get_minor); void tap_free_minor(dev_t major, struct tap_dev *tap) { @@ -438,6 +441,7 @@ void tap_free_minor(dev_t major, struct tap_dev *tap) } mutex_unlock(&tap_major->minor_lock); } +EXPORT_SYMBOL_GPL(tap_free_minor); static struct tap_dev *dev_get_by_tap_file(int major, int minor) { @@ -1193,6 +1197,7 @@ int tap_queue_resize(struct tap_dev *tap) kfree(arrays); return ret; } +EXPORT_SYMBOL_GPL(tap_queue_resize); static int tap_list_add(dev_t major, const char *device_name) { @@ -1236,6 +1241,7 @@ int tap_create_cdev(struct cdev *tap_cdev, out1: return err; } +EXPORT_SYMBOL_GPL(tap_create_cdev); void tap_destroy_cdev(dev_t major, struct cdev *tap_cdev) { @@ -1249,3 +1255,8 @@ void tap_destroy_cdev(dev_t major, struct cdev *tap_cdev) unregister_chrdev_region(major, TAP_NUM_DEVS); idr_destroy(&tap_major->minor_idr); } +EXPORT_SYMBOL_GPL(tap_destroy_cdev); + +MODULE_AUTHOR("Arnd Bergmann "); +MODULE_AUTHOR("Sainath Grandhi "); +MODULE_LICENSE("GPL"); diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig index 40764ec..cfdecea 100644 --- a/drivers/vhost/Kconfig +++ b/drivers/vhost/Kconfig @@ -1,6 +1,6 @@ config VHOST_NET tristate "Host kernel accelerator for virtio net" - depends on NE
[PATCHv2 5/7] TAP: Extending tap device create/destroy APIs
Extending tap APIs get/free_minor and create/destroy_cdev to handle more than one type of virtual interface. Signed-off-by: Sainath Grandhi --- drivers/net/macvtap_main.c | 6 +-- drivers/net/tap.c | 98 +++--- include/linux/if_tap.h | 4 +- 3 files changed, 80 insertions(+), 28 deletions(-) diff --git a/drivers/net/macvtap_main.c b/drivers/net/macvtap_main.c index 6326a82..3f047b4 100644 --- a/drivers/net/macvtap_main.c +++ b/drivers/net/macvtap_main.c @@ -160,7 +160,7 @@ static int macvtap_device_event(struct notifier_block *unused, * been registered but before register_netdevice has * finished running. */ - err = tap_get_minor(&vlantap->tap); + err = tap_get_minor(macvtap_major, &vlantap->tap); if (err) return notifier_from_errno(err); @@ -168,7 +168,7 @@ static int macvtap_device_event(struct notifier_block *unused, classdev = device_create(&macvtap_class, &dev->dev, devt, dev, tap_name); if (IS_ERR(classdev)) { - tap_free_minor(&vlantap->tap); + tap_free_minor(macvtap_major, &vlantap->tap); return notifier_from_errno(PTR_ERR(classdev)); } err = sysfs_create_link(&dev->dev.kobj, &classdev->kobj, @@ -183,7 +183,7 @@ static int macvtap_device_event(struct notifier_block *unused, sysfs_remove_link(&dev->dev.kobj, tap_name); devt = MKDEV(MAJOR(macvtap_major), vlantap->tap.minor); device_destroy(&macvtap_class, devt); - tap_free_minor(&vlantap->tap); + tap_free_minor(macvtap_major, &vlantap->tap); break; case NETDEV_CHANGE_TX_QUEUE_LEN: if (tap_queue_resize(&vlantap->tap)) diff --git a/drivers/net/tap.c b/drivers/net/tap.c index 43d9d54..7f38dbe 100644 --- a/drivers/net/tap.c +++ b/drivers/net/tap.c @@ -99,12 +99,16 @@ static struct proto tap_proto = { }; #define TAP_NUM_DEVS (1U << MINORBITS) + +static LIST_HEAD(major_list); + struct major_info { dev_t major; struct idr minor_idr; struct mutex minor_lock; const char *device_name; -} macvtap_major; + struct list_head next; +}; #define GOODCOPY_LEN 128 @@ -385,44 +389,73 @@ rx_handler_result_t tap_handle_frame(struct sk_buff **pskb) return RX_HANDLER_CONSUMED; } -int tap_get_minor(struct tap_dev *tap) +static struct major_info *tap_get_major(int major) +{ + struct major_info *tap_major, *tmp; + + list_for_each_entry_safe(tap_major, tmp, &major_list, next) { + if (tap_major->major == major) { + return tap_major; + } + } + + return NULL; +} + +int tap_get_minor(dev_t major, struct tap_dev *tap) { int retval = -ENOMEM; + struct major_info *tap_major; + + tap_major = tap_get_major(MAJOR(major)); + if (!tap_major) + return -EINVAL; - mutex_lock(&macvtap_major.minor_lock); - retval = idr_alloc(&macvtap_major.minor_idr, tap, 1, TAP_NUM_DEVS, GFP_KERNEL); + mutex_lock(&tap_major->minor_lock); + retval = idr_alloc(&tap_major->minor_idr, tap, 1, TAP_NUM_DEVS, GFP_KERNEL); if (retval >= 0) { tap->minor = retval; } else if (retval == -ENOSPC) { netdev_err(tap->dev, "Too many tap devices\n"); retval = -EINVAL; } - mutex_unlock(&macvtap_major.minor_lock); + mutex_unlock(&tap_major->minor_lock); return retval < 0 ? retval : 0; } -void tap_free_minor(struct tap_dev *tap) +void tap_free_minor(dev_t major, struct tap_dev *tap) { - mutex_lock(&macvtap_major.minor_lock); + struct major_info *tap_major; + + tap_major = tap_get_major(MAJOR(major)); + if (!tap_major) + return; + + mutex_lock(&tap_major->minor_lock); if (tap->minor) { - idr_remove(&macvtap_major.minor_idr, tap->minor); + idr_remove(&tap_major->minor_idr, tap->minor); tap->minor = 0; } - mutex_unlock(&macvtap_major.minor_lock); + mutex_unlock(&tap_major->minor_lock); } -static struct tap_dev *dev_get_by_tap_minor(int minor) +static struct tap_dev *dev_get_by_tap_file(int major, int minor) { struct net_device *dev = NULL; struct tap_dev *tap; + struct major_info *tap_major; + + tap_major = tap_get_major(major); + if (!tap_major) + return NULL; - mutex_lock(&macvtap_major.minor_lock); - tap = idr_find(&macvtap_major.minor_idr, minor); + mutex_lock(&tap_major->minor_lock); + tap = idr_find(&tap_major->minor_idr, minor); if (tap) { dev = tap->dev;
Re: [PATCH net-next v4 05/10] drivers: base: Add device_find_in_class_name()
On 01/17/2017 04:00 PM, Andy Shevchenko wrote: > On Wed, Jan 18, 2017 at 1:43 AM, Florian Fainelli > wrote: >> On 01/17/2017 03:34 PM, Andy Shevchenko wrote: >>> On Wed, Jan 18, 2017 at 1:21 AM, Florian Fainelli >>> wrote: > +static int device_class_name_match(struct device *dev, void *class) >>> >>> And why not const char *class? >> >> This was raised back in v2, and the same response applies: >> >> https://www.mail-archive.com/netdev@vger.kernel.org/msg147559.html >> >> Changing the signature of a callback is out of the scope of this patch >> series. > > Ah, right. > > But why not to use void *class_name to be consistent with callback and > device_find_child()? The top-level function: device_find_in_class_name() should have a stronger typing of its argument even if it internally uses device_find_child() and a callback that takes a void * argument, that's how I see it. > > Btw, > > return get_device(parent); Not sure I follow what that means here? -- Florian
Recall: [PATCHv2 0/7] Refactor macvtap to re-use tap functionality by other virtual intefaces
Grandhi, Sainath would like to recall the message, "[PATCHv2 0/7] Refactor macvtap to re-use tap functionality by other virtual intefaces".
[PATCHv2 0/7] Refactor macvtap to re-use tap functionality by other virtual intefaces
Tap character devices can be implemented on other virtual interfaces like ipvlan, similar to macvtap. Source code for tap functionality in macvtap can be re-used for this purpose. This patch series splits macvtap source into two modules, macvtap and tap. This patch series also includes a patch for implementing tap character device driver based on the IP-VLAN network interface, called ipvtap. These patches are tested on x86 platform. Sainath Grandhi (7): TAP: Refactoring macvtap.c TAP: Renaming tap related APIs, data structures, macros TAP: Tap character device creation/destroy API TAP: Abstract type of virtual interface from tap implementation TAP: Extending tap device create/destroy APIs TAP: tap as an independent module IPVTAP: IP-VLAN based tap driver drivers/net/Kconfig | 28 + drivers/net/Makefile |2 + drivers/net/ipvlan/Makefile |1 + drivers/net/ipvlan/ipvlan.h |7 + drivers/net/ipvlan/ipvlan_core.c |5 +- drivers/net/ipvlan/ipvlan_main.c | 27 +- drivers/net/ipvlan/ipvtap.c | 238 +++ drivers/net/macvlan.c|2 +- drivers/net/macvtap.c| 1226 ++-- drivers/net/tap.c| 1262 ++ drivers/vhost/Kconfig|2 +- drivers/vhost/net.c |3 +- include/linux/if_macvlan.h | 17 +- include/linux/if_tap.h | 75 +++ 14 files changed, 1686 insertions(+), 1209 deletions(-) create mode 100644 drivers/net/ipvlan/ipvtap.c create mode 100644 drivers/net/tap.c create mode 100644 include/linux/if_tap.h -- 2.7.4
Re: Getting a handle on all these new NIC features
On 01/17/2017 02:05 PM, Tom Herbert wrote: > I realize that backports of a driver is not a specific concern of the > Linux kernel, but nevertheless this is a real problem and a fact of > life for many users. Rebasing the full kernel is still a major effort > and it seems the best we could ever do is one rebase per year. In the > interim we need to occasionally backport drivers. Backporting drivers > is difficult precisely because of new features or API changes to > existing ones. These sort of changes tend to have a spiderweb of > dependencies in other parts of the stack so that the number of patches > we need to cherry-pick goes way beyond those that touch the driver we > are interested in. backports (formerly known as compat-wireless) dealt with that problem by pulling in all dependencies from the networking stack (and beyond ), this allowed people with a need to stay on a particular kernel version to get the newest and latest networking bits and drivers with minor disruption to other parts of the kernel. The project now seems to be largely dead, but could be revived I presume: https://backports.wiki.kernel.org/index.php/Main_Page > > In short, I would like to ask if driver maintainers to start to > modularize driver features. If something being added is obviously a > narrow feature that only a subset of users will need can we allow > config options to #ifdef those out somehow? Multiplying the number if #ifdef means that every config option is going to be turned on by Linux distributions, and most likely just a subset will be turned by specific kernel configurations (like yours), but all in all, this multiplies the number of build combinations to a point where this may not be manageable for an upstream driver and some combinations won't be tested properly except by whoever diverges from these. I understand the concern of modularizing and having clean independent features/modules, I am unsure that more configuration options is necessarily right approach. Slightly tangential, once a series of patches lands in a given maintainers' tree, it is very hard to match a given commit with its original submission and say, locate the 11 other patches out of this 12 patch series adding feature XYZ of interest. David does a great job a putting submissions in a branch, which helps a lot, but in general, there is not enough information in git to associate a given patch with its companion patches within a series, hence making backporting harder IMHO. -- Florian
Re: [PATCH net-next v4 05/10] drivers: base: Add device_find_in_class_name()
On Wed, Jan 18, 2017 at 1:43 AM, Florian Fainelli wrote: > On 01/17/2017 03:34 PM, Andy Shevchenko wrote: >> On Wed, Jan 18, 2017 at 1:21 AM, Florian Fainelli >> wrote: >>> +static int device_class_name_match(struct device *dev, void *class) >> >> And why not const char *class? > > This was raised back in v2, and the same response applies: > > https://www.mail-archive.com/netdev@vger.kernel.org/msg147559.html > > Changing the signature of a callback is out of the scope of this patch > series. Ah, right. But why not to use void *class_name to be consistent with callback and device_find_child()? Btw, return get_device(parent); -- With Best Regards, Andy Shevchenko
RE: [PATCHv1 5/7] TAP: Extending tap device create/destroy APIs
Please find reply inline. > -Original Message- > From: Andy Shevchenko [mailto:andy.shevche...@gmail.com] > Sent: Friday, January 06, 2017 3:21 PM > To: Grandhi, Sainath > Cc: netdev ; David S. Miller > ; mah...@bandewar.net; linux- > ker...@vger.kernel.org > Subject: Re: [PATCHv1 5/7] TAP: Extending tap device create/destroy APIs > > On Sat, Jan 7, 2017 at 12:33 AM, Sainath Grandhi > wrote: > > Extending tap APIs get/free_minor and create/destroy_cdev to handle > > more than one type of virtual interface. > > > > Signed-off-by: Sainath Grandhi > > Tested-by: Sainath Grandhi > > Usually it implies that commiter has tested the stuff. > > > --- a/drivers/net/tap.c > > +++ b/drivers/net/tap.c > > @@ -99,12 +99,16 @@ static struct proto tap_proto = { }; > > > > #define TAP_NUM_DEVS (1U << MINORBITS) > > > + > > +LIST_HEAD(major_list); > > + > > static ? Makes sense. Would take care of it. > > > -int tap_get_minor(struct tap_dev *tap) > > +int tap_get_minor(dev_t major, struct tap_dev *tap) > > { > > int retval = -ENOMEM; > > + struct major_info *tap_major, *tmp; > > + bool found = false; > > > > - mutex_lock(&macvtap_major.minor_lock); > > - retval = idr_alloc(&macvtap_major.minor_idr, tap, 1, TAP_NUM_DEVS, > GFP_KERNEL); > > > + list_for_each_entry_safe(tap_major, tmp, &major_list, next) { > > + if (tap_major->major == MAJOR(major)) { > > + found = true; > > + break; > > + } > > + } > > + > > + if (!found) > > + return -EINVAL; > > This is candidate to be a separate helper function. See also below. Would define a helper function. > > > > -void tap_free_minor(struct tap_dev *tap) > > +void tap_free_minor(dev_t major, struct tap_dev *tap) > > { > > - mutex_lock(&macvtap_major.minor_lock); > > + struct major_info *tap_major, *tmp; > > > + bool found = false; > > + > > + list_for_each_entry_safe(tap_major, tmp, &major_list, next) { > > + if (tap_major->major == MAJOR(major)) { > > + found = true; > > + break; > > + } > > + } > > + > > + if (!found) > > + return; > > Here is quite the same code (as above). > > > -static struct tap_dev *dev_get_by_tap_minor(int minor) > > +static struct tap_dev *dev_get_by_tap_file(int major, int minor) > > { > > struct net_device *dev = NULL; > > struct tap_dev *tap; > > + struct major_info *tap_major, *tmp; > > + bool found = false; > > > > - mutex_lock(&macvtap_major.minor_lock); > > - tap = idr_find(&macvtap_major.minor_idr, minor); > > > + list_for_each_entry_safe(tap_major, tmp, &major_list, next) { > > + if (tap_major->major == major) { > > + found = true; > > + break; > > + } > > + } > > + > > + if (!found) > > + return NULL; > > And here. > > > +static int tap_list_add(dev_t major, const char *device_name) { > > > + int err = 0; > > + struct major_info *tap_major; > > Perhaps > + struct major_info *tap_major; > + int err = 0; > > > + > > + tap_major = kzalloc(sizeof(*tap_major), GFP_ATOMIC); > > + > > + tap_major->major = MAJOR(major); > > + > > + idr_init(&tap_major->minor_idr); > > + mutex_init(&tap_major->minor_lock); > > + > > + tap_major->device_name = device_name; > > + > > + list_add_tail(&tap_major->next, &major_list); > > + return err; > > > > + err = tap_list_add(*tap_major, device_name); > > > > return err; > > return tap_list_add(); > > > void tap_destroy_cdev(dev_t major, struct cdev *tap_cdev) { > > + struct major_info *tap_major, *tmp; > > + bool found = false; > > + > > + list_for_each_entry_safe(tap_major, tmp, &major_list, next) { > > + if (tap_major->major == MAJOR(major)) { > > + found = true; > > + break; > > + } > > + } > > + > > + if (!found) > > + return; > > And here. > > -- > With Best Regards, > Andy Shevchenko
[PATCH net-next 1/2] net: ipv6: remove nowait arg to rt6_fill_node
All callers of rt6_fill_node pass 0 for nowait arg. Remove the arg and simplify rt6_fill_node accordingly. rt6_fill_node passes the nowait of 0 to ip6mr_get_route. Remove the nowait arg from it as well. Signed-off-by: David Ahern --- include/linux/mroute6.h | 2 +- net/ipv6/ip6mr.c| 9 ++--- net/ipv6/route.c| 27 ++- 3 files changed, 13 insertions(+), 25 deletions(-) diff --git a/include/linux/mroute6.h b/include/linux/mroute6.h index 19a1c0c2993b..ce44e3e96d27 100644 --- a/include/linux/mroute6.h +++ b/include/linux/mroute6.h @@ -116,7 +116,7 @@ struct mfc6_cache { struct rtmsg; extern int ip6mr_get_route(struct net *net, struct sk_buff *skb, - struct rtmsg *rtm, int nowait, u32 portid); + struct rtmsg *rtm, u32 portid); #ifdef CONFIG_IPV6_MROUTE extern struct sock *mroute6_socket(struct net *net, struct sk_buff *skb); diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c index e275077e8af2..babaf3ec2742 100644 --- a/net/ipv6/ip6mr.c +++ b/net/ipv6/ip6mr.c @@ -2288,7 +2288,7 @@ static int __ip6mr_fill_mroute(struct mr6_table *mrt, struct sk_buff *skb, } int ip6mr_get_route(struct net *net, struct sk_buff *skb, struct rtmsg *rtm, - int nowait, u32 portid) + u32 portid) { int err; struct mr6_table *mrt; @@ -2315,11 +2315,6 @@ int ip6mr_get_route(struct net *net, struct sk_buff *skb, struct rtmsg *rtm, struct net_device *dev; int vif; - if (nowait) { - read_unlock(&mrt_lock); - return -EAGAIN; - } - dev = skb->dev; if (!dev || (vif = ip6mr_find_vif(mrt, dev)) < 0) { read_unlock(&mrt_lock); @@ -2357,7 +2352,7 @@ int ip6mr_get_route(struct net *net, struct sk_buff *skb, struct rtmsg *rtm, return err; } - if (!nowait && (rtm->rtm_flags&RTM_F_NOTIFY)) + if (rtm->rtm_flags & RTM_F_NOTIFY) cache->mfc_flags |= MFC_NOTIFY; err = __ip6mr_fill_mroute(mrt, skb, cache, rtm); diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 4f6b067c8753..b2044dd71724 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -3169,7 +3169,7 @@ static int rt6_fill_node(struct net *net, struct sk_buff *skb, struct rt6_info *rt, struct in6_addr *dst, struct in6_addr *src, int iif, int type, u32 portid, u32 seq, -int prefix, int nowait, unsigned int flags) +int prefix, unsigned int flags) { u32 metrics[RTAX_MAX]; struct rtmsg *rtm; @@ -3261,19 +3261,12 @@ static int rt6_fill_node(struct net *net, if (iif) { #ifdef CONFIG_IPV6_MROUTE if (ipv6_addr_is_multicast(&rt->rt6i_dst.addr)) { - int err = ip6mr_get_route(net, skb, rtm, nowait, - portid); - - if (err <= 0) { - if (!nowait) { - if (err == 0) - return 0; - goto nla_put_failure; - } else { - if (err == -EMSGSIZE) - goto nla_put_failure; - } - } + int err = ip6mr_get_route(net, skb, rtm, portid); + + if (err == 0) + return 0; + if (err < 0) + goto nla_put_failure; } else #endif if (nla_put_u32(skb, RTA_IIF, iif)) @@ -3342,7 +3335,7 @@ int rt6_dump_route(struct rt6_info *rt, void *p_arg) return rt6_fill_node(arg->net, arg->skb, rt, NULL, NULL, 0, RTM_NEWROUTE, NETLINK_CB(arg->cb->skb).portid, arg->cb->nlh->nlmsg_seq, -prefix, 0, NLM_F_MULTI); +prefix, NLM_F_MULTI); } static int inet6_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh) @@ -3433,7 +3426,7 @@ static int inet6_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh) err = rt6_fill_node(net, skb, rt, &fl6.daddr, &fl6.saddr, iif, RTM_NEWROUTE, NETLINK_CB(in_skb).portid, - nlh->nlmsg_seq, 0, 0, 0); + nlh->nlmsg_seq, 0, 0); if (err < 0) { kfree_skb(skb); goto errout; @@ -3460,7 +3453,7 @@ void inet6_rt_notify(int event, struct rt6_info *rt, struct nl_info *info, goto errout; err = rt6_fill_node(net, skb, rt, NULL, NULL, 0, - event, inf
[PATCH net-next 2/2] net: ipv6: remove prefix arg to rt6_fill_node
The prefix arg to rt6_fill_node is non-0 in only 1 path - rt6_dump_route where a user is requesting a prefix only dump. Simplify rt6_fill_node by removing the prefix arg and moving the prefix check to rt6_dump_route. Signed-off-by: David Ahern --- net/ipv6/route.c | 27 --- 1 file changed, 12 insertions(+), 15 deletions(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index b2044dd71724..5585c501a540 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -3169,7 +3169,7 @@ static int rt6_fill_node(struct net *net, struct sk_buff *skb, struct rt6_info *rt, struct in6_addr *dst, struct in6_addr *src, int iif, int type, u32 portid, u32 seq, -int prefix, unsigned int flags) +unsigned int flags) { u32 metrics[RTAX_MAX]; struct rtmsg *rtm; @@ -3177,13 +3177,6 @@ static int rt6_fill_node(struct net *net, long expires; u32 table; - if (prefix) { /* user wants prefix routes only */ - if (!(rt->rt6i_flags & RTF_PREFIX_RT)) { - /* success since this is not a prefix route */ - return 1; - } - } - nlh = nlmsg_put(skb, portid, seq, type, sizeof(*rtm), flags); if (!nlh) return -EMSGSIZE; @@ -3324,18 +3317,22 @@ static int rt6_fill_node(struct net *net, int rt6_dump_route(struct rt6_info *rt, void *p_arg) { struct rt6_rtnl_dump_arg *arg = (struct rt6_rtnl_dump_arg *) p_arg; - int prefix; if (nlmsg_len(arg->cb->nlh) >= sizeof(struct rtmsg)) { struct rtmsg *rtm = nlmsg_data(arg->cb->nlh); - prefix = (rtm->rtm_flags & RTM_F_PREFIX) != 0; - } else - prefix = 0; + + /* user wants prefix routes only */ + if (rtm->rtm_flags & RTM_F_PREFIX && + !(rt->rt6i_flags & RTF_PREFIX_RT)) { + /* success since this is not a prefix route */ + return 1; + } + } return rt6_fill_node(arg->net, arg->skb, rt, NULL, NULL, 0, RTM_NEWROUTE, NETLINK_CB(arg->cb->skb).portid, arg->cb->nlh->nlmsg_seq, -prefix, NLM_F_MULTI); +NLM_F_MULTI); } static int inet6_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh) @@ -3426,7 +3423,7 @@ static int inet6_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh) err = rt6_fill_node(net, skb, rt, &fl6.daddr, &fl6.saddr, iif, RTM_NEWROUTE, NETLINK_CB(in_skb).portid, - nlh->nlmsg_seq, 0, 0); + nlh->nlmsg_seq, 0); if (err < 0) { kfree_skb(skb); goto errout; @@ -3453,7 +3450,7 @@ void inet6_rt_notify(int event, struct rt6_info *rt, struct nl_info *info, goto errout; err = rt6_fill_node(net, skb, rt, NULL, NULL, 0, - event, info->portid, seq, 0, nlm_flags); + event, info->portid, seq, nlm_flags); if (err < 0) { /* -EMSGSIZE implies BUG in rt6_nlmsg_size() */ WARN_ON(err == -EMSGSIZE); -- 2.1.4
[PATCH net-next 0/2] net: ipv6: simplify rt6_fill_node
Remove a couple of unnecessary input arguments to rt6_fill_node. David Ahern (2): net: ipv6: remove nowait arg to rt6_fill_node net: ipv6: remove prefix arg to rt6_fill_node include/linux/mroute6.h | 2 +- net/ipv6/ip6mr.c| 9 ++--- net/ipv6/route.c| 46 ++ 3 files changed, 21 insertions(+), 36 deletions(-) -- 2.1.4
Re: [PATCH net-next v4 05/10] drivers: base: Add device_find_in_class_name()
On 01/17/2017 03:34 PM, Andy Shevchenko wrote: > On Wed, Jan 18, 2017 at 1:21 AM, Florian Fainelli > wrote: >> Add a helper function to lookup a device reference given a class name. >> This is a preliminary patch to remove adhoc code from net/dsa/dsa.c and >> make it more generic. > > >> +static int device_class_name_match(struct device *dev, void *class) > > And why not const char *class? This was raised back in v2, and the same response applies: https://www.mail-archive.com/netdev@vger.kernel.org/msg147559.html Changing the signature of a callback is out of the scope of this patch series. -- Florian
RE: [PATCHv1 7/7] IPVTAP: IP-VLAN based tap driver
> -Original Message- > From: Mahesh Bandewar (महेश बंडेवार) > [mailto:mahe...@google.com] > Sent: Friday, January 06, 2017 3:47 PM > To: Grandhi, Sainath > Cc: linux-netdev ; David Miller > ; mah...@bandewar.net; linux- > ker...@vger.kernel.org > Subject: Re: [PATCHv1 7/7] IPVTAP: IP-VLAN based tap driver > > few superficial comments inline. > > On Fri, Jan 6, 2017 at 2:33 PM, Sainath Grandhi > wrote: > > This patch adds a tap character device driver that is based on the > > IP-VLAN network interface, called ipvtap. An ipvtap device can be > > created in the same way as an ipvlan device, using 'type ipvtap', and > > then accessed using the tap user space interface. > > > > Signed-off-by: Sainath Grandhi > > Tested-by: Sainath Grandhi > > --- > > drivers/net/Kconfig | 12 ++ > > drivers/net/Makefile | 1 + > > drivers/net/ipvlan/Makefile | 1 + > > drivers/net/ipvlan/ipvlan.h | 7 ++ > > drivers/net/ipvlan/ipvlan_core.c | 5 +- > > drivers/net/ipvlan/ipvlan_main.c | 37 +++--- > > drivers/net/ipvlan/ipvtap.c | 238 > +++ > > 7 files changed, 282 insertions(+), 19 deletions(-) create mode > > 100644 drivers/net/ipvlan/ipvtap.c > > > > diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index > > 280380d..ddfb30a 100644 > > --- a/drivers/net/Kconfig > > +++ b/drivers/net/Kconfig > > @@ -165,6 +165,18 @@ config IPVLAN > >To compile this driver as a module, choose M here: the module > >will be called ipvlan. > > > > +config IPVTAP > > +tristate "IP-VLAN based tap driver" > > +depends on IPVLAN > > +depends on INET > > +help > > + This adds a specialized tap character device driver that is based > > + on the IP-VLAN network interface, called ipvtap. An ipvtap device > > + can be added in the same way as a ipvlan device, using 'type > > + ipvtap', and then be accessed through the tap user space > > interface. > > + > > + To compile this driver as a module, choose M here: the module > > + will be called macvtap. > > > > config VXLAN > > tristate "Virtual eXtensible Local Area Network (VXLAN)" > > diff --git a/drivers/net/Makefile b/drivers/net/Makefile index > > 7dd86ca..98ed4d9 100644 > > --- a/drivers/net/Makefile > > +++ b/drivers/net/Makefile > > @@ -7,6 +7,7 @@ > > # > > obj-$(CONFIG_BONDING) += bonding/ > > obj-$(CONFIG_IPVLAN) += ipvlan/ > > +obj-$(CONFIG_IPVTAP) += ipvlan/ > > obj-$(CONFIG_DUMMY) += dummy.o > > obj-$(CONFIG_EQUALIZER) += eql.o > > obj-$(CONFIG_IFB) += ifb.o > > diff --git a/drivers/net/ipvlan/Makefile b/drivers/net/ipvlan/Makefile > > index df79910..8a2c64d 100644 > > --- a/drivers/net/ipvlan/Makefile > > +++ b/drivers/net/ipvlan/Makefile > > @@ -3,5 +3,6 @@ > > # > > > > obj-$(CONFIG_IPVLAN) += ipvlan.o > > +obj-$(CONFIG_IPVTAP) += ipvtap.o > > > > ipvlan-objs := ipvlan_core.o ipvlan_main.o diff --git > > a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h index > > dbfbb33..4362d88 100644 > > --- a/drivers/net/ipvlan/ipvlan.h > > +++ b/drivers/net/ipvlan/ipvlan.h > > @@ -133,4 +133,11 @@ struct sk_buff *ipvlan_l3_rcv(struct net_device > *dev, struct sk_buff *skb, > > u16 proto); unsigned int > > ipvlan_nf_input(void *priv, struct sk_buff *skb, > > const struct nf_hook_state *state); > > +void ipvlan_count_rx(const struct ipvl_dev *ipvlan, > > +unsigned int len, bool success, bool mcast); int > > +ipvlan_link_new(struct net *src_net, struct net_device *dev, > > + struct nlattr *tb[], struct nlattr *data[]); void > > +ipvlan_link_delete(struct net_device *dev, struct list_head *head); > > +void ipvlan_link_setup(struct net_device *dev); int > > +ipvlan_link_register(struct rtnl_link_ops *ops); > > #endif /* __IPVLAN_H */ > > diff --git a/drivers/net/ipvlan/ipvlan_core.c > > b/drivers/net/ipvlan/ipvlan_core.c > > index 83ce74a..9af16ab 100644 > > --- a/drivers/net/ipvlan/ipvlan_core.c > > +++ b/drivers/net/ipvlan/ipvlan_core.c > > @@ -16,8 +16,8 @@ void ipvlan_init_secret(void) > > net_get_random_once(&ipvlan_jhash_secret, > > sizeof(ipvlan_jhash_secret)); } > > > > -static void ipvlan_count_rx(const struct ipvl_dev *ipvlan, > > - unsigned int len, bool success, bool mcast) > > +void ipvlan_count_rx(const struct ipvl_dev *ipvlan, > > +unsigned int len, bool success, bool mcast) > > { > > if (!ipvlan) > > return; > > @@ -36,6 +36,7 @@ static void ipvlan_count_rx(const struct ipvl_dev > *ipvlan, > > this_cpu_inc(ipvlan->pcpu_stats->rx_errs); > > } > > } > > +EXPORT_SYMBOL_GPL(ipvlan_count_rx); > Why export, isn't just removing 'static' enough? This function becomes part of "ipvlan" module. "ipvtap" module depends on this function exported by "ipvlan" module. >
RE: [PATCHv1 7/7] IPVTAP: IP-VLAN based tap driver
> -Original Message- > From: Eric Dumazet [mailto:eric.duma...@gmail.com] > Sent: Friday, January 06, 2017 3:14 PM > To: Grandhi, Sainath > Cc: netdev@vger.kernel.org; da...@davemloft.net; > mah...@bandewar.net; linux-ker...@vger.kernel.org > Subject: Re: [PATCHv1 7/7] IPVTAP: IP-VLAN based tap driver > > On Fri, 2017-01-06 at 14:33 -0800, Sainath Grandhi wrote: > > This patch adds a tap character device driver that is based on the > > IP-VLAN network interface, called ipvtap. An ipvtap device can be > > created in the same way as an ipvlan device, using 'type ipvtap', and > > then accessed using the tap user space interface. > > > > Signed-off-by: Sainath Grandhi > > Tested-by: Sainath Grandhi > > --- > > > > +module_exit(ipvtap_exit); > > +MODULE_ALIAS_RTNL_LINK("ipvtap"); > > +MODULE_AUTHOR("Arnd Bergmann "); > > +MODULE_LICENSE("GPL"); > > Who wrote this driver exactly ??? > > Sending out next version, modifying this.
Re: Getting a handle on all these new NIC features
On Wed, Jan 18, 2017 at 12:05 AM, Tom Herbert wrote: > There was some discussion about the problems of dealing with the > explosion of NIC features in the mlx directory restructuring proposal, > but I think the is a deeper issue here that should be discussed. > > It's hard not to notice that there has been quite a proliferation of > NIC features in several drivers. This trend had resulted in very > complex driver code that may or may not segment individual features. > One visible manifestation of this is number of ndo functions which is > somewhere around seventy-five now. > > I suspect the vast majority of these advances NIC features (e.g. > bridging, UDP offloads, tc offload, etc.) are only relevant to some of > the people some of the time. The problem we have, in this case those > of us that are attempting to deploy and maintain NICs at scale, is > when we have to deal with the ramifications of these features being > intertwined with core driver functionality that is relevant to > everyone. This becomes very obvious when we need to backport drivers > from later versions of kernel. > > I realize that backports of a driver is not a specific concern of the > Linux kernel, but nevertheless this is a real problem and a fact of > life for many users. Rebasing the full kernel is still a major effort > and it seems the best we could ever do is one rebase per year. In the > interim we need to occasionally backport drivers. Backporting drivers > is difficult precisely because of new features or API changes to > existing ones. These sort of changes tend to have a spiderweb of > dependencies in other parts of the stack so that the number of patches > we need to cherry-pick goes way beyond those that touch the driver we > are interested in. > I think backporting is not the only concern here, the other main issue is a pure software design related that cannot just be ignored, device drivers are getting smarter and are doing lots of offloads and logic, they are not as thin as they used to be, which is also a justification for why we should take a second (stop coding for a while :-) ) and give this issue some attention. > Currently we (FB) need to backport two NIC drivers. I've already gave > details of backporting mlx5 on the thread to restructure the driver > directories. The other driver being backporting seems to suffer from > the same type of feature complexity. > Can you share some more about the most complex stuff you faced while backporting? What would have made it simpler if we designed the driver differently ? > In short, I would like to ask if driver maintainers to start to > modularize driver features. If something being added is obviously a > narrow feature that only a subset of users will need can we allow > config options to #ifdef those out somehow? Furthermore can the file > and directory structure of drivers reflect that; our lives would be > _so_ much simpler to maintain drivers in production if we have such > modularity and the ability to build drivers with the features of our > choosing. > Before we do this or define the plan, there are some questions to be asked: 1. Can we allow ourselves to have kconfig or even an internal compilation flag per device driver feature ? 2. What about previous features ? i mean in order to have a clean and clear way to do have this isolation for new features, some kind of restructuring or core reorganizing is required, it is ugly to have driver with a hybrid structuring. 3. in case if we decide to do a restructuring phase as we suggested in the mlx5 patch, what is the plan for older kernels who still backport fixes to the previous structure. 4. What is the concrete plan ? is there a design reference or guidelines known to someone that every one can follow ? Anyway I would like to contribute some thoughts and design techniques to achieve this moularization and features isolation by design ( at least for new features): Device initialization and netdev registration: - most of the device drivers have main.c which handles driver initialization and netdev registration. - but today this file provide much more than the above. - I suggest to keep it as thin as possible and dedicated to what it should do. - keep HAL (Hardware Abstraction Layer) separated from main.c and main should call entry points exposed by the HAL layer. - basic netdev features RX/TX and most basic ndos for basic Ethernet functionality can still be in main.c - Advanced features (eswitch,TC offloads, vxlan and tunneling offloads, XDP, etc..) such features can go to separate file(s) with full logic implementation and clear code locality wrapped by #ifdef compilation or kconfig flag to have easy control on them and to give the reviewer/developer a chance to logically understand the code and distinguish between the different features by looking at the Makefile or the c file including those features. ( just keep the feature logic out of main.c) I've been partially followi
[PATCH net-next] lwtunnel: remove device arg to lwtunnel_build_state
Nothing about lwt state requires a device reference, so remove the input argument. Signed-off-by: David Ahern --- include/net/lwtunnel.h| 6 +++--- net/core/lwt_bpf.c| 2 +- net/core/lwtunnel.c | 4 ++-- net/ipv4/fib_semantics.c | 22 ++ net/ipv4/ip_tunnel_core.c | 4 ++-- net/ipv6/ila/ila_lwt.c| 2 +- net/ipv6/route.c | 2 +- net/ipv6/seg6_iptunnel.c | 2 +- net/mpls/mpls_iptunnel.c | 2 +- 9 files changed, 18 insertions(+), 28 deletions(-) diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h index d4c1c75b8862..671d5a766dd9 100644 --- a/include/net/lwtunnel.h +++ b/include/net/lwtunnel.h @@ -33,7 +33,7 @@ struct lwtunnel_state { }; struct lwtunnel_encap_ops { - int (*build_state)(struct net_device *dev, struct nlattr *encap, + int (*build_state)(struct nlattr *encap, unsigned int family, const void *cfg, struct lwtunnel_state **ts); void (*destroy_state)(struct lwtunnel_state *lws); @@ -105,7 +105,7 @@ int lwtunnel_encap_add_ops(const struct lwtunnel_encap_ops *op, unsigned int num); int lwtunnel_encap_del_ops(const struct lwtunnel_encap_ops *op, unsigned int num); -int lwtunnel_build_state(struct net_device *dev, u16 encap_type, +int lwtunnel_build_state(u16 encap_type, struct nlattr *encap, unsigned int family, const void *cfg, struct lwtunnel_state **lws); @@ -168,7 +168,7 @@ static inline int lwtunnel_encap_del_ops(const struct lwtunnel_encap_ops *op, return -EOPNOTSUPP; } -static inline int lwtunnel_build_state(struct net_device *dev, u16 encap_type, +static inline int lwtunnel_build_state(u16 encap_type, struct nlattr *encap, unsigned int family, const void *cfg, struct lwtunnel_state **lws) diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c index 71bb3e2eca08..4b737a2e5457 100644 --- a/net/core/lwt_bpf.c +++ b/net/core/lwt_bpf.c @@ -237,7 +237,7 @@ static const struct nla_policy bpf_nl_policy[LWT_BPF_MAX + 1] = { [LWT_BPF_XMIT_HEADROOM] = { .type = NLA_U32 }, }; -static int bpf_build_state(struct net_device *dev, struct nlattr *nla, +static int bpf_build_state(struct nlattr *nla, unsigned int family, const void *cfg, struct lwtunnel_state **ts) { diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c index a5d4e866ce88..0f30398e0bdd 100644 --- a/net/core/lwtunnel.c +++ b/net/core/lwtunnel.c @@ -100,7 +100,7 @@ int lwtunnel_encap_del_ops(const struct lwtunnel_encap_ops *ops, } EXPORT_SYMBOL(lwtunnel_encap_del_ops); -int lwtunnel_build_state(struct net_device *dev, u16 encap_type, +int lwtunnel_build_state(u16 encap_type, struct nlattr *encap, unsigned int family, const void *cfg, struct lwtunnel_state **lws) { @@ -127,7 +127,7 @@ int lwtunnel_build_state(struct net_device *dev, u16 encap_type, } #endif if (likely(ops && ops->build_state)) - ret = ops->build_state(dev, encap, family, cfg, lws); + ret = ops->build_state(encap, family, cfg, lws); rcu_read_unlock(); return ret; diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index 9a375b908d01..f57efe73b84f 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -471,7 +471,6 @@ static int fib_count_nexthops(struct rtnexthop *rtnh, int remaining) static int fib_get_nhs(struct fib_info *fi, struct rtnexthop *rtnh, int remaining, struct fib_config *cfg) { - struct net *net = cfg->fc_nlinfo.nl_net; int ret; change_nexthops(fi) { @@ -503,16 +502,14 @@ static int fib_get_nhs(struct fib_info *fi, struct rtnexthop *rtnh, nla = nla_find(attrs, attrlen, RTA_ENCAP); if (nla) { struct lwtunnel_state *lwtstate; - struct net_device *dev = NULL; struct nlattr *nla_entype; nla_entype = nla_find(attrs, attrlen, RTA_ENCAP_TYPE); if (!nla_entype) goto err_inval; - if (cfg->fc_oif) - dev = __dev_get_by_index(net, cfg->fc_oif); - ret = lwtunnel_build_state(dev, nla_get_u16( + + ret = lwtunnel_build_state(nla_get_u16( nla_entype), nla, AF_INET, cfg,
[PATCH net-next v4 06/10] net: dsa: Migrate to device_find_in_class_name()
Now that the base device driver code provides an identical implementation of dev_find_class() utilize device_find_in_class_name() instead of our own version of it. Signed-off-by: Florian Fainelli --- net/dsa/dsa.c | 22 ++ 1 file changed, 2 insertions(+), 20 deletions(-) diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c index 2306d1b87c83..d9db63910887 100644 --- a/net/dsa/dsa.c +++ b/net/dsa/dsa.c @@ -455,29 +455,11 @@ EXPORT_SYMBOL_GPL(dsa_switch_resume); #endif /* platform driver init and cleanup */ -static int dev_is_class(struct device *dev, void *class) -{ - if (dev->class != NULL && !strcmp(dev->class->name, class)) - return 1; - - return 0; -} - -static struct device *dev_find_class(struct device *parent, char *class) -{ - if (dev_is_class(parent, class)) { - get_device(parent); - return parent; - } - - return device_find_child(parent, class, dev_is_class); -} - struct mii_bus *dsa_host_dev_to_mii_bus(struct device *dev) { struct device *d; - d = dev_find_class(dev, "mdio_bus"); + d = device_find_in_class_name(dev, "mdio_bus"); if (d != NULL) { struct mii_bus *bus; @@ -495,7 +477,7 @@ static struct net_device *dev_to_net_device(struct device *dev) { struct device *d; - d = dev_find_class(dev, "net"); + d = device_find_in_class_name(dev, "net"); if (d != NULL) { struct net_device *nd; -- 2.9.3
[PATCH net-next v4 02/10] net: dsa: Make most functions take a dsa_port argument
In preparation for allowing platform data, and therefore no valid device_node pointer, make most DSA functions takes a pointer to a dsa_port structure whenever possible. While at it, introduce a dsa_port_is_valid() helper function which checks whether port->dn is NULL or not at the moment. Signed-off-by: Florian Fainelli --- net/dsa/dsa.c | 15 -- net/dsa/dsa2.c | 61 +- net/dsa/dsa_priv.h | 4 ++-- 3 files changed, 44 insertions(+), 36 deletions(-) diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c index fd532487dfdf..2306d1b87c83 100644 --- a/net/dsa/dsa.c +++ b/net/dsa/dsa.c @@ -110,8 +110,9 @@ dsa_switch_probe(struct device *parent, struct device *host_dev, int sw_addr, /* basic switch operations **/ int dsa_cpu_dsa_setup(struct dsa_switch *ds, struct device *dev, - struct device_node *port_dn, int port) + struct dsa_port *dport, int port) { + struct device_node *port_dn = dport->dn; struct phy_device *phydev; int ret, mode; @@ -141,15 +142,15 @@ int dsa_cpu_dsa_setup(struct dsa_switch *ds, struct device *dev, static int dsa_cpu_dsa_setups(struct dsa_switch *ds, struct device *dev) { - struct device_node *port_dn; + struct dsa_port *dport; int ret, port; for (port = 0; port < DSA_MAX_PORTS; port++) { if (!(dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port))) continue; - port_dn = ds->ports[port].dn; - ret = dsa_cpu_dsa_setup(ds, dev, port_dn, port); + dport = &ds->ports[port]; + ret = dsa_cpu_dsa_setup(ds, dev, dport, port); if (ret) return ret; } @@ -366,8 +367,10 @@ dsa_switch_setup(struct dsa_switch_tree *dst, int index, return ds; } -void dsa_cpu_dsa_destroy(struct device_node *port_dn) +void dsa_cpu_dsa_destroy(struct dsa_port *port) { + struct device_node *port_dn = port->dn; + if (of_phy_is_fixed_link(port_dn)) of_phy_deregister_fixed_link(port_dn); } @@ -393,7 +396,7 @@ static void dsa_switch_destroy(struct dsa_switch *ds) for (port = 0; port < DSA_MAX_PORTS; port++) { if (!(dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port))) continue; - dsa_cpu_dsa_destroy(ds->ports[port].dn); + dsa_cpu_dsa_destroy(&ds->ports[port]); /* Clearing a bit which is not set does no harm */ ds->cpu_port_mask |= ~(1 << port); diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c index 4170f7ea8e28..6e3675220fef 100644 --- a/net/dsa/dsa2.c +++ b/net/dsa/dsa2.c @@ -79,14 +79,19 @@ static void dsa_dst_del_ds(struct dsa_switch_tree *dst, kref_put(&dst->refcount, dsa_free_dst); } -static bool dsa_port_is_dsa(struct device_node *port) +static bool dsa_port_is_valid(struct dsa_port *port) { - return !!of_parse_phandle(port, "link", 0); + return !!port->dn; } -static bool dsa_port_is_cpu(struct device_node *port) +static bool dsa_port_is_dsa(struct dsa_port *port) { - return !!of_parse_phandle(port, "ethernet", 0); + return !!of_parse_phandle(port->dn, "link", 0); +} + +static bool dsa_port_is_cpu(struct dsa_port *port) +{ + return !!of_parse_phandle(port->dn, "ethernet", 0); } static bool dsa_ds_find_port(struct dsa_switch *ds, @@ -120,7 +125,7 @@ static struct dsa_switch *dsa_dst_find_port(struct dsa_switch_tree *dst, static int dsa_port_complete(struct dsa_switch_tree *dst, struct dsa_switch *src_ds, -struct device_node *port, +struct dsa_port *port, u32 src_port) { struct device_node *link; @@ -128,7 +133,7 @@ static int dsa_port_complete(struct dsa_switch_tree *dst, struct dsa_switch *dst_ds; for (index = 0;; index++) { - link = of_parse_phandle(port, "link", index); + link = of_parse_phandle(port->dn, "link", index); if (!link) break; @@ -151,13 +156,13 @@ static int dsa_port_complete(struct dsa_switch_tree *dst, */ static int dsa_ds_complete(struct dsa_switch_tree *dst, struct dsa_switch *ds) { - struct device_node *port; + struct dsa_port *port; u32 index; int err; for (index = 0; index < DSA_MAX_PORTS; index++) { - port = ds->ports[index].dn; - if (!port) + port = &ds->ports[index]; + if (!dsa_port_is_valid(port)) continue; if (!dsa_port_is_dsa(port)) @@ -197,7 +202,7 @@ static int dsa_dst_complete(struct dsa_switch_tree *dst) return 0; } -static int dsa_dsa_port_apply(struct device_node *port, u3
Re: [PATCH net-next v4 05/10] drivers: base: Add device_find_in_class_name()
On Wed, Jan 18, 2017 at 1:21 AM, Florian Fainelli wrote: > Add a helper function to lookup a device reference given a class name. > This is a preliminary patch to remove adhoc code from net/dsa/dsa.c and > make it more generic. > +static int device_class_name_match(struct device *dev, void *class) And why not const char *class? > +{ > + if (dev->class != NULL && !strcmp(dev->class->name, class)) if (dev->class && ...) > + return 1; > + > + return 0; Perhaps even one line: return dev->class && ...; > +} > + > +/** > + * device_find_in_class_name - device iterator for locating a particular > device > + * within the specified class name > + * @parent: parent struct device > + * @class_name: Class name to match against > + * > + * This function returns 1 if the device (specified by @parent), or one of > its child > + * is in the class whose name is specified by @class_name. Returns 0 > otherwise. > + * > + * NOTE: you will need to drop the reference with put_device() after use. > + */ > +struct device *device_find_in_class_name(struct device *parent, > +char *class_name) const char *class_name > +{ > + if (device_class_name_match(parent, class_name)) { > + get_device(parent); > + return parent; > + } > + > + return device_find_child(parent, class_name, device_class_name_match); > +} > +EXPORT_SYMBOL_GPL(device_find_in_class_name); > +extern struct device *device_find_in_class_name(struct device *parent, > + char *class_name); Ditto. -- With Best Regards, Andy Shevchenko
[PATCH net-next v4 07/10] net: Relocate dev_to_net_device() into net/core/dev.c
dev_to_net_device() is moved from net/dsa/dsa.c to net/core/dev.c since it going to be used by net/dsa/dsa2.c and the namespace of the function justifies making it available to other users potentially. We also rename it to device_to_net_device() to better illustrate what it does since it is not just a container_of() wrapper. Signed-off-by: Florian Fainelli --- include/linux/netdevice.h | 2 ++ net/core/dev.c| 30 ++ net/dsa/dsa.c | 20 +--- 3 files changed, 33 insertions(+), 19 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 97ae0ac513ee..f8cc9833107c 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -4390,4 +4390,6 @@ do { \ #define PTYPE_HASH_SIZE(16) #define PTYPE_HASH_MASK(PTYPE_HASH_SIZE - 1) +struct net_device *device_to_net_device(struct device *dev); + #endif /* _LINUX_NETDEVICE_H */ diff --git a/net/core/dev.c b/net/core/dev.c index ad5959e56116..f6897906f229 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -8128,6 +8128,36 @@ const char *netdev_drivername(const struct net_device *dev) return empty; } +/** + * device_to_net_device - return the net_device from device + * @dev: device reference + * + * Returns the net_device associated with this device reference + * NULL if the device is not a network device, or could not be + * found. + * + * Note: caller must call dev_put() to release the net_device + * once done with it. + */ +struct net_device *device_to_net_device(struct device *dev) +{ + struct device *d; + + d = device_find_in_class_name(dev, "net"); + if (d) { + struct net_device *nd; + + nd = to_net_dev(d); + dev_hold(nd); + put_device(d); + + return nd; + } + + return NULL; +} +EXPORT_SYMBOL_GPL(device_to_net_device); + static void __netdev_printk(const char *level, const struct net_device *dev, struct va_format *vaf) { diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c index d9db63910887..88b56f7e3dd2 100644 --- a/net/dsa/dsa.c +++ b/net/dsa/dsa.c @@ -473,24 +473,6 @@ struct mii_bus *dsa_host_dev_to_mii_bus(struct device *dev) } EXPORT_SYMBOL_GPL(dsa_host_dev_to_mii_bus); -static struct net_device *dev_to_net_device(struct device *dev) -{ - struct device *d; - - d = device_find_in_class_name(dev, "net"); - if (d != NULL) { - struct net_device *nd; - - nd = to_net_dev(d); - dev_hold(nd); - put_device(d); - - return nd; - } - - return NULL; -} - #ifdef CONFIG_OF static int dsa_of_setup_routing_table(struct dsa_platform_data *pd, struct dsa_chip_data *cd, @@ -799,7 +781,7 @@ static int dsa_probe(struct platform_device *pdev) dev = pd->of_netdev; dev_hold(dev); } else { - dev = dev_to_net_device(pd->netdev); + dev = device_to_net_device(pd->netdev); } if (dev == NULL) { ret = -EPROBE_DEFER; -- 2.9.3
[PATCH net-next v4 04/10] net: dsa: Move ports assignment closer to error checking
Move the assignment of ports in _dsa_register_switch() closer to where it is checked, no functional change. Re-order declarations to be preserve the inverted christmas tree style. Signed-off-by: Florian Fainelli --- net/dsa/dsa2.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c index 04ab62251fe3..cd91070b5467 100644 --- a/net/dsa/dsa2.c +++ b/net/dsa/dsa2.c @@ -587,8 +587,8 @@ static struct device_node *dsa_get_ports(struct dsa_switch *ds, static int _dsa_register_switch(struct dsa_switch *ds, struct device *dev) { struct device_node *np = dev->of_node; - struct device_node *ports = dsa_get_ports(ds, np); struct dsa_switch_tree *dst; + struct device_node *ports; u32 tree, index; int i, err; @@ -596,6 +596,7 @@ static int _dsa_register_switch(struct dsa_switch *ds, struct device *dev) if (err) return err; + ports = dsa_get_ports(ds, np); if (IS_ERR(ports)) return PTR_ERR(ports); -- 2.9.3
[PATCH net-next v4 03/10] net: dsa: Suffix function manipulating device_node with _dn
Make it clear that these functions take a device_node structure pointer Signed-off-by: Florian Fainelli --- net/dsa/dsa2.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c index 6e3675220fef..04ab62251fe3 100644 --- a/net/dsa/dsa2.c +++ b/net/dsa/dsa2.c @@ -94,8 +94,8 @@ static bool dsa_port_is_cpu(struct dsa_port *port) return !!of_parse_phandle(port->dn, "ethernet", 0); } -static bool dsa_ds_find_port(struct dsa_switch *ds, -struct device_node *port) +static bool dsa_ds_find_port_dn(struct dsa_switch *ds, + struct device_node *port) { u32 index; @@ -105,8 +105,8 @@ static bool dsa_ds_find_port(struct dsa_switch *ds, return false; } -static struct dsa_switch *dsa_dst_find_port(struct dsa_switch_tree *dst, - struct device_node *port) +static struct dsa_switch *dsa_dst_find_port_dn(struct dsa_switch_tree *dst, + struct device_node *port) { struct dsa_switch *ds; u32 index; @@ -116,7 +116,7 @@ static struct dsa_switch *dsa_dst_find_port(struct dsa_switch_tree *dst, if (!ds) continue; - if (dsa_ds_find_port(ds, port)) + if (dsa_ds_find_port_dn(ds, port)) return ds; } @@ -137,7 +137,7 @@ static int dsa_port_complete(struct dsa_switch_tree *dst, if (!link) break; - dst_ds = dsa_dst_find_port(dst, link); + dst_ds = dsa_dst_find_port_dn(dst, link); of_node_put(link); if (!dst_ds) @@ -546,7 +546,7 @@ static int dsa_parse_ports_dn(struct device_node *ports, struct dsa_switch *ds) return 0; } -static int dsa_parse_member(struct device_node *np, u32 *tree, u32 *index) +static int dsa_parse_member_dn(struct device_node *np, u32 *tree, u32 *index) { int err; @@ -592,7 +592,7 @@ static int _dsa_register_switch(struct dsa_switch *ds, struct device *dev) u32 tree, index; int i, err; - err = dsa_parse_member(np, &tree, &index); + err = dsa_parse_member_dn(np, &tree, &index); if (err) return err; -- 2.9.3
Re: [PATCH 2/2] at803x: double check SGMII side autoneg
On 10/24/2016 05:40 AM, Zefir Kurtisi wrote: As a result, if you ever see a warning '803x_aneg_done: SGMII link is not ok' you will end up having an Ethernet link up but won't get any data through. This should not happen, if it does, please contact the module maintainer. I am now seeing this: ubuntu@ubuntu:~$ ifup eth1 ubuntu@ubuntu:~$ [ 588.687689] 803x_aneg_done: SGMII link is not ok [ 588.694909] qcom-emac QCOM8070:00 eth1: Link is Up - 1Gbps/Full - flow control rx/tx [ 588.703985] qcom-emac QCOM8070:00 eth1: Link is Up - 1Gbps/Full - flow control rx/tx ubuntu@ubuntu:~$ ping 192.168.3.1 PING 192.168.3.1 (192.168.3.1) 56(84) bytes of data. 64 bytes from 192.168.3.1: icmp_seq=1 ttl=64 time=0.502 ms 64 bytes from 192.168.3.1: icmp_seq=2 ttl=64 time=0.244 ms 64 bytes from 192.168.3.1: icmp_seq=3 ttl=64 time=0.220 ms ^C --- 192.168.3.1 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2107ms rtt min/avg/max/mdev = 0.220/0.322/0.502/0.127 ms So I do get the "SGMII link is not ok" message, but my connection is fine. I don't know why the link-up message is displayed twice. It's only displayed once if I use the genphy driver instead of the at803x driver. I'm going to debug the at803x to see what it does that causes the double link-up message. -- Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
[PATCH net-next v4 05/10] drivers: base: Add device_find_in_class_name()
Add a helper function to lookup a device reference given a class name. This is a preliminary patch to remove adhoc code from net/dsa/dsa.c and make it more generic. Signed-off-by: Florian Fainelli --- drivers/base/core.c| 31 +++ include/linux/device.h | 2 ++ 2 files changed, 33 insertions(+) diff --git a/drivers/base/core.c b/drivers/base/core.c index 8c25e68e67d7..fb9fced38634 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -2058,6 +2058,37 @@ struct device *device_find_child(struct device *parent, void *data, } EXPORT_SYMBOL_GPL(device_find_child); +static int device_class_name_match(struct device *dev, void *class) +{ + if (dev->class != NULL && !strcmp(dev->class->name, class)) + return 1; + + return 0; +} + +/** + * device_find_in_class_name - device iterator for locating a particular device + * within the specified class name + * @parent: parent struct device + * @class_name: Class name to match against + * + * This function returns 1 if the device (specified by @parent), or one of its child + * is in the class whose name is specified by @class_name. Returns 0 otherwise. + * + * NOTE: you will need to drop the reference with put_device() after use. + */ +struct device *device_find_in_class_name(struct device *parent, +char *class_name) +{ + if (device_class_name_match(parent, class_name)) { + get_device(parent); + return parent; + } + + return device_find_child(parent, class_name, device_class_name_match); +} +EXPORT_SYMBOL_GPL(device_find_in_class_name); + int __init devices_init(void) { devices_kset = kset_create_and_add("devices", &device_uevent_ops, NULL); diff --git a/include/linux/device.h b/include/linux/device.h index 491b4c0ca633..fbc2a255f92e 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -1120,6 +1120,8 @@ extern int device_for_each_child_reverse(struct device *dev, void *data, int (*fn)(struct device *dev, void *data)); extern struct device *device_find_child(struct device *dev, void *data, int (*match)(struct device *dev, void *data)); +extern struct device *device_find_in_class_name(struct device *parent, + char *class_name); extern int device_rename(struct device *dev, const char *new_name); extern int device_move(struct device *dev, struct device *new_parent, enum dpm_order dpm_order); -- 2.9.3
[PATCH net-next v4 08/10] net: dsa: Add support for platform data
Allow drivers to use the new DSA API with platform data. Most of the code in net/dsa/dsa2.c does not rely so much on device_nodes and can get the same information from platform_data instead. We purposely do not support distributed configurations with platform data, so drivers should be providing a pointer to a 'struct dsa_chip_data' structure if they wish to communicate per-port layout. Multiple CPUs port could potentially be supported and dsa_chip_data is extended to receive up to one reference to an upstream network device per port described by a dsa_chip_data structure. Signed-off-by: Florian Fainelli --- include/net/dsa.h | 6 net/dsa/dsa2.c| 102 -- 2 files changed, 90 insertions(+), 18 deletions(-) diff --git a/include/net/dsa.h b/include/net/dsa.h index 16a502a6c26a..491008792e4d 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -42,6 +42,11 @@ struct dsa_chip_data { struct device *host_dev; int sw_addr; + /* +* Reference to network devices +*/ + struct device *netdev[DSA_MAX_PORTS]; + /* set to size of eeprom if supported by the switch */ int eeprom_len; @@ -140,6 +145,7 @@ struct dsa_switch_tree { }; struct dsa_port { + const char *name; struct net_device *netdev; struct device_node *dn; unsigned intageing_time; diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c index cd91070b5467..761e8724423f 100644 --- a/net/dsa/dsa2.c +++ b/net/dsa/dsa2.c @@ -79,19 +79,28 @@ static void dsa_dst_del_ds(struct dsa_switch_tree *dst, kref_put(&dst->refcount, dsa_free_dst); } +/* For platform data configurations, we need to have a valid name argument to + * differentiate a disabled port from an enabled one + */ static bool dsa_port_is_valid(struct dsa_port *port) { - return !!port->dn; + return !!(port->dn || port->name); } static bool dsa_port_is_dsa(struct dsa_port *port) { - return !!of_parse_phandle(port->dn, "link", 0); + if (port->name && !strcmp(port->name, "dsa")) + return true; + else + return !!of_parse_phandle(port->dn, "link", 0); } static bool dsa_port_is_cpu(struct dsa_port *port) { - return !!of_parse_phandle(port->dn, "ethernet", 0); + if (port->name && !strcmp(port->name, "cpu")) + return true; + else + return !!of_parse_phandle(port->dn, "ethernet", 0); } static bool dsa_ds_find_port_dn(struct dsa_switch *ds, @@ -251,10 +260,11 @@ static void dsa_cpu_port_unapply(struct dsa_port *port, u32 index, static int dsa_user_port_apply(struct dsa_port *port, u32 index, struct dsa_switch *ds) { - const char *name; + const char *name = port->name; int err; - name = of_get_property(port->dn, "label", NULL); + if (port->dn) + name = of_get_property(port->dn, "label", NULL); if (!name) name = "eth%d"; @@ -439,11 +449,15 @@ static int dsa_cpu_parse(struct dsa_port *port, u32 index, struct net_device *ethernet_dev; struct device_node *ethernet; - ethernet = of_parse_phandle(port->dn, "ethernet", 0); - if (!ethernet) - return -EINVAL; + if (port->dn) { + ethernet = of_parse_phandle(port->dn, "ethernet", 0); + if (!ethernet) + return -EINVAL; + ethernet_dev = of_find_net_device_by_node(ethernet); + } else { + ethernet_dev = device_to_net_device(ds->cd->netdev[index]); + } - ethernet_dev = of_find_net_device_by_node(ethernet); if (!ethernet_dev) return -EPROBE_DEFER; @@ -462,6 +476,7 @@ static int dsa_cpu_parse(struct dsa_port *port, u32 index, dst->tag_ops = dsa_resolve_tag_protocol(tag_protocol); if (IS_ERR(dst->tag_ops)) { dev_warn(ds->dev, "No tagger for this switch\n"); + dev_put(ethernet_dev); return PTR_ERR(dst->tag_ops); } @@ -546,6 +561,33 @@ static int dsa_parse_ports_dn(struct device_node *ports, struct dsa_switch *ds) return 0; } +static int dsa_parse_ports(struct dsa_chip_data *cd, struct dsa_switch *ds) +{ + bool valid_name_found = false; + unsigned int i; + + for (i = 0; i < DSA_MAX_PORTS; i++) { + if (!cd->port_names[i]) + continue; + + ds->ports[i].name = cd->port_names[i]; + + /* Initialize enabled_port_mask now for drv->setup() +* to have access to a correct value, just like what +* net/dsa/dsa.c::dsa_switch_setup_one does. +*/ + if (!dsa_port_is_cpu(&ds->ports[i])) + ds->enabled_port_mask |= 1 << i; + +
[PATCH net-next v4 09/10] net: phy: Allow pre-declaration of MDIO devices
Allow board support code to collect pre-declarations for MDIO devices by registering them with mdiobus_register_board_info(). SPI and I2C buses have a similar feature, we were missing this for MDIO devices, but this is particularly useful for e.g: MDIO-connected switches which need to provide their port layout (often board-specific) to a MDIO Ethernet switch driver. Signed-off-by: Florian Fainelli --- drivers/net/phy/Makefile | 3 +- drivers/net/phy/mdio-boardinfo.c | 86 drivers/net/phy/mdio-boardinfo.h | 19 + drivers/net/phy/mdio_bus.c | 4 ++ drivers/net/phy/mdio_device.c| 11 + include/linux/mdio.h | 3 ++ include/linux/mod_devicetable.h | 1 + include/linux/phy.h | 19 + 8 files changed, 145 insertions(+), 1 deletion(-) create mode 100644 drivers/net/phy/mdio-boardinfo.c create mode 100644 drivers/net/phy/mdio-boardinfo.h diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile index 356859ac7c18..407b0b601ea8 100644 --- a/drivers/net/phy/Makefile +++ b/drivers/net/phy/Makefile @@ -1,6 +1,7 @@ # Makefile for Linux PHY drivers and MDIO bus drivers -libphy-y := phy.o phy_device.o mdio_bus.o mdio_device.o +libphy-y := phy.o phy_device.o mdio_bus.o mdio_device.o \ + mdio-boardinfo.o libphy-$(CONFIG_SWPHY) += swphy.o libphy-$(CONFIG_LED_TRIGGER_PHY) += phy_led_triggers.o diff --git a/drivers/net/phy/mdio-boardinfo.c b/drivers/net/phy/mdio-boardinfo.c new file mode 100644 index ..6b988f77da08 --- /dev/null +++ b/drivers/net/phy/mdio-boardinfo.c @@ -0,0 +1,86 @@ +/* + * mdio-boardinfo - Collect pre-declarations for MDIO devices + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the + * Free Software Foundation; either version 2 of the License, or (at your + * option) any later version. + */ + +#include +#include +#include +#include +#include + +#include "mdio-boardinfo.h" + +static LIST_HEAD(mdio_board_list); +static DEFINE_MUTEX(mdio_board_lock); + +/** + * mdiobus_setup_mdiodev_from_board_info - create and setup MDIO devices + * from pre-collected board specific MDIO information + * @mdiodev: MDIO device pointer + * Context: can sleep + */ +void mdiobus_setup_mdiodev_from_board_info(struct mii_bus *bus) +{ + struct mdio_board_entry *be; + struct mdio_device *mdiodev; + struct mdio_board_info *bi; + int ret; + + mutex_lock(&mdio_board_lock); + list_for_each_entry(be, &mdio_board_list, list) { + bi = &be->board_info; + + if (strcmp(bus->id, bi->bus_id)) + continue; + + mdiodev = mdio_device_create(bus, bi->mdio_addr); + if (IS_ERR(mdiodev)) + continue; + + strncpy(mdiodev->modalias, bi->modalias, + sizeof(mdiodev->modalias)); + mdiodev->bus_match = mdio_device_bus_match; + mdiodev->dev.platform_data = (void *)bi->platform_data; + + ret = mdio_device_register(mdiodev); + if (ret) { + mdio_device_free(mdiodev); + continue; + } + } + mutex_unlock(&mdio_board_lock); +} + +/** + * mdio_register_board_info - register MDIO devices for a given board + * @info: array of devices descriptors + * @n: number of descriptors provided + * Context: can sleep + * + * The board info passed can be marked with __initdata but be pointers + * such as platform_data etc. are copied as-is + */ +int mdiobus_register_board_info(const struct mdio_board_info *info, + unsigned int n) +{ + struct mdio_board_entry *be; + unsigned int i; + + be = kcalloc(n, sizeof(*be), GFP_KERNEL); + if (!be) + return -ENOMEM; + + for (i = 0; i < n; i++, be++, info++) { + memcpy(&be->board_info, info, sizeof(*info)); + mutex_lock(&mdio_board_lock); + list_add_tail(&be->list, &mdio_board_list); + mutex_unlock(&mdio_board_lock); + } + + return 0; +} diff --git a/drivers/net/phy/mdio-boardinfo.h b/drivers/net/phy/mdio-boardinfo.h new file mode 100644 index ..00f98163e90e --- /dev/null +++ b/drivers/net/phy/mdio-boardinfo.h @@ -0,0 +1,19 @@ +/* + * mdio-boardinfo.h - board info interface internal to the mdio_bus + * component + */ + +#ifndef __MDIO_BOARD_INFO_H +#define __MDIO_BOARD_INFO_H + +#include +#include + +struct mdio_board_entry { + struct list_headlist; + struct mdio_board_info board_info; +}; + +void mdiobus_setup_mdiodev_from_board_info(struct mii_bus *bus); + +#endif /* __MDIO_BOARD_INFO_H */ diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net
[PATCH net-next v4 10/10] ARM: orion: Register DSA switch as a MDIO device
Utilize the ability to pass board specific MDIO bus information towards a particular MDIO device thus allowing us to provide the per-port switch layout to the Marvell 88E6XXX switch driver. Since we would end-up with conflicting registration paths, do not register the "dsa" platform device anymore. Note that the MDIO devices registered by code in net/dsa/dsa2.c does not parse a dsa_platform_data, but directly take a dsa_chip_data (specific to a single switch chip), so we update the different call sites to pass this structure down to orion_ge00_switch_init(). Signed-off-by: Florian Fainelli --- arch/arm/mach-orion5x/common.c | 2 +- arch/arm/mach-orion5x/common.h | 4 ++-- arch/arm/mach-orion5x/rd88f5181l-fxo-setup.c | 7 +-- arch/arm/mach-orion5x/rd88f5181l-ge-setup.c | 7 +-- arch/arm/mach-orion5x/rd88f6183ap-ge-setup.c | 7 +-- arch/arm/mach-orion5x/wnr854t-setup.c| 2 +- arch/arm/mach-orion5x/wrt350n-v2-setup.c | 7 +-- arch/arm/plat-orion/common.c | 25 +++-- arch/arm/plat-orion/include/plat/common.h| 4 ++-- 9 files changed, 29 insertions(+), 36 deletions(-) diff --git a/arch/arm/mach-orion5x/common.c b/arch/arm/mach-orion5x/common.c index 04910764c385..83a7ec4c16d0 100644 --- a/arch/arm/mach-orion5x/common.c +++ b/arch/arm/mach-orion5x/common.c @@ -105,7 +105,7 @@ void __init orion5x_eth_init(struct mv643xx_eth_platform_data *eth_data) /* * Ethernet switch / -void __init orion5x_eth_switch_init(struct dsa_platform_data *d) +void __init orion5x_eth_switch_init(struct dsa_chip_data *d) { orion_ge00_switch_init(d); } diff --git a/arch/arm/mach-orion5x/common.h b/arch/arm/mach-orion5x/common.h index 8a4115bd441d..efeffc6b4ebb 100644 --- a/arch/arm/mach-orion5x/common.h +++ b/arch/arm/mach-orion5x/common.h @@ -3,7 +3,7 @@ #include -struct dsa_platform_data; +struct dsa_chip_data; struct mv643xx_eth_platform_data; struct mv_sata_platform_data; @@ -41,7 +41,7 @@ void orion5x_setup_wins(void); void orion5x_ehci0_init(void); void orion5x_ehci1_init(void); void orion5x_eth_init(struct mv643xx_eth_platform_data *eth_data); -void orion5x_eth_switch_init(struct dsa_platform_data *d); +void orion5x_eth_switch_init(struct dsa_chip_data *d); void orion5x_i2c_init(void); void orion5x_sata_init(struct mv_sata_platform_data *sata_data); void orion5x_spi_init(void); diff --git a/arch/arm/mach-orion5x/rd88f5181l-fxo-setup.c b/arch/arm/mach-orion5x/rd88f5181l-fxo-setup.c index dccadf68ea2b..a3c1336d30c9 100644 --- a/arch/arm/mach-orion5x/rd88f5181l-fxo-setup.c +++ b/arch/arm/mach-orion5x/rd88f5181l-fxo-setup.c @@ -101,11 +101,6 @@ static struct dsa_chip_data rd88f5181l_fxo_switch_chip_data = { .port_names[7] = "lan3", }; -static struct dsa_platform_data __initdata rd88f5181l_fxo_switch_plat_data = { - .nr_chips = 1, - .chip = &rd88f5181l_fxo_switch_chip_data, -}; - static void __init rd88f5181l_fxo_init(void) { /* @@ -120,7 +115,7 @@ static void __init rd88f5181l_fxo_init(void) */ orion5x_ehci0_init(); orion5x_eth_init(&rd88f5181l_fxo_eth_data); - orion5x_eth_switch_init(&rd88f5181l_fxo_switch_plat_data); + orion5x_eth_switch_init(&rd88f5181l_fxo_switch_chip_data); orion5x_uart0_init(); mvebu_mbus_add_window_by_id(ORION_MBUS_DEVBUS_BOOT_TARGET, diff --git a/arch/arm/mach-orion5x/rd88f5181l-ge-setup.c b/arch/arm/mach-orion5x/rd88f5181l-ge-setup.c index affe5ec825de..252efe29bd1a 100644 --- a/arch/arm/mach-orion5x/rd88f5181l-ge-setup.c +++ b/arch/arm/mach-orion5x/rd88f5181l-ge-setup.c @@ -102,11 +102,6 @@ static struct dsa_chip_data rd88f5181l_ge_switch_chip_data = { .port_names[7] = "lan3", }; -static struct dsa_platform_data __initdata rd88f5181l_ge_switch_plat_data = { - .nr_chips = 1, - .chip = &rd88f5181l_ge_switch_chip_data, -}; - static struct i2c_board_info __initdata rd88f5181l_ge_i2c_rtc = { I2C_BOARD_INFO("ds1338", 0x68), }; @@ -125,7 +120,7 @@ static void __init rd88f5181l_ge_init(void) */ orion5x_ehci0_init(); orion5x_eth_init(&rd88f5181l_ge_eth_data); - orion5x_eth_switch_init(&rd88f5181l_ge_switch_plat_data); + orion5x_eth_switch_init(&rd88f5181l_ge_switch_chip_data); orion5x_i2c_init(); orion5x_uart0_init(); diff --git a/arch/arm/mach-orion5x/rd88f6183ap-ge-setup.c b/arch/arm/mach-orion5x/rd88f6183ap-ge-setup.c index 67ee8571b03c..f4f1dbe1d91d 100644 --- a/arch/arm/mach-orion5x/rd88f6183ap-ge-setup.c +++ b/arch/arm/mach-orion5x/rd88f6183ap-ge-setup.c @@ -40,11 +40,6 @@ static struct dsa_chip_data rd88f6183ap_ge_switch_chip_data = { .port_names[5] = "cpu", }; -static struct dsa_platform_data __in
[PATCH net-next v4 00/10] net: dsa: Support for pdata in dsa2
Hi all, This is not exactly new, and was sent before, although back then, I did not have an user of the pre-declared MDIO board information, but now we do. Note that I have additional changes queued up to have b53 register platform data for MIPS bcm47xx and bcm63xx. Yes I know that we should have the Orion platforms eventually be converted to Device Tree, but until that happens, I don't want any remaining users of the old "dsa" platform device (hence the previous DTS submissions for ARM/mvebu) and, there will be platforms out there that most likely won't never see DT coming their way (BCM47xx is almost 100% sure, BCM63xx maybe not in a distant future). We would probably want the whole series to be merged via David Miller's tree to simplify things. Greg, can you Ack/Nack patch 5 since it touched the core LDD? Vivien, since some patches did change, I did not carry your Tested-by tag to all patches. Thanks! Changes in v4: - Changed device_find_class() to device_find_in_class_name() - Added kerneldoc above device_find_in_class_name() to explain what it does and the calling convention regarding device reference counts - Changed dev_to_net_device to device_to_net_device() added comments about what it does and the caller conventions regarding reference counts Changes in v3: - Tested EPROBE_DEFER from a mockup MDIO/DSA switch driver and everything is fine, once the driver finally probes we have access to platform data as expected - added comment above dsa_port_is_valid() that port->name is mandatory for platform data cases - added an extra check in dsa_parse_member() for a NULL pdata pointer - fixed a bunch of checkpatch errors and warnings Changes in v2: - Rebased against latest net-next/master - Moved dev_find_class() to device_find_class() into drivers/base/core.c - Moved dev_to_net_device into net/core/dev.c - Utilize dsa_chip_data directly instead of dsa_platform_data - Augmented dsa_chip_data to be multi-CPU port ready Changes from last submission (few months back): - rebased against latest net-next - do not introduce dsa2_platform_data which was overkill and was meant to allow us to do exaclty the same things with platform data and Device Tree we use the existing dsa_platform_data instead - properly register MDIO devices when the MDIO bus is registered and associate platform_data with them - add a change to the Orion platform code to demonstrate how this can be used Thank you Florian Fainelli (10): net: dsa: Pass device pointer to dsa_register_switch net: dsa: Make most functions take a dsa_port argument net: dsa: Suffix function manipulating device_node with _dn net: dsa: Move ports assignment closer to error checking drivers: base: Add device_find_in_class_name() net: dsa: Migrate to device_find_in_class_name() net: Relocate dev_to_net_device() into net/core/dev.c net: dsa: Add support for platform data net: phy: Allow pre-declaration of MDIO devices ARM: orion: Register DSA switch as a MDIO device arch/arm/mach-orion5x/common.c | 2 +- arch/arm/mach-orion5x/common.h | 4 +- arch/arm/mach-orion5x/rd88f5181l-fxo-setup.c | 7 +- arch/arm/mach-orion5x/rd88f5181l-ge-setup.c | 7 +- arch/arm/mach-orion5x/rd88f6183ap-ge-setup.c | 7 +- arch/arm/mach-orion5x/wnr854t-setup.c| 2 +- arch/arm/mach-orion5x/wrt350n-v2-setup.c | 7 +- arch/arm/plat-orion/common.c | 25 +++- arch/arm/plat-orion/include/plat/common.h| 4 +- drivers/base/core.c | 31 + drivers/net/dsa/b53/b53_common.c | 2 +- drivers/net/dsa/mv88e6xxx/chip.c | 11 +- drivers/net/dsa/qca8k.c | 2 +- drivers/net/phy/Makefile | 3 +- drivers/net/phy/mdio-boardinfo.c | 86 + drivers/net/phy/mdio-boardinfo.h | 19 +++ drivers/net/phy/mdio_bus.c | 4 + drivers/net/phy/mdio_device.c| 11 ++ include/linux/device.h | 2 + include/linux/mdio.h | 3 + include/linux/mod_devicetable.h | 1 + include/linux/netdevice.h| 2 + include/linux/phy.h | 19 +++ include/net/dsa.h| 8 +- net/core/dev.c | 30 + net/dsa/dsa.c| 55 ++--- net/dsa/dsa2.c | 175 +++ net/dsa/dsa_priv.h | 4 +- 28 files changed, 391 insertions(+), 142 deletions(-) create mode 100644 drivers/net/phy/mdio-boardinfo.c create mode 100644 drivers/net/phy/mdio-boardinfo.h -- 2.9.3
[PATCH net-next v4 01/10] net: dsa: Pass device pointer to dsa_register_switch
In preparation for allowing dsa_register_switch() to be supplied with device/platform data, pass down a struct device pointer instead of a struct device_node. Signed-off-by: Florian Fainelli --- drivers/net/dsa/b53/b53_common.c | 2 +- drivers/net/dsa/mv88e6xxx/chip.c | 11 ++- drivers/net/dsa/qca8k.c | 2 +- include/net/dsa.h| 2 +- net/dsa/dsa2.c | 7 --- 5 files changed, 13 insertions(+), 11 deletions(-) diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c index 5102a3701a1a..7179eed9ee6d 100644 --- a/drivers/net/dsa/b53/b53_common.c +++ b/drivers/net/dsa/b53/b53_common.c @@ -1882,7 +1882,7 @@ int b53_switch_register(struct b53_device *dev) pr_info("found switch: %s, rev %i\n", dev->name, dev->core_rev); - return dsa_register_switch(dev->ds, dev->ds->dev->of_node); + return dsa_register_switch(dev->ds, dev->ds->dev); } EXPORT_SYMBOL(b53_switch_register); diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index 987b2dbbd35a..3238a4752b98 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -4421,8 +4421,7 @@ static struct dsa_switch_driver mv88e6xxx_switch_drv = { .ops= &mv88e6xxx_switch_ops, }; -static int mv88e6xxx_register_switch(struct mv88e6xxx_chip *chip, -struct device_node *np) +static int mv88e6xxx_register_switch(struct mv88e6xxx_chip *chip) { struct device *dev = chip->dev; struct dsa_switch *ds; @@ -4437,7 +4436,7 @@ static int mv88e6xxx_register_switch(struct mv88e6xxx_chip *chip, dev_set_drvdata(dev, ds); - return dsa_register_switch(ds, np); + return dsa_register_switch(ds, dev); } static void mv88e6xxx_unregister_switch(struct mv88e6xxx_chip *chip) @@ -4521,9 +4520,11 @@ static int mv88e6xxx_probe(struct mdio_device *mdiodev) if (err) goto out_g2_irq; - err = mv88e6xxx_register_switch(chip, np); - if (err) + err = mv88e6xxx_register_switch(chip); + if (err) { + mv88e6xxx_mdio_unregister(chip); goto out_mdio; + } return 0; diff --git a/drivers/net/dsa/qca8k.c b/drivers/net/dsa/qca8k.c index 54d270d59eb0..c084aa484d2b 100644 --- a/drivers/net/dsa/qca8k.c +++ b/drivers/net/dsa/qca8k.c @@ -964,7 +964,7 @@ qca8k_sw_probe(struct mdio_device *mdiodev) mutex_init(&priv->reg_mutex); dev_set_drvdata(&mdiodev->dev, priv); - return dsa_register_switch(priv->ds, priv->ds->dev->of_node); + return dsa_register_switch(priv->ds, &mdiodev->dev); } static void diff --git a/include/net/dsa.h b/include/net/dsa.h index b94d1f2ef912..16a502a6c26a 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -403,7 +403,7 @@ static inline bool dsa_uses_tagged_protocol(struct dsa_switch_tree *dst) } void dsa_unregister_switch(struct dsa_switch *ds); -int dsa_register_switch(struct dsa_switch *ds, struct device_node *np); +int dsa_register_switch(struct dsa_switch *ds, struct device *dev); #ifdef CONFIG_PM_SLEEP int dsa_switch_suspend(struct dsa_switch *ds); int dsa_switch_resume(struct dsa_switch *ds); diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c index 42a41d84053c..4170f7ea8e28 100644 --- a/net/dsa/dsa2.c +++ b/net/dsa/dsa2.c @@ -579,8 +579,9 @@ static struct device_node *dsa_get_ports(struct dsa_switch *ds, return ports; } -static int _dsa_register_switch(struct dsa_switch *ds, struct device_node *np) +static int _dsa_register_switch(struct dsa_switch *ds, struct device *dev) { + struct device_node *np = dev->of_node; struct device_node *ports = dsa_get_ports(ds, np); struct dsa_switch_tree *dst; u32 tree, index; @@ -660,12 +661,12 @@ static int _dsa_register_switch(struct dsa_switch *ds, struct device_node *np) return err; } -int dsa_register_switch(struct dsa_switch *ds, struct device_node *np) +int dsa_register_switch(struct dsa_switch *ds, struct device *dev) { int err; mutex_lock(&dsa2_mutex); - err = _dsa_register_switch(ds, np); + err = _dsa_register_switch(ds, dev); mutex_unlock(&dsa2_mutex); return err; -- 2.9.3
Re: 52bd2d62ce6758d811edcbd2256eb9ea7f6a56cb fixing crashes? -> 4.4 stable?
On Tue, 2017-01-17 at 22:48 +0100, Nikola Ciprich wrote: > Dear netdev developers, > > I'd like to ask for a consultation regarding 4.4 kernel crashes. > we're using intel X540-AT2 10g controllers (onboard ones, on supermicro > boards) and we've noticed, then when using openvswitch, system very quickly > crashes on 4.4.x kernels we're usign. 4.5 is fine though. > > here's backtrace gathered from system pstore: Adding the openvswitch maintainer, Pravin. Hopefully you'll get a quicker response. - Greg > > <1>[ 1084.114586] BUG: unable to handle kernel paging request at > 8840c365b5c4 > <1>[ 1084.114918] IP: [] __netdev_pick_tx+0x92/0x140 > <4>[ 1084.115101] PGD 2018067 PUD 0 > <4>[ 1084.115270] Oops: [#1] SMP > <4>[ 1084.115439] Modules linked in: bonding(E) openvswitch(E) > nf_defrag_ipv6(E) nf_conntrack(E) crc32_pclmul(E) aesni_intel(E) lrw(E) > gf128mul(E) glue_helper(E) ablk_helper(E) cryptd(E) kvm > _intel(E) kvm(E) irqbypass(E) coretemp(E) crct10dif_pclmul(E) > intel_powerclamp(E) x86_pkg_temp_thermal(E) ses(E) enclosure(E) iTCO_wdt(E) > iTCO_vendor_support(E) mxm_wmi(E) i2c_i801(E) lpc_ic > h(E) mei_me(E) mfd_core(E) i2c_core(E) sb_edac(E) sg(E) mei(E) pcspkr(E) > edac_core(E) ipmi_devintf(E) ioatdma(E) shpchp(E) wmi(E) ipmi_si(E) > ipmi_msghandler(E) 8250_fintek(E) acpi_power_mete > r(E) acpi_pad(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) > sunrpc(E) ip_tables(E) ext4(E) jbd2(E) mbcache(E) raid1(E) sd_mod(E) ahci(E) > libahci(E) bnx2x(E) libcrc32c(E) ixgbe(E) cr > c32c_intel(E) libata(E) mdio(E) ptp(E) dca(E) megaraid_sas(E) pps_core(E) > dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) > <4>[ 1084.117683] CPU: 0 PID: 0 Comm: swapper/0 Tainted: GE > 4.4.33lb7.01 #1 > <4>[ 1084.118012] Hardware name: Supermicro X10DRi/X10DRI-T, BIOS 2.1 > 09/13/2016 > <4>[ 1084.118181] task: 819f14c0 ti: 819e task.ti: > 819e > <4>[ 1084.118501] RIP: 0010:[] [] > __netdev_pick_tx+0x92/0x140 > <4>[ 1084.118828] RSP: 0018:883f7f003638 EFLAGS: 00010a02 > <4>[ 1084.118994] RAX: aef55a76 RBX: RCX: > 9d6e7dcd > <4>[ 1084.119164] RDX: ba9f4f5f RSI: 883f63f14d00 RDI: > 883f7f0035ec > <4>[ 1084.119333] RBP: 883f7f003668 R08: 0003 R09: > c8cfdbe1 > <4>[ 1084.119506] R10: 883f61206042 R11: 883f7f0035c0 R12: > > <4>[ 1084.119679] R13: 883f657b00c0 R14: 883f5d92 R15: > f012 > <4>[ 1084.119850] FS: () GS:883f7f00() > knlGS: > <4>[ 1084.120171] CS: 0010 DS: ES: CR0: 80050033 > <4>[ 1084.120338] CR2: 8840c365b5c4 CR3: 019ea000 CR4: > 003406f0 > <4>[ 1084.120509] DR0: DR1: DR2: > > <4>[ 1084.120678] DR3: DR6: fffe0ff0 DR7: > 0400 > <4>[ 1084.120847] Stack: > <4>[ 1084.121006] 883f63f14d00 883f63f14d00 000e > > <4>[ 1084.121339] 883f5d92 883f60a7f840 883f7f0036a0 > a00fbed4 > <4>[ 1084.121672] 883f603612ac 883f5d92 883f63f14d00 > > <4>[ 1084.122006] Call Trace: > <4>[ 1084.122168] > <4>[ 1084.122193] [] ixgbe_select_queue+0xc4/0x150 [ixgbe] > <4>[ 1084.122519] [] netdev_pick_tx+0x5e/0xf0 > <4>[ 1084.122687] [] __dev_queue_xmit+0xa2/0x560 > <4>[ 1084.122856] [] dev_queue_xmit+0x10/0x20 > <4>[ 1084.123034] [] bond_dev_queue_xmit+0x32/0x80 > [bonding] > <4>[ 1084.123207] [] bond_start_xmit+0x1a6/0x3f0 [bonding] > <4>[ 1084.123382] [] ? ep_poll_callback+0xb5/0x160 > <4>[ 1084.123551] [] dev_hard_start_xmit+0x238/0x3f0 > <4>[ 1084.123721] [] ? netif_skb_features+0xff/0x200 > <4>[ 1084.123890] [] __dev_queue_xmit+0x442/0x560 > <4>[ 1084.124059] [] dev_queue_xmit+0x10/0x20 > <4>[ 1084.124232] [] ovs_vport_send+0x4a/0xc0 [openvswitch] > <4>[ 1084.124404] [] do_output.isra.30+0x43/0x160 > [openvswitch] > <4>[ 1084.124575] [] ? __skb_clone+0x2e/0x140 > <4>[ 1084.124744] [] do_execute_actions+0x684/0x7e0 > [openvswitch] > <4>[ 1084.125067] [] ovs_execute_actions+0x32/0xd0 > [openvswitch] > <4>[ 1084.125240] [] ovs_dp_process_packet+0x84/0x110 > [openvswitch] > <4>[ 1084.125565] [] ovs_vport_receive+0x6c/0xd0 > [openvswitch] > <4>[ 1084.125740] [] ? check_preempt_curr+0x75/0x90 > <4>[ 1084.125912] [] ? ttwu_do_wakeup+0x19/0xe0 > <4>[ 1084.126081] [] ? > ttwu_do_activate.constprop.95+0x5d/0x70 > <4>[ 1084.126252] [] ? try_to_wake_up+0x47/0x340 > <4>[ 1084.126427] [] ? default_wake_function+0x12/0x20 > <4>[ 1084.126600] [] ? autoremove_wake_function+0x2b/0x40 > <4>[ 1084.126773] [] netdev_frame_hook+0xe7/0x150 > [openvswitch] > <4>[ 1084.126945] [] __netif_receive_skb_core+0x1e0/0x9e0 > <4>[ 1084.127115] [] ? ipv6_gro_receive+0x246/0x360 > <4>[ 1084.127284] [] __netif_receive_skb+0x18/0x60 > <4>[ 1084.127453] [] netif_r
Re: [PATCH net] lwtunnel: fix autoload of lwt modules
On 1/17/17 1:54 PM, David Miller wrote: > From: David Ahern > Date: Tue, 17 Jan 2017 13:46:22 -0700 > >> In short seems like removing the dev + the current patch dropping >> the lock fixes the current deadlock problem and should be fine. > > What about the state recorded by fib_get_nhs() and similar? There is > a mapping from ifindex to ->nh_dev which would be invalidated if the > RTNL semaphore is dropped. As far as I can see through the call to build_state all device indices came from the user and have not been validated yet (once the dev arg to build_state is removed; sent that patch for net-next). The device index validation happens later in fib_create_info with the call to fib_check_nh (or dev_get_by_index for host scope). I sent an alternative approach that pulls the module loading into a separate function that is called while creating the fib_config. Performance heavy for multipath but solves the autoload without delving into the restart problem.
Re: [PATCH] net: ethernet: stmmac: add ARP management
On Tue, Jan 17, 2017 at 6:56 PM, Christophe Roullier wrote: > +static int dwmac4_arp_enable(struct mac_device_info *hw) > +{ > + void __iomem *ioaddr = hw->pcsr; __iomem *config = hw->pcsr + GMAC_CONFIG; > + u32 value = readl(ioaddr + GMAC_CONFIG); > + > + value |= GMAC_CONFIG_ARPEN; > + > + writel(value, ioaddr + GMAC_CONFIG); u32 value; value = readl(); writel(value | ...); ? > + > + value = readl(ioaddr + GMAC_CONFIG); > + > + return !!(value & GMAC_CONFIG_ARPEN); > +} > +/* Set ARP Address */ > +static void dwmac4_set_arp_addr(void __iomem *ioaddr, bool set, u32 addr) > +{ __iomem *arp_addr = ioaddr + GMAC_ARP_ADDR; > + u32 value; > + > + value = readl(ioaddr + GMAC_ARP_ADDR); Care to explain why do you need dummy readl() here? > + > + if (set) { > + /* set arp address */ > + value = addr; > + } else { > + /* unset arp address */ > + value = 0; > + } value = set ? addr : 0; > + > + writel(value, ioaddr + GMAC_ARP_ADDR); > + value = readl(ioaddr + GMAC_ARP_ADDR); > +} > + if ((priv->plat->arp_en) && (priv->dma_cap.arpoffsel)) { > + ret = priv->hw->mac->arp_en(priv->hw); > + if (!ret) { Hmm... Most would expect if (ret) { doing something } else { doing something else } > + pr_warn(" ARP feature disabled\n"); > + } else { > + pr_info(" ARP feature enabled\n"); Wouldn't be too noisy? pr_* -> dev_* > + /* Copy MAC addr into MAC_ARP_ADDRESS register*/ > + priv->hw->dma->set_arp_addr(priv->ioaddr, 1, > + priv->dev->dev_addr); > + } > + } -- With Best Regards, Andy Shevchenko
RE: [PATCHv1 5/7] TAP: Extending tap device create/destroy APIs
> -Original Message- > From: Eric Dumazet [mailto:eric.duma...@gmail.com] > Sent: Friday, January 06, 2017 3:16 PM > To: Grandhi, Sainath > Cc: netdev@vger.kernel.org; da...@davemloft.net; > mah...@bandewar.net; linux-ker...@vger.kernel.org > Subject: Re: [PATCHv1 5/7] TAP: Extending tap device create/destroy APIs > > On Fri, 2017-01-06 at 14:33 -0800, Sainath Grandhi wrote: > > > +static int tap_list_add(dev_t major, const char *device_name) { > > + int err = 0; > > + struct major_info *tap_major; > > + > > + tap_major = kzalloc(sizeof(*tap_major), GFP_ATOMIC); > > + > > + tap_major->major = MAJOR(major); > > + > > > kzalloc() can perfectly return NULL. > > You do not want to crash it that happens. > Thanks for pointing out. Will send out next version that takes care of null pointer
[PATCH net v2] lwtunnel: fix autoload of lwt modules
Trying to add an mpls encap route when the MPLS modules are not loaded hangs. For example: CONFIG_MPLS=y CONFIG_NET_MPLS_GSO=m CONFIG_MPLS_ROUTING=m CONFIG_MPLS_IPTUNNEL=m $ ip route add 10.10.10.10/32 encap mpls 100 via inet 10.100.1.2 The ip command hangs: root 880 826 0 21:25 pts/000:00:00 ip route add 10.10.10.10/32 encap mpls 100 via inet 10.100.1.2 $ cat /proc/880/stack [] call_usermodehelper_exec+0xd6/0x134 [] __request_module+0x27b/0x30a [] lwtunnel_build_state+0xe4/0x178 [] fib_create_info+0x47f/0xdd4 [] fib_table_insert+0x90/0x41f [] inet_rtm_newroute+0x4b/0x52 ... modprobe is trying to load rtnl-lwt-MPLS: root 881 5 0 21:25 ?00:00:00 /sbin/modprobe -q -- rtnl-lwt-MPLS and it hangs after loading mpls_router: $ cat /proc/881/stack [] rtnl_lock+0x12/0x14 [] register_netdevice_notifier+0x16/0x179 [] mpls_init+0x25/0x1000 [mpls_router] [] do_one_initcall+0x8e/0x13f [] do_init_module+0x5a/0x1e5 [] load_module+0x13bd/0x17d6 ... The problem is that lwtunnel_build_state is called with rtnl lock held preventing mpls_init from registering. Given the potential references held by the time lwtunnel_build_state it can not drop the rtnl lock to the load module. So, extract the module loading code from lwtunnel_build_state into a new function to validate the encap type. The new function is called while converting the user request into a fib_config which is well before any table, device or fib entries are examined. Fixes: 745041e2aaf1 ("lwtunnel: autoload of lwt modules") Signed-off-by: David Ahern --- v2 - extract the module load attempt into a separate function that is called early in the newroute code paths include/net/lwtunnel.h | 11 + net/core/lwtunnel.c | 62 - net/ipv4/fib_frontend.c | 8 +++ net/ipv6/route.c| 12 +- 4 files changed, 86 insertions(+), 7 deletions(-) diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h index d4c1c75b8862..0b585f1fd340 100644 --- a/include/net/lwtunnel.h +++ b/include/net/lwtunnel.h @@ -105,6 +105,8 @@ int lwtunnel_encap_add_ops(const struct lwtunnel_encap_ops *op, unsigned int num); int lwtunnel_encap_del_ops(const struct lwtunnel_encap_ops *op, unsigned int num); +int lwtunnel_valid_encap_type(u16 encap_type); +int lwtunnel_valid_encap_type_attr(struct nlattr *attr, int len); int lwtunnel_build_state(struct net_device *dev, u16 encap_type, struct nlattr *encap, unsigned int family, const void *cfg, @@ -168,6 +170,15 @@ static inline int lwtunnel_encap_del_ops(const struct lwtunnel_encap_ops *op, return -EOPNOTSUPP; } +static inline int lwtunnel_valid_encap_type(u16 encap_type) +{ + return -EOPNOTSUPP; +} +static inline int lwtunnel_valid_encap_type_attr(struct nlattr *attr, int len) +{ + return -EOPNOTSUPP; +} + static inline int lwtunnel_build_state(struct net_device *dev, u16 encap_type, struct nlattr *encap, unsigned int family, const void *cfg, diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c index a5d4e866ce88..47b1dd65947b 100644 --- a/net/core/lwtunnel.c +++ b/net/core/lwtunnel.c @@ -26,6 +26,7 @@ #include #include #include +#include #ifdef CONFIG_MODULES @@ -114,25 +115,74 @@ int lwtunnel_build_state(struct net_device *dev, u16 encap_type, ret = -EOPNOTSUPP; rcu_read_lock(); ops = rcu_dereference(lwtun_encaps[encap_type]); + if (likely(ops && ops->build_state)) + ret = ops->build_state(dev, encap, family, cfg, lws); + rcu_read_unlock(); + + return ret; +} +EXPORT_SYMBOL(lwtunnel_build_state); + +int lwtunnel_valid_encap_type(u16 encap_type) +{ + const struct lwtunnel_encap_ops *ops; + int ret = -EINVAL; + + if (encap_type == LWTUNNEL_ENCAP_NONE || + encap_type > LWTUNNEL_ENCAP_MAX) + return ret; + + rcu_read_lock(); + ops = rcu_dereference(lwtun_encaps[encap_type]); + rcu_read_unlock(); #ifdef CONFIG_MODULES if (!ops) { const char *encap_type_str = lwtunnel_encap_str(encap_type); if (encap_type_str) { - rcu_read_unlock(); + __rtnl_unlock(); request_module("rtnl-lwt-%s", encap_type_str); + rtnl_lock(); + rcu_read_lock(); ops = rcu_dereference(lwtun_encaps[encap_type]); + rcu_read_unlock(); } } #endif - if (likely(ops && ops->build_state)) - ret = ops->build_state(dev, encap, family, cfg, lws); - rcu_read_unlock(); + return ops ? 0 : -EOPNOTSUPP; +} +E
[net PATCH v5 5/6] virtio_net: refactor freeze/restore logic into virtnet reset logic
For XDP we will need to reset the queues to allow for buffer headroom to be configured. In order to do this we need to essentially run the freeze()/restore() code path. Unfortunately the locking requirements between the freeze/restore and reset paths are different however so we can not simply reuse the code. This patch refactors the code path and adds a reset helper routine. Signed-off-by: John Fastabend --- drivers/net/virtio_net.c | 75 -- drivers/virtio/virtio.c | 42 ++ include/linux/virtio.h |4 ++ 3 files changed, 73 insertions(+), 48 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 922ca66..62dbf4b 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1684,6 +1684,49 @@ static void virtnet_init_settings(struct net_device *dev) .set_settings = virtnet_set_settings, }; +static void virtnet_freeze_down(struct virtio_device *vdev) +{ + struct virtnet_info *vi = vdev->priv; + int i; + + /* Make sure no work handler is accessing the device */ + flush_work(&vi->config_work); + + netif_device_detach(vi->dev); + cancel_delayed_work_sync(&vi->refill); + + if (netif_running(vi->dev)) { + for (i = 0; i < vi->max_queue_pairs; i++) + napi_disable(&vi->rq[i].napi); + } +} + +static int init_vqs(struct virtnet_info *vi); + +static int virtnet_restore_up(struct virtio_device *vdev) +{ + struct virtnet_info *vi = vdev->priv; + int err, i; + + err = init_vqs(vi); + if (err) + return err; + + virtio_device_ready(vdev); + + if (netif_running(vi->dev)) { + for (i = 0; i < vi->curr_queue_pairs; i++) + if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL)) + schedule_delayed_work(&vi->refill, 0); + + for (i = 0; i < vi->max_queue_pairs; i++) + virtnet_napi_enable(&vi->rq[i]); + } + + netif_device_attach(vi->dev); + return err; +} + static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog) { unsigned long int max_sz = PAGE_SIZE - sizeof(struct padded_vnet_hdr); @@ -2374,21 +2417,9 @@ static void virtnet_remove(struct virtio_device *vdev) static int virtnet_freeze(struct virtio_device *vdev) { struct virtnet_info *vi = vdev->priv; - int i; virtnet_cpu_notif_remove(vi); - - /* Make sure no work handler is accessing the device */ - flush_work(&vi->config_work); - - netif_device_detach(vi->dev); - cancel_delayed_work_sync(&vi->refill); - - if (netif_running(vi->dev)) { - for (i = 0; i < vi->max_queue_pairs; i++) - napi_disable(&vi->rq[i].napi); - } - + virtnet_freeze_down(vdev); remove_vq_common(vi); return 0; @@ -2397,25 +2428,11 @@ static int virtnet_freeze(struct virtio_device *vdev) static int virtnet_restore(struct virtio_device *vdev) { struct virtnet_info *vi = vdev->priv; - int err, i; + int err; - err = init_vqs(vi); + err = virtnet_restore_up(vdev); if (err) return err; - - virtio_device_ready(vdev); - - if (netif_running(vi->dev)) { - for (i = 0; i < vi->curr_queue_pairs; i++) - if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL)) - schedule_delayed_work(&vi->refill, 0); - - for (i = 0; i < vi->max_queue_pairs; i++) - virtnet_napi_enable(&vi->rq[i]); - } - - netif_device_attach(vi->dev); - virtnet_set_queues(vi, vi->curr_queue_pairs); err = virtnet_cpu_notif_add(vi); diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c index 7062bb0..400d70b 100644 --- a/drivers/virtio/virtio.c +++ b/drivers/virtio/virtio.c @@ -100,11 +100,6 @@ static int virtio_uevent(struct device *_dv, struct kobj_uevent_env *env) dev->id.device, dev->id.vendor); } -static void add_status(struct virtio_device *dev, unsigned status) -{ - dev->config->set_status(dev, dev->config->get_status(dev) | status); -} - void virtio_check_driver_offered_feature(const struct virtio_device *vdev, unsigned int fbit) { @@ -145,14 +140,15 @@ void virtio_config_changed(struct virtio_device *dev) } EXPORT_SYMBOL_GPL(virtio_config_changed); -static void virtio_config_disable(struct virtio_device *dev) +void virtio_config_disable(struct virtio_device *dev) { spin_lock_irq(&dev->config_lock); dev->config_enabled = false; spin_unlock_irq(&dev->config_lock); } +EXPORT_SYMBOL_GPL(virtio_config_disable); -static void virtio_config_enable(struct virtio_device *dev) +void virtio_config_enable(struct virtio_device *dev) {
[PATCH] Revert "net: qcom/emac: configure the external phy to allow pause frames"
This reverts commit 3e884493448131179a5b7cae1ddca1028ffaecc8. With commit 529ed1275263 ("net: phy: phy drivers should not set SUPPORTED_[Asym_]Pause"), phylib now handles automatically enabling pause frame support in the PHY, and the MAC driver should follow suit. Since the EMAC driver driver does this, we no longer need to force pause frames support. Signed-off-by: Timur Tabi --- drivers/net/ethernet/qualcomm/emac/emac-mac.c | 6 -- 1 file changed, 6 deletions(-) diff --git a/drivers/net/ethernet/qualcomm/emac/emac-mac.c b/drivers/net/ethernet/qualcomm/emac/emac-mac.c index 0b4deb3..384e1be 100644 --- a/drivers/net/ethernet/qualcomm/emac/emac-mac.c +++ b/drivers/net/ethernet/qualcomm/emac/emac-mac.c @@ -1004,12 +1004,6 @@ int emac_mac_up(struct emac_adapter *adpt) writel((u32)~DIS_INT, adpt->base + EMAC_INT_STATUS); writel(adpt->irq.mask, adpt->base + EMAC_INT_MASK); - /* Enable pause frames. Without this feature, the EMAC has been shown -* to receive (and drop) frames with FCS errors at gigabit connections. -*/ - adpt->phydev->supported |= SUPPORTED_Pause | SUPPORTED_Asym_Pause; - adpt->phydev->advertising |= SUPPORTED_Pause | SUPPORTED_Asym_Pause; - adpt->phydev->irq = PHY_IGNORE_INTERRUPT; phy_start(adpt->phydev); -- Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
[net PATCH v5 6/6] virtio_net: XDP support for adjust_head
Add support for XDP adjust head by allocating a 256B header region that XDP programs can grow into. This is only enabled when a XDP program is loaded. In order to ensure that we do not have to unwind queue headroom push queue setup below bpf_prog_add. It reads better to do a prog ref unwind vs another queue setup call. At the moment this code must do a full reset to ensure old buffers without headroom on program add or with headroom on program removal are not used incorrectly in the datapath. Ideally we would only have to disable/enable the RX queues being updated but there is no API to do this at the moment in virtio so use the big hammer. In practice it is likely not that big of a problem as this will only happen when XDP is enabled/disabled changing programs does not require the reset. There is some risk that the driver may either have an allocation failure or for some reason fail to correctly negotiate with the underlying backend in this case the driver will be left uninitialized. I have not seen this ever happen on my test systems and for what its worth this same failure case can occur from probe and other contexts in virtio framework. Signed-off-by: John Fastabend --- drivers/net/virtio_net.c | 149 +++--- 1 file changed, 125 insertions(+), 24 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 62dbf4b..3b129b4 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -41,6 +41,9 @@ #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN) #define GOOD_COPY_LEN 128 +/* Amount of XDP headroom to prepend to packets for use by xdp_adjust_head */ +#define VIRTIO_XDP_HEADROOM 256 + /* RX packet size EWMA. The average packet size is used to determine the packet * buffer size when refilling RX rings. As the entire RX ring may be refilled * at once, the weight is chosen so that the EWMA will be insensitive to short- @@ -359,6 +362,7 @@ static void virtnet_xdp_xmit(struct virtnet_info *vi, } if (vi->mergeable_rx_bufs) { + xdp->data -= sizeof(struct virtio_net_hdr_mrg_rxbuf); /* Zero header and leave csum up to XDP layers */ hdr = xdp->data; memset(hdr, 0, vi->hdr_len); @@ -375,7 +379,9 @@ static void virtnet_xdp_xmit(struct virtnet_info *vi, num_sg = 2; sg_init_table(sq->sg, 2); sg_set_buf(sq->sg, hdr, vi->hdr_len); - skb_to_sgvec(skb, sq->sg + 1, 0, skb->len); + skb_to_sgvec(skb, sq->sg + 1, +xdp->data - xdp->data_hard_start, +xdp->data_end - xdp->data); } err = virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, data, GFP_ATOMIC); @@ -401,7 +407,6 @@ static struct sk_buff *receive_small(struct net_device *dev, struct bpf_prog *xdp_prog; len -= vi->hdr_len; - skb_trim(skb, len); rcu_read_lock(); xdp_prog = rcu_dereference(rq->xdp_prog); @@ -413,11 +418,15 @@ static struct sk_buff *receive_small(struct net_device *dev, if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags)) goto err_xdp; - xdp.data = skb->data; + xdp.data_hard_start = skb->data; + xdp.data = skb->data + VIRTIO_XDP_HEADROOM; xdp.data_end = xdp.data + len; act = bpf_prog_run_xdp(xdp_prog, &xdp); switch (act) { case XDP_PASS: + /* Recalculate length in case bpf program changed it */ + __skb_pull(skb, xdp.data - xdp.data_hard_start); + len = xdp.data_end - xdp.data; break; case XDP_TX: virtnet_xdp_xmit(vi, rq, &xdp, skb); @@ -432,6 +441,7 @@ static struct sk_buff *receive_small(struct net_device *dev, } rcu_read_unlock(); + skb_trim(skb, len); return skb; err_xdp: @@ -480,7 +490,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq, unsigned int *len) { struct page *page = alloc_page(GFP_ATOMIC); - unsigned int page_off = 0; + unsigned int page_off = VIRTIO_XDP_HEADROOM; if (!page) return NULL; @@ -516,7 +526,8 @@ static struct page *xdp_linearize_page(struct receive_queue *rq, put_page(p); } - *len = page_off; + /* Headroom does not contribute to packet length */ + *len = page_off - VIRTIO_XDP_HEADROOM; return page; err_buf: __free_pages(page, 0); @@ -555,7 +566,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, page, offset, &len); if (!xdp_page) goto err_
[net PATCH v5 4/6] virtio_net: remove duplicate queue pair binding in XDP
Factor out qp assignment. Signed-off-by: John Fastabend --- drivers/net/virtio_net.c | 18 +++--- 1 file changed, 7 insertions(+), 11 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 6de0cbe..922ca66 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -332,15 +332,19 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi, static void virtnet_xdp_xmit(struct virtnet_info *vi, struct receive_queue *rq, -struct send_queue *sq, struct xdp_buff *xdp, void *data) { struct virtio_net_hdr_mrg_rxbuf *hdr; unsigned int num_sg, len; + struct send_queue *sq; + unsigned int qp; void *xdp_sent; int err; + qp = vi->curr_queue_pairs - vi->xdp_queue_pairs + smp_processor_id(); + sq = &vi->sq[qp]; + /* Free up any pending old buffers before queueing new ones. */ while ((xdp_sent = virtqueue_get_buf(sq->vq, &len)) != NULL) { if (vi->mergeable_rx_bufs) { @@ -404,7 +408,6 @@ static struct sk_buff *receive_small(struct net_device *dev, if (xdp_prog) { struct virtio_net_hdr_mrg_rxbuf *hdr = buf; struct xdp_buff xdp; - unsigned int qp; u32 act; if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags)) @@ -417,10 +420,7 @@ static struct sk_buff *receive_small(struct net_device *dev, case XDP_PASS: break; case XDP_TX: - qp = vi->curr_queue_pairs - - vi->xdp_queue_pairs + - smp_processor_id(); - virtnet_xdp_xmit(vi, rq, &vi->sq[qp], &xdp, skb); + virtnet_xdp_xmit(vi, rq, &xdp, skb); rcu_read_unlock(); goto xdp_xmit; default: @@ -545,7 +545,6 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, if (xdp_prog) { struct page *xdp_page; struct xdp_buff xdp; - unsigned int qp; void *data; u32 act; @@ -586,10 +585,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, } break; case XDP_TX: - qp = vi->curr_queue_pairs - - vi->xdp_queue_pairs + - smp_processor_id(); - virtnet_xdp_xmit(vi, rq, &vi->sq[qp], &xdp, data); + virtnet_xdp_xmit(vi, rq, &xdp, data); ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len); if (unlikely(xdp_page != page)) goto err_xdp;
[net PATCH v5 0/6] virtio_net XDP fixes and adjust_header support
This has a fix to handle small buffer free logic correctly and then also adds adjust head support. I pushed adjust head at net (even though its rc3) to avoid having to push another exception case into virtio_net to catch if the program uses adjust_head and then block it. If there are any strong objections to this we can push it at net-next and use a patch from Jakub to add the exception handling but then user space has to deal with it either via try/fail logic or via kernel version checks. Granted we already have some cases that need to be configured to enable XDP but I don't see any reason to have yet another one when we can fix it now vs delaying a kernel version. v2: fix spelling error, convert unsigned -> unsigned int v3: v2 git crashed during send so retrying sorry for the noise v4: changed layout of rtnl_lock fixes (Stephen) moved reset logic into virtio core with new patch (MST) fixed up linearize and some code cleanup (Jason) Otherwise did some generic code cleanup so might be a bit cleaner this time at least that is the hope. v5: fixed rtnl_lock issue (DaveM) In order to fix rtnl_lock issue and also to address Jason's comment questioning the need for a generic virtio_device_reset routine I exported some virtio core routines and then wrote virtio_net reset routine. This is the cleanest solution I came up with today and I do not at this time have any need for a more generic reset. If folks don't like this I could revert back to v3 variant but Stephen pointed out that the pattern used there is also not ideal. Thanks for the review. --- John Fastabend (6): virtio_net: use dev_kfree_skb for small buffer XDP receive virtio_net: wrap rtnl_lock in test for calling with lock already held virtio_net: factor out xdp handler for readability virtio_net: remove duplicate queue pair binding in XDP virtio_net: refactor freeze/restore logic into virtnet reset logic virtio_net: XDP support for adjust_head drivers/net/virtio_net.c | 332 ++ drivers/virtio/virtio.c | 42 +++--- include/linux/virtio.h |4 + 3 files changed, 247 insertions(+), 131 deletions(-) -- Signature
[net PATCH v5 3/6] virtio_net: factor out xdp handler for readability
At this point the do_xdp_prog is mostly if/else branches handling the different modes of virtio_net. So remove it and handle running the program in the per mode handlers. Signed-off-by: John Fastabend --- drivers/net/virtio_net.c | 75 +- 1 file changed, 27 insertions(+), 48 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index ba0efee..6de0cbe 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -388,49 +388,6 @@ static void virtnet_xdp_xmit(struct virtnet_info *vi, virtqueue_kick(sq->vq); } -static u32 do_xdp_prog(struct virtnet_info *vi, - struct receive_queue *rq, - struct bpf_prog *xdp_prog, - void *data, int len) -{ - int hdr_padded_len; - struct xdp_buff xdp; - void *buf; - unsigned int qp; - u32 act; - - if (vi->mergeable_rx_bufs) { - hdr_padded_len = sizeof(struct virtio_net_hdr_mrg_rxbuf); - xdp.data = data + hdr_padded_len; - xdp.data_end = xdp.data + (len - vi->hdr_len); - buf = data; - } else { /* small buffers */ - struct sk_buff *skb = data; - - xdp.data = skb->data; - xdp.data_end = xdp.data + len; - buf = skb->data; - } - - act = bpf_prog_run_xdp(xdp_prog, &xdp); - switch (act) { - case XDP_PASS: - return XDP_PASS; - case XDP_TX: - qp = vi->curr_queue_pairs - - vi->xdp_queue_pairs + - smp_processor_id(); - xdp.data = buf; - virtnet_xdp_xmit(vi, rq, &vi->sq[qp], &xdp, data); - return XDP_TX; - default: - bpf_warn_invalid_xdp_action(act); - case XDP_ABORTED: - case XDP_DROP: - return XDP_DROP; - } -} - static struct sk_buff *receive_small(struct net_device *dev, struct virtnet_info *vi, struct receive_queue *rq, @@ -446,19 +403,30 @@ static struct sk_buff *receive_small(struct net_device *dev, xdp_prog = rcu_dereference(rq->xdp_prog); if (xdp_prog) { struct virtio_net_hdr_mrg_rxbuf *hdr = buf; + struct xdp_buff xdp; + unsigned int qp; u32 act; if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags)) goto err_xdp; - act = do_xdp_prog(vi, rq, xdp_prog, skb, len); + + xdp.data = skb->data; + xdp.data_end = xdp.data + len; + act = bpf_prog_run_xdp(xdp_prog, &xdp); switch (act) { case XDP_PASS: break; case XDP_TX: + qp = vi->curr_queue_pairs - + vi->xdp_queue_pairs + + smp_processor_id(); + virtnet_xdp_xmit(vi, rq, &vi->sq[qp], &xdp, skb); rcu_read_unlock(); goto xdp_xmit; - case XDP_DROP: default: + bpf_warn_invalid_xdp_action(act); + case XDP_ABORTED: + case XDP_DROP: goto err_xdp; } } @@ -576,6 +544,9 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, xdp_prog = rcu_dereference(rq->xdp_prog); if (xdp_prog) { struct page *xdp_page; + struct xdp_buff xdp; + unsigned int qp; + void *data; u32 act; /* This happens when rx buffer size is underestimated */ @@ -598,8 +569,10 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, if (unlikely(hdr->hdr.gso_type)) goto err_xdp; - act = do_xdp_prog(vi, rq, xdp_prog, - page_address(xdp_page) + offset, len); + data = page_address(xdp_page) + offset; + xdp.data = data + vi->hdr_len; + xdp.data_end = xdp.data + (len - vi->hdr_len); + act = bpf_prog_run_xdp(xdp_prog, &xdp); switch (act) { case XDP_PASS: /* We can only create skb based on xdp_page. */ @@ -613,13 +586,19 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, } break; case XDP_TX: + qp = vi->curr_queue_pairs - + vi->xdp_queue_pairs + + smp_processor_id(); + virtnet_xdp_xmit(vi, rq, &vi->sq[qp], &xdp, data); ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);
[net PATCH v5 2/6] virtio_net: wrap rtnl_lock in test for calling with lock already held
For XDP use case and to allow ethtool reset tests it is useful to be able to use reset paths from contexts where rtnl lock is already held. This requries updating virtnet_set_queues and free_receive_bufs the two places where rtnl_lock is taken in virtio_net. To do this we use the following pattern, _foo(...) { do stuff } foo(...) { rtnl_lock(); _foo(...); rtnl_unlock()}; this allows us to use freeze()/restore() flow from both contexts. Signed-off-by: John Fastabend --- drivers/net/virtio_net.c | 31 +-- 1 file changed, 21 insertions(+), 10 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index d97bb71..ba0efee 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1331,7 +1331,7 @@ static void virtnet_ack_link_announce(struct virtnet_info *vi) rtnl_unlock(); } -static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs) +static int _virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs) { struct scatterlist sg; struct net_device *dev = vi->dev; @@ -1357,6 +1357,16 @@ static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs) return 0; } +static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs) +{ + int err; + + rtnl_lock(); + err = _virtnet_set_queues(vi, queue_pairs); + rtnl_unlock(); + return err; +} + static int virtnet_close(struct net_device *dev) { struct virtnet_info *vi = netdev_priv(dev); @@ -1609,7 +1619,7 @@ static int virtnet_set_channels(struct net_device *dev, return -EINVAL; get_online_cpus(); - err = virtnet_set_queues(vi, queue_pairs); + err = _virtnet_set_queues(vi, queue_pairs); if (!err) { netif_set_real_num_tx_queues(dev, queue_pairs); netif_set_real_num_rx_queues(dev, queue_pairs); @@ -1736,7 +1746,7 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog) return -ENOMEM; } - err = virtnet_set_queues(vi, curr_qp + xdp_qp); + err = _virtnet_set_queues(vi, curr_qp + xdp_qp); if (err) { dev_warn(&dev->dev, "XDP Device queue allocation failure.\n"); return err; @@ -1745,7 +1755,7 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog) if (prog) { prog = bpf_prog_add(prog, vi->max_queue_pairs - 1); if (IS_ERR(prog)) { - virtnet_set_queues(vi, curr_qp); + _virtnet_set_queues(vi, curr_qp); return PTR_ERR(prog); } } @@ -1864,12 +1874,11 @@ static void virtnet_free_queues(struct virtnet_info *vi) kfree(vi->sq); } -static void free_receive_bufs(struct virtnet_info *vi) +static void _free_receive_bufs(struct virtnet_info *vi) { struct bpf_prog *old_prog; int i; - rtnl_lock(); for (i = 0; i < vi->max_queue_pairs; i++) { while (vi->rq[i].pages) __free_pages(get_a_page(&vi->rq[i], GFP_KERNEL), 0); @@ -1879,6 +1888,12 @@ static void free_receive_bufs(struct virtnet_info *vi) if (old_prog) bpf_prog_put(old_prog); } +} + +static void free_receive_bufs(struct virtnet_info *vi) +{ + rtnl_lock(); + _free_receive_bufs(vi); rtnl_unlock(); } @@ -2317,9 +2332,7 @@ static int virtnet_probe(struct virtio_device *vdev) goto free_unregister_netdev; } - rtnl_lock(); virtnet_set_queues(vi, vi->curr_queue_pairs); - rtnl_unlock(); /* Assume link up if device can't report link status, otherwise get link status from config. */ @@ -2428,9 +2441,7 @@ static int virtnet_restore(struct virtio_device *vdev) netif_device_attach(vi->dev); - rtnl_lock(); virtnet_set_queues(vi, vi->curr_queue_pairs); - rtnl_unlock(); err = virtnet_cpu_notif_add(vi); if (err)
[net PATCH v5 1/6] virtio_net: use dev_kfree_skb for small buffer XDP receive
In the small buffer case during driver unload we currently use put_page instead of dev_kfree_skb. Resolve this by adding a check for virtnet mode when checking XDP queue type. Also name the function so that the code reads correctly to match the additional check. Fixes: bb91accf2733 ("virtio-net: XDP support for small buffers") Signed-off-by: John Fastabend Acked-by: Jason Wang --- drivers/net/virtio_net.c |8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 4a10500..d97bb71 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1890,8 +1890,12 @@ static void free_receive_page_frags(struct virtnet_info *vi) put_page(vi->rq[i].alloc_frag.page); } -static bool is_xdp_queue(struct virtnet_info *vi, int q) +static bool is_xdp_raw_buffer_queue(struct virtnet_info *vi, int q) { + /* For small receive mode always use kfree_skb variants */ + if (!vi->mergeable_rx_bufs) + return false; + if (q < (vi->curr_queue_pairs - vi->xdp_queue_pairs)) return false; else if (q < vi->curr_queue_pairs) @@ -1908,7 +1912,7 @@ static void free_unused_bufs(struct virtnet_info *vi) for (i = 0; i < vi->max_queue_pairs; i++) { struct virtqueue *vq = vi->sq[i].vq; while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) { - if (!is_xdp_queue(vi, i)) + if (!is_xdp_raw_buffer_queue(vi, i)) dev_kfree_skb(buf); else put_page(virt_to_head_page(buf));
Getting a handle on all these new NIC features
There was some discussion about the problems of dealing with the explosion of NIC features in the mlx directory restructuring proposal, but I think the is a deeper issue here that should be discussed. It's hard not to notice that there has been quite a proliferation of NIC features in several drivers. This trend had resulted in very complex driver code that may or may not segment individual features. One visible manifestation of this is number of ndo functions which is somewhere around seventy-five now. I suspect the vast majority of these advances NIC features (e.g. bridging, UDP offloads, tc offload, etc.) are only relevant to some of the people some of the time. The problem we have, in this case those of us that are attempting to deploy and maintain NICs at scale, is when we have to deal with the ramifications of these features being intertwined with core driver functionality that is relevant to everyone. This becomes very obvious when we need to backport drivers from later versions of kernel. I realize that backports of a driver is not a specific concern of the Linux kernel, but nevertheless this is a real problem and a fact of life for many users. Rebasing the full kernel is still a major effort and it seems the best we could ever do is one rebase per year. In the interim we need to occasionally backport drivers. Backporting drivers is difficult precisely because of new features or API changes to existing ones. These sort of changes tend to have a spiderweb of dependencies in other parts of the stack so that the number of patches we need to cherry-pick goes way beyond those that touch the driver we are interested in. Currently we (FB) need to backport two NIC drivers. I've already gave details of backporting mlx5 on the thread to restructure the driver directories. The other driver being backporting seems to suffer from the same type of feature complexity. In short, I would like to ask if driver maintainers to start to modularize driver features. If something being added is obviously a narrow feature that only a subset of users will need can we allow config options to #ifdef those out somehow? Furthermore can the file and directory structure of drivers reflect that; our lives would be _so_ much simpler to maintain drivers in production if we have such modularity and the ability to build drivers with the features of our choosing. Thanks, Tom
52bd2d62ce6758d811edcbd2256eb9ea7f6a56cb fixing crashes? -> 4.4 stable?
Dear netdev developers, I'd like to ask for a consultation regarding 4.4 kernel crashes. we're using intel X540-AT2 10g controllers (onboard ones, on supermicro boards) and we've noticed, then when using openvswitch, system very quickly crashes on 4.4.x kernels we're usign. 4.5 is fine though. here's backtrace gathered from system pstore: <1>[ 1084.114586] BUG: unable to handle kernel paging request at 8840c365b5c4 <1>[ 1084.114918] IP: [] __netdev_pick_tx+0x92/0x140 <4>[ 1084.115101] PGD 2018067 PUD 0 <4>[ 1084.115270] Oops: [#1] SMP <4>[ 1084.115439] Modules linked in: bonding(E) openvswitch(E) nf_defrag_ipv6(E) nf_conntrack(E) crc32_pclmul(E) aesni_intel(E) lrw(E) gf128mul(E) glue_helper(E) ablk_helper(E) cryptd(E) kvm _intel(E) kvm(E) irqbypass(E) coretemp(E) crct10dif_pclmul(E) intel_powerclamp(E) x86_pkg_temp_thermal(E) ses(E) enclosure(E) iTCO_wdt(E) iTCO_vendor_support(E) mxm_wmi(E) i2c_i801(E) lpc_ic h(E) mei_me(E) mfd_core(E) i2c_core(E) sb_edac(E) sg(E) mei(E) pcspkr(E) edac_core(E) ipmi_devintf(E) ioatdma(E) shpchp(E) wmi(E) ipmi_si(E) ipmi_msghandler(E) 8250_fintek(E) acpi_power_mete r(E) acpi_pad(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) ip_tables(E) ext4(E) jbd2(E) mbcache(E) raid1(E) sd_mod(E) ahci(E) libahci(E) bnx2x(E) libcrc32c(E) ixgbe(E) cr c32c_intel(E) libata(E) mdio(E) ptp(E) dca(E) megaraid_sas(E) pps_core(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) <4>[ 1084.117683] CPU: 0 PID: 0 Comm: swapper/0 Tainted: GE 4.4.33lb7.01 #1 <4>[ 1084.118012] Hardware name: Supermicro X10DRi/X10DRI-T, BIOS 2.1 09/13/2016 <4>[ 1084.118181] task: 819f14c0 ti: 819e task.ti: 819e <4>[ 1084.118501] RIP: 0010:[] [] __netdev_pick_tx+0x92/0x140 <4>[ 1084.118828] RSP: 0018:883f7f003638 EFLAGS: 00010a02 <4>[ 1084.118994] RAX: aef55a76 RBX: RCX: 9d6e7dcd <4>[ 1084.119164] RDX: ba9f4f5f RSI: 883f63f14d00 RDI: 883f7f0035ec <4>[ 1084.119333] RBP: 883f7f003668 R08: 0003 R09: c8cfdbe1 <4>[ 1084.119506] R10: 883f61206042 R11: 883f7f0035c0 R12: <4>[ 1084.119679] R13: 883f657b00c0 R14: 883f5d92 R15: f012 <4>[ 1084.119850] FS: () GS:883f7f00() knlGS: <4>[ 1084.120171] CS: 0010 DS: ES: CR0: 80050033 <4>[ 1084.120338] CR2: 8840c365b5c4 CR3: 019ea000 CR4: 003406f0 <4>[ 1084.120509] DR0: DR1: DR2: <4>[ 1084.120678] DR3: DR6: fffe0ff0 DR7: 0400 <4>[ 1084.120847] Stack: <4>[ 1084.121006] 883f63f14d00 883f63f14d00 000e <4>[ 1084.121339] 883f5d92 883f60a7f840 883f7f0036a0 a00fbed4 <4>[ 1084.121672] 883f603612ac 883f5d92 883f63f14d00 <4>[ 1084.122006] Call Trace: <4>[ 1084.122168] <4>[ 1084.122193] [] ixgbe_select_queue+0xc4/0x150 [ixgbe] <4>[ 1084.122519] [] netdev_pick_tx+0x5e/0xf0 <4>[ 1084.122687] [] __dev_queue_xmit+0xa2/0x560 <4>[ 1084.122856] [] dev_queue_xmit+0x10/0x20 <4>[ 1084.123034] [] bond_dev_queue_xmit+0x32/0x80 [bonding] <4>[ 1084.123207] [] bond_start_xmit+0x1a6/0x3f0 [bonding] <4>[ 1084.123382] [] ? ep_poll_callback+0xb5/0x160 <4>[ 1084.123551] [] dev_hard_start_xmit+0x238/0x3f0 <4>[ 1084.123721] [] ? netif_skb_features+0xff/0x200 <4>[ 1084.123890] [] __dev_queue_xmit+0x442/0x560 <4>[ 1084.124059] [] dev_queue_xmit+0x10/0x20 <4>[ 1084.124232] [] ovs_vport_send+0x4a/0xc0 [openvswitch] <4>[ 1084.124404] [] do_output.isra.30+0x43/0x160 [openvswitch] <4>[ 1084.124575] [] ? __skb_clone+0x2e/0x140 <4>[ 1084.124744] [] do_execute_actions+0x684/0x7e0 [openvswitch] <4>[ 1084.125067] [] ovs_execute_actions+0x32/0xd0 [openvswitch] <4>[ 1084.125240] [] ovs_dp_process_packet+0x84/0x110 [openvswitch] <4>[ 1084.125565] [] ovs_vport_receive+0x6c/0xd0 [openvswitch] <4>[ 1084.125740] [] ? check_preempt_curr+0x75/0x90 <4>[ 1084.125912] [] ? ttwu_do_wakeup+0x19/0xe0 <4>[ 1084.126081] [] ? ttwu_do_activate.constprop.95+0x5d/0x70 <4>[ 1084.126252] [] ? try_to_wake_up+0x47/0x340 <4>[ 1084.126427] [] ? default_wake_function+0x12/0x20 <4>[ 1084.126600] [] ? autoremove_wake_function+0x2b/0x40 <4>[ 1084.126773] [] netdev_frame_hook+0xe7/0x150 [openvswitch] <4>[ 1084.126945] [] __netif_receive_skb_core+0x1e0/0x9e0 <4>[ 1084.127115] [] ? ipv6_gro_receive+0x246/0x360 <4>[ 1084.127284] [] __netif_receive_skb+0x18/0x60 <4>[ 1084.127453] [] netif_receive_skb_internal+0x40/0xb0 <4>[ 1084.127623] [] napi_gro_receive+0xc3/0x110 <4>[ 1084.127813] [] bnx2x_rx_int+0x101c/0x19d0 [bnx2x] <4>[ 1084.127984] [] ? load_balance+0x163/0x8d0 <4>[ 1084.128166] [] bnx2x_poll+0x284/0x340 [bnx2x] <4>[ 1084.128334] [] net_rx_action+0x16b/0x370 <4>[ 1084.128503] [] __do_softirq+0xe2/0x2e0 <4>[ 1084.128671] [] ir
Re: fs, net: deadlock between bind/splice on af_unix
On Mon, Jan 16, 2017 at 1:32 AM, Dmitry Vyukov wrote: > On Fri, Dec 9, 2016 at 7:41 AM, Al Viro wrote: >> On Thu, Dec 08, 2016 at 10:32:00PM -0800, Cong Wang wrote: >> >>> > Why do we do autobind there, anyway, and why is it conditional on >>> > SOCK_PASSCRED? Note that e.g. for SOCK_STREAM we can bloody well get >>> > to sending stuff without autobind ever done - just use socketpair() >>> > to create that sucker and we won't be going through the connect() >>> > at all. >>> >>> In the case Dmitry reported, unix_dgram_sendmsg() calls unix_autobind(), >>> not SOCK_STREAM. >> >> Yes, I've noticed. What I'm asking is what in there needs autobind triggered >> on sendmsg and why doesn't the same need affect the SOCK_STREAM case? >> >>> I guess some lock, perhaps the u->bindlock could be dropped before >>> acquiring the next one (sb_writer), but I need to double check. >> >> Bad idea, IMO - do you *want* autobind being able to come through while >> bind(2) is busy with mknod? > > > Ping. This is still happening on HEAD. > Thanks for your reminder. Mind to give the attached patch (compile only) a try? I take another approach to fix this deadlock, which moves the unix_mknod() out of unix->bindlock. Not sure if there is any unexpected impact with this way. Thanks. diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 127656e..5d4b4d1 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -995,6 +995,7 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) unsigned int hash; struct unix_address *addr; struct hlist_head *list; + struct path path; err = -EINVAL; if (sunaddr->sun_family != AF_UNIX) @@ -1010,9 +1011,20 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) goto out; addr_len = err; + if (sun_path[0]) { + umode_t mode = S_IFSOCK | + (SOCK_INODE(sock)->i_mode & ~current_umask()); + err = unix_mknod(sun_path, mode, &path); + if (err) { + if (err == -EEXIST) + err = -EADDRINUSE; + goto out; + } + } + err = mutex_lock_interruptible(&u->bindlock); if (err) - goto out; + goto out_put; err = -EINVAL; if (u->addr) @@ -1029,16 +1041,6 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) atomic_set(&addr->refcnt, 1); if (sun_path[0]) { - struct path path; - umode_t mode = S_IFSOCK | - (SOCK_INODE(sock)->i_mode & ~current_umask()); - err = unix_mknod(sun_path, mode, &path); - if (err) { - if (err == -EEXIST) - err = -EADDRINUSE; - unix_release_addr(addr); - goto out_up; - } addr->hash = UNIX_HASH_SIZE; hash = d_backing_inode(path.dentry)->i_ino & (UNIX_HASH_SIZE - 1); spin_lock(&unix_table_lock); @@ -1065,6 +1067,9 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) spin_unlock(&unix_table_lock); out_up: mutex_unlock(&u->bindlock); +out_put: + if (err) + path_put(&path); out: return err; }
Re: [PATCH net] lwtunnel: fix autoload of lwt modules
From: David Ahern Date: Tue, 17 Jan 2017 13:46:22 -0700 > In short seems like removing the dev + the current patch dropping > the lock fixes the current deadlock problem and should be fine. What about the state recorded by fib_get_nhs() and similar? There is a mapping from ifindex to ->nh_dev which would be invalidated if the RTNL semaphore is dropped. It won't get updated by device events, which is what normally happens, because the fib_info is not in any of the fib_trie tables yet. So I think you still have a huge problem without doing proper restarts.
Re: [PATCH net-next] tcp: accept RST for rcv_nxt - 1 after receiving a FIN
From: Jason Baron Date: Tue, 17 Jan 2017 13:37:19 -0500 > From: Jason Baron > > Using a Mac OSX box as a client connecting to a Linux server, we have found > that when certain applications (such as 'ab'), are abruptly terminated > (via ^C), a FIN is sent followed by a RST packet on tcp connections. The > FIN is accepted by the Linux stack but the RST is sent with the same > sequence number as the FIN, and Linux responds with a challenge ACK per > RFC 5961. The OSX client then sometimes (they are rate-limited) does not > reply with any RST as would be expected on a closed socket. > > This results in sockets accumulating on the Linux server left mostly in > the CLOSE_WAIT state, although LAST_ACK and CLOSING are also possible. > This sequence of events can tie up a lot of resources on the Linux server > since there may be a lot of data in write buffers at the time of the RST. > Accepting a RST equal to rcv_nxt - 1, after we have already successfully > processed a FIN, has made a significant difference for us in practice, by > freeing up unneeded resources in a more expedient fashion. > > A packetdrill test demonstrating the behavior: ... > Signed-off-by: Jason Baron Applied, thanks Jason.
Re: Potential issues (security and otherwise) with the current cgroup-bpf API
On Tue, Jan 17, 2017 at 5:58 AM, Michal Hocko wrote: > On Tue 17-01-17 14:32:04, Peter Zijlstra wrote: >> On Tue, Jan 17, 2017 at 02:03:03PM +0100, Michal Hocko wrote: >> > On Sun 15-01-17 20:19:01, Tejun Heo wrote: >> > [...] >> > > So, what's proposed is a proper part of bpf. In terms of >> > > implementation, cgroup helps by hosting the pointers but that doesn't >> > > necessarily affect the conceptual structure of it. Given that, I >> > > don't think it'd be a good idea to add anything to cgroup interface >> > > for this feature. Introspection is great to have but this should be >> > > introspectable together with other bpf programs using the same >> > > mechanism. That's where it belongs. >> > >> > If BPF only piggy backs on top of cgroup to iterate tasks shouldn't we >> > at least enforce that the cgroup has to be a leaf one and no further >> > children groups can be created once there is BPF program attached? >> >> Why (again) this stupid constraint? >> >> If you want to use cgroups for tagging (like perf does), _any_ parent >> cgroup will also tag you. >> >> So creating child cgroups, and placing tasks in it, should not be a >> problem, the BPF thing should apply to all of them. > > This would require using hierarchical cgroup iterators to iterate over > tasks. As per Andy's testing this doesn't seem to be the case. I haven't > checked the implementation closely but my understanding was that using > only cgroup specific tasks was intentional. The current semantics are AFAIK that only the innermost cgroup that has a hook installed is in effect. I think this is the wrong design. I think that the right semantics are probably to support both innermost-to-outermost and outermost-to-innermost and to select which is appropriate for each hook. Suppose we have a cgroup /a/b where a and b both have hooks installed. If the hook is a socket creation or egress hook, I think that b's hook should run first. If b's hook rejects, then a's hook is not run. If b's hook accepts, then a's hook is run. This way a gets the last word on any changes to the socket settings and a sees exactly what would happen if it were to accept. Conversely, for ingress hooks, I think that a's hook should run first. This way a sees the packet as it originally came in and can modify or reject it, and then b only sees whatever a chooses to let through. The guiding principle here is that, for actions that originate outside the machine, the outer hooks should IMO run first and, for actions that originate from a task in a cgroup, the innermost hooks should run first. --Andy
Re: [PATCH] net: ethoc: Make needlessly global struct ethtool_ops static
From: Tobias Klauser Date: Tue, 17 Jan 2017 15:01:08 +0100 > Make the needlessly global struct ethtool_ops ethoc_ethtool_ops static > to fix a sparse warning. > > Signed-off-by: Tobias Klauser Applied, thanks.