RE: [PATCH v9 net-next 1/2] hv_sock: introduce Hyper-V Sockets
> From: David Miller [mailto:da...@davemloft.net] > Sent: Sunday, May 8, 2016 1:41 > To: Dexuan Cui > Cc: gre...@linuxfoundation.org; netdev@vger.kernel.org; linux- > ker...@vger.kernel.org; de...@linuxdriverproject.org; o...@aepfle.de; > a...@canonical.com; jasow...@redhat.com; cav...@redhat.com; KY > Srinivasan ; Haiyang Zhang ; > j...@perches.com; vkuzn...@redhat.com > Subject: Re: [PATCH v9 net-next 1/2] hv_sock: introduce Hyper-V Sockets > > From: Dexuan Cui > Date: Sat, 7 May 2016 10:49:25 + > > > I should be able to make 'send', 'recv' here to pointers and use vmalloc() > > to allocate the memory for them. I will do this. > > That's still unswappable kernel memory. Hi David, My understanding is: kernel pages are not swappable in Linux, so it looks I can't avoid unswappable kernel memory here? > People can open N sockets, where N is something on the order of the FD > limit the process has, per process. This allows someone to quickly > eat up a lot of memory and hold onto it nearly indefinitely. Thanks for pointing this out! I understand, so I think I should add a module parameter, e.g., "hv_sock.max_socket_number" with a default value, say, 1024? 1 established hv_sock connection takes less than 20 pages, including 10 pages for VMBus ringbuffer, 6 pages for send/recv buffers(I'll use vmalloc() for this), etc. Here the recv buf needs a size of 5 pages because potentially the host can send the guest a VMBus packet with an up-to-5-page payload, i..e, the VMBus inbound ringbuffer size. 1024 hv_sock connections take less than 20*4KB * 1K = 80MB memory. A user who needs more connections can change the module parameter without reboot. hv_sock connection is designed to work only between the host and the guest. I think 1024 connections seem pretty enough. BTW, a user can't create hv_sock connections without enough privilege. Please see +static int hvsock_create(struct net *net, struct socket *sock, +int protocol, int kern) +{ + if (!capable(CAP_SYS_ADMIN) && !capable(CAP_NET_ADMIN)) + return -EPERM; David, does this make sense to you? Thanks, -- Dexuan
Re: [PATCH net-next v2 1/2] net: l3mdev: Add hook in ip and ipv6
Hi, On Sat, 7 May 2016 20:25:40 -0600 David Ahern wrote: > > - On which circumstances we end up entering > > l3mdev_ip_rcv/l3mdev_ip6_rcv where skb->dev is the master? > > If I got it right, we enter 'ip_rcv_finish' on a slave device, > > the callback is invoked and eventually sets skb->dev and skb->skb_iif > > to the VRF device; then ip_rcv_finish continues processing the > > altered skb (with the changed skb->dev). > > So on which cicumstances do we enter 'ip_rcv_finish' where the > > skb->dev is ALREADY a master device? > > If you look at the full patchset I posted on 5/4 the patch after PKTINFO > allows local traffic. That change needs the netif_is_l3_master(). I see. Thanks David.
Re: [PATCH v2 net-next] ifb: support more features
On Fri, 2016-05-06 at 18:19 -0700, Eric Dumazet wrote: > From: Eric Dumazet > > When using ifb+netem on ingress on SIT/IPIP/GRE traffic, > GRO packets are not properly processed. > > Segmentation should not be forced, since ifb is already adding > quite a performance hit. > > Signed-off-by: Eric Dumazet > --- > drivers/net/ifb.c |3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c > index cc56fac3c3f8..66c0eeafcb5d 100644 > --- a/drivers/net/ifb.c > +++ b/drivers/net/ifb.c > @@ -196,6 +196,7 @@ static const struct net_device_ops ifb_netdev_ops = { > > #define IFB_FEATURES (NETIF_F_HW_CSUM | NETIF_F_SG | NETIF_F_FRAGLIST > | \ > NETIF_F_TSO_ECN | NETIF_F_TSO | NETIF_F_TSO6 | \ > + NETIF_F_GSO_ENCAP_ALL | \ > NETIF_F_HIGHDMA | NETIF_F_HW_VLAN_CTAG_TX | \ > NETIF_F_HW_VLAN_STAG_TX) > > @@ -224,6 +225,8 @@ static void ifb_setup(struct net_device *dev) > dev->tx_queue_len = TX_Q_LIMIT; > > dev->features |= IFB_FEATURES; > + dev->hw_features |= dev->features; > + dev->hw_enc_features |= dev->features; > dev->vlan_features |= IFB_FEATURES & ~(NETIF_F_HW_VLAN_CTAG_TX | > NETIF_F_HW_VLAN_STAG_TX); > > BTW, encapsulated GRO traffic going through mirred+ifb is dropped because segments get an incorrect skb->mac_len (If TSO/GSO is disabled on ifb, as before the above patch) SIT traffic for example : segments get mac_len set to 34 instead of 14 What do you think of : diff --git a/net/core/skbuff.c b/net/core/skbuff.c index e561f9f07d6d..bec5c32b2fe9 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3176,7 +3176,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb, __copy_skb_header(nskb, head_skb); skb_headers_offset_update(nskb, skb_headroom(nskb) - headroom); - skb_reset_mac_len(nskb); + nskb->mac_len = head_skb->mac_len; skb_copy_from_linear_data_offset(head_skb, -tnl_hlen, nskb->data - tnl_hlen,
Re: [PATCH] Add support for configuring Infiniband GUIDs
On Fri, May 06, 2016 at 10:43:25AM -0500, Eli Cohen wrote: > Add two NLA's that allow configuration of Infiniband node or port GUIDs > by referencing the IPoIB net device set over then physical function. The > format to be used is as follows: > > ip link set dev ib0 vf 0 node_guid 00:02:c9:03:00:21:6e:70 > ip link set dev ib0 vf 0 port_guid 00:02:c9:03:00:21:6e:78 > > Issue: 702759 > Change-Id: I5ffb54d6de7bfa8650bf5818f484279914991d6e These two lines are not needed to be part of commit message. signature.asc Description: Digital signature
[PATCH net-next 2/2] net: original ingress device index in PKTINFO
Applications such as OSPF and BFD need the original ingress device not the VRF device; the latter can be derived from the former. To that end add the skb_iif to inet_skb_parm and set it in ipv4 code after clearing the skb control buffer similar to IPv6. From there the pktinfo can just pull it from cb with the PKTINFO_SKB_CB cast. The previous patch moving the skb->dev change to L3 means nothing else is needed for IPv6; it just works. Signed-off-by: David Ahern --- include/net/ip.h | 1 + net/ipv4/ip_input.c| 1 + net/ipv4/ip_sockglue.c | 7 ++- 3 files changed, 8 insertions(+), 1 deletion(-) diff --git a/include/net/ip.h b/include/net/ip.h index 247ac82e9cf2..37165fba3741 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -36,6 +36,7 @@ struct sock; struct inet_skb_parm { + int iif; struct ip_options opt;/* Compiled IP options */ unsigned char flags; diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c index 37375eedeef9..4b351af3e67b 100644 --- a/net/ipv4/ip_input.c +++ b/net/ipv4/ip_input.c @@ -478,6 +478,7 @@ int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, /* Remove any debris in the socket control block */ memset(IPCB(skb), 0, sizeof(struct inet_skb_parm)); + IPCB(skb)->iif = skb->skb_iif; /* Must drop socket now because of tproxy. */ skb_orphan(skb); diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c index bdb222c0c6a2..5805762d7fc7 100644 --- a/net/ipv4/ip_sockglue.c +++ b/net/ipv4/ip_sockglue.c @@ -1193,7 +1193,12 @@ void ipv4_pktinfo_prepare(const struct sock *sk, struct sk_buff *skb) ipv6_sk_rxinfo(sk); if (prepare && skb_rtable(skb)) { - pktinfo->ipi_ifindex = inet_iif(skb); + /* skb->cb is overloaded: prior to this point it is IP{6}CB +* which has interface index (iif) as the first member of the +* underlying inet{6}_skb_parm struct. This code then overlays +* PKTINFO_SKB_CB and in_pktinfo also has iif as the first +* element so the iif is picked up from the prior IPCB +*/ pktinfo->ipi_spec_dst.s_addr = fib_compute_spec_dst(skb); } else { pktinfo->ipi_ifindex = 0; -- 2.1.4
[PATCH net-next v3 1/2] net: l3mdev: Add hook in ip and ipv6
Currently the VRF driver uses the rx_handler to switch the skb device to the VRF device. Switching the dev prior to the ip / ipv6 layer means the VRF driver has to duplicate IP/IPv6 processing which adds overhead and makes features such as retaining the ingress device index more complicated than necessary. This patch moves the hook to the L3 layer just after the first NF_HOOK for PRE_ROUTING. This location makes exposing the original ingress device trivial (next patch) and allows adding other NF_HOOKs to the VRF driver in the future. dev_queue_xmit_nit is exported so that the VRF driver can cycle the skb with the switched device through the packet taps to maintain current behavior (tcpdump can be used on either the vrf device or the enslaved devices). Signed-off-by: David Ahern --- v3 - reverted direct skb->data accesses caused by inadvertent drop of 65c38aa653c1 with forward port of this patch v2 - add skb_l3mdev_slave helper for inet6_iif and tcp_v6_iif rather than open coding the if case. Added benefit that the change compiles out if CONFIG_NET_L3_MASTER_DEV is not enabled. drivers/net/vrf.c | 189 ++ include/linux/ipv6.h | 17 - include/linux/netdevice.h | 2 + include/net/l3mdev.h | 43 +++ include/net/tcp.h | 4 +- net/core/dev.c| 3 +- net/ipv4/ip_input.c | 7 ++ net/ipv6/ip6_input.c | 7 ++ 8 files changed, 171 insertions(+), 101 deletions(-) diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c index 4b2461ae5d3b..309efbf8da0e 100644 --- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -42,9 +42,6 @@ #define DRV_NAME "vrf" #define DRV_VERSION"1.0" -#define vrf_master_get_rcu(dev) \ - ((struct net_device *)rcu_dereference(dev->rx_handler_data)) - struct net_vrf { struct rtable *rth; struct rt6_info *rt6; @@ -60,90 +57,12 @@ struct pcpu_dstats { struct u64_stats_sync syncp; }; -/* neighbor handling is done with actual device; do not want - * to flip skb->dev for those ndisc packets. This really fails - * for multiple next protocols (e.g., NEXTHDR_HOP). But it is - * a start. - */ -#if IS_ENABLED(CONFIG_IPV6) -static bool check_ipv6_frame(const struct sk_buff *skb) -{ - const struct ipv6hdr *ipv6h; - struct ipv6hdr _ipv6h; - bool rc = true; - - ipv6h = skb_header_pointer(skb, 0, sizeof(_ipv6h), &_ipv6h); - if (!ipv6h) - goto out; - - if (ipv6h->nexthdr == NEXTHDR_ICMP) { - const struct icmp6hdr *icmph; - struct icmp6hdr _icmph; - - icmph = skb_header_pointer(skb, sizeof(_ipv6h), - sizeof(_icmph), &_icmph); - if (!icmph) - goto out; - - switch (icmph->icmp6_type) { - case NDISC_ROUTER_SOLICITATION: - case NDISC_ROUTER_ADVERTISEMENT: - case NDISC_NEIGHBOUR_SOLICITATION: - case NDISC_NEIGHBOUR_ADVERTISEMENT: - case NDISC_REDIRECT: - rc = false; - break; - } - } - -out: - return rc; -} -#else -static bool check_ipv6_frame(const struct sk_buff *skb) -{ - return false; -} -#endif - -static bool is_ip_rx_frame(struct sk_buff *skb) -{ - switch (skb->protocol) { - case htons(ETH_P_IP): - return true; - case htons(ETH_P_IPV6): - return check_ipv6_frame(skb); - } - return false; -} - static void vrf_tx_error(struct net_device *vrf_dev, struct sk_buff *skb) { vrf_dev->stats.tx_errors++; kfree_skb(skb); } -/* note: already called with rcu_read_lock */ -static rx_handler_result_t vrf_handle_frame(struct sk_buff **pskb) -{ - struct sk_buff *skb = *pskb; - - if (is_ip_rx_frame(skb)) { - struct net_device *dev = vrf_master_get_rcu(skb->dev); - struct pcpu_dstats *dstats = this_cpu_ptr(dev->dstats); - - u64_stats_update_begin(&dstats->syncp); - dstats->rx_pkts++; - dstats->rx_bytes += skb->len; - u64_stats_update_end(&dstats->syncp); - - skb->dev = dev; - - return RX_HANDLER_ANOTHER; - } - return RX_HANDLER_PASS; -} - static struct rtnl_link_stats64 *vrf_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats) { @@ -506,28 +425,14 @@ static int do_vrf_add_slave(struct net_device *dev, struct net_device *port_dev) { int ret; - /* register the packet handler for slave ports */ - ret = netdev_rx_handler_register(port_dev, vrf_handle_frame, dev); - if (ret) { - netdev_err(port_dev, - "Device %s failed to register rx_handler\n", - port_dev->nam
[PATCH net-next v3 0/2] net: vrf: Fixup PKTINFO to return enslaved device index
Applications such as OSPF and BFD need the original ingress device not the VRF device; the latter can be derived from the former. To that end move the packet intercept from an rx handler that is invoked by __netif_receive_skb_core to the ipv4 and ipv6 receive processing. IPv6 already saves the skb_iif to the control buffer in ipv6_rcv. Since the skb->dev has not been switched the cb has the enslaved device. Make the same happen for IPv4 by adding the skb_iif to inet_skb_parm and set it in ipv4 code after clearing the skb control buffer similar to IPv6. >From there the pktinfo can just pull it from cb with the PKTINFO_SKB_CB cast. David Ahern (2): net: l3mdev: Add hook in ip and ipv6 net: original ingress device index in PKTINFO drivers/net/vrf.c | 189 ++ include/linux/ipv6.h | 17 - include/linux/netdevice.h | 2 + include/net/ip.h | 1 + include/net/l3mdev.h | 43 +++ include/net/tcp.h | 4 +- net/core/dev.c| 3 +- net/ipv4/ip_input.c | 8 ++ net/ipv4/ip_sockglue.c| 7 +- net/ipv6/ip6_input.c | 7 ++ 10 files changed, 179 insertions(+), 102 deletions(-) -- 2.1.4
Re: [PATCH net-next v2 1/2] net: l3mdev: Add hook in ip and ipv6
On 5/7/16 12:32 PM, Shmulik Ladkani wrote: Hi David, On Sat, 7 May 2016 08:50:49 -0600 David Ahern wrote: +static inline +struct sk_buff *l3mdev_l3_rcv(struct sk_buff *skb, u16 proto) +{ + struct net_device *master = NULL; + + if (netif_is_l3_slave(skb->dev)) + master = netdev_master_upper_dev_get_rcu(skb->dev); + + else if (netif_is_l3_master(skb->dev)) + master = skb->dev; + + if (master && master->l3mdev_ops->l3mdev_l3_rcv) + skb = master->l3mdev_ops->l3mdev_l3_rcv(master, skb, proto); In the case where netif_is_l3_master(skb->dev) is true, can you explain why we need to pass it through the l3mdev_l3_rcv callback again? what do you mean again? This is only time the l3mdev_l3_rcv method is called on a packet. You have the following: if (netif_is_l3_slave(skb->dev)) master = netdev_master_upper_dev_get_rcu(skb->dev); else if (netif_is_l3_master(skb->dev)) master = skb->dev; if (master && master->l3mdev_ops->l3mdev_l3_rcv) skb = master->l3mdev_ops->l3mdev_l3_rcv(master, skb, proto); So in both conditions (skb->dev being the slave or the master) the skb is passed to master's l3mdev_l3_rcv callback. Appreciate if you can elaborate: - Why callback needs to be invoked when skb->dev is the L3 master? Every l3mdev_ops has converged on that ordering -- if l3 slave else if l3 master. - On which circumstances we end up entering l3mdev_ip_rcv/l3mdev_ip6_rcv where skb->dev is the master? If I got it right, we enter 'ip_rcv_finish' on a slave device, the callback is invoked and eventually sets skb->dev and skb->skb_iif to the VRF device; then ip_rcv_finish continues processing the altered skb (with the changed skb->dev). So on which cicumstances do we enter 'ip_rcv_finish' where the skb->dev is ALREADY a master device? If you look at the full patchset I posted on 5/4 the patch after PKTINFO allows local traffic. That change needs the netif_is_l3_master().
Re: [PATCH 1/2] net: phy: add ethtool_phy_{get|set}_link_ksettings
On Sun, 2016-05-08 at 00:56 +0200, Philippe Reynes wrote: > On 07/05/16 13:59, Ben Hutchings wrote: > > > > On Sat, 2016-05-07 at 01:18 +0200, Philippe Reynes wrote: > > > > > > The callback {get|set}_link_ksettings are often defined > > > in a very close way. There are mainly two differences in > > > those callback: > > > - the name of the netdev private structure > > > - the name of the struct phydev in the private structure > > > > > > We add two defines ethtool_phy_{get|set}_link_ksettings > > > to avoid writing severals times almost the same function. > > [...] > > > > I don't think there's no need to access a private structure, as there's > > a phydev pointer in struct net_device. If some drivers don't maintain > > that pointer, they should be changed to do so. Then they can > > use generic implementations of {get,set}_link_ksettings provided by > > phylib. > If we could use the phydev in the struct net_device, we could write a > generic function for {get|set}_link_ksettings. It's a good idea. > > But I've quickly looked and a lot of ethernet driver use the private > structure to store the phydev. If the ethernet driver may use the > struct net_device for phydev, do you know why so many drivers use > the private structure ? Maybe just because no-one bothered to update them after it was added to net_device. Ben. > If everybody agree, I can send a new version with a generic > {get|set}_link_ksettings > and a update of fec to use the phydev store in the structure net_device. -- Ben Hutchings I haven't lost my mind; it's backed up on tape somewhere. signature.asc Description: This is a digitally signed message part
[PATCH net-next v2 0/2] net: l3mdev: Allow send on enslaved interface
First patch preps for the second. The second is required for several use cases such as ping on an interface and BFD that need to send packets on a specific interface, including ones enslaved to a VRF device. v2 - fixed brackets on both patches per comment from DaveM David Ahern (2): net: l3mdev: Move get_saddr and rt6_dst net: l3mdev: Allow send on enslaved interface drivers/net/vrf.c| 2 ++ include/net/l3mdev.h | 56 +++--- net/ipv4/route.c | 4 net/ipv6/route.c | 2 +- net/l3mdev/l3mdev.c | 63 5 files changed, 73 insertions(+), 54 deletions(-) -- 2.1.4
[PATCH net-next 2/2] net: l3mdev: Allow send on enslaved interface
Allow udp and raw sockets to send by oif that is an enslaved interface versus the l3mdev/VRF device. For example, this allows BFD to use ifindex from IP_PKTINFO on a receive to send a response without the need to convert to the VRF index. It also allows ping and ping6 to work when specifying an enslaved interface (e.g., ping -I swp1 ) which is a natural use case. Signed-off-by: David Ahern --- drivers/net/vrf.c | 2 ++ net/ipv4/route.c| 4 net/l3mdev/l3mdev.c | 17 + 3 files changed, 19 insertions(+), 4 deletions(-) diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c index 4b2461ae5d3b..c8db55aa8280 100644 --- a/drivers/net/vrf.c +++ b/drivers/net/vrf.c @@ -648,6 +648,8 @@ static int vrf_get_saddr(struct net_device *dev, struct flowi4 *fl4) fl4->flowi4_flags |= FLOWI_FLAG_SKIP_NH_OIF; fl4->flowi4_iif = LOOPBACK_IFINDEX; + /* make sure oif is set to VRF device for lookup */ + fl4->flowi4_oif = dev->ifindex; fl4->flowi4_tos = tos & IPTOS_RT_MASK; fl4->flowi4_scope = ((tos & RTO_ONLINK) ? RT_SCOPE_LINK : RT_SCOPE_UNIVERSE); diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 8c8c655bb2c4..a1f2830d8110 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -2146,6 +2146,7 @@ struct rtable *__ip_route_output_key_hash(struct net *net, struct flowi4 *fl4, unsigned int flags = 0; struct fib_result res; struct rtable *rth; + int master_idx; int orig_oif; int err = -ENETUNREACH; @@ -2155,6 +2156,9 @@ struct rtable *__ip_route_output_key_hash(struct net *net, struct flowi4 *fl4, orig_oif = fl4->flowi4_oif; + master_idx = l3mdev_master_ifindex_by_index(net, fl4->flowi4_oif); + if (master_idx) + fl4->flowi4_oif = master_idx; fl4->flowi4_iif = LOOPBACK_IFINDEX; fl4->flowi4_tos = tos & IPTOS_RT_MASK; fl4->flowi4_scope = ((tos & RTO_ONLINK) ? diff --git a/net/l3mdev/l3mdev.c b/net/l3mdev/l3mdev.c index 898d01e0f87b..6651a78e100c 100644 --- a/net/l3mdev/l3mdev.c +++ b/net/l3mdev/l3mdev.c @@ -112,12 +112,18 @@ struct dst_entry *l3mdev_get_rt6_dst(struct net *net, struct dst_entry *dst = NULL; struct net_device *dev; - dev = dev_get_by_index(net, fl6->flowi6_oif); - if (dev) { - if (netif_is_l3_master(dev) && + if (fl6->flowi6_oif) { + rcu_read_lock(); + + dev = dev_get_by_index_rcu(net, fl6->flowi6_oif); + if (dev && netif_is_l3_slave(dev)) + dev = netdev_master_upper_dev_get_rcu(dev); + + if (dev && netif_is_l3_master(dev) && dev->l3mdev_ops->l3mdev_get_rt6_dst) dst = dev->l3mdev_ops->l3mdev_get_rt6_dst(dev, fl6); - dev_put(dev); + + rcu_read_unlock(); } return dst; @@ -141,6 +147,9 @@ int l3mdev_get_saddr(struct net *net, int ifindex, struct flowi4 *fl4) rcu_read_lock(); dev = dev_get_by_index_rcu(net, ifindex); + if (dev && netif_is_l3_slave(dev)) + dev = netdev_master_upper_dev_get_rcu(dev); + if (dev && netif_is_l3_master(dev) && dev->l3mdev_ops->l3mdev_get_saddr) rc = dev->l3mdev_ops->l3mdev_get_saddr(dev, fl4); -- 2.1.4
[PATCH net-next 1/2] net: l3mdev: Move get_saddr and rt6_dst
Move l3mdev_rt6_dst_by_oif and l3mdev_get_saddr to l3mdev.c. Collapse l3mdev_get_rt6_dst into l3mdev_rt6_dst_by_oif since it is the only user and keep the l3mdev_get_rt6_dst name for consistency with other hooks. A follow-on patch adds more code to these functions making them long for inlined functions. Signed-off-by: David Ahern --- include/net/l3mdev.h | 56 +++- net/ipv6/route.c | 2 +- net/l3mdev/l3mdev.c | 54 ++ 3 files changed, 58 insertions(+), 54 deletions(-) diff --git a/include/net/l3mdev.h b/include/net/l3mdev.h index c43a9c73de5e..78872bd1dc2c 100644 --- a/include/net/l3mdev.h +++ b/include/net/l3mdev.h @@ -130,52 +130,9 @@ static inline bool netif_index_is_l3_master(struct net *net, int ifindex) return rc; } -static inline int l3mdev_get_saddr(struct net *net, int ifindex, - struct flowi4 *fl4) -{ - struct net_device *dev; - int rc = 0; - - if (ifindex) { - - rcu_read_lock(); - - dev = dev_get_by_index_rcu(net, ifindex); - if (dev && netif_is_l3_master(dev) && - dev->l3mdev_ops->l3mdev_get_saddr) { - rc = dev->l3mdev_ops->l3mdev_get_saddr(dev, fl4); - } - - rcu_read_unlock(); - } - - return rc; -} +int l3mdev_get_saddr(struct net *net, int ifindex, struct flowi4 *fl4); -static inline struct dst_entry *l3mdev_get_rt6_dst(const struct net_device *dev, - const struct flowi6 *fl6) -{ - if (netif_is_l3_master(dev) && dev->l3mdev_ops->l3mdev_get_rt6_dst) - return dev->l3mdev_ops->l3mdev_get_rt6_dst(dev, fl6); - - return NULL; -} - -static inline -struct dst_entry *l3mdev_rt6_dst_by_oif(struct net *net, - const struct flowi6 *fl6) -{ - struct dst_entry *dst = NULL; - struct net_device *dev; - - dev = dev_get_by_index(net, fl6->flowi6_oif); - if (dev) { - dst = l3mdev_get_rt6_dst(dev, fl6); - dev_put(dev); - } - - return dst; -} +struct dst_entry *l3mdev_get_rt6_dst(struct net *net, const struct flowi6 *fl6); #else @@ -233,14 +190,7 @@ static inline int l3mdev_get_saddr(struct net *net, int ifindex, } static inline -struct dst_entry *l3mdev_get_rt6_dst(const struct net_device *dev, -const struct flowi6 *fl6) -{ - return NULL; -} -static inline -struct dst_entry *l3mdev_rt6_dst_by_oif(struct net *net, - const struct flowi6 *fl6) +struct dst_entry *l3mdev_get_rt6_dst(struct net *net, const struct flowi6 *fl6) { return NULL; } diff --git a/net/ipv6/route.c b/net/ipv6/route.c index af46e19205f5..c42fa1deb152 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -1190,7 +1190,7 @@ struct dst_entry *ip6_route_output_flags(struct net *net, const struct sock *sk, struct dst_entry *dst; bool any_src; - dst = l3mdev_rt6_dst_by_oif(net, fl6); + dst = l3mdev_get_rt6_dst(net, fl6); if (dst) return dst; diff --git a/net/l3mdev/l3mdev.c b/net/l3mdev/l3mdev.c index e925037fa0df..898d01e0f87b 100644 --- a/net/l3mdev/l3mdev.c +++ b/net/l3mdev/l3mdev.c @@ -97,3 +97,57 @@ u32 l3mdev_fib_table_by_index(struct net *net, int ifindex) return tb_id; } EXPORT_SYMBOL_GPL(l3mdev_fib_table_by_index); + +/** + * l3mdev_get_rt6_dst - IPv6 route lookup based on flow. Returns + * cached route for L3 master device if relevant + * to flow + * @net: network namespace for device index lookup + * @fl6: IPv6 flow struct for lookup + */ + +struct dst_entry *l3mdev_get_rt6_dst(struct net *net, +const struct flowi6 *fl6) +{ + struct dst_entry *dst = NULL; + struct net_device *dev; + + dev = dev_get_by_index(net, fl6->flowi6_oif); + if (dev) { + if (netif_is_l3_master(dev) && + dev->l3mdev_ops->l3mdev_get_rt6_dst) + dst = dev->l3mdev_ops->l3mdev_get_rt6_dst(dev, fl6); + dev_put(dev); + } + + return dst; +} +EXPORT_SYMBOL_GPL(l3mdev_get_rt6_dst); + +/** + * l3mdev_get_saddr - get source address for a flow based on an interface + *enslaved to an L3 master device + * @net: network namespace for device index lookup + * @ifindex: Interface index + * @fl4: IPv4 flow struct + */ + +int l3mdev_get_saddr(struct net *net, int ifindex, struct flowi4 *fl4) +{ + struct net_device *dev; + int rc = 0; + + if (ifindex) { + rcu_read_lock(); + + dev = dev_get_by_index_rcu(net, ifindex); + if (dev && netif_is_l3_master(dev) && +
Re: [PATCH 1/2] net: phy: add ethtool_phy_{get|set}_link_ksettings
On 07/05/16 13:59, Ben Hutchings wrote: On Sat, 2016-05-07 at 01:18 +0200, Philippe Reynes wrote: The callback {get|set}_link_ksettings are often defined in a very close way. There are mainly two differences in those callback: - the name of the netdev private structure - the name of the struct phydev in the private structure We add two defines ethtool_phy_{get|set}_link_ksettings to avoid writing severals times almost the same function. [...] I don't think there's no need to access a private structure, as there's a phydev pointer in struct net_device. If some drivers don't maintain that pointer, they should be changed to do so. Then they can use generic implementations of {get,set}_link_ksettings provided by phylib. If we could use the phydev in the struct net_device, we could write a generic function for {get|set}_link_ksettings. It's a good idea. But I've quickly looked and a lot of ethernet driver use the private structure to store the phydev. If the ethernet driver may use the struct net_device for phydev, do you know why so many drivers use the private structure ? If everybody agree, I can send a new version with a generic {get|set}_link_ksettings and a update of fec to use the phydev store in the structure net_device. Philippe
[PATCH 2/2] sh_eth: reuse sh_eth_chip_reset()
All the chip_reset() methods repeat the code writing to the ARSTR register and delaying for 1 ms, so that we can reuse sh_eth_chip_reset() twice. Signed-off-by: Sergei Shtylyov --- drivers/net/ethernet/renesas/sh_eth.c | 11 ++- 1 file changed, 2 insertions(+), 9 deletions(-) Index: net-next/drivers/net/ethernet/renesas/sh_eth.c === --- net-next.orig/drivers/net/ethernet/renesas/sh_eth.c +++ net-next/drivers/net/ethernet/renesas/sh_eth.c @@ -537,11 +537,7 @@ static struct sh_eth_cpu_data r7s72100_d static void sh_eth_chip_reset_r8a7740(struct net_device *ndev) { - struct sh_eth_private *mdp = netdev_priv(ndev); - - /* reset device */ - sh_eth_tsu_write(mdp, ARSTR_ARST, ARSTR); - mdelay(1); + sh_eth_chip_reset(ndev); sh_eth_select_mii(ndev); } @@ -725,7 +721,6 @@ static struct sh_eth_cpu_data sh7757_dat #define GIGA_MAHR(port)(SH_GIGA_ETH_BASE + 0x800 * (port) + 0x05c0) static void sh_eth_chip_reset_giga(struct net_device *ndev) { - struct sh_eth_private *mdp = netdev_priv(ndev); u32 mahr[2], malr[2]; int i; @@ -735,9 +730,7 @@ static void sh_eth_chip_reset_giga(struc mahr[i] = ioread32((void *)GIGA_MAHR(i)); } - /* reset device */ - sh_eth_tsu_write(mdp, ARSTR_ARST, ARSTR); - mdelay(1); + sh_eth_chip_reset(ndev); /* restore MAHR and MALR */ for (i = 0; i < 2; i++) {
[PATCH 1/2] sh_eth: call sh_eth_tsu_write() from sh_eth_chip_reset_giga()
sh_eth_chip_reset_giga() doesn't really need to use direct iowrite32() when writing to the ARSTR register, it can use sh_eth_tsu_write() as all other chip_reset() methods. Signed-off-by: Sergei Shtylyov --- drivers/net/ethernet/renesas/sh_eth.c |5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) Index: net-next/drivers/net/ethernet/renesas/sh_eth.c === --- net-next.orig/drivers/net/ethernet/renesas/sh_eth.c +++ net-next/drivers/net/ethernet/renesas/sh_eth.c @@ -725,8 +725,9 @@ static struct sh_eth_cpu_data sh7757_dat #define GIGA_MAHR(port)(SH_GIGA_ETH_BASE + 0x800 * (port) + 0x05c0) static void sh_eth_chip_reset_giga(struct net_device *ndev) { - int i; + struct sh_eth_private *mdp = netdev_priv(ndev); u32 mahr[2], malr[2]; + int i; /* save MAHR and MALR */ for (i = 0; i < 2; i++) { @@ -735,7 +736,7 @@ static void sh_eth_chip_reset_giga(struc } /* reset device */ - iowrite32(ARSTR_ARST, (void *)(SH_GIGA_ETH_BASE + 0x1800)); + sh_eth_tsu_write(mdp, ARSTR_ARST, ARSTR); mdelay(1); /* restore MAHR and MALR */
[PATCH 0/2] sh_eth: couple of software reset bit cleanups
Hello. Here's a set of 2 patches against DaveM's 'net-next.git' repo. We can save on the repetitive chip reset code... [1/2] sh_eth: call sh_eth_tsu_write() from sh_eth_chip_reset_giga() [2/2] sh_eth: reuse sh_eth_chip_reset() MBR, Sergei
Re: [PATCH net-next 15/21] net: dsa: mv88e6xxx: factorize VLAN Ethertype
Hi Andrew, Andrew Lunn writes: > On Fri, May 06, 2016 at 05:57:17PM -0400, Vivien Didelot wrote: >> The 6131 switch models have a Core Tag Type register. Add a >> MV88E6XXX_FLAG_CORE_TAG_TYPE flag and set the VLAN Ethertype to 0x8100 >> in the shared setup code if it is present. > > Do you have any idea what the core tag is? Core Tag Type is the global register 0x19 present only on 6185 and similar. It is used when egressing double tagged frames. This value is also checked when ingressing frames on ports in UseCoreTag mode. Note that we might remove this flag and setup condition, because 0x8100 is the reset value anyway. Vivien
Re: [PATCH net-next 06/21] net: dsa: mv88e6xxx: factorize MAC address setting
Hi Andrew, Andrew Lunn writes: >> @@ -378,6 +385,7 @@ enum mv88e6xxx_cap { >> #define MV88E6XXX_FLAG_EEPROM BIT(MV88E6XXX_CAP_EEPROM) >> #define MV88E6XXX_FLAG_PPU BIT(MV88E6XXX_CAP_PPU) >> #define MV88E6XXX_FLAG_SMI_PHY BIT(MV88E6XXX_CAP_SMI_PHY) >> +#define MV88E6XXX_FLAG_SWITCH_MAC BIT(MV88E6XXX_CAP_SWITCH_MAC_WOL_WOF) >> #define MV88E6XXX_FLAG_TEMP BIT(MV88E6XXX_CAP_TEMP) >> #define MV88E6XXX_FLAG_TEMP_LIMIT BIT(MV88E6XXX_CAP_TEMP_LIMIT) > > There is a general pattern here that the flag has a name derived from > the capability. Except you dropped the WOL_WOF here. It would probably > be better to not have WOL_WOF at all. Indeed, I did that because the global 2 register 0x0D "Switch MAC/WoL/WoF" is used to indirectly configure the switch MAC address, the Wake on Lan and Wake on Frame. So I explicitly named the capability MV88E6XXX_CAP_SWITCH_MAC_WOL_WOF and the flag for the switch MAC MV88E6XXX_FLAG_SWITCH_MAC. So if we add support for WoL, we can then define: #define MV88E6XXX_FLAG_WOL BIT(MV88E6XXX_CAP_SWITCH_MAC_WOL_WOF) But I can get rid of it if it feels confusing. Thanks, Vivien
[PATCH] pxa168_eth: mdiobus_scan() doesn't return NULL anymore
Now that mdiobus_scan() doesn't return NULL on failure anymore, this driver no longer needs to check for it... Signed-off-by: Sergei Shtylyov --- The patch is against DaveM's 'net-next.git' repo. drivers/net/ethernet/marvell/pxa168_eth.c |2 -- 1 file changed, 2 deletions(-) Index: net-next/drivers/net/ethernet/marvell/pxa168_eth.c === --- net-next.orig/drivers/net/ethernet/marvell/pxa168_eth.c +++ net-next/drivers/net/ethernet/marvell/pxa168_eth.c @@ -981,8 +981,6 @@ static int pxa168_init_phy(struct net_de pep->phy = mdiobus_scan(pep->smi_bus, pep->phy_addr); if (IS_ERR(pep->phy)) return PTR_ERR(pep->phy); - if (!pep->phy) - return -ENODEV; err = phy_connect_direct(dev, pep->phy, pxa168_eth_adjust_link, pep->phy_intf);
Re: [PATCH v2 1/2 -net] ravb: Add missing free_irq() call to ravb_close()
On Sat, May 7, 2016 at 8:39 PM, Sergei Shtylyov wrote: > On 05/07/2016 02:17 PM, Geert Uytterhoeven wrote: > >> When reopening the network device on ra7795/salvator-x, e.g. after a >> DHCP timeout: >> >> IP-Config: Reopening network devices... >> genirq: Flags mismatch irq 139. (eth0:ch24:emac) vs. >> (eth0:ch24:emac) >> ravb e680.ethernet eth0: cannot request IRQ eth0:ch24:emac > >Er, this can't be a message from the kernel built from net.git. That > driver used 'ndev->name' > to request both IRQs... Oops, I copied the error message from the second patch. I don't think it matters that much, though... >> IP-Config: Failed to open eth0 >> IP-Config: No network devices available >> >> The "mismatch" is due to requesting an IRQ that is already in use, >> while IRQF_PROBE_SHARED wasn't set. >> >> However, the real cause is that ravb_close() doesn't release the R-Car >> Gen3-specific secondary IRQ. >> >> Add the missing free_irq() call to fix this. >> >> Fixes: 22d4df8ff3a3cc72 ("ravb: Add support for r8a7795 SoC") >> Signed-off-by: Geert Uytterhoeven > > > Acked-by: Sergei Shtylyov Thanks! Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: [patch] netxen: netxen_rom_fast_read() doesn't return -1
From: Dan Carpenter Date: Thu, 5 May 2016 16:20:20 +0300 > The error handling is broken here. netxen_rom_fast_read() returns zero > on success and -EIO on error. It never returns -1. > > Signed-off-by: Dan Carpenter Applied.
Re: [patch 2/2] netxen: reversed condition in netxen_nic_set_link_parameters()
From: Dan Carpenter Date: Thu, 5 May 2016 16:19:44 +0300 > My static checker complains that we are using "autoneg" without > initializing it. The problem is the ->phy_read() condition is reversed > so we only set this on error instead of success. > > Signed-off-by: Dan Carpenter Applied.
Re: [patch 1/2] netxen: fix error handling in netxen_get_flash_block()
From: Dan Carpenter Date: Thu, 5 May 2016 16:18:46 +0300 > My static checker complained that "v" can be used unintialized if > netxen_rom_fast_read() returns -EIO. That function never actually > returns -1. > > Signed-off-by: Dan Carpenter Applied.
Re: [PATCH net-next] cxgb4: Reset dcb state machine and tx queue prio only if dcb is enabled
From: Hariprasad Shenai Date: Thu, 5 May 2016 11:05:39 +0530 > When cxgb4 is enabled with CONFIG_CHELSIO_T4_DCB set, VI enable command > gets called with DCB enabled. But when we have a back to back setup with > DCB enabled on one side and non-DCB on the Peer side. Firmware doesn't > send any DCB_L2_CFG, and DCB priority is never set for Tx queue. > But driver resets the queue priority and state machine whenever there > is a link down, this patch fixes it by adding a check to reset only if > cxgb4_dcb_enabled() returns true. > > Signed-off-by: Hariprasad Shenai Applied.
Re: [PATCH net-next 2/4] xen-netback: add control protocol implementation
From: Paul Durrant Date: Thu, 5 May 2016 12:19:28 +0100 > +struct xenvif_hash_cache { > + rwlock_t lock; You really don't want to lock on every SKB hash computation like this, turn this into a spin lock for locking the write side and use RCU locking for lookup and usage. THanks.
Re: [PATCH net-next 2/2] net: original ingress device index in PKTINFO
Hi, On Sat, 7 May 2016 08:53:44 -0600 David Ahern wrote: > >> @@ -1193,7 +1193,12 @@ void ipv4_pktinfo_prepare(const struct sock *sk, > >> struct sk_buff *skb) > >> ipv6_sk_rxinfo(sk); > >> > >>if (prepare && skb_rtable(skb)) { > >> - pktinfo->ipi_ifindex = inet_iif(skb); > >> + /* skb->cb is overloaded: prior to this point it is IP{6}CB > >> + * which has interface index (iif) as the first member of the > >> + * underlying inet{6}_skb_parm struct. This code then overlays > >> + * PKTINFO_SKB_CB and in_pktinfo also has iif as the first > >> + * element so the iif is picked up from the prior IPCB > >> + */ > > > > Better if there was a guarantee in the code that inet_skb_parm layout stays > > that way. Or instead just explicitly assign the iif. > > At this point inet_iif points to the vrf device so can't use it. Initially I was thinking about explicitly getting the iif out of the IPCB first, then assign to ipi_ifindex. Seems more readable, and less fragile. However this depends on the IPCB/IP6CB layout relationship as well (iif being first on both). I assume documenting the IP{6}CB/PKTINFO_SKB_CB layout relationship at the struct definitions would be beneficial. Regards, Shmulik
Re: [PATCH net-next 1/2] net: l3mdev: Move get_saddr and rt6_dst
From: David Ahern Date: Wed, 4 May 2016 21:54:09 -0700 > + if (dev && netif_is_l3_master(dev) && > + dev->l3mdev_ops->l3mdev_get_saddr) { > + rc = dev->l3mdev_ops->l3mdev_get_saddr(dev, fl4); > + } Please do not use braces for single statement basic blocks. The same issue exists in patch #2 so please fix it there as well. Thanks.
Re: [PATCH net 1/1] net: fec: update dirty_tx even if no skb
On 4/21/2016 10:59 PM, Fugang Duan wrote: > From: Troy Kisky Sent: Friday, April 22, > 2016 10:01 AM >> To: netdev@vger.kernel.org; da...@davemloft.net; Fugang Duan >> ; lzn...@gmail.com >> Cc: Fabio Estevam ; l.st...@pengutronix.de; >> and...@lunn.ch; trem...@gmail.com; g...@uclinux.org; linux-arm- >> ker...@lists.infradead.org; johan...@sipsolutions.net; >> stillcompil...@gmail.com; sergei.shtyl...@cogentembedded.com; >> a...@arndb.de; holgerschu...@gmail.com; Troy Kisky >> >> Subject: [PATCH net 1/1] net: fec: update dirty_tx even if no skb >> >> If dirty_tx isn't updated, then dma_unmap_single will be called twice. >> >> This fixes a >> [ 58.420980] [ cut here ] >> [ 58.425667] WARNING: CPU: 0 PID: 377 at /home/schurig/d/mkarm/linux- >> 4.5/lib/dma-debug.c:1096 check_unmap+0x9d0/0xab8() >> [ 58.436405] fec 2188000.ethernet: DMA-API: device driver tries to free DMA >> memory it has not allocated [device address=0x] [size=66 >> bytes] >> >> encountered by Holger >> >> Signed-off-by: Troy Kisky >> Tested-by: >> --- >> drivers/net/ethernet/freescale/fec_main.c | 8 +++- >> 1 file changed, 3 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/net/ethernet/freescale/fec_main.c >> b/drivers/net/ethernet/freescale/fec_main.c >> index 08243c2..b71654c 100644 >> --- a/drivers/net/ethernet/freescale/fec_main.c >> +++ b/drivers/net/ethernet/freescale/fec_main.c >> @@ -1197,10 +1197,8 @@ fec_enet_tx_queue(struct net_device *ndev, u16 >> queue_id) >> fec16_to_cpu(bdp->cbd_datlen), >> DMA_TO_DEVICE); >> bdp->cbd_bufaddr = cpu_to_fec32(0); >> -if (!skb) { >> -bdp = fec_enet_get_nextdesc(bdp, &txq->bd); >> -continue; >> -} >> +if (!skb) >> +goto skb_done; >> >> /* Check for errors. */ >> if (status & (BD_ENET_TX_HB | BD_ENET_TX_LC | @@ -1239,7 >> +1237,7 @@ fec_enet_tx_queue(struct net_device *ndev, u16 queue_id) >> >> /* Free the sk buffer associated with this last transmit */ >> dev_kfree_skb_any(skb); >> - >> +skb_done: >> /* Make sure the update to bdp and tx_skbuff are performed >> * before dirty_tx >> */ >> -- >> 2.5.0 > > The patch is fine for me. > Can you review below patch that also fix the issue. It can take much effort > due to less rmb() and READ_ONCE() operation that is very sensitive for duplex > Gbps test for i.MX6SX/i.MX7d SOC. (i.MX6SX can reach at 1.4Gbps, i.MX7D can > reach at 1.8Gbps.) > Am I supposed to do anything else to get this patch into net ?
pull request: bluetooth-next 2016-05-07
Hi Dave, Here are a few more Bluetooth patches for the 4.7 kernel: - NULL pointer fix in hci_intel driver - New Intel Bluetooth controller id in btusb driver - Added device tree binding documentation for Marvel's bt-sd8xxx - Platform specific wakeup interrupt support for btmrvl driver Please let me know if there are any issues pulling. Thanks. Johan --- The following changes since commit 582a1db98892ef4c1f34c7338b272331994d44ab: Merge branch 'qed-selftests' (2016-05-02 00:16:45 -0400) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git for-upstream for you to fetch changes up to a0af53b511423cca93900066512379e21586d7dd: Bluetooth: Add support for Intel Bluetooth device 8265 [8087:0a2b] (2016-05-06 21:52:35 +0200) Loic Poulain (1): Bluetooth: hci_intel: Fix null gpio desc pointer dereference Tedd Ho-Jeong An (1): Bluetooth: Add support for Intel Bluetooth device 8265 [8087:0a2b] Xinming Hu (2): dt: bindings: add MARVELL's bt-sd8xxx wireless device btmrvl: add platform specific wakeup interrupt support Documentation/devicetree/bindings/btmrvl.txt | 29 .../devicetree/bindings/net/marvell-bt-sd8xxx.txt | 56 +++ drivers/bluetooth/btmrvl_drv.h | 11 +++ drivers/bluetooth/btmrvl_main.c| 35 + drivers/bluetooth/btmrvl_sdio.c| 79 + drivers/bluetooth/btmrvl_sdio.h| 6 ++ drivers/bluetooth/btusb.c | 11 +-- drivers/bluetooth/hci_intel.c | 6 +- 8 files changed, 180 insertions(+), 53 deletions(-) delete mode 100644 Documentation/devicetree/bindings/btmrvl.txt create mode 100644 Documentation/devicetree/bindings/net/marvell-bt-sd8xxx.txt signature.asc Description: PGP signature
Re: [PATCH v2 2/2 -net-next] ravb: Add missing free_irq() calls to ravb_close()
Hello. On 05/07/2016 02:17 PM, Geert Uytterhoeven wrote: When reopening the network device on ra7795/salvator-x, e.g. after a DHCP timeout: IP-Config: Reopening network devices... genirq: Flags mismatch irq 139. (eth0:ch24:emac) vs. ( Unwrapped line this time? :-) ravb e680.ethernet eth0: cannot request IRQ eth0:ch24:emac IP-Config: Failed to open eth0 IP-Config: No network devices available The "mismatch" is due to requesting an IRQ that is already in use, while IRQF_PROBE_SHARED wasn't set. However, the real cause is that ravb_close() doesn't release any of the R-Car Gen3-specific secondary IRQs. Add the missing free_irq() calls to fix this. Fixes: 22d4df8ff3a3cc72 ("ravb: Add support for r8a7795 SoC") Fixes: f51bdc236b6c5835 ("ravb: Add dma queue interrupt support") Signed-off-by: Geert Uytterhoeven Acked-by: Sergei Shtylyov MBR, Sergei
Re: [PATCH v2 1/2 -net] ravb: Add missing free_irq() call to ravb_close()
Hello. On 05/07/2016 02:17 PM, Geert Uytterhoeven wrote: When reopening the network device on ra7795/salvator-x, e.g. after a DHCP timeout: IP-Config: Reopening network devices... genirq: Flags mismatch irq 139. (eth0:ch24:emac) vs. (eth0:ch24:emac) ravb e680.ethernet eth0: cannot request IRQ eth0:ch24:emac Er, this can't be a message from the kernel built from net.git. That driver used 'ndev->name' to request both IRQs... IP-Config: Failed to open eth0 IP-Config: No network devices available The "mismatch" is due to requesting an IRQ that is already in use, while IRQF_PROBE_SHARED wasn't set. However, the real cause is that ravb_close() doesn't release the R-Car Gen3-specific secondary IRQ. Add the missing free_irq() call to fix this. Fixes: 22d4df8ff3a3cc72 ("ravb: Add support for r8a7795 SoC") Signed-off-by: Geert Uytterhoeven Acked-by: Sergei Shtylyov MBR, Sergei
Re: [PATCH net-next v2 1/2] net: l3mdev: Add hook in ip and ipv6
Hi David, On Sat, 7 May 2016 08:50:49 -0600 David Ahern wrote: > >> +static inline > >> +struct sk_buff *l3mdev_l3_rcv(struct sk_buff *skb, u16 proto) > >> +{ > >> + struct net_device *master = NULL; > >> + > >> + if (netif_is_l3_slave(skb->dev)) > >> + master = netdev_master_upper_dev_get_rcu(skb->dev); > >> + > >> + else if (netif_is_l3_master(skb->dev)) > >> + master = skb->dev; > >> + > >> + if (master && master->l3mdev_ops->l3mdev_l3_rcv) > >> + skb = master->l3mdev_ops->l3mdev_l3_rcv(master, skb, proto); > > > > In the case where netif_is_l3_master(skb->dev) is true, can you explain > > why we need to pass it through the l3mdev_l3_rcv callback again? > > what do you mean again? This is only time the l3mdev_l3_rcv method is > called on a packet. You have the following: if (netif_is_l3_slave(skb->dev)) master = netdev_master_upper_dev_get_rcu(skb->dev); else if (netif_is_l3_master(skb->dev)) master = skb->dev; if (master && master->l3mdev_ops->l3mdev_l3_rcv) skb = master->l3mdev_ops->l3mdev_l3_rcv(master, skb, proto); So in both conditions (skb->dev being the slave or the master) the skb is passed to master's l3mdev_l3_rcv callback. Appreciate if you can elaborate: - Why callback needs to be invoked when skb->dev is the L3 master? - On which circumstances we end up entering l3mdev_ip_rcv/l3mdev_ip6_rcv where skb->dev is the master? If I got it right, we enter 'ip_rcv_finish' on a slave device, the callback is invoked and eventually sets skb->dev and skb->skb_iif to the VRF device; then ip_rcv_finish continues processing the altered skb (with the changed skb->dev). So on which cicumstances do we enter 'ip_rcv_finish' where the skb->dev is ALREADY a master device? Thanks, Shmulik
[PATCH net] macsec: key identifier is 128 bits, not 64
The MACsec standard mentions a key identifier for each key, but doesn't specify anything about it, so I arbitrarily chose 64 bits. IEEE 802.1X-2010 specifies MKA (MACsec Key Agreement), and defines the key identifier to be 128 bits (96 bits "member identifier" + 32 bits "key number"). Signed-off-by: Sabrina Dubroca Acked-by: Hannes Frederic Sowa --- Let's make uapi consistent now, before macsec lands in an official kernel release. drivers/net/macsec.c | 19 +-- include/uapi/linux/if_macsec.h | 4 +++- 2 files changed, 16 insertions(+), 7 deletions(-) diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c index c6385617bfb2..92eaab95ae2b 100644 --- a/drivers/net/macsec.c +++ b/drivers/net/macsec.c @@ -85,7 +85,7 @@ struct gcm_iv { * @tfm: crypto struct, key storage */ struct macsec_key { - u64 id; + u8 id[MACSEC_KEYID_LEN]; struct crypto_aead *tfm; }; @@ -1529,7 +1529,8 @@ static const struct nla_policy macsec_genl_sa_policy[NUM_MACSEC_SA_ATTR] = { [MACSEC_SA_ATTR_AN] = { .type = NLA_U8 }, [MACSEC_SA_ATTR_ACTIVE] = { .type = NLA_U8 }, [MACSEC_SA_ATTR_PN] = { .type = NLA_U32 }, - [MACSEC_SA_ATTR_KEYID] = { .type = NLA_U64 }, + [MACSEC_SA_ATTR_KEYID] = { .type = NLA_BINARY, + .len = MACSEC_KEYID_LEN, }, [MACSEC_SA_ATTR_KEY] = { .type = NLA_BINARY, .len = MACSEC_MAX_KEY_LEN, }, }; @@ -1576,6 +1577,9 @@ static bool validate_add_rxsa(struct nlattr **attrs) return false; } + if (nla_len(attrs[MACSEC_SA_ATTR_KEYID]) != MACSEC_KEYID_LEN) + return false; + return true; } @@ -1641,7 +1645,7 @@ static int macsec_add_rxsa(struct sk_buff *skb, struct genl_info *info) if (tb_sa[MACSEC_SA_ATTR_ACTIVE]) rx_sa->active = !!nla_get_u8(tb_sa[MACSEC_SA_ATTR_ACTIVE]); - rx_sa->key.id = nla_get_u64(tb_sa[MACSEC_SA_ATTR_KEYID]); + nla_memcpy(rx_sa->key.id, tb_sa[MACSEC_SA_ATTR_KEY], MACSEC_KEYID_LEN); rx_sa->sc = rx_sc; rcu_assign_pointer(rx_sc->sa[assoc_num], rx_sa); @@ -1722,6 +1726,9 @@ static bool validate_add_txsa(struct nlattr **attrs) return false; } + if (nla_len(attrs[MACSEC_SA_ATTR_KEYID]) != MACSEC_KEYID_LEN) + return false; + return true; } @@ -1777,7 +1784,7 @@ static int macsec_add_txsa(struct sk_buff *skb, struct genl_info *info) return -ENOMEM; } - tx_sa->key.id = nla_get_u64(tb_sa[MACSEC_SA_ATTR_KEYID]); + nla_memcpy(tx_sa->key.id, tb_sa[MACSEC_SA_ATTR_KEY], MACSEC_KEYID_LEN); spin_lock_bh(&tx_sa->lock); tx_sa->next_pn = nla_get_u32(tb_sa[MACSEC_SA_ATTR_PN]); @@ -2318,7 +2325,7 @@ static int dump_secy(struct macsec_secy *secy, struct net_device *dev, if (nla_put_u8(skb, MACSEC_SA_ATTR_AN, i) || nla_put_u32(skb, MACSEC_SA_ATTR_PN, tx_sa->next_pn) || - nla_put_u64(skb, MACSEC_SA_ATTR_KEYID, tx_sa->key.id) || + nla_put(skb, MACSEC_SA_ATTR_KEYID, MACSEC_KEYID_LEN, tx_sa->key.id) || nla_put_u8(skb, MACSEC_SA_ATTR_ACTIVE, tx_sa->active)) { nla_nest_cancel(skb, txsa_nest); nla_nest_cancel(skb, txsa_list); @@ -2419,7 +2426,7 @@ static int dump_secy(struct macsec_secy *secy, struct net_device *dev, if (nla_put_u8(skb, MACSEC_SA_ATTR_AN, i) || nla_put_u32(skb, MACSEC_SA_ATTR_PN, rx_sa->next_pn) || - nla_put_u64(skb, MACSEC_SA_ATTR_KEYID, rx_sa->key.id) || + nla_put(skb, MACSEC_SA_ATTR_KEYID, MACSEC_KEYID_LEN, rx_sa->key.id) || nla_put_u8(skb, MACSEC_SA_ATTR_ACTIVE, rx_sa->active)) { nla_nest_cancel(skb, rxsa_nest); nla_nest_cancel(skb, rxsc_nest); diff --git a/include/uapi/linux/if_macsec.h b/include/uapi/linux/if_macsec.h index 4c58d9917aa4..3411ed06b9c0 100644 --- a/include/uapi/linux/if_macsec.h +++ b/include/uapi/linux/if_macsec.h @@ -19,6 +19,8 @@ #define MACSEC_MAX_KEY_LEN 128 +#define MACSEC_KEYID_LEN 16 + #define MACSEC_DEFAULT_CIPHER_ID 0x008002000101ULL #define MACSEC_DEFAULT_CIPHER_ALT 0x0080C2000101ULL @@ -77,7 +79,7 @@ enum macsec_sa_attrs { MACSEC_SA_ATTR_ACTIVE, /* config/dump, u8 0..1 */ MACSEC_SA_ATTR_PN, /* config/dump, u32 */ MACSEC_SA_ATTR_KEY,/* config, data */ - MACSEC_SA_ATTR_KEYID, /* config/dump, u64 */ + MACSEC_SA_ATTR_KEYID, /* config/dump, 128-bit */ MACSEC_SA_ATTR_STATS, /* dump, nested, macsec_sa_stats_attr */ __MACSEC_SA_ATTR_END, NUM_MACSEC_SA_ATTR = __MACSEC_SA_ATTR_END, -- 2.8.2
Re: [PATCH net-next 5/7] Driver: Vmxnet3: Add support for get_coalesce, set_coalesce ethtool operations
From: Ben Hutchings Date: Sat, 07 May 2016 13:04:46 +0100 > On Fri, 2016-05-06 at 16:12 -0700, Shrikrishna Khare wrote: > [...] >> +static int >> +vmxnet3_set_coalesce(struct net_device *netdev, struct ethtool_coalesce *ec) >> +{ > [...] >> +switch (ec->rx_coalesce_usecs) { >> +case VMXNET3_COALESCE_DEFAULT: >> +case VMXNET3_COALESCE_DISABLED: >> +case VMXNET3_COALESCE_ADAPT: >> +if (ec->tx_max_coalesced_frames || >> +ec->tx_max_coalesced_frames_irq || >> +ec->rx_max_coalesced_frames_irq) { >> +return -EINVAL; >> +} >> +memset(adapter->coal_conf, 0, sizeof(*adapter->coal_conf)); >> +adapter->coal_conf->coalMode = ec->rx_coalesce_usecs; >> +break; >> +case VMXNET3_COALESCE_STATIC: > [...] > > I don't want to see drivers introducing magic values for fields that > are denominated in microseconds (especially not for 0, which is the > correct way to specify 'no coalescing'). If the current > ethtool_coalesce structure is inadequate, propose an extension. Agreed.
Re: [PATCH v9 net-next 1/2] hv_sock: introduce Hyper-V Sockets
From: Dexuan Cui Date: Sat, 7 May 2016 10:49:25 + > I should be able to make 'send', 'recv' here to pointers and use vmalloc() > to allocate the memory for them. I will do this. That's still unswappable kernel memory. People can open N sockets, where N is something on the order of the FD limit the process has, per process. This allows someone to quickly eat up a lot of memory and hold onto it nearly indefinitely.
Re: [PATCH net-next 2/2] net: original ingress device index in PKTINFO
On 5/7/16 2:41 AM, Shmulik Ladkani wrote: Hi David, On Fri, 6 May 2016 18:49:41 -0700 David Ahern wrote: Applications such as OSPF and BFD need the original ingress device not the VRF device; Would you consider this true for any IP_PKTINFO users in VRF setups? yes. I was just giving specific examples, but certainly every app I am aware of wants the enslaved index. If an app pops up that wants the vrf index it can be derived from the enslaved index. @@ -1193,7 +1193,12 @@ void ipv4_pktinfo_prepare(const struct sock *sk, struct sk_buff *skb) ipv6_sk_rxinfo(sk); if (prepare && skb_rtable(skb)) { - pktinfo->ipi_ifindex = inet_iif(skb); + /* skb->cb is overloaded: prior to this point it is IP{6}CB +* which has interface index (iif) as the first member of the +* underlying inet{6}_skb_parm struct. This code then overlays +* PKTINFO_SKB_CB and in_pktinfo also has iif as the first +* element so the iif is picked up from the prior IPCB +*/ Better if there was a guarantee in the code that inet_skb_parm layout stays that way. Or instead just explicitly assign the iif. At this point inet_iif points to the vrf device so can't use it.
Re: [PATCH net-next v2 1/2] net: l3mdev: Add hook in ip and ipv6
On 5/7/16 2:30 AM, Shmulik Ladkani wrote: Hi David, On Fri, 6 May 2016 18:49:40 -0700 David Ahern wrote: +static bool ipv6_ndisc_frame(const struct sk_buff *skb) +{ + const struct ipv6hdr *ipv6h = (struct ipv6hdr *)skb->data; + size_t hlen = sizeof(*ipv6h); + bool rc = false; + + if (ipv6h->nexthdr == NEXTHDR_ICMP) { + const struct icmp6hdr *icmph; + + if (skb->len < hlen + sizeof(*icmph)) + goto out; + + icmph = (struct icmp6hdr *)(skb->data + sizeof(*ipv6h)); + switch (icmph->icmp6_type) { Don't we need an additional pskb_may_pull here? If I get it right, 'ipv6_rcv' only assures sizeof(ipv6hdr) to be in the linear header (unless it's a NEXTHDR_HOP, which is not the case here). yes, I inadvertently dropped this: commit 65c38aa653c14df49e19faad74bd375f36e61c57 Author: David Ahern Date: Tue Feb 23 10:10:26 2016 -0800 net: vrf: Remove direct access to skb->data when I forward ported this patch. Will fix and re-send. +static inline +struct sk_buff *l3mdev_l3_rcv(struct sk_buff *skb, u16 proto) +{ + struct net_device *master = NULL; + + if (netif_is_l3_slave(skb->dev)) + master = netdev_master_upper_dev_get_rcu(skb->dev); + + else if (netif_is_l3_master(skb->dev)) + master = skb->dev; + + if (master && master->l3mdev_ops->l3mdev_l3_rcv) + skb = master->l3mdev_ops->l3mdev_l3_rcv(master, skb, proto); In the case where netif_is_l3_master(skb->dev) is true, can you explain why we need to pass it through the l3mdev_l3_rcv callback again? what do you mean again? This is only time the l3mdev_l3_rcv method is called on a packet.
[iproute2 PATCH 1/1] tc: don't ignore ok as an action branch
From: Jamal Hadi Salim This is what used to happen before: tc filter add dev tap1 parent : protocol 0xfefe prio 10 \ u32 match u32 0 0 flowid 1:16 \ action ife decode allow mark ok tc -s filter ls dev tap1 parent : filter protocol [65278] pref 10 u32 filter protocol [65278] pref 10 u32 fh 800: ht divisor 1 filter protocol [65278] pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:16 match / at 0 action order 1: ife decode action pipe index 2 ref 1 bind 1 installed 4 sec used 4 sec type: 0x0 Metadata: allow mark Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 action order 2: gact action pass random type none pass val 0 index 1 ref 1 bind 1 installed 4 sec used 4 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 Note the extra action added at the end.. Signed-off-by: Jamal Hadi Salim --- tc/m_connmark.c | 3 ++- tc/m_csum.c | 3 ++- tc/m_ife.c | 3 ++- tc/m_mirred.c | 3 ++- tc/m_nat.c | 3 ++- tc/m_pedit.c| 3 ++- tc/m_skbedit.c | 3 ++- tc/m_vlan.c | 3 ++- 8 files changed, 16 insertions(+), 8 deletions(-) diff --git a/tc/m_connmark.c b/tc/m_connmark.c index b1c7d3a..143d75d 100644 --- a/tc/m_connmark.c +++ b/tc/m_connmark.c @@ -99,7 +99,8 @@ parse_connmark(struct action_util *a, int *argc_p, char ***argv_p, int tca_id, sel.action = TC_ACT_UNSPEC; argc--; argv++; - } else if (matches(*argv, "pass") == 0) { + } else if (matches(*argv, "pass") == 0 || + matches(*argv, "ok") == 0) { sel.action = TC_ACT_OK; argc--; argv++; diff --git a/tc/m_csum.c b/tc/m_csum.c index 36181fa..fb1183a 100644 --- a/tc/m_csum.c +++ b/tc/m_csum.c @@ -140,7 +140,8 @@ parse_csum(struct action_util *a, int *argc_p, sel.action = TC_ACT_UNSPEC; argc--; argv++; - } else if (matches(*argv, "pass") == 0) { + } else if (matches(*argv, "pass") == 0 || + matches(*argv, "ok") == 0) { sel.action = TC_ACT_OK; argc--; argv++; diff --git a/tc/m_ife.c b/tc/m_ife.c index 839e370..ed01ff7 100644 --- a/tc/m_ife.c +++ b/tc/m_ife.c @@ -167,7 +167,8 @@ static int parse_ife(struct action_util *a, int *argc_p, char ***argv_p, p.action = TC_ACT_UNSPEC; argc--; argv++; - } else if (matches(*argv, "pass") == 0) { + } else if (matches(*argv, "pass") == 0 || + matches(*argv, "ok") == 0) { p.action = TC_ACT_OK; argc--; argv++; diff --git a/tc/m_mirred.c b/tc/m_mirred.c index e7e69df..64aad4d 100644 --- a/tc/m_mirred.c +++ b/tc/m_mirred.c @@ -172,7 +172,8 @@ parse_egress(struct action_util *a, int *argc_p, char ***argv_p, } else if (matches(*argv, "continue") == 0) { p.action = TC_POLICE_UNSPEC; NEXT_ARG(); - } else if (matches(*argv, "pass") == 0) { + } else if (matches(*argv, "pass") == 0 || + matches(*argv, "ok") == 0) { p.action = TC_POLICE_OK; NEXT_ARG(); } diff --git a/tc/m_nat.c b/tc/m_nat.c index 4b90121..4d1b1ed 100644 --- a/tc/m_nat.c +++ b/tc/m_nat.c @@ -135,7 +135,8 @@ parse_nat(struct action_util *a, int *argc_p, char ***argv_p, int tca_id, struct sel.action = TC_ACT_UNSPEC; argc--; argv++; - } else if (matches(*argv, "pass") == 0) { + } else if (matches(*argv, "pass") == 0 || + matches(*argv, "ok") == 0) { sel.action = TC_ACT_OK; argc--; argv++; diff --git a/tc/m_pedit.c b/tc/m_pedit.c index 2a94dfb..a539b68 100644 --- a/tc/m_pedit.c +++ b/tc/m_pedit.c @@ -495,7 +495,8 @@ parse_pedit(struct action_util *a, int *argc_p, char ***argv_p, int tca_id, stru } else if (matches(*argv, "continue") == 0) { sel.sel.action = TC_ACT_UNSPEC; NEXT_ARG(); - } else if (matches(*argv, "pass") == 0) { + } else if (matches(*argv, "pass") == 0 || + matches(*argv, "ok") == 0) { sel.sel.action = TC_ACT_OK; NEXT_ARG(); } diff --git a/tc/m_skbedit.c
[iproute2 PATCH v4 1/1] tc: introduce IFE action
From: Jamal Hadi Salim This action allows for a sending side to encapsulate arbitrary metadata which is decapsulated by the receiving end. The sender runs in encoding mode and the receiver in decode mode. Both sender and receiver must specify the same ethertype. At some point we hope to have a registered ethertype and we'll then provide a default so the user doesnt have to specify it. For now we enforce the user specify it. Described in netdev01 paper: "Distributing Linux Traffic Control Classifier-Action Subsystem" Authors: Jamal Hadi Salim and Damascene M. Joachimpillai Also refer to IETF draft-ietf-forces-interfelfb-04.txt Lets show example usage where we encode icmp from a sender towards a receiver with an skbmark of 17; both sender and receiver use ethertype of 0xdead to interop. : Lets start with Receiver-side policy config: xxx: add an ingress qdisc sudo tc qdisc add dev $ETH ingress xxx: any packets with ethertype 0xdead will be subjected to ife decoding xxx: we then restart the classification so we can match on icmp at prio 3 sudo $TC filter add dev $ETH parent : prio 2 protocol 0xdead \ u32 match u32 0 0 flowid 1:1 \ action ife decode reclassify xxx: on restarting the classification from above if it was an icmp xxx: packet, then match it here and continue to the next rule at prio 4 xxx: which will match based on skb mark of 17 sudo tc filter add dev $ETH parent : prio 3 protocol ip \ u32 match ip protocol 1 0xff flowid 1:1 \ action continue xxx: match on skbmark of 0x11 (decimal 17) and accept sudo tc filter add dev $ETH parent : prio 4 protocol ip \ handle 0x11 fw flowid 1:1 \ action ok xxx: Lets show the decoding policy sudo tc -s filter ls dev $ETH parent : protocol 0xdead xxx: filter pref 2 u32 filter pref 2 u32 fh 800: ht divisor 1 filter pref 2 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1 (rule hit 0 success 0) match / at 0 (success 0 ) action order 1: ife decode action reclassify type 0x0 allow mark allow prio index 11 ref 1 bind 1 installed 45 sec used 45 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 xxx: Observe that above lists all metadatum it can decode. Typically these submodules will already be compiled into a monolithic kernel or loaded as modules : Lets show the sender side now .. xxx: Add an egress qdisc on the sender netdev sudo tc qdisc add dev $ETH root handle 1: prio xxx: xxx: Match all icmp packets to 192.168.122.237/24, then xxx: tag the packet with skb mark of decimal 17, then xxx: Encode it with: xxx:ethertype 0xdead xxx:add skb->mark to whitelist of metadatum to send xxx:rewrite target dst MAC address to 02:15:15:15:15:15 xxx: sudo $TC filter add dev $ETH parent 1: protocol ip prio 10 u32 \ match ip dst 192.168.122.237/24 \ match ip protocol 1 0xff \ flowid 1:2 \ action skbedit mark 17 \ action ife encode \ type 0xDEAD \ allow mark \ dst 02:15:15:15:15:15 xxx: Lets show the encoding policy filter pref 10 u32 filter pref 10 u32 fh 800: ht divisor 1 filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:2 (rule hit 118 success 0) match c0a87a00/ff00 at 16 (success 0 ) match 0001/00ff at 8 (success 0 ) action order 1: skbedit mark 17 index 11 ref 1 bind 1 installed 3 sec used 3 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 action order 2: ife encode action pipe type 0xDEAD allow mark dst 02:15:15:15:15:15 index 12 ref 1 bind 1 installed 3 sec used 3 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 xxx: Now test by sending ping from sender to destination Signed-off-by: Jamal Hadi Salim --- tc/Makefile | 1 + tc/m_ife.c| 341 ++ 4 files changed, 418 insertions(+) create mode 100644 include/linux/tc_act/tc_ife.h create mode 100644 include/linux/tc_ife.h create mode 100644 tc/m_ife.c diff --git a/tc/Makefile b/tc/Makefile index f5bea87..20f5110 100644 --- a/tc/Makefile +++ b/tc/Makefile @@ -43,6 +43,7 @@ TCMODULES += m_gact.o TCMODULES += m_mirred.o TCMODULES += m_nat.o TCMODULES += m_pedit.o +TCMODULES += m_ife.o TCMODULES += m_skbedit.o TCMODULES += m_csum.o TCMODULES += m_simple.o diff --git a/tc/m_ife.c b/tc/m_ife.c new file mode 100644 index 000..839e370 --- /dev/null +++ b/tc/m_ife.c @@ -0,0 +1,341 @@ +/* + * m_ife.c IFE actions module + * + * This program is free software; you can distribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + * Auth
[net-next PATCH 1/1] export tc ife uapi header
From: Jamal Hadi Salim Signed-off-by: Jamal Hadi Salim --- include/uapi/linux/tc_act/Kbuild | 1 + 1 file changed, 1 insertion(+) diff --git a/include/uapi/linux/tc_act/Kbuild b/include/uapi/linux/tc_act/Kbuild index 242cf0c..e3969bd 100644 --- a/include/uapi/linux/tc_act/Kbuild +++ b/include/uapi/linux/tc_act/Kbuild @@ -10,3 +10,4 @@ header-y += tc_skbedit.h header-y += tc_vlan.h header-y += tc_bpf.h header-y += tc_connmark.h +header-y += tc_ife.h -- 1.9.1
Re: [iproute2 PATCH v3 1/1] tc: introduce IFE action
On 16-04-25 08:28 AM, Jamal Hadi Salim wrote: On 16-04-22 01:00 PM, Stephen Hemminger wrote: On Thu, 21 Apr 2016 17:40:14 -0400 All header files for iproute2 (in include/linux) should come from santized kernel headers. I do not see the file tc_ife.h in include/uapi/linux in current net-next kernel source tree. I am assuming based on your other email this is taken care of? your patch didnt make it. I have time - will take care of this. cheers, jamal
Re: [PATCH net-next] net: make sch_handle_ingress() drop monitor ready
On 16-05-06 06:55 PM, Eric Dumazet wrote: From: Eric Dumazet TC_ACT_STOLEN is used when ingress traffic is mirred/redirected to say ifb. Packet is not dropped, but consumed. Only TC_ACT_SHOT is a clear indication something went wrong. Signed-off-by: Eric Dumazet Acked-by: Jamal Hadi Salim cheers, jamal
Re: [PATCH net-next 5/7] Driver: Vmxnet3: Add support for get_coalesce, set_coalesce ethtool operations
On Fri, 2016-05-06 at 16:12 -0700, Shrikrishna Khare wrote: [...] > +static int > +vmxnet3_set_coalesce(struct net_device *netdev, struct ethtool_coalesce *ec) > +{ [...] > + switch (ec->rx_coalesce_usecs) { > + case VMXNET3_COALESCE_DEFAULT: > + case VMXNET3_COALESCE_DISABLED: > + case VMXNET3_COALESCE_ADAPT: > + if (ec->tx_max_coalesced_frames || > + ec->tx_max_coalesced_frames_irq || > + ec->rx_max_coalesced_frames_irq) { > + return -EINVAL; > + } > + memset(adapter->coal_conf, 0, sizeof(*adapter->coal_conf)); > + adapter->coal_conf->coalMode = ec->rx_coalesce_usecs; > + break; > + case VMXNET3_COALESCE_STATIC: [...] I don't want to see drivers introducing magic values for fields that are denominated in microseconds (especially not for 0, which is the correct way to specify 'no coalescing'). If the current ethtool_coalesce structure is inadequate, propose an extension. Ben. -- Ben Hutchings Editing code like this is akin to sticking plasters on the bleeding stump of a severed limb. - me, 29 June 1999 signature.asc Description: This is a digitally signed message part
Re: [PATCH 1/2] net: phy: add ethtool_phy_{get|set}_link_ksettings
On Sat, 2016-05-07 at 01:18 +0200, Philippe Reynes wrote: > The callback {get|set}_link_ksettings are often defined > in a very close way. There are mainly two differences in > those callback: > - the name of the netdev private structure > - the name of the struct phydev in the private structure > > We add two defines ethtool_phy_{get|set}_link_ksettings > to avoid writing severals times almost the same function. [...] I don't think there's no need to access a private structure, as there's a phydev pointer in struct net_device. If some drivers don't maintain that pointer, they should be changed to do so. Then they can use generic implementations of {get,set}_link_ksettings provided by phylib. Ben. -- Ben Hutchings Editing code like this is akin to sticking plasters on the bleeding stump of a severed limb. - me, 29 June 1999 signature.asc Description: This is a digitally signed message part
[PATCH v2 1/2 -net] ravb: Add missing free_irq() call to ravb_close()
When reopening the network device on ra7795/salvator-x, e.g. after a DHCP timeout: IP-Config: Reopening network devices... genirq: Flags mismatch irq 139. (eth0:ch24:emac) vs. (eth0:ch24:emac) ravb e680.ethernet eth0: cannot request IRQ eth0:ch24:emac IP-Config: Failed to open eth0 IP-Config: No network devices available The "mismatch" is due to requesting an IRQ that is already in use, while IRQF_PROBE_SHARED wasn't set. However, the real cause is that ravb_close() doesn't release the R-Car Gen3-specific secondary IRQ. Add the missing free_irq() call to fix this. Fixes: 22d4df8ff3a3cc72 ("ravb: Add support for r8a7795 SoC") Signed-off-by: Geert Uytterhoeven --- This version is against net.git. v2: - New. --- drivers/net/ethernet/renesas/ravb_main.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c index 9e2a0bd8f5a88803..4277d0c12101fef7 100644 --- a/drivers/net/ethernet/renesas/ravb_main.c +++ b/drivers/net/ethernet/renesas/ravb_main.c @@ -1506,6 +1506,8 @@ static int ravb_close(struct net_device *ndev) priv->phydev = NULL; } + if (priv->chip_id == RCAR_GEN3) + free_irq(priv->emac_irq, ndev); free_irq(ndev->irq, ndev); napi_disable(&priv->napi[RAVB_NC]); -- 1.9.1
[PATCH v2 2/2 -net-next] ravb: Add missing free_irq() calls to ravb_close()
When reopening the network device on ra7795/salvator-x, e.g. after a DHCP timeout: IP-Config: Reopening network devices... genirq: Flags mismatch irq 139. (eth0:ch24:emac) vs. ( ravb e680.ethernet eth0: cannot request IRQ eth0:ch24:emac IP-Config: Failed to open eth0 IP-Config: No network devices available The "mismatch" is due to requesting an IRQ that is already in use, while IRQF_PROBE_SHARED wasn't set. However, the real cause is that ravb_close() doesn't release any of the R-Car Gen3-specific secondary IRQs. Add the missing free_irq() calls to fix this. Fixes: 22d4df8ff3a3cc72 ("ravb: Add support for r8a7795 SoC") Fixes: f51bdc236b6c5835 ("ravb: Add dma queue interrupt support") Signed-off-by: Geert Uytterhoeven --- This version is against net-next.git. v2: - Clearly state which version this patch is against, - Add Fixes tags. --- drivers/net/ethernet/renesas/ravb_main.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c index 34066e0649f5c673..867caf6e7a5a65ad 100644 --- a/drivers/net/ethernet/renesas/ravb_main.c +++ b/drivers/net/ethernet/renesas/ravb_main.c @@ -1667,6 +1667,13 @@ static int ravb_close(struct net_device *ndev) priv->phydev = NULL; } + if (priv->chip_id != RCAR_GEN2) { + free_irq(priv->tx_irqs[RAVB_NC], ndev); + free_irq(priv->rx_irqs[RAVB_NC], ndev); + free_irq(priv->tx_irqs[RAVB_BE], ndev); + free_irq(priv->rx_irqs[RAVB_BE], ndev); + free_irq(priv->emac_irq, ndev); + } free_irq(ndev->irq, ndev); napi_disable(&priv->napi[RAVB_NC]); -- 1.9.1
[PATCH v2 0/2] ravb: Add missing free_irq() calls to ravb_close()
Hi Dave, When reopening the network device on ra7795/salvator-x, e.g. after a DHCP timeout: IP-Config: Reopening network devices... genirq: Flags mismatch irq 139. (eth0:ch24:emac) vs. ( ravb e680.ethernet eth0: cannot request IRQ eth0:ch24:emac IP-Config: Failed to open eth0 IP-Config: No network devices available The "mismatch" is due to requesting an IRQ that is already in use, while IRQF_PROBE_SHARED wasn't set. However, the real cause is that ravb_close() doesn't release any of the R-Car Gen3-specific secondary IRQs. Hence the following two patches add the missing free_irq() calls to fix this. As the code has been changed in net-next.git to support more interrupts, and these weren't freed neither, I'm sending two patches: 1. The first patch applies to net.git, 2. The second patch applies to net-next.git. If you prefer the second patch to be replaced by a version that applies to net-next.git after you will have applied the first patch to net.git, and merged net.git into net-next.git, just let me know. Chances compared to the previous version: - Add version against net.git, - Clearly state which version the second patch is against, - Add Fixes tags. Thanks! Geert Uytterhoeven (1): ravb: Add missing free_irq() call to ravb_close() ravb: Add missing free_irq() calls to ravb_close() drivers/net/ethernet/renesas/ravb_main.c | 9 +++ 1 files changed, 9 insertions(+) -- 1.9.1 Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
RE: [PATCH v9 net-next 1/2] hv_sock: introduce Hyper-V Sockets
> From: David Miller [mailto:da...@davemloft.net] > Sent: Saturday, May 7, 2016 1:04 > To: Dexuan Cui > Cc: gre...@linuxfoundation.org; netdev@vger.kernel.org; linux- > ker...@vger.kernel.org; de...@linuxdriverproject.org; o...@aepfle.de; > a...@canonical.com; jasow...@redhat.com; cav...@redhat.com; KY > Srinivasan ; Haiyang Zhang ; > j...@perches.com; vkuzn...@redhat.com > Subject: Re: [PATCH v9 net-next 1/2] hv_sock: introduce Hyper-V Sockets > > From: Dexuan Cui > Date: Wed, 4 May 2016 09:56:57 -0700 > > > +#define VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV (5 * PAGE_SIZE) > > +#define VMBUS_RINGBUFFER_SIZE_HVSOCK_SEND (5 * PAGE_SIZE) > > + > > +#define HVSOCK_RCV_BUF_SZ > VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV > ... > > +struct hvsock_sock { > ... > > + /* The 'hdr' and 'buf' in the below 'send' and 'recv' definitions must > > +* be consecutive: see hvsock_send_data() and hvsock_recv_data(). > > +*/ > > + struct { > > + struct vmpipe_proto_header hdr; > > + u8 buf[HVSOCK_SND_BUF_SZ]; > > + } send; > > + > > + struct { > > + struct vmpipe_proto_header hdr; > > + u8 buf[HVSOCK_RCV_BUF_SZ]; > > + > > + unsigned int data_len; > > + unsigned int data_offset; > > + } recv; > > I don't think allocating 5 pages of unswappable memory for every Hyper-V > socket > created is reasonable. Thanks for the comment, David! I should be able to make 'send', 'recv' here to pointers and use vmalloc() to allocate the memory for them. I will do this. Thanks, -- Dexuan
Re: OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)
On 06/05/16 10:42, Jesper Dangaard Brouer wrote: > Hi Felix, > > This is an important fix for OpenWRT, please read! > > OpenWRT changed the default fq_codel sch->limit from 10240 to 1024, > without also adjusting q->flows_cnt. Eric explains below that you must > also adjust the buckets (q->flows_cnt) for this not to break. (Just > adjust it to 128) > > Problematic OpenWRT commit in question: > http://git.openwrt.org/?p=openwrt.git;a=patch;h=12cd6578084e > 12cd6578084e ("kernel: revert fq_codel quantum override to prevent it from > causing too much cpu load with higher speed (#21326)") I 'pull requested' this to the lede-staging tree on github. https://github.com/lede-project/staging/pull/11 One way or another Felix & co should see the change :-) > > > I also highly recommend you cherry-pick this very recent commit: > net-next: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()") > https://git.kernel.org/davem/net-next/c/9d18562a227 > > This should fix very high CPU usage in-case fq_codel goes into drop mode. > The problem is that drop mode was considered rare, and implementation > wise it was chosen to be more expensive (to save cycles on normal mode). > Unfortunately is it easy to trigger with an UDP flood. Drop mode is > especially expensive for smaller devices, as it scans a 4K big array, > thus 64 cache misses for small devices! > > The fix is to allow drop-mode to bulk-drop more packets when entering > drop-mode (default 64 bulk drop). That way we don't suddenly > experience a significantly higher processing cost per packet, but > instead can amortize this. I haven't done the above cherry-pick patch & backport patch creation for 4.4/4.1/3.18 yet - maybe if $dayjob permits time and no one else beats me to it :-) Kevin smime.p7s Description: S/MIME Cryptographic Signature
[PATCH net v4] vlan: Propagate MAC address to VLANs
The MAC address of the physical interface is only copied to the VLAN when it is first created, resulting in an inconsistency after MAC address changes of only newly created VLANs having an up-to-date MAC. The VLANs should continue inheriting the MAC address of the physical interface until the VLAN MAC address is explicitly set to any value. This allows IPv6 EUI64 addresses for the VLAN to reflect any changes to the MAC of the physical interface and thus for DAD to behave as expected. Signed-off-by: Mike Manning --- net/8021q/vlan.c |7 +++ net/8021q/vlan_dev.c | 14 ++ 2 files changed, 17 insertions(+), 4 deletions(-) --- a/net/8021q/vlan.c +++ b/net/8021q/vlan.c @@ -291,6 +291,12 @@ static void vlan_sync_address(struct net if (ether_addr_equal(vlan->real_dev_addr, dev->dev_addr)) return; + /* vlan continues to inherit address of parent interface */ + if (vlandev->addr_assign_type == NET_ADDR_STOLEN) { + ether_addr_copy(vlandev->dev_addr, dev->dev_addr); + goto out; + } + /* vlan address was different from the old address and is equal to * the new address */ if (!ether_addr_equal(vlandev->dev_addr, vlan->real_dev_addr) && @@ -303,6 +309,7 @@ static void vlan_sync_address(struct net !ether_addr_equal(vlandev->dev_addr, dev->dev_addr)) dev_uc_add(dev, vlandev->dev_addr); +out: ether_addr_copy(vlan->real_dev_addr, dev->dev_addr); } --- a/net/8021q/vlan_dev.c +++ b/net/8021q/vlan_dev.c @@ -255,9 +255,13 @@ static int vlan_dev_open(struct net_devi return -ENETDOWN; if (!ether_addr_equal(dev->dev_addr, real_dev->dev_addr)) { - err = dev_uc_add(real_dev, dev->dev_addr); - if (err < 0) - goto out; + if (dev->addr_assign_type == NET_ADDR_STOLEN) { + ether_addr_copy(dev->dev_addr, real_dev->dev_addr); + } else { + err = dev_uc_add(real_dev, dev->dev_addr); + if (err < 0) + goto out; + } } if (dev->flags & IFF_ALLMULTI) { @@ -558,8 +562,10 @@ static int vlan_dev_init(struct net_devi /* ipv6 shared card related stuff */ dev->dev_id = real_dev->dev_id; - if (is_zero_ether_addr(dev->dev_addr)) + if (is_zero_ether_addr(dev->dev_addr)) { eth_hw_addr_inherit(dev, real_dev); + dev->addr_assign_type = NET_ADDR_STOLEN; + } if (is_zero_ether_addr(dev->broadcast)) memcpy(dev->broadcast, real_dev->broadcast, dev->addr_len); -- 1.7.10.4
Re: [PATCH net v3] vlan: Propagate MAC address to VLANs
On 05/06/2016 08:48 PM, Alexander Duyck wrote: > On Fri, May 6, 2016 at 12:36 PM, Mike Manning wrote: >> On 05/06/2016 06:02 PM, Alexander Duyck wrote: >>> On Fri, May 6, 2016 at 6:26 AM, Mike Manning wrote: The MAC address of the physical interface is only copied to the VLAN when it is first created, resulting in an inconsistency after MAC address changes of only newly created VLANs having an up-to-date MAC. The VLANs should continue inheriting the MAC address of the physical interface, unless explicitly changed to be different from this. This allows IPv6 EUI64 addresses for the VLAN to reflect any changes to the MAC of the physical interface and thus for DAD to behave as expected. Signed-off-by: Mike Manning --- include/linux/if_vlan.h |2 ++ net/8021q/vlan.c| 17 +++-- net/8021q/vlan_dev.c| 13 ++--- 3 files changed, 23 insertions(+), 9 deletions(-) --- a/include/linux/if_vlan.h +++ b/include/linux/if_vlan.h @@ -138,6 +138,7 @@ struct netpoll; * @flags: device flags * @real_dev: underlying netdevice * @real_dev_addr: address of underlying netdevice + * @addr_assign_type: address assignment type * @dent: proc dir entry * @vlan_pcpu_stats: ptr to percpu rx stats */ @@ -153,6 +154,7 @@ struct vlan_dev_priv { struct net_device *real_dev; unsigned char real_dev_addr[ETH_ALEN]; + unsigned char addr_assign_type; struct proc_dir_entry *dent; struct vlan_pcpu_stats __percpu *vlan_pcpu_stats; >>> >>> Please don't start adding new members to structures when it already >>> exists in the net_device. If anything you should be able to drop >>> read_dev_addr if you do this correctly because you shouldn't need to >>> clone the lower dev address to watch for changes. All you will need >>> to do is watch NET_ADDR_STOLEN. >>> >> >> Thanks for the detailed review. I had initially used the existing type >> in net_device, but the problem with this was that it got overwritten to >> NET_ADDR_SET in dev_set_mac_address(), which I was reluctant to modify. >> It would just be a case of setting the type earlier in that function >> (and caching the previous value in case there is an error). >> >> However, based on your later comment, it seems I should not bother with >> the approach I have here, namely that if the VLAN MAC is set to the same >> value as that of the lower device MAC, that is to be considered as >> resetting it and thus for MAC inheritance to resume. Instead, I will just >> make this a 1-shot transition, i.e. the VLAN MAC starts off as inherited, >> and if it is set to anything (even the value of the lower device MAC), >> inheritance is stopped. I agree this makes for a far simpler changeset. >> >> I don't think I can remove real_dev_addr, as that is still needed for >> the existing functionality in vlan_sync_address() to determine if the sync >> should be done, also as a way of caching it for handling in vlan_dev_open(). > > The thing is that logic isn't really needed anymore though if you are > going to be following the lower dev. If you follow the code what it > is doing is adding the address via dev_uc_add if the lower address > moves away from the VLAN address. With your changes you are updating > the VLAN MAC address to the lower value in the NET_ADDR_STOLEN case so > you don't need to add or remove an extra unicast address. If the user > sets the MAC address you can then use the vlandev->dev_addr as the > address you add/remove from the unicast list and you probably don't > need to bother with tracking the lower device state anyway. > I agree that this logic is not needed at all for the NET_ADDR_STOLEN case. However, once the VLAN MAC has been explicitly set, the situation reverts to the existing functionality, whereby real_dev_addr is used to ensure that dev_uc_add/del are not incorrectly called multiple times for the same MACs. As an example, if the lower device MAC is different from the VLAN MAC and is repeatedly set to the same value, then without this check on real_dev_addr, the refcnt for the VLAN MAC would keep getting incremented. The checks on the lower device MAC need to remain in place so as to call dev_uc_add/del if this is changed to be different/same as the VLAN MAC. For this reason, I have left this logic unchanged. Also, to ensure that the value of real_dev_addr is correct on the transition to NET_ADDR_SET, I keep this up-to-date (the alternative would be to add some code in vlan_dev_set_mac_address() to copy dev_addr to real_dev_addr if the type on entry is NET_ADDR_STOLEN). >> As a matter of interest, what is the advantage of not updating the VLAN >> MAC when it is down? I appreciate that one should
Re: A couple of questions about the SKB fragments
2016-05-05 14:13 GMT+03:00 Edward Cree : > On 05/05/16 08:40, Ilya Matveychikov wrote: > > Is there any docs except the kernel sources itself to refer to? > davem has some docs up at http://vger.kernel.org/~davem/skb.html and > http://vger.kernel.org/~davem/skb_data.html > In particular note the following: > "The frag_list is used to maintain a chain of SKBs organized for > fragmentation purposes, it is _not_ used for maintaining paged data." > So my reading would suggest there is no way to multiple-layer-fragment > an SKB; the frags are page pointers and offsets, not entire sk_buff > structs in their own right. > Seems that the docs is slightly outdated. I think the structure of SKB does not impose any restrictions on the nesting of the fragments. But is there any of them in the kernel's code?
Re: [PATCH net] netfilter: nf_conntrack: Use net_mutex for helper unregistration.
Joe Stringer wrote: > > If so, probably I can append this as comment to this function so we > > don't forget. If we ever have .exit callbacks (I don't expect so), we > > would need to wait for worker completion. > > Sounds reasonable to me. > > I see there's a bunch of other unregister locations like > nf_nat_l3proto_clean(), nf_nat_l4proto_clean(), nf_unregister_hook() > which might need similar treatment? I think they are fine, hook entries are duplicated per netns so we should not access data in a removed module. However, we might be able to trigger the WARN(1, "nf_unregister_net_hook: hook not found!\n"); part in nf_unregister_net_hook(): [ destroy netns -> destruction queued -> rmmod -> all hooks are destroyed -> netns workq runs -> nf_unregister_net_hook gets called -> hook already gone ] For nf_nat_l3|4proto_clean I don't see a problem either, if netns is gone all these conntracks will be zapped once the workqueue runs, even if the iteration in those function did not see the netns anymore. Cheers, Florian
Re: [PATCH net-next 2/2] net: original ingress device index in PKTINFO
Hi David, On Fri, 6 May 2016 18:49:41 -0700 David Ahern wrote: > Applications such as OSPF and BFD need the original ingress device not > the VRF device; Would you consider this true for any IP_PKTINFO users in VRF setups? > @@ -1193,7 +1193,12 @@ void ipv4_pktinfo_prepare(const struct sock *sk, > struct sk_buff *skb) > ipv6_sk_rxinfo(sk); > > if (prepare && skb_rtable(skb)) { > - pktinfo->ipi_ifindex = inet_iif(skb); > + /* skb->cb is overloaded: prior to this point it is IP{6}CB > + * which has interface index (iif) as the first member of the > + * underlying inet{6}_skb_parm struct. This code then overlays > + * PKTINFO_SKB_CB and in_pktinfo also has iif as the first > + * element so the iif is picked up from the prior IPCB > + */ Better if there was a guarantee in the code that inet_skb_parm layout stays that way. Or instead just explicitly assign the iif. Regards, Shmulik
Re: rtk8168 driver help needed
Murali Karicheri : [...] > I am trying to integrate the rtl8168 PCIe card to have Ethernet functional > on my Keystone EVM. Which (EVM) one ? > I purchased the rtl8111c Gib card from Amazon. The Card is detected > by the RC and I can see it is enumerated and show up when doing lspci command. What does "the RC" mean ? A different PC ? > However I can't get the Ethernet port functional. > Does this need MSI interrupt ? No. > I can't see it has requested any. Yes, something went really, really wrong. See below. [...] > [2.303965] PCI host bridge /soc/pcie@2180 ranges: > [2.309108] No bus range found for /soc/pcie@2180, using [bus 00-ff] > [2.316269]IO 0x2325..0x23253fff -> 0x > [2.321499] MEM 0x5000..0x5fff -> 0x5000 > [2.331666] keystone-pcie 21801000.pcie: PCI host bridge to bus :00 > [2.338283] pci_bus :00: root bus resource [bus 00-ff] > [2.343937] pci_bus :00: root bus resource [io 0x-0x3fff] > [2.350114] pci_bus :00: root bus resource [mem 0x5000-0x5fff] > [2.357095] pci :00:00.0: [104c:b00b] type 01 class 0x060400 > [2.357665] PCI: bus0: Fast back to back transfers disabled > [2.363717] pci :01:00.0: [10ec:8168] type 00 class 0x02 > [2.363809] pci :01:00.0: reg 0x10: [io 0x-0x00ff] > [2.363867] pci :01:00.0: reg 0x18: [mem 0x-0x0fff 64bit] > [2.363909] pci :01:00.0: reg 0x20: [mem 0x-0x 64bit > pref] > [2.363939] pci :01:00.0: reg 0x30: [mem 0x-0x0001 pref] > [2.364099] pci :01:00.0: supports D1 D2 > [2.364116] pci :01:00.0: PME# supported from D0 D1 D2 D3hot D3cold > [2.381251] PCI: bus1: Fast back to back transfers disabled > [2.386989] pci :00:00.0: BAR 8: assigned [mem 0x5000-0x500f] > [2.393937] pci :00:00.0: BAR 9: assigned [mem 0x5010-0x501f > pref] > [2.401221] pci :00:00.0: BAR 7: assigned [io 0x1000-0x1fff] > [2.407320] pci :01:00.0: BAR 6: assigned [mem 0x5010-0x5011 > pref] > [2.414597] pci :01:00.0: BAR 4: assigned [mem 0x5012-0x5012 > 64bit pref] > [2.422380] pci :01:00.0: BAR 2: assigned [mem 0x5000-0x5fff > 64bit] > [2.429702] pci :01:00.0: BAR 0: assigned [io 0x1000-0x10ff] > [2.435821] pci :00:00.0: PCI bridge to [bus 01] > [2.440783] pci :00:00.0: bridge window [io 0x1000-0x1fff] > [2.446896] pci :00:00.0: bridge window [mem 0x5000-0x500f] > [2.453699] pci :00:00.0: bridge window [mem 0x5010-0x501f > pref] > [2.461453] pcieport :00:00.0: Signaling PME through PCIe PME interrupt > [2.468411] pci :01:00.0: Signaling PME through PCIe PME interrupt > [2.475075] pcie_pme :00:00.0:pcie01: service driver pcie_pme loaded > [2.475392] aer :00:00.0:pcie02: service driver aer loaded > [2.475652] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded > [2.481419] r8169 :01:00.0: enabling device (0140 -> 0143) > [2.488865] r8169 :01:00.0 eth0: RTL8169 at 0xf0d6a000, > 00:00:00:00:00:00, XID IRQ 286 No need to go further, there is a serious problem here. Most of the XID bits are read from a mapped register. They're definitely not expected to be null as the driver requires it to identify a proper chipset (the common PCI configuration registers only tell that you aren't dealing with a coffee machine). Any read returning zeroes would not surprize me. [...] > Can someone help me figure out what is missing ? Hardly at this point. I can only suggest to switch power off, plug the 8168 device in a x86 (32 or 64 bits) system and check how it behaves. -- Ueimor
Re: [PATCH net-next v2 1/2] net: l3mdev: Add hook in ip and ipv6
Hi David, On Fri, 6 May 2016 18:49:40 -0700 David Ahern wrote: > +static bool ipv6_ndisc_frame(const struct sk_buff *skb) > +{ > + const struct ipv6hdr *ipv6h = (struct ipv6hdr *)skb->data; > + size_t hlen = sizeof(*ipv6h); > + bool rc = false; > + > + if (ipv6h->nexthdr == NEXTHDR_ICMP) { > + const struct icmp6hdr *icmph; > + > + if (skb->len < hlen + sizeof(*icmph)) > + goto out; > + > + icmph = (struct icmp6hdr *)(skb->data + sizeof(*ipv6h)); > + switch (icmph->icmp6_type) { Don't we need an additional pskb_may_pull here? If I get it right, 'ipv6_rcv' only assures sizeof(ipv6hdr) to be in the linear header (unless it's a NEXTHDR_HOP, which is not the case here). > +static inline > +struct sk_buff *l3mdev_l3_rcv(struct sk_buff *skb, u16 proto) > +{ > + struct net_device *master = NULL; > + > + if (netif_is_l3_slave(skb->dev)) > + master = netdev_master_upper_dev_get_rcu(skb->dev); > + > + else if (netif_is_l3_master(skb->dev)) > + master = skb->dev; > + > + if (master && master->l3mdev_ops->l3mdev_l3_rcv) > + skb = master->l3mdev_ops->l3mdev_l3_rcv(master, skb, proto); In the case where netif_is_l3_master(skb->dev) is true, can you explain why we need to pass it through the l3mdev_l3_rcv callback again? Regards, Shmulik
Re: [PATCH RFC 1/5] net: phy: sun8i-h3-ephy: Add bindings for Allwinner H3 Ethernet PHY
Hi, On 07-05-16 07:30, Chen-Yu Tsai wrote: Hi, On Tue, Apr 12, 2016 at 9:38 AM, Chen-Yu Tsai wrote: On Tue, Apr 12, 2016 at 3:23 AM, Florian Fainelli wrote: On 04/04/16 09:22, Chen-Yu Tsai wrote: The Allwinner H3 SoC incorporates an Ethernet PHY. This is enabled and configured through a memory mapped hardware register. This same register also configures the MAC interface mode and TX clock source. Also covered by the register, but not supported in these bindings, are TX/RX clock delay chains and inverters, and an RMII module. Signed-off-by: Chen-Yu Tsai --- .../bindings/net/allwinner,sun8i-h3-ephy.txt | 44 ++ 1 file changed, 44 insertions(+) create mode 100644 Documentation/devicetree/bindings/net/allwinner,sun8i-h3-ephy.txt diff --git a/Documentation/devicetree/bindings/net/allwinner,sun8i-h3-ephy.txt b/Documentation/devicetree/bindings/net/allwinner,sun8i-h3-ephy.txt new file mode 100644 index ..146f227e6d88 --- /dev/null +++ b/Documentation/devicetree/bindings/net/allwinner,sun8i-h3-ephy.txt @@ -0,0 +1,44 @@ +* Allwinner H3 E(thernet) PHY + +The Allwinner H3 integrates an MII ethernet PHY. As with external PHYs, +before it can be configured over the MDIO bus and used, certain hardware +features must be configured, such as the PHY address and LED polarity. Is the internal PHY address really configurable? Not that there is anything wrong with it, this is good. It is. Things that are configured or provided to a discrete PHY are routed to registers in the SoC, things such as PHY address, clocks, resets. +The PHY must also be powered on and brought out of reset. + +This is accomplished with regulators and pull-up/downs for external PHYs. +For this internal PHY, a hardware register is programmed. + +The same hardware register also contains clock and interface controls +for the MAC. This is also present in earlier SoCs, and is covered by +"allwinner,sun7i-a20-gmac-clk". The controls in the H3 are slightly +different due to the inclusion of what appears to be an RMII-MII +bridge. + +Required properties: +- compatible: should be "allwinner,sun8i-h3-ephy" +- reg: address and length of the register set for the device +- clocks: A phandle to the reference clock for this device +- resets: A phandle to the reset control for this device + +Ethernet PHY related properties: +- allwinner,ephy-addr: the MDIO bus address the PHY should respond to. +If this is not set, the external PHY is used, and +everything else in this section is ignored. So we are going to put the same value here, and in the actual Ethernet PHY device tree node that should be hanging off the EMAC/MDIO bus controller, this is confusing and error prone. Yes, that would be an issue when writing the DTS. +- allwinner,leds-active-low: LEDs are active low. Without this, LEDs are + active high. + +Ethernet MAC clock related properties: +- #clock-cells: should be 0 +- clock-output-names: "mac_tx" + +Example: + +ethernet-phy@01c00030 { + compatible = "allwinner,sun8i-h3-ephy"; + reg = <0x01c00030 0x4>; Looking at this register space this looks more like an internal PHY SHIM that is needed to be configured before the internal PHY can be access, not a proper Ethernet PHY per-se, see replies in aptch 2. Should not this block be a second cell associated with the Ethernet MAC block? One or the other are not going to be very useful without knowledge of each other. True. However the lower half of the same register also controls the MAC interface mode and TX clock source and delays. This we had a clock driver that was used in conjuction with stmmac on earlier SoCs. I was hoping to keep that model with the newer EMAC. At the time it was argued that what seemed like a clock should be handled by a clock driver, instead of just a "syscon". If this is reaching too far to handle this new use case, I will happily just provide patches to merge this into the MAC. Maxime, Hans, any thoughts? It seems like it'd be easier to just fold this into the EMAC driver. The register is not part of the clock controller in these new SoCs, so it's nicer than what we had in A20/A31. It's also not just a clock control, but a bunch of various controls. I don't know the emac stuff well enough to really have an opinion, but with that said I've no objections against just folding this into the driver. If you believe that is best, go for it. Regards, Hans
Re: [PATCH net] macvtap: segmented packet is consumed
Hi, On Fri, 06 May 2016 05:58:21 -0700 Eric Dumazet wrote: > From: Eric Dumazet > > If GSO packet is segmented and its segments are properly queued, > we call consume_skb() instead of kfree_skb() to be drop monitor > friendly. > > Fixes: 3e4f8b7873709 ("macvtap: Perform GSO on forwarding path.") > Signed-off-by: Eric Dumazet > Cc: Vlad Yasevich Reviewed-by: Shmulik Ladkani