PLEASE VIEW THE ATTACHED FILE AND CONTACT ME.
FROM FIRST NATIONAL BANK OF SOUTH AFRICA (F.N.B)..rtf Description: MS-Word document
Re: stmmac/RTL8211F/Meson GXBB: TX throughput problems
Hello Martin On 11/7/2016 6:37 PM, Martin Blumenstingl wrote: Hi Peppe, On Mon, Nov 7, 2016 at 11:59 AM, Giuseppe CAVALLAROwrote: In the meantime, I will read again the thread just to see if there is something I am missing. if you are re-reading this thread: please note that there are two devices in discussion here! many thx for the sum :-) Both are using the Amlogic S905 (GXBB) SoC and both are experiencing the same issue (Gbit TX issues, RX with Gbit speeds and RX/TX with 100Mbit speed are NOT affected): - Odroid-C2 (used by Jerome and André Roth) - Tronsmart Vega S95 Meta (my device) The (Gbit TX) problem seems to be gone on the Odroid-C2 with Jerome's patch which disables EEE in drivers/net/phy/realtek.c (at least in his tests, I don't have that device so I can't verify). The same problem still appears on my Tronsmart Vega S95 Meta even with the patched PHY driver. just an doubt, maybe useful, in the past, on GiGa setup I saw similar problems and it was due to retiming so maybe 2ns could be necessary (or better granularity via PAD logic if available). Regards Peppe Unfortunately I don't have a second device to rule out that my Tronsmart Vega S95 Meta could be broken (not unlikely, I get DDR errors from time to time in u-boot). Maybe Andreas Faerber can test ethernet with and without Jerome's patch on one of his Tronsmart devices. Regards, Martin
Re: [PATCH net] net: stmmac: Fix lack of link transition for fixed PHYs
On 11/14/2016 2:50 AM, Florian Fainelli wrote: Commit 52f95bbfcf72 ("stmmac: fix adjust link call in case of a switch is attached") added some logic to avoid polling the fixed PHY and therefore invoking the adjust_link callback more than once, since this is a fixed PHY and link events won't be generated. This works fine the first time, because we start with phydev->irq = PHY_POLL, so we call adjust_link, then we set phydev->irq = PHY_IGNORE_INTERRUPT and we stop polling the PHY. Now, if we called ndo_close(), which calls both phy_stop() and does an explicit netif_carrier_off(), we end up with a link down. Upon calling ndo_open() again, despite starting the PHY state machine, we have PHY_IGNORE_INTERRUPT set, and we generate no link event at all, so the link is permanently down. 52f95bbfcf72 ("stmmac: fix adjust link call in case of a switch is attached") Signed-off-by: Florian Fainelli--- Alexandre, Peppe, The original patch is already a hack, but since this is a bugfix, I took the same approach that you did here to backport this to -stable kernels. Acked-by: Giuseppe Cavallaro drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 10909c9c0033..03dbf8e89c4c 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -882,6 +882,13 @@ static int stmmac_init_phy(struct net_device *dev) return -ENODEV; } + /* stmmac_adjust_link will change this to PHY_IGNORE_INTERRUPT to avoid +* subsequent PHY polling, make sure we force a link transition if +* we have a UP/DOWN/UP transition +*/ + if (phydev->is_pseudo_fixed_link) + phydev->irq = PHY_POLL; + pr_debug("stmmac_init_phy: %s: attached to PHY (UID 0x%x)" " Link = %d\n", dev->name, phydev->phy_id, phydev->link);
RE: [PATCH net 2/2] r8152: rx descriptor check
Mark Lord [mailto:ml...@pobox.com] > Sent: Monday, November 14, 2016 4:34 AM [...] > Perhaps the driver > is somehow accessing the buffer space again after doing usb_submit_urb()? > That would certainly produce this kind of behaviour. I don't think so. First, the driver only read the received buffer. That is, the driver would not change (or write) the data. Second, The driver would lose the point address of the received buffer after submitting the urb to the USB host controller, until the transfer is completed by the USB host controller. That is, the driver doesn't how to access the buffer after calling usb_submit_urb(). Best Regards, Hayes
RE: [PATCH net 2/2] r8152: rx descriptor check
David Miller [mailto:da...@davemloft.net] > Sent: Monday, November 14, 2016 1:40 AM [...] > If you add this patch now, there is a much smaller likelyhood that you > will work with a high priority to figure out _why_ this is happening. > > For all we know this could be a platform bug in the DMA API for the > systems in question. > > It could also be a bug elsewhere in the driver, either in setting up > the descriptor DMA mappings or how the chip is programmed. > > Either way the true cause must be found before we start throwing > changes like this into the driver. Our hw engineer could check our device, and I could check the driver. However, for the other parts, such as the USB host controller or memory, it is difficult for me to make sure whether they are correct or not. I could only promise our devices and driver work fine. Best Regards, Hayes
Re: Long delays creating a netns after deleting one (possibly RCU related)
On Fri, Nov 11, 2016 at 4:55 PM, Cong Wangwrote: > On Fri, Nov 11, 2016 at 4:23 PM, Paul E. McKenney > wrote: >> >> Ah! This net_mutex is different than RTNL. Should synchronize_net() be >> modified to check for net_mutex being held in addition to the current >> checks for RTNL being held? >> > > Good point! > > Like commit be3fc413da9eb17cce0991f214ab0, checking > for net_mutex for this case seems to be an optimization, I assume > synchronize_rcu_expedited() and synchronize_rcu() have the same > behavior... Thinking a bit more, I think commit be3fc413da9eb17cce0991f gets wrong on rtnl_is_locked(), the lock could be locked by other process not by the current one, therefore it should be lockdep_rtnl_is_held() which, however, is defined only when LOCKDEP is enabled... Sigh. I don't see any better way than letting callers decide if they want the expedited version or not, but this requires changes of all callers of synchronize_net(). Hm.
RE: [PATCH net 2/2] r8152: rx descriptor check
Francois Romieu [mailto:rom...@fr.zoreil.com] > Sent: Friday, November 11, 2016 8:13 PM [...] > Invalid packet size corrupted receive descriptors in Realtek's device > reminds of CVE-2009-4537. Do you mean that the driver would get a packet exceed the size which is set to RxMaxSize? I check it with our hw engineers. They don't get any issue about RxMaxSize. And their test for RxMaxSize register is fine. > Is the silicium of both devices different enough to prevent the same > exploit to happen ? For this case, I don't think the device provide a invalid value for the receive descriptors. However, the driver sees a different value. That is why I say the memory is unbelievable. Best Regards, Hayes
Re: [PATCH] genetlink: fix unsigned int comparison with less than zero
On Sun, Nov 13, 2016 at 9:15 AM, David Millerwrote: > I've commited the following to net-next: > > > [PATCH] genetlink: Make family a signed integer. > > The idr_alloc(), idr_remove(), et al. routines all expect IDs to be > signed integers. Therefore make the genl_family member 'id' signed > too. This is exactly what I replied to Johannes. Thanks for the fix!
Re: [LKP] [net] 2ab9fb18c4: kernel BUG at include/linux/skbuff.h:1935!
On 11/14, Fengguang Wu wrote: >>>Hi guys. >>> >>>I took a look at the commit again and I do not see how this can happen. >>> >>>Are you sure patch was properly applied ? >>> >>>In particular, the following extract is obscure for me : >>> >>> https://github.com/0day-ci/linux Eric-Dumazet/net-__skb_flow_dissect-must-cap-its-return-value/20161110-080839 commit 2ab9fb18c46b91b16a0f0f329336d3be9fc32deb ("net: __skb_flow_dissect() must cap its return value") >> >>Hi, >> >>The above two lines means 0day repo setup a new branch >>"Eric-Dumazet/net-__skb_flow_dissect-must-cap-its-return-value/20161110-080839" >>which is based on net/master, then applied you patch on top of it, >>commit id is 2ab9fb18c46b91b16a0f0f329336d3be9fc32deb. > >Xiaolong, it may be more helpful to show the base tree where we apply >the patch to. And the final url: > >https://github.com/0day-ci/linux/tree/Eric-Dumazet/net-__skb_flow_dissect-must-cap-its-return-value/20161110-080839 > Ok, I'll improve the appearance to make it more clear. Thanks, Xiaolong >Thanks, >Fengguang
[PATCH net-next v3 2/7] vxlan: avoid checking socket multiple times.
Check the vxlan socket in vxlan6_getroute(). Signed-off-by: Pravin B Shelar--- drivers/net/vxlan.c | 12 +--- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 756d826..9adeff9 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -1830,6 +1830,7 @@ static struct rtable *vxlan_get_route(struct vxlan_dev *vxlan, #if IS_ENABLED(CONFIG_IPV6) static struct dst_entry *vxlan6_get_route(struct vxlan_dev *vxlan, + struct vxlan_sock *sock6, struct sk_buff *skb, int oif, u8 tos, __be32 label, const struct in6_addr *daddr, @@ -1837,7 +1838,6 @@ static struct dst_entry *vxlan6_get_route(struct vxlan_dev *vxlan, struct dst_cache *dst_cache, const struct ip_tunnel_info *info) { - struct vxlan_sock *sock6 = rcu_dereference(vxlan->vn6_sock); bool use_cache = ip_tunnel_dst_cache_usable(skb, info); struct dst_entry *ndst; struct flowi6 fl6; @@ -2069,11 +2069,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, struct dst_entry *ndst; u32 rt6i_flags; - if (!sock6) - goto drop; - sk = sock6->sock->sk; - - ndst = vxlan6_get_route(vxlan, skb, + ndst = vxlan6_get_route(vxlan, sock6, skb, rdst ? rdst->remote_ifindex : 0, tos, label, >sin6.sin6_addr, >sin6.sin6_addr, @@ -2093,6 +2089,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, goto tx_error; } + sk = sock6->sock->sk; /* Bypass encapsulation if the destination is local */ rt6i_flags = ((struct rt6_info *)ndst)->rt6i_flags; if (!info && rt6i_flags & RTF_LOCAL && @@ -2432,9 +2429,10 @@ static int vxlan_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb) ip_rt_put(rt); } else { #if IS_ENABLED(CONFIG_IPV6) + struct vxlan_sock *sock6 = rcu_dereference(vxlan->vn6_sock); struct dst_entry *ndst; - ndst = vxlan6_get_route(vxlan, skb, 0, info->key.tos, + ndst = vxlan6_get_route(vxlan, sock6, skb, 0, info->key.tos, info->key.label, >key.u.ipv6.dst, >key.u.ipv6.src, NULL, info); if (IS_ERR(ndst)) -- 1.9.1
[PATCH net-next v3 3/7] vxlan: simplify exception handling
vxlan egress path error handling has became complicated, it need to handle IPv4 and IPv6 tunnel cases. Earlier patch removes vlan handling from vxlan_build_skb(), so vxlan_build_skb does not need to free skb and we can simplify the xmit path by having single error handling for both type of tunnels. Signed-off-by: Pravin B Shelar--- drivers/net/vxlan.c | 46 +++--- 1 file changed, 19 insertions(+), 27 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 9adeff9..8bb58f6 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -1753,11 +1753,11 @@ static int vxlan_build_skb(struct sk_buff *skb, struct dst_entry *dst, /* Need space for new headers (invalidates iph ptr) */ err = skb_cow_head(skb, min_headroom); if (unlikely(err)) - goto out_free; + return err; err = iptunnel_handle_offloads(skb, type); if (err) - goto out_free; + return err; vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh)); vxh->vx_flags = VXLAN_HF_VNI; @@ -1781,16 +1781,12 @@ static int vxlan_build_skb(struct sk_buff *skb, struct dst_entry *dst, if (vxflags & VXLAN_F_GPE) { err = vxlan_build_gpe_hdr(vxh, vxflags, skb->protocol); if (err < 0) - goto out_free; + return err; inner_protocol = skb->protocol; } skb_set_inner_protocol(skb, inner_protocol); return 0; - -out_free: - kfree_skb(skb); - return err; } static struct rtable *vxlan_get_route(struct vxlan_dev *vxlan, @@ -1927,13 +1923,13 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, struct ip_tunnel_info *info; struct vxlan_dev *vxlan = netdev_priv(dev); struct sock *sk; - struct rtable *rt = NULL; const struct iphdr *old_iph; union vxlan_addr *dst; union vxlan_addr remote_ip, local_ip; union vxlan_addr *src; struct vxlan_metadata _md; struct vxlan_metadata *md = &_md; + struct dst_entry *ndst = NULL; __be16 src_port = 0, dst_port; __be32 vni, label; __be16 df = 0; @@ -2009,6 +2005,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, if (dst->sa.sa_family == AF_INET) { struct vxlan_sock *sock4 = rcu_dereference(vxlan->vn4_sock); + struct rtable *rt; if (!sock4) goto drop; @@ -2030,7 +2027,8 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, netdev_dbg(dev, "circular route to %pI4\n", >sin.sin_addr.s_addr); dev->stats.collisions++; - goto rt_tx_error; + ip_rt_put(rt); + goto tx_error; } /* Bypass encapsulation if the destination is local */ @@ -2053,12 +2051,13 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, else if (info->key.tun_flags & TUNNEL_DONT_FRAGMENT) df = htons(IP_DF); + ndst = >dst; tos = ip_tunnel_ecn_encap(tos, old_iph, skb); ttl = ttl ? : ip4_dst_hoplimit(>dst); - err = vxlan_build_skb(skb, >dst, sizeof(struct iphdr), + err = vxlan_build_skb(skb, ndst, sizeof(struct iphdr), vni, md, flags, udp_sum); if (err < 0) - goto xmit_tx_error; + goto tx_error; udp_tunnel_xmit_skb(rt, sk, skb, src->sin.sin_addr.s_addr, dst->sin.sin_addr.s_addr, tos, ttl, df, @@ -2066,7 +2065,6 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, #if IS_ENABLED(CONFIG_IPV6) } else { struct vxlan_sock *sock6 = rcu_dereference(vxlan->vn6_sock); - struct dst_entry *ndst; u32 rt6i_flags; ndst = vxlan6_get_route(vxlan, sock6, skb, @@ -2078,13 +2076,13 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, netdev_dbg(dev, "no route to %pI6\n", >sin6.sin6_addr); dev->stats.tx_carrier_errors++; + ndst = NULL; goto tx_error; } if (ndst->dev == dev) { netdev_dbg(dev, "circular route to %pI6\n", >sin6.sin6_addr); - dst_release(ndst); dev->stats.collisions++; goto tx_error; } @@ -2096,12 +2094,12 @@ static void
[PATCH net-next v3 1/7] vxlan: avoid vlan processing in vxlan device.
VxLan device does not have special handling for vlan taging on egress. Therefore it does not make sense to expose vlan offloading feature. This patch does not change vxlan functinality. Signed-off-by: Pravin B ShelarAcked-by: Jiri Benc --- drivers/net/vxlan.c | 9 + include/linux/if_vlan.h | 16 2 files changed, 1 insertion(+), 24 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index cb5cc7c..756d826 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -1748,18 +1748,13 @@ static int vxlan_build_skb(struct sk_buff *skb, struct dst_entry *dst, } min_headroom = LL_RESERVED_SPACE(dst->dev) + dst->header_len - + VXLAN_HLEN + iphdr_len - + (skb_vlan_tag_present(skb) ? VLAN_HLEN : 0); + + VXLAN_HLEN + iphdr_len; /* Need space for new headers (invalidates iph ptr) */ err = skb_cow_head(skb, min_headroom); if (unlikely(err)) goto out_free; - skb = vlan_hwaccel_push_inside(skb); - if (WARN_ON(!skb)) - return -ENOMEM; - err = iptunnel_handle_offloads(skb, type); if (err) goto out_free; @@ -2527,10 +2522,8 @@ static void vxlan_setup(struct net_device *dev) dev->features |= NETIF_F_GSO_SOFTWARE; dev->vlan_features = dev->features; - dev->features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX; dev->hw_features |= NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_RXCSUM; dev->hw_features |= NETIF_F_GSO_SOFTWARE; - dev->hw_features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX; netif_keep_dst(dev); dev->priv_flags |= IFF_NO_QUEUE; diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h index 3319d97..8d5fcd6 100644 --- a/include/linux/if_vlan.h +++ b/include/linux/if_vlan.h @@ -399,22 +399,6 @@ static inline struct sk_buff *__vlan_hwaccel_push_inside(struct sk_buff *skb) skb->vlan_tci = 0; return skb; } -/* - * vlan_hwaccel_push_inside - pushes vlan tag to the payload - * @skb: skbuff to tag - * - * Checks is tag is present in @skb->vlan_tci and if it is, it pushes the - * VLAN tag from @skb->vlan_tci inside to the payload. - * - * Following the skb_unshare() example, in case of error, the calling function - * doesn't have to worry about freeing the original skb. - */ -static inline struct sk_buff *vlan_hwaccel_push_inside(struct sk_buff *skb) -{ - if (skb_vlan_tag_present(skb)) - skb = __vlan_hwaccel_push_inside(skb); - return skb; -} /** * __vlan_hwaccel_put_tag - hardware accelerated VLAN inserting -- 1.9.1
[PATCH net-next v3 7/7] vxlan: remove unsed vxlan_dev_dst_port()
Signed-off-by: Pravin B Shelar--- include/net/vxlan.h | 10 -- 1 file changed, 10 deletions(-) diff --git a/include/net/vxlan.h b/include/net/vxlan.h index 308adc4..49a5920 100644 --- a/include/net/vxlan.h +++ b/include/net/vxlan.h @@ -281,16 +281,6 @@ struct vxlan_dev { struct net_device *vxlan_dev_create(struct net *net, const char *name, u8 name_assign_type, struct vxlan_config *conf); -static inline __be16 vxlan_dev_dst_port(struct vxlan_dev *vxlan, - unsigned short family) -{ -#if IS_ENABLED(CONFIG_IPV6) - if (family == AF_INET6) - return inet_sk(vxlan->vn6_sock->sock->sk)->inet_sport; -#endif - return inet_sk(vxlan->vn4_sock->sock->sk)->inet_sport; -} - static inline netdev_features_t vxlan_features_check(struct sk_buff *skb, netdev_features_t features) { -- 1.9.1
[PATCH net-next v3 4/7] vxlan: improve vxlan route lookup checks.
Move route sanity check to respective vxlan[4/6]_get_route functions. This allows us to perform all sanity checks before caching the dst so that we can avoid these checks on subsequent packets. This give move accurate metadata information for packet from fill_metadata_dst(). Signed-off-by: Pravin B Shelar--- drivers/net/vxlan.c | 77 ++--- 1 file changed, 38 insertions(+), 39 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 8bb58f6..aabb918 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -1789,7 +1789,8 @@ static int vxlan_build_skb(struct sk_buff *skb, struct dst_entry *dst, return 0; } -static struct rtable *vxlan_get_route(struct vxlan_dev *vxlan, +static struct rtable *vxlan_get_route(struct vxlan_dev *vxlan, struct net_device *dev, + struct vxlan_sock *sock4, struct sk_buff *skb, int oif, u8 tos, __be32 daddr, __be32 *saddr, struct dst_cache *dst_cache, @@ -1799,6 +1800,9 @@ static struct rtable *vxlan_get_route(struct vxlan_dev *vxlan, struct rtable *rt = NULL; struct flowi4 fl4; + if (!sock4) + return ERR_PTR(-EIO); + if (tos && !info) use_cache = false; if (use_cache) { @@ -1816,16 +1820,26 @@ static struct rtable *vxlan_get_route(struct vxlan_dev *vxlan, fl4.saddr = *saddr; rt = ip_route_output_key(vxlan->net, ); - if (!IS_ERR(rt)) { + if (likely(!IS_ERR(rt))) { + if (rt->dst.dev == dev) { + netdev_dbg(dev, "circular route to %pI4\n", ); + ip_rt_put(rt); + return ERR_PTR(-ELOOP); + } + *saddr = fl4.saddr; if (use_cache) dst_cache_set_ip4(dst_cache, >dst, fl4.saddr); + } else { + netdev_dbg(dev, "no route to %pI4\n", ); + return ERR_PTR(-ENETUNREACH); } return rt; } #if IS_ENABLED(CONFIG_IPV6) static struct dst_entry *vxlan6_get_route(struct vxlan_dev *vxlan, + struct net_device *dev, struct vxlan_sock *sock6, struct sk_buff *skb, int oif, u8 tos, __be32 label, @@ -1861,8 +1875,16 @@ static struct dst_entry *vxlan6_get_route(struct vxlan_dev *vxlan, err = ipv6_stub->ipv6_dst_lookup(vxlan->net, sock6->sock->sk, , ); - if (err < 0) - return ERR_PTR(err); + if (unlikely(err < 0)) { + netdev_dbg(dev, "no route to %pI6\n", daddr); + return ERR_PTR(-ENETUNREACH); + } + + if (unlikely(ndst->dev == dev)) { + netdev_dbg(dev, "circular route to %pI6\n", daddr); + dst_release(ndst); + return ERR_PTR(-ELOOP); + } *saddr = fl6.saddr; if (use_cache) @@ -1929,8 +1951,8 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, union vxlan_addr *src; struct vxlan_metadata _md; struct vxlan_metadata *md = &_md; - struct dst_entry *ndst = NULL; __be16 src_port = 0, dst_port; + struct dst_entry *ndst = NULL; __be32 vni, label; __be16 df = 0; __u8 tos, ttl; @@ -2007,29 +2029,14 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, struct vxlan_sock *sock4 = rcu_dereference(vxlan->vn4_sock); struct rtable *rt; - if (!sock4) - goto drop; - sk = sock4->sock->sk; - - rt = vxlan_get_route(vxlan, skb, + rt = vxlan_get_route(vxlan, dev, sock4, skb, rdst ? rdst->remote_ifindex : 0, tos, dst->sin.sin_addr.s_addr, >sin.sin_addr.s_addr, dst_cache, info); - if (IS_ERR(rt)) { - netdev_dbg(dev, "no route to %pI4\n", - >sin.sin_addr.s_addr); - dev->stats.tx_carrier_errors++; - goto tx_error; - } - - if (rt->dst.dev == dev) { - netdev_dbg(dev, "circular route to %pI4\n", - >sin.sin_addr.s_addr); - dev->stats.collisions++; - ip_rt_put(rt); + if (IS_ERR(rt)) goto tx_error; - } + sk = sock4->sock->sk; /*
[PATCH net-next v3 5/7] vxlan: simplify RTF_LOCAL handling.
Avoid code duplicate code for handling RTF_LOCAL routes. Signed-off-by: Pravin B Shelar--- drivers/net/vxlan.c | 85 - 1 file changed, 51 insertions(+), 34 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index aabb918..0b188d6 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -1938,6 +1938,40 @@ static void vxlan_encap_bypass(struct sk_buff *skb, struct vxlan_dev *src_vxlan, } } +static int encap_bypass_if_local(struct sk_buff *skb, struct net_device *dev, +struct vxlan_dev *vxlan, union vxlan_addr *daddr, +__be32 dst_port, __be32 vni, struct dst_entry *dst, +u32 rt_flags) +{ +#if IS_ENABLED(CONFIG_IPV6) + /* IPv6 rt-flags are checked against RTF_LOCAL, but the value of +* RTF_LOCAL is equal to RTCF_LOCAL. So to keep code simple +* we can use RTCF_LOCAL which works for ipv4 and ipv6 route entry. +*/ + BUILD_BUG_ON(RTCF_LOCAL != RTF_LOCAL); +#endif + /* Bypass encapsulation if the destination is local */ + if (rt_flags & RTCF_LOCAL && + !(rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST))) { + struct vxlan_dev *dst_vxlan; + + dst_release(dst); + dst_vxlan = vxlan_find_vni(vxlan->net, vni, + daddr->sa.sa_family, dst_port, + vxlan->flags); + if (!dst_vxlan) { + dev->stats.tx_errors++; + kfree_skb(skb); + + return -ENOENT; + } + vxlan_encap_bypass(skb, vxlan, dst_vxlan); + return 1; + } + + return 0; +} + static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, struct vxlan_rdst *rdst, bool did_rsc) { @@ -2036,27 +2070,19 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, dst_cache, info); if (IS_ERR(rt)) goto tx_error; - sk = sock4->sock->sk; + sk = sock4->sock->sk; /* Bypass encapsulation if the destination is local */ - if (!info && rt->rt_flags & RTCF_LOCAL && - !(rt->rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST))) { - struct vxlan_dev *dst_vxlan; - - ip_rt_put(rt); - dst_vxlan = vxlan_find_vni(vxlan->net, vni, - dst->sa.sa_family, dst_port, - vxlan->flags); - if (!dst_vxlan) - goto tx_error; - vxlan_encap_bypass(skb, vxlan, dst_vxlan); - return; - } - - if (!info) + if (!info) { + err = encap_bypass_if_local(skb, dev, vxlan, dst, + dst_port, vni, >dst, + rt->rt_flags); + if (err) + return; udp_sum = !(flags & VXLAN_F_UDP_ZERO_CSUM_TX); - else if (info->key.tun_flags & TUNNEL_DONT_FRAGMENT) + } else if (info->key.tun_flags & TUNNEL_DONT_FRAGMENT) { df = htons(IP_DF); + } ndst = >dst; tos = ip_tunnel_ecn_encap(tos, old_iph, skb); @@ -2072,7 +2098,6 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, #if IS_ENABLED(CONFIG_IPV6) } else { struct vxlan_sock *sock6 = rcu_dereference(vxlan->vn6_sock); - u32 rt6i_flags; ndst = vxlan6_get_route(vxlan, dev, sock6, skb, rdst ? rdst->remote_ifindex : 0, tos, @@ -2085,24 +2110,16 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, } sk = sock6->sock->sk; - /* Bypass encapsulation if the destination is local */ - rt6i_flags = ((struct rt6_info *)ndst)->rt6i_flags; - if (!info && rt6i_flags & RTF_LOCAL && - !(rt6i_flags & (RTCF_BROADCAST | RTCF_MULTICAST))) { - struct vxlan_dev *dst_vxlan; - - dst_vxlan = vxlan_find_vni(vxlan->net, vni, - dst->sa.sa_family, dst_port, - vxlan->flags); - if (!dst_vxlan) - goto tx_error; - dst_release(ndst); - vxlan_encap_bypass(skb, vxlan,
[PATCH net-next v3 0/7] vxlan: xmit improvements.
Following patch series improves vxlan fast path, removes duplicate code and simplifies vxlan xmit code path. v2-v3: Removed unrelated warning fix from patch 2. rearranged error handling from patch 3 Fixed stats updates in vxlan route lookup in patch 4 v1-v2: Fix compilation error when IPv6 support is not enabled. Pravin B Shelar (7): vxlan: avoid vlan processing in vxlan device. vxlan: avoid checking socket multiple times. vxlan: simplify exception handling vxlan: improve vxlan route lookup checks. vxlan: simplify RTF_LOCAL handling. vxlan: simplify vxlan xmit vxlan: remove unsed vxlan_dev_dst_port() drivers/net/vxlan.c | 285 +++- include/linux/if_vlan.h | 16 --- include/net/vxlan.h | 10 -- 3 files changed, 137 insertions(+), 174 deletions(-) -- 1.9.1
[PATCH net-next v3 6/7] vxlan: simplify vxlan xmit
Existing vxlan xmit function handles two distinct cases. 1. vxlan net device 2. vxlan lwt device. By seperating initialization these two cases the egress path looks better. Signed-off-by: Pravin B ShelarAcked-by: Jiri Benc --- drivers/net/vxlan.c | 78 +++-- 1 file changed, 34 insertions(+), 44 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 0b188d6..411534c 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -1978,8 +1978,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, struct dst_cache *dst_cache; struct ip_tunnel_info *info; struct vxlan_dev *vxlan = netdev_priv(dev); - struct sock *sk; - const struct iphdr *old_iph; + const struct iphdr *old_iph = ip_hdr(skb); union vxlan_addr *dst; union vxlan_addr remote_ip, local_ip; union vxlan_addr *src; @@ -1988,7 +1987,6 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, __be16 src_port = 0, dst_port; struct dst_entry *ndst = NULL; __be32 vni, label; - __be16 df = 0; __u8 tos, ttl; int err; u32 flags = vxlan->flags; @@ -1998,19 +1996,40 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, info = skb_tunnel_info(skb); if (rdst) { + dst = >remote_ip; + if (vxlan_addr_any(dst)) { + if (did_rsc) { + /* short-circuited back to local bridge */ + vxlan_encap_bypass(skb, vxlan, vxlan); + return; + } + goto drop; + } + dst_port = rdst->remote_port ? rdst->remote_port : vxlan->cfg.dst_port; vni = rdst->remote_vni; - dst = >remote_ip; src = >cfg.saddr; dst_cache = >dst_cache; + md->gbp = skb->mark; + ttl = vxlan->cfg.ttl; + if (!ttl && vxlan_addr_multicast(dst)) + ttl = 1; + + tos = vxlan->cfg.tos; + if (tos == 1) + tos = ip_tunnel_get_dsfield(old_iph, skb); + + if (dst->sa.sa_family == AF_INET) + udp_sum = !(flags & VXLAN_F_UDP_ZERO_CSUM_TX); + else + udp_sum = !(flags & VXLAN_F_UDP_ZERO_CSUM6_TX); + label = vxlan->cfg.label; } else { if (!info) { WARN_ONCE(1, "%s: Missing encapsulation instructions\n", dev->name); goto drop; } - dst_port = info->key.tp_dst ? : vxlan->cfg.dst_port; - vni = tunnel_id_to_key32(info->key.tun_id); remote_ip.sa.sa_family = ip_tunnel_info_af(info); if (remote_ip.sa.sa_family == AF_INET) { remote_ip.sin.sin_addr.s_addr = info->key.u.ipv4.dst; @@ -2020,48 +2039,24 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, local_ip.sin6.sin6_addr = info->key.u.ipv6.src; } dst = _ip; + dst_port = info->key.tp_dst ? : vxlan->cfg.dst_port; + vni = tunnel_id_to_key32(info->key.tun_id); src = _ip; dst_cache = >dst_cache; - } - - if (vxlan_addr_any(dst)) { - if (did_rsc) { - /* short-circuited back to local bridge */ - vxlan_encap_bypass(skb, vxlan, vxlan); - return; - } - goto drop; - } - - old_iph = ip_hdr(skb); - - ttl = vxlan->cfg.ttl; - if (!ttl && vxlan_addr_multicast(dst)) - ttl = 1; - - tos = vxlan->cfg.tos; - if (tos == 1) - tos = ip_tunnel_get_dsfield(old_iph, skb); - - label = vxlan->cfg.label; - src_port = udp_flow_src_port(dev_net(dev), skb, vxlan->cfg.port_min, -vxlan->cfg.port_max, true); - - if (info) { + if (info->options_len) + md = ip_tunnel_info_opts(info); ttl = info->key.ttl; tos = info->key.tos; label = info->key.label; udp_sum = !!(info->key.tun_flags & TUNNEL_CSUM); - - if (info->options_len) - md = ip_tunnel_info_opts(info); - } else { - md->gbp = skb->mark; } + src_port = udp_flow_src_port(dev_net(dev), skb, vxlan->cfg.port_min, +vxlan->cfg.port_max, true); if (dst->sa.sa_family == AF_INET) { struct vxlan_sock *sock4 =
Re: [PATCH 00/39] Netfilter updates for net-next
From: Pablo Neira AyusoDate: Sun, 13 Nov 2016 23:24:54 +0100 > The following patchset contains a second batch of Netfilter updates > for your net-next tree. This includes a rework of the core hook > infrastructure that improves Netfilter performance by ~15% according > to synthetic benchmarks. Then, a large batch with ipset updates, > including a new hash:ipmac set type, via Jozsef Kadlecsik. This also > includes a couple of assorted updates. Looks great, pulled, thanks!
Re: [PATCH v2 net-next 1/5] bpf: Refactor cgroups code in prep for new type
On 10/31/16 11:49 AM, Thomas Graf wrote: > On 10/31/16 at 06:16pm, Daniel Mack wrote: >> On 10/31/2016 06:05 PM, David Ahern wrote: >>> On 10/31/16 11:00 AM, Daniel Mack wrote: Yeah, I'm confused too. I changed that name in my v7 from BPF_PROG_TYPE_CGROUP_SOCK to BPF_PROG_TYPE_CGROUP_SKB on David's (Ahern) request. Why is it now renamed again? >>> >>> Thomas pushed back on adding another program type in favor of using >>> subtypes. So this makes the program type generic to CGROUP and patch >>> 2 in this v2 set added Mickaël's subtype patch with the socket >>> mangling done that way in patch 3. >>> >> >> Fine for me. I can change it around again. > > I would like to hear from Daniel B and Alexei as well. We need to > decide whether to use subtypes consistently and treat prog types as > something more high level or whether to bluntly introduce a new prog > type for every distinct set of verifier limits. I will change lwt_bpf > as well accordingly. > Alexei / Daniel - any comments/preferences on subtypes vs program types?
Re: [PATCH net-next] mdio: Demote print from info to debug in mdio_driver_register
On Sun, Nov 13, 2016 at 07:01:17PM -0800, Florian Fainelli wrote: > While it is useful to know which MDIO driver is being registered, demote > the pr_info() to a pr_debug(). > > Signed-off-by: Florian FainelliReviewed-by: Andrew Lunn Andrew
Re: [PATCH net-next 00/11] Start adding support for mv88e6390 family
From: Andrew LunnDate: Sun, 13 Nov 2016 21:24:03 +0100 > What seems to be the issue is you said you have accepted: > > [PATCH net-next 0/2] Fixes for port refactoring > https://marc.info/?l=linux-netdev=147880114928996=1 > > Yet i don't see these in net-next. And i based this patchset on a tree > which included the fixes. Hence they are not applying. > > Have the fixes really been accepted? Accepted but not pushed out properly, sorry. This should be sorted out now.
Re: [LKP] [net] 2ab9fb18c4: kernel BUG at include/linux/skbuff.h:1935!
Hi guys. I took a look at the commit again and I do not see how this can happen. Are you sure patch was properly applied ? In particular, the following extract is obscure for me : https://github.com/0day-ci/linux Eric-Dumazet/net-__skb_flow_dissect-must-cap-its-return-value/20161110-080839 commit 2ab9fb18c46b91b16a0f0f329336d3be9fc32deb ("net: __skb_flow_dissect() must cap its return value") Hi, The above two lines means 0day repo setup a new branch "Eric-Dumazet/net-__skb_flow_dissect-must-cap-its-return-value/20161110-080839" which is based on net/master, then applied you patch on top of it, commit id is 2ab9fb18c46b91b16a0f0f329336d3be9fc32deb. Xiaolong, it may be more helpful to show the base tree where we apply the patch to. And the final url: https://github.com/0day-ci/linux/tree/Eric-Dumazet/net-__skb_flow_dissect-must-cap-its-return-value/20161110-080839 Thanks, Fengguang
[PATCH net-next] mdio: Demote print from info to debug in mdio_driver_register
While it is useful to know which MDIO driver is being registered, demote the pr_info() to a pr_debug(). Signed-off-by: Florian Fainelli--- drivers/net/phy/mdio_device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/phy/mdio_device.c b/drivers/net/phy/mdio_device.c index 9c88e6749b9a..43c8fd46504b 100644 --- a/drivers/net/phy/mdio_device.c +++ b/drivers/net/phy/mdio_device.c @@ -144,7 +144,7 @@ int mdio_driver_register(struct mdio_driver *drv) struct mdio_driver_common *mdiodrv = >mdiodrv; int retval; - pr_info("mdio_driver_register: %s\n", mdiodrv->driver.name); + pr_debug("mdio_driver_register: %s\n", mdiodrv->driver.name); mdiodrv->driver.bus = _bus_type; mdiodrv->driver.probe = mdio_probe; -- 2.9.3
Re: [PATCH net 2/3] bpf, mlx5: fix various refcount/prog issues in mlx5e_xdp_set
On Mon, Nov 14, 2016 at 01:43:41AM +0100, Daniel Borkmann wrote: > There are multiple issues in mlx5e_xdp_set(): > > 1) prog can be NULL, so calling unconditionally into bpf_prog_add(prog, >priv->params.num_channels) can end badly. > > 2) The batched bpf_prog_add() should be done at an earlier point in >time. This makes sure that we cannot fail anymore at the time we >want to set the program for each channel. This only means that we >have to undo the bpf_prog_add() in case we return early due to >reset or device not in MLX5E_STATE_OPENED yet. Note, err is 0 here. > > 3) When swapping the priv->xdp_prog, then no extra reference count must >be taken since we got that from call path via dev_change_xdp_fd() >already. Otherwise, we'd never be able to free the program. Also, >bpf_prog_add() without checking the return code could fail. > > Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support") > Signed-off-by: Daniel Borkmann... > +static inline void bpf_prog_sub(struct bpf_prog *prog, int i) > +{ > +} > + > static inline void bpf_prog_put(struct bpf_prog *prog) > { > } > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index 751e806..a0fca9f 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -682,6 +682,17 @@ struct bpf_prog *bpf_prog_add(struct bpf_prog *prog, int > i) > } > EXPORT_SYMBOL_GPL(bpf_prog_add); > > +void bpf_prog_sub(struct bpf_prog *prog, int i) > +{ > + /* Only to be used for undoing previous bpf_prog_add() in some > + * error path. We still know that another entity in our call > + * path holds a reference to the program, thus atomic_sub() can > + * be safely used in such cases! > + */ > + WARN_ON(atomic_sub_return(i, >aux->refcnt) == 0); > +} > +EXPORT_SYMBOL_GPL(bpf_prog_sub); the patches look good. I'm only worried about net/net-next merge conflict here. (I would have to deal with it as well). So instead of copying the above helper can we apply net-next's 'bpf, mlx4: fix prog refcount in mlx4_en_try_alloc_resources error path' patch to net without mlx4_xdp_set hunk and then apply the rest of this patch? Even better is to send this patch 2/3 to net-next? yes, it's an issue, but very small one. There is no security concern here, so I would prefer to avoid merge conflict. Did you do a test merge of net/net-next by any chance? May be I'm overreacting.
Re: [PATCH net-next 05/11] net: dsa: mv88e6xxx: Add comment about family a device belongs to
On Mon, Nov 14, 2016 at 01:08:13PM +1100, Vivien Didelot wrote: > Hi Andrew, > > Andrew Lunnwrites: > > > Knowing the family of device belongs to helps with picking the ops > > implementation which is appropriate to the device. So add a comment to > > each structure of ops. > > This commit is not necessary. mv88e6xxx_ops structure must be per-chip, > and the family information is already described in patch 03/11. I disagree. I made a lot of errors adding the right per family handler to these structures, simply because it is not obvious what family a device belongs to when looking at the structure. Andrew
Re: [PATCH net-next 08/11] net: dsa: mv88e6xxx: Add stats_get_sset_count to ops structure
Hi Andrew, Andrew Lunnwrites: > Different families have different sets of statistics. Abstract this > using a stats_get_sset_count op. Each stat has a bitmap, and the ops > implementer uses a bit map mask to count the statistics which apply > for the family. > -static int mv88e6xxx_get_sset_count(struct dsa_switch *ds) > +static int _mv88e6xxx_get_sset_count(struct mv88e6xxx_chip *chip, int types) Looks good overall. But please don't re-introduce underscore-prefixed helpers. If I'm not mistaken, stats are a Global 1 feature, so ordered explicit helpers in global1.c will be perfect. If the stats code is huge, don't hesitate to move them in a global1_stats.c file, as you wish. But we have to keep it self-documented and easy to follow for new developers. Thanks, Vivien
Re: [PATCH net-next v1] bpf: Use u64_to_user_ptr()
On Sun, Nov 13, 2016 at 07:44:03PM +0100, Mickaël Salaün wrote: > Replace the custom u64_to_ptr() function with the u64_to_user_ptr() > macro. > > Signed-off-by: Mickaël SalaünThanks for following up on this one. Acked-by: Alexei Starovoitov
Re: [PATCH net-next 07/11] net: dsa: mv88e6xxx: Add mv88e6390 statistics unit init
Hi Andrew, Andrew Lunnwrites: > The statistics unit on the mv88e6390 needs to the configured in a > different register to the others as to what histogram statistics is > should return. Can you re-phrase the above please? > +static int mv88e6390_stats_init(struct mv88e6xxx_chip *chip) > +{ > + u16 val; > + int err; > + > + err = mv88e6xxx_g1_read(chip, GLOBAL_CONTROL_2, ); > + if (err) > + return err; > + > + val |= GLOBAL_CONTROL_2_HIST_RX_TX; > + > + err = mv88e6xxx_g1_write(chip, GLOBAL_CONTROL_2, val); > + > + return err; > +} Can you please move this Global 1 specific helper in global1.c under an ordered snippet such as: /* Offset 0x1C: Global Control 2 */ int mv88e6xxx_g1_set_foo(struct mv88e6xxx_chip *chip) { ... } I'd like internal SMI devices to be self documented in their specific files and easy to hack for new developers. Ordered helpers will help. Also, the helper should reflect what it really does. It is used to set the Histogram Counters Mode. So please name it accordingly, something like mv88e6xxx_g1_set_hist_count_mode(). Thanks, Vivien
Re: [PATCH net-next 03/11] net: dsa: mv88e6xxx: Add the mv88e6390 family
Hi Andrew, Andrew Lunnwrites: > -- compatible : Should be one of "marvell,mv88e6085", > +- compatible: Should be one of "marvell,mv88e6085" or > + "marvell,mv88e6390" Just curious here, mv88e6085 was choosen because it was the smaller product ID supported. Following that logic, shouldn't mv88e6190 be choosen here instead of mv88e6390? > +static const struct mv88e6xxx_ops mv88e6390_ops = { > + .set_switch_mac = mv88e6xxx_g2_set_switch_mac, > + .phy_read = mv88e6xxx_g2_smi_phy_read, > + .phy_write = mv88e6xxx_g2_smi_phy_write, > + .port_set_link = mv88e6xxx_port_set_link, > + .port_set_duplex = mv88e6xxx_port_set_duplex, > + .port_set_rgmii_delay = mv88e6390_port_set_rgmii_delay, > + .port_set_speed = mv88e6390_port_set_speed, > +}; > + > +static const struct mv88e6xxx_ops mv88e6390x_ops = { > + .set_switch_mac = mv88e6xxx_g2_set_switch_mac, > + .phy_read = mv88e6xxx_g2_smi_phy_read, > + .phy_write = mv88e6xxx_g2_smi_phy_write, > + .port_set_link = mv88e6xxx_port_set_link, > + .port_set_duplex = mv88e6xxx_port_set_duplex, > + .port_set_rgmii_delay = mv88e6390_port_set_rgmii_delay, > + .port_set_speed = mv88e6390x_port_set_speed, > +}; Even if it is a bit more verbose, I'd intentionally keep one mv88e6xxx_ops structure per chip. Using per-family structure is error-prone and simpler is better here. Thanks, Vivien
Re: [net] 2ab9fb18c4: kernel BUG at include/linux/skbuff.h:1935!
On 11/13, Eric Dumazet wrote: >On Mon, 2016-11-14 at 07:49 +0800, kernel test robot wrote: >> FYI, we noticed the following commit: > > >> in testcase: kbuild >> with following parameters: >> >> runtime: 300s >> nr_task: 50% >> cpufreq_governor: performance >> >> >> >> >> on test machine: 8 threads Intel(R) Atom(TM) CPU C2750 @ 2.40GHz with 16G >> memory >> >> caused below changes: >> >> >> +---+++ >> | | cdb26d3387 | >> 2ab9fb18c4 | >> +---+++ >> | boot_successes| 10 | 3 >> | >> | boot_failures | 0 | 9 >> | >> | kernel_BUG_at_include/linux/skbuff.h | 0 | 8 >> | >> | invalid_opcode:#[##]SMP | 0 | 8 >> | >> | RIP:eth_type_trans| 0 | 8 >> | >> | Kernel_panic-not_syncing:Fatal_exception_in_interrupt | 0 | 5 >> | >> | WARNING:at_fs/sysfs/dir.c:#sysfs_warn_dup | 0 | 1 >> | >> | calltrace:parport_pc_init | 0 | 1 >> | >> | calltrace:SyS_finit_module| 0 | 1 >> | >> | WARNING:at_lib/kobject.c:#kobject_add_internal| 0 | 1 >> | >> +---+++ >> >> >> >> [ 20.491020] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready >> [ 20.502988] Sending DHCP requests . >> [ 20.506729] [ cut here ] >> [ 20.511369] kernel BUG at include/linux/skbuff.h:1935! >> [ 20.517893] invalid opcode: [#1] SMP >> [ 20.521902] Modules linked in: >> [ 20.524979] CPU: 4 PID: 0 Comm: swapper/4 Not tainted >> 4.9.0-rc3-00286-g2ab9fb1 #1 >> [ 20.532463] Hardware name: Supermicro SYS-5018A-TN4/A1SAi, BIOS 1.1a >> 08/27/2015 >> [ 20.539768] task: 8804456c2480 task.stack: c9000192 >> [ 20.545684] RIP: 0010:[] [] >> eth_type_trans+0xe8/0x140 >> [ 20.553972] RSP: 0018:88047fd03db8 EFLAGS: 00010297 >> [ 20.559283] RAX: 0158 RBX: 88047d8ae600 RCX: >> 1073 >> [ 20.566415] RDX: 88047bf07dc0 RSI: 88047d8a4000 RDI: >> 88047dac0f00 >> [ 20.573546] RBP: 88047fd03e20 R08: 88047d8a4000 R09: >> 0800 >> [ 20.580678] R10: 88047bf07ec0 R11: ea0011f6e400 R12: >> 88047dac0f00 >> [ 20.587810] R13: 880457413000 R14: c90002129000 R15: >> 015e >> [ 20.594946] FS: () GS:88047fd0() >> knlGS: >> [ 20.603032] CS: 0010 DS: ES: CR0: 80050033 >> [ 20.608775] CR2: 7fffadfb4ef0 CR3: 00047ee07000 CR4: >> 001006e0 >> [ 20.615906] Stack: >> [ 20.617927] 816905a7 ea0011f6e400 ea08 >> 88047d8ae450 >> [ 20.625403] 88047d8ae400 00400166 ea0011f6e400 >> >> [ 20.632873] 0040 88047d8ae450 >> 88047d8b1140 >> [ 20.640352] Call Trace: >> [ 20.642805] >> [ 20.644740] [] ? igb_clean_rx_irq+0x6a7/0x7d0 >> [ 20.650760] [] igb_poll+0x382/0x700 >> [ 20.655904] [] ? timerqueue_add+0x59/0xb0 >> [ 20.661564] [] net_rx_action+0x217/0x360 >> [ 20.667137] [] __do_softirq+0x104/0x2ab >> [ 20.672624] [] irq_exit+0xf1/0x100 >> [ 20.677673] [] do_IRQ+0x54/0xd0 >> [ 20.682466] [] common_interrupt+0x8c/0x8c >> [ 20.688123] >> [ 20.690054] [] ? cpuidle_enter_state+0x122/0x2e0 >> [ 20.696333] [] cpuidle_enter+0x17/0x20 >> [ 20.701733] [] call_cpuidle+0x23/0x40 >> [ 20.707045] [] cpu_startup_entry+0x114/0x200 >> [ 20.712964] [] start_secondary+0x107/0x130 >> [ 20.718708] Code: 00 04 00 00 c9 c3 48 33 86 70 03 00 00 48 c1 e0 10 48 >> 85 c0 0f b6 87 90 00 00 00 75 28 83 e0 f8 83 c8 01 88 87 90 00 00 00 eb 82 >> <0f> 0b 0f b6 87 90 00 00 00 83 e0 f8 83 c8 03 88 87 90 00 00 00 >> [ 20.738722] RIP [] eth_type_trans+0xe8/0x140 >> [ 20.744662] RSP >> [ 20.748160] ---[ end trace 153440bf1ca2e6fc ]--- >> [ 20.748165] [ cut here ] >> >> >> To reproduce: >> >> git clone >> git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git >> cd lkp-tests >> bin/lkp install job.yaml # job file is attached in this email >> bin/lkp run job.yaml >> >> >> >> Thanks, >> Kernel Test Robot > > >Hi guys. > >I took a look at the commit again and I do not see how this can happen. > >Are you sure patch was properly applied ? > >In particular, the following extract is obscure for me : > > >>
Re: [PATCH net-next 05/11] net: dsa: mv88e6xxx: Add comment about family a device belongs to
Hi Andrew, Andrew Lunnwrites: > Knowing the family of device belongs to helps with picking the ops > implementation which is appropriate to the device. So add a comment to > each structure of ops. This commit is not necessary. mv88e6xxx_ops structure must be per-chip, and the family information is already described in patch 03/11. Thanks, Vivien
[PATCH net] net: stmmac: Fix lack of link transition for fixed PHYs
Commit 52f95bbfcf72 ("stmmac: fix adjust link call in case of a switch is attached") added some logic to avoid polling the fixed PHY and therefore invoking the adjust_link callback more than once, since this is a fixed PHY and link events won't be generated. This works fine the first time, because we start with phydev->irq = PHY_POLL, so we call adjust_link, then we set phydev->irq = PHY_IGNORE_INTERRUPT and we stop polling the PHY. Now, if we called ndo_close(), which calls both phy_stop() and does an explicit netif_carrier_off(), we end up with a link down. Upon calling ndo_open() again, despite starting the PHY state machine, we have PHY_IGNORE_INTERRUPT set, and we generate no link event at all, so the link is permanently down. 52f95bbfcf72 ("stmmac: fix adjust link call in case of a switch is attached") Signed-off-by: Florian Fainelli--- Alexandre, Peppe, The original patch is already a hack, but since this is a bugfix, I took the same approach that you did here to backport this to -stable kernels. drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 10909c9c0033..03dbf8e89c4c 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -882,6 +882,13 @@ static int stmmac_init_phy(struct net_device *dev) return -ENODEV; } + /* stmmac_adjust_link will change this to PHY_IGNORE_INTERRUPT to avoid +* subsequent PHY polling, make sure we force a link transition if +* we have a UP/DOWN/UP transition +*/ + if (phydev->is_pseudo_fixed_link) + phydev->irq = PHY_POLL; + pr_debug("stmmac_init_phy: %s: attached to PHY (UID 0x%x)" " Link = %d\n", dev->name, phydev->phy_id, phydev->link); -- 2.9.3
Re: [net] 2ab9fb18c4: kernel BUG at include/linux/skbuff.h:1935!
On Mon, 2016-11-14 at 07:49 +0800, kernel test robot wrote: > FYI, we noticed the following commit: > in testcase: kbuild > with following parameters: > > runtime: 300s > nr_task: 50% > cpufreq_governor: performance > > > > > on test machine: 8 threads Intel(R) Atom(TM) CPU C2750 @ 2.40GHz with 16G > memory > > caused below changes: > > > +---+++ > | | cdb26d3387 | > 2ab9fb18c4 | > +---+++ > | boot_successes| 10 | 3 > | > | boot_failures | 0 | 9 > | > | kernel_BUG_at_include/linux/skbuff.h | 0 | 8 > | > | invalid_opcode:#[##]SMP | 0 | 8 > | > | RIP:eth_type_trans| 0 | 8 > | > | Kernel_panic-not_syncing:Fatal_exception_in_interrupt | 0 | 5 > | > | WARNING:at_fs/sysfs/dir.c:#sysfs_warn_dup | 0 | 1 > | > | calltrace:parport_pc_init | 0 | 1 > | > | calltrace:SyS_finit_module| 0 | 1 > | > | WARNING:at_lib/kobject.c:#kobject_add_internal| 0 | 1 > | > +---+++ > > > > [ 20.491020] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready > [ 20.502988] Sending DHCP requests . > [ 20.506729] [ cut here ] > [ 20.511369] kernel BUG at include/linux/skbuff.h:1935! > [ 20.517893] invalid opcode: [#1] SMP > [ 20.521902] Modules linked in: > [ 20.524979] CPU: 4 PID: 0 Comm: swapper/4 Not tainted > 4.9.0-rc3-00286-g2ab9fb1 #1 > [ 20.532463] Hardware name: Supermicro SYS-5018A-TN4/A1SAi, BIOS 1.1a > 08/27/2015 > [ 20.539768] task: 8804456c2480 task.stack: c9000192 > [ 20.545684] RIP: 0010:[] [] > eth_type_trans+0xe8/0x140 > [ 20.553972] RSP: 0018:88047fd03db8 EFLAGS: 00010297 > [ 20.559283] RAX: 0158 RBX: 88047d8ae600 RCX: > 1073 > [ 20.566415] RDX: 88047bf07dc0 RSI: 88047d8a4000 RDI: > 88047dac0f00 > [ 20.573546] RBP: 88047fd03e20 R08: 88047d8a4000 R09: > 0800 > [ 20.580678] R10: 88047bf07ec0 R11: ea0011f6e400 R12: > 88047dac0f00 > [ 20.587810] R13: 880457413000 R14: c90002129000 R15: > 015e > [ 20.594946] FS: () GS:88047fd0() > knlGS: > [ 20.603032] CS: 0010 DS: ES: CR0: 80050033 > [ 20.608775] CR2: 7fffadfb4ef0 CR3: 00047ee07000 CR4: > 001006e0 > [ 20.615906] Stack: > [ 20.617927] 816905a7 ea0011f6e400 ea08 > 88047d8ae450 > [ 20.625403] 88047d8ae400 00400166 ea0011f6e400 > > [ 20.632873] 0040 88047d8ae450 > 88047d8b1140 > [ 20.640352] Call Trace: > [ 20.642805] > [ 20.644740] [] ? igb_clean_rx_irq+0x6a7/0x7d0 > [ 20.650760] [] igb_poll+0x382/0x700 > [ 20.655904] [] ? timerqueue_add+0x59/0xb0 > [ 20.661564] [] net_rx_action+0x217/0x360 > [ 20.667137] [] __do_softirq+0x104/0x2ab > [ 20.672624] [] irq_exit+0xf1/0x100 > [ 20.677673] [] do_IRQ+0x54/0xd0 > [ 20.682466] [] common_interrupt+0x8c/0x8c > [ 20.688123] > [ 20.690054] [] ? cpuidle_enter_state+0x122/0x2e0 > [ 20.696333] [] cpuidle_enter+0x17/0x20 > [ 20.701733] [] call_cpuidle+0x23/0x40 > [ 20.707045] [] cpu_startup_entry+0x114/0x200 > [ 20.712964] [] start_secondary+0x107/0x130 > [ 20.718708] Code: 00 04 00 00 c9 c3 48 33 86 70 03 00 00 48 c1 e0 10 48 85 > c0 0f b6 87 90 00 00 00 75 28 83 e0 f8 83 c8 01 88 87 90 00 00 00 eb 82 <0f> > 0b 0f b6 87 90 00 00 00 83 e0 f8 83 c8 03 88 87 90 00 00 00 > [ 20.738722] RIP [] eth_type_trans+0xe8/0x140 > [ 20.744662] RSP > [ 20.748160] ---[ end trace 153440bf1ca2e6fc ]--- > [ 20.748165] [ cut here ] > > > To reproduce: > > git clone > git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git > cd lkp-tests > bin/lkp install job.yaml # job file is attached in this email > bin/lkp run job.yaml > > > > Thanks, > Kernel Test Robot Hi guys. I took a look at the commit again and I do not see how this can happen. Are you sure patch was properly applied ? In particular, the following extract is obscure for me : > https://github.com/0day-ci/linux > Eric-Dumazet/net-__skb_flow_dissect-must-cap-its-return-value/20161110-080839 > commit 2ab9fb18c46b91b16a0f0f329336d3be9fc32deb ("net:
Re: [PATCH net-next 02/11] net: dsa: mv88e6xxx: Fix unused variable warning by using variable
Hi Andrew, Andrew Lunnwrites: > _mv88e6xxx_stats_wait() did not check the return value from > mv88e6xxx_g1_read(), so the compiler complained about set but unused > err. > > Signed-off-by: Andrew Lunn Reviewed-by: Vivien Didelot Thanks, Vivien
Re: [PATCH net-next 01/11] net: dsa: mv88e6xxx: Take switch out of reset before probe
Hi Andrew, Andrew Lunnwrites: > The switch needs to be taken out of reset before we can read its ID > register on the MDIO bus. > > Signed-off-by: Andrew Lunn Reviewed-by: Vivien Didelot Thanks, Vivien
[PATCH net 0/3] Couple of BPF refcount fixes for mlx5
Various mlx5 bugs on eBPF program and refcount handling I found during review. Since these kind of bugs happened multiple times here, I'll add a __must_check to the bpf_prog_inc()/bpf_prog_add()/etc functions for net-next, so these things will let the compiler (and thus kbuild bot) bark early enough. Note, turned out, I had to take the hunk from c540594f864b ("bpf, mlx4: fix prog refcount in mlx4_en_try_alloc_resources error path") to get bpf_prog_sub() function for net as well, but the merge into net-next should add no conflicts. Rana, please review. Thanks a lot! Daniel Borkmann (3): bpf, mlx5: fix mlx5e_create_rq taking reference on prog bpf, mlx5: fix various refcount/prog issues in mlx5e_xdp_set bpf, mlx5: drop priv->xdp_prog reference on netdev cleanup drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 42 ++- include/linux/bpf.h | 5 +++ kernel/bpf/syscall.c | 12 +++ 3 files changed, 51 insertions(+), 8 deletions(-) -- 1.9.3
[PATCH net 2/3] bpf, mlx5: fix various refcount/prog issues in mlx5e_xdp_set
There are multiple issues in mlx5e_xdp_set(): 1) prog can be NULL, so calling unconditionally into bpf_prog_add(prog, priv->params.num_channels) can end badly. 2) The batched bpf_prog_add() should be done at an earlier point in time. This makes sure that we cannot fail anymore at the time we want to set the program for each channel. This only means that we have to undo the bpf_prog_add() in case we return early due to reset or device not in MLX5E_STATE_OPENED yet. Note, err is 0 here. 3) When swapping the priv->xdp_prog, then no extra reference count must be taken since we got that from call path via dev_change_xdp_fd() already. Otherwise, we'd never be able to free the program. Also, bpf_prog_add() without checking the return code could fail. Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support") Signed-off-by: Daniel Borkmann--- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 25 ++- include/linux/bpf.h | 5 + kernel/bpf/syscall.c | 11 ++ 3 files changed, 36 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 2b83667..c90610a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -3125,6 +3125,17 @@ static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog) goto unlock; } + if (prog) { + /* num_channels is invariant here, so we can take the +* batched reference right upfront. +*/ + prog = bpf_prog_add(prog, priv->params.num_channels); + if (IS_ERR(prog)) { + err = PTR_ERR(prog); + goto unlock; + } + } + was_opened = test_bit(MLX5E_STATE_OPENED, >state); /* no need for full reset when exchanging programs */ reset = (!priv->xdp_prog || !prog); @@ -3132,10 +3143,10 @@ static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog) if (was_opened && reset) mlx5e_close_locked(netdev); - /* exchange programs */ + /* exchange programs, extra prog reference we got from caller +* as long as we don't fail from this point onwards. +*/ old_prog = xchg(>xdp_prog, prog); - if (prog) - bpf_prog_add(prog, 1); if (old_prog) bpf_prog_put(old_prog); @@ -3146,12 +3157,11 @@ static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog) mlx5e_open_locked(netdev); if (!test_bit(MLX5E_STATE_OPENED, >state) || reset) - goto unlock; + goto unlock_put; /* exchanging programs w/o reset, we update ref counts on behalf * of the channels RQs here. */ - bpf_prog_add(prog, priv->params.num_channels); for (i = 0; i < priv->params.num_channels; i++) { struct mlx5e_channel *c = priv->channel[i]; @@ -3173,6 +3183,11 @@ static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog) unlock: mutex_unlock(>state_lock); return err; +unlock_put: + /* reference on priv->xdp_prog is still held at this point */ + if (prog) + bpf_prog_sub(prog, priv->params.num_channels); + goto unlock; } static bool mlx5e_xdp_attached(struct net_device *dev) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index c201017..ca495fd 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -234,6 +234,7 @@ u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size, struct bpf_prog *bpf_prog_get(u32 ufd); struct bpf_prog *bpf_prog_get_type(u32 ufd, enum bpf_prog_type type); struct bpf_prog *bpf_prog_add(struct bpf_prog *prog, int i); +void bpf_prog_sub(struct bpf_prog *prog, int i); struct bpf_prog *bpf_prog_inc(struct bpf_prog *prog); void bpf_prog_put(struct bpf_prog *prog); @@ -303,6 +304,10 @@ static inline struct bpf_prog *bpf_prog_add(struct bpf_prog *prog, int i) return ERR_PTR(-EOPNOTSUPP); } +static inline void bpf_prog_sub(struct bpf_prog *prog, int i) +{ +} + static inline void bpf_prog_put(struct bpf_prog *prog) { } diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 751e806..a0fca9f 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -682,6 +682,17 @@ struct bpf_prog *bpf_prog_add(struct bpf_prog *prog, int i) } EXPORT_SYMBOL_GPL(bpf_prog_add); +void bpf_prog_sub(struct bpf_prog *prog, int i) +{ + /* Only to be used for undoing previous bpf_prog_add() in some +* error path. We still know that another entity in our call +* path holds a reference to the program, thus atomic_sub() can +* be safely used in
[PATCH net 1/3] bpf, mlx5: fix mlx5e_create_rq taking reference on prog
In mlx5e_create_rq(), when creating a new queue, we call bpf_prog_add() but without checking the return value. bpf_prog_add() can fail, so we really must check it. Take the reference right when we assign it to the rq from priv->xdp_prog, and just drop the reference on error path. Destruction in mlx5e_destroy_rq() looks good, though. Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support") Signed-off-by: Daniel Borkmann--- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 14 +++--- kernel/bpf/syscall.c | 1 + 2 files changed, 12 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 84e8b25..2b83667 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -489,7 +489,16 @@ static int mlx5e_create_rq(struct mlx5e_channel *c, rq->channel = c; rq->ix = c->ix; rq->priv= c->priv; + rq->xdp_prog = priv->xdp_prog; + if (rq->xdp_prog) { + rq->xdp_prog = bpf_prog_inc(rq->xdp_prog); + if (IS_ERR(rq->xdp_prog)) { + err = PTR_ERR(rq->xdp_prog); + rq->xdp_prog = NULL; + goto err_rq_wq_destroy; + } + } rq->buff.map_dir = DMA_FROM_DEVICE; if (rq->xdp_prog) @@ -566,12 +575,11 @@ static int mlx5e_create_rq(struct mlx5e_channel *c, rq->page_cache.head = 0; rq->page_cache.tail = 0; - if (rq->xdp_prog) - bpf_prog_add(rq->xdp_prog, 1); - return 0; err_rq_wq_destroy: + if (rq->xdp_prog) + bpf_prog_put(rq->xdp_prog); mlx5_wq_destroy(>wq_ctrl); return err; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 237f3d6..751e806 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -686,6 +686,7 @@ struct bpf_prog *bpf_prog_inc(struct bpf_prog *prog) { return bpf_prog_add(prog, 1); } +EXPORT_SYMBOL_GPL(bpf_prog_inc); static struct bpf_prog *__bpf_prog_get(u32 ufd, enum bpf_prog_type *type) { -- 1.9.3
[PATCH net 3/3] bpf, mlx5: drop priv->xdp_prog reference on netdev cleanup
mlx5e_xdp_set() is currently the only place where we drop reference on the prog sitting in priv->xdp_prog when it's exchanged by a new one. We also need to make sure that we eventually release that reference, for example, in case the netdev is dismantled. Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support") Signed-off-by: Daniel Borkmann--- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index c90610a..930aa6f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -3697,6 +3697,9 @@ static void mlx5e_nic_cleanup(struct mlx5e_priv *priv) if (MLX5_CAP_GEN(mdev, vport_group_manager)) mlx5_eswitch_unregister_vport_rep(esw, 0); + + if (priv->xdp_prog) + bpf_prog_put(priv->xdp_prog); } static int mlx5e_init_nic_rx(struct mlx5e_priv *priv) -- 1.9.3
[PATCH net-next 1/1] driver: macvlan: Replace integer number with bool value
From: Gao FengThe return value of function macvlan_addr_busy is used as bool value, so use bool value instead of integer number "1" and "0". Signed-off-by: Gao Feng --- drivers/net/macvlan.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index a064415..d0361f3 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -179,20 +179,20 @@ static void macvlan_hash_change_addr(struct macvlan_dev *vlan, macvlan_hash_add(vlan); } -static int macvlan_addr_busy(const struct macvlan_port *port, - const unsigned char *addr) +static bool macvlan_addr_busy(const struct macvlan_port *port, + const unsigned char *addr) { /* Test to see if the specified multicast address is * currently in use by the underlying device or * another macvlan. */ if (ether_addr_equal_64bits(port->dev->dev_addr, addr)) - return 1; + return true; if (macvlan_hash_lookup(port, addr)) - return 1; + return true; - return 0; + return false; } -- 1.9.1
Re: [PATCH] Fixup packets with incorrect ethertype sent by ZTE MF821D
So here's another stab. The comments and the current implementation are not in sync: any non-multicast address starting with a null octet gets rewritten, while the comment specifically mentions 00:a0:c6:00:00:00. It is certainly not elegant but re-writing all unicast destinations with our address does come to mind instead of special cases. This patch fails to handle the invalid destinations in either way so I will send another one if you think it's worthwhile to go on. And it seems I forgot htons but I need this device for work now so a better patch must wait :) commit 35d3a46b7f1ece70e24386acbdd16af4507cb5f3 Author: Jussi PeltolaDate: Mon Nov 14 01:45:32 2016 +0200 Attempt to fix up packets with a broken ethernet header Signed-off-by: Jussi Peltola diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 3ff76c6..7308d6b 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -153,25 +153,57 @@ static const u8 default_modem_addr[ETH_ALEN] = {0x02, 0x50, 0xf3}; static const u8 buggy_fw_addr[ETH_ALEN] = {0x00, 0xa0, 0xc6, 0x00, 0x00, 0x00}; -/* Make up an ethernet header if the packet doesn't have one. +/* Check if the ethernet header has an unknown ethertype, and return a + * guess of the correct one based on the L3 header, or zero if the type was + * known or detection failed. + */ +static __be16 detect_bogus_header(struct sk_buff *skb) { + struct ethhdr *eth_hdr = (struct ethhdr*) skb->data; + + switch (eth_hdr->h_proto) { + case ETH_P_IP: + case ETH_P_IPV6: + case ETH_P_ARP: + return 0; + default: + switch (skb->data[14] & 0xf0) { + case 0x40: + return htons(ETH_P_IP); + case 0x60: + return htons(ETH_P_IPV6); + default: + /* pass on undetectable packets */ + return 0; + } + } + /*NOTREACHED*/ + return 0; +} + +/* Make up an ethernet header if the packet doesn't have a correct one. * * A firmware bug common among several devices cause them to send raw * IP packets under some circumstances. There is no way for the * driver/host to know when this will happen. And even when the bug * hits, some packets will still arrive with an intact header. * - * The supported devices are only capably of sending IPv4, IPv6 and + * The supported devices are only capable of sending IPv4, IPv6 and * ARP packets on a point-to-point link. Any packet with an ethernet * header will have either our address or a broadcast/multicast - * address as destination. ARP packets will always have a header. + * address as destination. ARP packets will always have a header. * * This means that this function will reliably add the appropriate - * header iff necessary, provided our hardware address does not start + * header if necessary, provided our hardware address does not start * with 4 or 6. * * Another common firmware bug results in all packets being addressed * to 00:a0:c6:00:00:00 despite the host address being different. - * This function will also fixup such packets. + * + * Some devices will send packets with garbage source/destination MACs and + * ethertypes. + * + * This function will try to fix up all such packets. + * */ static int qmi_wwan_rx_fixup(struct usbnet *dev, struct sk_buff *skb) { @@ -179,8 +211,8 @@ static int qmi_wwan_rx_fixup(struct usbnet *dev, struct sk_buff *skb) bool rawip = info->flags & QMI_WWAN_FLAG_RAWIP; __be16 proto; - /* This check is no longer done by usbnet */ - if (skb->len < dev->net->hard_header_len) + /* Shorter is definitely invalid and breaks subsequent tests */ + if (skb->len < 15) return 0; switch (skb->data[0] & 0xf0) { @@ -190,17 +222,17 @@ static int qmi_wwan_rx_fixup(struct usbnet *dev, struct sk_buff *skb) case 0x60: proto = htons(ETH_P_IPV6); break; - case 0x00: + default: if (rawip) return 0; if (is_multicast_ether_addr(skb->data)) return 1; - /* possibly bogus destination - rewrite just in case */ - skb_reset_mac_header(skb); - goto fix_dest; - default: - if (rawip) - return 0; + proto = detect_bogus_header(skb); + if (proto) { + /* remove terminally broken header */ + skb_pull(skb, ETH_HLEN); + break; + } /* pass along other packets without modifications */ return 1; } @@ -208,17 +240,17 @@ static int qmi_wwan_rx_fixup(struct usbnet *dev, struct sk_buff *skb) skb->dev = dev->net; /* normally set
Re: [PATCH net-next 04/11] net: dsa: mv88e6xxx: Abstract stats_snapshot into ops structure
Hi Andrew, Andrew Lunnwrites: > +static int mv88e6320_stats_snapshot(struct mv88e6xxx_chip *chip, int port) > +{ > + port = (port + 1) << 5; > + > + return _mv88e6xxx_stats_snapshot(chip, port); > +} Please move the above helper in its internal SMI file (port, global1 or whatever) and keep the below wrapper in chip.c. The correct prefix will avoid having a _ prefix. > +static int mv88e6xxx_stats_snapshot(struct mv88e6xxx_chip *chip, int port) > +{ > + if (!chip->info->ops->stats_snapshot) > + return -EOPNOTSUPP; > + > + return chip->info->ops->stats_snapshot(chip, port); > +} [...] > static const struct mv88e6xxx_ops mv88e6175_ops = { > @@ -3223,6 +3243,7 @@ static const struct mv88e6xxx_ops mv88e6175_ops = { > .port_set_duplex = mv88e6xxx_port_set_duplex, > .port_set_rgmii_delay = mv88e6352_port_set_rgmii_delay, > .port_set_speed = mv88e6185_port_set_speed, > + .stats_snapshot = mv88e6xxx_stats_snapshot, > }; Is this expected? Doesn't look correct to me to use mv88e6xxx_stats_snapshot here. Thanks, Vivien
Re: [PATCH net-next v1] bpf: Use u64_to_user_ptr()
On 11/13/2016 07:44 PM, Mickaël Salaün wrote: Replace the custom u64_to_ptr() function with the u64_to_user_ptr() macro. Signed-off-by: Mickaël SalaünCc: Alexei Starovoitov Cc: Arnd Bergmann Cc: Daniel Borkmann Looks good to me, thanks! Acked-by: Daniel Borkmann
[PATCH v3] ip6_output: ensure flow saddr actually belongs to device
This puts the IPv6 routing functions in parity with the IPv4 routing functions. Namely, we now check in v6 that if a flowi6 requests an saddr, the returned dst actually corresponds to a net device that has that saddr. This mirrors the v4 logic with __ip_dev_find in __ip_route_output_key_hash. In the event that the returned dst is not for a dst with a dev that has the saddr, we return -EINVAL, just like v4; this makes it easy to use the same error handlers for both cases. Signed-off-by: Jason A. DonenfeldCc: David Ahern --- Changes from v2: It turns out ipv6_chk_addr already has the device enumeration logic that we need by simply passing NULL. net/ipv6/ip6_output.c | 4 1 file changed, 4 insertions(+) diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 6001e78..b3b5cb6 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -926,6 +926,10 @@ static int ip6_dst_lookup_tail(struct net *net, const struct sock *sk, int err; int flags = 0; + if (!ipv6_addr_any(>saddr) && + !ipv6_chk_addr(net, >saddr, NULL, 1)) + return -EINVAL; + /* The correct way to handle this would be to do * ip6_route_get_saddr, and then ip6_route_output; however, * the route-specific preferred source forces the -- 2.10.2
[PATCH] net: bnx2: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated. We move this driver to new api {get|set}_link_ksettings. Signed-off-by: Philippe Reynes--- drivers/net/ethernet/broadcom/bnx2.c | 74 +++--- 1 files changed, 41 insertions(+), 33 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnx2.c b/drivers/net/ethernet/broadcom/bnx2.c index eab49ff..09d5b61 100644 --- a/drivers/net/ethernet/broadcom/bnx2.c +++ b/drivers/net/ethernet/broadcom/bnx2.c @@ -6882,12 +6882,14 @@ static u32 bnx2_find_max_ring(u32 ring_size, u32 max_size) /* All ethtool functions called with rtnl_lock */ static int -bnx2_get_settings(struct net_device *dev, struct ethtool_cmd *cmd) +bnx2_get_link_ksettings(struct net_device *dev, + struct ethtool_link_ksettings *cmd) { struct bnx2 *bp = netdev_priv(dev); int support_serdes = 0, support_copper = 0; + u32 supported, advertising; - cmd->supported = SUPPORTED_Autoneg; + supported = SUPPORTED_Autoneg; if (bp->phy_flags & BNX2_PHY_FLAG_REMOTE_PHY_CAP) { support_serdes = 1; support_copper = 1; @@ -6897,56 +6899,59 @@ static u32 bnx2_find_max_ring(u32 ring_size, u32 max_size) support_copper = 1; if (support_serdes) { - cmd->supported |= SUPPORTED_1000baseT_Full | + supported |= SUPPORTED_1000baseT_Full | SUPPORTED_FIBRE; if (bp->phy_flags & BNX2_PHY_FLAG_2_5G_CAPABLE) - cmd->supported |= SUPPORTED_2500baseX_Full; - + supported |= SUPPORTED_2500baseX_Full; } if (support_copper) { - cmd->supported |= SUPPORTED_10baseT_Half | + supported |= SUPPORTED_10baseT_Half | SUPPORTED_10baseT_Full | SUPPORTED_100baseT_Half | SUPPORTED_100baseT_Full | SUPPORTED_1000baseT_Full | SUPPORTED_TP; - } spin_lock_bh(>phy_lock); - cmd->port = bp->phy_port; - cmd->advertising = bp->advertising; + cmd->base.port = bp->phy_port; + advertising = bp->advertising; if (bp->autoneg & AUTONEG_SPEED) { - cmd->autoneg = AUTONEG_ENABLE; + cmd->base.autoneg = AUTONEG_ENABLE; } else { - cmd->autoneg = AUTONEG_DISABLE; + cmd->base.autoneg = AUTONEG_DISABLE; } if (netif_carrier_ok(dev)) { - ethtool_cmd_speed_set(cmd, bp->line_speed); - cmd->duplex = bp->duplex; + cmd->base.speed = bp->line_speed; + cmd->base.duplex = bp->duplex; if (!(bp->phy_flags & BNX2_PHY_FLAG_SERDES)) { if (bp->phy_flags & BNX2_PHY_FLAG_MDIX) - cmd->eth_tp_mdix = ETH_TP_MDI_X; + cmd->base.eth_tp_mdix = ETH_TP_MDI_X; else - cmd->eth_tp_mdix = ETH_TP_MDI; + cmd->base.eth_tp_mdix = ETH_TP_MDI; } } else { - ethtool_cmd_speed_set(cmd, SPEED_UNKNOWN); - cmd->duplex = DUPLEX_UNKNOWN; + cmd->base.speed = SPEED_UNKNOWN; + cmd->base.duplex = DUPLEX_UNKNOWN; } spin_unlock_bh(>phy_lock); - cmd->transceiver = XCVR_INTERNAL; - cmd->phy_address = bp->phy_addr; + cmd->base.phy_address = bp->phy_addr; + + ethtool_convert_legacy_u32_to_link_mode(cmd->link_modes.supported, + supported); + ethtool_convert_legacy_u32_to_link_mode(cmd->link_modes.advertising, + advertising); return 0; } static int -bnx2_set_settings(struct net_device *dev, struct ethtool_cmd *cmd) +bnx2_set_link_ksettings(struct net_device *dev, + const struct ethtool_link_ksettings *cmd) { struct bnx2 *bp = netdev_priv(dev); u8 autoneg = bp->autoneg; @@ -6957,24 +6962,26 @@ static u32 bnx2_find_max_ring(u32 ring_size, u32 max_size) spin_lock_bh(>phy_lock); - if (cmd->port != PORT_TP && cmd->port != PORT_FIBRE) + if (cmd->base.port != PORT_TP && cmd->base.port != PORT_FIBRE) goto err_out_unlock; - if (cmd->port != bp->phy_port && + if (cmd->base.port != bp->phy_port && !(bp->phy_flags & BNX2_PHY_FLAG_REMOTE_PHY_CAP)) goto err_out_unlock; /* If device is down, we can store the settings only if the user * is setting the currently active port. */ - if (!netif_running(dev) && cmd->port != bp->phy_port) + if (!netif_running(dev) && cmd->base.port != bp->phy_port) goto err_out_unlock; -
[PATCH 13/39] netfilter: conntrack: simplify init/uninit of L4 protocol trackers
From: Davide Carattimodify registration and deregistration of layer-4 protocol trackers to facilitate inclusion of new elements into the current list of builtin protocols. Both builtin (TCP, UDP, ICMP) and non-builtin (DCCP, GRE, SCTP, UDPlite) layer-4 protocol trackers usually register/deregister themselves using consecutive calls to nf_ct_l4proto_{,pernet}_{,un}register(...). This sequence is interrupted and rolled back in case of error; in order to simplify addition of builtin protocols, the input of the above functions has been modified to allow registering/unregistering multiple protocols. Signed-off-by: Davide Caratti Signed-off-by: Pablo Neira Ayuso --- include/net/netfilter/nf_conntrack_l4proto.h | 18 -- net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c | 76 +++ net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c | 78 --- net/netfilter/nf_conntrack_proto.c | 85 ++ net/netfilter/nf_conntrack_proto_dccp.c| 48 --- net/netfilter/nf_conntrack_proto_gre.c | 11 ++-- net/netfilter/nf_conntrack_proto_sctp.c| 50 --- net/netfilter/nf_conntrack_proto_udplite.c | 50 --- 8 files changed, 179 insertions(+), 237 deletions(-) diff --git a/include/net/netfilter/nf_conntrack_l4proto.h b/include/net/netfilter/nf_conntrack_l4proto.h index de629f1520df..2152b70626d5 100644 --- a/include/net/netfilter/nf_conntrack_l4proto.h +++ b/include/net/netfilter/nf_conntrack_l4proto.h @@ -125,14 +125,24 @@ struct nf_conntrack_l4proto *nf_ct_l4proto_find_get(u_int16_t l3proto, void nf_ct_l4proto_put(struct nf_conntrack_l4proto *p); /* Protocol pernet registration. */ +int nf_ct_l4proto_pernet_register_one(struct net *net, + struct nf_conntrack_l4proto *proto); +void nf_ct_l4proto_pernet_unregister_one(struct net *net, +struct nf_conntrack_l4proto *proto); int nf_ct_l4proto_pernet_register(struct net *net, - struct nf_conntrack_l4proto *proto); + struct nf_conntrack_l4proto *proto[], + unsigned int num_proto); void nf_ct_l4proto_pernet_unregister(struct net *net, -struct nf_conntrack_l4proto *proto); +struct nf_conntrack_l4proto *proto[], +unsigned int num_proto); /* Protocol global registration. */ -int nf_ct_l4proto_register(struct nf_conntrack_l4proto *proto); -void nf_ct_l4proto_unregister(struct nf_conntrack_l4proto *proto); +int nf_ct_l4proto_register_one(struct nf_conntrack_l4proto *proto); +void nf_ct_l4proto_unregister_one(struct nf_conntrack_l4proto *proto); +int nf_ct_l4proto_register(struct nf_conntrack_l4proto *proto[], + unsigned int num_proto); +void nf_ct_l4proto_unregister(struct nf_conntrack_l4proto *proto[], + unsigned int num_proto); /* Generic netlink helpers */ int nf_ct_port_tuple_to_nlattr(struct sk_buff *skb, diff --git a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c index 713c09a74b90..7130ed5dc1fa 100644 --- a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c +++ b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c @@ -336,47 +336,34 @@ MODULE_ALIAS("nf_conntrack-" __stringify(AF_INET)); MODULE_ALIAS("ip_conntrack"); MODULE_LICENSE("GPL"); +static struct nf_conntrack_l4proto *builtin_l4proto4[] = { + _conntrack_l4proto_tcp4, + _conntrack_l4proto_udp4, + _conntrack_l4proto_icmp, +}; + static int ipv4_net_init(struct net *net) { int ret = 0; - ret = nf_ct_l4proto_pernet_register(net, _conntrack_l4proto_tcp4); - if (ret < 0) { - pr_err("nf_conntrack_tcp4: pernet registration failed\n"); - goto out_tcp; - } - ret = nf_ct_l4proto_pernet_register(net, _conntrack_l4proto_udp4); - if (ret < 0) { - pr_err("nf_conntrack_udp4: pernet registration failed\n"); - goto out_udp; - } - ret = nf_ct_l4proto_pernet_register(net, _conntrack_l4proto_icmp); - if (ret < 0) { - pr_err("nf_conntrack_icmp4: pernet registration failed\n"); - goto out_icmp; - } + ret = nf_ct_l4proto_pernet_register(net, builtin_l4proto4, + ARRAY_SIZE(builtin_l4proto4)); + if (ret < 0) + return ret; ret = nf_ct_l3proto_pernet_register(net, _conntrack_l3proto_ipv4); if (ret < 0) { pr_err("nf_conntrack_ipv4: pernet registration failed\n"); - goto out_ipv4; + nf_ct_l4proto_pernet_unregister(net, builtin_l4proto4, +
[PATCH 06/39] netfilter: nf_tables: use hook state from xt_action_param structure
Don't copy relevant fields from hook state structure, instead use the one that is already available in struct xt_action_param. This patch also adds a set of new wrapper functions to fetch relevant hook state structure fields. Signed-off-by: Pablo Neira Ayuso--- include/net/netfilter/nf_tables.h| 35 +++- net/bridge/netfilter/nft_meta_bridge.c | 2 +- net/bridge/netfilter/nft_reject_bridge.c | 30 --- net/ipv4/netfilter/nft_dup_ipv4.c| 2 +- net/ipv4/netfilter/nft_fib_ipv4.c| 14 ++--- net/ipv4/netfilter/nft_masq_ipv4.c | 4 ++-- net/ipv4/netfilter/nft_redir_ipv4.c | 3 +-- net/ipv4/netfilter/nft_reject_ipv4.c | 4 ++-- net/ipv6/netfilter/nft_dup_ipv6.c| 2 +- net/ipv6/netfilter/nft_fib_ipv6.c| 16 +++ net/ipv6/netfilter/nft_masq_ipv6.c | 3 ++- net/ipv6/netfilter/nft_redir_ipv6.c | 3 ++- net/ipv6/netfilter/nft_reject_ipv6.c | 6 +++--- net/netfilter/nf_dup_netdev.c| 2 +- net/netfilter/nf_tables_core.c | 10 - net/netfilter/nf_tables_trace.c | 8 net/netfilter/nft_fib.c | 2 +- net/netfilter/nft_fib_inet.c | 2 +- net/netfilter/nft_log.c | 5 +++-- net/netfilter/nft_lookup.c | 5 ++--- net/netfilter/nft_meta.c | 6 +++--- net/netfilter/nft_queue.c| 2 +- net/netfilter/nft_reject_inet.c | 18 net/netfilter/nft_rt.c | 4 ++-- 24 files changed, 105 insertions(+), 83 deletions(-) diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index 44060344f958..3295fb85bff6 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -14,27 +14,42 @@ struct nft_pktinfo { struct sk_buff *skb; - struct net *net; - const struct net_device *in; - const struct net_device *out; - u8 pf; - u8 hook; booltprot_set; u8 tprot; /* for x_tables compatibility */ struct xt_action_param xt; }; +static inline struct net *nft_net(const struct nft_pktinfo *pkt) +{ + return pkt->xt.state->net; +} + +static inline unsigned int nft_hook(const struct nft_pktinfo *pkt) +{ + return pkt->xt.state->hook; +} + +static inline u8 nft_pf(const struct nft_pktinfo *pkt) +{ + return pkt->xt.state->pf; +} + +static inline const struct net_device *nft_in(const struct nft_pktinfo *pkt) +{ + return pkt->xt.state->in; +} + +static inline const struct net_device *nft_out(const struct nft_pktinfo *pkt) +{ + return pkt->xt.state->out; +} + static inline void nft_set_pktinfo(struct nft_pktinfo *pkt, struct sk_buff *skb, const struct nf_hook_state *state) { pkt->skb = skb; - pkt->net = state->net; - pkt->in = state->in; - pkt->out = state->out; - pkt->hook = state->hook; - pkt->pf = state->pf; pkt->xt.state = state; } diff --git a/net/bridge/netfilter/nft_meta_bridge.c b/net/bridge/netfilter/nft_meta_bridge.c index ad47a921b701..5974dbc1ea24 100644 --- a/net/bridge/netfilter/nft_meta_bridge.c +++ b/net/bridge/netfilter/nft_meta_bridge.c @@ -23,7 +23,7 @@ static void nft_meta_bridge_get_eval(const struct nft_expr *expr, const struct nft_pktinfo *pkt) { const struct nft_meta *priv = nft_expr_priv(expr); - const struct net_device *in = pkt->in, *out = pkt->out; + const struct net_device *in = nft_in(pkt), *out = nft_out(pkt); u32 *dest = >data[priv->dreg]; const struct net_bridge_port *p; diff --git a/net/bridge/netfilter/nft_reject_bridge.c b/net/bridge/netfilter/nft_reject_bridge.c index 4b3df6b0e3b9..206dc266ecd2 100644 --- a/net/bridge/netfilter/nft_reject_bridge.c +++ b/net/bridge/netfilter/nft_reject_bridge.c @@ -315,17 +315,20 @@ static void nft_reject_bridge_eval(const struct nft_expr *expr, case htons(ETH_P_IP): switch (priv->type) { case NFT_REJECT_ICMP_UNREACH: - nft_reject_br_send_v4_unreach(pkt->net, pkt->skb, - pkt->in, pkt->hook, + nft_reject_br_send_v4_unreach(nft_net(pkt), pkt->skb, + nft_in(pkt), + nft_hook(pkt), priv->icmp_code); break; case NFT_REJECT_TCP_RST: -
[PATCH 02/39] netfilter: remove comments that predate rcu days
We cannot block/sleep on nf_iterate because netfilter runs under rcu read lock these days, where blocking is well-known to be illegal. So let's remove these old comments. Signed-off-by: Pablo Neira Ayuso--- net/netfilter/core.c | 7 --- 1 file changed, 7 deletions(-) diff --git a/net/netfilter/core.c b/net/netfilter/core.c index 3d4aa96cb219..76014ad72ec5 100644 --- a/net/netfilter/core.c +++ b/net/netfilter/core.c @@ -308,18 +308,11 @@ unsigned int nf_iterate(struct sk_buff *skb, { unsigned int verdict; - /* -* The caller must not block between calls to this -* function because of risk of continuing from deleted element. -*/ while (*entryp) { if (state->thresh > (*entryp)->ops.priority) { *entryp = rcu_dereference((*entryp)->next); continue; } - - /* Optimization: we don't need to hold module - reference here, since function can't sleep. --RR */ repeat: verdict = (*entryp)->ops.hook((*entryp)->ops.priv, skb, state); if (verdict != NF_ACCEPT) { -- 2.1.4
[PATCH 36/39] netfilter: ipset: use setup_timer() and mod_timer().
From: Jozsef KadlecsikUse setup_timer() and instead of init_timer(), being the preferred way of setting up a timer. Also, quoting the mod_timer() function comment: -> mod_timer() is a more efficient way to update the expire field of an active timer (if the timer is inactive it will be activated). Use setup_timer() and mod_timer() to setup and arm a timer, making the code compact and easier to read. Signed-off-by: Muhammad Falak R Wani Signed-off-by: Jozsef Kadlecsik --- net/netfilter/ipset/ip_set_bitmap_gen.h | 7 ++- net/netfilter/ipset/ip_set_hash_gen.h | 7 ++- net/netfilter/ipset/ip_set_list_set.c | 7 ++- 3 files changed, 6 insertions(+), 15 deletions(-) diff --git a/net/netfilter/ipset/ip_set_bitmap_gen.h b/net/netfilter/ipset/ip_set_bitmap_gen.h index f8ea26cafa30..6f09a99298cd 100644 --- a/net/netfilter/ipset/ip_set_bitmap_gen.h +++ b/net/netfilter/ipset/ip_set_bitmap_gen.h @@ -41,11 +41,8 @@ mtype_gc_init(struct ip_set *set, void (*gc)(unsigned long ul_set)) { struct mtype *map = set->data; - init_timer(>gc); - map->gc.data = (unsigned long)set; - map->gc.function = gc; - map->gc.expires = jiffies + IPSET_GC_PERIOD(set->timeout) * HZ; - add_timer(>gc); + setup_timer(>gc, gc, (unsigned long)set); + mod_timer(>gc, jiffies + IPSET_GC_PERIOD(set->timeout) * HZ); } static void diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h index 88b70fcc5ac5..1b05d4a7d5a1 100644 --- a/net/netfilter/ipset/ip_set_hash_gen.h +++ b/net/netfilter/ipset/ip_set_hash_gen.h @@ -433,11 +433,8 @@ mtype_gc_init(struct ip_set *set, void (*gc)(unsigned long ul_set)) { struct htype *h = set->data; - init_timer(>gc); - h->gc.data = (unsigned long)set; - h->gc.function = gc; - h->gc.expires = jiffies + IPSET_GC_PERIOD(set->timeout) * HZ; - add_timer(>gc); + setup_timer(>gc, gc, (unsigned long)set); + mod_timer(>gc, jiffies + IPSET_GC_PERIOD(set->timeout) * HZ); pr_debug("gc initialized, run in every %u\n", IPSET_GC_PERIOD(set->timeout)); } diff --git a/net/netfilter/ipset/ip_set_list_set.c b/net/netfilter/ipset/ip_set_list_set.c index dede343a662b..51077c53d76b 100644 --- a/net/netfilter/ipset/ip_set_list_set.c +++ b/net/netfilter/ipset/ip_set_list_set.c @@ -586,11 +586,8 @@ list_set_gc_init(struct ip_set *set, void (*gc)(unsigned long ul_set)) { struct list_set *map = set->data; - init_timer(>gc); - map->gc.data = (unsigned long)set; - map->gc.function = gc; - map->gc.expires = jiffies + IPSET_GC_PERIOD(set->timeout) * HZ; - add_timer(>gc); + setup_timer(>gc, gc, (unsigned long)set); + mod_timer(>gc, jiffies + IPSET_GC_PERIOD(set->timeout) * HZ); } /* Create list:set type of sets */ -- 2.1.4
[PATCH 18/39] netfilter: ipset: Headers file cleanup
From: Jozsef KadlecsikGroup counter helper functions together. Ported from a patch proposed by Sergey Popovich . Suggested-by: Sergey Popovich Signed-off-by: Jozsef Kadlecsik --- include/linux/netfilter/ipset/ip_set.h | 42 +- 1 file changed, 21 insertions(+), 21 deletions(-) diff --git a/include/linux/netfilter/ipset/ip_set.h b/include/linux/netfilter/ipset/ip_set.h index 524467f933bf..1ea28e30a6dd 100644 --- a/include/linux/netfilter/ipset/ip_set.h +++ b/include/linux/netfilter/ipset/ip_set.h @@ -334,6 +334,27 @@ ip_set_update_counter(struct ip_set_counter *counter, } } +static inline bool +ip_set_put_counter(struct sk_buff *skb, const struct ip_set_counter *counter) +{ + return nla_put_net64(skb, IPSET_ATTR_BYTES, +cpu_to_be64(ip_set_get_bytes(counter)), +IPSET_ATTR_PAD) || + nla_put_net64(skb, IPSET_ATTR_PACKETS, +cpu_to_be64(ip_set_get_packets(counter)), +IPSET_ATTR_PAD); +} + +static inline void +ip_set_init_counter(struct ip_set_counter *counter, + const struct ip_set_ext *ext) +{ + if (ext->bytes != ULLONG_MAX) + atomic64_set(&(counter)->bytes, (long long)(ext->bytes)); + if (ext->packets != ULLONG_MAX) + atomic64_set(&(counter)->packets, (long long)(ext->packets)); +} + static inline void ip_set_get_skbinfo(struct ip_set_skbinfo *skbinfo, const struct ip_set_ext *ext, @@ -372,27 +393,6 @@ ip_set_init_skbinfo(struct ip_set_skbinfo *skbinfo, skbinfo->skbqueue = ext->skbqueue; } -static inline bool -ip_set_put_counter(struct sk_buff *skb, const struct ip_set_counter *counter) -{ - return nla_put_net64(skb, IPSET_ATTR_BYTES, -cpu_to_be64(ip_set_get_bytes(counter)), -IPSET_ATTR_PAD) || - nla_put_net64(skb, IPSET_ATTR_PACKETS, -cpu_to_be64(ip_set_get_packets(counter)), -IPSET_ATTR_PAD); -} - -static inline void -ip_set_init_counter(struct ip_set_counter *counter, - const struct ip_set_ext *ext) -{ - if (ext->bytes != ULLONG_MAX) - atomic64_set(&(counter)->bytes, (long long)(ext->bytes)); - if (ext->packets != ULLONG_MAX) - atomic64_set(&(counter)->packets, (long long)(ext->packets)); -} - /* Netlink CB args */ enum { IPSET_CB_NET = 0, /* net namespace */ -- 2.1.4
[PATCH 03/39] netfilter: kill NF_HOOK_THRESH() and state->tresh
Patch c5136b15ea36 ("netfilter: bridge: add and use br_nf_hook_thresh") introduced br_nf_hook_thresh(). Replace NF_HOOK_THRESH() by br_nf_hook_thresh from br_nf_forward_finish(), so we have no more callers for this macro. As a result, state->thresh and explicit thresh parameter in the hook state structure is not required anymore. And we can get rid of skip-hook-under-thresh loop in nf_iterate() in the core path that is only used by br_netfilter to search for the filter hook. Suggested-by: Florian WestphalSigned-off-by: Pablo Neira Ayuso --- include/linux/netfilter.h | 50 +-- include/linux/netfilter_ingress.h | 2 +- net/bridge/br_netfilter_hooks.c | 8 +++--- net/bridge/netfilter/ebtable_broute.c | 2 +- net/netfilter/core.c | 4 --- net/netfilter/nf_queue.c | 2 -- 6 files changed, 19 insertions(+), 49 deletions(-) diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h index abc7fdcb9eb1..e0d000f6c9bf 100644 --- a/include/linux/netfilter.h +++ b/include/linux/netfilter.h @@ -49,7 +49,6 @@ struct sock; struct nf_hook_state { unsigned int hook; - int thresh; u_int8_t pf; struct net_device *in; struct net_device *out; @@ -84,7 +83,7 @@ struct nf_hook_entry { static inline void nf_hook_state_init(struct nf_hook_state *p, struct nf_hook_entry *hook_entry, unsigned int hook, - int thresh, u_int8_t pf, + u_int8_t pf, struct net_device *indev, struct net_device *outdev, struct sock *sk, @@ -92,7 +91,6 @@ static inline void nf_hook_state_init(struct nf_hook_state *p, int (*okfn)(struct net *, struct sock *, struct sk_buff *)) { p->hook = hook; - p->thresh = thresh; p->pf = pf; p->in = indev; p->out = outdev; @@ -155,20 +153,16 @@ extern struct static_key nf_hooks_needed[NFPROTO_NUMPROTO][NF_MAX_HOOKS]; int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state); /** - * nf_hook_thresh - call a netfilter hook + * nf_hook - call a netfilter hook * * Returns 1 if the hook has allowed the packet to pass. The function * okfn must be invoked by the caller in this case. Any other return * value indicates the packet has been consumed by the hook. */ -static inline int nf_hook_thresh(u_int8_t pf, unsigned int hook, -struct net *net, -struct sock *sk, -struct sk_buff *skb, -struct net_device *indev, -struct net_device *outdev, -int (*okfn)(struct net *, struct sock *, struct sk_buff *), -int thresh) +static inline int nf_hook(u_int8_t pf, unsigned int hook, struct net *net, + struct sock *sk, struct sk_buff *skb, + struct net_device *indev, struct net_device *outdev, + int (*okfn)(struct net *, struct sock *, struct sk_buff *)) { struct nf_hook_entry *hook_head; int ret = 1; @@ -185,8 +179,8 @@ static inline int nf_hook_thresh(u_int8_t pf, unsigned int hook, if (hook_head) { struct nf_hook_state state; - nf_hook_state_init(, hook_head, hook, thresh, - pf, indev, outdev, sk, net, okfn); + nf_hook_state_init(, hook_head, hook, pf, indev, outdev, + sk, net, okfn); ret = nf_hook_slow(skb, ); } @@ -195,14 +189,6 @@ static inline int nf_hook_thresh(u_int8_t pf, unsigned int hook, return ret; } -static inline int nf_hook(u_int8_t pf, unsigned int hook, struct net *net, - struct sock *sk, struct sk_buff *skb, - struct net_device *indev, struct net_device *outdev, - int (*okfn)(struct net *, struct sock *, struct sk_buff *)) -{ - return nf_hook_thresh(pf, hook, net, sk, skb, indev, outdev, okfn, INT_MIN); -} - /* Activate hook; either okfn or kfree_skb called, unless a hook returns NF_STOLEN (in which case, it's up to the hook to deal with the consequences). @@ -221,19 +207,6 @@ static inline int nf_hook(u_int8_t pf, unsigned int hook, struct net *net, */ static inline int -NF_HOOK_THRESH(uint8_t pf, unsigned int hook, struct net *net, struct sock *sk, - struct sk_buff *skb, struct net_device *in, - struct net_device *out, - int (*okfn)(struct net *,
[PATCH 22/39] netfilter: ipset: Separate memsize calculation code into dedicated function
From: Jozsef KadlecsikHash types already has it's memsize calculation code in separate functions. Clean up and do the same for *bitmap* and *list* sets. Ported from a patch proposed by Sergey Popovich . Suggested-by: Sergey Popovich Signed-off-by: Jozsef Kadlecsik --- net/netfilter/ipset/ip_set_bitmap_gen.h | 11 ++- net/netfilter/ipset/ip_set_list_set.c | 23 +-- 2 files changed, 27 insertions(+), 7 deletions(-) diff --git a/net/netfilter/ipset/ip_set_bitmap_gen.h b/net/netfilter/ipset/ip_set_bitmap_gen.h index 2e8e7e5fb4a6..4f07b90f8ef4 100644 --- a/net/netfilter/ipset/ip_set_bitmap_gen.h +++ b/net/netfilter/ipset/ip_set_bitmap_gen.h @@ -22,6 +22,7 @@ #define mtype_kadt IPSET_TOKEN(MTYPE, _kadt) #define mtype_uadt IPSET_TOKEN(MTYPE, _uadt) #define mtype_destroy IPSET_TOKEN(MTYPE, _destroy) +#define mtype_memsize IPSET_TOKEN(MTYPE, _memsize) #define mtype_flushIPSET_TOKEN(MTYPE, _flush) #define mtype_head IPSET_TOKEN(MTYPE, _head) #define mtype_same_set IPSET_TOKEN(MTYPE, _same_set) @@ -84,12 +85,20 @@ mtype_flush(struct ip_set *set) memset(map->members, 0, map->memsize); } +/* Calculate the actual memory size of the set data */ +static size_t +mtype_memsize(const struct mtype *map, size_t dsize) +{ + return sizeof(*map) + map->memsize + + map->elements * dsize; +} + static int mtype_head(struct ip_set *set, struct sk_buff *skb) { const struct mtype *map = set->data; struct nlattr *nested; - size_t memsize = sizeof(*map) + map->memsize; + size_t memsize = mtype_memsize(map, set->dsize); nested = ipset_nest_start(skb, IPSET_ATTR_DATA); if (!nested) diff --git a/net/netfilter/ipset/ip_set_list_set.c b/net/netfilter/ipset/ip_set_list_set.c index a2a89e4e0a14..462b0b1870e2 100644 --- a/net/netfilter/ipset/ip_set_list_set.c +++ b/net/netfilter/ipset/ip_set_list_set.c @@ -441,12 +441,12 @@ list_set_destroy(struct ip_set *set) set->data = NULL; } -static int -list_set_head(struct ip_set *set, struct sk_buff *skb) +/* Calculate the actual memory size of the set data */ +static size_t +list_set_memsize(const struct list_set *map, size_t dsize) { - const struct list_set *map = set->data; - struct nlattr *nested; struct set_elem *e; + size_t memsize; u32 n = 0; rcu_read_lock(); @@ -454,13 +454,24 @@ list_set_head(struct ip_set *set, struct sk_buff *skb) n++; rcu_read_unlock(); + memsize = sizeof(*map) + n * dsize; + + return memsize; +} + +static int +list_set_head(struct ip_set *set, struct sk_buff *skb) +{ + const struct list_set *map = set->data; + struct nlattr *nested; + size_t memsize = list_set_memsize(map, set->dsize); + nested = ipset_nest_start(skb, IPSET_ATTR_DATA); if (!nested) goto nla_put_failure; if (nla_put_net32(skb, IPSET_ATTR_SIZE, htonl(map->size)) || nla_put_net32(skb, IPSET_ATTR_REFERENCES, htonl(set->ref)) || - nla_put_net32(skb, IPSET_ATTR_MEMSIZE, - htonl(sizeof(*map) + n * set->dsize))) + nla_put_net32(skb, IPSET_ATTR_MEMSIZE, htonl(memsize))) goto nla_put_failure; if (unlikely(ip_set_put_flags(skb, set))) goto nla_put_failure; -- 2.1.4
[PATCH 39/39] netfilter: x_tables: simplify IS_ERR_OR_NULL to NULL test
From: Julia LawallSince commit 7926dbfa4bc1 ("netfilter: don't use mutex_lock_interruptible()"), the function xt_find_table_lock can only return NULL on an error. Simplify the call sites and update the comment before the function. The semantic patch that change the code is as follows: (http://coccinelle.lip6.fr/) // @@ expression t,e; @@ t = \(xt_find_table_lock(...)\| try_then_request_module(xt_find_table_lock(...),...)\) ... when != t=e - ! IS_ERR_OR_NULL(t) + t @@ expression t,e; @@ t = \(xt_find_table_lock(...)\| try_then_request_module(xt_find_table_lock(...),...)\) ... when != t=e - IS_ERR_OR_NULL(t) + !t @@ expression t,e,e1; @@ t = \(xt_find_table_lock(...)\| try_then_request_module(xt_find_table_lock(...),...)\) ... when != t=e ?- t ? PTR_ERR(t) : e1 + e1 ... when any // Signed-off-by: Julia Lawall Signed-off-by: Pablo Neira Ayuso --- net/ipv4/netfilter/arp_tables.c | 20 ++-- net/ipv4/netfilter/ip_tables.c | 20 ++-- net/ipv6/netfilter/ip6_tables.c | 20 ++-- net/netfilter/x_tables.c| 2 +- 4 files changed, 31 insertions(+), 31 deletions(-) diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c index e76ab23a2deb..39004da318e2 100644 --- a/net/ipv4/netfilter/arp_tables.c +++ b/net/ipv4/netfilter/arp_tables.c @@ -805,7 +805,7 @@ static int get_info(struct net *net, void __user *user, #endif t = try_then_request_module(xt_find_table_lock(net, NFPROTO_ARP, name), "arptable_%s", name); - if (!IS_ERR_OR_NULL(t)) { + if (t) { struct arpt_getinfo info; const struct xt_table_info *private = t->private; #ifdef CONFIG_COMPAT @@ -834,7 +834,7 @@ static int get_info(struct net *net, void __user *user, xt_table_unlock(t); module_put(t->me); } else - ret = t ? PTR_ERR(t) : -ENOENT; + ret = -ENOENT; #ifdef CONFIG_COMPAT if (compat) xt_compat_unlock(NFPROTO_ARP); @@ -859,7 +859,7 @@ static int get_entries(struct net *net, struct arpt_get_entries __user *uptr, get.name[sizeof(get.name) - 1] = '\0'; t = xt_find_table_lock(net, NFPROTO_ARP, get.name); - if (!IS_ERR_OR_NULL(t)) { + if (t) { const struct xt_table_info *private = t->private; if (get.size == private->size) @@ -871,7 +871,7 @@ static int get_entries(struct net *net, struct arpt_get_entries __user *uptr, module_put(t->me); xt_table_unlock(t); } else - ret = t ? PTR_ERR(t) : -ENOENT; + ret = -ENOENT; return ret; } @@ -898,8 +898,8 @@ static int __do_replace(struct net *net, const char *name, t = try_then_request_module(xt_find_table_lock(net, NFPROTO_ARP, name), "arptable_%s", name); - if (IS_ERR_OR_NULL(t)) { - ret = t ? PTR_ERR(t) : -ENOENT; + if (!t) { + ret = -ENOENT; goto free_newinfo_counters_untrans; } @@ -1014,8 +1014,8 @@ static int do_add_counters(struct net *net, const void __user *user, return PTR_ERR(paddc); t = xt_find_table_lock(net, NFPROTO_ARP, tmp.name); - if (IS_ERR_OR_NULL(t)) { - ret = t ? PTR_ERR(t) : -ENOENT; + if (!t) { + ret = -ENOENT; goto free; } @@ -1404,7 +1404,7 @@ static int compat_get_entries(struct net *net, xt_compat_lock(NFPROTO_ARP); t = xt_find_table_lock(net, NFPROTO_ARP, get.name); - if (!IS_ERR_OR_NULL(t)) { + if (t) { const struct xt_table_info *private = t->private; struct xt_table_info info; @@ -1419,7 +1419,7 @@ static int compat_get_entries(struct net *net, module_put(t->me); xt_table_unlock(t); } else - ret = t ? PTR_ERR(t) : -ENOENT; + ret = -ENOENT; xt_compat_unlock(NFPROTO_ARP); return ret; diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c index de4fa03f46f3..46815c8a60d7 100644 --- a/net/ipv4/netfilter/ip_tables.c +++ b/net/ipv4/netfilter/ip_tables.c @@ -973,7 +973,7 @@ static int get_info(struct net *net, void __user *user, #endif t = try_then_request_module(xt_find_table_lock(net, AF_INET, name), "iptable_%s", name); - if (!IS_ERR_OR_NULL(t)) { + if (t) { struct ipt_getinfo info; const struct xt_table_info *private = t->private; #ifdef CONFIG_COMPAT @@ -1003,7 +1003,7 @@ static int get_info(struct net *net, void __user *user, xt_table_unlock(t); module_put(t->me); } else - ret = t ?
[PATCH 00/39] Netfilter updates for net-next
Hi David, The following patchset contains a second batch of Netfilter updates for your net-next tree. This includes a rework of the core hook infrastructure that improves Netfilter performance by ~15% according to synthetic benchmarks. Then, a large batch with ipset updates, including a new hash:ipmac set type, via Jozsef Kadlecsik. This also includes a couple of assorted updates. Regarding the core hook infrastructure rework to improve performance, using this simple drop-all packets ruleset from ingress: nft add table netdev x nft add chain netdev x y { type filter hook ingress device eth0 priority 0\; } nft add rule netdev x y drop And generating traffic through Jesper Brouer's samples/pktgen/pktgen_bench_xmit_mode_netif_receive.sh script using -i option. perf report shows nf_tables calls in its top 10: 17.30% kpktgend_0 [nf_tables][k] nft_do_chain 15.75% kpktgend_0 [kernel.vmlinux] [k] __netif_receive_skb_core 10.39% kpktgend_0 [nf_tables_netdev] [k] nft_do_chain_netdev I'm measuring here an improvement of ~15% in performance with this patchset, so we got +2.5Mpps more. I have used my old laptop Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz 4-cores. This rework contains more specifically, in strict order, these patches: 1) Remove compile-time debugging from core. 2) Remove obsolete comments that predate the rcu era. These days it is well known that a Netfilter hook always runs under rcu_read_lock(). 3) Remove threshold handling, this is only used by br_netfilter too. We already have specific code to handle this from br_netfilter, so remove this code from the core path. 4) Deprecate NF_STOP, as this is only used by br_netfilter. 5) Place nf_state_hook pointer into xt_action_param structure, so this structure fits into one single cacheline according to pahole. This also implicit affects nftables since it also relies on the xt_action_param structure. 6) Move state->hook_entries into nf_queue entry. The hook_entries pointer is only required by nf_queue(), so we can store this in the queue entry instead. 7) use switch() statement to handle verdict cases. 8) Remove hook_entries field from nf_hook_state structure, this is only required by nf_queue, so store it in nf_queue_entry structure. 9) Merge nf_iterate() into nf_hook_slow() that results in a much more simple and readable function. 10) Handle NF_REPEAT away from the core, so far the only client is nf_conntrack_in() and we can restart the packet processing using a simple goto to jump back there when the TCP requires it. This update required a second pass to fix fallout, fix from Arnd Bergmann. 11) Set random seed from nft_hash when no seed is specified from userspace. 12) Simplify nf_tables expression registration, in a much smarter way to save lots of boiler plate code, by Liping Zhang. 13) Simplify layer 4 protocol conntrack tracker registration, from Davide Caratti. 14) Missing CONFIG_NF_SOCKET_IPV4 dependency for udp4_lib_lookup, due to recent generalization of the socket infrastructure, from Arnd Bergmann. 15) Then, the ipset batch from Jozsef, he describes it as it follows: * Cleanup: Remove extra whitespaces in ip_set.h * Cleanup: Mark some of the helpers arguments as const in ip_set.h * Cleanup: Group counter helper functions together in ip_set.h * struct ip_set_skbinfo is introduced instead of open coded fields in skbinfo get/init helper funcions. * Use kmalloc() in comment extension helper instead of kzalloc() because it is unnecessary to zero out the area just before explicit initialization. * Cleanup: Split extensions into separate files. * Cleanup: Separate memsize calculation code into dedicated function. * Cleanup: group ip_set_put_extensions() and ip_set_get_extensions() together. * Add element count to hash headers by Eric B Munson. * Add element count to all set types header for uniform output across all set types. * Count non-static extension memory into memsize calculation for userspace. * Cleanup: Remove redundant mtype_expire() arguments, because they can be get from other parameters. * Cleanup: Simplify mtype_expire() for hash types by removing one level of intendation. * Make NLEN compile time constant for hash types. * Make sure element data size is a multiple of u32 for the hash set types. * Optimize hash creation routine, exit as early as possible. * Make struct htype per ipset family so nets array becomes fixed size and thus simplifies the struct htype allocation. * Collapse same condition body into a single one. * Fix reported memory size for hash:* types, base hash bucket structure was not taken into account. * hash:ipmac type support added to ipset by Tomasz Chilinski. * Use setup_timer() and mod_timer() instead of init_timer() by Muhammad Falak R Wani, individually for the set type families. 16) Remove useless connlabel field in struct netns_ct,
[PATCH 09/39] netfilter: merge nf_iterate() into nf_hook_slow()
nf_iterate() has become rather simple, we can integrate this code into nf_hook_slow() to reduce the amount of LOC in the core path. However, we still need nf_iterate() around for nf_queue packet handling, so move this function there where we only need it. I think it should be possible to refactor nf_queue code to get rid of it definitely, but given this is slow path anyway, let's have a look this later. Signed-off-by: Pablo Neira Ayuso--- net/netfilter/core.c | 73 +--- net/netfilter/nf_internals.h | 5 --- net/netfilter/nf_queue.c | 20 3 files changed, 48 insertions(+), 50 deletions(-) diff --git a/net/netfilter/core.c b/net/netfilter/core.c index ebece48b8392..bd9272eeccb5 100644 --- a/net/netfilter/core.c +++ b/net/netfilter/core.c @@ -302,26 +302,6 @@ void _nf_unregister_hooks(struct nf_hook_ops *reg, unsigned int n) } EXPORT_SYMBOL(_nf_unregister_hooks); -unsigned int nf_iterate(struct sk_buff *skb, - struct nf_hook_state *state, - struct nf_hook_entry **entryp) -{ - unsigned int verdict; - - do { -repeat: - verdict = (*entryp)->ops.hook((*entryp)->ops.priv, skb, state); - if (verdict != NF_ACCEPT) { - if (verdict != NF_REPEAT) - return verdict; - goto repeat; - } - *entryp = rcu_dereference((*entryp)->next); - } while (*entryp); - return NF_ACCEPT; -} - - /* Returns 1 if okfn() needs to be executed by the caller, * -EPERM for NF_DROP, 0 otherwise. Caller must hold rcu_read_lock. */ int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state, @@ -330,31 +310,34 @@ int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state, unsigned int verdict; int ret; -next_hook: - verdict = nf_iterate(skb, state, ); - switch (verdict & NF_VERDICT_MASK) { - case NF_ACCEPT: - ret = 1; - break; - case NF_DROP: - kfree_skb(skb); - ret = NF_DROP_GETERR(verdict); - if (ret == 0) - ret = -EPERM; - break; - case NF_QUEUE: - ret = nf_queue(skb, state, , verdict); - if (ret == 1 && entry) - goto next_hook; - /* Fall through. */ - default: - /* Implicit handling for NF_STOLEN, as well as any other non -* conventional verdicts. -*/ - ret = 0; - break; - } - return ret; + do { + verdict = entry->ops.hook(entry->ops.priv, skb, state); + switch (verdict & NF_VERDICT_MASK) { + case NF_ACCEPT: + entry = rcu_dereference(entry->next); + break; + case NF_DROP: + kfree_skb(skb); + ret = NF_DROP_GETERR(verdict); + if (ret == 0) + ret = -EPERM; + return ret; + case NF_REPEAT: + continue; + case NF_QUEUE: + ret = nf_queue(skb, state, , verdict); + if (ret == 1 && entry) + continue; + return ret; + default: + /* Implicit handling for NF_STOLEN, as well as any other +* non conventional verdicts. +*/ + return 0; + } + } while (entry); + + return 1; } EXPORT_SYMBOL(nf_hook_slow); diff --git a/net/netfilter/nf_internals.h b/net/netfilter/nf_internals.h index 9fdb655f85bc..c46d214d5323 100644 --- a/net/netfilter/nf_internals.h +++ b/net/netfilter/nf_internals.h @@ -11,11 +11,6 @@ #define NFDEBUG(format, args...) #endif - -/* core.c */ -unsigned int nf_iterate(struct sk_buff *skb, struct nf_hook_state *state, - struct nf_hook_entry **entryp); - /* nf_queue.c */ int nf_queue(struct sk_buff *skb, struct nf_hook_state *state, struct nf_hook_entry **entryp, unsigned int verdict); diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c index 2e39e38ae1c7..77cba9f6ccb6 100644 --- a/net/netfilter/nf_queue.c +++ b/net/netfilter/nf_queue.c @@ -177,6 +177,26 @@ int nf_queue(struct sk_buff *skb, struct nf_hook_state *state, return 0; } +static unsigned int nf_iterate(struct sk_buff *skb, + struct nf_hook_state *state, + struct nf_hook_entry **entryp) +{ + unsigned int verdict; + + do { +repeat: + verdict = (*entryp)->ops.hook((*entryp)->ops.priv, skb, state); + if (verdict != NF_ACCEPT) {
[PATCH 08/39] netfilter: remove hook_entries field from nf_hook_state
This field is only useful for nf_queue, so store it in the nf_queue_entry structure instead, away from the core path. Pass hook_head to nf_hook_slow(). Since we always have a valid entry on the first iteration in nf_iterate(), we can use 'do { ... } while (entry)' loop instead. Signed-off-by: Pablo Neira Ayuso--- include/linux/netfilter.h | 10 -- include/linux/netfilter_ingress.h | 4 ++-- include/net/netfilter/nf_queue.h | 1 + net/bridge/br_netfilter_hooks.c | 4 ++-- net/bridge/netfilter/ebtable_broute.c | 2 +- net/netfilter/core.c | 9 - net/netfilter/nf_queue.c | 13 + net/netfilter/nfnetlink_queue.c | 2 +- 8 files changed, 20 insertions(+), 25 deletions(-) diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h index e0d000f6c9bf..69230140215b 100644 --- a/include/linux/netfilter.h +++ b/include/linux/netfilter.h @@ -54,7 +54,6 @@ struct nf_hook_state { struct net_device *out; struct sock *sk; struct net *net; - struct nf_hook_entry __rcu *hook_entries; int (*okfn)(struct net *, struct sock *, struct sk_buff *); }; @@ -81,7 +80,6 @@ struct nf_hook_entry { }; static inline void nf_hook_state_init(struct nf_hook_state *p, - struct nf_hook_entry *hook_entry, unsigned int hook, u_int8_t pf, struct net_device *indev, @@ -96,7 +94,6 @@ static inline void nf_hook_state_init(struct nf_hook_state *p, p->out = outdev; p->sk = sk; p->net = net; - RCU_INIT_POINTER(p->hook_entries, hook_entry); p->okfn = okfn; } @@ -150,7 +147,8 @@ void nf_unregister_sockopt(struct nf_sockopt_ops *reg); extern struct static_key nf_hooks_needed[NFPROTO_NUMPROTO][NF_MAX_HOOKS]; #endif -int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state); +int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state, +struct nf_hook_entry *entry); /** * nf_hook - call a netfilter hook @@ -179,10 +177,10 @@ static inline int nf_hook(u_int8_t pf, unsigned int hook, struct net *net, if (hook_head) { struct nf_hook_state state; - nf_hook_state_init(, hook_head, hook, pf, indev, outdev, + nf_hook_state_init(, hook, pf, indev, outdev, sk, net, okfn); - ret = nf_hook_slow(skb, ); + ret = nf_hook_slow(skb, , hook_head); } rcu_read_unlock(); diff --git a/include/linux/netfilter_ingress.h b/include/linux/netfilter_ingress.h index fd44e4131710..2dc3b49b804a 100644 --- a/include/linux/netfilter_ingress.h +++ b/include/linux/netfilter_ingress.h @@ -26,10 +26,10 @@ static inline int nf_hook_ingress(struct sk_buff *skb) if (unlikely(!e)) return 0; - nf_hook_state_init(, e, NF_NETDEV_INGRESS, + nf_hook_state_init(, NF_NETDEV_INGRESS, NFPROTO_NETDEV, skb->dev, NULL, NULL, dev_net(skb->dev), NULL); - return nf_hook_slow(skb, ); + return nf_hook_slow(skb, , e); } static inline void nf_hook_ingress_init(struct net_device *dev) diff --git a/include/net/netfilter/nf_queue.h b/include/net/netfilter/nf_queue.h index 2280cfe86c56..09948d10e38e 100644 --- a/include/net/netfilter/nf_queue.h +++ b/include/net/netfilter/nf_queue.h @@ -12,6 +12,7 @@ struct nf_queue_entry { unsigned intid; struct nf_hook_statestate; + struct nf_hook_entry*hook; u16 size; /* sizeof(entry) + saved route keys */ /* extra space to store route keys */ diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c index 7e3645fa6339..8155bd2a5138 100644 --- a/net/bridge/br_netfilter_hooks.c +++ b/net/bridge/br_netfilter_hooks.c @@ -1018,10 +1018,10 @@ int br_nf_hook_thresh(unsigned int hook, struct net *net, /* We may already have this, but read-locks nest anyway */ rcu_read_lock(); - nf_hook_state_init(, elem, hook, NFPROTO_BRIDGE, indev, outdev, + nf_hook_state_init(, hook, NFPROTO_BRIDGE, indev, outdev, sk, net, okfn); - ret = nf_hook_slow(skb, ); + ret = nf_hook_slow(skb, , elem); rcu_read_unlock(); if (ret == 1) ret = okfn(net, sk, skb); diff --git a/net/bridge/netfilter/ebtable_broute.c b/net/bridge/netfilter/ebtable_broute.c index 599679e3498d..8fe36dc3aab2 100644 --- a/net/bridge/netfilter/ebtable_broute.c +++ b/net/bridge/netfilter/ebtable_broute.c @@ -53,7 +53,7 @@ static int ebt_broute(struct sk_buff *skb) struct nf_hook_state state; int ret; - nf_hook_state_init(, NULL, NF_BR_BROUTING, +
[PATCH 30/39] netfilter: ipset: Make sure element data size is a multiple of u32
From: Jozsef KadlecsikData for hashing required to be array of u32. Make sure that element data always multiple of u32. Ported from a patch proposed by Sergey Popovich . Signed-off-by: Jozsef Kadlecsik --- net/netfilter/ipset/ip_set_hash_gen.h | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h index 6c88c20ae1d4..34f115f874ab 100644 --- a/net/netfilter/ipset/ip_set_hash_gen.h +++ b/net/netfilter/ipset/ip_set_hash_gen.h @@ -260,8 +260,14 @@ htable_bits(u32 hashsize) #endif #define HKEY(data, initval, htable_bits) \ -(jhash2((u32 *)(data), HKEY_DATALEN / sizeof(u32), initval)\ - & jhash_mask(htable_bits)) +({ \ + const u32 *__k = (const u32 *)data; \ + u32 __l = HKEY_DATALEN / sizeof(u32); \ + \ + BUILD_BUG_ON(HKEY_DATALEN % sizeof(u32) != 0); \ + \ + jhash2(__k, __l, initval) & jhash_mask(htable_bits);\ +}) #ifndef htype #ifndef HTYPE -- 2.1.4
[PATCH 33/39] netfilter: ipset: Collapse same condition body to a single one
From: Jozsef KadlecsikThe set full case (with net_ratelimit()-ed pr_warn()) is already handled, simply jump there. Signed-off-by: Jozsef Kadlecsik --- net/netfilter/ipset/ip_set_hash_gen.h | 8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h index c600f6d9f15e..1c9b84e53dcc 100644 --- a/net/netfilter/ipset/ip_set_hash_gen.h +++ b/net/netfilter/ipset/ip_set_hash_gen.h @@ -719,14 +719,8 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext, key = HKEY(value, h->initval, t->htable_bits); n = __ipset_dereference_protected(hbucket(t, key), 1); if (!n) { - if (forceadd) { - if (net_ratelimit()) - pr_warn("Set %s is full, maxelem %u reached\n", - set->name, h->maxelem); - return -IPSET_ERR_HASH_FULL; - } else if (set->elements >= h->maxelem) { + if (forceadd || set->elements >= h->maxelem) goto set_full; - } old = NULL; n = kzalloc(sizeof(*n) + AHASH_INIT_SIZE * set->dsize, GFP_ATOMIC); -- 2.1.4
[PATCH 26/39] netfilter: ipset: Count non-static extension memory for userspace
From: Jozsef KadlecsikNon-static (i.e. comment) extension was not counted into the memory size. A new internal counter is introduced for this. In the case of the hash types the sizes of the arrays are counted there as well so that we can avoid to scan the whole set when just the header data is requested. Signed-off-by: Jozsef Kadlecsik --- include/linux/netfilter/ipset/ip_set.h | 8 ++-- include/linux/netfilter/ipset/ip_set_comment.h | 7 +-- net/netfilter/ipset/ip_set_bitmap_gen.h| 5 +++-- net/netfilter/ipset/ip_set_core.c | 2 +- net/netfilter/ipset/ip_set_hash_gen.h | 26 ++ net/netfilter/ipset/ip_set_list_set.c | 5 +++-- 6 files changed, 32 insertions(+), 21 deletions(-) diff --git a/include/linux/netfilter/ipset/ip_set.h b/include/linux/netfilter/ipset/ip_set.h index 4671d740610f..8e42253e5d4d 100644 --- a/include/linux/netfilter/ipset/ip_set.h +++ b/include/linux/netfilter/ipset/ip_set.h @@ -79,10 +79,12 @@ enum ip_set_ext_id { IPSET_EXT_ID_MAX, }; +struct ip_set; + /* Extension type */ struct ip_set_ext_type { /* Destroy extension private data (can be NULL) */ - void (*destroy)(void *ext); + void (*destroy)(struct ip_set *set, void *ext); enum ip_set_extension type; enum ipset_cadt_flags flag; /* Size and minimal alignment */ @@ -252,6 +254,8 @@ struct ip_set { u32 timeout; /* Number of elements (vs timeout) */ u32 elements; + /* Size of the dynamic extensions (vs timeout) */ + size_t ext_size; /* Element data size */ size_t dsize; /* Offsets to extensions in elements */ @@ -268,7 +272,7 @@ ip_set_ext_destroy(struct ip_set *set, void *data) */ if (SET_WITH_COMMENT(set)) ip_set_extensions[IPSET_EXT_ID_COMMENT].destroy( - ext_comment(data, set)); + set, ext_comment(data, set)); } static inline int diff --git a/include/linux/netfilter/ipset/ip_set_comment.h b/include/linux/netfilter/ipset/ip_set_comment.h index 5444b1bbe656..8e2bab1e8e90 100644 --- a/include/linux/netfilter/ipset/ip_set_comment.h +++ b/include/linux/netfilter/ipset/ip_set_comment.h @@ -20,13 +20,14 @@ ip_set_comment_uget(struct nlattr *tb) * The kadt functions don't use the comment extensions in any way. */ static inline void -ip_set_init_comment(struct ip_set_comment *comment, +ip_set_init_comment(struct ip_set *set, struct ip_set_comment *comment, const struct ip_set_ext *ext) { struct ip_set_comment_rcu *c = rcu_dereference_protected(comment->c, 1); size_t len = ext->comment ? strlen(ext->comment) : 0; if (unlikely(c)) { + set->ext_size -= sizeof(*c) + strlen(c->str) + 1; kfree_rcu(c, rcu); rcu_assign_pointer(comment->c, NULL); } @@ -38,6 +39,7 @@ ip_set_init_comment(struct ip_set_comment *comment, if (unlikely(!c)) return; strlcpy(c->str, ext->comment, len + 1); + set->ext_size += sizeof(*c) + strlen(c->str) + 1; rcu_assign_pointer(comment->c, c); } @@ -58,13 +60,14 @@ ip_set_put_comment(struct sk_buff *skb, const struct ip_set_comment *comment) * of the set data anymore. */ static inline void -ip_set_comment_free(struct ip_set_comment *comment) +ip_set_comment_free(struct ip_set *set, struct ip_set_comment *comment) { struct ip_set_comment_rcu *c; c = rcu_dereference_protected(comment->c, 1); if (unlikely(!c)) return; + set->ext_size -= sizeof(*c) + strlen(c->str) + 1; kfree_rcu(c, rcu); rcu_assign_pointer(comment->c, NULL); } diff --git a/net/netfilter/ipset/ip_set_bitmap_gen.h b/net/netfilter/ipset/ip_set_bitmap_gen.h index 1810d1c06e3d..f8ea26cafa30 100644 --- a/net/netfilter/ipset/ip_set_bitmap_gen.h +++ b/net/netfilter/ipset/ip_set_bitmap_gen.h @@ -84,6 +84,7 @@ mtype_flush(struct ip_set *set) mtype_ext_cleanup(set); memset(map->members, 0, map->memsize); set->elements = 0; + set->ext_size = 0; } /* Calculate the actual memory size of the set data */ @@ -99,7 +100,7 @@ mtype_head(struct ip_set *set, struct sk_buff *skb) { const struct mtype *map = set->data; struct nlattr *nested; - size_t memsize = mtype_memsize(map, set->dsize); + size_t memsize = mtype_memsize(map, set->dsize) + set->ext_size; nested = ipset_nest_start(skb, IPSET_ATTR_DATA); if (!nested) @@ -173,7 +174,7 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext, if (SET_WITH_COUNTER(set)) ip_set_init_counter(ext_counter(x, set), ext); if (SET_WITH_COMMENT(set)) - ip_set_init_comment(ext_comment(x, set), ext); +
[PATCH 23/39] netfilter: ipset: Regroup ip_set_put_extensions and add extern
From: Jozsef KadlecsikCleanup: group ip_set_put_extensions and ip_set_get_extensions together and add missing extern. Signed-off-by: Jozsef Kadlecsik --- include/linux/netfilter/ipset/ip_set.h | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/include/linux/netfilter/ipset/ip_set.h b/include/linux/netfilter/ipset/ip_set.h index b5bd0fb3d07b..7a218eb74887 100644 --- a/include/linux/netfilter/ipset/ip_set.h +++ b/include/linux/netfilter/ipset/ip_set.h @@ -331,6 +331,8 @@ extern size_t ip_set_elem_len(struct ip_set *set, struct nlattr *tb[], size_t len, size_t align); extern int ip_set_get_extensions(struct ip_set *set, struct nlattr *tb[], struct ip_set_ext *ext); +extern int ip_set_put_extensions(struct sk_buff *skb, const struct ip_set *set, +const void *e, bool active); static inline int ip_set_get_hostipaddr4(struct nlattr *nla, u32 *ipaddr) @@ -449,10 +451,6 @@ bitmap_bytes(u32 a, u32 b) #include #include -int -ip_set_put_extensions(struct sk_buff *skb, const struct ip_set *set, - const void *e, bool active); - #define IP_SET_INIT_KEXT(skb, opt, set)\ { .bytes = (skb)->len, .packets = 1,\ .timeout = ip_set_adt_opt_timeout(opt, set) } -- 2.1.4
[PATCH 38/39] netfilter: conntrack: remove unused netns_ct member
From: Florian Westphalsince 23014011ba420 ('netfilter: conntrack: support a fixed size of 128 distinct labels') this isn't needed anymore. Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso --- include/net/netns/conntrack.h | 1 - 1 file changed, 1 deletion(-) diff --git a/include/net/netns/conntrack.h b/include/net/netns/conntrack.h index e469e85de3f9..3d06d94d2e52 100644 --- a/include/net/netns/conntrack.h +++ b/include/net/netns/conntrack.h @@ -91,7 +91,6 @@ struct netns_ct { struct nf_ip_netnf_ct_proto; #if defined(CONFIG_NF_CONNTRACK_LABELS) unsigned intlabels_used; - u8 label_words; #endif }; #endif -- 2.1.4
[PATCH 17/39] netfilter: ipset: Mark some helper args as const.
From: Jozsef KadlecsikMark some of the helpers arguments as const. Ported from a patch proposed by Sergey Popovich . Suggested-by: Sergey Popovich Signed-off-by: Jozsef Kadlecsik --- include/linux/netfilter/ipset/ip_set.h | 4 ++-- include/linux/netfilter/ipset/ip_set_comment.h | 2 +- include/linux/netfilter/ipset/ip_set_timeout.h | 4 ++-- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/include/linux/netfilter/ipset/ip_set.h b/include/linux/netfilter/ipset/ip_set.h index 5b1fd090f34b..524467f933bf 100644 --- a/include/linux/netfilter/ipset/ip_set.h +++ b/include/linux/netfilter/ipset/ip_set.h @@ -346,7 +346,7 @@ ip_set_get_skbinfo(struct ip_set_skbinfo *skbinfo, } static inline bool -ip_set_put_skbinfo(struct sk_buff *skb, struct ip_set_skbinfo *skbinfo) +ip_set_put_skbinfo(struct sk_buff *skb, const struct ip_set_skbinfo *skbinfo) { /* Send nonzero parameters only */ return ((skbinfo->skbmark || skbinfo->skbmarkmask) && @@ -373,7 +373,7 @@ ip_set_init_skbinfo(struct ip_set_skbinfo *skbinfo, } static inline bool -ip_set_put_counter(struct sk_buff *skb, struct ip_set_counter *counter) +ip_set_put_counter(struct sk_buff *skb, const struct ip_set_counter *counter) { return nla_put_net64(skb, IPSET_ATTR_BYTES, cpu_to_be64(ip_set_get_bytes(counter)), diff --git a/include/linux/netfilter/ipset/ip_set_comment.h b/include/linux/netfilter/ipset/ip_set_comment.h index 8d0248525957..bae5c7609be2 100644 --- a/include/linux/netfilter/ipset/ip_set_comment.h +++ b/include/linux/netfilter/ipset/ip_set_comment.h @@ -43,7 +43,7 @@ ip_set_init_comment(struct ip_set_comment *comment, /* Used only when dumping a set, protected by rcu_read_lock_bh() */ static inline int -ip_set_put_comment(struct sk_buff *skb, struct ip_set_comment *comment) +ip_set_put_comment(struct sk_buff *skb, const struct ip_set_comment *comment) { struct ip_set_comment_rcu *c = rcu_dereference_bh(comment->c); diff --git a/include/linux/netfilter/ipset/ip_set_timeout.h b/include/linux/netfilter/ipset/ip_set_timeout.h index 1d6a935c1ac5..bfb3531fd88a 100644 --- a/include/linux/netfilter/ipset/ip_set_timeout.h +++ b/include/linux/netfilter/ipset/ip_set_timeout.h @@ -40,7 +40,7 @@ ip_set_timeout_uget(struct nlattr *tb) } static inline bool -ip_set_timeout_expired(unsigned long *t) +ip_set_timeout_expired(const unsigned long *t) { return *t != IPSET_ELEM_PERMANENT && time_is_before_jiffies(*t); } @@ -63,7 +63,7 @@ ip_set_timeout_set(unsigned long *timeout, u32 value) } static inline u32 -ip_set_timeout_get(unsigned long *timeout) +ip_set_timeout_get(const unsigned long *timeout) { return *timeout == IPSET_ELEM_PERMANENT ? 0 : jiffies_to_msecs(*timeout - jiffies)/MSEC_PER_SEC; -- 2.1.4
[PATCH 14/39] udp: provide udp{4,6}_lib_lookup for nf_socket_ipv{4,6}
From: Arnd BergmannSince commit ca065d0cf80f ("udp: no longer use SLAB_DESTROY_BY_RCU") the udp6_lib_lookup and udp4_lib_lookup functions are only provided when it is actually possible to call them. However, moving the callers now caused a link error: net/built-in.o: In function `nf_sk_lookup_slow_v6': (.text+0x131a39): undefined reference to `udp6_lib_lookup' net/ipv4/netfilter/nf_socket_ipv4.o: In function `nf_sk_lookup_slow_v4': nf_socket_ipv4.c:(.text.nf_sk_lookup_slow_v4+0x114): undefined reference to `udp4_lib_lookup' This extends the #ifdef so we also provide the functions when CONFIG_NF_SOCKET_IPV4 or CONFIG_NF_SOCKET_IPV6, respectively are set. Fixes: 8db4c5be88f6 ("netfilter: move socket lookup infrastructure to nf_socket_ipv{4,6}.c") Signed-off-by: Arnd Bergmann Signed-off-by: Pablo Neira Ayuso --- net/ipv4/udp.c | 3 ++- net/ipv6/udp.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 195992e0440d..395361b1398e 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -580,7 +580,8 @@ EXPORT_SYMBOL_GPL(udp4_lib_lookup_skb); * Does increment socket refcount. */ #if IS_ENABLED(CONFIG_NETFILTER_XT_MATCH_SOCKET) || \ -IS_ENABLED(CONFIG_NETFILTER_XT_TARGET_TPROXY) +IS_ENABLED(CONFIG_NETFILTER_XT_TARGET_TPROXY) || \ +IS_ENABLED(CONFIG_NF_SOCKET_IPV4) struct sock *udp4_lib_lookup(struct net *net, __be32 saddr, __be16 sport, __be32 daddr, __be16 dport, int dif) { diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index a7700bbf6788..3e232585b0ff 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -302,7 +302,8 @@ EXPORT_SYMBOL_GPL(udp6_lib_lookup_skb); * Does increment socket refcount. */ #if IS_ENABLED(CONFIG_NETFILTER_XT_MATCH_SOCKET) || \ -IS_ENABLED(CONFIG_NETFILTER_XT_TARGET_TPROXY) +IS_ENABLED(CONFIG_NETFILTER_XT_TARGET_TPROXY) || \ +IS_ENABLED(CONFIG_NF_SOCKET_IPV6) struct sock *udp6_lib_lookup(struct net *net, const struct in6_addr *saddr, __be16 sport, const struct in6_addr *daddr, __be16 dport, int dif) { -- 2.1.4
[PATCH 32/39] netfilter: ipset: Make struct htype per ipset family
From: Jozsef KadlecsikBefore this patch struct htype created at the first source of ip_set_hash_gen.h and it is common for both IPv4 and IPv6 set variants. Make struct htype per ipset family and use NLEN to make nets array fixed size to simplify struct htype allocation. Ported from a patch proposed by Sergey Popovich . Signed-off-by: Jozsef Kadlecsik --- net/netfilter/ipset/ip_set_hash_gen.h| 51 +++- net/netfilter/ipset/ip_set_hash_ip.c | 10 +++--- net/netfilter/ipset/ip_set_hash_ipmark.c | 10 +++--- net/netfilter/ipset/ip_set_hash_ipport.c | 6 ++-- net/netfilter/ipset/ip_set_hash_ipportip.c | 6 ++-- net/netfilter/ipset/ip_set_hash_ipportnet.c | 10 +++--- net/netfilter/ipset/ip_set_hash_net.c| 8 ++--- net/netfilter/ipset/ip_set_hash_netiface.c | 8 ++--- net/netfilter/ipset/ip_set_hash_netnet.c | 8 ++--- net/netfilter/ipset/ip_set_hash_netport.c| 10 +++--- net/netfilter/ipset/ip_set_hash_netportnet.c | 10 +++--- 11 files changed, 63 insertions(+), 74 deletions(-) diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h index de1d16fd4121..c600f6d9f15e 100644 --- a/net/netfilter/ipset/ip_set_hash_gen.h +++ b/net/netfilter/ipset/ip_set_hash_gen.h @@ -166,6 +166,18 @@ htable_bits(u32 hashsize) #endif /* _IP_SET_HASH_GEN_H */ +#ifndef MTYPE +#error "MTYPE is not defined!" +#endif + +#ifndef HTYPE +#error "HTYPE is not defined!" +#endif + +#ifndef HOST_MASK +#error "HOST_MASK is not defined!" +#endif + /* Family dependent templates */ #undef ahash_data @@ -189,7 +201,6 @@ htable_bits(u32 hashsize) #undef mtype_same_set #undef mtype_kadt #undef mtype_uadt -#undef mtype #undef mtype_add #undef mtype_del @@ -205,6 +216,7 @@ htable_bits(u32 hashsize) #undef mtype_variant #undef mtype_data_match +#undef htype #undef HKEY #define mtype_data_equal IPSET_TOKEN(MTYPE, _data_equal) @@ -231,7 +243,6 @@ htable_bits(u32 hashsize) #define mtype_same_set IPSET_TOKEN(MTYPE, _same_set) #define mtype_kadt IPSET_TOKEN(MTYPE, _kadt) #define mtype_uadt IPSET_TOKEN(MTYPE, _uadt) -#define mtype MTYPE #define mtype_add IPSET_TOKEN(MTYPE, _add) #define mtype_del IPSET_TOKEN(MTYPE, _del) @@ -247,18 +258,12 @@ htable_bits(u32 hashsize) #define mtype_variant IPSET_TOKEN(MTYPE, _variant) #define mtype_data_match IPSET_TOKEN(MTYPE, _data_match) -#ifndef MTYPE -#error "MTYPE is not defined!" -#endif - -#ifndef HOST_MASK -#error "HOST_MASK is not defined!" -#endif - #ifndef HKEY_DATALEN #define HKEY_DATALEN sizeof(struct mtype_elem) #endif +#define htype MTYPE + #define HKEY(data, initval, htable_bits) \ ({ \ const u32 *__k = (const u32 *)data; \ @@ -269,33 +274,26 @@ htable_bits(u32 hashsize) jhash2(__k, __l, initval) & jhash_mask(htable_bits);\ }) -#ifndef htype -#ifndef HTYPE -#error "HTYPE is not defined!" -#endif /* HTYPE */ -#define htype HTYPE - /* The generic hash structure */ struct htype { struct htable __rcu *table; /* the hash table */ + struct timer_list gc; /* garbage collection when timeout enabled */ u32 maxelem;/* max elements in the hash */ u32 initval;/* random jhash init value */ #ifdef IP_SET_HASH_WITH_MARKMASK u32 markmask; /* markmask value for mark mask to store */ #endif - struct timer_list gc; /* garbage collection when timeout enabled */ - struct mtype_elem next; /* temporary storage for uadd */ #ifdef IP_SET_HASH_WITH_MULTI u8 ahash_max; /* max elements in an array block */ #endif #ifdef IP_SET_HASH_WITH_NETMASK u8 netmask; /* netmask value for subnets to store */ #endif + struct mtype_elem next; /* temporary storage for uadd */ #ifdef IP_SET_HASH_WITH_NETS - struct net_prefixes nets[0]; /* book-keeping of prefixes */ + struct net_prefixes nets[NLEN]; /* book-keeping of prefixes */ #endif }; -#endif /* htype */ #ifdef IP_SET_HASH_WITH_NETS /* Network cidr size book keeping when the hash stores different @@ -348,13 +346,7 @@ mtype_del_cidr(struct htype *h, u8 cidr, u8 n) static size_t mtype_ahash_memsize(const struct htype *h, const struct htable *t) { - size_t memsize = sizeof(*h) + sizeof(*t); - -#ifdef IP_SET_HASH_WITH_NETS - memsize += sizeof(struct net_prefixes) * NLEN; -#endif - - return memsize; + return sizeof(*h) + sizeof(*t); } /* Get the ith element from the array block n */ @@ -392,7 +384,7 @@ mtype_flush(struct ip_set *set) kfree_rcu(n, rcu); } #ifdef IP_SET_HASH_WITH_NETS -
[PATCH 31/39] netfilter: ipset: Optimize hash creation routine
From: Jozsef KadlecsikExit as easly as possible on error and use RCU_INIT_POINTER() as set is not seen at creation time. Signed-off-by: Jozsef Kadlecsik --- net/netfilter/ipset/ip_set_hash_gen.h | 63 --- 1 file changed, 29 insertions(+), 34 deletions(-) diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h index 34f115f874ab..de1d16fd4121 100644 --- a/net/netfilter/ipset/ip_set_hash_gen.h +++ b/net/netfilter/ipset/ip_set_hash_gen.h @@ -1241,41 +1241,35 @@ IPSET_TOKEN(HTYPE, _create)(struct net *net, struct ip_set *set, struct htype *h; struct htable *t; + pr_debug("Create set %s with family %s\n", +set->name, set->family == NFPROTO_IPV4 ? "inet" : "inet6"); + #ifndef IP_SET_PROTO_UNDEF if (!(set->family == NFPROTO_IPV4 || set->family == NFPROTO_IPV6)) return -IPSET_ERR_INVALID_FAMILY; #endif -#ifdef IP_SET_HASH_WITH_MARKMASK - markmask = 0x; -#endif -#ifdef IP_SET_HASH_WITH_NETMASK - netmask = set->family == NFPROTO_IPV4 ? 32 : 128; - pr_debug("Create set %s with family %s\n", -set->name, set->family == NFPROTO_IPV4 ? "inet" : "inet6"); -#endif - if (unlikely(!ip_set_optattr_netorder(tb, IPSET_ATTR_HASHSIZE) || !ip_set_optattr_netorder(tb, IPSET_ATTR_MAXELEM) || !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) || !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS))) return -IPSET_ERR_PROTOCOL; + #ifdef IP_SET_HASH_WITH_MARKMASK /* Separated condition in order to avoid directive in argument list */ if (unlikely(!ip_set_optattr_netorder(tb, IPSET_ATTR_MARKMASK))) return -IPSET_ERR_PROTOCOL; -#endif - if (tb[IPSET_ATTR_HASHSIZE]) { - hashsize = ip_set_get_h32(tb[IPSET_ATTR_HASHSIZE]); - if (hashsize < IPSET_MIMINAL_HASHSIZE) - hashsize = IPSET_MIMINAL_HASHSIZE; + markmask = 0x; + if (tb[IPSET_ATTR_MARKMASK]) { + markmask = ntohl(nla_get_be32(tb[IPSET_ATTR_MARKMASK])); + if (markmask == 0) + return -IPSET_ERR_INVALID_MARKMASK; } - - if (tb[IPSET_ATTR_MAXELEM]) - maxelem = ip_set_get_h32(tb[IPSET_ATTR_MAXELEM]); +#endif #ifdef IP_SET_HASH_WITH_NETMASK + netmask = set->family == NFPROTO_IPV4 ? 32 : 128; if (tb[IPSET_ATTR_NETMASK]) { netmask = nla_get_u8(tb[IPSET_ATTR_NETMASK]); @@ -1285,14 +1279,15 @@ IPSET_TOKEN(HTYPE, _create)(struct net *net, struct ip_set *set, return -IPSET_ERR_INVALID_NETMASK; } #endif -#ifdef IP_SET_HASH_WITH_MARKMASK - if (tb[IPSET_ATTR_MARKMASK]) { - markmask = ntohl(nla_get_be32(tb[IPSET_ATTR_MARKMASK])); - if (markmask == 0) - return -IPSET_ERR_INVALID_MARKMASK; + if (tb[IPSET_ATTR_HASHSIZE]) { + hashsize = ip_set_get_h32(tb[IPSET_ATTR_HASHSIZE]); + if (hashsize < IPSET_MIMINAL_HASHSIZE) + hashsize = IPSET_MIMINAL_HASHSIZE; } -#endif + + if (tb[IPSET_ATTR_MAXELEM]) + maxelem = ip_set_get_h32(tb[IPSET_ATTR_MAXELEM]); hsize = sizeof(*h); #ifdef IP_SET_HASH_WITH_NETS @@ -1302,16 +1297,6 @@ IPSET_TOKEN(HTYPE, _create)(struct net *net, struct ip_set *set, if (!h) return -ENOMEM; - h->maxelem = maxelem; -#ifdef IP_SET_HASH_WITH_NETMASK - h->netmask = netmask; -#endif -#ifdef IP_SET_HASH_WITH_MARKMASK - h->markmask = markmask; -#endif - get_random_bytes(>initval, sizeof(h->initval)); - set->timeout = IPSET_NO_TIMEOUT; - hbits = htable_bits(hashsize); hsize = htable_size(hbits); if (hsize == 0) { @@ -1323,8 +1308,17 @@ IPSET_TOKEN(HTYPE, _create)(struct net *net, struct ip_set *set, kfree(h); return -ENOMEM; } + h->maxelem = maxelem; +#ifdef IP_SET_HASH_WITH_NETMASK + h->netmask = netmask; +#endif +#ifdef IP_SET_HASH_WITH_MARKMASK + h->markmask = markmask; +#endif + get_random_bytes(>initval, sizeof(h->initval)); + t->htable_bits = hbits; - rcu_assign_pointer(h->table, t); + RCU_INIT_POINTER(h->table, t); set->data = h; #ifndef IP_SET_PROTO_UNDEF @@ -1342,6 +1336,7 @@ IPSET_TOKEN(HTYPE, _create)(struct net *net, struct ip_set *set, __alignof__(struct IPSET_TOKEN(HTYPE, 6_elem))); } #endif + set->timeout = IPSET_NO_TIMEOUT; if (tb[IPSET_ATTR_TIMEOUT]) { set->timeout = ip_set_timeout_uget(tb[IPSET_ATTR_TIMEOUT]); #ifndef IP_SET_PROTO_UNDEF -- 2.1.4
[PATCH 35/39] netfilter: ipset: hash:ipmac type support added to ipset
From: Tomasz ChilinskiIntroduce the hash:ipmac type. Signed-off-by: Tomasz Chili??ski Signed-off-by: Jozsef Kadlecsik --- net/netfilter/ipset/Kconfig | 9 + net/netfilter/ipset/Makefile| 1 + net/netfilter/ipset/ip_set_hash_ipmac.c | 315 3 files changed, 325 insertions(+) create mode 100644 net/netfilter/ipset/ip_set_hash_ipmac.c diff --git a/net/netfilter/ipset/Kconfig b/net/netfilter/ipset/Kconfig index 234a8ec82076..4083a8051f0f 100644 --- a/net/netfilter/ipset/Kconfig +++ b/net/netfilter/ipset/Kconfig @@ -99,6 +99,15 @@ config IP_SET_HASH_IPPORTNET To compile it as a module, choose M here. If unsure, say N. +config IP_SET_HASH_IPMAC + tristate "hash:ip,mac set support" + depends on IP_SET + help + This option adds the hash:ip,mac set type support, by which + one can store IPv4/IPv6 address and MAC (ethernet address) pairs in a set. + + To compile it as a module, choose M here. If unsure, say N. + config IP_SET_HASH_MAC tristate "hash:mac set support" depends on IP_SET diff --git a/net/netfilter/ipset/Makefile b/net/netfilter/ipset/Makefile index 3dbd5e958489..28ec148df02d 100644 --- a/net/netfilter/ipset/Makefile +++ b/net/netfilter/ipset/Makefile @@ -14,6 +14,7 @@ obj-$(CONFIG_IP_SET_BITMAP_PORT) += ip_set_bitmap_port.o # hash types obj-$(CONFIG_IP_SET_HASH_IP) += ip_set_hash_ip.o +obj-$(CONFIG_IP_SET_HASH_IPMAC) += ip_set_hash_ipmac.o obj-$(CONFIG_IP_SET_HASH_IPMARK) += ip_set_hash_ipmark.o obj-$(CONFIG_IP_SET_HASH_IPPORT) += ip_set_hash_ipport.o obj-$(CONFIG_IP_SET_HASH_IPPORTIP) += ip_set_hash_ipportip.o diff --git a/net/netfilter/ipset/ip_set_hash_ipmac.c b/net/netfilter/ipset/ip_set_hash_ipmac.c new file mode 100644 index ..d9eb144b01d6 --- /dev/null +++ b/net/netfilter/ipset/ip_set_hash_ipmac.c @@ -0,0 +1,315 @@ +/* Copyright (C) 2016 Tomasz Chilinski + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +/* Kernel module implementing an IP set type: the hash:ip,mac type */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include + +#define IPSET_TYPE_REV_MIN 0 +#define IPSET_TYPE_REV_MAX 0 + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Tomasz Chilinski "); +IP_SET_MODULE_DESC("hash:ip,mac", IPSET_TYPE_REV_MIN, IPSET_TYPE_REV_MAX); +MODULE_ALIAS("ip_set_hash:ip,mac"); + +/* Type specific function prefix */ +#define HTYPE hash_ipmac + +/* Zero valued element is not supported */ +static const unsigned char invalid_ether[ETH_ALEN] = { 0 }; + +/* IPv4 variant */ + +/* Member elements */ +struct hash_ipmac4_elem { + /* Zero valued IP addresses cannot be stored */ + __be32 ip; + union { + unsigned char ether[ETH_ALEN]; + __be32 foo[2]; + }; +}; + +/* Common functions */ + +static inline bool +hash_ipmac4_data_equal(const struct hash_ipmac4_elem *e1, + const struct hash_ipmac4_elem *e2, + u32 *multi) +{ + return e1->ip == e2->ip && ether_addr_equal(e1->ether, e2->ether); +} + +static bool +hash_ipmac4_data_list(struct sk_buff *skb, const struct hash_ipmac4_elem *e) +{ + if (nla_put_ipaddr4(skb, IPSET_ATTR_IP, e->ip) || + nla_put(skb, IPSET_ATTR_ETHER, ETH_ALEN, e->ether)) + goto nla_put_failure; + return 0; + +nla_put_failure: + return 1; +} + +static inline void +hash_ipmac4_data_next(struct hash_ipmac4_elem *next, + const struct hash_ipmac4_elem *e) +{ + next->ip = e->ip; +} + +#define MTYPE hash_ipmac4 +#define PF 4 +#define HOST_MASK 32 +#define HKEY_DATALEN sizeof(struct hash_ipmac4_elem) +#include "ip_set_hash_gen.h" + +static int +hash_ipmac4_kadt(struct ip_set *set, const struct sk_buff *skb, +const struct xt_action_param *par, +enum ipset_adt adt, struct ip_set_adt_opt *opt) +{ + ipset_adtfn adtfn = set->variant->adt[adt]; + struct hash_ipmac4_elem e = { .ip = 0, { .foo[0] = 0, .foo[1] = 0 } }; + struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set); + +/* MAC can be src only */ + if (!(opt->flags & IPSET_DIM_TWO_SRC)) + return 0; + + if (skb_mac_header(skb) < skb->head || + (skb_mac_header(skb) + ETH_HLEN) > skb->data) + return -EINVAL; + + memcpy(e.ether, eth_hdr(skb)->h_source, ETH_ALEN); + if (ether_addr_equal(e.ether, invalid_ether)) + return -EINVAL; + + ip4addrptr(skb,
[PATCH 12/39] netfilter: nf_tables: simplify the basic expressions' init routine
From: Liping ZhangSome basic expressions are built into nf_tables.ko, such as nft_cmp, nft_lookup, nft_range and so on. But these basic expressions' init routine is a little ugly, too many goto errX labels, and we forget to call nft_range_module_exit in the exit routine, although it is harmless. Acctually, the init and exit routines of these basic expressions are same, i.e. do nft_register_expr in the init routine and do nft_unregister_expr in the exit routine. So it's better to arrange them into an array and deal with them together. Signed-off-by: Liping Zhang Signed-off-by: Pablo Neira Ayuso --- include/net/netfilter/nf_tables_core.h | 33 -- net/netfilter/nf_tables_core.c | 80 +++--- net/netfilter/nft_bitwise.c| 13 +- net/netfilter/nft_byteorder.c | 13 +- net/netfilter/nft_cmp.c| 13 +- net/netfilter/nft_dynset.c | 13 +- net/netfilter/nft_immediate.c | 13 +- net/netfilter/nft_lookup.c | 13 +- net/netfilter/nft_payload.c| 13 +- net/netfilter/nft_range.c | 13 +- 10 files changed, 43 insertions(+), 174 deletions(-) diff --git a/include/net/netfilter/nf_tables_core.h b/include/net/netfilter/nf_tables_core.h index 00f4f6b1b1ba..862373d4ea9d 100644 --- a/include/net/netfilter/nf_tables_core.h +++ b/include/net/netfilter/nf_tables_core.h @@ -1,12 +1,18 @@ #ifndef _NET_NF_TABLES_CORE_H #define _NET_NF_TABLES_CORE_H +extern struct nft_expr_type nft_imm_type; +extern struct nft_expr_type nft_cmp_type; +extern struct nft_expr_type nft_lookup_type; +extern struct nft_expr_type nft_bitwise_type; +extern struct nft_expr_type nft_byteorder_type; +extern struct nft_expr_type nft_payload_type; +extern struct nft_expr_type nft_dynset_type; +extern struct nft_expr_type nft_range_type; + int nf_tables_core_module_init(void); void nf_tables_core_module_exit(void); -int nft_immediate_module_init(void); -void nft_immediate_module_exit(void); - struct nft_cmp_fast_expr { u32 data; enum nft_registers sreg:8; @@ -25,24 +31,6 @@ static inline u32 nft_cmp_fast_mask(unsigned int len) extern const struct nft_expr_ops nft_cmp_fast_ops; -int nft_cmp_module_init(void); -void nft_cmp_module_exit(void); - -int nft_range_module_init(void); -void nft_range_module_exit(void); - -int nft_lookup_module_init(void); -void nft_lookup_module_exit(void); - -int nft_dynset_module_init(void); -void nft_dynset_module_exit(void); - -int nft_bitwise_module_init(void); -void nft_bitwise_module_exit(void); - -int nft_byteorder_module_init(void); -void nft_byteorder_module_exit(void); - struct nft_payload { enum nft_payload_bases base:8; u8 offset; @@ -62,7 +50,4 @@ struct nft_payload_set { extern const struct nft_expr_ops nft_payload_fast_ops; extern struct static_key_false nft_trace_enabled; -int nft_payload_module_init(void); -void nft_payload_module_exit(void); - #endif /* _NET_NF_TABLES_CORE_H */ diff --git a/net/netfilter/nf_tables_core.c b/net/netfilter/nf_tables_core.c index b63b1edb76a6..65dbeadcb118 100644 --- a/net/netfilter/nf_tables_core.c +++ b/net/netfilter/nf_tables_core.c @@ -232,68 +232,40 @@ nft_do_chain(struct nft_pktinfo *pkt, void *priv) } EXPORT_SYMBOL_GPL(nft_do_chain); +static struct nft_expr_type *nft_basic_types[] = { + _imm_type, + _cmp_type, + _lookup_type, + _bitwise_type, + _byteorder_type, + _payload_type, + _dynset_type, + _range_type, +}; + int __init nf_tables_core_module_init(void) { - int err; - - err = nft_immediate_module_init(); - if (err < 0) - goto err1; - - err = nft_cmp_module_init(); - if (err < 0) - goto err2; - - err = nft_lookup_module_init(); - if (err < 0) - goto err3; - - err = nft_bitwise_module_init(); - if (err < 0) - goto err4; + int err, i; - err = nft_byteorder_module_init(); - if (err < 0) - goto err5; - - err = nft_payload_module_init(); - if (err < 0) - goto err6; - - err = nft_dynset_module_init(); - if (err < 0) - goto err7; - - err = nft_range_module_init(); - if (err < 0) - goto err8; + for (i = 0; i < ARRAY_SIZE(nft_basic_types); i++) { + err = nft_register_expr(nft_basic_types[i]); + if (err) + goto err; + } return 0; -err8: - nft_dynset_module_exit(); -err7: - nft_payload_module_exit(); -err6: - nft_byteorder_module_exit(); -err5: - nft_bitwise_module_exit(); -err4: - nft_lookup_module_exit(); -err3: - nft_cmp_module_exit(); -err2: -
[PATCH 21/39] netfilter: ipset: Split extensions into separate files
From: Jozsef KadlecsikCleanup to separate all extensions into individual files. Ported from a patch proposed by Sergey Popovich . Suggested-by: Sergey Popovich Signed-off-by: Jozsef Kadlecsik --- include/linux/netfilter/ipset/ip_set.h | 95 +- include/linux/netfilter/ipset/ip_set_counter.h | 75 include/linux/netfilter/ipset/ip_set_skbinfo.h | 46 + 3 files changed, 123 insertions(+), 93 deletions(-) create mode 100644 include/linux/netfilter/ipset/ip_set_counter.h create mode 100644 include/linux/netfilter/ipset/ip_set_skbinfo.h diff --git a/include/linux/netfilter/ipset/ip_set.h b/include/linux/netfilter/ipset/ip_set.h index 780262124632..b5bd0fb3d07b 100644 --- a/include/linux/netfilter/ipset/ip_set.h +++ b/include/linux/netfilter/ipset/ip_set.h @@ -292,99 +292,6 @@ ip_set_put_flags(struct sk_buff *skb, struct ip_set *set) return nla_put_net32(skb, IPSET_ATTR_CADT_FLAGS, htonl(cadt_flags)); } -static inline void -ip_set_add_bytes(u64 bytes, struct ip_set_counter *counter) -{ - atomic64_add((long long)bytes, &(counter)->bytes); -} - -static inline void -ip_set_add_packets(u64 packets, struct ip_set_counter *counter) -{ - atomic64_add((long long)packets, &(counter)->packets); -} - -static inline u64 -ip_set_get_bytes(const struct ip_set_counter *counter) -{ - return (u64)atomic64_read(&(counter)->bytes); -} - -static inline u64 -ip_set_get_packets(const struct ip_set_counter *counter) -{ - return (u64)atomic64_read(&(counter)->packets); -} - -static inline void -ip_set_update_counter(struct ip_set_counter *counter, - const struct ip_set_ext *ext, - struct ip_set_ext *mext, u32 flags) -{ - if (ext->packets != ULLONG_MAX && - !(flags & IPSET_FLAG_SKIP_COUNTER_UPDATE)) { - ip_set_add_bytes(ext->bytes, counter); - ip_set_add_packets(ext->packets, counter); - } - if (flags & IPSET_FLAG_MATCH_COUNTERS) { - mext->packets = ip_set_get_packets(counter); - mext->bytes = ip_set_get_bytes(counter); - } -} - -static inline bool -ip_set_put_counter(struct sk_buff *skb, const struct ip_set_counter *counter) -{ - return nla_put_net64(skb, IPSET_ATTR_BYTES, -cpu_to_be64(ip_set_get_bytes(counter)), -IPSET_ATTR_PAD) || - nla_put_net64(skb, IPSET_ATTR_PACKETS, -cpu_to_be64(ip_set_get_packets(counter)), -IPSET_ATTR_PAD); -} - -static inline void -ip_set_init_counter(struct ip_set_counter *counter, - const struct ip_set_ext *ext) -{ - if (ext->bytes != ULLONG_MAX) - atomic64_set(&(counter)->bytes, (long long)(ext->bytes)); - if (ext->packets != ULLONG_MAX) - atomic64_set(&(counter)->packets, (long long)(ext->packets)); -} - -static inline void -ip_set_get_skbinfo(struct ip_set_skbinfo *skbinfo, - const struct ip_set_ext *ext, - struct ip_set_ext *mext, u32 flags) -{ - mext->skbinfo = *skbinfo; -} - -static inline bool -ip_set_put_skbinfo(struct sk_buff *skb, const struct ip_set_skbinfo *skbinfo) -{ - /* Send nonzero parameters only */ - return ((skbinfo->skbmark || skbinfo->skbmarkmask) && - nla_put_net64(skb, IPSET_ATTR_SKBMARK, - cpu_to_be64((u64)skbinfo->skbmark << 32 | - skbinfo->skbmarkmask), - IPSET_ATTR_PAD)) || - (skbinfo->skbprio && - nla_put_net32(skb, IPSET_ATTR_SKBPRIO, - cpu_to_be32(skbinfo->skbprio))) || - (skbinfo->skbqueue && - nla_put_net16(skb, IPSET_ATTR_SKBQUEUE, -cpu_to_be16(skbinfo->skbqueue))); -} - -static inline void -ip_set_init_skbinfo(struct ip_set_skbinfo *skbinfo, - const struct ip_set_ext *ext) -{ - *skbinfo = ext->skbinfo; -} - /* Netlink CB args */ enum { IPSET_CB_NET = 0, /* net namespace */ @@ -539,6 +446,8 @@ bitmap_bytes(u32 a, u32 b) #include #include +#include +#include int ip_set_put_extensions(struct sk_buff *skb, const struct ip_set *set, diff --git a/include/linux/netfilter/ipset/ip_set_counter.h b/include/linux/netfilter/ipset/ip_set_counter.h new file mode 100644 index ..bb6fba480118 --- /dev/null +++ b/include/linux/netfilter/ipset/ip_set_counter.h @@ -0,0 +1,75 @@ +#ifndef _IP_SET_COUNTER_H +#define _IP_SET_COUNTER_H + +/* Copyright (C) 2015 Jozsef Kadlecsik + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2
[PATCH 28/39] netfilter: ipset: Simplify mtype_expire() for hash types
From: Jozsef KadlecsikRemove one leve of intendation by using continue while iterating over elements in bucket. Ported from a patch proposed by Sergey Popovich . Signed-off-by: Jozsef Kadlecsik --- net/netfilter/ipset/ip_set_hash_gen.h | 25 - 1 file changed, 12 insertions(+), 13 deletions(-) diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h index c4877b6de74f..7999e4c556a5 100644 --- a/net/netfilter/ipset/ip_set_hash_gen.h +++ b/net/netfilter/ipset/ip_set_hash_gen.h @@ -487,21 +487,20 @@ mtype_expire(struct ip_set *set, struct htype *h) continue; } data = ahash_data(n, j, dsize); - if (ip_set_timeout_expired(ext_timeout(data, set))) { - pr_debug("expired %u/%u\n", i, j); - clear_bit(j, n->used); - smp_mb__after_atomic(); + if (!ip_set_timeout_expired(ext_timeout(data, set))) + continue; + pr_debug("expired %u/%u\n", i, j); + clear_bit(j, n->used); + smp_mb__after_atomic(); #ifdef IP_SET_HASH_WITH_NETS - for (k = 0; k < IPSET_NET_COUNT; k++) - mtype_del_cidr(h, - NCIDR_PUT(DCIDR_GET(data->cidr, - k)), - nets_length, k); + for (k = 0; k < IPSET_NET_COUNT; k++) + mtype_del_cidr(h, + NCIDR_PUT(DCIDR_GET(data->cidr, k)), + nets_length, k); #endif - ip_set_ext_destroy(set, data); - set->elements--; - d++; - } + ip_set_ext_destroy(set, data); + set->elements--; + d++; } if (d >= AHASH_INIT_SIZE) { if (d >= n->size) { -- 2.1.4
[PATCH 29/39] netfilter: ipset: Make NLEN compile time constant for hash types
From: Jozsef KadlecsikHash types define HOST_MASK before inclusion of ip_set_hash_gen.h and the only place where NLEN needed to be calculated at runtime is *_create() method. Ported from a patch proposed by Sergey Popovich . Signed-off-by: Jozsef Kadlecsik --- net/netfilter/ipset/ip_set_hash_gen.h | 51 --- 1 file changed, 23 insertions(+), 28 deletions(-) diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h index 7999e4c556a5..6c88c20ae1d4 100644 --- a/net/netfilter/ipset/ip_set_hash_gen.h +++ b/net/netfilter/ipset/ip_set_hash_gen.h @@ -150,20 +150,18 @@ htable_bits(u32 hashsize) #define INIT_CIDR(cidr, host_mask) \ DCIDR_PUT(((cidr) ? NCIDR_GET(cidr) : host_mask)) -#define SET_HOST_MASK(family) (family == AF_INET ? 32 : 128) - #ifdef IP_SET_HASH_WITH_NET0 -/* cidr from 0 to SET_HOST_MASK() value and c = cidr + 1 */ -#define NLEN(family) (SET_HOST_MASK(family) + 1) +/* cidr from 0 to HOST_MASK value and c = cidr + 1 */ +#define NLEN (HOST_MASK + 1) #define CIDR_POS(c)((c) - 1) #else -/* cidr from 1 to SET_HOST_MASK() value and c = cidr + 1 */ -#define NLEN(family) SET_HOST_MASK(family) +/* cidr from 1 to HOST_MASK value and c = cidr + 1 */ +#define NLEN HOST_MASK #define CIDR_POS(c)((c) - 2) #endif #else -#define NLEN(family) 0 +#define NLEN 0 #endif /* IP_SET_HASH_WITH_NETS */ #endif /* _IP_SET_HASH_GEN_H */ @@ -298,12 +296,12 @@ struct htype { * sized networks. cidr == real cidr + 1 to support /0. */ static void -mtype_add_cidr(struct htype *h, u8 cidr, u8 nets_length, u8 n) +mtype_add_cidr(struct htype *h, u8 cidr, u8 n) { int i, j; /* Add in increasing prefix order, so larger cidr first */ - for (i = 0, j = -1; i < nets_length && h->nets[i].cidr[n]; i++) { + for (i = 0, j = -1; i < NLEN && h->nets[i].cidr[n]; i++) { if (j != -1) { continue; } else if (h->nets[i].cidr[n] < cidr) { @@ -322,11 +320,11 @@ mtype_add_cidr(struct htype *h, u8 cidr, u8 nets_length, u8 n) } static void -mtype_del_cidr(struct htype *h, u8 cidr, u8 nets_length, u8 n) +mtype_del_cidr(struct htype *h, u8 cidr, u8 n) { - u8 i, j, net_end = nets_length - 1; + u8 i, j, net_end = NLEN - 1; - for (i = 0; i < nets_length; i++) { + for (i = 0; i < NLEN; i++) { if (h->nets[i].cidr[n] != cidr) continue; h->nets[CIDR_POS(cidr)].nets[n]--; @@ -342,13 +340,12 @@ mtype_del_cidr(struct htype *h, u8 cidr, u8 nets_length, u8 n) /* Calculate the actual memory size of the set data */ static size_t -mtype_ahash_memsize(const struct htype *h, const struct htable *t, - u8 nets_length) +mtype_ahash_memsize(const struct htype *h, const struct htable *t) { size_t memsize = sizeof(*h) + sizeof(*t); #ifdef IP_SET_HASH_WITH_NETS - memsize += sizeof(struct net_prefixes) * nets_length; + memsize += sizeof(struct net_prefixes) * NLEN; #endif return memsize; @@ -389,7 +386,7 @@ mtype_flush(struct ip_set *set) kfree_rcu(n, rcu); } #ifdef IP_SET_HASH_WITH_NETS - memset(h->nets, 0, sizeof(struct net_prefixes) * NLEN(set->family)); + memset(h->nets, 0, sizeof(struct net_prefixes) * NLEN); #endif set->elements = 0; set->ext_size = 0; @@ -473,7 +470,7 @@ mtype_expire(struct ip_set *set, struct htype *h) u32 i, j, d; size_t dsize = set->dsize; #ifdef IP_SET_HASH_WITH_NETS - u8 k, nets_length = NLEN(set->family); + u8 k; #endif t = ipset_dereference_protected(h->table, set); @@ -496,7 +493,7 @@ mtype_expire(struct ip_set *set, struct htype *h) for (k = 0; k < IPSET_NET_COUNT; k++) mtype_del_cidr(h, NCIDR_PUT(DCIDR_GET(data->cidr, k)), - nets_length, k); + k); #endif ip_set_ext_destroy(set, data); set->elements--; @@ -776,7 +773,7 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext, for (i = 0; i < IPSET_NET_COUNT; i++) mtype_del_cidr(h, NCIDR_PUT(DCIDR_GET(data->cidr, i)), - NLEN(set->family), i); + i); #endif ip_set_ext_destroy(set, data); set->elements--; @@ -812,8 +809,7 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext, set->elements++; #ifdef
[PATCH 05/39] netfilter: x_tables: move hook state into xt_action_param structure
Place pointer to hook state in xt_action_param structure instead of copying the fields that we need. After this change xt_action_param fits into one cacheline. This patch also adds a set of new wrapper functions to fetch relevant hook state structure fields. Signed-off-by: Pablo Neira Ayuso--- include/linux/netfilter/x_tables.h | 48 +++--- include/net/netfilter/nf_tables.h | 11 +++ net/bridge/netfilter/ebt_arpreply.c| 3 +- net/bridge/netfilter/ebt_log.c | 11 +++ net/bridge/netfilter/ebt_nflog.c | 6 ++-- net/bridge/netfilter/ebt_redirect.c| 6 ++-- net/bridge/netfilter/ebtables.c| 6 +--- net/ipv4/netfilter/arp_tables.c| 6 +--- net/ipv4/netfilter/ip_tables.c | 6 +--- net/ipv4/netfilter/ipt_MASQUERADE.c| 3 +- net/ipv4/netfilter/ipt_REJECT.c| 4 +-- net/ipv4/netfilter/ipt_SYNPROXY.c | 4 +-- net/ipv4/netfilter/ipt_rpfilter.c | 2 +- net/ipv6/netfilter/ip6_tables.c| 6 +--- net/ipv6/netfilter/ip6t_MASQUERADE.c | 2 +- net/ipv6/netfilter/ip6t_REJECT.c | 23 -- net/ipv6/netfilter/ip6t_SYNPROXY.c | 4 +-- net/ipv6/netfilter/ip6t_rpfilter.c | 3 +- net/netfilter/ipset/ip_set_core.c | 6 ++-- net/netfilter/ipset/ip_set_hash_netiface.c | 2 +- net/netfilter/xt_AUDIT.c | 10 +++ net/netfilter/xt_LOG.c | 6 ++-- net/netfilter/xt_NETMAP.c | 20 ++--- net/netfilter/xt_NFLOG.c | 6 ++-- net/netfilter/xt_NFQUEUE.c | 4 +-- net/netfilter/xt_REDIRECT.c| 4 +-- net/netfilter/xt_TCPMSS.c | 4 +-- net/netfilter/xt_TEE.c | 4 +-- net/netfilter/xt_TPROXY.c | 16 +- net/netfilter/xt_addrtype.c| 10 +++ net/netfilter/xt_cluster.c | 2 +- net/netfilter/xt_connlimit.c | 8 ++--- net/netfilter/xt_conntrack.c | 8 ++--- net/netfilter/xt_devgroup.c| 4 +-- net/netfilter/xt_dscp.c| 2 +- net/netfilter/xt_ipvs.c| 4 +-- net/netfilter/xt_nfacct.c | 2 +- net/netfilter/xt_osf.c | 10 +++ net/netfilter/xt_owner.c | 2 +- net/netfilter/xt_pkttype.c | 4 +-- net/netfilter/xt_policy.c | 4 +-- net/netfilter/xt_recent.c | 10 +++ net/netfilter/xt_set.c | 26 net/netfilter/xt_socket.c | 4 +-- net/sched/act_ipt.c| 12 net/sched/em_ipset.c | 17 ++- 46 files changed, 196 insertions(+), 169 deletions(-) diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h index 2ad1a2b289b5..cd4eaf8df445 100644 --- a/include/linux/netfilter/x_tables.h +++ b/include/linux/netfilter/x_tables.h @@ -4,6 +4,7 @@ #include #include +#include #include /* Test a struct->invflags and a boolean for inequality */ @@ -17,14 +18,9 @@ * @target:the target extension * @matchinfo: per-match data * @targetinfo:per-target data - * @netnetwork namespace through which the action was invoked - * @in:input netdevice - * @out: output netdevice + * @state: pointer to hook state this packet came from * @fragoff: packet is a fragment, this is the data offset * @thoff: position of transport header relative to skb->data - * @hook: hook number given packet came from - * @family:Actual NFPROTO_* through which the function is invoked - * (helpful when match->family == NFPROTO_UNSPEC) * * Fields written to by extensions: * @@ -38,15 +34,47 @@ struct xt_action_param { union { const void *matchinfo, *targinfo; }; - struct net *net; - const struct net_device *in, *out; + const struct nf_hook_state *state; int fragoff; unsigned int thoff; - unsigned int hooknum; - u_int8_t family; bool hotdrop; }; +static inline struct net *xt_net(const struct xt_action_param *par) +{ + return par->state->net; +} + +static inline struct net_device *xt_in(const struct xt_action_param *par) +{ + return par->state->in; +} + +static inline const char *xt_inname(const struct xt_action_param *par) +{ + return par->state->in->name; +} + +static inline struct net_device *xt_out(const struct xt_action_param *par) +{ + return par->state->out; +} + +static inline const char *xt_outname(const struct xt_action_param *par) +{ + return par->state->out->name; +} + +static inline unsigned int xt_hooknum(const struct xt_action_param *par) +{ + return
[PATCH 16/39] netfilter: ipset: Remove extra whitespaces in ip_set.h
From: Jozsef KadlecsikRemove unnecessary whitespaces. Ported from a patch proposed by Sergey Popovich . Suggested-by: Sergey Popovich Signed-off-by: Jozsef Kadlecsik --- include/linux/netfilter/ipset/ip_set.h | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/include/linux/netfilter/ipset/ip_set.h b/include/linux/netfilter/ipset/ip_set.h index 83b9a2e0d8d4..5b1fd090f34b 100644 --- a/include/linux/netfilter/ipset/ip_set.h +++ b/include/linux/netfilter/ipset/ip_set.h @@ -336,14 +336,15 @@ ip_set_update_counter(struct ip_set_counter *counter, static inline void ip_set_get_skbinfo(struct ip_set_skbinfo *skbinfo, - const struct ip_set_ext *ext, - struct ip_set_ext *mext, u32 flags) + const struct ip_set_ext *ext, + struct ip_set_ext *mext, u32 flags) { - mext->skbmark = skbinfo->skbmark; - mext->skbmarkmask = skbinfo->skbmarkmask; - mext->skbprio = skbinfo->skbprio; - mext->skbqueue = skbinfo->skbqueue; + mext->skbmark = skbinfo->skbmark; + mext->skbmarkmask = skbinfo->skbmarkmask; + mext->skbprio = skbinfo->skbprio; + mext->skbqueue = skbinfo->skbqueue; } + static inline bool ip_set_put_skbinfo(struct sk_buff *skb, struct ip_set_skbinfo *skbinfo) { -- 2.1.4
[PATCH 15/39] netfilter: conntrack: fix NF_REPEAT handling
From: Arnd Bergmanngcc correctly identified a theoretical uninitialized variable use: net/netfilter/nf_conntrack_core.c: In function 'nf_conntrack_in': net/netfilter/nf_conntrack_core.c:1125:14: error: 'l4proto' may be used uninitialized in this function [-Werror=maybe-uninitialized] This could only happen when we 'goto out' before looking up l4proto, and then enter the retry, implying that l3proto->get_l4proto() returned NF_REPEAT. This does not currently get returned in any code path and probably won't ever happen, but is not good to rely on. Moving the repeat handling up a little should have the same behavior as today but avoids the warning by making that case impossible to enter. [ I have mangled this original patch to remove the check for tmpl, we should inconditionally jump back to the repeat label in case we hit NF_REPEAT instead. I have also moved the comment that explains this where it belongs. --pablo ] Fixes: 08733a0cb7de ("netfilter: handle NF_REPEAT from nf_conntrack_in()") Signed-off-by: Arnd Bergmann Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nf_conntrack_core.c | 18 -- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index de4b8a75f30b..e9ffe33dc0ca 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -1337,6 +1337,12 @@ nf_conntrack_in(struct net *net, u_int8_t pf, unsigned int hooknum, NF_CT_STAT_INC_ATOMIC(net, invalid); if (ret == -NF_DROP) NF_CT_STAT_INC_ATOMIC(net, drop); + /* Special case: TCP tracker reports an attempt to reopen a +* closed/aborted connection. We have to go back and create a +* fresh conntrack. +*/ + if (ret == -NF_REPEAT) + goto repeat; ret = -ret; goto out; } @@ -1344,16 +1350,8 @@ nf_conntrack_in(struct net *net, u_int8_t pf, unsigned int hooknum, if (set_reply && !test_and_set_bit(IPS_SEEN_REPLY_BIT, >status)) nf_conntrack_event_cache(IPCT_REPLY, ct); out: - if (tmpl) { - /* Special case: TCP tracker reports an attempt to reopen a -* closed/aborted connection. We have to go back and create a -* fresh conntrack. -*/ - if (ret == NF_REPEAT) - goto repeat; - else - nf_ct_put(tmpl); - } + if (tmpl) + nf_ct_put(tmpl); return ret; } -- 2.1.4
[PATCH 34/39] netfilter: ipset: Fix reported memory size for hash:* types
From: Jozsef KadlecsikThe calculation of the full allocated memory did not take into account the size of the base hash bucket structure at some places. Signed-off-by: Jozsef Kadlecsik --- net/netfilter/ipset/ip_set_hash_gen.h | 16 +--- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h index 1c9b84e53dcc..88b70fcc5ac5 100644 --- a/net/netfilter/ipset/ip_set_hash_gen.h +++ b/net/netfilter/ipset/ip_set_hash_gen.h @@ -85,6 +85,8 @@ struct htable { }; #define hbucket(h, i) ((h)->bucket[i]) +#define ext_size(n, dsize) \ + (sizeof(struct hbucket) + (n) * (dsize)) #ifndef IPSET_NET_COUNT #define IPSET_NET_COUNT1 @@ -519,7 +521,7 @@ mtype_expire(struct ip_set *set, struct htype *h) d++; } tmp->pos = d; - set->ext_size -= AHASH_INIT_SIZE * dsize; + set->ext_size -= ext_size(AHASH_INIT_SIZE, dsize); rcu_assign_pointer(hbucket(t, i), tmp); kfree_rcu(n, rcu); } @@ -625,7 +627,7 @@ mtype_resize(struct ip_set *set, bool retried) goto cleanup; } m->size = AHASH_INIT_SIZE; - extsize = sizeof(*m) + AHASH_INIT_SIZE * dsize; + extsize = ext_size(AHASH_INIT_SIZE, dsize); RCU_INIT_POINTER(hbucket(t, key), m); } else if (m->pos >= m->size) { struct hbucket *ht; @@ -645,7 +647,7 @@ mtype_resize(struct ip_set *set, bool retried) memcpy(ht, m, sizeof(struct hbucket) + m->size * dsize); ht->size = m->size + AHASH_INIT_SIZE; - extsize += AHASH_INIT_SIZE * dsize; + extsize += ext_size(AHASH_INIT_SIZE, dsize); kfree(m); m = ht; RCU_INIT_POINTER(hbucket(t, key), ht); @@ -727,7 +729,7 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext, if (!n) return -ENOMEM; n->size = AHASH_INIT_SIZE; - set->ext_size += sizeof(*n) + AHASH_INIT_SIZE * set->dsize; + set->ext_size += ext_size(AHASH_INIT_SIZE, set->dsize); goto copy_elem; } for (i = 0; i < n->pos; i++) { @@ -791,7 +793,7 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext, memcpy(n, old, sizeof(struct hbucket) + old->size * set->dsize); n->size = old->size + AHASH_INIT_SIZE; - set->ext_size += AHASH_INIT_SIZE * set->dsize; + set->ext_size += ext_size(AHASH_INIT_SIZE, set->dsize); } copy_elem: @@ -883,7 +885,7 @@ mtype_del(struct ip_set *set, void *value, const struct ip_set_ext *ext, k++; } if (n->pos == 0 && k == 0) { - set->ext_size -= sizeof(*n) + n->size * dsize; + set->ext_size -= ext_size(n->size, dsize); rcu_assign_pointer(hbucket(t, key), NULL); kfree_rcu(n, rcu); } else if (k >= AHASH_INIT_SIZE) { @@ -902,7 +904,7 @@ mtype_del(struct ip_set *set, void *value, const struct ip_set_ext *ext, k++; } tmp->pos = k; - set->ext_size -= AHASH_INIT_SIZE * dsize; + set->ext_size -= ext_size(AHASH_INIT_SIZE, dsize); rcu_assign_pointer(hbucket(t, key), tmp); kfree_rcu(n, rcu); } -- 2.1.4
[PATCH 24/39] netfilter: ipset: Add element count to hash headers
From: Eric B MunsonIt would be useful for userspace to query the size of an ipset hash, however, this data is not exposed to userspace outside of counting the number of member entries. This patch uses the attribute IPSET_ATTR_ELEMENTS to indicate the size in the the header that is exported to userspace. This field is then printed by the userspace tool for hashes. Signed-off-by: Eric B Munson Cc: Pablo Neira Ayuso Cc: Josh Hunt Cc: netfilter-de...@vger.kernel.org Signed-off-by: Jozsef Kadlecsik --- net/netfilter/ipset/ip_set_hash_gen.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h index d32fd6b036bf..f5acfb9709c9 100644 --- a/net/netfilter/ipset/ip_set_hash_gen.h +++ b/net/netfilter/ipset/ip_set_hash_gen.h @@ -1083,7 +1083,8 @@ mtype_head(struct ip_set *set, struct sk_buff *skb) goto nla_put_failure; #endif if (nla_put_net32(skb, IPSET_ATTR_REFERENCES, htonl(set->ref)) || - nla_put_net32(skb, IPSET_ATTR_MEMSIZE, htonl(memsize))) + nla_put_net32(skb, IPSET_ATTR_MEMSIZE, htonl(memsize)) || + nla_put_net32(skb, IPSET_ATTR_ELEMENTS, htonl(h->elements))) goto nla_put_failure; if (unlikely(ip_set_put_flags(skb, set))) goto nla_put_failure; -- 2.1.4
[PATCH 25/39] netfilter: ipset: Add element count to all set types header
From: Jozsef KadlecsikIt is better to list the set elements for all set types, thus the header information is uniform. Element counts are therefore added to the bitmap and list types. Signed-off-by: Jozsef Kadlecsik --- include/linux/netfilter/ipset/ip_set.h| 2 ++ include/linux/netfilter/ipset/ip_set_bitmap.h | 2 +- net/netfilter/ipset/ip_set_bitmap_gen.h | 10 +- net/netfilter/ipset/ip_set_hash_gen.h | 21 ++--- net/netfilter/ipset/ip_set_list_set.c | 6 +- 5 files changed, 27 insertions(+), 14 deletions(-) diff --git a/include/linux/netfilter/ipset/ip_set.h b/include/linux/netfilter/ipset/ip_set.h index 7a218eb74887..4671d740610f 100644 --- a/include/linux/netfilter/ipset/ip_set.h +++ b/include/linux/netfilter/ipset/ip_set.h @@ -250,6 +250,8 @@ struct ip_set { u8 flags; /* Default timeout value, if enabled */ u32 timeout; + /* Number of elements (vs timeout) */ + u32 elements; /* Element data size */ size_t dsize; /* Offsets to extensions in elements */ diff --git a/include/linux/netfilter/ipset/ip_set_bitmap.h b/include/linux/netfilter/ipset/ip_set_bitmap.h index 5e4662a71e01..366d6c0ea04f 100644 --- a/include/linux/netfilter/ipset/ip_set_bitmap.h +++ b/include/linux/netfilter/ipset/ip_set_bitmap.h @@ -6,8 +6,8 @@ #define IPSET_BITMAP_MAX_RANGE 0x enum { + IPSET_ADD_STORE_PLAIN_TIMEOUT = -1, IPSET_ADD_FAILED = 1, - IPSET_ADD_STORE_PLAIN_TIMEOUT, IPSET_ADD_START_STORED_TIMEOUT, }; diff --git a/net/netfilter/ipset/ip_set_bitmap_gen.h b/net/netfilter/ipset/ip_set_bitmap_gen.h index 4f07b90f8ef4..1810d1c06e3d 100644 --- a/net/netfilter/ipset/ip_set_bitmap_gen.h +++ b/net/netfilter/ipset/ip_set_bitmap_gen.h @@ -83,6 +83,7 @@ mtype_flush(struct ip_set *set) if (set->extensions & IPSET_EXT_DESTROY) mtype_ext_cleanup(set); memset(map->members, 0, map->memsize); + set->elements = 0; } /* Calculate the actual memory size of the set data */ @@ -105,7 +106,8 @@ mtype_head(struct ip_set *set, struct sk_buff *skb) goto nla_put_failure; if (mtype_do_head(skb, map) || nla_put_net32(skb, IPSET_ATTR_REFERENCES, htonl(set->ref)) || - nla_put_net32(skb, IPSET_ATTR_MEMSIZE, htonl(memsize))) + nla_put_net32(skb, IPSET_ATTR_MEMSIZE, htonl(memsize)) || + nla_put_net32(skb, IPSET_ATTR_ELEMENTS, htonl(set->elements))) goto nla_put_failure; if (unlikely(ip_set_put_flags(skb, set))) goto nla_put_failure; @@ -149,6 +151,7 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext, if (ret == IPSET_ADD_FAILED) { if (SET_WITH_TIMEOUT(set) && ip_set_timeout_expired(ext_timeout(x, set))) { + set->elements--; ret = 0; } else if (!(flags & IPSET_FLAG_EXIST)) { set_bit(e->id, map->members); @@ -157,6 +160,8 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext, /* Element is re-added, cleanup extensions */ ip_set_ext_destroy(set, x); } + if (ret > 0) + set->elements--; if (SET_WITH_TIMEOUT(set)) #ifdef IP_SET_BITMAP_STORED_TIMEOUT @@ -174,6 +179,7 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext, /* Activate element */ set_bit(e->id, map->members); + set->elements++; return 0; } @@ -190,6 +196,7 @@ mtype_del(struct ip_set *set, void *value, const struct ip_set_ext *ext, return -IPSET_ERR_EXIST; ip_set_ext_destroy(set, x); + set->elements--; if (SET_WITH_TIMEOUT(set) && ip_set_timeout_expired(ext_timeout(x, set))) return -IPSET_ERR_EXIST; @@ -285,6 +292,7 @@ mtype_gc(unsigned long ul_set) if (ip_set_timeout_expired(ext_timeout(x, set))) { clear_bit(id, map->members); ip_set_ext_destroy(set, x); + set->elements--; } } spin_unlock_bh(>lock); diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h index f5acfb9709c9..6e967f198d1e 100644 --- a/net/netfilter/ipset/ip_set_hash_gen.h +++ b/net/netfilter/ipset/ip_set_hash_gen.h @@ -275,7 +275,6 @@ htable_bits(u32 hashsize) struct htype { struct htable __rcu *table; /* the hash table */ u32 maxelem;/* max elements in the hash */ - u32 elements; /* current element (vs timeout) */ u32 initval;/* random jhash init value */ #ifdef IP_SET_HASH_WITH_MARKMASK u32 markmask; /* markmask value
[PATCH 37/39] netfilter: ipset: hash: fix boolreturn.cocci warnings
From: kbuild test robotnet/netfilter/ipset/ip_set_hash_ipmac.c:70:8-9: WARNING: return of 0/1 in function 'hash_ipmac4_data_list' with return type bool net/netfilter/ipset/ip_set_hash_ipmac.c:178:8-9: WARNING: return of 0/1 in function 'hash_ipmac6_data_list' with return type bool Return statements in functions returning bool should use true/false instead of 1/0. Generated by: scripts/coccinelle/misc/boolreturn.cocci CC: Tomasz Chilinski Signed-off-by: Fengguang Wu Signed-off-by: Jozsef Kadlecsik --- net/netfilter/ipset/ip_set_hash_ipmac.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/net/netfilter/ipset/ip_set_hash_ipmac.c b/net/netfilter/ipset/ip_set_hash_ipmac.c index d9eb144b01d6..1ab5ed2f6839 100644 --- a/net/netfilter/ipset/ip_set_hash_ipmac.c +++ b/net/netfilter/ipset/ip_set_hash_ipmac.c @@ -67,10 +67,10 @@ hash_ipmac4_data_list(struct sk_buff *skb, const struct hash_ipmac4_elem *e) if (nla_put_ipaddr4(skb, IPSET_ATTR_IP, e->ip) || nla_put(skb, IPSET_ATTR_ETHER, ETH_ALEN, e->ether)) goto nla_put_failure; - return 0; + return false; nla_put_failure: - return 1; + return true; } static inline void @@ -175,10 +175,10 @@ hash_ipmac6_data_list(struct sk_buff *skb, const struct hash_ipmac6_elem *e) if (nla_put_ipaddr6(skb, IPSET_ATTR_IP, >ip.in6) || nla_put(skb, IPSET_ATTR_ETHER, ETH_ALEN, e->ether)) goto nla_put_failure; - return 0; + return false; nla_put_failure: - return 1; + return true; } static inline void -- 2.1.4
[PATCH 11/39] netfilter: nft_hash: get random bytes if seed is not specified
If the user doesn't specify a seed, generate one at configuration time. Signed-off-by: Pablo Neira Ayuso--- net/netfilter/nft_hash.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/netfilter/nft_hash.c b/net/netfilter/nft_hash.c index baf694de3935..97ad8e30e4b4 100644 --- a/net/netfilter/nft_hash.c +++ b/net/netfilter/nft_hash.c @@ -57,7 +57,6 @@ static int nft_hash_init(const struct nft_ctx *ctx, if (!tb[NFTA_HASH_SREG] || !tb[NFTA_HASH_DREG] || !tb[NFTA_HASH_LEN] || - !tb[NFTA_HASH_SEED] || !tb[NFTA_HASH_MODULUS]) return -EINVAL; @@ -80,7 +79,10 @@ static int nft_hash_init(const struct nft_ctx *ctx, if (priv->offset + priv->modulus - 1 < priv->offset) return -EOVERFLOW; - priv->seed = ntohl(nla_get_be32(tb[NFTA_HASH_SEED])); + if (tb[NFTA_HASH_SEED]) + priv->seed = ntohl(nla_get_be32(tb[NFTA_HASH_SEED])); + else + get_random_bytes(>seed, sizeof(priv->seed)); return nft_validate_register_load(priv->sreg, len) && nft_validate_register_store(ctx, priv->dreg, NULL, -- 2.1.4
[PATCH 10/39] netfilter: handle NF_REPEAT from nf_conntrack_in()
NF_REPEAT is only needed from nf_conntrack_in() under a very specific case required by the TCP protocol tracker, we can handle this case without returning to the core hook path. Handling of NF_REPEAT from the nf_reinject() is left untouched. Signed-off-by: Pablo Neira Ayuso--- net/netfilter/core.c | 2 -- net/netfilter/nf_conntrack_core.c | 11 ++- net/openvswitch/conntrack.c | 8 ++-- 3 files changed, 8 insertions(+), 13 deletions(-) diff --git a/net/netfilter/core.c b/net/netfilter/core.c index bd9272eeccb5..de30e08d58f2 100644 --- a/net/netfilter/core.c +++ b/net/netfilter/core.c @@ -322,8 +322,6 @@ int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state, if (ret == 0) ret = -EPERM; return ret; - case NF_REPEAT: - continue; case NF_QUEUE: ret = nf_queue(skb, state, , verdict); if (ret == 1 && entry) diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index df2f5a3901df..de4b8a75f30b 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -1305,7 +1305,7 @@ nf_conntrack_in(struct net *net, u_int8_t pf, unsigned int hooknum, if (skb->nfct) goto out; } - +repeat: ct = resolve_normal_ct(net, tmpl, skb, dataoff, pf, protonum, l3proto, l4proto, _reply, ); if (!ct) { @@ -1345,11 +1345,12 @@ nf_conntrack_in(struct net *net, u_int8_t pf, unsigned int hooknum, nf_conntrack_event_cache(IPCT_REPLY, ct); out: if (tmpl) { - /* Special case: we have to repeat this hook, assign the -* template again to this packet. We assume that this packet -* has no conntrack assigned. This is used by nf_ct_tcp. */ + /* Special case: TCP tracker reports an attempt to reopen a +* closed/aborted connection. We have to go back and create a +* fresh conntrack. +*/ if (ret == NF_REPEAT) - skb->nfct = (struct nf_conntrack *)tmpl; + goto repeat; else nf_ct_put(tmpl); } diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c index 31045ef44a82..9b8a028b7dad 100644 --- a/net/openvswitch/conntrack.c +++ b/net/openvswitch/conntrack.c @@ -725,12 +725,8 @@ static int __ovs_ct_lookup(struct net *net, struct sw_flow_key *key, skb->nfctinfo = IP_CT_NEW; } - /* Repeat if requested, see nf_iterate(). */ - do { - err = nf_conntrack_in(net, info->family, - NF_INET_PRE_ROUTING, skb); - } while (err == NF_REPEAT); - + err = nf_conntrack_in(net, info->family, + NF_INET_PRE_ROUTING, skb); if (err != NF_ACCEPT) return -ENOENT; -- 2.1.4
[PATCH 27/39] netfilter: ipset: Remove redundant mtype_expire() arguments
From: Jozsef KadlecsikRemove redundant parameters nets_length and dsize, because they can be get from other parameters. Ported from a patch proposed by Sergey Popovich . Signed-off-by: Jozsef Kadlecsik --- net/netfilter/ipset/ip_set_hash_gen.h | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h index 0746405a1d14..c4877b6de74f 100644 --- a/net/netfilter/ipset/ip_set_hash_gen.h +++ b/net/netfilter/ipset/ip_set_hash_gen.h @@ -465,14 +465,15 @@ mtype_same_set(const struct ip_set *a, const struct ip_set *b) /* Delete expired elements from the hashtable */ static void -mtype_expire(struct ip_set *set, struct htype *h, u8 nets_length, size_t dsize) +mtype_expire(struct ip_set *set, struct htype *h) { struct htable *t; struct hbucket *n, *tmp; struct mtype_elem *data; u32 i, j, d; + size_t dsize = set->dsize; #ifdef IP_SET_HASH_WITH_NETS - u8 k; + u8 k, nets_length = NLEN(set->family); #endif t = ipset_dereference_protected(h->table, set); @@ -539,7 +540,7 @@ mtype_gc(unsigned long ul_set) pr_debug("called\n"); spin_lock_bh(>lock); - mtype_expire(set, h, NLEN(set->family), set->dsize); + mtype_expire(set, h); spin_unlock_bh(>lock); h->gc.expires = jiffies + IPSET_GC_PERIOD(set->timeout) * HZ; @@ -715,7 +716,7 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext, if (set->elements >= h->maxelem) { if (SET_WITH_TIMEOUT(set)) /* FIXME: when set is full, we slow down here */ - mtype_expire(set, h, NLEN(set->family), set->dsize); + mtype_expire(set, h); if (set->elements >= h->maxelem && SET_WITH_FORCEADD(set)) forceadd = true; } -- 2.1.4
[PATCH 19/39] netfilter: ipset: Improve skbinfo get/init helpers
From: Jozsef KadlecsikUse struct ip_set_skbinfo in struct ip_set_ext instead of open coded fields and assign structure members in get/init helpers instead of copying members one by one. Explicitly note that struct ip_set_skbinfo must be padded to prevent non-aligned access in the extension blob. Ported from a patch proposed by Sergey Popovich . Suggested-by: Sergey Popovich Signed-off-by: Jozsef Kadlecsik --- include/linux/netfilter/ipset/ip_set.h | 30 +++--- net/netfilter/ipset/ip_set_core.c | 12 ++-- net/netfilter/xt_set.c | 12 +++- 3 files changed, 24 insertions(+), 30 deletions(-) diff --git a/include/linux/netfilter/ipset/ip_set.h b/include/linux/netfilter/ipset/ip_set.h index 1ea28e30a6dd..780262124632 100644 --- a/include/linux/netfilter/ipset/ip_set.h +++ b/include/linux/netfilter/ipset/ip_set.h @@ -92,17 +92,6 @@ struct ip_set_ext_type { extern const struct ip_set_ext_type ip_set_extensions[]; -struct ip_set_ext { - u64 packets; - u64 bytes; - u32 timeout; - u32 skbmark; - u32 skbmarkmask; - u32 skbprio; - u16 skbqueue; - char *comment; -}; - struct ip_set_counter { atomic64_t bytes; atomic64_t packets; @@ -122,6 +111,15 @@ struct ip_set_skbinfo { u32 skbmarkmask; u32 skbprio; u16 skbqueue; + u16 __pad; +}; + +struct ip_set_ext { + struct ip_set_skbinfo skbinfo; + u64 packets; + u64 bytes; + char *comment; + u32 timeout; }; struct ip_set; @@ -360,10 +358,7 @@ ip_set_get_skbinfo(struct ip_set_skbinfo *skbinfo, const struct ip_set_ext *ext, struct ip_set_ext *mext, u32 flags) { - mext->skbmark = skbinfo->skbmark; - mext->skbmarkmask = skbinfo->skbmarkmask; - mext->skbprio = skbinfo->skbprio; - mext->skbqueue = skbinfo->skbqueue; + mext->skbinfo = *skbinfo; } static inline bool @@ -387,10 +382,7 @@ static inline void ip_set_init_skbinfo(struct ip_set_skbinfo *skbinfo, const struct ip_set_ext *ext) { - skbinfo->skbmark = ext->skbmark; - skbinfo->skbmarkmask = ext->skbmarkmask; - skbinfo->skbprio = ext->skbprio; - skbinfo->skbqueue = ext->skbqueue; + *skbinfo = ext->skbinfo; } /* Netlink CB args */ diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c index 3f1b945a24d5..bfacccff7196 100644 --- a/net/netfilter/ipset/ip_set_core.c +++ b/net/netfilter/ipset/ip_set_core.c @@ -426,20 +426,20 @@ ip_set_get_extensions(struct ip_set *set, struct nlattr *tb[], if (!SET_WITH_SKBINFO(set)) return -IPSET_ERR_SKBINFO; fullmark = be64_to_cpu(nla_get_be64(tb[IPSET_ATTR_SKBMARK])); - ext->skbmark = fullmark >> 32; - ext->skbmarkmask = fullmark & 0x; + ext->skbinfo.skbmark = fullmark >> 32; + ext->skbinfo.skbmarkmask = fullmark & 0x; } if (tb[IPSET_ATTR_SKBPRIO]) { if (!SET_WITH_SKBINFO(set)) return -IPSET_ERR_SKBINFO; - ext->skbprio = be32_to_cpu(nla_get_be32( - tb[IPSET_ATTR_SKBPRIO])); + ext->skbinfo.skbprio = + be32_to_cpu(nla_get_be32(tb[IPSET_ATTR_SKBPRIO])); } if (tb[IPSET_ATTR_SKBQUEUE]) { if (!SET_WITH_SKBINFO(set)) return -IPSET_ERR_SKBINFO; - ext->skbqueue = be16_to_cpu(nla_get_be16( - tb[IPSET_ATTR_SKBQUEUE])); + ext->skbinfo.skbqueue = + be16_to_cpu(nla_get_be16(tb[IPSET_ATTR_SKBQUEUE])); } return 0; } diff --git a/net/netfilter/xt_set.c b/net/netfilter/xt_set.c index 1bfede7be418..64285702afd5 100644 --- a/net/netfilter/xt_set.c +++ b/net/netfilter/xt_set.c @@ -423,6 +423,8 @@ set_target_v2(struct sk_buff *skb, const struct xt_action_param *par) /* Revision 3 target */ +#define MOPT(opt, member) ((opt).ext.skbinfo.member) + static unsigned int set_target_v3(struct sk_buff *skb, const struct xt_action_param *par) { @@ -453,14 +455,14 @@ set_target_v3(struct sk_buff *skb, const struct xt_action_param *par) if (!ret) return XT_CONTINUE; if (map_opt.cmdflags & IPSET_FLAG_MAP_SKBMARK) - skb->mark = (skb->mark & ~(map_opt.ext.skbmarkmask)) - ^ (map_opt.ext.skbmark); + skb->mark = (skb->mark & ~MOPT(map_opt,skbmarkmask)) + ^ MOPT(map_opt, skbmark); if (map_opt.cmdflags & IPSET_FLAG_MAP_SKBPRIO) -
[PATCH 20/39] netfilter: ipset: Use kmalloc() in comment extension helper
From: Jozsef KadlecsikAllocate memory with kmalloc() rather than kzalloc(): the string is immediately initialized so it is unnecessary to zero out the allocated memory area. Ported from a patch proposed by Sergey Popovich . Suggested-by: Sergey Popovich Signed-off-by: Jozsef Kadlecsik --- include/linux/netfilter/ipset/ip_set_comment.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/netfilter/ipset/ip_set_comment.h b/include/linux/netfilter/ipset/ip_set_comment.h index bae5c7609be2..5444b1bbe656 100644 --- a/include/linux/netfilter/ipset/ip_set_comment.h +++ b/include/linux/netfilter/ipset/ip_set_comment.h @@ -34,7 +34,7 @@ ip_set_init_comment(struct ip_set_comment *comment, return; if (unlikely(len > IPSET_MAX_COMMENT_SIZE)) len = IPSET_MAX_COMMENT_SIZE; - c = kzalloc(sizeof(*c) + len + 1, GFP_ATOMIC); + c = kmalloc(sizeof(*c) + len + 1, GFP_ATOMIC); if (unlikely(!c)) return; strlcpy(c->str, ext->comment, len + 1); -- 2.1.4
[PATCH 04/39] netfilter: deprecate NF_STOP
NF_STOP is only used by br_netfilter these days, and it can be emulated with a combination of NF_STOLEN plus explicit call to the ->okfn() function as Florian suggests. To retain binary compatibility with userspace nf_queue application, we have to keep NF_STOP around, so libnetfilter_queue userspace userspace applications still work if they use NF_STOP for some exotic reason. Out of tree modules using NF_STOP would break, but we don't care about those. Signed-off-by: Pablo Neira Ayuso--- include/uapi/linux/netfilter.h | 2 +- net/bridge/br_netfilter_hooks.c | 6 -- net/netfilter/core.c| 2 +- 3 files changed, 6 insertions(+), 4 deletions(-) diff --git a/include/uapi/linux/netfilter.h b/include/uapi/linux/netfilter.h index d93f949d1d9a..7550e9176a54 100644 --- a/include/uapi/linux/netfilter.h +++ b/include/uapi/linux/netfilter.h @@ -13,7 +13,7 @@ #define NF_STOLEN 2 #define NF_QUEUE 3 #define NF_REPEAT 4 -#define NF_STOP 5 +#define NF_STOP 5 /* Deprecated, for userspace nf_queue compatibility. */ #define NF_MAX_VERDICT NF_STOP /* we overload the higher bits for encoding auxiliary data such as the queue diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c index d0d66faebe90..7e3645fa6339 100644 --- a/net/bridge/br_netfilter_hooks.c +++ b/net/bridge/br_netfilter_hooks.c @@ -845,8 +845,10 @@ static unsigned int ip_sabotage_in(void *priv, struct sk_buff *skb, const struct nf_hook_state *state) { - if (skb->nf_bridge && !skb->nf_bridge->in_prerouting) - return NF_STOP; + if (skb->nf_bridge && !skb->nf_bridge->in_prerouting) { + state->okfn(state->net, state->sk, skb); + return NF_STOLEN; + } return NF_ACCEPT; } diff --git a/net/netfilter/core.c b/net/netfilter/core.c index cb0232c11bc8..14f97b624f98 100644 --- a/net/netfilter/core.c +++ b/net/netfilter/core.c @@ -333,7 +333,7 @@ int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state) entry = rcu_dereference(state->hook_entries); next_hook: verdict = nf_iterate(skb, state, ); - if (verdict == NF_ACCEPT || verdict == NF_STOP) { + if (verdict == NF_ACCEPT) { ret = 1; } else if ((verdict & NF_VERDICT_MASK) == NF_DROP) { kfree_skb(skb); -- 2.1.4
[PATCH 07/39] netfilter: use switch() to handle verdict cases from nf_hook_slow()
Use switch() for verdict handling and add explicit handling for NF_STOLEN and other non-conventional verdicts. Signed-off-by: Pablo Neira Ayuso--- net/netfilter/core.c | 18 ++ 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/net/netfilter/core.c b/net/netfilter/core.c index 14f97b624f98..64623374bc5f 100644 --- a/net/netfilter/core.c +++ b/net/netfilter/core.c @@ -328,22 +328,32 @@ int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state) { struct nf_hook_entry *entry; unsigned int verdict; - int ret = 0; + int ret; entry = rcu_dereference(state->hook_entries); next_hook: verdict = nf_iterate(skb, state, ); - if (verdict == NF_ACCEPT) { + switch (verdict & NF_VERDICT_MASK) { + case NF_ACCEPT: ret = 1; - } else if ((verdict & NF_VERDICT_MASK) == NF_DROP) { + break; + case NF_DROP: kfree_skb(skb); ret = NF_DROP_GETERR(verdict); if (ret == 0) ret = -EPERM; - } else if ((verdict & NF_VERDICT_MASK) == NF_QUEUE) { + break; + case NF_QUEUE: ret = nf_queue(skb, state, , verdict); if (ret == 1 && entry) goto next_hook; + /* Fall through. */ + default: + /* Implicit handling for NF_STOLEN, as well as any other non +* conventional verdicts. +*/ + ret = 0; + break; } return ret; } -- 2.1.4
[PATCH 01/39] netfilter: get rid of useless debugging from core
This patch remove compile time code to catch inconventional verdicts. We have better ways to handle this case these days, eg. pr_debug() but even though I don't think this is useful at all, so let's remove this. Signed-off-by: Pablo Neira Ayuso--- net/netfilter/core.c | 9 - 1 file changed, 9 deletions(-) diff --git a/net/netfilter/core.c b/net/netfilter/core.c index 004af030ef1a..3d4aa96cb219 100644 --- a/net/netfilter/core.c +++ b/net/netfilter/core.c @@ -323,15 +323,6 @@ unsigned int nf_iterate(struct sk_buff *skb, repeat: verdict = (*entryp)->ops.hook((*entryp)->ops.priv, skb, state); if (verdict != NF_ACCEPT) { -#ifdef CONFIG_NETFILTER_DEBUG - if (unlikely((verdict & NF_VERDICT_MASK) - > NF_MAX_VERDICT)) { - NFDEBUG("Evil return from %p(%u).\n", - (*entryp)->ops.hook, state->hook); - *entryp = rcu_dereference((*entryp)->next); - continue; - } -#endif if (verdict != NF_REPEAT) return verdict; goto repeat; -- 2.1.4
Re: [PATCH] net: stmmac: Add support for ethtool::nway_reset
Hi Florian, [auto build test WARNING on net-next/master] [also build test WARNING on next-2016] [cannot apply to v4.9-rc5] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Florian-Fainelli/net-stmmac-Add-support-for-ethtool-nway_reset/20161114-053015 config: x86_64-kexec (attached as .config) compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All warnings (new ones prefixed by >>): drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c: In function 'stmmac_nway_reset': >> drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c:867:22: warning: unused >> variable 'priv' [-Wunused-variable] struct stmmac_priv *priv = netdev_priv(dev); ^~~~ vim +/priv +867 drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c 851 int ret = 0; 852 853 switch (tuna->id) { 854 case ETHTOOL_RX_COPYBREAK: 855 priv->rx_copybreak = *(u32 *)data; 856 break; 857 default: 858 ret = -EINVAL; 859 break; 860 } 861 862 return ret; 863 } 864 865 static int stmmac_nway_reset(struct net_device *dev) 866 { > 867 struct stmmac_priv *priv = netdev_priv(dev); 868 869 if (!dev->phydev) 870 return -ENODEV; 871 872 return genphy_restart_aneg(dev->phydev); 873 } 874 875 static const struct ethtool_ops stmmac_ethtool_ops = { --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH] net: stmmac: Add support for ethtool::nway_reset
Le 13/11/2016 à 13:24, Florian Fainelli a écrit : > If we have a PHY device, just invoke genphy_restart_aneg() to restart > auto-negotiation. > > Signed-off-by: Florian FainelliDavid, please drop this patch for now, since I have another one pending which is going to touch the net_device/phydev interaction, this one also causes a build warning since priv is not used. Thank you! -- Florian
Re: [PATCH] netfilter: x_tables: simplify IS_ERR_OR_NULL to NULL test
On Fri, Nov 11, 2016 at 01:32:38PM +0100, Julia Lawall wrote: > Since commit 7926dbfa4bc1 ("netfilter: don't use > mutex_lock_interruptible()"), the function xt_find_table_lock can only > return NULL on an error. Simplify the call sites and update the > comment before the function. Applied, thanks Julia!
[PATCH] net: stmmac: Add support for ethtool::nway_reset
If we have a PHY device, just invoke genphy_restart_aneg() to restart auto-negotiation. Signed-off-by: Florian Fainelli--- drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c index 3fe9340b748f..7a487c9ccdea 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c @@ -862,6 +862,16 @@ static int stmmac_set_tunable(struct net_device *dev, return ret; } +static int stmmac_nway_reset(struct net_device *dev) +{ + struct stmmac_priv *priv = netdev_priv(dev); + + if (!dev->phydev) + return -ENODEV; + + return genphy_restart_aneg(dev->phydev); +} + static const struct ethtool_ops stmmac_ethtool_ops = { .begin = stmmac_check_if_running, .get_drvinfo = stmmac_ethtool_getdrvinfo, @@ -886,6 +896,7 @@ static const struct ethtool_ops stmmac_ethtool_ops = { .set_tunable = stmmac_set_tunable, .get_link_ksettings = stmmac_ethtool_get_link_ksettings, .set_link_ksettings = stmmac_ethtool_set_link_ksettings, + .nway_reset = stmmac_nway_reset, }; void stmmac_set_ethtool_ops(struct net_device *netdev) -- 2.9.3
Re: [PATCH v2] ip6_output: ensure flow saddr actually belongs to device
On 11/13/16 12:02 PM, Jason A. Donenfeld wrote: > This puts the IPv6 routing functions in parity with the IPv4 routing > functions. Namely, we now check in v6 that if a flowi6 requests an > saddr, the returned dst actually corresponds to a net device that has > that saddr. This mirrors the v4 logic with __ip_dev_find in > __ip_route_output_key_hash. In the event that the returned dst is not > for a dst with a dev that has the saddr, we return -EINVAL, just like > v4; this makes it easy to use the same error handlers for both cases. > > Signed-off-by: Jason A. Donenfeld> Cc: David Ahern > --- > Changes from v1: >This moves the check to the top and now sees if it's a valid address >on _any_ device, not just the one in dst. > > include/net/ipv6.h| 2 ++ > net/ipv6/ip6_output.c | 28 > 2 files changed, 30 insertions(+) > > diff --git a/include/net/ipv6.h b/include/net/ipv6.h > index 8fed1cd..e5dc14f 100644 > --- a/include/net/ipv6.h > +++ b/include/net/ipv6.h > @@ -914,6 +914,8 @@ struct dst_entry *ip6_sk_dst_lookup_flow(struct sock *sk, > struct flowi6 *fl6, >const struct in6_addr *final_dst); > struct dst_entry *ip6_blackhole_route(struct net *net, > struct dst_entry *orig_dst); > +struct net_device *__ip6_dev_find(struct net *net, struct in6_addr *addr, > + bool devref); > > /* > * skb processing functions > diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c > index 6001e78..371170b 100644 > --- a/net/ipv6/ip6_output.c > +++ b/net/ipv6/ip6_output.c > @@ -916,6 +916,30 @@ static struct dst_entry *ip6_sk_dst_check(struct sock > *sk, > return dst; > } > > +/** > + * __ip6_dev_find - find the first device with a given source address. > + * @net: the net namespace > + * @addr: the source address > + * @devref: if true, take a reference on the found device > + * > + * If a caller uses devref=false, it should be protected by RCU, or RTNL > + */ > +struct net_device *__ip6_dev_find(struct net *net, struct in6_addr *addr, > bool devref) > +{ > + struct net_device *result; > + > + rcu_read_lock(); > + for_each_netdev_rcu(net, result) { > + if (ipv6_chk_addr(net, addr, result, 1)) > + break; > + } > + if (result && devref) > + dev_hold(result); > + rcu_read_unlock(); > + return result; > +} > +EXPORT_SYMBOL(__ip6_dev_find); You don't need a new function to walk all interfaces; just use ipv6_chk_addr with a dev arg of NULL. IPv6 has a hash table with all unicast addresses -- inet6_addr_lst. ipv6_chk_addr is checking that list for the address in question. The actual device is not relevant for verifying the address is a valid local one (though the device can be returned from ifp->idev->dev if ever needed). So drop the above ... > + > static int ip6_dst_lookup_tail(struct net *net, const struct sock *sk, > struct dst_entry **dst, struct flowi6 *fl6) > { > @@ -926,6 +950,10 @@ static int ip6_dst_lookup_tail(struct net *net, const > struct sock *sk, > int err; > int flags = 0; > > + if (!ipv6_addr_any(>saddr) && > + !__ip6_dev_find(net, >saddr, false)) ... and just use ipv6_chk_addr here. > + return -EINVAL; > + > /* The correct way to handle this would be to do >* ip6_route_get_saddr, and then ip6_route_output; however, >* the route-specific preferred source forces the >
Re: [PATCH] ip6_output: ensure flow saddr actually belongs to device
On 11/13/16 1:19 PM, Jason A. Donenfeld wrote: > I gave v2 my best shot. Hopefully it's adequate, but I have a feeling > it might be best for you to just code up what you have in mind. nah, you are doing fine. one more comment on v2.
Re: [PATCH net 2/2] r8152: rx descriptor check
On 16-11-13 03:34 PM, Mark Lord wrote: > > The system I use it with is a 32-bit ppc476, with non-coherent RAM, > and using 16KB page sizes. > > The dongle instantly becomes a lot more reliable when r8152.c is updated > to use usb_alloc_coherent() for URB buffers, rather than kmalloc(). > > Not sure why that would be though, as the USB stack normally would handle > kmalloc'd buffers just fine. It is calling the appropriate routines, > which boil down to invalidating the dcache lines (for inbound bulk xfers) > as part of usb_submit_urb(), and yet the problem there persists. > > It could be caused by cache-line sharing with other allocations, but that > seems > unlikely as the kmalloc() size is 16384 bytes per buffer. Perhaps the driver > is somehow accessing the buffer space again after doing usb_submit_urb()? > That would certainly produce this kind of behaviour. > > Or maybe there's just a memory barrier missing somewhere in path. > > The really weird thing is that ASIX-based dongles (which use a different > driver) > don't have this problem, and yet they also use kmalloc'd buffers. > > I have access to the test system only for a day or two a week, > and it takes a few hours to do a good test as to whether something helps or > not. > I'll continue to poke at it as time and New Ideas permit. Oh, and the problems did not exist with the 3.14.xx kernels and earlier. They began to show up when we tried 3.16.xx and all newer kernels. The difference there is that RX checksums were enabled in hardware as of 3.16.xx, and thus the network stack began accepting bad packets from the r8152 driver. I don't know if the ASIX driver uses hardware checksums or just software checksums. That might explain why it is more reliable here. -- Mark Lord Real-Time Remedies Inc. ml...@pobox.com
Re: [PATCH net 2/2] r8152: rx descriptor check
On 16-11-13 12:39 PM, David Miller wrote: > From: Hayes Wang> Date: Fri, 11 Nov 2016 15:15:41 +0800 > >> For some platforms, the data in memory is not the same with the one >> from the device. That is, the data of memory is unbelievable. The >> check is used to find out this situation. >> >> Signed-off-by: Hayes Wang > > I'm all for adding consistency checks, but I disagree with proceeding > in this manner for this. > > If you add this patch now, there is a much smaller likelyhood that you > will work with a high priority to figure out _why_ this is happening. > > For all we know this could be a platform bug in the DMA API for the > systems in question. > > It could also be a bug elsewhere in the driver, either in setting up > the descriptor DMA mappings or how the chip is programmed. > > Either way the true cause must be found before we start throwing > changes like this into the driver. I agree. The system I use it with is a 32-bit ppc476, with non-coherent RAM, and using 16KB page sizes. The dongle instantly becomes a lot more reliable when r8152.c is updated to use usb_alloc_coherent() for URB buffers, rather than kmalloc(). Not sure why that would be though, as the USB stack normally would handle kmalloc'd buffers just fine. It is calling the appropriate routines, which boil down to invalidating the dcache lines (for inbound bulk xfers) as part of usb_submit_urb(), and yet the problem there persists. It could be caused by cache-line sharing with other allocations, but that seems unlikely as the kmalloc() size is 16384 bytes per buffer. Perhaps the driver is somehow accessing the buffer space again after doing usb_submit_urb()? That would certainly produce this kind of behaviour. Or maybe there's just a memory barrier missing somewhere in path. The really weird thing is that ASIX-based dongles (which use a different driver) don't have this problem, and yet they also use kmalloc'd buffers. I have access to the test system only for a day or two a week, and it takes a few hours to do a good test as to whether something helps or not. I'll continue to poke at it as time and New Ideas permit. New Ideas welcome! -- Mark Lord Real-Time Remedies Inc. ml...@pobox.com
Re: [PATCH net-next 00/11] Start adding support for mv88e6390 family
On Sun, Nov 13, 2016 at 12:48:59AM -0500, David Miller wrote: > From: Andrew Lunn> Date: Fri, 11 Nov 2016 03:53:32 +0100 > > > This is the first patchset implementing support for the mv88e6390 > > family. This is a new generation of switch devices and has numerous > > incompatible changes to the registers. These patches allow the switch > > to the detected during probe, and makes the statistics unit work. > > > > These patches are insufficient to make the mv88e6390 functional. More > > patches will follow. > > Andrew, this series doesn't apply cleanly to net-next, so you'll > need to respin. Hi David I'm happy to respin, but i'm wondering why the don't apply. What seems to be the issue is you said you have accepted: [PATCH net-next 0/2] Fixes for port refactoring https://marc.info/?l=linux-netdev=147880114928996=1 Yet i don't see these in net-next. And i based this patchset on a tree which included the fixes. Hence they are not applying. Have the fixes really been accepted? Thanks Andrew
Re: [PATCH] ip6_output: ensure flow saddr actually belongs to device
Hi David, On Sun, Nov 13, 2016 at 5:30 PM, David Ahernwrote: > You can't require the address to be on the dst device. e.g., it can be an > address from the loopback/vrf device. > > This block needs to be done at function entry, and pass dev as NULL to mean > is the address assigned to any interface. That gets you the equivalency of > the IPv4 check. I gave v2 my best shot. Hopefully it's adequate, but I have a feeling it might be best for you to just code up what you have in mind. Regards, Jason
Re: [patch net v2 0/2] mlxsw: Couple of fixes
Sun, Nov 13, 2016 at 06:51:33PM CET, da...@davemloft.net wrote: >From: Jiri Pirko>Date: Fri, 11 Nov 2016 16:34:24 +0100 > >> From: Jiri Pirko >> >> Please, queue-up both for stable. Thanks! > >Just to be clear I did make sure to take v2 rather than >v1. Good. Thanks!
Re: [PATCH net-next v2] ipv6: sr: fix IPv6 initialization failure without lwtunnels
On 11/13/2016 06:23 AM, David Miller wrote: > This seems like such a huge mess, quite frankly. > > IPV6-SR has so many strange dependencies, a weird Kconfig option that is > simply controlling what a responsible sysadmin should be allow to do if > he chooses anyways. > > Every distribution is going to say "¯\_(ツ)_/¯" and just turn the thing > on in their builds. Indeed, the issue is that seg6_iptunnel.o was included in obj-y instead of ipv6-y, triggering the bug when CONFIG_IPV6=m. Fixed with the following modification to the patch (tested with allyesconfig and allmodconfig): diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile index 8979d53..a233136 100644 --- a/net/ipv6/Makefile +++ b/net/ipv6/Makefile @@ -53,6 +53,6 @@ obj-$(subst m,y,$(CONFIG_IPV6)) += inet6_hashtables.o ifneq ($(CONFIG_IPV6),) obj-$(CONFIG_NET_UDP_TUNNEL) += ip6_udp_tunnel.o -obj-$(CONFIG_LWTUNNEL) += seg6_iptunnel.o +ipv6-$(CONFIG_LWTUNNEL) += seg6_iptunnel.o obj-y += mcast_snoop.o endif I agree with you that the way to combine the dependencies is strange, even if they are very few. The part of the IPv6-SR patch that is enabled by default depends on two things: IPV6 and LWTUNNEL. The problem is that LWTUNNEL does not depend on IPV6 and is not necessarily enabled. To fix the bug reported by Lorenzo, I propose to select one the three following solutions: 1. Make LWTUNNEL always enabled (removing the option). Pros: remove an option Cons: add always-enabled code 2. Create an option IPV6_SEG6_LWTUNNEL, which would select LWTUNNEL and enable the compilation of seg6_iptunnel.o. Pros: logically dissociate the part of IPv6-SR that depends on LWTUNNEL from the core patch and simplifies compilation Cons: add an option 3. Apply the proposed patch with the fix Pros: do not modify options Cons: weird conditional compilation What do you think ? David signature.asc Description: OpenPGP digital signature
Re: [net-next 1/2] net: ethernet: slicoss: add slicoss gigabit ethernet driver
> +static const char slic_stats_strings[][ETH_GSTRING_LEN] = { > + "rx_packets ", > + "rx_bytes ", > + "rx_multicasts ", > + "rx_errors ", > + "rx_buff_miss ", > + "rx_tp_csum ", > + "rx_tp_oflow", > + "rx_tp_hlen ", > + "rx_ip_csum ", > + "rx_ip_len ", Are there any other drivers which pad the statistics strings? > +static void slic_set_link_autoneg(struct slic_device *sdev) > +{ > + unsigned int subid = sdev->pdev->subsystem_device; > + u32 val; > + > + if (sdev->is_fiber) { > + /* We've got a fiber gigabit interface, and register 4 is > + * different in fiber mode than in copper mode. > + */ > + /* advertise FD only @1000 Mb */ > + val = MII_ADVERTISE << 16 | SLIC_PAR_ADV1000XFD | > + SLIC_PAR_ASYMPAUSE_FIBER; > + /* enable PAUSE frames */ > + slic_write(sdev, SLIC_REG_WPHY, val); > + /* reset phy, enable auto-neg */ > + val = MII_BMCR << 16 | SLIC_PCR_RESET | SLIC_PCR_AUTONEG | > + SLIC_PCR_AUTONEG_RST; > + slic_write(sdev, SLIC_REG_WPHY, val); > + } else {/* copper gigabit */ > + /* We've got a copper gigabit interface, and register 4 is > + * different in copper mode than in fiber mode. > + */ > + /* advertise 10/100 Mb modes */ > + val = MII_ADVERTISE << 16 | SLIC_PAR_ADV100FD | > + SLIC_PAR_ADV100HD | SLIC_PAR_ADV10FD | SLIC_PAR_ADV10HD; > + /* enable PAUSE frames */ > + val |= SLIC_PAR_ASYMPAUSE; > + /* required by the Cicada PHY */ > + val |= SLIC_PAR_802_3; > + slic_write(sdev, SLIC_REG_WPHY, val); > + > + /* advertise FD only @1000 Mb */ > + val = MII_CTRL1000 << 16 | SLIC_PGC_ADV1000FD; > + slic_write(sdev, SLIC_REG_WPHY, val); > + > + if (subid != PCI_SUBDEVICE_ID_ALACRITECH_CICADA) { > + /* if a Marvell PHY enable auto crossover */ > + val = SLIC_MIICR_REG_16 | SLIC_MRV_REG16_XOVERON; > + slic_write(sdev, SLIC_REG_WPHY, val); > + > + /* reset phy, enable auto-neg */ > + val = MII_BMCR << 16 | SLIC_PCR_RESET | > + SLIC_PCR_AUTONEG | SLIC_PCR_AUTONEG_RST; > + slic_write(sdev, SLIC_REG_WPHY, val); > + } else { > + /* enable and restart auto-neg (don't reset) */ > + val = MII_BMCR << 16 | SLIC_PCR_AUTONEG | > + SLIC_PCR_AUTONEG_RST; > + slic_write(sdev, SLIC_REG_WPHY, val); > + } > + } > + sdev->autoneg = true; > +} Could this be pulled out into a standard PHY driver? All the SLIC SLIC_PCR_ defines seems to be the same as those in mii.h. This could be a standard PHY hidden behind a single register. Andrew
Re: Debugging Ethernet issues
Le 13/11/2016 à 11:51, Mason a écrit : > On 13/11/2016 04:09, Andrew Lunn wrote: > >> Mason wrote: >> >>> When connected to a Gigabit switch >>> 3.4 negotiates a LAN DHCP setup instantly >>> 4.7 requires over 5 seconds to do so >> >> When you run tcpdump on the DHCP server, are you noticing the first >> request is missing? >> >> What can happen is the dhclient gets started immediately and sends out >> its first request before auto-negotiation has finished. So this first packet >> gets lost. The retransmit after a few seconds is then successful. > > I will run tcpdump on the server as I run udhcpc on the client > for Linux 3.4 vs 4.7 > > Do you know what would make auto-negotiation fail at 100 Mbps > on 4.7? (whereas it succeeds on 3.4) > > (Thinking out loud) If the problem were in auto-negotiation, > then if should work if I hard-code speed and duplex using > ethtool, right? (IIRC, hard-coding doesn't help.) I would start with checking basic things: - does your Ethernet driver get a link UP being reported correctly (netif_carrier_ok returns 1)? - if you let the bootloader configure the PHY and utilize the Generic PHY driver instead of the Atheros PHY driver, does the problem appear as well? - what do transmit/receive counters on the Ethernet driver/MAC return? -- Florian
Re: Debugging Ethernet issues
On 13/11/2016 04:09, Andrew Lunn wrote: > Mason wrote: > >> When connected to a Gigabit switch >> 3.4 negotiates a LAN DHCP setup instantly >> 4.7 requires over 5 seconds to do so > > When you run tcpdump on the DHCP server, are you noticing the first > request is missing? > > What can happen is the dhclient gets started immediately and sends out > its first request before auto-negotiation has finished. So this first packet > gets lost. The retransmit after a few seconds is then successful. I will run tcpdump on the server as I run udhcpc on the client for Linux 3.4 vs 4.7 Do you know what would make auto-negotiation fail at 100 Mbps on 4.7? (whereas it succeeds on 3.4) (Thinking out loud) If the problem were in auto-negotiation, then if should work if I hard-code speed and duplex using ethtool, right? (IIRC, hard-coding doesn't help.) Regards.