date:20161113

PLEASE VIEW THE ATTACHED FILE AND CONTACT ME.

2016-11-13 Thread Dr. Felix Collins




FROM FIRST NATIONAL BANK OF SOUTH AFRICA (F.N.B)..rtf
Description: MS-Word document

Re: stmmac/RTL8211F/Meson GXBB: TX throughput problems

2016-11-13 Thread Giuseppe CAVALLARO


Hello Martin

On 11/7/2016 6:37 PM, Martin Blumenstingl wrote:

Hi Peppe,

On Mon, Nov 7, 2016 at 11:59 AM, Giuseppe CAVALLARO
 wrote:

In the meantime, I will read again the thread just to see if
there is something I am missing.

if you are re-reading this thread: please note that there are two
devices in discussion here!


many thx for the sum :-)


Both are using the Amlogic S905 (GXBB) SoC and both are experiencing
the same issue (Gbit TX issues, RX with Gbit speeds and RX/TX with
100Mbit speed are NOT affected):
- Odroid-C2 (used by Jerome and André Roth)
- Tronsmart Vega S95 Meta (my device)

The (Gbit TX) problem seems to be gone on the Odroid-C2 with Jerome's
patch which disables EEE in drivers/net/phy/realtek.c (at least in his
tests, I don't have that device so I can't verify).
The same problem still appears on my Tronsmart Vega S95 Meta even with
the patched PHY driver.


just an doubt, maybe useful, in the past, on GiGa setup I saw similar
problems and it was due to retiming so maybe 2ns could be necessary
(or better granularity via PAD logic if available).

Regards
Peppe


Unfortunately I don't have a second device to rule out that my
Tronsmart Vega S95 Meta could be broken (not unlikely, I get DDR
errors from time to time in u-boot). Maybe Andreas Faerber can test
ethernet with and without Jerome's patch on one of his Tronsmart
devices.


Regards,
Martin

Re: [PATCH net] net: stmmac: Fix lack of link transition for fixed PHYs

2016-11-13 Thread Giuseppe CAVALLARO


On 11/14/2016 2:50 AM, Florian Fainelli wrote:

Commit 52f95bbfcf72 ("stmmac: fix adjust link call in case of a switch
is attached") added some logic to avoid polling the fixed PHY and
therefore invoking the adjust_link callback more than once, since this
is a fixed PHY and link events won't be generated.

This works fine the first time, because we start with phydev->irq =
PHY_POLL, so we call adjust_link, then we set phydev->irq =
PHY_IGNORE_INTERRUPT and we stop polling the PHY.

Now, if we called ndo_close(), which calls both phy_stop() and does an
explicit netif_carrier_off(), we end up with a link down. Upon calling
ndo_open() again, despite starting the PHY state machine, we have
PHY_IGNORE_INTERRUPT set, and we generate no link event at all, so the
link is permanently down.

52f95bbfcf72 ("stmmac: fix adjust link call in case of a switch is attached")
Signed-off-by: Florian Fainelli 
---
Alexandre, Peppe,

The original patch is already a hack, but since this is a bugfix, I took the
same approach that you did here to backport this to -stable kernels.



Acked-by: Giuseppe Cavallaro 



 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 10909c9c0033..03dbf8e89c4c 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -882,6 +882,13 @@ static int stmmac_init_phy(struct net_device *dev)
return -ENODEV;
}

+   /* stmmac_adjust_link will change this to PHY_IGNORE_INTERRUPT to avoid
+* subsequent PHY polling, make sure we force a link transition if
+* we have a UP/DOWN/UP transition
+*/
+   if (phydev->is_pseudo_fixed_link)
+   phydev->irq = PHY_POLL;
+
pr_debug("stmmac_init_phy:  %s: attached to PHY (UID 0x%x)"
 " Link = %d\n", dev->name, phydev->phy_id, phydev->link);

RE: [PATCH net 2/2] r8152: rx descriptor check

2016-11-13 Thread Hayes Wang

Mark Lord [mailto:ml...@pobox.com]
> Sent: Monday, November 14, 2016 4:34 AM
[...]
> Perhaps the driver
> is somehow accessing the buffer space again after doing usb_submit_urb()?
> That would certainly produce this kind of behaviour.

I don't think so. First, the driver only read the received buffer.
That is, the driver would not change (or write) the data. Second,
The driver would lose the point address of the received buffer
after submitting the urb to the USB host controller, until the
transfer is completed by the USB host controller. That is, the
driver doesn't how to access the buffer after calling usb_submit_urb().

Best Regards,
Hayes

RE: [PATCH net 2/2] r8152: rx descriptor check

2016-11-13 Thread Hayes Wang

David Miller [mailto:da...@davemloft.net]
> Sent: Monday, November 14, 2016 1:40 AM
[...]
> If you add this patch now, there is a much smaller likelyhood that you
> will work with a high priority to figure out _why_ this is happening.
> 
> For all we know this could be a platform bug in the DMA API for the
> systems in question.
> 
> It could also be a bug elsewhere in the driver, either in setting up
> the descriptor DMA mappings or how the chip is programmed.
> 
> Either way the true cause must be found before we start throwing
> changes like this into the driver.

Our hw engineer could check our device, and I could check the
driver. However, for the other parts, such as the USB host
controller or memory, it is difficult for me to make sure whether
they are correct or not. I could only promise our devices and
driver work fine.

Best Regards,
Hayes

Re: Long delays creating a netns after deleting one (possibly RCU related)

2016-11-13 Thread Cong Wang

On Fri, Nov 11, 2016 at 4:55 PM, Cong Wang  wrote:
> On Fri, Nov 11, 2016 at 4:23 PM, Paul E. McKenney
>  wrote:
>>
>> Ah!  This net_mutex is different than RTNL.  Should synchronize_net() be
>> modified to check for net_mutex being held in addition to the current
>> checks for RTNL being held?
>>
>
> Good point!
>
> Like commit be3fc413da9eb17cce0991f214ab0, checking
> for net_mutex for this case seems to be an optimization, I assume
> synchronize_rcu_expedited() and synchronize_rcu() have the same
> behavior...

Thinking a bit more, I think commit be3fc413da9eb17cce0991f
gets wrong on rtnl_is_locked(), the lock could be locked by other
process not by the current one, therefore it should be
lockdep_rtnl_is_held() which, however, is defined only when LOCKDEP
is enabled... Sigh.

I don't see any better way than letting callers decide if they want the
expedited version or not, but this requires changes of all callers of
synchronize_net(). Hm.

RE: [PATCH net 2/2] r8152: rx descriptor check

2016-11-13 Thread Hayes Wang

Francois Romieu [mailto:rom...@fr.zoreil.com]
> Sent: Friday, November 11, 2016 8:13 PM
[...]
> Invalid packet size corrupted receive descriptors in Realtek's device
> reminds of CVE-2009-4537.

Do you mean that the driver would get a packet exceed the size
which is set to RxMaxSize? I check it with our hw engineers.
They don't get any issue about RxMaxSize. And their test for
RxMaxSize register is fine.

> Is the silicium of both devices different enough to prevent the same
> exploit to happen ?

For this case, I don't think the device provide a invalid value
for the receive descriptors. However, the driver sees a different
value. That is why I say the memory is unbelievable.

Best Regards,
Hayes

Re: [PATCH] genetlink: fix unsigned int comparison with less than zero

2016-11-13 Thread Cong Wang

On Sun, Nov 13, 2016 at 9:15 AM, David Miller  wrote:
> I've commited the following to net-next:
>
> 
> [PATCH] genetlink: Make family a signed integer.
>
> The idr_alloc(), idr_remove(), et al. routines all expect IDs to be
> signed integers.  Therefore make the genl_family member 'id' signed
> too.

This is exactly what I replied to Johannes.

Thanks for the fix!

Re: [LKP] [net] 2ab9fb18c4: kernel BUG at include/linux/skbuff.h:1935!

2016-11-13 Thread Ye Xiaolong

On 11/14, Fengguang Wu wrote:
>>>Hi guys.
>>>
>>>I took a look at the commit again and I do not see how this can happen.
>>>
>>>Are you sure patch was properly applied ?
>>>
>>>In particular, the following extract is obscure for me :
>>>
>>>
https://github.com/0day-ci/linux 
Eric-Dumazet/net-__skb_flow_dissect-must-cap-its-return-value/20161110-080839
commit 2ab9fb18c46b91b16a0f0f329336d3be9fc32deb ("net: __skb_flow_dissect() 
must cap its return value")

>>
>>Hi,
>>
>>The above two lines means 0day repo setup a new branch
>>"Eric-Dumazet/net-__skb_flow_dissect-must-cap-its-return-value/20161110-080839"
>>which is based on net/master, then applied you patch on top of it,
>>commit id is 2ab9fb18c46b91b16a0f0f329336d3be9fc32deb.
>
>Xiaolong, it may be more helpful to show the base tree where we apply
>the patch to. And the final url:
>
>https://github.com/0day-ci/linux/tree/Eric-Dumazet/net-__skb_flow_dissect-must-cap-its-return-value/20161110-080839
>

Ok, I'll improve the appearance to make it more clear.

Thanks,
Xiaolong
>Thanks,
>Fengguang

[PATCH net-next v3 2/7] vxlan: avoid checking socket multiple times.

2016-11-13 Thread Pravin B Shelar

Check the vxlan socket in vxlan6_getroute().

Signed-off-by: Pravin B Shelar 
---
 drivers/net/vxlan.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 756d826..9adeff9 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1830,6 +1830,7 @@ static struct rtable *vxlan_get_route(struct vxlan_dev 
*vxlan,
 
 #if IS_ENABLED(CONFIG_IPV6)
 static struct dst_entry *vxlan6_get_route(struct vxlan_dev *vxlan,
+ struct vxlan_sock *sock6,
  struct sk_buff *skb, int oif, u8 tos,
  __be32 label,
  const struct in6_addr *daddr,
@@ -1837,7 +1838,6 @@ static struct dst_entry *vxlan6_get_route(struct 
vxlan_dev *vxlan,
  struct dst_cache *dst_cache,
  const struct ip_tunnel_info *info)
 {
-   struct vxlan_sock *sock6 = rcu_dereference(vxlan->vn6_sock);
bool use_cache = ip_tunnel_dst_cache_usable(skb, info);
struct dst_entry *ndst;
struct flowi6 fl6;
@@ -2069,11 +2069,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
struct dst_entry *ndst;
u32 rt6i_flags;
 
-   if (!sock6)
-   goto drop;
-   sk = sock6->sock->sk;
-
-   ndst = vxlan6_get_route(vxlan, skb,
+   ndst = vxlan6_get_route(vxlan, sock6, skb,
rdst ? rdst->remote_ifindex : 0, tos,
label, >sin6.sin6_addr,
>sin6.sin6_addr,
@@ -2093,6 +2089,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
goto tx_error;
}
 
+   sk = sock6->sock->sk;
/* Bypass encapsulation if the destination is local */
rt6i_flags = ((struct rt6_info *)ndst)->rt6i_flags;
if (!info && rt6i_flags & RTF_LOCAL &&
@@ -2432,9 +2429,10 @@ static int vxlan_fill_metadata_dst(struct net_device 
*dev, struct sk_buff *skb)
ip_rt_put(rt);
} else {
 #if IS_ENABLED(CONFIG_IPV6)
+   struct vxlan_sock *sock6 = rcu_dereference(vxlan->vn6_sock);
struct dst_entry *ndst;
 
-   ndst = vxlan6_get_route(vxlan, skb, 0, info->key.tos,
+   ndst = vxlan6_get_route(vxlan, sock6, skb, 0, info->key.tos,
info->key.label, >key.u.ipv6.dst,
>key.u.ipv6.src, NULL, info);
if (IS_ERR(ndst))
-- 
1.9.1

[PATCH net-next v3 3/7] vxlan: simplify exception handling

2016-11-13 Thread Pravin B Shelar

vxlan egress path error handling has became complicated, it
need to handle IPv4 and IPv6 tunnel cases.
Earlier patch removes vlan handling from vxlan_build_skb(), so
vxlan_build_skb does not need to free skb and we can simplify
the xmit path by having single error handling for both type of
tunnels.

Signed-off-by: Pravin B Shelar 
---
 drivers/net/vxlan.c | 46 +++---
 1 file changed, 19 insertions(+), 27 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 9adeff9..8bb58f6 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1753,11 +1753,11 @@ static int vxlan_build_skb(struct sk_buff *skb, struct 
dst_entry *dst,
/* Need space for new headers (invalidates iph ptr) */
err = skb_cow_head(skb, min_headroom);
if (unlikely(err))
-   goto out_free;
+   return err;
 
err = iptunnel_handle_offloads(skb, type);
if (err)
-   goto out_free;
+   return err;
 
vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
vxh->vx_flags = VXLAN_HF_VNI;
@@ -1781,16 +1781,12 @@ static int vxlan_build_skb(struct sk_buff *skb, struct 
dst_entry *dst,
if (vxflags & VXLAN_F_GPE) {
err = vxlan_build_gpe_hdr(vxh, vxflags, skb->protocol);
if (err < 0)
-   goto out_free;
+   return err;
inner_protocol = skb->protocol;
}
 
skb_set_inner_protocol(skb, inner_protocol);
return 0;
-
-out_free:
-   kfree_skb(skb);
-   return err;
 }
 
 static struct rtable *vxlan_get_route(struct vxlan_dev *vxlan,
@@ -1927,13 +1923,13 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
struct ip_tunnel_info *info;
struct vxlan_dev *vxlan = netdev_priv(dev);
struct sock *sk;
-   struct rtable *rt = NULL;
const struct iphdr *old_iph;
union vxlan_addr *dst;
union vxlan_addr remote_ip, local_ip;
union vxlan_addr *src;
struct vxlan_metadata _md;
struct vxlan_metadata *md = &_md;
+   struct dst_entry *ndst = NULL;
__be16 src_port = 0, dst_port;
__be32 vni, label;
__be16 df = 0;
@@ -2009,6 +2005,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
 
if (dst->sa.sa_family == AF_INET) {
struct vxlan_sock *sock4 = rcu_dereference(vxlan->vn4_sock);
+   struct rtable *rt;
 
if (!sock4)
goto drop;
@@ -2030,7 +2027,8 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
netdev_dbg(dev, "circular route to %pI4\n",
   >sin.sin_addr.s_addr);
dev->stats.collisions++;
-   goto rt_tx_error;
+   ip_rt_put(rt);
+   goto tx_error;
}
 
/* Bypass encapsulation if the destination is local */
@@ -2053,12 +2051,13 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
else if (info->key.tun_flags & TUNNEL_DONT_FRAGMENT)
df = htons(IP_DF);
 
+   ndst = >dst;
tos = ip_tunnel_ecn_encap(tos, old_iph, skb);
ttl = ttl ? : ip4_dst_hoplimit(>dst);
-   err = vxlan_build_skb(skb, >dst, sizeof(struct iphdr),
+   err = vxlan_build_skb(skb, ndst, sizeof(struct iphdr),
  vni, md, flags, udp_sum);
if (err < 0)
-   goto xmit_tx_error;
+   goto tx_error;
 
udp_tunnel_xmit_skb(rt, sk, skb, src->sin.sin_addr.s_addr,
dst->sin.sin_addr.s_addr, tos, ttl, df,
@@ -2066,7 +2065,6 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
 #if IS_ENABLED(CONFIG_IPV6)
} else {
struct vxlan_sock *sock6 = rcu_dereference(vxlan->vn6_sock);
-   struct dst_entry *ndst;
u32 rt6i_flags;
 
ndst = vxlan6_get_route(vxlan, sock6, skb,
@@ -2078,13 +2076,13 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
netdev_dbg(dev, "no route to %pI6\n",
   >sin6.sin6_addr);
dev->stats.tx_carrier_errors++;
+   ndst = NULL;
goto tx_error;
}
 
if (ndst->dev == dev) {
netdev_dbg(dev, "circular route to %pI6\n",
   >sin6.sin6_addr);
-   dst_release(ndst);
dev->stats.collisions++;
goto tx_error;
}
@@ -2096,12 +2094,12 @@ static void

[PATCH net-next v3 1/7] vxlan: avoid vlan processing in vxlan device.

2016-11-13 Thread Pravin B Shelar

VxLan device does not have special handling for vlan taging on egress.
Therefore it does not make sense to expose vlan offloading feature.
This patch does not change vxlan functinality.

Signed-off-by: Pravin B Shelar 
Acked-by: Jiri Benc 
---
 drivers/net/vxlan.c |  9 +
 include/linux/if_vlan.h | 16 
 2 files changed, 1 insertion(+), 24 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index cb5cc7c..756d826 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1748,18 +1748,13 @@ static int vxlan_build_skb(struct sk_buff *skb, struct 
dst_entry *dst,
}
 
min_headroom = LL_RESERVED_SPACE(dst->dev) + dst->header_len
-   + VXLAN_HLEN + iphdr_len
-   + (skb_vlan_tag_present(skb) ? VLAN_HLEN : 0);
+   + VXLAN_HLEN + iphdr_len;
 
/* Need space for new headers (invalidates iph ptr) */
err = skb_cow_head(skb, min_headroom);
if (unlikely(err))
goto out_free;
 
-   skb = vlan_hwaccel_push_inside(skb);
-   if (WARN_ON(!skb))
-   return -ENOMEM;
-
err = iptunnel_handle_offloads(skb, type);
if (err)
goto out_free;
@@ -2527,10 +2522,8 @@ static void vxlan_setup(struct net_device *dev)
dev->features   |= NETIF_F_GSO_SOFTWARE;
 
dev->vlan_features = dev->features;
-   dev->features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX;
dev->hw_features |= NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_RXCSUM;
dev->hw_features |= NETIF_F_GSO_SOFTWARE;
-   dev->hw_features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX;
netif_keep_dst(dev);
dev->priv_flags |= IFF_NO_QUEUE;
 
diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
index 3319d97..8d5fcd6 100644
--- a/include/linux/if_vlan.h
+++ b/include/linux/if_vlan.h
@@ -399,22 +399,6 @@ static inline struct sk_buff 
*__vlan_hwaccel_push_inside(struct sk_buff *skb)
skb->vlan_tci = 0;
return skb;
 }
-/*
- * vlan_hwaccel_push_inside - pushes vlan tag to the payload
- * @skb: skbuff to tag
- *
- * Checks is tag is present in @skb->vlan_tci and if it is, it pushes the
- * VLAN tag from @skb->vlan_tci inside to the payload.
- *
- * Following the skb_unshare() example, in case of error, the calling function
- * doesn't have to worry about freeing the original skb.
- */
-static inline struct sk_buff *vlan_hwaccel_push_inside(struct sk_buff *skb)
-{
-   if (skb_vlan_tag_present(skb))
-   skb = __vlan_hwaccel_push_inside(skb);
-   return skb;
-}
 
 /**
  * __vlan_hwaccel_put_tag - hardware accelerated VLAN inserting
-- 
1.9.1

[PATCH net-next v3 7/7] vxlan: remove unsed vxlan_dev_dst_port()

2016-11-13 Thread Pravin B Shelar

Signed-off-by: Pravin B Shelar 
---
 include/net/vxlan.h | 10 --
 1 file changed, 10 deletions(-)

diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index 308adc4..49a5920 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -281,16 +281,6 @@ struct vxlan_dev {
 struct net_device *vxlan_dev_create(struct net *net, const char *name,
u8 name_assign_type, struct vxlan_config 
*conf);
 
-static inline __be16 vxlan_dev_dst_port(struct vxlan_dev *vxlan,
-   unsigned short family)
-{
-#if IS_ENABLED(CONFIG_IPV6)
-   if (family == AF_INET6)
-   return inet_sk(vxlan->vn6_sock->sock->sk)->inet_sport;
-#endif
-   return inet_sk(vxlan->vn4_sock->sock->sk)->inet_sport;
-}
-
 static inline netdev_features_t vxlan_features_check(struct sk_buff *skb,
 netdev_features_t features)
 {
-- 
1.9.1

[PATCH net-next v3 4/7] vxlan: improve vxlan route lookup checks.

2016-11-13 Thread Pravin B Shelar

Move route sanity check to respective vxlan[4/6]_get_route functions.
This allows us to perform all sanity checks before caching the dst so
that we can avoid these checks on subsequent packets.
This give move accurate metadata information for packet from
fill_metadata_dst().

Signed-off-by: Pravin B Shelar 
---
 drivers/net/vxlan.c | 77 ++---
 1 file changed, 38 insertions(+), 39 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 8bb58f6..aabb918 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1789,7 +1789,8 @@ static int vxlan_build_skb(struct sk_buff *skb, struct 
dst_entry *dst,
return 0;
 }
 
-static struct rtable *vxlan_get_route(struct vxlan_dev *vxlan,
+static struct rtable *vxlan_get_route(struct vxlan_dev *vxlan, struct 
net_device *dev,
+ struct vxlan_sock *sock4,
  struct sk_buff *skb, int oif, u8 tos,
  __be32 daddr, __be32 *saddr,
  struct dst_cache *dst_cache,
@@ -1799,6 +1800,9 @@ static struct rtable *vxlan_get_route(struct vxlan_dev 
*vxlan,
struct rtable *rt = NULL;
struct flowi4 fl4;
 
+   if (!sock4)
+   return ERR_PTR(-EIO);
+
if (tos && !info)
use_cache = false;
if (use_cache) {
@@ -1816,16 +1820,26 @@ static struct rtable *vxlan_get_route(struct vxlan_dev 
*vxlan,
fl4.saddr = *saddr;
 
rt = ip_route_output_key(vxlan->net, );
-   if (!IS_ERR(rt)) {
+   if (likely(!IS_ERR(rt))) {
+   if (rt->dst.dev == dev) {
+   netdev_dbg(dev, "circular route to %pI4\n", );
+   ip_rt_put(rt);
+   return ERR_PTR(-ELOOP);
+   }
+
*saddr = fl4.saddr;
if (use_cache)
dst_cache_set_ip4(dst_cache, >dst, fl4.saddr);
+   } else {
+   netdev_dbg(dev, "no route to %pI4\n", );
+   return ERR_PTR(-ENETUNREACH);
}
return rt;
 }
 
 #if IS_ENABLED(CONFIG_IPV6)
 static struct dst_entry *vxlan6_get_route(struct vxlan_dev *vxlan,
+ struct net_device *dev,
  struct vxlan_sock *sock6,
  struct sk_buff *skb, int oif, u8 tos,
  __be32 label,
@@ -1861,8 +1875,16 @@ static struct dst_entry *vxlan6_get_route(struct 
vxlan_dev *vxlan,
err = ipv6_stub->ipv6_dst_lookup(vxlan->net,
 sock6->sock->sk,
 , );
-   if (err < 0)
-   return ERR_PTR(err);
+   if (unlikely(err < 0)) {
+   netdev_dbg(dev, "no route to %pI6\n", daddr);
+   return ERR_PTR(-ENETUNREACH);
+   }
+
+   if (unlikely(ndst->dev == dev)) {
+   netdev_dbg(dev, "circular route to %pI6\n", daddr);
+   dst_release(ndst);
+   return ERR_PTR(-ELOOP);
+   }
 
*saddr = fl6.saddr;
if (use_cache)
@@ -1929,8 +1951,8 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
union vxlan_addr *src;
struct vxlan_metadata _md;
struct vxlan_metadata *md = &_md;
-   struct dst_entry *ndst = NULL;
__be16 src_port = 0, dst_port;
+   struct dst_entry *ndst = NULL;
__be32 vni, label;
__be16 df = 0;
__u8 tos, ttl;
@@ -2007,29 +2029,14 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
struct vxlan_sock *sock4 = rcu_dereference(vxlan->vn4_sock);
struct rtable *rt;
 
-   if (!sock4)
-   goto drop;
-   sk = sock4->sock->sk;
-
-   rt = vxlan_get_route(vxlan, skb,
+   rt = vxlan_get_route(vxlan, dev, sock4, skb,
 rdst ? rdst->remote_ifindex : 0, tos,
 dst->sin.sin_addr.s_addr,
 >sin.sin_addr.s_addr,
 dst_cache, info);
-   if (IS_ERR(rt)) {
-   netdev_dbg(dev, "no route to %pI4\n",
-  >sin.sin_addr.s_addr);
-   dev->stats.tx_carrier_errors++;
-   goto tx_error;
-   }
-
-   if (rt->dst.dev == dev) {
-   netdev_dbg(dev, "circular route to %pI4\n",
-  >sin.sin_addr.s_addr);
-   dev->stats.collisions++;
-   ip_rt_put(rt);
+   if (IS_ERR(rt))
goto tx_error;
-   }
+   sk = sock4->sock->sk;
 
/*

[PATCH net-next v3 5/7] vxlan: simplify RTF_LOCAL handling.

2016-11-13 Thread Pravin B Shelar

Avoid code duplicate code for handling RTF_LOCAL routes.

Signed-off-by: Pravin B Shelar 
---
 drivers/net/vxlan.c | 85 -
 1 file changed, 51 insertions(+), 34 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index aabb918..0b188d6 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1938,6 +1938,40 @@ static void vxlan_encap_bypass(struct sk_buff *skb, 
struct vxlan_dev *src_vxlan,
}
 }
 
+static int encap_bypass_if_local(struct sk_buff *skb, struct net_device *dev,
+struct vxlan_dev *vxlan, union vxlan_addr 
*daddr,
+__be32 dst_port, __be32 vni, struct dst_entry 
*dst,
+u32 rt_flags)
+{
+#if IS_ENABLED(CONFIG_IPV6)
+   /* IPv6 rt-flags are checked against RTF_LOCAL, but the value of
+* RTF_LOCAL is equal to RTCF_LOCAL. So to keep code simple
+* we can use RTCF_LOCAL which works for ipv4 and ipv6 route entry.
+*/
+   BUILD_BUG_ON(RTCF_LOCAL != RTF_LOCAL);
+#endif
+   /* Bypass encapsulation if the destination is local */
+   if (rt_flags & RTCF_LOCAL &&
+   !(rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST))) {
+   struct vxlan_dev *dst_vxlan;
+
+   dst_release(dst);
+   dst_vxlan = vxlan_find_vni(vxlan->net, vni,
+  daddr->sa.sa_family, dst_port,
+  vxlan->flags);
+   if (!dst_vxlan) {
+   dev->stats.tx_errors++;
+   kfree_skb(skb);
+
+   return -ENOENT;
+   }
+   vxlan_encap_bypass(skb, vxlan, dst_vxlan);
+   return 1;
+   }
+
+   return 0;
+}
+
 static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
   struct vxlan_rdst *rdst, bool did_rsc)
 {
@@ -2036,27 +2070,19 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
 dst_cache, info);
if (IS_ERR(rt))
goto tx_error;
-   sk = sock4->sock->sk;
 
+   sk = sock4->sock->sk;
/* Bypass encapsulation if the destination is local */
-   if (!info && rt->rt_flags & RTCF_LOCAL &&
-   !(rt->rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST))) {
-   struct vxlan_dev *dst_vxlan;
-
-   ip_rt_put(rt);
-   dst_vxlan = vxlan_find_vni(vxlan->net, vni,
-  dst->sa.sa_family, dst_port,
-  vxlan->flags);
-   if (!dst_vxlan)
-   goto tx_error;
-   vxlan_encap_bypass(skb, vxlan, dst_vxlan);
-   return;
-   }
-
-   if (!info)
+   if (!info) {
+   err = encap_bypass_if_local(skb, dev, vxlan, dst,
+   dst_port, vni, >dst,
+   rt->rt_flags);
+   if (err)
+   return;
udp_sum = !(flags & VXLAN_F_UDP_ZERO_CSUM_TX);
-   else if (info->key.tun_flags & TUNNEL_DONT_FRAGMENT)
+   } else if (info->key.tun_flags & TUNNEL_DONT_FRAGMENT) {
df = htons(IP_DF);
+   }
 
ndst = >dst;
tos = ip_tunnel_ecn_encap(tos, old_iph, skb);
@@ -2072,7 +2098,6 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
 #if IS_ENABLED(CONFIG_IPV6)
} else {
struct vxlan_sock *sock6 = rcu_dereference(vxlan->vn6_sock);
-   u32 rt6i_flags;
 
ndst = vxlan6_get_route(vxlan, dev, sock6, skb,
rdst ? rdst->remote_ifindex : 0, tos,
@@ -2085,24 +2110,16 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
}
sk = sock6->sock->sk;
 
-   /* Bypass encapsulation if the destination is local */
-   rt6i_flags = ((struct rt6_info *)ndst)->rt6i_flags;
-   if (!info && rt6i_flags & RTF_LOCAL &&
-   !(rt6i_flags & (RTCF_BROADCAST | RTCF_MULTICAST))) {
-   struct vxlan_dev *dst_vxlan;
-
-   dst_vxlan = vxlan_find_vni(vxlan->net, vni,
-  dst->sa.sa_family, dst_port,
-  vxlan->flags);
-   if (!dst_vxlan)
-   goto tx_error;
-   dst_release(ndst);
-   vxlan_encap_bypass(skb, vxlan,

[PATCH net-next v3 0/7] vxlan: xmit improvements.

2016-11-13 Thread Pravin B Shelar

Following patch series improves vxlan fast path, removes
duplicate code and simplifies vxlan xmit code path.

v2-v3:
Removed unrelated warning fix from patch 2.
rearranged error handling from patch 3
Fixed stats updates in vxlan route lookup in patch 4

v1-v2:
Fix compilation error when IPv6 support is not enabled.


Pravin B Shelar (7):
  vxlan: avoid vlan processing in vxlan device.
  vxlan: avoid checking socket multiple times.
  vxlan: simplify exception handling
  vxlan: improve vxlan route lookup checks.
  vxlan: simplify RTF_LOCAL handling.
  vxlan: simplify vxlan xmit
  vxlan: remove unsed vxlan_dev_dst_port()

 drivers/net/vxlan.c | 285 +++-
 include/linux/if_vlan.h |  16 ---
 include/net/vxlan.h |  10 --
 3 files changed, 137 insertions(+), 174 deletions(-)

-- 
1.9.1

[PATCH net-next v3 6/7] vxlan: simplify vxlan xmit

2016-11-13 Thread Pravin B Shelar

Existing vxlan xmit function handles two distinct cases.
1. vxlan net device
2. vxlan lwt device.
By seperating initialization these two cases the egress path
looks better.

Signed-off-by: Pravin B Shelar 
Acked-by: Jiri Benc 
---
 drivers/net/vxlan.c | 78 +++--
 1 file changed, 34 insertions(+), 44 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 0b188d6..411534c 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1978,8 +1978,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
struct dst_cache *dst_cache;
struct ip_tunnel_info *info;
struct vxlan_dev *vxlan = netdev_priv(dev);
-   struct sock *sk;
-   const struct iphdr *old_iph;
+   const struct iphdr *old_iph = ip_hdr(skb);
union vxlan_addr *dst;
union vxlan_addr remote_ip, local_ip;
union vxlan_addr *src;
@@ -1988,7 +1987,6 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
__be16 src_port = 0, dst_port;
struct dst_entry *ndst = NULL;
__be32 vni, label;
-   __be16 df = 0;
__u8 tos, ttl;
int err;
u32 flags = vxlan->flags;
@@ -1998,19 +1996,40 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
info = skb_tunnel_info(skb);
 
if (rdst) {
+   dst = >remote_ip;
+   if (vxlan_addr_any(dst)) {
+   if (did_rsc) {
+   /* short-circuited back to local bridge */
+   vxlan_encap_bypass(skb, vxlan, vxlan);
+   return;
+   }
+   goto drop;
+   }
+
dst_port = rdst->remote_port ? rdst->remote_port : 
vxlan->cfg.dst_port;
vni = rdst->remote_vni;
-   dst = >remote_ip;
src = >cfg.saddr;
dst_cache = >dst_cache;
+   md->gbp = skb->mark;
+   ttl = vxlan->cfg.ttl;
+   if (!ttl && vxlan_addr_multicast(dst))
+   ttl = 1;
+
+   tos = vxlan->cfg.tos;
+   if (tos == 1)
+   tos = ip_tunnel_get_dsfield(old_iph, skb);
+
+   if (dst->sa.sa_family == AF_INET)
+   udp_sum = !(flags & VXLAN_F_UDP_ZERO_CSUM_TX);
+   else
+   udp_sum = !(flags & VXLAN_F_UDP_ZERO_CSUM6_TX);
+   label = vxlan->cfg.label;
} else {
if (!info) {
WARN_ONCE(1, "%s: Missing encapsulation instructions\n",
  dev->name);
goto drop;
}
-   dst_port = info->key.tp_dst ? : vxlan->cfg.dst_port;
-   vni = tunnel_id_to_key32(info->key.tun_id);
remote_ip.sa.sa_family = ip_tunnel_info_af(info);
if (remote_ip.sa.sa_family == AF_INET) {
remote_ip.sin.sin_addr.s_addr = info->key.u.ipv4.dst;
@@ -2020,48 +2039,24 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
local_ip.sin6.sin6_addr = info->key.u.ipv6.src;
}
dst = _ip;
+   dst_port = info->key.tp_dst ? : vxlan->cfg.dst_port;
+   vni = tunnel_id_to_key32(info->key.tun_id);
src = _ip;
dst_cache = >dst_cache;
-   }
-
-   if (vxlan_addr_any(dst)) {
-   if (did_rsc) {
-   /* short-circuited back to local bridge */
-   vxlan_encap_bypass(skb, vxlan, vxlan);
-   return;
-   }
-   goto drop;
-   }
-
-   old_iph = ip_hdr(skb);
-
-   ttl = vxlan->cfg.ttl;
-   if (!ttl && vxlan_addr_multicast(dst))
-   ttl = 1;
-
-   tos = vxlan->cfg.tos;
-   if (tos == 1)
-   tos = ip_tunnel_get_dsfield(old_iph, skb);
-
-   label = vxlan->cfg.label;
-   src_port = udp_flow_src_port(dev_net(dev), skb, vxlan->cfg.port_min,
-vxlan->cfg.port_max, true);
-
-   if (info) {
+   if (info->options_len)
+   md = ip_tunnel_info_opts(info);
ttl = info->key.ttl;
tos = info->key.tos;
label = info->key.label;
udp_sum = !!(info->key.tun_flags & TUNNEL_CSUM);
-
-   if (info->options_len)
-   md = ip_tunnel_info_opts(info);
-   } else {
-   md->gbp = skb->mark;
}
+   src_port = udp_flow_src_port(dev_net(dev), skb, vxlan->cfg.port_min,
+vxlan->cfg.port_max, true);
 
if (dst->sa.sa_family == AF_INET) {
struct vxlan_sock *sock4 =

Re: [PATCH 00/39] Netfilter updates for net-next

2016-11-13 Thread David Miller

From: Pablo Neira Ayuso 
Date: Sun, 13 Nov 2016 23:24:54 +0100

> The following patchset contains a second batch of Netfilter updates
> for your net-next tree. This includes a rework of the core hook
> infrastructure that improves Netfilter performance by ~15% according
> to synthetic benchmarks. Then, a large batch with ipset updates,
> including a new hash:ipmac set type, via Jozsef Kadlecsik. This also
> includes a couple of assorted updates.

Looks great, pulled, thanks!

Re: [PATCH v2 net-next 1/5] bpf: Refactor cgroups code in prep for new type

2016-11-13 Thread David Ahern

On 10/31/16 11:49 AM, Thomas Graf wrote:
> On 10/31/16 at 06:16pm, Daniel Mack wrote:
>> On 10/31/2016 06:05 PM, David Ahern wrote:
>>> On 10/31/16 11:00 AM, Daniel Mack wrote:
 Yeah, I'm confused too. I changed that name in my v7 from 
 BPF_PROG_TYPE_CGROUP_SOCK to BPF_PROG_TYPE_CGROUP_SKB on David's
 (Ahern) request. Why is it now renamed again?
>>>
>>> Thomas pushed back on adding another program type in favor of using
>>> subtypes. So this makes the program type generic to CGROUP and patch
>>> 2 in this v2 set added Mickaël's subtype patch with the socket
>>> mangling done that way in patch 3.
>>>
>>
>> Fine for me. I can change it around again.
> 
> I would like to hear from Daniel B and Alexei as well. We need to
> decide whether to use subtypes consistently and treat prog types as
> something more high level or whether to bluntly introduce a new prog
> type for every distinct set of verifier limits. I will change lwt_bpf
> as well accordingly.
> 

Alexei / Daniel - any comments/preferences on subtypes vs program types?

Re: [PATCH net-next] mdio: Demote print from info to debug in mdio_driver_register

2016-11-13 Thread Andrew Lunn

On Sun, Nov 13, 2016 at 07:01:17PM -0800, Florian Fainelli wrote:
> While it is useful to know which MDIO driver is being registered, demote
> the pr_info() to a pr_debug().
> 
> Signed-off-by: Florian Fainelli 

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH net-next 00/11] Start adding support for mv88e6390 family

2016-11-13 Thread David Miller

From: Andrew Lunn 
Date: Sun, 13 Nov 2016 21:24:03 +0100

> What seems to be the issue is you said you have accepted:
> 
> [PATCH net-next 0/2] Fixes for port refactoring
> https://marc.info/?l=linux-netdev=147880114928996=1
> 
> Yet i don't see these in net-next. And i based this patchset on a tree
> which included the fixes. Hence they are not applying.
> 
> Have the fixes really been accepted?

Accepted but not pushed out properly, sorry.

This should be sorted out now.

Re: [LKP] [net] 2ab9fb18c4: kernel BUG at include/linux/skbuff.h:1935!

2016-11-13 Thread Fengguang Wu


Hi guys.

I took a look at the commit again and I do not see how this can happen.

Are you sure patch was properly applied ?

In particular, the following extract is obscure for me :



https://github.com/0day-ci/linux 
Eric-Dumazet/net-__skb_flow_dissect-must-cap-its-return-value/20161110-080839
commit 2ab9fb18c46b91b16a0f0f329336d3be9fc32deb ("net: __skb_flow_dissect() must cap 
its return value")



Hi,

The above two lines means 0day repo setup a new branch
"Eric-Dumazet/net-__skb_flow_dissect-must-cap-its-return-value/20161110-080839"
which is based on net/master, then applied you patch on top of it,
commit id is 2ab9fb18c46b91b16a0f0f329336d3be9fc32deb.


Xiaolong, it may be more helpful to show the base tree where we apply
the patch to. And the final url:

https://github.com/0day-ci/linux/tree/Eric-Dumazet/net-__skb_flow_dissect-must-cap-its-return-value/20161110-080839

Thanks,
Fengguang

[PATCH net-next] mdio: Demote print from info to debug in mdio_driver_register

2016-11-13 Thread Florian Fainelli

While it is useful to know which MDIO driver is being registered, demote
the pr_info() to a pr_debug().

Signed-off-by: Florian Fainelli 
---
 drivers/net/phy/mdio_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/phy/mdio_device.c b/drivers/net/phy/mdio_device.c
index 9c88e6749b9a..43c8fd46504b 100644
--- a/drivers/net/phy/mdio_device.c
+++ b/drivers/net/phy/mdio_device.c
@@ -144,7 +144,7 @@ int mdio_driver_register(struct mdio_driver *drv)
struct mdio_driver_common *mdiodrv = >mdiodrv;
int retval;
 
-   pr_info("mdio_driver_register: %s\n", mdiodrv->driver.name);
+   pr_debug("mdio_driver_register: %s\n", mdiodrv->driver.name);
 
mdiodrv->driver.bus = _bus_type;
mdiodrv->driver.probe = mdio_probe;
-- 
2.9.3

Re: [PATCH net 2/3] bpf, mlx5: fix various refcount/prog issues in mlx5e_xdp_set

2016-11-13 Thread Alexei Starovoitov

On Mon, Nov 14, 2016 at 01:43:41AM +0100, Daniel Borkmann wrote:
> There are multiple issues in mlx5e_xdp_set():
> 
> 1) prog can be NULL, so calling unconditionally into bpf_prog_add(prog,
>priv->params.num_channels) can end badly.
> 
> 2) The batched bpf_prog_add() should be done at an earlier point in
>time. This makes sure that we cannot fail anymore at the time we
>want to set the program for each channel. This only means that we
>have to undo the bpf_prog_add() in case we return early due to
>reset or device not in MLX5E_STATE_OPENED yet. Note, err is 0 here.
> 
> 3) When swapping the priv->xdp_prog, then no extra reference count must
>be taken since we got that from call path via dev_change_xdp_fd()
>already. Otherwise, we'd never be able to free the program. Also,
>bpf_prog_add() without checking the return code could fail.
> 
> Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support")
> Signed-off-by: Daniel Borkmann 
...
> +static inline void bpf_prog_sub(struct bpf_prog *prog, int i)
> +{
> +}
> +
>  static inline void bpf_prog_put(struct bpf_prog *prog)
>  {
>  }
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 751e806..a0fca9f 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -682,6 +682,17 @@ struct bpf_prog *bpf_prog_add(struct bpf_prog *prog, int 
> i)
>  }
>  EXPORT_SYMBOL_GPL(bpf_prog_add);
>  
> +void bpf_prog_sub(struct bpf_prog *prog, int i)
> +{
> + /* Only to be used for undoing previous bpf_prog_add() in some
> +  * error path. We still know that another entity in our call
> +  * path holds a reference to the program, thus atomic_sub() can
> +  * be safely used in such cases!
> +  */
> + WARN_ON(atomic_sub_return(i, >aux->refcnt) == 0);
> +}
> +EXPORT_SYMBOL_GPL(bpf_prog_sub);

the patches look good. I'm only worried about net/net-next merge
conflict here. (I would have to deal with it as well).
So instead of copying the above helper can we apply net-next's
'bpf, mlx4: fix prog refcount in mlx4_en_try_alloc_resources error path'
patch to net without mlx4_xdp_set hunk and then apply
the rest of this patch?
Even better is to send this patch 2/3 to net-next?
yes, it's an issue, but very small one. There is no security
concern here, so I would prefer to avoid merge conflict.
Did you do a test merge of net/net-next by any chance?
May be I'm overreacting.

Re: [PATCH net-next 05/11] net: dsa: mv88e6xxx: Add comment about family a device belongs to

2016-11-13 Thread Andrew Lunn

On Mon, Nov 14, 2016 at 01:08:13PM +1100, Vivien Didelot wrote:
> Hi Andrew,
> 
> Andrew Lunn  writes:
> 
> > Knowing the family of device belongs to helps with picking the ops
> > implementation which is appropriate to the device. So add a comment to
> > each structure of ops.
> 
> This commit is not necessary. mv88e6xxx_ops structure must be per-chip,
> and the family information is already described in patch 03/11.

I disagree. I made a lot of errors adding the right per family handler
to these structures, simply because it is not obvious what family a
device belongs to when looking at the structure.

   Andrew

Re: [PATCH net-next 08/11] net: dsa: mv88e6xxx: Add stats_get_sset_count to ops structure

2016-11-13 Thread Vivien Didelot

Hi Andrew,

Andrew Lunn  writes:

> Different families have different sets of statistics. Abstract this
> using a stats_get_sset_count op. Each stat has a bitmap, and the ops
> implementer uses a bit map mask to count the statistics which apply
> for the family.

> -static int mv88e6xxx_get_sset_count(struct dsa_switch *ds)
> +static int _mv88e6xxx_get_sset_count(struct mv88e6xxx_chip *chip, int types)

Looks good overall. But please don't re-introduce underscore-prefixed
helpers. If I'm not mistaken, stats are a Global 1 feature, so ordered
explicit helpers in global1.c will be perfect.

If the stats code is huge, don't hesitate to move them in a
global1_stats.c file, as you wish. But we have to keep it
self-documented and easy to follow for new developers.

Thanks,

Vivien

Re: [PATCH net-next v1] bpf: Use u64_to_user_ptr()

2016-11-13 Thread Alexei Starovoitov

On Sun, Nov 13, 2016 at 07:44:03PM +0100, Mickaël Salaün wrote:
> Replace the custom u64_to_ptr() function with the u64_to_user_ptr()
> macro.
> 
> Signed-off-by: Mickaël Salaün 

Thanks for following up on this one.
Acked-by: Alexei Starovoitov

Re: [PATCH net-next 07/11] net: dsa: mv88e6xxx: Add mv88e6390 statistics unit init

2016-11-13 Thread Vivien Didelot

Hi Andrew,

Andrew Lunn  writes:

> The statistics unit on the mv88e6390 needs to the configured in a
> different register to the others as to what histogram statistics is
> should return.

Can you re-phrase the above please?

> +static int mv88e6390_stats_init(struct mv88e6xxx_chip *chip)
> +{
> + u16 val;
> + int err;
> +
> + err = mv88e6xxx_g1_read(chip, GLOBAL_CONTROL_2, );
> + if (err)
> + return err;
> +
> + val |= GLOBAL_CONTROL_2_HIST_RX_TX;
> +
> + err = mv88e6xxx_g1_write(chip, GLOBAL_CONTROL_2, val);
> +
> + return err;
> +}

Can you please move this Global 1 specific helper in global1.c under an
ordered snippet such as:

/* Offset 0x1C: Global Control 2 */

int mv88e6xxx_g1_set_foo(struct mv88e6xxx_chip *chip)
{
...
}

I'd like internal SMI devices to be self documented in their specific
files and easy to hack for new developers. Ordered helpers will help.

Also, the helper should reflect what it really does. It is used to set
the Histogram Counters Mode. So please name it accordingly, something
like mv88e6xxx_g1_set_hist_count_mode().

Thanks,

Vivien

Re: [PATCH net-next 03/11] net: dsa: mv88e6xxx: Add the mv88e6390 family

2016-11-13 Thread Vivien Didelot

Hi Andrew,

Andrew Lunn  writes:

> -- compatible   : Should be one of "marvell,mv88e6085",
> +- compatible: Should be one of "marvell,mv88e6085" or
> +  "marvell,mv88e6390"

Just curious here, mv88e6085 was choosen because it was the smaller
product ID supported. Following that logic, shouldn't mv88e6190 be
choosen here instead of mv88e6390?

> +static const struct mv88e6xxx_ops mv88e6390_ops = {
> + .set_switch_mac = mv88e6xxx_g2_set_switch_mac,
> + .phy_read = mv88e6xxx_g2_smi_phy_read,
> + .phy_write = mv88e6xxx_g2_smi_phy_write,
> + .port_set_link = mv88e6xxx_port_set_link,
> + .port_set_duplex = mv88e6xxx_port_set_duplex,
> + .port_set_rgmii_delay = mv88e6390_port_set_rgmii_delay,
> + .port_set_speed = mv88e6390_port_set_speed,
> +};
> +
> +static const struct mv88e6xxx_ops mv88e6390x_ops = {
> + .set_switch_mac = mv88e6xxx_g2_set_switch_mac,
> + .phy_read = mv88e6xxx_g2_smi_phy_read,
> + .phy_write = mv88e6xxx_g2_smi_phy_write,
> + .port_set_link = mv88e6xxx_port_set_link,
> + .port_set_duplex = mv88e6xxx_port_set_duplex,
> + .port_set_rgmii_delay = mv88e6390_port_set_rgmii_delay,
> + .port_set_speed = mv88e6390x_port_set_speed,
> +};

Even if it is a bit more verbose, I'd intentionally keep one
mv88e6xxx_ops structure per chip. Using per-family structure is
error-prone and simpler is better here.

Thanks,

Vivien

Re: [net] 2ab9fb18c4: kernel BUG at include/linux/skbuff.h:1935!

2016-11-13 Thread Ye Xiaolong

On 11/13, Eric Dumazet wrote:
>On Mon, 2016-11-14 at 07:49 +0800, kernel test robot wrote:
>> FYI, we noticed the following commit:
>
>
>> in testcase: kbuild
>> with following parameters:
>> 
>>  runtime: 300s
>>  nr_task: 50%
>>  cpufreq_governor: performance
>> 
>> 
>> 
>> 
>> on test machine: 8 threads Intel(R) Atom(TM) CPU  C2750  @ 2.40GHz with 16G 
>> memory
>> 
>> caused below changes:
>> 
>> 
>> +---+++
>> |   | cdb26d3387 | 
>> 2ab9fb18c4 |
>> +---+++
>> | boot_successes| 10 | 3 
>>  |
>> | boot_failures | 0  | 9 
>>  |
>> | kernel_BUG_at_include/linux/skbuff.h  | 0  | 8 
>>  |
>> | invalid_opcode:#[##]SMP   | 0  | 8 
>>  |
>> | RIP:eth_type_trans| 0  | 8 
>>  |
>> | Kernel_panic-not_syncing:Fatal_exception_in_interrupt | 0  | 5 
>>  |
>> | WARNING:at_fs/sysfs/dir.c:#sysfs_warn_dup | 0  | 1 
>>  |
>> | calltrace:parport_pc_init | 0  | 1 
>>  |
>> | calltrace:SyS_finit_module| 0  | 1 
>>  |
>> | WARNING:at_lib/kobject.c:#kobject_add_internal| 0  | 1 
>>  |
>> +---+++
>> 
>> 
>> 
>> [   20.491020] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>> [   20.502988] Sending DHCP requests .
>> [   20.506729] [ cut here ]
>> [   20.511369] kernel BUG at include/linux/skbuff.h:1935!
>> [   20.517893] invalid opcode:  [#1] SMP
>> [   20.521902] Modules linked in:
>> [   20.524979] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 
>> 4.9.0-rc3-00286-g2ab9fb1 #1
>> [   20.532463] Hardware name: Supermicro SYS-5018A-TN4/A1SAi, BIOS 1.1a 
>> 08/27/2015
>> [   20.539768] task: 8804456c2480 task.stack: c9000192
>> [   20.545684] RIP: 0010:[]  [] 
>> eth_type_trans+0xe8/0x140
>> [   20.553972] RSP: 0018:88047fd03db8  EFLAGS: 00010297
>> [   20.559283] RAX: 0158 RBX: 88047d8ae600 RCX: 
>> 1073
>> [   20.566415] RDX: 88047bf07dc0 RSI: 88047d8a4000 RDI: 
>> 88047dac0f00
>> [   20.573546] RBP: 88047fd03e20 R08: 88047d8a4000 R09: 
>> 0800
>> [   20.580678] R10: 88047bf07ec0 R11: ea0011f6e400 R12: 
>> 88047dac0f00
>> [   20.587810] R13: 880457413000 R14: c90002129000 R15: 
>> 015e
>> [   20.594946] FS:  () GS:88047fd0() 
>> knlGS:
>> [   20.603032] CS:  0010 DS:  ES:  CR0: 80050033
>> [   20.608775] CR2: 7fffadfb4ef0 CR3: 00047ee07000 CR4: 
>> 001006e0
>> [   20.615906] Stack:
>> [   20.617927]  816905a7 ea0011f6e400 ea08 
>> 88047d8ae450
>> [   20.625403]  88047d8ae400 00400166 ea0011f6e400 
>> 
>> [   20.632873]  0040  88047d8ae450 
>> 88047d8b1140
>> [   20.640352] Call Trace:
>> [   20.642805]   
>> [   20.644740]  [] ? igb_clean_rx_irq+0x6a7/0x7d0
>> [   20.650760]  [] igb_poll+0x382/0x700
>> [   20.655904]  [] ? timerqueue_add+0x59/0xb0
>> [   20.661564]  [] net_rx_action+0x217/0x360
>> [   20.667137]  [] __do_softirq+0x104/0x2ab
>> [   20.672624]  [] irq_exit+0xf1/0x100
>> [   20.677673]  [] do_IRQ+0x54/0xd0
>> [   20.682466]  [] common_interrupt+0x8c/0x8c
>> [   20.688123]   
>> [   20.690054]  [] ? cpuidle_enter_state+0x122/0x2e0
>> [   20.696333]  [] cpuidle_enter+0x17/0x20
>> [   20.701733]  [] call_cpuidle+0x23/0x40
>> [   20.707045]  [] cpu_startup_entry+0x114/0x200
>> [   20.712964]  [] start_secondary+0x107/0x130
>> [   20.718708] Code: 00 04 00 00 c9 c3 48 33 86 70 03 00 00 48 c1 e0 10 48 
>> 85 c0 0f b6 87 90 00 00 00 75 28 83 e0 f8 83 c8 01 88 87 90 00 00 00 eb 82 
>> <0f> 0b 0f b6 87 90 00 00 00 83 e0 f8 83 c8 03 88 87 90 00 00 00 
>> [   20.738722] RIP  [] eth_type_trans+0xe8/0x140
>> [   20.744662]  RSP 
>> [   20.748160] ---[ end trace 153440bf1ca2e6fc ]---
>> [   20.748165] [ cut here ]
>> 
>> 
>> To reproduce:
>> 
>> git clone 
>> git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
>> cd lkp-tests
>> bin/lkp install job.yaml  # job file is attached in this email
>> bin/lkp run job.yaml
>> 
>> 
>> 
>> Thanks,
>> Kernel Test Robot
>
>
>Hi guys.
>
>I took a look at the commit again and I do not see how this can happen.
>
>Are you sure patch was properly applied ?
>
>In particular, the following extract is obscure for me :
>
>
>>

Re: [PATCH net-next 05/11] net: dsa: mv88e6xxx: Add comment about family a device belongs to

2016-11-13 Thread Vivien Didelot

Hi Andrew,

Andrew Lunn  writes:

> Knowing the family of device belongs to helps with picking the ops
> implementation which is appropriate to the device. So add a comment to
> each structure of ops.

This commit is not necessary. mv88e6xxx_ops structure must be per-chip,
and the family information is already described in patch 03/11.

Thanks,

Vivien

[PATCH net] net: stmmac: Fix lack of link transition for fixed PHYs

2016-11-13 Thread Florian Fainelli

Commit 52f95bbfcf72 ("stmmac: fix adjust link call in case of a switch
is attached") added some logic to avoid polling the fixed PHY and
therefore invoking the adjust_link callback more than once, since this
is a fixed PHY and link events won't be generated.

This works fine the first time, because we start with phydev->irq =
PHY_POLL, so we call adjust_link, then we set phydev->irq =
PHY_IGNORE_INTERRUPT and we stop polling the PHY.

Now, if we called ndo_close(), which calls both phy_stop() and does an
explicit netif_carrier_off(), we end up with a link down. Upon calling
ndo_open() again, despite starting the PHY state machine, we have
PHY_IGNORE_INTERRUPT set, and we generate no link event at all, so the
link is permanently down.

52f95bbfcf72 ("stmmac: fix adjust link call in case of a switch is attached")
Signed-off-by: Florian Fainelli 
---
Alexandre, Peppe,

The original patch is already a hack, but since this is a bugfix, I took the
same approach that you did here to backport this to -stable kernels.

 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 10909c9c0033..03dbf8e89c4c 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -882,6 +882,13 @@ static int stmmac_init_phy(struct net_device *dev)
return -ENODEV;
}
 
+   /* stmmac_adjust_link will change this to PHY_IGNORE_INTERRUPT to avoid
+* subsequent PHY polling, make sure we force a link transition if
+* we have a UP/DOWN/UP transition
+*/
+   if (phydev->is_pseudo_fixed_link)
+   phydev->irq = PHY_POLL;
+
pr_debug("stmmac_init_phy:  %s: attached to PHY (UID 0x%x)"
 " Link = %d\n", dev->name, phydev->phy_id, phydev->link);
 
-- 
2.9.3

Re: [net] 2ab9fb18c4: kernel BUG at include/linux/skbuff.h:1935!

2016-11-13 Thread Eric Dumazet

On Mon, 2016-11-14 at 07:49 +0800, kernel test robot wrote:
> FYI, we noticed the following commit:


> in testcase: kbuild
> with following parameters:
> 
>   runtime: 300s
>   nr_task: 50%
>   cpufreq_governor: performance
> 
> 
> 
> 
> on test machine: 8 threads Intel(R) Atom(TM) CPU  C2750  @ 2.40GHz with 16G 
> memory
> 
> caused below changes:
> 
> 
> +---+++
> |   | cdb26d3387 | 
> 2ab9fb18c4 |
> +---+++
> | boot_successes| 10 | 3  
> |
> | boot_failures | 0  | 9  
> |
> | kernel_BUG_at_include/linux/skbuff.h  | 0  | 8  
> |
> | invalid_opcode:#[##]SMP   | 0  | 8  
> |
> | RIP:eth_type_trans| 0  | 8  
> |
> | Kernel_panic-not_syncing:Fatal_exception_in_interrupt | 0  | 5  
> |
> | WARNING:at_fs/sysfs/dir.c:#sysfs_warn_dup | 0  | 1  
> |
> | calltrace:parport_pc_init | 0  | 1  
> |
> | calltrace:SyS_finit_module| 0  | 1  
> |
> | WARNING:at_lib/kobject.c:#kobject_add_internal| 0  | 1  
> |
> +---+++
> 
> 
> 
> [   20.491020] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> [   20.502988] Sending DHCP requests .
> [   20.506729] [ cut here ]
> [   20.511369] kernel BUG at include/linux/skbuff.h:1935!
> [   20.517893] invalid opcode:  [#1] SMP
> [   20.521902] Modules linked in:
> [   20.524979] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 
> 4.9.0-rc3-00286-g2ab9fb1 #1
> [   20.532463] Hardware name: Supermicro SYS-5018A-TN4/A1SAi, BIOS 1.1a 
> 08/27/2015
> [   20.539768] task: 8804456c2480 task.stack: c9000192
> [   20.545684] RIP: 0010:[]  [] 
> eth_type_trans+0xe8/0x140
> [   20.553972] RSP: 0018:88047fd03db8  EFLAGS: 00010297
> [   20.559283] RAX: 0158 RBX: 88047d8ae600 RCX: 
> 1073
> [   20.566415] RDX: 88047bf07dc0 RSI: 88047d8a4000 RDI: 
> 88047dac0f00
> [   20.573546] RBP: 88047fd03e20 R08: 88047d8a4000 R09: 
> 0800
> [   20.580678] R10: 88047bf07ec0 R11: ea0011f6e400 R12: 
> 88047dac0f00
> [   20.587810] R13: 880457413000 R14: c90002129000 R15: 
> 015e
> [   20.594946] FS:  () GS:88047fd0() 
> knlGS:
> [   20.603032] CS:  0010 DS:  ES:  CR0: 80050033
> [   20.608775] CR2: 7fffadfb4ef0 CR3: 00047ee07000 CR4: 
> 001006e0
> [   20.615906] Stack:
> [   20.617927]  816905a7 ea0011f6e400 ea08 
> 88047d8ae450
> [   20.625403]  88047d8ae400 00400166 ea0011f6e400 
> 
> [   20.632873]  0040  88047d8ae450 
> 88047d8b1140
> [   20.640352] Call Trace:
> [   20.642805]   
> [   20.644740]  [] ? igb_clean_rx_irq+0x6a7/0x7d0
> [   20.650760]  [] igb_poll+0x382/0x700
> [   20.655904]  [] ? timerqueue_add+0x59/0xb0
> [   20.661564]  [] net_rx_action+0x217/0x360
> [   20.667137]  [] __do_softirq+0x104/0x2ab
> [   20.672624]  [] irq_exit+0xf1/0x100
> [   20.677673]  [] do_IRQ+0x54/0xd0
> [   20.682466]  [] common_interrupt+0x8c/0x8c
> [   20.688123]   
> [   20.690054]  [] ? cpuidle_enter_state+0x122/0x2e0
> [   20.696333]  [] cpuidle_enter+0x17/0x20
> [   20.701733]  [] call_cpuidle+0x23/0x40
> [   20.707045]  [] cpu_startup_entry+0x114/0x200
> [   20.712964]  [] start_secondary+0x107/0x130
> [   20.718708] Code: 00 04 00 00 c9 c3 48 33 86 70 03 00 00 48 c1 e0 10 48 85 
> c0 0f b6 87 90 00 00 00 75 28 83 e0 f8 83 c8 01 88 87 90 00 00 00 eb 82 <0f> 
> 0b 0f b6 87 90 00 00 00 83 e0 f8 83 c8 03 88 87 90 00 00 00 
> [   20.738722] RIP  [] eth_type_trans+0xe8/0x140
> [   20.744662]  RSP 
> [   20.748160] ---[ end trace 153440bf1ca2e6fc ]---
> [   20.748165] [ cut here ]
> 
> 
> To reproduce:
> 
> git clone 
> git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
> cd lkp-tests
> bin/lkp install job.yaml  # job file is attached in this email
> bin/lkp run job.yaml
> 
> 
> 
> Thanks,
> Kernel Test Robot


Hi guys.

I took a look at the commit again and I do not see how this can happen.

Are you sure patch was properly applied ?

In particular, the following extract is obscure for me :


> https://github.com/0day-ci/linux 
> Eric-Dumazet/net-__skb_flow_dissect-must-cap-its-return-value/20161110-080839
> commit 2ab9fb18c46b91b16a0f0f329336d3be9fc32deb ("net:

Re: [PATCH net-next 02/11] net: dsa: mv88e6xxx: Fix unused variable warning by using variable

2016-11-13 Thread Vivien Didelot

Hi Andrew,

Andrew Lunn  writes:

>  _mv88e6xxx_stats_wait() did not check the return value from
>  mv88e6xxx_g1_read(), so the compiler complained about set but unused
>  err.
>
> Signed-off-by: Andrew Lunn 

Reviewed-by: Vivien Didelot 

Thanks,

Vivien

Re: [PATCH net-next 01/11] net: dsa: mv88e6xxx: Take switch out of reset before probe

2016-11-13 Thread Vivien Didelot

Hi Andrew,

Andrew Lunn  writes:

> The switch needs to be taken out of reset before we can read its ID
> register on the MDIO bus.
>
> Signed-off-by: Andrew Lunn 

Reviewed-by: Vivien Didelot 

Thanks,

Vivien

[PATCH net 0/3] Couple of BPF refcount fixes for mlx5

2016-11-13 Thread Daniel Borkmann

Various mlx5 bugs on eBPF program and refcount handling I found during review.
Since these kind of bugs happened multiple times here, I'll add a __must_check
to the bpf_prog_inc()/bpf_prog_add()/etc functions for net-next, so these things
will let the compiler (and thus kbuild bot) bark early enough. Note, turned out,
I had to take the hunk from c540594f864b ("bpf, mlx4: fix prog refcount in
mlx4_en_try_alloc_resources error path") to get bpf_prog_sub() function for net
as well, but the merge into net-next should add no conflicts.

Rana, please review.

Thanks a lot!

Daniel Borkmann (3):
  bpf, mlx5: fix mlx5e_create_rq taking reference on prog
  bpf, mlx5: fix various refcount/prog issues in mlx5e_xdp_set
  bpf, mlx5: drop priv->xdp_prog reference on netdev cleanup

 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 42 ++-
 include/linux/bpf.h   |  5 +++
 kernel/bpf/syscall.c  | 12 +++
 3 files changed, 51 insertions(+), 8 deletions(-)

-- 
1.9.3

[PATCH net 2/3] bpf, mlx5: fix various refcount/prog issues in mlx5e_xdp_set

2016-11-13 Thread Daniel Borkmann

There are multiple issues in mlx5e_xdp_set():

1) prog can be NULL, so calling unconditionally into bpf_prog_add(prog,
   priv->params.num_channels) can end badly.

2) The batched bpf_prog_add() should be done at an earlier point in
   time. This makes sure that we cannot fail anymore at the time we
   want to set the program for each channel. This only means that we
   have to undo the bpf_prog_add() in case we return early due to
   reset or device not in MLX5E_STATE_OPENED yet. Note, err is 0 here.

3) When swapping the priv->xdp_prog, then no extra reference count must
   be taken since we got that from call path via dev_change_xdp_fd()
   already. Otherwise, we'd never be able to free the program. Also,
   bpf_prog_add() without checking the return code could fail.

Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support")
Signed-off-by: Daniel Borkmann 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 25 ++-
 include/linux/bpf.h   |  5 +
 kernel/bpf/syscall.c  | 11 ++
 3 files changed, 36 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 2b83667..c90610a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3125,6 +3125,17 @@ static int mlx5e_xdp_set(struct net_device *netdev, 
struct bpf_prog *prog)
goto unlock;
}
 
+   if (prog) {
+   /* num_channels is invariant here, so we can take the
+* batched reference right upfront.
+*/
+   prog = bpf_prog_add(prog, priv->params.num_channels);
+   if (IS_ERR(prog)) {
+   err = PTR_ERR(prog);
+   goto unlock;
+   }
+   }
+
was_opened = test_bit(MLX5E_STATE_OPENED, >state);
/* no need for full reset when exchanging programs */
reset = (!priv->xdp_prog || !prog);
@@ -3132,10 +3143,10 @@ static int mlx5e_xdp_set(struct net_device *netdev, 
struct bpf_prog *prog)
if (was_opened && reset)
mlx5e_close_locked(netdev);
 
-   /* exchange programs */
+   /* exchange programs, extra prog reference we got from caller
+* as long as we don't fail from this point onwards.
+*/
old_prog = xchg(>xdp_prog, prog);
-   if (prog)
-   bpf_prog_add(prog, 1);
if (old_prog)
bpf_prog_put(old_prog);
 
@@ -3146,12 +3157,11 @@ static int mlx5e_xdp_set(struct net_device *netdev, 
struct bpf_prog *prog)
mlx5e_open_locked(netdev);
 
if (!test_bit(MLX5E_STATE_OPENED, >state) || reset)
-   goto unlock;
+   goto unlock_put;
 
/* exchanging programs w/o reset, we update ref counts on behalf
 * of the channels RQs here.
 */
-   bpf_prog_add(prog, priv->params.num_channels);
for (i = 0; i < priv->params.num_channels; i++) {
struct mlx5e_channel *c = priv->channel[i];
 
@@ -3173,6 +3183,11 @@ static int mlx5e_xdp_set(struct net_device *netdev, 
struct bpf_prog *prog)
 unlock:
mutex_unlock(>state_lock);
return err;
+unlock_put:
+   /* reference on priv->xdp_prog is still held at this point */
+   if (prog)
+   bpf_prog_sub(prog, priv->params.num_channels);
+   goto unlock;
 }
 
 static bool mlx5e_xdp_attached(struct net_device *dev)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index c201017..ca495fd 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -234,6 +234,7 @@ u64 bpf_event_output(struct bpf_map *map, u64 flags, void 
*meta, u64 meta_size,
 struct bpf_prog *bpf_prog_get(u32 ufd);
 struct bpf_prog *bpf_prog_get_type(u32 ufd, enum bpf_prog_type type);
 struct bpf_prog *bpf_prog_add(struct bpf_prog *prog, int i);
+void bpf_prog_sub(struct bpf_prog *prog, int i);
 struct bpf_prog *bpf_prog_inc(struct bpf_prog *prog);
 void bpf_prog_put(struct bpf_prog *prog);
 
@@ -303,6 +304,10 @@ static inline struct bpf_prog *bpf_prog_add(struct 
bpf_prog *prog, int i)
return ERR_PTR(-EOPNOTSUPP);
 }
 
+static inline void bpf_prog_sub(struct bpf_prog *prog, int i)
+{
+}
+
 static inline void bpf_prog_put(struct bpf_prog *prog)
 {
 }
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 751e806..a0fca9f 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -682,6 +682,17 @@ struct bpf_prog *bpf_prog_add(struct bpf_prog *prog, int i)
 }
 EXPORT_SYMBOL_GPL(bpf_prog_add);
 
+void bpf_prog_sub(struct bpf_prog *prog, int i)
+{
+   /* Only to be used for undoing previous bpf_prog_add() in some
+* error path. We still know that another entity in our call
+* path holds a reference to the program, thus atomic_sub() can
+* be safely used in

[PATCH net 1/3] bpf, mlx5: fix mlx5e_create_rq taking reference on prog

2016-11-13 Thread Daniel Borkmann

In mlx5e_create_rq(), when creating a new queue, we call bpf_prog_add() but
without checking the return value. bpf_prog_add() can fail, so we really
must check it. Take the reference right when we assign it to the rq from
priv->xdp_prog, and just drop the reference on error path. Destruction in
mlx5e_destroy_rq() looks good, though.

Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support")
Signed-off-by: Daniel Borkmann 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 14 +++---
 kernel/bpf/syscall.c  |  1 +
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 84e8b25..2b83667 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -489,7 +489,16 @@ static int mlx5e_create_rq(struct mlx5e_channel *c,
rq->channel = c;
rq->ix  = c->ix;
rq->priv= c->priv;
+
rq->xdp_prog = priv->xdp_prog;
+   if (rq->xdp_prog) {
+   rq->xdp_prog = bpf_prog_inc(rq->xdp_prog);
+   if (IS_ERR(rq->xdp_prog)) {
+   err = PTR_ERR(rq->xdp_prog);
+   rq->xdp_prog = NULL;
+   goto err_rq_wq_destroy;
+   }
+   }
 
rq->buff.map_dir = DMA_FROM_DEVICE;
if (rq->xdp_prog)
@@ -566,12 +575,11 @@ static int mlx5e_create_rq(struct mlx5e_channel *c,
rq->page_cache.head = 0;
rq->page_cache.tail = 0;
 
-   if (rq->xdp_prog)
-   bpf_prog_add(rq->xdp_prog, 1);
-
return 0;
 
 err_rq_wq_destroy:
+   if (rq->xdp_prog)
+   bpf_prog_put(rq->xdp_prog);
mlx5_wq_destroy(>wq_ctrl);
 
return err;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 237f3d6..751e806 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -686,6 +686,7 @@ struct bpf_prog *bpf_prog_inc(struct bpf_prog *prog)
 {
return bpf_prog_add(prog, 1);
 }
+EXPORT_SYMBOL_GPL(bpf_prog_inc);
 
 static struct bpf_prog *__bpf_prog_get(u32 ufd, enum bpf_prog_type *type)
 {
-- 
1.9.3

[PATCH net 3/3] bpf, mlx5: drop priv->xdp_prog reference on netdev cleanup

2016-11-13 Thread Daniel Borkmann

mlx5e_xdp_set() is currently the only place where we drop reference on the
prog sitting in priv->xdp_prog when it's exchanged by a new one. We also
need to make sure that we eventually release that reference, for example,
in case the netdev is dismantled.

Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support")
Signed-off-by: Daniel Borkmann 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index c90610a..930aa6f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3697,6 +3697,9 @@ static void mlx5e_nic_cleanup(struct mlx5e_priv *priv)
 
if (MLX5_CAP_GEN(mdev, vport_group_manager))
mlx5_eswitch_unregister_vport_rep(esw, 0);
+
+   if (priv->xdp_prog)
+   bpf_prog_put(priv->xdp_prog);
 }
 
 static int mlx5e_init_nic_rx(struct mlx5e_priv *priv)
-- 
1.9.3

[PATCH net-next 1/1] driver: macvlan: Replace integer number with bool value

2016-11-13 Thread fgao

From: Gao Feng 

The return value of function macvlan_addr_busy is used as bool value,
so use bool value instead of integer number "1" and "0".

Signed-off-by: Gao Feng 
---
 drivers/net/macvlan.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index a064415..d0361f3 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -179,20 +179,20 @@ static void macvlan_hash_change_addr(struct macvlan_dev 
*vlan,
macvlan_hash_add(vlan);
 }
 
-static int macvlan_addr_busy(const struct macvlan_port *port,
-   const unsigned char *addr)
+static bool macvlan_addr_busy(const struct macvlan_port *port,
+ const unsigned char *addr)
 {
/* Test to see if the specified multicast address is
 * currently in use by the underlying device or
 * another macvlan.
 */
if (ether_addr_equal_64bits(port->dev->dev_addr, addr))
-   return 1;
+   return true;
 
if (macvlan_hash_lookup(port, addr))
-   return 1;
+   return true;
 
-   return 0;
+   return false;
 }
 
 
-- 
1.9.1

Re: [PATCH] Fixup packets with incorrect ethertype sent by ZTE MF821D

2016-11-13 Thread Jussi Peltola

So here's another stab. The comments and the current implementation are
not in sync: any non-multicast address starting with a null octet gets
rewritten, while the comment specifically mentions 00:a0:c6:00:00:00. It
is certainly not elegant but re-writing all unicast destinations with
our address does come to mind instead of special cases.

This patch fails to handle the invalid destinations in either way so I
will send another one if you think it's worthwhile to go on. And it
seems I forgot htons but I need this device for work now so a better
patch must wait :)

commit 35d3a46b7f1ece70e24386acbdd16af4507cb5f3
Author: Jussi Peltola 
Date:   Mon Nov 14 01:45:32 2016 +0200

Attempt to fix up packets with a broken ethernet header

Signed-off-by: Jussi Peltola 

diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
index 3ff76c6..7308d6b 100644
--- a/drivers/net/usb/qmi_wwan.c
+++ b/drivers/net/usb/qmi_wwan.c
@@ -153,25 +153,57 @@ static const u8 default_modem_addr[ETH_ALEN] = {0x02, 
0x50, 0xf3};
 
 static const u8 buggy_fw_addr[ETH_ALEN] = {0x00, 0xa0, 0xc6, 0x00, 0x00, 0x00};
 
-/* Make up an ethernet header if the packet doesn't have one.
+/* Check if the ethernet header has an unknown ethertype, and return a
+ * guess of the correct one based on the L3 header, or zero if the type was
+ * known or detection failed.
+ */
+static __be16 detect_bogus_header(struct sk_buff *skb) {
+   struct ethhdr *eth_hdr = (struct ethhdr*) skb->data;
+
+   switch (eth_hdr->h_proto) {
+   case ETH_P_IP:
+   case ETH_P_IPV6:
+   case ETH_P_ARP:
+   return 0;
+   default:
+   switch (skb->data[14] & 0xf0) {
+   case 0x40:
+   return htons(ETH_P_IP);
+   case 0x60:
+   return htons(ETH_P_IPV6);
+   default:
+   /* pass on undetectable packets */
+   return 0;
+   }
+   }
+   /*NOTREACHED*/
+   return 0;
+}
+
+/* Make up an ethernet header if the packet doesn't have a correct one.
  *
  * A firmware bug common among several devices cause them to send raw
  * IP packets under some circumstances.  There is no way for the
  * driver/host to know when this will happen.  And even when the bug
  * hits, some packets will still arrive with an intact header.
  *
- * The supported devices are only capably of sending IPv4, IPv6 and
+ * The supported devices are only capable of sending IPv4, IPv6 and
  * ARP packets on a point-to-point link. Any packet with an ethernet
  * header will have either our address or a broadcast/multicast
- * address as destination.  ARP packets will always have a header.
+ * address as destination. ARP packets will always have a header.
  *
  * This means that this function will reliably add the appropriate
- * header iff necessary, provided our hardware address does not start
+ * header if necessary, provided our hardware address does not start
  * with 4 or 6.
  *
  * Another common firmware bug results in all packets being addressed
  * to 00:a0:c6:00:00:00 despite the host address being different.
- * This function will also fixup such packets.
+ *
+ * Some devices will send packets with garbage source/destination MACs and
+ * ethertypes.
+ *
+ * This function will try to fix up all such packets.
+ *
  */
 static int qmi_wwan_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
 {
@@ -179,8 +211,8 @@ static int qmi_wwan_rx_fixup(struct usbnet *dev, struct 
sk_buff *skb)
bool rawip = info->flags & QMI_WWAN_FLAG_RAWIP;
__be16 proto;
 
-   /* This check is no longer done by usbnet */
-   if (skb->len < dev->net->hard_header_len)
+   /* Shorter is definitely invalid and breaks subsequent tests */
+   if (skb->len < 15)
return 0;
 
switch (skb->data[0] & 0xf0) {
@@ -190,17 +222,17 @@ static int qmi_wwan_rx_fixup(struct usbnet *dev, struct 
sk_buff *skb)
case 0x60:
proto = htons(ETH_P_IPV6);
break;
-   case 0x00:
+   default:
if (rawip)
return 0;
if (is_multicast_ether_addr(skb->data))
return 1;
-   /* possibly bogus destination - rewrite just in case */
-   skb_reset_mac_header(skb);
-   goto fix_dest;
-   default:
-   if (rawip)
-   return 0;
+   proto = detect_bogus_header(skb);
+   if (proto) {
+   /* remove terminally broken header */
+   skb_pull(skb, ETH_HLEN);
+   break;
+   }
/* pass along other packets without modifications */
return 1;
}
@@ -208,17 +240,17 @@ static int qmi_wwan_rx_fixup(struct usbnet *dev, struct 
sk_buff *skb)
skb->dev = dev->net; /* normally set

Re: [PATCH net-next 04/11] net: dsa: mv88e6xxx: Abstract stats_snapshot into ops structure

2016-11-13 Thread Vivien Didelot

Hi Andrew,

Andrew Lunn  writes:

> +static int mv88e6320_stats_snapshot(struct mv88e6xxx_chip *chip, int port)
> +{
> + port = (port + 1) << 5;
> +
> + return _mv88e6xxx_stats_snapshot(chip, port);
> +}

Please move the above helper in its internal SMI file (port, global1 or
whatever) and keep the below wrapper in chip.c. The correct prefix will
avoid having a _ prefix.

> +static int mv88e6xxx_stats_snapshot(struct mv88e6xxx_chip *chip, int port)
> +{
> + if (!chip->info->ops->stats_snapshot)
> + return -EOPNOTSUPP;
> +
> + return chip->info->ops->stats_snapshot(chip, port);
> +}

[...]

>  static const struct mv88e6xxx_ops mv88e6175_ops = {
> @@ -3223,6 +3243,7 @@ static const struct mv88e6xxx_ops mv88e6175_ops = {
>   .port_set_duplex = mv88e6xxx_port_set_duplex,
>   .port_set_rgmii_delay = mv88e6352_port_set_rgmii_delay,
>   .port_set_speed = mv88e6185_port_set_speed,
> + .stats_snapshot = mv88e6xxx_stats_snapshot,
>  };

Is this expected? Doesn't look correct to me to use
mv88e6xxx_stats_snapshot here.

Thanks,

Vivien

Re: [PATCH net-next v1] bpf: Use u64_to_user_ptr()

2016-11-13 Thread Daniel Borkmann


On 11/13/2016 07:44 PM, Mickaël Salaün wrote:

Replace the custom u64_to_ptr() function with the u64_to_user_ptr()
macro.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Arnd Bergmann 
Cc: Daniel Borkmann 


Looks good to me, thanks!

Acked-by: Daniel Borkmann

[PATCH v3] ip6_output: ensure flow saddr actually belongs to device

2016-11-13 Thread Jason A. Donenfeld

This puts the IPv6 routing functions in parity with the IPv4 routing
functions. Namely, we now check in v6 that if a flowi6 requests an
saddr, the returned dst actually corresponds to a net device that has
that saddr. This mirrors the v4 logic with __ip_dev_find in
__ip_route_output_key_hash. In the event that the returned dst is not
for a dst with a dev that has the saddr, we return -EINVAL, just like
v4; this makes it easy to use the same error handlers for both cases.

Signed-off-by: Jason A. Donenfeld 
Cc: David Ahern 
---
Changes from v2:
It turns out ipv6_chk_addr already has the device enumeration
logic that we need by simply passing NULL.

 net/ipv6/ip6_output.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 6001e78..b3b5cb6 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -926,6 +926,10 @@ static int ip6_dst_lookup_tail(struct net *net, const 
struct sock *sk,
int err;
int flags = 0;
 
+   if (!ipv6_addr_any(>saddr) &&
+   !ipv6_chk_addr(net, >saddr, NULL, 1))
+   return -EINVAL;
+
/* The correct way to handle this would be to do
 * ip6_route_get_saddr, and then ip6_route_output; however,
 * the route-specific preferred source forces the
-- 
2.10.2

[PATCH] net: bnx2: use new api ethtool_{get|set}_link_ksettings

2016-11-13 Thread Philippe Reynes

The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

Signed-off-by: Philippe Reynes 
---
 drivers/net/ethernet/broadcom/bnx2.c |   74 +++---
 1 files changed, 41 insertions(+), 33 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2.c 
b/drivers/net/ethernet/broadcom/bnx2.c
index eab49ff..09d5b61 100644
--- a/drivers/net/ethernet/broadcom/bnx2.c
+++ b/drivers/net/ethernet/broadcom/bnx2.c
@@ -6882,12 +6882,14 @@ static u32 bnx2_find_max_ring(u32 ring_size, u32 
max_size)
 /* All ethtool functions called with rtnl_lock */
 
 static int
-bnx2_get_settings(struct net_device *dev, struct ethtool_cmd *cmd)
+bnx2_get_link_ksettings(struct net_device *dev,
+   struct ethtool_link_ksettings *cmd)
 {
struct bnx2 *bp = netdev_priv(dev);
int support_serdes = 0, support_copper = 0;
+   u32 supported, advertising;
 
-   cmd->supported = SUPPORTED_Autoneg;
+   supported = SUPPORTED_Autoneg;
if (bp->phy_flags & BNX2_PHY_FLAG_REMOTE_PHY_CAP) {
support_serdes = 1;
support_copper = 1;
@@ -6897,56 +6899,59 @@ static u32 bnx2_find_max_ring(u32 ring_size, u32 
max_size)
support_copper = 1;
 
if (support_serdes) {
-   cmd->supported |= SUPPORTED_1000baseT_Full |
+   supported |= SUPPORTED_1000baseT_Full |
SUPPORTED_FIBRE;
if (bp->phy_flags & BNX2_PHY_FLAG_2_5G_CAPABLE)
-   cmd->supported |= SUPPORTED_2500baseX_Full;
-
+   supported |= SUPPORTED_2500baseX_Full;
}
if (support_copper) {
-   cmd->supported |= SUPPORTED_10baseT_Half |
+   supported |= SUPPORTED_10baseT_Half |
SUPPORTED_10baseT_Full |
SUPPORTED_100baseT_Half |
SUPPORTED_100baseT_Full |
SUPPORTED_1000baseT_Full |
SUPPORTED_TP;
-
}
 
spin_lock_bh(>phy_lock);
-   cmd->port = bp->phy_port;
-   cmd->advertising = bp->advertising;
+   cmd->base.port = bp->phy_port;
+   advertising = bp->advertising;
 
if (bp->autoneg & AUTONEG_SPEED) {
-   cmd->autoneg = AUTONEG_ENABLE;
+   cmd->base.autoneg = AUTONEG_ENABLE;
} else {
-   cmd->autoneg = AUTONEG_DISABLE;
+   cmd->base.autoneg = AUTONEG_DISABLE;
}
 
if (netif_carrier_ok(dev)) {
-   ethtool_cmd_speed_set(cmd, bp->line_speed);
-   cmd->duplex = bp->duplex;
+   cmd->base.speed = bp->line_speed;
+   cmd->base.duplex = bp->duplex;
if (!(bp->phy_flags & BNX2_PHY_FLAG_SERDES)) {
if (bp->phy_flags & BNX2_PHY_FLAG_MDIX)
-   cmd->eth_tp_mdix = ETH_TP_MDI_X;
+   cmd->base.eth_tp_mdix = ETH_TP_MDI_X;
else
-   cmd->eth_tp_mdix = ETH_TP_MDI;
+   cmd->base.eth_tp_mdix = ETH_TP_MDI;
}
}
else {
-   ethtool_cmd_speed_set(cmd, SPEED_UNKNOWN);
-   cmd->duplex = DUPLEX_UNKNOWN;
+   cmd->base.speed = SPEED_UNKNOWN;
+   cmd->base.duplex = DUPLEX_UNKNOWN;
}
spin_unlock_bh(>phy_lock);
 
-   cmd->transceiver = XCVR_INTERNAL;
-   cmd->phy_address = bp->phy_addr;
+   cmd->base.phy_address = bp->phy_addr;
+
+   ethtool_convert_legacy_u32_to_link_mode(cmd->link_modes.supported,
+   supported);
+   ethtool_convert_legacy_u32_to_link_mode(cmd->link_modes.advertising,
+   advertising);
 
return 0;
 }
 
 static int
-bnx2_set_settings(struct net_device *dev, struct ethtool_cmd *cmd)
+bnx2_set_link_ksettings(struct net_device *dev,
+   const struct ethtool_link_ksettings *cmd)
 {
struct bnx2 *bp = netdev_priv(dev);
u8 autoneg = bp->autoneg;
@@ -6957,24 +6962,26 @@ static u32 bnx2_find_max_ring(u32 ring_size, u32 
max_size)
 
spin_lock_bh(>phy_lock);
 
-   if (cmd->port != PORT_TP && cmd->port != PORT_FIBRE)
+   if (cmd->base.port != PORT_TP && cmd->base.port != PORT_FIBRE)
goto err_out_unlock;
 
-   if (cmd->port != bp->phy_port &&
+   if (cmd->base.port != bp->phy_port &&
!(bp->phy_flags & BNX2_PHY_FLAG_REMOTE_PHY_CAP))
goto err_out_unlock;
 
/* If device is down, we can store the settings only if the user
 * is setting the currently active port.
 */
-   if (!netif_running(dev) && cmd->port != bp->phy_port)
+   if (!netif_running(dev) && cmd->base.port != bp->phy_port)
goto err_out_unlock;
 
-

[PATCH 13/39] netfilter: conntrack: simplify init/uninit of L4 protocol trackers

2016-11-13 Thread Pablo Neira Ayuso

From: Davide Caratti 

modify registration and deregistration of layer-4 protocol trackers to
facilitate inclusion of new elements into the current list of builtin
protocols. Both builtin (TCP, UDP, ICMP) and non-builtin (DCCP, GRE, SCTP,
UDPlite) layer-4 protocol trackers usually register/deregister themselves
using consecutive calls to nf_ct_l4proto_{,pernet}_{,un}register(...).
This sequence is interrupted and rolled back in case of error; in order to
simplify addition of builtin protocols, the input of the above functions
has been modified to allow registering/unregistering multiple protocols.

Signed-off-by: Davide Caratti 
Signed-off-by: Pablo Neira Ayuso 
---
 include/net/netfilter/nf_conntrack_l4proto.h   | 18 --
 net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c | 76 +++
 net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c | 78 ---
 net/netfilter/nf_conntrack_proto.c | 85 ++
 net/netfilter/nf_conntrack_proto_dccp.c| 48 ---
 net/netfilter/nf_conntrack_proto_gre.c | 11 ++--
 net/netfilter/nf_conntrack_proto_sctp.c| 50 ---
 net/netfilter/nf_conntrack_proto_udplite.c | 50 ---
 8 files changed, 179 insertions(+), 237 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_l4proto.h 
b/include/net/netfilter/nf_conntrack_l4proto.h
index de629f1520df..2152b70626d5 100644
--- a/include/net/netfilter/nf_conntrack_l4proto.h
+++ b/include/net/netfilter/nf_conntrack_l4proto.h
@@ -125,14 +125,24 @@ struct nf_conntrack_l4proto 
*nf_ct_l4proto_find_get(u_int16_t l3proto,
 void nf_ct_l4proto_put(struct nf_conntrack_l4proto *p);
 
 /* Protocol pernet registration. */
+int nf_ct_l4proto_pernet_register_one(struct net *net,
+ struct nf_conntrack_l4proto *proto);
+void nf_ct_l4proto_pernet_unregister_one(struct net *net,
+struct nf_conntrack_l4proto *proto);
 int nf_ct_l4proto_pernet_register(struct net *net,
- struct nf_conntrack_l4proto *proto);
+ struct nf_conntrack_l4proto *proto[],
+ unsigned int num_proto);
 void nf_ct_l4proto_pernet_unregister(struct net *net,
-struct nf_conntrack_l4proto *proto);
+struct nf_conntrack_l4proto *proto[],
+unsigned int num_proto);
 
 /* Protocol global registration. */
-int nf_ct_l4proto_register(struct nf_conntrack_l4proto *proto);
-void nf_ct_l4proto_unregister(struct nf_conntrack_l4proto *proto);
+int nf_ct_l4proto_register_one(struct nf_conntrack_l4proto *proto);
+void nf_ct_l4proto_unregister_one(struct nf_conntrack_l4proto *proto);
+int nf_ct_l4proto_register(struct nf_conntrack_l4proto *proto[],
+  unsigned int num_proto);
+void nf_ct_l4proto_unregister(struct nf_conntrack_l4proto *proto[],
+ unsigned int num_proto);
 
 /* Generic netlink helpers */
 int nf_ct_port_tuple_to_nlattr(struct sk_buff *skb,
diff --git a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c 
b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
index 713c09a74b90..7130ed5dc1fa 100644
--- a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
+++ b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
@@ -336,47 +336,34 @@ MODULE_ALIAS("nf_conntrack-" __stringify(AF_INET));
 MODULE_ALIAS("ip_conntrack");
 MODULE_LICENSE("GPL");
 
+static struct nf_conntrack_l4proto *builtin_l4proto4[] = {
+   _conntrack_l4proto_tcp4,
+   _conntrack_l4proto_udp4,
+   _conntrack_l4proto_icmp,
+};
+
 static int ipv4_net_init(struct net *net)
 {
int ret = 0;
 
-   ret = nf_ct_l4proto_pernet_register(net, _conntrack_l4proto_tcp4);
-   if (ret < 0) {
-   pr_err("nf_conntrack_tcp4: pernet registration failed\n");
-   goto out_tcp;
-   }
-   ret = nf_ct_l4proto_pernet_register(net, _conntrack_l4proto_udp4);
-   if (ret < 0) {
-   pr_err("nf_conntrack_udp4: pernet registration failed\n");
-   goto out_udp;
-   }
-   ret = nf_ct_l4proto_pernet_register(net, _conntrack_l4proto_icmp);
-   if (ret < 0) {
-   pr_err("nf_conntrack_icmp4: pernet registration failed\n");
-   goto out_icmp;
-   }
+   ret = nf_ct_l4proto_pernet_register(net, builtin_l4proto4,
+   ARRAY_SIZE(builtin_l4proto4));
+   if (ret < 0)
+   return ret;
ret = nf_ct_l3proto_pernet_register(net, _conntrack_l3proto_ipv4);
if (ret < 0) {
pr_err("nf_conntrack_ipv4: pernet registration failed\n");
-   goto out_ipv4;
+   nf_ct_l4proto_pernet_unregister(net, builtin_l4proto4,
+

[PATCH 06/39] netfilter: nf_tables: use hook state from xt_action_param structure

2016-11-13 Thread Pablo Neira Ayuso

Don't copy relevant fields from hook state structure, instead use the
one that is already available in struct xt_action_param.

This patch also adds a set of new wrapper functions to fetch relevant
hook state structure fields.

Signed-off-by: Pablo Neira Ayuso 
---
 include/net/netfilter/nf_tables.h| 35 +++-
 net/bridge/netfilter/nft_meta_bridge.c   |  2 +-
 net/bridge/netfilter/nft_reject_bridge.c | 30 ---
 net/ipv4/netfilter/nft_dup_ipv4.c|  2 +-
 net/ipv4/netfilter/nft_fib_ipv4.c| 14 ++---
 net/ipv4/netfilter/nft_masq_ipv4.c   |  4 ++--
 net/ipv4/netfilter/nft_redir_ipv4.c  |  3 +--
 net/ipv4/netfilter/nft_reject_ipv4.c |  4 ++--
 net/ipv6/netfilter/nft_dup_ipv6.c|  2 +-
 net/ipv6/netfilter/nft_fib_ipv6.c| 16 +++
 net/ipv6/netfilter/nft_masq_ipv6.c   |  3 ++-
 net/ipv6/netfilter/nft_redir_ipv6.c  |  3 ++-
 net/ipv6/netfilter/nft_reject_ipv6.c |  6 +++---
 net/netfilter/nf_dup_netdev.c|  2 +-
 net/netfilter/nf_tables_core.c   | 10 -
 net/netfilter/nf_tables_trace.c  |  8 
 net/netfilter/nft_fib.c  |  2 +-
 net/netfilter/nft_fib_inet.c |  2 +-
 net/netfilter/nft_log.c  |  5 +++--
 net/netfilter/nft_lookup.c   |  5 ++---
 net/netfilter/nft_meta.c |  6 +++---
 net/netfilter/nft_queue.c|  2 +-
 net/netfilter/nft_reject_inet.c  | 18 
 net/netfilter/nft_rt.c   |  4 ++--
 24 files changed, 105 insertions(+), 83 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h 
b/include/net/netfilter/nf_tables.h
index 44060344f958..3295fb85bff6 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -14,27 +14,42 @@
 
 struct nft_pktinfo {
struct sk_buff  *skb;
-   struct net  *net;
-   const struct net_device *in;
-   const struct net_device *out;
-   u8  pf;
-   u8  hook;
booltprot_set;
u8  tprot;
/* for x_tables compatibility */
struct xt_action_param  xt;
 };
 
+static inline struct net *nft_net(const struct nft_pktinfo *pkt)
+{
+   return pkt->xt.state->net;
+}
+
+static inline unsigned int nft_hook(const struct nft_pktinfo *pkt)
+{
+   return pkt->xt.state->hook;
+}
+
+static inline u8 nft_pf(const struct nft_pktinfo *pkt)
+{
+   return pkt->xt.state->pf;
+}
+
+static inline const struct net_device *nft_in(const struct nft_pktinfo *pkt)
+{
+   return pkt->xt.state->in;
+}
+
+static inline const struct net_device *nft_out(const struct nft_pktinfo *pkt)
+{
+   return pkt->xt.state->out;
+}
+
 static inline void nft_set_pktinfo(struct nft_pktinfo *pkt,
   struct sk_buff *skb,
   const struct nf_hook_state *state)
 {
pkt->skb = skb;
-   pkt->net = state->net;
-   pkt->in = state->in;
-   pkt->out = state->out;
-   pkt->hook = state->hook;
-   pkt->pf = state->pf;
pkt->xt.state = state;
 }
 
diff --git a/net/bridge/netfilter/nft_meta_bridge.c 
b/net/bridge/netfilter/nft_meta_bridge.c
index ad47a921b701..5974dbc1ea24 100644
--- a/net/bridge/netfilter/nft_meta_bridge.c
+++ b/net/bridge/netfilter/nft_meta_bridge.c
@@ -23,7 +23,7 @@ static void nft_meta_bridge_get_eval(const struct nft_expr 
*expr,
 const struct nft_pktinfo *pkt)
 {
const struct nft_meta *priv = nft_expr_priv(expr);
-   const struct net_device *in = pkt->in, *out = pkt->out;
+   const struct net_device *in = nft_in(pkt), *out = nft_out(pkt);
u32 *dest = >data[priv->dreg];
const struct net_bridge_port *p;
 
diff --git a/net/bridge/netfilter/nft_reject_bridge.c 
b/net/bridge/netfilter/nft_reject_bridge.c
index 4b3df6b0e3b9..206dc266ecd2 100644
--- a/net/bridge/netfilter/nft_reject_bridge.c
+++ b/net/bridge/netfilter/nft_reject_bridge.c
@@ -315,17 +315,20 @@ static void nft_reject_bridge_eval(const struct nft_expr 
*expr,
case htons(ETH_P_IP):
switch (priv->type) {
case NFT_REJECT_ICMP_UNREACH:
-   nft_reject_br_send_v4_unreach(pkt->net, pkt->skb,
- pkt->in, pkt->hook,
+   nft_reject_br_send_v4_unreach(nft_net(pkt), pkt->skb,
+ nft_in(pkt),
+ nft_hook(pkt),
  priv->icmp_code);
break;
case NFT_REJECT_TCP_RST:
-

[PATCH 02/39] netfilter: remove comments that predate rcu days

2016-11-13 Thread Pablo Neira Ayuso

We cannot block/sleep on nf_iterate because netfilter runs under rcu
read lock these days, where blocking is well-known to be illegal. So
let's remove these old comments.

Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/core.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index 3d4aa96cb219..76014ad72ec5 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -308,18 +308,11 @@ unsigned int nf_iterate(struct sk_buff *skb,
 {
unsigned int verdict;
 
-   /*
-* The caller must not block between calls to this
-* function because of risk of continuing from deleted element.
-*/
while (*entryp) {
if (state->thresh > (*entryp)->ops.priority) {
*entryp = rcu_dereference((*entryp)->next);
continue;
}
-
-   /* Optimization: we don't need to hold module
-  reference here, since function can't sleep. --RR */
 repeat:
verdict = (*entryp)->ops.hook((*entryp)->ops.priv, skb, state);
if (verdict != NF_ACCEPT) {
-- 
2.1.4

[PATCH 36/39] netfilter: ipset: use setup_timer() and mod_timer().

2016-11-13 Thread Pablo Neira Ayuso

From: Jozsef Kadlecsik 

Use setup_timer() and instead of init_timer(), being the preferred way
of setting up a timer.

Also, quoting the mod_timer() function comment:
-> mod_timer() is a more efficient way to update the expire field of an
   active timer (if the timer is inactive it will be activated).

Use setup_timer() and mod_timer() to setup and arm a timer, making the
code compact and easier to read.

Signed-off-by: Muhammad Falak R Wani 
Signed-off-by: Jozsef Kadlecsik 
---
 net/netfilter/ipset/ip_set_bitmap_gen.h | 7 ++-
 net/netfilter/ipset/ip_set_hash_gen.h   | 7 ++-
 net/netfilter/ipset/ip_set_list_set.c   | 7 ++-
 3 files changed, 6 insertions(+), 15 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_bitmap_gen.h 
b/net/netfilter/ipset/ip_set_bitmap_gen.h
index f8ea26cafa30..6f09a99298cd 100644
--- a/net/netfilter/ipset/ip_set_bitmap_gen.h
+++ b/net/netfilter/ipset/ip_set_bitmap_gen.h
@@ -41,11 +41,8 @@ mtype_gc_init(struct ip_set *set, void (*gc)(unsigned long 
ul_set))
 {
struct mtype *map = set->data;
 
-   init_timer(>gc);
-   map->gc.data = (unsigned long)set;
-   map->gc.function = gc;
-   map->gc.expires = jiffies + IPSET_GC_PERIOD(set->timeout) * HZ;
-   add_timer(>gc);
+   setup_timer(>gc, gc, (unsigned long)set);
+   mod_timer(>gc, jiffies + IPSET_GC_PERIOD(set->timeout) * HZ);
 }
 
 static void
diff --git a/net/netfilter/ipset/ip_set_hash_gen.h 
b/net/netfilter/ipset/ip_set_hash_gen.h
index 88b70fcc5ac5..1b05d4a7d5a1 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -433,11 +433,8 @@ mtype_gc_init(struct ip_set *set, void (*gc)(unsigned long 
ul_set))
 {
struct htype *h = set->data;
 
-   init_timer(>gc);
-   h->gc.data = (unsigned long)set;
-   h->gc.function = gc;
-   h->gc.expires = jiffies + IPSET_GC_PERIOD(set->timeout) * HZ;
-   add_timer(>gc);
+   setup_timer(>gc, gc, (unsigned long)set);
+   mod_timer(>gc, jiffies + IPSET_GC_PERIOD(set->timeout) * HZ);
pr_debug("gc initialized, run in every %u\n",
 IPSET_GC_PERIOD(set->timeout));
 }
diff --git a/net/netfilter/ipset/ip_set_list_set.c 
b/net/netfilter/ipset/ip_set_list_set.c
index dede343a662b..51077c53d76b 100644
--- a/net/netfilter/ipset/ip_set_list_set.c
+++ b/net/netfilter/ipset/ip_set_list_set.c
@@ -586,11 +586,8 @@ list_set_gc_init(struct ip_set *set, void (*gc)(unsigned 
long ul_set))
 {
struct list_set *map = set->data;
 
-   init_timer(>gc);
-   map->gc.data = (unsigned long)set;
-   map->gc.function = gc;
-   map->gc.expires = jiffies + IPSET_GC_PERIOD(set->timeout) * HZ;
-   add_timer(>gc);
+   setup_timer(>gc, gc, (unsigned long)set);
+   mod_timer(>gc, jiffies + IPSET_GC_PERIOD(set->timeout) * HZ);
 }
 
 /* Create list:set type of sets */
-- 
2.1.4

[PATCH 18/39] netfilter: ipset: Headers file cleanup

2016-11-13 Thread Pablo Neira Ayuso

From: Jozsef Kadlecsik 

Group counter helper functions together.

Ported from a patch proposed by Sergey Popovich .

Suggested-by: Sergey Popovich 
Signed-off-by: Jozsef Kadlecsik 
---
 include/linux/netfilter/ipset/ip_set.h | 42 +-
 1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/include/linux/netfilter/ipset/ip_set.h 
b/include/linux/netfilter/ipset/ip_set.h
index 524467f933bf..1ea28e30a6dd 100644
--- a/include/linux/netfilter/ipset/ip_set.h
+++ b/include/linux/netfilter/ipset/ip_set.h
@@ -334,6 +334,27 @@ ip_set_update_counter(struct ip_set_counter *counter,
}
 }
 
+static inline bool
+ip_set_put_counter(struct sk_buff *skb, const struct ip_set_counter *counter)
+{
+   return nla_put_net64(skb, IPSET_ATTR_BYTES,
+cpu_to_be64(ip_set_get_bytes(counter)),
+IPSET_ATTR_PAD) ||
+  nla_put_net64(skb, IPSET_ATTR_PACKETS,
+cpu_to_be64(ip_set_get_packets(counter)),
+IPSET_ATTR_PAD);
+}
+
+static inline void
+ip_set_init_counter(struct ip_set_counter *counter,
+   const struct ip_set_ext *ext)
+{
+   if (ext->bytes != ULLONG_MAX)
+   atomic64_set(&(counter)->bytes, (long long)(ext->bytes));
+   if (ext->packets != ULLONG_MAX)
+   atomic64_set(&(counter)->packets, (long long)(ext->packets));
+}
+
 static inline void
 ip_set_get_skbinfo(struct ip_set_skbinfo *skbinfo,
   const struct ip_set_ext *ext,
@@ -372,27 +393,6 @@ ip_set_init_skbinfo(struct ip_set_skbinfo *skbinfo,
skbinfo->skbqueue = ext->skbqueue;
 }
 
-static inline bool
-ip_set_put_counter(struct sk_buff *skb, const struct ip_set_counter *counter)
-{
-   return nla_put_net64(skb, IPSET_ATTR_BYTES,
-cpu_to_be64(ip_set_get_bytes(counter)),
-IPSET_ATTR_PAD) ||
-  nla_put_net64(skb, IPSET_ATTR_PACKETS,
-cpu_to_be64(ip_set_get_packets(counter)),
-IPSET_ATTR_PAD);
-}
-
-static inline void
-ip_set_init_counter(struct ip_set_counter *counter,
-   const struct ip_set_ext *ext)
-{
-   if (ext->bytes != ULLONG_MAX)
-   atomic64_set(&(counter)->bytes, (long long)(ext->bytes));
-   if (ext->packets != ULLONG_MAX)
-   atomic64_set(&(counter)->packets, (long long)(ext->packets));
-}
-
 /* Netlink CB args */
 enum {
IPSET_CB_NET = 0,   /* net namespace */
-- 
2.1.4

[PATCH 03/39] netfilter: kill NF_HOOK_THRESH() and state->tresh

2016-11-13 Thread Pablo Neira Ayuso

Patch c5136b15ea36 ("netfilter: bridge: add and use br_nf_hook_thresh")
introduced br_nf_hook_thresh().

Replace NF_HOOK_THRESH() by br_nf_hook_thresh from
br_nf_forward_finish(), so we have no more callers for this macro.

As a result, state->thresh and explicit thresh parameter in the hook
state structure is not required anymore. And we can get rid of
skip-hook-under-thresh loop in nf_iterate() in the core path that is
only used by br_netfilter to search for the filter hook.

Suggested-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
---
 include/linux/netfilter.h | 50 +--
 include/linux/netfilter_ingress.h |  2 +-
 net/bridge/br_netfilter_hooks.c   |  8 +++---
 net/bridge/netfilter/ebtable_broute.c |  2 +-
 net/netfilter/core.c  |  4 ---
 net/netfilter/nf_queue.c  |  2 --
 6 files changed, 19 insertions(+), 49 deletions(-)

diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index abc7fdcb9eb1..e0d000f6c9bf 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -49,7 +49,6 @@ struct sock;
 
 struct nf_hook_state {
unsigned int hook;
-   int thresh;
u_int8_t pf;
struct net_device *in;
struct net_device *out;
@@ -84,7 +83,7 @@ struct nf_hook_entry {
 static inline void nf_hook_state_init(struct nf_hook_state *p,
  struct nf_hook_entry *hook_entry,
  unsigned int hook,
- int thresh, u_int8_t pf,
+ u_int8_t pf,
  struct net_device *indev,
  struct net_device *outdev,
  struct sock *sk,
@@ -92,7 +91,6 @@ static inline void nf_hook_state_init(struct nf_hook_state *p,
  int (*okfn)(struct net *, struct sock *, 
struct sk_buff *))
 {
p->hook = hook;
-   p->thresh = thresh;
p->pf = pf;
p->in = indev;
p->out = outdev;
@@ -155,20 +153,16 @@ extern struct static_key 
nf_hooks_needed[NFPROTO_NUMPROTO][NF_MAX_HOOKS];
 int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state);
 
 /**
- * nf_hook_thresh - call a netfilter hook
+ * nf_hook - call a netfilter hook
  *
  * Returns 1 if the hook has allowed the packet to pass.  The function
  * okfn must be invoked by the caller in this case.  Any other return
  * value indicates the packet has been consumed by the hook.
  */
-static inline int nf_hook_thresh(u_int8_t pf, unsigned int hook,
-struct net *net,
-struct sock *sk,
-struct sk_buff *skb,
-struct net_device *indev,
-struct net_device *outdev,
-int (*okfn)(struct net *, struct sock *, 
struct sk_buff *),
-int thresh)
+static inline int nf_hook(u_int8_t pf, unsigned int hook, struct net *net,
+ struct sock *sk, struct sk_buff *skb,
+ struct net_device *indev, struct net_device *outdev,
+ int (*okfn)(struct net *, struct sock *, struct 
sk_buff *))
 {
struct nf_hook_entry *hook_head;
int ret = 1;
@@ -185,8 +179,8 @@ static inline int nf_hook_thresh(u_int8_t pf, unsigned int 
hook,
if (hook_head) {
struct nf_hook_state state;
 
-   nf_hook_state_init(, hook_head, hook, thresh,
-  pf, indev, outdev, sk, net, okfn);
+   nf_hook_state_init(, hook_head, hook, pf, indev, outdev,
+  sk, net, okfn);
 
ret = nf_hook_slow(skb, );
}
@@ -195,14 +189,6 @@ static inline int nf_hook_thresh(u_int8_t pf, unsigned int 
hook,
return ret;
 }
 
-static inline int nf_hook(u_int8_t pf, unsigned int hook, struct net *net,
- struct sock *sk, struct sk_buff *skb,
- struct net_device *indev, struct net_device *outdev,
- int (*okfn)(struct net *, struct sock *, struct 
sk_buff *))
-{
-   return nf_hook_thresh(pf, hook, net, sk, skb, indev, outdev, okfn, 
INT_MIN);
-}
-   
 /* Activate hook; either okfn or kfree_skb called, unless a hook
returns NF_STOLEN (in which case, it's up to the hook to deal with
the consequences).
@@ -221,19 +207,6 @@ static inline int nf_hook(u_int8_t pf, unsigned int hook, 
struct net *net,
 */
 
 static inline int
-NF_HOOK_THRESH(uint8_t pf, unsigned int hook, struct net *net, struct sock *sk,
-  struct sk_buff *skb, struct net_device *in,
-  struct net_device *out,
-  int (*okfn)(struct net *,

[PATCH 22/39] netfilter: ipset: Separate memsize calculation code into dedicated function

2016-11-13 Thread Pablo Neira Ayuso

From: Jozsef Kadlecsik 

Hash types already has it's memsize calculation code in separate
functions. Clean up and do the same for *bitmap* and *list* sets.

Ported from a patch proposed by Sergey Popovich .

Suggested-by: Sergey Popovich 
Signed-off-by: Jozsef Kadlecsik 
---
 net/netfilter/ipset/ip_set_bitmap_gen.h | 11 ++-
 net/netfilter/ipset/ip_set_list_set.c   | 23 +--
 2 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_bitmap_gen.h 
b/net/netfilter/ipset/ip_set_bitmap_gen.h
index 2e8e7e5fb4a6..4f07b90f8ef4 100644
--- a/net/netfilter/ipset/ip_set_bitmap_gen.h
+++ b/net/netfilter/ipset/ip_set_bitmap_gen.h
@@ -22,6 +22,7 @@
 #define mtype_kadt IPSET_TOKEN(MTYPE, _kadt)
 #define mtype_uadt IPSET_TOKEN(MTYPE, _uadt)
 #define mtype_destroy  IPSET_TOKEN(MTYPE, _destroy)
+#define mtype_memsize  IPSET_TOKEN(MTYPE, _memsize)
 #define mtype_flushIPSET_TOKEN(MTYPE, _flush)
 #define mtype_head IPSET_TOKEN(MTYPE, _head)
 #define mtype_same_set IPSET_TOKEN(MTYPE, _same_set)
@@ -84,12 +85,20 @@ mtype_flush(struct ip_set *set)
memset(map->members, 0, map->memsize);
 }
 
+/* Calculate the actual memory size of the set data */
+static size_t
+mtype_memsize(const struct mtype *map, size_t dsize)
+{
+   return sizeof(*map) + map->memsize +
+  map->elements * dsize;
+}
+
 static int
 mtype_head(struct ip_set *set, struct sk_buff *skb)
 {
const struct mtype *map = set->data;
struct nlattr *nested;
-   size_t memsize = sizeof(*map) + map->memsize;
+   size_t memsize = mtype_memsize(map, set->dsize);
 
nested = ipset_nest_start(skb, IPSET_ATTR_DATA);
if (!nested)
diff --git a/net/netfilter/ipset/ip_set_list_set.c 
b/net/netfilter/ipset/ip_set_list_set.c
index a2a89e4e0a14..462b0b1870e2 100644
--- a/net/netfilter/ipset/ip_set_list_set.c
+++ b/net/netfilter/ipset/ip_set_list_set.c
@@ -441,12 +441,12 @@ list_set_destroy(struct ip_set *set)
set->data = NULL;
 }
 
-static int
-list_set_head(struct ip_set *set, struct sk_buff *skb)
+/* Calculate the actual memory size of the set data */
+static size_t
+list_set_memsize(const struct list_set *map, size_t dsize)
 {
-   const struct list_set *map = set->data;
-   struct nlattr *nested;
struct set_elem *e;
+   size_t memsize;
u32 n = 0;
 
rcu_read_lock();
@@ -454,13 +454,24 @@ list_set_head(struct ip_set *set, struct sk_buff *skb)
n++;
rcu_read_unlock();
 
+   memsize = sizeof(*map) + n * dsize;
+
+   return memsize;
+}
+
+static int
+list_set_head(struct ip_set *set, struct sk_buff *skb)
+{
+   const struct list_set *map = set->data;
+   struct nlattr *nested;
+   size_t memsize = list_set_memsize(map, set->dsize);
+
nested = ipset_nest_start(skb, IPSET_ATTR_DATA);
if (!nested)
goto nla_put_failure;
if (nla_put_net32(skb, IPSET_ATTR_SIZE, htonl(map->size)) ||
nla_put_net32(skb, IPSET_ATTR_REFERENCES, htonl(set->ref)) ||
-   nla_put_net32(skb, IPSET_ATTR_MEMSIZE,
- htonl(sizeof(*map) + n * set->dsize)))
+   nla_put_net32(skb, IPSET_ATTR_MEMSIZE, htonl(memsize)))
goto nla_put_failure;
if (unlikely(ip_set_put_flags(skb, set)))
goto nla_put_failure;
-- 
2.1.4

[PATCH 39/39] netfilter: x_tables: simplify IS_ERR_OR_NULL to NULL test

2016-11-13 Thread Pablo Neira Ayuso

From: Julia Lawall 

Since commit 7926dbfa4bc1 ("netfilter: don't use
mutex_lock_interruptible()"), the function xt_find_table_lock can only
return NULL on an error.  Simplify the call sites and update the
comment before the function.

The semantic patch that change the code is as follows:
(http://coccinelle.lip6.fr/)

// 
@@
expression t,e;
@@

t = \(xt_find_table_lock(...)\|
  try_then_request_module(xt_find_table_lock(...),...)\)
... when != t=e
- ! IS_ERR_OR_NULL(t)
+ t

@@
expression t,e;
@@

t = \(xt_find_table_lock(...)\|
  try_then_request_module(xt_find_table_lock(...),...)\)
... when != t=e
- IS_ERR_OR_NULL(t)
+ !t

@@
expression t,e,e1;
@@

t = \(xt_find_table_lock(...)\|
  try_then_request_module(xt_find_table_lock(...),...)\)
... when != t=e
?- t ? PTR_ERR(t) : e1
+ e1
... when any

// 

Signed-off-by: Julia Lawall 
Signed-off-by: Pablo Neira Ayuso 
---
 net/ipv4/netfilter/arp_tables.c | 20 ++--
 net/ipv4/netfilter/ip_tables.c  | 20 ++--
 net/ipv6/netfilter/ip6_tables.c | 20 ++--
 net/netfilter/x_tables.c|  2 +-
 4 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index e76ab23a2deb..39004da318e2 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -805,7 +805,7 @@ static int get_info(struct net *net, void __user *user,
 #endif
t = try_then_request_module(xt_find_table_lock(net, NFPROTO_ARP, name),
"arptable_%s", name);
-   if (!IS_ERR_OR_NULL(t)) {
+   if (t) {
struct arpt_getinfo info;
const struct xt_table_info *private = t->private;
 #ifdef CONFIG_COMPAT
@@ -834,7 +834,7 @@ static int get_info(struct net *net, void __user *user,
xt_table_unlock(t);
module_put(t->me);
} else
-   ret = t ? PTR_ERR(t) : -ENOENT;
+   ret = -ENOENT;
 #ifdef CONFIG_COMPAT
if (compat)
xt_compat_unlock(NFPROTO_ARP);
@@ -859,7 +859,7 @@ static int get_entries(struct net *net, struct 
arpt_get_entries __user *uptr,
get.name[sizeof(get.name) - 1] = '\0';
 
t = xt_find_table_lock(net, NFPROTO_ARP, get.name);
-   if (!IS_ERR_OR_NULL(t)) {
+   if (t) {
const struct xt_table_info *private = t->private;
 
if (get.size == private->size)
@@ -871,7 +871,7 @@ static int get_entries(struct net *net, struct 
arpt_get_entries __user *uptr,
module_put(t->me);
xt_table_unlock(t);
} else
-   ret = t ? PTR_ERR(t) : -ENOENT;
+   ret = -ENOENT;
 
return ret;
 }
@@ -898,8 +898,8 @@ static int __do_replace(struct net *net, const char *name,
 
t = try_then_request_module(xt_find_table_lock(net, NFPROTO_ARP, name),
"arptable_%s", name);
-   if (IS_ERR_OR_NULL(t)) {
-   ret = t ? PTR_ERR(t) : -ENOENT;
+   if (!t) {
+   ret = -ENOENT;
goto free_newinfo_counters_untrans;
}
 
@@ -1014,8 +1014,8 @@ static int do_add_counters(struct net *net, const void 
__user *user,
return PTR_ERR(paddc);
 
t = xt_find_table_lock(net, NFPROTO_ARP, tmp.name);
-   if (IS_ERR_OR_NULL(t)) {
-   ret = t ? PTR_ERR(t) : -ENOENT;
+   if (!t) {
+   ret = -ENOENT;
goto free;
}
 
@@ -1404,7 +1404,7 @@ static int compat_get_entries(struct net *net,
 
xt_compat_lock(NFPROTO_ARP);
t = xt_find_table_lock(net, NFPROTO_ARP, get.name);
-   if (!IS_ERR_OR_NULL(t)) {
+   if (t) {
const struct xt_table_info *private = t->private;
struct xt_table_info info;
 
@@ -1419,7 +1419,7 @@ static int compat_get_entries(struct net *net,
module_put(t->me);
xt_table_unlock(t);
} else
-   ret = t ? PTR_ERR(t) : -ENOENT;
+   ret = -ENOENT;
 
xt_compat_unlock(NFPROTO_ARP);
return ret;
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index de4fa03f46f3..46815c8a60d7 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -973,7 +973,7 @@ static int get_info(struct net *net, void __user *user,
 #endif
t = try_then_request_module(xt_find_table_lock(net, AF_INET, name),
"iptable_%s", name);
-   if (!IS_ERR_OR_NULL(t)) {
+   if (t) {
struct ipt_getinfo info;
const struct xt_table_info *private = t->private;
 #ifdef CONFIG_COMPAT
@@ -1003,7 +1003,7 @@ static int get_info(struct net *net, void __user *user,
xt_table_unlock(t);
module_put(t->me);
} else
-   ret = t ?

[PATCH 00/39] Netfilter updates for net-next

2016-11-13 Thread Pablo Neira Ayuso

Hi David,

The following patchset contains a second batch of Netfilter updates for
your net-next tree. This includes a rework of the core hook
infrastructure that improves Netfilter performance by ~15% according to
synthetic benchmarks. Then, a large batch with ipset updates, including
a new hash:ipmac set type, via Jozsef Kadlecsik. This also includes a
couple of assorted updates.

Regarding the core hook infrastructure rework to improve performance,
using this simple drop-all packets ruleset from ingress:

nft add table netdev x
nft add chain netdev x y { type filter hook ingress device eth0 
priority 0\; }
nft add rule netdev x y drop

And generating traffic through Jesper Brouer's
samples/pktgen/pktgen_bench_xmit_mode_netif_receive.sh script using -i
option. perf report shows nf_tables calls in its top 10:

17.30%  kpktgend_0   [nf_tables][k] nft_do_chain
15.75%  kpktgend_0   [kernel.vmlinux]   [k] __netif_receive_skb_core
10.39%  kpktgend_0   [nf_tables_netdev] [k] nft_do_chain_netdev

I'm measuring here an improvement of ~15% in performance with this
patchset, so we got +2.5Mpps more. I have used my old laptop Intel(R)
Core(TM) i5-3320M CPU @ 2.60GHz 4-cores.

This rework contains more specifically, in strict order, these patches:

1) Remove compile-time debugging from core.

2) Remove obsolete comments that predate the rcu era. These days it is
   well known that a Netfilter hook always runs under rcu_read_lock().

3) Remove threshold handling, this is only used by br_netfilter too.
   We already have specific code to handle this from br_netfilter,
   so remove this code from the core path.

4) Deprecate NF_STOP, as this is only used by br_netfilter.

5) Place nf_state_hook pointer into xt_action_param structure, so
   this structure fits into one single cacheline according to pahole.
   This also implicit affects nftables since it also relies on the
   xt_action_param structure.

6) Move state->hook_entries into nf_queue entry. The hook_entries
   pointer is only required by nf_queue(), so we can store this in the
   queue entry instead.

7) use switch() statement to handle verdict cases.

8) Remove hook_entries field from nf_hook_state structure, this is only
   required by nf_queue, so store it in nf_queue_entry structure.

9) Merge nf_iterate() into nf_hook_slow() that results in a much more
   simple and readable function.

10) Handle NF_REPEAT away from the core, so far the only client is
nf_conntrack_in() and we can restart the packet processing using a
simple goto to jump back there when the TCP requires it.
This update required a second pass to fix fallout, fix from
Arnd Bergmann.

11) Set random seed from nft_hash when no seed is specified from
userspace.

12) Simplify nf_tables expression registration, in a much smarter way
to save lots of boiler plate code, by Liping Zhang.

13) Simplify layer 4 protocol conntrack tracker registration, from
Davide Caratti.

14) Missing CONFIG_NF_SOCKET_IPV4 dependency for udp4_lib_lookup, due
to recent generalization of the socket infrastructure, from Arnd
Bergmann.

15) Then, the ipset batch from Jozsef, he describes it as it follows:

* Cleanup: Remove extra whitespaces in ip_set.h
* Cleanup: Mark some of the helpers arguments as const in ip_set.h
* Cleanup: Group counter helper functions together in ip_set.h
* struct ip_set_skbinfo is introduced instead of open coded fields
  in skbinfo get/init helper funcions.
* Use kmalloc() in comment extension helper instead of kzalloc()
  because it is unnecessary to zero out the area just before
  explicit initialization.
* Cleanup: Split extensions into separate files.
* Cleanup: Separate memsize calculation code into dedicated function.
* Cleanup: group ip_set_put_extensions() and ip_set_get_extensions()
  together.
* Add element count to hash headers by Eric B Munson.
* Add element count to all set types header for uniform output
  across all set types.
* Count non-static extension memory into memsize calculation for
  userspace.
* Cleanup: Remove redundant mtype_expire() arguments, because
  they can be get from other parameters.
* Cleanup: Simplify mtype_expire() for hash types by removing
  one level of intendation.
* Make NLEN compile time constant for hash types.
* Make sure element data size is a multiple of u32 for the hash set
  types.
* Optimize hash creation routine, exit as early as possible.
* Make struct htype per ipset family so nets array becomes fixed size
  and thus simplifies the struct htype allocation.
* Collapse same condition body into a single one.
* Fix reported memory size for hash:* types, base hash bucket structure
  was not taken into account.
* hash:ipmac type support added to ipset by Tomasz Chilinski.
* Use setup_timer() and mod_timer() instead of init_timer()
  by Muhammad Falak R Wani, individually for the set type families.

16) Remove useless connlabel field in struct netns_ct,

[PATCH 09/39] netfilter: merge nf_iterate() into nf_hook_slow()

2016-11-13 Thread Pablo Neira Ayuso

nf_iterate() has become rather simple, we can integrate this code into
nf_hook_slow() to reduce the amount of LOC in the core path.

However, we still need nf_iterate() around for nf_queue packet handling,
so move this function there where we only need it. I think it should be
possible to refactor nf_queue code to get rid of it definitely, but
given this is slow path anyway, let's have a look this later.

Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/core.c | 73 +---
 net/netfilter/nf_internals.h |  5 ---
 net/netfilter/nf_queue.c | 20 
 3 files changed, 48 insertions(+), 50 deletions(-)

diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index ebece48b8392..bd9272eeccb5 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -302,26 +302,6 @@ void _nf_unregister_hooks(struct nf_hook_ops *reg, 
unsigned int n)
 }
 EXPORT_SYMBOL(_nf_unregister_hooks);
 
-unsigned int nf_iterate(struct sk_buff *skb,
-   struct nf_hook_state *state,
-   struct nf_hook_entry **entryp)
-{
-   unsigned int verdict;
-
-   do {
-repeat:
-   verdict = (*entryp)->ops.hook((*entryp)->ops.priv, skb, state);
-   if (verdict != NF_ACCEPT) {
-   if (verdict != NF_REPEAT)
-   return verdict;
-   goto repeat;
-   }
-   *entryp = rcu_dereference((*entryp)->next);
-   } while (*entryp);
-   return NF_ACCEPT;
-}
-
-
 /* Returns 1 if okfn() needs to be executed by the caller,
  * -EPERM for NF_DROP, 0 otherwise.  Caller must hold rcu_read_lock. */
 int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state,
@@ -330,31 +310,34 @@ int nf_hook_slow(struct sk_buff *skb, struct 
nf_hook_state *state,
unsigned int verdict;
int ret;
 
-next_hook:
-   verdict = nf_iterate(skb, state, );
-   switch (verdict & NF_VERDICT_MASK) {
-   case NF_ACCEPT:
-   ret = 1;
-   break;
-   case NF_DROP:
-   kfree_skb(skb);
-   ret = NF_DROP_GETERR(verdict);
-   if (ret == 0)
-   ret = -EPERM;
-   break;
-   case NF_QUEUE:
-   ret = nf_queue(skb, state, , verdict);
-   if (ret == 1 && entry)
-   goto next_hook;
-   /* Fall through. */
-   default:
-   /* Implicit handling for NF_STOLEN, as well as any other non
-* conventional verdicts.
-*/
-   ret = 0;
-   break;
-   }
-   return ret;
+   do {
+   verdict = entry->ops.hook(entry->ops.priv, skb, state);
+   switch (verdict & NF_VERDICT_MASK) {
+   case NF_ACCEPT:
+   entry = rcu_dereference(entry->next);
+   break;
+   case NF_DROP:
+   kfree_skb(skb);
+   ret = NF_DROP_GETERR(verdict);
+   if (ret == 0)
+   ret = -EPERM;
+   return ret;
+   case NF_REPEAT:
+   continue;
+   case NF_QUEUE:
+   ret = nf_queue(skb, state, , verdict);
+   if (ret == 1 && entry)
+   continue;
+   return ret;
+   default:
+   /* Implicit handling for NF_STOLEN, as well as any other
+* non conventional verdicts.
+*/
+   return 0;
+   }
+   } while (entry);
+
+   return 1;
 }
 EXPORT_SYMBOL(nf_hook_slow);
 
diff --git a/net/netfilter/nf_internals.h b/net/netfilter/nf_internals.h
index 9fdb655f85bc..c46d214d5323 100644
--- a/net/netfilter/nf_internals.h
+++ b/net/netfilter/nf_internals.h
@@ -11,11 +11,6 @@
 #define NFDEBUG(format, args...)
 #endif
 
-
-/* core.c */
-unsigned int nf_iterate(struct sk_buff *skb, struct nf_hook_state *state,
-   struct nf_hook_entry **entryp);
-
 /* nf_queue.c */
 int nf_queue(struct sk_buff *skb, struct nf_hook_state *state,
 struct nf_hook_entry **entryp, unsigned int verdict);
diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c
index 2e39e38ae1c7..77cba9f6ccb6 100644
--- a/net/netfilter/nf_queue.c
+++ b/net/netfilter/nf_queue.c
@@ -177,6 +177,26 @@ int nf_queue(struct sk_buff *skb, struct nf_hook_state 
*state,
return 0;
 }
 
+static unsigned int nf_iterate(struct sk_buff *skb,
+  struct nf_hook_state *state,
+  struct nf_hook_entry **entryp)
+{
+   unsigned int verdict;
+
+   do {
+repeat:
+   verdict = (*entryp)->ops.hook((*entryp)->ops.priv, skb, state);
+   if (verdict != NF_ACCEPT) {

[PATCH 08/39] netfilter: remove hook_entries field from nf_hook_state

2016-11-13 Thread Pablo Neira Ayuso

This field is only useful for nf_queue, so store it in the
nf_queue_entry structure instead, away from the core path. Pass
hook_head to nf_hook_slow().

Since we always have a valid entry on the first iteration in
nf_iterate(), we can use 'do { ... } while (entry)' loop instead.

Signed-off-by: Pablo Neira Ayuso 
---
 include/linux/netfilter.h | 10 --
 include/linux/netfilter_ingress.h |  4 ++--
 include/net/netfilter/nf_queue.h  |  1 +
 net/bridge/br_netfilter_hooks.c   |  4 ++--
 net/bridge/netfilter/ebtable_broute.c |  2 +-
 net/netfilter/core.c  |  9 -
 net/netfilter/nf_queue.c  | 13 +
 net/netfilter/nfnetlink_queue.c   |  2 +-
 8 files changed, 20 insertions(+), 25 deletions(-)

diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index e0d000f6c9bf..69230140215b 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -54,7 +54,6 @@ struct nf_hook_state {
struct net_device *out;
struct sock *sk;
struct net *net;
-   struct nf_hook_entry __rcu *hook_entries;
int (*okfn)(struct net *, struct sock *, struct sk_buff *);
 };
 
@@ -81,7 +80,6 @@ struct nf_hook_entry {
 };
 
 static inline void nf_hook_state_init(struct nf_hook_state *p,
- struct nf_hook_entry *hook_entry,
  unsigned int hook,
  u_int8_t pf,
  struct net_device *indev,
@@ -96,7 +94,6 @@ static inline void nf_hook_state_init(struct nf_hook_state *p,
p->out = outdev;
p->sk = sk;
p->net = net;
-   RCU_INIT_POINTER(p->hook_entries, hook_entry);
p->okfn = okfn;
 }
 
@@ -150,7 +147,8 @@ void nf_unregister_sockopt(struct nf_sockopt_ops *reg);
 extern struct static_key nf_hooks_needed[NFPROTO_NUMPROTO][NF_MAX_HOOKS];
 #endif
 
-int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state);
+int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state,
+struct nf_hook_entry *entry);
 
 /**
  * nf_hook - call a netfilter hook
@@ -179,10 +177,10 @@ static inline int nf_hook(u_int8_t pf, unsigned int hook, 
struct net *net,
if (hook_head) {
struct nf_hook_state state;
 
-   nf_hook_state_init(, hook_head, hook, pf, indev, outdev,
+   nf_hook_state_init(, hook, pf, indev, outdev,
   sk, net, okfn);
 
-   ret = nf_hook_slow(skb, );
+   ret = nf_hook_slow(skb, , hook_head);
}
rcu_read_unlock();
 
diff --git a/include/linux/netfilter_ingress.h 
b/include/linux/netfilter_ingress.h
index fd44e4131710..2dc3b49b804a 100644
--- a/include/linux/netfilter_ingress.h
+++ b/include/linux/netfilter_ingress.h
@@ -26,10 +26,10 @@ static inline int nf_hook_ingress(struct sk_buff *skb)
if (unlikely(!e))
return 0;
 
-   nf_hook_state_init(, e, NF_NETDEV_INGRESS,
+   nf_hook_state_init(, NF_NETDEV_INGRESS,
   NFPROTO_NETDEV, skb->dev, NULL, NULL,
   dev_net(skb->dev), NULL);
-   return nf_hook_slow(skb, );
+   return nf_hook_slow(skb, , e);
 }
 
 static inline void nf_hook_ingress_init(struct net_device *dev)
diff --git a/include/net/netfilter/nf_queue.h b/include/net/netfilter/nf_queue.h
index 2280cfe86c56..09948d10e38e 100644
--- a/include/net/netfilter/nf_queue.h
+++ b/include/net/netfilter/nf_queue.h
@@ -12,6 +12,7 @@ struct nf_queue_entry {
unsigned intid;
 
struct nf_hook_statestate;
+   struct nf_hook_entry*hook;
u16 size; /* sizeof(entry) + saved route keys */
 
/* extra space to store route keys */
diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
index 7e3645fa6339..8155bd2a5138 100644
--- a/net/bridge/br_netfilter_hooks.c
+++ b/net/bridge/br_netfilter_hooks.c
@@ -1018,10 +1018,10 @@ int br_nf_hook_thresh(unsigned int hook, struct net 
*net,
 
/* We may already have this, but read-locks nest anyway */
rcu_read_lock();
-   nf_hook_state_init(, elem, hook, NFPROTO_BRIDGE, indev, outdev,
+   nf_hook_state_init(, hook, NFPROTO_BRIDGE, indev, outdev,
   sk, net, okfn);
 
-   ret = nf_hook_slow(skb, );
+   ret = nf_hook_slow(skb, , elem);
rcu_read_unlock();
if (ret == 1)
ret = okfn(net, sk, skb);
diff --git a/net/bridge/netfilter/ebtable_broute.c 
b/net/bridge/netfilter/ebtable_broute.c
index 599679e3498d..8fe36dc3aab2 100644
--- a/net/bridge/netfilter/ebtable_broute.c
+++ b/net/bridge/netfilter/ebtable_broute.c
@@ -53,7 +53,7 @@ static int ebt_broute(struct sk_buff *skb)
struct nf_hook_state state;
int ret;
 
-   nf_hook_state_init(, NULL, NF_BR_BROUTING,
+

[PATCH 30/39] netfilter: ipset: Make sure element data size is a multiple of u32

2016-11-13 Thread Pablo Neira Ayuso

From: Jozsef Kadlecsik 

Data for hashing required to be array of u32. Make sure that
element data always multiple of u32.

Ported from a patch proposed by Sergey Popovich .

Signed-off-by: Jozsef Kadlecsik 
---
 net/netfilter/ipset/ip_set_hash_gen.h | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_hash_gen.h 
b/net/netfilter/ipset/ip_set_hash_gen.h
index 6c88c20ae1d4..34f115f874ab 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -260,8 +260,14 @@ htable_bits(u32 hashsize)
 #endif
 
 #define HKEY(data, initval, htable_bits)   \
-(jhash2((u32 *)(data), HKEY_DATALEN / sizeof(u32), initval)\
-   & jhash_mask(htable_bits))
+({ \
+   const u32 *__k = (const u32 *)data; \
+   u32 __l = HKEY_DATALEN / sizeof(u32);   \
+   \
+   BUILD_BUG_ON(HKEY_DATALEN % sizeof(u32) != 0);  \
+   \
+   jhash2(__k, __l, initval) & jhash_mask(htable_bits);\
+})
 
 #ifndef htype
 #ifndef HTYPE
-- 
2.1.4

[PATCH 33/39] netfilter: ipset: Collapse same condition body to a single one

2016-11-13 Thread Pablo Neira Ayuso

From: Jozsef Kadlecsik 

The set full case (with net_ratelimit()-ed pr_warn()) is already
handled, simply jump there.

Signed-off-by: Jozsef Kadlecsik 
---
 net/netfilter/ipset/ip_set_hash_gen.h | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_hash_gen.h 
b/net/netfilter/ipset/ip_set_hash_gen.h
index c600f6d9f15e..1c9b84e53dcc 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -719,14 +719,8 @@ mtype_add(struct ip_set *set, void *value, const struct 
ip_set_ext *ext,
key = HKEY(value, h->initval, t->htable_bits);
n = __ipset_dereference_protected(hbucket(t, key), 1);
if (!n) {
-   if (forceadd) {
-   if (net_ratelimit())
-   pr_warn("Set %s is full, maxelem %u reached\n",
-   set->name, h->maxelem);
-   return -IPSET_ERR_HASH_FULL;
-   } else if (set->elements >= h->maxelem) {
+   if (forceadd || set->elements >= h->maxelem)
goto set_full;
-   }
old = NULL;
n = kzalloc(sizeof(*n) + AHASH_INIT_SIZE * set->dsize,
GFP_ATOMIC);
-- 
2.1.4

[PATCH 26/39] netfilter: ipset: Count non-static extension memory for userspace

2016-11-13 Thread Pablo Neira Ayuso

From: Jozsef Kadlecsik 

Non-static (i.e. comment) extension was not counted into the memory
size. A new internal counter is introduced for this. In the case of
the hash types the sizes of the arrays are counted there as well so
that we can avoid to scan the whole set when just the header data
is requested.

Signed-off-by: Jozsef Kadlecsik 
---
 include/linux/netfilter/ipset/ip_set.h |  8 ++--
 include/linux/netfilter/ipset/ip_set_comment.h |  7 +--
 net/netfilter/ipset/ip_set_bitmap_gen.h|  5 +++--
 net/netfilter/ipset/ip_set_core.c  |  2 +-
 net/netfilter/ipset/ip_set_hash_gen.h  | 26 ++
 net/netfilter/ipset/ip_set_list_set.c  |  5 +++--
 6 files changed, 32 insertions(+), 21 deletions(-)

diff --git a/include/linux/netfilter/ipset/ip_set.h 
b/include/linux/netfilter/ipset/ip_set.h
index 4671d740610f..8e42253e5d4d 100644
--- a/include/linux/netfilter/ipset/ip_set.h
+++ b/include/linux/netfilter/ipset/ip_set.h
@@ -79,10 +79,12 @@ enum ip_set_ext_id {
IPSET_EXT_ID_MAX,
 };
 
+struct ip_set;
+
 /* Extension type */
 struct ip_set_ext_type {
/* Destroy extension private data (can be NULL) */
-   void (*destroy)(void *ext);
+   void (*destroy)(struct ip_set *set, void *ext);
enum ip_set_extension type;
enum ipset_cadt_flags flag;
/* Size and minimal alignment */
@@ -252,6 +254,8 @@ struct ip_set {
u32 timeout;
/* Number of elements (vs timeout) */
u32 elements;
+   /* Size of the dynamic extensions (vs timeout) */
+   size_t ext_size;
/* Element data size */
size_t dsize;
/* Offsets to extensions in elements */
@@ -268,7 +272,7 @@ ip_set_ext_destroy(struct ip_set *set, void *data)
 */
if (SET_WITH_COMMENT(set))
ip_set_extensions[IPSET_EXT_ID_COMMENT].destroy(
-   ext_comment(data, set));
+   set, ext_comment(data, set));
 }
 
 static inline int
diff --git a/include/linux/netfilter/ipset/ip_set_comment.h 
b/include/linux/netfilter/ipset/ip_set_comment.h
index 5444b1bbe656..8e2bab1e8e90 100644
--- a/include/linux/netfilter/ipset/ip_set_comment.h
+++ b/include/linux/netfilter/ipset/ip_set_comment.h
@@ -20,13 +20,14 @@ ip_set_comment_uget(struct nlattr *tb)
  * The kadt functions don't use the comment extensions in any way.
  */
 static inline void
-ip_set_init_comment(struct ip_set_comment *comment,
+ip_set_init_comment(struct ip_set *set, struct ip_set_comment *comment,
const struct ip_set_ext *ext)
 {
struct ip_set_comment_rcu *c = rcu_dereference_protected(comment->c, 1);
size_t len = ext->comment ? strlen(ext->comment) : 0;
 
if (unlikely(c)) {
+   set->ext_size -= sizeof(*c) + strlen(c->str) + 1;
kfree_rcu(c, rcu);
rcu_assign_pointer(comment->c, NULL);
}
@@ -38,6 +39,7 @@ ip_set_init_comment(struct ip_set_comment *comment,
if (unlikely(!c))
return;
strlcpy(c->str, ext->comment, len + 1);
+   set->ext_size += sizeof(*c) + strlen(c->str) + 1;
rcu_assign_pointer(comment->c, c);
 }
 
@@ -58,13 +60,14 @@ ip_set_put_comment(struct sk_buff *skb, const struct 
ip_set_comment *comment)
  * of the set data anymore.
  */
 static inline void
-ip_set_comment_free(struct ip_set_comment *comment)
+ip_set_comment_free(struct ip_set *set, struct ip_set_comment *comment)
 {
struct ip_set_comment_rcu *c;
 
c = rcu_dereference_protected(comment->c, 1);
if (unlikely(!c))
return;
+   set->ext_size -= sizeof(*c) + strlen(c->str) + 1;
kfree_rcu(c, rcu);
rcu_assign_pointer(comment->c, NULL);
 }
diff --git a/net/netfilter/ipset/ip_set_bitmap_gen.h 
b/net/netfilter/ipset/ip_set_bitmap_gen.h
index 1810d1c06e3d..f8ea26cafa30 100644
--- a/net/netfilter/ipset/ip_set_bitmap_gen.h
+++ b/net/netfilter/ipset/ip_set_bitmap_gen.h
@@ -84,6 +84,7 @@ mtype_flush(struct ip_set *set)
mtype_ext_cleanup(set);
memset(map->members, 0, map->memsize);
set->elements = 0;
+   set->ext_size = 0;
 }
 
 /* Calculate the actual memory size of the set data */
@@ -99,7 +100,7 @@ mtype_head(struct ip_set *set, struct sk_buff *skb)
 {
const struct mtype *map = set->data;
struct nlattr *nested;
-   size_t memsize = mtype_memsize(map, set->dsize);
+   size_t memsize = mtype_memsize(map, set->dsize) + set->ext_size;
 
nested = ipset_nest_start(skb, IPSET_ATTR_DATA);
if (!nested)
@@ -173,7 +174,7 @@ mtype_add(struct ip_set *set, void *value, const struct 
ip_set_ext *ext,
if (SET_WITH_COUNTER(set))
ip_set_init_counter(ext_counter(x, set), ext);
if (SET_WITH_COMMENT(set))
-   ip_set_init_comment(ext_comment(x, set), ext);
+

[PATCH 23/39] netfilter: ipset: Regroup ip_set_put_extensions and add extern

2016-11-13 Thread Pablo Neira Ayuso

From: Jozsef Kadlecsik 

Cleanup: group ip_set_put_extensions and ip_set_get_extensions
together and add missing extern.

Signed-off-by: Jozsef Kadlecsik 
---
 include/linux/netfilter/ipset/ip_set.h | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/include/linux/netfilter/ipset/ip_set.h 
b/include/linux/netfilter/ipset/ip_set.h
index b5bd0fb3d07b..7a218eb74887 100644
--- a/include/linux/netfilter/ipset/ip_set.h
+++ b/include/linux/netfilter/ipset/ip_set.h
@@ -331,6 +331,8 @@ extern size_t ip_set_elem_len(struct ip_set *set, struct 
nlattr *tb[],
  size_t len, size_t align);
 extern int ip_set_get_extensions(struct ip_set *set, struct nlattr *tb[],
 struct ip_set_ext *ext);
+extern int ip_set_put_extensions(struct sk_buff *skb, const struct ip_set *set,
+const void *e, bool active);
 
 static inline int
 ip_set_get_hostipaddr4(struct nlattr *nla, u32 *ipaddr)
@@ -449,10 +451,6 @@ bitmap_bytes(u32 a, u32 b)
 #include 
 #include 
 
-int
-ip_set_put_extensions(struct sk_buff *skb, const struct ip_set *set,
- const void *e, bool active);
-
 #define IP_SET_INIT_KEXT(skb, opt, set)\
{ .bytes = (skb)->len, .packets = 1,\
  .timeout = ip_set_adt_opt_timeout(opt, set) }
-- 
2.1.4

[PATCH 38/39] netfilter: conntrack: remove unused netns_ct member

2016-11-13 Thread Pablo Neira Ayuso

From: Florian Westphal 

since 23014011ba420 ('netfilter: conntrack: support a fixed size of 128 
distinct labels')
this isn't needed anymore.

Signed-off-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
---
 include/net/netns/conntrack.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/include/net/netns/conntrack.h b/include/net/netns/conntrack.h
index e469e85de3f9..3d06d94d2e52 100644
--- a/include/net/netns/conntrack.h
+++ b/include/net/netns/conntrack.h
@@ -91,7 +91,6 @@ struct netns_ct {
struct nf_ip_netnf_ct_proto;
 #if defined(CONFIG_NF_CONNTRACK_LABELS)
unsigned intlabels_used;
-   u8  label_words;
 #endif
 };
 #endif
-- 
2.1.4

[PATCH 17/39] netfilter: ipset: Mark some helper args as const.

2016-11-13 Thread Pablo Neira Ayuso

From: Jozsef Kadlecsik 

Mark some of the helpers arguments as const.

Ported from a patch proposed by Sergey Popovich .

Suggested-by: Sergey Popovich 
Signed-off-by: Jozsef Kadlecsik 
---
 include/linux/netfilter/ipset/ip_set.h | 4 ++--
 include/linux/netfilter/ipset/ip_set_comment.h | 2 +-
 include/linux/netfilter/ipset/ip_set_timeout.h | 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/netfilter/ipset/ip_set.h 
b/include/linux/netfilter/ipset/ip_set.h
index 5b1fd090f34b..524467f933bf 100644
--- a/include/linux/netfilter/ipset/ip_set.h
+++ b/include/linux/netfilter/ipset/ip_set.h
@@ -346,7 +346,7 @@ ip_set_get_skbinfo(struct ip_set_skbinfo *skbinfo,
 }
 
 static inline bool
-ip_set_put_skbinfo(struct sk_buff *skb, struct ip_set_skbinfo *skbinfo)
+ip_set_put_skbinfo(struct sk_buff *skb, const struct ip_set_skbinfo *skbinfo)
 {
/* Send nonzero parameters only */
return ((skbinfo->skbmark || skbinfo->skbmarkmask) &&
@@ -373,7 +373,7 @@ ip_set_init_skbinfo(struct ip_set_skbinfo *skbinfo,
 }
 
 static inline bool
-ip_set_put_counter(struct sk_buff *skb, struct ip_set_counter *counter)
+ip_set_put_counter(struct sk_buff *skb, const struct ip_set_counter *counter)
 {
return nla_put_net64(skb, IPSET_ATTR_BYTES,
 cpu_to_be64(ip_set_get_bytes(counter)),
diff --git a/include/linux/netfilter/ipset/ip_set_comment.h 
b/include/linux/netfilter/ipset/ip_set_comment.h
index 8d0248525957..bae5c7609be2 100644
--- a/include/linux/netfilter/ipset/ip_set_comment.h
+++ b/include/linux/netfilter/ipset/ip_set_comment.h
@@ -43,7 +43,7 @@ ip_set_init_comment(struct ip_set_comment *comment,
 
 /* Used only when dumping a set, protected by rcu_read_lock_bh() */
 static inline int
-ip_set_put_comment(struct sk_buff *skb, struct ip_set_comment *comment)
+ip_set_put_comment(struct sk_buff *skb, const struct ip_set_comment *comment)
 {
struct ip_set_comment_rcu *c = rcu_dereference_bh(comment->c);
 
diff --git a/include/linux/netfilter/ipset/ip_set_timeout.h 
b/include/linux/netfilter/ipset/ip_set_timeout.h
index 1d6a935c1ac5..bfb3531fd88a 100644
--- a/include/linux/netfilter/ipset/ip_set_timeout.h
+++ b/include/linux/netfilter/ipset/ip_set_timeout.h
@@ -40,7 +40,7 @@ ip_set_timeout_uget(struct nlattr *tb)
 }
 
 static inline bool
-ip_set_timeout_expired(unsigned long *t)
+ip_set_timeout_expired(const unsigned long *t)
 {
return *t != IPSET_ELEM_PERMANENT && time_is_before_jiffies(*t);
 }
@@ -63,7 +63,7 @@ ip_set_timeout_set(unsigned long *timeout, u32 value)
 }
 
 static inline u32
-ip_set_timeout_get(unsigned long *timeout)
+ip_set_timeout_get(const unsigned long *timeout)
 {
return *timeout == IPSET_ELEM_PERMANENT ? 0 :
jiffies_to_msecs(*timeout - jiffies)/MSEC_PER_SEC;
-- 
2.1.4

[PATCH 14/39] udp: provide udp{4,6}_lib_lookup for nf_socket_ipv{4,6}

2016-11-13 Thread Pablo Neira Ayuso

From: Arnd Bergmann 

Since commit ca065d0cf80f ("udp: no longer use SLAB_DESTROY_BY_RCU")
the udp6_lib_lookup and udp4_lib_lookup functions are only
provided when it is actually possible to call them.

However, moving the callers now caused a link error:

net/built-in.o: In function `nf_sk_lookup_slow_v6':
(.text+0x131a39): undefined reference to `udp6_lib_lookup'
net/ipv4/netfilter/nf_socket_ipv4.o: In function `nf_sk_lookup_slow_v4':
nf_socket_ipv4.c:(.text.nf_sk_lookup_slow_v4+0x114): undefined reference to 
`udp4_lib_lookup'

This extends the #ifdef so we also provide the functions when
CONFIG_NF_SOCKET_IPV4 or CONFIG_NF_SOCKET_IPV6, respectively
are set.

Fixes: 8db4c5be88f6 ("netfilter: move socket lookup infrastructure to 
nf_socket_ipv{4,6}.c")
Signed-off-by: Arnd Bergmann 
Signed-off-by: Pablo Neira Ayuso 
---
 net/ipv4/udp.c | 3 ++-
 net/ipv6/udp.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 195992e0440d..395361b1398e 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -580,7 +580,8 @@ EXPORT_SYMBOL_GPL(udp4_lib_lookup_skb);
  * Does increment socket refcount.
  */
 #if IS_ENABLED(CONFIG_NETFILTER_XT_MATCH_SOCKET) || \
-IS_ENABLED(CONFIG_NETFILTER_XT_TARGET_TPROXY)
+IS_ENABLED(CONFIG_NETFILTER_XT_TARGET_TPROXY) || \
+IS_ENABLED(CONFIG_NF_SOCKET_IPV4)
 struct sock *udp4_lib_lookup(struct net *net, __be32 saddr, __be16 sport,
 __be32 daddr, __be16 dport, int dif)
 {
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index a7700bbf6788..3e232585b0ff 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -302,7 +302,8 @@ EXPORT_SYMBOL_GPL(udp6_lib_lookup_skb);
  * Does increment socket refcount.
  */
 #if IS_ENABLED(CONFIG_NETFILTER_XT_MATCH_SOCKET) || \
-IS_ENABLED(CONFIG_NETFILTER_XT_TARGET_TPROXY)
+IS_ENABLED(CONFIG_NETFILTER_XT_TARGET_TPROXY) || \
+IS_ENABLED(CONFIG_NF_SOCKET_IPV6)
 struct sock *udp6_lib_lookup(struct net *net, const struct in6_addr *saddr, 
__be16 sport,
 const struct in6_addr *daddr, __be16 dport, int 
dif)
 {
-- 
2.1.4

[PATCH 32/39] netfilter: ipset: Make struct htype per ipset family

2016-11-13 Thread Pablo Neira Ayuso

From: Jozsef Kadlecsik 

Before this patch struct htype created at the first source
of ip_set_hash_gen.h and it is common for both IPv4 and IPv6
set variants.

Make struct htype per ipset family and use NLEN to make
nets array fixed size to simplify struct htype allocation.

Ported from a patch proposed by Sergey Popovich .

Signed-off-by: Jozsef Kadlecsik 
---
 net/netfilter/ipset/ip_set_hash_gen.h| 51 +++-
 net/netfilter/ipset/ip_set_hash_ip.c | 10 +++---
 net/netfilter/ipset/ip_set_hash_ipmark.c | 10 +++---
 net/netfilter/ipset/ip_set_hash_ipport.c |  6 ++--
 net/netfilter/ipset/ip_set_hash_ipportip.c   |  6 ++--
 net/netfilter/ipset/ip_set_hash_ipportnet.c  | 10 +++---
 net/netfilter/ipset/ip_set_hash_net.c|  8 ++---
 net/netfilter/ipset/ip_set_hash_netiface.c   |  8 ++---
 net/netfilter/ipset/ip_set_hash_netnet.c |  8 ++---
 net/netfilter/ipset/ip_set_hash_netport.c| 10 +++---
 net/netfilter/ipset/ip_set_hash_netportnet.c | 10 +++---
 11 files changed, 63 insertions(+), 74 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_hash_gen.h 
b/net/netfilter/ipset/ip_set_hash_gen.h
index de1d16fd4121..c600f6d9f15e 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -166,6 +166,18 @@ htable_bits(u32 hashsize)
 
 #endif /* _IP_SET_HASH_GEN_H */
 
+#ifndef MTYPE
+#error "MTYPE is not defined!"
+#endif
+
+#ifndef HTYPE
+#error "HTYPE is not defined!"
+#endif
+
+#ifndef HOST_MASK
+#error "HOST_MASK is not defined!"
+#endif
+
 /* Family dependent templates */
 
 #undef ahash_data
@@ -189,7 +201,6 @@ htable_bits(u32 hashsize)
 #undef mtype_same_set
 #undef mtype_kadt
 #undef mtype_uadt
-#undef mtype
 
 #undef mtype_add
 #undef mtype_del
@@ -205,6 +216,7 @@ htable_bits(u32 hashsize)
 #undef mtype_variant
 #undef mtype_data_match
 
+#undef htype
 #undef HKEY
 
 #define mtype_data_equal   IPSET_TOKEN(MTYPE, _data_equal)
@@ -231,7 +243,6 @@ htable_bits(u32 hashsize)
 #define mtype_same_set IPSET_TOKEN(MTYPE, _same_set)
 #define mtype_kadt IPSET_TOKEN(MTYPE, _kadt)
 #define mtype_uadt IPSET_TOKEN(MTYPE, _uadt)
-#define mtype  MTYPE
 
 #define mtype_add  IPSET_TOKEN(MTYPE, _add)
 #define mtype_del  IPSET_TOKEN(MTYPE, _del)
@@ -247,18 +258,12 @@ htable_bits(u32 hashsize)
 #define mtype_variant  IPSET_TOKEN(MTYPE, _variant)
 #define mtype_data_match   IPSET_TOKEN(MTYPE, _data_match)
 
-#ifndef MTYPE
-#error "MTYPE is not defined!"
-#endif
-
-#ifndef HOST_MASK
-#error "HOST_MASK is not defined!"
-#endif
-
 #ifndef HKEY_DATALEN
 #define HKEY_DATALEN   sizeof(struct mtype_elem)
 #endif
 
+#define htype  MTYPE
+
 #define HKEY(data, initval, htable_bits)   \
 ({ \
const u32 *__k = (const u32 *)data; \
@@ -269,33 +274,26 @@ htable_bits(u32 hashsize)
jhash2(__k, __l, initval) & jhash_mask(htable_bits);\
 })
 
-#ifndef htype
-#ifndef HTYPE
-#error "HTYPE is not defined!"
-#endif /* HTYPE */
-#define htype  HTYPE
-
 /* The generic hash structure */
 struct htype {
struct htable __rcu *table; /* the hash table */
+   struct timer_list gc;   /* garbage collection when timeout enabled */
u32 maxelem;/* max elements in the hash */
u32 initval;/* random jhash init value */
 #ifdef IP_SET_HASH_WITH_MARKMASK
u32 markmask;   /* markmask value for mark mask to store */
 #endif
-   struct timer_list gc;   /* garbage collection when timeout enabled */
-   struct mtype_elem next; /* temporary storage for uadd */
 #ifdef IP_SET_HASH_WITH_MULTI
u8 ahash_max;   /* max elements in an array block */
 #endif
 #ifdef IP_SET_HASH_WITH_NETMASK
u8 netmask; /* netmask value for subnets to store */
 #endif
+   struct mtype_elem next; /* temporary storage for uadd */
 #ifdef IP_SET_HASH_WITH_NETS
-   struct net_prefixes nets[0]; /* book-keeping of prefixes */
+   struct net_prefixes nets[NLEN]; /* book-keeping of prefixes */
 #endif
 };
-#endif /* htype */
 
 #ifdef IP_SET_HASH_WITH_NETS
 /* Network cidr size book keeping when the hash stores different
@@ -348,13 +346,7 @@ mtype_del_cidr(struct htype *h, u8 cidr, u8 n)
 static size_t
 mtype_ahash_memsize(const struct htype *h, const struct htable *t)
 {
-   size_t memsize = sizeof(*h) + sizeof(*t);
-
-#ifdef IP_SET_HASH_WITH_NETS
-   memsize += sizeof(struct net_prefixes) * NLEN;
-#endif
-
-   return memsize;
+   return sizeof(*h) + sizeof(*t);
 }
 
 /* Get the ith element from the array block n */
@@ -392,7 +384,7 @@ mtype_flush(struct ip_set *set)
kfree_rcu(n, rcu);
}
 #ifdef IP_SET_HASH_WITH_NETS
-

[PATCH 31/39] netfilter: ipset: Optimize hash creation routine

2016-11-13 Thread Pablo Neira Ayuso

From: Jozsef Kadlecsik 

Exit as easly as possible on error and use RCU_INIT_POINTER()
as set is not seen at creation time.

Signed-off-by: Jozsef Kadlecsik 
---
 net/netfilter/ipset/ip_set_hash_gen.h | 63 ---
 1 file changed, 29 insertions(+), 34 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_hash_gen.h 
b/net/netfilter/ipset/ip_set_hash_gen.h
index 34f115f874ab..de1d16fd4121 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -1241,41 +1241,35 @@ IPSET_TOKEN(HTYPE, _create)(struct net *net, struct 
ip_set *set,
struct htype *h;
struct htable *t;
 
+   pr_debug("Create set %s with family %s\n",
+set->name, set->family == NFPROTO_IPV4 ? "inet" : "inet6");
+
 #ifndef IP_SET_PROTO_UNDEF
if (!(set->family == NFPROTO_IPV4 || set->family == NFPROTO_IPV6))
return -IPSET_ERR_INVALID_FAMILY;
 #endif
 
-#ifdef IP_SET_HASH_WITH_MARKMASK
-   markmask = 0x;
-#endif
-#ifdef IP_SET_HASH_WITH_NETMASK
-   netmask = set->family == NFPROTO_IPV4 ? 32 : 128;
-   pr_debug("Create set %s with family %s\n",
-set->name, set->family == NFPROTO_IPV4 ? "inet" : "inet6");
-#endif
-
if (unlikely(!ip_set_optattr_netorder(tb, IPSET_ATTR_HASHSIZE) ||
 !ip_set_optattr_netorder(tb, IPSET_ATTR_MAXELEM) ||
 !ip_set_optattr_netorder(tb, IPSET_ATTR_TIMEOUT) ||
 !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS)))
return -IPSET_ERR_PROTOCOL;
+
 #ifdef IP_SET_HASH_WITH_MARKMASK
/* Separated condition in order to avoid directive in argument list */
if (unlikely(!ip_set_optattr_netorder(tb, IPSET_ATTR_MARKMASK)))
return -IPSET_ERR_PROTOCOL;
-#endif
 
-   if (tb[IPSET_ATTR_HASHSIZE]) {
-   hashsize = ip_set_get_h32(tb[IPSET_ATTR_HASHSIZE]);
-   if (hashsize < IPSET_MIMINAL_HASHSIZE)
-   hashsize = IPSET_MIMINAL_HASHSIZE;
+   markmask = 0x;
+   if (tb[IPSET_ATTR_MARKMASK]) {
+   markmask = ntohl(nla_get_be32(tb[IPSET_ATTR_MARKMASK]));
+   if (markmask == 0)
+   return -IPSET_ERR_INVALID_MARKMASK;
}
-
-   if (tb[IPSET_ATTR_MAXELEM])
-   maxelem = ip_set_get_h32(tb[IPSET_ATTR_MAXELEM]);
+#endif
 
 #ifdef IP_SET_HASH_WITH_NETMASK
+   netmask = set->family == NFPROTO_IPV4 ? 32 : 128;
if (tb[IPSET_ATTR_NETMASK]) {
netmask = nla_get_u8(tb[IPSET_ATTR_NETMASK]);
 
@@ -1285,14 +1279,15 @@ IPSET_TOKEN(HTYPE, _create)(struct net *net, struct 
ip_set *set,
return -IPSET_ERR_INVALID_NETMASK;
}
 #endif
-#ifdef IP_SET_HASH_WITH_MARKMASK
-   if (tb[IPSET_ATTR_MARKMASK]) {
-   markmask = ntohl(nla_get_be32(tb[IPSET_ATTR_MARKMASK]));
 
-   if (markmask == 0)
-   return -IPSET_ERR_INVALID_MARKMASK;
+   if (tb[IPSET_ATTR_HASHSIZE]) {
+   hashsize = ip_set_get_h32(tb[IPSET_ATTR_HASHSIZE]);
+   if (hashsize < IPSET_MIMINAL_HASHSIZE)
+   hashsize = IPSET_MIMINAL_HASHSIZE;
}
-#endif
+
+   if (tb[IPSET_ATTR_MAXELEM])
+   maxelem = ip_set_get_h32(tb[IPSET_ATTR_MAXELEM]);
 
hsize = sizeof(*h);
 #ifdef IP_SET_HASH_WITH_NETS
@@ -1302,16 +1297,6 @@ IPSET_TOKEN(HTYPE, _create)(struct net *net, struct 
ip_set *set,
if (!h)
return -ENOMEM;
 
-   h->maxelem = maxelem;
-#ifdef IP_SET_HASH_WITH_NETMASK
-   h->netmask = netmask;
-#endif
-#ifdef IP_SET_HASH_WITH_MARKMASK
-   h->markmask = markmask;
-#endif
-   get_random_bytes(>initval, sizeof(h->initval));
-   set->timeout = IPSET_NO_TIMEOUT;
-
hbits = htable_bits(hashsize);
hsize = htable_size(hbits);
if (hsize == 0) {
@@ -1323,8 +1308,17 @@ IPSET_TOKEN(HTYPE, _create)(struct net *net, struct 
ip_set *set,
kfree(h);
return -ENOMEM;
}
+   h->maxelem = maxelem;
+#ifdef IP_SET_HASH_WITH_NETMASK
+   h->netmask = netmask;
+#endif
+#ifdef IP_SET_HASH_WITH_MARKMASK
+   h->markmask = markmask;
+#endif
+   get_random_bytes(>initval, sizeof(h->initval));
+
t->htable_bits = hbits;
-   rcu_assign_pointer(h->table, t);
+   RCU_INIT_POINTER(h->table, t);
 
set->data = h;
 #ifndef IP_SET_PROTO_UNDEF
@@ -1342,6 +1336,7 @@ IPSET_TOKEN(HTYPE, _create)(struct net *net, struct 
ip_set *set,
__alignof__(struct IPSET_TOKEN(HTYPE, 6_elem)));
}
 #endif
+   set->timeout = IPSET_NO_TIMEOUT;
if (tb[IPSET_ATTR_TIMEOUT]) {
set->timeout = ip_set_timeout_uget(tb[IPSET_ATTR_TIMEOUT]);
 #ifndef IP_SET_PROTO_UNDEF
-- 
2.1.4

[PATCH 35/39] netfilter: ipset: hash:ipmac type support added to ipset

2016-11-13 Thread Pablo Neira Ayuso

From: Tomasz Chilinski 

Introduce the hash:ipmac type.

Signed-off-by: Tomasz Chili??ski 
Signed-off-by: Jozsef Kadlecsik 
---
 net/netfilter/ipset/Kconfig |   9 +
 net/netfilter/ipset/Makefile|   1 +
 net/netfilter/ipset/ip_set_hash_ipmac.c | 315 
 3 files changed, 325 insertions(+)
 create mode 100644 net/netfilter/ipset/ip_set_hash_ipmac.c

diff --git a/net/netfilter/ipset/Kconfig b/net/netfilter/ipset/Kconfig
index 234a8ec82076..4083a8051f0f 100644
--- a/net/netfilter/ipset/Kconfig
+++ b/net/netfilter/ipset/Kconfig
@@ -99,6 +99,15 @@ config IP_SET_HASH_IPPORTNET
 
  To compile it as a module, choose M here.  If unsure, say N.
 
+config IP_SET_HASH_IPMAC
+   tristate "hash:ip,mac set support"
+   depends on IP_SET
+   help
+ This option adds the hash:ip,mac set type support, by which
+ one can store IPv4/IPv6 address and MAC (ethernet address) pairs in a 
set.
+
+ To compile it as a module, choose M here.  If unsure, say N.
+
 config IP_SET_HASH_MAC
tristate "hash:mac set support"
depends on IP_SET
diff --git a/net/netfilter/ipset/Makefile b/net/netfilter/ipset/Makefile
index 3dbd5e958489..28ec148df02d 100644
--- a/net/netfilter/ipset/Makefile
+++ b/net/netfilter/ipset/Makefile
@@ -14,6 +14,7 @@ obj-$(CONFIG_IP_SET_BITMAP_PORT) += ip_set_bitmap_port.o
 
 # hash types
 obj-$(CONFIG_IP_SET_HASH_IP) += ip_set_hash_ip.o
+obj-$(CONFIG_IP_SET_HASH_IPMAC) += ip_set_hash_ipmac.o
 obj-$(CONFIG_IP_SET_HASH_IPMARK) += ip_set_hash_ipmark.o
 obj-$(CONFIG_IP_SET_HASH_IPPORT) += ip_set_hash_ipport.o
 obj-$(CONFIG_IP_SET_HASH_IPPORTIP) += ip_set_hash_ipportip.o
diff --git a/net/netfilter/ipset/ip_set_hash_ipmac.c 
b/net/netfilter/ipset/ip_set_hash_ipmac.c
new file mode 100644
index ..d9eb144b01d6
--- /dev/null
+++ b/net/netfilter/ipset/ip_set_hash_ipmac.c
@@ -0,0 +1,315 @@
+/* Copyright (C) 2016 Tomasz Chilinski 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+/* Kernel module implementing an IP set type: the hash:ip,mac type */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#define IPSET_TYPE_REV_MIN 0
+#define IPSET_TYPE_REV_MAX 0
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Tomasz Chilinski ");
+IP_SET_MODULE_DESC("hash:ip,mac", IPSET_TYPE_REV_MIN, IPSET_TYPE_REV_MAX);
+MODULE_ALIAS("ip_set_hash:ip,mac");
+
+/* Type specific function prefix */
+#define HTYPE  hash_ipmac
+
+/* Zero valued element is not supported */
+static const unsigned char invalid_ether[ETH_ALEN] = { 0 };
+
+/* IPv4 variant */
+
+/* Member elements */
+struct hash_ipmac4_elem {
+   /* Zero valued IP addresses cannot be stored */
+   __be32 ip;
+   union {
+   unsigned char ether[ETH_ALEN];
+   __be32 foo[2];
+   };
+};
+
+/* Common functions */
+
+static inline bool
+hash_ipmac4_data_equal(const struct hash_ipmac4_elem *e1,
+  const struct hash_ipmac4_elem *e2,
+  u32 *multi)
+{
+   return e1->ip == e2->ip && ether_addr_equal(e1->ether, e2->ether);
+}
+
+static bool
+hash_ipmac4_data_list(struct sk_buff *skb, const struct hash_ipmac4_elem *e)
+{
+   if (nla_put_ipaddr4(skb, IPSET_ATTR_IP, e->ip) ||
+   nla_put(skb, IPSET_ATTR_ETHER, ETH_ALEN, e->ether))
+   goto nla_put_failure;
+   return 0;
+
+nla_put_failure:
+   return 1;
+}
+
+static inline void
+hash_ipmac4_data_next(struct hash_ipmac4_elem *next,
+ const struct hash_ipmac4_elem *e)
+{
+   next->ip = e->ip;
+}
+
+#define MTYPE  hash_ipmac4
+#define PF 4
+#define HOST_MASK  32
+#define HKEY_DATALEN   sizeof(struct hash_ipmac4_elem)
+#include "ip_set_hash_gen.h"
+
+static int
+hash_ipmac4_kadt(struct ip_set *set, const struct sk_buff *skb,
+const struct xt_action_param *par,
+enum ipset_adt adt, struct ip_set_adt_opt *opt)
+{
+   ipset_adtfn adtfn = set->variant->adt[adt];
+   struct hash_ipmac4_elem e = { .ip = 0, { .foo[0] = 0, .foo[1] = 0 } };
+   struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set);
+
+/* MAC can be src only */
+   if (!(opt->flags & IPSET_DIM_TWO_SRC))
+   return 0;
+
+   if (skb_mac_header(skb) < skb->head ||
+   (skb_mac_header(skb) + ETH_HLEN) > skb->data)
+   return -EINVAL;
+
+   memcpy(e.ether, eth_hdr(skb)->h_source, ETH_ALEN);
+   if (ether_addr_equal(e.ether, invalid_ether))
+   return -EINVAL;
+
+   ip4addrptr(skb,

[PATCH 12/39] netfilter: nf_tables: simplify the basic expressions' init routine

2016-11-13 Thread Pablo Neira Ayuso

From: Liping Zhang 

Some basic expressions are built into nf_tables.ko, such as nft_cmp,
nft_lookup, nft_range and so on. But these basic expressions' init
routine is a little ugly, too many goto errX labels, and we forget
to call nft_range_module_exit in the exit routine, although it is
harmless.

Acctually, the init and exit routines of these basic expressions
are same, i.e. do nft_register_expr in the init routine and do
nft_unregister_expr in the exit routine.

So it's better to arrange them into an array and deal with them
together.

Signed-off-by: Liping Zhang 
Signed-off-by: Pablo Neira Ayuso 
---
 include/net/netfilter/nf_tables_core.h | 33 --
 net/netfilter/nf_tables_core.c | 80 +++---
 net/netfilter/nft_bitwise.c| 13 +-
 net/netfilter/nft_byteorder.c  | 13 +-
 net/netfilter/nft_cmp.c| 13 +-
 net/netfilter/nft_dynset.c | 13 +-
 net/netfilter/nft_immediate.c  | 13 +-
 net/netfilter/nft_lookup.c | 13 +-
 net/netfilter/nft_payload.c| 13 +-
 net/netfilter/nft_range.c  | 13 +-
 10 files changed, 43 insertions(+), 174 deletions(-)

diff --git a/include/net/netfilter/nf_tables_core.h 
b/include/net/netfilter/nf_tables_core.h
index 00f4f6b1b1ba..862373d4ea9d 100644
--- a/include/net/netfilter/nf_tables_core.h
+++ b/include/net/netfilter/nf_tables_core.h
@@ -1,12 +1,18 @@
 #ifndef _NET_NF_TABLES_CORE_H
 #define _NET_NF_TABLES_CORE_H
 
+extern struct nft_expr_type nft_imm_type;
+extern struct nft_expr_type nft_cmp_type;
+extern struct nft_expr_type nft_lookup_type;
+extern struct nft_expr_type nft_bitwise_type;
+extern struct nft_expr_type nft_byteorder_type;
+extern struct nft_expr_type nft_payload_type;
+extern struct nft_expr_type nft_dynset_type;
+extern struct nft_expr_type nft_range_type;
+
 int nf_tables_core_module_init(void);
 void nf_tables_core_module_exit(void);
 
-int nft_immediate_module_init(void);
-void nft_immediate_module_exit(void);
-
 struct nft_cmp_fast_expr {
u32 data;
enum nft_registers  sreg:8;
@@ -25,24 +31,6 @@ static inline u32 nft_cmp_fast_mask(unsigned int len)
 
 extern const struct nft_expr_ops nft_cmp_fast_ops;
 
-int nft_cmp_module_init(void);
-void nft_cmp_module_exit(void);
-
-int nft_range_module_init(void);
-void nft_range_module_exit(void);
-
-int nft_lookup_module_init(void);
-void nft_lookup_module_exit(void);
-
-int nft_dynset_module_init(void);
-void nft_dynset_module_exit(void);
-
-int nft_bitwise_module_init(void);
-void nft_bitwise_module_exit(void);
-
-int nft_byteorder_module_init(void);
-void nft_byteorder_module_exit(void);
-
 struct nft_payload {
enum nft_payload_bases  base:8;
u8  offset;
@@ -62,7 +50,4 @@ struct nft_payload_set {
 extern const struct nft_expr_ops nft_payload_fast_ops;
 extern struct static_key_false nft_trace_enabled;
 
-int nft_payload_module_init(void);
-void nft_payload_module_exit(void);
-
 #endif /* _NET_NF_TABLES_CORE_H */
diff --git a/net/netfilter/nf_tables_core.c b/net/netfilter/nf_tables_core.c
index b63b1edb76a6..65dbeadcb118 100644
--- a/net/netfilter/nf_tables_core.c
+++ b/net/netfilter/nf_tables_core.c
@@ -232,68 +232,40 @@ nft_do_chain(struct nft_pktinfo *pkt, void *priv)
 }
 EXPORT_SYMBOL_GPL(nft_do_chain);
 
+static struct nft_expr_type *nft_basic_types[] = {
+   _imm_type,
+   _cmp_type,
+   _lookup_type,
+   _bitwise_type,
+   _byteorder_type,
+   _payload_type,
+   _dynset_type,
+   _range_type,
+};
+
 int __init nf_tables_core_module_init(void)
 {
-   int err;
-
-   err = nft_immediate_module_init();
-   if (err < 0)
-   goto err1;
-
-   err = nft_cmp_module_init();
-   if (err < 0)
-   goto err2;
-
-   err = nft_lookup_module_init();
-   if (err < 0)
-   goto err3;
-
-   err = nft_bitwise_module_init();
-   if (err < 0)
-   goto err4;
+   int err, i;
 
-   err = nft_byteorder_module_init();
-   if (err < 0)
-   goto err5;
-
-   err = nft_payload_module_init();
-   if (err < 0)
-   goto err6;
-
-   err = nft_dynset_module_init();
-   if (err < 0)
-   goto err7;
-
-   err = nft_range_module_init();
-   if (err < 0)
-   goto err8;
+   for (i = 0; i < ARRAY_SIZE(nft_basic_types); i++) {
+   err = nft_register_expr(nft_basic_types[i]);
+   if (err)
+   goto err;
+   }
 
return 0;
-err8:
-   nft_dynset_module_exit();
-err7:
-   nft_payload_module_exit();
-err6:
-   nft_byteorder_module_exit();
-err5:
-   nft_bitwise_module_exit();
-err4:
-   nft_lookup_module_exit();
-err3:
-   nft_cmp_module_exit();
-err2:
-

[PATCH 21/39] netfilter: ipset: Split extensions into separate files

2016-11-13 Thread Pablo Neira Ayuso

From: Jozsef Kadlecsik 

Cleanup to separate all extensions into individual files.

Ported from a patch proposed by Sergey Popovich .

Suggested-by: Sergey Popovich 
Signed-off-by: Jozsef Kadlecsik 
---
 include/linux/netfilter/ipset/ip_set.h | 95 +-
 include/linux/netfilter/ipset/ip_set_counter.h | 75 
 include/linux/netfilter/ipset/ip_set_skbinfo.h | 46 +
 3 files changed, 123 insertions(+), 93 deletions(-)
 create mode 100644 include/linux/netfilter/ipset/ip_set_counter.h
 create mode 100644 include/linux/netfilter/ipset/ip_set_skbinfo.h

diff --git a/include/linux/netfilter/ipset/ip_set.h 
b/include/linux/netfilter/ipset/ip_set.h
index 780262124632..b5bd0fb3d07b 100644
--- a/include/linux/netfilter/ipset/ip_set.h
+++ b/include/linux/netfilter/ipset/ip_set.h
@@ -292,99 +292,6 @@ ip_set_put_flags(struct sk_buff *skb, struct ip_set *set)
return nla_put_net32(skb, IPSET_ATTR_CADT_FLAGS, htonl(cadt_flags));
 }
 
-static inline void
-ip_set_add_bytes(u64 bytes, struct ip_set_counter *counter)
-{
-   atomic64_add((long long)bytes, &(counter)->bytes);
-}
-
-static inline void
-ip_set_add_packets(u64 packets, struct ip_set_counter *counter)
-{
-   atomic64_add((long long)packets, &(counter)->packets);
-}
-
-static inline u64
-ip_set_get_bytes(const struct ip_set_counter *counter)
-{
-   return (u64)atomic64_read(&(counter)->bytes);
-}
-
-static inline u64
-ip_set_get_packets(const struct ip_set_counter *counter)
-{
-   return (u64)atomic64_read(&(counter)->packets);
-}
-
-static inline void
-ip_set_update_counter(struct ip_set_counter *counter,
- const struct ip_set_ext *ext,
- struct ip_set_ext *mext, u32 flags)
-{
-   if (ext->packets != ULLONG_MAX &&
-   !(flags & IPSET_FLAG_SKIP_COUNTER_UPDATE)) {
-   ip_set_add_bytes(ext->bytes, counter);
-   ip_set_add_packets(ext->packets, counter);
-   }
-   if (flags & IPSET_FLAG_MATCH_COUNTERS) {
-   mext->packets = ip_set_get_packets(counter);
-   mext->bytes = ip_set_get_bytes(counter);
-   }
-}
-
-static inline bool
-ip_set_put_counter(struct sk_buff *skb, const struct ip_set_counter *counter)
-{
-   return nla_put_net64(skb, IPSET_ATTR_BYTES,
-cpu_to_be64(ip_set_get_bytes(counter)),
-IPSET_ATTR_PAD) ||
-  nla_put_net64(skb, IPSET_ATTR_PACKETS,
-cpu_to_be64(ip_set_get_packets(counter)),
-IPSET_ATTR_PAD);
-}
-
-static inline void
-ip_set_init_counter(struct ip_set_counter *counter,
-   const struct ip_set_ext *ext)
-{
-   if (ext->bytes != ULLONG_MAX)
-   atomic64_set(&(counter)->bytes, (long long)(ext->bytes));
-   if (ext->packets != ULLONG_MAX)
-   atomic64_set(&(counter)->packets, (long long)(ext->packets));
-}
-
-static inline void
-ip_set_get_skbinfo(struct ip_set_skbinfo *skbinfo,
-  const struct ip_set_ext *ext,
-  struct ip_set_ext *mext, u32 flags)
-{
-   mext->skbinfo = *skbinfo;
-}
-
-static inline bool
-ip_set_put_skbinfo(struct sk_buff *skb, const struct ip_set_skbinfo *skbinfo)
-{
-   /* Send nonzero parameters only */
-   return ((skbinfo->skbmark || skbinfo->skbmarkmask) &&
-   nla_put_net64(skb, IPSET_ATTR_SKBMARK,
- cpu_to_be64((u64)skbinfo->skbmark << 32 |
- skbinfo->skbmarkmask),
- IPSET_ATTR_PAD)) ||
-  (skbinfo->skbprio &&
-   nla_put_net32(skb, IPSET_ATTR_SKBPRIO,
- cpu_to_be32(skbinfo->skbprio))) ||
-  (skbinfo->skbqueue &&
-   nla_put_net16(skb, IPSET_ATTR_SKBQUEUE,
-cpu_to_be16(skbinfo->skbqueue)));
-}
-
-static inline void
-ip_set_init_skbinfo(struct ip_set_skbinfo *skbinfo,
-   const struct ip_set_ext *ext)
-{
-   *skbinfo = ext->skbinfo;
-}
-
 /* Netlink CB args */
 enum {
IPSET_CB_NET = 0,   /* net namespace */
@@ -539,6 +446,8 @@ bitmap_bytes(u32 a, u32 b)
 
 #include 
 #include 
+#include 
+#include 
 
 int
 ip_set_put_extensions(struct sk_buff *skb, const struct ip_set *set,
diff --git a/include/linux/netfilter/ipset/ip_set_counter.h 
b/include/linux/netfilter/ipset/ip_set_counter.h
new file mode 100644
index ..bb6fba480118
--- /dev/null
+++ b/include/linux/netfilter/ipset/ip_set_counter.h
@@ -0,0 +1,75 @@
+#ifndef _IP_SET_COUNTER_H
+#define _IP_SET_COUNTER_H
+
+/* Copyright (C) 2015 Jozsef Kadlecsik 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2

[PATCH 28/39] netfilter: ipset: Simplify mtype_expire() for hash types

2016-11-13 Thread Pablo Neira Ayuso

From: Jozsef Kadlecsik 

Remove one leve of intendation by using continue while
iterating over elements in bucket.

Ported from a patch proposed by Sergey Popovich .

Signed-off-by: Jozsef Kadlecsik 
---
 net/netfilter/ipset/ip_set_hash_gen.h | 25 -
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_hash_gen.h 
b/net/netfilter/ipset/ip_set_hash_gen.h
index c4877b6de74f..7999e4c556a5 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -487,21 +487,20 @@ mtype_expire(struct ip_set *set, struct htype *h)
continue;
}
data = ahash_data(n, j, dsize);
-   if (ip_set_timeout_expired(ext_timeout(data, set))) {
-   pr_debug("expired %u/%u\n", i, j);
-   clear_bit(j, n->used);
-   smp_mb__after_atomic();
+   if (!ip_set_timeout_expired(ext_timeout(data, set)))
+   continue;
+   pr_debug("expired %u/%u\n", i, j);
+   clear_bit(j, n->used);
+   smp_mb__after_atomic();
 #ifdef IP_SET_HASH_WITH_NETS
-   for (k = 0; k < IPSET_NET_COUNT; k++)
-   mtype_del_cidr(h,
-   NCIDR_PUT(DCIDR_GET(data->cidr,
-   k)),
-   nets_length, k);
+   for (k = 0; k < IPSET_NET_COUNT; k++)
+   mtype_del_cidr(h,
+   NCIDR_PUT(DCIDR_GET(data->cidr, k)),
+   nets_length, k);
 #endif
-   ip_set_ext_destroy(set, data);
-   set->elements--;
-   d++;
-   }
+   ip_set_ext_destroy(set, data);
+   set->elements--;
+   d++;
}
if (d >= AHASH_INIT_SIZE) {
if (d >= n->size) {
-- 
2.1.4

[PATCH 29/39] netfilter: ipset: Make NLEN compile time constant for hash types

2016-11-13 Thread Pablo Neira Ayuso

From: Jozsef Kadlecsik 

Hash types define HOST_MASK before inclusion of ip_set_hash_gen.h
and the only place where NLEN needed to be calculated at runtime
is *_create() method.

Ported from a patch proposed by Sergey Popovich .

Signed-off-by: Jozsef Kadlecsik 
---
 net/netfilter/ipset/ip_set_hash_gen.h | 51 ---
 1 file changed, 23 insertions(+), 28 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_hash_gen.h 
b/net/netfilter/ipset/ip_set_hash_gen.h
index 7999e4c556a5..6c88c20ae1d4 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -150,20 +150,18 @@ htable_bits(u32 hashsize)
 #define INIT_CIDR(cidr, host_mask) \
DCIDR_PUT(((cidr) ? NCIDR_GET(cidr) : host_mask))
 
-#define SET_HOST_MASK(family)  (family == AF_INET ? 32 : 128)
-
 #ifdef IP_SET_HASH_WITH_NET0
-/* cidr from 0 to SET_HOST_MASK() value and c = cidr + 1 */
-#define NLEN(family)   (SET_HOST_MASK(family) + 1)
+/* cidr from 0 to HOST_MASK value and c = cidr + 1 */
+#define NLEN   (HOST_MASK + 1)
 #define CIDR_POS(c)((c) - 1)
 #else
-/* cidr from 1 to SET_HOST_MASK() value and c = cidr + 1 */
-#define NLEN(family)   SET_HOST_MASK(family)
+/* cidr from 1 to HOST_MASK value and c = cidr + 1 */
+#define NLEN   HOST_MASK
 #define CIDR_POS(c)((c) - 2)
 #endif
 
 #else
-#define NLEN(family)   0
+#define NLEN   0
 #endif /* IP_SET_HASH_WITH_NETS */
 
 #endif /* _IP_SET_HASH_GEN_H */
@@ -298,12 +296,12 @@ struct htype {
  * sized networks. cidr == real cidr + 1 to support /0.
  */
 static void
-mtype_add_cidr(struct htype *h, u8 cidr, u8 nets_length, u8 n)
+mtype_add_cidr(struct htype *h, u8 cidr, u8 n)
 {
int i, j;
 
/* Add in increasing prefix order, so larger cidr first */
-   for (i = 0, j = -1; i < nets_length && h->nets[i].cidr[n]; i++) {
+   for (i = 0, j = -1; i < NLEN && h->nets[i].cidr[n]; i++) {
if (j != -1) {
continue;
} else if (h->nets[i].cidr[n] < cidr) {
@@ -322,11 +320,11 @@ mtype_add_cidr(struct htype *h, u8 cidr, u8 nets_length, 
u8 n)
 }
 
 static void
-mtype_del_cidr(struct htype *h, u8 cidr, u8 nets_length, u8 n)
+mtype_del_cidr(struct htype *h, u8 cidr, u8 n)
 {
-   u8 i, j, net_end = nets_length - 1;
+   u8 i, j, net_end = NLEN - 1;
 
-   for (i = 0; i < nets_length; i++) {
+   for (i = 0; i < NLEN; i++) {
if (h->nets[i].cidr[n] != cidr)
continue;
h->nets[CIDR_POS(cidr)].nets[n]--;
@@ -342,13 +340,12 @@ mtype_del_cidr(struct htype *h, u8 cidr, u8 nets_length, 
u8 n)
 
 /* Calculate the actual memory size of the set data */
 static size_t
-mtype_ahash_memsize(const struct htype *h, const struct htable *t,
-   u8 nets_length)
+mtype_ahash_memsize(const struct htype *h, const struct htable *t)
 {
size_t memsize = sizeof(*h) + sizeof(*t);
 
 #ifdef IP_SET_HASH_WITH_NETS
-   memsize += sizeof(struct net_prefixes) * nets_length;
+   memsize += sizeof(struct net_prefixes) * NLEN;
 #endif
 
return memsize;
@@ -389,7 +386,7 @@ mtype_flush(struct ip_set *set)
kfree_rcu(n, rcu);
}
 #ifdef IP_SET_HASH_WITH_NETS
-   memset(h->nets, 0, sizeof(struct net_prefixes) * NLEN(set->family));
+   memset(h->nets, 0, sizeof(struct net_prefixes) * NLEN);
 #endif
set->elements = 0;
set->ext_size = 0;
@@ -473,7 +470,7 @@ mtype_expire(struct ip_set *set, struct htype *h)
u32 i, j, d;
size_t dsize = set->dsize;
 #ifdef IP_SET_HASH_WITH_NETS
-   u8 k, nets_length = NLEN(set->family);
+   u8 k;
 #endif
 
t = ipset_dereference_protected(h->table, set);
@@ -496,7 +493,7 @@ mtype_expire(struct ip_set *set, struct htype *h)
for (k = 0; k < IPSET_NET_COUNT; k++)
mtype_del_cidr(h,
NCIDR_PUT(DCIDR_GET(data->cidr, k)),
-   nets_length, k);
+   k);
 #endif
ip_set_ext_destroy(set, data);
set->elements--;
@@ -776,7 +773,7 @@ mtype_add(struct ip_set *set, void *value, const struct 
ip_set_ext *ext,
for (i = 0; i < IPSET_NET_COUNT; i++)
mtype_del_cidr(h,
NCIDR_PUT(DCIDR_GET(data->cidr, i)),
-   NLEN(set->family), i);
+   i);
 #endif
ip_set_ext_destroy(set, data);
set->elements--;
@@ -812,8 +809,7 @@ mtype_add(struct ip_set *set, void *value, const struct 
ip_set_ext *ext,
set->elements++;
 #ifdef

[PATCH 05/39] netfilter: x_tables: move hook state into xt_action_param structure

2016-11-13 Thread Pablo Neira Ayuso

Place pointer to hook state in xt_action_param structure instead of
copying the fields that we need. After this change xt_action_param fits
into one cacheline.

This patch also adds a set of new wrapper functions to fetch relevant
hook state structure fields.

Signed-off-by: Pablo Neira Ayuso 
---
 include/linux/netfilter/x_tables.h | 48 +++---
 include/net/netfilter/nf_tables.h  | 11 +++
 net/bridge/netfilter/ebt_arpreply.c|  3 +-
 net/bridge/netfilter/ebt_log.c | 11 +++
 net/bridge/netfilter/ebt_nflog.c   |  6 ++--
 net/bridge/netfilter/ebt_redirect.c|  6 ++--
 net/bridge/netfilter/ebtables.c|  6 +---
 net/ipv4/netfilter/arp_tables.c|  6 +---
 net/ipv4/netfilter/ip_tables.c |  6 +---
 net/ipv4/netfilter/ipt_MASQUERADE.c|  3 +-
 net/ipv4/netfilter/ipt_REJECT.c|  4 +--
 net/ipv4/netfilter/ipt_SYNPROXY.c  |  4 +--
 net/ipv4/netfilter/ipt_rpfilter.c  |  2 +-
 net/ipv6/netfilter/ip6_tables.c|  6 +---
 net/ipv6/netfilter/ip6t_MASQUERADE.c   |  2 +-
 net/ipv6/netfilter/ip6t_REJECT.c   | 23 --
 net/ipv6/netfilter/ip6t_SYNPROXY.c |  4 +--
 net/ipv6/netfilter/ip6t_rpfilter.c |  3 +-
 net/netfilter/ipset/ip_set_core.c  |  6 ++--
 net/netfilter/ipset/ip_set_hash_netiface.c |  2 +-
 net/netfilter/xt_AUDIT.c   | 10 +++
 net/netfilter/xt_LOG.c |  6 ++--
 net/netfilter/xt_NETMAP.c  | 20 ++---
 net/netfilter/xt_NFLOG.c   |  6 ++--
 net/netfilter/xt_NFQUEUE.c |  4 +--
 net/netfilter/xt_REDIRECT.c|  4 +--
 net/netfilter/xt_TCPMSS.c  |  4 +--
 net/netfilter/xt_TEE.c |  4 +--
 net/netfilter/xt_TPROXY.c  | 16 +-
 net/netfilter/xt_addrtype.c| 10 +++
 net/netfilter/xt_cluster.c |  2 +-
 net/netfilter/xt_connlimit.c   |  8 ++---
 net/netfilter/xt_conntrack.c   |  8 ++---
 net/netfilter/xt_devgroup.c|  4 +--
 net/netfilter/xt_dscp.c|  2 +-
 net/netfilter/xt_ipvs.c|  4 +--
 net/netfilter/xt_nfacct.c  |  2 +-
 net/netfilter/xt_osf.c | 10 +++
 net/netfilter/xt_owner.c   |  2 +-
 net/netfilter/xt_pkttype.c |  4 +--
 net/netfilter/xt_policy.c  |  4 +--
 net/netfilter/xt_recent.c  | 10 +++
 net/netfilter/xt_set.c | 26 
 net/netfilter/xt_socket.c  |  4 +--
 net/sched/act_ipt.c| 12 
 net/sched/em_ipset.c   | 17 ++-
 46 files changed, 196 insertions(+), 169 deletions(-)

diff --git a/include/linux/netfilter/x_tables.h 
b/include/linux/netfilter/x_tables.h
index 2ad1a2b289b5..cd4eaf8df445 100644
--- a/include/linux/netfilter/x_tables.h
+++ b/include/linux/netfilter/x_tables.h
@@ -4,6 +4,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 /* Test a struct->invflags and a boolean for inequality */
@@ -17,14 +18,9 @@
  * @target:the target extension
  * @matchinfo: per-match data
  * @targetinfo:per-target data
- * @netnetwork namespace through which the action was invoked
- * @in:input netdevice
- * @out:   output netdevice
+ * @state: pointer to hook state this packet came from
  * @fragoff:   packet is a fragment, this is the data offset
  * @thoff: position of transport header relative to skb->data
- * @hook:  hook number given packet came from
- * @family:Actual NFPROTO_* through which the function is invoked
- * (helpful when match->family == NFPROTO_UNSPEC)
  *
  * Fields written to by extensions:
  *
@@ -38,15 +34,47 @@ struct xt_action_param {
union {
const void *matchinfo, *targinfo;
};
-   struct net *net;
-   const struct net_device *in, *out;
+   const struct nf_hook_state *state;
int fragoff;
unsigned int thoff;
-   unsigned int hooknum;
-   u_int8_t family;
bool hotdrop;
 };
 
+static inline struct net *xt_net(const struct xt_action_param *par)
+{
+   return par->state->net;
+}
+
+static inline struct net_device *xt_in(const struct xt_action_param *par)
+{
+   return par->state->in;
+}
+
+static inline const char *xt_inname(const struct xt_action_param *par)
+{
+   return par->state->in->name;
+}
+
+static inline struct net_device *xt_out(const struct xt_action_param *par)
+{
+   return par->state->out;
+}
+
+static inline const char *xt_outname(const struct xt_action_param *par)
+{
+   return par->state->out->name;
+}
+
+static inline unsigned int xt_hooknum(const struct xt_action_param *par)
+{
+   return

[PATCH 16/39] netfilter: ipset: Remove extra whitespaces in ip_set.h

2016-11-13 Thread Pablo Neira Ayuso

From: Jozsef Kadlecsik 

Remove unnecessary whitespaces.

Ported from a patch proposed by Sergey Popovich .

Suggested-by: Sergey Popovich 
Signed-off-by: Jozsef Kadlecsik 
---
 include/linux/netfilter/ipset/ip_set.h | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/include/linux/netfilter/ipset/ip_set.h 
b/include/linux/netfilter/ipset/ip_set.h
index 83b9a2e0d8d4..5b1fd090f34b 100644
--- a/include/linux/netfilter/ipset/ip_set.h
+++ b/include/linux/netfilter/ipset/ip_set.h
@@ -336,14 +336,15 @@ ip_set_update_counter(struct ip_set_counter *counter,
 
 static inline void
 ip_set_get_skbinfo(struct ip_set_skbinfo *skbinfo,
- const struct ip_set_ext *ext,
- struct ip_set_ext *mext, u32 flags)
+  const struct ip_set_ext *ext,
+  struct ip_set_ext *mext, u32 flags)
 {
-   mext->skbmark = skbinfo->skbmark;
-   mext->skbmarkmask = skbinfo->skbmarkmask;
-   mext->skbprio = skbinfo->skbprio;
-   mext->skbqueue = skbinfo->skbqueue;
+   mext->skbmark = skbinfo->skbmark;
+   mext->skbmarkmask = skbinfo->skbmarkmask;
+   mext->skbprio = skbinfo->skbprio;
+   mext->skbqueue = skbinfo->skbqueue;
 }
+
 static inline bool
 ip_set_put_skbinfo(struct sk_buff *skb, struct ip_set_skbinfo *skbinfo)
 {
-- 
2.1.4

[PATCH 15/39] netfilter: conntrack: fix NF_REPEAT handling

2016-11-13 Thread Pablo Neira Ayuso

From: Arnd Bergmann 

gcc correctly identified a theoretical uninitialized variable use:

net/netfilter/nf_conntrack_core.c: In function 'nf_conntrack_in':
net/netfilter/nf_conntrack_core.c:1125:14: error: 'l4proto' may be used 
uninitialized in this function [-Werror=maybe-uninitialized]

This could only happen when we 'goto out' before looking up l4proto,
and then enter the retry, implying that l3proto->get_l4proto()
returned NF_REPEAT. This does not currently get returned in any
code path and probably won't ever happen, but is not good to
rely on.

Moving the repeat handling up a little should have the same
behavior as today but avoids the warning by making that case
impossible to enter.

[ I have mangled this original patch to remove the check for tmpl, we
  should inconditionally jump back to the repeat label in case we hit
  NF_REPEAT instead. I have also moved the comment that explains this
  where it belongs. --pablo ]

Fixes: 08733a0cb7de ("netfilter: handle NF_REPEAT from nf_conntrack_in()")
Signed-off-by: Arnd Bergmann 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_conntrack_core.c | 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c 
b/net/netfilter/nf_conntrack_core.c
index de4b8a75f30b..e9ffe33dc0ca 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1337,6 +1337,12 @@ nf_conntrack_in(struct net *net, u_int8_t pf, unsigned 
int hooknum,
NF_CT_STAT_INC_ATOMIC(net, invalid);
if (ret == -NF_DROP)
NF_CT_STAT_INC_ATOMIC(net, drop);
+   /* Special case: TCP tracker reports an attempt to reopen a
+* closed/aborted connection. We have to go back and create a
+* fresh conntrack.
+*/
+   if (ret == -NF_REPEAT)
+   goto repeat;
ret = -ret;
goto out;
}
@@ -1344,16 +1350,8 @@ nf_conntrack_in(struct net *net, u_int8_t pf, unsigned 
int hooknum,
if (set_reply && !test_and_set_bit(IPS_SEEN_REPLY_BIT, >status))
nf_conntrack_event_cache(IPCT_REPLY, ct);
 out:
-   if (tmpl) {
-   /* Special case: TCP tracker reports an attempt to reopen a
-* closed/aborted connection. We have to go back and create a
-* fresh conntrack.
-*/
-   if (ret == NF_REPEAT)
-   goto repeat;
-   else
-   nf_ct_put(tmpl);
-   }
+   if (tmpl)
+   nf_ct_put(tmpl);
 
return ret;
 }
-- 
2.1.4

[PATCH 34/39] netfilter: ipset: Fix reported memory size for hash:* types

2016-11-13 Thread Pablo Neira Ayuso

From: Jozsef Kadlecsik 

The calculation of the full allocated memory did not take
into account the size of the base hash bucket structure at some
places.

Signed-off-by: Jozsef Kadlecsik 
---
 net/netfilter/ipset/ip_set_hash_gen.h | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_hash_gen.h 
b/net/netfilter/ipset/ip_set_hash_gen.h
index 1c9b84e53dcc..88b70fcc5ac5 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -85,6 +85,8 @@ struct htable {
 };
 
 #define hbucket(h, i)  ((h)->bucket[i])
+#define ext_size(n, dsize) \
+   (sizeof(struct hbucket) + (n) * (dsize))
 
 #ifndef IPSET_NET_COUNT
 #define IPSET_NET_COUNT1
@@ -519,7 +521,7 @@ mtype_expire(struct ip_set *set, struct htype *h)
d++;
}
tmp->pos = d;
-   set->ext_size -= AHASH_INIT_SIZE * dsize;
+   set->ext_size -= ext_size(AHASH_INIT_SIZE, dsize);
rcu_assign_pointer(hbucket(t, i), tmp);
kfree_rcu(n, rcu);
}
@@ -625,7 +627,7 @@ mtype_resize(struct ip_set *set, bool retried)
goto cleanup;
}
m->size = AHASH_INIT_SIZE;
-   extsize = sizeof(*m) + AHASH_INIT_SIZE * dsize;
+   extsize = ext_size(AHASH_INIT_SIZE, dsize);
RCU_INIT_POINTER(hbucket(t, key), m);
} else if (m->pos >= m->size) {
struct hbucket *ht;
@@ -645,7 +647,7 @@ mtype_resize(struct ip_set *set, bool retried)
memcpy(ht, m, sizeof(struct hbucket) +
  m->size * dsize);
ht->size = m->size + AHASH_INIT_SIZE;
-   extsize += AHASH_INIT_SIZE * dsize;
+   extsize += ext_size(AHASH_INIT_SIZE, dsize);
kfree(m);
m = ht;
RCU_INIT_POINTER(hbucket(t, key), ht);
@@ -727,7 +729,7 @@ mtype_add(struct ip_set *set, void *value, const struct 
ip_set_ext *ext,
if (!n)
return -ENOMEM;
n->size = AHASH_INIT_SIZE;
-   set->ext_size += sizeof(*n) + AHASH_INIT_SIZE * set->dsize;
+   set->ext_size += ext_size(AHASH_INIT_SIZE, set->dsize);
goto copy_elem;
}
for (i = 0; i < n->pos; i++) {
@@ -791,7 +793,7 @@ mtype_add(struct ip_set *set, void *value, const struct 
ip_set_ext *ext,
memcpy(n, old, sizeof(struct hbucket) +
   old->size * set->dsize);
n->size = old->size + AHASH_INIT_SIZE;
-   set->ext_size += AHASH_INIT_SIZE * set->dsize;
+   set->ext_size += ext_size(AHASH_INIT_SIZE, set->dsize);
}
 
 copy_elem:
@@ -883,7 +885,7 @@ mtype_del(struct ip_set *set, void *value, const struct 
ip_set_ext *ext,
k++;
}
if (n->pos == 0 && k == 0) {
-   set->ext_size -= sizeof(*n) + n->size * dsize;
+   set->ext_size -= ext_size(n->size, dsize);
rcu_assign_pointer(hbucket(t, key), NULL);
kfree_rcu(n, rcu);
} else if (k >= AHASH_INIT_SIZE) {
@@ -902,7 +904,7 @@ mtype_del(struct ip_set *set, void *value, const struct 
ip_set_ext *ext,
k++;
}
tmp->pos = k;
-   set->ext_size -= AHASH_INIT_SIZE * dsize;
+   set->ext_size -= ext_size(AHASH_INIT_SIZE, dsize);
rcu_assign_pointer(hbucket(t, key), tmp);
kfree_rcu(n, rcu);
}
-- 
2.1.4

[PATCH 24/39] netfilter: ipset: Add element count to hash headers

2016-11-13 Thread Pablo Neira Ayuso

From: Eric B Munson 

It would be useful for userspace to query the size of an ipset hash,
however, this data is not exposed to userspace outside of counting the
number of member entries.  This patch uses the attribute
IPSET_ATTR_ELEMENTS to indicate the size in the the header that is
exported to userspace.  This field is then printed by the userspace
tool for hashes.

Signed-off-by: Eric B Munson 
Cc: Pablo Neira Ayuso 
Cc: Josh Hunt 
Cc: netfilter-de...@vger.kernel.org
Signed-off-by: Jozsef Kadlecsik 
---
 net/netfilter/ipset/ip_set_hash_gen.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/ipset/ip_set_hash_gen.h 
b/net/netfilter/ipset/ip_set_hash_gen.h
index d32fd6b036bf..f5acfb9709c9 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -1083,7 +1083,8 @@ mtype_head(struct ip_set *set, struct sk_buff *skb)
goto nla_put_failure;
 #endif
if (nla_put_net32(skb, IPSET_ATTR_REFERENCES, htonl(set->ref)) ||
-   nla_put_net32(skb, IPSET_ATTR_MEMSIZE, htonl(memsize)))
+   nla_put_net32(skb, IPSET_ATTR_MEMSIZE, htonl(memsize)) ||
+   nla_put_net32(skb, IPSET_ATTR_ELEMENTS, htonl(h->elements)))
goto nla_put_failure;
if (unlikely(ip_set_put_flags(skb, set)))
goto nla_put_failure;
-- 
2.1.4

[PATCH 25/39] netfilter: ipset: Add element count to all set types header

2016-11-13 Thread Pablo Neira Ayuso

From: Jozsef Kadlecsik 

It is better to list the set elements for all set types, thus the
header information is uniform. Element counts are therefore added
to the bitmap and list types.

Signed-off-by: Jozsef Kadlecsik 
---
 include/linux/netfilter/ipset/ip_set.h|  2 ++
 include/linux/netfilter/ipset/ip_set_bitmap.h |  2 +-
 net/netfilter/ipset/ip_set_bitmap_gen.h   | 10 +-
 net/netfilter/ipset/ip_set_hash_gen.h | 21 ++---
 net/netfilter/ipset/ip_set_list_set.c |  6 +-
 5 files changed, 27 insertions(+), 14 deletions(-)

diff --git a/include/linux/netfilter/ipset/ip_set.h 
b/include/linux/netfilter/ipset/ip_set.h
index 7a218eb74887..4671d740610f 100644
--- a/include/linux/netfilter/ipset/ip_set.h
+++ b/include/linux/netfilter/ipset/ip_set.h
@@ -250,6 +250,8 @@ struct ip_set {
u8 flags;
/* Default timeout value, if enabled */
u32 timeout;
+   /* Number of elements (vs timeout) */
+   u32 elements;
/* Element data size */
size_t dsize;
/* Offsets to extensions in elements */
diff --git a/include/linux/netfilter/ipset/ip_set_bitmap.h 
b/include/linux/netfilter/ipset/ip_set_bitmap.h
index 5e4662a71e01..366d6c0ea04f 100644
--- a/include/linux/netfilter/ipset/ip_set_bitmap.h
+++ b/include/linux/netfilter/ipset/ip_set_bitmap.h
@@ -6,8 +6,8 @@
 #define IPSET_BITMAP_MAX_RANGE 0x
 
 enum {
+   IPSET_ADD_STORE_PLAIN_TIMEOUT = -1,
IPSET_ADD_FAILED = 1,
-   IPSET_ADD_STORE_PLAIN_TIMEOUT,
IPSET_ADD_START_STORED_TIMEOUT,
 };
 
diff --git a/net/netfilter/ipset/ip_set_bitmap_gen.h 
b/net/netfilter/ipset/ip_set_bitmap_gen.h
index 4f07b90f8ef4..1810d1c06e3d 100644
--- a/net/netfilter/ipset/ip_set_bitmap_gen.h
+++ b/net/netfilter/ipset/ip_set_bitmap_gen.h
@@ -83,6 +83,7 @@ mtype_flush(struct ip_set *set)
if (set->extensions & IPSET_EXT_DESTROY)
mtype_ext_cleanup(set);
memset(map->members, 0, map->memsize);
+   set->elements = 0;
 }
 
 /* Calculate the actual memory size of the set data */
@@ -105,7 +106,8 @@ mtype_head(struct ip_set *set, struct sk_buff *skb)
goto nla_put_failure;
if (mtype_do_head(skb, map) ||
nla_put_net32(skb, IPSET_ATTR_REFERENCES, htonl(set->ref)) ||
-   nla_put_net32(skb, IPSET_ATTR_MEMSIZE, htonl(memsize)))
+   nla_put_net32(skb, IPSET_ATTR_MEMSIZE, htonl(memsize)) ||
+   nla_put_net32(skb, IPSET_ATTR_ELEMENTS, htonl(set->elements)))
goto nla_put_failure;
if (unlikely(ip_set_put_flags(skb, set)))
goto nla_put_failure;
@@ -149,6 +151,7 @@ mtype_add(struct ip_set *set, void *value, const struct 
ip_set_ext *ext,
if (ret == IPSET_ADD_FAILED) {
if (SET_WITH_TIMEOUT(set) &&
ip_set_timeout_expired(ext_timeout(x, set))) {
+   set->elements--;
ret = 0;
} else if (!(flags & IPSET_FLAG_EXIST)) {
set_bit(e->id, map->members);
@@ -157,6 +160,8 @@ mtype_add(struct ip_set *set, void *value, const struct 
ip_set_ext *ext,
/* Element is re-added, cleanup extensions */
ip_set_ext_destroy(set, x);
}
+   if (ret > 0)
+   set->elements--;
 
if (SET_WITH_TIMEOUT(set))
 #ifdef IP_SET_BITMAP_STORED_TIMEOUT
@@ -174,6 +179,7 @@ mtype_add(struct ip_set *set, void *value, const struct 
ip_set_ext *ext,
 
/* Activate element */
set_bit(e->id, map->members);
+   set->elements++;
 
return 0;
 }
@@ -190,6 +196,7 @@ mtype_del(struct ip_set *set, void *value, const struct 
ip_set_ext *ext,
return -IPSET_ERR_EXIST;
 
ip_set_ext_destroy(set, x);
+   set->elements--;
if (SET_WITH_TIMEOUT(set) &&
ip_set_timeout_expired(ext_timeout(x, set)))
return -IPSET_ERR_EXIST;
@@ -285,6 +292,7 @@ mtype_gc(unsigned long ul_set)
if (ip_set_timeout_expired(ext_timeout(x, set))) {
clear_bit(id, map->members);
ip_set_ext_destroy(set, x);
+   set->elements--;
}
}
spin_unlock_bh(>lock);
diff --git a/net/netfilter/ipset/ip_set_hash_gen.h 
b/net/netfilter/ipset/ip_set_hash_gen.h
index f5acfb9709c9..6e967f198d1e 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -275,7 +275,6 @@ htable_bits(u32 hashsize)
 struct htype {
struct htable __rcu *table; /* the hash table */
u32 maxelem;/* max elements in the hash */
-   u32 elements;   /* current element (vs timeout) */
u32 initval;/* random jhash init value */
 #ifdef IP_SET_HASH_WITH_MARKMASK
u32 markmask;   /* markmask value

[PATCH 37/39] netfilter: ipset: hash: fix boolreturn.cocci warnings

2016-11-13 Thread Pablo Neira Ayuso

From: kbuild test robot 

net/netfilter/ipset/ip_set_hash_ipmac.c:70:8-9: WARNING: return of 0/1 in 
function 'hash_ipmac4_data_list' with return type bool
net/netfilter/ipset/ip_set_hash_ipmac.c:178:8-9: WARNING: return of 0/1 in 
function 'hash_ipmac6_data_list' with return type bool

 Return statements in functions returning bool should use
 true/false instead of 1/0.
Generated by: scripts/coccinelle/misc/boolreturn.cocci

CC: Tomasz Chilinski 
Signed-off-by: Fengguang Wu 
Signed-off-by: Jozsef Kadlecsik 
---
 net/netfilter/ipset/ip_set_hash_ipmac.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_hash_ipmac.c 
b/net/netfilter/ipset/ip_set_hash_ipmac.c
index d9eb144b01d6..1ab5ed2f6839 100644
--- a/net/netfilter/ipset/ip_set_hash_ipmac.c
+++ b/net/netfilter/ipset/ip_set_hash_ipmac.c
@@ -67,10 +67,10 @@ hash_ipmac4_data_list(struct sk_buff *skb, const struct 
hash_ipmac4_elem *e)
if (nla_put_ipaddr4(skb, IPSET_ATTR_IP, e->ip) ||
nla_put(skb, IPSET_ATTR_ETHER, ETH_ALEN, e->ether))
goto nla_put_failure;
-   return 0;
+   return false;
 
 nla_put_failure:
-   return 1;
+   return true;
 }
 
 static inline void
@@ -175,10 +175,10 @@ hash_ipmac6_data_list(struct sk_buff *skb, const struct 
hash_ipmac6_elem *e)
if (nla_put_ipaddr6(skb, IPSET_ATTR_IP, >ip.in6) ||
nla_put(skb, IPSET_ATTR_ETHER, ETH_ALEN, e->ether))
goto nla_put_failure;
-   return 0;
+   return false;
 
 nla_put_failure:
-   return 1;
+   return true;
 }
 
 static inline void
-- 
2.1.4

[PATCH 11/39] netfilter: nft_hash: get random bytes if seed is not specified

2016-11-13 Thread Pablo Neira Ayuso

If the user doesn't specify a seed, generate one at configuration time.

Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nft_hash.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nft_hash.c b/net/netfilter/nft_hash.c
index baf694de3935..97ad8e30e4b4 100644
--- a/net/netfilter/nft_hash.c
+++ b/net/netfilter/nft_hash.c
@@ -57,7 +57,6 @@ static int nft_hash_init(const struct nft_ctx *ctx,
if (!tb[NFTA_HASH_SREG] ||
!tb[NFTA_HASH_DREG] ||
!tb[NFTA_HASH_LEN]  ||
-   !tb[NFTA_HASH_SEED] ||
!tb[NFTA_HASH_MODULUS])
return -EINVAL;
 
@@ -80,7 +79,10 @@ static int nft_hash_init(const struct nft_ctx *ctx,
if (priv->offset + priv->modulus - 1 < priv->offset)
return -EOVERFLOW;
 
-   priv->seed = ntohl(nla_get_be32(tb[NFTA_HASH_SEED]));
+   if (tb[NFTA_HASH_SEED])
+   priv->seed = ntohl(nla_get_be32(tb[NFTA_HASH_SEED]));
+   else
+   get_random_bytes(>seed, sizeof(priv->seed));
 
return nft_validate_register_load(priv->sreg, len) &&
   nft_validate_register_store(ctx, priv->dreg, NULL,
-- 
2.1.4

[PATCH 10/39] netfilter: handle NF_REPEAT from nf_conntrack_in()

2016-11-13 Thread Pablo Neira Ayuso

NF_REPEAT is only needed from nf_conntrack_in() under a very specific
case required by the TCP protocol tracker, we can handle this case
without returning to the core hook path. Handling of NF_REPEAT from the
nf_reinject() is left untouched.

Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/core.c  |  2 --
 net/netfilter/nf_conntrack_core.c | 11 ++-
 net/openvswitch/conntrack.c   |  8 ++--
 3 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index bd9272eeccb5..de30e08d58f2 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -322,8 +322,6 @@ int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state 
*state,
if (ret == 0)
ret = -EPERM;
return ret;
-   case NF_REPEAT:
-   continue;
case NF_QUEUE:
ret = nf_queue(skb, state, , verdict);
if (ret == 1 && entry)
diff --git a/net/netfilter/nf_conntrack_core.c 
b/net/netfilter/nf_conntrack_core.c
index df2f5a3901df..de4b8a75f30b 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1305,7 +1305,7 @@ nf_conntrack_in(struct net *net, u_int8_t pf, unsigned 
int hooknum,
if (skb->nfct)
goto out;
}
-
+repeat:
ct = resolve_normal_ct(net, tmpl, skb, dataoff, pf, protonum,
   l3proto, l4proto, _reply, );
if (!ct) {
@@ -1345,11 +1345,12 @@ nf_conntrack_in(struct net *net, u_int8_t pf, unsigned 
int hooknum,
nf_conntrack_event_cache(IPCT_REPLY, ct);
 out:
if (tmpl) {
-   /* Special case: we have to repeat this hook, assign the
-* template again to this packet. We assume that this packet
-* has no conntrack assigned. This is used by nf_ct_tcp. */
+   /* Special case: TCP tracker reports an attempt to reopen a
+* closed/aborted connection. We have to go back and create a
+* fresh conntrack.
+*/
if (ret == NF_REPEAT)
-   skb->nfct = (struct nf_conntrack *)tmpl;
+   goto repeat;
else
nf_ct_put(tmpl);
}
diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 31045ef44a82..9b8a028b7dad 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -725,12 +725,8 @@ static int __ovs_ct_lookup(struct net *net, struct 
sw_flow_key *key,
skb->nfctinfo = IP_CT_NEW;
}
 
-   /* Repeat if requested, see nf_iterate(). */
-   do {
-   err = nf_conntrack_in(net, info->family,
- NF_INET_PRE_ROUTING, skb);
-   } while (err == NF_REPEAT);
-
+   err = nf_conntrack_in(net, info->family,
+ NF_INET_PRE_ROUTING, skb);
if (err != NF_ACCEPT)
return -ENOENT;
 
-- 
2.1.4

[PATCH 27/39] netfilter: ipset: Remove redundant mtype_expire() arguments

2016-11-13 Thread Pablo Neira Ayuso

From: Jozsef Kadlecsik 

Remove redundant parameters nets_length and dsize, because
they can be get from other parameters.

Ported from a patch proposed by Sergey Popovich .

Signed-off-by: Jozsef Kadlecsik 
---
 net/netfilter/ipset/ip_set_hash_gen.h | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_hash_gen.h 
b/net/netfilter/ipset/ip_set_hash_gen.h
index 0746405a1d14..c4877b6de74f 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -465,14 +465,15 @@ mtype_same_set(const struct ip_set *a, const struct 
ip_set *b)
 
 /* Delete expired elements from the hashtable */
 static void
-mtype_expire(struct ip_set *set, struct htype *h, u8 nets_length, size_t dsize)
+mtype_expire(struct ip_set *set, struct htype *h)
 {
struct htable *t;
struct hbucket *n, *tmp;
struct mtype_elem *data;
u32 i, j, d;
+   size_t dsize = set->dsize;
 #ifdef IP_SET_HASH_WITH_NETS
-   u8 k;
+   u8 k, nets_length = NLEN(set->family);
 #endif
 
t = ipset_dereference_protected(h->table, set);
@@ -539,7 +540,7 @@ mtype_gc(unsigned long ul_set)
 
pr_debug("called\n");
spin_lock_bh(>lock);
-   mtype_expire(set, h, NLEN(set->family), set->dsize);
+   mtype_expire(set, h);
spin_unlock_bh(>lock);
 
h->gc.expires = jiffies + IPSET_GC_PERIOD(set->timeout) * HZ;
@@ -715,7 +716,7 @@ mtype_add(struct ip_set *set, void *value, const struct 
ip_set_ext *ext,
if (set->elements >= h->maxelem) {
if (SET_WITH_TIMEOUT(set))
/* FIXME: when set is full, we slow down here */
-   mtype_expire(set, h, NLEN(set->family), set->dsize);
+   mtype_expire(set, h);
if (set->elements >= h->maxelem && SET_WITH_FORCEADD(set))
forceadd = true;
}
-- 
2.1.4

[PATCH 19/39] netfilter: ipset: Improve skbinfo get/init helpers

2016-11-13 Thread Pablo Neira Ayuso

From: Jozsef Kadlecsik 

Use struct ip_set_skbinfo in struct ip_set_ext instead of open
coded fields and assign structure members in get/init helpers
instead of copying members one by one. Explicitly note that
struct ip_set_skbinfo must be padded to prevent non-aligned
access in the extension blob.

Ported from a patch proposed by Sergey Popovich .

Suggested-by: Sergey Popovich 
Signed-off-by: Jozsef Kadlecsik 
---
 include/linux/netfilter/ipset/ip_set.h | 30 +++---
 net/netfilter/ipset/ip_set_core.c  | 12 ++--
 net/netfilter/xt_set.c | 12 +++-
 3 files changed, 24 insertions(+), 30 deletions(-)

diff --git a/include/linux/netfilter/ipset/ip_set.h 
b/include/linux/netfilter/ipset/ip_set.h
index 1ea28e30a6dd..780262124632 100644
--- a/include/linux/netfilter/ipset/ip_set.h
+++ b/include/linux/netfilter/ipset/ip_set.h
@@ -92,17 +92,6 @@ struct ip_set_ext_type {
 
 extern const struct ip_set_ext_type ip_set_extensions[];
 
-struct ip_set_ext {
-   u64 packets;
-   u64 bytes;
-   u32 timeout;
-   u32 skbmark;
-   u32 skbmarkmask;
-   u32 skbprio;
-   u16 skbqueue;
-   char *comment;
-};
-
 struct ip_set_counter {
atomic64_t bytes;
atomic64_t packets;
@@ -122,6 +111,15 @@ struct ip_set_skbinfo {
u32 skbmarkmask;
u32 skbprio;
u16 skbqueue;
+   u16 __pad;
+};
+
+struct ip_set_ext {
+   struct ip_set_skbinfo skbinfo;
+   u64 packets;
+   u64 bytes;
+   char *comment;
+   u32 timeout;
 };
 
 struct ip_set;
@@ -360,10 +358,7 @@ ip_set_get_skbinfo(struct ip_set_skbinfo *skbinfo,
   const struct ip_set_ext *ext,
   struct ip_set_ext *mext, u32 flags)
 {
-   mext->skbmark = skbinfo->skbmark;
-   mext->skbmarkmask = skbinfo->skbmarkmask;
-   mext->skbprio = skbinfo->skbprio;
-   mext->skbqueue = skbinfo->skbqueue;
+   mext->skbinfo = *skbinfo;
 }
 
 static inline bool
@@ -387,10 +382,7 @@ static inline void
 ip_set_init_skbinfo(struct ip_set_skbinfo *skbinfo,
const struct ip_set_ext *ext)
 {
-   skbinfo->skbmark = ext->skbmark;
-   skbinfo->skbmarkmask = ext->skbmarkmask;
-   skbinfo->skbprio = ext->skbprio;
-   skbinfo->skbqueue = ext->skbqueue;
+   *skbinfo = ext->skbinfo;
 }
 
 /* Netlink CB args */
diff --git a/net/netfilter/ipset/ip_set_core.c 
b/net/netfilter/ipset/ip_set_core.c
index 3f1b945a24d5..bfacccff7196 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -426,20 +426,20 @@ ip_set_get_extensions(struct ip_set *set, struct nlattr 
*tb[],
if (!SET_WITH_SKBINFO(set))
return -IPSET_ERR_SKBINFO;
fullmark = be64_to_cpu(nla_get_be64(tb[IPSET_ATTR_SKBMARK]));
-   ext->skbmark = fullmark >> 32;
-   ext->skbmarkmask = fullmark & 0x;
+   ext->skbinfo.skbmark = fullmark >> 32;
+   ext->skbinfo.skbmarkmask = fullmark & 0x;
}
if (tb[IPSET_ATTR_SKBPRIO]) {
if (!SET_WITH_SKBINFO(set))
return -IPSET_ERR_SKBINFO;
-   ext->skbprio = be32_to_cpu(nla_get_be32(
-   tb[IPSET_ATTR_SKBPRIO]));
+   ext->skbinfo.skbprio =
+   be32_to_cpu(nla_get_be32(tb[IPSET_ATTR_SKBPRIO]));
}
if (tb[IPSET_ATTR_SKBQUEUE]) {
if (!SET_WITH_SKBINFO(set))
return -IPSET_ERR_SKBINFO;
-   ext->skbqueue = be16_to_cpu(nla_get_be16(
-   tb[IPSET_ATTR_SKBQUEUE]));
+   ext->skbinfo.skbqueue =
+   be16_to_cpu(nla_get_be16(tb[IPSET_ATTR_SKBQUEUE]));
}
return 0;
 }
diff --git a/net/netfilter/xt_set.c b/net/netfilter/xt_set.c
index 1bfede7be418..64285702afd5 100644
--- a/net/netfilter/xt_set.c
+++ b/net/netfilter/xt_set.c
@@ -423,6 +423,8 @@ set_target_v2(struct sk_buff *skb, const struct 
xt_action_param *par)
 
 /* Revision 3 target */
 
+#define MOPT(opt, member)  ((opt).ext.skbinfo.member)
+
 static unsigned int
 set_target_v3(struct sk_buff *skb, const struct xt_action_param *par)
 {
@@ -453,14 +455,14 @@ set_target_v3(struct sk_buff *skb, const struct 
xt_action_param *par)
if (!ret)
return XT_CONTINUE;
if (map_opt.cmdflags & IPSET_FLAG_MAP_SKBMARK)
-   skb->mark = (skb->mark & ~(map_opt.ext.skbmarkmask))
-   ^ (map_opt.ext.skbmark);
+   skb->mark = (skb->mark & ~MOPT(map_opt,skbmarkmask))
+   ^ MOPT(map_opt, skbmark);
if (map_opt.cmdflags & IPSET_FLAG_MAP_SKBPRIO)
-

[PATCH 20/39] netfilter: ipset: Use kmalloc() in comment extension helper

2016-11-13 Thread Pablo Neira Ayuso

From: Jozsef Kadlecsik 

Allocate memory with kmalloc() rather than kzalloc(): the string
is immediately initialized so it is unnecessary to zero out
the allocated memory area.

Ported from a patch proposed by Sergey Popovich .

Suggested-by: Sergey Popovich 
Signed-off-by: Jozsef Kadlecsik 
---
 include/linux/netfilter/ipset/ip_set_comment.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/netfilter/ipset/ip_set_comment.h 
b/include/linux/netfilter/ipset/ip_set_comment.h
index bae5c7609be2..5444b1bbe656 100644
--- a/include/linux/netfilter/ipset/ip_set_comment.h
+++ b/include/linux/netfilter/ipset/ip_set_comment.h
@@ -34,7 +34,7 @@ ip_set_init_comment(struct ip_set_comment *comment,
return;
if (unlikely(len > IPSET_MAX_COMMENT_SIZE))
len = IPSET_MAX_COMMENT_SIZE;
-   c = kzalloc(sizeof(*c) + len + 1, GFP_ATOMIC);
+   c = kmalloc(sizeof(*c) + len + 1, GFP_ATOMIC);
if (unlikely(!c))
return;
strlcpy(c->str, ext->comment, len + 1);
-- 
2.1.4

[PATCH 04/39] netfilter: deprecate NF_STOP

2016-11-13 Thread Pablo Neira Ayuso

NF_STOP is only used by br_netfilter these days, and it can be emulated
with a combination of NF_STOLEN plus explicit call to the ->okfn()
function as Florian suggests.

To retain binary compatibility with userspace nf_queue application, we
have to keep NF_STOP around, so libnetfilter_queue userspace userspace
applications still work if they use NF_STOP for some exotic reason.

Out of tree modules using NF_STOP would break, but we don't care about
those.

Signed-off-by: Pablo Neira Ayuso 
---
 include/uapi/linux/netfilter.h  | 2 +-
 net/bridge/br_netfilter_hooks.c | 6 --
 net/netfilter/core.c| 2 +-
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/netfilter.h b/include/uapi/linux/netfilter.h
index d93f949d1d9a..7550e9176a54 100644
--- a/include/uapi/linux/netfilter.h
+++ b/include/uapi/linux/netfilter.h
@@ -13,7 +13,7 @@
 #define NF_STOLEN 2
 #define NF_QUEUE 3
 #define NF_REPEAT 4
-#define NF_STOP 5
+#define NF_STOP 5  /* Deprecated, for userspace nf_queue compatibility. */
 #define NF_MAX_VERDICT NF_STOP
 
 /* we overload the higher bits for encoding auxiliary data such as the queue
diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
index d0d66faebe90..7e3645fa6339 100644
--- a/net/bridge/br_netfilter_hooks.c
+++ b/net/bridge/br_netfilter_hooks.c
@@ -845,8 +845,10 @@ static unsigned int ip_sabotage_in(void *priv,
   struct sk_buff *skb,
   const struct nf_hook_state *state)
 {
-   if (skb->nf_bridge && !skb->nf_bridge->in_prerouting)
-   return NF_STOP;
+   if (skb->nf_bridge && !skb->nf_bridge->in_prerouting) {
+   state->okfn(state->net, state->sk, skb);
+   return NF_STOLEN;
+   }
 
return NF_ACCEPT;
 }
diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index cb0232c11bc8..14f97b624f98 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -333,7 +333,7 @@ int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state 
*state)
entry = rcu_dereference(state->hook_entries);
 next_hook:
verdict = nf_iterate(skb, state, );
-   if (verdict == NF_ACCEPT || verdict == NF_STOP) {
+   if (verdict == NF_ACCEPT) {
ret = 1;
} else if ((verdict & NF_VERDICT_MASK) == NF_DROP) {
kfree_skb(skb);
-- 
2.1.4

[PATCH 07/39] netfilter: use switch() to handle verdict cases from nf_hook_slow()

2016-11-13 Thread Pablo Neira Ayuso

Use switch() for verdict handling and add explicit handling for
NF_STOLEN and other non-conventional verdicts.

Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/core.c | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index 14f97b624f98..64623374bc5f 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -328,22 +328,32 @@ int nf_hook_slow(struct sk_buff *skb, struct 
nf_hook_state *state)
 {
struct nf_hook_entry *entry;
unsigned int verdict;
-   int ret = 0;
+   int ret;
 
entry = rcu_dereference(state->hook_entries);
 next_hook:
verdict = nf_iterate(skb, state, );
-   if (verdict == NF_ACCEPT) {
+   switch (verdict & NF_VERDICT_MASK) {
+   case NF_ACCEPT:
ret = 1;
-   } else if ((verdict & NF_VERDICT_MASK) == NF_DROP) {
+   break;
+   case NF_DROP:
kfree_skb(skb);
ret = NF_DROP_GETERR(verdict);
if (ret == 0)
ret = -EPERM;
-   } else if ((verdict & NF_VERDICT_MASK) == NF_QUEUE) {
+   break;
+   case NF_QUEUE:
ret = nf_queue(skb, state, , verdict);
if (ret == 1 && entry)
goto next_hook;
+   /* Fall through. */
+   default:
+   /* Implicit handling for NF_STOLEN, as well as any other non
+* conventional verdicts.
+*/
+   ret = 0;
+   break;
}
return ret;
 }
-- 
2.1.4

[PATCH 01/39] netfilter: get rid of useless debugging from core

2016-11-13 Thread Pablo Neira Ayuso

This patch remove compile time code to catch inconventional verdicts.
We have better ways to handle this case these days, eg. pr_debug() but
even though I don't think this is useful at all, so let's remove this.

Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/core.c | 9 -
 1 file changed, 9 deletions(-)

diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index 004af030ef1a..3d4aa96cb219 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -323,15 +323,6 @@ unsigned int nf_iterate(struct sk_buff *skb,
 repeat:
verdict = (*entryp)->ops.hook((*entryp)->ops.priv, skb, state);
if (verdict != NF_ACCEPT) {
-#ifdef CONFIG_NETFILTER_DEBUG
-   if (unlikely((verdict & NF_VERDICT_MASK)
-   > NF_MAX_VERDICT)) {
-   NFDEBUG("Evil return from %p(%u).\n",
-   (*entryp)->ops.hook, state->hook);
-   *entryp = rcu_dereference((*entryp)->next);
-   continue;
-   }
-#endif
if (verdict != NF_REPEAT)
return verdict;
goto repeat;
-- 
2.1.4

Re: [PATCH] net: stmmac: Add support for ethtool::nway_reset

2016-11-13 Thread kbuild test robot

Hi Florian,

[auto build test WARNING on net-next/master]
[also build test WARNING on next-2016]
[cannot apply to v4.9-rc5]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Florian-Fainelli/net-stmmac-Add-support-for-ethtool-nway_reset/20161114-053015
config: x86_64-kexec (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c: In function 
'stmmac_nway_reset':
>> drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c:867:22: warning: unused 
>> variable 'priv' [-Wunused-variable]
 struct stmmac_priv *priv = netdev_priv(dev);
 ^~~~

vim +/priv +867 drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c

   851  int ret = 0;
   852  
   853  switch (tuna->id) {
   854  case ETHTOOL_RX_COPYBREAK:
   855  priv->rx_copybreak = *(u32 *)data;
   856  break;
   857  default:
   858  ret = -EINVAL;
   859  break;
   860  }
   861  
   862  return ret;
   863  }
   864  
   865  static int stmmac_nway_reset(struct net_device *dev)
   866  {
 > 867  struct stmmac_priv *priv = netdev_priv(dev);
   868  
   869  if (!dev->phydev)
   870  return -ENODEV;
   871  
   872  return genphy_restart_aneg(dev->phydev);
   873  }
   874  
   875  static const struct ethtool_ops stmmac_ethtool_ops = {

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

Re: [PATCH] net: stmmac: Add support for ethtool::nway_reset

2016-11-13 Thread Florian Fainelli

Le 13/11/2016 à 13:24, Florian Fainelli a écrit :
> If we have a PHY device, just invoke genphy_restart_aneg() to restart
> auto-negotiation.
> 
> Signed-off-by: Florian Fainelli 

David, please drop this patch for now, since I have another one pending
which is going to touch the net_device/phydev interaction, this one also
causes a build warning since priv is not used.

Thank you!
-- 
Florian

Re: [PATCH] netfilter: x_tables: simplify IS_ERR_OR_NULL to NULL test

2016-11-13 Thread Pablo Neira Ayuso

On Fri, Nov 11, 2016 at 01:32:38PM +0100, Julia Lawall wrote:
> Since commit 7926dbfa4bc1 ("netfilter: don't use
> mutex_lock_interruptible()"), the function xt_find_table_lock can only
> return NULL on an error.  Simplify the call sites and update the
> comment before the function.

Applied, thanks Julia!

[PATCH] net: stmmac: Add support for ethtool::nway_reset

2016-11-13 Thread Florian Fainelli

If we have a PHY device, just invoke genphy_restart_aneg() to restart
auto-negotiation.

Signed-off-by: Florian Fainelli 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
index 3fe9340b748f..7a487c9ccdea 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
@@ -862,6 +862,16 @@ static int stmmac_set_tunable(struct net_device *dev,
return ret;
 }
 
+static int stmmac_nway_reset(struct net_device *dev)
+{
+   struct stmmac_priv *priv = netdev_priv(dev);
+
+   if (!dev->phydev)
+   return -ENODEV;
+
+   return genphy_restart_aneg(dev->phydev);
+}
+
 static const struct ethtool_ops stmmac_ethtool_ops = {
.begin = stmmac_check_if_running,
.get_drvinfo = stmmac_ethtool_getdrvinfo,
@@ -886,6 +896,7 @@ static const struct ethtool_ops stmmac_ethtool_ops = {
.set_tunable = stmmac_set_tunable,
.get_link_ksettings = stmmac_ethtool_get_link_ksettings,
.set_link_ksettings = stmmac_ethtool_set_link_ksettings,
+   .nway_reset = stmmac_nway_reset,
 };
 
 void stmmac_set_ethtool_ops(struct net_device *netdev)
-- 
2.9.3

Re: [PATCH v2] ip6_output: ensure flow saddr actually belongs to device

2016-11-13 Thread David Ahern

On 11/13/16 12:02 PM, Jason A. Donenfeld wrote:
> This puts the IPv6 routing functions in parity with the IPv4 routing
> functions. Namely, we now check in v6 that if a flowi6 requests an
> saddr, the returned dst actually corresponds to a net device that has
> that saddr. This mirrors the v4 logic with __ip_dev_find in
> __ip_route_output_key_hash. In the event that the returned dst is not
> for a dst with a dev that has the saddr, we return -EINVAL, just like
> v4; this makes it easy to use the same error handlers for both cases.
> 
> Signed-off-by: Jason A. Donenfeld 
> Cc: David Ahern 
> ---
> Changes from v1:
>This moves the check to the top and now sees if it's a valid address
>on _any_ device, not just the one in dst.
> 
>  include/net/ipv6.h|  2 ++
>  net/ipv6/ip6_output.c | 28 
>  2 files changed, 30 insertions(+)
> 
> diff --git a/include/net/ipv6.h b/include/net/ipv6.h
> index 8fed1cd..e5dc14f 100644
> --- a/include/net/ipv6.h
> +++ b/include/net/ipv6.h
> @@ -914,6 +914,8 @@ struct dst_entry *ip6_sk_dst_lookup_flow(struct sock *sk, 
> struct flowi6 *fl6,
>const struct in6_addr *final_dst);
>  struct dst_entry *ip6_blackhole_route(struct net *net,
> struct dst_entry *orig_dst);
> +struct net_device *__ip6_dev_find(struct net *net, struct in6_addr *addr,
> +   bool devref);
>  
>  /*
>   *   skb processing functions
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index 6001e78..371170b 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -916,6 +916,30 @@ static struct dst_entry *ip6_sk_dst_check(struct sock 
> *sk,
>   return dst;
>  }
>  
> +/**
> + * __ip6_dev_find - find the first device with a given source address.
> + * @net: the net namespace
> + * @addr: the source address
> + * @devref: if true, take a reference on the found device
> + *
> + * If a caller uses devref=false, it should be protected by RCU, or RTNL
> + */
> +struct net_device *__ip6_dev_find(struct net *net, struct in6_addr *addr, 
> bool devref)
> +{
> + struct net_device *result;
> +
> + rcu_read_lock();
> + for_each_netdev_rcu(net, result) {
> + if (ipv6_chk_addr(net, addr, result, 1))
> + break;
> + }
> + if (result && devref)
> + dev_hold(result);
> + rcu_read_unlock();
> + return result;
> +}
> +EXPORT_SYMBOL(__ip6_dev_find);

You don't need a new function to walk all interfaces; just use ipv6_chk_addr 
with a dev arg of NULL. IPv6 has a hash table with all unicast addresses -- 
inet6_addr_lst. ipv6_chk_addr is checking that list for the address in 
question. The actual device is not relevant for verifying the address is a 
valid local one (though the device can be returned from ifp->idev->dev if ever 
needed).

So drop the above ...

> +
>  static int ip6_dst_lookup_tail(struct net *net, const struct sock *sk,
>  struct dst_entry **dst, struct flowi6 *fl6)
>  {
> @@ -926,6 +950,10 @@ static int ip6_dst_lookup_tail(struct net *net, const 
> struct sock *sk,
>   int err;
>   int flags = 0;
>  
> + if (!ipv6_addr_any(>saddr) &&
> + !__ip6_dev_find(net, >saddr, false))

... and just use ipv6_chk_addr here.

> + return -EINVAL;
> +
>   /* The correct way to handle this would be to do
>* ip6_route_get_saddr, and then ip6_route_output; however,
>* the route-specific preferred source forces the
>

Re: [PATCH] ip6_output: ensure flow saddr actually belongs to device

2016-11-13 Thread David Ahern

On 11/13/16 1:19 PM, Jason A. Donenfeld wrote:
> I gave v2 my best shot. Hopefully it's adequate, but I have a feeling
> it might be best for you to just code up what you have in mind.

nah, you are doing fine. one more comment on v2.

Re: [PATCH net 2/2] r8152: rx descriptor check

2016-11-13 Thread Mark Lord

On 16-11-13 03:34 PM, Mark Lord wrote:
>
> The system I use it with is a 32-bit ppc476, with non-coherent RAM,
> and using 16KB page sizes.
> 
> The dongle instantly becomes a lot more reliable when r8152.c is updated
> to use usb_alloc_coherent() for URB buffers, rather than kmalloc().
> 
> Not sure why that would be though, as the USB stack normally would handle
> kmalloc'd buffers just fine.  It is calling the appropriate routines,
> which boil down to invalidating the dcache lines (for inbound bulk xfers)
> as part of usb_submit_urb(), and yet the problem there persists.
> 
> It could be caused by cache-line sharing with other allocations, but that 
> seems
> unlikely as the kmalloc() size is 16384 bytes per buffer.  Perhaps the driver
> is somehow accessing the buffer space again after doing usb_submit_urb()?
> That would certainly produce this kind of behaviour.
> 
> Or maybe there's just a memory barrier missing somewhere in path.
> 
> The really weird thing is that ASIX-based dongles (which use a different 
> driver)
> don't have this problem, and yet they also use kmalloc'd buffers.
> 
> I have access to the test system only for a day or two a week,
> and it takes a few hours to do a good test as to whether something helps or 
> not.
> I'll continue to poke at it as time and New Ideas permit.

Oh, and the problems did not exist with the 3.14.xx kernels and earlier.
They began to show up when we tried 3.16.xx and all newer kernels.

The difference there is that RX checksums were enabled in hardware as of 
3.16.xx,
and thus the network stack began accepting bad packets from the r8152 driver.

I don't know if the ASIX driver uses hardware checksums or just software 
checksums.
That might explain why it is more reliable here.
-- 
Mark Lord
Real-Time Remedies Inc.
ml...@pobox.com

Re: [PATCH net 2/2] r8152: rx descriptor check

2016-11-13 Thread Mark Lord

On 16-11-13 12:39 PM, David Miller wrote:
> From: Hayes Wang 
> Date: Fri, 11 Nov 2016 15:15:41 +0800
> 
>> For some platforms, the data in memory is not the same with the one
>> from the device. That is, the data of memory is unbelievable. The
>> check is used to find out this situation.
>>
>> Signed-off-by: Hayes Wang 
> 
> I'm all for adding consistency checks, but I disagree with proceeding
> in this manner for this.
> 
> If you add this patch now, there is a much smaller likelyhood that you
> will work with a high priority to figure out _why_ this is happening.
> 
> For all we know this could be a platform bug in the DMA API for the
> systems in question.
> 
> It could also be a bug elsewhere in the driver, either in setting up
> the descriptor DMA mappings or how the chip is programmed.
> 
> Either way the true cause must be found before we start throwing
> changes like this into the driver.

I agree.

The system I use it with is a 32-bit ppc476, with non-coherent RAM,
and using 16KB page sizes.

The dongle instantly becomes a lot more reliable when r8152.c is updated
to use usb_alloc_coherent() for URB buffers, rather than kmalloc().

Not sure why that would be though, as the USB stack normally would handle
kmalloc'd buffers just fine.  It is calling the appropriate routines,
which boil down to invalidating the dcache lines (for inbound bulk xfers)
as part of usb_submit_urb(), and yet the problem there persists.

It could be caused by cache-line sharing with other allocations, but that seems
unlikely as the kmalloc() size is 16384 bytes per buffer.  Perhaps the driver
is somehow accessing the buffer space again after doing usb_submit_urb()?
That would certainly produce this kind of behaviour.

Or maybe there's just a memory barrier missing somewhere in path.

The really weird thing is that ASIX-based dongles (which use a different driver)
don't have this problem, and yet they also use kmalloc'd buffers.

I have access to the test system only for a day or two a week,
and it takes a few hours to do a good test as to whether something helps or not.
I'll continue to poke at it as time and New Ideas permit.

New Ideas welcome!
-- 
Mark Lord
Real-Time Remedies Inc.
ml...@pobox.com

Re: [PATCH net-next 00/11] Start adding support for mv88e6390 family

2016-11-13 Thread Andrew Lunn

On Sun, Nov 13, 2016 at 12:48:59AM -0500, David Miller wrote:
> From: Andrew Lunn 
> Date: Fri, 11 Nov 2016 03:53:32 +0100
> 
> > This is the first patchset implementing support for the mv88e6390
> > family.  This is a new generation of switch devices and has numerous
> > incompatible changes to the registers. These patches allow the switch
> > to the detected during probe, and makes the statistics unit work.
> > 
> > These patches are insufficient to make the mv88e6390 functional. More
> > patches will follow.
> 
> Andrew, this series doesn't apply cleanly to net-next, so you'll
> need to respin.

Hi David

I'm happy to respin, but i'm wondering why the don't apply.

What seems to be the issue is you said you have accepted:

[PATCH net-next 0/2] Fixes for port refactoring
https://marc.info/?l=linux-netdev=147880114928996=1

Yet i don't see these in net-next. And i based this patchset on a tree
which included the fixes. Hence they are not applying.

Have the fixes really been accepted?

Thanks
Andrew

Re: [PATCH] ip6_output: ensure flow saddr actually belongs to device

2016-11-13 Thread Jason A. Donenfeld

Hi David,

On Sun, Nov 13, 2016 at 5:30 PM, David Ahern  wrote:
> You can't require the address to be on the dst device. e.g., it can be an 
> address from the loopback/vrf device.
>
> This block needs to be done at function entry, and pass dev as NULL to mean 
> is the address assigned to any interface. That gets you the equivalency of 
> the IPv4 check.

I gave v2 my best shot. Hopefully it's adequate, but I have a feeling
it might be best for you to just code up what you have in mind.

Regards,
Jason

Re: [patch net v2 0/2] mlxsw: Couple of fixes

2016-11-13 Thread Jiri Pirko

Sun, Nov 13, 2016 at 06:51:33PM CET, da...@davemloft.net wrote:
>From: Jiri Pirko 
>Date: Fri, 11 Nov 2016 16:34:24 +0100
>
>> From: Jiri Pirko 
>> 
>> Please, queue-up both for stable. Thanks!
>
>Just to be clear I did make sure to take v2 rather than
>v1.

Good. Thanks!

Re: [PATCH net-next v2] ipv6: sr: fix IPv6 initialization failure without lwtunnels

2016-11-13 Thread David Lebrun

On 11/13/2016 06:23 AM, David Miller wrote:
> This seems like such a huge mess, quite frankly.
> 
> IPV6-SR has so many strange dependencies, a weird Kconfig option that is
> simply controlling what a responsible sysadmin should be allow to do if
> he chooses anyways.
> 
> Every distribution is going to say "¯\_(ツ)_/¯" and just turn the thing
> on in their builds.

Indeed, the issue is that seg6_iptunnel.o was included in obj-y instead
of ipv6-y, triggering the bug when CONFIG_IPV6=m. Fixed with the
following modification to the patch (tested with allyesconfig and
allmodconfig):

diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile
index 8979d53..a233136 100644
--- a/net/ipv6/Makefile
+++ b/net/ipv6/Makefile
@@ -53,6 +53,6 @@ obj-$(subst m,y,$(CONFIG_IPV6)) += inet6_hashtables.o

 ifneq ($(CONFIG_IPV6),)
 obj-$(CONFIG_NET_UDP_TUNNEL) += ip6_udp_tunnel.o
-obj-$(CONFIG_LWTUNNEL) += seg6_iptunnel.o
+ipv6-$(CONFIG_LWTUNNEL) += seg6_iptunnel.o
 obj-y += mcast_snoop.o
 endif

I agree with you that the way to combine the dependencies is strange,
even if they are very few. The part of the IPv6-SR patch that is enabled
by default depends on two things: IPV6 and LWTUNNEL. The problem is that
LWTUNNEL does not depend on IPV6 and is not necessarily enabled. To fix
the bug reported by Lorenzo, I propose to select one the three following
solutions:

1. Make LWTUNNEL always enabled (removing the option).
   Pros: remove an option
   Cons: add always-enabled code

2. Create an option IPV6_SEG6_LWTUNNEL, which would select LWTUNNEL and
enable the compilation of seg6_iptunnel.o.
   Pros: logically dissociate the part of IPv6-SR that depends on
LWTUNNEL from the core patch and simplifies compilation
   Cons: add an option

3. Apply the proposed patch with the fix
   Pros: do not modify options
   Cons: weird conditional compilation

What do you think ?

David

signature.asc
Description: OpenPGP digital signature

Re: [net-next 1/2] net: ethernet: slicoss: add slicoss gigabit ethernet driver

2016-11-13 Thread Andrew Lunn

> +static const char slic_stats_strings[][ETH_GSTRING_LEN] = {
> + "rx_packets ",
> + "rx_bytes   ",
> + "rx_multicasts  ",
> + "rx_errors  ",
> + "rx_buff_miss   ",
> + "rx_tp_csum ",
> + "rx_tp_oflow",
> + "rx_tp_hlen ",
> + "rx_ip_csum ",
> + "rx_ip_len  ",

Are there any other drivers which pad the statistics strings?

> +static void slic_set_link_autoneg(struct slic_device *sdev)
> +{
> + unsigned int subid = sdev->pdev->subsystem_device;
> + u32 val;
> +
> + if (sdev->is_fiber) {
> + /* We've got a fiber gigabit interface, and register 4 is
> +  * different in fiber mode than in copper mode.
> +  */
> + /* advertise FD only @1000 Mb */
> + val = MII_ADVERTISE << 16 | SLIC_PAR_ADV1000XFD |
> +   SLIC_PAR_ASYMPAUSE_FIBER;
> + /* enable PAUSE frames */
> + slic_write(sdev, SLIC_REG_WPHY, val);
> + /* reset phy, enable auto-neg  */
> + val = MII_BMCR << 16 | SLIC_PCR_RESET | SLIC_PCR_AUTONEG |
> +   SLIC_PCR_AUTONEG_RST;
> + slic_write(sdev, SLIC_REG_WPHY, val);
> + } else {/* copper gigabit */
> + /* We've got a copper gigabit interface, and register 4 is
> +  * different in copper mode than in fiber mode.
> +  */
> + /* advertise 10/100 Mb modes   */
> + val = MII_ADVERTISE << 16 | SLIC_PAR_ADV100FD |
> +   SLIC_PAR_ADV100HD | SLIC_PAR_ADV10FD | SLIC_PAR_ADV10HD;
> + /* enable PAUSE frames  */
> + val |= SLIC_PAR_ASYMPAUSE;
> + /* required by the Cicada PHY  */
> + val |= SLIC_PAR_802_3;
> + slic_write(sdev, SLIC_REG_WPHY, val);
> +
> + /* advertise FD only @1000 Mb  */
> + val = MII_CTRL1000 << 16 | SLIC_PGC_ADV1000FD;
> + slic_write(sdev, SLIC_REG_WPHY, val);
> +
> + if (subid != PCI_SUBDEVICE_ID_ALACRITECH_CICADA) {
> +  /* if a Marvell PHY enable auto crossover */
> + val = SLIC_MIICR_REG_16 | SLIC_MRV_REG16_XOVERON;
> + slic_write(sdev, SLIC_REG_WPHY, val);
> +
> + /* reset phy, enable auto-neg  */
> + val = MII_BMCR << 16 | SLIC_PCR_RESET |
> +   SLIC_PCR_AUTONEG | SLIC_PCR_AUTONEG_RST;
> + slic_write(sdev, SLIC_REG_WPHY, val);
> + } else {
> + /* enable and restart auto-neg (don't reset)  */
> + val = MII_BMCR << 16 | SLIC_PCR_AUTONEG |
> +   SLIC_PCR_AUTONEG_RST;
> + slic_write(sdev, SLIC_REG_WPHY, val);
> + }
> + }
> + sdev->autoneg = true;
> +}

Could this be pulled out into a standard PHY driver? All the SLIC
SLIC_PCR_ defines seems to be the same as those in mii.h. This could
be a standard PHY hidden behind a single register.

   Andrew

Re: Debugging Ethernet issues

2016-11-13 Thread Florian Fainelli

Le 13/11/2016 à 11:51, Mason a écrit :
> On 13/11/2016 04:09, Andrew Lunn wrote:
> 
>> Mason wrote:
>>
>>> When connected to a Gigabit switch
>>> 3.4 negotiates a LAN DHCP setup instantly
>>> 4.7 requires over 5 seconds to do so
>>
>> When you run tcpdump on the DHCP server, are you noticing the first
>> request is missing?
>>
>> What can happen is the dhclient gets started immediately and sends out
>> its first request before auto-negotiation has finished. So this first packet
>> gets lost. The retransmit after a few seconds is then successful.
> 
> I will run tcpdump on the server as I run udhcpc on the client
> for Linux 3.4 vs 4.7
> 
> Do you know what would make auto-negotiation fail at 100 Mbps
> on 4.7? (whereas it succeeds on 3.4)
> 
> (Thinking out loud) If the problem were in auto-negotiation,
> then if should work if I hard-code speed and duplex using
> ethtool, right? (IIRC, hard-coding doesn't help.)

I would start with checking basic things:

- does your Ethernet driver get a link UP being reported correctly
(netif_carrier_ok returns 1)?
- if you let the bootloader configure the PHY and utilize the Generic
PHY driver instead of the Atheros PHY driver, does the problem appear as
well?
- what do transmit/receive counters on the Ethernet driver/MAC return?
-- 
Florian

Re: Debugging Ethernet issues

2016-11-13 Thread Mason

On 13/11/2016 04:09, Andrew Lunn wrote:

> Mason wrote:
>
>> When connected to a Gigabit switch
>> 3.4 negotiates a LAN DHCP setup instantly
>> 4.7 requires over 5 seconds to do so
> 
> When you run tcpdump on the DHCP server, are you noticing the first
> request is missing?
> 
> What can happen is the dhclient gets started immediately and sends out
> its first request before auto-negotiation has finished. So this first packet
> gets lost. The retransmit after a few seconds is then successful.

I will run tcpdump on the server as I run udhcpc on the client
for Linux 3.4 vs 4.7

Do you know what would make auto-negotiation fail at 100 Mbps
on 4.7? (whereas it succeeds on 3.4)

(Thinking out loud) If the problem were in auto-negotiation,
then if should work if I hard-code speed and duplex using
ethtool, right? (IIRC, hard-coding doesn't help.)

Regards.

1 2 >

1 - 100 of 145 matches

Mail list logo