[Patch net] ipt_CLUSTERIP: fix a race condition of proc file creation

2018-02-07 Thread Cong Wang
There is a race condition between clusterip_config_entry_put()
and clusterip_config_init(), after we release the spinlock in
clusterip_config_entry_put(), a new proc file with a same IP could
be created immediately since it is already removed from the configs
list, therefore it triggers this warning:

[ cut here ]
proc_dir_entry 'ipt_CLUSTERIP/172.20.0.170' already registered
WARNING: CPU: 1 PID: 4152 at fs/proc/generic.c:330 proc_register+0x2a4/0x370 
fs/proc/generic.c:329
Kernel panic - not syncing: panic_on_warn set ...

As a quick fix, just move the proc_remove() inside the spinlock.

Reported-by: 
Fixes: 6c5d5cfbe3c5 ("netfilter: ipt_CLUSTERIP: check duplicate config when 
initializing")
Tested-by: Paolo Abeni 
Cc: Xin Long 
Cc: Pablo Neira Ayuso 
Signed-off-by: Cong Wang 
---
 net/ipv4/netfilter/ipt_CLUSTERIP.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c 
b/net/ipv4/netfilter/ipt_CLUSTERIP.c
index 3a84a60f6b39..1ff72b87a066 100644
--- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
+++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
@@ -107,12 +107,6 @@ clusterip_config_entry_put(struct net *net, struct 
clusterip_config *c)
 
local_bh_disable();
if (refcount_dec_and_lock(>entries, >lock)) {
-   list_del_rcu(>list);
-   spin_unlock(>lock);
-   local_bh_enable();
-
-   unregister_netdevice_notifier(>notifier);
-
/* In case anyone still accesses the file, the open/close
 * functions are also incrementing the refcount on their own,
 * so it's safe to remove the entry even if it's in use. */
@@ -120,6 +114,12 @@ clusterip_config_entry_put(struct net *net, struct 
clusterip_config *c)
if (cn->procdir)
proc_remove(c->pde);
 #endif
+   list_del_rcu(>list);
+   spin_unlock(>lock);
+   local_bh_enable();
+
+   unregister_netdevice_notifier(>notifier);
+
return;
}
local_bh_enable();
-- 
2.13.0

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/4] netlink: add generic object description infrastructure

2018-02-07 Thread Randy Dunlap
On 02/06/2018 05:37 PM, Pablo Neira Ayuso wrote:
> This patch allows netlink busses to provide object descriptions to
> userspace, in terms of supported attributes and its corresponding
> datatypes.
> 
> Userspace sends a requests that looks like:
> 
>   netlink header
>   NLA_DESC_REQ_BUS
>   NLA_DESC_REQ_DATA
> 
> Where NLA_DESC_REQ_BUS is the netlink bus/protocol number, eg.
> NETLINK_NETFILTER, and NLA_DESC_REQ_DATA is an attribute layout is
> specific to the bus that you are inspecting, this is useful for both
> nfnetlink and genetlink since they need to what subsystem in the bus
> specifically you're targeting to.
> 
> Then, the netlink description subsystem response via netlink dump looks
> like this:
> 
>   netlink header
>   NLA_DESC_NUM_OBJS
>   NLA_DESC_OBJS (nest)
>   NLA_DESC_LIST_ITEM (nest)
>   NLA_DESC_OBJ_ID
>   NLA_DESC_OBJ_ATTRS_MAX
>   NLA_DESC_OBJ_ATTRS (nest)
>   NLA_DESC_LIST_ITEM (nest)
>   NLA_DESC_ATTR_NUM
>   NLA_DESC_ATTR_TYPE
>   NLA_DESC_ATTR_LEN
>   NLA_DESC_ATTR_MAXVAL
>   NLA_DESC_ATTR_NEST_ID
>   NLA_DESC_LIST_ITEM (nest)
>   ...
> 
> Each object definition is composed of an unique ID, the number of
> attributes and the list of attribute definitions.
> 
> The NETLINK_DESC bus provides a generic interface to retrieve the list
> of existing objects and its attributes via netlink dump. This new
> description family autoloads module dependencies based on what userspace
> requests.
> 
> Each bus needs to register a struct nl_desc_subsys definition, that
> provides the lookup and parse callbacks. These route the description
> requests to the corresponding backend subsystem for genetlink and
> nfnetlink. The lookup callback returns struct nl_desc_objs that provides
> the array of object descriptions.
> 
> Signed-off-by: Pablo Neira Ayuso 
> ---
>  include/net/net_namespace.h  |   1 +
>  include/net/nldesc.h | 160 ++
>  include/uapi/linux/netlink.h |  67 ++
>  net/netlink/Makefile |   2 +-
>  net/netlink/desc.c   | 499 
> +++
>  5 files changed, 728 insertions(+), 1 deletion(-)
>  create mode 100644 include/net/nldesc.h
>  create mode 100644 net/netlink/desc.c
> 

> diff --git a/include/net/nldesc.h b/include/net/nldesc.h
> new file mode 100644
> index ..19306a648f10
> --- /dev/null
> +++ b/include/net/nldesc.h
> @@ -0,0 +1,160 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __NET_NLDESC_H
> +#define __NET_NLDESC_H
> +
> +#include 
> +
> +struct nl_desc_cmd;
> +struct nl_desc_obj;
> +
> +struct nl_desc_cmds {
> + int max;
> + const struct nl_desc_cmd*table;
> +};
> +
> +struct nl_desc_objs {
> + int max;
> + const struct nl_desc_obj**table;
> +};
> +
> +struct nl_desc_req {
> + u32 bus;
> +};
> +
> +struct net;
> +struct sk_buff;
> +struct nlmsghdr;
> +struct nlattr;
> +

> +
> +/**
> + * struct nl_desc_obj - netlink object description
> + * @id: unique ID to identify this netlink object
> + * @max: number of attributes to describe this object

  @attr_max:

> + * @attrs: array of attribute descriptions
> + */
> +struct nl_desc_obj {
> + u16 id;
> + u16 attr_max;
> + const struct nl_desc_attr   *attrs;
> +};


Is there a test program for this?
Maybe add it to tools/testing/ ?

thanks,
-- 
~Randy
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


linux-next: Signed-off-by missing for commit in the netfilter tree

2018-02-07 Thread Stephen Rothwell
Hi all,

Commit

  d8ed9600581d ("netfilter: remove useless prototype")

is missing a Signed-off-by from its committer.

-- 
Cheers,
Stephen Rothwell
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf 7/7] netfilter: x_tables: use pr ratelimiting in all remaining spots

2018-02-07 Thread Pablo Neira Ayuso
On Wed, Feb 07, 2018 at 08:23:23PM +0100, Florian Westphal wrote:
> Pablo Neira Ayuso  wrote:
> > > --- a/net/bridge/netfilter/ebt_among.c
> > > +++ b/net/bridge/netfilter/ebt_among.c
> > > @@ -187,17 +187,17 @@ static int ebt_among_mt_check(const struct 
> > > xt_mtchk_param *par)
> > >   expected_length += ebt_mac_wormhash_size(wh_src);
> > >  
> > >   if (em->match_size != EBT_ALIGN(expected_length)) {
> > > - pr_info("wrong size: %d against expected %d, rounded to %zd\n",
> > > - em->match_size, expected_length,
> > > - EBT_ALIGN(expected_length));
> > > + pr_info_ratelimited("wrong size: %d against expected %d, 
> > > rounded to %zd\n",
> > 
> > Shouldn't all these be pr_err_ratelimited instead?
> 
> Don't know.
> 
> This could even be pr_debug actually since this message is
> useless unless you're doing ebtables development work.

I see, I'm telling this because iptables says 'look at dmesg' when we
hit EINVAL, but there will be nothing.

[...]
> > >   if (index == IPSET_INVALID_ID) {
> > > - pr_warn("Cannot find set identified by id %u to match\n",
> > > - info->match_set.index);
> > > + pr_warn_ratelimited("Cannot find set identified by id %u to 
> > > match\n",
> > > + info->match_set.index);
> > 
> > Use pr_err_ratelimited instead?
> 
> I think we should settle on a single pr_foo, i suggest
> pr_info(_ratelimited).

OK.

> This is not an error condition, we only have these
> printks because we can't return a proper error to userspace.
> 
> If this was netlink, it would be converted to extack instead...

Indeed, we have this primitive error reporting in iptables, we can do
better in nftables.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf 7/7] netfilter: x_tables: use pr ratelimiting in all remaining spots

2018-02-07 Thread Florian Westphal
Pablo Neira Ayuso  wrote:
> > --- a/net/bridge/netfilter/ebt_among.c
> > +++ b/net/bridge/netfilter/ebt_among.c
> > @@ -187,17 +187,17 @@ static int ebt_among_mt_check(const struct 
> > xt_mtchk_param *par)
> > expected_length += ebt_mac_wormhash_size(wh_src);
> >  
> > if (em->match_size != EBT_ALIGN(expected_length)) {
> > -   pr_info("wrong size: %d against expected %d, rounded to %zd\n",
> > -   em->match_size, expected_length,
> > -   EBT_ALIGN(expected_length));
> > +   pr_info_ratelimited("wrong size: %d against expected %d, 
> > rounded to %zd\n",
> 
> Shouldn't all these be pr_err_ratelimited instead?

Don't know.

This could even be pr_debug actually since this message is
useless unless you're doing ebtables development work.

> Probably this is a good chance to homogeneize all error reporting in
> xtables.

Yes.

> > if (wh_dst && (err = ebt_mac_wormhash_check_integrity(wh_dst))) {
> > -   pr_info("dst integrity fail: %x\n", -err);
> > +   pr_info_ratelimited("dst integrity fail: %x\n", -err);
> > return -EINVAL;
> > }
> > if (wh_src && (err = ebt_mac_wormhash_check_integrity(wh_src))) {
> > -   pr_info("src integrity fail: %x\n", -err);
> > +   pr_info_ratelimited("src integrity fail: %x\n", -err);
> > return -EINVAL;

Same for these two, I'll convert all to pr_debug instead.

> > if (info->queues_total == 0) {
> > -   pr_err("NFQUEUE: number of total queues is 0\n");
> 
> 
> We can probably add this all over the place in the same go?
 
> #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

Yes.

> > if (index == IPSET_INVALID_ID) {
> > -   pr_warn("Cannot find set identified by id %u to match\n",
> > -   info->match_set.index);
> > +   pr_warn_ratelimited("Cannot find set identified by id %u to 
> > match\n",
> > +   info->match_set.index);
> 
> Use pr_err_ratelimited instead?

I think we should settle on a single pr_foo, i suggest
pr_info(_ratelimited).

This is not an error condition, we only have these
printks because we can't return a proper error to userspace.

If this was netlink, it would be converted to extack instead...
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf 2/7] netfilter: x_tables: prefer pr_debug where possible

2018-02-07 Thread Florian Westphal
Pablo Neira Ayuso  wrote:
> On Wed, Feb 07, 2018 at 02:48:23PM +0100, Florian Westphal wrote:
> > prefer pr_debug for cases where error is usually not seen by users.
> > checkpatch complains due to lines > 80 but adding a newline doesn't
> > make things any more readable.
> > 
> > Signed-off-by: Florian Westphal 
> > ---
> >  net/ipv4/netfilter/ipt_rpfilter.c  | 2 +-
> >  net/ipv6/netfilter/ip6t_rpfilter.c | 2 +-
> >  net/netfilter/xt_SECMARK.c | 2 +-
> >  net/netfilter/xt_bpf.c | 2 +-
> >  net/netfilter/xt_connlabel.c   | 2 +-
> >  net/netfilter/xt_ipcomp.c  | 2 +-
> >  net/netfilter/xt_ipvs.c| 2 +-
> >  net/netfilter/xt_l2tp.c| 2 +-
> >  net/netfilter/xt_recent.c  | 4 ++--
> >  net/netfilter/xt_socket.c  | 8 
> >  net/netfilter/xt_time.c| 2 +-
> >  11 files changed, 15 insertions(+), 15 deletions(-)
> > 
> > diff --git a/net/ipv4/netfilter/ipt_rpfilter.c 
> > b/net/ipv4/netfilter/ipt_rpfilter.c
> > index 37fb9552e858..ffd1cf65af3a 100644
> > --- a/net/ipv4/netfilter/ipt_rpfilter.c
> > +++ b/net/ipv4/netfilter/ipt_rpfilter.c
> > @@ -105,7 +105,7 @@ static int rpfilter_check(const struct xt_mtchk_param 
> > *par)
> > const struct xt_rpfilter_info *info = par->matchinfo;
> > unsigned int options = ~XT_RPFILTER_OPTION_MASK;
> > if (info->flags & options) {
> > -   pr_info("unknown options encountered");
> > +   pr_debug("unknown options");
> 
> OK, so the idea is to use pr_debug() when it is unlikely to hit an
> error via iptables, right?

Yes, alternatively this pr_* could be removed.

Theoretically we could have some new version of iptables hat support
--rpfilter-foobar flag which would then trigger this -EINVAL.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf 1/7] netfilter: x_tables: remove pr_info where possible

2018-02-07 Thread Florian Westphal
Pablo Neira Ayuso  wrote:
> On Wed, Feb 07, 2018 at 02:48:22PM +0100, Florian Westphal wrote:
> > remove several pr_info messages that cannot be triggered with iptables.
> > 
> > Signed-off-by: Florian Westphal 
> > ---
> >  net/ipv4/netfilter/ipt_ECN.c | 10 --
> >  net/netfilter/xt_HL.c| 13 +++--
> >  net/netfilter/xt_LED.c   |  4 +---
> >  net/netfilter/xt_cgroup.c|  4 +---
> >  4 files changed, 9 insertions(+), 22 deletions(-)
> > 
> > diff --git a/net/ipv4/netfilter/ipt_ECN.c b/net/ipv4/netfilter/ipt_ECN.c
> > index 270765236f5e..39ff167e6d86 100644
> > --- a/net/ipv4/netfilter/ipt_ECN.c
> > +++ b/net/ipv4/netfilter/ipt_ECN.c
> > @@ -98,14 +98,12 @@ static int ecn_tg_check(const struct xt_tgchk_param 
> > *par)
> > const struct ipt_ECN_info *einfo = par->targinfo;
> > const struct ipt_entry *e = par->entryinfo;
> >  
> > -   if (einfo->operation & IPT_ECN_OP_MASK) {
> > -   pr_info("unsupported ECN operation %x\n", einfo->operation);
> > +   if (einfo->operation & IPT_ECN_OP_MASK)
> 
> According to patch 2/7, these should be pr_debug(), or probably I'm
> misunderstanding something :-).

Right, there is no consistency in the tree currently.

I don't think we'll see any new options added to ipt_ECN, so I don't
think its worth having a pr_foo() for this.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)

2018-02-07 Thread Andrew Morton
On Wed, 7 Feb 2018 18:44:39 +0100 Pablo Neira Ayuso  wrote:

> Hi,
> 
> On Wed, Jan 31, 2018 at 09:19:16AM +0100, Michal Hocko wrote:
> [...]
> > Yeah, we do not BUG but rather fail instead. See __vmalloc_node_range.
> > My excavation tools pointed me to "VM: Rework vmalloc code to support 
> > mapping of arbitray pages"
> > by Christoph back in 2002. So yes, we can safely remove it finally. Se
> > below.
> > 
> > 
> > From 8d52e1d939d101b0dafed6ae5c3c1376183e65bb Mon Sep 17 00:00:00 2001
> > From: Michal Hocko 
> > Date: Wed, 31 Jan 2018 09:16:56 +0100
> > Subject: [PATCH] net/netfilter/x_tables.c: remove size check
> > 
> > Back in 2002 vmalloc used to BUG on too large sizes. We are much better
> > behaved these days and vmalloc simply returns NULL for those. Remove
> > the check as it simply not needed and the comment even misleading.
> > 
> > Suggested-by: Andrew Morton 
> > Signed-off-by: Michal Hocko 
> > ---
> >  net/netfilter/x_tables.c | 4 
> >  1 file changed, 4 deletions(-)
> > 
> > diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
> > index b55ec5aa51a6..48a6ff620493 100644
> > --- a/net/netfilter/x_tables.c
> > +++ b/net/netfilter/x_tables.c
> > @@ -999,10 +999,6 @@ struct xt_table_info *xt_alloc_table_info(unsigned int 
> > size)
> > if (sz < sizeof(*info))
> > return NULL;
> >  
> > -   /* Pedantry: prevent them from hitting BUG() in vmalloc.c --RR */
> > -   if ((SMP_ALIGN(size) >> PAGE_SHIFT) + 2 > totalram_pages)
> > -   return NULL;
> > -
> > /* __GFP_NORETRY is not fully supported by kvmalloc but it should
> >  * work reasonably well if sz is too large and bail out rather
> >  * than shoot all processes down before realizing there is nothing
> 
> Patchwork didn't catch this patch for some reason, would you mind to
> resend?

From: Michal Hocko 
Subject: net/netfilter/x_tables.c: remove size check

Back in 2002 vmalloc used to BUG on too large sizes.  We are much better
behaved these days and vmalloc simply returns NULL for those.  Remove the
check as it simply not needed and the comment is even misleading.

Link: http://lkml.kernel.org/r/20180131081916.go21...@dhcp22.suse.cz
Suggested-by: Andrew Morton 
Signed-off-by: Michal Hocko 
Reviewed-by: Andrew Morton 
Cc: Florian Westphal 
Cc: David S. Miller 
Signed-off-by: Andrew Morton 
---

 net/netfilter/x_tables.c |4 
 1 file changed, 4 deletions(-)

diff -puN net/netfilter/x_tables.c~net-netfilter-x_tablesc-remove-size-check 
net/netfilter/x_tables.c
--- a/net/netfilter/x_tables.c~net-netfilter-x_tablesc-remove-size-check
+++ a/net/netfilter/x_tables.c
@@ -1004,10 +1004,6 @@ struct xt_table_info *xt_alloc_table_inf
if (sz < sizeof(*info))
return NULL;
 
-   /* Pedantry: prevent them from hitting BUG() in vmalloc.c --RR */
-   if ((size >> PAGE_SHIFT) + 2 > totalram_pages)
-   return NULL;
-
/* __GFP_NORETRY is not fully supported by kvmalloc but it should
 * work reasonably well if sz is too large and bail out rather
 * than shoot all processes down before realizing there is nothing
_

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/11] Netfilter fixes for net

2018-02-07 Thread David Miller
From: Pablo Neira Ayuso 
Date: Wed,  7 Feb 2018 18:42:18 +0100

> The following patchset contains Netfilter fixes for you net tree, they
> are:
 ...
> You can pull these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Pulled, thanks Pablo.

> P.S: Again more fixes cooking on netfilter-devel@vger.kernel.org, so
>  another round is likely coming up soon.

Ok, no problem.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [netfilter-core] kernel panic: Out of memory and no killable processes... (2)

2018-02-07 Thread Pablo Neira Ayuso
Hi,

On Wed, Jan 31, 2018 at 09:19:16AM +0100, Michal Hocko wrote:
[...]
> Yeah, we do not BUG but rather fail instead. See __vmalloc_node_range.
> My excavation tools pointed me to "VM: Rework vmalloc code to support mapping 
> of arbitray pages"
> by Christoph back in 2002. So yes, we can safely remove it finally. Se
> below.
> 
> 
> From 8d52e1d939d101b0dafed6ae5c3c1376183e65bb Mon Sep 17 00:00:00 2001
> From: Michal Hocko 
> Date: Wed, 31 Jan 2018 09:16:56 +0100
> Subject: [PATCH] net/netfilter/x_tables.c: remove size check
> 
> Back in 2002 vmalloc used to BUG on too large sizes. We are much better
> behaved these days and vmalloc simply returns NULL for those. Remove
> the check as it simply not needed and the comment even misleading.
> 
> Suggested-by: Andrew Morton 
> Signed-off-by: Michal Hocko 
> ---
>  net/netfilter/x_tables.c | 4 
>  1 file changed, 4 deletions(-)
> 
> diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
> index b55ec5aa51a6..48a6ff620493 100644
> --- a/net/netfilter/x_tables.c
> +++ b/net/netfilter/x_tables.c
> @@ -999,10 +999,6 @@ struct xt_table_info *xt_alloc_table_info(unsigned int 
> size)
>   if (sz < sizeof(*info))
>   return NULL;
>  
> - /* Pedantry: prevent them from hitting BUG() in vmalloc.c --RR */
> - if ((SMP_ALIGN(size) >> PAGE_SHIFT) + 2 > totalram_pages)
> - return NULL;
> -
>   /* __GFP_NORETRY is not fully supported by kvmalloc but it should
>* work reasonably well if sz is too large and bail out rather
>* than shoot all processes down before realizing there is nothing

Patchwork didn't catch this patch for some reason, would you mind to
resend?

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/11] netfilter: nft_flow_offload: no need to flush entries on module removal

2018-02-07 Thread Pablo Neira Ayuso
nft_flow_offload module removal does not require to flush existing
flowtables, it is valid to remove this module while keeping flowtables
around.

Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nft_flow_offload.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c
index 1739ff8ca21f..e5c45c7ac02a 100644
--- a/net/netfilter/nft_flow_offload.c
+++ b/net/netfilter/nft_flow_offload.c
@@ -247,14 +247,8 @@ static int __init nft_flow_offload_module_init(void)
 
 static void __exit nft_flow_offload_module_exit(void)
 {
-   struct net *net;
-
nft_unregister_expr(_flow_offload_type);
unregister_netdevice_notifier(_offload_netdev_notifier);
-   rtnl_lock();
-   for_each_net(net)
-   nft_flow_table_iterate(net, nft_flow_offload_iterate_cleanup, 
NULL);
-   rtnl_unlock();
 }
 
 module_init(nft_flow_offload_module_init);
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/11] netfilter: x_tables: make allocation less aggressive

2018-02-07 Thread Pablo Neira Ayuso
From: Michal Hocko 

syzbot has noticed that xt_alloc_table_info can allocate a lot of memory.
This is an admin only interface but an admin in a namespace is sufficient
as well.  eacd86ca3b03 ("net/netfilter/x_tables.c: use kvmalloc() in
xt_alloc_table_info()") has changed the opencoded kmalloc->vmalloc
fallback into kvmalloc.  It has dropped __GFP_NORETRY on the way because
vmalloc has simply never fully supported __GFP_NORETRY semantic.  This is
still the case because e.g.  page tables backing the vmalloc area are
hardcoded GFP_KERNEL.

Revert back to __GFP_NORETRY as a poors man defence against excessively
large allocation request here.  We will not rule out the OOM killer
completely but __GFP_NORETRY should at least stop the large request in
most cases.

[a...@linux-foundation.org: coding-style fixes]
Fixes: eacd86ca3b03 ("net/netfilter/x_tables.c: use kvmalloc() in 
xt_alloc_tableLink: 
http://lkml.kernel.org/r/20180130140104.ge21...@dhcp22.suse.cz
Signed-off-by: Michal Hocko 
Acked-by: Florian Westphal 
Reviewed-by: Andrew Morton 
Cc: David S. Miller 
Signed-off-by: Andrew Morton 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/x_tables.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 8fa4d37141a7..2f685ee1f9c8 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -1008,7 +1008,12 @@ struct xt_table_info *xt_alloc_table_info(unsigned int 
size)
if ((size >> PAGE_SHIFT) + 2 > totalram_pages)
return NULL;
 
-   info = kvmalloc(sz, GFP_KERNEL);
+   /* __GFP_NORETRY is not fully supported by kvmalloc but it should
+* work reasonably well if sz is too large and bail out rather
+* than shoot all processes down before realizing there is nothing
+* more to reclaim.
+*/
+   info = kvmalloc(sz, GFP_KERNEL | __GFP_NORETRY);
if (!info)
return NULL;
 
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/11] netfilter: xt_RATEEST: acquire xt_rateest_mutex for hash insert

2018-02-07 Thread Pablo Neira Ayuso
From: Cong Wang 

rateest_hash is supposed to be protected by xt_rateest_mutex,
and, as suggested by Eric, lookup and insert should be atomic,
so we should acquire the xt_rateest_mutex once for both.

So introduce a non-locking helper for internal use and keep the
locking one for external.

Reported-by: 
Fixes: 5859034d7eb8 ("[NETFILTER]: x_tables: add RATEEST target")
Signed-off-by: Cong Wang 
Reviewed-by: Florian Westphal 
Reviewed-by: Eric Dumazet 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/xt_RATEEST.c | 22 +-
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/xt_RATEEST.c b/net/netfilter/xt_RATEEST.c
index 498b54fd04d7..141c295191f6 100644
--- a/net/netfilter/xt_RATEEST.c
+++ b/net/netfilter/xt_RATEEST.c
@@ -39,23 +39,31 @@ static void xt_rateest_hash_insert(struct xt_rateest *est)
hlist_add_head(>list, _hash[h]);
 }
 
-struct xt_rateest *xt_rateest_lookup(const char *name)
+static struct xt_rateest *__xt_rateest_lookup(const char *name)
 {
struct xt_rateest *est;
unsigned int h;
 
h = xt_rateest_hash(name);
-   mutex_lock(_rateest_mutex);
hlist_for_each_entry(est, _hash[h], list) {
if (strcmp(est->name, name) == 0) {
est->refcnt++;
-   mutex_unlock(_rateest_mutex);
return est;
}
}
-   mutex_unlock(_rateest_mutex);
+
return NULL;
 }
+
+struct xt_rateest *xt_rateest_lookup(const char *name)
+{
+   struct xt_rateest *est;
+
+   mutex_lock(_rateest_mutex);
+   est = __xt_rateest_lookup(name);
+   mutex_unlock(_rateest_mutex);
+   return est;
+}
 EXPORT_SYMBOL_GPL(xt_rateest_lookup);
 
 void xt_rateest_put(struct xt_rateest *est)
@@ -100,8 +108,10 @@ static int xt_rateest_tg_checkentry(const struct 
xt_tgchk_param *par)
 
net_get_random_once(_rnd, sizeof(jhash_rnd));
 
-   est = xt_rateest_lookup(info->name);
+   mutex_lock(_rateest_mutex);
+   est = __xt_rateest_lookup(info->name);
if (est) {
+   mutex_unlock(_rateest_mutex);
/*
 * If estimator parameters are specified, they must match the
 * existing estimator.
@@ -139,11 +149,13 @@ static int xt_rateest_tg_checkentry(const struct 
xt_tgchk_param *par)
 
info->est = est;
xt_rateest_hash_insert(est);
+   mutex_unlock(_rateest_mutex);
return 0;
 
 err2:
kfree(est);
 err1:
+   mutex_unlock(_rateest_mutex);
return ret;
 }
 
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 00/11] Netfilter fixes for net

2018-02-07 Thread Pablo Neira Ayuso
Hi David,

The following patchset contains Netfilter fixes for you net tree, they
are:

1) Restore __GFP_NORETRY in xt_table allocations to mitigate effects of
   large memory allocation requests, from Michal Hocko.

2) Release IPv6 fragment queue in case of error in fragmentation header,
   this is a follow up to amend patch 83f1999caeb1, from Subash Abhinov
   Kasiviswanathan.

3) Flowtable infrastructure depends on NETFILTER_INGRESS as it registers
   a hook for each flowtable, reported by John Crispin.

4) Missing initialization of info->priv in xt_cgroup version 1, from
   Cong Wang.

5) Give a chance to garbage collector to run after scheduling flowtable
   cleanup.

6) Releasing flowtable content on nft_flow_offload module removal is
   not required at all, there is not dependencies between this module
   and flowtables, remove it.

7) Fix missing xt_rateest_mutex grabbing for hash insertions, also from
   Cong Wang.

8) Move nf_flow_table_cleanup() routine to flowtable core, this patch is
   a dependency for the next patch in this list.

9) Flowtable resources are not properly released on removal from the
   control plane. Fix this resource leak by scheduling removal of all
   entries and explicit call to the garbage collector.

10) nf_ct_nat_offset() declaration is dead code, this function prototype
is not used anywhere, remove it. From Taehee Yoo.

11) Fix another flowtable resource leak on entry insertion failures,
this patch also fixes a possible use-after-free. Patch from Felix
Fietkau.

You can pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Thanks!

P.S: Again more fixes cooking on netfilter-devel@vger.kernel.org, so
 another round is likely coming up soon.



The following changes since commit 743efac1c670c6618742c923f6275d819604:

  net: pxa168_eth: add netconsole support (2018-02-01 14:58:37 -0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git HEAD

for you to fetch changes up to 0ff90b6c20340e57616a51ae1a1bf18156d6638a:

  netfilter: nf_flow_offload: fix use-after-free and a resource leak 
(2018-02-07 11:55:52 +0100)


Cong Wang (2):
  netfilter: xt_cgroup: initialize info->priv in cgroup_mt_check_v1()
  netfilter: xt_RATEEST: acquire xt_rateest_mutex for hash insert

Felix Fietkau (1):
  netfilter: nf_flow_offload: fix use-after-free and a resource leak

Michal Hocko (1):
  netfilter: x_tables: make allocation less aggressive

Pablo Neira Ayuso (5):
  netfilter: flowtable infrastructure depends on NETFILTER_INGRESS
  netfilter: nft_flow_offload: wait for garbage collector to run after 
cleanup
  netfilter: nft_flow_offload: no need to flush entries on module removal
  netfilter: nft_flow_offload: move flowtable cleanup routines to 
nf_flow_table
  netfilter: nf_tables: fix flowtable free

Subash Abhinov Kasiviswanathan (1):
  netfilter: ipv6: nf_defrag: Kill frag queue on RFC2460 failure

Taehee Yoo (1):
  netfilter: remove useless prototype

 include/net/netfilter/nf_conntrack.h|  5 ---
 include/net/netfilter/nf_flow_table.h   |  6 ++-
 net/ipv4/netfilter/Kconfig  |  3 +-
 net/ipv4/netfilter/nf_flow_table_ipv4.c |  1 +
 net/ipv6/netfilter/Kconfig  |  3 +-
 net/ipv6/netfilter/nf_conntrack_reasm.c |  1 +
 net/ipv6/netfilter/nf_flow_table_ipv6.c |  1 +
 net/netfilter/Kconfig   |  8 ++--
 net/netfilter/nf_flow_table.c   | 76 ++---
 net/netfilter/nf_flow_table_inet.c  |  1 +
 net/netfilter/nf_tables_api.c   | 17 +++-
 net/netfilter/nft_flow_offload.c| 24 +--
 net/netfilter/x_tables.c|  7 ++-
 net/netfilter/xt_RATEEST.c  | 22 +++---
 net/netfilter/xt_cgroup.c   |  1 +
 15 files changed, 97 insertions(+), 79 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/11] netfilter: xt_cgroup: initialize info->priv in cgroup_mt_check_v1()

2018-02-07 Thread Pablo Neira Ayuso
From: Cong Wang 

xt_cgroup_info_v1->priv is an internal pointer only used for kernel,
we should not trust what user-space provides.

Reported-by: 
Fixes: c38c4597e4bf ("netfilter: implement xt_cgroup cgroup2 path match")
Cc: Pablo Neira Ayuso 
Signed-off-by: Cong Wang 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/xt_cgroup.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/netfilter/xt_cgroup.c b/net/netfilter/xt_cgroup.c
index 1db1ce59079f..891f4e7e8ea7 100644
--- a/net/netfilter/xt_cgroup.c
+++ b/net/netfilter/xt_cgroup.c
@@ -52,6 +52,7 @@ static int cgroup_mt_check_v1(const struct xt_mtchk_param 
*par)
return -EINVAL;
}
 
+   info->priv = NULL;
if (info->has_path) {
cgrp = cgroup_get_from_path(info->path);
if (IS_ERR(cgrp)) {
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/11] netfilter: nft_flow_offload: wait for garbage collector to run after cleanup

2018-02-07 Thread Pablo Neira Ayuso
If netdevice goes down, then flowtable entries are scheduled to be
removed. Wait for garbage collector to have a chance to run so it can
delete them from the hashtable.

The flush call might sleep, so hold the nfnl mutex from
nft_flow_table_iterate() instead of rcu read side lock. The use of the
nfnl mutex is also implicitly fixing races between updates via nfnetlink
and netdevice event.

Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_tables_api.c| 8 
 net/netfilter/nft_flow_offload.c | 1 +
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 0791813a1e7d..07dd1fac78a8 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -5006,13 +5006,13 @@ void nft_flow_table_iterate(struct net *net,
struct nft_flowtable *flowtable;
const struct nft_table *table;
 
-   rcu_read_lock();
-   list_for_each_entry_rcu(table, >nft.tables, list) {
-   list_for_each_entry_rcu(flowtable, >flowtables, list) {
+   nfnl_lock(NFNL_SUBSYS_NFTABLES);
+   list_for_each_entry(table, >nft.tables, list) {
+   list_for_each_entry(flowtable, >flowtables, list) {
iter(>data, data);
}
}
-   rcu_read_unlock();
+   nfnl_unlock(NFNL_SUBSYS_NFTABLES);
 }
 EXPORT_SYMBOL_GPL(nft_flow_table_iterate);
 
diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c
index 4503b8dcf9c0..1739ff8ca21f 100644
--- a/net/netfilter/nft_flow_offload.c
+++ b/net/netfilter/nft_flow_offload.c
@@ -208,6 +208,7 @@ static void nft_flow_offload_iterate_cleanup(struct 
nf_flowtable *flowtable,
 void *data)
 {
nf_flow_table_iterate(flowtable, flow_offload_iterate_cleanup, data);
+   flush_delayed_work(>gc_work);
 }
 
 static int flow_offload_netdev_event(struct notifier_block *this,
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/11] netfilter: nft_flow_offload: move flowtable cleanup routines to nf_flow_table

2018-02-07 Thread Pablo Neira Ayuso
Move the flowtable cleanup routines to nf_flow_table and expose the
nf_flow_table_cleanup() helper function.

Signed-off-by: Pablo Neira Ayuso 
---
 include/net/netfilter/nf_flow_table.h |  3 +++
 net/netfilter/nf_flow_table.c | 24 
 net/netfilter/nft_flow_offload.c  | 19 +--
 3 files changed, 28 insertions(+), 18 deletions(-)

diff --git a/include/net/netfilter/nf_flow_table.h 
b/include/net/netfilter/nf_flow_table.h
index b22b22082733..ed49cd169ecf 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -95,6 +95,9 @@ struct flow_offload_tuple_rhash *flow_offload_lookup(struct 
nf_flowtable *flow_t
 int nf_flow_table_iterate(struct nf_flowtable *flow_table,
  void (*iter)(struct flow_offload *flow, void *data),
  void *data);
+
+void nf_flow_table_cleanup(struct net *net, struct net_device *dev);
+
 void nf_flow_offload_work_gc(struct work_struct *work);
 extern const struct rhashtable_params nf_flow_offload_rhash_params;
 
diff --git a/net/netfilter/nf_flow_table.c b/net/netfilter/nf_flow_table.c
index 2f5099cb85b8..04c08f6b9015 100644
--- a/net/netfilter/nf_flow_table.c
+++ b/net/netfilter/nf_flow_table.c
@@ -4,6 +4,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -425,5 +426,28 @@ int nf_flow_dnat_port(const struct flow_offload *flow,
 }
 EXPORT_SYMBOL_GPL(nf_flow_dnat_port);
 
+static void nf_flow_table_do_cleanup(struct flow_offload *flow, void *data)
+{
+   struct net_device *dev = data;
+
+   if (dev && flow->tuplehash[0].tuple.iifidx != dev->ifindex)
+   return;
+
+   flow_offload_dead(flow);
+}
+
+static void nf_flow_table_iterate_cleanup(struct nf_flowtable *flowtable,
+ void *data)
+{
+   nf_flow_table_iterate(flowtable, nf_flow_table_do_cleanup, data);
+   flush_delayed_work(>gc_work);
+}
+
+void nf_flow_table_cleanup(struct net *net, struct net_device *dev)
+{
+   nft_flow_table_iterate(net, nf_flow_table_iterate_cleanup, dev);
+}
+EXPORT_SYMBOL_GPL(nf_flow_table_cleanup);
+
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Pablo Neira Ayuso ");
diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c
index e5c45c7ac02a..b65829b2be22 100644
--- a/net/netfilter/nft_flow_offload.c
+++ b/net/netfilter/nft_flow_offload.c
@@ -194,23 +194,6 @@ static struct nft_expr_type nft_flow_offload_type 
__read_mostly = {
.owner  = THIS_MODULE,
 };
 
-static void flow_offload_iterate_cleanup(struct flow_offload *flow, void *data)
-{
-   struct net_device *dev = data;
-
-   if (dev && flow->tuplehash[0].tuple.iifidx != dev->ifindex)
-   return;
-
-   flow_offload_dead(flow);
-}
-
-static void nft_flow_offload_iterate_cleanup(struct nf_flowtable *flowtable,
-void *data)
-{
-   nf_flow_table_iterate(flowtable, flow_offload_iterate_cleanup, data);
-   flush_delayed_work(>gc_work);
-}
-
 static int flow_offload_netdev_event(struct notifier_block *this,
 unsigned long event, void *ptr)
 {
@@ -219,7 +202,7 @@ static int flow_offload_netdev_event(struct notifier_block 
*this,
if (event != NETDEV_DOWN)
return NOTIFY_DONE;
 
-   nft_flow_table_iterate(dev_net(dev), nft_flow_offload_iterate_cleanup, 
dev);
+   nf_flow_table_cleanup(dev_net(dev), dev);
 
return NOTIFY_DONE;
 }
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/11] netfilter: nf_flow_offload: fix use-after-free and a resource leak

2018-02-07 Thread Pablo Neira Ayuso
From: Felix Fietkau 

flow_offload_del frees the flow, so all associated resource must be
freed before.

Since the ct entry in struct flow_offload_entry was allocated by
flow_offload_alloc, it should be freed by flow_offload_free to take care
of the error handling path when flow_offload_add fails.

While at it, make flow_offload_del static, since it should never be
called directly, only from the gc step

Signed-off-by: Felix Fietkau 
Signed-off-by: Pablo Neira Ayuso 
---
 include/net/netfilter/nf_flow_table.h |  1 -
 net/netfilter/nf_flow_table.c | 27 +++
 2 files changed, 7 insertions(+), 21 deletions(-)

diff --git a/include/net/netfilter/nf_flow_table.h 
b/include/net/netfilter/nf_flow_table.h
index 020ae903066f..833752dd0c58 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -90,7 +90,6 @@ struct flow_offload *flow_offload_alloc(struct nf_conn *ct,
 void flow_offload_free(struct flow_offload *flow);
 
 int flow_offload_add(struct nf_flowtable *flow_table, struct flow_offload 
*flow);
-void flow_offload_del(struct nf_flowtable *flow_table, struct flow_offload 
*flow);
 struct flow_offload_tuple_rhash *flow_offload_lookup(struct nf_flowtable 
*flow_table,
 struct flow_offload_tuple 
*tuple);
 int nf_flow_table_iterate(struct nf_flowtable *flow_table,
diff --git a/net/netfilter/nf_flow_table.c b/net/netfilter/nf_flow_table.c
index c17f1af42daa..ec410cae9307 100644
--- a/net/netfilter/nf_flow_table.c
+++ b/net/netfilter/nf_flow_table.c
@@ -125,7 +125,9 @@ void flow_offload_free(struct flow_offload *flow)
dst_release(flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.dst_cache);
dst_release(flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.dst_cache);
e = container_of(flow, struct flow_offload_entry, flow);
-   kfree(e);
+   nf_ct_delete(e->ct, 0, 0);
+   nf_ct_put(e->ct);
+   kfree_rcu(e, rcu_head);
 }
 EXPORT_SYMBOL_GPL(flow_offload_free);
 
@@ -149,11 +151,9 @@ int flow_offload_add(struct nf_flowtable *flow_table, 
struct flow_offload *flow)
 }
 EXPORT_SYMBOL_GPL(flow_offload_add);
 
-void flow_offload_del(struct nf_flowtable *flow_table,
- struct flow_offload *flow)
+static void flow_offload_del(struct nf_flowtable *flow_table,
+struct flow_offload *flow)
 {
-   struct flow_offload_entry *e;
-
rhashtable_remove_fast(_table->rhashtable,
   >tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].node,
   *flow_table->type->params);
@@ -161,10 +161,8 @@ void flow_offload_del(struct nf_flowtable *flow_table,
   >tuplehash[FLOW_OFFLOAD_DIR_REPLY].node,
   *flow_table->type->params);
 
-   e = container_of(flow, struct flow_offload_entry, flow);
-   kfree_rcu(e, rcu_head);
+   flow_offload_free(flow);
 }
-EXPORT_SYMBOL_GPL(flow_offload_del);
 
 struct flow_offload_tuple_rhash *
 flow_offload_lookup(struct nf_flowtable *flow_table,
@@ -175,15 +173,6 @@ flow_offload_lookup(struct nf_flowtable *flow_table,
 }
 EXPORT_SYMBOL_GPL(flow_offload_lookup);
 
-static void nf_flow_release_ct(const struct flow_offload *flow)
-{
-   struct flow_offload_entry *e;
-
-   e = container_of(flow, struct flow_offload_entry, flow);
-   nf_ct_delete(e->ct, 0, 0);
-   nf_ct_put(e->ct);
-}
-
 int nf_flow_table_iterate(struct nf_flowtable *flow_table,
  void (*iter)(struct flow_offload *flow, void *data),
  void *data)
@@ -259,10 +248,8 @@ static int nf_flow_offload_gc_step(struct nf_flowtable 
*flow_table)
flow = container_of(tuplehash, struct flow_offload, 
tuplehash[0]);
 
if (nf_flow_has_expired(flow) ||
-   nf_flow_is_dying(flow)) {
+   nf_flow_is_dying(flow))
flow_offload_del(flow_table, flow);
-   nf_flow_release_ct(flow);
-   }
}
 out:
rhashtable_walk_stop();
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/11] netfilter: nf_tables: fix flowtable free

2018-02-07 Thread Pablo Neira Ayuso
Every flow_offload entry is added into the table twice. Because of this,
rhashtable_free_and_destroy can't be used, since it would call kfree for
each flow_offload object twice.

This patch cleans up the flowtable via nf_flow_table_iterate() to
schedule removal of entries by setting on the dying bit, then there is
an explicitly invocation of the garbage collector to release resources.

Based on patch from Felix Fietkau.

Signed-off-by: Pablo Neira Ayuso 
---
 include/net/netfilter/nf_flow_table.h   |  2 ++
 net/ipv4/netfilter/nf_flow_table_ipv4.c |  1 +
 net/ipv6/netfilter/nf_flow_table_ipv6.c |  1 +
 net/netfilter/nf_flow_table.c   | 25 +++--
 net/netfilter/nf_flow_table_inet.c  |  1 +
 net/netfilter/nf_tables_api.c   |  9 ++---
 6 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/include/net/netfilter/nf_flow_table.h 
b/include/net/netfilter/nf_flow_table.h
index ed49cd169ecf..020ae903066f 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -14,6 +14,7 @@ struct nf_flowtable_type {
struct list_headlist;
int family;
void(*gc)(struct work_struct *work);
+   void(*free)(struct nf_flowtable *ft);
const struct rhashtable_params  *params;
nf_hookfn   *hook;
struct module   *owner;
@@ -98,6 +99,7 @@ int nf_flow_table_iterate(struct nf_flowtable *flow_table,
 
 void nf_flow_table_cleanup(struct net *net, struct net_device *dev);
 
+void nf_flow_table_free(struct nf_flowtable *flow_table);
 void nf_flow_offload_work_gc(struct work_struct *work);
 extern const struct rhashtable_params nf_flow_offload_rhash_params;
 
diff --git a/net/ipv4/netfilter/nf_flow_table_ipv4.c 
b/net/ipv4/netfilter/nf_flow_table_ipv4.c
index b2d01eb25f2c..25d2975da156 100644
--- a/net/ipv4/netfilter/nf_flow_table_ipv4.c
+++ b/net/ipv4/netfilter/nf_flow_table_ipv4.c
@@ -260,6 +260,7 @@ static struct nf_flowtable_type flowtable_ipv4 = {
.family = NFPROTO_IPV4,
.params = _flow_offload_rhash_params,
.gc = nf_flow_offload_work_gc,
+   .free   = nf_flow_table_free,
.hook   = nf_flow_offload_ip_hook,
.owner  = THIS_MODULE,
 };
diff --git a/net/ipv6/netfilter/nf_flow_table_ipv6.c 
b/net/ipv6/netfilter/nf_flow_table_ipv6.c
index fff21602875a..d346705d6ee6 100644
--- a/net/ipv6/netfilter/nf_flow_table_ipv6.c
+++ b/net/ipv6/netfilter/nf_flow_table_ipv6.c
@@ -253,6 +253,7 @@ static struct nf_flowtable_type flowtable_ipv6 = {
.family = NFPROTO_IPV6,
.params = _flow_offload_rhash_params,
.gc = nf_flow_offload_work_gc,
+   .free   = nf_flow_table_free,
.hook   = nf_flow_offload_ipv6_hook,
.owner  = THIS_MODULE,
 };
diff --git a/net/netfilter/nf_flow_table.c b/net/netfilter/nf_flow_table.c
index 04c08f6b9015..c17f1af42daa 100644
--- a/net/netfilter/nf_flow_table.c
+++ b/net/netfilter/nf_flow_table.c
@@ -232,19 +232,16 @@ static inline bool nf_flow_is_dying(const struct 
flow_offload *flow)
return flow->flags & FLOW_OFFLOAD_DYING;
 }
 
-void nf_flow_offload_work_gc(struct work_struct *work)
+static int nf_flow_offload_gc_step(struct nf_flowtable *flow_table)
 {
struct flow_offload_tuple_rhash *tuplehash;
-   struct nf_flowtable *flow_table;
struct rhashtable_iter hti;
struct flow_offload *flow;
int err;
 
-   flow_table = container_of(work, struct nf_flowtable, gc_work.work);
-
err = rhashtable_walk_init(_table->rhashtable, , GFP_KERNEL);
if (err)
-   goto schedule;
+   return 0;
 
rhashtable_walk_start();
 
@@ -270,7 +267,16 @@ void nf_flow_offload_work_gc(struct work_struct *work)
 out:
rhashtable_walk_stop();
rhashtable_walk_exit();
-schedule:
+
+   return 1;
+}
+
+void nf_flow_offload_work_gc(struct work_struct *work)
+{
+   struct nf_flowtable *flow_table;
+
+   flow_table = container_of(work, struct nf_flowtable, gc_work.work);
+   nf_flow_offload_gc_step(flow_table);
queue_delayed_work(system_power_efficient_wq, _table->gc_work, HZ);
 }
 EXPORT_SYMBOL_GPL(nf_flow_offload_work_gc);
@@ -449,5 +455,12 @@ void nf_flow_table_cleanup(struct net *net, struct 
net_device *dev)
 }
 EXPORT_SYMBOL_GPL(nf_flow_table_cleanup);
 
+void nf_flow_table_free(struct nf_flowtable *flow_table)
+{
+   nf_flow_table_iterate(flow_table, nf_flow_table_do_cleanup, NULL);
+   WARN_ON(!nf_flow_offload_gc_step(flow_table));
+}
+EXPORT_SYMBOL_GPL(nf_flow_table_free);
+
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Pablo Neira Ayuso ");
diff --git a/net/netfilter/nf_flow_table_inet.c 
b/net/netfilter/nf_flow_table_inet.c
index 

[PATCH 03/11] netfilter: flowtable infrastructure depends on NETFILTER_INGRESS

2018-02-07 Thread Pablo Neira Ayuso
config NF_FLOW_TABLE depends on NETFILTER_INGRESS. If users forget to
enable this toggle, flowtable registration fails with EOPNOTSUPP.

Moreover, turn 'select NF_FLOW_TABLE' in every flowtable family flavour
into dependency instead, otherwise this new dependency on
NETFILTER_INGRESS causes a warning. This also allows us to remove the
explicit dependency between family flowtables <-> NF_TABLES and
NF_CONNTRACK, given they depend on the NF_FLOW_TABLE core that already
expresses the general dependencies for this new infrastructure.

Moreover, NF_FLOW_TABLE_INET depends on NF_FLOW_TABLE_IPV4 and
NF_FLOWTABLE_IPV6, which already depends on NF_FLOW_TABLE. So we can get
rid of direct dependency with NF_FLOW_TABLE.

In general, let's avoid 'select', it just makes things more complicated.

Reported-by: John Crispin 
Signed-off-by: Pablo Neira Ayuso 
---
 net/ipv4/netfilter/Kconfig | 3 +--
 net/ipv6/netfilter/Kconfig | 3 +--
 net/netfilter/Kconfig  | 8 +---
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig
index 5f52236780b4..dfe6fa4ea554 100644
--- a/net/ipv4/netfilter/Kconfig
+++ b/net/ipv4/netfilter/Kconfig
@@ -80,8 +80,7 @@ endif # NF_TABLES
 
 config NF_FLOW_TABLE_IPV4
tristate "Netfilter flow table IPv4 module"
-   depends on NF_CONNTRACK && NF_TABLES
-   select NF_FLOW_TABLE
+   depends on NF_FLOW_TABLE
help
  This option adds the flow table IPv4 support.
 
diff --git a/net/ipv6/netfilter/Kconfig b/net/ipv6/netfilter/Kconfig
index 4a634b7a2c80..d395d1590699 100644
--- a/net/ipv6/netfilter/Kconfig
+++ b/net/ipv6/netfilter/Kconfig
@@ -73,8 +73,7 @@ endif # NF_TABLES
 
 config NF_FLOW_TABLE_IPV6
tristate "Netfilter flow table IPv6 module"
-   depends on NF_CONNTRACK && NF_TABLES
-   select NF_FLOW_TABLE
+   depends on NF_FLOW_TABLE
help
  This option adds the flow table IPv6 support.
 
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 9019fa98003d..d3220b43c832 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -666,8 +666,8 @@ endif # NF_TABLES
 
 config NF_FLOW_TABLE_INET
tristate "Netfilter flow table mixed IPv4/IPv6 module"
-   depends on NF_FLOW_TABLE_IPV4 && NF_FLOW_TABLE_IPV6
-   select NF_FLOW_TABLE
+   depends on NF_FLOW_TABLE_IPV4
+   depends on NF_FLOW_TABLE_IPV6
help
   This option adds the flow table mixed IPv4/IPv6 support.
 
@@ -675,7 +675,9 @@ config NF_FLOW_TABLE_INET
 
 config NF_FLOW_TABLE
tristate "Netfilter flow table module"
-   depends on NF_CONNTRACK && NF_TABLES
+   depends on NETFILTER_INGRESS
+   depends on NF_CONNTRACK
+   depends on NF_TABLES
help
  This option adds the flow table core infrastructure.
 
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/11] netfilter: ipv6: nf_defrag: Kill frag queue on RFC2460 failure

2018-02-07 Thread Pablo Neira Ayuso
From: Subash Abhinov Kasiviswanathan 

Failures were seen in ICMPv6 fragmentation timeout tests if they were
run after the RFC2460 failure tests. Kernel was not sending out the
ICMPv6 fragment reassembly time exceeded packet after the fragmentation
reassembly timeout of 1 minute had elapsed.

This happened because the frag queue was not released if an error in
IPv6 fragmentation header was detected by RFC2460.

Fixes: 83f1999caeb1 ("netfilter: ipv6: nf_defrag: Pass on packets to stack per 
RFC2460")
Signed-off-by: Subash Abhinov Kasiviswanathan 
Signed-off-by: Pablo Neira Ayuso 
---
 net/ipv6/netfilter/nf_conntrack_reasm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c 
b/net/ipv6/netfilter/nf_conntrack_reasm.c
index ce53dcfda88a..b84ce3e6d728 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -264,6 +264,7 @@ static int nf_ct_frag6_queue(struct frag_queue *fq, struct 
sk_buff *skb,
 * this case. -DaveM
 */
pr_debug("end of fragment not rounded to 8 bytes.\n");
+   inet_frag_kill(>q, _frags);
return -EPROTO;
}
if (end > fq->q.len) {
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf 1/7] netfilter: x_tables: remove pr_info where possible

2018-02-07 Thread Pablo Neira Ayuso
On Wed, Feb 07, 2018 at 02:48:22PM +0100, Florian Westphal wrote:
> remove several pr_info messages that cannot be triggered with iptables.
> 
> Signed-off-by: Florian Westphal 
> ---
>  net/ipv4/netfilter/ipt_ECN.c | 10 --
>  net/netfilter/xt_HL.c| 13 +++--
>  net/netfilter/xt_LED.c   |  4 +---
>  net/netfilter/xt_cgroup.c|  4 +---
>  4 files changed, 9 insertions(+), 22 deletions(-)
> 
> diff --git a/net/ipv4/netfilter/ipt_ECN.c b/net/ipv4/netfilter/ipt_ECN.c
> index 270765236f5e..39ff167e6d86 100644
> --- a/net/ipv4/netfilter/ipt_ECN.c
> +++ b/net/ipv4/netfilter/ipt_ECN.c
> @@ -98,14 +98,12 @@ static int ecn_tg_check(const struct xt_tgchk_param *par)
>   const struct ipt_ECN_info *einfo = par->targinfo;
>   const struct ipt_entry *e = par->entryinfo;
>  
> - if (einfo->operation & IPT_ECN_OP_MASK) {
> - pr_info("unsupported ECN operation %x\n", einfo->operation);
> + if (einfo->operation & IPT_ECN_OP_MASK)

According to patch 2/7, these should be pr_debug(), or probably I'm
misunderstanding something :-).
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf 2/7] netfilter: x_tables: prefer pr_debug where possible

2018-02-07 Thread Pablo Neira Ayuso
On Wed, Feb 07, 2018 at 02:48:23PM +0100, Florian Westphal wrote:
> prefer pr_debug for cases where error is usually not seen by users.
> checkpatch complains due to lines > 80 but adding a newline doesn't
> make things any more readable.
> 
> Signed-off-by: Florian Westphal 
> ---
>  net/ipv4/netfilter/ipt_rpfilter.c  | 2 +-
>  net/ipv6/netfilter/ip6t_rpfilter.c | 2 +-
>  net/netfilter/xt_SECMARK.c | 2 +-
>  net/netfilter/xt_bpf.c | 2 +-
>  net/netfilter/xt_connlabel.c   | 2 +-
>  net/netfilter/xt_ipcomp.c  | 2 +-
>  net/netfilter/xt_ipvs.c| 2 +-
>  net/netfilter/xt_l2tp.c| 2 +-
>  net/netfilter/xt_recent.c  | 4 ++--
>  net/netfilter/xt_socket.c  | 8 
>  net/netfilter/xt_time.c| 2 +-
>  11 files changed, 15 insertions(+), 15 deletions(-)
> 
> diff --git a/net/ipv4/netfilter/ipt_rpfilter.c 
> b/net/ipv4/netfilter/ipt_rpfilter.c
> index 37fb9552e858..ffd1cf65af3a 100644
> --- a/net/ipv4/netfilter/ipt_rpfilter.c
> +++ b/net/ipv4/netfilter/ipt_rpfilter.c
> @@ -105,7 +105,7 @@ static int rpfilter_check(const struct xt_mtchk_param 
> *par)
>   const struct xt_rpfilter_info *info = par->matchinfo;
>   unsigned int options = ~XT_RPFILTER_OPTION_MASK;
>   if (info->flags & options) {
> - pr_info("unknown options encountered");
> + pr_debug("unknown options");

OK, so the idea is to use pr_debug() when it is unlikely to hit an
error via iptables, right?
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf 7/7] netfilter: x_tables: use pr ratelimiting in all remaining spots

2018-02-07 Thread Pablo Neira Ayuso
Hi Florian,

Thanks for looking into this, comments below.

On Wed, Feb 07, 2018 at 02:48:28PM +0100, Florian Westphal wrote:
> Signed-off-by: Florian Westphal 
> ---
>  net/bridge/netfilter/ebt_among.c | 10 
>  net/bridge/netfilter/ebt_limit.c |  4 ++--
>  net/ipv4/netfilter/ipt_ECN.c |  2 +-
>  net/ipv4/netfilter/ipt_REJECT.c  |  4 ++--
>  net/ipv6/netfilter/ip6t_REJECT.c |  4 ++--
>  net/ipv6/netfilter/ip6t_srh.c|  6 +++--
>  net/netfilter/xt_AUDIT.c |  4 ++--
>  net/netfilter/xt_CHECKSUM.c  |  5 ++--
>  net/netfilter/xt_CONNSECMARK.c   |  6 ++---
>  net/netfilter/xt_DSCP.c  |  2 +-
>  net/netfilter/xt_LED.c   |  2 +-
>  net/netfilter/xt_NFQUEUE.c   |  6 ++---
>  net/netfilter/xt_SECMARK.c   | 12 ++
>  net/netfilter/xt_TCPMSS.c| 10 
>  net/netfilter/xt_TPROXY.c|  6 ++---
>  net/netfilter/xt_cgroup.c|  8 ---
>  net/netfilter/xt_cluster.c   |  8 +++
>  net/netfilter/xt_connbytes.c |  4 ++--
>  net/netfilter/xt_connlabel.c |  4 ++--
>  net/netfilter/xt_connmark.c  |  8 +++
>  net/netfilter/xt_conntrack.c |  4 ++--
>  net/netfilter/xt_dscp.c  |  2 +-
>  net/netfilter/xt_ecn.c   |  4 ++--
>  net/netfilter/xt_hashlimit.c | 24 ++-
>  net/netfilter/xt_helper.c|  4 ++--
>  net/netfilter/xt_l2tp.c  | 20 +---
>  net/netfilter/xt_limit.c |  4 ++--
>  net/netfilter/xt_nat.c   |  5 ++--
>  net/netfilter/xt_nfacct.c|  6 +++--
>  net/netfilter/xt_physdev.c   |  4 +---
>  net/netfilter/xt_recent.c| 10 
>  net/netfilter/xt_set.c   | 50 
> 
>  net/netfilter/xt_state.c |  4 ++--
>  net/netfilter/xt_time.c  |  3 +--
>  34 files changed, 132 insertions(+), 127 deletions(-)
> 
> diff --git a/net/bridge/netfilter/ebt_among.c 
> b/net/bridge/netfilter/ebt_among.c
> index 279527f8b1fe..12d850a3ea68 100644
> --- a/net/bridge/netfilter/ebt_among.c
> +++ b/net/bridge/netfilter/ebt_among.c
> @@ -187,17 +187,17 @@ static int ebt_among_mt_check(const struct 
> xt_mtchk_param *par)
>   expected_length += ebt_mac_wormhash_size(wh_src);
>  
>   if (em->match_size != EBT_ALIGN(expected_length)) {
> - pr_info("wrong size: %d against expected %d, rounded to %zd\n",
> - em->match_size, expected_length,
> - EBT_ALIGN(expected_length));
> + pr_info_ratelimited("wrong size: %d against expected %d, 
> rounded to %zd\n",

Shouldn't all these be pr_err_ratelimited instead?

Probably this is a good chance to homogeneize all error reporting in
xtables.

> + em->match_size, expected_length,
> + EBT_ALIGN(expected_length));
>   return -EINVAL;
>   }
>   if (wh_dst && (err = ebt_mac_wormhash_check_integrity(wh_dst))) {
> - pr_info("dst integrity fail: %x\n", -err);
> + pr_info_ratelimited("dst integrity fail: %x\n", -err);
>   return -EINVAL;
>   }
>   if (wh_src && (err = ebt_mac_wormhash_check_integrity(wh_src))) {
> - pr_info("src integrity fail: %x\n", -err);
> + pr_info_ratelimited("src integrity fail: %x\n", -err);
>   return -EINVAL;
>   }
>   return 0;
[...]
> diff --git a/net/netfilter/xt_NFQUEUE.c b/net/netfilter/xt_NFQUEUE.c
> index a360b99a958a..9fac4710f7cf 100644
> --- a/net/netfilter/xt_NFQUEUE.c
> +++ b/net/netfilter/xt_NFQUEUE.c
> @@ -67,13 +67,13 @@ static int nfqueue_tg_check(const struct xt_tgchk_param 
> *par)
>   init_hashrandom(_initval);
>  
>   if (info->queues_total == 0) {
> - pr_err("NFQUEUE: number of total queues is 0\n");


We can probably add this all over the place in the same go?

#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

> + pr_err_ratelimited("NFQUEUE: number of total queues is 0\n");
>   return -EINVAL;
>   }
>   maxid = info->queues_total - 1 + info->queuenum;
>   if (maxid > 0x) {
> - pr_err("NFQUEUE: number of queues (%u) out of range (got %u)\n",
> -info->queues_total, maxid);
> + pr_err_ratelimited("NFQUEUE: number of queues (%u) out of range 
> (got %u)\n",
> +info->queues_total, maxid);
>   return -ERANGE;
>   }
>   if (par->target->revision == 2 && info->flags > 1)
[...]
> diff --git a/net/netfilter/xt_set.c b/net/netfilter/xt_set.c
> index 16b6b11ee83f..ba94286f25aa 100644
> --- a/net/netfilter/xt_set.c
> +++ b/net/netfilter/xt_set.c
> @@ -92,12 +92,12 @@ set_match_v0_checkentry(const struct xt_mtchk_param *par)
>   index = ip_set_nfnl_get_byindex(par->net, info->match_set.index);
>  
>   if (index == IPSET_INVALID_ID) {
> - pr_warn("Cannot find 

Re: CPU load on queued_spin_lock_slowpath

2018-02-07 Thread Tugrul Erdogan
Thanks for your advices. I will try to create the erroneous situation
by triggering icmp error for existing connection and try non-tcp patch
and kernel upgrade respectively. I will report the results at mail
list.

> On Tue, Feb 6, 2018, 7:10 AM Pablo Neira Ayuso  wrote:
>>
>> On Tue, Feb 06, 2018 at 10:56:20AM +0300, Tugrul Erdogan wrote:
>> > Hi All,
>> >
>> > My server had a locking problem with the logs located below. I can not
>> > reproduce this erroneous situation again but I think that there is an
>> > active vulnerability at my server because of this error.
>> >
>> > My server's kernel version is v4.6.4.
>>
>> Probably this helps you?
>>
>> commit 49f817d793d1bcc11d721881aac037b996feef5c
>> Author: Lin Zhang 
>> Date:   Fri Oct 6 00:44:03 2017 +0800
>>
>> netfilter: SYNPROXY: skip non-tcp packet in {ipv4, ipv6}_synproxy_hook
>>
>> 4.6.4 is rather old, BTW.
>> --
>> To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING: proc registration bug in clusterip_tg_check

2018-02-07 Thread Paolo Abeni
On Wed, 2018-02-07 at 09:43 +0100, Paolo Abeni wrote:
> On Tue, 2018-02-06 at 22:42 -0800, Cong Wang wrote:
> > On Tue, Feb 6, 2018 at 6:27 AM, syzbot
> >  wrote:
> > > Hello,
> > > 
> > > syzbot hit the following crash on net-next commit
> > > 617aebe6a97efa539cc4b8a52adccd89596e6be0 (Sun Feb 4 00:25:42 2018 +)
> > > Merge tag 'usercopy-v4.16-rc1' of
> > > git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
> > > 
> > > So far this crash happened 5 times on net-next, upstream.
> > > C reproducer is attached.
> > > syzkaller reproducer is attached.
> > > Raw console output is attached.
> > > compiler: gcc (GCC) 7.1.1 20170620
> > > .config is attached.
> > > 
> > > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > > Reported-by: syzbot+03218bcdba6aa7644...@syzkaller.appspotmail.com
> > > It will help syzbot understand when the bug is fixed. See footer for
> > > details.
> > > If you forward the report, please keep this part and the footer.
> > > 
> > > x_tables: ip_tables: osf match: only valid for protocol 6
> > > x_tables: ip_tables: osf match: only valid for protocol 6
> > > x_tables: ip_tables: osf match: only valid for protocol 6
> > > [ cut here ]
> > > proc_dir_entry 'ipt_CLUSTERIP/172.20.0.170' already registered
> > > WARNING: CPU: 1 PID: 4152 at fs/proc/generic.c:330 
> > > proc_register+0x2a4/0x370
> > > fs/proc/generic.c:329
> > > Kernel panic - not syncing: panic_on_warn set ...
> > > 
> > > CPU: 1 PID: 4152 Comm: syzkaller851476 Not tainted 4.15.0+ #221
> > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > > Google 01/01/2011
> > > Call Trace:
> > >  __dump_stack lib/dump_stack.c:17 [inline]
> > >  dump_stack+0x194/0x257 lib/dump_stack.c:53
> > >  panic+0x1e4/0x41c kernel/panic.c:183
> > >  __warn+0x1dc/0x200 kernel/panic.c:547
> > >  report_bug+0x211/0x2d0 lib/bug.c:184
> > >  fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
> > >  fixup_bug arch/x86/kernel/traps.c:247 [inline]
> > >  do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
> > >  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
> > >  invalid_op+0x22/0x40 arch/x86/entry/entry_64.S:1097
> > > RIP: 0010:proc_register+0x2a4/0x370 fs/proc/generic.c:329
> > > RSP: 0018:8801cbd6ee20 EFLAGS: 00010286
> > > RAX: dc08 RBX: 8801d2181038 RCX: 815a57ae
> > > RDX:  RSI: 1100397add74 RDI: 1100397add49
> > > RBP: 8801cbd6ee70 R08: 1100397add0b R09: 
> > > R10: 8801cbd6ecd8 R11:  R12: 8801b2bb1cc0
> > > R13: dc00 R14: 8801b0d8dbc8 R15: 8801b2bb1d81
> > >  proc_create_data+0xf8/0x180 fs/proc/generic.c:494
> > >  clusterip_config_init net/ipv4/netfilter/ipt_CLUSTERIP.c:250 [inline]
> > 
> > I think there is probably a race condition between 
> > clusterip_config_entry_put()
> > and clusterip_config_init(), after we release the spinlock, a new proc
> > with the same IP could be created therefore triggers this warning
> > 
> > I am not sure if it is enough to just move the proc_remove() under
> > spinlock...
> 
> I *think* we should change the order on proc fs entry creation, 
> because clusterip_config_init() can race with itself,
> clusterip_config_init() returns NULL if the clusterip_config_init has
> no pte, and currently such entry is inserted into the list with NULL
> pte and the list lock itself is released before creating the PTE.

I was wrong. My suggested fix does not work at all.

I tried your code and it fixes the issue here.

Feel free to submit with:

Tested-by: Paolo Abeni  

Thank you,

Paolo
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf 4/7] netfilter: x_tables: rate limit pr_err warnings

2018-02-07 Thread Florian Westphal
Signed-off-by: Florian Westphal 
---
 net/netfilter/x_tables.c | 70 +++-
 1 file changed, 34 insertions(+), 36 deletions(-)

diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 2f685ee1f9c8..0f81294dea7b 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -434,36 +434,35 @@ int xt_check_match(struct xt_mtchk_param *par,
 * ebt_among is exempt from centralized matchsize checking
 * because it uses a dynamic-size data set.
 */
-   pr_err("%s_tables: %s.%u match: invalid size "
-  "%u (kernel) != (user) %u\n",
-  xt_prefix[par->family], par->match->name,
-  par->match->revision,
-  XT_ALIGN(par->match->matchsize), size);
+   pr_err_ratelimited("%s_tables: %s.%u match: invalid size %u 
(kernel) != (user) %u\n",
+  xt_prefix[par->family], par->match->name,
+  par->match->revision,
+  XT_ALIGN(par->match->matchsize), size);
return -EINVAL;
}
if (par->match->table != NULL &&
strcmp(par->match->table, par->table) != 0) {
-   pr_err("%s_tables: %s match: only valid in %s table, not %s\n",
-  xt_prefix[par->family], par->match->name,
-  par->match->table, par->table);
+   pr_err_ratelimited("%s_tables: %s match: only valid in %s 
table, not %s\n",
+  xt_prefix[par->family], par->match->name,
+  par->match->table, par->table);
return -EINVAL;
}
if (par->match->hooks && (par->hook_mask & ~par->match->hooks) != 0) {
char used[64], allow[64];
 
-   pr_err("%s_tables: %s match: used from hooks %s, but only "
-  "valid from %s\n",
-  xt_prefix[par->family], par->match->name,
-  textify_hooks(used, sizeof(used), par->hook_mask,
-par->family),
-  textify_hooks(allow, sizeof(allow), par->match->hooks,
-par->family));
+   pr_err_ratelimited("%s_tables: %s match: used from hooks %s, 
but only valid from %s\n",
+  xt_prefix[par->family], par->match->name,
+  textify_hooks(used, sizeof(used),
+par->hook_mask, par->family),
+  textify_hooks(allow, sizeof(allow),
+par->match->hooks,
+par->family));
return -EINVAL;
}
if (par->match->proto && (par->match->proto != proto || inv_proto)) {
-   pr_err("%s_tables: %s match: only valid for protocol %u\n",
-  xt_prefix[par->family], par->match->name,
-  par->match->proto);
+   pr_err_ratelimited("%s_tables: %s match: only valid for 
protocol %u\n",
+  xt_prefix[par->family], par->match->name,
+  par->match->proto);
return -EINVAL;
}
if (par->match->checkentry != NULL) {
@@ -814,36 +813,35 @@ int xt_check_target(struct xt_tgchk_param *par,
int ret;
 
if (XT_ALIGN(par->target->targetsize) != size) {
-   pr_err("%s_tables: %s.%u target: invalid size "
-  "%u (kernel) != (user) %u\n",
-  xt_prefix[par->family], par->target->name,
-  par->target->revision,
-  XT_ALIGN(par->target->targetsize), size);
+   pr_err_ratelimited("%s_tables: %s.%u target: invalid size %u 
(kernel) != (user) %u\n",
+  xt_prefix[par->family], par->target->name,
+  par->target->revision,
+  XT_ALIGN(par->target->targetsize), size);
return -EINVAL;
}
if (par->target->table != NULL &&
strcmp(par->target->table, par->table) != 0) {
-   pr_err("%s_tables: %s target: only valid in %s table, not %s\n",
-  xt_prefix[par->family], par->target->name,
-  par->target->table, par->table);
+   pr_err_ratelimited("%s_tables: %s target: only valid in %s 
table, not %s\n",
+  xt_prefix[par->family], par->target->name,
+  par->target->table, par->table);
return -EINVAL;
}
if (par->target->hooks && (par->hook_mask & ~par->target->hooks) != 0) {
char 

[PATCH nf 6/7] netfilter: x_tables: use pr ratelimiting

2018-02-07 Thread Florian Westphal
all of these print simple error message - use single pr_ratelimit call.
checkpatch complains about lines > 80 but this would require splitting
several "literals" over multiple lines which is worse.

Signed-off-by: Florian Westphal 
---
 net/netfilter/xt_HMARK.c| 24 ++--
 net/netfilter/xt_addrtype.c | 33 -
 net/netfilter/xt_policy.c   | 23 +--
 3 files changed, 43 insertions(+), 37 deletions(-)

diff --git a/net/netfilter/xt_HMARK.c b/net/netfilter/xt_HMARK.c
index 60e6dbe12460..3345bed8b200 100644
--- a/net/netfilter/xt_HMARK.c
+++ b/net/netfilter/xt_HMARK.c
@@ -9,6 +9,8 @@
  * the Free Software Foundation.
  */
 
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
 #include 
 #include 
 #include 
@@ -312,29 +314,31 @@ hmark_tg_v4(struct sk_buff *skb, const struct 
xt_action_param *par)
 static int hmark_tg_check(const struct xt_tgchk_param *par)
 {
const struct xt_hmark_info *info = par->targinfo;
+   const char *errmsg = "hash modulus can't be zero";
 
-   if (!info->hmodulus) {
-   pr_info("xt_HMARK: hash modulus can't be zero\n");
-   return -EINVAL;
-   }
+   if (!info->hmodulus)
+   goto err;
if (info->proto_mask &&
(info->flags & XT_HMARK_FLAG(XT_HMARK_METHOD_L3))) {
-   pr_info("xt_HMARK: proto mask must be zero with L3 mode\n");
-   return -EINVAL;
+   errmsg = "proto mask must be zero with L3 mode";
+   goto err;
}
if (info->flags & XT_HMARK_FLAG(XT_HMARK_SPI_MASK) &&
(info->flags & (XT_HMARK_FLAG(XT_HMARK_SPORT_MASK) |
 XT_HMARK_FLAG(XT_HMARK_DPORT_MASK {
-   pr_info("xt_HMARK: spi-mask and port-mask can't be combined\n");
-   return -EINVAL;
+   errmsg = "spi-mask and port-mask can't be combined";
+   goto err;
}
if (info->flags & XT_HMARK_FLAG(XT_HMARK_SPI) &&
(info->flags & (XT_HMARK_FLAG(XT_HMARK_SPORT) |
 XT_HMARK_FLAG(XT_HMARK_DPORT {
-   pr_info("xt_HMARK: spi-set and port-set can't be combined\n");
-   return -EINVAL;
+   errmsg = "spi-set and port-set can't be combined";
+   goto err;
}
return 0;
+err:
+   pr_info_ratelimited("%s\n", errmsg);
+   return -EINVAL;
 }
 
 static struct xt_target hmark_tg_reg[] __read_mostly = {
diff --git a/net/netfilter/xt_addrtype.c b/net/netfilter/xt_addrtype.c
index 911a7c0da504..89e281b3bfc2 100644
--- a/net/netfilter/xt_addrtype.c
+++ b/net/netfilter/xt_addrtype.c
@@ -164,48 +164,47 @@ addrtype_mt_v1(const struct sk_buff *skb, struct 
xt_action_param *par)
 
 static int addrtype_mt_checkentry_v1(const struct xt_mtchk_param *par)
 {
+   const char *errmsg = "both incoming and outgoing interface limitation 
cannot be selected";
struct xt_addrtype_info_v1 *info = par->matchinfo;
 
if (info->flags & XT_ADDRTYPE_LIMIT_IFACE_IN &&
-   info->flags & XT_ADDRTYPE_LIMIT_IFACE_OUT) {
-   pr_info("both incoming and outgoing "
-   "interface limitation cannot be selected\n");
-   return -EINVAL;
-   }
+   info->flags & XT_ADDRTYPE_LIMIT_IFACE_OUT)
+   goto err;
 
if (par->hook_mask & ((1 << NF_INET_PRE_ROUTING) |
(1 << NF_INET_LOCAL_IN)) &&
info->flags & XT_ADDRTYPE_LIMIT_IFACE_OUT) {
-   pr_info("output interface limitation "
-   "not valid in PREROUTING and INPUT\n");
-   return -EINVAL;
+   errmsg = "output interface limitation not valid in PREROUTING 
and INPUT";
+   goto err;
}
 
if (par->hook_mask & ((1 << NF_INET_POST_ROUTING) |
(1 << NF_INET_LOCAL_OUT)) &&
info->flags & XT_ADDRTYPE_LIMIT_IFACE_IN) {
-   pr_info("input interface limitation "
-   "not valid in POSTROUTING and OUTPUT\n");
-   return -EINVAL;
+   errmsg = "input interface limitation not valid in POSTROUTING 
and OUTPUT";
+   goto err;
}
 
 #if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
if (par->family == NFPROTO_IPV6) {
if ((info->source | info->dest) & XT_ADDRTYPE_BLACKHOLE) {
-   pr_err("ipv6 BLACKHOLE matching not supported\n");
-   return -EINVAL;
+   errmsg = "ipv6 BLACKHOLE matching not supported";
+   goto err;
}
if ((info->source | info->dest) >= XT_ADDRTYPE_PROHIBIT) {
-   pr_err("ipv6 PROHIBIT (THROW, NAT ..) matching not 
supported\n");
-   return -EINVAL;
+   errmsg = "ipv6 PROHIBIT (THROW, NAT ..) matching not 
supported";
+

[PATCH nf 5/7] netfilter: x_tables: rate-limit table mismatch warnings

2018-02-07 Thread Florian Westphal
Signed-off-by: Florian Westphal 
---
 net/ipv4/netfilter/ipt_rpfilter.c  | 4 ++--
 net/ipv6/netfilter/ip6t_rpfilter.c | 4 ++--
 net/netfilter/xt_CONNSECMARK.c | 4 ++--
 net/netfilter/xt_SECMARK.c | 4 ++--
 4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/netfilter/ipt_rpfilter.c 
b/net/ipv4/netfilter/ipt_rpfilter.c
index ffd1cf65af3a..afea7aff80c1 100644
--- a/net/ipv4/netfilter/ipt_rpfilter.c
+++ b/net/ipv4/netfilter/ipt_rpfilter.c
@@ -111,8 +111,8 @@ static int rpfilter_check(const struct xt_mtchk_param *par)
 
if (strcmp(par->table, "mangle") != 0 &&
strcmp(par->table, "raw") != 0) {
-   pr_info("match only valid in the \'raw\' "
-   "or \'mangle\' tables, not \'%s\'.\n", par->table);
+   pr_info_ratelimited("only valid in \'raw\' or \'mangle\' table, 
not \'%s\'\n",
+   par->table);
return -EINVAL;
}
 
diff --git a/net/ipv6/netfilter/ip6t_rpfilter.c 
b/net/ipv6/netfilter/ip6t_rpfilter.c
index c9e27d4687a2..9740f9c6428c 100644
--- a/net/ipv6/netfilter/ip6t_rpfilter.c
+++ b/net/ipv6/netfilter/ip6t_rpfilter.c
@@ -109,8 +109,8 @@ static int rpfilter_check(const struct xt_mtchk_param *par)
 
if (strcmp(par->table, "mangle") != 0 &&
strcmp(par->table, "raw") != 0) {
-   pr_info("match only valid in the \'raw\' "
-   "or \'mangle\' tables, not \'%s\'.\n", par->table);
+   pr_info_ratelimited("only valid in \'raw\' or \'mangle\' table, 
not \'%s\'\n",
+   par->table);
return -EINVAL;
}
 
diff --git a/net/netfilter/xt_CONNSECMARK.c b/net/netfilter/xt_CONNSECMARK.c
index da56c06a443c..6f30cd399e42 100644
--- a/net/netfilter/xt_CONNSECMARK.c
+++ b/net/netfilter/xt_CONNSECMARK.c
@@ -91,8 +91,8 @@ static int connsecmark_tg_check(const struct xt_tgchk_param 
*par)
 
if (strcmp(par->table, "mangle") != 0 &&
strcmp(par->table, "security") != 0) {
-   pr_info("target only valid in the \'mangle\' "
-   "or \'security\' tables, not \'%s\'.\n", par->table);
+   pr_info_ratelimited("only valid in \'mangle\' or \'security\' 
table, not \'%s\'\n",
+   par->table);
return -EINVAL;
}
 
diff --git a/net/netfilter/xt_SECMARK.c b/net/netfilter/xt_SECMARK.c
index 36f7ad881a7e..1f2a0627478a 100644
--- a/net/netfilter/xt_SECMARK.c
+++ b/net/netfilter/xt_SECMARK.c
@@ -86,8 +86,8 @@ static int secmark_tg_check(const struct xt_tgchk_param *par)
 
if (strcmp(par->table, "mangle") != 0 &&
strcmp(par->table, "security") != 0) {
-   pr_info("target only valid in the \'mangle\' "
-   "or \'security\' tables, not \'%s\'.\n", par->table);
+   pr_info_ratelimited("only valid in \'mangle\' or \'security\' 
table, not \'%s\'\n",
+   par->table);
return -EINVAL;
}
 
-- 
2.13.6

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf 2/7] netfilter: x_tables: prefer pr_debug where possible

2018-02-07 Thread Florian Westphal
prefer pr_debug for cases where error is usually not seen by users.
checkpatch complains due to lines > 80 but adding a newline doesn't
make things any more readable.

Signed-off-by: Florian Westphal 
---
 net/ipv4/netfilter/ipt_rpfilter.c  | 2 +-
 net/ipv6/netfilter/ip6t_rpfilter.c | 2 +-
 net/netfilter/xt_SECMARK.c | 2 +-
 net/netfilter/xt_bpf.c | 2 +-
 net/netfilter/xt_connlabel.c   | 2 +-
 net/netfilter/xt_ipcomp.c  | 2 +-
 net/netfilter/xt_ipvs.c| 2 +-
 net/netfilter/xt_l2tp.c| 2 +-
 net/netfilter/xt_recent.c  | 4 ++--
 net/netfilter/xt_socket.c  | 8 
 net/netfilter/xt_time.c| 2 +-
 11 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/net/ipv4/netfilter/ipt_rpfilter.c 
b/net/ipv4/netfilter/ipt_rpfilter.c
index 37fb9552e858..ffd1cf65af3a 100644
--- a/net/ipv4/netfilter/ipt_rpfilter.c
+++ b/net/ipv4/netfilter/ipt_rpfilter.c
@@ -105,7 +105,7 @@ static int rpfilter_check(const struct xt_mtchk_param *par)
const struct xt_rpfilter_info *info = par->matchinfo;
unsigned int options = ~XT_RPFILTER_OPTION_MASK;
if (info->flags & options) {
-   pr_info("unknown options encountered");
+   pr_debug("unknown options");
return -EINVAL;
}
 
diff --git a/net/ipv6/netfilter/ip6t_rpfilter.c 
b/net/ipv6/netfilter/ip6t_rpfilter.c
index b12e61b7b16c..c9e27d4687a2 100644
--- a/net/ipv6/netfilter/ip6t_rpfilter.c
+++ b/net/ipv6/netfilter/ip6t_rpfilter.c
@@ -103,7 +103,7 @@ static int rpfilter_check(const struct xt_mtchk_param *par)
unsigned int options = ~XT_RPFILTER_OPTION_MASK;
 
if (info->flags & options) {
-   pr_info("unknown options encountered");
+   pr_debug("unknown options");
return -EINVAL;
}
 
diff --git a/net/netfilter/xt_SECMARK.c b/net/netfilter/xt_SECMARK.c
index 9faf5e050b79..36f7ad881a7e 100644
--- a/net/netfilter/xt_SECMARK.c
+++ b/net/netfilter/xt_SECMARK.c
@@ -101,7 +101,7 @@ static int secmark_tg_check(const struct xt_tgchk_param 
*par)
case SECMARK_MODE_SEL:
break;
default:
-   pr_info("invalid mode: %hu\n", info->mode);
+   pr_debug("invalid mode: %hu\n", info->mode);
return -EINVAL;
}
 
diff --git a/net/netfilter/xt_bpf.c b/net/netfilter/xt_bpf.c
index 06b090d8e901..77a12ef9e11e 100644
--- a/net/netfilter/xt_bpf.c
+++ b/net/netfilter/xt_bpf.c
@@ -34,7 +34,7 @@ static int __bpf_mt_check_bytecode(struct sock_filter *insns, 
__u16 len,
program.filter = insns;
 
if (bpf_prog_create(ret, )) {
-   pr_info("bpf: check failed: parse error\n");
+   pr_debug("bpf: check failed: parse error\n");
return -EINVAL;
}
 
diff --git a/net/netfilter/xt_connlabel.c b/net/netfilter/xt_connlabel.c
index 23372879e6e3..cf3031e4ff61 100644
--- a/net/netfilter/xt_connlabel.c
+++ b/net/netfilter/xt_connlabel.c
@@ -57,7 +57,7 @@ static int connlabel_mt_check(const struct xt_mtchk_param 
*par)
int ret;
 
if (info->options & ~options) {
-   pr_err("Unknown options in mask %x\n", info->options);
+   pr_debug("Unknown options in mask %x\n", info->options);
return -EINVAL;
}
 
diff --git a/net/netfilter/xt_ipcomp.c b/net/netfilter/xt_ipcomp.c
index 7ca64a50db04..1ecde0efe879 100644
--- a/net/netfilter/xt_ipcomp.c
+++ b/net/netfilter/xt_ipcomp.c
@@ -72,7 +72,7 @@ static int comp_mt_check(const struct xt_mtchk_param *par)
 
/* Must specify no unknown invflags */
if (compinfo->invflags & ~XT_IPCOMP_INV_MASK) {
-   pr_err("unknown flags %X\n", compinfo->invflags);
+   pr_debug("unknown flags %X\n", compinfo->invflags);
return -EINVAL;
}
return 0;
diff --git a/net/netfilter/xt_ipvs.c b/net/netfilter/xt_ipvs.c
index 42540d26c2b8..e5ffc2f1424c 100644
--- a/net/netfilter/xt_ipvs.c
+++ b/net/netfilter/xt_ipvs.c
@@ -158,7 +158,7 @@ static int ipvs_mt_check(const struct xt_mtchk_param *par)
&& par->family != NFPROTO_IPV6
 #endif
) {
-   pr_info("protocol family %u not supported\n", par->family);
+   pr_debug("protocol family %u not supported\n", par->family);
return -EINVAL;
}
 
diff --git a/net/netfilter/xt_l2tp.c b/net/netfilter/xt_l2tp.c
index 8aee572771f2..54ac58b309e5 100644
--- a/net/netfilter/xt_l2tp.c
+++ b/net/netfilter/xt_l2tp.c
@@ -216,7 +216,7 @@ static int l2tp_mt_check(const struct xt_mtchk_param *par)
/* Check for invalid flags */
if (info->flags & ~(XT_L2TP_TID | XT_L2TP_SID | XT_L2TP_VERSION |
XT_L2TP_TYPE)) {
-   pr_info("unknown flags: %x\n", info->flags);
+   pr_debug("unknown flags: %x\n", info->flags);
return -EINVAL;
}
 
diff 

[PATCH nf 3/7] netfilter: xt_CT: use pr ratelimiting

2018-02-07 Thread Florian Westphal
checkpatch complains about line > 80 but this would require splitting
"literal" over two lines which is worse.

Signed-off-by: Florian Westphal 
---
 net/netfilter/xt_CT.c | 25 +
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/net/netfilter/xt_CT.c b/net/netfilter/xt_CT.c
index 5a152e2acfd5..8790190c6feb 100644
--- a/net/netfilter/xt_CT.c
+++ b/net/netfilter/xt_CT.c
@@ -82,15 +82,14 @@ xt_ct_set_helper(struct nf_conn *ct, const char 
*helper_name,
 
proto = xt_ct_find_proto(par);
if (!proto) {
-   pr_info("You must specify a L4 protocol, and not use "
-   "inversions on it.\n");
+   pr_info_ratelimited("You must specify a L4 protocol and not use 
inversions on it\n");
return -ENOENT;
}
 
helper = nf_conntrack_helper_try_module_get(helper_name, par->family,
proto);
if (helper == NULL) {
-   pr_info("No such helper \"%s\"\n", helper_name);
+   pr_info_ratelimited("No such helper \"%s\"\n", helper_name);
return -ENOENT;
}
 
@@ -124,6 +123,7 @@ xt_ct_set_timeout(struct nf_conn *ct, const struct 
xt_tgchk_param *par,
const struct nf_conntrack_l4proto *l4proto;
struct ctnl_timeout *timeout;
struct nf_conn_timeout *timeout_ext;
+   const char *errmsg = NULL;
int ret = 0;
u8 proto;
 
@@ -131,29 +131,29 @@ xt_ct_set_timeout(struct nf_conn *ct, const struct 
xt_tgchk_param *par,
timeout_find_get = rcu_dereference(nf_ct_timeout_find_get_hook);
if (timeout_find_get == NULL) {
ret = -ENOENT;
-   pr_info("Timeout policy base is empty\n");
+   errmsg = "Timeout policy base is empty";
goto out;
}
 
proto = xt_ct_find_proto(par);
if (!proto) {
ret = -EINVAL;
-   pr_info("You must specify a L4 protocol, and not use "
-   "inversions on it.\n");
+   errmsg = "You must specify a L4 protocol and not use inversions 
on it";
goto out;
}
 
timeout = timeout_find_get(par->net, timeout_name);
if (timeout == NULL) {
ret = -ENOENT;
-   pr_info("No such timeout policy \"%s\"\n", timeout_name);
+   pr_info_ratelimited("No such timeout policy \"%s\"\n",
+   timeout_name);
goto out;
}
 
if (timeout->l3num != par->family) {
ret = -EINVAL;
-   pr_info("Timeout policy `%s' can only be used by L3 protocol "
-   "number %d\n", timeout_name, timeout->l3num);
+   pr_info_ratelimited("Timeout policy `%s' can only be used by 
L%d protocol number %d\n",
+   timeout_name, 3, timeout->l3num);
goto err_put_timeout;
}
/* Make sure the timeout policy matches any existing protocol tracker,
@@ -162,9 +162,8 @@ xt_ct_set_timeout(struct nf_conn *ct, const struct 
xt_tgchk_param *par,
l4proto = __nf_ct_l4proto_find(par->family, proto);
if (timeout->l4proto->l4proto != l4proto->l4proto) {
ret = -EINVAL;
-   pr_info("Timeout policy `%s' can only be used by L4 protocol "
-   "number %d\n",
-   timeout_name, timeout->l4proto->l4proto);
+   pr_info_ratelimited("Timeout policy `%s' can only be used by 
L%d protocol number %d\n",
+   timeout_name, 4, timeout->l4proto->l4proto);
goto err_put_timeout;
}
timeout_ext = nf_ct_timeout_ext_add(ct, timeout, GFP_ATOMIC);
@@ -180,6 +179,8 @@ xt_ct_set_timeout(struct nf_conn *ct, const struct 
xt_tgchk_param *par,
__xt_ct_tg_timeout_put(timeout);
 out:
rcu_read_unlock();
+   if (errmsg)
+   pr_info_ratelimited("%s\n", errmsg);
return ret;
 #else
return -EOPNOTSUPP;
-- 
2.13.6

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf 7/7] netfilter: x_tables: use pr ratelimiting in all remaining spots

2018-02-07 Thread Florian Westphal
Signed-off-by: Florian Westphal 
---
 net/bridge/netfilter/ebt_among.c | 10 
 net/bridge/netfilter/ebt_limit.c |  4 ++--
 net/ipv4/netfilter/ipt_ECN.c |  2 +-
 net/ipv4/netfilter/ipt_REJECT.c  |  4 ++--
 net/ipv6/netfilter/ip6t_REJECT.c |  4 ++--
 net/ipv6/netfilter/ip6t_srh.c|  6 +++--
 net/netfilter/xt_AUDIT.c |  4 ++--
 net/netfilter/xt_CHECKSUM.c  |  5 ++--
 net/netfilter/xt_CONNSECMARK.c   |  6 ++---
 net/netfilter/xt_DSCP.c  |  2 +-
 net/netfilter/xt_LED.c   |  2 +-
 net/netfilter/xt_NFQUEUE.c   |  6 ++---
 net/netfilter/xt_SECMARK.c   | 12 ++
 net/netfilter/xt_TCPMSS.c| 10 
 net/netfilter/xt_TPROXY.c|  6 ++---
 net/netfilter/xt_cgroup.c|  8 ---
 net/netfilter/xt_cluster.c   |  8 +++
 net/netfilter/xt_connbytes.c |  4 ++--
 net/netfilter/xt_connlabel.c |  4 ++--
 net/netfilter/xt_connmark.c  |  8 +++
 net/netfilter/xt_conntrack.c |  4 ++--
 net/netfilter/xt_dscp.c  |  2 +-
 net/netfilter/xt_ecn.c   |  4 ++--
 net/netfilter/xt_hashlimit.c | 24 ++-
 net/netfilter/xt_helper.c|  4 ++--
 net/netfilter/xt_l2tp.c  | 20 +---
 net/netfilter/xt_limit.c |  4 ++--
 net/netfilter/xt_nat.c   |  5 ++--
 net/netfilter/xt_nfacct.c|  6 +++--
 net/netfilter/xt_physdev.c   |  4 +---
 net/netfilter/xt_recent.c| 10 
 net/netfilter/xt_set.c   | 50 
 net/netfilter/xt_state.c |  4 ++--
 net/netfilter/xt_time.c  |  3 +--
 34 files changed, 132 insertions(+), 127 deletions(-)

diff --git a/net/bridge/netfilter/ebt_among.c b/net/bridge/netfilter/ebt_among.c
index 279527f8b1fe..12d850a3ea68 100644
--- a/net/bridge/netfilter/ebt_among.c
+++ b/net/bridge/netfilter/ebt_among.c
@@ -187,17 +187,17 @@ static int ebt_among_mt_check(const struct xt_mtchk_param 
*par)
expected_length += ebt_mac_wormhash_size(wh_src);
 
if (em->match_size != EBT_ALIGN(expected_length)) {
-   pr_info("wrong size: %d against expected %d, rounded to %zd\n",
-   em->match_size, expected_length,
-   EBT_ALIGN(expected_length));
+   pr_info_ratelimited("wrong size: %d against expected %d, 
rounded to %zd\n",
+   em->match_size, expected_length,
+   EBT_ALIGN(expected_length));
return -EINVAL;
}
if (wh_dst && (err = ebt_mac_wormhash_check_integrity(wh_dst))) {
-   pr_info("dst integrity fail: %x\n", -err);
+   pr_info_ratelimited("dst integrity fail: %x\n", -err);
return -EINVAL;
}
if (wh_src && (err = ebt_mac_wormhash_check_integrity(wh_src))) {
-   pr_info("src integrity fail: %x\n", -err);
+   pr_info_ratelimited("src integrity fail: %x\n", -err);
return -EINVAL;
}
return 0;
diff --git a/net/bridge/netfilter/ebt_limit.c b/net/bridge/netfilter/ebt_limit.c
index 61a9f1be1263..165b9d678cf1 100644
--- a/net/bridge/netfilter/ebt_limit.c
+++ b/net/bridge/netfilter/ebt_limit.c
@@ -72,8 +72,8 @@ static int ebt_limit_mt_check(const struct xt_mtchk_param 
*par)
/* Check for overflow. */
if (info->burst == 0 ||
user2credits(info->avg * info->burst) < user2credits(info->avg)) {
-   pr_info("overflow, try lower: %u/%u\n",
-   info->avg, info->burst);
+   pr_info_ratelimited("overflow, try lower: %u/%u\n",
+   info->avg, info->burst);
return -EINVAL;
}
 
diff --git a/net/ipv4/netfilter/ipt_ECN.c b/net/ipv4/netfilter/ipt_ECN.c
index 39ff167e6d86..aaaf9a81fbc9 100644
--- a/net/ipv4/netfilter/ipt_ECN.c
+++ b/net/ipv4/netfilter/ipt_ECN.c
@@ -106,7 +106,7 @@ static int ecn_tg_check(const struct xt_tgchk_param *par)
 
if ((einfo->operation & (IPT_ECN_OP_SET_ECE|IPT_ECN_OP_SET_CWR)) &&
(e->ip.proto != IPPROTO_TCP || (e->ip.invflags & XT_INV_PROTO))) {
-   pr_info("cannot use TCP operations on a non-tcp rule\n");
+   pr_info_ratelimited("cannot use operation on non-tcp rule\n");
return -EINVAL;
}
return 0;
diff --git a/net/ipv4/netfilter/ipt_REJECT.c b/net/ipv4/netfilter/ipt_REJECT.c
index 8bd0d7b26632..e8bed3390e58 100644
--- a/net/ipv4/netfilter/ipt_REJECT.c
+++ b/net/ipv4/netfilter/ipt_REJECT.c
@@ -74,13 +74,13 @@ static int reject_tg_check(const struct xt_tgchk_param *par)
const struct ipt_entry *e = par->entryinfo;
 
if (rejinfo->with == IPT_ICMP_ECHOREPLY) {
-   pr_info("ECHOREPLY no longer supported.\n");
+   pr_info_ratelimited("ECHOREPLY no longer supported.\n");
return -EINVAL;
} else if (rejinfo->with == 

[PATCH nf 1/7] netfilter: x_tables: remove pr_info where possible

2018-02-07 Thread Florian Westphal
remove several pr_info messages that cannot be triggered with iptables.

Signed-off-by: Florian Westphal 
---
 net/ipv4/netfilter/ipt_ECN.c | 10 --
 net/netfilter/xt_HL.c| 13 +++--
 net/netfilter/xt_LED.c   |  4 +---
 net/netfilter/xt_cgroup.c|  4 +---
 4 files changed, 9 insertions(+), 22 deletions(-)

diff --git a/net/ipv4/netfilter/ipt_ECN.c b/net/ipv4/netfilter/ipt_ECN.c
index 270765236f5e..39ff167e6d86 100644
--- a/net/ipv4/netfilter/ipt_ECN.c
+++ b/net/ipv4/netfilter/ipt_ECN.c
@@ -98,14 +98,12 @@ static int ecn_tg_check(const struct xt_tgchk_param *par)
const struct ipt_ECN_info *einfo = par->targinfo;
const struct ipt_entry *e = par->entryinfo;
 
-   if (einfo->operation & IPT_ECN_OP_MASK) {
-   pr_info("unsupported ECN operation %x\n", einfo->operation);
+   if (einfo->operation & IPT_ECN_OP_MASK)
return -EINVAL;
-   }
-   if (einfo->ip_ect & ~IPT_ECN_IP_MASK) {
-   pr_info("new ECT codepoint %x out of mask\n", einfo->ip_ect);
+
+   if (einfo->ip_ect & ~IPT_ECN_IP_MASK)
return -EINVAL;
-   }
+
if ((einfo->operation & (IPT_ECN_OP_SET_ECE|IPT_ECN_OP_SET_CWR)) &&
(e->ip.proto != IPPROTO_TCP || (e->ip.invflags & XT_INV_PROTO))) {
pr_info("cannot use TCP operations on a non-tcp rule\n");
diff --git a/net/netfilter/xt_HL.c b/net/netfilter/xt_HL.c
index 1535e87ed9bd..4653b071bed4 100644
--- a/net/netfilter/xt_HL.c
+++ b/net/netfilter/xt_HL.c
@@ -105,10 +105,8 @@ static int ttl_tg_check(const struct xt_tgchk_param *par)
 {
const struct ipt_TTL_info *info = par->targinfo;
 
-   if (info->mode > IPT_TTL_MAXMODE) {
-   pr_info("TTL: invalid or unknown mode %u\n", info->mode);
+   if (info->mode > IPT_TTL_MAXMODE)
return -EINVAL;
-   }
if (info->mode != IPT_TTL_SET && info->ttl == 0)
return -EINVAL;
return 0;
@@ -118,15 +116,10 @@ static int hl_tg6_check(const struct xt_tgchk_param *par)
 {
const struct ip6t_HL_info *info = par->targinfo;
 
-   if (info->mode > IP6T_HL_MAXMODE) {
-   pr_info("invalid or unknown mode %u\n", info->mode);
+   if (info->mode > IP6T_HL_MAXMODE)
return -EINVAL;
-   }
-   if (info->mode != IP6T_HL_SET && info->hop_limit == 0) {
-   pr_info("increment/decrement does not "
-   "make sense with value 0\n");
+   if (info->mode != IP6T_HL_SET && info->hop_limit == 0)
return -EINVAL;
-   }
return 0;
 }
 
diff --git a/net/netfilter/xt_LED.c b/net/netfilter/xt_LED.c
index 1dcad893df78..ece311c11fdc 100644
--- a/net/netfilter/xt_LED.c
+++ b/net/netfilter/xt_LED.c
@@ -111,10 +111,8 @@ static int led_tg_check(const struct xt_tgchk_param *par)
struct xt_led_info_internal *ledinternal;
int err;
 
-   if (ledinfo->id[0] == '\0') {
-   pr_info("No 'id' parameter given.\n");
+   if (ledinfo->id[0] == '\0')
return -EINVAL;
-   }
 
mutex_lock(_led_mutex);
 
diff --git a/net/netfilter/xt_cgroup.c b/net/netfilter/xt_cgroup.c
index 891f4e7e8ea7..556530db7dbb 100644
--- a/net/netfilter/xt_cgroup.c
+++ b/net/netfilter/xt_cgroup.c
@@ -42,10 +42,8 @@ static int cgroup_mt_check_v1(const struct xt_mtchk_param 
*par)
if ((info->invert_path & ~1) || (info->invert_classid & ~1))
return -EINVAL;
 
-   if (!info->has_path && !info->has_classid) {
-   pr_info("xt_cgroup: no path or classid specified\n");
+   if (!info->has_path && !info->has_classid)
return -EINVAL;
-   }
 
if (info->has_path && info->has_classid) {
pr_info("xt_cgroup: both path and classid specified\n");
-- 
2.13.6

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


netfilter: x_tables: ratelimit most printks

2018-02-07 Thread Florian Westphal
Aeons ago, before namespaces, there was no need to ratelimit this:
all of these error messages got triggered in response to iptables
commands, which need CAP_NET_ADMIN.

Nowadays we have namespaces, so its better to ratelimit these.
This should also help fuzzing (syzkaller), as it can generate a large
volume of error messages (which are useless there).

The patches are split as follows:
- first get rid of printks that should never be triggered, as userland
  doesn't generate such malformed rules anyway.
- second, switch some printks to pr_debug.  This is mostly for messages
  where it might make sense for developers to see what exactly went
  wrong.

Rest of the patches swap remaining pr_foo with pr_foo_ratelimited().

Note that most patches introduce overly long lines, but splitting these
would make it necessary to split the error messages which is worse.

46 files changed, 254 insertions(+), 257 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


WARNING: proc registration bug in clusterip_tg_check

2018-02-07 Thread syzbot

Hello,

syzbot tried to test the proposed patch but build/boot failed:

kernel build failed: failed to run /usr/bin/make [make bzImage -j 32  
CC=/syzkaller/gcc/bin/gcc]: exit status 2

scripts/kconfig/conf  --silentoldconfig Kconfig
  CHK include/config/kernel.release
  CHK include/generated/uapi/linux/version.h
  CHK include/generated/utsrelease.h
  CHK scripts/mod/devicetable-offsets.h
  CHK include/generated/bounds.h
  CHK include/generated/timeconst.h
  CHK include/generated/asm-offsets.h
  CALLscripts/checksyscalls.sh
  CHK include/generated/compile.h
  CC  net/ipv4/netfilter/ipt_CLUSTERIP.o
  CC  net/ipv4/netfilter/ipt_MASQUERADE.o
  CC  net/ipv4/netfilter/ipt_REJECT.o
  CC  net/ipv4/netfilter/ipt_SYNPROXY.o
  CC  net/ipv4/netfilter/arp_tables.o
  CC  net/ipv4/netfilter/arpt_mangle.o
  CC  net/ipv4/netfilter/arptable_filter.o
  CC  net/ipv4/netfilter/nf_dup_ipv4.o
net/ipv4/netfilter/ipt_CLUSTERIP.c: In function ‘clusterip_config_init’:
net/ipv4/netfilter/ipt_CLUSTERIP.c:253:22: error: expected ‘;’ before ‘:’  
token

   goto err_remove_pte:
  ^
scripts/Makefile.build:316: recipe for  
target 'net/ipv4/netfilter/ipt_CLUSTERIP.o' failed

make[3]: *** [net/ipv4/netfilter/ipt_CLUSTERIP.o] Error 1
make[3]: *** Waiting for unfinished jobs
scripts/Makefile.build:575: recipe for target 'net/ipv4/netfilter' failed
make[2]: *** [net/ipv4/netfilter] Error 2
scripts/Makefile.build:575: recipe for target 'net/ipv4' failed
make[1]: *** [net/ipv4] Error 2
Makefile:1020: recipe for target 'net' failed
make: *** [net] Error 2



Tested on net commit
176bfb406d735655f9a69d868a7af0c3da959d51 (Tue Feb 6 16:48:40 2018 +)
Merge branch 'be2net-patch-set'

compiler: gcc (GCC) 7.1.1 20170620
Patch is attached.



--- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
+++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
@@ -230,17 +230,6 @@ clusterip_config_init(struct net *net, const struct 
ipt_clusterip_tgt_info *i,
refcount_set(>refcount, 1);
refcount_set(>entries, 1);
 
-   spin_lock_bh(>lock);
-   if (__clusterip_config_find(net, ip)) {
-   spin_unlock_bh(>lock);
-   kfree(c);
-
-   return ERR_PTR(-EBUSY);
-   }
-
-   list_add_rcu(>list, >configs);
-   spin_unlock_bh(>lock);
-
 #ifdef CONFIG_PROC_FS
{
char buffer[16];
@@ -257,20 +246,31 @@ clusterip_config_init(struct net *net, const struct 
ipt_clusterip_tgt_info *i,
}
 #endif
 
+   spin_lock_bh(>lock);
+   if (__clusterip_config_find(net, ip)) {
+   spin_unlock_bh(>lock);
+   err = -EBUSY;
+   goto err_remove_pte:
+   }
+
+   list_add_rcu(>list, >configs);
+   spin_unlock_bh(>lock);
+
c->notifier.notifier_call = clusterip_netdev_event;
err = register_netdevice_notifier(>notifier);
if (!err)
return c;
 
+   spin_lock_bh(>lock);
+   list_del_rcu(>list);
+   spin_unlock_bh(>lock);
+
+err_remove_pte:
 #ifdef CONFIG_PROC_FS
proc_remove(c->pde);
 err:
 #endif
-   spin_lock_bh(>lock);
-   list_del_rcu(>list);
-   spin_unlock_bh(>lock);
kfree(c);
-
return ERR_PTR(err);
 }
 


[PATCH nf RFC] netfilter: x_tables: only allow jumps to user-defined chains

2018-02-07 Thread Florian Westphal
This rejects rulesets where a jump occurs to a non-user defined chain.
This isn't limited in any way in the binary format (you can jump to
any rule you want within the blob structure), but iptables tools
do not offset such a feature.

Sending as RFC as this limits features that might be used by programs
that don't call xtables(-restore) tools.

This change also prevents the syzkaller reported crash as
ruleset gets rejected.

Reported-by: syzbot+e783f671527912cd9...@syzkaller.appspotmail.com
Signed-off-by: Florian Westphal 
---
 net/ipv4/netfilter/arp_tables.c | 20 ++--
 net/ipv4/netfilter/ip_tables.c  | 21 +++--
 net/ipv6/netfilter/ip6_tables.c | 22 --
 3 files changed, 45 insertions(+), 18 deletions(-)

diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index e3e420f3ba7b..2df708e5cf42 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -368,10 +368,14 @@ static int mark_source_chains(const struct xt_table_info 
*newinfo,
if (strcmp(t->target.u.user.name,
   XT_STANDARD_TARGET) == 0 &&
newpos >= 0) {
-   /* This a jump; chase it. */
-   if (!xt_find_jump_offset(offsets, 
newpos,
-
newinfo->number))
+   if (newpos >= newinfo->size)
return 0;
+   if (entry0 + newpos != 
ipt_next_entry(e)) {
+   /* This a jump; chase it. */
+   if 
(!xt_find_jump_offset(offsets, newpos,
+
newinfo->stacksize))
+   return 0;
+   }
} else {
/* ... this is a fallthru */
newpos = pos + e->next_offset;
@@ -523,6 +527,7 @@ static int translate_table(struct xt_table_info *newinfo, 
void *entry0,
struct arpt_entry *iter;
unsigned int *offsets;
unsigned int i;
+   bool userchain;
int ret = 0;
 
newinfo->size = repl->size;
@@ -548,12 +553,15 @@ static int translate_table(struct xt_table_info *newinfo, 
void *entry0,
 repl->valid_hooks);
if (ret != 0)
goto out_free;
-   if (i < repl->num_entries)
-   offsets[i] = (void *)iter - entry0;
++i;
+   if (userchain) {
+   offsets[newinfo->stacksize] = (void *)iter - entry0;
+   ++newinfo->stacksize;
+   userchain = false;
+   }
if (strcmp(arpt_get_target(iter)->u.user.name,
XT_ERROR_TARGET) == 0)
-   ++newinfo->stacksize;
+   userchain = true;
}
 
ret = -EINVAL;
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index e38395a8dcf2..c8adab24a883 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -435,10 +435,14 @@ mark_source_chains(const struct xt_table_info *newinfo,
if (strcmp(t->target.u.user.name,
   XT_STANDARD_TARGET) == 0 &&
newpos >= 0) {
-   /* This a jump; chase it. */
-   if (!xt_find_jump_offset(offsets, 
newpos,
-
newinfo->number))
+   if (newpos >= newinfo->size)
return 0;
+   if (entry0 + newpos != 
ipt_next_entry(e)) {
+   /* This a jump; chase it. */
+   if 
(!xt_find_jump_offset(offsets, newpos,
+
newinfo->stacksize))
+   return 0;
+   }
} else {
/* ... this is a fallthru */
newpos = pos + e->next_offset;
@@ -671,6 +675,7 @@ translate_table(struct net *net, struct xt_table_info 
*newinfo, void *entry0,
struct ipt_entry *iter;
unsigned int *offsets;
unsigned int i;
+   bool userchain;

[PATCH nf] netfilter: add back stackpointer size checks

2018-02-07 Thread Florian Westphal
The rationale for removing the check is only correct for rulesets
generated by ip(6)tables.

In iptables, a jump can only occur to a user-defined chain, i.e.
because we size the stack based on number of user-defined chains we
cannot exceed stack size.

However, the underlying binary format has no such restriction,
and the validation step only ensures that the jump target is a
valid rule start point.

IOW, its possible to build a rule blob that has no user-defined
chains but does contain a jump.

If this happens, no jump stack gets allocated and crash occurs
because no jumpstack was allocated.

Fixes: 7814b6ec6d0d6 ("netfilter: xtables: don't save/restore jumpstack offset")
Reported-by: syzbot+e783f671527912cd9...@syzkaller.appspotmail.com
Signed-off-by: Florian Westphal 
---
 net/ipv4/netfilter/arp_tables.c | 4 
 net/ipv4/netfilter/ip_tables.c  | 7 ++-
 net/ipv6/netfilter/ip6_tables.c | 4 
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index 4ffe302f9b82..e3e420f3ba7b 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -252,6 +252,10 @@ unsigned int arpt_do_table(struct sk_buff *skb,
}
if (table_base + v
!= arpt_next_entry(e)) {
+   if (unlikely(stackidx >= private->stacksize)) {
+   verdict = NF_DROP;
+   break;
+   }
jumpstack[stackidx++] = e;
}
 
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 9a71f3149507..e38395a8dcf2 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -330,8 +330,13 @@ ipt_do_table(struct sk_buff *skb,
continue;
}
if (table_base + v != ipt_next_entry(e) &&
-   !(e->ip.flags & IPT_F_GOTO))
+   !(e->ip.flags & IPT_F_GOTO)) {
+   if (unlikely(stackidx >= private->stacksize)) {
+   verdict = NF_DROP;
+   break;
+   }
jumpstack[stackidx++] = e;
+   }
 
e = get_entry(table_base, v);
continue;
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index af4c917e0836..62358b93bbac 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -352,6 +352,10 @@ ip6t_do_table(struct sk_buff *skb,
}
if (table_base + v != ip6t_next_entry(e) &&
!(e->ipv6.flags & IP6T_F_GOTO)) {
+   if (unlikely(stackidx >= private->stacksize)) {
+   verdict = NF_DROP;
+   break;
+   }
jumpstack[stackidx++] = e;
}
 
-- 
2.13.6

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING: proc registration bug in clusterip_tg_check

2018-02-07 Thread Florian Westphal
Paolo Abeni  wrote:

[ pruning CC list ]

> #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master
> 
> I can't reproduce the issue locally, so asking the syzbot to test the 
> tentive fix for me (and hoping I did not mess with the tag/format) 

I can reproduce it.

CLUSTERIP has multiple other bugs that need to be fixed, I'll look into
this asap.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


WARNING: proc registration bug in clusterip_tg_check

2018-02-07 Thread syzbot

Hello,

syzbot tried to test the proposed patch but build/boot failed:

kernel build failed: failed to run /usr/bin/make [make bzImage -j 32  
CC=/syzkaller/gcc/bin/gcc]: exit status 2

scripts/kconfig/conf  --silentoldconfig Kconfig
  CHK include/config/kernel.release
  CHK include/generated/uapi/linux/version.h
  CHK include/generated/utsrelease.h
  CHK scripts/mod/devicetable-offsets.h
  CHK include/generated/bounds.h
  CHK include/generated/timeconst.h
  CHK include/generated/asm-offsets.h
  CALLscripts/checksyscalls.sh
  CHK include/generated/compile.h
  CC  net/psample/psample.o
  CC  net/packet/af_packet.o
  CC  net/strparser/strparser.o
  CC  net/switchdev/switchdev.o
  CC  net/rfkill/core.o
  CC  net/compat.o
  CC  net/rfkill/input.o
  CC  net/sysctl_net.o
  CC  net/tls/tls_main.o
  CC  net/tls/tls_sw.o
  CC  net/unix/af_unix.o
  CC  net/wimax/id-table.o
  CC  net/unix/garbage.o
  CC  net/wimax/op-msg.o
  CC  net/unix/sysctl_net_unix.o
  CC  net/wimax/op-reset.o
  CC  net/wimax/op-rfkill.o
  CC  net/wimax/op-state-get.o
  AR  net/ipv4/netfilter/nf_conntrack_ipv4.o
  AR  net/ipv4/netfilter/nf_nat_ipv4.o
  CC  net/vmw_vsock/af_vsock.o
  AR  net/ipv4/netfilter/nf_nat_snmp_basic.o
  CC  net/tipc/addr.o
  CC  net/xfrm/xfrm_policy.o
  CC  net/ipv4/netfilter/ipt_CLUSTERIP.o
  CC  net/tipc/bcast.o
  CC  net/wireless/core.o
  CC  net/sunrpc/clnt.o
  CC  net/rds/af_rds.o
  CC  net/ipv4/netfilter/ipt_ECN.o
  CC  net/sched/sch_generic.o
  CC  net/sched/sch_mq.o
  CC  net/sctp/sm_statetable.o
  CC  net/sctp/sm_statefuns.o
  CC  net/wimax/stack.o
  CC  net/wimax/debugfs.o
  CC  net/sctp/sm_sideeffect.o
  CC  net/sctp/protocol.o
  AR  net/psample/built-in.o
  CC  net/sctp/endpointola.o
  CC  net/rds/bind.o
  CC  net/sunrpc/xprt.o
  CC  net/rds/cong.o
  AR  net/switchdev/built-in.o
  CC  net/sched/sch_api.o
  AR  net/rfkill/rfkill.o
net/ipv4/netfilter/ipt_CLUSTERIP.c: In function ‘clusterip_config_init’:
net/ipv4/netfilter/ipt_CLUSTERIP.c:253:22: error: expected ‘;’ before ‘:’  
token

   goto err_remove_pte:
  ^
  AR  net/rfkill/built-in.o
  CC  net/rds/connection.o
scripts/Makefile.build:316: recipe for  
target 'net/ipv4/netfilter/ipt_CLUSTERIP.o' failed

make[3]: *** [net/ipv4/netfilter/ipt_CLUSTERIP.o] Error 1
make[3]: *** Waiting for unfinished jobs
  CC  net/rds/info.o
  CC  net/sctp/associola.o
  CC  net/sctp/transport.o
  AR  net/strparser/built-in.o
  CC  net/sctp/chunk.o
  CC  net/sunrpc/socklib.o
  CC  net/tipc/bearer.o
scripts/Makefile.build:575: recipe for target 'net/ipv4/netfilter' failed
make[2]: *** [net/ipv4/netfilter] Error 2
scripts/Makefile.build:575: recipe for target 'net/ipv4' failed
make[1]: *** [net/ipv4] Error 2
make[1]: *** Waiting for unfinished jobs
  CC  net/tipc/core.o
  CC  net/sctp/sm_make_chunk.o
  CC  net/wireless/sysfs.o
  CC  net/wireless/radiotap.o
  AR  net/tls/tls.o
  AR  net/tls/built-in.o
  CC  net/sched/sch_blackhole.o
  CC  net/sched/cls_api.o
  CC  net/sctp/ulpevent.o
  AR  net/wimax/wimax.o
  AR  net/wimax/built-in.o
  CC  net/wireless/util.o
  CC  net/rds/message.o
  CC  net/sunrpc/xprtsock.o
  CC  net/sctp/inqueue.o
  CC  net/vmw_vsock/af_vsock_tap.o
  CC  net/rds/recv.o
  CC  net/wireless/reg.o
  CC  net/rds/send.o
  CC  net/tipc/link.o
  CC  net/sctp/outqueue.o
  CC  net/sched/act_api.o
  CC  net/sched/act_police.o
  CC  net/sctp/ulpqueue.o
  CC  net/rds/stats.o
  CC  net/rds/sysctl.o
  CC  net/sctp/tsnmap.o
  CC  net/sched/act_sample.o
  CC  net/vmw_vsock/vsock_addr.o
  CC  net/wireless/scan.o
  CC  net/sctp/bind_addr.o
  CC  net/rds/threads.o
  CC  net/rds/transport.o
  CC  net/sctp/socket.o
  CC  net/sunrpc/sched.o
  AR  net/unix/unix.o
  CC  net/tipc/discover.o
  AR  net/unix/built-in.o
  CC  net/tipc/msg.o
  CC  net/wireless/nl80211.o
  CC  net/sched/act_nat.o
  CC  net/rds/loop.o
  CC  net/rds/page.o
  CC  net/sctp/primitive.o
  CC  net/sched/act_pedit.o
  CC  net/sctp/output.o
  CC  net/vmw_vsock/diag.o
  CC  net/sctp/input.o
  CC  net/vmw_vsock/virtio_transport.o
  CC  net/sched/act_simple.o
  CC  net/rds/rdma.o
  CC  net/sched/act_bpf.o
  CC  net/tipc/name_distr.o
  CC  net/rds/tcp.o
  CC  net/sctp/debug.o
net/sctp/outqueue.c: In function ‘sctp_outq_flush’:
net/sctp/outqueue.c:1205:1: warning: the frame size of 2144 bytes is larger  
than 2048 bytes [-Wframe-larger-than=]

 }
 ^
  CC  net/sctp/stream.o
  CC  net/rds/tcp_connect.o
  CC  net/sctp/auth.o
  CC  net/rds/tcp_listen.o
  CC  net/sched/sch_fifo.o
  CC  net/rds/tcp_recv.o
  CC 

Re: [PATCH] netfilter: remove useless prototype

2018-02-07 Thread Pablo Neira Ayuso
On Wed, Feb 07, 2018 at 11:50:41AM +0900, Taehee Yoo wrote:
> prototype nf_ct_nat_offset is not used anymore.

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] netfilter: nf_flow_offload: fix use-after-free and a resource leak

2018-02-07 Thread Pablo Neira Ayuso
On Wed, Feb 07, 2018 at 09:49:02AM +0100, Felix Fietkau wrote:
> flow_offload_del frees the flow, so all associated resource must be
> freed before.
> 
> Since the ct entry in struct flow_offload_entry was allocated by
> flow_offload_alloc, it should be freed by flow_offload_free to take care
> of the error handling path when flow_offload_add fails.
> 
> While at it, make flow_offload_del static, since it should never be
> called directly, only from the gc step

Applied, thanks Felix!
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING: proc registration bug in clusterip_tg_check

2018-02-07 Thread Dmitry Vyukov
You dropped syzbot from CC ;)
Add syzbot+03218bcdba6aa7644...@syzkaller.appspotmail.com to To or CC.


On Wed, Feb 7, 2018 at 11:42 AM, syzbot
 wrote:
>> #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git
>> master
>
>
> Can't find the corresponding bug.
>
>
>
>> I can't reproduce the issue locally, so asking the syzbot to test the
>> tentive fix for me (and hoping I did not mess with the tag/format)
>
>
>> ---
>>   net/ipv4/netfilter/ipt_CLUSTERIP.c | 30 +++---
>>   1 file changed, 15 insertions(+), 15 deletions(-)
>
>
>> diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c
>> b/net/ipv4/netfilter/ipt_CLUSTERIP.c
>> index 3a84a60f6b39..db103cd971a9 100644
>> --- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
>> +++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
>> @@ -230,17 +230,6 @@ clusterip_config_init(struct net *net, const struct
>> ipt_clusterip_tgt_info *i,
>> refcount_set(>refcount, 1);
>> refcount_set(>entries, 1);
>
>
>> -   spin_lock_bh(>lock);
>> -   if (__clusterip_config_find(net, ip)) {
>> -   spin_unlock_bh(>lock);
>> -   kfree(c);
>> -
>> -   return ERR_PTR(-EBUSY);
>> -   }
>> -
>> -   list_add_rcu(>list, >configs);
>> -   spin_unlock_bh(>lock);
>> -
>>   #ifdef CONFIG_PROC_FS
>> {
>> char buffer[16];
>> @@ -257,20 +246,31 @@ clusterip_config_init(struct net *net, const struct
>> ipt_clusterip_tgt_info *i,
>> }
>>   #endif
>
>
>> +   spin_lock_bh(>lock);
>> +   if (__clusterip_config_find(net, ip)) {
>> +   spin_unlock_bh(>lock);
>> +   err = -EBUSY;
>> +   goto err_remove_pte:
>> +   }
>> +
>> +   list_add_rcu(>list, >configs);
>> +   spin_unlock_bh(>lock);
>> +
>> c->notifier.notifier_call = clusterip_netdev_event;
>> err = register_netdevice_notifier(>notifier);
>> if (!err)
>> return c;
>
>
>> +   spin_lock_bh(>lock);
>> +   list_del_rcu(>list);
>> +   spin_unlock_bh(>lock);
>> +
>> +err_remove_pte:
>>   #ifdef CONFIG_PROC_FS
>> proc_remove(c->pde);
>>   err:
>>   #endif
>> -   spin_lock_bh(>lock);
>> -   list_del_rcu(>list);
>> -   spin_unlock_bh(>lock);
>> kfree(c);
>> -
>> return ERR_PTR(err);
>>   }
>
>
>> --
>> 2.14.3
>
>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "syzkaller-bugs" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to syzkaller-bugs+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/syzkaller-bugs/945c8517a87c671825b61223088064ea2ad0a8cb.1517999262.git.pabeni%40redhat.com.
>> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to syzkaller-bugs+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/syzkaller-bugs/001a114372a68e749405649cf352%40google.com.
>
> For more options, visit https://groups.google.com/d/optout.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING: proc registration bug in clusterip_tg_check

2018-02-07 Thread Paolo Abeni
#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master

I can't reproduce the issue locally, so asking the syzbot to test the 
tentive fix for me (and hoping I did not mess with the tag/format) 

---
 net/ipv4/netfilter/ipt_CLUSTERIP.c | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c 
b/net/ipv4/netfilter/ipt_CLUSTERIP.c
index 3a84a60f6b39..db103cd971a9 100644
--- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
+++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
@@ -230,17 +230,6 @@ clusterip_config_init(struct net *net, const struct 
ipt_clusterip_tgt_info *i,
refcount_set(>refcount, 1);
refcount_set(>entries, 1);
 
-   spin_lock_bh(>lock);
-   if (__clusterip_config_find(net, ip)) {
-   spin_unlock_bh(>lock);
-   kfree(c);
-
-   return ERR_PTR(-EBUSY);
-   }
-
-   list_add_rcu(>list, >configs);
-   spin_unlock_bh(>lock);
-
 #ifdef CONFIG_PROC_FS
{
char buffer[16];
@@ -257,20 +246,31 @@ clusterip_config_init(struct net *net, const struct 
ipt_clusterip_tgt_info *i,
}
 #endif
 
+   spin_lock_bh(>lock);
+   if (__clusterip_config_find(net, ip)) {
+   spin_unlock_bh(>lock);
+   err = -EBUSY;
+   goto err_remove_pte:
+   }
+
+   list_add_rcu(>list, >configs);
+   spin_unlock_bh(>lock);
+
c->notifier.notifier_call = clusterip_netdev_event;
err = register_netdevice_notifier(>notifier);
if (!err)
return c;
 
+   spin_lock_bh(>lock);
+   list_del_rcu(>list);
+   spin_unlock_bh(>lock);
+
+err_remove_pte:
 #ifdef CONFIG_PROC_FS
proc_remove(c->pde);
 err:
 #endif
-   spin_lock_bh(>lock);
-   list_del_rcu(>list);
-   spin_unlock_bh(>lock);
kfree(c);
-
return ERR_PTR(err);
 }
 
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING: proc registration bug in clusterip_tg_check

2018-02-07 Thread syzbot
#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git  
master


Can't find the corresponding bug.



I can't reproduce the issue locally, so asking the syzbot to test the
tentive fix for me (and hoping I did not mess with the tag/format)



---
  net/ipv4/netfilter/ipt_CLUSTERIP.c | 30 +++---
  1 file changed, 15 insertions(+), 15 deletions(-)


diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c  
b/net/ipv4/netfilter/ipt_CLUSTERIP.c

index 3a84a60f6b39..db103cd971a9 100644
--- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
+++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
@@ -230,17 +230,6 @@ clusterip_config_init(struct net *net, const struct  
ipt_clusterip_tgt_info *i,

refcount_set(>refcount, 1);
refcount_set(>entries, 1);



-   spin_lock_bh(>lock);
-   if (__clusterip_config_find(net, ip)) {
-   spin_unlock_bh(>lock);
-   kfree(c);
-
-   return ERR_PTR(-EBUSY);
-   }
-
-   list_add_rcu(>list, >configs);
-   spin_unlock_bh(>lock);
-
  #ifdef CONFIG_PROC_FS
{
char buffer[16];
@@ -257,20 +246,31 @@ clusterip_config_init(struct net *net, const struct  
ipt_clusterip_tgt_info *i,

}
  #endif



+   spin_lock_bh(>lock);
+   if (__clusterip_config_find(net, ip)) {
+   spin_unlock_bh(>lock);
+   err = -EBUSY;
+   goto err_remove_pte:
+   }
+
+   list_add_rcu(>list, >configs);
+   spin_unlock_bh(>lock);
+
c->notifier.notifier_call = clusterip_netdev_event;
err = register_netdevice_notifier(>notifier);
if (!err)
return c;



+   spin_lock_bh(>lock);
+   list_del_rcu(>list);
+   spin_unlock_bh(>lock);
+
+err_remove_pte:
  #ifdef CONFIG_PROC_FS
proc_remove(c->pde);
  err:
  #endif
-   spin_lock_bh(>lock);
-   list_del_rcu(>list);
-   spin_unlock_bh(>lock);
kfree(c);
-
return ERR_PTR(err);
  }



--
2.14.3



--
You received this message because you are subscribed to the Google  
Groups "syzkaller-bugs" group.
To unsubscribe from this group and stop receiving emails from it, send an  
email to syzkaller-bugs+unsubscr...@googlegroups.com.
To view this discussion on the web visit  
https://groups.google.com/d/msgid/syzkaller-bugs/945c8517a87c671825b61223088064ea2ad0a8cb.1517999262.git.pabeni%40redhat.com.

For more options, visit https://groups.google.com/d/optout.

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [nft PATCH] Enable automerge feature for anonymous sets

2018-02-07 Thread Phil Sutter
Hi Pablo,

On Wed, Feb 07, 2018 at 12:39:43AM +0100, Pablo Neira Ayuso wrote:
> On Tue, Feb 06, 2018 at 07:18:47PM +0100, Phil Sutter wrote:
> > Automatic merging of adjacent/overlapping ranges upon insertion has
> > clear benefits performance- and readability-wise. The drawbacks which
> > led to disabling it by default don't apply to anonymous sets since they
> > are read-only anyway, so enable this feature for them again.
> 
> Question is, why someone would be adding elements with overlapping
> ranges in an anonymous set? Then, when listing the ruleset you will
> get something different to what you've added. This would also be
> inconsistent with regards to the existing behaviour in named sets,
> where this is turned off by default.

Recapping the pros and cons of automatic merging the discussion brought
up (I probably miss some):

+ Overlaps may be hard to spot (and avoid) if they sit in different
  defines used to fill a single set.

- Items added to a set can't be removed again if they were merged with
  others added at the same time.

>From those, I assume that if one uses defines to manage e.g. white/black
lists, a named set is probably not used either. The con above applies
only to named sets since anonymous ones can't be changed. Therefore I
considered automatic merging happening in anonymous sets to be
desirable.

> For named sets, that are useful to maintain white/blacklists, I
> understand this simplifies complexity for people dealing with them.
> But not sure for anonymous sets.

I'm not entirely sure about that since there are multiple ways to see
things. AFAIR, Jeff's point was when he combines set elements from
different sources he doesn't want to check them manually for potential
overlaps (or even can't). So there automatic merging is a good thing. If
my fail2ban script can't delete a range previously added after it
timeouts, it fails and recovery is not trivial (find the merged element
and divide it again). Here, merging is a bad thing. Overall though, I
wonder whether using interval sets for white/black lists isn't a bad
idea at all. :)

> @Jeff: Is this also useful to you in the anonymous set use-case? IIRC
> we agreed that this was good for named sets, but not for anonymous
> sets.

I'm curious to hearing his thoughts as well.

Cheers, Phil
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [nft PATCH] Enable automerge feature for anonymous sets

2018-02-07 Thread Jozsef Kadlecsik
On Wed, 7 Feb 2018, Pablo Neira Ayuso wrote:

> On Tue, Feb 06, 2018 at 07:18:47PM +0100, Phil Sutter wrote:
> > Automatic merging of adjacent/overlapping ranges upon insertion has
> > clear benefits performance- and readability-wise. The drawbacks which
> > led to disabling it by default don't apply to anonymous sets since they
> > are read-only anyway, so enable this feature for them again.
> 
> Question is, why someone would be adding elements with overlapping 
> ranges in an anonymous set? Then, when listing the ruleset you will get 
> something different to what you've added. This would also be 
> inconsistent with regards to the existing behaviour in named sets, where 
> this is turned off by default.
> 
> For named sets, that are useful to maintain white/blacklists, I 
> understand this simplifies complexity for people dealing with them. But 
> not sure for anonymous sets.
> 
> @Jeff: Is this also useful to you in the anonymous set use-case? IIRC we 
> agreed that this was good for named sets, but not for anonymous sets.

In my opinion the consistent behaviour is the most desired one. Such 
subleties that by default there's no automerge in named sets but it's on 
for anonymous sets are easily overlooked by users. Better have a flag, 
option to turn it on explicitly for a given anonymous set.

Best regards,
Jozsef
-
E-mail  : kad...@blackhole.kfki.hu, kadlecsik.joz...@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
  H-1525 Budapest 114, POB. 49, Hungary
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Spelling fixes

2018-02-07 Thread Ville Skyttä
Signed-off-by: Ville Skyttä 
---
 INSTALL | 2 +-
 doc/nft.xml | 2 +-
 include/datatype.h  | 2 +-
 src/exthdr.c| 4 ++--
 src/netlink_delinearize.c   | 2 +-
 src/payload.c   | 4 ++--
 tests/py/nft-test.py| 2 +-
 tests/shell/README  | 2 +-
 tests/shell/testcases/include/0012glob_dependency_1 | 2 +-
 tests/shell/testcases/listing/0002ruleset_0 | 2 +-
 10 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/INSTALL b/INSTALL
index e212bac..9475711 100644
--- a/INSTALL
+++ b/INSTALL
@@ -43,7 +43,7 @@ Installation instructions for nftables
 
  --datarootdir=
 
-   The base directory for arch-independant files. Defaults to
+   The base directory for arch-independent files. Defaults to
$prefix/share.
 
  --disable-debug
diff --git a/doc/nft.xml b/doc/nft.xml
index d5b9c27..807c836 100644
--- a/doc/nft.xml
+++ b/doc/nft.xml
@@ -3568,7 +3568,7 @@ inet filter output rt ip6 nexthop fd00::1



vtag
-   
Verfication Tag
+   
Verification Tag
integer 
(32 bit)


diff --git a/include/datatype.h b/include/datatype.h
index e9f6079..cc4cb07 100644
--- a/include/datatype.h
+++ b/include/datatype.h
@@ -129,7 +129,7 @@ enum datatype_flags {
  * @subtypes:  number of subtypes (concat type)
  * @name:  type name
  * @desc:  type description
- * @basetype:  basetype for subtypes, determines type compatibilty
+ * @basetype:  basetype for subtypes, determines type compatibility
  * @basefmt:   format string for basetype
  * @print: function to print a constant of this type
  * @parse: function to parse a symbol and return an expression
diff --git a/src/exthdr.c b/src/exthdr.c
index ac3e163..f5c20ac 100644
--- a/src/exthdr.c
+++ b/src/exthdr.c
@@ -26,8 +26,8 @@
 static void exthdr_expr_print(const struct expr *expr, struct output_ctx *octx)
 {
if (expr->exthdr.op == NFT_EXTHDR_OP_TCPOPT) {
-   /* Offset calcualtion is a bit hacky at this point.
-* There might be an tcp option one day with another
+   /* Offset calculation is a bit hacky at this point.
+* There might be a tcp option one day with another
 * multiplicator
 */
unsigned int offset = expr->exthdr.offset / 64;
diff --git a/src/netlink_delinearize.c b/src/netlink_delinearize.c
index 256552b..6b66257 100644
--- a/src/netlink_delinearize.c
+++ b/src/netlink_delinearize.c
@@ -2173,7 +2173,7 @@ static void stmt_payload_postprocess(struct rule_pp_ctx 
*ctx)
 
 /*
  * We can only remove payload dependencies if they occur without
- * a statment with side effects in between.
+ * a statement with side effects in between.
  *
  * For instance:
  * 'ip protocol tcp tcp dport 22 counter' is same as
diff --git a/src/payload.c b/src/payload.c
index 60090ac..c036da6 100644
--- a/src/payload.c
+++ b/src/payload.c
@@ -429,13 +429,13 @@ void payload_dependency_store(struct payload_dep_ctx *ctx,
 }
 
 /**
- * __payload_dependency_kill - kill a redundant payload depedency
+ * __payload_dependency_kill - kill a redundant payload dependency
  *
  * @ctx: payload dependency context
  * @expr: higher layer payload expression
  *
  * Kill a redundant payload expression if a higher layer payload expression
- * implies its existance.
+ * implies its existence.
  */
 void __payload_dependency_kill(struct payload_dep_ctx *ctx,
   enum proto_bases base)
diff --git a/tests/py/nft-test.py b/tests/py/nft-test.py
index 6aedfb0..87e3d51 100755
--- a/tests/py/nft-test.py
+++ b/tests/py/nft-test.py
@@ -384,7 +384,7 @@ def set_add_elements(set_element, set_name, state, 
filename, lineno):
 print_error(reason, filename, lineno)
 return -1
 
-# Add element into a all_set.
+# Add element into all_set.
 if ret == 0 and state == "ok":
 for e in set_element:
 all_set[set_name].add(e)
diff --git a/tests/shell/README b/tests/shell/README
index 2d8681f..3ffe642 100644
--- a/tests/shell/README
+++ b/tests/shell/README
@@ -10,7 +10,7 @@ To run the test suite (as root):
 
 Test files are executables files with the pattern <>, where N is the
 expected return code of the executable. Since they are located 

[PATCH] netfilter: nf_flow_offload: fix use-after-free and a resource leak

2018-02-07 Thread Felix Fietkau
flow_offload_del frees the flow, so all associated resource must be
freed before.

Since the ct entry in struct flow_offload_entry was allocated by
flow_offload_alloc, it should be freed by flow_offload_free to take care
of the error handling path when flow_offload_add fails.

While at it, make flow_offload_del static, since it should never be
called directly, only from the gc step

Signed-off-by: Felix Fietkau 
---
 include/net/netfilter/nf_flow_table.h |  1 -
 net/netfilter/nf_flow_table.c | 27 +++
 2 files changed, 7 insertions(+), 21 deletions(-)

diff --git a/include/net/netfilter/nf_flow_table.h 
b/include/net/netfilter/nf_flow_table.h
index 020ae903066f..833752dd0c58 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -90,7 +90,6 @@ struct flow_offload *flow_offload_alloc(struct nf_conn *ct,
 void flow_offload_free(struct flow_offload *flow);
 
 int flow_offload_add(struct nf_flowtable *flow_table, struct flow_offload 
*flow);
-void flow_offload_del(struct nf_flowtable *flow_table, struct flow_offload 
*flow);
 struct flow_offload_tuple_rhash *flow_offload_lookup(struct nf_flowtable 
*flow_table,
 struct flow_offload_tuple 
*tuple);
 int nf_flow_table_iterate(struct nf_flowtable *flow_table,
diff --git a/net/netfilter/nf_flow_table.c b/net/netfilter/nf_flow_table.c
index c17f1af42daa..ec410cae9307 100644
--- a/net/netfilter/nf_flow_table.c
+++ b/net/netfilter/nf_flow_table.c
@@ -125,7 +125,9 @@ void flow_offload_free(struct flow_offload *flow)
dst_release(flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.dst_cache);
dst_release(flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.dst_cache);
e = container_of(flow, struct flow_offload_entry, flow);
-   kfree(e);
+   nf_ct_delete(e->ct, 0, 0);
+   nf_ct_put(e->ct);
+   kfree_rcu(e, rcu_head);
 }
 EXPORT_SYMBOL_GPL(flow_offload_free);
 
@@ -149,11 +151,9 @@ int flow_offload_add(struct nf_flowtable *flow_table, 
struct flow_offload *flow)
 }
 EXPORT_SYMBOL_GPL(flow_offload_add);
 
-void flow_offload_del(struct nf_flowtable *flow_table,
- struct flow_offload *flow)
+static void flow_offload_del(struct nf_flowtable *flow_table,
+struct flow_offload *flow)
 {
-   struct flow_offload_entry *e;
-
rhashtable_remove_fast(_table->rhashtable,
   >tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].node,
   *flow_table->type->params);
@@ -161,10 +161,8 @@ void flow_offload_del(struct nf_flowtable *flow_table,
   >tuplehash[FLOW_OFFLOAD_DIR_REPLY].node,
   *flow_table->type->params);
 
-   e = container_of(flow, struct flow_offload_entry, flow);
-   kfree_rcu(e, rcu_head);
+   flow_offload_free(flow);
 }
-EXPORT_SYMBOL_GPL(flow_offload_del);
 
 struct flow_offload_tuple_rhash *
 flow_offload_lookup(struct nf_flowtable *flow_table,
@@ -175,15 +173,6 @@ flow_offload_lookup(struct nf_flowtable *flow_table,
 }
 EXPORT_SYMBOL_GPL(flow_offload_lookup);
 
-static void nf_flow_release_ct(const struct flow_offload *flow)
-{
-   struct flow_offload_entry *e;
-
-   e = container_of(flow, struct flow_offload_entry, flow);
-   nf_ct_delete(e->ct, 0, 0);
-   nf_ct_put(e->ct);
-}
-
 int nf_flow_table_iterate(struct nf_flowtable *flow_table,
  void (*iter)(struct flow_offload *flow, void *data),
  void *data)
@@ -259,10 +248,8 @@ static int nf_flow_offload_gc_step(struct nf_flowtable 
*flow_table)
flow = container_of(tuplehash, struct flow_offload, 
tuplehash[0]);
 
if (nf_flow_has_expired(flow) ||
-   nf_flow_is_dying(flow)) {
+   nf_flow_is_dying(flow))
flow_offload_del(flow_table, flow);
-   nf_flow_release_ct(flow);
-   }
}
 out:
rhashtable_walk_stop();
-- 
2.14.2

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING: proc registration bug in clusterip_tg_check

2018-02-07 Thread Paolo Abeni
On Tue, 2018-02-06 at 22:42 -0800, Cong Wang wrote:
> On Tue, Feb 6, 2018 at 6:27 AM, syzbot
>  wrote:
> > Hello,
> > 
> > syzbot hit the following crash on net-next commit
> > 617aebe6a97efa539cc4b8a52adccd89596e6be0 (Sun Feb 4 00:25:42 2018 +)
> > Merge tag 'usercopy-v4.16-rc1' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
> > 
> > So far this crash happened 5 times on net-next, upstream.
> > C reproducer is attached.
> > syzkaller reproducer is attached.
> > Raw console output is attached.
> > compiler: gcc (GCC) 7.1.1 20170620
> > .config is attached.
> > 
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+03218bcdba6aa7644...@syzkaller.appspotmail.com
> > It will help syzbot understand when the bug is fixed. See footer for
> > details.
> > If you forward the report, please keep this part and the footer.
> > 
> > x_tables: ip_tables: osf match: only valid for protocol 6
> > x_tables: ip_tables: osf match: only valid for protocol 6
> > x_tables: ip_tables: osf match: only valid for protocol 6
> > [ cut here ]
> > proc_dir_entry 'ipt_CLUSTERIP/172.20.0.170' already registered
> > WARNING: CPU: 1 PID: 4152 at fs/proc/generic.c:330 proc_register+0x2a4/0x370
> > fs/proc/generic.c:329
> > Kernel panic - not syncing: panic_on_warn set ...
> > 
> > CPU: 1 PID: 4152 Comm: syzkaller851476 Not tainted 4.15.0+ #221
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > Google 01/01/2011
> > Call Trace:
> >  __dump_stack lib/dump_stack.c:17 [inline]
> >  dump_stack+0x194/0x257 lib/dump_stack.c:53
> >  panic+0x1e4/0x41c kernel/panic.c:183
> >  __warn+0x1dc/0x200 kernel/panic.c:547
> >  report_bug+0x211/0x2d0 lib/bug.c:184
> >  fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
> >  fixup_bug arch/x86/kernel/traps.c:247 [inline]
> >  do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
> >  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
> >  invalid_op+0x22/0x40 arch/x86/entry/entry_64.S:1097
> > RIP: 0010:proc_register+0x2a4/0x370 fs/proc/generic.c:329
> > RSP: 0018:8801cbd6ee20 EFLAGS: 00010286
> > RAX: dc08 RBX: 8801d2181038 RCX: 815a57ae
> > RDX:  RSI: 1100397add74 RDI: 1100397add49
> > RBP: 8801cbd6ee70 R08: 1100397add0b R09: 
> > R10: 8801cbd6ecd8 R11:  R12: 8801b2bb1cc0
> > R13: dc00 R14: 8801b0d8dbc8 R15: 8801b2bb1d81
> >  proc_create_data+0xf8/0x180 fs/proc/generic.c:494
> >  clusterip_config_init net/ipv4/netfilter/ipt_CLUSTERIP.c:250 [inline]
> 
> I think there is probably a race condition between 
> clusterip_config_entry_put()
> and clusterip_config_init(), after we release the spinlock, a new proc
> with the same IP could be created therefore triggers this warning
> 
> I am not sure if it is enough to just move the proc_remove() under
> spinlock...

I *think* we should change the order on proc fs entry creation, 
because clusterip_config_init() can race with itself,
clusterip_config_init() returns NULL if the clusterip_config_init has
no pte, and currently such entry is inserted into the list with NULL
pte and the list lock itself is released before creating the PTE.

I'll try to test something the following:
---
diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c 
b/net/ipv4/netfilter/ipt_CLUSTERIP.c
index 3a84a60f6b39..d8807c44cc61 100644
--- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
+++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
@@ -230,17 +230,6 @@ clusterip_config_init(struct net *net, const struct 
ipt_clusterip_tgt_info *i,
refcount_set(>refcount, 1);
refcount_set(>entries, 1);
 
-   spin_lock_bh(>lock);
-   if (__clusterip_config_find(net, ip)) {
-   spin_unlock_bh(>lock);
-   kfree(c);
-
-   return ERR_PTR(-EBUSY);
-   }
-
-   list_add_rcu(>list, >configs);
-   spin_unlock_bh(>lock);
-
 #ifdef CONFIG_PROC_FS
{
char buffer[16];
@@ -257,6 +246,18 @@ clusterip_config_init(struct net *net, const struct 
ipt_clusterip_tgt_info *i,
}
 #endif
 
+   spin_lock_bh(>lock);
+   if (__clusterip_config_find(net, ip)) {
+   spin_unlock_bh(>lock);
+   kfree(c);
+
+   proc_remove(c->pde);
+   return ERR_PTR(-EBUSY);
+   }
+
+   list_add_rcu(>list, >configs);
+   spin_unlock_bh(>lock);
+
c->notifier.notifier_call = clusterip_netdev_event;
err = register_netdevice_notifier(>notifier);
if (!err)
---

Cheers,

Paolo
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html