Re: [PATCH nf-next v2 1/3] netfilter: bridge: add and use br_nf_hook_thresh

2016-07-14 Thread Aaron Conole
Pablo Neira Ayuso  writes:

> On Tue, Jul 12, 2016 at 11:32:19AM -0400, Aaron Conole wrote:
>> +/* recursively invokes nf_hook_slow (again), skipping already-called
>> + * hooks (< NF_BR_PRI_BRNF).
>> + *
>> + * Called with rcu read lock held.
>> + */
>> +int br_nf_hook_thresh(unsigned int hook, struct net *net,
>> +  struct sock *sk, struct sk_buff *skb,
>> +  struct net_device *indev,
>> +  struct net_device *outdev,
>> +  int (*okfn)(struct net *, struct sock *,
>> +  struct sk_buf *))
>  ^^
>
> This doesn't compile. Please, make sure patches compile independently
> so we make sure git bisectability doesn't break. Thanks.

Okay, I will make sure to do this for all the patches.  Apologies.

Thanks for the review.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/3] netfilter: replace list_head with single linked list

2016-07-14 Thread Pablo Neira Ayuso
On Tue, Jul 12, 2016 at 11:32:21AM -0400, Aaron Conole wrote:
> The netfilter hook list never uses the prev pointer, and so can be
> trimmed to be a smaller singly-linked list.
> 
> In addition to having a more light weight structure for hook traversal,
> struct net becomes 5568 bytes (down from 6400) and struct net_device
> becomes 2176 bytes (down from 2240).
> 
> Signed-off-by: Aaron Conole 
> Signed-off-by: Florian Westphal 
> ---
> v2:
> * Adjusted the hook list head function, and retested with rcu and lockdep
>   debugging enabled.
> 
>  include/linux/netdevice.h |   2 +-
>  include/linux/netfilter.h |  18 +++---
>  include/linux/netfilter_ingress.h |  14 +++--
>  include/net/netfilter/nf_queue.h  |   9 ++-
>  include/net/netns/netfilter.h |   2 +-
>  net/bridge/br_netfilter_hooks.c   |  21 +++
>  net/netfilter/core.c  | 127 
> --
>  net/netfilter/nf_internals.h  |  10 +--
>  net/netfilter/nf_queue.c  |  15 +++--
>  net/netfilter/nfnetlink_queue.c   |   5 +-
>  10 files changed, 130 insertions(+), 93 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 49736a3..9da07ad 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1749,7 +1749,7 @@ struct net_device {
>  #endif
>   struct netdev_queue __rcu *ingress_queue;
>  #ifdef CONFIG_NETFILTER_INGRESS
> - struct list_headnf_hooks_ingress;
> + struct nf_hook_entry __rcu *nf_hooks_ingress;
>  #endif
>  
>   unsigned char   broadcast[MAX_ADDR_LEN];
> diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
> index ad444f0..3390a84 100644
> --- a/include/linux/netfilter.h
> +++ b/include/linux/netfilter.h
> @@ -55,12 +55,12 @@ struct nf_hook_state {
>   struct net_device *out;
>   struct sock *sk;
>   struct net *net;
> - struct list_head *hook_list;
> + struct nf_hook_entry *hook_list;
>   int (*okfn)(struct net *, struct sock *, struct sk_buff *);
>  };
>  
>  static inline void nf_hook_state_init(struct nf_hook_state *p,
> -   struct list_head *hook_list,
> +   struct nf_hook_entry *hook_list,
> unsigned int hook,
> int thresh, u_int8_t pf,
> struct net_device *indev,
> @@ -97,6 +97,12 @@ struct nf_hook_ops {
>   int priority;
>  };
>  
> +struct nf_hook_entry {
> + struct nf_hook_entry __rcu  *next;
> + struct nf_hook_ops  ops;
> + const struct nf_hook_ops*orig_ops;
> +};
> +
>  struct nf_sockopt_ops {
>   struct list_head list;
>  
> @@ -161,8 +167,6 @@ static inline int nf_hook_thresh(u_int8_t pf, unsigned 
> int hook,
>int (*okfn)(struct net *, struct sock *, 
> struct sk_buff *),
>int thresh)
>  {
> - struct list_head *hook_list;
> -
>  #ifdef HAVE_JUMP_LABEL
>   if (__builtin_constant_p(pf) &&
>   __builtin_constant_p(hook) &&
> @@ -170,14 +174,14 @@ static inline int nf_hook_thresh(u_int8_t pf, unsigned 
> int hook,
>   return 1;
>  #endif
>  
> - hook_list = >nf.hooks[pf][hook];
> -

You have to place rcu_read_lock() here, see below.

> - if (!list_empty(hook_list)) {
> + if (rcu_access_pointer(net->nf.hooks[pf][hook])) {

This check above is out of the rcu read-side section, here this may
evaluate true...

> + struct nf_hook_entry *hook_list;
>   struct nf_hook_state state;
>   int ret;
>  
>   /* We may already have this, but read-locks nest anyway */
>   rcu_read_lock();
> + hook_list = rcu_dereference(net->nf.hooks[pf][hook]);

... but then, net->nf.hooks[pf][hook]) may have become NULL, I guess
this race will result in a crash.

General note on this patchset: With linked-lists, it was always true
that net->nf.hooks[pf][hook] is non-NULL since this was pointing to
the list head. After this patch this no longer true, that means we
have to be more careful ;).

>   nf_hook_state_init(, hook_list, hook, thresh,
>  pf, indev, outdev, sk, net, okfn);
>  
> diff --git a/include/linux/netfilter_ingress.h 
> b/include/linux/netfilter_ingress.h
> index 6965ba0..e3e3f6d 100644
> --- a/include/linux/netfilter_ingress.h
> +++ b/include/linux/netfilter_ingress.h
> @@ -11,23 +11,27 @@ static inline bool nf_hook_ingress_active(const struct 
> sk_buff *skb)
>   if 
> (!static_key_false(_hooks_needed[NFPROTO_NETDEV][NF_NETDEV_INGRESS]))
>   return false;
>  #endif
> - return !list_empty(>dev->nf_hooks_ingress);
> + return rcu_access_pointer(skb->dev->nf_hooks_ingress) != NULL;
>  }
>  
>  /* caller must hold rcu_read_lock */
>  static 

Re: [PATCH nf-next v2 2/3] netfilter: call nf_hook_state_init with rcu_read_lock held

2016-07-14 Thread Florian Westphal
Pablo Neira Ayuso  wrote:
> > diff --git a/net/bridge/netfilter/ebt_redirect.c 
> > b/net/bridge/netfilter/ebt_redirect.c
> > index 20396499..2e7c4f9 100644
> > --- a/net/bridge/netfilter/ebt_redirect.c
> > +++ b/net/bridge/netfilter/ebt_redirect.c
> > @@ -24,7 +24,7 @@ ebt_redirect_tg(struct sk_buff *skb, const struct 
> > xt_action_param *par)
> > return EBT_DROP;
> >  
> > if (par->hooknum != NF_BR_BROUTING)
> > -   /* rcu_read_lock()ed by nf_hook_slow */
> > +   /* rcu_read_lock()ed by nf_hook_thresh */
> 
> Why are all these comments being renamed in this patch? This patch
> description doesn't say anything about this.

Aaron, I'm sorry about this.

I suggest to do this renaming in a different patch, i.e.
keep the rcu_read_lock in nf_hook_slow in this patch.

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf-next v2 2/3] netfilter: call nf_hook_state_init with rcu_read_lock held

2016-07-14 Thread Pablo Neira Ayuso
On Tue, Jul 12, 2016 at 11:32:20AM -0400, Aaron Conole wrote:
> From: Florian Westphal 
> 
> This makes things simpler because we can store the head of the list
> in the nf_state structure without worrying about concurrent add/delete
> of hook elements from the list.

This is something that you need for your follow up patch, right? Then
it would be good to document this here.

More comments below.

> diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
> index 9230f9a..ad444f0 100644
> --- a/include/linux/netfilter.h
> +++ b/include/linux/netfilter.h
> @@ -174,10 +174,16 @@ static inline int nf_hook_thresh(u_int8_t pf, unsigned 
> int hook,
>  
>   if (!list_empty(hook_list)) {
>   struct nf_hook_state state;
> + int ret;
>  
> + /* We may already have this, but read-locks nest anyway */
> + rcu_read_lock();
>   nf_hook_state_init(, hook_list, hook, thresh,
>  pf, indev, outdev, sk, net, okfn);
> - return nf_hook_slow(skb, );
> +
> + ret = nf_hook_slow(skb, );
> + rcu_read_unlock();
> + return ret;
>   }
>   return 1;
>  }
> diff --git a/include/linux/netfilter_ingress.h 
> b/include/linux/netfilter_ingress.h
> index 5fcd375..6965ba0 100644
> --- a/include/linux/netfilter_ingress.h
> +++ b/include/linux/netfilter_ingress.h
> @@ -14,6 +14,7 @@ static inline bool nf_hook_ingress_active(const struct 
> sk_buff *skb)
>   return !list_empty(>dev->nf_hooks_ingress);
>  }
>  
> +/* caller must hold rcu_read_lock */
>  static inline int nf_hook_ingress(struct sk_buff *skb)
>  {
>   struct nf_hook_state state;
> diff --git a/net/bridge/netfilter/ebt_redirect.c 
> b/net/bridge/netfilter/ebt_redirect.c
> index 20396499..2e7c4f9 100644
> --- a/net/bridge/netfilter/ebt_redirect.c
> +++ b/net/bridge/netfilter/ebt_redirect.c
> @@ -24,7 +24,7 @@ ebt_redirect_tg(struct sk_buff *skb, const struct 
> xt_action_param *par)
>   return EBT_DROP;
>  
>   if (par->hooknum != NF_BR_BROUTING)
> - /* rcu_read_lock()ed by nf_hook_slow */
> + /* rcu_read_lock()ed by nf_hook_thresh */

Why are all these comments being renamed in this patch? This patch
description doesn't say anything about this.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf-next v2 1/3] netfilter: bridge: add and use br_nf_hook_thresh

2016-07-14 Thread Pablo Neira Ayuso
On Tue, Jul 12, 2016 at 11:32:19AM -0400, Aaron Conole wrote:
> +/* recursively invokes nf_hook_slow (again), skipping already-called
> + * hooks (< NF_BR_PRI_BRNF).
> + *
> + * Called with rcu read lock held.
> + */
> +int br_nf_hook_thresh(unsigned int hook, struct net *net,
> +   struct sock *sk, struct sk_buff *skb,
> +   struct net_device *indev,
> +   struct net_device *outdev,
> +   int (*okfn)(struct net *, struct sock *,
> +   struct sk_buf *))
 ^^

This doesn't compile. Please, make sure patches compile independently
so we make sure git bisectability doesn't break. Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 nf] netfilter: x_tables: speed up jump target validation

2016-07-14 Thread Florian Westphal
Florian Westphal  wrote:
> The dummy ruleset I used to test the original validation change was broken,
> most rules were unreachable and were not tested by mark_source_chains().

...

I will send a v3 to also include arptables.
I thought arptables was irrelevant since arptable rulesets are usually
very small but I forgot about DoS angle (we use single mutext for all
net nsamespaces).
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nft 2/3] meta: add short-hand mnemonic for probalistic matching

2016-07-14 Thread Florian Westphal
Pablo Neira Ayuso  wrote:
> But if the user introduces a meta random value that can be mapped to
> probability datatype, we would still hit this asymmetry, right? So the
> guess game would fail and the user would get confused.

Yes, but thats not really different from what we do with dependency
removal, e.g. with 'ip protocol tcp tcp dport 22', the 'ip protocol tcp'
is still elided from list output since its redundant.

> > Nothing, but the meta random might be interesting to e.g. set random
> > (ct)mark for load balancing purposes.
> 
> Could you have a look at the libnftnl userdata tlv infrastructure? We
> can probably place this information the RULE_USERDATA so we provide an
> explicit indication to userspace of how to interpret this.  Currently
> this is only used for rule comments, but we can stash this
> how-to-interpret-this information there.

Sure, I will have a look.  It might take a while though.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nft 2/3] meta: add short-hand mnemonic for probalistic matching

2016-07-14 Thread Pablo Neira Ayuso
On Thu, Jul 14, 2016 at 12:52:18PM +0200, Florian Westphal wrote:
> Pablo Neira Ayuso  wrote:
> > On Tue, Jul 05, 2016 at 09:35:34AM +0200, Florian Westphal wrote:
> > > Allow users to use a simpler way to specify probalistic matching, e. g.:
> > > 
> > > meta probability 0.5  (match approx. every 2nd packet)
> > > meta probability 0.001(match approx. once every 1000 packets)
> > > 
> > > nft list will still show
> > > meta random <= 2147483647
> > > meta random <= 4294967
> > 
> > I don't like this asymmetry.
> 
> Its changed in patch #3 when adding the shorthand reverse
> translation.

But if the user introduces a meta random value that can be mapped to
probability datatype, we would still hit this asymmetry, right? So the
guess game would fail and the user would get confused.

> > What is the usecase for 'meta random' out of this probability case that
> > maps to what xt_statistics offers?
> 
> Nothing, but the meta random might be interesting to e.g. set random
> (ct)mark for load balancing purposes.

Could you have a look at the libnftnl userdata tlv infrastructure? We
can probably place this information the RULE_USERDATA so we provide an
explicit indication to userspace of how to interpret this.  Currently
this is only used for rule comments, but we can stash this
how-to-interpret-this information there.

The idea is to keep this information around as context in the
delinearize step, so we can replace the default datatype that is
assigned to the one that displays this as a probability from the
rule_parse_postprocess() phase.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nft 2/3] meta: add short-hand mnemonic for probalistic matching

2016-07-14 Thread Florian Westphal
Pablo Neira Ayuso  wrote:
> On Tue, Jul 05, 2016 at 09:35:34AM +0200, Florian Westphal wrote:
> > Allow users to use a simpler way to specify probalistic matching, e. g.:
> > 
> > meta probability 0.5(match approx. every 2nd packet)
> > meta probability 0.001  (match approx. once every 1000 packets)
> > 
> > nft list will still show
> > meta random <= 2147483647
> > meta random <= 4294967
> 
> I don't like this asymmetry.

Its changed in patch #3 when adding the shorthand reverse translation.

> What is the usecase for 'meta random' out of this probability case that
> maps to what xt_statistics offers?

Nothing, but the meta random might be interesting to e.g. set random
(ct)mark for load balancing purposes.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nft 2/3] meta: add short-hand mnemonic for probalistic matching

2016-07-14 Thread Pablo Neira Ayuso
On Tue, Jul 05, 2016 at 09:35:34AM +0200, Florian Westphal wrote:
> Allow users to use a simpler way to specify probalistic matching, e. g.:
> 
> meta probability 0.5  (match approx. every 2nd packet)
> meta probability 0.001(match approx. once every 1000 packets)
> 
> nft list will still show
> meta random <= 2147483647
> meta random <= 4294967

I don't like this asymmetry.

What is the usecase for 'meta random' out of this probability case that
maps to what xt_statistics offers?
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html