Re: [PATCH] netfilter: nf_nat: Support fullcone NAT

2020-11-06 Thread Florian Westphal
Paul Menzel  wrote:
> From: Kiran Kella 
> 
> Changes done in the kernel to ensure 3-tuple uniqueness of the conntrack
> entries for the fullcone nat functionality.
> 
> *   Hashlist is maintained for the 3-tuple unique keys (Protocol/Source
> IP/Port) for all the conntrack entries.
> 
> *   When NAT table rules are created with the fullcone option, the
> SNAT/POSTROUTING stage ensures the ports from the pool are picked up in
> such a way that the 3-tuple is uniquely assigned.
> 
> *   In the DNAT/POSTROUTING stage, the fullcone behavior is ensured by 
> checking
> and reusing the 3-tuple for the Source IP/Port in the original direction.
> 
> *   When the pool is exhausted of the 3-tuple assignments, the packets are
> dropped, else, they will be going out of the router they being 5-tuple
> unique (which is not intended).
> 
> *   Passing fullcone option using iptables is part of another PR (in
> sonic-buildimage repo).

These are way too many changes for a single patch.
Please consider splitting this ino multiple chunks, e.g. at least
separate functional fullcone from the boilerplate changes.

> The kernel changes mentioned above are done to counter the challenges
> explained in the section *3.4.2.1 Handling NAT model mismatch between
> the ASIC and the Kernel* in the NAT HLD [1].

And please add the relevant explanations from this

> [1]: 
> https://github.com/kirankella/SONiC/blob/nat_doc_changes/doc/nat/nat_design_spec.md

... to the commit message.

> This is taken from switch network operating system (NOS) SONiC’s Linux
> repository, where the support was added in September 2019 [1], and
> forwarded ported to Linux 4.19 by Akhilesh in June 2020 [2].

> I am sending it upstream as a request for comments, before effort
> is put into forward porting it to Linux master.

I don't see any huge problems from a technical pov.
But I don't see why this functionality is needed from a pure SW
point of view.

AFAICS SONiC uses a propritary (or at least, custom) offload mechanism
to place nat entries into HW.

Netfilter already has a forwarding offload mechanism, described in
Documentation/networking/nf_flowtable.rst , so I'm not sure it makes
sense to accept this without patches to support the needed offload
support as well.

AFAIU passing fullcone makes no sense unless using offload HW that
doesn't support the current nat port allocation scheme.

And current kernel doesn't support any such HW.

Nevertheless, some comments below.

> +/* Is this 3-tuple already taken? (not by us)*/
> +int
> +nf_nat_used_3_tuple(const struct nf_conntrack_tuple *tuple,
> + const struct nf_conn *ignored_conntrack,
> + enum nf_nat_manip_type maniptype);
> +
>*/
> - void (*unique_tuple)(const struct nf_nat_l3proto *l3proto,
> + int  (*unique_tuple)(const struct nf_nat_l3proto *l3proto,
>struct nf_conntrack_tuple *tuple,
>const struct nf_nat_range2 *range,
>enum nf_nat_manip_type maniptype,

The above change should be done in a separate patch, so its
in a isolated change.  This will ease review of the fullcone part.

>  /* generate unique tuple ... */
> -static void
> +static int
>  gre_unique_tuple(const struct nf_nat_l3proto *l3proto,
>struct nf_conntrack_tuple *tuple,
>const struct nf_nat_range2 *range,
> @@ -52,7 +52,7 @@ gre_unique_tuple(const struct nf_nat_l3proto *l3proto,
>   /* If there is no master conntrack we are not PPTP,
>  do not change tuples */
>   if (!ct->master)
> - return;
> + return 0;
>  
>   if (maniptype == NF_NAT_MANIP_SRC)
>   keyptr = >src.u.gre.key;
> @@ -73,11 +73,11 @@ gre_unique_tuple(const struct nf_nat_l3proto *l3proto,
>   for (i = 0; ; ++key) {
>   *keyptr = htons(min + key % range_size);
>   if (++i == range_size || !nf_nat_used_tuple(tuple, ct))
> - return;
> + return 1;
>   }

I suggest you use 'bool' type for this rather than int, unless you plan
to use errno codes here in some future change.

> @@ -155,6 +156,31 @@ hash_by_src(const struct net *n, const struct 
> nf_conntrack_tuple *tuple)
>   return reciprocal_scale(hash, nf_nat_htable_size);
>  }
>  
> +static inline unsigned int
> +hash_by_dst(const struct net *n, const struct nf_conntrack_tuple *tuple)

please avoid inline keyword in .c files for new submissions.

> +{
> + unsigned int hash;
> +
> + get_random_once(_nat_hash_rnd, sizeof(nf_nat_hash_rnd));

get_random_once can't be called from multiple places for the same random
value.

[ I did not check if thats the case since patch doesn't apply to current
nf ].

> +static inline int
> +same_reply_dst(const struct nf_conn *ct,
> +const struct nf_conntrack_tuple *tuple)
> +{
> + const struct nf_conntrack_tuple *t;
> +
> + t = 

[PATCH] netfilter: nf_nat: Support fullcone NAT

2020-11-06 Thread Paul Menzel
From: Kiran Kella 

Changes done in the kernel to ensure 3-tuple uniqueness of the conntrack
entries for the fullcone nat functionality.

*   Hashlist is maintained for the 3-tuple unique keys (Protocol/Source
IP/Port) for all the conntrack entries.

*   When NAT table rules are created with the fullcone option, the
SNAT/POSTROUTING stage ensures the ports from the pool are picked up in
such a way that the 3-tuple is uniquely assigned.

*   In the DNAT/POSTROUTING stage, the fullcone behavior is ensured by checking
and reusing the 3-tuple for the Source IP/Port in the original direction.

*   When the pool is exhausted of the 3-tuple assignments, the packets are
dropped, else, they will be going out of the router they being 5-tuple
unique (which is not intended).

*   Passing fullcone option using iptables is part of another PR (in
sonic-buildimage repo).

The kernel changes mentioned above are done to counter the challenges
explained in the section *3.4.2.1 Handling NAT model mismatch between
the ASIC and the Kernel* in the NAT HLD [1].

[1]: 
https://github.com/kirankella/SONiC/blob/nat_doc_changes/doc/nat/nat_design_spec.md

[Add to SONiC in https://github.com/Azure/sonic-linux-kernel/pull/100]
Signed-off-by: Kiran Kella 
[forward port to Linux v4.19, 
https://github.com/Azure/sonic-linux-kernel/pull/147]
Signed-off-by: Akhilesh Samineni 
Signed-off-by: Paul Menzel 
---
Dear Linux folks,


This is taken from switch network operating system (NOS) SONiC’s Linux
repository, where the support was added in September 2019 [1], and
forwarded ported to Linux 4.19 by Akhilesh in June 2020 [2].

I am sending it upstream as a request for comments, before effort is put
into forward porting it to Linux master.


Kind regards,

Paul 


[1]: https://github.com/Azure/sonic-linux-kernel/pull/100
[2]: https://github.com/Azure/sonic-linux-kernel/pull/147

 include/net/netfilter/nf_conntrack.h |   3 +
 include/net/netfilter/nf_nat.h   |   6 +
 include/net/netfilter/nf_nat_l4proto.h   |  12 +-
 include/uapi/linux/netfilter/nf_nat.h|   1 +
 net/ipv4/netfilter/nf_nat_proto_gre.c|   8 +-
 net/ipv4/netfilter/nf_nat_proto_icmp.c   |   6 +-
 net/ipv6/netfilter/nf_nat_proto_icmpv6.c |   5 +-
 net/netfilter/nf_nat_core.c  | 173 ---
 net/netfilter/nf_nat_proto_common.c  |  32 +++--
 net/netfilter/nf_nat_proto_dccp.c|   6 +-
 net/netfilter/nf_nat_proto_sctp.c|   6 +-
 net/netfilter/nf_nat_proto_tcp.c |   6 +-
 net/netfilter/nf_nat_proto_udp.c |  12 +-
 net/netfilter/nf_nat_proto_unknown.c |   4 +-
 14 files changed, 220 insertions(+), 60 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack.h 
b/include/net/netfilter/nf_conntrack.h
index f45141bdbb83..64b9293a31f6 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -84,6 +84,9 @@ struct nf_conn {
 #if IS_ENABLED(CONFIG_NF_NAT)
struct hlist_node   nat_bysource;
 #endif
+/* To optionally ensure 3-tuple uniqueness on the translated source */
+struct hlist_node   nat_by_manip_src;
+
/* all members below initialized via memset */
u8 __nfct_init_offset[0];
 
diff --git a/include/net/netfilter/nf_nat.h b/include/net/netfilter/nf_nat.h
index a17eb2f8d40e..7c3cc3c7b35f 100644
--- a/include/net/netfilter/nf_nat.h
+++ b/include/net/netfilter/nf_nat.h
@@ -51,6 +51,12 @@ struct nf_conn_nat *nf_ct_nat_ext_add(struct nf_conn *ct);
 int nf_nat_used_tuple(const struct nf_conntrack_tuple *tuple,
  const struct nf_conn *ignored_conntrack);
 
+/* Is this 3-tuple already taken? (not by us)*/
+int
+nf_nat_used_3_tuple(const struct nf_conntrack_tuple *tuple,
+   const struct nf_conn *ignored_conntrack,
+   enum nf_nat_manip_type maniptype);
+
 static inline struct nf_conn_nat *nfct_nat(const struct nf_conn *ct)
 {
 #if defined(CONFIG_NF_NAT) || defined(CONFIG_NF_NAT_MODULE)
diff --git a/include/net/netfilter/nf_nat_l4proto.h 
b/include/net/netfilter/nf_nat_l4proto.h
index b4d6b29bca62..fbcbb9ad9e4b 100644
--- a/include/net/netfilter/nf_nat_l4proto.h
+++ b/include/net/netfilter/nf_nat_l4proto.h
@@ -32,7 +32,7 @@ struct nf_nat_l4proto {
 * possible.  Per-protocol part of tuple is initialized to the
 * incoming packet.
 */
-   void (*unique_tuple)(const struct nf_nat_l3proto *l3proto,
+   int  (*unique_tuple)(const struct nf_nat_l3proto *l3proto,
 struct nf_conntrack_tuple *tuple,
 const struct nf_nat_range2 *range,
 enum nf_nat_manip_type maniptype,
@@ -70,11 +70,11 @@ bool nf_nat_l4proto_in_range(const struct 
nf_conntrack_tuple *tuple,
 const union nf_conntrack_man_proto *min,
 const union nf_conntrack_man_proto *max);
 
-void nf_nat_l4proto_unique_tuple(const struct