Re: [PATCH] netfilter: nf_nat: Support fullcone NAT
Paul Menzel wrote: > From: Kiran Kella > > Changes done in the kernel to ensure 3-tuple uniqueness of the conntrack > entries for the fullcone nat functionality. > > * Hashlist is maintained for the 3-tuple unique keys (Protocol/Source > IP/Port) for all the conntrack entries. > > * When NAT table rules are created with the fullcone option, the > SNAT/POSTROUTING stage ensures the ports from the pool are picked up in > such a way that the 3-tuple is uniquely assigned. > > * In the DNAT/POSTROUTING stage, the fullcone behavior is ensured by > checking > and reusing the 3-tuple for the Source IP/Port in the original direction. > > * When the pool is exhausted of the 3-tuple assignments, the packets are > dropped, else, they will be going out of the router they being 5-tuple > unique (which is not intended). > > * Passing fullcone option using iptables is part of another PR (in > sonic-buildimage repo). These are way too many changes for a single patch. Please consider splitting this ino multiple chunks, e.g. at least separate functional fullcone from the boilerplate changes. > The kernel changes mentioned above are done to counter the challenges > explained in the section *3.4.2.1 Handling NAT model mismatch between > the ASIC and the Kernel* in the NAT HLD [1]. And please add the relevant explanations from this > [1]: > https://github.com/kirankella/SONiC/blob/nat_doc_changes/doc/nat/nat_design_spec.md ... to the commit message. > This is taken from switch network operating system (NOS) SONiC’s Linux > repository, where the support was added in September 2019 [1], and > forwarded ported to Linux 4.19 by Akhilesh in June 2020 [2]. > I am sending it upstream as a request for comments, before effort > is put into forward porting it to Linux master. I don't see any huge problems from a technical pov. But I don't see why this functionality is needed from a pure SW point of view. AFAICS SONiC uses a propritary (or at least, custom) offload mechanism to place nat entries into HW. Netfilter already has a forwarding offload mechanism, described in Documentation/networking/nf_flowtable.rst , so I'm not sure it makes sense to accept this without patches to support the needed offload support as well. AFAIU passing fullcone makes no sense unless using offload HW that doesn't support the current nat port allocation scheme. And current kernel doesn't support any such HW. Nevertheless, some comments below. > +/* Is this 3-tuple already taken? (not by us)*/ > +int > +nf_nat_used_3_tuple(const struct nf_conntrack_tuple *tuple, > + const struct nf_conn *ignored_conntrack, > + enum nf_nat_manip_type maniptype); > + >*/ > - void (*unique_tuple)(const struct nf_nat_l3proto *l3proto, > + int (*unique_tuple)(const struct nf_nat_l3proto *l3proto, >struct nf_conntrack_tuple *tuple, >const struct nf_nat_range2 *range, >enum nf_nat_manip_type maniptype, The above change should be done in a separate patch, so its in a isolated change. This will ease review of the fullcone part. > /* generate unique tuple ... */ > -static void > +static int > gre_unique_tuple(const struct nf_nat_l3proto *l3proto, >struct nf_conntrack_tuple *tuple, >const struct nf_nat_range2 *range, > @@ -52,7 +52,7 @@ gre_unique_tuple(const struct nf_nat_l3proto *l3proto, > /* If there is no master conntrack we are not PPTP, > do not change tuples */ > if (!ct->master) > - return; > + return 0; > > if (maniptype == NF_NAT_MANIP_SRC) > keyptr = >src.u.gre.key; > @@ -73,11 +73,11 @@ gre_unique_tuple(const struct nf_nat_l3proto *l3proto, > for (i = 0; ; ++key) { > *keyptr = htons(min + key % range_size); > if (++i == range_size || !nf_nat_used_tuple(tuple, ct)) > - return; > + return 1; > } I suggest you use 'bool' type for this rather than int, unless you plan to use errno codes here in some future change. > @@ -155,6 +156,31 @@ hash_by_src(const struct net *n, const struct > nf_conntrack_tuple *tuple) > return reciprocal_scale(hash, nf_nat_htable_size); > } > > +static inline unsigned int > +hash_by_dst(const struct net *n, const struct nf_conntrack_tuple *tuple) please avoid inline keyword in .c files for new submissions. > +{ > + unsigned int hash; > + > + get_random_once(_nat_hash_rnd, sizeof(nf_nat_hash_rnd)); get_random_once can't be called from multiple places for the same random value. [ I did not check if thats the case since patch doesn't apply to current nf ]. > +static inline int > +same_reply_dst(const struct nf_conn *ct, > +const struct nf_conntrack_tuple *tuple) > +{ > + const struct nf_conntrack_tuple *t; > + > + t =
[PATCH] netfilter: nf_nat: Support fullcone NAT
From: Kiran Kella Changes done in the kernel to ensure 3-tuple uniqueness of the conntrack entries for the fullcone nat functionality. * Hashlist is maintained for the 3-tuple unique keys (Protocol/Source IP/Port) for all the conntrack entries. * When NAT table rules are created with the fullcone option, the SNAT/POSTROUTING stage ensures the ports from the pool are picked up in such a way that the 3-tuple is uniquely assigned. * In the DNAT/POSTROUTING stage, the fullcone behavior is ensured by checking and reusing the 3-tuple for the Source IP/Port in the original direction. * When the pool is exhausted of the 3-tuple assignments, the packets are dropped, else, they will be going out of the router they being 5-tuple unique (which is not intended). * Passing fullcone option using iptables is part of another PR (in sonic-buildimage repo). The kernel changes mentioned above are done to counter the challenges explained in the section *3.4.2.1 Handling NAT model mismatch between the ASIC and the Kernel* in the NAT HLD [1]. [1]: https://github.com/kirankella/SONiC/blob/nat_doc_changes/doc/nat/nat_design_spec.md [Add to SONiC in https://github.com/Azure/sonic-linux-kernel/pull/100] Signed-off-by: Kiran Kella [forward port to Linux v4.19, https://github.com/Azure/sonic-linux-kernel/pull/147] Signed-off-by: Akhilesh Samineni Signed-off-by: Paul Menzel --- Dear Linux folks, This is taken from switch network operating system (NOS) SONiCâs Linux repository, where the support was added in September 2019 [1], and forwarded ported to Linux 4.19 by Akhilesh in June 2020 [2]. I am sending it upstream as a request for comments, before effort is put into forward porting it to Linux master. Kind regards, Paul [1]: https://github.com/Azure/sonic-linux-kernel/pull/100 [2]: https://github.com/Azure/sonic-linux-kernel/pull/147 include/net/netfilter/nf_conntrack.h | 3 + include/net/netfilter/nf_nat.h | 6 + include/net/netfilter/nf_nat_l4proto.h | 12 +- include/uapi/linux/netfilter/nf_nat.h| 1 + net/ipv4/netfilter/nf_nat_proto_gre.c| 8 +- net/ipv4/netfilter/nf_nat_proto_icmp.c | 6 +- net/ipv6/netfilter/nf_nat_proto_icmpv6.c | 5 +- net/netfilter/nf_nat_core.c | 173 --- net/netfilter/nf_nat_proto_common.c | 32 +++-- net/netfilter/nf_nat_proto_dccp.c| 6 +- net/netfilter/nf_nat_proto_sctp.c| 6 +- net/netfilter/nf_nat_proto_tcp.c | 6 +- net/netfilter/nf_nat_proto_udp.c | 12 +- net/netfilter/nf_nat_proto_unknown.c | 4 +- 14 files changed, 220 insertions(+), 60 deletions(-) diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h index f45141bdbb83..64b9293a31f6 100644 --- a/include/net/netfilter/nf_conntrack.h +++ b/include/net/netfilter/nf_conntrack.h @@ -84,6 +84,9 @@ struct nf_conn { #if IS_ENABLED(CONFIG_NF_NAT) struct hlist_node nat_bysource; #endif +/* To optionally ensure 3-tuple uniqueness on the translated source */ +struct hlist_node nat_by_manip_src; + /* all members below initialized via memset */ u8 __nfct_init_offset[0]; diff --git a/include/net/netfilter/nf_nat.h b/include/net/netfilter/nf_nat.h index a17eb2f8d40e..7c3cc3c7b35f 100644 --- a/include/net/netfilter/nf_nat.h +++ b/include/net/netfilter/nf_nat.h @@ -51,6 +51,12 @@ struct nf_conn_nat *nf_ct_nat_ext_add(struct nf_conn *ct); int nf_nat_used_tuple(const struct nf_conntrack_tuple *tuple, const struct nf_conn *ignored_conntrack); +/* Is this 3-tuple already taken? (not by us)*/ +int +nf_nat_used_3_tuple(const struct nf_conntrack_tuple *tuple, + const struct nf_conn *ignored_conntrack, + enum nf_nat_manip_type maniptype); + static inline struct nf_conn_nat *nfct_nat(const struct nf_conn *ct) { #if defined(CONFIG_NF_NAT) || defined(CONFIG_NF_NAT_MODULE) diff --git a/include/net/netfilter/nf_nat_l4proto.h b/include/net/netfilter/nf_nat_l4proto.h index b4d6b29bca62..fbcbb9ad9e4b 100644 --- a/include/net/netfilter/nf_nat_l4proto.h +++ b/include/net/netfilter/nf_nat_l4proto.h @@ -32,7 +32,7 @@ struct nf_nat_l4proto { * possible. Per-protocol part of tuple is initialized to the * incoming packet. */ - void (*unique_tuple)(const struct nf_nat_l3proto *l3proto, + int (*unique_tuple)(const struct nf_nat_l3proto *l3proto, struct nf_conntrack_tuple *tuple, const struct nf_nat_range2 *range, enum nf_nat_manip_type maniptype, @@ -70,11 +70,11 @@ bool nf_nat_l4proto_in_range(const struct nf_conntrack_tuple *tuple, const union nf_conntrack_man_proto *min, const union nf_conntrack_man_proto *max); -void nf_nat_l4proto_unique_tuple(const struct