On Wed, Nov 23, 2022 at 04:34:45PM -0500, Xin Long wrote: > On Wed, Nov 23, 2022 at 4:21 PM Marcelo Ricardo Leitner > <marcelo.leit...@gmail.com> wrote: > > > > On Wed, Nov 23, 2022 at 02:55:05PM -0500, Xin Long wrote: > > > On Wed, Nov 23, 2022 at 2:17 PM Marcelo Ricardo Leitner > > > <marcelo.leit...@gmail.com> wrote: > > > > > > > > On Wed, Nov 23, 2022 at 01:54:41PM -0500, Xin Long wrote: > > > > > On Wed, Nov 23, 2022 at 1:48 PM Marcelo Ricardo Leitner > > > > > <marcelo.leit...@gmail.com> wrote: > > > > > > > > > > > > On Wed, Nov 23, 2022 at 12:31:38PM -0500, Xin Long wrote: > > > > > > > On Wed, Nov 23, 2022 at 10:13 AM Marcelo Ricardo Leitner > > > > > > > <marcelo.leit...@gmail.com> wrote: > > > > > > > > > > > > > > > > On Wed, Nov 23, 2022 at 12:09:55PM -0300, Marcelo Ricardo > > > > > > > > Leitner wrote: > > > > > > > > > On Tue, Nov 22, 2022 at 12:32:21PM -0500, Xin Long wrote: > > > > > > > > > > +int nf_ct_nat(struct sk_buff *skb, struct nf_conn *ct, > > > > > > > > > > + enum ip_conntrack_info ctinfo, int *action, > > > > > > > > > > + const struct nf_nat_range2 *range, bool commit) > > > > > > > > > > +{ > > > > > > > > > > + enum nf_nat_manip_type maniptype; > > > > > > > > > > + int err, ct_action = *action; > > > > > > > > > > + > > > > > > > > > > + *action = 0; > > > > > > > > > > + > > > > > > > > > > + /* Add NAT extension if not confirmed yet. */ > > > > > > > > > > + if (!nf_ct_is_confirmed(ct) && !nf_ct_nat_ext_add(ct)) > > > > > > > > > > + return NF_ACCEPT; /* Can't NAT. */ > > > > > > > > > > + > > > > > > > > > > + if (ctinfo != IP_CT_NEW && (ct->status & IPS_NAT_MASK) > > > > > > > > > > && > > > > > > > > > > + (ctinfo != IP_CT_RELATED || commit)) { > > > > > > > > > > + /* NAT an established or related connection > > > > > > > > > > like before. */ > > > > > > > > > > + if (CTINFO2DIR(ctinfo) == IP_CT_DIR_REPLY) > > > > > > > > > > + /* This is the REPLY direction for a > > > > > > > > > > connection > > > > > > > > > > + * for which NAT was applied in the > > > > > > > > > > forward > > > > > > > > > > + * direction. Do the reverse NAT. > > > > > > > > > > + */ > > > > > > > > > > + maniptype = ct->status & IPS_SRC_NAT > > > > > > > > > > + ? NF_NAT_MANIP_DST : > > > > > > > > > > NF_NAT_MANIP_SRC; > > > > > > > > > > + else > > > > > > > > > > + maniptype = ct->status & IPS_SRC_NAT > > > > > > > > > > + ? NF_NAT_MANIP_SRC : > > > > > > > > > > NF_NAT_MANIP_DST; > > > > > > > > > > + } else if (ct_action & (1 << NF_NAT_MANIP_SRC)) { > > > > > > > > > > + maniptype = NF_NAT_MANIP_SRC; > > > > > > > > > > + } else if (ct_action & (1 << NF_NAT_MANIP_DST)) { > > > > > > > > > > + maniptype = NF_NAT_MANIP_DST; > > > > > > > > > > + } else { > > > > > > > > > > + return NF_ACCEPT; > > > > > > > > > > + } > > > > > > > > > > + > > > > > > > > > > + err = nf_ct_nat_execute(skb, ct, ctinfo, action, range, > > > > > > > > > > maniptype); > > > > > > > > > > + if (err == NF_ACCEPT && ct->status & IPS_DST_NAT) { > > > > > > > > > > + if (ct->status & IPS_SRC_NAT) { > > > > > > > > > > + if (maniptype == NF_NAT_MANIP_SRC) > > > > > > > > > > + maniptype = NF_NAT_MANIP_DST; > > > > > > > > > > + else > > > > > > > > > > + maniptype = NF_NAT_MANIP_SRC; > > > > > > > > > > + > > > > > > > > > > + err = nf_ct_nat_execute(skb, ct, > > > > > > > > > > ctinfo, action, range, > > > > > > > > > > + maniptype); > > > > > > > > > > + } else if (CTINFO2DIR(ctinfo) == > > > > > > > > > > IP_CT_DIR_ORIGINAL) { > > > > > > > > > > + err = nf_ct_nat_execute(skb, ct, > > > > > > > > > > ctinfo, action, NULL, > > > > > > > > > > + > > > > > > > > > > NF_NAT_MANIP_SRC); > > > > > > > > > > + } > > > > > > > > > > + } > > > > > > > > > > + return err; > > > > > > > > > > +} > > > > > > > > > > +EXPORT_SYMBOL_GPL(nf_ct_nat); > > > > > > > > > > diff --git a/net/openvswitch/conntrack.c > > > > > > > > > > b/net/openvswitch/conntrack.c > > > > > > > > > > index cc643a556ea1..d03c75165663 100644 > > > > > > > > > > --- a/net/openvswitch/conntrack.c > > > > > > > > > > +++ b/net/openvswitch/conntrack.c > > > > > > > > > > @@ -726,144 +726,27 @@ static void > > > > > > > > > > ovs_nat_update_key(struct sw_flow_key *key, > > > > > > > > > > } > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > -/* Modelled after nf_nat_ipv[46]_fn(). > > > > > > > > > > - * range is only used for new, uninitialized NAT state. > > > > > > > > > > - * Returns either NF_ACCEPT or NF_DROP. > > > > > > > > > > - */ > > > > > > > > > > -static int ovs_ct_nat_execute(struct sk_buff *skb, struct > > > > > > > > > > nf_conn *ct, > > > > > > > > > > - enum ip_conntrack_info ctinfo, > > > > > > > > > > - const struct nf_nat_range2 *range, > > > > > > > > > > - enum nf_nat_manip_type maniptype, > > > > > > > > > > struct sw_flow_key *key) > > > > > > > > > > -{ > > > > > > > > > > - int hooknum, err = NF_ACCEPT; > > > > > > > > > > - > > > > > > > > > > - /* See HOOK2MANIP(). */ > > > > > > > > > > - if (maniptype == NF_NAT_MANIP_SRC) > > > > > > > > > > - hooknum = NF_INET_LOCAL_IN; /* Source NAT */ > > > > > > > > > > - else > > > > > > > > > > - hooknum = NF_INET_LOCAL_OUT; /* Destination NAT > > > > > > > > > > */ > > > > > > > > > > - > > > > > > > > > > - switch (ctinfo) { > > > > > > > > > > - case IP_CT_RELATED: > > > > > > > > > > - case IP_CT_RELATED_REPLY: > > > > > > > > > > - if (IS_ENABLED(CONFIG_NF_NAT) && > > > > > > > > > > - skb->protocol == htons(ETH_P_IP) && > > > > > > > > > > - ip_hdr(skb)->protocol == IPPROTO_ICMP) { > > > > > > > > > > - if (!nf_nat_icmp_reply_translation(skb, > > > > > > > > > > ct, ctinfo, > > > > > > > > > > - > > > > > > > > > > hooknum)) > > > > > > > > > > - err = NF_DROP; > > > > > > > > > > - goto out; > > > > > > > > > > - } else if (IS_ENABLED(CONFIG_IPV6) && > > > > > > > > > > - skb->protocol == htons(ETH_P_IPV6)) { > > > > > > > > > > - __be16 frag_off; > > > > > > > > > > - u8 nexthdr = ipv6_hdr(skb)->nexthdr; > > > > > > > > > > - int hdrlen = ipv6_skip_exthdr(skb, > > > > > > > > > > - > > > > > > > > > > sizeof(struct ipv6hdr), > > > > > > > > > > - &nexthdr, > > > > > > > > > > &frag_off); > > > > > > > > > > - > > > > > > > > > > - if (hdrlen >= 0 && nexthdr == > > > > > > > > > > IPPROTO_ICMPV6) { > > > > > > > > > > - if > > > > > > > > > > (!nf_nat_icmpv6_reply_translation(skb, ct, > > > > > > > > > > - > > > > > > > > > > ctinfo, > > > > > > > > > > - > > > > > > > > > > hooknum, > > > > > > > > > > - > > > > > > > > > > hdrlen)) > > > > > > > > > > - err = NF_DROP; > > > > > > > > > > - goto out; > > > > > > > > > > - } > > > > > > > > > > - } > > > > > > > > > > - /* Non-ICMP, fall thru to initialize if needed. > > > > > > > > > > */ > > > > > > > > > > - fallthrough; > > > > > > > > > > - case IP_CT_NEW: > > > > > > > > > > - /* Seen it before? This can happen for > > > > > > > > > > loopback, retrans, > > > > > > > > > > - * or local packets. > > > > > > > > > > - */ > > > > > > > > > > - if (!nf_nat_initialized(ct, maniptype)) { > > > > > > > > > > - /* Initialize according to the NAT > > > > > > > > > > action. */ > > > > > > > > > > - err = (range && range->flags & > > > > > > > > > > NF_NAT_RANGE_MAP_IPS) > > > > > > > > > > - /* Action is set up to > > > > > > > > > > establish a new > > > > > > > > > > - * mapping. > > > > > > > > > > - */ > > > > > > > > > > - ? nf_nat_setup_info(ct, range, > > > > > > > > > > maniptype) > > > > > > > > > > - : nf_nat_alloc_null_binding(ct, > > > > > > > > > > hooknum); > > > > > > > > > > - if (err != NF_ACCEPT) > > > > > > > > > > - goto out; > > > > > > > > > > - } > > > > > > > > > > - break; > > > > > > > > > > - > > > > > > > > > > - case IP_CT_ESTABLISHED: > > > > > > > > > > - case IP_CT_ESTABLISHED_REPLY: > > > > > > > > > > - break; > > > > > > > > > > - > > > > > > > > > > - default: > > > > > > > > > > - err = NF_DROP; > > > > > > > > > > - goto out; > > > > > > > > > > - } > > > > > > > > > > - > > > > > > > > > > - err = nf_nat_packet(ct, ctinfo, hooknum, skb); > > > > > > > > > > -out: > > > > > > > > > > - /* Update the flow key if NAT successful. */ > > > > > > > > > > - if (err == NF_ACCEPT) > > > > > > > > > > - ovs_nat_update_key(key, skb, maniptype); > > > > > > > > > > - > > > > > > > > > > - return err; > > > > > > > > > > -} > > > > > > > > > > - > > > > > > > > > > /* Returns NF_DROP if the packet should be dropped, > > > > > > > > > > NF_ACCEPT otherwise. */ > > > > > > > > > > static int ovs_ct_nat(struct net *net, struct sw_flow_key > > > > > > > > > > *key, > > > > > > > > > > const struct ovs_conntrack_info *info, > > > > > > > > > > struct sk_buff *skb, struct nf_conn *ct, > > > > > > > > > > enum ip_conntrack_info ctinfo) > > > > > > > > > > { > > > > > > > > > > - enum nf_nat_manip_type maniptype; > > > > > > > > > > - int err; > > > > > > > > > > + int err, action = 0; > > > > > > > > > > > > > > > > > > > > if (!(info->nat & OVS_CT_NAT)) > > > > > > > > > > return NF_ACCEPT; > > > > > > > > > > + if (info->nat & OVS_CT_SRC_NAT) > > > > > > > > > > + action |= (1 << NF_NAT_MANIP_SRC); > > > > > > > > > > + if (info->nat & OVS_CT_DST_NAT) > > > > > > > > > > + action |= (1 << NF_NAT_MANIP_DST); > > > > > > > > > > > > > > > > > > I'm wondering why this dance at this level with supporting > > > > > > > > > multiple > > > > > > > > > MANIPs while actually only one can be used at a time. > > > > > > > > > > > > > > > > > > act_ct will reject an action using both: > > > > > > > > > if ((p->ct_action & TCA_CT_ACT_NAT_SRC) && > > > > > > > > > (p->ct_action & TCA_CT_ACT_NAT_DST)) { > > > > > > > > > NL_SET_ERR_MSG_MOD(extack, "dnat and snat > > > > > > > > > can't be enabled at the same time"); > > > > > > > > > return -EOPNOTSUPP; > > > > > > > > > > > > > > > > > > I couldn't find this kind of check in ovs code right now > > > > > > > > > (didn't look much, I > > > > > > > > > confess :)), but even the code here was already doing: > > > > > > > > > > > > > > > > > > - } else if (info->nat & OVS_CT_SRC_NAT) { > > > > > > > > > - maniptype = NF_NAT_MANIP_SRC; > > > > > > > > > - } else if (info->nat & OVS_CT_DST_NAT) { > > > > > > > > > - maniptype = NF_NAT_MANIP_DST; > > > > > > > > > > > > > > > > > > And in case of tuple conflict, maniptype will be forcibly > > > > > > > > > updated in > > > > > > > > > [*] below. > > > > > > > > > > > > > > > > Ahh.. just found why.. in case of typle conflict, nf_ct_nat() > > > > > > > > now may > > > > > > > > set both. > > > > > > > Right. > > > > > > > BTW. The tuple conflict processing actually has provided the > > > > > > > code to do full nat (snat and dnat at the same time), which I > > > > > > > think is more efficient than doing full nat in two zones. All > > > > > > > we have to do is adjust a few lines of code for that. > > > > > > > > > > > > In this part, yes. But it needs some surgery all around for full > > > > > > support. The code in ovs kernel for using only one type starts here: > > > > > > > > > > > > static int parse_nat(const struct nlattr *attr, > > > > > > struct ovs_conntrack_info *info, bool log) > > > > > > { > > > > > > ... > > > > > > switch (type) { > > > > > > case OVS_NAT_ATTR_SRC: > > > > > > case OVS_NAT_ATTR_DST: > > > > > > if (info->nat) { > > > > > > OVS_NLERR(log, "Only one type of > > > > > > NAT may be specified"); > > > > > > return -ERANGE; > > > > > > } > > > > > > ... > > > > > > } > > > > > > > > > > > > So vswitchd also doesn't support it. And then tc, iproute and > > > > > > drivers > > > > > > that offload it as well. Not sure if it has impacts on openflow too. > > > > > > > > > > > not in one single action, but two actions: > > > > > > > > Oh. > > > > > > > > > > > > > > "table=1, in_port=veth1,tcp,tcp_dst=2121,ct_state=+trk+new > > > > > actions=ct(nat(dst=7.7.16.3)),ct(commit, nat(src=7.7.16.1), > > > > > alg=ftp),veth2" > > > > > > > > > > as long as it allows the 1st one doesn't commit, which is a simple > > > > > check in parse_nat(). > > > > > I tested it, TC already supports it. I'm not sure about drivers, but I > > > > > > > > There's an outstanding issue with act_ct that it may reuse an old > > > > CT cache. Fixing it could (I'm not sure) impact this use case: > > > > > > > > https://bugzilla.redhat.com/show_bug.cgi?id=2099220 > > > > same issue in ovs was fixed in > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2061ecfdf2350994e5b61c43e50e98a7a70e95ee > > > > > > > > (please don't ask me who would NAT and then overwrite IP addresses and > > > > then NAT it again :D) > > > I thought only traditional NAT would change IP, I'm too naive. > > > > > > nftables names this as "stateless NAT." > > > With two CTs in the same zone for full nat is more close to the > > > netfilter's NAT processing (the same CT goes from prerouting to > > > postrouting). > > > Now I'm wondering how nftables handles the stateful NAT and stateless > > > NAT at the same time. > > > > Me too. > > > > > > > > > > > > > > think it should be, as with two actions. > > > > > > > > mlx5 at least currently doesn't support it. Good that tc already > > > > supports it, then. Maybe drivers can get up to speed later on. > > > So because mlx5 currently only supports one ct action in one tc rule > > > or something? > > > > Not sure what you meant here? > Sorry. that might be a silly question, I don't know much about TC > HW offload, and just curious why such a rule can not be offloaded > in mlx5 driver: > > "actions=ct(nat(dst=7.7.16.3)),ct(commit, nat(src=7.7.16.1)),veth2" > > it doesn't support any tc NAT or just does not support tc full NAT HW offload?
AFAIK it just doesn't support 2 ct() actions in the same filter. It can do full NAT HW offload, as long as it's going through 2 filters. I don't know where the limitation comes from. I was hoping that Nvidia could comment on it, with at least a "yep, that's not supported" or otherwise. Maybe this is old stuff already. _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev