Thanks for working on the patch. I have some discussions with Yi-Hung and Toms.
Add comments inline.

On Tue, Dec 15, 2020 at 3:41 PM Yi-Hung Wei <yihung....@gmail.com> wrote:
>
> On userspace datapath, geneve option flag FLOW_TNL_F_UDPIF should only
> be set when the geneve option is available.  However, currently, we always
> set FLOW_TNL_F_UDPIF in the metadata flag, and that may result in megaflow
> revalidation issue when the geneve option length is zero due to the datapath
> flow key mis-match.  That is the original flow key always has FLOW_TNL_F_UDPIF
> set, while the regenerated flow key during revalidation process  only has
> FLOW_TNL_F_UDPIF set if geneve option length is zero.  For reference, check
> tun_metadata_to_geneve_nlattr() to see how the flow key of geneve
> tunnel metadatafor are generated from netlink attributes.
>

The issue shows up when using Geneve without geneve TLV metadata.
That's why it does not show up in OVN use case or VMware's internal case.

> The following are the reproducible steps reported by Antonin.
> After step 6, OvS reports a warning about failing to update the datapath
> flow.
>
> 2020-12-15T01:56:21.324Z|00007|dpif(revalidator4)|WARN|netdev@ovs-netdev:
> failed to put[modify] (No such file or directory)
> ufid:d252fe62-5b2a-44e6-846a-190328190e09 skb_priority(0/0),
> tunnel(tun_id=0x0,src=192.168.77.1,dst=192.168.77.2,ttl=64/0,tp_src=57391/0,
> tp_dst=6081/0,flags(-df-csum+key)),skb_mark(0/0),ct_state(0x21/0x21),
> ct_zone(0xfff0/0),ct_mark(0/0),ct_label(0/0),ct_tuple4(src=10.10.0.1/0.0.0.0,
> dst=10.10.0.2/0.0.0.0,proto=1/0,tp_src=8/0,tp_dst=0/0),recirc_id(0xa),
> dp_hash(0/0),in_port(5),packet_type(ns=0,id=0),
> eth(src=1e:9c:f0:fb:18:9c/00:00:00:00:00:00,dst=ee:9f:60:8a:c0:a5/00:00:00:00:00:00),
> eth_type(0x0800),ipv4(src=10.10.0.1/0.0.0.0,dst=10.10.0.2,proto=1/0,tos=0/0,
> ttl=64/0,frag=no),icmp(type=8/0,code=0/0), actions:drop

So the reason revalidator has error is due to revalidator creates a
flow without setting
the FLOW_TNL_F_UDPIF (due to length zero).
While the existing megaflow cache has FLOW_TNL_F_UDPIF set?

>
> Setup:
> 2 Nodes on the same subnet (192.168.77.101/24 - 192.168.77.102/24)
>
> Step 1 – Creating bridges (run on each Node):
>
> iface="enp0s8"
>
> hwaddr=$(ip link show $iface | grep link/ether | awk '{print $2}')
> inet=$(ip addr show $iface | grep "inet " | awk '{ print $2 }')
>
> ovs-vsctl add-br br-phy \
>           -- set Bridge br-phy datapath_type=netdev \
>           -- br-set-external-id br-phy bridge-id br-phy \
>           -- set bridge br-phy fail-mode=standalone \
>           other_config:hwaddr="$hwaddr"
> ovs-vsctl --timeout 10 add-port br-phy "$iface"
> ip addr add "$inet" dev br-phy
> ip link set br-phy up
> ip addr flush dev enp0s8 2>/dev/null
> ip link set enp0s8 up
> iptables -F
>
> ovs-vsctl add-br br-int \
>           -- set Bridge br-int datapath_type=netdev \
>           -- set bridge br-phy fail-mode=standalone
>
> Step 2 – Creating netns to for overlay endpoints:
>
> On first Node (192.168.77.101):
> ip netns add ns0
> ip link add veth0 type veth peer name veth1
> ip link set veth0 netns ns0
> ovs-vsctl add-port br-int veth1
> ip link set dev veth1 up
> ip netns exec ns0 ip addr add 10.10.0.1/24 dev veth0
> ip netns exec ns0 ip link set dev veth0 up
>
> On second Node (192.168.77.102):
> ip netns add ns0
> ip link add veth0 type veth peer name veth1
> ip link set veth0 netns ns0
> ovs-vsctl add-port br-int veth1
> ip link set dev veth1 up
> ip netns exec ns0 ip addr add 10.10.0.2/24 dev veth0
> ip netns exec ns0 ip link set dev veth0 up
>
> Step 3 – Create tunnel (run on each Node):
>
> ovs-vsctl add-port br-int tun0 -- set interface tun0 type=geneve \
> ofport_request=100 options:remote_ip=flow options:key=flow
>
> Step 4 – Create the following flows:
>
> On first Node (192.168.77.101):
> root@ovs-test-node-1:/home/vagrant# ovs-ofctl dump-flows br-int --no-stats
> priority=100,ip actions=resubmit(,10)
> priority=0 actions=NORMAL
> priority=50 actions=resubmit(,40)
> table=10, priority=100,ip actions=ct(table=20,zone=65520)
> table=20, priority=200,ct_state=-new+trk,ip actions=resubmit(,30)
> table=20, priority=100,ip,nw_dst=10.10.0.2 actions=resubmit(,30)
> table=20, priority=0,ip actions=drop
> table=30, priority=100,ip actions=ct(commit,table=40,zone=65520)
> table=40, priority=100,in_port=veth1 
> actions=load:0xc0a84d66->NXM_NX_TUN_IPV4_DST[],output:tun0
> table=40, priority=100,in_port=tun0 actions=output:veth1
> table=40, priority=0 actions=drop
>
> On second Node (192.168.77.102):
> root@ovs-test-node-2:/home/vagrant# ovs-ofctl dump-flows br-int --no-stats
> priority=100,ip actions=resubmit(,10)
> priority=0 actions=NORMAL
> priority=50 actions=resubmit(,40)
> table=10, priority=100,ip actions=ct(table=20,zone=65520)
> table=20, priority=200,ct_state=-new+trk,ip actions=resubmit(,30)
> table=20, priority=100,ip,nw_dst=10.10.0.2 actions=resubmit(,30)
> table=20, priority=0,ip actions=drop
> table=30, priority=100,ip actions=ct(commit,table=40,zone=65520)
> table=40, priority=100,in_port=veth1 
> actions=load:0xc0a84d65->NXM_NX_TUN_IPV4_DST[],output:tun0
> table=40, priority=100,in_port=tun0 actions=output:veth1
> table=40, priority=0 actions=drop
>
> Step 5 – Check that ping works:
> From the first Node: ip netns exec ns0 ping 10.10.0.2
>
> Step 6 – The actual issue:
> From the first Node: ip netns exec ns0 ping -c 3 10.10.0.2
> From the second Node: ovs-ofctl del-flows br-int 
> 'table=20,ip,nw_dst=10.10.0.2'
> From the first Node: ip netns exec ns0 ping -c 3 10.10.0.2

Thanks for the steps.
The bug should be reproducible without using conntrack.

>
> Execute these instructions in order, as soon as the previous one
> completes.  If you follow these steps, you should see the ping in
> the last step succeed.  This is not expected because of the deleted
> flow. It also does not happen with VXLAN.
Make sense, because it's the inconsistency in FLOW_TNL_F_UDPIF field
in megaflow matching.
VXLAN does not use it.

> Reported-by: Antonin Bas <a...@vmware.com>
> Reported-at: https://github.com/vmware-tanzu/antrea/issues/897
> Fixes: 6b241d645291 ("netdev-vport: Factor-out tunnel Push-pop code into 
> separate module.")
> Signed-off-by: Yi-Hung Wei <yihung....@gmail.com>

VMWare-BZ: #2623444

> ---
>  lib/netdev-native-tnl.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
> index b89dfdd52a86..09cff3a1fc7d 100644
> --- a/lib/netdev-native-tnl.c
> +++ b/lib/netdev-native-tnl.c
> @@ -1015,9 +1015,11 @@ netdev_geneve_pop_header(struct dp_packet *packet)
>      tnl->tun_id = htonll(ntohl(get_16aligned_be32(&gnh->vni)) >> 8);
>      tnl->flags |= FLOW_TNL_F_KEY;
>
> -    memcpy(tnl->metadata.opts.gnv, gnh->options, opts_len);
> -    tnl->metadata.present.len = opts_len;
> -    tnl->flags |= FLOW_TNL_F_UDPIF;
> +    if (opts_len > 0) {
> +        memcpy(tnl->metadata.opts.gnv, gnh->options, opts_len);
> +        tnl->metadata.present.len = opts_len;
> +        tnl->flags |= FLOW_TNL_F_UDPIF;
> +    }
>
>      packet->packet_type = htonl(PT_ETH);
>      dp_packet_reset_packet(packet, hlen);
> --
LGTM.
Acked-by: William Tu <u9012...@gmail.com>
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to