Hi:

Data plane restores when cleaning up flows using ovs-dpctl del-flows and
eventually all the flows catch up as flows added by ovn are intact.
However, not sure what flow caused this as the issue pops up on
ovs-vswitchd restarts and needs to be  workaround by dpctl del-flows. Not
sure if it's due to version compatibility with 2.11 ovn and 2.16 ovs or any
particular patch in ovs/ovn that already has this fix . Will keep looking
in parallel as the workaround unblocks this for now. Any additional
pointers would be good too vs this workaround.

Regards,
Aliasgar


On Fri, Apr 19, 2024 at 4:24 PM aginwala <aginw...@asu.edu> wrote:

> Hi All:
>
> Part of upgrading OVN north south gateway to the new 5.4 kernel , VMs
> connectivity is lost when setting chassis for provider network lrp to this
> new gateway. For interconnection gateways and hypervisors its not an issue/
> lrp
> _uuid               : 387a735d-fc11-4e90-8655-07785aa024af
> chassis             : b80a285b-586a-42d9-b189-69d641f143b1
> datapath            : d9219b69-5961-4f24-8414-1d4054b23169
> external_ids        : {}
> gateway_chassis     : [728adc6d-3236-4637-86e3-0f6745cf1b50,
> 7a372e68-c228-400b-9a4b-439cf234ed40, 82295a9c-02aa-416b-bac3-83755c687caf,
> d1b42374-c475-4745-abdb-36e72140c5b5]
> logical_port        : "cr-lrp-9a1d2341-efb0-4e7e-839d-99b01944ba2e"
> mac                 : ["74:db:d1:80:d3:af 10.169.247.140/24"]
> nat_addresses       : []
> options             :
> {distributed-port="lrp-9a1d2341-efb0-4e7e-839d-99b01944ba2e"}
> parent_port         : []
> tag                 : []
> tunnel_key          : 2
> type                : chassisredirect
>
> provider network
> port provnet-f239a6e8-73a5-4f95-8410-f7b3e0befe90
>         type: localnet
>         tag: 20
>         addresses: ["unknown"]
> ## encap ip for ovn is on eth0
>
> ## gw interfaces brens2f0 hosts uplink provider network
> ovs-vsctl list-br
> br-int
> brens2f0
> ovs-vsctl list-ports brens2f0
> ens2f0
> patch-provnet-f239a6e8-73a5-4f95-8410-f7b3e0befe90-to-br-int
> ## fail mode secure
> ovs-vsctl get-fail-mode br-int
> secure
> ## set chassis
> ovn-nbctl lrp-set-gateway-chassis lrp-9a1d2341-efb0-4e7e-839d-99b01944ba2e
> cee81be9-f782-4c82-800e-c5c5327531e4 101
>
> ovn-controller is running as a container on the new gateway
> ovn-controller --version
> ovn-controller (Open vSwitch) 2.11.1-13
> OpenFlow versions 0x4:0x4
>
> ## ovs on the host 5.4 kernel
> ovs-vsctl --version
> ovs-vsctl (Open vSwitch) 2.16.0
> DB Schema 8.3.0
>
> ovs-ofctl --version
> ovs-ofctl (Open vSwitch) 2.16.0
> OpenFlow versions 0x1:0x6
>
>
> Digging further with tcpdump on the destination vm interface shows vlan
> being present causing connectivity failure and no reply packet
> 20:26:06.371540 74:db:d1:80:09:01 > 74:db:d1:80:0a:15, ethertype 802.1Q
> (0x8100), length 102: vlan 20, p 0, ethertype IPv4, (tos 0x0, ttl 56, id
> 53702, offset 0, flags [none], proto ICMP (1), length 84) 10.228.4.180 >
> 10.78.8.42: ICMP echo request, id 7765, seq 791, length 64
> 20:26:07.375960 74:db:d1:80:09:01 > 74:db:d1:80:0a:15, ethertype 802.1Q
> (0x8100), length 102: vlan 20, p 0, ethertype IPv4, (tos 0x0, ttl 56, id
> 36269, offset 0, flags [none], proto ICMP (1), length 84) 10.228.4.180 >
> 10.78.8.42: ICMP echo request, id 7765, seq 792, length 64
>
> openflow rules for atrip vlan 20 is correct that are programmed with ovn
> on new/old gw :
> ovs-ofctl dump-flows br-int | grep strip_vlan | grep 20
> cookie=0x0, duration=27.894s, table=65, n_packets=136, n_bytes=19198,
> idle_age=0, priority=100,reg15=0x1,metadata=0x1
> actions=mod_vlan_vid:20,output:161,strip_vlan
> cookie=0x0, duration=30.055s, table=0, n_packets=1592, n_bytes=130783,
> idle_age=0, priority=150,in_port=161,dl_vlan=20
> actions=strip_vlan,load:0xe1->NXM_NX_REG13[],load:0x36->NXM_NX_REG11[],load:0xd7->NXM_NX_REG12[],load:0x1->OXM_OF_METADATA[],load:0x1->NXM_NX_REG14[],resubmit(,8)
>
>
> Checking ovs datapath flow shows vlan being present
> ovs-dpctl dump-flows  | grep vlan
> recirc_id(0x422),tunnel(tun_id=0x10066000005,src=10.172.66.144,dst=10.173.84.83,flags(-df+csum+key)),in_port(1),ct_state(+new-est-rel-rpl-inv+trk),ct_label(0/0x1),eth(dst=74:db:d1:80:0a:15),eth_type(0x8100),vlan(vid=20/0x14),encap(eth_type(0x0800),ipv4(frag=no)),
> packets:1713, bytes:174726, used:0.145s, actions:5
>
> Couldn't find much drift with ofproto/trace
> ovs-appctl ofproto/trace br-int in_port=2321,dl_vlan=20
> running on old/new gw (replace with in_port)
>
>
> Tried stripping on the hypervisor/compute and data plane is ok but thats
> not the right approach
> ovs-ofctl add-flow br-int "priority=65535,dl_vlan=20
> actions=strip_vlan,output:4597"
>
> Downgrading the kernel to 4.15 and pinning to ovs 2.11 restores the data
> plane with no vlan and 802.1q in the tcpdump on the destion workload tap
> interface.
>
>
> Is it a bug or known issue with later versions; post 2.11 version of ovs
> when tagged vlan is present for provider network?
>
> Tried to pin oflow version to 1.4 too but didn't help much as strip_vlan
> flows are good. Any pointers further would be great as we continue to debug.
>
>
> Regards,
> Aliasgar
>
>
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to