Re: [ovs-discuss] OVN/OVS no ARP response for internal router interface from external port
After investigating further, I believe I am hitting the following issue: https://bugs.launchpad.net/neutron/+bug/1995078 Essentially the external port and the LRP are being scheduled separately and without coordination. Because of this, if these ports are scheduled on different chassis the ARP request is dropped. Will need to build/test this fix and will follow up with a conclusion. Juniper Business Use Only From: Austin Cormier Date: Thursday, February 1, 2024 at 5:39 PM To: ovs-discuss@openvswitch.org Subject: OVN/OVS no ARP response for internal router interface from external port Looking for some help troubleshooting why OVS is not generating a response for my internal router port on a VLAN tenant network. I’ve dug down as far as I am reasonably able to but need a quick boost here. The ARP request is coming from an external system which is on the appropriate VLAN for the tenant network. East/West traffic is working as expected as I am able to communicate successfully with another VM on that vlan tenant network. It APPEARS that the flow never gets generated on br-int in the appropriate controller. I am going to walk through my debugging steps starting from the NB database to OVS on the appropriate controller that should be generating the ARP response: In OVN NB, I have a router with a port. This is my internal gateway interface of 192.168.5.1 - router 21cd6ac3-4804-4c68-a683-9bba07d97967 (neutron-5d87debf-cf0b-4fac-ba49-01b7680368aa) (aka vlan_test) port lrp-8c020ac1-ae54-4aa7-a143-4440067e9f42 mac: "fa:16:3e:37:43:b3" networks: ["192.168.5.1/24"] port lrp-e18d09db-19d8-4362-8252-751e6974ef5e mac: "fa:16:3e:37:67:a3" networks: ["10.27.14.50/23"] gateway chassis: [infra-prod-controller-02 infra-prod-controller-01 infra-prod-controller-03] nat 40605ce2-3f93-4877-ac26-47e4b257fa5f external ip: "10.27.14.50" logical ip: "192.168.5.0/24" type: "snat" The logical switch for the tenant network is connected to a localnet with tag 1106 and has the external port for my baremetal device and the appropriate router port. switch be7c870d-6d9c-471f-8996-e48a551068a0 (neutron-fcc39c5e-d3d8-4400-b5cf-b10f76d0112b) (aka vlan_test) port provnet-5ddf8b20-c849-484a-ad8f-a86a9856c2b2 type: localnet tag: 1106 addresses: ["unknown"] port 1f01c94e-f32f-4e94-b02a-813bb1ad4a47 addresses: ["unknown"] port 85aa9a1e-cb84-4137-97ce-85958a948390 addresses: ["fa:16:3e:ca:6e:3b 192.168.5.188"] port 8c020ac1-ae54-4aa7-a143-4440067e9f42 type: router router-port: lrp-8c020ac1-ae54-4aa7-a143-4440067e9f42 port 19279d0c-7c9a-498e-a3e5-269933c49df6 type: localport addresses: ["fa:16:3e:4a:14:28 192.168.5.2"] port 4c9047c4-a6c7-4b27-9cfa-58b5d30ce964 type: external addresses: ["90:ec:77:32:e6:6e 192.168.5.56"] In OVN SB, I can see that the external port (4c9047c4-a6c7-4b27-9cfa-58b5d30ce964) has been scheduled on infra-prod-controller-02. This is important because the ARP response would only get generated from a single HA Chassis. --- Chassis infra-prod-controller-02 hostname: infra-prod-controller-02 Encap geneve ip: "10.27.12.24" options: {csum="true"} Port_Binding cr-lrp-20ba6028-7220-4c8d-a20f-9e4c416da3f7 Port_Binding "71e436bb-7121-473e-a024-e34d4d7f4a4f" Port_Binding cr-lrp-c03b5dd9-92e1-4046-be1c-a953c0fab238 Port_Binding "f8eb9e30-e65f-44c4-94b6-a67700790880" Port_Binding cr-lrp-ae2f5dbb-2cd0-44d0-9061-71c8186440be Port_Binding "4c9047c4-a6c7-4b27-9cfa-58b5d30ce964" Port_Binding "eb4435ad-37f2-44f9-a786-470b18bb9f0d" Port_Binding cr-lrp-950eec85-b785-474b-837b-4ecbbcf080c9 Port_Binding "f3bbbe9a-a1a7-44b5-b6bc-b00a351ca1a5" Port_Binding "79430028-7ae3-448c-bc12-c9d7d44d218b" --- In OVN SB again, I can issue a trace command to verify that the Logical Flow exists to generate the ARP response: --- # ovn-trace neutron-fcc39c5e-d3d8-4400-b5cf-b10f76d0112b 'inport == "provnet-5ddf8b20-c849-484a-ad8f-a86a9856c2b2" && eth.src == 90:ec:77:32:e6:6e && eth.dst == ff:ff:ff:ff:ff:ff && arp.tpa == 192.168.5.1 && arp.spa == 192.168.5 .178 && arp.op == 1 && arp.tha == ff:ff:ff:ff:ff:ff && arp.sha == 90:ec:77:32:e6:6e' … ingress(dp="vlan_test", inport="lrp-8c020a") 0. lr_in_admission (northd.c:12885): eth.mcast && inport == "lrp-8c020a", priority 50, uuid faac4787 xreg0[0..47] = fa:16:3e:37:43:b3; next; 1. lr_in_lookup_neighbor (northd.c:13142): inport == "lrp-8c020a" && arp.spa == 192.168.5.0/24 && arp.tpa == 192.168.5.1 && arp.op == 1, priority 110, uuid d447a962 reg9[2] = lookup_arp(inport, arp.spa, arp.sha); /* MAC binding to ff:ff:ff:ff:ff:ff found. */ reg9[3] = 1; next;
Re: [ovs-discuss] OVS with or without DPDK Throughput issue
On Wed, 31 Jan 2024 17:45:30 -0800 MIchael Miller via discuss wrote: > Hi All, > > Not sure if this is the right forum for this question but figured I > would start here. > > Have implemented several dedicated kvm based private virtualization > server in several Outside Data Center (have no access to > router/switch configs), Base host is Rocky Linux 9 with OVS 3.2.1 > with dpdk 22.11 complied from source and from that I have created two > ovs bridges groups, "br-ex" is attached to 10Gbps Server NIC Card and > vhost-net attached to a KVM running a firewall/router/vpn appliance > ("WAN" Port), 2nd group is "br-int" which creates a form of a lan > network without a physical NIC card, this is attached to the KVM > running the firewall/router/vpn appliance as a "LAN" Port and any KVM > guest as well via a vhost-net style config. When I test from Base > Host via speedtest or iperf3, results vary but are typically > 8+/8+Gbps and same test from KVM guest results in 4/4Gbps which > sounds great in theory but actually throughput between KVM guests or > KVM guest and outside world varies, have seen 31Gb take 15 minutes > and have seen the same 31GB take several hours. Look at the CPU usage while testing it. Most probably there are threads scheduled over the same CPU causing the tput to go down. See this recommendation as an example: https://access.redhat.com/documentation/pt-br/red_hat_openstack_platform/10/html/ovs-dpdk_end_to_end_troubleshooting_guide/using_virsh_emulatorpin_in_virtual_environments_with_nfv fbl > > Have gone down the rabbit hole of maybe the firewall/router/vpn > appliance can't handle it but have tried two different vendor > products: Sonicwall and Mikrotik CHR both have similar results. I > have also tested a Rocky LInux KVM guest directly attached to the > "br-ex" group but the results also show degraded performance, which > is leading me to believe something might have been missed in the OVS > setup. > > > br-ex/br-int kvm nic config (difference is source bridge line and > slot= line): > > > > > > > > > > event_idx="off" queues="8"/> > > function="0x0"/> > > > -- fbl ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] Encapsulate VXLAN and then process other flows
On 2/2/24 08:58, Lim, Derrick via discuss wrote: > Hi Ilya Maximets, > >> The rules look mostly fine. I think the main problem you have is priority. >> Default priority for OF rules (if not specified) is 32768, so your new rules >> with priority 50 are too low in a priority list and not getting hit. > > I tried this again with the default flow at priority 50 and mine at 499 but I > still couldn't get the flow to hit. > > However, I observed that if the source address is set to anything other than > `2403:400:31da:::18:6`, which is an address that exists on the phy > `br-phy` > interface, the lookup is a hit and the action taken. > > Is there anything that prevents the address from being set to that of > something > that is already configured on an interface? > > For example, > > $ ip addr > 35: br-phy: mtu 1500 qdisc fq_codel > state UNKNOWN group default qlen 1000 > link/ether de:03:37:e2:1f:ef brd ff:ff:ff:ff:ff:ff > inet6 2403:400:31da:::18:6/128 scope global > valid_lft forever preferred_lft forever > inet6 fe80::dc03:37ff:fee2:1fef/64 scope link > valid_lft forever preferred_lft forever > > Set the source address to `2403:400:31da:::18:5`. In the flow entry, > `set(ipv6(src=2403:400:31da:::18:5))` is applied in actions. > > === > > $ /usr/bin/ovs-ofctl add-flow br-phy \ > > "priority=499,ipv6,ipv6_dst=2403:400:31da:::18:3,actions=set_field:2403:400:31da:::18:5->ipv6_src,normal" > > $ /usr/bin/ovs-appctl dpctl/dump-flows -m netdev@ovs-netdev | grep > 192.168.1.33 > > ufid:acc4b3bc-4958-412d-90c2-9bc4b3fbfac7, > > skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0), > recirc_id(0),dp_hash(0/0),in_port(dpdk-vm101),packet_type(ns=0,id=0), > > eth(src=52:54:00:3d:cd:0c/00:00:00:00:00:00,dst=00:00:00:00:00:01/00:00:00:00:00:00),eth_type(0x0800), > > ipv4(src=100.87.18.60/0.0.0.0,dst=192.168.1.33/0.0.0.0,proto=1/0,tos=0/0x3,ttl=64/0,frag=no), > icmp(type=8/0,code=0/0), packets:407, bytes:39886, used:0.661s, dp:ovs, > actions: > tnl_push(tnl_port(vxlan_sys_4789),header(size=70,type=4, > eth(dst=90:0a:84:9e:95:70,src=de:03:37:e2:1f:ef,dl_type=0x86dd), > > ipv6(src=fe80::dc03:37ff:fee2:1fef,dst=2403:400:31da:::18:3,label=0,proto=17,tclass=0x0,hlimit=64), > > udp(src=0,dst=4789,csum=0x),vxlan(flags=0x800,vni=0x1)),out_port(br-phy)), > set(ipv6(src=2403:400:31da:::18:5)), > push_vlan(vid=304,pcp=0), > exit_p0, > dp-extra-info:miniflow_bits(4,1) > > === > > > Set the source address to `2403:400:31da:::18:6`. This is a configured > address on `br-phy`. The `set(ipv6(src=))` part is no longer applied. > > === > > $ /usr/bin/ovs-ofctl add-flow br-phy \ > > "priority=499,ipv6,ipv6_dst=2403:400:31da:::18:3,actions=set_field:2403:400:31da:::18:6->ipv6_src,normal" > > $ /usr/bin/ovs-appctl dpctl/dump-flows -m netdev@ovs-netdev | grep > 192.168.1.33 > > ufid:acc4b3bc-4958-412d-90c2-9bc4b3fbfac7, > > skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0), > recirc_id(0),dp_hash(0/0),in_port(dpdk-vm101),packet_type(ns=0,id=0), > > eth(src=52:54:00:3d:cd:0c/00:00:00:00:00:00,dst=00:00:00:00:00:01/00:00:00:00:00:00),eth_type(0x0800), > > ipv4(src=100.87.18.60/0.0.0.0,dst=192.168.1.33/0.0.0.0,proto=1/0,tos=0/0x3,ttl=64/0,frag=no), > icmp(type=8/0,code=0/0), packets:423, bytes:41454, used:0.803s, dp:ovs, > actions: > tnl_push(tnl_port(vxlan_sys_4789),header(size=70,type=4, > eth(dst=90:0a:84:9e:95:70,src=de:03:37:e2:1f:ef,dl_type=0x86dd), > > ipv6(src=fe80::dc03:37ff:fee2:1fef,dst=2403:400:31da:::18:3,label=0,proto=17,tclass=0x0,hlimit=64), > > udp(src=0,dst=4789,csum=0x),vxlan(flags=0x800,vni=0x1)),out_port(br-phy)), > push_vlan(vid=304,pcp=0), > exit_p0, > dp-extra-info:miniflow_bits(4,1) > > $ /usr/bin/ovs-ofctl dump-flows br-phy > > cookie=0x0, duration=170.787s, table=0, n_packets=251, n_bytes=40328, > priority=499,ipv6,ipv6_dst=2403:400:31da:::18:3 > > actions=load:0x180006->NXM_NX_IPV6_SRC[0..63],load:0x2403040031da->NXM_NX_IPV6_SRC[64..127],NORMAL > > cookie=0x0, duration=1136.132s, table=0, n_packets=10175, n_bytes=1116852, > priority=10 actions=NORMAL > > === Hmm. This is interesting. We can see that some packets do actually hit the OpenFlow rule (n_packets=251). The decision to not include the set(ipv6(src=)) action is likely made because it is the same as one already in the packet. But we can see from the datapath flow dump that it is supposed to be different: tnl_push( ... ipv6(src=fe80::dc03:37ff:fee2:1fef, ... ) This is a link-local IP of that interface. I suspect that a mishap happened somewhere and 2403:400:31da:::18:6 was used for the actual tunnel header, or it was used to updated the local flow structure during the flow
Re: [ovs-discuss] bond: bond/show next balance time is a negative value.
On 2/1/24 04:42, Huangzhidong via discuss wrote: > Hi > > When I use bond/show to get next balance time, it sometimes get a negative > value. > > It can easily reproduce by run shell scripts: > > while true; do > ovs-appctl bond/show | grep next > done > > and it can be easily fixed by: > > ofproto/bond.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > diff --git a/ofproto/bond.c b/ofproto/bond.c > index cfdf44f85..384ffdb08 100644 > --- a/ofproto/bond.c > +++ b/ofproto/bond.c > @@ -1539,7 +1539,7 @@ bond_print_details(struct ds *ds, const struct bond > *bond) > if (bond_is_balanced(bond)) { > ds_put_format(ds, "next rebalance: %lld ms\n", > - bond->next_rebalance - time_msec()); > + (bond->next_rebalance + bond->rebalance_interval - > time_msec()) % bond->rebalance_interval); > } > ds_put_format(ds, "lacp_status: %s\n", Hi, My understanding is that we print out negative value because rebalancing is already overdue, i.e. we should have been rebalanced X ms ago. And that indicates that rebalancing will be performed as soon as possible. Your suggested change will make the value positive, but it will no longer be correct in this case. We could print out a zero, I guess, instead of a negative value, but, I think, the negative value is somewhat useful, because we can tell how far behind OVS is on the rebalancing. Does that make sense? Best regards, Ilya Maximets. ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
[ovs-discuss] lacp: Unable to restore static aggregation after configuring LACP
Hi I configured static aggregation on the physical switch and also configured a static aggregation bond port on the latest version of OVS. When I set the LACP of the bond port to active, the port goes down, which is reasonable because the protocols on both ends do not match. However, when I restore the LACP setting to off, the port status cannot be recovered. This situation occurs because when I set LACP to off, it does not trigger seq_change(connectivity_seq_get()), so the port status is not updated in the port_run function. A simple solution is to update the connectivity_seq when the LACP status changes. This will ensure that the port status is refreshed during the next port_run. A simple patch is provided below. Index: bond.c bool bond_run(struct bond *bond, enum lacp_status lacp_status) { struct bond_member *member, *primary; bool revalidate; ovs_rwlock_wrlock(); if (bond->lacp_status != lacp_status) { bond->lacp_status = lacp_status; bond->bond_revalidate = true; + seq_change(connectivity_seq_get()); /* Change in LACP status can affect whether the bond is falling back to * active-backup. Make sure to create or destroy buckets if * necessary. */ Best regards, - ? ??? This e-mail and its attachments contain confidential information from New H3C, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
[ovs-discuss] bond: bond/show next balance time is a negative value.
Hi When I use bond/show to get next balance time, it sometimes get a negative value. It can easily reproduce by run shell scripts: while true; do ovs-appctl bond/show | grep next done and it can be easily fixed by: ofproto/bond.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ofproto/bond.c b/ofproto/bond.c index cfdf44f85..384ffdb08 100644 --- a/ofproto/bond.c +++ b/ofproto/bond.c @@ -1539,7 +1539,7 @@ bond_print_details(struct ds *ds, const struct bond *bond) if (bond_is_balanced(bond)) { ds_put_format(ds, "next rebalance: %lld ms\n", - bond->next_rebalance - time_msec()); +(bond->next_rebalance + bond->rebalance_interval - time_msec()) % bond->rebalance_interval); } ds_put_format(ds, "lacp_status: %s\n", Best Regards - ? ??? This e-mail and its attachments contain confidential information from New H3C, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
[ovs-discuss] RSTP cannot be configured on both bridges that are connected by patch ports.
Hi I have two bridges, vds1-br and vds1-br-ex, connected by patch ports with OVS 3.2. When I configure RSTP on both bridges, it causes the OVS process to crash and restart repeatedly. [root@localhost ~]# ovs-vsctl set Bridge vds1-br-ex rstp_enable=true [root@localhost ~]# ovs-vsctl set Bridge vds1-br rstp_enable=true ->at this time, ovs crash and ovsdb poll 2024-02-01T02:35:00Z|2|fatal_signal|WARN|terminating with signal 2 (Interrupt) [root@localhost ~]# ps aux | grep ovs root 4173731 0.0 0.0 13712 2572 ?SFrom core , I can get: #0 rstp_port_received_bpdu (rp=0x562c718d0ed0, bpdu=0x562c718dbaa1, bpdu_size=36) at lib/rstp.c:238 #1 0x562c703aa415 in rstp_process_packet (packet=, xport=) at ./lib/dp-packet.h:619 #2 0x562c703aad22 in process_special (ctx=ctx@entry=0x7fff448169c0, xport=xport@entry=0x562c719192f0) at ofproto/ofproto-dpif-xlate.c:3443 #3 0x562c703ad4b3 in patch_port_output (ctx=ctx@entry=0x7fff448169c0, in_dev=in_dev@entry=0x562c7191c190, out_dev=0x562c719192f0, is_last_action=is_last_action@entry=true) at ofproto/ofproto-dpif-xlate.c:3951 #4 0x562c703af17a in compose_output_action__ (ctx=ctx@entry=0x7fff448169c0, ofp_port=1, xr=xr@entry=0x0, check_stp=check_stp@entry=true, is_last_action=, truncate=truncate@entry=false) at ofproto/ofproto-dpif-xlate.c:4274 #5 0x562c703b0f61 in compose_output_action (truncate=false, is_last_action=, xr=0x0, ofp_port=, ctx=0x7fff448169c0) at ofproto/ofproto-dpif-xlate.c:5374 #17 0x562c704d27ae in rstp_port_set_mac_operational (p=0x562c718df850, new_mac_operational=) at lib/rstp.c:1042 #18 0x562c703778d7 in ofproto_port_set_rstp (ofproto=0x562c718974c0, ofp_port=1, s=s@entry=0x7fff44818450) at ofproto/ofproto.c:1271 It caused by rstp_mutex set to PTHREAD_MUTEX_INITIALIZER, it does not allow RECURSIVE. >From stp_mutex, it says: /* We need a recursive mutex because stp_send_bpdu() could loop back * into the stp module through a patch port. This happens * intentionally as part of the unit tests. Ideally we'd ditch * the call back function, but for now this is what we have. */ ovs_mutex_init_recursive(); And the early version of RSTP also used recursive mutex locks. With commit * lib/rstp: Remove lock recursion.( SHA-1:6b90bc57e7a23b89a594ceb857f8267c8b4026df , Change the RSTP send_bpdu interface so that a recursive mutex is not needed.), RECURSIVE lock has been canceled. May I still need a RECURSIVE lock for rstp? Best regards - ? ??? This e-mail and its attachments contain confidential information from New H3C, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss