On 12/2/22 18:31, Vladislav Odintsov wrote: > Hi, > > we’ve met with an issue, where it was possible to create multiple similar > routes within LR (same ip_prefix, nexthop, and route table). > > Initially the problem stared after OVN upgrade. We use python ovsdbapp > library, > and we found a problem in python-ovs, which is described here > https://mail.openvswitch.org/pipermail/ovs-dev/2022-November/399722.html by my > colleague Anton. @Terry Wilson, please take a look on this. > > The problem itself touches OVN and OVS. Sorry for the long read, but it seems > that there are a couple of bugs in different places, part of which this RFC > used to cover. > > How the issue was initially reproduced: > > 1. assume we have (at least) 2-Availability Zone OVN deployment > (utilising ovn-ic infrastructure). > 2. create transit switch in IC NB > 3. create LR in each AZ, connect them to transit switch > 4. create one logical switch with a VIF port attached to local OVS & > connect this logical switch to LR (e.g. 192.168.0.1/24) > 5. install in one AZ in LR 2 static routes with a create command (invoke > next command twice): > > ovn-nbctl --id=@id create logical-router-static-route ip_prefix=1.2.3.4/32 > nexthop=192.168.0.10 -- logical_router add lr1 static_routes @id > > From this time there is a couple of strange behaviour/bugs appear: > > 1. [possible problem] There is a duplicated route in the NB within a > single LR. lflow is computed to have ECMP group with two similar > routes: > > table=11(lr_in_ip_routing ), priority=97 , match=(reg7 == 0 && ip4.dst > == 1.2.3.4/32), action=(ip.ttl--; flags.loopback = 1; reg8[0..15] = 1; > reg8[16..31] = select(1, 2); > table=12(lr_in_ip_routing_ecmp), priority=100 , match=(reg8[0..15] == 1 > && reg8[16..31] == 1), action=(reg0 = 192.168.0.10; reg1 = 192.168.0.1; > eth.src = d0:fe:00:00:00:04; outport = "subnet-45661000"; next;) > table=12(lr_in_ip_routing_ecmp), priority=100 , match=(reg8[0..15] == 2 > && reg8[16..31] == 1), action=(reg0 = 192.168.0.10; reg1 = 192.168.0.1; > eth.src = d0:fe:00:00:00:04; outport = "subnet-45661000"; next;) > > Maybe, it’s better to have some kind of handling such routes? > ovsdb index or some logic in ovn-northd? > > 2. [bug] There is a duplicated route advertisement in > OVN_IC_Southbound:Route table. IMO, this should be fixed by adding a > new index to this table for availability_zone, transit_switch, > ip_prefix, nexthop and route_table; adding a logic to check if the > route was already advertised (covered in Patch #7). > > 3. [bug] There is a constant same route learning. Each ovn-ic iteration > on the opposite availability zone adds one new same route. It creates > thousands of same routes each second. This bug is covered by Patch #7. > > 4. [possible problem] After multiple routes are learned to NB on the > opposite availability zone, ovn-northd generates ecmp lflows. Same as > in #1: one in lr_in_ip_routing with select(<thousands of elements>) > and thousands of same records in lr_in_ip_routing_ecmp. OVN allows > installing UINT_MAX routes within ECMP group. > > 5. [OVS bug?] I'd like someone from OVS team to see on this. > ovn-controller installed long-long openflow group rule > (group #3): > > # ovn-appctl -t ovn-controller group-table-list | grep :3 | wc -c > 797824 > > When I try to dump groups with ovs-ofctl dump-groups br-int, I get > next error in console: > > # ovs-ofctl dump-groups br-int > ovs-ofctl: OpenFlow packet receive failed (End of file) > > In ovs-vswitchd I see next error in logs and after this line ovs is > restarted: > > 2022-11-16T15:21:29.898Z|00145|util|EMER|lib/ofp-msgs.c:995: assertion > start_ofs <= UINT16_MAX failed in ofpmp_postappend()
This looks like an OVS bug to me. Ilya, what do you think the best way to fix this is? > > If I issue command again, sometimes it prints same error, but > sometimes this one (I had on the dev machine another OVN LB, so there > are excess groups): > > # ovs-ofctl dump-groups br-int > NXST_GROUP_DESC reply (xid=0x2): flags=[more] > > group_id=3,type=select,selection_method=dp_hash,bucket=bucket_id:0,weight:100,actions=ct(commit,table=20,zone=NXM_NX_REG13[0..15],nat(dst=...),exec(load:0x1->NXM_NX_CT_LABEL[1])) > > group_id=1,type=select,selection_method=dp_hash,bucket=bucket_id:0,weight:100,actions=ct(commit,table=20,zone=NXM_NX_REG13[0..15],nat(dst=...),exec(load:0x1->NXM_NX_CT_LABEL[1])) > 2022-11-17T17:53:41Z|00001|ofp_group|WARN|OpenFlow message bucket length > 56 exceeds remaining buckets data size 40 > NXST_GROUP_DESC reply (xid=0x2): ***decode error: OFPGMFC_BAD_BUCKET*** > 00000000 01 11 a9 58 00 00 00 02-ff ff 00 00 00 00 23 20 |...X..........# > | > 00000010 00 00 00 08 00 00 00 00-a9 40 01 00 00 00 00 02 > |.........@......| > 00000020 a9 08 00 00 00 00 00 00-00 38 00 28 00 00 00 00 > |.........8.(....| > 00000030 ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# > ........| > 00000040 00 00 00 00 00 00 00 01-ff ff 00 10 00 00 23 20 |..............# > | > 00000050 00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 > |.............d..| > 00000060 00 38 00 28 00 00 00 01-ff ff 00 18 00 00 23 20 |.8.(..........# > | > 00000070 00 07 0c 0f 80 01 08 08-00 00 00 00 00 00 00 02 > |................| > 00000080 ff ff 00 10 00 00 23 20-00 0e ff f8 14 00 00 00 |......# > ........| > 00000090 00 00 00 08 00 64 00 00-00 38 00 28 00 00 00 02 > |.....d...8.(....| > 000000a0 ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# > ........| > 000000b0 00 00 00 00 00 00 00 03-ff ff 00 10 00 00 23 20 |..............# > | > 000000c0 00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 > |.............d..| > 000000d0 00 38 00 28 00 00 00 03-ff ff 00 18 00 00 23 20 |.8.(..........# > | > 000000e0 00 07 0c 0f 80 01 08 08-00 00 00 00 00 00 00 04 > |................| > 000000f0 ff ff 00 10 00 00 23 20-00 0e ff f8 14 00 00 00 |......# > ........| > 00000100 00 00 00 08 00 64 00 00-00 38 00 28 00 00 00 04 > |.....d...8.(....| > 00000110 ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# > ........| > 00000120 00 00 00 00 00 00 00 05-ff ff 00 10 00 00 23 20 |..............# > | > 00000130 00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 > |.............d..| > 00000140 00 38 00 28 00 00 00 05-ff ff 00 18 00 00 23 20 |.8.(..........# > | > 00000150 00 07 0c 0f 80 01 08 08-00 00 00 00 00 00 00 06 > |................| > 00000160 ff ff 00 10 00 00 23 20-00 0e ff f8 14 00 00 00 |......# > ........| > 00000170 00 00 00 08 00 64 00 00-00 38 00 28 00 00 00 06 > |.....d...8.(....| > 00000180 ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# > ........| > 00000190 00 00 00 00 00 00 00 07-ff ff 00 10 00 00 23 20 |..............# > | > 000001a0 00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 > |.............d..| > 000001b0 00 38 00 28 00 00 00 07-ff ff 00 18 00 00 23 20 |.8.(..........# > | > 000001c0 00 07 0c 0f 80 01 08 08-00 00 00 00 00 00 00 08 > |................| > 000001d0 ff ff 00 10 00 00 23 20-00 0e ff f8 14 00 00 00 |......# > ........| > 000001e0 00 00 00 08 00 64 00 00-00 38 00 28 00 00 00 08 > |.....d...8.(....| > 000001f0 ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# > ........| > 00000200 00 00 00 00 00 00 00 09-ff ff 00 10 00 00 23 20 |..............# > | > 00000210 00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 > |.............d..| > > 7. From this problem with groups-dump I have some questions: > 1. Is there a limit for a buckets count in group? Or a limit for the > group string length? > 2. If yes, should OVN limit on its side the count of buckets in a > group? (Patches #4 && #6). > > 8. Also I’ve tried to see from which values do these problem with > dump-groups begin. I created in a for-loop in OVN multiple ECMP routes > and see that starting from 1200 items in a group the error from last > example appear. I tried to create 10k buckets and while it was > configuring on my machine there were also next lines in logfile: > > 2022-11-17T18:23:30.992Z|00554|ovs_rcu(urcu6)|WARN|blocked 1000 ms waiting > for main to quiesce > 2022-11-17T18:23:31.992Z|00555|ovs_rcu(urcu6)|WARN|blocked 2000 ms waiting > for main to quiesce > 2022-11-17T18:23:33.993Z|00556|ovs_rcu(urcu6)|WARN|blocked 4001 ms waiting > for main to quiesce > > When the routes finished creating, I've issued ovs-ofctl dump-groups br-int > and there was just an error: > > # ovs-ofctl dump-groups br-int > ovs-ofctl: OpenFlow packet receive failed (End of file) > > And OVS crashed. OVS 2.17.3 is used. > > My script: > > # cat ./repro.sh > #!/bin/bash > > count=$1 > > echo "Creating ${count} same routes..." > > ovn-nbctl lr-route-del lr1 1.2.3.4/32 > > for i in $(seq 1 ${count}); do > echo $i > ovn-nbctl --id=@id create logical-router-static-route > ip_prefix=1.2.3.4/32 nexthop=172.31.32.4 policy=dst-ip -- add logical-router > vpc-FC7D6A54 static_routes @id > done > > Thanks for reading this, I'm ready to provide any additional information to > help investigate this. > > Vladislav Odintsov (7): > ic: move routes_ad hmap insert to separate function > ic: remove orphan ovn interconnection routes > ic: lookup southbound port_binding only if needed > actions: limit possible OF group bucket count > ic: minor code improvements > northd: limit ECMP group by 1024 members > ic: prevent advertising/learning multiple same routes > > ic/ovn-ic.c | 123 ++++++++++++++++++++++++++++------------ > lib/actions.c | 40 ++++++++++++- > northd/northd.c | 2 +- > ovn-ic-sb.ovsschema | 6 +- > tests/ovn-ic.at | 133 ++++++++++++++++++++++++++++++++++++++++++++ > 5 files changed, 263 insertions(+), 41 deletions(-) > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev