On 12/2/22 18:31, Vladislav Odintsov wrote:
> Hi,
> 
> we’ve met with an issue, where it was possible to create multiple similar
> routes within LR (same ip_prefix, nexthop, and route table).
> 
> Initially the problem stared after OVN upgrade. We use python ovsdbapp 
> library,
> and we found a problem in python-ovs, which is described here
> https://mail.openvswitch.org/pipermail/ovs-dev/2022-November/399722.html by my
> colleague Anton.  @Terry Wilson, please take a look on this.
> 
> The problem itself touches OVN and OVS.  Sorry for the long read, but it seems
> that there are a couple of bugs in different places, part of which this RFC
> used to cover.
> 
> How the issue was initially reproduced:
> 
> 1. assume we have (at least) 2-Availability Zone OVN deployment
>    (utilising ovn-ic infrastructure).
> 2. create transit switch in IC NB
> 3. create LR in each AZ, connect them to transit switch
> 4. create one logical switch with a VIF port attached to local OVS &
>    connect this logical switch to LR (e.g. 192.168.0.1/24)
> 5. install in one AZ in LR 2 static routes with a create command (invoke
>    next command twice):
> 
>    ovn-nbctl --id=@id create logical-router-static-route ip_prefix=1.2.3.4/32 
> nexthop=192.168.0.10 -- logical_router add lr1 static_routes @id
> 
> From this time there is a couple of strange behaviour/bugs appear:
> 
> 1. [possible problem] There is a duplicated route in the NB within a
>    single LR.  lflow is computed to have ECMP group with two similar
>    routes:
> 
>    table=11(lr_in_ip_routing   ), priority=97   , match=(reg7 == 0 && ip4.dst 
> == 1.2.3.4/32), action=(ip.ttl--; flags.loopback = 1; reg8[0..15] = 1; 
> reg8[16..31] = select(1, 2);
>    table=12(lr_in_ip_routing_ecmp), priority=100  , match=(reg8[0..15] == 1 
> && reg8[16..31] == 1), action=(reg0 = 192.168.0.10; reg1 = 192.168.0.1; 
> eth.src = d0:fe:00:00:00:04; outport = "subnet-45661000"; next;)
>    table=12(lr_in_ip_routing_ecmp), priority=100  , match=(reg8[0..15] == 2 
> && reg8[16..31] == 1), action=(reg0 = 192.168.0.10; reg1 = 192.168.0.1; 
> eth.src = d0:fe:00:00:00:04; outport = "subnet-45661000"; next;)
> 
>    Maybe, it’s better to have some kind of handling such routes?
>    ovsdb index or some logic in ovn-northd?
> 
> 2. [bug] There is a duplicated route advertisement in
>    OVN_IC_Southbound:Route table.  IMO, this should be fixed by adding a
>    new index to this table for availability_zone, transit_switch,
>    ip_prefix, nexthop and route_table; adding a logic to check if the
>    route was already advertised (covered in Patch #7).
> 
> 3. [bug] There is a constant same route learning.  Each ovn-ic iteration
>    on the opposite availability zone adds one new same route.  It creates
>    thousands of same routes each second. This bug is covered by Patch #7.
> 
> 4. [possible problem] After multiple routes are learned to NB on the
>    opposite availability zone, ovn-northd generates ecmp lflows.  Same as
>    in #1: one in lr_in_ip_routing with select(<thousands of elements>)
>    and thousands of same records in lr_in_ip_routing_ecmp.  OVN allows
>    installing UINT_MAX routes within ECMP group.
> 
> 5. [OVS bug?] I'd like someone from OVS team to see on this.
>    ovn-controller installed long-long openflow group rule
>    (group #3):
> 
>    # ovn-appctl -t ovn-controller group-table-list | grep :3 | wc -c
>    797824
> 
>    When I try to dump groups with ovs-ofctl dump-groups br-int, I get
>    next error in console:
> 
>    # ovs-ofctl dump-groups br-int
>    ovs-ofctl: OpenFlow packet receive failed (End of file)
> 
>    In ovs-vswitchd I see next error in logs and after this line ovs is
>    restarted:
> 
>    2022-11-16T15:21:29.898Z|00145|util|EMER|lib/ofp-msgs.c:995: assertion 
> start_ofs <= UINT16_MAX failed in ofpmp_postappend()

This looks like an OVS bug to me.  Ilya, what do you think the best way
to fix this is?

> 
>    If I issue command again, sometimes it prints same error, but
>    sometimes this one (I had on the dev machine another OVN LB, so there
>    are excess groups):
> 
>    # ovs-ofctl dump-groups br-int
>    NXST_GROUP_DESC reply (xid=0x2): flags=[more]
>    
> group_id=3,type=select,selection_method=dp_hash,bucket=bucket_id:0,weight:100,actions=ct(commit,table=20,zone=NXM_NX_REG13[0..15],nat(dst=...),exec(load:0x1->NXM_NX_CT_LABEL[1]))
>    
> group_id=1,type=select,selection_method=dp_hash,bucket=bucket_id:0,weight:100,actions=ct(commit,table=20,zone=NXM_NX_REG13[0..15],nat(dst=...),exec(load:0x1->NXM_NX_CT_LABEL[1]))
>    2022-11-17T17:53:41Z|00001|ofp_group|WARN|OpenFlow message bucket length 
> 56 exceeds remaining buckets data size 40
>    NXST_GROUP_DESC reply (xid=0x2): ***decode error: OFPGMFC_BAD_BUCKET***
>    00000000  01 11 a9 58 00 00 00 02-ff ff 00 00 00 00 23 20 |...X..........# 
> |
>    00000010  00 00 00 08 00 00 00 00-a9 40 01 00 00 00 00 02 
> |.........@......|
>    00000020  a9 08 00 00 00 00 00 00-00 38 00 28 00 00 00 00 
> |.........8.(....|
>    00000030  ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# 
> ........|
>    00000040  00 00 00 00 00 00 00 01-ff ff 00 10 00 00 23 20 |..............# 
> |
>    00000050  00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 
> |.............d..|
>    00000060  00 38 00 28 00 00 00 01-ff ff 00 18 00 00 23 20 |.8.(..........# 
> |
>    00000070  00 07 0c 0f 80 01 08 08-00 00 00 00 00 00 00 02 
> |................|
>    00000080  ff ff 00 10 00 00 23 20-00 0e ff f8 14 00 00 00 |......# 
> ........|
>    00000090  00 00 00 08 00 64 00 00-00 38 00 28 00 00 00 02 
> |.....d...8.(....|
>    000000a0  ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# 
> ........|
>    000000b0  00 00 00 00 00 00 00 03-ff ff 00 10 00 00 23 20 |..............# 
> |
>    000000c0  00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 
> |.............d..|
>    000000d0  00 38 00 28 00 00 00 03-ff ff 00 18 00 00 23 20 |.8.(..........# 
> |
>    000000e0  00 07 0c 0f 80 01 08 08-00 00 00 00 00 00 00 04 
> |................|
>    000000f0  ff ff 00 10 00 00 23 20-00 0e ff f8 14 00 00 00 |......# 
> ........|
>    00000100  00 00 00 08 00 64 00 00-00 38 00 28 00 00 00 04 
> |.....d...8.(....|
>    00000110  ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# 
> ........|
>    00000120  00 00 00 00 00 00 00 05-ff ff 00 10 00 00 23 20 |..............# 
> |
>    00000130  00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 
> |.............d..|
>    00000140  00 38 00 28 00 00 00 05-ff ff 00 18 00 00 23 20 |.8.(..........# 
> |
>    00000150  00 07 0c 0f 80 01 08 08-00 00 00 00 00 00 00 06 
> |................|
>    00000160  ff ff 00 10 00 00 23 20-00 0e ff f8 14 00 00 00 |......# 
> ........|
>    00000170  00 00 00 08 00 64 00 00-00 38 00 28 00 00 00 06 
> |.....d...8.(....|
>    00000180  ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# 
> ........|
>    00000190  00 00 00 00 00 00 00 07-ff ff 00 10 00 00 23 20 |..............# 
> |
>    000001a0  00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 
> |.............d..|
>    000001b0  00 38 00 28 00 00 00 07-ff ff 00 18 00 00 23 20 |.8.(..........# 
> |
>    000001c0  00 07 0c 0f 80 01 08 08-00 00 00 00 00 00 00 08 
> |................|
>    000001d0  ff ff 00 10 00 00 23 20-00 0e ff f8 14 00 00 00 |......# 
> ........|
>    000001e0  00 00 00 08 00 64 00 00-00 38 00 28 00 00 00 08 
> |.....d...8.(....|
>    000001f0  ff ff 00 18 00 00 23 20-00 07 0c 0f 80 01 08 08 |......# 
> ........|
>    00000200  00 00 00 00 00 00 00 09-ff ff 00 10 00 00 23 20 |..............# 
> |
>    00000210  00 0e ff f8 14 00 00 00-00 00 00 08 00 64 00 00 
> |.............d..|
> 
> 7. From this problem with groups-dump I have some questions:
>    1. Is there a limit for a buckets count in group? Or a limit for the
>       group string length?
>    2. If yes, should OVN limit on its side the count of buckets in a
>       group? (Patches #4 && #6).
> 
> 8. Also I’ve tried to see from which values do these problem with
>    dump-groups begin. I created in a for-loop in OVN multiple ECMP routes
>    and see that starting from 1200 items in a group the error from last
>    example appear. I tried to create 10k buckets and while it was
>    configuring on my machine there were also next lines in logfile:
> 
>    2022-11-17T18:23:30.992Z|00554|ovs_rcu(urcu6)|WARN|blocked 1000 ms waiting 
> for main to quiesce
>    2022-11-17T18:23:31.992Z|00555|ovs_rcu(urcu6)|WARN|blocked 2000 ms waiting 
> for main to quiesce
>    2022-11-17T18:23:33.993Z|00556|ovs_rcu(urcu6)|WARN|blocked 4001 ms waiting 
> for main to quiesce
> 
>    When the routes finished creating, I've issued ovs-ofctl dump-groups br-int
>    and there was just an error:
> 
>    # ovs-ofctl dump-groups br-int
>    ovs-ofctl: OpenFlow packet receive failed (End of file)
> 
>    And OVS crashed. OVS 2.17.3 is used.
> 
>    My script:
> 
> # cat ./repro.sh
> #!/bin/bash
> 
> count=$1
> 
> echo "Creating ${count} same routes..."
> 
> ovn-nbctl lr-route-del lr1 1.2.3.4/32
> 
> for i in $(seq 1 ${count}); do
>     echo $i
>     ovn-nbctl --id=@id create logical-router-static-route 
> ip_prefix=1.2.3.4/32 nexthop=172.31.32.4 policy=dst-ip -- add logical-router 
> vpc-FC7D6A54 static_routes @id
> done
> 
> Thanks for reading this, I'm ready to provide any additional information to 
> help investigate this.
> 
> Vladislav Odintsov (7):
>   ic: move routes_ad hmap insert to separate function
>   ic: remove orphan ovn interconnection routes
>   ic: lookup southbound port_binding only if needed
>   actions: limit possible OF group bucket count
>   ic: minor code improvements
>   northd: limit ECMP group by 1024 members
>   ic: prevent advertising/learning multiple same routes
> 
>  ic/ovn-ic.c         | 123 ++++++++++++++++++++++++++++------------
>  lib/actions.c       |  40 ++++++++++++-
>  northd/northd.c     |   2 +-
>  ovn-ic-sb.ovsschema |   6 +-
>  tests/ovn-ic.at     | 133 ++++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 263 insertions(+), 41 deletions(-)
> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to