On Fri, Mar 8, 2024 at 12:17 AM Tao Liu <taoliu...@163.com> wrote: > > > On 3/7/24 5:39 AM, Ilya Maximets wrote: > > On 2/27/24 20:14, Han Zhou wrote: > >> For kernel datapath, when a new tunnel port is created in the same > >> transaction in which an old tunnel port with the same tunnel > >> configuration is deleted, the new tunnel port creation will fail and > >> left in an error state. This can be easily reproduced in OVN by > >> triggering a chassis deletion and addition with the same encap in the > >> same SB DB transaction, such as: > >> > >> ovn-sbctl chassis-add aaaaaa geneve 1.2.3.4 > >> ovn-sbctl chassis-del aaaaaa -- chassis-add bbbbbb 1.2.3.4 > >> > >> ovs-vsctl show | grep error > >> error: "could not add network device ovn-bbbbbb-0 to ofproto (File exists)" > >> > >> Related logs in OVS: > >> — > >> 2024-02-23T05:41:49.978Z|405933|bridge|INFO|bridge br-int: deleted interface ovn-aaaaaa-0 on port 113 > >> 2024-02-23T05:41:49.989Z|405935|tunnel|WARN|ovn-bbbbbb-0: attempting to add tunnel port with same config as port 'ovn-aaaaaa-0' (::->1.2.3.4, key=flow, legacy_l2, dp port=9) > >> 2024-02-23T05:41:49.989Z|405936|ofproto|WARN|br-int: could not add port ovn-bbbbbb-0 (File exists) > >> 2024-02-23T05:41:49.989Z|405937|bridge|WARN|could not add network device ovn-bbbbbb-0 to ofproto (File exists) > >> — > > > > Hi, Han. Thanks for the patch! > > > >> > >> Depending on when there are other OVSDB changes, it may take a long time > >> for the device to be added successfully, triggered by the next OVS > >> iteration. > >> > >> (note: native tunnel ports do not have this problem) > > > > I don't think this is correct. The code path is common for both system > > and native tunnels. I can reproduce the issues in a sandbox with: > > > > $ make -j8 sandbox SANDBOXFLAGS="\-\-dummy='system'" > > [tutorial]$ ovs-vsctl add-port br0 tunnel_port \ > > -- set Interface tunnel_port \ > > type=geneve options:remote_ip=flow options:key=123 > > [tutorial]$ ovs-vsctl del-port tunnel_port \ > > -- add-port br0 tunnel_port2 \ > > -- set Interface tunnel_port2 \ > > type=geneve options:remote_ip=flow options:key=123 > > ovs-vsctl: Error detected while setting up 'tunnel_port2': > > could not add network device tunnel_port2 to ofproto (File exists). > > See ovs-vswitchd log for details. > > > > The same should work in a testsuite as well, i.e. we should be able to > > create a test for this scenario. > > > > Note: The --dummy=system prevents OVS from replacing tunnel ports with > > dummy ones. > >
Thanks Ilya for the correction! --dummy=system is very helpful. > >> > >> The problem is how the tunnel port deletion and creation are handled. In > >> bridge_reconfigure(), port deletion is handled before creation, to avoid > >> such resource conflict. However, for kernel tunnel ports, the real clean > >> up is performed at the end of the bridge_reconfigure() in the: > >> bridge_run__()->ofproto_run()->...->ofproto_dpif:port_destruct() > >> > >> We cannot call bridge_run__() at an earlier point before all > >> reconfigurations are done, so this patch tries a generic approach to > >> just re-run the bridge_reconfigure() when there are any port creations > >> encountered "File exists" error, which indicates a possible resource > >> conflict may have happened due to a later deleted port, and retry may > >> succeed. > >> > >> Signed-off-by: Han Zhou <hz...@ovn.org> > >> --- > >> This is RFC because I am not sure if there is a better way to solve the problem > >> more specifically by executing the port_destruct for the old port before trying > >> to create the new port. The fix may be more complex though. > > > > I don't think re-trying is a good approach in general. We should likely > > just destroy the tnl_port structure right away, similarly to how we clean > > up stp, lldp and bundles in ofproto_port_unregister(). Maybe we can add > > a new callback similar to bundle_remove() and call tnl_port_del() from it? > > (I didn't try, so I'm not 100% sure this will not cause any issues.) > > > > What do you think? > > > > Best regards, Ilya Maximets. > > _______________________________________________ > > dev mailing list > > d...@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > Hi Ilya and Han, > > We hit this issue some days ago. As our analysis, it was introduced by > commit fe83f81df977 ("netdev: Remove netdev from global shash when the > user is changing interface configuration.") > > Reproduce: > ovs-vsctl add-port br-int vxlan0 \ > -- set interface vxlan0 type=vxlan options:remote_ip=10.10.10.1 > > sleep 2 > > ovs-vsctl --if-exists del-port vxlan0 \ > -- add-port br-int vxlan1 \ > -- set interface vxlan1 type=vxlan options:remote_ip=10.10.10.1 > ovs-vsctl: Error detected while setting up 'vxlan1': could not add > network device vxlan1 to ofproto (File exists). > > vswitchd log: > bridge|INFO|bridge br-int: added interface vxlan0 on port 1106 > bridge|INFO|bridge br-int: deleted interface vxlan0 on port 1106 > tunnel|WARN|vxlan1: attempting to add tunnel port with same config as > port 'vxlan0' (::->10.10.10.1, key=0, legacy_l2, dp port=122) > ofproto|WARN|br-int: could not add port vxlan1 (File exists) > bridge|WARN|could not add network device vxlan1 to ofproto (File exists) > > CallTrace: > bridge_reconfigure > bridge_del_ports > port_destroy > iface_destroy__ > netdev_remove <------ netdev vxlan0 removed > bridge_delete_or_reconfigure_ports > OFPROTO_PORT_FOR_EACH > ofproto_port_dump_next > port_dump_next > port_query_by_name <------ netdev_shash do not contain vxlan0 > ofproto_port_del <------ vxlan0 do not del in ofproto > bridge_add_ports > bridge_add_ports__ > iface_create > iface_do_create > ofproto_port_add <------ vxlan1 add failed > > > We tried to fix this by the following diff: > > --- > ofproto/ofproto-dpif.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c > index d7544ef..654468e 100644 > --- a/ofproto/ofproto-dpif.c > +++ b/ofproto/ofproto-dpif.c > @@ -3907,14 +3907,16 @@ port_query_by_name(const struct ofproto > *ofproto_, const char *devname, > > if (sset_contains(&ofproto->ghost_ports, devname)) { > const char *type = netdev_get_type_from_name(devname); > + const struct ofport *ofport = > + shash_find_data(&ofproto->up.port_by_name, > devname); > + if (!type && ofport && ofport->netdev) { > + type = netdev_get_type(ofport->netdev); > + } > > /* We may be called before ofproto->up.port_by_name is > populated with > * the appropriate ofport. For this reason, we must get the > name and > * type from the netdev layer directly. */ > if (type) { > - const struct ofport *ofport; > - > - ofport = shash_find_data(&ofproto->up.port_by_name, devname); > ofproto_port->ofp_port = ofport ? ofport->ofp_port : > OFPP_NONE; > ofproto_port->name = xstrdup(devname); > ofproto_port->type = xstrdup(type); > > Best regards, Tao > Thanks Tao for the patch. Your patch is much cleaner. Would you like to send a formal patch? Ilya, what do you think? Han _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev