On Fri, Mar 8, 2024 at 12:17 AM Tao Liu <taoliu...@163.com> wrote:
>
>
> On 3/7/24 5:39 AM, Ilya Maximets wrote:
> > On 2/27/24 20:14, Han Zhou wrote:
> >> For kernel datapath, when a new tunnel port is created in the same
> >> transaction in which an old tunnel port with the same tunnel
> >> configuration is deleted, the new tunnel port creation will fail and
> >> left in an error state. This can be easily reproduced in OVN by
> >> triggering a chassis deletion and addition with the same encap in the
> >> same SB DB transaction, such as:
> >>
> >> ovn-sbctl chassis-add aaaaaa geneve 1.2.3.4
> >> ovn-sbctl chassis-del aaaaaa -- chassis-add bbbbbb 1.2.3.4
> >>
> >> ovs-vsctl show | grep error
> >> error: "could not add network device ovn-bbbbbb-0 to ofproto (File
exists)"
> >>
> >> Related logs in OVS:
> >> —
> >> 2024-02-23T05:41:49.978Z|405933|bridge|INFO|bridge br-int: deleted
interface ovn-aaaaaa-0 on port 113
> >> 2024-02-23T05:41:49.989Z|405935|tunnel|WARN|ovn-bbbbbb-0: attempting
to add tunnel port with same config as port 'ovn-aaaaaa-0' (::->1.2.3.4,
key=flow, legacy_l2, dp port=9)
> >> 2024-02-23T05:41:49.989Z|405936|ofproto|WARN|br-int: could not add
port ovn-bbbbbb-0 (File exists)
> >> 2024-02-23T05:41:49.989Z|405937|bridge|WARN|could not add network
device ovn-bbbbbb-0 to ofproto (File exists)
> >> —
> >
> > Hi, Han.  Thanks for the patch!
> >
> >>
> >> Depending on when there are other OVSDB changes, it may take a long
time
> >> for the device to be added successfully, triggered by the next OVS
> >> iteration.
> >>
> >> (note: native tunnel ports do not have this problem)
> >
> > I don't think this is correct.  The code path is common for both system
> > and native tunnels.  I can reproduce the issues in a sandbox with:
> >
> > $ make -j8 sandbox SANDBOXFLAGS="\-\-dummy='system'"
> > [tutorial]$ ovs-vsctl add-port br0 tunnel_port \
> >                  -- set Interface tunnel_port \
> >                         type=geneve options:remote_ip=flow
options:key=123
> > [tutorial]$ ovs-vsctl del-port tunnel_port \
> >                  -- add-port br0 tunnel_port2 \
> >                  -- set Interface tunnel_port2 \
> >                         type=geneve options:remote_ip=flow
options:key=123
> > ovs-vsctl: Error detected while setting up 'tunnel_port2':
> > could not add network device tunnel_port2 to ofproto (File exists).
> > See ovs-vswitchd log for details.
> >
> > The same should work in a testsuite as well, i.e. we should be able to
> > create a test for this scenario.
> >
> > Note: The --dummy=system prevents OVS from replacing tunnel ports with
> >        dummy ones.
> >

Thanks Ilya for the correction! --dummy=system is very helpful.

> >>
> >> The problem is how the tunnel port deletion and creation are handled.
In
> >> bridge_reconfigure(), port deletion is handled before creation, to
avoid
> >> such resource conflict. However, for kernel tunnel ports, the real
clean
> >> up is performed at the end of the bridge_reconfigure() in the:
> >> bridge_run__()->ofproto_run()->...->ofproto_dpif:port_destruct()
> >>
> >> We cannot call bridge_run__() at an earlier point before all
> >> reconfigurations are done, so this patch tries a generic approach to
> >> just re-run the bridge_reconfigure() when there are any port creations
> >> encountered "File exists" error, which indicates a possible resource
> >> conflict may have happened due to a later deleted port, and retry may
> >> succeed.
> >>
> >> Signed-off-by: Han Zhou <hz...@ovn.org>
> >> ---
> >> This is RFC because I am not sure if there is a better way to solve
the problem
> >> more specifically by executing the port_destruct for the old port
before trying
> >> to create the new port. The fix may be more complex though.
> >
> > I don't think re-trying is a good approach in general.  We should likely
> > just destroy the tnl_port structure right away, similarly to how we
clean
> > up stp, lldp and bundles in ofproto_port_unregister().  Maybe we can add
> > a new callback similar to bundle_remove() and call tnl_port_del() from
it?
> > (I didn't try, so I'm not 100% sure this will not cause any issues.)
> >
> > What do you think?
> >
> > Best regards, Ilya Maximets.
> > _______________________________________________
> > dev mailing list
> > d...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
> Hi Ilya and Han,
>
> We hit this issue some days ago. As our analysis, it was introduced by
> commit fe83f81df977 ("netdev: Remove netdev from global shash when the
> user is changing interface configuration.")
>
> Reproduce:
>    ovs-vsctl add-port br-int vxlan0 \
>      -- set interface vxlan0 type=vxlan options:remote_ip=10.10.10.1
>
>    sleep 2
>
>    ovs-vsctl --if-exists del-port vxlan0 \
>      -- add-port br-int vxlan1 \
>      -- set interface vxlan1 type=vxlan options:remote_ip=10.10.10.1
>    ovs-vsctl: Error detected while setting up 'vxlan1': could not add
>    network device vxlan1 to ofproto (File exists).
>
> vswitchd log:
>    bridge|INFO|bridge br-int: added interface vxlan0 on port 1106
>    bridge|INFO|bridge br-int: deleted interface vxlan0 on port 1106
>    tunnel|WARN|vxlan1: attempting to add tunnel port with same config as
> port 'vxlan0' (::->10.10.10.1, key=0, legacy_l2, dp port=122)
>    ofproto|WARN|br-int: could not add port vxlan1 (File exists)
>    bridge|WARN|could not add network device vxlan1 to ofproto (File
exists)
>
> CallTrace:
>    bridge_reconfigure
>      bridge_del_ports
>        port_destroy
>          iface_destroy__
>            netdev_remove         <------ netdev vxlan0 removed
>      bridge_delete_or_reconfigure_ports
>        OFPROTO_PORT_FOR_EACH
>          ofproto_port_dump_next
>            port_dump_next
>            port_query_by_name    <------ netdev_shash do not contain
vxlan0
>          ofproto_port_del        <------ vxlan0 do not del in ofproto
>      bridge_add_ports
>        bridge_add_ports__
>          iface_create
>            iface_do_create
>              ofproto_port_add    <------ vxlan1 add failed
>
>
> We tried to fix this by the following diff:
>
> ---
>   ofproto/ofproto-dpif.c | 8 +++++---
>   1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
> index d7544ef..654468e 100644
> --- a/ofproto/ofproto-dpif.c
> +++ b/ofproto/ofproto-dpif.c
> @@ -3907,14 +3907,16 @@ port_query_by_name(const struct ofproto
> *ofproto_, const char *devname,
>
>       if (sset_contains(&ofproto->ghost_ports, devname)) {
>           const char *type = netdev_get_type_from_name(devname);
> +        const struct ofport *ofport =
> +                        shash_find_data(&ofproto->up.port_by_name,
> devname);
> +        if (!type && ofport && ofport->netdev) {
> +            type = netdev_get_type(ofport->netdev);
> +        }
>
>           /* We may be called before ofproto->up.port_by_name is
> populated with
>            * the appropriate ofport.  For this reason, we must get the
> name and
>            * type from the netdev layer directly. */
>           if (type) {
> -            const struct ofport *ofport;
> -
> -            ofport = shash_find_data(&ofproto->up.port_by_name, devname);
>               ofproto_port->ofp_port = ofport ? ofport->ofp_port :
> OFPP_NONE;
>               ofproto_port->name = xstrdup(devname);
>               ofproto_port->type = xstrdup(type);
>
> Best regards, Tao
>

Thanks Tao for the patch. Your patch is much cleaner. Would you like to
send a formal patch?
Ilya, what do you think?

Han
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to