On 2/27/24 20:14, Han Zhou wrote:
> For kernel datapath, when a new tunnel port is created in the same
> transaction in which an old tunnel port with the same tunnel
> configuration is deleted, the new tunnel port creation will fail and
> left in an error state. This can be easily reproduced in OVN by
> triggering a chassis deletion and addition with the same encap in the
> same SB DB transaction, such as:
> 
> ovn-sbctl chassis-add aaaaaa geneve 1.2.3.4
> ovn-sbctl chassis-del aaaaaa -- chassis-add bbbbbb 1.2.3.4
> 
> ovs-vsctl show | grep error
> error: "could not add network device ovn-bbbbbb-0 to ofproto (File exists)"
> 
> Related logs in OVS:
> —
> 2024-02-23T05:41:49.978Z|405933|bridge|INFO|bridge br-int: deleted interface 
> ovn-aaaaaa-0 on port 113
> 2024-02-23T05:41:49.989Z|405935|tunnel|WARN|ovn-bbbbbb-0: attempting to add 
> tunnel port with same config as port 'ovn-aaaaaa-0' (::->1.2.3.4, key=flow, 
> legacy_l2, dp port=9)
> 2024-02-23T05:41:49.989Z|405936|ofproto|WARN|br-int: could not add port 
> ovn-bbbbbb-0 (File exists)
> 2024-02-23T05:41:49.989Z|405937|bridge|WARN|could not add network device 
> ovn-bbbbbb-0 to ofproto (File exists)
> —

Hi, Han.  Thanks for the patch!

> 
> Depending on when there are other OVSDB changes, it may take a long time
> for the device to be added successfully, triggered by the next OVS
> iteration.
> 
> (note: native tunnel ports do not have this problem)

I don't think this is correct.  The code path is common for both system
and native tunnels.  I can reproduce the issues in a sandbox with:

$ make -j8 sandbox SANDBOXFLAGS="\-\-dummy='system'"
[tutorial]$ ovs-vsctl add-port br0 tunnel_port \
                -- set Interface tunnel_port \
                       type=geneve options:remote_ip=flow options:key=123
[tutorial]$ ovs-vsctl del-port tunnel_port \
                -- add-port br0 tunnel_port2 \
                -- set Interface tunnel_port2 \
                       type=geneve options:remote_ip=flow options:key=123
ovs-vsctl: Error detected while setting up 'tunnel_port2':
could not add network device tunnel_port2 to ofproto (File exists).
See ovs-vswitchd log for details.

The same should work in a testsuite as well, i.e. we should be able to
create a test for this scenario.

Note: The --dummy=system prevents OVS from replacing tunnel ports with
      dummy ones.

> 
> The problem is how the tunnel port deletion and creation are handled. In
> bridge_reconfigure(), port deletion is handled before creation, to avoid
> such resource conflict. However, for kernel tunnel ports, the real clean
> up is performed at the end of the bridge_reconfigure() in the:
> bridge_run__()->ofproto_run()->...->ofproto_dpif:port_destruct()
> 
> We cannot call bridge_run__() at an earlier point before all
> reconfigurations are done, so this patch tries a generic approach to
> just re-run the bridge_reconfigure() when there are any port creations
> encountered "File exists" error, which indicates a possible resource
> conflict may have happened due to a later deleted port, and retry may
> succeed.
> 
> Signed-off-by: Han Zhou <hz...@ovn.org>
> ---
> This is RFC because I am not sure if there is a better way to solve the 
> problem
> more specifically by executing the port_destruct for the old port before 
> trying
> to create the new port. The fix may be more complex though.

I don't think re-trying is a good approach in general.  We should likely
just destroy the tnl_port structure right away, similarly to how we clean
up stp, lldp and bundles in ofproto_port_unregister().  Maybe we can add
a new callback similar to bundle_remove() and call tnl_port_del() from it?
(I didn't try, so I'm not 100% sure this will not cause any issues.)

What do you think?

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to