On Thu, Sep 11, 2025 at 12:11 PM Ilya Maximets <[email protected]> wrote:
>
> On 9/2/25 9:29 AM, Han Zhou wrote:
> > When adding ports to bridges with many existing ports, bridge
> > reconfiguration was spending significant CPU time in repeated calls to
> > ofproto_port_open_type() during MTU calculations.
> >
> > For N existing ports, this resulted in O(N) calls to
> > ofproto_port_open_type() per port addition.
> >
> > perf shows:
> > ...
> > +   75.47%     0.00%  ovs-vswitchd   ovs-vswitchd      [.] ofproto_run
> > +   66.17%     0.00%  ovs-vswitchd   ovs-vswitchd      [.] update_mtu
> > +   65.77%     7.27%  ovs-vswitchd   ovs-vswitchd      [.] find_min_mtu
> > +   65.77%     0.00%  ovs-vswitchd   ovs-vswitchd      [.]
update_mtu_ofproto (inlined)
> > +   48.75%     1.38%  ovs-vswitchd   ovs-vswitchd      [.]
ofport_is_mtu_overridden
> > +   48.56%     0.00%  ovs-vswitchd   ovs-vswitchd      [.]
ofport_is_internal_or_patch (inlined)
> > +   37.19%     2.21%  ovs-vswitchd   ovs-vswitchd      [.]
dpif_port_open_type
> > +   37.19%     0.00%  ovs-vswitchd   ovs-vswitchd      [.]
ofproto_port_open_type (inlined)
> > ...
> >
> > With 5k existing ports, adding a new port took 5.149s.
> > After this patch, it takes 3.201s for the same.
>
> Hi, Han.  Thanks for the patch!
>
> I think, it's an interesting change on its own, but I wonder what is the
> datapath and the port types were in your test?
>

Thanks Ilya! Yes you are right, I was using sandbox without specifying any
port type. I didn't realize it was a performance bug specific to the dummy
datapath.

> I see a similar performance with dummy datapath, but the reason for that
> is a not correct behavior of netdev_dummy_set_config() function.  It
always
> unconditionally calls netdev_change_seq_changed() and this is actually
> causing quadratic complexity on the port addition.  In the ofproto_run()
> we collect all the ports that changed into 'devnames' sset and then
> update_port() is called for each of them that calls update_mtu() then
> update_mtu_ofproto() and that iterates over all the ports again.  So, we
> always do a nested iteration over all the ports with N^2 complexity.
> For real ports though this should not normally happen as they will not
> be marked as changed and will not be added to the 'devnames' set.
>
> If I comment out the netdev_change_seq_changed() call, then it takes
> about 1 second to add a new dummy port at 5000.  The time complexity is
> still quadratic here because of the very inefficient get_port_by_name()
> implementation in lib/dpif-netdev.c.  We don't have a hash map there
> that would index by name, only by the port number, so this function
> just iterates over all the ports and compares names.  If we add a new
> hash map there that indexes by name, then the port addition in the
> dummy datapath (and a general userspace datapath) should be a lot faster.
>

I tried sandbox with "--dummy=system" and then creating ports with
type=internal. It took around 1.2s at 5000, and the perf shows exactly the
get_port_by_name() as the bottleneck as you pointed out.

> If I do the same test, but adding tunnel ports into a linux kernel
> datapath, then I do not see the same issue as with dummy ports.  One
tunnel
> port at 5000 can be added in about 350ms.  This is relatively fast,
because
> we're not doing a lot of syscalls for tunnel ports, since there is only
one
> shared device for all of them in the kernel.
>

I tried sandbox with "--dummy=system" with geneve ports. It is also same as
your observation, ~350ms for the 5000th port.
Then I tried 16k ports in the same sandbox env, and port-add took ~1.2s,
which shows linear growth, and I guess for 64k ports it would take ~5s.
However, in a scale test env which was running kernel datapath with real
geneve ports, at 16k ports level it took 5s to add a new port, but we
couldn't do perf profiling in that environment yet. It may be due to VM/CPU
differences.

> The situation is a bit worse for kernel ports that are actually backed
> by separate kernel interfaces.  The main problem here are all the
> OVS_VPORT_CMD_GET syscalls that are made for each port separately as part
> of the OFPROTO_PORT_FOR_EACH() macro inside the main iteration in
> vswitchd/bridge.c:bridge_delete_or_reconfigure_ports().
> Addition of one internal port into a linux kernel datapath takes about
> 500ms at 5000 preexisting ports.  I suspect that if we change the logic
> to dump all the ports once instead of doing a separate syscall per port,
> this may be done noticeably faster, but I didn't try.
>

This is less of a concern because it is not reasonable to have that many
kernel interfaces. The only use case I can imagine is the tunnel ports
that can be so many.

> All in all, the mtu-related code should not be a real issue, it is only
> a problem if any time you touch an interface it changes, which is only
> the case for dummy ports.  This should never happen for any type of a
> real port.
>
> What we actually need to optimize:
>
> 1. Optimize lib/dpif-netdev.c:get_port_by_name() to avoid the linear
>    port lookup.
>
> 2. Re-work OFPROTO_PORT_FOR_EACH() macro to dump all the ports from
>    the datapath once and then look them up in a local hash map, instead
>    of looking pu every port using a separate syscall.
>
> 3. Fix the netdev_dummy_set_config() to avoid flagging a change if
>    nothing actually changed.
>
> What do you think?

I agree, please ignore this patch.
And maybe 1) is the most important, 3) is good to have, and 2) is not
important for real world use cases.

Thanks,
Han

>
> Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to