On 11/18/21 11:56 AM, Murilo Opsfelder Araújo wrote:
On 11/16/21 19:31, Ilya Maximets wrote:
On 10/25/21 19:45, David Christensen wrote:
In certain high load situations, such as when creating a large number of
ports on a switch, the parameter 'sock' may be passed to nl_sock_recv__()
as null, resulting in a segmentation fault when 'sock' is later
dereferenced, such as when calling recvmsg().

Hi, David.  Thanks for the patch.

It's OK to check for a NULL pointer there, I guess.  However,
do you know from where it was actually called?  This function,
in general, should not be called without the actual socket,
so we, probably, should fix the caller instead.

Best regards, Ilya Maximets.

Hi, Ilya Maximets.

When I looked at the coredump file, ch->sock was nil and was passed to nl_sock_recv():

(gdb) l
2701
2702        while (handler->event_offset < handler->n_events) {
2703            int idx = handler->epoll_events[handler->event_offset].data.u32;
2704            struct dpif_channel *ch = &dpif->channels[idx];

(gdb) p idx
$26 = 4
(gdb) p *dpif->channels@5
$27 = {{sock = 0x1001ae88240, last_poll = -9223372036854775808}, {sock = 0x1001aa9a8a0, last_poll = -9223372036854775808}, {sock = 0x1001ae09510, last_poll = 60634070}, {sock = 0x1001a9dbb60, last_poll = 60756950}, {sock = 0x0,
     last_poll = 61340749}}


The above snippet is from lib/dpif-netlink.c and the caller is dpif_netlink_recv_vport_dispatch().

The channel at idx=4 had sock=0x0, which was passed to nl_sock_recv() via ch->sock parameter. In that function, it tried to access sock->fd when calling recvmsg(), causing the segfault.

I'm not enough experienced in Open vSwitch to explain why sock was nil at that given index.
The fix seems worth, though.

A few other points of note:

- Test system was a very large configuration (2K CPUs, > 1TB RAM)
- OVS Switch was configured with 6K ports as follows:

# b=br0; cmds=; for i in {1..6000}; do cmds+=" -- add-port $b p$i -- set interface p$i type=internal"; done
# time sudo ovs-vsctl $cmds

- OVS was installed from RHEL RPM. Build from source did not exhibit the same problem. - Unable to reproduce on a different system (128 CPUs, 256GB RAM), even with 10K ports.

Dave
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to