On 4/11/26 4:14 PM, Weiming Shi wrote:
> The vport netlink reply helpers allocate a fixed-size skb with
> nlmsg_new(NLMSG_DEFAULT_SIZE, ...) but serialize the full upcall PID
> array via ovs_vport_get_upcall_portids(). Since
> ovs_vport_set_upcall_portids() accepts any non-zero multiple of
> sizeof(u32) with no upper bound, a CAP_NET_ADMIN user can install a PID
> array large enough to overflow the reply buffer. On systems with
> unprivileged user namespaces enabled (e.g., Ubuntu default), this is
> reachable via unshare -Urn since all OVS vport genl operations use
> GENL_UNS_ADMIN_PERM.
> 
> When the subsequent nla_put() fails with -EMSGSIZE, five BUG_ON(err < 0)
> sites fire and panic the kernel:
> 
>  kernel BUG at net/openvswitch/datapath.c:2414!
>  Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
>  CPU: 1 UID: 0 PID: 65 Comm: poc Not tainted 7.0.0-rc7-00195-geb216e422044 #1
>  RIP: 0010:ovs_vport_cmd_set+0x34c/0x400
>  Call Trace:
>   <TASK>
>   genl_family_rcv_msg_doit (net/netlink/genetlink.c:1116)
>   genl_rcv_msg (net/netlink/genetlink.c:1194)
>   netlink_rcv_skb (net/netlink/af_netlink.c:2550)
>   genl_rcv (net/netlink/genetlink.c:1219)
>   netlink_unicast (net/netlink/af_netlink.c:1344)
>   netlink_sendmsg (net/netlink/af_netlink.c:1894)
>   __sys_sendto (net/socket.c:2206)
>   __x64_sys_sendto (net/socket.c:2209)
>   do_syscall_64 (arch/x86/entry/syscall_64.c:63)
>   entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
>   </TASK>
>  Kernel panic - not syncing: Fatal exception
> 
> Fix this by dynamically sizing the reply skb to account for the actual
> PID array length, and replace the BUG_ON() calls with graceful error
> returns.

Hi, Weiming.  Thanks for working on this!  The earlier attempt to fix this
problem was here:
  https://lore.kernel.org/netdev/[email protected]/
CC: sunichi, maybe you can cooperate on the fix somehow.

A few problems with the solution here:

- You're locking to count the number of pids, then unlocking and then
  re-locking to fill the output.  This is racy and may still cause the
  inability to put all the data into allocated space, since the number can
  theoretically change.

- Failing the del command is very unfriendly to the userspace and also it
  becomes impossible for the user to delete the port without clearing the
  upcall pids first, which they will not know to do.

- Failing the new command as done in this change is very confusing as the
  port is actually created while the user gets the error.

- NLMSG_DEFAULT_SIZE + portids_size is not the actual size of the message,
  nothing in the code guarantees that the rest fits into NLMSG_DEFAULT_SIZE.

- The number of pids being unbounded is a problem in itself.

So, we need to approach this issue differently.  As I suggested in the thread
linked above, we should limit the maximum number of the pid to the number of
CPUs, as it is done for the per-cpu dispatch socket array.  There is no point
in more sockets than CPUs and the userspace never creates that many.  This way
we can:
- Fail the attempts to set up more pids than CPUs in the first place.
- Know beforehand the maximum size to allocate - no need to check the actual
  number and re-lock.  Create a function similar to ovs_dp_cmd_msg_size()
  to allocate based on what can be included in the worst case, which shouldn't
  be a lot.

And all the BUG_ON() calls must remain BUG_ON()s as it is a bug if we're not
counting correctly.

> Fixes: b83d23a2a38b ("openvswitch: Introduce per-cpu upcall dispatch")

This is not the right commit.  The change you're making has nothing to do with
the per-cpu dispatch.  The actual commit you're looking for is much older:
  5cd667b0a456 ("openvswitch: Allow each vport to have an array of 'port_id's.")

Couple more notes:

- Wait at least 24 hours between versions.  Otherwise, people have no enough
  time to look at your patch.

- Please, CC all maintainers including the ovs-dev list.  It is moderated for
  new senders (the only reliable way to keep it out of spam) but we approve
  fast, so your next emails should go right through.

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to