On Tue, Jan 27, 2026 at 01:47:43PM -0500, Stefan Hajnoczi wrote: > Hi Benjamin and Paolo, > I would like to discuss changes to DM-Multipath and qemu-pr-helper to > handle SCSI Persistent Reservations in QEMU without privileged code. > > SCSI Persistent Reservations support in QEMU is built on the > qemu-pr-helper daemon that performs PERSISTENT RESERVATION IN and > PERSISTENT RESERVATION OUT commands on behalf of the guest. The > qemu-pr-helper process provides privilege separation for ioctl(SG_IO)'s > CAP_SYS_RAWIO and libmpathpersist's root privileges since the main QEMU > process should not have those privileges. > > There are issues with the current approach: > - Privileged code is a security attack surface. > - A bunch of code is required for privilege separation and for management > tools to set up qemu-pr-helper with access to multipathd. > - The interface is SCSI-specific and does not support NVMe. > > Several of us have pondered a different approach that I will summarize > here. The <linux/pr.h> ioctl interface provides an alternative to > ioctl(SG_IO) without the CAP_SYS_RAWIO requirement. It supports both > SCSI and NVMe. Since privileges are not required, there would be no need > for the qemu-pr-helper daemon anymore. > > The blocker is that <linux/pr.h> is not usable in multipath > environments. The Linux DM-Multipath driver has an incomplete ioctl > implementation that falls short of what libmpathpersist and multipathd > do in userspace. Kernel changes are necessary to fix this. > > My suggestion is to implement <linux/pr.h> via upcalls from DM-Multipath > to multipathd. That way applications like QEMU can consistently use > <linux/pr.h> across block device types and no longer have to go through > the privileged libmpathpersist interface.
This would take intercepting the pr commands to multipath devices right at the start of dm_call_pr(). In order to make some persistent reservation commands seem atomic, libmpathpersist needs to suspend the multipath device in certain situations. So device-mapper cannot call dm_get_live_table(), since this will block suspends. This should be o.k. Libmpathpersist is designed to handle the possiblity that the multipath device gets reloaded with different paths while it is running. And since the multipath target is an immutable singleton target, there is no possibility of it turning into another target type because of a table reload during suspend. Also, just to clarify, the kernel code can't interface directly with multipathd. Most of the code for handling persistent reservations is in libmpathpersist, which just needs multipathd to do things like make sure that paths that are added in the furture get registered properly. There would likely need to be some new program (that is just a thin wrapper around libmpathpersist) which can be called with call_usermodehelper(). > Once DM-Multipath support <linux/pr.h> is functional, the main QEMU > process can directly invoke the ioctls. qemu-pr-helper will no longer be > needed, eliminating privileged code and simplifying the setup required > by management tools such as libvirt and KubeVirt. > > The only loss in functionality that I have identified when switching to > <linux/pr.h> is that qemu-pr-helper supports SCSI TransportIDs for the > PERSISTENT RESERVATION OUT command. This is not supported by > <linux/pr.h>, but I'm not sure how this even works today since the guest > sees a virtual SCSI bus and is unaware of the physical bus or HBA. So > maybe that was never used in the first place? This is fine. Like you said, TransportIDs don't really make sense on a virtual scsi device on top of a multipath device. > Does this plan sound good to you? I'm not sure how well this would go over upstream, but it does seem like a reasonable plan. Mikulas, do you have any thought about this idea? -Ben > Benjamin: I can work on the DM-Multipath upcalls if you are busy. > > Thanks, > Stefan
