Re: [libvirt] New QEMU daemon for persistent reservations
On 27/11/2017 14:44, Michal Privoznik wrote: > On 11/27/2017 12:09 PM, Paolo Bonzini wrote: >> On 24/11/2017 19:13, Michal Privoznik wrote: >>> On 11/24/2017 06:18 PM, Paolo Bonzini wrote: On 24/11/2017 18:07, Michal Privoznik wrote: > The helper is going to be daemonized (for the same reason > that qemu process is) => there's no SIGCHLD for libvirtd to receive. Couldn't (or shouldn't!) libvirt register itself as a subreaper instead? >>> >>> Whether libvirt can't have a >>> separate thread where nothing waitpid() would run? That could hardly >>> work - not only would libvirt need to spawn separate thread for every >>> helper, it would not work across restarts of libvirtd. We would have no >>> guarantee that PID we recorded is still the same process as it was when >>> we were dying. >> >> Oh, you're right. :( Even if libvirtd were a subreaper (see prctl(2) >> manpage), the daemonized qemu-pr-helper would be reparented to init when >> libvirtd restarts. >> >> libvirtd could test whether the helper is alive by connecting to its >> socket. If that's not enough, I'll add an event. > > Well, connecting and staying connected? Otherwise mere connect followed > by immediate disconnect would work only when libvirtd is starting up and > reconnecting to already running domains. Not if the helper dies some > time after libvirt started the domain and the helper. [1] Ok, let's add the event and state command (to QEMU 2.12 only). Thanks, Paolo -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On 11/27/2017 12:09 PM, Paolo Bonzini wrote: > On 24/11/2017 19:13, Michal Privoznik wrote: >> On 11/24/2017 06:18 PM, Paolo Bonzini wrote: >>> On 24/11/2017 18:07, Michal Privoznik wrote: The helper is going to be daemonized (for the same reason that qemu process is) => there's no SIGCHLD for libvirtd to receive. >>> >>> Couldn't (or shouldn't!) libvirt register itself as a subreaper instead? >> >> Whether libvirt can't have a >> separate thread where nothing waitpid() would run? That could hardly >> work - not only would libvirt need to spawn separate thread for every >> helper, it would not work across restarts of libvirtd. We would have no >> guarantee that PID we recorded is still the same process as it was when >> we were dying. > > Oh, you're right. :( Even if libvirtd were a subreaper (see prctl(2) > manpage), the daemonized qemu-pr-helper would be reparented to init when > libvirtd restarts. > > libvirtd could test whether the helper is alive by connecting to its > socket. If that's not enough, I'll add an event. Well, connecting and staying connected? Otherwise mere connect followed by immediate disconnect would work only when libvirtd is starting up and reconnecting to already running domains. Not if the helper dies some time after libvirt started the domain and the helper. [1] > > Another thing: we have > > > > which represents "one daemon per QEMU" in privileged libvirtd and > "well-known socket path" in session libvirtd. What would the > live-domain XML look like? Maybe: > > > > mode='server'/> > > > > > mode='server'/> > This is a good idea. We can explicitly state if the helper is supposed to be managed by libvirtd or not. However, I'd prefer not exposing path in case managed='yes' because it's something that libvirt should manage. We don't expose monitor path either, for instance. With managed='yes' libvirt would: a) spawn daemon on per domain basis b) generate unique path for each helper it spawns c) label the socket (so that qemu can access it) d) put the path onto qemu's cmd line However, with managed='no' libvirt would: a) require path to be provided in the XML b) merely just put the path onto qemu's cmd line > > And then, would it be an error to specify the "wrong" managed setting in > the domain-creation XML? For now yes. For now specifying both path and managed='yes' would be considered an error. It gives us enough room to allow that in the future should anybody need it. > Or would it be possible to use a managed='no' > helper in privileged libvirtd? I take it as a "no" given your remark below. You're right. This is going to be forbidden too. Again, we can allow this if somebody needs it. 1: Also, it's worth nothing that libvirtd would restart the helper iff managed='yes'. Michal -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On 27/11/2017 14:35, Michal Privoznik wrote: >>> But can you also test _more_ permissions if you want? So if QEMU passed >>> to the helper a file for which it has "lock" but not "ioctl" access, >>> would it make sense for the helper to let it through? Together with the >>> socket MAC, this should be precise enough. >> Sure, you can check any of the permissions which are defined for the >> type of object you've got. The goal is to check permissions which >> correspond to actions you're taking on the object. So if you do >> something other than just ioctl() on the passed in FD, it would make >> sense to check further permissions as appropriate. > Just to make sure I understand correctly: the PD passing is done by qemu > and not libvirt, right? Technically, we don't open the disks. Correct. Paolo -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On 11/27/2017 02:29 PM, Daniel P. Berrange wrote: > On Mon, Nov 27, 2017 at 02:01:06PM +0100, Paolo Bonzini wrote: >> On 27/11/2017 13:57, Daniel P. Berrange wrote: Got it. My problem here is that ioctl permission might be too strict. One use case for the helper is to bypass the ioctl permission, and only grant SCSI passthrough access for the specific case of persistent reservation commands. Would it make sense to also allow e.g. "lock" (and would it pass the SELinux policy)? >>> When I write "...ioctl perm..." I use that just as a short cut to represent >>> whatever SELinux access vector would be checked if QEMU were to do the SCSI >>> PR calls itself. The access vector permission bits are defined in the >>> policy >>> file, and in the auto-generated header file >>> /usr/include/selinux/av_permissions.h >>> >>> AFAICT, there's only a coarse COMMON_FILE_IOCTL access vector defined, >>> nothing >>> on the level of individual ioctl commands. So, yes, the MAC policy check >>> would allow other ioctl commands to be run, aside from those required for >>> persistent reservations. We just have to rely on the PR helper code for >>> that restriction. >> >> But can you also test _more_ permissions if you want? So if QEMU passed >> to the helper a file for which it has "lock" but not "ioctl" access, >> would it make sense for the helper to let it through? Together with the >> socket MAC, this should be precise enough. > > Sure, you can check any of the permissions which are defined for the > type of object you've got. The goal is to check permissions which > correspond to actions you're taking on the object. So if you do > something other than just ioctl() on the passed in FD, it would make > sense to check further permissions as appropriate. Just to make sure I understand correctly: the PD passing is done by qemu and not libvirt, right? Technically, we don't open the disks. Michal -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On Mon, Nov 27, 2017 at 02:01:06PM +0100, Paolo Bonzini wrote: > On 27/11/2017 13:57, Daniel P. Berrange wrote: > >> Got it. My problem here is that ioctl permission might be too strict. > >> One use case for the helper is to bypass the ioctl permission, and only > >> grant SCSI passthrough access for the specific case of persistent > >> reservation commands. Would it make sense to also allow e.g. "lock" > >> (and would it pass the SELinux policy)? > > When I write "...ioctl perm..." I use that just as a short cut to represent > > whatever SELinux access vector would be checked if QEMU were to do the SCSI > > PR calls itself. The access vector permission bits are defined in the > > policy > > file, and in the auto-generated header file > > /usr/include/selinux/av_permissions.h > > > > AFAICT, there's only a coarse COMMON_FILE_IOCTL access vector defined, > > nothing > > on the level of individual ioctl commands. So, yes, the MAC policy check > > would allow other ioctl commands to be run, aside from those required for > > persistent reservations. We just have to rely on the PR helper code for > > that restriction. > > But can you also test _more_ permissions if you want? So if QEMU passed > to the helper a file for which it has "lock" but not "ioctl" access, > would it make sense for the helper to let it through? Together with the > socket MAC, this should be precise enough. Sure, you can check any of the permissions which are defined for the type of object you've got. The goal is to check permissions which correspond to actions you're taking on the object. So if you do something other than just ioctl() on the passed in FD, it would make sense to check further permissions as appropriate. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On 27/11/2017 13:57, Daniel P. Berrange wrote: >> Got it. My problem here is that ioctl permission might be too strict. >> One use case for the helper is to bypass the ioctl permission, and only >> grant SCSI passthrough access for the specific case of persistent >> reservation commands. Would it make sense to also allow e.g. "lock" >> (and would it pass the SELinux policy)? > When I write "...ioctl perm..." I use that just as a short cut to represent > whatever SELinux access vector would be checked if QEMU were to do the SCSI > PR calls itself. The access vector permission bits are defined in the policy > file, and in the auto-generated header file > /usr/include/selinux/av_permissions.h > > AFAICT, there's only a coarse COMMON_FILE_IOCTL access vector defined, nothing > on the level of individual ioctl commands. So, yes, the MAC policy check > would allow other ioctl commands to be run, aside from those required for > persistent reservations. We just have to rely on the PR helper code for > that restriction. But can you also test _more_ permissions if you want? So if QEMU passed to the helper a file for which it has "lock" but not "ioctl" access, would it make sense for the helper to let it through? Together with the socket MAC, this should be precise enough. Thanks, Paolo -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On 27/11/2017 13:18, Daniel P. Berrange wrote: > On Mon, Nov 27, 2017 at 12:51:33PM +0100, Paolo Bonzini wrote: >> On 27/11/2017 12:37, Daniel P. Berrange wrote: >>> On Mon, Nov 27, 2017 at 12:13:24PM +0100, Paolo Bonzini wrote: Hm, I see what you mean now. But it would be "just" a qemu-pr-helper bug that it trusts the caller to have "ioctl" permissions on the file descriptor, wouldn't it? And it could be a feature even, since the remote QEMU process also has to have "connect" permissions on the qemu-pr-helper socket. So you could give it ioctl access *limited to persistent reservations* by granting the appropriate permissions on the socket. >>> >>> We can't grant access to the persistent reservation helper's socket on a >>> per QEMU basis. Permissions are granted on the domain type svirt_t, and >>> we don't want to invent a new domain type just for having access to the >>> PR helper. >> >> You can do so via DAC and MAC on the socket itself, or is that not enough? > > Yes, you can do MAC on the socket, but that merely gives you protection > betweeen the host & guest. The goal of sVirt is to give protection > between VMs, and MAC on the socket alone doesn't guarantee that. > In the shared-PR helper the PR helper context would not have an > MAC level applied, so the MAC check is now > >process context == qemu_pr_helper_t:s0 >file context == svirt_image_t:s0,c340,c524 > > This is still a MAC check, but it is a weaker check than what sVirt promises, > because a QEMU with MCS c340,c524 could pass a file descriptor with a MCS > level c123,c422. QEMU would be denied to use this bogus FD, but the PR > helper would happily allow this bad access. So the issue could be that QEMU could have received this bogus FD from an attacker. Got it. > So to satisfy sVirt separation of QEMU instances, we need the PR helper to > be able to validate the MCS levels, as the kernel no longer has enough info > todo this automatically. > > In the PR helper this is achieved with something like > > getpeercon(socketfd, &qemucon) => returns svirt_t:s0,c340,c524 > fgetfilecont(filefd, &diskcon) => returns svirt_image_t:s0,c340,c524 > > Then it would call something like > >security_compute_av(qemucon, diskcon, ...file class..., ...ioctl perm...) > > to get a MAC decision from the kernel. This preserves the sVirt isolation > of individual QEMU instances. Got it. My problem here is that ioctl permission might be too strict. One use case for the helper is to bypass the ioctl permission, and only grant SCSI passthrough access for the specific case of persistent reservation commands. Would it make sense to also allow e.g. "lock" (and would it pass the SELinux policy)? Also, what about AppArmor? Thanks, Paolo -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On Mon, Nov 27, 2017 at 01:49:41PM +0100, Paolo Bonzini wrote: > On 27/11/2017 13:18, Daniel P. Berrange wrote: > > On Mon, Nov 27, 2017 at 12:51:33PM +0100, Paolo Bonzini wrote: > >> On 27/11/2017 12:37, Daniel P. Berrange wrote: > >>> On Mon, Nov 27, 2017 at 12:13:24PM +0100, Paolo Bonzini wrote: > Hm, I see what you mean now. But it would be "just" a qemu-pr-helper > bug that it trusts the caller to have "ioctl" permissions on the file > descriptor, wouldn't it? > > And it could be a feature even, since the remote QEMU process also has > to have "connect" permissions on the qemu-pr-helper socket. So you > could give it ioctl access *limited to persistent reservations* by > granting the appropriate permissions on the socket. > >>> > >>> We can't grant access to the persistent reservation helper's socket on a > >>> per QEMU basis. Permissions are granted on the domain type svirt_t, and > >>> we don't want to invent a new domain type just for having access to the > >>> PR helper. > >> > >> You can do so via DAC and MAC on the socket itself, or is that not enough? > > > > Yes, you can do MAC on the socket, but that merely gives you protection > > betweeen the host & guest. The goal of sVirt is to give protection > > between VMs, and MAC on the socket alone doesn't guarantee that. > > In the shared-PR helper the PR helper context would not have an > > MAC level applied, so the MAC check is now > > > >process context == qemu_pr_helper_t:s0 > >file context == svirt_image_t:s0,c340,c524 > > > > This is still a MAC check, but it is a weaker check than what sVirt > > promises, > > because a QEMU with MCS c340,c524 could pass a file descriptor with a MCS > > level c123,c422. QEMU would be denied to use this bogus FD, but the PR > > helper would happily allow this bad access. > > So the issue could be that QEMU could have received this bogus FD from > an attacker. Got it. > > > So to satisfy sVirt separation of QEMU instances, we need the PR helper to > > be able to validate the MCS levels, as the kernel no longer has enough info > > todo this automatically. > > > > In the PR helper this is achieved with something like > > > > getpeercon(socketfd, &qemucon) => returns svirt_t:s0,c340,c524 > > fgetfilecont(filefd, &diskcon) => returns svirt_image_t:s0,c340,c524 > > > > Then it would call something like > > > >security_compute_av(qemucon, diskcon, ...file class..., ...ioctl perm...) > > > > to get a MAC decision from the kernel. This preserves the sVirt isolation > > of individual QEMU instances. > > Got it. My problem here is that ioctl permission might be too strict. > One use case for the helper is to bypass the ioctl permission, and only > grant SCSI passthrough access for the specific case of persistent > reservation commands. Would it make sense to also allow e.g. "lock" > (and would it pass the SELinux policy)? When I write "...ioctl perm..." I use that just as a short cut to represent whatever SELinux access vector would be checked if QEMU were to do the SCSI PR calls itself. The access vector permission bits are defined in the policy file, and in the auto-generated header file /usr/include/selinux/av_permissions.h AFAICT, there's only a coarse COMMON_FILE_IOCTL access vector defined, nothing on the level of individual ioctl commands. So, yes, the MAC policy check would allow other ioctl commands to be run, aside from those required for persistent reservations. We just have to rely on the PR helper code for that restriction. > Also, what about AppArmor? I don't think it has a way to do the delegated permission checks in the way I describe for SELinux. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On Mon, Nov 27, 2017 at 12:51:33PM +0100, Paolo Bonzini wrote: > On 27/11/2017 12:37, Daniel P. Berrange wrote: > > On Mon, Nov 27, 2017 at 12:13:24PM +0100, Paolo Bonzini wrote: > >> Hm, I see what you mean now. But it would be "just" a qemu-pr-helper > >> bug that it trusts the caller to have "ioctl" permissions on the file > >> descriptor, wouldn't it? > >> > >> And it could be a feature even, since the remote QEMU process also has > >> to have "connect" permissions on the qemu-pr-helper socket. So you > >> could give it ioctl access *limited to persistent reservations* by > >> granting the appropriate permissions on the socket. > > > > We can't grant access to the persistent reservation helper's socket on a > > per QEMU basis. Permissions are granted on the domain type svirt_t, and > > we don't want to invent a new domain type just for having access to the > > PR helper. > > You can do so via DAC and MAC on the socket itself, or is that not enough? Whatever is done with DAC is irrelevant when considering the MAC policy. Yes, you can do MAC on the socket, but that merely gives you protection betweeen the host & guest. The goal of sVirt is to give protection between VMs, and MAC on the socket alone doesn't guarantee that. Consider what the kernel would enforce if the SCSI PR was done inside QEMU. It would check allowed(process context, file context, ...file class.., ...ioctl perm...) where process context == svirt_t:s0,c340,c524 file context == svirt_image_t:s0,c340,c524 In the case of a per-QEMU SCSI PR, we can get the same effect because when this is checked by the kernel, the PR helper has a process context still containing a MCS level, eg process context == qemu_pr_helper_t:s0,c340,c524 file context == svirt_image_t:s0,c340,c524 In the shared-PR helper though, the PR helper context would not have an MAC level applied, so the MAC check is now process context == qemu_pr_helper_t:s0 file context == svirt_image_t:s0,c340,c524 This is still a MAC check, but it is a weaker check than what sVirt promises, because a QEMU with MCS c340,c524 could pass a file descriptor with a MCS level c123,c422. QEMU would be denied to use this bogus FD, but the PR helper would happily allow this bad access. So to satisfy sVirt separation of QEMU instances, we need the PR helper to be able to validate the MCS levels, as the kernel no longer has enough info todo this automatically. In the PR helper this is achieved with something like getpeercon(socketfd, &qemucon) => returns svirt_t:s0,c340,c524 fgetfilecont(filefd, &diskcon) => returns svirt_image_t:s0,c340,c524 Then it would call something like security_compute_av(qemucon, diskcon, ...file class..., ...ioctl perm...) to get a MAC decision from the kernel. This preserves the sVirt isolation of individual QEMU instances. > In other words, what are the SELinux best practices when you don't want > a process to have blanket access to a permission, but you may be fine > with a subset of those? If you think of unpriv_sgio=0 as a very simple > MAC, this is actually the very case that the PR helper is designed for. It depends on the security goal of the MAC policy. The original SELinux policy was just protecting the host from QEMUs. For sVirt the goal is stronger, to have isolation between individual QEMUs Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On 27/11/2017 12:37, Daniel P. Berrange wrote: > On Mon, Nov 27, 2017 at 12:13:24PM +0100, Paolo Bonzini wrote: >> Hm, I see what you mean now. But it would be "just" a qemu-pr-helper >> bug that it trusts the caller to have "ioctl" permissions on the file >> descriptor, wouldn't it? >> >> And it could be a feature even, since the remote QEMU process also has >> to have "connect" permissions on the qemu-pr-helper socket. So you >> could give it ioctl access *limited to persistent reservations* by >> granting the appropriate permissions on the socket. > > We can't grant access to the persistent reservation helper's socket on a > per QEMU basis. Permissions are granted on the domain type svirt_t, and > we don't want to invent a new domain type just for having access to the > PR helper. You can do so via DAC and MAC on the socket itself, or is that not enough? In other words, what are the SELinux best practices when you don't want a process to have blanket access to a permission, but you may be fine with a subset of those? If you think of unpriv_sgio=0 as a very simple MAC, this is actually the very case that the PR helper is designed for. Thanks, Paolo > So if we grant access to a global PR helper, we must have that helper > do MAC checks. Without it, QEMU has delegated actions it can't do itself > to a separate process thus escaping its MAC confinement in that area. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On Mon, Nov 27, 2017 at 12:13:24PM +0100, Paolo Bonzini wrote: > On 27/11/2017 11:59, Daniel P. Berrange wrote: > > On Mon, Nov 27, 2017 at 11:57:56AM +0100, Paolo Bonzini wrote: > >> On 27/11/2017 10:40, Daniel P. Berrange wrote: > >>> > >>> If we had one daemon per QEMU, then we would give the daemon the same > >>> MCS label as QEMU. The kernel will thus enforce this label matches the > >>> label on the QEMU process when it connects to the UNIX socket. The kernel > >>> will also validate the label on the disk file descriptor passed to the > >>> daemon by QEMU. > >>> > >>> If we had one daemon per host, then that daemon will need a generic > >>> label that lets all QEMUs connect to it. When QEMU passes in a disk > >>> FD, the daemon will need to query the SELinux context of the remote > >>> QEMU process, and then perform a userspace ACL check of that against > >>> the FD that is passed in. > >>> > >>> The latter case means the QEMU helper will need to link to libselinux > >>> and performs checks itself. > >> > >> Then it seems much better to use one daemon per QEMU, indeed. > > > > That would rule out using persistent reservations for unprivileged QEMU > > though, because we still need sVirt protection for that. > > Hm, I see what you mean now. But it would be "just" a qemu-pr-helper > bug that it trusts the caller to have "ioctl" permissions on the file > descriptor, wouldn't it? > > And it could be a feature even, since the remote QEMU process also has > to have "connect" permissions on the qemu-pr-helper socket. So you > could give it ioctl access *limited to persistent reservations* by > granting the appropriate permissions on the socket. We can't grant access to the persistent reservation helper's socket on a per QEMU basis. Permissions are granted on the domain type svirt_t, and we don't want to invent a new domain type just for having access to the PR helper. So if we grant access to a global PR helper, we must have that helper do MAC checks. Without it, QEMU has delegated actions it can't do itself to a separate process thus escaping its MAC confinement in that area. Userspace MAC checks are not very hard to do, so I don't see a significant burden to adding this support to the helper. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On 27/11/2017 11:59, Daniel P. Berrange wrote: > On Mon, Nov 27, 2017 at 11:57:56AM +0100, Paolo Bonzini wrote: >> On 27/11/2017 10:40, Daniel P. Berrange wrote: >>> >>> If we had one daemon per QEMU, then we would give the daemon the same >>> MCS label as QEMU. The kernel will thus enforce this label matches the >>> label on the QEMU process when it connects to the UNIX socket. The kernel >>> will also validate the label on the disk file descriptor passed to the >>> daemon by QEMU. >>> >>> If we had one daemon per host, then that daemon will need a generic >>> label that lets all QEMUs connect to it. When QEMU passes in a disk >>> FD, the daemon will need to query the SELinux context of the remote >>> QEMU process, and then perform a userspace ACL check of that against >>> the FD that is passed in. >>> >>> The latter case means the QEMU helper will need to link to libselinux >>> and performs checks itself. >> >> Then it seems much better to use one daemon per QEMU, indeed. > > That would rule out using persistent reservations for unprivileged QEMU > though, because we still need sVirt protection for that. Hm, I see what you mean now. But it would be "just" a qemu-pr-helper bug that it trusts the caller to have "ioctl" permissions on the file descriptor, wouldn't it? And it could be a feature even, since the remote QEMU process also has to have "connect" permissions on the qemu-pr-helper socket. So you could give it ioctl access *limited to persistent reservations* by granting the appropriate permissions on the socket. Thanks, Paolo -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On 24/11/2017 19:13, Michal Privoznik wrote: > On 11/24/2017 06:18 PM, Paolo Bonzini wrote: >> On 24/11/2017 18:07, Michal Privoznik wrote: >>> The helper is going to be daemonized (for the same reason >>> that qemu process is) => there's no SIGCHLD for libvirtd to receive. >> >> Couldn't (or shouldn't!) libvirt register itself as a subreaper instead? > > Whether libvirt can't have a > separate thread where nothing waitpid() would run? That could hardly > work - not only would libvirt need to spawn separate thread for every > helper, it would not work across restarts of libvirtd. We would have no > guarantee that PID we recorded is still the same process as it was when > we were dying. Oh, you're right. :( Even if libvirtd were a subreaper (see prctl(2) manpage), the daemonized qemu-pr-helper would be reparented to init when libvirtd restarts. libvirtd could test whether the helper is alive by connecting to its socket. If that's not enough, I'll add an event. Another thing: we have which represents "one daemon per QEMU" in privileged libvirtd and "well-known socket path" in session libvirtd. What would the live-domain XML look like? Maybe: And then, would it be an error to specify the "wrong" managed setting in the domain-creation XML? Or would it be possible to use a managed='no' helper in privileged libvirtd? I take it as a "no" given your remark below. Thanks, Paolo > If qemu has to connect to the helper's socket the socket must have a > label that allows that. However, libvirt allows running qemu under > pretty much any label that users configure. So if there were just one > daemon per host (which there is not nor it will be), we wouldn't be able > to make any MCS (or at least that's how I understand it) - thus the only > thing we could do is just to allow everybody to connect. > > This is more of affirmation that one helper daemon per host is a bad > idea than anything really. > > Michal > -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On Mon, Nov 27, 2017 at 11:57:56AM +0100, Paolo Bonzini wrote: > On 27/11/2017 10:40, Daniel P. Berrange wrote: > > > > If we had one daemon per QEMU, then we would give the daemon the same > > MCS label as QEMU. The kernel will thus enforce this label matches the > > label on the QEMU process when it connects to the UNIX socket. The kernel > > will also validate the label on the disk file descriptor passed to the > > daemon by QEMU. > > > > If we had one daemon per host, then that daemon will need a generic > > label that lets all QEMUs connect to it. When QEMU passes in a disk > > FD, the daemon will need to query the SELinux context of the remote > > QEMU process, and then perform a userspace ACL check of that against > > the FD that is passed in. > > > > The latter case means the QEMU helper will need to link to libselinux > > and performs checks itself. > > Then it seems much better to use one daemon per QEMU, indeed. That would rule out using persistent reservations for unprivileged QEMU though, because we still need sVirt protection for that. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On 27/11/2017 10:40, Daniel P. Berrange wrote: > > If we had one daemon per QEMU, then we would give the daemon the same > MCS label as QEMU. The kernel will thus enforce this label matches the > label on the QEMU process when it connects to the UNIX socket. The kernel > will also validate the label on the disk file descriptor passed to the > daemon by QEMU. > > If we had one daemon per host, then that daemon will need a generic > label that lets all QEMUs connect to it. When QEMU passes in a disk > FD, the daemon will need to query the SELinux context of the remote > QEMU process, and then perform a userspace ACL check of that against > the FD that is passed in. > > The latter case means the QEMU helper will need to link to libselinux > and performs checks itself. Then it seems much better to use one daemon per QEMU, indeed. Paolo -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On Fri, Nov 24, 2017 at 04:42:00PM +0100, Paolo Bonzini wrote: > On 24/11/2017 15:52, Daniel P. Berrange wrote: > >> So what has been suggested so far is: > >> > >> > >> > >> > >> > >> > > without an inner element leaves libvirtd with > the choice of a daemon per QEMU, or a daemon per host in a well-known > location. Unprivileged libvirtd would always use the latter; for > privileged libvirtd it is implementation-defined. I like it. > > with an inner always gives a daemon per host in > a custom location. It can be used by either unprivileged or privileged > libvirtd. > > >> Now, my question is, in the first case - how should libvirt chose the > >> path? Should it be different for each disk/domain? How is the daemon > >> started in the first place - should libvirt start it? And when should > >> libvirt kill it? > > > > The core question is one daemon per QEMU, or one daemon per host. I'd be > > more inclined to have one daemon per QEMU so we always have isolation > > and thus do't have to worry about a shared daemon being a potential > > attack point between distinct QEMU's. > > One daemon per QEMU is nicer IMO because it lets us do MCS. Of course > one daemon per QEMU can only apply to system libvirtd; session must use > one daemon per host. > > One daemon per host is easy, because it's just a couple new command-line > options as far as libvirtd is concerned, but we need to check that it > works well with MCS. In both cases we can do MCS, but it would work differently. If we had one daemon per QEMU, then we would give the daemon the same MCS label as QEMU. The kernel will thus enforce this label matches the label on the QEMU process when it connects to the UNIX socket. The kernel will also validate the label on the disk file descriptor passed to the daemon by QEMU. If we had one daemon per host, then that daemon will need a generic label that lets all QEMUs connect to it. When QEMU passes in a disk FD, the daemon will need to query the SELinux context of the remote QEMU process, and then perform a userspace ACL check of that against the FD that is passed in. The latter case means the QEMU helper will need to link to libselinux and performs checks itself. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On Mon, Nov 27, 2017 at 07:27:17 +0100, Michal Privoznik wrote: > On 11/26/2017 09:32 AM, Peter Krempa wrote: > > On Fri, Nov 24, 2017 at 15:38:54 +0100, Michal Privoznik wrote: > >> On 08/22/2017 06:27 PM, Paolo Bonzini wrote: [...] > >> However, we want to be able to turn it on/off or not mention it at all > >> on per-disk basis. So what has been suggested so far is: > >> > >> > >> > >> > >> > >> > > > > Note that if the reservation is applied to the image and not the disk > > the element should become child of the elememt. > > > Especially if > > there is only a ever so slight chance that it will work with formats > > supporting backing store (qcow2) and thus there would be a need to > > configure it per-image. > > The way I understand PRs is that they are format agnostic, so there's > not (nor there will be) per image configuration. Nevertheless, In qemu you have two objects representing a disk image (or member of the backing chain). One is the format driver, and the second is the driver which accesses the storage itself. So even if it concerns the access to the physical storage, it may be necessary to track it in the backing chain and thus under . Only if we are sure that no member of the backing chain can have different config, only then it can be made per-disk. > concern disk image so I will place the element under > . Good point. signature.asc Description: PGP signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On 11/26/2017 09:32 AM, Peter Krempa wrote: > On Fri, Nov 24, 2017 at 15:38:54 +0100, Michal Privoznik wrote: >> On 08/22/2017 06:27 PM, Paolo Bonzini wrote: >>> Hi all, >>> >> >> Sorry for resurrecting old thread but seems like there was no agreement >> reached. >> >> We don't want to expose any paths because the fact that PR helper is a >> separate binary that uses a UNIX socket to talk to qemu is a >> implementation detail of qemu. Other HVs may have it differently. >> >> However, we want to be able to turn it on/off or not mention it at all >> on per-disk basis. So what has been suggested so far is: >> >> >> >> >> >> > > Note that if the reservation is applied to the image and not the disk > the element should become child of the elememt. > Especially if > there is only a ever so slight chance that it will work with formats > supporting backing store (qcow2) and thus there would be a need to > configure it per-image. The way I understand PRs is that they are format agnostic, so there's not (nor there will be) per image configuration. Nevertheless, concern disk image so I will place the element under . Good point. What is still unclear is what should happen when the helper daemon dies while domain is running. I was thinking about this during the weekend and realized that if we go with the event delivered to libvirtd, we also need a monitor command to query the state (we need to cover case when event is fired but daemon is not runnning, it has to query the state once started). Michal -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On Fri, Nov 24, 2017 at 15:38:54 +0100, Michal Privoznik wrote: > On 08/22/2017 06:27 PM, Paolo Bonzini wrote: > > Hi all, > > > > Sorry for resurrecting old thread but seems like there was no agreement > reached. > > We don't want to expose any paths because the fact that PR helper is a > separate binary that uses a UNIX socket to talk to qemu is a > implementation detail of qemu. Other HVs may have it differently. > > However, we want to be able to turn it on/off or not mention it at all > on per-disk basis. So what has been suggested so far is: > > > > > > Note that if the reservation is applied to the image and not the disk the element should become child of the elememt. Especially if there is only a ever so slight chance that it will work with formats supporting backing store (qcow2) and thus there would be a need to configure it per-image. signature.asc Description: PGP signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On 11/24/2017 06:18 PM, Paolo Bonzini wrote: > On 24/11/2017 18:07, Michal Privoznik wrote: >> On 11/24/2017 04:42 PM, Paolo Bonzini wrote: >>> One daemon per QEMU is nicer IMO because it lets us do MCS. Of course >>> one daemon per QEMU can only apply to system libvirtd; session must use >>> one daemon per host. >> >> Agreed. One daemon per QEMU it is then. Just to make sure whether I> >> understand correctly - libvirtd would start it *before* exec()-ing qemu >> (so that qemu can connect) and kill it after qemu dies. But what should >> happen if the helper dies while qemu is running? Should qemu pause >> itself, fire up an event on the monitor so that libvirtd can spawn the >> helper again? > > QEMU currently tries to reconnect five times every second, and then > fails the I/O command. This seemed okay to me because PRs are generally > used in a HA environment where the failure will be broadcast and another > node will pick up the work. > >> The helper is going to be daemonized (for the same reason >> that qemu process is) => there's no SIGCHLD for libvirtd to receive. > > Couldn't (or shouldn't!) libvirt register itself as a subreaper instead? I don't know what you mean by registering. Whether libvirt can't have a separate thread where nothing waitpid() would run? That could hardly work - not only would libvirt need to spawn separate thread for every helper, it would not work across restarts of libvirtd. We would have no guarantee that PID we recorded is still the same process as it was when we were dying. > >> Also, I don't really expect anything special when it comes to migration, >> just want to make sure I'm not mislead. > > No, everything is fine for migration. > >>> One daemon per host is easy, because it's just a couple new command-line >>> options as far as libvirtd is concerned, but we need to check that it >>> works well with MCS. >> >> It's very likely that it wouldn't. Users can chose arbitrary DAC/SELinux >> labels for their qemus. In general we will not find any intersection >> that the helper socket can have. > > I know some of those words. :) Can you explain to an SELinux layman? If qemu has to connect to the helper's socket the socket must have a label that allows that. However, libvirt allows running qemu under pretty much any label that users configure. So if there were just one daemon per host (which there is not nor it will be), we wouldn't be able to make any MCS (or at least that's how I understand it) - thus the only thing we could do is just to allow everybody to connect. This is more of affirmation that one helper daemon per host is a bad idea than anything really. Michal -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On 24/11/2017 18:07, Michal Privoznik wrote: > On 11/24/2017 04:42 PM, Paolo Bonzini wrote: >> One daemon per QEMU is nicer IMO because it lets us do MCS. Of course >> one daemon per QEMU can only apply to system libvirtd; session must use >> one daemon per host. > > Agreed. One daemon per QEMU it is then. Just to make sure whether I> > understand correctly - libvirtd would start it *before* exec()-ing qemu > (so that qemu can connect) and kill it after qemu dies. But what should > happen if the helper dies while qemu is running? Should qemu pause > itself, fire up an event on the monitor so that libvirtd can spawn the > helper again? QEMU currently tries to reconnect five times every second, and then fails the I/O command. This seemed okay to me because PRs are generally used in a HA environment where the failure will be broadcast and another node will pick up the work. > The helper is going to be daemonized (for the same reason > that qemu process is) => there's no SIGCHLD for libvirtd to receive. Couldn't (or shouldn't!) libvirt register itself as a subreaper instead? > Also, I don't really expect anything special when it comes to migration, > just want to make sure I'm not mislead. No, everything is fine for migration. >> One daemon per host is easy, because it's just a couple new command-line >> options as far as libvirtd is concerned, but we need to check that it >> works well with MCS. > > It's very likely that it wouldn't. Users can chose arbitrary DAC/SELinux > labels for their qemus. In general we will not find any intersection > that the helper socket can have. I know some of those words. :) Can you explain to an SELinux layman? > Yup. In case of a unprivileged libvirtd there'll be a system-wide helper > daemon that: > a) will not be managed by libvirtd > b) has wide permissions so that any user can connect to it > > This basically follows what we have for vhostuser. Except that vhostuser > is not an implementation detail of qemu :-) Okay, thanks. Paolo -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On 11/24/2017 04:42 PM, Paolo Bonzini wrote: > On 24/11/2017 15:52, Daniel P. Berrange wrote: >>> So what has been suggested so far is: >>> >>> >>> >>> >>> >>> > > without an inner element leaves libvirtd with > the choice of a daemon per QEMU, or a daemon per host in a well-known > location. Unprivileged libvirtd would always use the latter; for > privileged libvirtd it is implementation-defined. I like it. > > with an inner always gives a daemon per host in > a custom location. It can be used by either unprivileged or privileged > libvirtd. > >>> Now, my question is, in the first case - how should libvirt chose the >>> path? Should it be different for each disk/domain? How is the daemon >>> started in the first place - should libvirt start it? And when should >>> libvirt kill it? >> >> The core question is one daemon per QEMU, or one daemon per host. I'd be >> more inclined to have one daemon per QEMU so we always have isolation >> and thus do't have to worry about a shared daemon being a potential >> attack point between distinct QEMU's. > > One daemon per QEMU is nicer IMO because it lets us do MCS. Of course > one daemon per QEMU can only apply to system libvirtd; session must use > one daemon per host. Agreed. One daemon per QEMU it is then. Just to make sure whether I understand correctly - libvirtd would start it *before* exec()-ing qemu (so that qemu can connect) and kill it after qemu dies. But what should happen if the helper dies while qemu is running? Should qemu pause itself, fire up an event on the monitor so that libvirtd can spawn the helper again? The helper is going to be daemonized (for the same reason that qemu process is) => there's no SIGCHLD for libvirtd to receive. Also, I don't really expect anything special when it comes to migration, just want to make sure I'm not mislead. > > One daemon per host is easy, because it's just a couple new command-line > options as far as libvirtd is concerned, but we need to check that it > works well with MCS. It's very likely that it wouldn't. Users can chose arbitrary DAC/SELinux labels for their qemus. In general we will not find any intersection that the helper socket can have. > >> If one daemon per host, then for privileged libvirtd, we should make sure >> the daemon ships with a systemd unit file + socket activation file, then >> we have a well-known cross-distro standardized socket path. > > Ok, then I will send a patch for upstream QEMU that adds the Fedora > systemd unit files to contrib/. They are useful anyway. Yup. In case of a unprivileged libvirtd there'll be a system-wide helper daemon that: a) will not be managed by libvirtd b) has wide permissions so that any user can connect to it This basically follows what we have for vhostuser. Except that vhostuser is not an implementation detail of qemu :-) Michal -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On 24/11/2017 15:52, Daniel P. Berrange wrote: >> So what has been suggested so far is: >> >> >> >> >> >> without an inner element leaves libvirtd with the choice of a daemon per QEMU, or a daemon per host in a well-known location. Unprivileged libvirtd would always use the latter; for privileged libvirtd it is implementation-defined. I like it. with an inner always gives a daemon per host in a custom location. It can be used by either unprivileged or privileged libvirtd. >> Now, my question is, in the first case - how should libvirt chose the >> path? Should it be different for each disk/domain? How is the daemon >> started in the first place - should libvirt start it? And when should >> libvirt kill it? > > The core question is one daemon per QEMU, or one daemon per host. I'd be > more inclined to have one daemon per QEMU so we always have isolation > and thus do't have to worry about a shared daemon being a potential > attack point between distinct QEMU's. One daemon per QEMU is nicer IMO because it lets us do MCS. Of course one daemon per QEMU can only apply to system libvirtd; session must use one daemon per host. One daemon per host is easy, because it's just a couple new command-line options as far as libvirtd is concerned, but we need to check that it works well with MCS. > If one daemon per host, then for privileged libvirtd, we should make sure > the daemon ships with a systemd unit file + socket activation file, then > we have a well-known cross-distro standardized socket path. Ok, then I will send a patch for upstream QEMU that adds the Fedora systemd unit files to contrib/. They are useful anyway. Thanks, Paolo > If one daemon per QEMU, then we should just put the socket in the VM's > private dir under /var/run/libvirt/qemu/$GUEST/ -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On Fri, Nov 24, 2017 at 03:38:54PM +0100, Michal Privoznik wrote: > On 08/22/2017 06:27 PM, Paolo Bonzini wrote: > > Hi all, > > > > Sorry for resurrecting old thread but seems like there was no agreement > reached. > > We don't want to expose any paths because the fact that PR helper is a > separate binary that uses a UNIX socket to talk to qemu is a > implementation detail of qemu. Other HVs may have it differently. > > However, we want to be able to turn it on/off or not mention it at all > on per-disk basis. So what has been suggested so far is: > > > > > > > > for privileged qemu, then: > > > > > > > > > > for unprivileged qemu, or: > > > > > > > > for PR feature turned off (equivalent to leaving it out completely). > > Now, my question is, in the first case - how should libvirt chose the > path? Should it be different for each disk/domain? How is the daemon > started in the first place - should libvirt start it? And when should > libvirt kill it? The core question is one daemon per QEMU, or one daemon per host. I'd be more inclined to have one daemon per QEMU so we always have isolation and thus do't have to worry about a shared daemon being a potential attack point between distinct QEMU's. If one daemon per host, then for privileged libvirtd, we should make sure the daemon ships with a systemd unit file + socket activation file, then we have a well-known cross-distro standardized socket path. If one daemon per QEMU, then we should just put the socket in the VM's private dir under /var/run/libvirt/qemu/$GUEST/. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On 08/22/2017 06:27 PM, Paolo Bonzini wrote: > Hi all, > Sorry for resurrecting old thread but seems like there was no agreement reached. We don't want to expose any paths because the fact that PR helper is a separate binary that uses a UNIX socket to talk to qemu is a implementation detail of qemu. Other HVs may have it differently. However, we want to be able to turn it on/off or not mention it at all on per-disk basis. So what has been suggested so far is: for privileged qemu, then: for unprivileged qemu, or: for PR feature turned off (equivalent to leaving it out completely). Now, my question is, in the first case - how should libvirt chose the path? Should it be different for each disk/domain? How is the daemon started in the first place - should libvirt start it? And when should libvirt kill it? Michal -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On 04/10/2017 10:28, Daniel P. Berrange wrote: > On Tue, Oct 03, 2017 at 06:53:56PM +0200, Paolo Bonzini wrote: >> On 03/10/2017 18:39, Daniel P. Berrange wrote: >>> On Tue, Oct 03, 2017 at 06:35:03PM +0200, Paolo Bonzini wrote: And later on we might have other ways to implement persistent reservations in QEMU. So while I'm not a big fan(*) of the driver='helper' moniker, I don't think an attribute is enough. Maybe driver='external'?... >>> >>> Yes, if there's a choice of ways to manage reservations, we could >>> reflect that as 'reservations=passthrough|emulated' or something >>> like that. >>> >>> I just don't think the concept of a helper program should be visible >>> in the XML, as it is an impl detail that is totally QEMU specific and >>> could conceivably change eg not even needed with unpriv_sgio, >> >> Not sure about that, the mpathpersist behavior is somewhat magic and I >> am not really sure we should enable it by default, even with unpriv_sgio. >> >>> and if >>> kernel were enhanced could be usable without needing a helper elsewhere >>> too. >> >> It's an implementation detail for system mode, but not for user mode >> (where ACLs on the socket are used to allow access to a privileged >> operation). >> >> So: >> >> >> >> (uses helper from global configuration if not a libiscsi drive) vs. >> >> >> >> > > What do you mean by here ? If that's the chardev source I would > really prefer not to have that visible. Yes, it's the chardev source. See above: in user mode, ACLs on the socket are used to allow access to a privileged operation, so the source has to be there. It's not the most common mode, but it's the only one that works for user mode and hardcoding a (possibly distro-specific) socket path in libvirtd is worse IMO. Paolo >> >> (for user mode) vs. >> >> >> >> (fails if libiscsi || CAP_SYS_RAWIO || unpriv_sgio)? > > Regards, > Daniel > -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On Tue, Oct 03, 2017 at 06:53:56PM +0200, Paolo Bonzini wrote: > On 03/10/2017 18:39, Daniel P. Berrange wrote: > > On Tue, Oct 03, 2017 at 06:35:03PM +0200, Paolo Bonzini wrote: > >> And later on we might have other ways to implement persistent > >> reservations in QEMU. So while I'm not a big fan(*) of the > >> driver='helper' moniker, I don't think an attribute is enough. Maybe > >> driver='external'?... > > > > Yes, if there's a choice of ways to manage reservations, we could > > reflect that as 'reservations=passthrough|emulated' or something > > like that. > > > > I just don't think the concept of a helper program should be visible > > in the XML, as it is an impl detail that is totally QEMU specific and > > could conceivably change eg not even needed with unpriv_sgio, > > Not sure about that, the mpathpersist behavior is somewhat magic and I > am not really sure we should enable it by default, even with unpriv_sgio. > > > and if > > kernel were enhanced could be usable without needing a helper elsewhere > > too. > > It's an implementation detail for system mode, but not for user mode > (where ACLs on the socket are used to allow access to a privileged > operation). > > So: > > > > (uses helper from global configuration if not a libiscsi drive) vs. > > > > What do you mean by here ? If that's the chardev source I would really prefer not to have that visible. > > (for user mode) vs. > > > > (fails if libiscsi || CAP_SYS_RAWIO || unpriv_sgio)? Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On 03/10/2017 18:39, Daniel P. Berrange wrote: > On Tue, Oct 03, 2017 at 06:35:03PM +0200, Paolo Bonzini wrote: >> And later on we might have other ways to implement persistent >> reservations in QEMU. So while I'm not a big fan(*) of the >> driver='helper' moniker, I don't think an attribute is enough. Maybe >> driver='external'?... > > Yes, if there's a choice of ways to manage reservations, we could > reflect that as 'reservations=passthrough|emulated' or something > like that. > > I just don't think the concept of a helper program should be visible > in the XML, as it is an impl detail that is totally QEMU specific and > could conceivably change eg not even needed with unpriv_sgio, Not sure about that, the mpathpersist behavior is somewhat magic and I am not really sure we should enable it by default, even with unpriv_sgio. > and if > kernel were enhanced could be usable without needing a helper elsewhere > too. It's an implementation detail for system mode, but not for user mode (where ACLs on the socket are used to allow access to a privileged operation). So: (uses helper from global configuration if not a libiscsi drive) vs. (for user mode) vs. (fails if libiscsi || CAP_SYS_RAWIO || unpriv_sgio)? Paolo -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On Tue, Oct 03, 2017 at 06:35:03PM +0200, Paolo Bonzini wrote: > On 03/10/2017 18:17, Daniel P. Berrange wrote: > > On Tue, Oct 03, 2017 at 06:07:53PM +0200, Paolo Bonzini wrote: > >> Yes, but OTOH if libvirtd starts the daemon, nobody cares about the > >> source type, so perhaps > >> > >> > >> > >> > >> > >> (mandatory source) vs. > >> > >> > >> /path/to/qemu-pr-helper > >> > >> > >> (optional path, default from global configuration) vs. > >> > >> > >> > >> (no helper)? > > > > I tend to think we should not expose the concept of a 'helper' at all in > > the XML. The fact that QEMU has a separate binary to do this is really an > > internal implementation detail due to the need for privilege separation/ > > elevation. > > > > Libvirt should just do the right thing running the helper with a suitable > > configuration when needed, just like we run the TAP device helper when > > needed. > > I agree we don't need the helper path. I am not sure about the socket > path, but maybe that could also be in qemu.conf. However, > reservations='off' doesn't always make sense: > > 1) if you have appropriate privileges (via unpriv_sgio on distros that > have it, or capabilities, or just because you're using the libiscsi > driver) you will be able to access them anyway; > > 2) if you're on a multipath device, you need to use the helper anyway to > get the right semantics; > > And later on we might have other ways to implement persistent > reservations in QEMU. So while I'm not a big fan(*) of the > driver='helper' moniker, I don't think an attribute is enough. Maybe > driver='external'?... Yes, if there's a choice of ways to manage reservations, we could reflect that as 'reservations=passthrough|emulated' or something like that. I just don't think the concept of a helper program should be visible in the XML, as it is an impl detail that is totally QEMU specific and could conceivably change eg not even needed with unpriv_sgio, and if kernel were enhanced could be usable without needing a helper elsewhere too. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On 03/10/2017 18:17, Daniel P. Berrange wrote: > On Tue, Oct 03, 2017 at 06:07:53PM +0200, Paolo Bonzini wrote: >> Yes, but OTOH if libvirtd starts the daemon, nobody cares about the >> source type, so perhaps >> >> >> >> >> >> (mandatory source) vs. >> >> >> /path/to/qemu-pr-helper >> >> >> (optional path, default from global configuration) vs. >> >> >> >> (no helper)? > > I tend to think we should not expose the concept of a 'helper' at all in > the XML. The fact that QEMU has a separate binary to do this is really an > internal implementation detail due to the need for privilege separation/ > elevation. > > Libvirt should just do the right thing running the helper with a suitable > configuration when needed, just like we run the TAP device helper when > needed. I agree we don't need the helper path. I am not sure about the socket path, but maybe that could also be in qemu.conf. However, reservations='off' doesn't always make sense: 1) if you have appropriate privileges (via unpriv_sgio on distros that have it, or capabilities, or just because you're using the libiscsi driver) you will be able to access them anyway; 2) if you're on a multipath device, you need to use the helper anyway to get the right semantics; And later on we might have other ways to implement persistent reservations in QEMU. So while I'm not a big fan(*) of the driver='helper' moniker, I don't think an attribute is enough. Maybe driver='external'?... Paolo (*) I might say, I have some reservations about it > IOW, just be as simple as an attribute "reservations=on|off" and we'll > decide the UNIX socket path, daemon forking, etc ourselves > > Regards, > Daniel > -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On 03/10/2017 17:59, Michal Privoznik wrote: > Ah, this breaks my design. I guess > > > > > > > > is pure madness, isn't it? Yes, but OTOH if libvirtd starts the daemon, nobody cares about the source type, so perhaps (mandatory source) vs. /path/to/qemu-pr-helper (optional path, default from global configuration) vs. (no helper)? Paolo -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On Tue, Oct 03, 2017 at 06:07:53PM +0200, Paolo Bonzini wrote: > On 03/10/2017 17:59, Michal Privoznik wrote: > > Ah, this breaks my design. I guess > > > > > > > > > > > > > > > > is pure madness, isn't it? > > Yes, but OTOH if libvirtd starts the daemon, nobody cares about the > source type, so perhaps > > > > > > (mandatory source) vs. > > > /path/to/qemu-pr-helper > > > (optional path, default from global configuration) vs. > > > > (no helper)? I tend to think we should not expose the concept of a 'helper' at all in the XML. The fact that QEMU has a separate binary to do this is really an internal implementation detail due to the need for privilege separation/ elevation. Libvirt should just do the right thing running the helper with a suitable configuration when needed, just like we run the TAP device helper when needed. IOW, just be as simple as an attribute "reservations=on|off" and we'll decide the UNIX socket path, daemon forking, etc ourselves Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On 09/10/2017 11:38 AM, Paolo Bonzini wrote: > On 28/08/2017 13:11, Michal Privoznik wrote: >> On 08/25/2017 12:41 AM, Paolo Bonzini wrote: >>> On 22/08/2017 18:27, Paolo Bonzini wrote: Hi all, >> >>> >>> The XML to use the helper with a predefined socket could be: >>> >>> >>>/path/to/unix.socket' >>> This looks okay-ish. But from future proof POV, I'd have the path as an attribute, or as a separate sub-element of so that we can stick more elements into it if we need to. >> >> Do we want to/need to expose the path here? I mean, is user expected to >> do something with it? We don't expose monitor path anywhere but keep it >> private (of course we store it in so called status XML which is a >> persistent storage solely for purpose of reloading the internal state >> when daemon is restarted). > > In this case, yes. This is for the case of a global daemon. I'm thinking if we should just use virDomainChrSourceDefPtr for this. I mean, if you take look at the vhost-user XML format it looks something like this: http://libvirt.org/formatdomain.html#elementVhostuser So we could have: The advantage of this is that we can express other connection modes if the pr-helper is ever going to support them (although, if you need to pass FDs around, UNIX socket is the only way for you). Then again, I think it follows what we have elsewhere. Also, if we go this way, we can forbid any type='' but unix and mode='server'. > >>> >>> while to use it with a dedicated daemon >>> >>> >>>/path/to/qemu-pr-helper >>> Ah, this breaks my design. I guess is pure madness, isn't it? Michal -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On 11/09/2017 17:53, Daniel P. Berrange wrote: > On Mon, Sep 11, 2017 at 05:47:28PM +0200, Paolo Bonzini wrote: >> On 11/09/2017 17:33, Daniel P. Berrange wrote: >>> On Mon, Sep 11, 2017 at 05:27:20PM +0200, Paolo Bonzini wrote: On 11/09/2017 17:23, Daniel P. Berrange wrote: >> On the other hand, the daemon has CAP_SYS_RAWIO and CAP_SYS_ADMIN, so if >> you get memory corruption all bets are probably off anyway. > That's where the benefit of strict selinux labelling comes in. If we had > strict labelling of the individual paths below the device, then even if > the daemon got corrupted, the policy would prevent it from doing any > damage to the system beyond calling ioctl() the individual paths it had > been granted. It wouldn't be able to access devices associated with > the host OS mounts, or other non-VM related or non-multipath related > block devices. Sure, but those capabilities let you do a lot of nasty things indirectly, even within the constraints of the SELinux policy. For example, if you are able to reconfigure device mapper, you can convince the kernel to write to any block device---even if you cannot open it. IDWEFAL (I don't write exploits for a living) but I'm sure that's just scraping the surface. >>> >>> Surely we would not write an SELinux policy that allows this daemon >>> to reconfigure device mapper. >>> >>> IIUC, all this daemon should need is the ability to request persistent >>> reservations on the individual paths associated with the mpath device. >>> >>> Is it not possible to write a SElinux policy which allows that, without >>> also allowing reconfiguration of device mapper. >> >> As far as I know, querying and reconfiguring the device mapper are both >> done with ioctls on /dev/mapper/control, and both require CAP_SYS_ADMIN. >> >> Maybe future versions of Linux could change it to require CAP_SYS_ADMIN >> only for reconfiguration, so that the PR helper daemon does not require >> the capability anymore. However, that would be independent from >> SELinux, which only controls "ioctl" access without finer-grain choice >> of which ioctls to allow. > > I don't suppose that the LUNS behind a mpath device end up being > surface in /sys/block anywhere do they ? The LUNs actually can be identified via /sys/block/dm-NN/slaves (I think), but you cannot find out if it's a mpath device in the first place without a /dev/mapper/control ioctl. >> I understand that you want to protect in depth, but unfortunately this >> only works if all layers are aware of SELinux. Luckily the daemon is >> much, much smaller than QEMU, and so is the attack surface. > > It would be ok if we think its possible to lock it down later (once any > needed kernel enhancements are developed), without having to change the > interaction between QEMU / libvirt / helper daemon. I'm beginning to > feel that is ok. Yes, that's my reasoning as well. We could (and perhaps should) still look at MCS to block unwanted connections to the socket, though. Paolo -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On Mon, Sep 11, 2017 at 05:47:28PM +0200, Paolo Bonzini wrote: > On 11/09/2017 17:33, Daniel P. Berrange wrote: > > On Mon, Sep 11, 2017 at 05:27:20PM +0200, Paolo Bonzini wrote: > >> On 11/09/2017 17:23, Daniel P. Berrange wrote: > On the other hand, the daemon has CAP_SYS_RAWIO and CAP_SYS_ADMIN, so if > you get memory corruption all bets are probably off anyway. > >>> That's where the benefit of strict selinux labelling comes in. If we had > >>> strict labelling of the individual paths below the device, then even if > >>> the daemon got corrupted, the policy would prevent it from doing any > >>> damage to the system beyond calling ioctl() the individual paths it had > >>> been granted. It wouldn't be able to access devices associated with > >>> the host OS mounts, or other non-VM related or non-multipath related > >>> block devices. > >> > >> Sure, but those capabilities let you do a lot of nasty things > >> indirectly, even within the constraints of the SELinux policy. > >> > >> For example, if you are able to reconfigure device mapper, you can > >> convince the kernel to write to any block device---even if you cannot > >> open it. IDWEFAL (I don't write exploits for a living) but I'm sure > >> that's just scraping the surface. > > > > Surely we would not write an SELinux policy that allows this daemon > > to reconfigure device mapper. > > > > IIUC, all this daemon should need is the ability to request persistent > > reservations on the individual paths associated with the mpath device. > > > > Is it not possible to write a SElinux policy which allows that, without > > also allowing reconfiguration of device mapper. > > As far as I know, querying and reconfiguring the device mapper are both > done with ioctls on /dev/mapper/control, and both require CAP_SYS_ADMIN. > > Maybe future versions of Linux could change it to require CAP_SYS_ADMIN > only for reconfiguration, so that the PR helper daemon does not require > the capability anymore. However, that would be independent from > SELinux, which only controls "ioctl" access without finer-grain choice > of which ioctls to allow. I don't suppose that the LUNS behind a mpath device end up being surface in /sys/block anywhere do they ? > I understand that you want to protect in depth, but unfortunately this > only works if all layers are aware of SELinux. Luckily the daemon is > much, much smaller than QEMU, and so is the attack surface. It would be ok if we think its possible to lock it down later (once any needed kernel enhancements are developed), without having to change the interaction between QEMU / libvirt / helper daemon. I'm beginning to feel that is ok. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On 11/09/2017 17:33, Daniel P. Berrange wrote: > On Mon, Sep 11, 2017 at 05:27:20PM +0200, Paolo Bonzini wrote: >> On 11/09/2017 17:23, Daniel P. Berrange wrote: On the other hand, the daemon has CAP_SYS_RAWIO and CAP_SYS_ADMIN, so if you get memory corruption all bets are probably off anyway. >>> That's where the benefit of strict selinux labelling comes in. If we had >>> strict labelling of the individual paths below the device, then even if >>> the daemon got corrupted, the policy would prevent it from doing any >>> damage to the system beyond calling ioctl() the individual paths it had >>> been granted. It wouldn't be able to access devices associated with >>> the host OS mounts, or other non-VM related or non-multipath related >>> block devices. >> >> Sure, but those capabilities let you do a lot of nasty things >> indirectly, even within the constraints of the SELinux policy. >> >> For example, if you are able to reconfigure device mapper, you can >> convince the kernel to write to any block device---even if you cannot >> open it. IDWEFAL (I don't write exploits for a living) but I'm sure >> that's just scraping the surface. > > Surely we would not write an SELinux policy that allows this daemon > to reconfigure device mapper. > > IIUC, all this daemon should need is the ability to request persistent > reservations on the individual paths associated with the mpath device. > > Is it not possible to write a SElinux policy which allows that, without > also allowing reconfiguration of device mapper. As far as I know, querying and reconfiguring the device mapper are both done with ioctls on /dev/mapper/control, and both require CAP_SYS_ADMIN. Maybe future versions of Linux could change it to require CAP_SYS_ADMIN only for reconfiguration, so that the PR helper daemon does not require the capability anymore. However, that would be independent from SELinux, which only controls "ioctl" access without finer-grain choice of which ioctls to allow. I understand that you want to protect in depth, but unfortunately this only works if all layers are aware of SELinux. Luckily the daemon is much, much smaller than QEMU, and so is the attack surface. Paolo -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On 11/09/2017 17:23, Daniel P. Berrange wrote: >> On the other hand, the daemon has CAP_SYS_RAWIO and CAP_SYS_ADMIN, so if >> you get memory corruption all bets are probably off anyway. > That's where the benefit of strict selinux labelling comes in. If we had > strict labelling of the individual paths below the device, then even if > the daemon got corrupted, the policy would prevent it from doing any > damage to the system beyond calling ioctl() the individual paths it had > been granted. It wouldn't be able to access devices associated with > the host OS mounts, or other non-VM related or non-multipath related > block devices. Sure, but those capabilities let you do a lot of nasty things indirectly, even within the constraints of the SELinux policy. For example, if you are able to reconfigure device mapper, you can convince the kernel to write to any block device---even if you cannot open it. IDWEFAL (I don't write exploits for a living) but I'm sure that's just scraping the surface. Paolo -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On Mon, Sep 11, 2017 at 05:20:10PM +0200, Paolo Bonzini wrote: > On 11/09/2017 16:32, Daniel P. Berrange wrote: > >> This is already handled via SCM_RIGHTS and is part of the design of the > >> helper daemon. QEMU cannot even open mpath devices which are not > >> accessible according to its SELinux category or device cgroup. > > > > Ah so the daemon relies on the fact that the client was not permitted > > to open another file. So the only FD it can receive from the client is > > one that was associated with a permitted mpath device. > > > > That would be sufficient to protect against a malicious qemu process > > trying to explicitly pass it invalid FD data. It wouldn't be sufficient > > to be safe against a QEMU process that somehow managed to trigger a bug > > that caused it to corrupt memory and thus trick into opening a different > > file. For the latter we would need to have some stricter policy about > > the helper daemon and labelling on files. > > Exactly. The passed file descriptor acts as a "capability"; the daemon > goes from there to the paths through fstat on the file descriptor > followed by /dev/mapper/control APIs (mostly issued by libmpathpersist). > > On the other hand, the daemon has CAP_SYS_RAWIO and CAP_SYS_ADMIN, so if > you get memory corruption all bets are probably off anyway. That's where the benefit of strict selinux labelling comes in. If we had strict labelling of the individual paths below the device, then even if the daemon got corrupted, the policy would prevent it from doing any damage to the system beyond calling ioctl() the individual paths it had been granted. It wouldn't be able to access devices associated with the host OS mounts, or other non-VM related or non-multipath related block devices. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On Mon, Sep 11, 2017 at 05:27:20PM +0200, Paolo Bonzini wrote: > On 11/09/2017 17:23, Daniel P. Berrange wrote: > >> On the other hand, the daemon has CAP_SYS_RAWIO and CAP_SYS_ADMIN, so if > >> you get memory corruption all bets are probably off anyway. > > That's where the benefit of strict selinux labelling comes in. If we had > > strict labelling of the individual paths below the device, then even if > > the daemon got corrupted, the policy would prevent it from doing any > > damage to the system beyond calling ioctl() the individual paths it had > > been granted. It wouldn't be able to access devices associated with > > the host OS mounts, or other non-VM related or non-multipath related > > block devices. > > Sure, but those capabilities let you do a lot of nasty things > indirectly, even within the constraints of the SELinux policy. > > For example, if you are able to reconfigure device mapper, you can > convince the kernel to write to any block device---even if you cannot > open it. IDWEFAL (I don't write exploits for a living) but I'm sure > that's just scraping the surface. Surely we would not write an SELinux policy that allows this daemon to reconfigure device mapper. IIUC, all this daemon should need is the ability to request persistent reservations on the individual paths associated with the mpath device. Is it not possible to write a SElinux policy which allows that, without also allowing reconfiguration of device mapper. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On 11/09/2017 16:32, Daniel P. Berrange wrote: >> This is already handled via SCM_RIGHTS and is part of the design of the >> helper daemon. QEMU cannot even open mpath devices which are not >> accessible according to its SELinux category or device cgroup. > > Ah so the daemon relies on the fact that the client was not permitted > to open another file. So the only FD it can receive from the client is > one that was associated with a permitted mpath device. > > That would be sufficient to protect against a malicious qemu process > trying to explicitly pass it invalid FD data. It wouldn't be sufficient > to be safe against a QEMU process that somehow managed to trigger a bug > that caused it to corrupt memory and thus trick into opening a different > file. For the latter we would need to have some stricter policy about > the helper daemon and labelling on files. Exactly. The passed file descriptor acts as a "capability"; the daemon goes from there to the paths through fstat on the file descriptor followed by /dev/mapper/control APIs (mostly issued by libmpathpersist). On the other hand, the daemon has CAP_SYS_RAWIO and CAP_SYS_ADMIN, so if you get memory corruption all bets are probably off anyway. Paolo -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On Mon, Sep 11, 2017 at 04:23:27PM +0200, Paolo Bonzini wrote: > On 11/09/2017 14:09, Daniel P. Berrange wrote: > > On Mon, Sep 11, 2017 at 01:44:59PM +0200, Paolo Bonzini wrote: > >> On 11/09/2017 12:46, Daniel P. Berrange wrote: > So the helper is a bit different from QEMU with respect to both SELinux > MCS labeling and the devices cgroup. Access limitation comes from only > ever operating on devices that have been passed on the socket. SELinux > MCS could be used so that only the "right" QEMU can connect to each > instance of the helper, though. > >>> I wonder how this interacts with SELinux. IIUC if we grant access to > >>> the multipath device file, the kernel won't normally require us to > >>> grant access the underlying paths. I/O is just allowed because they > >>> are a member of the mpath device. Hopefully this applies to the > >>> persistent reservations too ? > >> > >> No, persistent reservations are special; they have to be registered > >> independently on each path. (As I said, this was the original reason to > >> have a separate daemon: implementing it in QEMU would be not just a bad > >> idea because you need CAP_SYS_ADMIN, it would be impossible for > >> libvirt-started guests because of sVirt and device cgroups). > >> > >> So I think the helper daemon needs to be granted blanket ioctl access to > >> devices, without using either device cgroups or MCS. > > > > If it was a single helper daemon for all guests, that would be ok to be > > granted access to all devices. If we did that though, the daemon would > > have to be SELinux aware. ie, libvirt would have to talk to it and say > > that svirt_t:s0:c236,c660 is permitted /dev/mpath/foo, and it would > > have to validate the requests from the socket to QEMU to make sure that > > QEMU is not requesting access to other mpath devices. > > This is already handled via SCM_RIGHTS and is part of the design of the > helper daemon. QEMU cannot even open mpath devices which are not > accessible according to its SELinux category or device cgroup. Ah so the daemon relies on the fact that the client was not permitted to open another file. So the only FD it can receive from the client is one that was associated with a permitted mpath device. That would be sufficient to protect against a malicious qemu process trying to explicitly pass it invalid FD data. It wouldn't be sufficient to be safe against a QEMU process that somehow managed to trigger a bug that caused it to corrupt memory and thus trick into opening a different file. For the latter we would need to have some stricter policy about the helper daemon and labelling on files. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On 11/09/2017 14:09, Daniel P. Berrange wrote: > On Mon, Sep 11, 2017 at 01:44:59PM +0200, Paolo Bonzini wrote: >> On 11/09/2017 12:46, Daniel P. Berrange wrote: So the helper is a bit different from QEMU with respect to both SELinux MCS labeling and the devices cgroup. Access limitation comes from only ever operating on devices that have been passed on the socket. SELinux MCS could be used so that only the "right" QEMU can connect to each instance of the helper, though. >>> I wonder how this interacts with SELinux. IIUC if we grant access to >>> the multipath device file, the kernel won't normally require us to >>> grant access the underlying paths. I/O is just allowed because they >>> are a member of the mpath device. Hopefully this applies to the >>> persistent reservations too ? >> >> No, persistent reservations are special; they have to be registered >> independently on each path. (As I said, this was the original reason to >> have a separate daemon: implementing it in QEMU would be not just a bad >> idea because you need CAP_SYS_ADMIN, it would be impossible for >> libvirt-started guests because of sVirt and device cgroups). >> >> So I think the helper daemon needs to be granted blanket ioctl access to >> devices, without using either device cgroups or MCS. > > If it was a single helper daemon for all guests, that would be ok to be > granted access to all devices. If we did that though, the daemon would > have to be SELinux aware. ie, libvirt would have to talk to it and say > that svirt_t:s0:c236,c660 is permitted /dev/mpath/foo, and it would > have to validate the requests from the socket to QEMU to make sure that > QEMU is not requesting access to other mpath devices. This is already handled via SCM_RIGHTS and is part of the design of the helper daemon. QEMU cannot even open mpath devices which are not accessible according to its SELinux category or device cgroup. > If it was a one helper daemon per QEMU instance, then we would want to > directly confine it to just the underlying devices associated with the > mpath device allowed for that QEMU instance. This would imply that we > needed to label the underlying devices with the MCS label to match the > VM. This would be nice but not necessary, right? Paolo -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On Mon, Sep 11, 2017 at 01:44:59PM +0200, Paolo Bonzini wrote: > On 11/09/2017 12:46, Daniel P. Berrange wrote: > >> So the helper is a bit different from QEMU with respect to both SELinux > >> MCS labeling and the devices cgroup. Access limitation comes from only > >> ever operating on devices that have been passed on the socket. SELinux > >> MCS could be used so that only the "right" QEMU can connect to each > >> instance of the helper, though. > > I wonder how this interacts with SELinux. IIUC if we grant access to > > the multipath device file, the kernel won't normally require us to > > grant access the underlying paths. I/O is just allowed because they > > are a member of the mpath device. Hopefully this applies to the > > persistent reservations too ? > > No, persistent reservations are special; they have to be registered > independently on each path. (As I said, this was the original reason to > have a separate daemon: implementing it in QEMU would be not just a bad > idea because you need CAP_SYS_ADMIN, it would be impossible for > libvirt-started guests because of sVirt and device cgroups). > > So I think the helper daemon needs to be granted blanket ioctl access to > devices, without using either device cgroups or MCS. If it was a single helper daemon for all guests, that would be ok to be granted access to all devices. If we did that though, the daemon would have to be SELinux aware. ie, libvirt would have to talk to it and say that svirt_t:s0:c236,c660 is permitted /dev/mpath/foo, and it would have to validate the requests from the socket to QEMU to make sure that QEMU is not requesting access to other mpath devices. If it was a one helper daemon per QEMU instance, then we would want to directly confine it to just the underlying devices associated with the mpath device allowed for that QEMU instance. This would imply that we needed to label the underlying devices with the MCS label to match the VM. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On 11/09/2017 12:46, Daniel P. Berrange wrote: >> So the helper is a bit different from QEMU with respect to both SELinux >> MCS labeling and the devices cgroup. Access limitation comes from only >> ever operating on devices that have been passed on the socket. SELinux >> MCS could be used so that only the "right" QEMU can connect to each >> instance of the helper, though. > I wonder how this interacts with SELinux. IIUC if we grant access to > the multipath device file, the kernel won't normally require us to > grant access the underlying paths. I/O is just allowed because they > are a member of the mpath device. Hopefully this applies to the > persistent reservations too ? No, persistent reservations are special; they have to be registered independently on each path. (As I said, this was the original reason to have a separate daemon: implementing it in QEMU would be not just a bad idea because you need CAP_SYS_ADMIN, it would be impossible for libvirt-started guests because of sVirt and device cgroups). So I think the helper daemon needs to be granted blanket ioctl access to devices, without using either device cgroups or MCS. Paolo -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On Sun, Sep 10, 2017 at 11:52:24AM +0200, Paolo Bonzini wrote: > On 29/08/2017 14:41, Daniel P. Berrange wrote: > > On Tue, Aug 22, 2017 at 06:27:40PM +0200, Paolo Bonzini wrote: > >> Hi all, > >> > >> I am adding a new daemon to QEMU, that QEMU can connect to in order to > >> issue persistent reservation commands. > > > > Can you elaborate on what this daemon does ? > > > > IIUC, by 'persistent reservation' you are referring to SCSI LUN > > reservations ? If so, the security model needs to involve more > > than just policy on the socket that QEMU uses to talk to the > > PR daemon. We need to be able to control what device nodes this > > daemon is permitted to access. > > > > For iSCSI backed disks, the daemon might need to use the libiscsi > > driver, instead of assuming there is a device node on the local > > host too. > > libiscsi-backed disks can already issue persistent reservations. The > daemon is only needed for physical disks, as an alternative to > sgio='unfiltered' or (yuck) running QEMU with CAP_SYS_RAWIO. > > > Libvirt has a generic lock manager facility and I've previously > > though might be able to be extended to acquire SCSI reservations > > for disks which are backed by SCSI/iSCSI, but never tried to > > work on this. > > This is an interesting idea, but it is unrelated to this work, which is > about guests who manage the locking on their own (through peristent > reservations). Ah ok, I missed that this was about allowing the guest todo reservations itself, rather than QEMU using it for disk locking. > > I'm not sure if want to have QEMU spawning this daemon itself or not. > > Definitely not. The daemon is not suid root. > > > On the one hand if QEMU spawns the daemon, then it means the daemon > > inherits the SELinux policy of QEMU by default, so is restricted to > > only access the devices that QEMU has been granted. > > Libvirt could also give a confined policy to the helper, but there are > some complications because of multipath. When passed a multipath > device, the daemon forwards the PR command to _all_ paths, not just the > currently active one, similar to the mpathpersist(1) command. (In fact > this was the original usecase; support for non-multipath devices was > just an easy and useful extension.) > > So the helper is a bit different from QEMU with respect to both SELinux > MCS labeling and the devices cgroup. Access limitation comes from only > ever operating on devices that have been passed on the socket. SELinux > MCS could be used so that only the "right" QEMU can connect to each > instance of the helper, though. I wonder how this interacts with SELinux. IIUC if we grant access to the multipath device file, the kernel won't normally require us to grant access the underlying paths. I/O is just allowed because they are a member of the mpath device. Hopefully this applies to the persistent reservations too ? Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On 29/08/2017 14:41, Daniel P. Berrange wrote: > On Tue, Aug 22, 2017 at 06:27:40PM +0200, Paolo Bonzini wrote: >> Hi all, >> >> I am adding a new daemon to QEMU, that QEMU can connect to in order to >> issue persistent reservation commands. > > Can you elaborate on what this daemon does ? > > IIUC, by 'persistent reservation' you are referring to SCSI LUN > reservations ? If so, the security model needs to involve more > than just policy on the socket that QEMU uses to talk to the > PR daemon. We need to be able to control what device nodes this > daemon is permitted to access. > > For iSCSI backed disks, the daemon might need to use the libiscsi > driver, instead of assuming there is a device node on the local > host too. libiscsi-backed disks can already issue persistent reservations. The daemon is only needed for physical disks, as an alternative to sgio='unfiltered' or (yuck) running QEMU with CAP_SYS_RAWIO. > Libvirt has a generic lock manager facility and I've previously > though might be able to be extended to acquire SCSI reservations > for disks which are backed by SCSI/iSCSI, but never tried to > work on this. This is an interesting idea, but it is unrelated to this work, which is about guests who manage the locking on their own (through peristent reservations). > I'm not sure if want to have QEMU spawning this daemon itself or not. Definitely not. The daemon is not suid root. > On the one hand if QEMU spawns the daemon, then it means the daemon > inherits the SELinux policy of QEMU by default, so is restricted to > only access the devices that QEMU has been granted. Libvirt could also give a confined policy to the helper, but there are some complications because of multipath. When passed a multipath device, the daemon forwards the PR command to _all_ paths, not just the currently active one, similar to the mpathpersist(1) command. (In fact this was the original usecase; support for non-multipath devices was just an easy and useful extension.) So the helper is a bit different from QEMU with respect to both SELinux MCS labeling and the devices cgroup. Access limitation comes from only ever operating on devices that have been passed on the socket. SELinux MCS could be used so that only the "right" QEMU can connect to each instance of the helper, though. Paolo -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On 10/09/2017 11:38, Paolo Bonzini wrote: > The daemon can then be > placed in the same devices cgroup and SELinux MCS category as QEMU. At least regarding the devices cgroup, this is wrong, sorry (the socket can be given an MCS category to restrict who connects to it, but not the daemon). More details in the reply to Daniel's message. Thanks, Paolo -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On 28/08/2017 13:11, Michal Privoznik wrote: > On 08/25/2017 12:41 AM, Paolo Bonzini wrote: >> On 22/08/2017 18:27, Paolo Bonzini wrote: >>> Hi all, > > Hey, sorry for late reply. I was enjoying my PTO by not reading e-mails :-) Me too! >>> >>> I am adding a new daemon to QEMU, that QEMU can connect to in order to >>> issue persistent reservation commands. > > Persistent reservation of what? SCSI persistent reservation commands are commands that support multiple initiators accessing the same device. >> Thinking more about it, Libvirt could start the daemon on its own, >> assigning a socket per virtual machine. SELinux MCS should then just >> work, because the same category is assigned to the daemon instance and QEMU. > > Whoa, libvirt is not really prepared for qemu spawning processes, or > having more than one process per qemu domain. But it shouldn't be that > hard to prepare internal structs for that. As Daniel remarked, QEMU should not be spawning processes with elevated privileges. In fact, it definitely should not be doing that in this case. The two usecases are: 1) for user libvirtd, the daemon is started by init and uses a global socket. To talk to the daemon, the user must have been given access to the socket by the administrator via Unix groups or ACLs or whatever. Once you have access, you can send PRs to whatever device you have access (unless the administrator hasn't also configured e.g. devices cgroups for the daemon---but this is outside libvirt's scope). While this mode can be used for system libvirtd too, it would be less secure and it would be hard to make it work with SELinux, so... 2) ... for system libvirtd, a copy of the daemon is started for each VM by libvirtd, together with QEMU. This mode cannot be used for user libvirtd because the helper is _not_ suid root. The daemon can then be placed in the same devices cgroup and SELinux MCS category as QEMU. >> In particular, Libvirt could create the socket itself, label it, and >> pass it to the daemon through the systemd socket activation protocol >> (which qemu-pr-helper supports). > > We can pass FDs to qemu (in fact any process). Even if its running. So > that shouldn't be a problem. Yeah, the systemd socket activation protocol should be trivial to support in libvirtd (compared to other changes to use >1 process per VM). >> >> The XML to use the helper with a predefined socket could be: >> >> >> /path/to/unix.socket' >> > > Do we want to/need to expose the path here? I mean, is user expected to > do something with it? We don't expose monitor path anywhere but keep it > private (of course we store it in so called status XML which is a > persistent storage solely for purpose of reloading the internal state > when daemon is restarted). In this case, yes. This is for the case of a global daemon. >> >> while to use it with a dedicated daemon >> >> >> /path/to/qemu-pr-helper >> > > > Ah, so there isn't 1:1 relationship with qemu process and the daemon > helper. One daemon can serve multiple qemu processes, am I right? Yes, but it need not be the case. > Also, how would libvirt know if the daemon helper dies? The daemon shouldn't die. Last famous words, sure, but even if the daemon isn't respawned, persistent reservations commands fail but everything else works normally. > I mean, if libvirt is > to start it again (there are some systemd-less distros), we have to know > that such situation happened. For instance, we can get an event on the > monitor to which we start the daemon and pass new FD to its socket to > qemu? Although, this would mean a significant work on libvirt side in > case there's 1:N relationship. Because on delivery of the event from two > domains we have to figure out if the domains are supposed to have their > own daemons or one shared. Yeah, this is why for the 1:N model (user libvirtd) I prefer to just leave it to systemd and systemd-less distros will rely on the last famous words above. For the 1:1 model (system libvirtd) libvirtd _could_ respawn, but it is not big deal if it doesn't. > Also, what happens when the daemon dies? What's the qemu state at that > point? Is the guest still running or is it paused (e.g. like on ENOSPC > error on disks)? See above. Paolo -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On Tue, Aug 22, 2017 at 06:27:40PM +0200, Paolo Bonzini wrote: > Hi all, > > I am adding a new daemon to QEMU, that QEMU can connect to in order to > issue persistent reservation commands. Can you elaborate on what this daemon does ? IIUC, by 'persistent reservation' you are referring to SCSI LUN reservations ? If so, the security model needs to involve more than just policy on the socket that QEMU uses to talk to the PR daemon. We need to be able to control what device nodes this daemon is permitted to access. For iSCSI backed disks, the daemon might need to use the libiscsi driver, instead of assuming there is a device node on the local host too. Libvirt has a generic lock manager facility and I've previously though might be able to be extended to acquire SCSI reservations for disks which are backed by SCSI/iSCSI, but never tried to work on this. > The daemon can only issue the commands on file descriptor that QEMU > already has. In addition normal users shouldn't have access to the > daemon's Unix socket in /run, so the daemon is protected against misuse. > > My question is what is the best way to handle the connection to the > daemon socket. Currently, the path to the socket is passed to QEMU on > the command line: > > -object pr-manager-helper,id=mgr,path=/run/qemu-pr-helper.sock \ > -drive if=none,id=hd,driver=raw,filename=/dev/sdb,file.pr-manager=mgr \ > -device scsi-block,drive=hd > > (the new parts are "-object pr-manager-helper" and "file.pr-manager"). I'm not sure if want to have QEMU spawning this daemon itself or not. On the one hand if QEMU spawns the daemon, then it means the daemon inherits the SELinux policy of QEMU by default, so is restricted to only access the devices that QEMU has been granted. On the other hand, this means we need to give QEMU permission to execve(), which is something we've been striving to remove by default. > I could just make it root:root and pass a file descriptor from libvirt > to QEMU, but this would make it impossible for QEMU to reconnect to the > daemon in case someone does a "systemctl restart" or even just kills it > inadvertently. The daemon is stateless, so transparent reconnection > would be a nice feature to have. > > The alternative is to somehow label the daemon socket so that it can be > accessed by QEMU, but I'm not very well versed in SELinux. > > Any ideas? Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On 08/25/2017 12:41 AM, Paolo Bonzini wrote: > On 22/08/2017 18:27, Paolo Bonzini wrote: >> Hi all, Hey, sorry for late reply. I was enjoying my PTO by not reading e-mails :-) >> >> I am adding a new daemon to QEMU, that QEMU can connect to in order to >> issue persistent reservation commands. Persistent reservation of what? >> >> The daemon can only issue the commands on file descriptor that QEMU >> already has. In addition normal users shouldn't have access to the >> daemon's Unix socket in /run, so the daemon is protected against misuse. >> >> My question is what is the best way to handle the connection to the >> daemon socket. Currently, the path to the socket is passed to QEMU on >> the command line: >> >> -object pr-manager-helper,id=mgr,path=/run/qemu-pr-helper.sock \ >> -drive if=none,id=hd,driver=raw,filename=/dev/sdb,file.pr-manager=mgr \ >> -device scsi-block,drive=hd >> >> (the new parts are "-object pr-manager-helper" and "file.pr-manager"). >> >> I could just make it root:root and pass a file descriptor from libvirt >> to QEMU, but this would make it impossible for QEMU to reconnect to the >> daemon in case someone does a "systemctl restart" or even just kills it >> inadvertently. The daemon is stateless, so transparent reconnection >> would be a nice feature to have. >> >> The alternative is to somehow label the daemon socket so that it can be >> accessed by QEMU, but I'm not very well versed in SELinux. > > Thinking more about it, Libvirt could start the daemon on its own, > assigning a socket per virtual machine. SELinux MCS should then just > work, because the same category is assigned to the daemon instance and QEMU. Whoa, libvirt is not really prepared for qemu spawning processes, or having more than one process per qemu domain. But it shouldn't be that hard to prepare internal structs for that. > > In particular, Libvirt could create the socket itself, label it, and > pass it to the daemon through the systemd socket activation protocol > (which qemu-pr-helper supports). We can pass FDs to qemu (in fact any process). Even if its running. So that shouldn't be a problem. > > The XML to use the helper with a predefined socket could be: > > > /path/to/unix.socket' > Do we want to/need to expose the path here? I mean, is user expected to do something with it? We don't expose monitor path anywhere but keep it private (of course we store it in so called status XML which is a persistent storage solely for purpose of reloading the internal state when daemon is restarted). > > while to use it with a dedicated daemon > > > /path/to/qemu-pr-helper > Ah, so there isn't 1:1 relationship with qemu process and the daemon helper. One daemon can serve multiple qemu processes, am I right? Also, how would libvirt know if the daemon helper dies? I mean, if libvirt is to start it again (there are some systemd-less distros), we have to know that such situation happened. For instance, we can get an event on the monitor to which we start the daemon and pass new FD to its socket to qemu? Although, this would mean a significant work on libvirt side in case there's 1:N relationship. Because on delivery of the event from two domains we have to figure out if the domains are supposed to have their own daemons or one shared. Also, what happens when the daemon dies? What's the qemu state at that point? Is the guest still running or is it paused (e.g. like on ENOSPC error on disks)? Michal -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] New QEMU daemon for persistent reservations
On 22/08/2017 18:27, Paolo Bonzini wrote: > Hi all, > > I am adding a new daemon to QEMU, that QEMU can connect to in order to > issue persistent reservation commands. > > The daemon can only issue the commands on file descriptor that QEMU > already has. In addition normal users shouldn't have access to the > daemon's Unix socket in /run, so the daemon is protected against misuse. > > My question is what is the best way to handle the connection to the > daemon socket. Currently, the path to the socket is passed to QEMU on > the command line: > > -object pr-manager-helper,id=mgr,path=/run/qemu-pr-helper.sock \ > -drive if=none,id=hd,driver=raw,filename=/dev/sdb,file.pr-manager=mgr \ > -device scsi-block,drive=hd > > (the new parts are "-object pr-manager-helper" and "file.pr-manager"). > > I could just make it root:root and pass a file descriptor from libvirt > to QEMU, but this would make it impossible for QEMU to reconnect to the > daemon in case someone does a "systemctl restart" or even just kills it > inadvertently. The daemon is stateless, so transparent reconnection > would be a nice feature to have. > > The alternative is to somehow label the daemon socket so that it can be > accessed by QEMU, but I'm not very well versed in SELinux. Thinking more about it, Libvirt could start the daemon on its own, assigning a socket per virtual machine. SELinux MCS should then just work, because the same category is assigned to the daemon instance and QEMU. In particular, Libvirt could create the socket itself, label it, and pass it to the daemon through the systemd socket activation protocol (which qemu-pr-helper supports). The XML to use the helper with a predefined socket could be: /path/to/unix.socket' while to use it with a dedicated daemon /path/to/qemu-pr-helper An empty element can use a standard helper program declared in qemu.conf (default /usr/libexec/qemu-pr-helper). Paolo -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list