Re: [Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist
On Wednesday, September 18, 2013 05:32:17 PM Daniel P. Berrange wrote: > On Wed, Sep 18, 2013 at 12:19:44PM -0400, Paul Moore wrote: > > On Wednesday, September 18, 2013 04:59:10 PM Daniel P. Berrange wrote: > > > On Wed, Sep 18, 2013 at 11:53:09AM -0400, Paul Moore wrote: > > > > On Wednesday, September 18, 2013 08:38:17 AM Daniel P. Berrange wrote: > > > > > Libvirt does not want to be in the business of creating seccomp > > > > > syscall filters for QEMU. As mentioned before, IMHO that places an > > > > > unacceptable burden on libvirt to know about the syscalls each a > > > > > particular version of QEMU requires for its operation. > > > > > > > > At a high level, I don't see how libvirt configuring and installing a > > > > syscall filter is substantially different from libvirt configuring and > > > > installing a network filter. > > > > > > The rules created for a network filter have no bearing or relation to > > > internal QEMU implementation details, as you have with syscalls, so > > > this isn't really a relevant comparison. > > > > The rules created for a network filter are directly related to the details > > of the guest running inside of QEMU. From a practical point of view I > > see both network and syscall filtering as being dependent on the guest; > > the network filtering configuration can change as the guest's services > > change, the syscall filtering configuration can change as the QEMU > > functionality can change. > > You're talking about two very different things here. Seccomp syscall > filtering affects QEMU itself while network filter affects the guest > OS apps inside QEMU. >From a security standpoint I'm not entirely convinced the distinction is important. > Network filtering still does not depend on the implementation details of the > guest OS apps - it depends on the services that those apps are using. Once again, I'm not entirely sure that worrying about the distinction between guest apps/services is important - it is just the "guest". > Thus configuring network filters does not require the admin to have > knowledge of the apps internal impl details in the way that seccomp does. Network filters require the admin to have knowledge of what apps/services the guest is providing. Syscall filters require the admin have knowledge of what version of QEMU is deployed on the host. I think it is reasonable to expect that the admin has more knowledge, and more control, over the QEMU version they are using than they do over what is being run in the hosted guests. I don't argue that arriving at the correct syscall filter configuration is more difficult than a network filter, but I don't see what that means we can't offer it as an option for the more savy admins. Also, the libvirt patches I'm currently working on allow the syscall filter to be defined either as a whitelist or a blacklist; the blacklist approach should provide a much more gradual learning curve ... and in the case of containers, I suspect it might also be the better solution. > > > > Also, and I recognize this is diverting away from a topic most of > > > > qemu-devel is not interested in, what about libvirt-lxc? What about > > > > all of the other virtualization drivers supported by libvirt (granted, > > > > not all would be candidates for syscall filtering, but you get the > > > > idea). > > > > > > It isn't clear to me that syscall filtering is something that's relevant > > > for inclusion in libvirt-lxc. It seems like something that would be used > > > by apps running inside LXC containers directly. > > > > For all the same reasons that it makes sense to filter syscalls in QEMU, I > > think it makes sense to filter syscalls in libvirt-lxc. The fundamental > > concern is that the kernel presents are large attack surface in the way of > > syscalls, and it is extremely likely that any given container does not > > have a legitimate need to call into all of the syscalls the kernel > > presents to userspace; especially if you consider the recent approaches > > of using containers to ship/deploy single applications. > > > > Also, just in case there are some misconceptions floating around, loading > > a syscall filter in libvirt doesn't mean the individual container > > applications can't also load their own filter. When multiple syscall > > filters are present for a given process, all of the filters are evaluated > > and the most restrictive decision for a given syscall request "wins". > > > > > Libvirt has no knowledge of such apps or what rules they might require, > > > so can't make any kind of intelligent decision about syscall filtering > > > for LXC. > > > > A perfectly valid point, but I also think of syscall filtering as allowing > > the host administrator the ability to reduce the attack surface of the > > host system/kernel from potentially malicious containers/applications > > without having to rely on these containers/applications to police > > themselves. > > > > > I really view seccom
Re: [Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist
On 09/18/2013 12:32 PM, Daniel P. Berrange wrote: On Wed, Sep 18, 2013 at 12:19:44PM -0400, Paul Moore wrote: On Wednesday, September 18, 2013 04:59:10 PM Daniel P. Berrange wrote: On Wed, Sep 18, 2013 at 11:53:09AM -0400, Paul Moore wrote: On Wednesday, September 18, 2013 08:38:17 AM Daniel P. Berrange wrote: Libvirt does not want to be in the business of creating seccomp syscall filters for QEMU. As mentioned before, IMHO that places an unacceptable burden on libvirt to know about the syscalls each a particular version of QEMU requires for its operation. At a high level, I don't see how libvirt configuring and installing a syscall filter is substantially different from libvirt configuring and installing a network filter. The rules created for a network filter have no bearing or relation to internal QEMU implementation details, as you have with syscalls, so this isn't really a relevant comparison. The rules created for a network filter are directly related to the details of the guest running inside of QEMU. From a practical point of view I see both network and syscall filtering as being dependent on the guest; the network filtering configuration can change as the guest's services change, the syscall filtering configuration can change as the QEMU functionality can change. You're talking about two very different things here. Seccomp syscall filtering affects QEMU itself, while network filter affects the guest OS apps inside QEMU. Network filtering still does not depend on the implementation details of the guest OS apps - it depends on the services that those apps are using. Thus configuring network filters does not require the admin to have knowledge of the apps internal impl details in the way that seccomp does. Also, and I recognize this is diverting away from a topic most of qemu-devel is not interested in, what about libvirt-lxc? What about all of the other virtualization drivers supported by libvirt (granted, not all would be candidates for syscall filtering, but you get the idea). It isn't clear to me that syscall filtering is something that's relevant for inclusion in libvirt-lxc. It seems like something that would be used by apps running inside LXC containers directly. For all the same reasons that it makes sense to filter syscalls in QEMU, I think it makes sense to filter syscalls in libvirt-lxc. The fundamental concern is that the kernel presents are large attack surface in the way of syscalls, and it is extremely likely that any given container does not have a legitimate need to call into all of the syscalls the kernel presents to userspace; especially if you consider the recent approaches of using containers to ship/deploy single applications. Also, just in case there are some misconceptions floating around, loading a syscall filter in libvirt doesn't mean the individual container applications can't also load their own filter. When multiple syscall filters are present for a given process, all of the filters are evaluated and the most restrictive decision for a given syscall request "wins". Libvirt has no knowledge of such apps or what rules they might require, so can't make any kind of intelligent decision about syscall filtering for LXC. A perfectly valid point, but I also think of syscall filtering as allowing the host administrator the ability to reduce the attack surface of the host system/kernel from potentially malicious containers/applications without having to rely on these containers/applications to police themselves. I really view seccomp as something that apps use directly themselves, not something that a 3rd party process applies prior to launching the apps, since the latter has far too much administrative burden IMHO. The seccomp filter functionality is definitely something that apps can use themselves, but to limit syscall filtering to just that use case is to miss out on other valid uses as well. As far as the burden is concerned, is users/administrators find it too difficult, there is nothing requiring them to use it, however, for those who are facing serious security risks in their deployments providing syscall filtering in libvirt might be a very welcome addition. I'm not debating the usefulness of secomp technology, I just really don't see it as something that is practical or sensible to encourage end users/ admins to make use of. It is hard enough for app developers themselves to make use of it properly and they have a tonne of domain knowledge about the internals of their application implementation. When you have uninformed users/admins using it by trial and error I just see a support disaster coming straight at us. That small minority who really are skilful enough to use it can still do so by launching the app in question via a 'runseccomp' like too which would just install a filter & then exec the real binary. Regards, Daniel An added benefit of allowing an admin to configure a seccomp filter is that they could potentially "patch
Re: [Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist
On Wed, Sep 18, 2013 at 12:19:44PM -0400, Paul Moore wrote: > On Wednesday, September 18, 2013 04:59:10 PM Daniel P. Berrange wrote: > > On Wed, Sep 18, 2013 at 11:53:09AM -0400, Paul Moore wrote: > > > On Wednesday, September 18, 2013 08:38:17 AM Daniel P. Berrange wrote: > > > > Libvirt does not want to be in the business of creating seccomp syscall > > > > filters for QEMU. As mentioned before, IMHO that places an unacceptable > > > > burden on libvirt to know about the syscalls each a particular version > > > > of QEMU requires for its operation. > > > > > > At a high level, I don't see how libvirt configuring and installing a > > > syscall filter is substantially different from libvirt configuring and > > > installing a network filter. > > > > The rules created for a network filter have no bearing or relation to > > internal QEMU implementation details, as you have with syscalls, so > > this isn't really a relevant comparison. > > The rules created for a network filter are directly related to the details of > the guest running inside of QEMU. From a practical point of view I see both > network and syscall filtering as being dependent on the guest; the network > filtering configuration can change as the guest's services change, the > syscall > filtering configuration can change as the QEMU functionality can change. You're talking about two very different things here. Seccomp syscall filtering affects QEMU itself, while network filter affects the guest OS apps inside QEMU. Network filtering still does not depend on the implementation details of the guest OS apps - it depends on the services that those apps are using. Thus configuring network filters does not require the admin to have knowledge of the apps internal impl details in the way that seccomp does. > > > Also, and I recognize this is diverting away from a topic most of > > > qemu-devel is not interested in, what about libvirt-lxc? What about all > > > of the other virtualization drivers supported by libvirt (granted, not > > > all would be candidates for syscall filtering, but you get the idea). > > > > It isn't clear to me that syscall filtering is something that's relevant > > for inclusion in libvirt-lxc. It seems like something that would be used > > by apps running inside LXC containers directly. > > For all the same reasons that it makes sense to filter syscalls in QEMU, I > think it makes sense to filter syscalls in libvirt-lxc. The fundamental > concern is that the kernel presents are large attack surface in the way of > syscalls, and it is extremely likely that any given container does not have a > legitimate need to call into all of the syscalls the kernel presents to > userspace; especially if you consider the recent approaches of using > containers to ship/deploy single applications. > > Also, just in case there are some misconceptions floating around, loading a > syscall filter in libvirt doesn't mean the individual container applications > can't also load their own filter. When multiple syscall filters are present > for a given process, all of the filters are evaluated and the most > restrictive > decision for a given syscall request "wins". > > > Libvirt has no knowledge of such apps or what rules they might require, so > > can't make any kind of intelligent decision about syscall filtering for LXC. > > A perfectly valid point, but I also think of syscall filtering as allowing > the > host administrator the ability to reduce the attack surface of the host > system/kernel from potentially malicious containers/applications without > having to rely on these containers/applications to police themselves. > > > I really view seccomp as something that apps use directly themselves, not > > something that a 3rd party process applies prior to launching the apps, > > since the latter has far too much administrative burden IMHO. > > The seccomp filter functionality is definitely something that apps can use > themselves, but to limit syscall filtering to just that use case is to miss > out on other valid uses as well. As far as the burden is concerned, is > users/administrators find it too difficult, there is nothing requiring them > to > use it, however, for those who are facing serious security risks in their > deployments providing syscall filtering in libvirt might be a very welcome > addition. I'm not debating the usefulness of secomp technology, I just really don't see it as something that is practical or sensible to encourage end users/ admins to make use of. It is hard enough for app developers themselves to make use of it properly and they have a tonne of domain knowledge about the internals of their application implementation. When you have uninformed users/admins using it by trial and error I just see a support disaster coming straight at us. That small minority who really are skilful enough to use it can still do so by launching the app in question via a 'runseccomp' like too which would
Re: [Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist
On Wed, Sep 18, 2013 at 11:53:09AM -0400, Paul Moore wrote: > On Wednesday, September 18, 2013 08:38:17 AM Daniel P. Berrange wrote: > > Libvirt does not want to be in the business of creating seccomp syscall > > filters for QEMU. As mentioned before, IMHO that places an unacceptable > > burden on libvirt to know about the syscalls each a particular version > > of QEMU requires for its operation. > > At a high level, I don't see how libvirt configuring and installing a syscall > filter is substantially different from libvirt configuring and installing a > network filter. The rules created for a network filter have no bearing or relation to internal QEMU implementation details, as you have with syscalls, so this isn't really a relevant comparison. > Also, and I recognize this is diverting away from a topic most of qemu-devel > is not interested in, what about libvirt-lxc? What about all of the other > virtualization drivers supported by libvirt (granted, not all would be > candidates for syscall filtering, but you get the idea). It isn't clear to me that syscall filtering is something that's relevant for inclusion in libvirt-lxc. It seems like something that would be used by apps running inside LXC containers directly. Libvirt has no knowledge of such apps or what rules they might require, so can't make any kind of intelligent decision about syscall filtering for LXC. I really view seccomp as something that apps use directly themselves, not something that a 3rd party process applies prior to launching the apps, since the latter has far too much administrative burden IMHO. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Re: [Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist
On Wednesday, September 18, 2013 08:38:17 AM Daniel P. Berrange wrote: > Libvirt does not want to be in the business of creating seccomp syscall > filters for QEMU. As mentioned before, IMHO that places an unacceptable > burden on libvirt to know about the syscalls each a particular version > of QEMU requires for its operation. At a high level, I don't see how libvirt configuring and installing a syscall filter is substantially different from libvirt configuring and installing a network filter. Also, and I recognize this is diverting away from a topic most of qemu-devel is not interested in, what about libvirt-lxc? What about all of the other virtualization drivers supported by libvirt (granted, not all would be candidates for syscall filtering, but you get the idea). -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist
On Wednesday, September 18, 2013 04:59:10 PM Daniel P. Berrange wrote: > On Wed, Sep 18, 2013 at 11:53:09AM -0400, Paul Moore wrote: > > On Wednesday, September 18, 2013 08:38:17 AM Daniel P. Berrange wrote: > > > Libvirt does not want to be in the business of creating seccomp syscall > > > filters for QEMU. As mentioned before, IMHO that places an unacceptable > > > burden on libvirt to know about the syscalls each a particular version > > > of QEMU requires for its operation. > > > > At a high level, I don't see how libvirt configuring and installing a > > syscall filter is substantially different from libvirt configuring and > > installing a network filter. > > The rules created for a network filter have no bearing or relation to > internal QEMU implementation details, as you have with syscalls, so > this isn't really a relevant comparison. The rules created for a network filter are directly related to the details of the guest running inside of QEMU. From a practical point of view I see both network and syscall filtering as being dependent on the guest; the network filtering configuration can change as the guest's services change, the syscall filtering configuration can change as the QEMU functionality can change. > > Also, and I recognize this is diverting away from a topic most of > > qemu-devel is not interested in, what about libvirt-lxc? What about all > > of the other virtualization drivers supported by libvirt (granted, not > > all would be candidates for syscall filtering, but you get the idea). > > It isn't clear to me that syscall filtering is something that's relevant > for inclusion in libvirt-lxc. It seems like something that would be used > by apps running inside LXC containers directly. For all the same reasons that it makes sense to filter syscalls in QEMU, I think it makes sense to filter syscalls in libvirt-lxc. The fundamental concern is that the kernel presents are large attack surface in the way of syscalls, and it is extremely likely that any given container does not have a legitimate need to call into all of the syscalls the kernel presents to userspace; especially if you consider the recent approaches of using containers to ship/deploy single applications. Also, just in case there are some misconceptions floating around, loading a syscall filter in libvirt doesn't mean the individual container applications can't also load their own filter. When multiple syscall filters are present for a given process, all of the filters are evaluated and the most restrictive decision for a given syscall request "wins". > Libvirt has no knowledge of such apps or what rules they might require, so > can't make any kind of intelligent decision about syscall filtering for LXC. A perfectly valid point, but I also think of syscall filtering as allowing the host administrator the ability to reduce the attack surface of the host system/kernel from potentially malicious containers/applications without having to rely on these containers/applications to police themselves. > I really view seccomp as something that apps use directly themselves, not > something that a 3rd party process applies prior to launching the apps, > since the latter has far too much administrative burden IMHO. The seccomp filter functionality is definitely something that apps can use themselves, but to limit syscall filtering to just that use case is to miss out on other valid uses as well. As far as the burden is concerned, is users/administrators find it too difficult, there is nothing requiring them to use it, however, for those who are facing serious security risks in their deployments providing syscall filtering in libvirt might be a very welcome addition. -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist
On Tue, Sep 17, 2013 at 03:17:28PM -0400, Corey Bryant wrote: > > > On 09/17/2013 01:14 PM, Eduardo Otubo wrote: > > > > > >On 09/17/2013 11:43 AM, Paul Moore wrote: > >>On Tuesday, September 17, 2013 02:06:06 PM Daniel P. Berrange wrote: > >>>On Tue, Sep 17, 2013 at 10:01:23AM -0300, Eduardo Otubo wrote: > >>> > Paul, what exactly are you planning to add to libvirt? I'm not a big > fan of using qemu command line to pass syscalls for blacklist as > arguments, but I can't see other way to avoid problems (like -net > bridge / -net tap) from happening. > >> > >>At present, and as far as I'm concerned pretty much everything is open > >>for > >>discussion, the code works similar to the libvirt network filters. > >>You create > >>a separate XML configuration file which defines the filter and you > >>reference > >>that filter from the domain's XML configuration. When a QEMU/KVM or > >>LXC based > >>domain starts it uses libseccomp to create the seccomp filter and then > >>loads > >>it into the kernel after the fork but before the domain is exec'd. > > > >Clever approach. I tihnk a possible way to do this is something like: > > > > -sandbox > >-on[,strict=][,whitelist=qemu_whitelist.conf][,blacklist=qemu_blacklist.conf] > > > > > > where: > > > >[,whitelist=qemu_whitelist.conf] will override default whitelist filter > >[,blacklist=blacklist.conf] will override default blacklist filter > > > >But when we add seccomp support for qemu on libvirt, we make sure to > >just add -sandbox off and use Paul's approach. > > > >Is that a reasonable approach? What do you think? > > > > QEMU wouldn't require any changes for the approach Paul describes. > The QEMU process that is exec'd by libvirt would be constrained by > the filter that libvirt installed. Libvirt does not want to be in the business of creating seccomp syscall filters for QEMU. As mentioned before, IMHO that places an unacceptable burden on libvirt to know about the syscalls each a particular version of QEMU requires for its operation. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Re: [Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist
On Tue, Sep 17, 2013 at 02:14:25PM -0300, Eduardo Otubo wrote: > > > On 09/17/2013 11:43 AM, Paul Moore wrote: > >On Tuesday, September 17, 2013 02:06:06 PM Daniel P. Berrange wrote: > >>On Tue, Sep 17, 2013 at 10:01:23AM -0300, Eduardo Otubo wrote: > >> > >>>Paul, what exactly are you planning to add to libvirt? I'm not a big > >>>fan of using qemu command line to pass syscalls for blacklist as > >>>arguments, but I can't see other way to avoid problems (like -net > >>>bridge / -net tap) from happening. > > > >At present, and as far as I'm concerned pretty much everything is open for > >discussion, the code works similar to the libvirt network filters. You > >create > >a separate XML configuration file which defines the filter and you reference > >that filter from the domain's XML configuration. When a QEMU/KVM or LXC > >based > >domain starts it uses libseccomp to create the seccomp filter and then loads > >it into the kernel after the fork but before the domain is exec'd. > > Clever approach. I tihnk a possible way to do this is something like: > > -sandbox > -on[,strict=][,whitelist=qemu_whitelist.conf][,blacklist=qemu_blacklist.conf] > > where: > > [,whitelist=qemu_whitelist.conf] will override default whitelist filter > [,blacklist=blacklist.conf] will override default blacklist filter > > But when we add seccomp support for qemu on libvirt, we make sure to > just add -sandbox off and use Paul's approach. > > Is that a reasonable approach? What do you think? IMHO the same problem exists for non-libvirt apps using QEMU. Exposing lists of syscalls as a config option requires applications using QEMU to know far too much about QEMU's internal implementation details. With this syntax either apps have to read the source to find out which syscalls to allow, or they have to use trial & error launching QEMU repeatedly to see what breaks. Neither of these are nice to applications. IMHO any configuration of syscalls lists should be exclusively QEMU's responsibility. What is your actual goal here ? If the goal is to make it possible to use arbitrary command line arguments, then IMHO, QEMU should just look at the args given and automatically just "do the right thing" with the syscall whitelists. Of course per my previous message, I think making all possible args work under seccomp should be a non-goal. > >There are no command line arguments passed to QEMU. This work can co-exist > >with the QEMU seccomp filters without problem. > > > >The original goal of this effort wasn't to add libvirt syscall filtering for > >QEMU, but rather for LXC; adding QEMU support just happened to be a trivial > >patch once the LXC support was added. > > > >(I also apologize for the delays, I hit a snag with an existing problem on > >libvirt which stopped work and then some other BZs grabbed my attention...) > > > >>IMHO, if libvirt is enabling seccomp, then making all possible cli > >>args work is a non-goal. If there are things which require privileges > >>seccomp is blocking, then libvirt should avoid using them. eg by making > >>use of FD passing where appropriate to reduce privileges qemu needs. > > > >I agree. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Re: [Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist
On 09/17/2013 04:17 PM, Corey Bryant wrote: On 09/17/2013 01:14 PM, Eduardo Otubo wrote: On 09/17/2013 11:43 AM, Paul Moore wrote: On Tuesday, September 17, 2013 02:06:06 PM Daniel P. Berrange wrote: On Tue, Sep 17, 2013 at 10:01:23AM -0300, Eduardo Otubo wrote: Paul, what exactly are you planning to add to libvirt? I'm not a big fan of using qemu command line to pass syscalls for blacklist as arguments, but I can't see other way to avoid problems (like -net bridge / -net tap) from happening. At present, and as far as I'm concerned pretty much everything is open for discussion, the code works similar to the libvirt network filters. You create a separate XML configuration file which defines the filter and you reference that filter from the domain's XML configuration. When a QEMU/KVM or LXC based domain starts it uses libseccomp to create the seccomp filter and then loads it into the kernel after the fork but before the domain is exec'd. Clever approach. I tihnk a possible way to do this is something like: -sandbox -on[,strict=][,whitelist=qemu_whitelist.conf][,blacklist=qemu_blacklist.conf] where: [,whitelist=qemu_whitelist.conf] will override default whitelist filter [,blacklist=blacklist.conf] will override default blacklist filter But when we add seccomp support for qemu on libvirt, we make sure to just add -sandbox off and use Paul's approach. Is that a reasonable approach? What do you think? QEMU wouldn't require any changes for the approach Paul describes. The QEMU process that is exec'd by libvirt would be constrained by the filter that libvirt installed. Yes, that is correct. But I'm thinking about the case when Qemu is run stand-alone, without libvirt. There must be a way to configure it without using a pre configured filter from libvirt. -- Eduardo Otubo IBM Linux Technology Center
Re: [Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist
On 09/17/2013 01:14 PM, Eduardo Otubo wrote: On 09/17/2013 11:43 AM, Paul Moore wrote: On Tuesday, September 17, 2013 02:06:06 PM Daniel P. Berrange wrote: On Tue, Sep 17, 2013 at 10:01:23AM -0300, Eduardo Otubo wrote: Paul, what exactly are you planning to add to libvirt? I'm not a big fan of using qemu command line to pass syscalls for blacklist as arguments, but I can't see other way to avoid problems (like -net bridge / -net tap) from happening. At present, and as far as I'm concerned pretty much everything is open for discussion, the code works similar to the libvirt network filters. You create a separate XML configuration file which defines the filter and you reference that filter from the domain's XML configuration. When a QEMU/KVM or LXC based domain starts it uses libseccomp to create the seccomp filter and then loads it into the kernel after the fork but before the domain is exec'd. Clever approach. I tihnk a possible way to do this is something like: -sandbox -on[,strict=][,whitelist=qemu_whitelist.conf][,blacklist=qemu_blacklist.conf] where: [,whitelist=qemu_whitelist.conf] will override default whitelist filter [,blacklist=blacklist.conf] will override default blacklist filter But when we add seccomp support for qemu on libvirt, we make sure to just add -sandbox off and use Paul's approach. Is that a reasonable approach? What do you think? QEMU wouldn't require any changes for the approach Paul describes. The QEMU process that is exec'd by libvirt would be constrained by the filter that libvirt installed. -- Regards, Corey Bryant There are no command line arguments passed to QEMU. This work can co-exist with the QEMU seccomp filters without problem. The original goal of this effort wasn't to add libvirt syscall filtering for QEMU, but rather for LXC; adding QEMU support just happened to be a trivial patch once the LXC support was added. (I also apologize for the delays, I hit a snag with an existing problem on libvirt which stopped work and then some other BZs grabbed my attention...) IMHO, if libvirt is enabling seccomp, then making all possible cli args work is a non-goal. If there are things which require privileges seccomp is blocking, then libvirt should avoid using them. eg by making use of FD passing where appropriate to reduce privileges qemu needs. I agree.
Re: [Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist
On 09/17/2013 02:14 PM, Eduardo Otubo wrote: On 09/17/2013 11:43 AM, Paul Moore wrote: On Tuesday, September 17, 2013 02:06:06 PM Daniel P. Berrange wrote: On Tue, Sep 17, 2013 at 10:01:23AM -0300, Eduardo Otubo wrote: Paul, what exactly are you planning to add to libvirt? I'm not a big fan of using qemu command line to pass syscalls for blacklist as arguments, but I can't see other way to avoid problems (like -net bridge / -net tap) from happening. At present, and as far as I'm concerned pretty much everything is open for discussion, the code works similar to the libvirt network filters. You create a separate XML configuration file which defines the filter and you reference that filter from the domain's XML configuration. When a QEMU/KVM or LXC based domain starts it uses libseccomp to create the seccomp filter and then loads it into the kernel after the fork but before the domain is exec'd. Clever approach. I tihnk a possible way to do this is something like: -sandbox -on[,strict=][,whitelist=qemu_whitelist.conf][,blacklist=qemu_blacklist.conf] where: [,whitelist=qemu_whitelist.conf] will override default whitelist filter [,blacklist=blacklist.conf] will override default blacklist filter But when we add seccomp support for qemu on libvirt, we make sure to just add -sandbox off and use Paul's approach. Is that a reasonable approach? What do you think? This approach is also interesting from the test point of view. I'll be able to write more complex tests on virt-test. General tests like "remove one syscall at a time from whitelist and test" --without the need of sed'ing the code and recompiling every time, or even include new syscalls to the blacklist. There are no command line arguments passed to QEMU. This work can co-exist with the QEMU seccomp filters without problem. The original goal of this effort wasn't to add libvirt syscall filtering for QEMU, but rather for LXC; adding QEMU support just happened to be a trivial patch once the LXC support was added. (I also apologize for the delays, I hit a snag with an existing problem on libvirt which stopped work and then some other BZs grabbed my attention...) IMHO, if libvirt is enabling seccomp, then making all possible cli args work is a non-goal. If there are things which require privileges seccomp is blocking, then libvirt should avoid using them. eg by making use of FD passing where appropriate to reduce privileges qemu needs. I agree. -- Eduardo Otubo IBM Linux Technology Center
Re: [Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist
On 09/17/2013 11:43 AM, Paul Moore wrote: On Tuesday, September 17, 2013 02:06:06 PM Daniel P. Berrange wrote: On Tue, Sep 17, 2013 at 10:01:23AM -0300, Eduardo Otubo wrote: Paul, what exactly are you planning to add to libvirt? I'm not a big fan of using qemu command line to pass syscalls for blacklist as arguments, but I can't see other way to avoid problems (like -net bridge / -net tap) from happening. At present, and as far as I'm concerned pretty much everything is open for discussion, the code works similar to the libvirt network filters. You create a separate XML configuration file which defines the filter and you reference that filter from the domain's XML configuration. When a QEMU/KVM or LXC based domain starts it uses libseccomp to create the seccomp filter and then loads it into the kernel after the fork but before the domain is exec'd. Clever approach. I tihnk a possible way to do this is something like: -sandbox -on[,strict=][,whitelist=qemu_whitelist.conf][,blacklist=qemu_blacklist.conf] where: [,whitelist=qemu_whitelist.conf] will override default whitelist filter [,blacklist=blacklist.conf] will override default blacklist filter But when we add seccomp support for qemu on libvirt, we make sure to just add -sandbox off and use Paul's approach. Is that a reasonable approach? What do you think? There are no command line arguments passed to QEMU. This work can co-exist with the QEMU seccomp filters without problem. The original goal of this effort wasn't to add libvirt syscall filtering for QEMU, but rather for LXC; adding QEMU support just happened to be a trivial patch once the LXC support was added. (I also apologize for the delays, I hit a snag with an existing problem on libvirt which stopped work and then some other BZs grabbed my attention...) IMHO, if libvirt is enabling seccomp, then making all possible cli args work is a non-goal. If there are things which require privileges seccomp is blocking, then libvirt should avoid using them. eg by making use of FD passing where appropriate to reduce privileges qemu needs. I agree. -- Eduardo Otubo IBM Linux Technology Center
Re: [Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist
On Tuesday, September 17, 2013 02:06:06 PM Daniel P. Berrange wrote: > On Tue, Sep 17, 2013 at 10:01:23AM -0300, Eduardo Otubo wrote: > > > Paul, what exactly are you planning to add to libvirt? I'm not a big > > fan of using qemu command line to pass syscalls for blacklist as > > arguments, but I can't see other way to avoid problems (like -net > > bridge / -net tap) from happening. At present, and as far as I'm concerned pretty much everything is open for discussion, the code works similar to the libvirt network filters. You create a separate XML configuration file which defines the filter and you reference that filter from the domain's XML configuration. When a QEMU/KVM or LXC based domain starts it uses libseccomp to create the seccomp filter and then loads it into the kernel after the fork but before the domain is exec'd. There are no command line arguments passed to QEMU. This work can co-exist with the QEMU seccomp filters without problem. The original goal of this effort wasn't to add libvirt syscall filtering for QEMU, but rather for LXC; adding QEMU support just happened to be a trivial patch once the LXC support was added. (I also apologize for the delays, I hit a snag with an existing problem on libvirt which stopped work and then some other BZs grabbed my attention...) > IMHO, if libvirt is enabling seccomp, then making all possible cli > args work is a non-goal. If there are things which require privileges > seccomp is blocking, then libvirt should avoid using them. eg by making > use of FD passing where appropriate to reduce privileges qemu needs. I agree. -- paul moore security and virtualization @ redhat
Re: [Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist
On Tue, Sep 17, 2013 at 10:01:23AM -0300, Eduardo Otubo wrote: > > > On 09/11/2013 01:49 PM, Daniel P. Berrange wrote: > >On Wed, Sep 11, 2013 at 12:45:54PM -0400, Corey Bryant wrote: > >> > >> > >>On 09/06/2013 03:21 PM, Eduardo Otubo wrote: > >>>New command line options for the seccomp blacklist feature: > >>> > >>> $ qemu -sandbox on[,strict=] > >>> > >>>The strict parameter will turn on or off the new system call blacklist > >> > >>I mentioned this before but I'll say it again since I think it needs > >>to be discussed. Since this regresses support (it'll prevent -net > >>bridge and -net tap from using execv) the concern I have with the > >>strict=on|off option is whether or not we will have the flexibility > >>to modify the blacklist once QEMU is released with this support. Of > >>course we should be able to add more syscalls to the blacklist as > >>long as they don't regress QEMU functionality. But if we want to > >>add a syscall that does regress QEMU functionality, I think we'd > >>have to add a new command line option, which doesn't seem desirable. > >> > >>So a more flexible approach may be necessary. Maybe the blacklist > >>should be passed on the command line, which would enable it to be > >>defined by libvirt and passed to QEMU. I know Paul is working on > >>something for libvirt so maybe that answers this question. > > Paul, what exactly are you planning to add to libvirt? I'm not a big > fan of using qemu command line to pass syscalls for blacklist as > arguments, but I can't see other way to avoid problems (like -net > bridge / -net tap) from happening. IMHO, if libvirt is enabling seccomp, then making all possible cli args work is a non-goal. If there are things which require privileges seccomp is blocking, then libvirt should avoid using them. eg by making use of FD passing where appropriate to reduce privileges qemu needs. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Re: [Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist
On 09/11/2013 01:49 PM, Daniel P. Berrange wrote: On Wed, Sep 11, 2013 at 12:45:54PM -0400, Corey Bryant wrote: On 09/06/2013 03:21 PM, Eduardo Otubo wrote: New command line options for the seccomp blacklist feature: $ qemu -sandbox on[,strict=] The strict parameter will turn on or off the new system call blacklist I mentioned this before but I'll say it again since I think it needs to be discussed. Since this regresses support (it'll prevent -net bridge and -net tap from using execv) the concern I have with the strict=on|off option is whether or not we will have the flexibility to modify the blacklist once QEMU is released with this support. Of course we should be able to add more syscalls to the blacklist as long as they don't regress QEMU functionality. But if we want to add a syscall that does regress QEMU functionality, I think we'd have to add a new command line option, which doesn't seem desirable. So a more flexible approach may be necessary. Maybe the blacklist should be passed on the command line, which would enable it to be defined by libvirt and passed to QEMU. I know Paul is working on something for libvirt so maybe that answers this question. Paul, what exactly are you planning to add to libvirt? I'm not a big fan of using qemu command line to pass syscalls for blacklist as arguments, but I can't see other way to avoid problems (like -net bridge / -net tap) from happening. On the face of it, I'm not at all a fan of the idea of libvirt having to pass a syscall whitelist/blacklist to QEMU. IMHO that would be exposing too much knowledge of QEMU impl details to libvirt. Daniel -- Eduardo Otubo IBM Linux Technology Center
Re: [Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist
On Wed, Sep 11, 2013 at 12:45:54PM -0400, Corey Bryant wrote: > > > On 09/06/2013 03:21 PM, Eduardo Otubo wrote: > >New command line options for the seccomp blacklist feature: > > > > $ qemu -sandbox on[,strict=] > > > >The strict parameter will turn on or off the new system call blacklist > > I mentioned this before but I'll say it again since I think it needs > to be discussed. Since this regresses support (it'll prevent -net > bridge and -net tap from using execv) the concern I have with the > strict=on|off option is whether or not we will have the flexibility > to modify the blacklist once QEMU is released with this support. Of > course we should be able to add more syscalls to the blacklist as > long as they don't regress QEMU functionality. But if we want to > add a syscall that does regress QEMU functionality, I think we'd > have to add a new command line option, which doesn't seem desirable. > > So a more flexible approach may be necessary. Maybe the blacklist > should be passed on the command line, which would enable it to be > defined by libvirt and passed to QEMU. I know Paul is working on > something for libvirt so maybe that answers this question. On the face of it, I'm not at all a fan of the idea of libvirt having to pass a syscall whitelist/blacklist to QEMU. IMHO that would be exposing too much knowledge of QEMU impl details to libvirt. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Re: [Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist
On 09/06/2013 03:21 PM, Eduardo Otubo wrote: New command line options for the seccomp blacklist feature: $ qemu -sandbox on[,strict=] The strict parameter will turn on or off the new system call blacklist I mentioned this before but I'll say it again since I think it needs to be discussed. Since this regresses support (it'll prevent -net bridge and -net tap from using execv) the concern I have with the strict=on|off option is whether or not we will have the flexibility to modify the blacklist once QEMU is released with this support. Of course we should be able to add more syscalls to the blacklist as long as they don't regress QEMU functionality. But if we want to add a syscall that does regress QEMU functionality, I think we'd have to add a new command line option, which doesn't seem desirable. So a more flexible approach may be necessary. Maybe the blacklist should be passed on the command line, which would enable it to be defined by libvirt and passed to QEMU. I know Paul is working on something for libvirt so maybe that answers this question. Signed-off-by: Eduardo Otubo --- qemu-options.hx | 8 +--- vl.c| 11 ++- 2 files changed, 15 insertions(+), 4 deletions(-) diff --git a/qemu-options.hx b/qemu-options.hx index d15338e..05485e1 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -2978,13 +2978,15 @@ Old param mode (ARM only). ETEXI DEF("sandbox", HAS_ARG, QEMU_OPTION_sandbox, \ -"-sandbox Enable seccomp mode 2 system call filter (default 'off').\n", +"-sandbox Enable seccomp mode 2 system call filter (default 'off').\n" +"-sandbox on[,strict=]\n" +"Enable seccomp mode 2 system call second level filter (default 'off').\n", Does this need to mention the QEMU features restricted by the blacklist? QEMU_ARCH_ALL) STEXI -@item -sandbox @var{arg} +@item -sandbox @var{arg}[,strict=@var{value}] @findex -sandbox Enable Seccomp mode 2 system call filter. 'on' will enable syscall filtering and 'off' will -disable it. The default is 'off'. +disable it. The default is 'off'. 'strict=on' will enable second level filter (default is 'off'). And here too? ETEXI DEF("readconfig", HAS_ARG, QEMU_OPTION_readconfig, diff --git a/vl.c b/vl.c index 02f7486..909f685 100644 --- a/vl.c +++ b/vl.c @@ -329,6 +329,9 @@ static QemuOptsList qemu_sandbox_opts = { { .name = "enable", .type = QEMU_OPT_BOOL, +},{ +.name = "strict", +.type = QEMU_OPT_STRING, }, { /* end of list */ } }, @@ -1031,6 +1034,7 @@ static int bt_parse(const char *opt) static int parse_sandbox(QemuOpts *opts, void *opaque) { +const char * strict_value = NULL; /* FIXME: change this to true for 1.3 */ if (qemu_opt_get_bool(opts, "enable", false)) { #ifdef CONFIG_SECCOMP @@ -1040,7 +1044,12 @@ static int parse_sandbox(QemuOpts *opts, void *opaque) return -1; } -enable_blacklist = true; +strict_value = qemu_opt_get(opts, "strict"); +if (strict_value) { +if (!strcmp(strict_value, "on")) { +enable_blacklist = true; +} +} #else qerror_report(ERROR_CLASS_GENERIC_ERROR, "sandboxing request but seccomp is not compiled into this build"); -- Regards, Corey Bryant
[Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist
New command line options for the seccomp blacklist feature: $ qemu -sandbox on[,strict=] The strict parameter will turn on or off the new system call blacklist Signed-off-by: Eduardo Otubo --- qemu-options.hx | 8 +--- vl.c| 11 ++- 2 files changed, 15 insertions(+), 4 deletions(-) diff --git a/qemu-options.hx b/qemu-options.hx index d15338e..05485e1 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -2978,13 +2978,15 @@ Old param mode (ARM only). ETEXI DEF("sandbox", HAS_ARG, QEMU_OPTION_sandbox, \ -"-sandbox Enable seccomp mode 2 system call filter (default 'off').\n", +"-sandbox Enable seccomp mode 2 system call filter (default 'off').\n" +"-sandbox on[,strict=]\n" +"Enable seccomp mode 2 system call second level filter (default 'off').\n", QEMU_ARCH_ALL) STEXI -@item -sandbox @var{arg} +@item -sandbox @var{arg}[,strict=@var{value}] @findex -sandbox Enable Seccomp mode 2 system call filter. 'on' will enable syscall filtering and 'off' will -disable it. The default is 'off'. +disable it. The default is 'off'. 'strict=on' will enable second level filter (default is 'off'). ETEXI DEF("readconfig", HAS_ARG, QEMU_OPTION_readconfig, diff --git a/vl.c b/vl.c index 02f7486..909f685 100644 --- a/vl.c +++ b/vl.c @@ -329,6 +329,9 @@ static QemuOptsList qemu_sandbox_opts = { { .name = "enable", .type = QEMU_OPT_BOOL, +},{ +.name = "strict", +.type = QEMU_OPT_STRING, }, { /* end of list */ } }, @@ -1031,6 +1034,7 @@ static int bt_parse(const char *opt) static int parse_sandbox(QemuOpts *opts, void *opaque) { +const char * strict_value = NULL; /* FIXME: change this to true for 1.3 */ if (qemu_opt_get_bool(opts, "enable", false)) { #ifdef CONFIG_SECCOMP @@ -1040,7 +1044,12 @@ static int parse_sandbox(QemuOpts *opts, void *opaque) return -1; } -enable_blacklist = true; +strict_value = qemu_opt_get(opts, "strict"); +if (strict_value) { +if (!strcmp(strict_value, "on")) { +enable_blacklist = true; +} +} #else qerror_report(ERROR_CLASS_GENERIC_ERROR, "sandboxing request but seccomp is not compiled into this build"); -- 1.8.3.1