Re: [Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist

2013-09-18 Thread Paul Moore
On Wednesday, September 18, 2013 05:32:17 PM Daniel P. Berrange wrote:
> On Wed, Sep 18, 2013 at 12:19:44PM -0400, Paul Moore wrote:
> > On Wednesday, September 18, 2013 04:59:10 PM Daniel P. Berrange wrote:
> > > On Wed, Sep 18, 2013 at 11:53:09AM -0400, Paul Moore wrote:
> > > > On Wednesday, September 18, 2013 08:38:17 AM Daniel P. Berrange wrote:
> > > > > Libvirt does not want to be in the business of creating seccomp
> > > > > syscall filters for QEMU. As mentioned before, IMHO that places an
> > > > > unacceptable burden on libvirt to know about the syscalls each a
> > > > > particular version of QEMU requires for its operation.
> > > > 
> > > > At a high level, I don't see how libvirt configuring and installing a
> > > > syscall filter is substantially different from libvirt configuring and
> > > > installing a network filter.
> > > 
> > > The rules created for a network filter have no bearing or relation to
> > > internal QEMU implementation details, as you have with syscalls, so
> > > this isn't really a relevant comparison.
> > 
> > The rules created for a network filter are directly related to the details
> > of the guest running inside of QEMU.  From a practical point of view I
> > see both network and syscall filtering as being dependent on the guest;
> > the network filtering configuration can change as the guest's services
> > change, the syscall filtering configuration can change as the QEMU
> > functionality can change.
>
> You're talking about two very different things here. Seccomp syscall
> filtering affects QEMU itself while network filter affects the guest
> OS apps inside QEMU.

>From a security standpoint I'm not entirely convinced the distinction is 
important.

> Network filtering still does not depend on the implementation details of the
> guest OS apps - it depends on the services that those apps are using.

Once again, I'm not entirely sure that worrying about the distinction between 
guest apps/services is important - it is just the "guest".

> Thus configuring network filters does not require the admin to have
> knowledge of the apps internal impl details in the way that seccomp does.

Network filters require the admin to have knowledge of what apps/services the 
guest is providing.  Syscall filters require the admin have knowledge of what 
version of QEMU is deployed on the host.  I think it is reasonable to expect 
that the admin has more knowledge, and more control, over the QEMU version 
they are using than they do over what is being run in the hosted guests.

I don't argue that arriving at the correct syscall filter configuration is 
more difficult than a network filter, but I don't see what that means we can't 
offer it as an option for the more savy admins.  Also, the libvirt patches I'm 
currently working on allow the syscall filter to be defined either as a 
whitelist or a blacklist; the blacklist approach should provide a much more 
gradual learning curve ... and in the case of containers, I suspect it might 
also be the better solution.

> > > > Also, and I recognize this is diverting away from a topic most of
> > > > qemu-devel is not interested in, what about libvirt-lxc?  What about
> > > > all of the other virtualization drivers supported by libvirt (granted,
> > > > not all would be candidates for syscall filtering, but you get the
> > > > idea).
> > > 
> > > It isn't clear to me that syscall filtering is something that's relevant
> > > for inclusion in libvirt-lxc. It seems like something that would be used
> > > by apps running inside LXC containers directly.
> > 
> > For all the same reasons that it makes sense to filter syscalls in QEMU, I
> > think it makes sense to filter syscalls in libvirt-lxc.  The fundamental
> > concern is that the kernel presents are large attack surface in the way of
> > syscalls, and it is extremely likely that any given container does not
> > have a legitimate need to call into all of the syscalls the kernel
> > presents to userspace; especially if you consider the recent approaches
> > of using containers to ship/deploy single applications.
> > 
> > Also, just in case there are some misconceptions floating around, loading
> > a syscall filter in libvirt doesn't mean the individual container
> > applications can't also load their own filter.  When multiple syscall
> > filters are present for a given process, all of the filters are evaluated
> > and the most restrictive decision for a given syscall request "wins".
> > 
> > > Libvirt has no knowledge of such apps or what rules they might require,
> > > so can't make any kind of intelligent decision about syscall filtering
> > > for LXC.
> >
> > A perfectly valid point, but I also think of syscall filtering as allowing
> > the host administrator the ability to reduce the attack surface of the
> > host system/kernel from potentially malicious containers/applications
> > without having to rely on these containers/applications to police
> > themselves.
> > 
> > > I really view seccom

Re: [Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist

2013-09-18 Thread Corey Bryant



On 09/18/2013 12:32 PM, Daniel P. Berrange wrote:

On Wed, Sep 18, 2013 at 12:19:44PM -0400, Paul Moore wrote:

On Wednesday, September 18, 2013 04:59:10 PM Daniel P. Berrange wrote:

On Wed, Sep 18, 2013 at 11:53:09AM -0400, Paul Moore wrote:

On Wednesday, September 18, 2013 08:38:17 AM Daniel P. Berrange wrote:

Libvirt does not want to be in the business of creating seccomp syscall
filters for QEMU. As mentioned before, IMHO that places an unacceptable
burden on libvirt to know about the syscalls each a particular version
of QEMU requires for its operation.


At a high level, I don't see how libvirt configuring and installing a
syscall filter is substantially different from libvirt configuring and
installing a network filter.


The rules created for a network filter have no bearing or relation to
internal QEMU implementation details, as you have with syscalls, so
this isn't really a relevant comparison.


The rules created for a network filter are directly related to the details of
the guest running inside of QEMU.  From a practical point of view I see both
network and syscall filtering as being dependent on the guest; the network
filtering configuration can change as the guest's services change, the syscall
filtering configuration can change as the QEMU functionality can change.


You're talking about two very different things here. Seccomp syscall
filtering affects QEMU itself, while network filter affects the guest
OS apps inside QEMU. Network filtering still does not depend on the
implementation details of the guest OS apps - it depends on the services
that those apps are using. Thus configuring network filters does not
require the admin to have knowledge of the apps internal impl details
in the way that seccomp does.


Also, and I recognize this is diverting away from a topic most of
qemu-devel is not interested in, what about libvirt-lxc?  What about all
of the other virtualization drivers supported by libvirt (granted, not
all would be candidates for syscall filtering, but you get the idea).


It isn't clear to me that syscall filtering is something that's relevant
for inclusion in libvirt-lxc. It seems like something that would be used
by apps running inside LXC containers directly.


For all the same reasons that it makes sense to filter syscalls in QEMU, I
think it makes sense to filter syscalls in libvirt-lxc.  The fundamental
concern is that the kernel presents are large attack surface in the way of
syscalls, and it is extremely likely that any given container does not have a
legitimate need to call into all of the syscalls the kernel presents to
userspace; especially if you consider the recent approaches of using
containers to ship/deploy single applications.

Also, just in case there are some misconceptions floating around, loading a
syscall filter in libvirt doesn't mean the individual container applications
can't also load their own filter.  When multiple syscall filters are present
for a given process, all of the filters are evaluated and the most restrictive
decision for a given syscall request "wins".


Libvirt has no knowledge of such apps or what rules they might require, so
can't make any kind of intelligent decision about syscall filtering for LXC.


A perfectly valid point, but I also think of syscall filtering as allowing the
host administrator the ability to reduce the attack surface of the host
system/kernel from potentially malicious containers/applications without
having to rely on these containers/applications to police themselves.


I really view seccomp as something that apps use directly themselves, not
something that a 3rd party process applies prior to launching the apps,
since the latter has far too much administrative burden IMHO.


The seccomp filter functionality is definitely something that apps can use
themselves, but to limit syscall filtering to just that use case is to miss
out on other valid uses as well.  As far as the burden is concerned, is
users/administrators find it too difficult, there is nothing requiring them to
use it, however, for those who are facing serious security risks in their
deployments providing syscall filtering in libvirt might be a very welcome
addition.


I'm not debating the usefulness of secomp technology, I just really don't
see it as something that is practical or sensible to encourage end users/
admins to make use of. It is hard enough for app developers themselves to
make use of it properly and they have a tonne of domain knowledge about
the internals of their application implementation. When you have uninformed
users/admins using it by trial and error I just see a support disaster
coming straight at us. That small minority who  really are skilful enough
to use it can still do so by launching the app in question via a 'runseccomp'
like too which would just install a filter & then exec the real binary.

Regards,
Daniel



An added benefit of allowing an admin to configure a seccomp filter is 
that they could potentially "patch

Re: [Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist

2013-09-18 Thread Daniel P. Berrange
On Wed, Sep 18, 2013 at 12:19:44PM -0400, Paul Moore wrote:
> On Wednesday, September 18, 2013 04:59:10 PM Daniel P. Berrange wrote:
> > On Wed, Sep 18, 2013 at 11:53:09AM -0400, Paul Moore wrote:
> > > On Wednesday, September 18, 2013 08:38:17 AM Daniel P. Berrange wrote:
> > > > Libvirt does not want to be in the business of creating seccomp syscall
> > > > filters for QEMU. As mentioned before, IMHO that places an unacceptable
> > > > burden on libvirt to know about the syscalls each a particular version
> > > > of QEMU requires for its operation.
> > > 
> > > At a high level, I don't see how libvirt configuring and installing a
> > > syscall filter is substantially different from libvirt configuring and
> > > installing a network filter.
> > 
> > The rules created for a network filter have no bearing or relation to
> > internal QEMU implementation details, as you have with syscalls, so
> > this isn't really a relevant comparison.
> 
> The rules created for a network filter are directly related to the details of 
> the guest running inside of QEMU.  From a practical point of view I see both 
> network and syscall filtering as being dependent on the guest; the network 
> filtering configuration can change as the guest's services change, the 
> syscall 
> filtering configuration can change as the QEMU functionality can change.

You're talking about two very different things here. Seccomp syscall
filtering affects QEMU itself, while network filter affects the guest
OS apps inside QEMU. Network filtering still does not depend on the
implementation details of the guest OS apps - it depends on the services
that those apps are using. Thus configuring network filters does not
require the admin to have knowledge of the apps internal impl details
in the way that seccomp does.

> > > Also, and I recognize this is diverting away from a topic most of
> > > qemu-devel is not interested in, what about libvirt-lxc?  What about all
> > > of the other virtualization drivers supported by libvirt (granted, not
> > > all would be candidates for syscall filtering, but you get the idea).
> > 
> > It isn't clear to me that syscall filtering is something that's relevant
> > for inclusion in libvirt-lxc. It seems like something that would be used
> > by apps running inside LXC containers directly.
> 
> For all the same reasons that it makes sense to filter syscalls in QEMU, I 
> think it makes sense to filter syscalls in libvirt-lxc.  The fundamental 
> concern is that the kernel presents are large attack surface in the way of 
> syscalls, and it is extremely likely that any given container does not have a 
> legitimate need to call into all of the syscalls the kernel presents to 
> userspace; especially if you consider the recent approaches of using 
> containers to ship/deploy single applications.
>
> Also, just in case there are some misconceptions floating around, loading a 
> syscall filter in libvirt doesn't mean the individual container applications 
> can't also load their own filter.  When multiple syscall filters are present 
> for a given process, all of the filters are evaluated and the most 
> restrictive 
> decision for a given syscall request "wins".
> 
> > Libvirt has no knowledge of such apps or what rules they might require, so
> > can't make any kind of intelligent decision about syscall filtering for LXC.
> 
> A perfectly valid point, but I also think of syscall filtering as allowing 
> the 
> host administrator the ability to reduce the attack surface of the host 
> system/kernel from potentially malicious containers/applications without 
> having to rely on these containers/applications to police themselves.
> 
> > I really view seccomp as something that apps use directly themselves, not
> > something that a 3rd party process applies prior to launching the apps,
> > since the latter has far too much administrative burden IMHO.
> 
> The seccomp filter functionality is definitely something that apps can use 
> themselves, but to limit syscall filtering to just that use case is to miss 
> out on other valid uses as well.  As far as the burden is concerned, is 
> users/administrators find it too difficult, there is nothing requiring them 
> to 
> use it, however, for those who are facing serious security risks in their 
> deployments providing syscall filtering in libvirt might be a very welcome 
> addition.

I'm not debating the usefulness of secomp technology, I just really don't
see it as something that is practical or sensible to encourage end users/
admins to make use of. It is hard enough for app developers themselves to
make use of it properly and they have a tonne of domain knowledge about
the internals of their application implementation. When you have uninformed
users/admins using it by trial and error I just see a support disaster
coming straight at us. That small minority who  really are skilful enough
to use it can still do so by launching the app in question via a 'runseccomp'
like too which would

Re: [Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist

2013-09-18 Thread Daniel P. Berrange
On Wed, Sep 18, 2013 at 11:53:09AM -0400, Paul Moore wrote:
> On Wednesday, September 18, 2013 08:38:17 AM Daniel P. Berrange wrote:
> > Libvirt does not want to be in the business of creating seccomp syscall
> > filters for QEMU. As mentioned before, IMHO that places an unacceptable
> > burden on libvirt to know about the syscalls each a particular version
> > of QEMU requires for its operation.
> 
> At a high level, I don't see how libvirt configuring and installing a syscall 
> filter is substantially different from libvirt configuring and installing a 
> network filter.

The rules created for a network filter have no bearing or relation to
internal QEMU implementation details, as you have with syscalls, so
this isn't really a relevant comparison.

> Also, and I recognize this is diverting away from a topic most of qemu-devel 
> is not interested in, what about libvirt-lxc?  What about all of the other 
> virtualization drivers supported by libvirt (granted, not all would be 
> candidates for syscall filtering, but you get the idea).

It isn't clear to me that syscall filtering is something that's relevant
for inclusion in libvirt-lxc. It seems like something that would be used
by apps running inside LXC containers directly. Libvirt has no knowledge
of such apps or what rules they might require, so can't make any kind of
intelligent decision about syscall filtering for LXC. I really view
seccomp as something that apps use directly themselves, not something
that a 3rd party process applies prior to launching the apps, since the
latter has far too much administrative burden IMHO.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist

2013-09-18 Thread Paul Moore
On Wednesday, September 18, 2013 08:38:17 AM Daniel P. Berrange wrote:
> Libvirt does not want to be in the business of creating seccomp syscall
> filters for QEMU. As mentioned before, IMHO that places an unacceptable
> burden on libvirt to know about the syscalls each a particular version
> of QEMU requires for its operation.

At a high level, I don't see how libvirt configuring and installing a syscall 
filter is substantially different from libvirt configuring and installing a 
network filter.

Also, and I recognize this is diverting away from a topic most of qemu-devel 
is not interested in, what about libvirt-lxc?  What about all of the other 
virtualization drivers supported by libvirt (granted, not all would be 
candidates for syscall filtering, but you get the idea).

-- 
paul moore
security and virtualization @ redhat




Re: [Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist

2013-09-18 Thread Paul Moore
On Wednesday, September 18, 2013 04:59:10 PM Daniel P. Berrange wrote:
> On Wed, Sep 18, 2013 at 11:53:09AM -0400, Paul Moore wrote:
> > On Wednesday, September 18, 2013 08:38:17 AM Daniel P. Berrange wrote:
> > > Libvirt does not want to be in the business of creating seccomp syscall
> > > filters for QEMU. As mentioned before, IMHO that places an unacceptable
> > > burden on libvirt to know about the syscalls each a particular version
> > > of QEMU requires for its operation.
> > 
> > At a high level, I don't see how libvirt configuring and installing a
> > syscall filter is substantially different from libvirt configuring and
> > installing a network filter.
> 
> The rules created for a network filter have no bearing or relation to
> internal QEMU implementation details, as you have with syscalls, so
> this isn't really a relevant comparison.

The rules created for a network filter are directly related to the details of 
the guest running inside of QEMU.  From a practical point of view I see both 
network and syscall filtering as being dependent on the guest; the network 
filtering configuration can change as the guest's services change, the syscall 
filtering configuration can change as the QEMU functionality can change.

> > Also, and I recognize this is diverting away from a topic most of
> > qemu-devel is not interested in, what about libvirt-lxc?  What about all
> > of the other virtualization drivers supported by libvirt (granted, not
> > all would be candidates for syscall filtering, but you get the idea).
> 
> It isn't clear to me that syscall filtering is something that's relevant
> for inclusion in libvirt-lxc. It seems like something that would be used
> by apps running inside LXC containers directly.

For all the same reasons that it makes sense to filter syscalls in QEMU, I 
think it makes sense to filter syscalls in libvirt-lxc.  The fundamental 
concern is that the kernel presents are large attack surface in the way of 
syscalls, and it is extremely likely that any given container does not have a 
legitimate need to call into all of the syscalls the kernel presents to 
userspace; especially if you consider the recent approaches of using 
containers to ship/deploy single applications.

Also, just in case there are some misconceptions floating around, loading a 
syscall filter in libvirt doesn't mean the individual container applications 
can't also load their own filter.  When multiple syscall filters are present 
for a given process, all of the filters are evaluated and the most restrictive 
decision for a given syscall request "wins".

> Libvirt has no knowledge of such apps or what rules they might require, so
> can't make any kind of intelligent decision about syscall filtering for LXC.

A perfectly valid point, but I also think of syscall filtering as allowing the 
host administrator the ability to reduce the attack surface of the host 
system/kernel from potentially malicious containers/applications without 
having to rely on these containers/applications to police themselves.

> I really view seccomp as something that apps use directly themselves, not
> something that a 3rd party process applies prior to launching the apps,
> since the latter has far too much administrative burden IMHO.

The seccomp filter functionality is definitely something that apps can use 
themselves, but to limit syscall filtering to just that use case is to miss 
out on other valid uses as well.  As far as the burden is concerned, is 
users/administrators find it too difficult, there is nothing requiring them to 
use it, however, for those who are facing serious security risks in their 
deployments providing syscall filtering in libvirt might be a very welcome 
addition.

-- 
paul moore
security and virtualization @ redhat




Re: [Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist

2013-09-18 Thread Daniel P. Berrange
On Tue, Sep 17, 2013 at 03:17:28PM -0400, Corey Bryant wrote:
> 
> 
> On 09/17/2013 01:14 PM, Eduardo Otubo wrote:
> >
> >
> >On 09/17/2013 11:43 AM, Paul Moore wrote:
> >>On Tuesday, September 17, 2013 02:06:06 PM Daniel P. Berrange wrote:
> >>>On Tue, Sep 17, 2013 at 10:01:23AM -0300, Eduardo Otubo wrote:
> >>>
> Paul, what exactly are you planning to add to libvirt? I'm not a big
> fan of using qemu command line to pass syscalls for blacklist as
> arguments, but I can't see other way to avoid problems (like -net
> bridge / -net tap) from happening.
> >>
> >>At present, and as far as I'm concerned pretty much everything is open
> >>for
> >>discussion, the code works similar to the libvirt network filters.
> >>You create
> >>a separate XML configuration file which defines the filter and you
> >>reference
> >>that filter from the domain's XML configuration.  When a QEMU/KVM or
> >>LXC based
> >>domain starts it uses libseccomp to create the seccomp filter and then
> >>loads
> >>it into the kernel after the fork but before the domain is exec'd.
> >
> >Clever approach. I tihnk a possible way to do this is something like:
> >
> >  -sandbox
> >-on[,strict=][,whitelist=qemu_whitelist.conf][,blacklist=qemu_blacklist.conf]
> >
> >
> > where:
> >
> >[,whitelist=qemu_whitelist.conf] will override default whitelist filter
> >[,blacklist=blacklist.conf] will override default blacklist filter
> >
> >But when we add seccomp support for qemu on libvirt, we make sure to
> >just add -sandbox off and use Paul's approach.
> >
> >Is that a reasonable approach? What do you think?
> >
> 
> QEMU wouldn't require any changes for the approach Paul describes.
> The QEMU process that is exec'd by libvirt would be constrained by
> the filter that libvirt installed.

Libvirt does not want to be in the business of creating seccomp syscall
filters for QEMU. As mentioned before, IMHO that places an unacceptable
burden on libvirt to know about the syscalls each a particular version
of QEMU requires for its operation.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist

2013-09-18 Thread Daniel P. Berrange
On Tue, Sep 17, 2013 at 02:14:25PM -0300, Eduardo Otubo wrote:
> 
> 
> On 09/17/2013 11:43 AM, Paul Moore wrote:
> >On Tuesday, September 17, 2013 02:06:06 PM Daniel P. Berrange wrote:
> >>On Tue, Sep 17, 2013 at 10:01:23AM -0300, Eduardo Otubo wrote:
> >>
> >>>Paul, what exactly are you planning to add to libvirt? I'm not a big
> >>>fan of using qemu command line to pass syscalls for blacklist as
> >>>arguments, but I can't see other way to avoid problems (like -net
> >>>bridge / -net tap) from happening.
> >
> >At present, and as far as I'm concerned pretty much everything is open for
> >discussion, the code works similar to the libvirt network filters.  You 
> >create
> >a separate XML configuration file which defines the filter and you reference
> >that filter from the domain's XML configuration.  When a QEMU/KVM or LXC 
> >based
> >domain starts it uses libseccomp to create the seccomp filter and then loads
> >it into the kernel after the fork but before the domain is exec'd.
> 
> Clever approach. I tihnk a possible way to do this is something like:
> 
>  -sandbox 
> -on[,strict=][,whitelist=qemu_whitelist.conf][,blacklist=qemu_blacklist.conf]
> 
>   where:
> 
> [,whitelist=qemu_whitelist.conf] will override default whitelist filter
> [,blacklist=blacklist.conf] will override default blacklist filter
> 
> But when we add seccomp support for qemu on libvirt, we make sure to
> just add -sandbox off and use Paul's approach.
> 
> Is that a reasonable approach? What do you think?

IMHO the same problem exists for non-libvirt apps using QEMU. Exposing
lists of syscalls as a config option requires applications using QEMU
to know far too much about QEMU's internal implementation details. With
this syntax either apps have to read the source to find out which syscalls
to allow, or they have to use trial & error launching QEMU repeatedly
to see what breaks. Neither of these are nice to applications. IMHO any
configuration of syscalls lists should be exclusively QEMU's responsibility.

What is your actual goal here ? If the goal is to make it possible to
use arbitrary command line arguments, then IMHO, QEMU should just look
at the args given and automatically just "do the right thing" with the
syscall whitelists. Of course per my previous message, I think making
all possible args work under seccomp should be a non-goal.

> >There are no command line arguments passed to QEMU.  This work can co-exist
> >with the QEMU seccomp filters without problem.
> >
> >The original goal of this effort wasn't to add libvirt syscall filtering for
> >QEMU, but rather for LXC; adding QEMU support just happened to be a trivial
> >patch once the LXC support was added.
> >
> >(I also apologize for the delays, I hit a snag with an existing problem on
> >libvirt which stopped work and then some other BZs grabbed my attention...)
> >
> >>IMHO, if libvirt is enabling seccomp, then making all possible cli
> >>args work is a non-goal. If there are things which require privileges
> >>seccomp is blocking, then libvirt should avoid using them. eg by making
> >>use of FD passing where appropriate to reduce privileges qemu needs.
> >
> >I agree.


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist

2013-09-17 Thread Eduardo Otubo



On 09/17/2013 04:17 PM, Corey Bryant wrote:



On 09/17/2013 01:14 PM, Eduardo Otubo wrote:



On 09/17/2013 11:43 AM, Paul Moore wrote:

On Tuesday, September 17, 2013 02:06:06 PM Daniel P. Berrange wrote:

On Tue, Sep 17, 2013 at 10:01:23AM -0300, Eduardo Otubo wrote:


Paul, what exactly are you planning to add to libvirt? I'm not a big
fan of using qemu command line to pass syscalls for blacklist as
arguments, but I can't see other way to avoid problems (like -net
bridge / -net tap) from happening.


At present, and as far as I'm concerned pretty much everything is open
for
discussion, the code works similar to the libvirt network filters.
You create
a separate XML configuration file which defines the filter and you
reference
that filter from the domain's XML configuration.  When a QEMU/KVM or
LXC based
domain starts it uses libseccomp to create the seccomp filter and then
loads
it into the kernel after the fork but before the domain is exec'd.


Clever approach. I tihnk a possible way to do this is something like:

  -sandbox
-on[,strict=][,whitelist=qemu_whitelist.conf][,blacklist=qemu_blacklist.conf]



 where:

[,whitelist=qemu_whitelist.conf] will override default whitelist filter
[,blacklist=blacklist.conf] will override default blacklist filter

But when we add seccomp support for qemu on libvirt, we make sure to
just add -sandbox off and use Paul's approach.

Is that a reasonable approach? What do you think?



QEMU wouldn't require any changes for the approach Paul describes. The
QEMU process that is exec'd by libvirt would be constrained by the
filter that libvirt installed.



Yes, that is correct. But I'm thinking about the case when Qemu is run 
stand-alone, without libvirt. There must be a way to configure it 
without using a pre configured filter from libvirt.


--
Eduardo Otubo
IBM Linux Technology Center




Re: [Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist

2013-09-17 Thread Corey Bryant



On 09/17/2013 01:14 PM, Eduardo Otubo wrote:



On 09/17/2013 11:43 AM, Paul Moore wrote:

On Tuesday, September 17, 2013 02:06:06 PM Daniel P. Berrange wrote:

On Tue, Sep 17, 2013 at 10:01:23AM -0300, Eduardo Otubo wrote:


Paul, what exactly are you planning to add to libvirt? I'm not a big
fan of using qemu command line to pass syscalls for blacklist as
arguments, but I can't see other way to avoid problems (like -net
bridge / -net tap) from happening.


At present, and as far as I'm concerned pretty much everything is open
for
discussion, the code works similar to the libvirt network filters.
You create
a separate XML configuration file which defines the filter and you
reference
that filter from the domain's XML configuration.  When a QEMU/KVM or
LXC based
domain starts it uses libseccomp to create the seccomp filter and then
loads
it into the kernel after the fork but before the domain is exec'd.


Clever approach. I tihnk a possible way to do this is something like:

  -sandbox
-on[,strict=][,whitelist=qemu_whitelist.conf][,blacklist=qemu_blacklist.conf]


 where:

[,whitelist=qemu_whitelist.conf] will override default whitelist filter
[,blacklist=blacklist.conf] will override default blacklist filter

But when we add seccomp support for qemu on libvirt, we make sure to
just add -sandbox off and use Paul's approach.

Is that a reasonable approach? What do you think?



QEMU wouldn't require any changes for the approach Paul describes. The 
QEMU process that is exec'd by libvirt would be constrained by the 
filter that libvirt installed.


--
Regards,
Corey Bryant



There are no command line arguments passed to QEMU.  This work can
co-exist
with the QEMU seccomp filters without problem.

The original goal of this effort wasn't to add libvirt syscall
filtering for
QEMU, but rather for LXC; adding QEMU support just happened to be a
trivial
patch once the LXC support was added.

(I also apologize for the delays, I hit a snag with an existing
problem on
libvirt which stopped work and then some other BZs grabbed my
attention...)


IMHO, if libvirt is enabling seccomp, then making all possible cli
args work is a non-goal. If there are things which require privileges
seccomp is blocking, then libvirt should avoid using them. eg by making
use of FD passing where appropriate to reduce privileges qemu needs.


I agree.








Re: [Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist

2013-09-17 Thread Eduardo Otubo



On 09/17/2013 02:14 PM, Eduardo Otubo wrote:



On 09/17/2013 11:43 AM, Paul Moore wrote:

On Tuesday, September 17, 2013 02:06:06 PM Daniel P. Berrange wrote:

On Tue, Sep 17, 2013 at 10:01:23AM -0300, Eduardo Otubo wrote:


Paul, what exactly are you planning to add to libvirt? I'm not a big
fan of using qemu command line to pass syscalls for blacklist as
arguments, but I can't see other way to avoid problems (like -net
bridge / -net tap) from happening.


At present, and as far as I'm concerned pretty much everything is open
for
discussion, the code works similar to the libvirt network filters.
You create
a separate XML configuration file which defines the filter and you
reference
that filter from the domain's XML configuration.  When a QEMU/KVM or
LXC based
domain starts it uses libseccomp to create the seccomp filter and then
loads
it into the kernel after the fork but before the domain is exec'd.


Clever approach. I tihnk a possible way to do this is something like:

  -sandbox
-on[,strict=][,whitelist=qemu_whitelist.conf][,blacklist=qemu_blacklist.conf]


 where:

[,whitelist=qemu_whitelist.conf] will override default whitelist filter
[,blacklist=blacklist.conf] will override default blacklist filter

But when we add seccomp support for qemu on libvirt, we make sure to
just add -sandbox off and use Paul's approach.

Is that a reasonable approach? What do you think?


This approach is also interesting from the test point of view. I'll be 
able to write more complex tests on virt-test. General tests like 
"remove one syscall at a time from whitelist and test" --without the 
need of sed'ing the code and recompiling every time, or even include new 
syscalls to the blacklist.






There are no command line arguments passed to QEMU.  This work can
co-exist
with the QEMU seccomp filters without problem.

The original goal of this effort wasn't to add libvirt syscall
filtering for
QEMU, but rather for LXC; adding QEMU support just happened to be a
trivial
patch once the LXC support was added.

(I also apologize for the delays, I hit a snag with an existing
problem on
libvirt which stopped work and then some other BZs grabbed my
attention...)


IMHO, if libvirt is enabling seccomp, then making all possible cli
args work is a non-goal. If there are things which require privileges
seccomp is blocking, then libvirt should avoid using them. eg by making
use of FD passing where appropriate to reduce privileges qemu needs.


I agree.





--
Eduardo Otubo
IBM Linux Technology Center




Re: [Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist

2013-09-17 Thread Eduardo Otubo



On 09/17/2013 11:43 AM, Paul Moore wrote:

On Tuesday, September 17, 2013 02:06:06 PM Daniel P. Berrange wrote:

On Tue, Sep 17, 2013 at 10:01:23AM -0300, Eduardo Otubo wrote:


Paul, what exactly are you planning to add to libvirt? I'm not a big
fan of using qemu command line to pass syscalls for blacklist as
arguments, but I can't see other way to avoid problems (like -net
bridge / -net tap) from happening.


At present, and as far as I'm concerned pretty much everything is open for
discussion, the code works similar to the libvirt network filters.  You create
a separate XML configuration file which defines the filter and you reference
that filter from the domain's XML configuration.  When a QEMU/KVM or LXC based
domain starts it uses libseccomp to create the seccomp filter and then loads
it into the kernel after the fork but before the domain is exec'd.


Clever approach. I tihnk a possible way to do this is something like:

 -sandbox 
-on[,strict=][,whitelist=qemu_whitelist.conf][,blacklist=qemu_blacklist.conf]


where:

[,whitelist=qemu_whitelist.conf] will override default whitelist filter
[,blacklist=blacklist.conf] will override default blacklist filter

But when we add seccomp support for qemu on libvirt, we make sure to 
just add -sandbox off and use Paul's approach.


Is that a reasonable approach? What do you think?



There are no command line arguments passed to QEMU.  This work can co-exist
with the QEMU seccomp filters without problem.

The original goal of this effort wasn't to add libvirt syscall filtering for
QEMU, but rather for LXC; adding QEMU support just happened to be a trivial
patch once the LXC support was added.

(I also apologize for the delays, I hit a snag with an existing problem on
libvirt which stopped work and then some other BZs grabbed my attention...)


IMHO, if libvirt is enabling seccomp, then making all possible cli
args work is a non-goal. If there are things which require privileges
seccomp is blocking, then libvirt should avoid using them. eg by making
use of FD passing where appropriate to reduce privileges qemu needs.


I agree.



--
Eduardo Otubo
IBM Linux Technology Center




Re: [Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist

2013-09-17 Thread Paul Moore
On Tuesday, September 17, 2013 02:06:06 PM Daniel P. Berrange wrote:
> On Tue, Sep 17, 2013 at 10:01:23AM -0300, Eduardo Otubo wrote:
>
> > Paul, what exactly are you planning to add to libvirt? I'm not a big
> > fan of using qemu command line to pass syscalls for blacklist as
> > arguments, but I can't see other way to avoid problems (like -net
> > bridge / -net tap) from happening.

At present, and as far as I'm concerned pretty much everything is open for 
discussion, the code works similar to the libvirt network filters.  You create 
a separate XML configuration file which defines the filter and you reference 
that filter from the domain's XML configuration.  When a QEMU/KVM or LXC based 
domain starts it uses libseccomp to create the seccomp filter and then loads 
it into the kernel after the fork but before the domain is exec'd.

There are no command line arguments passed to QEMU.  This work can co-exist 
with the QEMU seccomp filters without problem.

The original goal of this effort wasn't to add libvirt syscall filtering for 
QEMU, but rather for LXC; adding QEMU support just happened to be a trivial 
patch once the LXC support was added.

(I also apologize for the delays, I hit a snag with an existing problem on 
libvirt which stopped work and then some other BZs grabbed my attention...)

> IMHO, if libvirt is enabling seccomp, then making all possible cli
> args work is a non-goal. If there are things which require privileges
> seccomp is blocking, then libvirt should avoid using them. eg by making
> use of FD passing where appropriate to reduce privileges qemu needs.

I agree.

-- 
paul moore
security and virtualization @ redhat




Re: [Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist

2013-09-17 Thread Daniel P. Berrange
On Tue, Sep 17, 2013 at 10:01:23AM -0300, Eduardo Otubo wrote:
> 
> 
> On 09/11/2013 01:49 PM, Daniel P. Berrange wrote:
> >On Wed, Sep 11, 2013 at 12:45:54PM -0400, Corey Bryant wrote:
> >>
> >>
> >>On 09/06/2013 03:21 PM, Eduardo Otubo wrote:
> >>>New command line options for the seccomp blacklist feature:
> >>>
> >>>  $ qemu -sandbox on[,strict=]
> >>>
> >>>The strict parameter will turn on or off the new system call blacklist
> >>
> >>I mentioned this before but I'll say it again since I think it needs
> >>to be discussed.  Since this regresses support (it'll prevent -net
> >>bridge and -net tap from using execv) the concern I have with the
> >>strict=on|off option is whether or not we will have the flexibility
> >>to modify the blacklist once QEMU is released with this support.  Of
> >>course we should be able to add more syscalls to the blacklist as
> >>long as they don't regress QEMU functionality.  But if we want to
> >>add a syscall that does regress QEMU functionality, I think we'd
> >>have to add a new command line option, which doesn't seem desirable.
> >>
> >>So a more flexible approach may be necessary.  Maybe the blacklist
> >>should be passed on the command line, which would enable it to be
> >>defined by libvirt and passed to QEMU.  I know Paul is working on
> >>something for libvirt so maybe that answers this question.
> 
> Paul, what exactly are you planning to add to libvirt? I'm not a big
> fan of using qemu command line to pass syscalls for blacklist as
> arguments, but I can't see other way to avoid problems (like -net
> bridge / -net tap) from happening.

IMHO, if libvirt is enabling seccomp, then making all possible cli
args work is a non-goal. If there are things which require privileges
seccomp is blocking, then libvirt should avoid using them. eg by making
use of FD passing where appropriate to reduce privileges qemu needs.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist

2013-09-17 Thread Eduardo Otubo



On 09/11/2013 01:49 PM, Daniel P. Berrange wrote:

On Wed, Sep 11, 2013 at 12:45:54PM -0400, Corey Bryant wrote:



On 09/06/2013 03:21 PM, Eduardo Otubo wrote:

New command line options for the seccomp blacklist feature:

  $ qemu -sandbox on[,strict=]

The strict parameter will turn on or off the new system call blacklist


I mentioned this before but I'll say it again since I think it needs
to be discussed.  Since this regresses support (it'll prevent -net
bridge and -net tap from using execv) the concern I have with the
strict=on|off option is whether or not we will have the flexibility
to modify the blacklist once QEMU is released with this support.  Of
course we should be able to add more syscalls to the blacklist as
long as they don't regress QEMU functionality.  But if we want to
add a syscall that does regress QEMU functionality, I think we'd
have to add a new command line option, which doesn't seem desirable.

So a more flexible approach may be necessary.  Maybe the blacklist
should be passed on the command line, which would enable it to be
defined by libvirt and passed to QEMU.  I know Paul is working on
something for libvirt so maybe that answers this question.


Paul, what exactly are you planning to add to libvirt? I'm not a big fan 
of using qemu command line to pass syscalls for blacklist as arguments, 
but I can't see other way to avoid problems (like -net bridge / -net 
tap) from happening.




On the face of it, I'm not at all a fan of the idea of libvirt having
to pass a syscall whitelist/blacklist to QEMU. IMHO that would be
exposing too much knowledge of QEMU impl details to libvirt.

Daniel



--
Eduardo Otubo
IBM Linux Technology Center




Re: [Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist

2013-09-11 Thread Daniel P. Berrange
On Wed, Sep 11, 2013 at 12:45:54PM -0400, Corey Bryant wrote:
> 
> 
> On 09/06/2013 03:21 PM, Eduardo Otubo wrote:
> >New command line options for the seccomp blacklist feature:
> >
> >  $ qemu -sandbox on[,strict=]
> >
> >The strict parameter will turn on or off the new system call blacklist
> 
> I mentioned this before but I'll say it again since I think it needs
> to be discussed.  Since this regresses support (it'll prevent -net
> bridge and -net tap from using execv) the concern I have with the
> strict=on|off option is whether or not we will have the flexibility
> to modify the blacklist once QEMU is released with this support.  Of
> course we should be able to add more syscalls to the blacklist as
> long as they don't regress QEMU functionality.  But if we want to
> add a syscall that does regress QEMU functionality, I think we'd
> have to add a new command line option, which doesn't seem desirable.
> 
> So a more flexible approach may be necessary.  Maybe the blacklist
> should be passed on the command line, which would enable it to be
> defined by libvirt and passed to QEMU.  I know Paul is working on
> something for libvirt so maybe that answers this question.

On the face of it, I'm not at all a fan of the idea of libvirt having
to pass a syscall whitelist/blacklist to QEMU. IMHO that would be
exposing too much knowledge of QEMU impl details to libvirt.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist

2013-09-11 Thread Corey Bryant



On 09/06/2013 03:21 PM, Eduardo Otubo wrote:

New command line options for the seccomp blacklist feature:

  $ qemu -sandbox on[,strict=]

The strict parameter will turn on or off the new system call blacklist


I mentioned this before but I'll say it again since I think it needs to 
be discussed.  Since this regresses support (it'll prevent -net bridge 
and -net tap from using execv) the concern I have with the strict=on|off 
option is whether or not we will have the flexibility to modify the 
blacklist once QEMU is released with this support.  Of course we should 
be able to add more syscalls to the blacklist as long as they don't 
regress QEMU functionality.  But if we want to add a syscall that does 
regress QEMU functionality, I think we'd have to add a new command line 
option, which doesn't seem desirable.


So a more flexible approach may be necessary.  Maybe the blacklist 
should be passed on the command line, which would enable it to be 
defined by libvirt and passed to QEMU.  I know Paul is working on 
something for libvirt so maybe that answers this question.




Signed-off-by: Eduardo Otubo 
---
  qemu-options.hx |  8 +---
  vl.c| 11 ++-
  2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index d15338e..05485e1 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2978,13 +2978,15 @@ Old param mode (ARM only).
  ETEXI

  DEF("sandbox", HAS_ARG, QEMU_OPTION_sandbox, \
-"-sandbox   Enable seccomp mode 2 system call filter (default 
'off').\n",
+"-sandbox   Enable seccomp mode 2 system call filter (default 
'off').\n"
+"-sandbox on[,strict=]\n"
+"Enable seccomp mode 2 system call second level filter (default 
'off').\n",


Does this need to mention the QEMU features restricted by the blacklist?


  QEMU_ARCH_ALL)
  STEXI
-@item -sandbox @var{arg}
+@item -sandbox @var{arg}[,strict=@var{value}]
  @findex -sandbox
  Enable Seccomp mode 2 system call filter. 'on' will enable syscall filtering 
and 'off' will
-disable it.  The default is 'off'.
+disable it.  The default is 'off'. 'strict=on' will enable second level filter 
(default is 'off').


And here too?


  ETEXI

  DEF("readconfig", HAS_ARG, QEMU_OPTION_readconfig,
diff --git a/vl.c b/vl.c
index 02f7486..909f685 100644
--- a/vl.c
+++ b/vl.c
@@ -329,6 +329,9 @@ static QemuOptsList qemu_sandbox_opts = {
  {
  .name = "enable",
  .type = QEMU_OPT_BOOL,
+},{
+.name = "strict",
+.type = QEMU_OPT_STRING,
  },
  { /* end of list */ }
  },
@@ -1031,6 +1034,7 @@ static int bt_parse(const char *opt)

  static int parse_sandbox(QemuOpts *opts, void *opaque)
  {
+const char * strict_value = NULL;
  /* FIXME: change this to true for 1.3 */
  if (qemu_opt_get_bool(opts, "enable", false)) {
  #ifdef CONFIG_SECCOMP
@@ -1040,7 +1044,12 @@ static int parse_sandbox(QemuOpts *opts, void *opaque)
  return -1;
  }

-enable_blacklist = true;
+strict_value = qemu_opt_get(opts, "strict");
+if (strict_value) {
+if (!strcmp(strict_value, "on")) {
+enable_blacklist = true;
+}
+}
  #else
  qerror_report(ERROR_CLASS_GENERIC_ERROR,
"sandboxing request but seccomp is not compiled into this 
build");



--
Regards,
Corey Bryant




[Qemu-devel] [PATCHv2 2/3] seccomp: adding command line support for blacklist

2013-09-06 Thread Eduardo Otubo
New command line options for the seccomp blacklist feature:

 $ qemu -sandbox on[,strict=]

The strict parameter will turn on or off the new system call blacklist

Signed-off-by: Eduardo Otubo 
---
 qemu-options.hx |  8 +---
 vl.c| 11 ++-
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index d15338e..05485e1 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2978,13 +2978,15 @@ Old param mode (ARM only).
 ETEXI
 
 DEF("sandbox", HAS_ARG, QEMU_OPTION_sandbox, \
-"-sandbox   Enable seccomp mode 2 system call filter (default 
'off').\n",
+"-sandbox   Enable seccomp mode 2 system call filter (default 
'off').\n"
+"-sandbox on[,strict=]\n"
+"Enable seccomp mode 2 system call second level filter 
(default 'off').\n",
 QEMU_ARCH_ALL)
 STEXI
-@item -sandbox @var{arg}
+@item -sandbox @var{arg}[,strict=@var{value}]
 @findex -sandbox
 Enable Seccomp mode 2 system call filter. 'on' will enable syscall filtering 
and 'off' will
-disable it.  The default is 'off'.
+disable it.  The default is 'off'. 'strict=on' will enable second level filter 
(default is 'off').
 ETEXI
 
 DEF("readconfig", HAS_ARG, QEMU_OPTION_readconfig,
diff --git a/vl.c b/vl.c
index 02f7486..909f685 100644
--- a/vl.c
+++ b/vl.c
@@ -329,6 +329,9 @@ static QemuOptsList qemu_sandbox_opts = {
 {
 .name = "enable",
 .type = QEMU_OPT_BOOL,
+},{
+.name = "strict",
+.type = QEMU_OPT_STRING,
 },
 { /* end of list */ }
 },
@@ -1031,6 +1034,7 @@ static int bt_parse(const char *opt)
 
 static int parse_sandbox(QemuOpts *opts, void *opaque)
 {
+const char * strict_value = NULL;
 /* FIXME: change this to true for 1.3 */
 if (qemu_opt_get_bool(opts, "enable", false)) {
 #ifdef CONFIG_SECCOMP
@@ -1040,7 +1044,12 @@ static int parse_sandbox(QemuOpts *opts, void *opaque)
 return -1;
 }
 
-enable_blacklist = true;
+strict_value = qemu_opt_get(opts, "strict");
+if (strict_value) {
+if (!strcmp(strict_value, "on")) {
+enable_blacklist = true;
+}
+}
 #else
 qerror_report(ERROR_CLASS_GENERIC_ERROR,
   "sandboxing request but seccomp is not compiled into 
this build");
-- 
1.8.3.1