Re: [lxc-users] apparmor profile for systemd containers (WAS: Fedora container thinks it is not running)

2014-06-19 Thread Fajar A. Nugraha
On Thu, Jun 19, 2014 at 9:01 PM, Michael H. Warfield m...@wittsend.com
wrote:

 All concerned participants:

 Was there any further update on this problem?  I'd like to know if we
 (I) should be updating the templates for either this aa_profile thing or
 for the mount sets.



IIRC Christian was going to try something?

So far all my test with every suggested values of lxc.mount.auto (including
cgroup-full:mixed) isn't enough to got f20 container running under the
default apparmor profile. I either have to:
- use unconfined profile. Works, but vulnerable to most known lxc exploit.
- use lxc.hook.mount and lxc.hook.post-stop scripts that create and
bind-mount a new, empty, systemd cgroup hiearchy to the container's
/sys/fs/cgroup/systemd.
Kinda messy, but this way it's still protected by the apparmor profile.

The second approach is more ideal if it can be made into something like
lxc.mount.auto=cgroup:systemd-new setting, but it's way beyond what I'm
capable of.

For the next lxc release, as a user I suggest to just uncomment the
aa_profile line.

-- 
Fajar


 Regards,
 Mike

 On Fri, 2014-05-30 at 01:00 +0200, Christian Seiler wrote:
  Hi,
 
   # lxc-attach -n f20 -- mount | grep cgroup
   cgroup on /sys/fs/cgroup type tmpfs (rw,relatime,size=12k,mode=755)
   none on /sys/fs/cgroup/cgmanager type tmpfs
 (rw,relatime,size=4k,mode=755)
   tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,mode=755)
 
  :-( This appears to be a rather nasty bug...
 
   lxc does read the file /etc/lxc/lxc.conf that I created, verfied by
   the fact that lxc.cgroup.pattern works correctly. It does not,
   however, create the directory /sys/fs/cgroup/systemd/lxc-all/f20
   (which, if I understand correctly, it should, since I use
   lxc.cgroup.use = @all)
  
   # ls -d /sys/fs/cgroup/*/lxc-all/f20
   /sys/fs/cgroup/blkio/lxc-all/f20/sys/fs/cgroup/cpuset/lxc-all/f20
/sys/fs/cgroup/hugetlb/lxc-all/f20
   /sys/fs/cgroup/cpuacct/lxc-all/f20  /sys/fs/cgroup/devices/lxc-all/f20
/sys/fs/cgroup/memory/lxc-all/f20
   /sys/fs/cgroup/cpu/lxc-all/f20  /sys/fs/cgroup/freezer/lxc-all/f20
/sys/fs/cgroup/perf_event/lxc-all/f20
  
   # mount | grep cgroup
   none on /sys/fs/cgroup type tmpfs (rw,relatime,size=4k,mode=755)
   cgroup on /sys/fs/cgroup/cpuset type cgroup
  
 (rw,relatime,cpuset,release_agent=/run/cgmanager/agents/cgm-release-agent.cpuset,clone_children)
   cgroup on /sys/fs/cgroup/cpu type cgroup
  
 (rw,relatime,cpu,release_agent=/run/cgmanager/agents/cgm-release-agent.cpu)
   cgroup on /sys/fs/cgroup/cpuacct type cgroup
  
 (rw,relatime,cpuacct,release_agent=/run/cgmanager/agents/cgm-release-agent.cpuacct)
   cgroup on /sys/fs/cgroup/memory type cgroup
  
 (rw,relatime,memory,release_agent=/run/cgmanager/agents/cgm-release-agent.memory)
   cgroup on /sys/fs/cgroup/devices type cgroup
  
 (rw,relatime,devices,release_agent=/run/cgmanager/agents/cgm-release-agent.devices)
   cgroup on /sys/fs/cgroup/freezer type cgroup
  
 (rw,relatime,freezer,release_agent=/run/cgmanager/agents/cgm-release-agent.freezer)
   cgroup on /sys/fs/cgroup/blkio type cgroup
  
 (rw,relatime,blkio,release_agent=/run/cgmanager/agents/cgm-release-agent.blkio)
   cgroup on /sys/fs/cgroup/perf_event type cgroup
  
 (rw,relatime,perf_event,release_agent=/run/cgmanager/agents/cgm-release-agent.perf_event)
   cgroup on /sys/fs/cgroup/hugetlb type cgroup
  
 (rw,relatime,hugetlb,release_agent=/run/cgmanager/agents/cgm-release-agent.hugetlb)
   systemd on /sys/fs/cgroup/systemd type cgroup
  
 (rw,nosuid,nodev,noexec,relatime,release_agent=/run/cgmanager/agents/cgm-release-agent.systemd,name=systemd)
 
  Hmm, are you running cgmanager at the same time as systemd? I think this
  might be a problem with the intersection of cgmanager with the cgroup
  mounting code, i.e. the cgroup mounting code uses the cgfs stuff (which
  was originally just cgroup before Serge implemented multiple drivers)
  while the put the container into cgroup code uses cgmanager, which may
  have some weird side effect in this case. I have to confess that so far
  I haven't tried cgmanager myself (it's on my todo list), so I never
  tested the interaction between Serge's cgmanager code and my cgroup
  mounting code...
 
  If you are running cgmanager, could you try the same while cgmanager
  being stopped? Then LXC should fall back to the cgfs code, which
  *should* work in this case, unless something else broke this logic.
 
  Anyway, I'll have a chance to look at this more closely on Saturday (I'm
  busy with other things tomorrow).
 
  Regards,
  Christian


 --
 Michael H. Warfield (AI4NB) | (770) 978-7061 |  m...@wittsend.com
/\/\|=mhw=|\/\/  | (678) 463-0932 |
 http://www.wittsend.com/mhw/
NIC whois: MHW9  | An optimist believes we live in the best of
 all
  PGP Key: 0x674627FF| possible worlds.  A pessimist is sure of it!


___
lxc-users mailing list

Re: [lxc-users] apparmor profile for systemd containers (WAS: Fedora container thinks it is not running)

2014-05-30 Thread Christian Seiler
Hi,

 # lxc-attach -n f20 -- mount | grep cgroup
 cgroup on /sys/fs/cgroup type tmpfs (rw,relatime,size=12k,mode=755)
 none on /sys/fs/cgroup/cgmanager type tmpfs (rw,relatime,size=4k,mode=755)
 tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,mode=755)

:-( This appears to be a rather nasty bug...

 lxc does read the file /etc/lxc/lxc.conf that I created, verfied by
 the fact that lxc.cgroup.pattern works correctly. It does not,
 however, create the directory /sys/fs/cgroup/systemd/lxc-all/f20
 (which, if I understand correctly, it should, since I use
 lxc.cgroup.use = @all)
 
 # ls -d /sys/fs/cgroup/*/lxc-all/f20
 /sys/fs/cgroup/blkio/lxc-all/f20/sys/fs/cgroup/cpuset/lxc-all/f20
  /sys/fs/cgroup/hugetlb/lxc-all/f20
 /sys/fs/cgroup/cpuacct/lxc-all/f20  /sys/fs/cgroup/devices/lxc-all/f20
  /sys/fs/cgroup/memory/lxc-all/f20
 /sys/fs/cgroup/cpu/lxc-all/f20  /sys/fs/cgroup/freezer/lxc-all/f20
  /sys/fs/cgroup/perf_event/lxc-all/f20
 
 # mount | grep cgroup
 none on /sys/fs/cgroup type tmpfs (rw,relatime,size=4k,mode=755)
 cgroup on /sys/fs/cgroup/cpuset type cgroup
 (rw,relatime,cpuset,release_agent=/run/cgmanager/agents/cgm-release-agent.cpuset,clone_children)
 cgroup on /sys/fs/cgroup/cpu type cgroup
 (rw,relatime,cpu,release_agent=/run/cgmanager/agents/cgm-release-agent.cpu)
 cgroup on /sys/fs/cgroup/cpuacct type cgroup
 (rw,relatime,cpuacct,release_agent=/run/cgmanager/agents/cgm-release-agent.cpuacct)
 cgroup on /sys/fs/cgroup/memory type cgroup
 (rw,relatime,memory,release_agent=/run/cgmanager/agents/cgm-release-agent.memory)
 cgroup on /sys/fs/cgroup/devices type cgroup
 (rw,relatime,devices,release_agent=/run/cgmanager/agents/cgm-release-agent.devices)
 cgroup on /sys/fs/cgroup/freezer type cgroup
 (rw,relatime,freezer,release_agent=/run/cgmanager/agents/cgm-release-agent.freezer)
 cgroup on /sys/fs/cgroup/blkio type cgroup
 (rw,relatime,blkio,release_agent=/run/cgmanager/agents/cgm-release-agent.blkio)
 cgroup on /sys/fs/cgroup/perf_event type cgroup
 (rw,relatime,perf_event,release_agent=/run/cgmanager/agents/cgm-release-agent.perf_event)
 cgroup on /sys/fs/cgroup/hugetlb type cgroup
 (rw,relatime,hugetlb,release_agent=/run/cgmanager/agents/cgm-release-agent.hugetlb)
 systemd on /sys/fs/cgroup/systemd type cgroup
 (rw,nosuid,nodev,noexec,relatime,release_agent=/run/cgmanager/agents/cgm-release-agent.systemd,name=systemd)

Hmm, are you running cgmanager at the same time as systemd? I think this
might be a problem with the intersection of cgmanager with the cgroup
mounting code, i.e. the cgroup mounting code uses the cgfs stuff (which
was originally just cgroup before Serge implemented multiple drivers)
while the put the container into cgroup code uses cgmanager, which may
have some weird side effect in this case. I have to confess that so far
I haven't tried cgmanager myself (it's on my todo list), so I never
tested the interaction between Serge's cgmanager code and my cgroup
mounting code...

If you are running cgmanager, could you try the same while cgmanager
being stopped? Then LXC should fall back to the cgfs code, which
*should* work in this case, unless something else broke this logic.

Anyway, I'll have a chance to look at this more closely on Saturday (I'm
busy with other things tomorrow).

Regards,
Christian

___
lxc-users mailing list
lxc-users@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-users

Re: [lxc-users] apparmor profile for systemd containers (WAS: Fedora container thinks it is not running)

2014-05-30 Thread Christian Seiler
Hi,

as I said before, I'll have a chance of looking at the whole thing
tomorrow myself, but just two quick things:

 First it turns out I also needed to add lxc.mount.auto = sys before
 lxc.mount.auto = cgroup:mixed (otherwise I'd get double /sys/fs/cgroup
 tmpfs mount).

Huh? So lxc.mount.auto = sys has to be there, obiously (otherwise /sys
is not mounted), but what exactly do you mean by double?

 What happpens is:
 - the container still Freezing execution while starting root slice
 - /sys/fs/cgroup/cpuset (and friends) are bind-mounted (there's
 additional user/0.user/13.session directory, but I assume it's the
 effect of the ubuntu hosts's systemd, and is okay)
 - systemd mount in the container happens at
 /sys/fs/cgroup/systemd/user/0.user/13.session/lxc-all/f20 , but the
 container expects /sys/fs/cgroup/systemd/ to be writable
 
 So lxc.mount.auto = cgroup:mixed and lxc.cgroup.use = @all works, but
 it's not enough for fedora (and other sytemd-based container) to work
 properly.

Could you try the following?
lxc.mount.auto = sys cgroup-full:mixed

That will mount the whole cgroup tree, but the parts outside of the
container read-only.

In any case, I'll take a close look myself tomorrow.

Regards,
Christian

___
lxc-users mailing list
lxc-users@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-users

Re: [lxc-users] apparmor profile for systemd containers (WAS: Fedora container thinks it is not running)

2014-05-29 Thread Serge Hallyn
Quoting Fajar A. Nugraha (l...@fajar.net):
 On Thu, May 29, 2014 at 10:58 AM, Serge Hallyn serge.hal...@ubuntu.comwrote:
 
  Quoting Fajar A. Nugraha (l...@fajar.net):
   On Thu, May 29, 2014 at 5:08 AM, Serge Hallyn serge.hal...@ubuntu.com
  wrote:
would systemd be happy with it being mounted by lxc using an
lxc.mount.entry?  I think that would be preferable to relaxing the
apparmor policy.  i.e.
   
lxc.mount.entry = /sys/fs/cgroup/systemd sys/fs/cgroup/systemd none
bind,create=dir,optional 0 0
   
   
   Wouldn't that be shadowed by the container mounting its own /sys?
 
  If lxc mounts /sys then systemd will leave it be.
 
 
 Apparently that line alone doesn't work for me. I also had to add before
 that:
 
 lxc.mount.entry = sysfs sys sysfs default 0 0
 lxc.mount.entry = none sys/fs/cgroup tmpfs rw 0 0

or lxc.mount.auto = sys

That's what I meant by 'if lxc mounts /sys' :)

   Stephane also pointed out in my (closed) pull request that it would also
   allow the container to mess with the hosts's resource allocation.
 
  Yes, that's why lxc.mount.auto = cgroup:mixed is better.  But the above
  mount entry is no worse than letting the container do it through
  apparmor.
 
 
 That does not work, apparently.
 
 ### in confing
 lxc.mount.auto = cgroup:mixed
 ###
 
 ### lxc-start output
 30systemd[1]: Starting Root Slice.
 27systemd[1]: Caught SEGV, dumped core as pid 12.
 30systemd[1]: Freezing execution.
 ###

Hm, that's unfortunate.  I thought lxc.mount.auto = cgroup:mixed
with cgfs would mount named subsystems?  Christian?

 ###
 # lxc-attach -n f20 -- mount
 rpool/lxc on / type zfs (rw,noatime,xattr,noacl)
 udev on /dev type devtmpfs
 (rw,relatime,size=2473540k,nr_inodes=618385,mode=755)
 cgroup on /sys/fs/cgroup type tmpfs (rw,relatime,size=12k,mode=755)
 none on /sys/fs/cgroup/cgmanager type tmpfs (rw,relatime,size=4k,mode=755)
 devpts on /dev/lxc/console type devpts
 (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
 devpts on /dev/lxc/tty1 type devpts
 (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
 devpts on /dev/lxc/tty2 type devpts
 (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
 devpts on /dev/lxc/tty3 type devpts
 (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
 devpts on /dev/lxc/tty4 type devpts
 (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
 devpts on /dev/pts type devpts (rw,relatime,gid=5,mode=620,ptmxmode=666)
 proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
 sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
 tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
 tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
 tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,mode=755)
 
 # lxc-attach -n f20 -- ls /sys/fs/cgroup/
 blkio  cpu,cpuacct  cpuset  devices  freezer  hugetlb  memory  perf_event
  systemd
 
 # lxc-attach -n f20 -- ls /sys/fs/cgroup/systemd
 (no output)
 ###
 
 It looks like there's two lines for /sys/fs/cgroup? I'm using trusty's
 lxc-1.0.3.
 
 
 
 
 
   This works (at least, tested with console and ssh login), and should be
   secure-enough (bind-mount the container subdir, instead of the whole
   systemd cgroup), but complicated.
  
   ### snippet of config
   lxc.hook.mount = /var/lib/lxc/f20/bin/create_container_systemd_cgroup
   lxc.hook.post-stop =
  /var/lib/lxc/f20/bin/remove_container_systemd_cgroup
   ###
  
   ### cat create_container_systemd_cgroup
   #!/bin/bash
   mkdir -p /sys/fs/cgroup/systemd/lxc/$LXC_NAME
   mount -t sysfs sysfs $LXC_ROOTFS_MOUNT/sys
   mount -t tmpfs none $LXC_ROOTFS_MOUNT/sys/fs/cgroup
   mkdir $LXC_ROOTFS_MOUNT/sys/fs/cgroup/systemd
   mount --bind /sys/fs/cgroup/systemd/lxc/$LXC_NAME
   $LXC_ROOTFS_MOUNT/sys/fs/cgroup/systemd
   ###
  
   ### cat remove_container_systemd_cgroup
   #!/bin/bash
   [ -n $LXC_NAME ]  find /sys/fs/cgroup/systemd/lxc/$LXC_NAME -type d |
   tac | xargs rmdir
   ###
  
   Is there a way to simplify this somehow for it to be more suitable in the
   template?
 
  I suppose we could add a new a lxc.mount.auto = cgroup:systemd option which
  only mounts name=systemd, read-only except for the container's own cgroup
  which is rw?  But when I say we I don't really mean we :)
 
 
 
 Will that work?
 
 systemd cgroup mount is weird in a sense that there's no
 /lxc/CONTAINER_NAME subdirs under /sys/fs/cgroup/systemd, while there's one
 under /sys/fs/crgoup/{blkio,cpu,etc}. So for systemd cgroup I don't see
 which ones should be mount ro and which gets rw.
 
 The workaround hook I wrote earlier creates the directory
 /sys/fs/cgroup/systemd/lxc/CONTAINER_NAME on the host, and bind-mount it as
 the container's /sys/fs/cgroup/systemd.
 
 -- 
 Fajar
___
lxc-users mailing list
lxc-users@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-users

Re: [lxc-users] apparmor profile for systemd containers (WAS: Fedora container thinks it is not running)

2014-05-29 Thread Serge Hallyn
Quoting Christian Seiler (christ...@iwakd.de):
 Hi,
 
  ### lxc-start output
  30systemd[1]: Starting Root Slice.
  27systemd[1]: Caught SEGV, dumped core as pid 12.
  30systemd[1]: Freezing execution.
  ###
  
  Hm, that's unfortunate.  I thought lxc.mount.auto = cgroup:mixed
  with cgfs would mount named subsystems?  Christian?
 
 Yes, but this is actually controlled by lxc.cgroup.use (in
 lxc.system.conf(5), *not* lxc.container.conf(5)). Basically, we were
 conservative back then and decided to only touch cgroups (both for
 putting the container into and also for bind-mounting) that were either
 kernel cgroups or that the user explicitly specified.

Ah, thanks.

Fajar, does that fix it for you?

 BUT I think for the auto-mounting hook we should maybe change that to
 use *all* hierarchies. It's just that auto-mounting came a bit later and
 I just reused the existing code at that point and didn't properly think
 through the implications. I can provide a patch for changing this to all
 hierarchies for the auto-mounting case, but not today.
 
 In the mean time, you can just create a /etc/lxc/lxc.conf (or whatever
 LXC looks for on your system) with the following setting:
 
 lxc.cgroup.use = @all
 
 That will resort to using *all* named hierarchies. Or, alternatively,
 you can use something like
 
 lxc.cgroup.use = @kernel systemd
 
 to include all kernel hierarchies and the systemd hierarchy, but not
 other named ones.
 
 Note btw. that including the systemd hierarchy here actually has some
 weird side-effects, since the lxc.cgroup.use setting applies to both the
 auto-mounting feature but also the let's move the container into cgroup
 logic, thus directly modifying the systemd cgroup tree, something that
 the systemd strongly discourages.
 
 I was actually working on an additional cgroup backend for LXC (in
 addition to cgfs and cgmanager) that interfaces with systemd's dbus
 interface, but I'm not nearly done yet.

Oh, great.  Clearly finding a good place for cgmanager and systemd to
intersect is on my todo list, maybe your driver will be inspiration.

(My primary goal is to continue support unprivileged nested containers
as well with systemd as we do with upstart+cgmanager)

-serge
___
lxc-users mailing list
lxc-users@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-users

Re: [lxc-users] apparmor profile for systemd containers (WAS: Fedora container thinks it is not running)

2014-05-28 Thread Serge Hallyn
Quoting Fajar A. Nugraha (l...@fajar.net):
 (changed subject to match content)
 
 On Tue, May 27, 2014 at 11:10 PM, Michael H. Warfield m...@wittsend.com 
 wrote:
  On Tue, 2014-05-27 at 15:33 +0700, Fajar A. Nugraha wrote:
  On further test, this seems enough
 
  ###
  # cat lxc-default-with-systemd
  profile lxc-container-default-with-systemd
  flags=(attach_disconnected,mediate_deleted) {
#include abstractions/lxc/container-base
deny mount fstype=devpts,
mount options=(none,name=systemd) fstype=cgroup - 
  /sys/fs/cgroup/systemd/,
  }
  ###
 
  This sounds excellent.  It sounds like this should be incorporated into
  the lxc package for any host distros supporting app armour and we could
  then add that default to all the systemd based containers such as
  Fedora, Suse, eventually Oracle, and eventually CentOS.
 
  I agree it does seem to make more sense to use a restrictive profile
  that covers the minimal set of requirements as opposed to unconfined.
 
  That should be submitted as a patch over on the lxc-devel list then, for
  Serge and Stéphane to review.  I see where the file would need to be
  added in the config/apparmour/profiles directory but I'm not familiar
  enough with the packaging for Ubuntu to know what changes would be
  needed to add them there.
 
 I'll let Serge comment on this one.
 
 
 As a side note, I've tested opensuse 13.1 (using the squashfs root
 from rescue ISO) and it has two additional complains with the previous
 apparmor profile:
 
 May 27 17:12:50 trusty kernel: [66563.219898] type=1400
 audit(1401185570.578:9249): apparmor=DENIED operation=mount
 info=failed type match error=-13
 profile=lxc-container-default-with-systemd name=/var/run/
 pid=30648 comm=mount srcname=/run/ flags=rw, bind

Hm.  In Debian/Ubuntu this is done with a /var/run - /run
symlink...

 May 27 17:21:20 trusty kernel: [67073.932892] type=1400
 audit(1401186080.906:9846): apparmor=DENIED operation=mount
 info=failed flags match error=-13 profile=lxc-container-opensuse
 name=/proc/ pid=4158 comm=mount flags=rw, remount
 
 the second one (/proc) is pretty harmless, so I ignored it. The first
 one (/var/run) produced lots of errors
 
 [FAILED] Failed to mount Runtime Directory.
 See 'systemctl status var-run.mount' for details.
 [DEPEND] Dependency failed for System Logging Service.
  Mounting Runtime Directory...
 
 
 ... and made syslog (and possibly other services) failed to start, so
 for opensuse I had to adjust the profile even further
 
 ###
 profile lxc-container-opensuse flags=(attach_disconnected,mediate_deleted) {
   #include abstractions/lxc/container-base
   deny mount fstype=devpts,
   mount options=(none,name=systemd) fstype=cgroup - /sys/fs/cgroup/systemd/,
   mount options=(rw,bind),
 }
 ###
 
 Bind mounts inside a container should be safe, right? While there are
 still some problems with opensuse container (e.g. shutdown takes a
 long time on systemctl stop network@eth0.service), it is at least
 usable for testing purposes.

would systemd be happy with it being mounted by lxc using an
lxc.mount.entry?  I think that would be preferable to relaxing the
apparmor policy.  i.e.

lxc.mount.entry = /sys/fs/cgroup/systemd sys/fs/cgroup/systemd none 
bind,create=dir,optional 0 0

Or, of course, you can just do

lxc.mount.auto = cgroup:mixed

which should give you /sys/fs/cgroup/systemd if it exists on the host, and in a
safer way.  Now if /sys/fs/cgroup/systemd does not exist on the host, these
won't work...

As you say the bind mounts should be ok - although some of the mount options
stuff doesn't work right in many apparmor parsers.  So we'd want to make
sure that 'mount options=(rw,bind)' does in fact only allow that, instead
of suddely allowing all mounts, as I've unfortunately seen happen when I tried
to selectively allow some other mount options.

-serge
___
lxc-users mailing list
lxc-users@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-users

Re: [lxc-users] apparmor profile for systemd containers (WAS: Fedora container thinks it is not running)

2014-05-28 Thread Fajar A. Nugraha
On Thu, May 29, 2014 at 5:08 AM, Serge Hallyn serge.hal...@ubuntu.comwrote:

 Quoting Fajar A. Nugraha (l...@fajar.net):
  (changed subject to match content)
 
  On Tue, May 27, 2014 at 11:10 PM, Michael H. Warfield m...@wittsend.com
 wrote:
   On Tue, 2014-05-27 at 15:33 +0700, Fajar A. Nugraha wrote:
   On further test, this seems enough
  
   ###
   # cat lxc-default-with-systemd
   profile lxc-container-default-with-systemd
   flags=(attach_disconnected,mediate_deleted) {
 #include abstractions/lxc/container-base
 deny mount fstype=devpts,
 mount options=(none,name=systemd) fstype=cgroup -
 /sys/fs/cgroup/systemd/,
   }
   ###
  
   This sounds excellent.  It sounds like this should be incorporated into
   the lxc package for any host distros supporting app armour and we could
   then add that default to all the systemd based containers such as
   Fedora, Suse, eventually Oracle, and eventually CentOS.
  
   I agree it does seem to make more sense to use a restrictive profile
   that covers the minimal set of requirements as opposed to unconfined.
  
   That should be submitted as a patch over on the lxc-devel list then,
 for
   Serge and Stéphane to review.  I see where the file would need to be
   added in the config/apparmour/profiles directory but I'm not familiar
   enough with the packaging for Ubuntu to know what changes would be
   needed to add them there.
 
  I'll let Serge comment on this one.
 
 
  As a side note, I've tested opensuse 13.1 (using the squashfs root
  from rescue ISO) and it has two additional complains with the previous
  apparmor profile:
 
  May 27 17:12:50 trusty kernel: [66563.219898] type=1400
  audit(1401185570.578:9249): apparmor=DENIED operation=mount
  info=failed type match error=-13
  profile=lxc-container-default-with-systemd name=/var/run/
  pid=30648 comm=mount srcname=/run/ flags=rw, bind

 Hm.  In Debian/Ubuntu this is done with a /var/run - /run
 symlink...


something like that could probably be added to the opensuse template,
modifying the current mount service.



  May 27 17:21:20 trusty kernel: [67073.932892] type=1400
  audit(1401186080.906:9846): apparmor=DENIED operation=mount
  info=failed flags match error=-13 profile=lxc-container-opensuse
  name=/proc/ pid=4158 comm=mount flags=rw, remount
 
  the second one (/proc) is pretty harmless, so I ignored it. The first
  one (/var/run) produced lots of errors
 
  [FAILED] Failed to mount Runtime Directory.
  See 'systemctl status var-run.mount' for details.
  [DEPEND] Dependency failed for System Logging Service.
   Mounting Runtime Directory...
 
 
  ... and made syslog (and possibly other services) failed to start, so
  for opensuse I had to adjust the profile even further
 
  ###
  profile lxc-container-opensuse
 flags=(attach_disconnected,mediate_deleted) {
#include abstractions/lxc/container-base
deny mount fstype=devpts,
mount options=(none,name=systemd) fstype=cgroup -
 /sys/fs/cgroup/systemd/,
mount options=(rw,bind),
  }
  ###
 
  Bind mounts inside a container should be safe, right? While there are
  still some problems with opensuse container (e.g. shutdown takes a
  long time on systemctl stop network@eth0.service), it is at least
  usable for testing purposes.

 would systemd be happy with it being mounted by lxc using an
 lxc.mount.entry?  I think that would be preferable to relaxing the
 apparmor policy.  i.e.

 lxc.mount.entry = /sys/fs/cgroup/systemd sys/fs/cgroup/systemd none
 bind,create=dir,optional 0 0


Wouldn't that be shadowed by the container mounting its own /sys?
Stephane also pointed out in my (closed) pull request that it would also
allow the container to mess with the hosts's resource allocation.

This works (at least, tested with console and ssh login), and should be
secure-enough (bind-mount the container subdir, instead of the whole
systemd cgroup), but complicated.

### snippet of config
lxc.hook.mount = /var/lib/lxc/f20/bin/create_container_systemd_cgroup
lxc.hook.post-stop = /var/lib/lxc/f20/bin/remove_container_systemd_cgroup
###

### cat create_container_systemd_cgroup
#!/bin/bash
mkdir -p /sys/fs/cgroup/systemd/lxc/$LXC_NAME
mount -t sysfs sysfs $LXC_ROOTFS_MOUNT/sys
mount -t tmpfs none $LXC_ROOTFS_MOUNT/sys/fs/cgroup
mkdir $LXC_ROOTFS_MOUNT/sys/fs/cgroup/systemd
mount --bind /sys/fs/cgroup/systemd/lxc/$LXC_NAME
$LXC_ROOTFS_MOUNT/sys/fs/cgroup/systemd
###

### cat remove_container_systemd_cgroup
#!/bin/bash
[ -n $LXC_NAME ]  find /sys/fs/cgroup/systemd/lxc/$LXC_NAME -type d |
tac | xargs rmdir
###

Is there a way to simplify this somehow for it to be more suitable in the
template?

-- 
Fajar
___
lxc-users mailing list
lxc-users@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-users

Re: [lxc-users] apparmor profile for systemd containers (WAS: Fedora container thinks it is not running)

2014-05-28 Thread Serge Hallyn
Quoting Fajar A. Nugraha (l...@fajar.net):
 On Thu, May 29, 2014 at 5:08 AM, Serge Hallyn serge.hal...@ubuntu.comwrote:
  would systemd be happy with it being mounted by lxc using an
  lxc.mount.entry?  I think that would be preferable to relaxing the
  apparmor policy.  i.e.
 
  lxc.mount.entry = /sys/fs/cgroup/systemd sys/fs/cgroup/systemd none
  bind,create=dir,optional 0 0
 
 
 Wouldn't that be shadowed by the container mounting its own /sys?

If lxc mounts /sys then systemd will leave it be.

 Stephane also pointed out in my (closed) pull request that it would also
 allow the container to mess with the hosts's resource allocation.

Yes, that's why lxc.mount.auto = cgroup:mixed is better.  But the above
mount entry is no worse than letting the container do it through
apparmor.

 This works (at least, tested with console and ssh login), and should be
 secure-enough (bind-mount the container subdir, instead of the whole
 systemd cgroup), but complicated.
 
 ### snippet of config
 lxc.hook.mount = /var/lib/lxc/f20/bin/create_container_systemd_cgroup
 lxc.hook.post-stop = /var/lib/lxc/f20/bin/remove_container_systemd_cgroup
 ###
 
 ### cat create_container_systemd_cgroup
 #!/bin/bash
 mkdir -p /sys/fs/cgroup/systemd/lxc/$LXC_NAME
 mount -t sysfs sysfs $LXC_ROOTFS_MOUNT/sys
 mount -t tmpfs none $LXC_ROOTFS_MOUNT/sys/fs/cgroup
 mkdir $LXC_ROOTFS_MOUNT/sys/fs/cgroup/systemd
 mount --bind /sys/fs/cgroup/systemd/lxc/$LXC_NAME
 $LXC_ROOTFS_MOUNT/sys/fs/cgroup/systemd
 ###
 
 ### cat remove_container_systemd_cgroup
 #!/bin/bash
 [ -n $LXC_NAME ]  find /sys/fs/cgroup/systemd/lxc/$LXC_NAME -type d |
 tac | xargs rmdir
 ###
 
 Is there a way to simplify this somehow for it to be more suitable in the
 template?

I suppose we could add a new a lxc.mount.auto = cgroup:systemd option which
only mounts name=systemd, read-only except for the container's own cgroup
which is rw?  But when I say we I don't really mean we :)
___
lxc-users mailing list
lxc-users@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-users

Re: [lxc-users] apparmor profile for systemd containers (WAS: Fedora container thinks it is not running)

2014-05-28 Thread Fajar A. Nugraha
On Thu, May 29, 2014 at 10:58 AM, Serge Hallyn serge.hal...@ubuntu.comwrote:

 Quoting Fajar A. Nugraha (l...@fajar.net):
  On Thu, May 29, 2014 at 5:08 AM, Serge Hallyn serge.hal...@ubuntu.com
 wrote:
   would systemd be happy with it being mounted by lxc using an
   lxc.mount.entry?  I think that would be preferable to relaxing the
   apparmor policy.  i.e.
  
   lxc.mount.entry = /sys/fs/cgroup/systemd sys/fs/cgroup/systemd none
   bind,create=dir,optional 0 0
  
  
  Wouldn't that be shadowed by the container mounting its own /sys?

 If lxc mounts /sys then systemd will leave it be.


Apparently that line alone doesn't work for me. I also had to add before
that:

lxc.mount.entry = sysfs sys sysfs default 0 0
lxc.mount.entry = none sys/fs/cgroup tmpfs rw 0 0



  Stephane also pointed out in my (closed) pull request that it would also
  allow the container to mess with the hosts's resource allocation.

 Yes, that's why lxc.mount.auto = cgroup:mixed is better.  But the above
 mount entry is no worse than letting the container do it through
 apparmor.


That does not work, apparently.

### in confing
lxc.mount.auto = cgroup:mixed
###

### lxc-start output
30systemd[1]: Starting Root Slice.
27systemd[1]: Caught SEGV, dumped core as pid 12.
30systemd[1]: Freezing execution.
###

###
# lxc-attach -n f20 -- mount
rpool/lxc on / type zfs (rw,noatime,xattr,noacl)
udev on /dev type devtmpfs
(rw,relatime,size=2473540k,nr_inodes=618385,mode=755)
cgroup on /sys/fs/cgroup type tmpfs (rw,relatime,size=12k,mode=755)
none on /sys/fs/cgroup/cgmanager type tmpfs (rw,relatime,size=4k,mode=755)
devpts on /dev/lxc/console type devpts
(rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
devpts on /dev/lxc/tty1 type devpts
(rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
devpts on /dev/lxc/tty2 type devpts
(rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
devpts on /dev/lxc/tty3 type devpts
(rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
devpts on /dev/lxc/tty4 type devpts
(rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
devpts on /dev/pts type devpts (rw,relatime,gid=5,mode=620,ptmxmode=666)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,mode=755)

# lxc-attach -n f20 -- ls /sys/fs/cgroup/
blkio  cpu,cpuacct  cpuset  devices  freezer  hugetlb  memory  perf_event
 systemd

# lxc-attach -n f20 -- ls /sys/fs/cgroup/systemd
(no output)
###

It looks like there's two lines for /sys/fs/cgroup? I'm using trusty's
lxc-1.0.3.





  This works (at least, tested with console and ssh login), and should be
  secure-enough (bind-mount the container subdir, instead of the whole
  systemd cgroup), but complicated.
 
  ### snippet of config
  lxc.hook.mount = /var/lib/lxc/f20/bin/create_container_systemd_cgroup
  lxc.hook.post-stop =
 /var/lib/lxc/f20/bin/remove_container_systemd_cgroup
  ###
 
  ### cat create_container_systemd_cgroup
  #!/bin/bash
  mkdir -p /sys/fs/cgroup/systemd/lxc/$LXC_NAME
  mount -t sysfs sysfs $LXC_ROOTFS_MOUNT/sys
  mount -t tmpfs none $LXC_ROOTFS_MOUNT/sys/fs/cgroup
  mkdir $LXC_ROOTFS_MOUNT/sys/fs/cgroup/systemd
  mount --bind /sys/fs/cgroup/systemd/lxc/$LXC_NAME
  $LXC_ROOTFS_MOUNT/sys/fs/cgroup/systemd
  ###
 
  ### cat remove_container_systemd_cgroup
  #!/bin/bash
  [ -n $LXC_NAME ]  find /sys/fs/cgroup/systemd/lxc/$LXC_NAME -type d |
  tac | xargs rmdir
  ###
 
  Is there a way to simplify this somehow for it to be more suitable in the
  template?

 I suppose we could add a new a lxc.mount.auto = cgroup:systemd option which
 only mounts name=systemd, read-only except for the container's own cgroup
 which is rw?  But when I say we I don't really mean we :)



Will that work?

systemd cgroup mount is weird in a sense that there's no
/lxc/CONTAINER_NAME subdirs under /sys/fs/cgroup/systemd, while there's one
under /sys/fs/crgoup/{blkio,cpu,etc}. So for systemd cgroup I don't see
which ones should be mount ro and which gets rw.

The workaround hook I wrote earlier creates the directory
/sys/fs/cgroup/systemd/lxc/CONTAINER_NAME on the host, and bind-mount it as
the container's /sys/fs/cgroup/systemd.

-- 
Fajar
___
lxc-users mailing list
lxc-users@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-users