Re: [systemd-devel] [PATCH] audit: Fix journal failing on unsupported audit in containers [was: journal: don't complain about audit socket errors in a container.]
On Wed, 20.05.15 22:40, Martin Pitt (martin.p...@ubuntu.com) wrote: Hey Lennart, Lennart Poettering [2015-05-20 17:49 +0200]: Nope, ConditionSecurity=audit is only a simple boolean check that holds when audit is enabled at all. It doesn't tell you anything about the precise audit feature set of the kernel. Ah, thanks for the clarification. I have now conditionalized the unit on CAP_ADMIN_READ, which is the cap that you need to read the audit multicast stuff. You container manager hence should simply drop that cap fro, the cap set it passes and all should be good. Wonderful! Now it works perfectly in nspawn. (This needs to be fixed in unprivileged LXC containers, but that's not a systemd problem; I'll talk to LXC upstream about that). With these two fixes, should we now remove the scary warning in README? AFAICS there is no need to turn auditing off on the host any more. As mentioned before: unless you turn auditing off in the kernel, you cannot even log into any Fedora system running in a container (unless you have the seccomp trick on and are on x86-64). The message hence really should stay. Note that Debian/Ubuntu are not as restrictive regarding audit as Fedora is. In Fedora due to government craziness failing audit will result in refused logins, and that's the issue here. Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [PATCH] audit: Fix journal failing on unsupported audit in containers [was: journal: don't complain about audit socket errors in a container.]
Quoting Lennart Poettering (lenn...@poettering.net): On Wed, 20.05.15 22:40, Martin Pitt (martin.p...@ubuntu.com) wrote: Hey Lennart, Lennart Poettering [2015-05-20 17:49 +0200]: Nope, ConditionSecurity=audit is only a simple boolean check that holds when audit is enabled at all. It doesn't tell you anything about the precise audit feature set of the kernel. Ah, thanks for the clarification. I have now conditionalized the unit on CAP_ADMIN_READ, which is the cap that you need to read the audit multicast stuff. You container manager hence should simply drop that cap fro, the cap set it passes and all should be good. I want to clarify this point. Dropping CAP_ADMIN_READ from the bounding set means dropping it from the capabilities targeted at your own user namespace. The only check in the kernel for CAP_ADMIN_READ currently is against the initial user namespace. One day of course (maybe soon) this may change so that you only need CAP_ADMIN_READ against your own user_ns. Following the above, container managers could then again keep CAP_ADMIN_READ in the bounding set. But I'm claiming that checking for CAP_ADMIN_READ in your bounding set is the wrong check here. It simply has nothing to do with what you actually want to be able to do. One could argue that the right answer is a new kernel facility to check for caps against init_user_ns, but no that will have the same problem after audit ns becomes possible. I think the right check for systemd to perform to check whether this is allowed is to actuallly try the bind(). That will return the right answer both now and when namespaced audit is possible, without taking a probably-wrong unrelated cue from the container manager. It's not earth-shatteringly important and what you've got is workable, but I think it may set a better precedent to do it the other way. -serge (One might almost think that we should have a new kernel facility to answer questions such questions. CAP_MAC_ADMIN is similar.) ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [PATCH] audit: Fix journal failing on unsupported audit in containers [was: journal: don't complain about audit socket errors in a container.]
Patchset imported to github. Pull request: https://github.com/systemd-devs/systemd/compare/master...systemd-mailing-devs:20150520115326.GA32127%40piware.de -- Generated by https://github.com/haraldh/mail2git ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [PATCH] audit: Fix journal failing on unsupported audit in containers [was: journal: don't complain about audit socket errors in a container.]
On Wed, 20.05.15 13:53, Martin Pitt (martin.p...@ubuntu.com) wrote: -cached_use = true; +/* bind() fails in namespaces (containers), so check that too */ +static const union sockaddr_union sa = { +.nl.nl_family = AF_NETLINK, +.nl.nl_pid= 0, +.nl.nl_groups = AUDIT_NLGRP_READLOG, +}; +cached_use = (bind(fd, sa.sa, sizeof(sa.nl)) = 0); safe_close(fd); This check is simply not right. With that you check whether the multicast audit API is available. But given that it has been added only one or two kernel releases ago, and is protected by its own capabality the check is definitely too broad. The fact is simply that the kernel audit subsystem is borked in the kernel when it comes to namespacing, and there's no nice way to detect whether it is borked I am aware of. And it's not really about this multicast journald feature only. Sooner or later you will run into other problems: any fedora-based distro will not allow you to even log in in the container if you leave audit on in the kernel, and don#t use the seccomp hack we have in place (for example, because you are on 32bit x86, or because your distro turned it off). We could of course add a detect_container() check now to journald. But I think that would be a big mistake, since there was work on fixing audit in the kernel for containers (by adding audit namespacing or so). And we should try to write our code so that things will start working when the kernel is fixed, but a detect_container() check would make that impossible. Anyway, I think people are mostly concerned about bind() failing here, hence I have now added some code to handle that gracefully. Right now it will still log a message with LOG_WARNING. I'd be willing to downgrade this to LOG_DEBUG for select error codes, if you tell me the ones you run into. EINVAL? Also, please convince your distro to enable seccomp support! Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [PATCH] audit: Fix journal failing on unsupported audit in containers [was: journal: don't complain about audit socket errors in a container.]
Lennart Poettering [2015-05-20 14:57 +0200]: On Wed, 20.05.15 13:53, Martin Pitt (martin.p...@ubuntu.com) wrote: -cached_use = true; +/* bind() fails in namespaces (containers), so check that too */ +static const union sockaddr_union sa = { +.nl.nl_family = AF_NETLINK, +.nl.nl_pid= 0, +.nl.nl_groups = AUDIT_NLGRP_READLOG, +}; +cached_use = (bind(fd, sa.sa, sizeof(sa.nl)) = 0); safe_close(fd); This check is simply not right. With that you check whether the multicast audit API is available. But given that it has been added only one or two kernel releases ago, and is protected by its own capabality the check is definitely too broad. OK. I thought the intention of ConditionSecurity=audit was exactly that, as this condition was added together with adding it to systemd-journald-audit.socket. The fact is simply that the kernel audit subsystem is borked in the kernel when it comes to namespacing, and there's no nice way to detect whether it is borked I am aware of. Right, hence my thinking was that we check for the things we actually want to do with it. We could of course add a detect_container() check now to journald. But I think that would be a big mistake, since there was work on fixing audit in the kernel for containers (by adding audit namespacing or so). Right, fully agreed. That's why I wrote that I didn't like Frank's original patch. Anyway, I think people are mostly concerned about bind() failing here, hence I have now added some code to handle that gracefully. Right now it will still log a message with LOG_WARNING. I'd be willing to downgrade this to LOG_DEBUG for select error codes, if you tell me the ones you run into. EINVAL? bind(7, {sa_family=AF_NETLINK, pid=0, groups=0001}, 12) = -1 EPERM (Operation not permitted) With commit 417a7fdc journald now starts working, but systemd-journald-audit.socket still fails: Active: failed (Result: resources) systemd[1]: systemd-journald-audit.socket: Socket service systemd-journald.service already active, refusing. systemd[1]: Failed to listen on Journal Audit Socket. That's why I thought tightening the ConditionSecurity=audit check would make more sense, as systemd-journald-audit.socket is the only unit which actually uses it. We could add ConditionVirtualization=!container to it as a distro-level workaround, but I don't like that for the reasons above. I don't just want to leave it like that as it makes the system stay in degraded mode. Also, please convince your distro to enable seccomp support! Fair enough, but that hack doesn't work on all platforms we support (i386, powerpc, ppc64el, etc.), and quite frankly that's an even worse hack: You'd need to disable that filter once the kernel gets fixed, and/or conditionalize it based on the running kernel version. I'd like the same code to work everywhere :-) Thanks, Martin -- Martin Pitt| http://www.piware.de Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org) signature.asc Description: Digital signature ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [PATCH] audit: Fix journal failing on unsupported audit in containers [was: journal: don't complain about audit socket errors in a container.]
On Wed, 20.05.15 15:48, Martin Pitt (martin.p...@ubuntu.com) wrote: Lennart Poettering [2015-05-20 14:57 +0200]: On Wed, 20.05.15 13:53, Martin Pitt (martin.p...@ubuntu.com) wrote: -cached_use = true; +/* bind() fails in namespaces (containers), so check that too */ +static const union sockaddr_union sa = { +.nl.nl_family = AF_NETLINK, +.nl.nl_pid= 0, +.nl.nl_groups = AUDIT_NLGRP_READLOG, +}; +cached_use = (bind(fd, sa.sa, sizeof(sa.nl)) = 0); safe_close(fd); This check is simply not right. With that you check whether the multicast audit API is available. But given that it has been added only one or two kernel releases ago, and is protected by its own capabality the check is definitely too broad. OK. I thought the intention of ConditionSecurity=audit was exactly that, as this condition was added together with adding it to systemd-journald-audit.socket. Nope, ConditionSecurity=audit is only a simple boolean check that holds when audit is enabled at all. It doesn't tell you anything about the precise audit feature set of the kernel. Anyway, I think people are mostly concerned about bind() failing here, hence I have now added some code to handle that gracefully. Right now it will still log a message with LOG_WARNING. I'd be willing to downgrade this to LOG_DEBUG for select error codes, if you tell me the ones you run into. EINVAL? bind(7, {sa_family=AF_NETLINK, pid=0, groups=0001}, 12) = -1 EPERM (Operation not permitted) With commit 417a7fdc journald now starts working, but systemd-journald-audit.socket still fails: Active: failed (Result: resources) systemd[1]: systemd-journald-audit.socket: Socket service systemd-journald.service already active, refusing. systemd[1]: Failed to listen on Journal Audit Socket. I have now conditionalized the unit on CAP_ADMIN_READ, which is the cap that you need to read the audit multicast stuff. You container manager hence should simply drop that cap fro, the cap set it passes and all should be good. I didn't test this though, hence please check if current git fixes that for you now. That's why I thought tightening the ConditionSecurity=audit check would make more sense, as systemd-journald-audit.socket is the only unit which actually uses it. We could add ConditionVirtualization=!container to it as a distro-level workaround, but I don't like that for the reasons above. I don't just want to leave it like that as it makes the system stay in degraded mode. Both conditions are now in place, and we need both: one can have the cap without auditing being enabled, and auditing can be enabled without the cap available, only if one has both the audit suff in journal can work. Also, please convince your distro to enable seccomp support! Fair enough, but that hack doesn't work on all platforms we support (i386, powerpc, ppc64el, etc.), and quite frankly that's an even worse hack: You'd need to disable that filter once the kernel gets fixed, and/or conditionalize it based on the running kernel version. I'd like the same code to work everywhere :-) Well, it's relatively simply fixing one container manager than all userspaces that can run within it... But anyway, please check if git works for you now. Lennart -- Lennart Poettering, Red Hat ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [PATCH] audit: Fix journal failing on unsupported audit in containers [was: journal: don't complain about audit socket errors in a container.]
Hey Lennart, Lennart Poettering [2015-05-20 17:49 +0200]: Nope, ConditionSecurity=audit is only a simple boolean check that holds when audit is enabled at all. It doesn't tell you anything about the precise audit feature set of the kernel. Ah, thanks for the clarification. I have now conditionalized the unit on CAP_ADMIN_READ, which is the cap that you need to read the audit multicast stuff. You container manager hence should simply drop that cap fro, the cap set it passes and all should be good. Wonderful! Now it works perfectly in nspawn. (This needs to be fixed in unprivileged LXC containers, but that's not a systemd problem; I'll talk to LXC upstream about that). With these two fixes, should we now remove the scary warning in README? AFAICS there is no need to turn auditing off on the host any more. Thanks! Martin -- Martin Pitt| http://www.piware.de Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org) ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel