Re: [PATCH] [RFC] proc connector: add namespace events
On Tue, 13 Sep 2016 16:42:43 +0200, Alban Crequy wrote: > Note that I will probably not have the chance to spend more time on > this patch soon because Iago will explore other methods with > eBPF+kprobes instead. eBPF+kprobes would not have the same API > stability though. I was curious to see if anyone would find the > namespace addition to proc connector interesting for other projects. Yes, this is a sorely missing feature. I don't care how this is done (proc connector or something else) but the feature itself is quite important for system management daemons. In particular, we need an application that monitors network configuration changes on the machine, displays the current configuration and records history of the changes. This is currently impossible to do reliably if net name spaces are in use - which they are with OpenStack and Docker and similar things in place on those machines. The current tools try to do things like monitoring /var/run/netns which is obviously unreliable and broken. There are actually two (orthogonal) problems here: apart of the one described above, it's also startup of such daemon. There's currently no way to find all current name spaces from the user space. We'll need an API for this, too. And no, eBPF is not the answer. This should just work like any other system daemon. I can't imagine that we would need llvm compiler and kernel sources/debuginfo/whatever on every machine that runs such daemon. Thanks, Jiri
Re: [PATCH] [RFC] proc connector: add namespace events
On Tue, 13 Sep 2016 16:42:43 +0200, Alban Crequy wrote: > Note that I will probably not have the chance to spend more time on > this patch soon because Iago will explore other methods with > eBPF+kprobes instead. eBPF+kprobes would not have the same API > stability though. I was curious to see if anyone would find the > namespace addition to proc connector interesting for other projects. Yes, this is a sorely missing feature. I don't care how this is done (proc connector or something else) but the feature itself is quite important for system management daemons. In particular, we need an application that monitors network configuration changes on the machine, displays the current configuration and records history of the changes. This is currently impossible to do reliably if net name spaces are in use - which they are with OpenStack and Docker and similar things in place on those machines. The current tools try to do things like monitoring /var/run/netns which is obviously unreliable and broken. There are actually two (orthogonal) problems here: apart of the one described above, it's also startup of such daemon. There's currently no way to find all current name spaces from the user space. We'll need an API for this, too. And no, eBPF is not the answer. This should just work like any other system daemon. I can't imagine that we would need llvm compiler and kernel sources/debuginfo/whatever on every machine that runs such daemon. Thanks, Jiri
Re: [PATCH] [RFC] proc connector: add namespace events
On 12 September 2016 at 23:39, Evgeniy Polyakovwrote: > Hi everyone > > 08.09.2016, 18:39, "Alban Crequy" : >> The act of a process creating or joining a namespace via clone(), >> unshare() or setns() is a useful signal for monitoring applications. > >> + if (old_ns->mnt_ns != new_ns->mnt_ns) >> + proc_ns_connector(tsk, CLONE_NEWNS, PROC_NM_REASON_CLONE, old_mntns_inum, >> new_mntns_inum); >> + >> + if (old_ns->uts_ns != new_ns->uts_ns) >> + proc_ns_connector(tsk, CLONE_NEWUTS, PROC_NM_REASON_CLONE, >> old_ns->uts_ns->ns.inum, new_ns->uts_ns->ns.inum); >> + >> + if (old_ns->ipc_ns != new_ns->ipc_ns) >> + proc_ns_connector(tsk, CLONE_NEWIPC, PROC_NM_REASON_CLONE, >> old_ns->ipc_ns->ns.inum, new_ns->ipc_ns->ns.inum); >> + >> + if (old_ns->net_ns != new_ns->net_ns) >> + proc_ns_connector(tsk, CLONE_NEWNET, PROC_NM_REASON_CLONE, >> old_ns->net_ns->ns.inum, new_ns->net_ns->ns.inum); >> + >> + if (old_ns->cgroup_ns != new_ns->cgroup_ns) >> + proc_ns_connector(tsk, CLONE_NEWCGROUP, PROC_NM_REASON_CLONE, >> old_ns->cgroup_ns->ns.inum, new_ns->cgroup_ns->ns.inum); >> + >> + if (old_ns->pid_ns_for_children != new_ns->pid_ns_for_children) >> + proc_ns_connector(tsk, CLONE_NEWPID, PROC_NM_REASON_CLONE, >> old_ns->pid_ns_for_children->ns.inum, new_ns->pid_ns_for_children->ns.inum); >> + } >> + > > Patch looks good to me from technical/connector point of view, but these even > multiplication is a bit weird imho. > > I'm not against it, but did you consider sending just 2 serialized ns > structures via single message, and client > would check all ns bits himself? I have not considered it, thanks for the suggestion. Should we offer the guarantee to userspace that it will always be send in one single message? If we want to give the information about the userns change too, it will be a bit more complicated to write the patch because it is not done in the same function. Note that I will probably not have the chance to spend more time on this patch soon because Iago will explore other methods with eBPF+kprobes instead. eBPF+kprobes would not have the same API stability though. I was curious to see if anyone would find the namespace addition to proc connector interesting for other projects.
Re: [PATCH] [RFC] proc connector: add namespace events
On 12 September 2016 at 23:39, Evgeniy Polyakov wrote: > Hi everyone > > 08.09.2016, 18:39, "Alban Crequy" : >> The act of a process creating or joining a namespace via clone(), >> unshare() or setns() is a useful signal for monitoring applications. > >> + if (old_ns->mnt_ns != new_ns->mnt_ns) >> + proc_ns_connector(tsk, CLONE_NEWNS, PROC_NM_REASON_CLONE, old_mntns_inum, >> new_mntns_inum); >> + >> + if (old_ns->uts_ns != new_ns->uts_ns) >> + proc_ns_connector(tsk, CLONE_NEWUTS, PROC_NM_REASON_CLONE, >> old_ns->uts_ns->ns.inum, new_ns->uts_ns->ns.inum); >> + >> + if (old_ns->ipc_ns != new_ns->ipc_ns) >> + proc_ns_connector(tsk, CLONE_NEWIPC, PROC_NM_REASON_CLONE, >> old_ns->ipc_ns->ns.inum, new_ns->ipc_ns->ns.inum); >> + >> + if (old_ns->net_ns != new_ns->net_ns) >> + proc_ns_connector(tsk, CLONE_NEWNET, PROC_NM_REASON_CLONE, >> old_ns->net_ns->ns.inum, new_ns->net_ns->ns.inum); >> + >> + if (old_ns->cgroup_ns != new_ns->cgroup_ns) >> + proc_ns_connector(tsk, CLONE_NEWCGROUP, PROC_NM_REASON_CLONE, >> old_ns->cgroup_ns->ns.inum, new_ns->cgroup_ns->ns.inum); >> + >> + if (old_ns->pid_ns_for_children != new_ns->pid_ns_for_children) >> + proc_ns_connector(tsk, CLONE_NEWPID, PROC_NM_REASON_CLONE, >> old_ns->pid_ns_for_children->ns.inum, new_ns->pid_ns_for_children->ns.inum); >> + } >> + > > Patch looks good to me from technical/connector point of view, but these even > multiplication is a bit weird imho. > > I'm not against it, but did you consider sending just 2 serialized ns > structures via single message, and client > would check all ns bits himself? I have not considered it, thanks for the suggestion. Should we offer the guarantee to userspace that it will always be send in one single message? If we want to give the information about the userns change too, it will be a bit more complicated to write the patch because it is not done in the same function. Note that I will probably not have the chance to spend more time on this patch soon because Iago will explore other methods with eBPF+kprobes instead. eBPF+kprobes would not have the same API stability though. I was curious to see if anyone would find the namespace addition to proc connector interesting for other projects.
Re: [PATCH] [RFC] proc connector: add namespace events
Hi everyone 08.09.2016, 18:39, "Alban Crequy": > The act of a process creating or joining a namespace via clone(), > unshare() or setns() is a useful signal for monitoring applications. > + if (old_ns->mnt_ns != new_ns->mnt_ns) > + proc_ns_connector(tsk, CLONE_NEWNS, PROC_NM_REASON_CLONE, old_mntns_inum, > new_mntns_inum); > + > + if (old_ns->uts_ns != new_ns->uts_ns) > + proc_ns_connector(tsk, CLONE_NEWUTS, PROC_NM_REASON_CLONE, > old_ns->uts_ns->ns.inum, new_ns->uts_ns->ns.inum); > + > + if (old_ns->ipc_ns != new_ns->ipc_ns) > + proc_ns_connector(tsk, CLONE_NEWIPC, PROC_NM_REASON_CLONE, > old_ns->ipc_ns->ns.inum, new_ns->ipc_ns->ns.inum); > + > + if (old_ns->net_ns != new_ns->net_ns) > + proc_ns_connector(tsk, CLONE_NEWNET, PROC_NM_REASON_CLONE, > old_ns->net_ns->ns.inum, new_ns->net_ns->ns.inum); > + > + if (old_ns->cgroup_ns != new_ns->cgroup_ns) > + proc_ns_connector(tsk, CLONE_NEWCGROUP, PROC_NM_REASON_CLONE, > old_ns->cgroup_ns->ns.inum, new_ns->cgroup_ns->ns.inum); > + > + if (old_ns->pid_ns_for_children != new_ns->pid_ns_for_children) > + proc_ns_connector(tsk, CLONE_NEWPID, PROC_NM_REASON_CLONE, > old_ns->pid_ns_for_children->ns.inum, new_ns->pid_ns_for_children->ns.inum); > + } > + Patch looks good to me from technical/connector point of view, but these even multiplication is a bit weird imho. I'm not against it, but did you consider sending just 2 serialized ns structures via single message, and client would check all ns bits himself?
Re: [PATCH] [RFC] proc connector: add namespace events
Hi everyone 08.09.2016, 18:39, "Alban Crequy" : > The act of a process creating or joining a namespace via clone(), > unshare() or setns() is a useful signal for monitoring applications. > + if (old_ns->mnt_ns != new_ns->mnt_ns) > + proc_ns_connector(tsk, CLONE_NEWNS, PROC_NM_REASON_CLONE, old_mntns_inum, > new_mntns_inum); > + > + if (old_ns->uts_ns != new_ns->uts_ns) > + proc_ns_connector(tsk, CLONE_NEWUTS, PROC_NM_REASON_CLONE, > old_ns->uts_ns->ns.inum, new_ns->uts_ns->ns.inum); > + > + if (old_ns->ipc_ns != new_ns->ipc_ns) > + proc_ns_connector(tsk, CLONE_NEWIPC, PROC_NM_REASON_CLONE, > old_ns->ipc_ns->ns.inum, new_ns->ipc_ns->ns.inum); > + > + if (old_ns->net_ns != new_ns->net_ns) > + proc_ns_connector(tsk, CLONE_NEWNET, PROC_NM_REASON_CLONE, > old_ns->net_ns->ns.inum, new_ns->net_ns->ns.inum); > + > + if (old_ns->cgroup_ns != new_ns->cgroup_ns) > + proc_ns_connector(tsk, CLONE_NEWCGROUP, PROC_NM_REASON_CLONE, > old_ns->cgroup_ns->ns.inum, new_ns->cgroup_ns->ns.inum); > + > + if (old_ns->pid_ns_for_children != new_ns->pid_ns_for_children) > + proc_ns_connector(tsk, CLONE_NEWPID, PROC_NM_REASON_CLONE, > old_ns->pid_ns_for_children->ns.inum, new_ns->pid_ns_for_children->ns.inum); > + } > + Patch looks good to me from technical/connector point of view, but these even multiplication is a bit weird imho. I'm not against it, but did you consider sending just 2 serialized ns structures via single message, and client would check all ns bits himself?
[PATCH] [RFC] proc connector: add namespace events
From: Alban CrequyThe act of a process creating or joining a namespace via clone(), unshare() or setns() is a useful signal for monitoring applications. I am working on a monitoring application that keeps track of all the containers and all processes inside each container. The current way of doing it is by polling regularly in /proc for the list of processes and in /proc/*/ns/* to know which namespaces they belong to. This is inefficient on systems with a large number of containers and a large number of processes. Instead, I would inspect /proc only one time and get the updates with the proc connector. Unfortunately, the proc connector gives me the list of processes but does not notify me when a process changes namespaces. So I would still need to inspect /proc/*/ns/*. This patch add namespace events for processes. It generates a namespace event each time a process changes namespace via clone(), unshare() or setns(). For example, the following command: | # unshare -n -f ls -l /proc/self/ns/net | lrwxrwxrwx 1 root root 0 Sep 6 05:35 /proc/self/ns/net -> 'net:[4026532142]' causes the proc connector to generate the following events: | fork: ppid=696 pid=858 | exec: pid=858 | ns: pid=858 type=net reason=set old_inum=4026531957 inum=4026532142 | fork: ppid=858 pid=859 | exec: pid=859 | exit: pid=859 | exit: pid=858 Note: this patch is just a RFC, we are exploring other ways to achieve the same feature. The current implementation has the following limitations: - Ideally, I want to know whether the event is cause by clone(), unshare() or setns(). At the moment, the reason field only distinguishes between clone() and non-clone. - The event for pid namespaces is generated when pid_ns_for_children changes. I think that's ok, and it just needs to be documented for userspace in the same way it is already documented in pid_namespaces(7). Userspace really needs to know whether the event is caused by clone() or non-clone to interpret the event correctly. - Events for userns are not implemented yet. I skipped it for now because user namespaces are not managed with nsproxy as other namespaces. - The mnt namespace struct is more private than other so the code is a bit different for this. I don't know if there is a better way to do this. - Userspace needs a way to know whether namespace events are implemented in the proc connector. If not implemented, userspaces needs to fallback to polling changes in /proc/*/ns/*. I am not sure whether to add a Netlink message to query the kernel if the feature is implemented or otherwise. - There is no granularity when subscribing for proc connector events. I figured it might not be a problem since namespace events are more rare than other fork/exec events. It will probably not flood existing users of the proc connector. Signed-off-by: Alban Crequy --- drivers/connector/cn_proc.c | 28 + include/linux/cn_proc.h | 4 +++ include/uapi/linux/cn_proc.h | 16 +- kernel/nsproxy.c | 71 4 files changed, 118 insertions(+), 1 deletion(-) diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c index a782ce8..69e6815 100644 --- a/drivers/connector/cn_proc.c +++ b/drivers/connector/cn_proc.c @@ -246,6 +246,34 @@ void proc_comm_connector(struct task_struct *task) send_msg(msg); } +void proc_ns_connector(struct task_struct *task, int type, int reason, u64 old_inum, u64 inum) +{ + struct cn_msg *msg; + struct proc_event *ev; + __u8 buffer[CN_PROC_MSG_SIZE] __aligned(8); + + if (atomic_read(_event_num_listeners) < 1) + return; + + msg = buffer_to_cn_msg(buffer); + ev = (struct proc_event *)msg->data; + memset(>event_data, 0, sizeof(ev->event_data)); + ev->timestamp_ns = ktime_get_ns(); + ev->what = PROC_EVENT_NM; + ev->event_data.nm.process_pid = task->pid; + ev->event_data.nm.process_tgid = task->tgid; + ev->event_data.nm.type = type; + ev->event_data.nm.reason = reason; + ev->event_data.nm.old_inum = old_inum; + ev->event_data.nm.inum = inum; + + memcpy(>id, _proc_event_id, sizeof(msg->id)); + msg->ack = 0; /* not used */ + msg->len = sizeof(*ev); + msg->flags = 0; /* not used */ + send_msg(msg); +} + void proc_coredump_connector(struct task_struct *task) { struct cn_msg *msg; diff --git a/include/linux/cn_proc.h b/include/linux/cn_proc.h index 1d5b02a..2e6915e 100644 --- a/include/linux/cn_proc.h +++ b/include/linux/cn_proc.h @@ -26,6 +26,7 @@ void proc_id_connector(struct task_struct *task, int which_id); void proc_sid_connector(struct task_struct *task); void proc_ptrace_connector(struct task_struct *task, int which_id); void proc_comm_connector(struct task_struct *task); +void proc_ns_connector(struct task_struct *task, int type, int change, u64
[PATCH] [RFC] proc connector: add namespace events
From: Alban Crequy The act of a process creating or joining a namespace via clone(), unshare() or setns() is a useful signal for monitoring applications. I am working on a monitoring application that keeps track of all the containers and all processes inside each container. The current way of doing it is by polling regularly in /proc for the list of processes and in /proc/*/ns/* to know which namespaces they belong to. This is inefficient on systems with a large number of containers and a large number of processes. Instead, I would inspect /proc only one time and get the updates with the proc connector. Unfortunately, the proc connector gives me the list of processes but does not notify me when a process changes namespaces. So I would still need to inspect /proc/*/ns/*. This patch add namespace events for processes. It generates a namespace event each time a process changes namespace via clone(), unshare() or setns(). For example, the following command: | # unshare -n -f ls -l /proc/self/ns/net | lrwxrwxrwx 1 root root 0 Sep 6 05:35 /proc/self/ns/net -> 'net:[4026532142]' causes the proc connector to generate the following events: | fork: ppid=696 pid=858 | exec: pid=858 | ns: pid=858 type=net reason=set old_inum=4026531957 inum=4026532142 | fork: ppid=858 pid=859 | exec: pid=859 | exit: pid=859 | exit: pid=858 Note: this patch is just a RFC, we are exploring other ways to achieve the same feature. The current implementation has the following limitations: - Ideally, I want to know whether the event is cause by clone(), unshare() or setns(). At the moment, the reason field only distinguishes between clone() and non-clone. - The event for pid namespaces is generated when pid_ns_for_children changes. I think that's ok, and it just needs to be documented for userspace in the same way it is already documented in pid_namespaces(7). Userspace really needs to know whether the event is caused by clone() or non-clone to interpret the event correctly. - Events for userns are not implemented yet. I skipped it for now because user namespaces are not managed with nsproxy as other namespaces. - The mnt namespace struct is more private than other so the code is a bit different for this. I don't know if there is a better way to do this. - Userspace needs a way to know whether namespace events are implemented in the proc connector. If not implemented, userspaces needs to fallback to polling changes in /proc/*/ns/*. I am not sure whether to add a Netlink message to query the kernel if the feature is implemented or otherwise. - There is no granularity when subscribing for proc connector events. I figured it might not be a problem since namespace events are more rare than other fork/exec events. It will probably not flood existing users of the proc connector. Signed-off-by: Alban Crequy --- drivers/connector/cn_proc.c | 28 + include/linux/cn_proc.h | 4 +++ include/uapi/linux/cn_proc.h | 16 +- kernel/nsproxy.c | 71 4 files changed, 118 insertions(+), 1 deletion(-) diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c index a782ce8..69e6815 100644 --- a/drivers/connector/cn_proc.c +++ b/drivers/connector/cn_proc.c @@ -246,6 +246,34 @@ void proc_comm_connector(struct task_struct *task) send_msg(msg); } +void proc_ns_connector(struct task_struct *task, int type, int reason, u64 old_inum, u64 inum) +{ + struct cn_msg *msg; + struct proc_event *ev; + __u8 buffer[CN_PROC_MSG_SIZE] __aligned(8); + + if (atomic_read(_event_num_listeners) < 1) + return; + + msg = buffer_to_cn_msg(buffer); + ev = (struct proc_event *)msg->data; + memset(>event_data, 0, sizeof(ev->event_data)); + ev->timestamp_ns = ktime_get_ns(); + ev->what = PROC_EVENT_NM; + ev->event_data.nm.process_pid = task->pid; + ev->event_data.nm.process_tgid = task->tgid; + ev->event_data.nm.type = type; + ev->event_data.nm.reason = reason; + ev->event_data.nm.old_inum = old_inum; + ev->event_data.nm.inum = inum; + + memcpy(>id, _proc_event_id, sizeof(msg->id)); + msg->ack = 0; /* not used */ + msg->len = sizeof(*ev); + msg->flags = 0; /* not used */ + send_msg(msg); +} + void proc_coredump_connector(struct task_struct *task) { struct cn_msg *msg; diff --git a/include/linux/cn_proc.h b/include/linux/cn_proc.h index 1d5b02a..2e6915e 100644 --- a/include/linux/cn_proc.h +++ b/include/linux/cn_proc.h @@ -26,6 +26,7 @@ void proc_id_connector(struct task_struct *task, int which_id); void proc_sid_connector(struct task_struct *task); void proc_ptrace_connector(struct task_struct *task, int which_id); void proc_comm_connector(struct task_struct *task); +void proc_ns_connector(struct task_struct *task, int type, int change, u64 old_inum, u64 inum); void