Re: [PATCH] [RFC] proc connector: add namespace events

2016-09-14 Thread Jiri Benc
On Tue, 13 Sep 2016 16:42:43 +0200, Alban Crequy wrote:
> Note that I will probably not have the chance to spend more time on
> this patch soon because Iago will explore other methods with
> eBPF+kprobes instead. eBPF+kprobes would not have the same API
> stability though. I was curious to see if anyone would find the
> namespace addition to proc connector interesting for other projects.

Yes, this is a sorely missing feature. I don't care how this is done
(proc connector or something else) but the feature itself is quite
important for system management daemons. In particular, we need an
application that monitors network configuration changes on the machine,
displays the current configuration and records history of the changes.
This is currently impossible to do reliably if net name spaces are in
use - which they are with OpenStack and Docker and similar things in
place on those machines. The current tools try to do things like
monitoring /var/run/netns which is obviously unreliable and broken.

There are actually two (orthogonal) problems here: apart of the one
described above, it's also startup of such daemon. There's currently no
way to find all current name spaces from the user space. We'll need an
API for this, too.

And no, eBPF is not the answer. This should just work like any other
system daemon. I can't imagine that we would need llvm compiler and
kernel sources/debuginfo/whatever on every machine that runs such
daemon.

Thanks,

 Jiri


Re: [PATCH] [RFC] proc connector: add namespace events

2016-09-14 Thread Jiri Benc
On Tue, 13 Sep 2016 16:42:43 +0200, Alban Crequy wrote:
> Note that I will probably not have the chance to spend more time on
> this patch soon because Iago will explore other methods with
> eBPF+kprobes instead. eBPF+kprobes would not have the same API
> stability though. I was curious to see if anyone would find the
> namespace addition to proc connector interesting for other projects.

Yes, this is a sorely missing feature. I don't care how this is done
(proc connector or something else) but the feature itself is quite
important for system management daemons. In particular, we need an
application that monitors network configuration changes on the machine,
displays the current configuration and records history of the changes.
This is currently impossible to do reliably if net name spaces are in
use - which they are with OpenStack and Docker and similar things in
place on those machines. The current tools try to do things like
monitoring /var/run/netns which is obviously unreliable and broken.

There are actually two (orthogonal) problems here: apart of the one
described above, it's also startup of such daemon. There's currently no
way to find all current name spaces from the user space. We'll need an
API for this, too.

And no, eBPF is not the answer. This should just work like any other
system daemon. I can't imagine that we would need llvm compiler and
kernel sources/debuginfo/whatever on every machine that runs such
daemon.

Thanks,

 Jiri


Re: [PATCH] [RFC] proc connector: add namespace events

2016-09-13 Thread Alban Crequy
On 12 September 2016 at 23:39, Evgeniy Polyakov  wrote:
> Hi everyone
>
> 08.09.2016, 18:39, "Alban Crequy" :
>> The act of a process creating or joining a namespace via clone(),
>> unshare() or setns() is a useful signal for monitoring applications.
>
>> + if (old_ns->mnt_ns != new_ns->mnt_ns)
>> + proc_ns_connector(tsk, CLONE_NEWNS, PROC_NM_REASON_CLONE, old_mntns_inum, 
>> new_mntns_inum);
>> +
>> + if (old_ns->uts_ns != new_ns->uts_ns)
>> + proc_ns_connector(tsk, CLONE_NEWUTS, PROC_NM_REASON_CLONE, 
>> old_ns->uts_ns->ns.inum, new_ns->uts_ns->ns.inum);
>> +
>> + if (old_ns->ipc_ns != new_ns->ipc_ns)
>> + proc_ns_connector(tsk, CLONE_NEWIPC, PROC_NM_REASON_CLONE, 
>> old_ns->ipc_ns->ns.inum, new_ns->ipc_ns->ns.inum);
>> +
>> + if (old_ns->net_ns != new_ns->net_ns)
>> + proc_ns_connector(tsk, CLONE_NEWNET, PROC_NM_REASON_CLONE, 
>> old_ns->net_ns->ns.inum, new_ns->net_ns->ns.inum);
>> +
>> + if (old_ns->cgroup_ns != new_ns->cgroup_ns)
>> + proc_ns_connector(tsk, CLONE_NEWCGROUP, PROC_NM_REASON_CLONE, 
>> old_ns->cgroup_ns->ns.inum, new_ns->cgroup_ns->ns.inum);
>> +
>> + if (old_ns->pid_ns_for_children != new_ns->pid_ns_for_children)
>> + proc_ns_connector(tsk, CLONE_NEWPID, PROC_NM_REASON_CLONE, 
>> old_ns->pid_ns_for_children->ns.inum, new_ns->pid_ns_for_children->ns.inum);
>> + }
>> +
>
> Patch looks good to me from technical/connector point of view, but these even 
> multiplication is a bit weird imho.
>
> I'm not against it, but did you consider sending just 2 serialized ns 
> structures via single message, and client
> would check all ns bits himself?

I have not considered it, thanks for the suggestion. Should we offer
the guarantee to userspace that it will always be send in one single
message? If we want to give the information about the userns change
too, it will be a bit more complicated to write the patch because it
is not done in the same function.

Note that I will probably not have the chance to spend more time on
this patch soon because Iago will explore other methods with
eBPF+kprobes instead. eBPF+kprobes would not have the same API
stability though. I was curious to see if anyone would find the
namespace addition to proc connector interesting for other projects.


Re: [PATCH] [RFC] proc connector: add namespace events

2016-09-13 Thread Alban Crequy
On 12 September 2016 at 23:39, Evgeniy Polyakov  wrote:
> Hi everyone
>
> 08.09.2016, 18:39, "Alban Crequy" :
>> The act of a process creating or joining a namespace via clone(),
>> unshare() or setns() is a useful signal for monitoring applications.
>
>> + if (old_ns->mnt_ns != new_ns->mnt_ns)
>> + proc_ns_connector(tsk, CLONE_NEWNS, PROC_NM_REASON_CLONE, old_mntns_inum, 
>> new_mntns_inum);
>> +
>> + if (old_ns->uts_ns != new_ns->uts_ns)
>> + proc_ns_connector(tsk, CLONE_NEWUTS, PROC_NM_REASON_CLONE, 
>> old_ns->uts_ns->ns.inum, new_ns->uts_ns->ns.inum);
>> +
>> + if (old_ns->ipc_ns != new_ns->ipc_ns)
>> + proc_ns_connector(tsk, CLONE_NEWIPC, PROC_NM_REASON_CLONE, 
>> old_ns->ipc_ns->ns.inum, new_ns->ipc_ns->ns.inum);
>> +
>> + if (old_ns->net_ns != new_ns->net_ns)
>> + proc_ns_connector(tsk, CLONE_NEWNET, PROC_NM_REASON_CLONE, 
>> old_ns->net_ns->ns.inum, new_ns->net_ns->ns.inum);
>> +
>> + if (old_ns->cgroup_ns != new_ns->cgroup_ns)
>> + proc_ns_connector(tsk, CLONE_NEWCGROUP, PROC_NM_REASON_CLONE, 
>> old_ns->cgroup_ns->ns.inum, new_ns->cgroup_ns->ns.inum);
>> +
>> + if (old_ns->pid_ns_for_children != new_ns->pid_ns_for_children)
>> + proc_ns_connector(tsk, CLONE_NEWPID, PROC_NM_REASON_CLONE, 
>> old_ns->pid_ns_for_children->ns.inum, new_ns->pid_ns_for_children->ns.inum);
>> + }
>> +
>
> Patch looks good to me from technical/connector point of view, but these even 
> multiplication is a bit weird imho.
>
> I'm not against it, but did you consider sending just 2 serialized ns 
> structures via single message, and client
> would check all ns bits himself?

I have not considered it, thanks for the suggestion. Should we offer
the guarantee to userspace that it will always be send in one single
message? If we want to give the information about the userns change
too, it will be a bit more complicated to write the patch because it
is not done in the same function.

Note that I will probably not have the chance to spend more time on
this patch soon because Iago will explore other methods with
eBPF+kprobes instead. eBPF+kprobes would not have the same API
stability though. I was curious to see if anyone would find the
namespace addition to proc connector interesting for other projects.


Re: [PATCH] [RFC] proc connector: add namespace events

2016-09-12 Thread Evgeniy Polyakov
Hi everyone

08.09.2016, 18:39, "Alban Crequy" :
> The act of a process creating or joining a namespace via clone(),
> unshare() or setns() is a useful signal for monitoring applications.

> + if (old_ns->mnt_ns != new_ns->mnt_ns)
> + proc_ns_connector(tsk, CLONE_NEWNS, PROC_NM_REASON_CLONE, old_mntns_inum, 
> new_mntns_inum);
> +
> + if (old_ns->uts_ns != new_ns->uts_ns)
> + proc_ns_connector(tsk, CLONE_NEWUTS, PROC_NM_REASON_CLONE, 
> old_ns->uts_ns->ns.inum, new_ns->uts_ns->ns.inum);
> +
> + if (old_ns->ipc_ns != new_ns->ipc_ns)
> + proc_ns_connector(tsk, CLONE_NEWIPC, PROC_NM_REASON_CLONE, 
> old_ns->ipc_ns->ns.inum, new_ns->ipc_ns->ns.inum);
> +
> + if (old_ns->net_ns != new_ns->net_ns)
> + proc_ns_connector(tsk, CLONE_NEWNET, PROC_NM_REASON_CLONE, 
> old_ns->net_ns->ns.inum, new_ns->net_ns->ns.inum);
> +
> + if (old_ns->cgroup_ns != new_ns->cgroup_ns)
> + proc_ns_connector(tsk, CLONE_NEWCGROUP, PROC_NM_REASON_CLONE, 
> old_ns->cgroup_ns->ns.inum, new_ns->cgroup_ns->ns.inum);
> +
> + if (old_ns->pid_ns_for_children != new_ns->pid_ns_for_children)
> + proc_ns_connector(tsk, CLONE_NEWPID, PROC_NM_REASON_CLONE, 
> old_ns->pid_ns_for_children->ns.inum, new_ns->pid_ns_for_children->ns.inum);
> + }
> +

Patch looks good to me from technical/connector point of view, but these even 
multiplication is a bit weird imho.

I'm not against it, but did you consider sending just 2 serialized ns 
structures via single message, and client
would check all ns bits himself?


Re: [PATCH] [RFC] proc connector: add namespace events

2016-09-12 Thread Evgeniy Polyakov
Hi everyone

08.09.2016, 18:39, "Alban Crequy" :
> The act of a process creating or joining a namespace via clone(),
> unshare() or setns() is a useful signal for monitoring applications.

> + if (old_ns->mnt_ns != new_ns->mnt_ns)
> + proc_ns_connector(tsk, CLONE_NEWNS, PROC_NM_REASON_CLONE, old_mntns_inum, 
> new_mntns_inum);
> +
> + if (old_ns->uts_ns != new_ns->uts_ns)
> + proc_ns_connector(tsk, CLONE_NEWUTS, PROC_NM_REASON_CLONE, 
> old_ns->uts_ns->ns.inum, new_ns->uts_ns->ns.inum);
> +
> + if (old_ns->ipc_ns != new_ns->ipc_ns)
> + proc_ns_connector(tsk, CLONE_NEWIPC, PROC_NM_REASON_CLONE, 
> old_ns->ipc_ns->ns.inum, new_ns->ipc_ns->ns.inum);
> +
> + if (old_ns->net_ns != new_ns->net_ns)
> + proc_ns_connector(tsk, CLONE_NEWNET, PROC_NM_REASON_CLONE, 
> old_ns->net_ns->ns.inum, new_ns->net_ns->ns.inum);
> +
> + if (old_ns->cgroup_ns != new_ns->cgroup_ns)
> + proc_ns_connector(tsk, CLONE_NEWCGROUP, PROC_NM_REASON_CLONE, 
> old_ns->cgroup_ns->ns.inum, new_ns->cgroup_ns->ns.inum);
> +
> + if (old_ns->pid_ns_for_children != new_ns->pid_ns_for_children)
> + proc_ns_connector(tsk, CLONE_NEWPID, PROC_NM_REASON_CLONE, 
> old_ns->pid_ns_for_children->ns.inum, new_ns->pid_ns_for_children->ns.inum);
> + }
> +

Patch looks good to me from technical/connector point of view, but these even 
multiplication is a bit weird imho.

I'm not against it, but did you consider sending just 2 serialized ns 
structures via single message, and client
would check all ns bits himself?


[PATCH] [RFC] proc connector: add namespace events

2016-09-08 Thread Alban Crequy
From: Alban Crequy 

The act of a process creating or joining a namespace via clone(),
unshare() or setns() is a useful signal for monitoring applications.

I am working on a monitoring application that keeps track of all the
containers and all processes inside each container. The current way of
doing it is by polling regularly in /proc for the list of processes and
in /proc/*/ns/* to know which namespaces they belong to. This is
inefficient on systems with a large number of containers and a large
number of processes.

Instead, I would inspect /proc only one time and get the updates with
the proc connector. Unfortunately, the proc connector gives me the list
of processes but does not notify me when a process changes namespaces.
So I would still need to inspect /proc/*/ns/*.

This patch add namespace events for processes. It generates a namespace
event each time a process changes namespace via clone(), unshare() or
setns().

For example, the following command:
| # unshare -n -f ls -l /proc/self/ns/net
| lrwxrwxrwx 1 root root 0 Sep  6 05:35 /proc/self/ns/net -> 'net:[4026532142]'

causes the proc connector to generate the following events:
| fork: ppid=696 pid=858
| exec: pid=858
| ns: pid=858 type=net reason=set old_inum=4026531957 inum=4026532142
| fork: ppid=858 pid=859
| exec: pid=859
| exit: pid=859
| exit: pid=858

Note: this patch is just a RFC, we are exploring other ways to achieve
  the same feature.

The current implementation has the following limitations:

- Ideally, I want to know whether the event is cause by clone(),
  unshare() or setns(). At the moment, the reason field only
  distinguishes between clone() and non-clone.

- The event for pid namespaces is generated when pid_ns_for_children
  changes. I think that's ok, and it just needs to be documented for
  userspace in the same way it is already documented in
  pid_namespaces(7). Userspace really needs to know whether the event is
  caused by clone() or non-clone to interpret the event correctly.

- Events for userns are not implemented yet. I skipped it for now
  because user namespaces are not managed with nsproxy as other namespaces.

- The mnt namespace struct is more private than other so the code is a
  bit different for this. I don't know if there is a better way to do
  this.

- Userspace needs a way to know whether namespace events are implemented
  in the proc connector. If not implemented, userspaces needs to
  fallback to polling changes in /proc/*/ns/*. I am not sure whether to
  add a Netlink message to query the kernel if the feature is implemented
  or otherwise.

- There is no granularity when subscribing for proc connector events. I
  figured it might not be a problem since namespace events are more rare
  than other fork/exec events. It will probably not flood existing users
  of the proc connector.

Signed-off-by: Alban Crequy 
---
 drivers/connector/cn_proc.c  | 28 +
 include/linux/cn_proc.h  |  4 +++
 include/uapi/linux/cn_proc.h | 16 +-
 kernel/nsproxy.c | 71 
 4 files changed, 118 insertions(+), 1 deletion(-)

diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index a782ce8..69e6815 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -246,6 +246,34 @@ void proc_comm_connector(struct task_struct *task)
send_msg(msg);
 }
 
+void proc_ns_connector(struct task_struct *task, int type, int reason, u64 
old_inum, u64 inum)
+{
+   struct cn_msg *msg;
+   struct proc_event *ev;
+   __u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
+
+   if (atomic_read(_event_num_listeners) < 1)
+   return;
+
+   msg = buffer_to_cn_msg(buffer);
+   ev = (struct proc_event *)msg->data;
+   memset(>event_data, 0, sizeof(ev->event_data));
+   ev->timestamp_ns = ktime_get_ns();
+   ev->what = PROC_EVENT_NM;
+   ev->event_data.nm.process_pid  = task->pid;
+   ev->event_data.nm.process_tgid = task->tgid;
+   ev->event_data.nm.type = type;
+   ev->event_data.nm.reason = reason;
+   ev->event_data.nm.old_inum = old_inum;
+   ev->event_data.nm.inum = inum;
+
+   memcpy(>id, _proc_event_id, sizeof(msg->id));
+   msg->ack = 0; /* not used */
+   msg->len = sizeof(*ev);
+   msg->flags = 0; /* not used */
+   send_msg(msg);
+}
+
 void proc_coredump_connector(struct task_struct *task)
 {
struct cn_msg *msg;
diff --git a/include/linux/cn_proc.h b/include/linux/cn_proc.h
index 1d5b02a..2e6915e 100644
--- a/include/linux/cn_proc.h
+++ b/include/linux/cn_proc.h
@@ -26,6 +26,7 @@ void proc_id_connector(struct task_struct *task, int 
which_id);
 void proc_sid_connector(struct task_struct *task);
 void proc_ptrace_connector(struct task_struct *task, int which_id);
 void proc_comm_connector(struct task_struct *task);
+void proc_ns_connector(struct task_struct *task, int type, int change, u64 

[PATCH] [RFC] proc connector: add namespace events

2016-09-08 Thread Alban Crequy
From: Alban Crequy 

The act of a process creating or joining a namespace via clone(),
unshare() or setns() is a useful signal for monitoring applications.

I am working on a monitoring application that keeps track of all the
containers and all processes inside each container. The current way of
doing it is by polling regularly in /proc for the list of processes and
in /proc/*/ns/* to know which namespaces they belong to. This is
inefficient on systems with a large number of containers and a large
number of processes.

Instead, I would inspect /proc only one time and get the updates with
the proc connector. Unfortunately, the proc connector gives me the list
of processes but does not notify me when a process changes namespaces.
So I would still need to inspect /proc/*/ns/*.

This patch add namespace events for processes. It generates a namespace
event each time a process changes namespace via clone(), unshare() or
setns().

For example, the following command:
| # unshare -n -f ls -l /proc/self/ns/net
| lrwxrwxrwx 1 root root 0 Sep  6 05:35 /proc/self/ns/net -> 'net:[4026532142]'

causes the proc connector to generate the following events:
| fork: ppid=696 pid=858
| exec: pid=858
| ns: pid=858 type=net reason=set old_inum=4026531957 inum=4026532142
| fork: ppid=858 pid=859
| exec: pid=859
| exit: pid=859
| exit: pid=858

Note: this patch is just a RFC, we are exploring other ways to achieve
  the same feature.

The current implementation has the following limitations:

- Ideally, I want to know whether the event is cause by clone(),
  unshare() or setns(). At the moment, the reason field only
  distinguishes between clone() and non-clone.

- The event for pid namespaces is generated when pid_ns_for_children
  changes. I think that's ok, and it just needs to be documented for
  userspace in the same way it is already documented in
  pid_namespaces(7). Userspace really needs to know whether the event is
  caused by clone() or non-clone to interpret the event correctly.

- Events for userns are not implemented yet. I skipped it for now
  because user namespaces are not managed with nsproxy as other namespaces.

- The mnt namespace struct is more private than other so the code is a
  bit different for this. I don't know if there is a better way to do
  this.

- Userspace needs a way to know whether namespace events are implemented
  in the proc connector. If not implemented, userspaces needs to
  fallback to polling changes in /proc/*/ns/*. I am not sure whether to
  add a Netlink message to query the kernel if the feature is implemented
  or otherwise.

- There is no granularity when subscribing for proc connector events. I
  figured it might not be a problem since namespace events are more rare
  than other fork/exec events. It will probably not flood existing users
  of the proc connector.

Signed-off-by: Alban Crequy 
---
 drivers/connector/cn_proc.c  | 28 +
 include/linux/cn_proc.h  |  4 +++
 include/uapi/linux/cn_proc.h | 16 +-
 kernel/nsproxy.c | 71 
 4 files changed, 118 insertions(+), 1 deletion(-)

diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c
index a782ce8..69e6815 100644
--- a/drivers/connector/cn_proc.c
+++ b/drivers/connector/cn_proc.c
@@ -246,6 +246,34 @@ void proc_comm_connector(struct task_struct *task)
send_msg(msg);
 }
 
+void proc_ns_connector(struct task_struct *task, int type, int reason, u64 
old_inum, u64 inum)
+{
+   struct cn_msg *msg;
+   struct proc_event *ev;
+   __u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
+
+   if (atomic_read(_event_num_listeners) < 1)
+   return;
+
+   msg = buffer_to_cn_msg(buffer);
+   ev = (struct proc_event *)msg->data;
+   memset(>event_data, 0, sizeof(ev->event_data));
+   ev->timestamp_ns = ktime_get_ns();
+   ev->what = PROC_EVENT_NM;
+   ev->event_data.nm.process_pid  = task->pid;
+   ev->event_data.nm.process_tgid = task->tgid;
+   ev->event_data.nm.type = type;
+   ev->event_data.nm.reason = reason;
+   ev->event_data.nm.old_inum = old_inum;
+   ev->event_data.nm.inum = inum;
+
+   memcpy(>id, _proc_event_id, sizeof(msg->id));
+   msg->ack = 0; /* not used */
+   msg->len = sizeof(*ev);
+   msg->flags = 0; /* not used */
+   send_msg(msg);
+}
+
 void proc_coredump_connector(struct task_struct *task)
 {
struct cn_msg *msg;
diff --git a/include/linux/cn_proc.h b/include/linux/cn_proc.h
index 1d5b02a..2e6915e 100644
--- a/include/linux/cn_proc.h
+++ b/include/linux/cn_proc.h
@@ -26,6 +26,7 @@ void proc_id_connector(struct task_struct *task, int 
which_id);
 void proc_sid_connector(struct task_struct *task);
 void proc_ptrace_connector(struct task_struct *task, int which_id);
 void proc_comm_connector(struct task_struct *task);
+void proc_ns_connector(struct task_struct *task, int type, int change, u64 
old_inum, u64 inum);
 void