On 01/09/2014 03:55 PM, Victor Porton wrote:
> In Fedora there is bin/sandbox command which runs a specified command in so 
> called 'sandbox'. Program running in sandbox cannot open new files (it is 
> commonly used with preopen stdin and stdout) and possibly its access to 
> network is limited. It is intended to run potentially malicious software 
> safely.
> 
> This Fedora sandbox is not perfect however.
> 
> One problem is:
> 
> Suppose the sandboxed program spawned some child processes and exited itself.
> 
> Suppose we want to kill the sandboxed program after 30 second, if it has not 
> exited voluntarily.
> 
> The trouble is that the software cannot figure out which processes have 
> appeared from the sandboxed binary. So we are unable to kill these processes 
> automatically. This means that a hacker can in this way create thousands (or 
> more) processes which would overload the system.
> 
> Also note that the sandboxed program may run setsid() and thus its identity 
> may be lost completely.
> 
> I propose to add parameter sandbox_id to each process in the kernel. It would 
> be 0 for normal processes and allocated like PID or GID for processes we 
> create in sandbox. Children inherit sandbox_id. There should be an API call 
> using which a process makes it sandboxed_id non-zero (which returns EPERM if 
> it is already non-zero).
> 
> Then there should be API to enumerate all processes with given sandbox_id, so 
> that we would be able to kill them (-TERM or -KILL). Or maybe we should also 
> have the function which sends the given signal to all processes with given 
> sandbox_id (otherwise we would war with a hacker which could possibly create 
> new children faster than we kill them).

I think you need to think bigger :)

I've occasionally pondered how to do real tracking of process trees
(sandbox could use it, but I was thinking of systemd and other service
managers).  cgroups* suck for this purpose.

One approach would be to have another subreaper mode (subreaper mode 2)
that does three things:
 - Subreaper mode 2 zombies do not send SIGCHLD and cannot be reaped
until they have no descendents left.
 - Direct zombie children of subreaper mode 2 zombies are automatically
reaped.
 - Descendents that need to be reparented are reparented to the
subreaper, just like in subreaper mode 1.

Then you'd add an API that takes the PID of a mode 2 subreaper and kills
its entire process subtree.  (Optionally, tgkill could do that
automatically.)

To use this for sandbox, sandbox would set subreaper mode 2 and then
fork.  The initial sandbox process would exit and the child would exec
into the sandbox.  The parent would stick around as a zombie until the
whole tree went away.

To use this for an init-like program, the service manager would
fork/clone a dummy PID, set subreaper mode 2, fork again, and exec the
service.  That dummy PID would serve as a persistent reference to the
subtree.

For added fun, there should be a way to efficiently find the mode 2
subreaper that owns a given pid/tid.  That way systemd / journald could
map PIDs to service names without mucking with cgroups.

An alternative formulation of more or less the same thing would be a
syscall manage_pid_subtree(pid_t pid) that does, roughly:

  if (pid->real_parent != current) return -EINVAL;
  set subreaper mode;
  exit current mm, signal set, etc to conserve resources;
  /* at this point, current is essentially a kernel thread. */
  wait for pid to exit;
  exit, copying pid's return code and other exit siginfo state;

To manage a subreaper, you double-fork, and then the middle process
would call manage_pid_subtree on its child.

Thoughts?

* Goddamnit, systemd, I want a way to turn *off* your control of the One
True Cgroup Hierarchy (TM).  I consider the lack of such a mechanism to
be a serious upcoming regression.  Maybe if the kernel gives systemd a
way to do this, systemd will use it.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to