Il 26/07/2014 23:04, Eric W. Biederman ha scritto:
>> The most significant aspect of Capsicum is associating *rights* with
>> (some) file descriptors, so that the kernel only allows operations on an
>> FD if the rights permit it.  This allows userspace applications to
>> sandbox themselves by tightly constraining what's allowed with both
>> input and outputs; for example, tcpdump might restrict itself so it can
>> only read from the network FD, and only write to stdout.
>>
>> The kernel thus needs to police the rights checks for these file
>> descriptors (referred to as 'Capsicum capabilities', completely
>> different than POSIX.1e capabilities), and the best place to do this is
>> at the points where a file descriptor from userspace is converted to a
>> struct file * within the kernel.
>>
>>   [Policing the rights checks anywhere else, for example at the system
>>   call boundary, isn't a good idea because it opens up the possibility
>>   of time-of-check/time-of-use (TOCTOU) attacks [2] where FDs are
>>   changed (as openat/close/dup2 are allowed in capability mode) between
>>   the 'check' at syscall entry and the 'use' at fget() invocation.]
>>
>> However, this does lead to quite an invasive change to the kernel --
>> every invocation of fget() or similar functions (fdget(),
>> sockfd_lookup(), user_path_at(),...) needs to be annotated with the
>> rights associated with the specific operations that will be performed on
>> the struct file.  There are ~100 such invocations that need
>> annotation.
> 
> And it is silly.  Roughly you just need a locking version of
> fcntl(F_SETFL).
> 
> That is make the restriction in the struct file not in the fd to file
> lookup.

No, they have to be in the file descriptor.  The same file descriptor
can be dup'ed and passed with different capabilities to different processes.

Say you pass an eventfd to a process with SCM_RIGHTS, and you want to
only allow the process to write to it.

>> 4) New System Calls
>> -------------------
>>
>> To allow userspace applications to access the Capsicum capability
>> functionality, I'm proposing two new system calls: cap_rights_limit(2)
>> and cap_rights_get(2).  I guess these could potentially be implemented
>> elsewhere (e.g. as fcntl(2) operations?) but the changes seem
>> significant enough that new syscalls are warranted.
>>
>>   [FreeBSD 10.x actually includes six new syscalls for manipulating the
>>   rights associated with a Capsicum capability -- the capability rights
>>   can police that only specific fcntl(2) or ioctl(2) commands are
>>   allowed, and FreeBSD sets these with distinct syscalls.]
> 
> ioctls?  In a sandbox?  Ick.

KVM?  X11?  Both of them use loads of ioctls.  I'm less sure of the
benefit of picking which fcntls to allow.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to