Re: syscall: introduce sendfd() syscall (v.2)

2014-12-07 Thread Pavel Machek
On Fri 2014-12-05 13:22:50, One Thousand Gnomes wrote:
> 
> > 2.a. If task A has sufficient capabilities to send signals to task B, then
> > task A is already in position to do anything it wants with task B, including
> > killing it outright.
> 
> Not entirely true.
> 
> - We have securirty models like SELinux
> - We have namespaces and being able to send an fd between namespaces is
>   not quite as flexible as you would make it
> 
> I suspect therefore it needs security hooks but otherwise looks more sane
> than the current AF_UNIX approach.

The right test for "can do anything" is "can_ptrace()"...
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: syscall: introduce sendfd() syscall (v.2)

2014-12-07 Thread Pavel Machek
On Fri 2014-12-05 13:22:50, One Thousand Gnomes wrote:
 
  2.a. If task A has sufficient capabilities to send signals to task B, then
  task A is already in position to do anything it wants with task B, including
  killing it outright.
 
 Not entirely true.
 
 - We have securirty models like SELinux
 - We have namespaces and being able to send an fd between namespaces is
   not quite as flexible as you would make it
 
 I suspect therefore it needs security hooks but otherwise looks more sane
 than the current AF_UNIX approach.

The right test for can do anything is can_ptrace()...
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: syscall: introduce sendfd() syscall (v.2)

2014-12-05 Thread Alex Dubov
On Sat, Dec 6, 2014 at 12:22 AM, One Thousand Gnomes
 wrote:
>
>> 2.a. If task A has sufficient capabilities to send signals to task B, then
>> task A is already in position to do anything it wants with task B, including
>> killing it outright.
>
> Not entirely true.
>
> - We have securirty models like SELinux
> - We have namespaces and being able to send an fd between namespaces is
>   not quite as flexible as you would make it
>
> I suspect therefore it needs security hooks but otherwise looks more sane
> than the current AF_UNIX approach.
>

The best part about signal transport compared to anything in net/ is
that it adheres to very straightforward and simple API contract. That
is, you can tweak it here and there and still keep everything working.

1. adding an additional capability flag to selinux does not appear to
be that complicated (it's got 4 capabilities related to signal
handling already, fifth is not going to make much difference)

2. sending fds between namespaces may be prohibited outright; this
would not be an unreasonable prohibition. A more flexible model may
also be feasible, but I wonder if necessary.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] syscall: introduce sendfd() syscall (v.2)

2014-12-05 Thread Alex Dubov
On Sat, Dec 6, 2014 at 6:23 AM, Bastien ROUCARIES
 wrote:
>
>
> See senfd recvfd in gnulib. It wirk even under solaris
>

What's so special about a thin wrapper around domain sockets/named
fifos (solaris style)?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: syscall: introduce sendfd() syscall (v.2)

2014-12-05 Thread One Thousand Gnomes

> 2.a. If task A has sufficient capabilities to send signals to task B, then
> task A is already in position to do anything it wants with task B, including
> killing it outright.

Not entirely true.

- We have securirty models like SELinux
- We have namespaces and being able to send an fd between namespaces is
  not quite as flexible as you would make it

I suspect therefore it needs security hooks but otherwise looks more sane
than the current AF_UNIX approach.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: syscall: introduce sendfd() syscall (v.2)

2014-12-05 Thread One Thousand Gnomes

 2.a. If task A has sufficient capabilities to send signals to task B, then
 task A is already in position to do anything it wants with task B, including
 killing it outright.

Not entirely true.

- We have securirty models like SELinux
- We have namespaces and being able to send an fd between namespaces is
  not quite as flexible as you would make it

I suspect therefore it needs security hooks but otherwise looks more sane
than the current AF_UNIX approach.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] syscall: introduce sendfd() syscall (v.2)

2014-12-05 Thread Alex Dubov
On Sat, Dec 6, 2014 at 6:23 AM, Bastien ROUCARIES
roucaries.bast...@gmail.com wrote:


 See senfd recvfd in gnulib. It wirk even under solaris


What's so special about a thin wrapper around domain sockets/named
fifos (solaris style)?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: syscall: introduce sendfd() syscall (v.2)

2014-12-05 Thread Alex Dubov
On Sat, Dec 6, 2014 at 12:22 AM, One Thousand Gnomes
gno...@lxorguk.ukuu.org.uk wrote:

 2.a. If task A has sufficient capabilities to send signals to task B, then
 task A is already in position to do anything it wants with task B, including
 killing it outright.

 Not entirely true.

 - We have securirty models like SELinux
 - We have namespaces and being able to send an fd between namespaces is
   not quite as flexible as you would make it

 I suspect therefore it needs security hooks but otherwise looks more sane
 than the current AF_UNIX approach.


The best part about signal transport compared to anything in net/ is
that it adheres to very straightforward and simple API contract. That
is, you can tweak it here and there and still keep everything working.

1. adding an additional capability flag to selinux does not appear to
be that complicated (it's got 4 capabilities related to signal
handling already, fifth is not going to make much difference)

2. sending fds between namespaces may be prohibited outright; this
would not be an unreasonable prohibition. A more flexible model may
also be feasible, but I wonder if necessary.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


syscall: introduce sendfd() syscall (v.2)

2014-12-03 Thread Alex Dubov
I would like to present my second attempt at file descriptor duplication over
Posix.1b real-time signal transport. All the constructive points raised
in the previous discussion are believed to be addressed.

To this end, I would like to address some concerns raised in the preceding
discussion:

1. Claim: signals as a transport would not scale

Each task_struct allocated by kernel has its own signal queue, reliable, when
Posix.1b signals are concerned. This queue essentially serves as per-task
mail box, enabling complex applications to send signals from each thread to
each thread directly, with very low overhead, and thus avoid any shared
contention points outright (originating task's pid is passed along with
the siginfo data, so source based dispatching is perfectly possible).

Also, signals can be trivially integrated with other communication mediums,
as signalfd() syscall is perfectly compatible with epoll.

2. Claim: adding new functionality to the signal transport will create new
attack/DoS vectors.

Nothing can be further from truth.

2.a. If task A has sufficient capabilities to send signals to task B, then
task A is already in position to do anything it wants with task B, including
killing it outright.

2.b. Flood attacks on signal queues are not dangerous to the system, as signal
queues are relatively shallow and consume little memory even when full. Compare
with infamous "recursive fd" attack against AF_UNIX fd transport , which plagues
application development to this day (due to safeguards introduced to alleviate
it).

2.c. Natural decoupling of signal transport from vfs internals prevents any
sort of "recursive fd" attacks altogether (it is even safe to send the
signalfd() fd through - this can be considered a convenient feature to
replicate signal delivery masks around; of course, the receiving task will only
receive its own signals through it, peeking on other task's signals will not be
possible).

3. Suggestion: new file desriptors should not appear in destination processes
out of the blue.

3.a. To receive the signal, process must make non-trivial preparations (
manipulate signal masks, etc), which would only happen if certain signals
are expected.

3.b. In present implementation, file desriptor is only created at the
destination when destination task explictly elects to receive the associated
signal info with sigtimedwait/signalfd. In the absence of destination task
cooperation, the only overhead on the kernel side will be a single pair
of ref_count increment/decrement, that is, completely negligible.

3.c. Due to the nature of siginfo delivery, operations on file descriptor table
are completely safe and indistinguishable from a normal dup() system call.

I would appreciate any additional constructive criticism, as it is in my
interest as well to end up with safe and simple solution. However, I would
prefer the criticism to target particular technical shortcomings, and not be
derived from personal preferences, if possible.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] syscall: introduce sendfd() syscall (v.2)

2014-12-03 Thread Alex Dubov
Present patch introduces exceptionally easy to use, low latency and low
overhead mechanism for transferring file descriptors between cooperating
processes:

int sendfd(pid_t pid, int sig, int fd)

Given a target process pid, the sendfd() will queue a real-time signal for
delivery to task referenced by pid. If signal can be delivered to destination
tasks and it chooses to collect the associated signal info, a new file
descriptor will be created on its behalf, pointing to file originally referred
by fd (the value of newly created file descriptor will be communicated as
integer payload within the siginfo data).

Signed-off-by: Alex Dubov 
---
 arch/x86/syscalls/syscall_32.tbl   |  2 +
 arch/x86/syscalls/syscall_64.tbl   |  1 +
 include/asm-generic/siginfo.h  |  1 +
 include/linux/syscalls.h   |  1 +
 include/uapi/asm-generic/siginfo.h |  1 +
 init/Kconfig   | 11 +
 kernel/signal.c| 89 ++
 kernel/sys_ni.c|  3 ++
 8 files changed, 109 insertions(+)

diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl
index 9fe1b5d..e2782bd 100644
--- a/arch/x86/syscalls/syscall_32.tbl
+++ b/arch/x86/syscalls/syscall_32.tbl
@@ -364,3 +364,5 @@
 355i386getrandom   sys_getrandom
 356i386memfd_createsys_memfd_create
 357i386bpf sys_bpf
+358i386sendfd  sys_sendfd
+
diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
index 281150b..4d6b55d 100644
--- a/arch/x86/syscalls/syscall_64.tbl
+++ b/arch/x86/syscalls/syscall_64.tbl
@@ -328,6 +328,7 @@
 319common  memfd_createsys_memfd_create
 320common  kexec_file_load sys_kexec_file_load
 321common  bpf sys_bpf
+322common  sendfd  sys_sendfd
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/include/asm-generic/siginfo.h b/include/asm-generic/siginfo.h
index 3d1a3af..c8af06f 100644
--- a/include/asm-generic/siginfo.h
+++ b/include/asm-generic/siginfo.h
@@ -12,6 +12,7 @@
 #define __SI_RT(5 << 16)
 #define __SI_MESGQ (6 << 16)
 #define __SI_SYS   (7 << 16)
+#define __SI_FILEP (8 << 16)
 #define __SI_CODE(T,N) ((T) | ((N) & 0x))
 
 struct siginfo;
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index bda9b81..1871b72f 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -877,4 +877,5 @@ asmlinkage long sys_seccomp(unsigned int op, unsigned int 
flags,
 asmlinkage long sys_getrandom(char __user *buf, size_t count,
  unsigned int flags);
 asmlinkage long sys_bpf(int cmd, union bpf_attr *attr, unsigned int size);
+asmlinkage long sys_sendfd(pid_t pid, int sig, int fd);
 #endif
diff --git a/include/uapi/asm-generic/siginfo.h 
b/include/uapi/asm-generic/siginfo.h
index ba5be7f..a92e38e 100644
--- a/include/uapi/asm-generic/siginfo.h
+++ b/include/uapi/asm-generic/siginfo.h
@@ -148,6 +148,7 @@ typedef struct siginfo {
 #define __SI_RT0
 #define __SI_MESGQ 0
 #define __SI_SYS   0
+#define __SI_FILEP 0
 #define __SI_CODE(T,N) (N)
 #endif
 
diff --git a/init/Kconfig b/init/Kconfig
index 2081a4d..6a62a44 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1505,6 +1505,17 @@ config SIGNALFD
 
  If unsure, say Y.
 
+config SENDFD
+   bool "Enable sendfd() system call" if EXPERT
+   default y
+   help
+ Enable the sendfd() system call that allows rapid duplication
+ of file descriptor across process boundaries. The target process
+ will receive a duplicate file descriptor delivered with one of
+ Posix.1b real-time signals.
+
+ If unsure, say Y.
+
 config TIMERFD
bool "Enable timerfd() system call" if EXPERT
select ANON_INODES
diff --git a/kernel/signal.c b/kernel/signal.c
index 8f0876f..299ee9c 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -35,6 +35,11 @@
 #include 
 #include 
 
+#ifdef CONFIG_SENDFD
+#include 
+#include 
+#endif
+
 #define CREATE_TRACE_POINTS
 #include 
 
@@ -394,8 +399,15 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t 
flags, int override_rlimi
 
 static void __sigqueue_free(struct sigqueue *q)
 {
+   if (q->info.si_code == __SI_FILEP) {
+   fput((struct file *)q->info.si_ptr);
+   q->info.si_code = 0;
+   q->info.si_ptr = NULL;
+   }
+
if (q->flags & SIGQUEUE_PREALLOC)
return;
+
atomic_dec(>user->sigpending);
free_uid(q->user);
kmem_cache_free(sigqueue_cachep, q);
@@ -543,6 +555,44 @@ unblock_all_signals(void)
spin_unlock_irqrestore(>sighand->siglock, flags);
 }
 
+#ifdef CONFIG_SENDFD
+
+/*
+ * sendfd_copy_install can only be reached from collect_signal(), that is from
+ * signalfd_read or sigtimedwait. 

[PATCH] syscall: introduce sendfd() syscall (v.2)

2014-12-03 Thread Alex Dubov
Present patch introduces exceptionally easy to use, low latency and low
overhead mechanism for transferring file descriptors between cooperating
processes:

int sendfd(pid_t pid, int sig, int fd)

Given a target process pid, the sendfd() will queue a real-time signal for
delivery to task referenced by pid. If signal can be delivered to destination
tasks and it chooses to collect the associated signal info, a new file
descriptor will be created on its behalf, pointing to file originally referred
by fd (the value of newly created file descriptor will be communicated as
integer payload within the siginfo data).

Signed-off-by: Alex Dubov oa...@yahoo.com
---
 arch/x86/syscalls/syscall_32.tbl   |  2 +
 arch/x86/syscalls/syscall_64.tbl   |  1 +
 include/asm-generic/siginfo.h  |  1 +
 include/linux/syscalls.h   |  1 +
 include/uapi/asm-generic/siginfo.h |  1 +
 init/Kconfig   | 11 +
 kernel/signal.c| 89 ++
 kernel/sys_ni.c|  3 ++
 8 files changed, 109 insertions(+)

diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl
index 9fe1b5d..e2782bd 100644
--- a/arch/x86/syscalls/syscall_32.tbl
+++ b/arch/x86/syscalls/syscall_32.tbl
@@ -364,3 +364,5 @@
 355i386getrandom   sys_getrandom
 356i386memfd_createsys_memfd_create
 357i386bpf sys_bpf
+358i386sendfd  sys_sendfd
+
diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
index 281150b..4d6b55d 100644
--- a/arch/x86/syscalls/syscall_64.tbl
+++ b/arch/x86/syscalls/syscall_64.tbl
@@ -328,6 +328,7 @@
 319common  memfd_createsys_memfd_create
 320common  kexec_file_load sys_kexec_file_load
 321common  bpf sys_bpf
+322common  sendfd  sys_sendfd
 
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/include/asm-generic/siginfo.h b/include/asm-generic/siginfo.h
index 3d1a3af..c8af06f 100644
--- a/include/asm-generic/siginfo.h
+++ b/include/asm-generic/siginfo.h
@@ -12,6 +12,7 @@
 #define __SI_RT(5  16)
 #define __SI_MESGQ (6  16)
 #define __SI_SYS   (7  16)
+#define __SI_FILEP (8  16)
 #define __SI_CODE(T,N) ((T) | ((N)  0x))
 
 struct siginfo;
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index bda9b81..1871b72f 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -877,4 +877,5 @@ asmlinkage long sys_seccomp(unsigned int op, unsigned int 
flags,
 asmlinkage long sys_getrandom(char __user *buf, size_t count,
  unsigned int flags);
 asmlinkage long sys_bpf(int cmd, union bpf_attr *attr, unsigned int size);
+asmlinkage long sys_sendfd(pid_t pid, int sig, int fd);
 #endif
diff --git a/include/uapi/asm-generic/siginfo.h 
b/include/uapi/asm-generic/siginfo.h
index ba5be7f..a92e38e 100644
--- a/include/uapi/asm-generic/siginfo.h
+++ b/include/uapi/asm-generic/siginfo.h
@@ -148,6 +148,7 @@ typedef struct siginfo {
 #define __SI_RT0
 #define __SI_MESGQ 0
 #define __SI_SYS   0
+#define __SI_FILEP 0
 #define __SI_CODE(T,N) (N)
 #endif
 
diff --git a/init/Kconfig b/init/Kconfig
index 2081a4d..6a62a44 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1505,6 +1505,17 @@ config SIGNALFD
 
  If unsure, say Y.
 
+config SENDFD
+   bool Enable sendfd() system call if EXPERT
+   default y
+   help
+ Enable the sendfd() system call that allows rapid duplication
+ of file descriptor across process boundaries. The target process
+ will receive a duplicate file descriptor delivered with one of
+ Posix.1b real-time signals.
+
+ If unsure, say Y.
+
 config TIMERFD
bool Enable timerfd() system call if EXPERT
select ANON_INODES
diff --git a/kernel/signal.c b/kernel/signal.c
index 8f0876f..299ee9c 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -35,6 +35,11 @@
 #include linux/cn_proc.h
 #include linux/compiler.h
 
+#ifdef CONFIG_SENDFD
+#include linux/file.h
+#include linux/fdtable.h
+#endif
+
 #define CREATE_TRACE_POINTS
 #include trace/events/signal.h
 
@@ -394,8 +399,15 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t 
flags, int override_rlimi
 
 static void __sigqueue_free(struct sigqueue *q)
 {
+   if (q-info.si_code == __SI_FILEP) {
+   fput((struct file *)q-info.si_ptr);
+   q-info.si_code = 0;
+   q-info.si_ptr = NULL;
+   }
+
if (q-flags  SIGQUEUE_PREALLOC)
return;
+
atomic_dec(q-user-sigpending);
free_uid(q-user);
kmem_cache_free(sigqueue_cachep, q);
@@ -543,6 +555,44 @@ unblock_all_signals(void)
spin_unlock_irqrestore(current-sighand-siglock, flags);
 }
 
+#ifdef CONFIG_SENDFD
+
+/*
+ * sendfd_copy_install can only be 

syscall: introduce sendfd() syscall (v.2)

2014-12-03 Thread Alex Dubov
I would like to present my second attempt at file descriptor duplication over
Posix.1b real-time signal transport. All the constructive points raised
in the previous discussion are believed to be addressed.

To this end, I would like to address some concerns raised in the preceding
discussion:

1. Claim: signals as a transport would not scale

Each task_struct allocated by kernel has its own signal queue, reliable, when
Posix.1b signals are concerned. This queue essentially serves as per-task
mail box, enabling complex applications to send signals from each thread to
each thread directly, with very low overhead, and thus avoid any shared
contention points outright (originating task's pid is passed along with
the siginfo data, so source based dispatching is perfectly possible).

Also, signals can be trivially integrated with other communication mediums,
as signalfd() syscall is perfectly compatible with epoll.

2. Claim: adding new functionality to the signal transport will create new
attack/DoS vectors.

Nothing can be further from truth.

2.a. If task A has sufficient capabilities to send signals to task B, then
task A is already in position to do anything it wants with task B, including
killing it outright.

2.b. Flood attacks on signal queues are not dangerous to the system, as signal
queues are relatively shallow and consume little memory even when full. Compare
with infamous recursive fd attack against AF_UNIX fd transport , which plagues
application development to this day (due to safeguards introduced to alleviate
it).

2.c. Natural decoupling of signal transport from vfs internals prevents any
sort of recursive fd attacks altogether (it is even safe to send the
signalfd() fd through - this can be considered a convenient feature to
replicate signal delivery masks around; of course, the receiving task will only
receive its own signals through it, peeking on other task's signals will not be
possible).

3. Suggestion: new file desriptors should not appear in destination processes
out of the blue.

3.a. To receive the signal, process must make non-trivial preparations (
manipulate signal masks, etc), which would only happen if certain signals
are expected.

3.b. In present implementation, file desriptor is only created at the
destination when destination task explictly elects to receive the associated
signal info with sigtimedwait/signalfd. In the absence of destination task
cooperation, the only overhead on the kernel side will be a single pair
of ref_count increment/decrement, that is, completely negligible.

3.c. Due to the nature of siginfo delivery, operations on file descriptor table
are completely safe and indistinguishable from a normal dup() system call.

I would appreciate any additional constructive criticism, as it is in my
interest as well to end up with safe and simple solution. However, I would
prefer the criticism to target particular technical shortcomings, and not be
derived from personal preferences, if possible.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/