Re: [PATCH 1/2] fs: introduce sendfd() syscall

2014-12-05 Thread One Thousand Gnomes
On Wed, 3 Dec 2014 11:41:44 +0100
Richard Cochran  wrote:

> On Wed, Dec 03, 2014 at 09:17:37AM +0100, Richard Weinberger wrote:
> > Come on guys, get a cup of coffee and relax a bit...
> 
> I am relaxed, especially after I had a good laugh reading this:
> 
>On a less related note, I hope you will agree that the simpler
>mechanism for this very in-demand feature is long overdue on Linux
>(every man and his dog are passing fds around these days).
> 
> Really, in years and years of unix programming, I have not yet felt
> the need to pass a file descriptor. Thats goes double for my dogs.

Its underused in part because you need a pointy hat to do it in Unix, but
it's a very common model elsewhere.

Whether you need the syscall or just to write sendfd() acceptfd() in
terms of AF_UNIX sockets in a library and bury the icky bits is another
question. I think the reality is you'd probably end up doing the library
*anyway* to deal with the fact it'll be 5 or more years before sendfd
percolated everywhere even if it was merged today.

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] fs: introduce sendfd() syscall

2014-12-05 Thread One Thousand Gnomes
On Wed, 3 Dec 2014 11:41:44 +0100
Richard Cochran richardcoch...@gmail.com wrote:

 On Wed, Dec 03, 2014 at 09:17:37AM +0100, Richard Weinberger wrote:
  Come on guys, get a cup of coffee and relax a bit...
 
 I am relaxed, especially after I had a good laugh reading this:
 
On a less related note, I hope you will agree that the simpler
mechanism for this very in-demand feature is long overdue on Linux
(every man and his dog are passing fds around these days).
 
 Really, in years and years of unix programming, I have not yet felt
 the need to pass a file descriptor. Thats goes double for my dogs.

Its underused in part because you need a pointy hat to do it in Unix, but
it's a very common model elsewhere.

Whether you need the syscall or just to write sendfd() acceptfd() in
terms of AF_UNIX sockets in a library and bury the icky bits is another
question. I think the reality is you'd probably end up doing the library
*anyway* to deal with the fact it'll be 5 or more years before sendfd
percolated everywhere even if it was merged today.

Alan
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] fs: introduce sendfd() syscall

2014-12-03 Thread Alex Dubov
On Wed, Dec 3, 2014 at 9:41 PM, Richard Cochran
 wrote:
> In any case, I find it hard to believe that the traditional method is
> really so bad. The explanation of why this new way is needed boils
> down to: "unix programming is so hard to get right."


Surely, this can be said about any new feature proposed. Why do we
need this new thing called wheel? We lived 50k years without it just
fine! It all boils down to: "walking with legs is so hard to get
right". :-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] fs: introduce sendfd() syscall

2014-12-03 Thread Richard Cochran
On Wed, Dec 03, 2014 at 09:17:37AM +0100, Richard Weinberger wrote:
> Come on guys, get a cup of coffee and relax a bit...

I am relaxed, especially after I had a good laugh reading this:

   On a less related note, I hope you will agree that the simpler
   mechanism for this very in-demand feature is long overdue on Linux
   (every man and his dog are passing fds around these days).

Really, in years and years of unix programming, I have not yet felt
the need to pass a file descriptor. Thats goes double for my dogs.

In any case, I find it hard to believe that the traditional method is
really so bad. The explanation of why this new way is needed boils
down to: "unix programming is so hard to get right."

Thanks,
Richard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] fs: introduce sendfd() syscall

2014-12-03 Thread Richard Weinberger
On Wed, Dec 3, 2014 at 9:08 AM, Richard Cochran
 wrote:
> On Tue, Dec 02, 2014 at 10:50:46PM -0800, Eric Dumazet wrote:
>> I think I will ignore your future mails.
>
> And I won't have time to read them either, because I will be too busy
> passing fds to my two collies.

Come on guys, get a cup of coffee and relax a bit...

-- 
Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] fs: introduce sendfd() syscall

2014-12-03 Thread Richard Cochran
On Tue, Dec 02, 2014 at 10:50:46PM -0800, Eric Dumazet wrote:
> I think I will ignore your future mails.

And I won't have time to read them either, because I will be too busy
passing fds to my two collies.

Cheers,
Richard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] fs: introduce sendfd() syscall

2014-12-03 Thread Richard Cochran
On Tue, Dec 02, 2014 at 10:50:46PM -0800, Eric Dumazet wrote:
 I think I will ignore your future mails.

And I won't have time to read them either, because I will be too busy
passing fds to my two collies.

Cheers,
Richard
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] fs: introduce sendfd() syscall

2014-12-03 Thread Richard Weinberger
On Wed, Dec 3, 2014 at 9:08 AM, Richard Cochran
richardcoch...@gmail.com wrote:
 On Tue, Dec 02, 2014 at 10:50:46PM -0800, Eric Dumazet wrote:
 I think I will ignore your future mails.

 And I won't have time to read them either, because I will be too busy
 passing fds to my two collies.

Come on guys, get a cup of coffee and relax a bit...

-- 
Thanks,
//richard
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] fs: introduce sendfd() syscall

2014-12-03 Thread Richard Cochran
On Wed, Dec 03, 2014 at 09:17:37AM +0100, Richard Weinberger wrote:
 Come on guys, get a cup of coffee and relax a bit...

I am relaxed, especially after I had a good laugh reading this:

   On a less related note, I hope you will agree that the simpler
   mechanism for this very in-demand feature is long overdue on Linux
   (every man and his dog are passing fds around these days).

Really, in years and years of unix programming, I have not yet felt
the need to pass a file descriptor. Thats goes double for my dogs.

In any case, I find it hard to believe that the traditional method is
really so bad. The explanation of why this new way is needed boils
down to: unix programming is so hard to get right.

Thanks,
Richard
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] fs: introduce sendfd() syscall

2014-12-03 Thread Alex Dubov
On Wed, Dec 3, 2014 at 9:41 PM, Richard Cochran
richardcoch...@gmail.com wrote:
 In any case, I find it hard to believe that the traditional method is
 really so bad. The explanation of why this new way is needed boils
 down to: unix programming is so hard to get right.


Surely, this can be said about any new feature proposed. Why do we
need this new thing called wheel? We lived 50k years without it just
fine! It all boils down to: walking with legs is so hard to get
right. :-)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] fs: introduce sendfd() syscall

2014-12-02 Thread Eric Dumazet
On Wed, 2014-12-03 at 13:22 +1100, Alex Dubov wrote:

> Yours is the first insightful message in this thread. Some of the
> other commenters exhibited an unfortunate lack of understanding,
> regarding what signals are and what they can be useful for.

Oh nice.

I think I will ignore your future mails.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] fs: introduce sendfd() syscall

2014-12-02 Thread Eric Dumazet
On Wed, 2014-12-03 at 13:11 +1100, Alex Dubov wrote:

> Kindly enlighten me, how are you going to use any file descriptor in a
> 128 threads program in a scalable way (socket and all)? How this
> approach will be different when using signalfd()?

Thats the point : use one different channel (AF_UNIX socket, or AF_INET
listener...) per thread.

Each thread uses epoll() on a private epoll fd, and a dedicated channel
to get fds from other processes.

Sharing a signalfd() would be terrible, like using accept() on a single
listener socket :(

Your proposed interface, being tied to legacy signal(s), do not allow
for many multiple channels.

Sorry, but using signals is simply a no go for me.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] fs: introduce sendfd() syscall

2014-12-02 Thread Alex Dubov
On Wed, Dec 3, 2014 at 2:40 PM, Al Viro  wrote:
> On Wed, Dec 03, 2014 at 01:22:33PM +1100, Alex Dubov wrote:
>
>> On a less related note, I hope you will agree that the simpler
>> mechanism for this very in-demand feature is long overdue on Linux
>> (every man and his dog are passing fds around these days).
>
> ... and I'm less than sure that it's a good thing.  If nothing else,
> once the pieces of your program are passing descriptors around freely,
> you have created a barfball that will be impossible to split between
> several boxen if you run into scalability issues.  Descriptor-passing
> is limited to a single system; you *can't* do that between e.g. components
> of a cluster.  So it's not an unmixed blessing, just as overuse of
> shared memory segments, etc.  They do have their uses, but that needs
> to be carefully considered every time, or you'll create a major headache
> a few years down the road.

Well, if you try hard enough, you can pass fds around the components
of the cluster - Mosix was doing just that some 10 years ago.
Conceptually, it's even easier than doing distributed shared memory,
as long as mmap is not concerned. :-)

I was, however, looking at it from a different standpoint. Abundance
of cores in the contemporary CPUs calls for locally parallel
applications (and those are still the majority - clearly 90% of the
applications and their workloads fit just fine on a single node).

Thus, any modern application developer faces the usual dilemma:

1. Go multi-threaded - easy inter-thread IPC, lousy reliability with
minor errors in secondary tasks crashing the whole application.

2. Go multi-process - circus hoop jumping when IPC is concerned, great
reliability through OS provided fault isolation (so even really broken
stuff, like PHP plugin for apache manages to perform most of the time
:-)

Memfd (on its own) and eventfd are great steps in the right direction,
as managing persistent shmem and sem objects was always pain in the
arse. If there was an alternative to AF_UNIX fd passing, with its
arcane API, fs persistence and mind boggling fd recursion bugs, then
option 2 would became much more attractive for developers leading to
over-all increase in application reliability and security.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] fs: introduce sendfd() syscall

2014-12-02 Thread Al Viro
On Wed, Dec 03, 2014 at 01:22:33PM +1100, Alex Dubov wrote:

> On a less related note, I hope you will agree that the simpler
> mechanism for this very in-demand feature is long overdue on Linux
> (every man and his dog are passing fds around these days).

... and I'm less than sure that it's a good thing.  If nothing else,
once the pieces of your program are passing descriptors around freely,
you have created a barfball that will be impossible to split between
several boxen if you run into scalability issues.  Descriptor-passing
is limited to a single system; you *can't* do that between e.g. components
of a cluster.  So it's not an unmixed blessing, just as overuse of
shared memory segments, etc.  They do have their uses, but that needs
to be carefully considered every time, or you'll create a major headache
a few years down the road.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] fs: introduce sendfd() syscall

2014-12-02 Thread Alex Dubov
On Wed, Dec 3, 2014 at 4:00 AM, Al Viro  wrote:
> On Tue, Dec 02, 2014 at 03:35:18PM +1100, Alex Dubov wrote:
>> +
>> + if (rc < 0)
>> + __close_fd(dst_files, s_info.si_int);
>
> Oh, lovely...  And we are guaranteed that it still the same file, because...?
>
> Not to mention anything else, this stuff violates the assumption used in a lot
> of places - that the *only* way for a process to modify a descriptor table is
> to have a reference to it obtained by something that had it as its current
> descriptor table and not dropped since then.  The way you do it might actually
> turn out to be OK, but there's no way I'll take that without detailed 
> analysis;
> start with refcounting of struct file, for one thing - it does rely on the
> assumption above in non-trivial ways.

Ok, I see the problem here. This indeed requires further thought.

> And that's aside of the points other folks had brought up.

Yours is the first insightful message in this thread. Some of the
other commenters exhibited an unfortunate lack of understanding,
regarding what signals are and what they can be useful for.

Unless, of course, I have missed something important.

On a less related note, I hope you will agree that the simpler
mechanism for this very in-demand feature is long overdue on Linux
(every man and his dog are passing fds around these days).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] fs: introduce sendfd() syscall

2014-12-02 Thread Alex Dubov
On Wed, Dec 3, 2014 at 3:42 AM, Eric Dumazet  wrote:
> On Wed, 2014-12-03 at 03:23 +1100, Alex Dubov wrote:
>
> Tell me how a 128 threads program can use this new mechanism in a
> scalable way.
>
> One signal per thread ?

What for?

Kernel will deliver the signal only to the thread/threads which has
the relevant signal unblocked (they are blocked by default).

>
> I guess we'll keep AF_UNIX then, thank you.

Kindly enlighten me, how are you going to use any file descriptor in a
128 threads program in a scalable way (socket and all)? How this
approach will be different when using signalfd()?

And no, I'm not proposing to take your favorite toys away.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] fs: introduce sendfd() syscall

2014-12-02 Thread Al Viro
On Tue, Dec 02, 2014 at 03:35:18PM +1100, Alex Dubov wrote:
> + dst_files = get_files_struct(dst_task);
> + if (!dst_files) {
> + rc = -EMFILE;
> + goto out_put_dst_task;
> + }
> +
> + if (!lock_task_sighand(dst_task, )) {
> + rc = -EMFILE;
> + goto out_put_dst_files;
> + }
> +
> + rlim = task_rlimit(dst_task, RLIMIT_NOFILE);
> +
> + unlock_task_sighand(dst_task, );
> +
> + rc = __alloc_fd(dst_task->files, 0, rlim, O_CLOEXEC);
> + if (rc < 0)
> + goto out_put_dst_files;
> +
> + s_info.si_int = rc;
> +
> + get_file(src_file);
> + __fd_install(dst_files, rc, src_file);
> + rc = kill_pid_info(sig, _info, task_pid(dst_task));
> +
> + if (rc < 0)
> + __close_fd(dst_files, s_info.si_int);

Oh, lovely...  And we are guaranteed that it still the same file, because...?

Not to mention anything else, this stuff violates the assumption used in a lot
of places - that the *only* way for a process to modify a descriptor table is
to have a reference to it obtained by something that had it as its current
descriptor table and not dropped since then.  The way you do it might actually
turn out to be OK, but there's no way I'll take that without detailed analysis;
start with refcounting of struct file, for one thing - it does rely on the
assumption above in non-trivial ways.

Binder, shite as it is, satisfies that assumption.  Your "simpler" variant
does not.  Which means that you get to prove that you won't open any races
around fs/file.c.

And that's aside of the points other folks had brought up.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] fs: introduce sendfd() syscall

2014-12-02 Thread Eric Dumazet
On Wed, 2014-12-03 at 03:23 +1100, Alex Dubov wrote:

> Same as SIGKILL. And yet, our machines are still working fine.
> 
> If process A has sufficient capability to send signals to process B,
> then process B is already at its mercy, fds or not fds.

Tell me how a 128 threads program can use this new mechanism in a
scalable way.

One signal per thread ?

I guess we'll keep AF_UNIX then, thank you.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] fs: introduce sendfd() syscall

2014-12-02 Thread Alex Dubov
On Wed, Dec 3, 2014 at 2:33 AM, Eric Dumazet  wrote:
> On Wed, 2014-12-03 at 01:47 +1100, Alex Dubov wrote:
>> > User A can send fd(s) to processes belonging to user B, even if user B
>> > does (probably) not want this to happen ?
>>
>> 1. Process A must have sufficient permissions to signal process B.
>> This will only happen if process A belongs to the same user as process
>> B or has elevated capabilities, which can not appear by themselves
>> (and if root on some machine can not be trusted, then all is lost
>> anyway).
>>
>
> I do not see this enforced in your patch.
>
> Allowing a process to hold many times the lock protecting my file
> descriptor table is very scary.
>
> Reserving a slot, then undo this if the signal failed is a nice way to
> slow down critical programs and eventually block them from doing
> progress when using file descriptors (most system calls afaik)

Yes, this is an omission. I already promised to tighten the security
in my last post. :)

>> 2. If process B has not specified explicitly how it wants the
>> particular signal to be handled, it will be killed by the default
>> handler. End of story, nothing else is going to happen.
>
> So it seems possible for an arbitrary program to send fds to innocent
> programs, that will likely fill their fd table and wont be able to open
> a new file.
>
> This opens interesting security issues and attack vectors.

Same as SIGKILL. And yet, our machines are still working fine.

If process A has sufficient capability to send signals to process B,
then process B is already at its mercy, fds or not fds.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] fs: introduce sendfd() syscall

2014-12-02 Thread Eric Dumazet
On Wed, 2014-12-03 at 01:47 +1100, Alex Dubov wrote:
> > User A can send fd(s) to processes belonging to user B, even if user B
> > does (probably) not want this to happen ?
> 
> 1. Process A must have sufficient permissions to signal process B.
> This will only happen if process A belongs to the same user as process
> B or has elevated capabilities, which can not appear by themselves
> (and if root on some machine can not be trusted, then all is lost
> anyway).
> 

I do not see this enforced in your patch.

Allowing a process to hold many times the lock protecting my file
descriptor table is very scary.

Reserving a slot, then undo this if the signal failed is a nice way to
slow down critical programs and eventually block them from doing
progress when using file descriptors (most system calls afaik)


> 2. If process B has not specified explicitly how it wants the
> particular signal to be handled, it will be killed by the default
> handler. End of story, nothing else is going to happen.

So it seems possible for an arbitrary program to send fds to innocent
programs, that will likely fill their fd table and wont be able to open
a new file.

This opens interesting security issues and attack vectors.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] fs: introduce sendfd() syscall

2014-12-02 Thread Alex Dubov
> User A can send fd(s) to processes belonging to user B, even if user B
> does (probably) not want this to happen ?

1. Process A must have sufficient permissions to signal process B.
This will only happen if process A belongs to the same user as process
B or has elevated capabilities, which can not appear by themselves
(and if root on some machine can not be trusted, then all is lost
anyway).

2. If process B has not specified explicitly how it wants the
particular signal to be handled, it will be killed by the default
handler. End of story, nothing else is going to happen.

I suppose, I can add an extra permissions check prior to creating the
new file descriptor in the first place.

> Also, relying on signals seems quite old fashion these days. How about
> multi-threaded programs wanting separate channels to receive fds ?

Most multi-threaded programs share the same file table between all
threads (unless some fancy clone() magic is involved), so the issue is
rather mundane. At any rate, each thread has its own pid and the usual
signal routing applies.

At a more generic level Posix real-time signals are anything, but
old-fashioned. sigqueue()/signalfd() pair provides a very convenient,
low overhead micro-messaging facility with ordered, reliably delivery.
I fail to see what's wrong with making a worthy use of it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] fs: introduce sendfd() syscall

2014-12-02 Thread Eric Dumazet
On Tue, 2014-12-02 at 15:35 +1100, Alex Dubov wrote:
> Present patch introduces exceptionally easy to use, low latency and low
> overhead mechanism for transferring file descriptors between cooperating
> processes:
> 
> int sendfd(pid_t pid, int sig, int fd)
> 
> Given a target process pid, the sendfd() syscall will create a duplicate
> file descriptor in a target task's (referred by pid) file table pointing to
> the file references by descriptor fd. Then, it will attempt to notify the
> target task by issuing a Posix.1b real-time signal (sig), carrying the new
> file descriptor as integer payload. If real-time signal can not be enqueued
> at the destination signal queue, the newly created file descriptor will be
> promptly closed.
> 
> Signed-off-by: Alex Dubov 
> ---

User A can send fd(s) to processes belonging to user B, even if user B
does (probably) not want this to happen ?

Also, relying on signals seems quite old fashion these days. How about
multi-threaded programs wanting separate channels to receive fds ?

Ability to flood fds and fill target file descriptors table looks very
dangerous to me. Some programs could break as they expect they control
fd allocations.

I like the idea of not having to use AF_UNIX and stick to a well defined
interface, but I do not like this asynchronous model.

Thanks.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] fs: introduce sendfd() syscall

2014-12-02 Thread Eric Dumazet
On Tue, 2014-12-02 at 15:35 +1100, Alex Dubov wrote:
 Present patch introduces exceptionally easy to use, low latency and low
 overhead mechanism for transferring file descriptors between cooperating
 processes:
 
 int sendfd(pid_t pid, int sig, int fd)
 
 Given a target process pid, the sendfd() syscall will create a duplicate
 file descriptor in a target task's (referred by pid) file table pointing to
 the file references by descriptor fd. Then, it will attempt to notify the
 target task by issuing a Posix.1b real-time signal (sig), carrying the new
 file descriptor as integer payload. If real-time signal can not be enqueued
 at the destination signal queue, the newly created file descriptor will be
 promptly closed.
 
 Signed-off-by: Alex Dubov oa...@yahoo.com
 ---

User A can send fd(s) to processes belonging to user B, even if user B
does (probably) not want this to happen ?

Also, relying on signals seems quite old fashion these days. How about
multi-threaded programs wanting separate channels to receive fds ?

Ability to flood fds and fill target file descriptors table looks very
dangerous to me. Some programs could break as they expect they control
fd allocations.

I like the idea of not having to use AF_UNIX and stick to a well defined
interface, but I do not like this asynchronous model.

Thanks.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] fs: introduce sendfd() syscall

2014-12-02 Thread Alex Dubov
 User A can send fd(s) to processes belonging to user B, even if user B
 does (probably) not want this to happen ?

1. Process A must have sufficient permissions to signal process B.
This will only happen if process A belongs to the same user as process
B or has elevated capabilities, which can not appear by themselves
(and if root on some machine can not be trusted, then all is lost
anyway).

2. If process B has not specified explicitly how it wants the
particular signal to be handled, it will be killed by the default
handler. End of story, nothing else is going to happen.

I suppose, I can add an extra permissions check prior to creating the
new file descriptor in the first place.

 Also, relying on signals seems quite old fashion these days. How about
 multi-threaded programs wanting separate channels to receive fds ?

Most multi-threaded programs share the same file table between all
threads (unless some fancy clone() magic is involved), so the issue is
rather mundane. At any rate, each thread has its own pid and the usual
signal routing applies.

At a more generic level Posix real-time signals are anything, but
old-fashioned. sigqueue()/signalfd() pair provides a very convenient,
low overhead micro-messaging facility with ordered, reliably delivery.
I fail to see what's wrong with making a worthy use of it.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] fs: introduce sendfd() syscall

2014-12-02 Thread Eric Dumazet
On Wed, 2014-12-03 at 01:47 +1100, Alex Dubov wrote:
  User A can send fd(s) to processes belonging to user B, even if user B
  does (probably) not want this to happen ?
 
 1. Process A must have sufficient permissions to signal process B.
 This will only happen if process A belongs to the same user as process
 B or has elevated capabilities, which can not appear by themselves
 (and if root on some machine can not be trusted, then all is lost
 anyway).
 

I do not see this enforced in your patch.

Allowing a process to hold many times the lock protecting my file
descriptor table is very scary.

Reserving a slot, then undo this if the signal failed is a nice way to
slow down critical programs and eventually block them from doing
progress when using file descriptors (most system calls afaik)


 2. If process B has not specified explicitly how it wants the
 particular signal to be handled, it will be killed by the default
 handler. End of story, nothing else is going to happen.

So it seems possible for an arbitrary program to send fds to innocent
programs, that will likely fill their fd table and wont be able to open
a new file.

This opens interesting security issues and attack vectors.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] fs: introduce sendfd() syscall

2014-12-02 Thread Alex Dubov
On Wed, Dec 3, 2014 at 2:33 AM, Eric Dumazet eric.duma...@gmail.com wrote:
 On Wed, 2014-12-03 at 01:47 +1100, Alex Dubov wrote:
  User A can send fd(s) to processes belonging to user B, even if user B
  does (probably) not want this to happen ?

 1. Process A must have sufficient permissions to signal process B.
 This will only happen if process A belongs to the same user as process
 B or has elevated capabilities, which can not appear by themselves
 (and if root on some machine can not be trusted, then all is lost
 anyway).


 I do not see this enforced in your patch.

 Allowing a process to hold many times the lock protecting my file
 descriptor table is very scary.

 Reserving a slot, then undo this if the signal failed is a nice way to
 slow down critical programs and eventually block them from doing
 progress when using file descriptors (most system calls afaik)

Yes, this is an omission. I already promised to tighten the security
in my last post. :)

 2. If process B has not specified explicitly how it wants the
 particular signal to be handled, it will be killed by the default
 handler. End of story, nothing else is going to happen.

 So it seems possible for an arbitrary program to send fds to innocent
 programs, that will likely fill their fd table and wont be able to open
 a new file.

 This opens interesting security issues and attack vectors.

Same as SIGKILL. And yet, our machines are still working fine.

If process A has sufficient capability to send signals to process B,
then process B is already at its mercy, fds or not fds.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] fs: introduce sendfd() syscall

2014-12-02 Thread Eric Dumazet
On Wed, 2014-12-03 at 03:23 +1100, Alex Dubov wrote:

 Same as SIGKILL. And yet, our machines are still working fine.
 
 If process A has sufficient capability to send signals to process B,
 then process B is already at its mercy, fds or not fds.

Tell me how a 128 threads program can use this new mechanism in a
scalable way.

One signal per thread ?

I guess we'll keep AF_UNIX then, thank you.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] fs: introduce sendfd() syscall

2014-12-02 Thread Al Viro
On Tue, Dec 02, 2014 at 03:35:18PM +1100, Alex Dubov wrote:
 + dst_files = get_files_struct(dst_task);
 + if (!dst_files) {
 + rc = -EMFILE;
 + goto out_put_dst_task;
 + }
 +
 + if (!lock_task_sighand(dst_task, flags)) {
 + rc = -EMFILE;
 + goto out_put_dst_files;
 + }
 +
 + rlim = task_rlimit(dst_task, RLIMIT_NOFILE);
 +
 + unlock_task_sighand(dst_task, flags);
 +
 + rc = __alloc_fd(dst_task-files, 0, rlim, O_CLOEXEC);
 + if (rc  0)
 + goto out_put_dst_files;
 +
 + s_info.si_int = rc;
 +
 + get_file(src_file);
 + __fd_install(dst_files, rc, src_file);
 + rc = kill_pid_info(sig, s_info, task_pid(dst_task));
 +
 + if (rc  0)
 + __close_fd(dst_files, s_info.si_int);

Oh, lovely...  And we are guaranteed that it still the same file, because...?

Not to mention anything else, this stuff violates the assumption used in a lot
of places - that the *only* way for a process to modify a descriptor table is
to have a reference to it obtained by something that had it as its current
descriptor table and not dropped since then.  The way you do it might actually
turn out to be OK, but there's no way I'll take that without detailed analysis;
start with refcounting of struct file, for one thing - it does rely on the
assumption above in non-trivial ways.

Binder, shite as it is, satisfies that assumption.  Your simpler variant
does not.  Which means that you get to prove that you won't open any races
around fs/file.c.

And that's aside of the points other folks had brought up.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] fs: introduce sendfd() syscall

2014-12-02 Thread Alex Dubov
On Wed, Dec 3, 2014 at 3:42 AM, Eric Dumazet eric.duma...@gmail.com wrote:
 On Wed, 2014-12-03 at 03:23 +1100, Alex Dubov wrote:

 Tell me how a 128 threads program can use this new mechanism in a
 scalable way.

 One signal per thread ?

What for?

Kernel will deliver the signal only to the thread/threads which has
the relevant signal unblocked (they are blocked by default).


 I guess we'll keep AF_UNIX then, thank you.

Kindly enlighten me, how are you going to use any file descriptor in a
128 threads program in a scalable way (socket and all)? How this
approach will be different when using signalfd()?

And no, I'm not proposing to take your favorite toys away.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] fs: introduce sendfd() syscall

2014-12-02 Thread Alex Dubov
On Wed, Dec 3, 2014 at 4:00 AM, Al Viro v...@zeniv.linux.org.uk wrote:
 On Tue, Dec 02, 2014 at 03:35:18PM +1100, Alex Dubov wrote:
 +
 + if (rc  0)
 + __close_fd(dst_files, s_info.si_int);

 Oh, lovely...  And we are guaranteed that it still the same file, because...?

 Not to mention anything else, this stuff violates the assumption used in a lot
 of places - that the *only* way for a process to modify a descriptor table is
 to have a reference to it obtained by something that had it as its current
 descriptor table and not dropped since then.  The way you do it might actually
 turn out to be OK, but there's no way I'll take that without detailed 
 analysis;
 start with refcounting of struct file, for one thing - it does rely on the
 assumption above in non-trivial ways.

Ok, I see the problem here. This indeed requires further thought.

 And that's aside of the points other folks had brought up.

Yours is the first insightful message in this thread. Some of the
other commenters exhibited an unfortunate lack of understanding,
regarding what signals are and what they can be useful for.

Unless, of course, I have missed something important.

On a less related note, I hope you will agree that the simpler
mechanism for this very in-demand feature is long overdue on Linux
(every man and his dog are passing fds around these days).
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] fs: introduce sendfd() syscall

2014-12-02 Thread Al Viro
On Wed, Dec 03, 2014 at 01:22:33PM +1100, Alex Dubov wrote:

 On a less related note, I hope you will agree that the simpler
 mechanism for this very in-demand feature is long overdue on Linux
 (every man and his dog are passing fds around these days).

... and I'm less than sure that it's a good thing.  If nothing else,
once the pieces of your program are passing descriptors around freely,
you have created a barfball that will be impossible to split between
several boxen if you run into scalability issues.  Descriptor-passing
is limited to a single system; you *can't* do that between e.g. components
of a cluster.  So it's not an unmixed blessing, just as overuse of
shared memory segments, etc.  They do have their uses, but that needs
to be carefully considered every time, or you'll create a major headache
a few years down the road.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] fs: introduce sendfd() syscall

2014-12-02 Thread Alex Dubov
On Wed, Dec 3, 2014 at 2:40 PM, Al Viro v...@zeniv.linux.org.uk wrote:
 On Wed, Dec 03, 2014 at 01:22:33PM +1100, Alex Dubov wrote:

 On a less related note, I hope you will agree that the simpler
 mechanism for this very in-demand feature is long overdue on Linux
 (every man and his dog are passing fds around these days).

 ... and I'm less than sure that it's a good thing.  If nothing else,
 once the pieces of your program are passing descriptors around freely,
 you have created a barfball that will be impossible to split between
 several boxen if you run into scalability issues.  Descriptor-passing
 is limited to a single system; you *can't* do that between e.g. components
 of a cluster.  So it's not an unmixed blessing, just as overuse of
 shared memory segments, etc.  They do have their uses, but that needs
 to be carefully considered every time, or you'll create a major headache
 a few years down the road.

Well, if you try hard enough, you can pass fds around the components
of the cluster - Mosix was doing just that some 10 years ago.
Conceptually, it's even easier than doing distributed shared memory,
as long as mmap is not concerned. :-)

I was, however, looking at it from a different standpoint. Abundance
of cores in the contemporary CPUs calls for locally parallel
applications (and those are still the majority - clearly 90% of the
applications and their workloads fit just fine on a single node).

Thus, any modern application developer faces the usual dilemma:

1. Go multi-threaded - easy inter-thread IPC, lousy reliability with
minor errors in secondary tasks crashing the whole application.

2. Go multi-process - circus hoop jumping when IPC is concerned, great
reliability through OS provided fault isolation (so even really broken
stuff, like PHP plugin for apache manages to perform most of the time
:-)

Memfd (on its own) and eventfd are great steps in the right direction,
as managing persistent shmem and sem objects was always pain in the
arse. If there was an alternative to AF_UNIX fd passing, with its
arcane API, fs persistence and mind boggling fd recursion bugs, then
option 2 would became much more attractive for developers leading to
over-all increase in application reliability and security.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] fs: introduce sendfd() syscall

2014-12-02 Thread Eric Dumazet
On Wed, 2014-12-03 at 13:11 +1100, Alex Dubov wrote:

 Kindly enlighten me, how are you going to use any file descriptor in a
 128 threads program in a scalable way (socket and all)? How this
 approach will be different when using signalfd()?

Thats the point : use one different channel (AF_UNIX socket, or AF_INET
listener...) per thread.

Each thread uses epoll() on a private epoll fd, and a dedicated channel
to get fds from other processes.

Sharing a signalfd() would be terrible, like using accept() on a single
listener socket :(

Your proposed interface, being tied to legacy signal(s), do not allow
for many multiple channels.

Sorry, but using signals is simply a no go for me.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] fs: introduce sendfd() syscall

2014-12-02 Thread Eric Dumazet
On Wed, 2014-12-03 at 13:22 +1100, Alex Dubov wrote:

 Yours is the first insightful message in this thread. Some of the
 other commenters exhibited an unfortunate lack of understanding,
 regarding what signals are and what they can be useful for.

Oh nice.

I think I will ignore your future mails.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] fs: introduce sendfd() syscall

2014-12-01 Thread Alex Dubov
Present patch introduces exceptionally easy to use, low latency and low
overhead mechanism for transferring file descriptors between cooperating
processes:

int sendfd(pid_t pid, int sig, int fd)

Given a target process pid, the sendfd() syscall will create a duplicate
file descriptor in a target task's (referred by pid) file table pointing to
the file references by descriptor fd. Then, it will attempt to notify the
target task by issuing a Posix.1b real-time signal (sig), carrying the new
file descriptor as integer payload. If real-time signal can not be enqueued
at the destination signal queue, the newly created file descriptor will be
promptly closed.

Signed-off-by: Alex Dubov 
---
 fs/Makefile  |  1 +
 fs/sendfd.c  | 82 
 init/Kconfig | 11 
 3 files changed, 94 insertions(+)
 create mode 100644 fs/sendfd.c

diff --git a/fs/Makefile b/fs/Makefile
index da0bbb4..bed05a8 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -27,6 +27,7 @@ obj-$(CONFIG_ANON_INODES) += anon_inodes.o
 obj-$(CONFIG_SIGNALFD) += signalfd.o
 obj-$(CONFIG_TIMERFD)  += timerfd.o
 obj-$(CONFIG_EVENTFD)  += eventfd.o
+obj-$(CONFIG_SENDFD)   += sendfd.o
 obj-$(CONFIG_AIO)   += aio.o
 obj-$(CONFIG_FILE_LOCKING)  += locks.o
 obj-$(CONFIG_COMPAT)   += compat.o compat_ioctl.o
diff --git a/fs/sendfd.c b/fs/sendfd.c
new file mode 100644
index 000..1e85484
--- /dev/null
+++ b/fs/sendfd.c
@@ -0,0 +1,82 @@
+/*
+ *  fs/sendfd.c
+ *
+ *  Copyright (C) 2014 Alex Dubov 
+ *
+ */
+
+#include 
+#include 
+#include 
+
+SYSCALL_DEFINE3(sendfd, pid_t, pid, int, sig, int, fd)
+{
+   struct siginfo s_info = {
+   .si_signo = sig,
+   .si_errno = 0,
+   .si_code = __SI_RT
+   };
+   struct file *src_file = NULL;
+   struct task_struct *dst_task = NULL;
+   struct files_struct *dst_files  = NULL;
+   unsigned long rlim = 0;
+   unsigned long flags = 0;
+   int rc = 0;
+
+   if ((sig < SIGRTMIN) || (sig > SIGRTMAX))
+   return -EINVAL;
+
+   s_info.si_pid = task_pid_vnr(current);
+   s_info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
+   s_info.si_int = -1;
+
+   src_file = fget(fd);
+   if (!src_file)
+   return -EBADF;
+
+   rcu_read_lock();
+   dst_task = find_task_by_vpid(pid);
+
+   if (!dst_task) {
+   rc = -ESRCH;
+   goto out_put_src_file;
+   }
+   get_task_struct(dst_task);
+   rcu_read_unlock();
+
+   dst_files = get_files_struct(dst_task);
+   if (!dst_files) {
+   rc = -EMFILE;
+   goto out_put_dst_task;
+   }
+
+   if (!lock_task_sighand(dst_task, )) {
+   rc = -EMFILE;
+   goto out_put_dst_files;
+   }
+
+   rlim = task_rlimit(dst_task, RLIMIT_NOFILE);
+
+   unlock_task_sighand(dst_task, );
+
+   rc = __alloc_fd(dst_task->files, 0, rlim, O_CLOEXEC);
+   if (rc < 0)
+   goto out_put_dst_files;
+
+   s_info.si_int = rc;
+
+   get_file(src_file);
+   __fd_install(dst_files, rc, src_file);
+   rc = kill_pid_info(sig, _info, task_pid(dst_task));
+
+   if (rc < 0)
+   __close_fd(dst_files, s_info.si_int);
+
+out_put_dst_files:
+   put_files_struct(dst_files);
+out_put_dst_task:
+   put_task_struct(dst_task);
+out_put_src_file:
+   fput(src_file);
+   return rc;
+}
diff --git a/init/Kconfig b/init/Kconfig
index 2081a4d..dfe8b6f 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1525,6 +1525,17 @@ config EVENTFD
 
  If unsure, say Y.
 
+config SENDFD
+   bool "Enable sendfd() system call" if EXPERT
+   default y
+   help
+ Enable the sendfd() system call that allows rapid duplication
+ of file descriptor across process boundaries. The target process
+ will receive a duplicate file descriptor delivered with one of
+ Posix.1b real-time signals.
+
+ If unsure, say Y.
+
 # syscall, maps, verifier
 config BPF_SYSCALL
bool "Enable bpf() system call" if EXPERT
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] fs: introduce sendfd() syscall

2014-12-01 Thread Alex Dubov
Present patch introduces exceptionally easy to use, low latency and low
overhead mechanism for transferring file descriptors between cooperating
processes:

int sendfd(pid_t pid, int sig, int fd)

Given a target process pid, the sendfd() syscall will create a duplicate
file descriptor in a target task's (referred by pid) file table pointing to
the file references by descriptor fd. Then, it will attempt to notify the
target task by issuing a Posix.1b real-time signal (sig), carrying the new
file descriptor as integer payload. If real-time signal can not be enqueued
at the destination signal queue, the newly created file descriptor will be
promptly closed.

Signed-off-by: Alex Dubov oa...@yahoo.com
---
 fs/Makefile  |  1 +
 fs/sendfd.c  | 82 
 init/Kconfig | 11 
 3 files changed, 94 insertions(+)
 create mode 100644 fs/sendfd.c

diff --git a/fs/Makefile b/fs/Makefile
index da0bbb4..bed05a8 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -27,6 +27,7 @@ obj-$(CONFIG_ANON_INODES) += anon_inodes.o
 obj-$(CONFIG_SIGNALFD) += signalfd.o
 obj-$(CONFIG_TIMERFD)  += timerfd.o
 obj-$(CONFIG_EVENTFD)  += eventfd.o
+obj-$(CONFIG_SENDFD)   += sendfd.o
 obj-$(CONFIG_AIO)   += aio.o
 obj-$(CONFIG_FILE_LOCKING)  += locks.o
 obj-$(CONFIG_COMPAT)   += compat.o compat_ioctl.o
diff --git a/fs/sendfd.c b/fs/sendfd.c
new file mode 100644
index 000..1e85484
--- /dev/null
+++ b/fs/sendfd.c
@@ -0,0 +1,82 @@
+/*
+ *  fs/sendfd.c
+ *
+ *  Copyright (C) 2014 Alex Dubov oa...@yahoo.com
+ *
+ */
+
+#include linux/file.h
+#include linux/fdtable.h
+#include linux/syscalls.h
+
+SYSCALL_DEFINE3(sendfd, pid_t, pid, int, sig, int, fd)
+{
+   struct siginfo s_info = {
+   .si_signo = sig,
+   .si_errno = 0,
+   .si_code = __SI_RT
+   };
+   struct file *src_file = NULL;
+   struct task_struct *dst_task = NULL;
+   struct files_struct *dst_files  = NULL;
+   unsigned long rlim = 0;
+   unsigned long flags = 0;
+   int rc = 0;
+
+   if ((sig  SIGRTMIN) || (sig  SIGRTMAX))
+   return -EINVAL;
+
+   s_info.si_pid = task_pid_vnr(current);
+   s_info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
+   s_info.si_int = -1;
+
+   src_file = fget(fd);
+   if (!src_file)
+   return -EBADF;
+
+   rcu_read_lock();
+   dst_task = find_task_by_vpid(pid);
+
+   if (!dst_task) {
+   rc = -ESRCH;
+   goto out_put_src_file;
+   }
+   get_task_struct(dst_task);
+   rcu_read_unlock();
+
+   dst_files = get_files_struct(dst_task);
+   if (!dst_files) {
+   rc = -EMFILE;
+   goto out_put_dst_task;
+   }
+
+   if (!lock_task_sighand(dst_task, flags)) {
+   rc = -EMFILE;
+   goto out_put_dst_files;
+   }
+
+   rlim = task_rlimit(dst_task, RLIMIT_NOFILE);
+
+   unlock_task_sighand(dst_task, flags);
+
+   rc = __alloc_fd(dst_task-files, 0, rlim, O_CLOEXEC);
+   if (rc  0)
+   goto out_put_dst_files;
+
+   s_info.si_int = rc;
+
+   get_file(src_file);
+   __fd_install(dst_files, rc, src_file);
+   rc = kill_pid_info(sig, s_info, task_pid(dst_task));
+
+   if (rc  0)
+   __close_fd(dst_files, s_info.si_int);
+
+out_put_dst_files:
+   put_files_struct(dst_files);
+out_put_dst_task:
+   put_task_struct(dst_task);
+out_put_src_file:
+   fput(src_file);
+   return rc;
+}
diff --git a/init/Kconfig b/init/Kconfig
index 2081a4d..dfe8b6f 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1525,6 +1525,17 @@ config EVENTFD
 
  If unsure, say Y.
 
+config SENDFD
+   bool Enable sendfd() system call if EXPERT
+   default y
+   help
+ Enable the sendfd() system call that allows rapid duplication
+ of file descriptor across process boundaries. The target process
+ will receive a duplicate file descriptor delivered with one of
+ Posix.1b real-time signals.
+
+ If unsure, say Y.
+
 # syscall, maps, verifier
 config BPF_SYSCALL
bool Enable bpf() system call if EXPERT
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/