Re: [PATCH 1/2] fs: introduce sendfd() syscall
On Wed, 3 Dec 2014 11:41:44 +0100 Richard Cochran wrote: > On Wed, Dec 03, 2014 at 09:17:37AM +0100, Richard Weinberger wrote: > > Come on guys, get a cup of coffee and relax a bit... > > I am relaxed, especially after I had a good laugh reading this: > >On a less related note, I hope you will agree that the simpler >mechanism for this very in-demand feature is long overdue on Linux >(every man and his dog are passing fds around these days). > > Really, in years and years of unix programming, I have not yet felt > the need to pass a file descriptor. Thats goes double for my dogs. Its underused in part because you need a pointy hat to do it in Unix, but it's a very common model elsewhere. Whether you need the syscall or just to write sendfd() acceptfd() in terms of AF_UNIX sockets in a library and bury the icky bits is another question. I think the reality is you'd probably end up doing the library *anyway* to deal with the fact it'll be 5 or more years before sendfd percolated everywhere even if it was merged today. Alan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] fs: introduce sendfd() syscall
On Wed, Dec 3, 2014 at 9:41 PM, Richard Cochran wrote: > In any case, I find it hard to believe that the traditional method is > really so bad. The explanation of why this new way is needed boils > down to: "unix programming is so hard to get right." Surely, this can be said about any new feature proposed. Why do we need this new thing called wheel? We lived 50k years without it just fine! It all boils down to: "walking with legs is so hard to get right". :-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] fs: introduce sendfd() syscall
On Wed, Dec 03, 2014 at 09:17:37AM +0100, Richard Weinberger wrote: > Come on guys, get a cup of coffee and relax a bit... I am relaxed, especially after I had a good laugh reading this: On a less related note, I hope you will agree that the simpler mechanism for this very in-demand feature is long overdue on Linux (every man and his dog are passing fds around these days). Really, in years and years of unix programming, I have not yet felt the need to pass a file descriptor. Thats goes double for my dogs. In any case, I find it hard to believe that the traditional method is really so bad. The explanation of why this new way is needed boils down to: "unix programming is so hard to get right." Thanks, Richard -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] fs: introduce sendfd() syscall
On Wed, Dec 3, 2014 at 9:08 AM, Richard Cochran wrote: > On Tue, Dec 02, 2014 at 10:50:46PM -0800, Eric Dumazet wrote: >> I think I will ignore your future mails. > > And I won't have time to read them either, because I will be too busy > passing fds to my two collies. Come on guys, get a cup of coffee and relax a bit... -- Thanks, //richard -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] fs: introduce sendfd() syscall
On Tue, Dec 02, 2014 at 10:50:46PM -0800, Eric Dumazet wrote: > I think I will ignore your future mails. And I won't have time to read them either, because I will be too busy passing fds to my two collies. Cheers, Richard -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] fs: introduce sendfd() syscall
On Wed, 2014-12-03 at 13:22 +1100, Alex Dubov wrote: > Yours is the first insightful message in this thread. Some of the > other commenters exhibited an unfortunate lack of understanding, > regarding what signals are and what they can be useful for. Oh nice. I think I will ignore your future mails. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] fs: introduce sendfd() syscall
On Wed, 2014-12-03 at 13:11 +1100, Alex Dubov wrote: > Kindly enlighten me, how are you going to use any file descriptor in a > 128 threads program in a scalable way (socket and all)? How this > approach will be different when using signalfd()? Thats the point : use one different channel (AF_UNIX socket, or AF_INET listener...) per thread. Each thread uses epoll() on a private epoll fd, and a dedicated channel to get fds from other processes. Sharing a signalfd() would be terrible, like using accept() on a single listener socket :( Your proposed interface, being tied to legacy signal(s), do not allow for many multiple channels. Sorry, but using signals is simply a no go for me. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] fs: introduce sendfd() syscall
On Wed, Dec 3, 2014 at 2:40 PM, Al Viro wrote: > On Wed, Dec 03, 2014 at 01:22:33PM +1100, Alex Dubov wrote: > >> On a less related note, I hope you will agree that the simpler >> mechanism for this very in-demand feature is long overdue on Linux >> (every man and his dog are passing fds around these days). > > ... and I'm less than sure that it's a good thing. If nothing else, > once the pieces of your program are passing descriptors around freely, > you have created a barfball that will be impossible to split between > several boxen if you run into scalability issues. Descriptor-passing > is limited to a single system; you *can't* do that between e.g. components > of a cluster. So it's not an unmixed blessing, just as overuse of > shared memory segments, etc. They do have their uses, but that needs > to be carefully considered every time, or you'll create a major headache > a few years down the road. Well, if you try hard enough, you can pass fds around the components of the cluster - Mosix was doing just that some 10 years ago. Conceptually, it's even easier than doing distributed shared memory, as long as mmap is not concerned. :-) I was, however, looking at it from a different standpoint. Abundance of cores in the contemporary CPUs calls for locally parallel applications (and those are still the majority - clearly 90% of the applications and their workloads fit just fine on a single node). Thus, any modern application developer faces the usual dilemma: 1. Go multi-threaded - easy inter-thread IPC, lousy reliability with minor errors in secondary tasks crashing the whole application. 2. Go multi-process - circus hoop jumping when IPC is concerned, great reliability through OS provided fault isolation (so even really broken stuff, like PHP plugin for apache manages to perform most of the time :-) Memfd (on its own) and eventfd are great steps in the right direction, as managing persistent shmem and sem objects was always pain in the arse. If there was an alternative to AF_UNIX fd passing, with its arcane API, fs persistence and mind boggling fd recursion bugs, then option 2 would became much more attractive for developers leading to over-all increase in application reliability and security. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] fs: introduce sendfd() syscall
On Wed, Dec 03, 2014 at 01:22:33PM +1100, Alex Dubov wrote: > On a less related note, I hope you will agree that the simpler > mechanism for this very in-demand feature is long overdue on Linux > (every man and his dog are passing fds around these days). ... and I'm less than sure that it's a good thing. If nothing else, once the pieces of your program are passing descriptors around freely, you have created a barfball that will be impossible to split between several boxen if you run into scalability issues. Descriptor-passing is limited to a single system; you *can't* do that between e.g. components of a cluster. So it's not an unmixed blessing, just as overuse of shared memory segments, etc. They do have their uses, but that needs to be carefully considered every time, or you'll create a major headache a few years down the road. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] fs: introduce sendfd() syscall
On Wed, Dec 3, 2014 at 4:00 AM, Al Viro wrote: > On Tue, Dec 02, 2014 at 03:35:18PM +1100, Alex Dubov wrote: >> + >> + if (rc < 0) >> + __close_fd(dst_files, s_info.si_int); > > Oh, lovely... And we are guaranteed that it still the same file, because...? > > Not to mention anything else, this stuff violates the assumption used in a lot > of places - that the *only* way for a process to modify a descriptor table is > to have a reference to it obtained by something that had it as its current > descriptor table and not dropped since then. The way you do it might actually > turn out to be OK, but there's no way I'll take that without detailed > analysis; > start with refcounting of struct file, for one thing - it does rely on the > assumption above in non-trivial ways. Ok, I see the problem here. This indeed requires further thought. > And that's aside of the points other folks had brought up. Yours is the first insightful message in this thread. Some of the other commenters exhibited an unfortunate lack of understanding, regarding what signals are and what they can be useful for. Unless, of course, I have missed something important. On a less related note, I hope you will agree that the simpler mechanism for this very in-demand feature is long overdue on Linux (every man and his dog are passing fds around these days). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] fs: introduce sendfd() syscall
On Wed, Dec 3, 2014 at 3:42 AM, Eric Dumazet wrote: > On Wed, 2014-12-03 at 03:23 +1100, Alex Dubov wrote: > > Tell me how a 128 threads program can use this new mechanism in a > scalable way. > > One signal per thread ? What for? Kernel will deliver the signal only to the thread/threads which has the relevant signal unblocked (they are blocked by default). > > I guess we'll keep AF_UNIX then, thank you. Kindly enlighten me, how are you going to use any file descriptor in a 128 threads program in a scalable way (socket and all)? How this approach will be different when using signalfd()? And no, I'm not proposing to take your favorite toys away. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] fs: introduce sendfd() syscall
On Tue, Dec 02, 2014 at 03:35:18PM +1100, Alex Dubov wrote: > + dst_files = get_files_struct(dst_task); > + if (!dst_files) { > + rc = -EMFILE; > + goto out_put_dst_task; > + } > + > + if (!lock_task_sighand(dst_task, &flags)) { > + rc = -EMFILE; > + goto out_put_dst_files; > + } > + > + rlim = task_rlimit(dst_task, RLIMIT_NOFILE); > + > + unlock_task_sighand(dst_task, &flags); > + > + rc = __alloc_fd(dst_task->files, 0, rlim, O_CLOEXEC); > + if (rc < 0) > + goto out_put_dst_files; > + > + s_info.si_int = rc; > + > + get_file(src_file); > + __fd_install(dst_files, rc, src_file); > + rc = kill_pid_info(sig, &s_info, task_pid(dst_task)); > + > + if (rc < 0) > + __close_fd(dst_files, s_info.si_int); Oh, lovely... And we are guaranteed that it still the same file, because...? Not to mention anything else, this stuff violates the assumption used in a lot of places - that the *only* way for a process to modify a descriptor table is to have a reference to it obtained by something that had it as its current descriptor table and not dropped since then. The way you do it might actually turn out to be OK, but there's no way I'll take that without detailed analysis; start with refcounting of struct file, for one thing - it does rely on the assumption above in non-trivial ways. Binder, shite as it is, satisfies that assumption. Your "simpler" variant does not. Which means that you get to prove that you won't open any races around fs/file.c. And that's aside of the points other folks had brought up. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] fs: introduce sendfd() syscall
On Wed, 2014-12-03 at 03:23 +1100, Alex Dubov wrote: > Same as SIGKILL. And yet, our machines are still working fine. > > If process A has sufficient capability to send signals to process B, > then process B is already at its mercy, fds or not fds. Tell me how a 128 threads program can use this new mechanism in a scalable way. One signal per thread ? I guess we'll keep AF_UNIX then, thank you. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] fs: introduce sendfd() syscall
On Wed, Dec 3, 2014 at 2:33 AM, Eric Dumazet wrote: > On Wed, 2014-12-03 at 01:47 +1100, Alex Dubov wrote: >> > User A can send fd(s) to processes belonging to user B, even if user B >> > does (probably) not want this to happen ? >> >> 1. Process A must have sufficient permissions to signal process B. >> This will only happen if process A belongs to the same user as process >> B or has elevated capabilities, which can not appear by themselves >> (and if root on some machine can not be trusted, then all is lost >> anyway). >> > > I do not see this enforced in your patch. > > Allowing a process to hold many times the lock protecting my file > descriptor table is very scary. > > Reserving a slot, then undo this if the signal failed is a nice way to > slow down critical programs and eventually block them from doing > progress when using file descriptors (most system calls afaik) Yes, this is an omission. I already promised to tighten the security in my last post. :) >> 2. If process B has not specified explicitly how it wants the >> particular signal to be handled, it will be killed by the default >> handler. End of story, nothing else is going to happen. > > So it seems possible for an arbitrary program to send fds to innocent > programs, that will likely fill their fd table and wont be able to open > a new file. > > This opens interesting security issues and attack vectors. Same as SIGKILL. And yet, our machines are still working fine. If process A has sufficient capability to send signals to process B, then process B is already at its mercy, fds or not fds. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] fs: introduce sendfd() syscall
On Wed, 2014-12-03 at 01:47 +1100, Alex Dubov wrote: > > User A can send fd(s) to processes belonging to user B, even if user B > > does (probably) not want this to happen ? > > 1. Process A must have sufficient permissions to signal process B. > This will only happen if process A belongs to the same user as process > B or has elevated capabilities, which can not appear by themselves > (and if root on some machine can not be trusted, then all is lost > anyway). > I do not see this enforced in your patch. Allowing a process to hold many times the lock protecting my file descriptor table is very scary. Reserving a slot, then undo this if the signal failed is a nice way to slow down critical programs and eventually block them from doing progress when using file descriptors (most system calls afaik) > 2. If process B has not specified explicitly how it wants the > particular signal to be handled, it will be killed by the default > handler. End of story, nothing else is going to happen. So it seems possible for an arbitrary program to send fds to innocent programs, that will likely fill their fd table and wont be able to open a new file. This opens interesting security issues and attack vectors. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] fs: introduce sendfd() syscall
> User A can send fd(s) to processes belonging to user B, even if user B > does (probably) not want this to happen ? 1. Process A must have sufficient permissions to signal process B. This will only happen if process A belongs to the same user as process B or has elevated capabilities, which can not appear by themselves (and if root on some machine can not be trusted, then all is lost anyway). 2. If process B has not specified explicitly how it wants the particular signal to be handled, it will be killed by the default handler. End of story, nothing else is going to happen. I suppose, I can add an extra permissions check prior to creating the new file descriptor in the first place. > Also, relying on signals seems quite old fashion these days. How about > multi-threaded programs wanting separate channels to receive fds ? Most multi-threaded programs share the same file table between all threads (unless some fancy clone() magic is involved), so the issue is rather mundane. At any rate, each thread has its own pid and the usual signal routing applies. At a more generic level Posix real-time signals are anything, but old-fashioned. sigqueue()/signalfd() pair provides a very convenient, low overhead micro-messaging facility with ordered, reliably delivery. I fail to see what's wrong with making a worthy use of it. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] fs: introduce sendfd() syscall
On Tue, 2014-12-02 at 15:35 +1100, Alex Dubov wrote: > Present patch introduces exceptionally easy to use, low latency and low > overhead mechanism for transferring file descriptors between cooperating > processes: > > int sendfd(pid_t pid, int sig, int fd) > > Given a target process pid, the sendfd() syscall will create a duplicate > file descriptor in a target task's (referred by pid) file table pointing to > the file references by descriptor fd. Then, it will attempt to notify the > target task by issuing a Posix.1b real-time signal (sig), carrying the new > file descriptor as integer payload. If real-time signal can not be enqueued > at the destination signal queue, the newly created file descriptor will be > promptly closed. > > Signed-off-by: Alex Dubov > --- User A can send fd(s) to processes belonging to user B, even if user B does (probably) not want this to happen ? Also, relying on signals seems quite old fashion these days. How about multi-threaded programs wanting separate channels to receive fds ? Ability to flood fds and fill target file descriptors table looks very dangerous to me. Some programs could break as they expect they control fd allocations. I like the idea of not having to use AF_UNIX and stick to a well defined interface, but I do not like this asynchronous model. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/