[PATCH 000/109] remove in-kernel calls to syscalls
[ While most parts of this patch set have been sent out already at least once, I send out *all* patches to lkml once again as this whole series touches several different subsystems in sensitive areas. ] System calls are interaction points between userspace and the kernel. Therefore, system call functions such as sys_xyzzy() or compat_sys_xyzzy() should only be called from userspace via the syscall table, but not from elsewhere in the kernel. At least on 64-bit x86, it will likely be a hard requirement from v4.17 onwards to not call system call functions in the kernel: It is better to use use a different calling convention for system calls there, where struct pt_regs is decoded on-the-fly in a syscall wrapper which then hands processing over to the actual syscall function. This means that only those parameters which are actually needed for a specific syscall are passed on during syscall entry, instead of filling in six CPU registers with random user space content all the time (which may cause serious trouble down the call chain).[*] Moreover, rules on how data may be accessed may differ between kernel data and user data. This is another reason why calling sys_xyzzy() is generally a bad idea, and -- at most -- acceptable in arch-specific code. This patchset removes all in-kernel calls to syscall functions in the kernel with the exception of arch/. On top of this, it cleans up the three places where many syscalls are referenced or prototyped, namely kernel/sys_ni.c, include/linux/syscalls.h and include/linux/compat.h. Patches 1 to 101 have been sent out earlier, namely - part 1 ( http://lkml.kernel.org/r/20180315190529.20943-1-li...@dominikbrodowski.net ) - part 2 ( http://lkml.kernel.org/r/20180316170614.5392-1-li...@dominikbrodowski.net ) - part 3 ( http://lkml.kernel.org/r/20180322090059.19361-1-li...@dominikbrodowski.net ). Changes since these earlier versions are: - I have added a lot more documentation and improved the commit messages, namely to explain the naming convention and the rationale of this patches. - ACKs/Reviewed-by (thanks!) were added . - Shuffle the patches around to have them grouped together systematically: First goes a patch which defines the goal and explains the rationale: syscalls: define and explain goal to not call syscalls in the kernel A few codepaths can trivially be converted to existing in-kernel interfaces: kernel: use kernel_wait4() instead of sys_wait4() kernel: open-code sys_rt_sigpending() in sys_sigpending() kexec: call do_kexec_load() in compat syscall directly mm: use do_futex() instead of sys_futex() in mm_release() x86: use _do_fork() in compat_sys_x86_clone() x86: remove compat_sys_x86_waitpid() Then follow many patches which only affect specfic subsystems each, and replace sys_*() with internal helpers named __sys_*() or do_sys_*(). Let's start with net/: net: socket: add __sys_recvfrom() helper; remove in-kernel call to syscall net: socket: add __sys_sendto() helper; remove in-kernel call to syscall net: socket: add __sys_accept4() helper; remove in-kernel call to syscall net: socket: add __sys_socket() helper; remove in-kernel call to syscall net: socket: add __sys_bind() helper; remove in-kernel call to syscall net: socket: add __sys_connect() helper; remove in-kernel call to syscall net: socket: add __sys_listen() helper; remove in-kernel call to syscall net: socket: add __sys_getsockname() helper; remove in-kernel call to syscall net: socket: add __sys_getpeername() helper; remove in-kernel call to syscall net: socket: add __sys_socketpair() helper; remove in-kernel call to syscall net: socket: add __sys_shutdown() helper; remove in-kernel call to syscall net: socket: add __sys_setsockopt() helper; remove in-kernel call to syscall net: socket: add __sys_getsockopt() helper; remove in-kernel call to syscall net: socket: add do_sys_recvmmsg() helper; remove in-kernel call to syscall net: socket: move check for forbid_cmsg_compat to __sys_...msg() net: socket: replace calls to sys_send() with __sys_sendto() net: socket: replace call to sys_recv() with __sys_recvfrom() net: socket: add __compat_sys_recvfrom() helper; remove in-kernel call to compat syscall net: socket: add __compat_sys_setsockopt() helper; remove in-kernel call to compat syscall net: socket: add __compat_sys_getsockopt() helper; remove in-kernel call to compat syscall net: socket: add __compat_sys_recvmmsg() helper; remove in-kernel call to compat syscall net: socket: add __compat_sys_...msg() helpers; remove in-kernel calls to compat syscalls The changes in ipc/ are limited to this specific subsystem. The wrappers are named ksys_*() to denote that these functions are meant as a drop-in replacement for the syscalls. ipc: add semtimedop syscall/compat_syscall wrappers ipc: add semget syscall wrapper ipc: add semctl syscall/compat_syscall wrappers ipc: add msgget syscall wrap
Re: [PATCH 000/109] remove in-kernel calls to syscalls
On Thu, Mar 29, 2018 at 01:22:37PM +0200, Dominik Brodowski wrote: > At least on 64-bit x86, it will likely be a hard requirement from v4.17 > onwards to not call system call functions in the kernel: It is better to > use use a different calling convention for system calls there, where > struct pt_regs is decoded on-the-fly in a syscall wrapper which then hands > processing over to the actual syscall function. This means that only those > parameters which are actually needed for a specific syscall are passed on > during syscall entry, instead of filling in six CPU registers with random > user space content all the time (which may cause serious trouble down the > call chain).[*] How do we stop new ones from springing up? Some kind of linker trick like was used to, er, "dissuade" people from using gets()?
Re: [PATCH 000/109] remove in-kernel calls to syscalls
On Thu, Mar 29, 2018 at 07:20:27AM -0700, Matthew Wilcox wrote: > On Thu, Mar 29, 2018 at 01:22:37PM +0200, Dominik Brodowski wrote: > > At least on 64-bit x86, it will likely be a hard requirement from v4.17 > > onwards to not call system call functions in the kernel: It is better to > > use use a different calling convention for system calls there, where > > struct pt_regs is decoded on-the-fly in a syscall wrapper which then hands > > processing over to the actual syscall function. This means that only those > > parameters which are actually needed for a specific syscall are passed on > > during syscall entry, instead of filling in six CPU registers with random > > user space content all the time (which may cause serious trouble down the > > call chain).[*] > > How do we stop new ones from springing up? Some kind of linker trick > like was used to, er, "dissuade" people from using gets()? Once the patches which modify the syscall calling convention are merged, it won't compile on 64-bit x86, but bark loudly. That should frighten anyone. Meow. Thanks, Dominik
RE: [PATCH 000/109] remove in-kernel calls to syscalls
From: Dominik Brodowski > Sent: 29 March 2018 15:42 > On Thu, Mar 29, 2018 at 07:20:27AM -0700, Matthew Wilcox wrote: > > On Thu, Mar 29, 2018 at 01:22:37PM +0200, Dominik Brodowski wrote: > > > At least on 64-bit x86, it will likely be a hard requirement from v4.17 > > > onwards to not call system call functions in the kernel: It is better to > > > use use a different calling convention for system calls there, where > > > struct pt_regs is decoded on-the-fly in a syscall wrapper which then hands > > > processing over to the actual syscall function. This means that only those > > > parameters which are actually needed for a specific syscall are passed on > > > during syscall entry, instead of filling in six CPU registers with random > > > user space content all the time (which may cause serious trouble down the > > > call chain).[*] > > > > How do we stop new ones from springing up? Some kind of linker trick > > like was used to, er, "dissuade" people from using gets()? > > Once the patches which modify the syscall calling convention are merged, > it won't compile on 64-bit x86, but bark loudly. That should frighten anyone. > Meow. Should be pretty easy to ensure the prototypes aren't in any normal header. Renaming the global symbols (to not match the function name) will make it much harder to call them as well. David
Re: [PATCH 000/109] remove in-kernel calls to syscalls
On Thu, Mar 29, 2018 at 02:46:44PM +, David Laight wrote: > From: Dominik Brodowski > > Sent: 29 March 2018 15:42 > > On Thu, Mar 29, 2018 at 07:20:27AM -0700, Matthew Wilcox wrote: > > > On Thu, Mar 29, 2018 at 01:22:37PM +0200, Dominik Brodowski wrote: > > > > At least on 64-bit x86, it will likely be a hard requirement from v4.17 > > > > onwards to not call system call functions in the kernel: It is better to > > > > use use a different calling convention for system calls there, where > > > > struct pt_regs is decoded on-the-fly in a syscall wrapper which then > > > > hands > > > > processing over to the actual syscall function. This means that only > > > > those > > > > parameters which are actually needed for a specific syscall are passed > > > > on > > > > during syscall entry, instead of filling in six CPU registers with > > > > random > > > > user space content all the time (which may cause serious trouble down > > > > the > > > > call chain).[*] > > > > > > How do we stop new ones from springing up? Some kind of linker trick > > > like was used to, er, "dissuade" people from using gets()? > > > > Once the patches which modify the syscall calling convention are merged, > > it won't compile on 64-bit x86, but bark loudly. That should frighten > > anyone. > > Meow. > > Should be pretty easy to ensure the prototypes aren't in any normal header. That's exactly why the compile will fail. > Renaming the global symbols (to not match the function name) will make it > much harder to call them as well. That still depends on the exact design of the patchset, which is still under review. Thanks, Dominik