[PATCH 000/109] remove in-kernel calls to syscalls

2018-03-29 Thread Dominik Brodowski
[ While most parts of this patch set have been sent out already at least
  once, I send out *all* patches to lkml once again as this whole series
  touches several different subsystems in sensitive areas. ]

System calls are interaction points between userspace and the kernel.
Therefore, system call functions such as sys_xyzzy() or compat_sys_xyzzy()
should only be called from userspace via the syscall table, but not from
elsewhere in the kernel.

At least on 64-bit x86, it will likely be a hard requirement from v4.17
onwards to not call system call functions in the kernel: It is better to
use use a different calling convention for system calls there, where 
struct pt_regs is decoded on-the-fly in a syscall wrapper which then hands
processing over to the actual syscall function. This means that only those
parameters which are actually needed for a specific syscall are passed on
during syscall entry, instead of filling in six CPU registers with random
user space content all the time (which may cause serious trouble down the
call chain).[*]

Moreover, rules on how data may be accessed may differ between kernel data
and user data.  This is another reason why calling sys_xyzzy() is
generally a bad idea, and -- at most -- acceptable in arch-specific code.


This patchset removes all in-kernel calls to syscall functions in the
kernel with the exception of arch/. On top of this, it cleans up the
three places where many syscalls are referenced or prototyped, namely
kernel/sys_ni.c, include/linux/syscalls.h and include/linux/compat.h.
Patches 1 to 101 have been sent out earlier, namely
- part 1 ( 
http://lkml.kernel.org/r/20180315190529.20943-1-li...@dominikbrodowski.net )
- part 2 ( 
http://lkml.kernel.org/r/20180316170614.5392-1-li...@dominikbrodowski.net )
- part 3 ( 
http://lkml.kernel.org/r/20180322090059.19361-1-li...@dominikbrodowski.net ).

Changes since these earlier versions are:

- I have added a lot more documentation and improved the commit messages,
  namely to explain the naming convention and the rationale of this
  patches.

- ACKs/Reviewed-by (thanks!) were added .

- Shuffle the patches around to have them grouped together systematically:

First goes a patch which defines the goal and explains the rationale:

  syscalls: define and explain goal to not call syscalls in the kernel

A few codepaths can trivially be converted to existing in-kernel interfaces:

  kernel: use kernel_wait4() instead of sys_wait4()
  kernel: open-code sys_rt_sigpending() in sys_sigpending()
  kexec: call do_kexec_load() in compat syscall directly
  mm: use do_futex() instead of sys_futex() in mm_release()
  x86: use _do_fork() in compat_sys_x86_clone()
  x86: remove compat_sys_x86_waitpid()

Then follow many patches which only affect specfic subsystems each, and
replace sys_*() with internal helpers named __sys_*() or do_sys_*(). Let's
start with net/:

  net: socket: add __sys_recvfrom() helper; remove in-kernel call to syscall
  net: socket: add __sys_sendto() helper; remove in-kernel call to syscall
  net: socket: add __sys_accept4() helper; remove in-kernel call to syscall
  net: socket: add __sys_socket() helper; remove in-kernel call to syscall
  net: socket: add __sys_bind() helper; remove in-kernel call to syscall
  net: socket: add __sys_connect() helper; remove in-kernel call to syscall
  net: socket: add __sys_listen() helper; remove in-kernel call to syscall
  net: socket: add __sys_getsockname() helper; remove in-kernel call to syscall
  net: socket: add __sys_getpeername() helper; remove in-kernel call to syscall
  net: socket: add __sys_socketpair() helper; remove in-kernel call to syscall
  net: socket: add __sys_shutdown() helper; remove in-kernel call to syscall
  net: socket: add __sys_setsockopt() helper; remove in-kernel call to syscall
  net: socket: add __sys_getsockopt() helper; remove in-kernel call to syscall
  net: socket: add do_sys_recvmmsg() helper; remove in-kernel call to syscall
  net: socket: move check for forbid_cmsg_compat to __sys_...msg()
  net: socket: replace calls to sys_send() with __sys_sendto()
  net: socket: replace call to sys_recv() with __sys_recvfrom()
  net: socket: add __compat_sys_recvfrom() helper; remove in-kernel call to 
compat syscall
  net: socket: add __compat_sys_setsockopt() helper; remove in-kernel call to 
compat syscall
  net: socket: add __compat_sys_getsockopt() helper; remove in-kernel call to 
compat syscall
  net: socket: add __compat_sys_recvmmsg() helper; remove in-kernel call to 
compat syscall
  net: socket: add __compat_sys_...msg() helpers; remove in-kernel calls to 
compat syscalls

The changes in ipc/ are limited to this specific subsystem. The wrappers are
named ksys_*() to denote that these functions are meant as a drop-in replacement
for the syscalls.

  ipc: add semtimedop syscall/compat_syscall wrappers
  ipc: add semget syscall wrapper
  ipc: add semctl syscall/compat_syscall wrappers
  ipc: add msgget syscall wrap

Re: [PATCH 000/109] remove in-kernel calls to syscalls

2018-03-29 Thread Matthew Wilcox
On Thu, Mar 29, 2018 at 01:22:37PM +0200, Dominik Brodowski wrote:
> At least on 64-bit x86, it will likely be a hard requirement from v4.17
> onwards to not call system call functions in the kernel: It is better to
> use use a different calling convention for system calls there, where 
> struct pt_regs is decoded on-the-fly in a syscall wrapper which then hands
> processing over to the actual syscall function. This means that only those
> parameters which are actually needed for a specific syscall are passed on
> during syscall entry, instead of filling in six CPU registers with random
> user space content all the time (which may cause serious trouble down the
> call chain).[*]

How do we stop new ones from springing up?  Some kind of linker trick
like was used to, er, "dissuade" people from using gets()?


Re: [PATCH 000/109] remove in-kernel calls to syscalls

2018-03-29 Thread Dominik Brodowski
On Thu, Mar 29, 2018 at 07:20:27AM -0700, Matthew Wilcox wrote:
> On Thu, Mar 29, 2018 at 01:22:37PM +0200, Dominik Brodowski wrote:
> > At least on 64-bit x86, it will likely be a hard requirement from v4.17
> > onwards to not call system call functions in the kernel: It is better to
> > use use a different calling convention for system calls there, where 
> > struct pt_regs is decoded on-the-fly in a syscall wrapper which then hands
> > processing over to the actual syscall function. This means that only those
> > parameters which are actually needed for a specific syscall are passed on
> > during syscall entry, instead of filling in six CPU registers with random
> > user space content all the time (which may cause serious trouble down the
> > call chain).[*]
> 
> How do we stop new ones from springing up?  Some kind of linker trick
> like was used to, er, "dissuade" people from using gets()?

Once the patches which modify the syscall calling convention are merged,
it won't compile on 64-bit x86, but bark loudly. That should frighten anyone.
Meow.

Thanks,
Dominik


RE: [PATCH 000/109] remove in-kernel calls to syscalls

2018-03-29 Thread David Laight
From: Dominik Brodowski
> Sent: 29 March 2018 15:42
> On Thu, Mar 29, 2018 at 07:20:27AM -0700, Matthew Wilcox wrote:
> > On Thu, Mar 29, 2018 at 01:22:37PM +0200, Dominik Brodowski wrote:
> > > At least on 64-bit x86, it will likely be a hard requirement from v4.17
> > > onwards to not call system call functions in the kernel: It is better to
> > > use use a different calling convention for system calls there, where
> > > struct pt_regs is decoded on-the-fly in a syscall wrapper which then hands
> > > processing over to the actual syscall function. This means that only those
> > > parameters which are actually needed for a specific syscall are passed on
> > > during syscall entry, instead of filling in six CPU registers with random
> > > user space content all the time (which may cause serious trouble down the
> > > call chain).[*]
> >
> > How do we stop new ones from springing up?  Some kind of linker trick
> > like was used to, er, "dissuade" people from using gets()?
> 
> Once the patches which modify the syscall calling convention are merged,
> it won't compile on 64-bit x86, but bark loudly. That should frighten anyone.
> Meow.

Should be pretty easy to ensure the prototypes aren't in any normal header.
Renaming the global symbols (to not match the function name) will make it
much harder to call them as well.

David



Re: [PATCH 000/109] remove in-kernel calls to syscalls

2018-03-29 Thread Dominik Brodowski
On Thu, Mar 29, 2018 at 02:46:44PM +, David Laight wrote:
> From: Dominik Brodowski
> > Sent: 29 March 2018 15:42
> > On Thu, Mar 29, 2018 at 07:20:27AM -0700, Matthew Wilcox wrote:
> > > On Thu, Mar 29, 2018 at 01:22:37PM +0200, Dominik Brodowski wrote:
> > > > At least on 64-bit x86, it will likely be a hard requirement from v4.17
> > > > onwards to not call system call functions in the kernel: It is better to
> > > > use use a different calling convention for system calls there, where
> > > > struct pt_regs is decoded on-the-fly in a syscall wrapper which then 
> > > > hands
> > > > processing over to the actual syscall function. This means that only 
> > > > those
> > > > parameters which are actually needed for a specific syscall are passed 
> > > > on
> > > > during syscall entry, instead of filling in six CPU registers with 
> > > > random
> > > > user space content all the time (which may cause serious trouble down 
> > > > the
> > > > call chain).[*]
> > >
> > > How do we stop new ones from springing up?  Some kind of linker trick
> > > like was used to, er, "dissuade" people from using gets()?
> > 
> > Once the patches which modify the syscall calling convention are merged,
> > it won't compile on 64-bit x86, but bark loudly. That should frighten 
> > anyone.
> > Meow.
> 
> Should be pretty easy to ensure the prototypes aren't in any normal header.

That's exactly why the compile will fail.

> Renaming the global symbols (to not match the function name) will make it
> much harder to call them as well.

That still depends on the exact design of the patchset, which is still under
review.

Thanks,
Dominik