On Wed, Mar 25, 2015 at 03:03:47PM -0400, Mathieu Desnoyers wrote: > Here is an implementation of a new system call, sys_membarrier(), which > executes a memory barrier on all threads running on the system. It is > implemented by calling synchronize_sched(). It can be used to distribute > the cost of user-space memory barriers asymmetrically by transforming > pairs of memory barriers into pairs consisting of sys_membarrier() and a > compiler barrier. For synchronization primitives that distinguish > between read-side and write-side (e.g. userspace RCU [1], rwlocks), the > read-side can be accelerated significantly by moving the bulk of the > memory barrier overhead to the write-side.
[ . . . ] > Signed-off-by: Mathieu Desnoyers <mathieu.desnoy...@efficios.com> > CC: Paul E. McKenney <paul...@linux.vnet.ibm.com> > CC: Josh Triplett <j...@joshtriplett.org> > CC: KOSAKI Motohiro <kosaki.motoh...@jp.fujitsu.com> > CC: Steven Rostedt <rost...@goodmis.org> > CC: Nicholas Miell <nmi...@comcast.net> > CC: Linus Torvalds <torva...@linux-foundation.org> > CC: Ingo Molnar <mi...@redhat.com> > CC: Alan Cox <gno...@lxorguk.ukuu.org.uk> > CC: Lai Jiangshan <la...@cn.fujitsu.com> > CC: Stephen Hemminger <step...@networkplumber.org> > CC: Andrew Morton <a...@linux-foundation.org> > CC: Thomas Gleixner <t...@linutronix.de> > CC: Peter Zijlstra <pet...@infradead.org> > CC: David Howells <dhowe...@redhat.com> Reviewed-by: Paul E. McKenney <paul...@linux.vnet.ibm.com> > --- > MAINTAINERS | 8 ++++ > arch/x86/syscalls/syscall_32.tbl | 1 + > arch/x86/syscalls/syscall_64.tbl | 1 + > include/linux/syscalls.h | 2 + > include/uapi/asm-generic/unistd.h | 4 +- > include/uapi/linux/Kbuild | 1 + > include/uapi/linux/membarrier.h | 57 ++++++++++++++++++++++++++++ > init/Kconfig | 12 ++++++ > kernel/Makefile | 1 + > kernel/membarrier.c | 75 > +++++++++++++++++++++++++++++++++++++ > kernel/sys_ni.c | 3 + > 11 files changed, 164 insertions(+), 1 deletions(-) > create mode 100644 include/uapi/linux/membarrier.h > create mode 100644 kernel/membarrier.c > > diff --git a/MAINTAINERS b/MAINTAINERS > index d66a97d..7fbb698 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -6206,6 +6206,14 @@ W: http://www.mellanox.com > Q: http://patchwork.ozlabs.org/project/netdev/list/ > F: drivers/net/ethernet/mellanox/mlx4/en_* > > +MEMBARRIER SUPPORT > +M: Mathieu Desnoyers <mathieu.desnoy...@efficios.com> > +M: "Paul E. McKenney" <paul...@linux.vnet.ibm.com> > +L: linux-kernel@vger.kernel.org > +S: Supported > +F: kernel/membarrier.c > +F: include/uapi/linux/membarrier.h > + > MEMORY MANAGEMENT > L: linux...@kvack.org > W: http://www.linux-mm.org > diff --git a/arch/x86/syscalls/syscall_32.tbl > b/arch/x86/syscalls/syscall_32.tbl > index b3560ec..439415f 100644 > --- a/arch/x86/syscalls/syscall_32.tbl > +++ b/arch/x86/syscalls/syscall_32.tbl > @@ -365,3 +365,4 @@ > 356 i386 memfd_create sys_memfd_create > 357 i386 bpf sys_bpf > 358 i386 execveat sys_execveat > stub32_execveat > +359 i386 membarrier sys_membarrier > diff --git a/arch/x86/syscalls/syscall_64.tbl > b/arch/x86/syscalls/syscall_64.tbl > index 8d656fb..823130d 100644 > --- a/arch/x86/syscalls/syscall_64.tbl > +++ b/arch/x86/syscalls/syscall_64.tbl > @@ -329,6 +329,7 @@ > 320 common kexec_file_load sys_kexec_file_load > 321 common bpf sys_bpf > 322 64 execveat stub_execveat > +323 common membarrier sys_membarrier > > # > # x32-specific system call numbers start at 512 to avoid cache impact > diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h > index 85893d7..058ec0a 100644 > --- a/include/linux/syscalls.h > +++ b/include/linux/syscalls.h > @@ -882,4 +882,6 @@ asmlinkage long sys_execveat(int dfd, const char __user > *filename, > const char __user *const __user *argv, > const char __user *const __user *envp, int flags); > > +asmlinkage long sys_membarrier(int flags); > + > #endif > diff --git a/include/uapi/asm-generic/unistd.h > b/include/uapi/asm-generic/unistd.h > index e016bd9..8da542a 100644 > --- a/include/uapi/asm-generic/unistd.h > +++ b/include/uapi/asm-generic/unistd.h > @@ -709,9 +709,11 @@ __SYSCALL(__NR_memfd_create, sys_memfd_create) > __SYSCALL(__NR_bpf, sys_bpf) > #define __NR_execveat 281 > __SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat) > +#define __NR_membarrier 282 > +__SYSCALL(__NR_membarrier, sys_membarrier) > > #undef __NR_syscalls > -#define __NR_syscalls 282 > +#define __NR_syscalls 283 > > /* > * All syscalls below here should go away really, > diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild > index 00b10002..c5b0dbf 100644 > --- a/include/uapi/linux/Kbuild > +++ b/include/uapi/linux/Kbuild > @@ -248,6 +248,7 @@ header-y += mdio.h > header-y += media.h > header-y += media-bus-format.h > header-y += mei.h > +header-y += membarrier.h > header-y += memfd.h > header-y += mempolicy.h > header-y += meye.h > diff --git a/include/uapi/linux/membarrier.h b/include/uapi/linux/membarrier.h > new file mode 100644 > index 0000000..b6f8f40 > --- /dev/null > +++ b/include/uapi/linux/membarrier.h > @@ -0,0 +1,57 @@ > +#ifndef _UAPI_LINUX_MEMBARRIER_H > +#define _UAPI_LINUX_MEMBARRIER_H > + > +/* > + * linux/membarrier.h > + * > + * membarrier system call API > + * > + * Copyright (c) 2010, 2015 Mathieu Desnoyers > <mathieu.desnoy...@efficios.com> > + * > + * Permission is hereby granted, free of charge, to any person obtaining a > copy > + * of this software and associated documentation files (the "Software"), to > deal > + * in the Software without restriction, including without limitation the > rights > + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell > + * copies of the Software, and to permit persons to whom the Software is > + * furnished to do so, subject to the following conditions: > + * > + * The above copyright notice and this permission notice shall be included in > + * all copies or substantial portions of the Software. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL > THE > + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING > FROM, > + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN > THE > + * SOFTWARE. > + */ > + > +/* > + * All memory accesses performed in program order from each thread on > + * the system is guaranteed to be ordered with respect to sys_membarrier(). > + * If we use the semantic "barrier()" to represent a compiler barrier > + * forcing memory accesses to be performed in program order across the > + * barrier, and smp_mb() to represent explicit memory barriers forcing > + * full memory ordering across the barrier, we have the following > + * ordering table for each pair of barrier(), sys_membarrier() and > + * smp_mb() : > + * > + * The pair ordering is detailed as (O: ordered, X: not ordered): > + * > + * barrier() smp_mb() sys_membarrier() > + * barrier() X X O > + * smp_mb() X O O > + * sys_membarrier() O O O > + */ > + > +/* System call membarrier "flags" argument. */ > +enum { > + /* > + * Query whether the rest of the specified flags are supported, > + * without performing synchronization. > + */ > + MEMBARRIER_QUERY = (1 << 31), > +}; > + > +#endif /* _UAPI_LINUX_MEMBARRIER_H */ > diff --git a/init/Kconfig b/init/Kconfig > index 9afb971..2452f3c 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -1568,6 +1568,18 @@ config PCI_QUIRKS > bugs/quirks. Disable this only if your target machine is > unaffected by PCI quirks. > > +config MEMBARRIER > + bool "Enable membarrier() system call" if EXPERT > + default y > + help > + Enable the membarrier() system call that allows issuing memory > + barriers across all running threads, which can be used to distribute > + the cost of user-space memory barriers asymmetrically by transforming > + pairs of memory barriers into pairs consisting of membarrier() and a > + compiler barrier. > + > + If unsure, say Y. > + > config EMBEDDED > bool "Embedded system" > option allnoconfig_y > diff --git a/kernel/Makefile b/kernel/Makefile > index a59481a..b572ced 100644 > --- a/kernel/Makefile > +++ b/kernel/Makefile > @@ -95,6 +95,7 @@ obj-$(CONFIG_CRASH_DUMP) += crash_dump.o > obj-$(CONFIG_JUMP_LABEL) += jump_label.o > obj-$(CONFIG_CONTEXT_TRACKING) += context_tracking.o > obj-$(CONFIG_TORTURE_TEST) += torture.o > +obj-$(CONFIG_MEMBARRIER) += membarrier.o > > $(obj)/configs.o: $(obj)/config_data.h > > diff --git a/kernel/membarrier.c b/kernel/membarrier.c > new file mode 100644 > index 0000000..3077e94 > --- /dev/null > +++ b/kernel/membarrier.c > @@ -0,0 +1,75 @@ > +/* > + * Copyright (C) 2010, 2015 Mathieu Desnoyers > <mathieu.desnoy...@efficios.com> > + * > + * membarrier system call > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation; either version 2 of the License, or > + * (at your option) any later version. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + */ > + > +#include <linux/syscalls.h> > +#include <linux/membarrier.h> > + > +static int membarrier_validate_flags(int flags) > +{ > + /* Check for unrecognized flags. */ > + if (flags & ~MEMBARRIER_QUERY) > + return -EINVAL; > + return 0; > +} > + > +#ifdef CONFIG_SMP > + > +/* > + * sys_membarrier - issue memory barrier on all running threads > + * @flags: MEMBARRIER_QUERY: > + * Query whether the rest of the specified flags are supported, > + * without performing synchronization. > + * > + * return values: Returns -EINVAL if the flags are incorrect. Testing > + * for kernel sys_membarrier support can be done by checking for -ENOSYS > + * return value. Return value of 0 indicates success. For a given set > + * of flags on a given kernel, this system call will always return the > + * same value. It is therefore correct to check the return value only > + * once during a process lifetime, setting MEMBARRIER_QUERY to only > + * check if the flags are supported, without performing any > + * synchronization. > + * > + * This system call executes a memory barrier on all running threads. > + * Upon completion, the caller thread is ensured that all running > + * threads have passed through a state where all memory accesses to > + * user-space addresses match program order. (non-running threads are de > + * facto in such a state.) > + * > + * On uniprocessor systems, this system call simply returns 0 after > + * validating the arguments, so user-space knows it is implemented. > + */ > +SYSCALL_DEFINE1(membarrier, int, flags) > +{ > + int retval; > + > + retval = membarrier_validate_flags(flags); > + if (retval) > + goto end; > + if (unlikely(flags & MEMBARRIER_QUERY) || num_online_cpus() == 1) > + goto end; > + synchronize_sched(); > +end: > + return retval; > +} > + > +#else /* !CONFIG_SMP */ > + > +SYSCALL_DEFINE1(membarrier, int, flags) > +{ > + return membarrier_validate_flags(flags); > +} > + > +#endif /* CONFIG_SMP */ > diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c > index 5adcb0a..5913b84 100644 > --- a/kernel/sys_ni.c > +++ b/kernel/sys_ni.c > @@ -229,3 +229,6 @@ cond_syscall(sys_bpf); > > /* execveat */ > cond_syscall(sys_execveat); > + > +/* membarrier */ > +cond_syscall(sys_membarrier); > -- > 1.7.7.3 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/