Re: [RFC PATCH] thread_local_abi system call: caching current CPU number (x86)
On Fri, Jul 17, 2015 at 10:49:19AM +, Ben Maurer wrote: > Mathieu Desnoyers wrote: > > Expose a new system call allowing threads to register a userspace memory > > area where to store the current CPU number. Scheduler migration sets the > > I really like that this approach makes it easier to add a per-thread > interaction between userspace and the kernel in the future. > > >+ if (!tlap || t->thread_local_abi_len < > >+ offsetof(struct thread_local_abi, cpu) > >+ + sizeof(tlap->cpu)) > > Could you save a branch here by enforcing that thread_local_abi_len = 0 if > thread_local_abi = null? "saving a branch" doesn't seem like a good reason to do that; however, it *is* the convention across other calls: if you pass 0, the pointer is ignored, but if you pass non-zero, the pointer must be valid or you get -EFAULT (or an actual segfault). - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] thread_local_abi system call: caching current CPU number (x86)
- On Jul 17, 2015, at 8:48 AM, Nikolay Borisov n.bori...@siteground.com wrote: > On 07/16/2015 11:00 PM, Mathieu Desnoyers wrote: >> Expose a new system call allowing threads to register a userspace memory >> area where to store the current CPU number. Scheduler migration sets the >> TIF_NOTIFY_RESUME flag on the current thread. Upon return to user-space, >> a notify-resume handler updates the current CPU value within that >> user-space memory area. >> >> This getcpu cache is an alternative to the sched_getcpu() vdso which has >> a few benefits: >> - It is faster to do a memory read that to call a vDSO, >> - This cache value can be read from within an inline assembly, which >> makes it a useful building block for restartable sequences. >> >> This approach is inspired by Paul Turner and Andrew Hunter's work >> on percpu atomics, which lets the kernel handle restart of critical >> sections: >> Ref.: >> * https://lkml.org/lkml/2015/6/24/665 >> * https://lwn.net/Articles/650333/ >> * >> http://www.linuxplumbersconf.org/2013/ocw/system/presentations/1695/original/LPC%20-%20PerCpu%20Atomics.pdf >> >> Benchmarking sched_getcpu() vs tls cache approach. Getting the >> current CPU number: >> >> - With Linux vdso:12.7 ns >> - With TLS-cached cpu number: 0.3 ns >> >> The system call can be extended by registering a larger structure in >> the future. >> [...] >> +/* >> + * sys_thread_local_abi - setup thread-local ABI for caller thread >> + */ >> +SYSCALL_DEFINE3(thread_local_abi, struct thread_local_abi __user *, tlap, >> +size_t, len, int, flags) >> +{ >> +size_t minlen; >> + >> +if (flags) >> +return -EINVAL; >> +if (current->thread_local_abi && tlap) >> +return -EBUSY; >> +/* Agree on the intersection of userspace and kernel features */ >> +minlen = min_t(size_t, len, sizeof(struct thread_local_abi)); >> +current->thread_local_abi_len = minlen; >> +current->thread_local_abi = tlap; >> +if (!tlap) >> +return 0; >> +/* >> + * Migration checks ->thread_local_abi to see if notify_resume >> + * flag should be set. Therefore, we need to ensure that >> + * the scheduler sees ->thread_local_abi before we update its content. >> + */ >> +barrier(); /* Store thread_local_abi before update content */ >> +if (getcpu_cache_active(current)) { > > Just checking whether my understanding of the code is correct, but this > 'if' is necessary in case we have been moved to a different CPU after > the store of the thread_local_abi? No, this is not correct. Currently, only the getcpu_cache feature is implemented, but if struct thread_local_abi eventually grows with more fields, userspace could call the kernel with a "len" argument that does not cover some of the features. Therefore, the generic way to check whether getcpu_cache is implemented by the current thread is to call "getcpu_cache_active()". If it is enabled, then we need to update the getcpu_cache content for the current thread. The barrier() above is required because we want to store thread_local_abi (and thread_local_abi_len) before we get the current CPU number and store it into the getcpu_cache, because we could be migrated by the scheduler with CONFIG_PREEMPT=y at any point between the moment we read the current CPU number within getcpu_cache_update() and resume userspace. Having thread_local_abi and thread_local_abi_len set before fetching the current CPU number ensures that the scheduler will succeed its own getcpu_cache_active() check, and will therefore raise the resume notifier flag upon migration, which will then fix the CPU number before resuming to userspace. Thanks, Mathieu > >> +if (getcpu_cache_update(current)) >> +return -EFAULT; >> +} >> +return minlen; >> +} -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] thread_local_abi system call: caching current CPU number (x86)
- On Jul 17, 2015, at 6:49 AM, Ben Maurer bmau...@fb.com wrote: > Mathieu Desnoyers wrote: >> Expose a new system call allowing threads to register a userspace memory >> area where to store the current CPU number. Scheduler migration sets the > > I really like that this approach makes it easier to add a per-thread > interaction > between userspace and the kernel in the future. > >>+ if (!tlap || t->thread_local_abi_len < >>+ offsetof(struct thread_local_abi, cpu) >>+ + sizeof(tlap->cpu)) > > Could you save a branch here by enforcing that thread_local_abi_len = 0 if > thread_local_abi = null? Yes, good idea! Will do. Thanks! Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] thread_local_abi system call: caching current CPU number (x86)
On 07/16/2015 11:00 PM, Mathieu Desnoyers wrote: > Expose a new system call allowing threads to register a userspace memory > area where to store the current CPU number. Scheduler migration sets the > TIF_NOTIFY_RESUME flag on the current thread. Upon return to user-space, > a notify-resume handler updates the current CPU value within that > user-space memory area. > > This getcpu cache is an alternative to the sched_getcpu() vdso which has > a few benefits: > - It is faster to do a memory read that to call a vDSO, > - This cache value can be read from within an inline assembly, which > makes it a useful building block for restartable sequences. > > This approach is inspired by Paul Turner and Andrew Hunter's work > on percpu atomics, which lets the kernel handle restart of critical > sections: > Ref.: > * https://lkml.org/lkml/2015/6/24/665 > * https://lwn.net/Articles/650333/ > * > http://www.linuxplumbersconf.org/2013/ocw/system/presentations/1695/original/LPC%20-%20PerCpu%20Atomics.pdf > > Benchmarking sched_getcpu() vs tls cache approach. Getting the > current CPU number: > > - With Linux vdso:12.7 ns > - With TLS-cached cpu number: 0.3 ns > > The system call can be extended by registering a larger structure in > the future. > > Signed-off-by: Mathieu Desnoyers > CC: Paul Turner > CC: Andrew Hunter > CC: Peter Zijlstra > CC: Ingo Molnar > CC: Ben Maurer > CC: Steven Rostedt > CC: "Paul E. McKenney" > CC: Josh Triplett > CC: Linus Torvalds > CC: Andrew Morton > CC: linux-...@vger.kernel.org > --- > arch/x86/kernel/signal.c | 2 + > arch/x86/syscalls/syscall_64.tbl | 1 + > fs/exec.c | 1 + > include/linux/sched.h | 35 ++ > include/uapi/asm-generic/unistd.h | 4 +- > include/uapi/linux/Kbuild | 1 + > include/uapi/linux/thread_local_abi.h | 37 ++ > init/Kconfig | 9 > kernel/Makefile | 1 + > kernel/fork.c | 2 + > kernel/sched/core.c | 4 ++ > kernel/sched/sched.h | 2 + > kernel/sys_ni.c | 3 ++ > kernel/thread_local_abi.c | 90 > +++ > 14 files changed, 191 insertions(+), 1 deletion(-) > create mode 100644 include/uapi/linux/thread_local_abi.h > create mode 100644 kernel/thread_local_abi.c > > diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c > index e504246..157cec0 100644 > --- a/arch/x86/kernel/signal.c > +++ b/arch/x86/kernel/signal.c > @@ -750,6 +750,8 @@ do_notify_resume(struct pt_regs *regs, void *unused, > __u32 thread_info_flags) > if (thread_info_flags & _TIF_NOTIFY_RESUME) { > clear_thread_flag(TIF_NOTIFY_RESUME); > tracehook_notify_resume(regs); > + if (getcpu_cache_active(current)) > + getcpu_cache_handle_notify_resume(current); > } > if (thread_info_flags & _TIF_USER_RETURN_NOTIFY) > fire_user_return_notifiers(); > diff --git a/arch/x86/syscalls/syscall_64.tbl > b/arch/x86/syscalls/syscall_64.tbl > index 8d656fb..0eb2fc2 100644 > --- a/arch/x86/syscalls/syscall_64.tbl > +++ b/arch/x86/syscalls/syscall_64.tbl > @@ -329,6 +329,7 @@ > 320 common kexec_file_load sys_kexec_file_load > 321 common bpf sys_bpf > 322 64 execveatstub_execveat > +323 common thread_local_abisys_thread_local_abi > > # > # x32-specific system call numbers start at 512 to avoid cache impact > diff --git a/fs/exec.c b/fs/exec.c > index c7f9b73..e5acf80 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -1555,6 +1555,7 @@ static int do_execveat_common(int fd, struct filename > *filename, > /* execve succeeded */ > current->fs->in_exec = 0; > current->in_execve = 0; > + thread_local_abi_execve(current); > acct_update_integrals(current); > task_numa_free(current); > free_bprm(bprm); > diff --git a/include/linux/sched.h b/include/linux/sched.h > index a419b65..4a3fc52 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -2,6 +2,7 @@ > #define _LINUX_SCHED_H > > #include > +#include > > #include > > @@ -1710,6 +1711,10 @@ struct task_struct { > #ifdef CONFIG_DEBUG_ATOMIC_SLEEP > unsigned long task_state_change; > #endif > +#ifdef CONFIG_THREAD_LOCAL_ABI > + size_t thread_local_abi_len; > + struct thread_local_abi __user *thread_local_abi; > +#endif > }; > > /* Future-safe accessor for struct task_struct's cpus_allowed. */ > @@ -3090,4 +3095,34 @@ static inline unsigned long rlimit_max(unsigned int > limit) > return task_rlimit_max(current, limit); > } > > +#ifdef CONFIG_THREAD_LOCAL_ABI > +void thread_local_abi_fork(struct task_struct *t); > +void thread_local_abi_execve(struct task_struct *t); > +void
RE: [RFC PATCH] thread_local_abi system call: caching current CPU number (x86)
Mathieu Desnoyers wrote: > Expose a new system call allowing threads to register a userspace memory > area where to store the current CPU number. Scheduler migration sets the I really like that this approach makes it easier to add a per-thread interaction between userspace and the kernel in the future. >+ if (!tlap || t->thread_local_abi_len < >+ offsetof(struct thread_local_abi, cpu) >+ + sizeof(tlap->cpu)) Could you save a branch here by enforcing that thread_local_abi_len = 0 if thread_local_abi = null? -b-- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] thread_local_abi system call: caching current CPU number (x86)
On 07/16/2015 11:00 PM, Mathieu Desnoyers wrote: Expose a new system call allowing threads to register a userspace memory area where to store the current CPU number. Scheduler migration sets the TIF_NOTIFY_RESUME flag on the current thread. Upon return to user-space, a notify-resume handler updates the current CPU value within that user-space memory area. This getcpu cache is an alternative to the sched_getcpu() vdso which has a few benefits: - It is faster to do a memory read that to call a vDSO, - This cache value can be read from within an inline assembly, which makes it a useful building block for restartable sequences. This approach is inspired by Paul Turner and Andrew Hunter's work on percpu atomics, which lets the kernel handle restart of critical sections: Ref.: * https://lkml.org/lkml/2015/6/24/665 * https://lwn.net/Articles/650333/ * http://www.linuxplumbersconf.org/2013/ocw/system/presentations/1695/original/LPC%20-%20PerCpu%20Atomics.pdf Benchmarking sched_getcpu() vs tls cache approach. Getting the current CPU number: - With Linux vdso:12.7 ns - With TLS-cached cpu number: 0.3 ns The system call can be extended by registering a larger structure in the future. Signed-off-by: Mathieu Desnoyers mathieu.desnoy...@efficios.com CC: Paul Turner p...@google.com CC: Andrew Hunter a...@google.com CC: Peter Zijlstra pet...@infradead.org CC: Ingo Molnar mi...@redhat.com CC: Ben Maurer bmau...@fb.com CC: Steven Rostedt rost...@goodmis.org CC: Paul E. McKenney paul...@linux.vnet.ibm.com CC: Josh Triplett j...@joshtriplett.org CC: Linus Torvalds torva...@linux-foundation.org CC: Andrew Morton a...@linux-foundation.org CC: linux-...@vger.kernel.org --- arch/x86/kernel/signal.c | 2 + arch/x86/syscalls/syscall_64.tbl | 1 + fs/exec.c | 1 + include/linux/sched.h | 35 ++ include/uapi/asm-generic/unistd.h | 4 +- include/uapi/linux/Kbuild | 1 + include/uapi/linux/thread_local_abi.h | 37 ++ init/Kconfig | 9 kernel/Makefile | 1 + kernel/fork.c | 2 + kernel/sched/core.c | 4 ++ kernel/sched/sched.h | 2 + kernel/sys_ni.c | 3 ++ kernel/thread_local_abi.c | 90 +++ 14 files changed, 191 insertions(+), 1 deletion(-) create mode 100644 include/uapi/linux/thread_local_abi.h create mode 100644 kernel/thread_local_abi.c diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c index e504246..157cec0 100644 --- a/arch/x86/kernel/signal.c +++ b/arch/x86/kernel/signal.c @@ -750,6 +750,8 @@ do_notify_resume(struct pt_regs *regs, void *unused, __u32 thread_info_flags) if (thread_info_flags _TIF_NOTIFY_RESUME) { clear_thread_flag(TIF_NOTIFY_RESUME); tracehook_notify_resume(regs); + if (getcpu_cache_active(current)) + getcpu_cache_handle_notify_resume(current); } if (thread_info_flags _TIF_USER_RETURN_NOTIFY) fire_user_return_notifiers(); diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl index 8d656fb..0eb2fc2 100644 --- a/arch/x86/syscalls/syscall_64.tbl +++ b/arch/x86/syscalls/syscall_64.tbl @@ -329,6 +329,7 @@ 320 common kexec_file_load sys_kexec_file_load 321 common bpf sys_bpf 322 64 execveatstub_execveat +323 common thread_local_abisys_thread_local_abi # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/fs/exec.c b/fs/exec.c index c7f9b73..e5acf80 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1555,6 +1555,7 @@ static int do_execveat_common(int fd, struct filename *filename, /* execve succeeded */ current-fs-in_exec = 0; current-in_execve = 0; + thread_local_abi_execve(current); acct_update_integrals(current); task_numa_free(current); free_bprm(bprm); diff --git a/include/linux/sched.h b/include/linux/sched.h index a419b65..4a3fc52 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2,6 +2,7 @@ #define _LINUX_SCHED_H #include uapi/linux/sched.h +#include uapi/linux/thread_local_abi.h #include linux/sched/prio.h @@ -1710,6 +1711,10 @@ struct task_struct { #ifdef CONFIG_DEBUG_ATOMIC_SLEEP unsigned long task_state_change; #endif +#ifdef CONFIG_THREAD_LOCAL_ABI + size_t thread_local_abi_len; + struct thread_local_abi __user *thread_local_abi; +#endif }; /* Future-safe accessor for struct task_struct's cpus_allowed. */ @@ -3090,4 +3095,34 @@ static inline unsigned long rlimit_max(unsigned int limit) return task_rlimit_max(current, limit); } +#ifdef
Re: [RFC PATCH] thread_local_abi system call: caching current CPU number (x86)
- On Jul 17, 2015, at 8:48 AM, Nikolay Borisov n.bori...@siteground.com wrote: On 07/16/2015 11:00 PM, Mathieu Desnoyers wrote: Expose a new system call allowing threads to register a userspace memory area where to store the current CPU number. Scheduler migration sets the TIF_NOTIFY_RESUME flag on the current thread. Upon return to user-space, a notify-resume handler updates the current CPU value within that user-space memory area. This getcpu cache is an alternative to the sched_getcpu() vdso which has a few benefits: - It is faster to do a memory read that to call a vDSO, - This cache value can be read from within an inline assembly, which makes it a useful building block for restartable sequences. This approach is inspired by Paul Turner and Andrew Hunter's work on percpu atomics, which lets the kernel handle restart of critical sections: Ref.: * https://lkml.org/lkml/2015/6/24/665 * https://lwn.net/Articles/650333/ * http://www.linuxplumbersconf.org/2013/ocw/system/presentations/1695/original/LPC%20-%20PerCpu%20Atomics.pdf Benchmarking sched_getcpu() vs tls cache approach. Getting the current CPU number: - With Linux vdso:12.7 ns - With TLS-cached cpu number: 0.3 ns The system call can be extended by registering a larger structure in the future. [...] +/* + * sys_thread_local_abi - setup thread-local ABI for caller thread + */ +SYSCALL_DEFINE3(thread_local_abi, struct thread_local_abi __user *, tlap, +size_t, len, int, flags) +{ +size_t minlen; + +if (flags) +return -EINVAL; +if (current-thread_local_abi tlap) +return -EBUSY; +/* Agree on the intersection of userspace and kernel features */ +minlen = min_t(size_t, len, sizeof(struct thread_local_abi)); +current-thread_local_abi_len = minlen; +current-thread_local_abi = tlap; +if (!tlap) +return 0; +/* + * Migration checks -thread_local_abi to see if notify_resume + * flag should be set. Therefore, we need to ensure that + * the scheduler sees -thread_local_abi before we update its content. + */ +barrier(); /* Store thread_local_abi before update content */ +if (getcpu_cache_active(current)) { Just checking whether my understanding of the code is correct, but this 'if' is necessary in case we have been moved to a different CPU after the store of the thread_local_abi? No, this is not correct. Currently, only the getcpu_cache feature is implemented, but if struct thread_local_abi eventually grows with more fields, userspace could call the kernel with a len argument that does not cover some of the features. Therefore, the generic way to check whether getcpu_cache is implemented by the current thread is to call getcpu_cache_active(). If it is enabled, then we need to update the getcpu_cache content for the current thread. The barrier() above is required because we want to store thread_local_abi (and thread_local_abi_len) before we get the current CPU number and store it into the getcpu_cache, because we could be migrated by the scheduler with CONFIG_PREEMPT=y at any point between the moment we read the current CPU number within getcpu_cache_update() and resume userspace. Having thread_local_abi and thread_local_abi_len set before fetching the current CPU number ensures that the scheduler will succeed its own getcpu_cache_active() check, and will therefore raise the resume notifier flag upon migration, which will then fix the CPU number before resuming to userspace. Thanks, Mathieu +if (getcpu_cache_update(current)) +return -EFAULT; +} +return minlen; +} -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] thread_local_abi system call: caching current CPU number (x86)
- On Jul 17, 2015, at 6:49 AM, Ben Maurer bmau...@fb.com wrote: Mathieu Desnoyers wrote: Expose a new system call allowing threads to register a userspace memory area where to store the current CPU number. Scheduler migration sets the I really like that this approach makes it easier to add a per-thread interaction between userspace and the kernel in the future. + if (!tlap || t-thread_local_abi_len + offsetof(struct thread_local_abi, cpu) + + sizeof(tlap-cpu)) Could you save a branch here by enforcing that thread_local_abi_len = 0 if thread_local_abi = null? Yes, good idea! Will do. Thanks! Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC PATCH] thread_local_abi system call: caching current CPU number (x86)
Mathieu Desnoyers wrote: Expose a new system call allowing threads to register a userspace memory area where to store the current CPU number. Scheduler migration sets the I really like that this approach makes it easier to add a per-thread interaction between userspace and the kernel in the future. + if (!tlap || t-thread_local_abi_len + offsetof(struct thread_local_abi, cpu) + + sizeof(tlap-cpu)) Could you save a branch here by enforcing that thread_local_abi_len = 0 if thread_local_abi = null? -b-- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] thread_local_abi system call: caching current CPU number (x86)
On Fri, Jul 17, 2015 at 10:49:19AM +, Ben Maurer wrote: Mathieu Desnoyers wrote: Expose a new system call allowing threads to register a userspace memory area where to store the current CPU number. Scheduler migration sets the I really like that this approach makes it easier to add a per-thread interaction between userspace and the kernel in the future. + if (!tlap || t-thread_local_abi_len + offsetof(struct thread_local_abi, cpu) + + sizeof(tlap-cpu)) Could you save a branch here by enforcing that thread_local_abi_len = 0 if thread_local_abi = null? saving a branch doesn't seem like a good reason to do that; however, it *is* the convention across other calls: if you pass 0, the pointer is ignored, but if you pass non-zero, the pointer must be valid or you get -EFAULT (or an actual segfault). - Josh Triplett -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/