On Wed, Aug 10, 2016 at 05:33:44PM +0000, Mathieu Desnoyers wrote: > ----- On Aug 9, 2016, at 12:13 PM, Boqun Feng [email protected] wrote: > > <snip> > > > > > However, I'm thinking maybe we can use some tricks to avoid unnecessary > > aborts-on-preemption. > > > > First of all, I notice we haven't make any constraint on what kind of > > memory objects could be "protected" by rseq critical sections yet. And I > > think this is something we should decide before adding this feature into > > kernel. > > > > We can do some optimization if we have some constraints. For example, if > > the memory objects inside the rseq critical sections could only be > > modified by userspace programs, we therefore don't need to abort > > immediately when userspace task -> kernel task context switch. > > The rseq_owner per-cpu variable and rseq_cpu field in task_struct you > propose below would indeed take care of this scenario. > > > > > Further more, if the memory objects inside the rseq critical sections > > could only be modified by userspace programs that have registered their > > rseq structures, we don't need to abort immediately between the context > > switches between two rseq-unregistered tasks or one rseq-registered > > task and one rseq-unregistered task. > > > > Instead, we do tricks as follow: > > > > defining a percpu pointer in kernel: > > > > DEFINE_PER_CPU(struct task_struct *, rseq_owner); > > > > and a cpu field in struct task_struct: > > > > struct task_struct { > > ... > > #ifdef CONFIG_RSEQ > > struct rseq __user *rseq; > > uint32_t rseq_event_counter; > > int rseq_cpu; > > #endif > > ... > > }; > > > > (task_struct::rseq_cpu should be initialized as -1.) > > > > each time at sched out(in rseq_sched_out()), we do something like: > > > > if (prev->rseq) { > > raw_cpu_write(rseq_owner, prev); > > prev->rseq_cpu = smp_processor_id(); > > } > > > > each time sched in(in rseq_handle_notify_resume()), we do something > > like: > > > > if (current->rseq && > > (this_cpu_read(rseq_owner) != current || > > current->rseq_cpu != smp_processor_id())) > > __rseq_handle_notify_resume(regs); > > > > (Also need to modify rseq_signal_deliver() to call > > __rseq_handle_notify_resume() directly). > > > > > > I think this could save some unnecessary aborts-on-preemption, however, > > TBH, I'm too sleepy to verify every corner case. Will recheck this > > tomorrow. > > This adds extra fields to the task struct, per-cpu rseq_owner pointers, > and hooks into sched_in which are not needed otherwise, all this to > eliminate unneeded abort-on-preemption. > > If we look at the single-stepping use-case, this means that gdb would > only be able to single-step applications as long as neither itself, nor > any of its libraries, use rseq. This seems to be quite fragile. I prefer > requiring rseq users to implement a fallback to locking which progresses > in every situation rather than adding complexity and overhead trying > lessen the odds of triggering the restart. > > Simply lessening the odds of triggering the restart without a design that > ensures progress even in restart cases seems to make the lack-of-progress > problem just harder to debug when it will surface in real life. >
Fair enough.
I did my own research of the mechanism I proposed. The patch is attached
at the end of the email. Unfortunately, there is no noticeable
performance gain for the current benchmark. One possible reason may be:
The rseq critical sections in current benchmark are quite small, which
makes retrying is not that expensive.
From another angle, this may imply that in current senarios,
abort-on-preemption doesn't hurt the performance much. But these are
only my two cents.
> Thanks,
>
> Mathieu
>
---
include/linux/sched.h | 18 +++++++++++++++---
kernel/rseq.c | 2 ++
2 files changed, 17 insertions(+), 3 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 5875fdd6edc8..c23e5dee9c60 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1922,6 +1922,7 @@ struct task_struct {
#ifdef CONFIG_RSEQ
struct rseq __user *rseq;
u32 rseq_event_counter;
+ int rseq_cpu;
#endif
/* CPU-specific state of this task */
struct thread_struct thread;
@@ -3393,6 +3394,8 @@ void cpufreq_remove_update_util_hook(int cpu);
#endif /* CONFIG_CPU_FREQ */
#ifdef CONFIG_RSEQ
+DECLARE_PER_CPU(struct task_struct *, rseq_owner);
+
static inline void rseq_set_notify_resume(struct task_struct *t)
{
if (t->rseq)
@@ -3401,7 +3404,9 @@ static inline void rseq_set_notify_resume(struct
task_struct *t)
void __rseq_handle_notify_resume(struct pt_regs *regs);
static inline void rseq_handle_notify_resume(struct pt_regs *regs)
{
- if (current->rseq)
+ if (current->rseq &&
+ (current != raw_cpu_read(rseq_owner) ||
+ current->rseq_cpu != smp_processor_id()))
__rseq_handle_notify_resume(regs);
}
/*
@@ -3415,9 +3420,11 @@ static inline void rseq_fork(struct task_struct *t,
unsigned long clone_flags)
if (clone_flags & CLONE_THREAD) {
t->rseq = NULL;
t->rseq_event_counter = 0;
+ t->rseq_cpu = -1;
} else {
t->rseq = current->rseq;
t->rseq_event_counter = current->rseq_event_counter;
+ t->rseq_cpu = -1;
rseq_set_notify_resume(t);
}
}
@@ -3428,11 +3435,16 @@ static inline void rseq_execve(struct task_struct *t)
}
static inline void rseq_sched_out(struct task_struct *t)
{
- rseq_set_notify_resume(t);
+ if (t->rseq) {
+ raw_cpu_write(rseq_owner, t);
+ t->rseq_cpu = smp_processor_id();
+ rseq_set_notify_resume(t);
+ }
}
static inline void rseq_signal_deliver(struct pt_regs *regs)
{
- rseq_handle_notify_resume(regs);
+ if (current->rseq)
+ __rseq_handle_notify_resume(regs);
}
#else
static inline void rseq_set_notify_resume(struct task_struct *t)
diff --git a/kernel/rseq.c b/kernel/rseq.c
index 7e4d1d0e9520..0390a57ef0e5 100644
--- a/kernel/rseq.c
+++ b/kernel/rseq.c
@@ -100,6 +100,8 @@
* F2. Return false.
*/
+DEFINE_PER_CPU(struct task_struct *, rseq_owner);
+
/*
* The rseq_event_counter allow user-space to detect preemption and
* signal delivery. It increments at least once before returning to
--
2.9.0
signature.asc
Description: PGP signature

