[PATCH 2/3] fix utrace_control(DETACH) utrace-death interaction
Suppose that we want to detach the engine and free engine-data. To avoid the races with our calbacks which can use -data we should do either err = utrace_set_events(0); if (err == ...) utrace_barrier(); utrace_control(DETACH); or err = utrace_control(DETACH); if (err = ...) utrace_barrier(); Neither variant works if we should care about report_quiesce() or report_death(). Let's discuss the 2nd case, the 1st one have the similar problems. The problem is, utrace_control(DETACH) does nothing and returns -EALREADY if utrace-death is set, this is not right. We can and should detach in this case, we only should skip utrace_reset() to avoid the race with utrace_report_death()-REPORT_CALLBACKS(). This was the main problem I hit during the testing. Consider the exiting tracee with the valid engine-ops, suppose that utrace_control(DETACH) is called right after utrace_report_death() unlocks utrace-lock and before it does REPORT_CALLBACKS(). utrace_control() fails, but utrace_barrier() does not help after that. It takes get_utrace_lock() successfully and returns 0. The caller frees engine-data, but utrace_report_death() calls -report_death() after that. Kill utrace_control_dead(). Change utrace_control() to always allow UTRACE_DETACH if engine has the valid -ops. This means that mark_engine_detached() can race with REPORT_CALLBACKS() but there is nothing new. utrace_do_stop() is unnecessary and pointless in this case, but it doesn't hurt. Signed-off-by: Oleg Nesterov o...@redhat.com --- kernel/utrace.c | 49 ++--- 1 file changed, 10 insertions(+), 39 deletions(-) --- kstub/kernel/utrace.c~6_fix_control_dead2010-08-16 11:17:05.0 +0200 +++ kstub/kernel/utrace.c 2010-08-16 11:17:51.0 +0200 @@ -928,37 +928,6 @@ void utrace_maybe_reap(struct task_struc } } -/* - * You can't do anything to a dead task but detach it. - * If release_task() has been called, you can't do that. - * - * On the exit path, DEATH and QUIESCE event bits are set only - * before utrace_report_death() has taken the lock. At that point, - * the death report will come soon, so disallow detach until it's - * done. This prevents us from racing with it detaching itself. - * - * Called only when @target-exit_state is nonzero. - */ -static inline int utrace_control_dead(struct task_struct *target, - struct utrace *utrace, - enum utrace_resume_action action) -{ - lockdep_assert_held(utrace-lock); - - if (action != UTRACE_DETACH || unlikely(utrace-reap)) - return -ESRCH; - - if (unlikely(utrace-death)) - /* -* We have already started the death report. We can't -* prevent the report_death and report_reap callbacks, -* so tell the caller they will happen. -*/ - return -EALREADY; - - return 0; -} - /** * utrace_control - control a thread being traced by a tracing engine * @target:thread to affect @@ -1115,18 +1084,20 @@ int utrace_control(struct task_struct *t ret = 0; /* -* -exit_state can change under us, this doesn't matter. -* We do not care about -exit_state in fact, but we do -* care about -reap and -death. If either flag is set, -* we must also see -exit_state != 0. +* -exit_state can change under us, this doesn't matter. But, +* if utrace-death is set we must also see -exit_state != 0, +* this is what we care about. */ if (unlikely(target-exit_state)) { - ret = utrace_control_dead(target, utrace, action); - if (ret) { + /* You can't do anything to a dead task but detach it */ + if (action != UTRACE_DETACH) { spin_unlock(utrace-lock); - return ret; + return -ESRCH; } - reset = true; + + /* If it is not set, we can safely do utrace_reset() */ + if (likely(!utrace-death)) + reset = true; } switch (action) {
[PATCH 3/3] utrace_set_events: kill now unnecessary exit_state/reap checks
Now that get_utrace_lock() can never succeed if utrace-reap is true, we can kill this check in utrace_set_events(). This also means we can kill the target-exit_state check, it buys nothing now. Signed-off-by: Oleg Nesterov o...@redhat.com --- kernel/utrace.c | 13 - 1 file changed, 4 insertions(+), 9 deletions(-) --- kstub/kernel/utrace.c~7_set_events_kill_reap2010-08-16 11:17:51.0 +0200 +++ kstub/kernel/utrace.c 2010-08-16 11:28:00.0 +0200 @@ -539,15 +539,10 @@ int utrace_set_events(struct task_struct old_utrace_flags = target-utrace_flags; old_flags = engine-flags ~ENGINE_STOP; - /* If -death or -reap is true we must see exit_state != 0. */ - if (target-exit_state) { - if (utrace-death) { - if ((old_flags ~events) _UTRACE_DEATH_EVENTS) - goto unlock; - } else if (utrace-reap) { - if ((old_flags ^ events) UTRACE_EVENT(REAP)) - goto unlock; - } + if ((old_flags ~events) _UTRACE_DEATH_EVENTS) { + /* Too late if utrace_report_death() is in progress */ + if (utrace-death) + goto unlock; } /*
Re: [PATCH 0/3] UTRACE_DETACH fixes
On 08/16, Oleg Nesterov wrote: Well, I tried to test these changes, seems to work. But... From 2/3 changelog: This means that mark_engine_detached() can race with REPORT_CALLBACKS() but there is nothing new. Yes, there is nothing new. But, Roland, this is WRONG! With or without these fixes utrace_control()-mark_engine_detached() was always wrong if it races with REPORT/REPORT_CALLBACKS. Not sure what we were thinking about :/ Look. Suppose that utrace_control(DETACH) is called right after start_callback() returns ops != NULL. After that finish_callback(..., ops-callback(...)) can hit -callback == NULL! Cough. I guess I spent too much time with utrace and testing today ;) Let me try again. mark_engine_detached() does engine-ops = utrace_detached_ops; smp_wmb(); engine-flags = UTRACE_EVENT(QUIESCE); Whatever we do, start_callback() can see the old engine-flags but the new -ops = utrace_detached_ops. Just suppose that the caller of UTRACE_DETACH is interrupted right after setting engine-ops. The obvious and simplest solution: utrace_detached_ops{} should have the dummy handler for every callback. But I'd like to think a bit more. Yes. Perhaps (unlikely) we can reverse the order, but I can no longer think properly today. Or do you think I miss something and this is false alarm? Oleg.
Re: [PATCH 2/4] utrace_set_events: consolidate setting _UTRACE_DEATH_EVENTS checks
Hmm. For unknown reason I do not see this 2/4 patch on utrace-devel, strange. So I am attaching it in case my email was really lost and you didn't get it. Indeed, it did not come through to me or the list. Thanks, Roland
Re: [PATCH 2/4] utrace_set_events: consolidate setting _UTRACE_DEATH_EVENTS checks
utrace_set_events() checks the setting _UTRACE_DEATH_EVENTS case twice, and it is not immediately obvious why the first check is needed, and why it is not racy (we are checking exit_state without tasklist). The code is correct, but looks confusing. More comments are always good. In short: this double check allows to avoid tasklist when utrace_flags already has these bits while engine-flags doesn't. But multitracing is unlikely case, in the likely case if we add _UTRACE_DEATH_EVENTS to engine-flags we set these bits in utrace_flags too. I think it makes sense to consolidate these checks to make the code a bit more understandable. I don't really agree about unlikely case. In many uses, the systemtap task-finder will have a utrace engine on every task in the system, for example. Moreover, this is an importantly distinct particular kind of micro-optimization to be doing here: avoiding a system-wide lock. Any place that we take tasklist_lock at all, we are introducing a system-wide slowdown or limitation on scaling, just because of our tracing of one task. So optimizing the minimize that is qualitatively different and really much more important than avoiding taking the utrace lock, for example. But with that note in your mind, I am happy to take this patch now as a simplification and let reoptimization come back later. Thanks, Roland
Re: [PATCH 1/3] get_utrace_lock() must not succeed if utrace-reap == T
get_utrace_lock() must threat utrace-reap == T as engine-ops == NULL. Yes, I think you're right. This requires some changes to the kerneldoc and utrace.tmpl, because it now says that you get EALREADY if report_reap is already running. Now it will be consistent with utrace_control, where you get ESRCH either if report_reap might already be running or if it's entirely detached and/or reaped. Thanks, Roland
Re: [PATCH 2/3] fix utrace_control(DETACH) utrace-death interaction
The problem is, utrace_control(DETACH) does nothing and returns -EALREADY if utrace-death is set, this is not right. We can and should detach in this case, we only should skip utrace_reset() to avoid the race with utrace_report_death()-REPORT_CALLBACKS(). This behavior is the original (minimal) synchronization scheme from before we had utrace_barrier. See Interlock with final callbacks in Documentation/DocBook/utrace.tmpl. The expectation is that your report_death is going to clean up your -data stuff and then return UTRACE_DETACH. If utrace_control returns -EALREADY, then you know that report_death is taking care of things. I'd imagined that you'd do something like: mutex_lock(stuff engine-data points to); mark in there that we are detaching; ret = utrace_control(task, engine, UTRACE_DETACH); if (ret == 0) { /* detached. tear our stuff down. */ ... return DONE; } else if (ret == -EALREADY) { /* report_death is running, i.e. waiting for our lock */ mutex_unlock(...); return DONE; } else ... Put another way, you can have told your report_death beforehand that it should return UTRACE_DETACH if it runs before your utrace_control call. Thanks, Roland
Re: [PATCH 0/3] UTRACE_DETACH fixes
Whatever we do, start_callback() can see the old engine-flags but the new -ops = utrace_detached_ops. Just suppose that the caller of UTRACE_DETACH is interrupted right after setting engine-ops. * it can check the old flags before using the old ops, or check the old * flags before using the new ops, or check the new flags before using the * new ops, but can never check the new flags before using the old ops. Or do you think I miss something and this is false alarm? * Hence, utrace_detached_ops might be used with any old flags in place. * It has report_quiesce() and report_reap() callbacks to handle all cases. report_reap is covered with the utrace_detached_reap() stub. Every other callback uses start_callback(), i.e. calls report_quiesce first. utrace_detached_quiesce() returns UTRACE_DETACH, so start_callback() will return NULL. [...] but I can no longer think properly today. Sleep well. :-) Thanks, Roland