[PATCH 2/3] fix utrace_control(DETACH) utrace-death interaction

2010-08-16 Thread Oleg Nesterov
Suppose that we want to detach the engine and free engine-data. To
avoid the races with our calbacks which can use -data we should do
either

err = utrace_set_events(0);
if (err == ...)
utrace_barrier();
utrace_control(DETACH);

or

err = utrace_control(DETACH);
if (err = ...)
utrace_barrier();

Neither variant works if we should care about report_quiesce() or
report_death(). Let's discuss the 2nd case, the 1st one have the
similar problems.

The problem is, utrace_control(DETACH) does nothing and returns
-EALREADY if utrace-death is set, this is not right. We can and
should detach in this case, we only should skip utrace_reset() to
avoid the race with utrace_report_death()-REPORT_CALLBACKS().

This was the main problem I hit during the testing. Consider the
exiting tracee with the valid engine-ops, suppose that
utrace_control(DETACH) is called right after utrace_report_death()
unlocks utrace-lock and before it does REPORT_CALLBACKS().

utrace_control() fails, but utrace_barrier() does not help after
that. It takes get_utrace_lock() successfully and returns 0.
The caller frees engine-data, but utrace_report_death() calls
-report_death() after that.

Kill utrace_control_dead(). Change utrace_control() to always
allow UTRACE_DETACH if engine has the valid -ops. This means that
mark_engine_detached() can race with REPORT_CALLBACKS() but there
is nothing new. utrace_do_stop() is unnecessary and pointless in
this case, but it doesn't hurt.

Signed-off-by: Oleg Nesterov o...@redhat.com
---

 kernel/utrace.c |   49 ++---
 1 file changed, 10 insertions(+), 39 deletions(-)

--- kstub/kernel/utrace.c~6_fix_control_dead2010-08-16 11:17:05.0 
+0200
+++ kstub/kernel/utrace.c   2010-08-16 11:17:51.0 +0200
@@ -928,37 +928,6 @@ void utrace_maybe_reap(struct task_struc
}
 }
 
-/*
- * You can't do anything to a dead task but detach it.
- * If release_task() has been called, you can't do that.
- *
- * On the exit path, DEATH and QUIESCE event bits are set only
- * before utrace_report_death() has taken the lock.  At that point,
- * the death report will come soon, so disallow detach until it's
- * done.  This prevents us from racing with it detaching itself.
- *
- * Called only when @target-exit_state is nonzero.
- */
-static inline int utrace_control_dead(struct task_struct *target,
- struct utrace *utrace,
- enum utrace_resume_action action)
-{
-   lockdep_assert_held(utrace-lock);
-
-   if (action != UTRACE_DETACH || unlikely(utrace-reap))
-   return -ESRCH;
-
-   if (unlikely(utrace-death))
-   /*
-* We have already started the death report.  We can't
-* prevent the report_death and report_reap callbacks,
-* so tell the caller they will happen.
-*/
-   return -EALREADY;
-
-   return 0;
-}
-
 /**
  * utrace_control - control a thread being traced by a tracing engine
  * @target:thread to affect
@@ -1115,18 +1084,20 @@ int utrace_control(struct task_struct *t
ret = 0;
 
/*
-* -exit_state can change under us, this doesn't matter.
-* We do not care about -exit_state in fact, but we do
-* care about -reap and -death. If either flag is set,
-* we must also see -exit_state != 0.
+* -exit_state can change under us, this doesn't matter. But,
+* if utrace-death is set we must also see -exit_state != 0,
+* this is what we care about.
 */
if (unlikely(target-exit_state)) {
-   ret = utrace_control_dead(target, utrace, action);
-   if (ret) {
+   /* You can't do anything to a dead task but detach it */
+   if (action != UTRACE_DETACH) {
spin_unlock(utrace-lock);
-   return ret;
+   return -ESRCH;
}
-   reset = true;
+
+   /* If it is not set, we can safely do utrace_reset() */
+   if (likely(!utrace-death))
+   reset = true;
}
 
switch (action) {



[PATCH 3/3] utrace_set_events: kill now unnecessary exit_state/reap checks

2010-08-16 Thread Oleg Nesterov
Now that get_utrace_lock() can never succeed if utrace-reap is true,
we can kill this check in utrace_set_events(). This also means we can
kill the target-exit_state check, it buys nothing now.

Signed-off-by: Oleg Nesterov o...@redhat.com
---

 kernel/utrace.c |   13 -
 1 file changed, 4 insertions(+), 9 deletions(-)

--- kstub/kernel/utrace.c~7_set_events_kill_reap2010-08-16 
11:17:51.0 +0200
+++ kstub/kernel/utrace.c   2010-08-16 11:28:00.0 +0200
@@ -539,15 +539,10 @@ int utrace_set_events(struct task_struct
old_utrace_flags = target-utrace_flags;
old_flags = engine-flags  ~ENGINE_STOP;
 
-   /* If -death or -reap is true we must see exit_state != 0. */
-   if (target-exit_state) {
-   if (utrace-death) {
-   if ((old_flags  ~events)   _UTRACE_DEATH_EVENTS)
-   goto unlock;
-   } else if (utrace-reap) {
-   if ((old_flags ^ events)  UTRACE_EVENT(REAP))
-   goto unlock;
-   }
+   if ((old_flags  ~events)   _UTRACE_DEATH_EVENTS) {
+   /* Too late if utrace_report_death() is in progress */
+   if (utrace-death)
+   goto unlock;
}
 
/*



Re: [PATCH 0/3] UTRACE_DETACH fixes

2010-08-16 Thread Oleg Nesterov
On 08/16, Oleg Nesterov wrote:

 Well, I tried to test these changes, seems to work.

 But... From 2/3 changelog:

   This means that
   mark_engine_detached() can race with REPORT_CALLBACKS() but there
   is nothing new.

 Yes, there is nothing new. But, Roland, this is WRONG!

 With or without these fixes utrace_control()-mark_engine_detached()
 was always wrong if it races with REPORT/REPORT_CALLBACKS. Not sure
 what we were thinking about :/

 Look. Suppose that utrace_control(DETACH) is called right after
 start_callback() returns ops != NULL. After that
 finish_callback(...,  ops-callback(...)) can hit -callback == NULL!

Cough. I guess I spent too much time with utrace and testing today ;)

Let me try again. mark_engine_detached() does

engine-ops = utrace_detached_ops;
smp_wmb();
engine-flags = UTRACE_EVENT(QUIESCE);

Whatever we do, start_callback() can see the old engine-flags but
the new -ops = utrace_detached_ops. Just suppose that the caller
of UTRACE_DETACH is interrupted right after setting engine-ops.

 The obvious and simplest solution: utrace_detached_ops{} should
 have the dummy handler for every callback. But I'd like to think
 a bit more.

Yes. Perhaps (unlikely) we can reverse the order, but I can no longer
think properly today.

Or do you think I miss something and this is false alarm?

Oleg.



Re: [PATCH 2/4] utrace_set_events: consolidate setting _UTRACE_DEATH_EVENTS checks

2010-08-16 Thread Roland McGrath
 Hmm. For unknown reason I do not see this 2/4 patch on utrace-devel,
 strange.
 
 So I am attaching it in case my email was really lost and you didn't
 get it.

Indeed, it did not come through to me or the list.

Thanks,
Roland



Re: [PATCH 2/4] utrace_set_events: consolidate setting _UTRACE_DEATH_EVENTS checks

2010-08-16 Thread Roland McGrath
 utrace_set_events() checks the setting _UTRACE_DEATH_EVENTS case twice,
 and it is not immediately obvious why the first check is needed, and why
 it is not racy (we are checking exit_state without tasklist). The code is
 correct, but looks confusing.

More comments are always good.

 In short: this double check allows to avoid tasklist when utrace_flags
 already has these bits while engine-flags doesn't.
 
 But multitracing is unlikely case, in the likely case if we add
 _UTRACE_DEATH_EVENTS to engine-flags we set these bits in utrace_flags
 too. I think it makes sense to consolidate these checks to make the
 code a bit more understandable.

I don't really agree about unlikely case.  In many uses, the systemtap
task-finder will have a utrace engine on every task in the system, for
example.  Moreover, this is an importantly distinct particular kind of
micro-optimization to be doing here: avoiding a system-wide lock.  Any
place that we take tasklist_lock at all, we are introducing a system-wide
slowdown or limitation on scaling, just because of our tracing of one task.
So optimizing the minimize that is qualitatively different and really much
more important than avoiding taking the utrace lock, for example.

But with that note in your mind, I am happy to take this patch now as a
simplification and let reoptimization come back later.


Thanks,
Roland



Re: [PATCH 1/3] get_utrace_lock() must not succeed if utrace-reap == T

2010-08-16 Thread Roland McGrath
 get_utrace_lock() must threat utrace-reap == T as engine-ops == NULL.

Yes, I think you're right.  This requires some changes to the kerneldoc and
utrace.tmpl, because it now says that you get EALREADY if report_reap is
already running.  Now it will be consistent with utrace_control, where you
get ESRCH either if report_reap might already be running or if it's
entirely detached and/or reaped.


Thanks,
Roland



Re: [PATCH 2/3] fix utrace_control(DETACH) utrace-death interaction

2010-08-16 Thread Roland McGrath
 The problem is, utrace_control(DETACH) does nothing and returns
 -EALREADY if utrace-death is set, this is not right. We can and
 should detach in this case, we only should skip utrace_reset() to
 avoid the race with utrace_report_death()-REPORT_CALLBACKS().

This behavior is the original (minimal) synchronization scheme from before
we had utrace_barrier.  See Interlock with final callbacks in
Documentation/DocBook/utrace.tmpl.

The expectation is that your report_death is going to clean up your -data
stuff and then return UTRACE_DETACH.  If utrace_control returns -EALREADY,
then you know that report_death is taking care of things.  I'd imagined
that you'd do something like:

mutex_lock(stuff engine-data points to);
mark in there that we are detaching;
ret = utrace_control(task, engine, UTRACE_DETACH);
if (ret == 0) {
/* detached.  tear our stuff down. */
...
return DONE;
} else if (ret == -EALREADY) {
/* report_death is running, i.e. waiting for our lock */
mutex_unlock(...);
return DONE;
} else ...

Put another way, you can have told your report_death beforehand that it
should return UTRACE_DETACH if it runs before your utrace_control call.


Thanks,
Roland



Re: [PATCH 0/3] UTRACE_DETACH fixes

2010-08-16 Thread Roland McGrath
 Whatever we do, start_callback() can see the old engine-flags but
 the new -ops = utrace_detached_ops. Just suppose that the caller
 of UTRACE_DETACH is interrupted right after setting engine-ops.

 * it can check the old flags before using the old ops, or check the old
 * flags before using the new ops, or check the new flags before using the
 * new ops, but can never check the new flags before using the old ops.

 Or do you think I miss something and this is false alarm?

 * Hence, utrace_detached_ops might be used with any old flags in place.
 * It has report_quiesce() and report_reap() callbacks to handle all cases.

report_reap is covered with the utrace_detached_reap() stub.  Every other
callback uses start_callback(), i.e. calls report_quiesce first.
utrace_detached_quiesce() returns UTRACE_DETACH, so start_callback() will
return NULL.

 [...] but I can no longer think properly today.

Sleep well. :-)


Thanks,
Roland