Re: [PATCH 2/3] fix utrace_control(DETACH) utrace-death interaction

2010-08-26 Thread Oleg Nesterov
On 08/19, Roland McGrath wrote:

  Wait. It doesn't break this. It only breaks -EALREADY contract. And
  I don't understand why this is bad.

 The -EALREADY contract lets you have a report_death callback that does all
 your cleanup and then returns UTRACE_DETACH.

I think this is possible without the current -EALREADY contract.

 To have that plan, you need a
 way for an asynchronous detach attempt to be excluded once report_death has
 begun, and for that asynchronous caller to know that's the situation.

The problem is, with the current conract detach is never simple if you
have report_death. You always have to synchronize with this callback.

But see below.

 It breaks the documented API.  I am of course open to changing the API.
 But we need to get completely clear on what the new range of plans we'll
 support is, and make the documentation and the implementation match.

Yes. I thought it makes sense to change the API and docs if we can improve
things, before utrace is widely used.

But OK, lets's forget about this.

  Why? What is the point? What makes UTRACE_EVENT(DEATH) so special?
  I do not see the logic at all.

 It is special because it is the last interesting event to a traditional
 user such as ptrace.  Hence it is attractive to write a report_death
 callback that returns UTRACE_DETACH spontaneously, i.e. without an
 asynchronous caller (i.e. debugger thread) having requested the detach.

Sure. And this is still possible.

  If -report_death() does something which the caller of utrace_control()
  should know, then they should take care of synchronization anyway.
  With or without this patch, utrace_control(DEATH) can return 0 after
  -report_death() was already called.

 If your engine's report_death always returns UTRACE_DETACH, then
 utrace_control(UTRACE_DETACH) can never return 0 once report_death
 is about to be called nor any time thereafter.

I don't undestand this. OK, probably I missed something, this doesn't
matter.

 But, the current state of play from the broader perspective is that we
 really have no idea about the future viability of the utrace API.  I don't
 think it makes a lot of sense to put a great deal of effort into perfecting
 the corners of the API much further when we don't really have any good
 reasons to think that this API will be the basis of anything that survives
 in the long run.

Yes. I have to agree, this is good point.

So, lets forget about these changes, at least for now.

 The current focus of work is your ugdb prototype.  If some utrace changes
 make it greatly easier to get that to kind-of-working status, then they are
 worthwhile.

No. I didn't think about ugdb at all. I'll find the workaround for ugdb.
If nothing else, I can use utrace_engine_ops-release(). Or something else.

 If it
 works in practice but has disturbing robustness holes, it is OK to just
 comment that loudly now and move on to the next battle, IMHO.

OK, agreed.

Oleg.



Re: [PATCH 2/3] fix utrace_control(DETACH) utrace-death interaction

2010-08-18 Thread Oleg Nesterov
On 08/17, Oleg Nesterov wrote:

 On 08/16, Roland McGrath wrote:
 
   The problem is, utrace_control(DETACH) does nothing and returns
   -EALREADY if utrace-death is set, this is not right. We can and
   should detach in this case, we only should skip utrace_reset() to
   avoid the race with utrace_report_death()-REPORT_CALLBACKS().
 
  This behavior is the original (minimal) synchronization scheme from before
  we had utrace_barrier.  See Interlock with final callbacks in
  Documentation/DocBook/utrace.tmpl.

 OK, I agree my patch breaks this part

   Normally
   functionutrace_control/function called with
   constantUTRACE_DETACH/constant returns zero, and this means that no
   more callbacks will be made.

 of documentation (which I never read ;)

Wait. It doesn't break this. It only breaks -EALREADY contract. And
I don't understand why this is bad.

 But in any case, personally I dislike the current behaviour anyway,
 I think this certainly complicates the life for module writers.
 Instead of simple detach + barrier, you always need the nontrivial
 code if report_quiesce/death ever touches engine-data! This can't
 be good.

Yes.

Roland, could you explain once again why do you dislike this patch?


Once again. Currently utrace_control(DETACH) refuses to even try to
detach the engine if utrace-death is set.

Why? What is the point? What makes UTRACE_EVENT(DEATH) so special?
I do not see the logic at all.

If -report_death() does something which the caller of utrace_control()
should know, then they should take care of synchronization anyway.
With or without this patch, utrace_control(DEATH) can return 0 after
-report_death() was already called.


Currently, utrace_control(DETACH) returns -EALREADY when this callback
was alredy called, or it can be called later, or it may be running. And
utrace_barrier() can't help.

With this patch utrace_control(DETACH) returns either 0 with the same
meaning (will not be called later, but probably was already called
before utrace_control), or -EINPROGRESS which suggests to use
utrace_barrier().


Please correct me, but I think this certainly makes things much simpler.
Otherwise UTRACE_DETACH is never trivial (and iiuc it is better to
avoid utrace_engine_ops-release() hook).

What do you think?

Oleg.



[PATCH 2/3] fix utrace_control(DETACH) utrace-death interaction

2010-08-16 Thread Oleg Nesterov
Suppose that we want to detach the engine and free engine-data. To
avoid the races with our calbacks which can use -data we should do
either

err = utrace_set_events(0);
if (err == ...)
utrace_barrier();
utrace_control(DETACH);

or

err = utrace_control(DETACH);
if (err = ...)
utrace_barrier();

Neither variant works if we should care about report_quiesce() or
report_death(). Let's discuss the 2nd case, the 1st one have the
similar problems.

The problem is, utrace_control(DETACH) does nothing and returns
-EALREADY if utrace-death is set, this is not right. We can and
should detach in this case, we only should skip utrace_reset() to
avoid the race with utrace_report_death()-REPORT_CALLBACKS().

This was the main problem I hit during the testing. Consider the
exiting tracee with the valid engine-ops, suppose that
utrace_control(DETACH) is called right after utrace_report_death()
unlocks utrace-lock and before it does REPORT_CALLBACKS().

utrace_control() fails, but utrace_barrier() does not help after
that. It takes get_utrace_lock() successfully and returns 0.
The caller frees engine-data, but utrace_report_death() calls
-report_death() after that.

Kill utrace_control_dead(). Change utrace_control() to always
allow UTRACE_DETACH if engine has the valid -ops. This means that
mark_engine_detached() can race with REPORT_CALLBACKS() but there
is nothing new. utrace_do_stop() is unnecessary and pointless in
this case, but it doesn't hurt.

Signed-off-by: Oleg Nesterov o...@redhat.com
---

 kernel/utrace.c |   49 ++---
 1 file changed, 10 insertions(+), 39 deletions(-)

--- kstub/kernel/utrace.c~6_fix_control_dead2010-08-16 11:17:05.0 
+0200
+++ kstub/kernel/utrace.c   2010-08-16 11:17:51.0 +0200
@@ -928,37 +928,6 @@ void utrace_maybe_reap(struct task_struc
}
 }
 
-/*
- * You can't do anything to a dead task but detach it.
- * If release_task() has been called, you can't do that.
- *
- * On the exit path, DEATH and QUIESCE event bits are set only
- * before utrace_report_death() has taken the lock.  At that point,
- * the death report will come soon, so disallow detach until it's
- * done.  This prevents us from racing with it detaching itself.
- *
- * Called only when @target-exit_state is nonzero.
- */
-static inline int utrace_control_dead(struct task_struct *target,
- struct utrace *utrace,
- enum utrace_resume_action action)
-{
-   lockdep_assert_held(utrace-lock);
-
-   if (action != UTRACE_DETACH || unlikely(utrace-reap))
-   return -ESRCH;
-
-   if (unlikely(utrace-death))
-   /*
-* We have already started the death report.  We can't
-* prevent the report_death and report_reap callbacks,
-* so tell the caller they will happen.
-*/
-   return -EALREADY;
-
-   return 0;
-}
-
 /**
  * utrace_control - control a thread being traced by a tracing engine
  * @target:thread to affect
@@ -1115,18 +1084,20 @@ int utrace_control(struct task_struct *t
ret = 0;
 
/*
-* -exit_state can change under us, this doesn't matter.
-* We do not care about -exit_state in fact, but we do
-* care about -reap and -death. If either flag is set,
-* we must also see -exit_state != 0.
+* -exit_state can change under us, this doesn't matter. But,
+* if utrace-death is set we must also see -exit_state != 0,
+* this is what we care about.
 */
if (unlikely(target-exit_state)) {
-   ret = utrace_control_dead(target, utrace, action);
-   if (ret) {
+   /* You can't do anything to a dead task but detach it */
+   if (action != UTRACE_DETACH) {
spin_unlock(utrace-lock);
-   return ret;
+   return -ESRCH;
}
-   reset = true;
+
+   /* If it is not set, we can safely do utrace_reset() */
+   if (likely(!utrace-death))
+   reset = true;
}
 
switch (action) {



Re: [PATCH 2/3] fix utrace_control(DETACH) utrace-death interaction

2010-08-16 Thread Roland McGrath
 The problem is, utrace_control(DETACH) does nothing and returns
 -EALREADY if utrace-death is set, this is not right. We can and
 should detach in this case, we only should skip utrace_reset() to
 avoid the race with utrace_report_death()-REPORT_CALLBACKS().

This behavior is the original (minimal) synchronization scheme from before
we had utrace_barrier.  See Interlock with final callbacks in
Documentation/DocBook/utrace.tmpl.

The expectation is that your report_death is going to clean up your -data
stuff and then return UTRACE_DETACH.  If utrace_control returns -EALREADY,
then you know that report_death is taking care of things.  I'd imagined
that you'd do something like:

mutex_lock(stuff engine-data points to);
mark in there that we are detaching;
ret = utrace_control(task, engine, UTRACE_DETACH);
if (ret == 0) {
/* detached.  tear our stuff down. */
...
return DONE;
} else if (ret == -EALREADY) {
/* report_death is running, i.e. waiting for our lock */
mutex_unlock(...);
return DONE;
} else ...

Put another way, you can have told your report_death beforehand that it
should return UTRACE_DETACH if it runs before your utrace_control call.


Thanks,
Roland