Re: Possible problem with utrace_control

2010-06-23 Thread Oleg Nesterov
On 06/22, Roland McGrath wrote:

  OTOH, I am wondering if we can give more power to utrace_control(STOP).
  Currently debugger can't stop the tracee unless it cooperates. However, if
  the tracee doesn't have UTRACE_EVENT(QUIESCE), then utrace_control(STOP)
  works. This is strange.

 Why is it strange?  If you don't have any callback, then all your engine
 does is get stopped.  If you have callbacks, then you always get callbacks
 before the thread does anything (including stop).  Since your callback's
 return value always replaces any other resume action

OK. But then perhaps UTRACE_STICKY_STOP makes sense even without the
races we discussed. It can simplify the cooperation in this case.

  Stupid question: what if we just remove the clear_engine_wants_stop()
  code in finish_callback_report() ?

 That would change the API so that a callback's resume action is always
 overridden by a utrace_control(,,UTRACE_STOP).  Today the practice from a
 third-party debugger thread is to do:

   ret = utrace_control(task, engine, UTRACE_STOP);
   if (ret == 0) {
 // Already stopped, e.g. was in jctl or another engine's stop.
 // This means we can do our asynchronous examination here.
   } else if (ret == -EINPROGRESS) {
 // it will get to report_quiesce soonish (modulo blocked in kernel)
   }

 If the examination you wanted to do was some predetermined sequence of
 fetching and fiddling before resuming, then the report_* callback can just
 do all that itself.  For example, fetch the registers for a sample and keep
 going; insert some breakpoints and keep going; etc.

OK, agreed.

 With this
 change, it would have to explicitly request not stopping by calling
 utrace_control(current, engine, UTRACE_RESUME) inside the callback.

Ah, indeed, this can't be good. Thanks.

  Btw, before I forgot about this. shouldn't utrace_control(UTRACE_RESUME)
  remove ENGINE_STOP from -utrace_flags like DETACH does?

 Perhaps so.  My memory is dim on the use of that bit in utrace_flags.

(don't assume I was able to quickly recall why DETACH does this ;)

 It's
 to close certain races, but we'll have to remind each other of the details.

Yes. Let's consider the concrete example. The tracee is going to
stop and calls utrace_stop(). Before it takes utrace-lock and
sets state = TASK_TRACED, the debugger does utrace_control(DETACH).

In this case utrace_stop() shouldn't stop, otherwise nobody will
ever wake it up. That is why we clear this bit in -utrace_flags
to provoke utrace_reset() which will check carefully the tracee
has an engine with ENGINE_STOP.

The original motivation for this patch was ptrace-utrace, the
implicit DETACH if the tracer exits.

UTRACE_RESUME should probably do the same, otherwise the tracee
can stop right after utrace_control(RESUME). This case is less
important compared to UTRACE_DETACH (and just in case, ptrace
doesn't need this) but still.

  Hmm. Not sure... at least I don't immediately see something simple.
 
  Except, just add wait_queue_head into struct utrace. This way 
  utrace_barrier()
  can wait longer than needed if it notices utrace-reporting == engine. Or we
  can add it into utrace_engine, in this case utrace_stop() needs
  list_for_each(engine, utrace-attached).

 You're talking about two different things here.  Whatever implementation
 details change in utrace_barrier() is not directly apropos.

Yes, agreed.

 For the new API idea, I was talking about something like a utrace_wake_up()
 call.

Now I don't understand you again...

The 1st question: should the new API help us to kill ptrace_notify_stop()
in utrace_stop() ?

 What I'm most interested in right now is thinking through the utility of
 particular new API details for engine writers.

Hmm... After the previous email I thought that this utrace_wake_up()
or whatever shouldn't be visible outside of utrace.c.

Ignoring the ptrace_notify_stop() problem, could you give us any
example of how it can be used?

Or. Do you literally mean something like

utrace_wait_for_event();// for debugger
utrace_wake_up();   // for tracee

which should (at least) cover all races with exit/detach/etc ?

Oleg.



Re: Possible problem with utrace_control

2010-06-23 Thread Roland McGrath
 OK. But then perhaps UTRACE_STICKY_STOP makes sense even without the
 races we discussed. It can simplify the cooperation in this case.

The only cooperation methods we should consider are those that cover all
their races.

 Yes. Let's consider the concrete example. The tracee is going to
 stop and calls utrace_stop(). Before it takes utrace-lock and
 sets state = TASK_TRACED, the debugger does utrace_control(DETACH).
 
 In this case utrace_stop() shouldn't stop, otherwise nobody will
 ever wake it up. That is why we clear this bit in -utrace_flags
 to provoke utrace_reset() which will check carefully the tracee
 has an engine with ENGINE_STOP.

Right.  I agree this race seems to apply to all kinds of resumption, not
just detach.  The question for the non-detach cases is what kind of
guarantee the utrace API is claiming to provide.  For detach it matters
strongly because the loser of the race is gone and there is no way for
it to be responsible for cleaning up the fallout.

For non-detach cases, the picture is more hazy.  The conclusions on what
matters might be different if we have utrace_wake_up() or equivalent.

Off hand I think it does matter today insofar as it violates the
documented guarantees of the utrace_barrier API.  If utrace_barrier
returns 0 it is said to mean that, Any effect of its return value (such
as %UTRACE_STOP) has already been applied to @engine.  So e.g. if you
get a wake-up sent by your callback before it returns UTRACE_STOP, and
then call utrace_barrier followed by utrace_control(,,UTRACE_RESUME),
then you should be guaranteed that your resumption cleared the
callback's stop.

Another example, you call utrace_set_events and disable some events,
including all the ones for which your callbacks ever return UTRACE_STOP.
If that returns 0, or then you call utrace_barrier and it returns 0,
then you know that either it was already stopped before you called
utrace_set_events, or that it's not stopped now.  In either case, you
know that utrace_control(,,UTRACE_RESUME) ensures that it's no longer
stopped.  (That is, stopped by your engine--it of course might still
be stopped by a different engine.)

  For the new API idea, I was talking about something like a utrace_wake_up()
  call.
 
 Now I don't understand you again...
 
 The 1st question: should the new API help us to kill ptrace_notify_stop()
 in utrace_stop() ?

Ideally yes.  This might still have to be some other kind of special case,
we'll have to figure that out.  That would be survivable if it's necessary.
The wait_chldexit wakeup would be covered by what we're discussing, it's
just another wait_queue_head_t.  The signal sending is more problematic.

But I wanted first to contemplate what API would cover new code cleanly
written.  The dismal old ptrace signal/wait synchronization might always be
a difficult outlying case.

 Hmm... After the previous email I thought that this utrace_wake_up()
 or whatever shouldn't be visible outside of utrace.c.

I'm sorry if I was not clear.  I have been talking entirely about new
utrace API features, not internals.  This is a discussion for your
engine-writer hat more than your utrace-maintainer hat.  I'm only
mentioning any utrace implementation details to elucidate what is
realistic or problematic to consider for the new API features.

 Or. Do you literally mean something like
 
   utrace_wait_for_event();// for debugger
   utrace_wake_up();   // for tracee
 
 which should (at least) cover all races with exit/detach/etc ?

I meant utrace_wake_up(wait_queue_head_t *) used in place of wake_up()
inside a utrace callback.  The waiter (tracer) side would use normal
linux/wait.h interfaces (wait_event macro, etc.).


Thanks,
Roland