Re: [PATCH] utrace_barrier(detached_engine) must not sping without checking -reporting

2010-10-13 Thread Roland McGrath
Please feel free to ignore this if it's no longer relevant to your work.
I'm trying to catch up on the backlog of replies I owe you.  I chose
this one as the arbitrary cut-off before which I think I have already
neglected things too long to still matter.

 If engine is detached (has utrace_detached_ops), utrace_barrier(engine)
 spins until engine-ops becomes NULL. This is just wrong.

Yes, this happens because get_utrace_lock returns -ERESTARTSYS and
utrace_barrier checks for this and loops.  I agree these long-spin
scenarios would be wrong.

The reason it tries to wait for fully detached state is that after
utrace_control(task,engine,UTRACE_DETACH), @task could still be in the
middle of a callback for @engine.

 Suppose that utrace_control(DETACH) returns -EINPROGRESS, now we should
 call utrace_barrier(). However, it is possible that -EINPROGRESS means
 we raced with sys_sleep(A_LOT) doing report_syscall_entry(). 

Right.  Perhaps utrace_barrier could do some different variant of the
(utrace-reporting != engine) check.

 Change get_utrace_lock() to succeed if the caller is utrace_barrier()
 and ops == utrace_detached_ops. I do not see any reason why this case
 should be special from utrace_barrier's pov. It can just check
 -reporting and return 0 or do another iteration.
[...]
 Also, it is not clear why utrace_barrier() needs utrace-lock,
 except to ensure it is safe to dereference target/utrace.

Well, wouldn't that be reason enough?  The comment in utrace_barrier
talks about needing the lock.  This corresponds to the comment in the
UTRACE_DETACH case of finish_callback_report.  Do you think those
comments are inaccurate about what's required?

 Note: we should also reconsider() utrace_barrier()-signal_pending() check.

IMHO it is badly wrong to have utrace_barrier do an uninterruptible wait
(even moreso since it's really a spin).  If a buggy callback gets stuck
blocking or spinning and fails to return promptly, then you wedge any
debugger thread trying to synchronize with it via utrace_barrier.  If
you can't even interrupt that debugger thread, then there will really be
no chance to recover from the deadlock.


Thanks,
Roland



Re: gdbstub initial code, v8 ptrace

2010-10-13 Thread Roland McGrath
 But. Suppose we have to attached engines. The first engine gets
 UTRACE_SIGNAL_REPORT and returns UTRACE_STOP | UTRACE_SIGNAL_IGN.

Right, that's what it would do.  I see, you're saying that the
report.result passed on to the next engine would appear like it had
been a real signal that the previous engine decided to ignore (or
whose sa_handler==SIG_IGN or default action was to ignore).  Hmm.

Well, it's also distinguished by having orig_ka==NULL in the callback.
Any real signal has a non-null orig_ka argument.

 or yes, it returns UTRACE_SIGNAL_DELIVER after gdb does signal XX.
 
 Now. The second engine gets UTRACE_SIGNAL_DELIVER or UTRACE_SIGNAL_IGN,
 not UTRACE_SIGNAL_REPORT.

At least in the UTRACE_SIGNAL_DELIVER case, that's really the true
thing for it to see.  The previous engine decided to do a signal
delivery (as directed by return_ka), so the next engine needs to see
this to know what the prevailing state of play is now.

 That is why ugdb_signal_report(UTRACE_SIGNAL_REPORT) tries to return
 UTRACE_STOP | utrace_signal_action(action) to not change report-result
 (passed to the next tracee) inside the reporting loop.

Well, it *can* do that.  If UTRACE_SIGNAL_REPORT (or anything else
random) is the ultimate report.result from the whole callback loop, we
treat it just like UTRACE_SIGNAL_IGN.

 The worst case is UTRACE_SIGNAL_HANDLER. Single-stepping should know
 about this case. This means that any engine should always return
 UTRACE_resume_action | UTRACE_SIGNAL_HANDLER.

I see.  This is indeed the only way for any engine to know that it's
getting the UTRACE_SIGNAL_HANDLER specific notification rather than
some random fallout of someone else's UTRACE_REPORT request or whatnot.

   Probably we can check orig_ka != NULL and treat orig_ka == NULL as
   UTRACE_SIGNAL_REPORT. Hopefully this can't be confused with
   UTRACE_SIGNAL_HANDLER.
 
  I'm not really sure what you mean here.
 
 If -report_signal(orig_ka) was calles with orig_ka == NULL, then,
 whatever utrace_signal_action(action) we see, originally it was
 either UTRACE_SIGNAL_HANDLER or UTRACE_SIGNAL_REPORT but another engine
 returned, say, UTRACE_SIGNAL_DELIVER/UTRACE_SIGNAL_IGN to deliver/stop.

Right.  But this is in fact just the same for either
UTRACE_SIGNAL_REPORT or UTRACE_SIGNAL_HANDLER.

   Not sure about UTRACE_SIGNAL_HOLD, but this is very unlikely race.
 
  You probably don't want to use that, but I'm not entirely sure.  ptrace
  doesn't use it, and the signal interception model is pretty much the same.
 
 Yes, agreed. But there is one corner case. Until we generalize
 utrace_stop()-ptrace_notify_stop() hook, the tracee can be reported as
 stopped to gdb, but before it actually stops it can recieve a signal.

I don't follow this.  If we're stopping, then we don't receive
(dequeue) any other signal until we get resumed.  That should be fine
from gdb's point of view.  The next signal is just pending for later.
The signal that we just reported was a) in fact reported to gdb and
b) is still sitting in the siginfo_t on stack and the engine can
examine/modify that before letting the thread resume.  (The kerneldoc
paragraph about @report_signal in utrace.h makes this guarantee.)


Thanks,
Roland