Re: [PATCH] utrace_barrier(detached_engine) must not sping without checking -reporting

2010-10-15 Thread Oleg Nesterov
On 10/12, Roland McGrath wrote:

  If engine is detached (has utrace_detached_ops), utrace_barrier(engine)
  spins until engine-ops becomes NULL. This is just wrong.

 Yes, this happens because get_utrace_lock returns -ERESTARTSYS and
 utrace_barrier checks for this and loops.  I agree these long-spin
 scenarios would be wrong.

 The reason it tries to wait for fully detached state is that after
 utrace_control(task,engine,UTRACE_DETACH), @task could still be in the
 middle of a callback for @engine.

Yes. But, with this patch, in this case
utrace_barrier()-get_utrace_lock(attached = false) returns success
and then we check utrace-reporting == engine.

This patch only makes a difference if ops == utrace_detached_ops,

Before this patch

spin until -ops goes away

After this patch

spin until -ops goes away OR -reporting != engine

(Hmm. Probably utrace-reap = T case needs more attention,
 utrace_maybe_reap() doesn't set utrace-reporting).

  Also, it is not clear why utrace_barrier() needs utrace-lock,
  except to ensure it is safe to dereference target/utrace.

 Well, wouldn't that be reason enough?  The comment in utrace_barrier
 talks about needing the lock.  This corresponds to the comment in the
 UTRACE_DETACH case of finish_callback_report.

Yes agreed. Honestly, I can't even recall what I meant.

  Note: we should also reconsider() utrace_barrier()-signal_pending() check.

 IMHO it is badly wrong to have utrace_barrier do an uninterruptible wait
 (even moreso since it's really a spin).  If a buggy callback gets stuck
 blocking or spinning and fails to return promptly, then you wedge any
 debugger thread trying to synchronize with it via utrace_barrier.  If
 you can't even interrupt that debugger thread, then there will really be
 no chance to recover from the deadlock.

Completely agreed.

But, unfortunately, this signal_pending() check assumes the caller can
always handle this ERESTARTSYS. But this is not true, one example is
the exiting debugger doing exit_ptrace().

Oleg.



Re: gdbstub initial code, v7

2010-10-15 Thread Oleg Nesterov
On 10/13, Roland McGrath wrote:

  On 09/10, Roland McGrath wrote:
  
ugdb sets please stop flag and does utrace_control(INTERRUPT). 
However,
in unlikely case the tracee can stop before -report_signal() reporting
  
   I don't think this is the right thing to do.  When the intent is 
   explicitly
   to interrupt, there is no reason to stop before the interruption is
   complete, i.e. report_signal.
 
  This means that ugdb_report_quiesce() should never return UTRACE_STOP,
  and that is all.

 I'm not sure about this.

Ignoring the problems below, why?

  But what about multitracing? Suppose that (gdb) interrupt happens
  just before, say, do_report_syscall_entry() and another engine wants
  to stop. If ugdb_report_quiesce() doesn't return UTRACE_STOP, then
  gdb will wait until another debugger resumes the tracee.

 Yes, I do think that's a problem.  We want gdb to report back promptly.
 One possibility is to have report_quiesce notice its argument is
 UTRACE_EVENT(SYSCALL_ENTRY) and roll back to before the syscall.
 That is, it enables SYSCALL_ENTRY and SYSCALL_EXIT reporting, then
 its report_syscall_entry uses UTRACE_SIGNAL_ABORT, report_syscall_exit
 does syscall_set_return_value(-ERESTARTNOHAND, 0) and then returns
 UTRACE_INTERRUPT.  Now, we'll reenter a UTRACE_SIGNAL_REPORT callback
 before the system call, and we can stop there without being in any
 sticky situation.

Well, but this doesn't look friendly to other engines...

And at first glance this looks a bit too hairy. And, this doesn't
cover another case: gdb asks to stop the tracee when it is already
stopped by another engine and sleeps in utrace_resume() path.

So, I think ugdb should be changed so that signal SIG always works
(without reporing this signal) even when the stopped tracee doesn't
have the signal context.

As for $_siginfo (qXfer:siginfo:read::), I do not know what ugdb
should do. Probably it can just report the all-zeroes siginfo or
report si_signo = SIGSTOP.

Oleg.