Re: [PATCH] utrace_barrier(detached_engine) must not sping without checking -reporting

2010-10-25 Thread Oleg Nesterov
On 10/22, Oleg Nesterov wrote:

 On 10/19, Roland McGrath wrote:
 
   Yes. But, with this patch, in this case
   utrace_barrier()-get_utrace_lock(attached = false) returns success
   and then we check utrace-reporting == engine.
 
  I guess that's OK if -reporting == engine is always set when any kind of
  callback might be in progress.

 I think utrace_maybe_reap() is the only exception,

   (Hmm. Probably utrace-reap = T case needs more attention,
utrace_maybe_reap() doesn't set utrace-reporting).
 
  That would be a problem.

 Yes, but the current code has the same problem? Hmm, I need to
 recheck this all once again tomorrow. Suddenly I started to suspect
 that even -ops == NULL case is not right. And the more I read this
 code now, the more I confused.

Yes. And get_utrace_lock() must not succeed if utrace-reap == T
c93fecc9 makes things worse. With this patch utrace_barrier() can
return -ESRCH while -report_death() or report_reap() is in progress.

But even before that comment, I do not think that !engine-ops check
can prevent the race with -report_reap(), we need at least mb()
in utrace_maybe_reap() before engine-ops = NULL.

 But in any case. If engine-ops == utrace_detached_ops and
 utrace-reporting != engine, I do not see why utrace_barrier() can't
 return success, even if utrace-reap is set.

No! it must not return success! I still think it should not spin if
-reporting != engine, but it should not return zero. Somehow I forgot
that utrace_barrier() != reporting != engine.

This means that my patch is very wrong.

Oleg.



Re: [PATCH] utrace_barrier(detached_engine) must not sping without checking -reporting

2010-10-19 Thread Roland McGrath
 Yes. But, with this patch, in this case
 utrace_barrier()-get_utrace_lock(attached = false) returns success
 and then we check utrace-reporting == engine.

I guess that's OK if -reporting == engine is always set when any kind of
callback might be in progress.

 (Hmm. Probably utrace-reap = T case needs more attention,
  utrace_maybe_reap() doesn't set utrace-reporting).

That would be a problem.

 But, unfortunately, this signal_pending() check assumes the caller can
 always handle this ERESTARTSYS. But this is not true, one example is
 the exiting debugger doing exit_ptrace().

I understand it's a problem.  Perhaps we need to have a flag argument or
(probably better) a new utrace_barrier_uninterruptible call.  But, for
this particular situation with exit_ptrace, it could be kludged around.
The caller is never going to dequeue another signal once it's as far
into death as exit_ptrace anyway.  So, it could do a loop of:

ret = utrace_barrier(task, engine);
if (ret == -ERESTARTSYS) {
clear_thread_flag(TIF_SIGPENDING);
continue;
}

Not that I'm suggesting this isn't a dismal kludge.  But it seems like
it would close the hole we have today in practice.


Thanks,
Roland



Re: [PATCH] utrace_barrier(detached_engine) must not sping without checking -reporting

2010-10-15 Thread Oleg Nesterov
On 10/12, Roland McGrath wrote:

  If engine is detached (has utrace_detached_ops), utrace_barrier(engine)
  spins until engine-ops becomes NULL. This is just wrong.

 Yes, this happens because get_utrace_lock returns -ERESTARTSYS and
 utrace_barrier checks for this and loops.  I agree these long-spin
 scenarios would be wrong.

 The reason it tries to wait for fully detached state is that after
 utrace_control(task,engine,UTRACE_DETACH), @task could still be in the
 middle of a callback for @engine.

Yes. But, with this patch, in this case
utrace_barrier()-get_utrace_lock(attached = false) returns success
and then we check utrace-reporting == engine.

This patch only makes a difference if ops == utrace_detached_ops,

Before this patch

spin until -ops goes away

After this patch

spin until -ops goes away OR -reporting != engine

(Hmm. Probably utrace-reap = T case needs more attention,
 utrace_maybe_reap() doesn't set utrace-reporting).

  Also, it is not clear why utrace_barrier() needs utrace-lock,
  except to ensure it is safe to dereference target/utrace.

 Well, wouldn't that be reason enough?  The comment in utrace_barrier
 talks about needing the lock.  This corresponds to the comment in the
 UTRACE_DETACH case of finish_callback_report.

Yes agreed. Honestly, I can't even recall what I meant.

  Note: we should also reconsider() utrace_barrier()-signal_pending() check.

 IMHO it is badly wrong to have utrace_barrier do an uninterruptible wait
 (even moreso since it's really a spin).  If a buggy callback gets stuck
 blocking or spinning and fails to return promptly, then you wedge any
 debugger thread trying to synchronize with it via utrace_barrier.  If
 you can't even interrupt that debugger thread, then there will really be
 no chance to recover from the deadlock.

Completely agreed.

But, unfortunately, this signal_pending() check assumes the caller can
always handle this ERESTARTSYS. But this is not true, one example is
the exiting debugger doing exit_ptrace().

Oleg.



Re: [PATCH] utrace_barrier(detached_engine) must not sping without checking -reporting

2010-10-13 Thread Roland McGrath
Please feel free to ignore this if it's no longer relevant to your work.
I'm trying to catch up on the backlog of replies I owe you.  I chose
this one as the arbitrary cut-off before which I think I have already
neglected things too long to still matter.

 If engine is detached (has utrace_detached_ops), utrace_barrier(engine)
 spins until engine-ops becomes NULL. This is just wrong.

Yes, this happens because get_utrace_lock returns -ERESTARTSYS and
utrace_barrier checks for this and loops.  I agree these long-spin
scenarios would be wrong.

The reason it tries to wait for fully detached state is that after
utrace_control(task,engine,UTRACE_DETACH), @task could still be in the
middle of a callback for @engine.

 Suppose that utrace_control(DETACH) returns -EINPROGRESS, now we should
 call utrace_barrier(). However, it is possible that -EINPROGRESS means
 we raced with sys_sleep(A_LOT) doing report_syscall_entry(). 

Right.  Perhaps utrace_barrier could do some different variant of the
(utrace-reporting != engine) check.

 Change get_utrace_lock() to succeed if the caller is utrace_barrier()
 and ops == utrace_detached_ops. I do not see any reason why this case
 should be special from utrace_barrier's pov. It can just check
 -reporting and return 0 or do another iteration.
[...]
 Also, it is not clear why utrace_barrier() needs utrace-lock,
 except to ensure it is safe to dereference target/utrace.

Well, wouldn't that be reason enough?  The comment in utrace_barrier
talks about needing the lock.  This corresponds to the comment in the
UTRACE_DETACH case of finish_callback_report.  Do you think those
comments are inaccurate about what's required?

 Note: we should also reconsider() utrace_barrier()-signal_pending() check.

IMHO it is badly wrong to have utrace_barrier do an uninterruptible wait
(even moreso since it's really a spin).  If a buggy callback gets stuck
blocking or spinning and fails to return promptly, then you wedge any
debugger thread trying to synchronize with it via utrace_barrier.  If
you can't even interrupt that debugger thread, then there will really be
no chance to recover from the deadlock.


Thanks,
Roland