Re: utrace-cleanup branch

2009-10-29 Thread Oleg Nesterov
Btw, why does utrace_set_events() check utrace-stopped?

If a tracee is stopped then -reporting == engine is not possible,
-reporting must be NULL.

To optimize out other checks + mb() in the likely stopped case?

Oleg.



Re: utrace-cleanup branch

2009-10-29 Thread Roland McGrath
 To optimize out other checks + mb() in the likely stopped case?

Yes.


Thanks,
Roland



Re: utrace-cleanup branch

2009-10-29 Thread Roland McGrath
 Can't comment right now, need to read the code.

Such gentlemanly restraint.

 Afaics, we can't just remove utrace_finish_jctl() and the similar code in
 utrace_stop(). We need
 
   void utrace_finish_jctl(void)
   {
   struct utrace *utrace = task_utrace_struct(current);
   /*
* While in TASK_STOPPED, we can be considered safely stopped by
* utrace_do_stop(). Make sure we can do nothing until the 
 tracer
* drops utrace-lock
*/
   if (unlikely(__fatal_signal_pending()))
   spin_unlock_wait(utrace-lock);
   }
 
 and utrace_stop() should do the same.

I see.  The comments should be more clear that SIGKILL breaking
TASK_TRACED is the only way this can arise.  In the jctl case, that is
in particular after utrace_do_stop() changed TASK_STOPPED into TASK_TRACED.

 Otherwise, the killed tracee can start another reporting loop and
 list_for_each() can race with, say, utrace_reset(DETACH)-utrace_reset().
 More generally, if the tracer sees it is stopped under utrace-lock,
 the tracee must be really stopped until we drop utrace-lock(), it
 must not escape from utrace_stop() or do_signal_stop().

Correct.

So today -stopped really does still mean one other thing.  It means
that it's either still in TASK_TRACED or is just waking up for SIGKILL.

I think we've discussed the related stuff before.  A contrary approach
would be to ensure that in the SIGKILL-breaks-UTRACE_STOP case we never
make any more normal reports at all before utrace_report_death.  It
seems like a good API choice on its face: if you used UTRACE_STOP, you
didn't expect to see any SYSCALL_EXIT, SIGNAL, etc. events--the special
case is the instant-death scenario, so straight to report_death makes
some sense.  I was attracted by this until I started going through it.
It's all good until you consider report_clone.  If a SIGKILL is pending
when you get to utrace_report_clone(), the clone has already happened.
If you skip that callback, the engine doesn't find out about the new
child, and that matters if it's not a CLONE_THREAD.


Thanks,
Roland



Re: utrace-cleanup branch

2009-10-29 Thread Oleg Nesterov
On 10/29, Roland McGrath wrote:

  void utrace_finish_jctl(void)
  {
  struct utrace *utrace = task_utrace_struct(current);
  /*
   * While in TASK_STOPPED, we can be considered safely stopped by
   * utrace_do_stop(). Make sure we can do nothing until the 
  tracer
   * drops utrace-lock
   */
  if (unlikely(__fatal_signal_pending()))
  spin_unlock_wait(utrace-lock);
  }
 
  and utrace_stop() should do the same.

 I see.  The comments should be more clear that SIGKILL breaking
 TASK_TRACED is the only way this can arise.  In the jctl case, that is
 in particular after utrace_do_stop() changed TASK_STOPPED into TASK_TRACED.

Agreed.

 So today -stopped really does still mean one other thing.  It means
 that it's either still in TASK_TRACED or is just waking up for SIGKILL.

 I think we've discussed the related stuff before.  A contrary approach
 would be to ensure that in the SIGKILL-breaks-UTRACE_STOP case we never
 make any more normal reports at all before utrace_report_death.

Yes, I agree.

 I was attracted by this until I started going through it.
 It's all good until you consider report_clone.  If a SIGKILL is pending
 when you get to utrace_report_clone(), the clone has already happened.
 If you skip that callback, the engine doesn't find out about the new
 child, and that matters if it's not a CLONE_THREAD.

another nasty case is report_exit(). Even if we fix the discussed problems
with explicit/implicit SIGKILL's. We shouldn't afaics skip this report if
the tracee was killed by execing sub-thread.

Both can be fixed if we add spin_unlock_wait() before REPORT(). But imho
it would be safer if we start with spin_unlock_wait() in utrace_stop(),
perhaps there is something else to consider.

Oleg.