Re: Q: utrace-stopped utrace_report_jctl()

2009-03-16 Thread Oleg Nesterov
On 03/15, Roland McGrath wrote:

  Then we re-do this (well, almost) check under -siglock,
 
  } else if (task_is_stopped(target)) {
  if (!(target-utrace_flags  UTRACE_EVENT(JCTL)))
  utrace-stopped = stopped = true;
  }
 
  But this is not nice. Let's suppose the task is already stopped, we do
  UTRACE_ATTACH + utrace_set_events(JCTL).

 This is exactly why utrace_set_events() sets -stopped preemptively for
 that case.

Yes, thanks. I saw this code in utrace_set_events(), but then forgot.

  REPORT(task, utrace, report, UTRACE_EVENT(JCTL),
 report_jctl, what, notify);
 
  instead.

 There is a bug, but your fix changes a key API choice.
 I've put in this fix:

 -report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED, what);
 +report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED,
 +notify ? what : 0);

 The @type argument shows the state we will be in after the callback.
 If the state changes, there will be another callback.  That's what a
 state-tracking tracer needs, e.g. to keep a little light on the screen red
 while the thread is stopped and green while it's running.

 The @notify argument shows what SIGCHLD the parent sees (if it were
 dequeuing all possible SIGCHLD postings as quickly as they come).  That's
 what an event-tracking tracer needs, e.g. to match up with what SIGCHLDs
 are expected in the parent.

I see, thanks.

  With the first patch, we call utrace_report_jctl() before we actually
  stop. do_signal_stop() can fail then, but I think this is OK, we can
  pretend that SIGCONT/SIGKILL happened after we stopped. It is not complete,
  and with this patch we always call -report_jctl with notify == 0. Just for
  discussion.

 I think I sort of understand the intent of your patch.  If we change the
 calling convention for tracehook_notify_jctl, I think we want to preserve
 the aspect that the hook decides about sending the notification.  That's
 how the ptrace quirks can be reimplemented differently later without
 changing the tracehook layer again.  Also, we certainly don't want one
 tracehook call with two different locking conditions.

Agreed, bool sig_locked is awful. But we can avoid it. The real problem
is how to figure out the correct notify argument. I'll try to think more,
but I am not sure I will find the clean solution :(

Just in case. We can introduce PF_SIGCONTED flag and change
prepare_signal(SIGCONT) and signal_wake_up(resume = 1) to set this flag.
Since the task never changes its -flags in finish_stop() path, it is safe
to do this under -siglock. This way utrace_report_jctl() can drop
TASK_STOPPED before REPORT() and then check !PF_SIGCONTED before restoring
the -state. But yes sure, this is very, very ugly.

Oleg.



Re: Q: utrace-stopped utrace_report_jctl()

2009-03-15 Thread Roland McGrath
 I was wrong, I forgot that tracehook_get_signal() doesn't need JCTL.

Right, that is key.

 OK, let's look at utrace_do_stop:
 
   if (task_is_stopped(target) 
   !(target-utrace_flags  UTRACE_EVENT(JCTL))) {
   utrace-stopped = 1;
   return true;
   }
 
 This doesn't look correct. We don't hold -siglock, the task can be
 SIGCONT'ed and return from get_signal_to_deliver(), and then we set
 -stopped. Or I missed something again?

I think you're right.  The logic there was supposed to be, TASK_STOPPED
means it will get into utrace_get_signal().  That much is true, but
nothing inside utrace_get_signal() actually synchronizes with this to make
that matter.

All this check does is try to optimize the TASK_STOPPED case not to take
the siglock.  That doesn't seem worth much, so we can just drop it.

 Then we re-do this (well, almost) check under -siglock,
 
   } else if (task_is_stopped(target)) {
   if (!(target-utrace_flags  UTRACE_EVENT(JCTL)))
   utrace-stopped = stopped = true;
   }
 
 But this is not nice. Let's suppose the task is already stopped, we do
 UTRACE_ATTACH + utrace_set_events(JCTL).

This is exactly why utrace_set_events() sets -stopped preemptively for
that case.

 Now, utrace_control(UTRACE_STOP) can do nothing until SIGCONT. We don't
 even set -report. Yes, we can't set -stopped if JCTL, we can race with
 utrace_report_jctl() which does REPORT().

Setting JCTL while in TASK_STOPPED made it set -stopped, so
utrace_control() succeeds without calling utrace_do_stop().

 BTW, afaics utrace_report_jctl() has another bug,
 
   REPORT(task, utrace, report, UTRACE_EVENT(JCTL),
  report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED, what);
 
 I think it should do
 
   REPORT(task, utrace, report, UTRACE_EVENT(JCTL),
  report_jctl, what, notify);
 
 instead.

There is a bug, but your fix changes a key API choice.
I've put in this fix:

-  report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED, what);
+  report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED,
+  notify ? what : 0);

There are two things a tracer might be tracking: state or events.
The state is whether the thread is in job control stop or is running.
The events are the SIGCHLD notifications that the thread tries to post to
its parent.

The @type argument shows the state we will be in after the callback.
If the state changes, there will be another callback.  That's what a
state-tracking tracer needs, e.g. to keep a little light on the screen red
while the thread is stopped and green while it's running.

The @notify argument shows what SIGCHLD the parent sees (if it were
dequeuing all possible SIGCHLD postings as quickly as they come).  That's
what an event-tracking tracer needs, e.g. to match up with what SIGCHLDs
are expected in the parent.

Your change to @type would break state-trackers in the case where
tracehook_notify_jctl() is called from get_signal_to_deliver() with
CLD_STOPPED.

 With the first patch, we call utrace_report_jctl() before we actually
 stop. do_signal_stop() can fail then, but I think this is OK, we can
 pretend that SIGCONT/SIGKILL happened after we stopped. It is not complete,
 and with this patch we always call -report_jctl with notify == 0. Just for
 discussion.

I think I sort of understand the intent of your patch.  If we change the
calling convention for tracehook_notify_jctl, I think we want to preserve
the aspect that the hook decides about sending the notification.  That's
how the ptrace quirks can be reimplemented differently later without
changing the tracehook layer again.  Also, we certainly don't want one
tracehook call with two different locking conditions.

It seems right in principle to do the reporting before we change -state,
given that we have to allow for it changing during the callbacks.  And
indeed, that avoids the JCTL special case mess entirely.


Thanks,
Roland



Re: Q: utrace-stopped utrace_report_jctl()

2009-03-13 Thread Oleg Nesterov
On 03/12, Roland McGrath wrote:

  Yep. And utrace_reset() can be called because -stopped == 1.

 Right.

  Let me explain. Again, let's suppose D attaches engine E to the target T.
 
  T enters utrace_report_jctl() with -stopped == 1.
 
  D calls utrace_set_events(events = 0), this removes JCTL from E-flags.
 
  D calls, say, utrace_control(UTRACE_RESUME). Since -stopped == 1, this
  calls utrace_reset() and removes JCTL from T-utrace_flags.

 Right.  In the utrace-indirect code this would have reset the utrace
 pointer too.

  T takes utrace-lock, clears -stopped, and drops the lock.

 In the utrace-indirect code, this part would have been harmless even in the
 race case where it happened (the more likely case being that task-utrace
 was cleared already before utrace_report_jctl looked at it).  (That code
 just had the dangling utrace pointer issue I noticed yesterday, at the end
 of the function.)

 But, yes, this is a problem.  I think this ought to cover it:

 @@ -1659,6 +1659,16 @@ void utrace_report_jctl(int notify, int what)
* longer considered stopped while we run callbacks.
*/
   spin_lock(utrace-lock);
 + /*
 +  * Now that we have the lock, check in case utrace_reset() has
 +  * just now cleared UTRACE_EVENT(JCTL) while it considered us
 +  * safely stopped.  In that case, we should not touch -stopped
 +  * and have nothing else to do.
 +  */
 + if (unlikely(!(task-utrace_flags  UTRACE_EVENT(JCTL {
 + spin_unlock(utrace-lock);
 + return;

I don't think this can help, even if we clear -stopped before return.
It is still possible to set -stopped after that, and since we don't
have JCTL we return from get_signal_to_deliver() bypassing tracehook
calls.

From the previous message:

 That suggests we must preemptively go back to TASK_RUNNING before making
 the callbacks, just in case they would do the transition.
 ...

I thought about this too. But this not easy and not nice.

Roland, I _seem_ to have the vague idea, will return tomorrow.

Oleg.



Re: Q: utrace-stopped utrace_report_jctl()

2009-03-12 Thread Roland McGrath
 I'd like to ask you to clarify what utrace-stopped means...

I'm very glad you are looking into this area!

 My understanding is: if we see -stopped == true under utrace-lock, then
 the target can do nothing interesting from the utrace's pov. The target
 should take utrace-lock at least once. Either in finish_utrace_stop(), or,
 if -stopped was set by do_signal_stop() path, the target will call
 tracehook_get_signal()-utrace_get_signal(). So we can assume the target
 is quiescent and we can do, for example, UTRACE_DETACH safely.

Correct.

 But utrace_report_jctl() doesn't look right to me,
 
   spin_lock(utrace-lock);
   utrace-stopped = 0;
   utrace-report = 0;
   spin_unlock(utrace-lock);
 
 I must admit, I dont't understand the comment above, but obviously this is
 right, we should clear -stopped. If nothing else, REPORT()-start_report()
 won't be happy if -stopped.

The comment mentions utrace being removed, which is a bit of old text
referring to an indirect struct utrace.  Aside from that, please tell me
what is not clear about that comment.

 But -stopped can be restored right after we clear it! Yes, utrace_do_stop()
 and utrace_set_events() set -stopped == 1 only if -utrace_flags has no JCTL,
 and since we are here we must have JCTL.

That's indeed the logic intended to prevent -stopped being set again here.

 But, if we enter utrace_report_jctl() with -stopped == 1, JCTL can be
 already removed from -utrace_flags, exactly because -stopped was true.

I don't follow this.  JCTL is never removed from -utrace_flags, except
as all event bits are, by utrace_reset().

 This leads to another minor question, how it is possible to enter enter
 utrace_report_jctl() with -stopped == 1 ? I think the only possibility
 it was previously set by another call to utrace_report_jctl(), see below.

There are two ways to enter utrace_report_jctl with -stopped set.

1. utrace_report_jctl was called when entering TASK_STOPPED, and set it then.
   Now utrace_report_jctl is called for the CLD_CONTINUED case, and
   -stopped remains set.

2. utrace_set_events(target, UTRACE_EVENT(JCTL)) was called when target was
   already in TASK_STOPPED (and really stopped, or at least got past
   tracehook_notify_jctl before JCTL was set).  It sets -stopped before
   adding JCTL to -utrace_flags, so that utrace_control() will consider
   the target stopped.

 SIGNAL_STOP_STOPPED is not reliable, it is possible that the group stop in
 progress and it is not finished yet. 

SIGNAL_STOP_STOPPED should be reliable, as far as it goes.  It will only be
set if the group stop is complete.  If then a SIGCONT+stop signal come,
SIGCONT will clear SIGNAL_STOP_STOPPED before the stop signal starts
another group stop.  (We have no bad old PTRACE_CONT implementation to
conflict with here.)

 But -group_stop_count is not reliable too. It it possible that we
 recieved SIGCONT and then another SIGSTOP. If another thread has already
 dequeued this SIGSTOP and initiated the new group stop, we can't just set
 TASK_STOPPED, we must participate in the -group_stop_count accounting.

It's worse than that!  If we came out of TASK_STOPPED, we did it implicitly
and without holding the siglock.  

We participated in group_stop_count accounting for the first stop before we
got here.  If we stayed in TASK_STOPPED throughout the callbacks, then that
bookkeeping is still correct.

If the initiation of the new group stop happened while we were in
TASK_STOPPED, we were omitted from the count but we should stop again.  
In that case we should stop either if SIGNAL_STOP_STOPPED is set or if
group_stop_count  0.  Since we weren't counted, if group_stop_count==0
then SIGNAL_STOP_STOPPED will be set (again).

If that initiation happened while a callback (e.g.) blocked in kmalloc or
after (i.e. we were not in TASK_STOPPED), we were included in that count.
In that case we need to decrement group_stop_count and stop again, but
possibly also need to call do_notify_parent_cldstop again if it was 1.  For
that we'd do the right thing just by returning in TASK_RUNNING.  We'll just
come right back around in get_signal_to_deliver and handle group_stop_count
normally.

The trouble is that we have no way to distinguish these two cases, i.e. to
know whether or not we were counted in group_stop_count.  Am I missing a
way?  (The one piece of information we are not using is the @notify
argument: it tells us whether we were the thread responsible for setting
SIGNAL_STOP_STOPPED just before we got here.  But I don't see how that helps.)

I think the bottom line is that we can't ever allow any transition to or
from TASK_STOPPED when we don't hold the siglock.  Every such transition
must hold that lock to manage group_stop_count and SIGNAL_STOP_STOPPED.

That suggests we must preemptively go back to TASK_RUNNING before making
the callbacks, just in case they would do the transition.  We'd take the
siglock and manage the bookkeeping.  But I'm not sure yet how best to do
that.  

Re: Q: utrace-stopped utrace_report_jctl()

2009-03-12 Thread Oleg Nesterov
Roland, I left some parts of your message unanswered because I need to think
more about them...

On 03/12, Roland McGrath wrote:

  But, if we enter utrace_report_jctl() with -stopped == 1, JCTL can be
  already removed from -utrace_flags, exactly because -stopped was true.

 I don't follow this.  JCTL is never removed from -utrace_flags, except
 as all event bits are, by utrace_reset().

Yep. And utrace_reset() can be called because -stopped == 1.

Let me explain. Again, let's suppose D attaches engine E to the target T.

T enters utrace_report_jctl() with -stopped == 1.

D calls utrace_set_events(events = 0), this removes JCTL from E-flags.

D calls, say, utrace_control(UTRACE_RESUME). Since -stopped == 1, this
calls utrace_reset() and removes JCTL from T-utrace_flags.

T takes utrace-lock, clears -stopped, and drops the lock.

D does utrace_control(UTRACE_STOP). This calls utrace_do_stop() which
sees task_is_stopped()  !JCTL, so it sets -stopped = true.

T calls REPORT() and start_report() hits the (correct) BUG_ON(stopped).

No?

  This leads to another minor question, how it is possible to enter enter
  utrace_report_jctl() with -stopped == 1 ? I think the only possibility
  it was previously set by another call to utrace_report_jctl(), see below.

 There are two ways to enter utrace_report_jctl with -stopped set.

 1. utrace_report_jctl was called when entering TASK_STOPPED, and set it then.
Now utrace_report_jctl is called for the CLD_CONTINUED case, and
-stopped remains set.

this is covered by my guess above,

 2. utrace_set_events(target, UTRACE_EVENT(JCTL)) was called when target was
already in TASK_STOPPED (and really stopped, or at least got past
tracehook_notify_jctl before JCTL was set).  It sets -stopped before
adding JCTL to -utrace_flags,

Yes, thanks. I missed this.

  SIGNAL_STOP_STOPPED is not reliable, it is possible that the group stop in
  progress and it is not finished yet.

 SIGNAL_STOP_STOPPED should be reliable, as far as it goes.  It will only be
 set if the group stop is complete.

Yes sure. I wasn't clear. I meant, what if SIGNAL_STOP_STOPPED is not set?
This doesn't mean we don't need  __set_current_state(TASK_STOPPED), it is
possible that the group-stop is in progress and -group_stop_count != 0.

  But! can't we miss utrace_wakeup() ?

 I think you've found something (though not quite the scenario you describe).

  Let's suppose the debugger D attaches the single engine E to the target T.
 
  D does utrace_set_events(JCTL).
 
  T calls do_signal_stop(), tracehook_notify_jctl() sees JCTL and starts
  utrace_report_jctl().
 
  D does utrace_set_events(events = 0), this clears E-flags.
 
  T calls REPORT(), nobody needs needs JCTL, finish_report() sees 
  !-takers
  and calls utrace_reset(). It sets -utrace_flags = 0.

 Nope:

   flags |= engine-flags | UTRACE_EVENT(REAP);

Ah, thanks. Can't understand how I didn't notice this, I checked the
code several times ;)

But as you pointed out,

 So, change your scenario to:

   D does utrace_control(UTRACE_DETACH).

 and then this will happen.

Yes.

Oleg.



Re: Q: utrace-stopped utrace_report_jctl()

2009-03-12 Thread Roland McGrath
 Yep. And utrace_reset() can be called because -stopped == 1.

Right.

 Let me explain. Again, let's suppose D attaches engine E to the target T.
 
 T enters utrace_report_jctl() with -stopped == 1.
 
 D calls utrace_set_events(events = 0), this removes JCTL from E-flags.
 
 D calls, say, utrace_control(UTRACE_RESUME). Since -stopped == 1, this
 calls utrace_reset() and removes JCTL from T-utrace_flags.

Right.  In the utrace-indirect code this would have reset the utrace
pointer too.

 T takes utrace-lock, clears -stopped, and drops the lock.

In the utrace-indirect code, this part would have been harmless even in the
race case where it happened (the more likely case being that task-utrace
was cleared already before utrace_report_jctl looked at it).  (That code
just had the dangling utrace pointer issue I noticed yesterday, at the end
of the function.)

But, yes, this is a problem.  I think this ought to cover it:

@@ -1659,6 +1659,16 @@ void utrace_report_jctl(int notify, int what)
 * longer considered stopped while we run callbacks.
 */
spin_lock(utrace-lock);
+   /*
+* Now that we have the lock, check in case utrace_reset() has
+* just now cleared UTRACE_EVENT(JCTL) while it considered us
+* safely stopped.  In that case, we should not touch -stopped
+* and have nothing else to do.
+*/
+   if (unlikely(!(task-utrace_flags  UTRACE_EVENT(JCTL {
+   spin_unlock(utrace-lock);
+   return;
+   }
utrace-stopped = 0;
utrace-report = 0;
spin_unlock(utrace-lock);

  2. utrace_set_events(target, UTRACE_EVENT(JCTL)) was called when target was
 already in TASK_STOPPED (and really stopped, or at least got past
 tracehook_notify_jctl before JCTL was set).  It sets -stopped before
 adding JCTL to -utrace_flags,
 
 Yes, thanks. I missed this.

I feel I should also point out the case where exit_signals() calls
tracehook_notify_jctl, because I just noticed it.  I don't think that path
existed the last time I thought seriously about the utrace_report_jctl
logic.  (This is not a #3 in that list, but in general is another path we
need to keep in mind here.)

 Yes sure. I wasn't clear. I meant, what if SIGNAL_STOP_STOPPED is not set?
 This doesn't mean we don't need  __set_current_state(TASK_STOPPED), it is
 possible that the group-stop is in progress and -group_stop_count != 0.

Right.


Thanks,
Roland