Re: Q: utrace-stopped utrace_report_jctl()
On 03/15, Roland McGrath wrote: Then we re-do this (well, almost) check under -siglock, } else if (task_is_stopped(target)) { if (!(target-utrace_flags UTRACE_EVENT(JCTL))) utrace-stopped = stopped = true; } But this is not nice. Let's suppose the task is already stopped, we do UTRACE_ATTACH + utrace_set_events(JCTL). This is exactly why utrace_set_events() sets -stopped preemptively for that case. Yes, thanks. I saw this code in utrace_set_events(), but then forgot. REPORT(task, utrace, report, UTRACE_EVENT(JCTL), report_jctl, what, notify); instead. There is a bug, but your fix changes a key API choice. I've put in this fix: -report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED, what); +report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED, +notify ? what : 0); The @type argument shows the state we will be in after the callback. If the state changes, there will be another callback. That's what a state-tracking tracer needs, e.g. to keep a little light on the screen red while the thread is stopped and green while it's running. The @notify argument shows what SIGCHLD the parent sees (if it were dequeuing all possible SIGCHLD postings as quickly as they come). That's what an event-tracking tracer needs, e.g. to match up with what SIGCHLDs are expected in the parent. I see, thanks. With the first patch, we call utrace_report_jctl() before we actually stop. do_signal_stop() can fail then, but I think this is OK, we can pretend that SIGCONT/SIGKILL happened after we stopped. It is not complete, and with this patch we always call -report_jctl with notify == 0. Just for discussion. I think I sort of understand the intent of your patch. If we change the calling convention for tracehook_notify_jctl, I think we want to preserve the aspect that the hook decides about sending the notification. That's how the ptrace quirks can be reimplemented differently later without changing the tracehook layer again. Also, we certainly don't want one tracehook call with two different locking conditions. Agreed, bool sig_locked is awful. But we can avoid it. The real problem is how to figure out the correct notify argument. I'll try to think more, but I am not sure I will find the clean solution :( Just in case. We can introduce PF_SIGCONTED flag and change prepare_signal(SIGCONT) and signal_wake_up(resume = 1) to set this flag. Since the task never changes its -flags in finish_stop() path, it is safe to do this under -siglock. This way utrace_report_jctl() can drop TASK_STOPPED before REPORT() and then check !PF_SIGCONTED before restoring the -state. But yes sure, this is very, very ugly. Oleg.
Re: Q: utrace-stopped utrace_report_jctl()
I was wrong, I forgot that tracehook_get_signal() doesn't need JCTL. Right, that is key. OK, let's look at utrace_do_stop: if (task_is_stopped(target) !(target-utrace_flags UTRACE_EVENT(JCTL))) { utrace-stopped = 1; return true; } This doesn't look correct. We don't hold -siglock, the task can be SIGCONT'ed and return from get_signal_to_deliver(), and then we set -stopped. Or I missed something again? I think you're right. The logic there was supposed to be, TASK_STOPPED means it will get into utrace_get_signal(). That much is true, but nothing inside utrace_get_signal() actually synchronizes with this to make that matter. All this check does is try to optimize the TASK_STOPPED case not to take the siglock. That doesn't seem worth much, so we can just drop it. Then we re-do this (well, almost) check under -siglock, } else if (task_is_stopped(target)) { if (!(target-utrace_flags UTRACE_EVENT(JCTL))) utrace-stopped = stopped = true; } But this is not nice. Let's suppose the task is already stopped, we do UTRACE_ATTACH + utrace_set_events(JCTL). This is exactly why utrace_set_events() sets -stopped preemptively for that case. Now, utrace_control(UTRACE_STOP) can do nothing until SIGCONT. We don't even set -report. Yes, we can't set -stopped if JCTL, we can race with utrace_report_jctl() which does REPORT(). Setting JCTL while in TASK_STOPPED made it set -stopped, so utrace_control() succeeds without calling utrace_do_stop(). BTW, afaics utrace_report_jctl() has another bug, REPORT(task, utrace, report, UTRACE_EVENT(JCTL), report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED, what); I think it should do REPORT(task, utrace, report, UTRACE_EVENT(JCTL), report_jctl, what, notify); instead. There is a bug, but your fix changes a key API choice. I've put in this fix: - report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED, what); + report_jctl, was_stopped ? CLD_STOPPED : CLD_CONTINUED, + notify ? what : 0); There are two things a tracer might be tracking: state or events. The state is whether the thread is in job control stop or is running. The events are the SIGCHLD notifications that the thread tries to post to its parent. The @type argument shows the state we will be in after the callback. If the state changes, there will be another callback. That's what a state-tracking tracer needs, e.g. to keep a little light on the screen red while the thread is stopped and green while it's running. The @notify argument shows what SIGCHLD the parent sees (if it were dequeuing all possible SIGCHLD postings as quickly as they come). That's what an event-tracking tracer needs, e.g. to match up with what SIGCHLDs are expected in the parent. Your change to @type would break state-trackers in the case where tracehook_notify_jctl() is called from get_signal_to_deliver() with CLD_STOPPED. With the first patch, we call utrace_report_jctl() before we actually stop. do_signal_stop() can fail then, but I think this is OK, we can pretend that SIGCONT/SIGKILL happened after we stopped. It is not complete, and with this patch we always call -report_jctl with notify == 0. Just for discussion. I think I sort of understand the intent of your patch. If we change the calling convention for tracehook_notify_jctl, I think we want to preserve the aspect that the hook decides about sending the notification. That's how the ptrace quirks can be reimplemented differently later without changing the tracehook layer again. Also, we certainly don't want one tracehook call with two different locking conditions. It seems right in principle to do the reporting before we change -state, given that we have to allow for it changing during the callbacks. And indeed, that avoids the JCTL special case mess entirely. Thanks, Roland
Re: Q: utrace-stopped utrace_report_jctl()
On 03/12, Roland McGrath wrote: Yep. And utrace_reset() can be called because -stopped == 1. Right. Let me explain. Again, let's suppose D attaches engine E to the target T. T enters utrace_report_jctl() with -stopped == 1. D calls utrace_set_events(events = 0), this removes JCTL from E-flags. D calls, say, utrace_control(UTRACE_RESUME). Since -stopped == 1, this calls utrace_reset() and removes JCTL from T-utrace_flags. Right. In the utrace-indirect code this would have reset the utrace pointer too. T takes utrace-lock, clears -stopped, and drops the lock. In the utrace-indirect code, this part would have been harmless even in the race case where it happened (the more likely case being that task-utrace was cleared already before utrace_report_jctl looked at it). (That code just had the dangling utrace pointer issue I noticed yesterday, at the end of the function.) But, yes, this is a problem. I think this ought to cover it: @@ -1659,6 +1659,16 @@ void utrace_report_jctl(int notify, int what) * longer considered stopped while we run callbacks. */ spin_lock(utrace-lock); + /* + * Now that we have the lock, check in case utrace_reset() has + * just now cleared UTRACE_EVENT(JCTL) while it considered us + * safely stopped. In that case, we should not touch -stopped + * and have nothing else to do. + */ + if (unlikely(!(task-utrace_flags UTRACE_EVENT(JCTL { + spin_unlock(utrace-lock); + return; I don't think this can help, even if we clear -stopped before return. It is still possible to set -stopped after that, and since we don't have JCTL we return from get_signal_to_deliver() bypassing tracehook calls. From the previous message: That suggests we must preemptively go back to TASK_RUNNING before making the callbacks, just in case they would do the transition. ... I thought about this too. But this not easy and not nice. Roland, I _seem_ to have the vague idea, will return tomorrow. Oleg.
Re: Q: utrace-stopped utrace_report_jctl()
I'd like to ask you to clarify what utrace-stopped means... I'm very glad you are looking into this area! My understanding is: if we see -stopped == true under utrace-lock, then the target can do nothing interesting from the utrace's pov. The target should take utrace-lock at least once. Either in finish_utrace_stop(), or, if -stopped was set by do_signal_stop() path, the target will call tracehook_get_signal()-utrace_get_signal(). So we can assume the target is quiescent and we can do, for example, UTRACE_DETACH safely. Correct. But utrace_report_jctl() doesn't look right to me, spin_lock(utrace-lock); utrace-stopped = 0; utrace-report = 0; spin_unlock(utrace-lock); I must admit, I dont't understand the comment above, but obviously this is right, we should clear -stopped. If nothing else, REPORT()-start_report() won't be happy if -stopped. The comment mentions utrace being removed, which is a bit of old text referring to an indirect struct utrace. Aside from that, please tell me what is not clear about that comment. But -stopped can be restored right after we clear it! Yes, utrace_do_stop() and utrace_set_events() set -stopped == 1 only if -utrace_flags has no JCTL, and since we are here we must have JCTL. That's indeed the logic intended to prevent -stopped being set again here. But, if we enter utrace_report_jctl() with -stopped == 1, JCTL can be already removed from -utrace_flags, exactly because -stopped was true. I don't follow this. JCTL is never removed from -utrace_flags, except as all event bits are, by utrace_reset(). This leads to another minor question, how it is possible to enter enter utrace_report_jctl() with -stopped == 1 ? I think the only possibility it was previously set by another call to utrace_report_jctl(), see below. There are two ways to enter utrace_report_jctl with -stopped set. 1. utrace_report_jctl was called when entering TASK_STOPPED, and set it then. Now utrace_report_jctl is called for the CLD_CONTINUED case, and -stopped remains set. 2. utrace_set_events(target, UTRACE_EVENT(JCTL)) was called when target was already in TASK_STOPPED (and really stopped, or at least got past tracehook_notify_jctl before JCTL was set). It sets -stopped before adding JCTL to -utrace_flags, so that utrace_control() will consider the target stopped. SIGNAL_STOP_STOPPED is not reliable, it is possible that the group stop in progress and it is not finished yet. SIGNAL_STOP_STOPPED should be reliable, as far as it goes. It will only be set if the group stop is complete. If then a SIGCONT+stop signal come, SIGCONT will clear SIGNAL_STOP_STOPPED before the stop signal starts another group stop. (We have no bad old PTRACE_CONT implementation to conflict with here.) But -group_stop_count is not reliable too. It it possible that we recieved SIGCONT and then another SIGSTOP. If another thread has already dequeued this SIGSTOP and initiated the new group stop, we can't just set TASK_STOPPED, we must participate in the -group_stop_count accounting. It's worse than that! If we came out of TASK_STOPPED, we did it implicitly and without holding the siglock. We participated in group_stop_count accounting for the first stop before we got here. If we stayed in TASK_STOPPED throughout the callbacks, then that bookkeeping is still correct. If the initiation of the new group stop happened while we were in TASK_STOPPED, we were omitted from the count but we should stop again. In that case we should stop either if SIGNAL_STOP_STOPPED is set or if group_stop_count 0. Since we weren't counted, if group_stop_count==0 then SIGNAL_STOP_STOPPED will be set (again). If that initiation happened while a callback (e.g.) blocked in kmalloc or after (i.e. we were not in TASK_STOPPED), we were included in that count. In that case we need to decrement group_stop_count and stop again, but possibly also need to call do_notify_parent_cldstop again if it was 1. For that we'd do the right thing just by returning in TASK_RUNNING. We'll just come right back around in get_signal_to_deliver and handle group_stop_count normally. The trouble is that we have no way to distinguish these two cases, i.e. to know whether or not we were counted in group_stop_count. Am I missing a way? (The one piece of information we are not using is the @notify argument: it tells us whether we were the thread responsible for setting SIGNAL_STOP_STOPPED just before we got here. But I don't see how that helps.) I think the bottom line is that we can't ever allow any transition to or from TASK_STOPPED when we don't hold the siglock. Every such transition must hold that lock to manage group_stop_count and SIGNAL_STOP_STOPPED. That suggests we must preemptively go back to TASK_RUNNING before making the callbacks, just in case they would do the transition. We'd take the siglock and manage the bookkeeping. But I'm not sure yet how best to do that.
Re: Q: utrace-stopped utrace_report_jctl()
Roland, I left some parts of your message unanswered because I need to think more about them... On 03/12, Roland McGrath wrote: But, if we enter utrace_report_jctl() with -stopped == 1, JCTL can be already removed from -utrace_flags, exactly because -stopped was true. I don't follow this. JCTL is never removed from -utrace_flags, except as all event bits are, by utrace_reset(). Yep. And utrace_reset() can be called because -stopped == 1. Let me explain. Again, let's suppose D attaches engine E to the target T. T enters utrace_report_jctl() with -stopped == 1. D calls utrace_set_events(events = 0), this removes JCTL from E-flags. D calls, say, utrace_control(UTRACE_RESUME). Since -stopped == 1, this calls utrace_reset() and removes JCTL from T-utrace_flags. T takes utrace-lock, clears -stopped, and drops the lock. D does utrace_control(UTRACE_STOP). This calls utrace_do_stop() which sees task_is_stopped() !JCTL, so it sets -stopped = true. T calls REPORT() and start_report() hits the (correct) BUG_ON(stopped). No? This leads to another minor question, how it is possible to enter enter utrace_report_jctl() with -stopped == 1 ? I think the only possibility it was previously set by another call to utrace_report_jctl(), see below. There are two ways to enter utrace_report_jctl with -stopped set. 1. utrace_report_jctl was called when entering TASK_STOPPED, and set it then. Now utrace_report_jctl is called for the CLD_CONTINUED case, and -stopped remains set. this is covered by my guess above, 2. utrace_set_events(target, UTRACE_EVENT(JCTL)) was called when target was already in TASK_STOPPED (and really stopped, or at least got past tracehook_notify_jctl before JCTL was set). It sets -stopped before adding JCTL to -utrace_flags, Yes, thanks. I missed this. SIGNAL_STOP_STOPPED is not reliable, it is possible that the group stop in progress and it is not finished yet. SIGNAL_STOP_STOPPED should be reliable, as far as it goes. It will only be set if the group stop is complete. Yes sure. I wasn't clear. I meant, what if SIGNAL_STOP_STOPPED is not set? This doesn't mean we don't need __set_current_state(TASK_STOPPED), it is possible that the group-stop is in progress and -group_stop_count != 0. But! can't we miss utrace_wakeup() ? I think you've found something (though not quite the scenario you describe). Let's suppose the debugger D attaches the single engine E to the target T. D does utrace_set_events(JCTL). T calls do_signal_stop(), tracehook_notify_jctl() sees JCTL and starts utrace_report_jctl(). D does utrace_set_events(events = 0), this clears E-flags. T calls REPORT(), nobody needs needs JCTL, finish_report() sees !-takers and calls utrace_reset(). It sets -utrace_flags = 0. Nope: flags |= engine-flags | UTRACE_EVENT(REAP); Ah, thanks. Can't understand how I didn't notice this, I checked the code several times ;) But as you pointed out, So, change your scenario to: D does utrace_control(UTRACE_DETACH). and then this will happen. Yes. Oleg.
Re: Q: utrace-stopped utrace_report_jctl()
Yep. And utrace_reset() can be called because -stopped == 1. Right. Let me explain. Again, let's suppose D attaches engine E to the target T. T enters utrace_report_jctl() with -stopped == 1. D calls utrace_set_events(events = 0), this removes JCTL from E-flags. D calls, say, utrace_control(UTRACE_RESUME). Since -stopped == 1, this calls utrace_reset() and removes JCTL from T-utrace_flags. Right. In the utrace-indirect code this would have reset the utrace pointer too. T takes utrace-lock, clears -stopped, and drops the lock. In the utrace-indirect code, this part would have been harmless even in the race case where it happened (the more likely case being that task-utrace was cleared already before utrace_report_jctl looked at it). (That code just had the dangling utrace pointer issue I noticed yesterday, at the end of the function.) But, yes, this is a problem. I think this ought to cover it: @@ -1659,6 +1659,16 @@ void utrace_report_jctl(int notify, int what) * longer considered stopped while we run callbacks. */ spin_lock(utrace-lock); + /* +* Now that we have the lock, check in case utrace_reset() has +* just now cleared UTRACE_EVENT(JCTL) while it considered us +* safely stopped. In that case, we should not touch -stopped +* and have nothing else to do. +*/ + if (unlikely(!(task-utrace_flags UTRACE_EVENT(JCTL { + spin_unlock(utrace-lock); + return; + } utrace-stopped = 0; utrace-report = 0; spin_unlock(utrace-lock); 2. utrace_set_events(target, UTRACE_EVENT(JCTL)) was called when target was already in TASK_STOPPED (and really stopped, or at least got past tracehook_notify_jctl before JCTL was set). It sets -stopped before adding JCTL to -utrace_flags, Yes, thanks. I missed this. I feel I should also point out the case where exit_signals() calls tracehook_notify_jctl, because I just noticed it. I don't think that path existed the last time I thought seriously about the utrace_report_jctl logic. (This is not a #3 in that list, but in general is another path we need to keep in mind here.) Yes sure. I wasn't clear. I meant, what if SIGNAL_STOP_STOPPED is not set? This doesn't mean we don't need __set_current_state(TASK_STOPPED), it is possible that the group-stop is in progress and -group_stop_count != 0. Right. Thanks, Roland