Re: [PATCH] utrace: utrace_reset() should clear TIF_SINGLESTEP if no more engines
On 10/22, Roland McGrath wrote: Hmm. If task is not stopped then it is current (except utrace_control(DETACH) can play with the dying task). Right, asynchronous detach was the problematic case I was concerned with. but asynchronous detach doesn't do utrace_reset(), unless the tracee is stopped or exiting (-exit_state != 0). But the whole problem we have is that we aren't getting to that path when we've done a detach, right? Confused. We already discussed this before, OK, I'll do some testing and resend right now. In UTRACE_DETACH case reset can be true but the tracee is running. But I don't think it makes sense to check target-exit_state == 0, correct? I think it's OK if it's running with -exit_state set (or even with just PF_EXITING set), i.e. already in kernel mode and never going back to user mode. (Then it's essentially equivalent to calling user_*_step while racing with a SIGKILL death, which has to be OK.) Any other kind of running would not be OK. Probably we misunderstood each other. In any case I agree, that patch was not good. Oleg.
Re: [PATCH] utrace: utrace_reset() should clear TIF_SINGLESTEP if no more engines
The 1st patch I sent initially: --- kstub/kernel/utrace.c~10_utrace_reset_should_clear_ss 2010-09-20 03:55:12.0 +0200 +++ kstub/kernel/utrace.c 2010-09-20 21:33:47.0 +0200 @@ -709,6 +709,7 @@ static bool utrace_reset(struct task_str */ utrace-resume = UTRACE_RESUME; utrace-signal_handler = 0; + user_disable_single_step(task); } /* It clears TIF_SINGLESTEP when the last engine detaches. This we didn't want because utrace_reset can be called when the tracee is not stopped, and it's not safe to call user_*_step then (with the one caveat that the SIGKILL race is OK). The 2nd one which changes utrace_control(DETACH), @@ -1139,7 +1139,9 @@ int utrace_control(struct task_struct *t target-utrace_flags = ~ENGINE_STOP; mark_engine_detached(engine); reset = reset || utrace_do_stop(target, utrace); - if (!reset) { + if (reset) { + user_disable_single_step(target); + } else { /* * As in utrace_set_events(), this barrier ensures * that our engine-flags changes have hit before we It doesn't cover the case when the TIF_SINGLESTEP tracee detaches itself, but a) currently the are no such engines, and b) it looks more clean. If we're trying to meet the advertised API--how it's documented now is the only reasonable thing if we are actually fixing things properly at all--we do need to cover the case where the only engine to have requested single-step earlier then later returns UTRACE_DETACH from a callback. That is, if that engine could actually know that the tracee never reached user mode between its use of UTRACE_SINGLESTEP and its later UTRACE_DETACH. However, I am starting to think that if you prefer this change we have a better option. Instead of duplicating UTRACE_RESUME logic, perhaps the patch below makes more sense? Hmm, that does seem like it might be pretty reasonable. If your engine had ever permitted the tracee to get back to user mode after requesting UTRACE_*STEP, then it's just desserts to cause that SIGTRAP regardless of whether other engines might have kept it stopped in actuality. But, there might be plausible corners where an engine really could know reliably (without outside knowledge of what other engines might be doing) that the tracee can't really have been back to user mode yet. For example, if it checked signal_pending(tracee) before it used UTRACE_*STEP, then it could know for sure that its report_signal callback will have been called before it ever got to user mode, so if that callback returns UTRACE_DETACH, it better not be left with stepping enabled. In that particular case, it shouldn't be a problem, since after its report_signal we will always hit finish_resume_report later. But we should think about if there are any holes of a similar nature. Please tell me which fix you prefer? Or just commit the fix you like more. I'm only going to merge a specific commit of yours that you have tested. Thanks, Roland
Re: [PATCH] utrace: utrace_reset() should clear TIF_SINGLESTEP if no more engines
I think we should start with changing utrace_control(DETACH) anyway, then try to improve. I'll ressend the one-liner I already showed. Ok. Hmm. I'll try to think more. Right now I don't really understand how to do this correctly. I wasn't immediately sure either. OK, finish_callback_report() and utrace_control(DETACH) can set TIF_NOTIFY_RESUME. Right. Those utrace_resume has the report.action==UTRACE_RESUME bail-out case. So either that would change or detach would also do UTRACE_REPORT. But what if there are no more attached engines? Looks like, utrace_resume(UTRACE_RESUME) needs to handle this special case. And utrace_reset() shouldn't clear task-utrace_flags, otherwise utrace_resume/utrace_get_signal won't be called. Right. Or else tracehook_notify_resume could call utrace_resume unconditionally, but I'm not at all sure that is not worse. The original theory was that it should always be OK to have some utrace_flags bits stay set when they are stale, because any kind of reporting pass that got enabled would hit the report-spurious case and clean the state up synchronously when it's safe. So, probably detach should set TIF_NOTIFY_RESUME, but utrace_reset() should do user_disable_single_step() too if no more engines. Confused. If there are no more engines but the tracee is still running, we still shouldn't do it there because it still might not be entirely safe. If the tracee is not stopped, it's only safe to call in current. And in fact I don't understand why this is important. When it comes to multiracing, any engine can hit the unwanted/unexpected trap because another engine can ask for UTRACE_*STEP. Right. An engine earlier in the list could swallow the signal so the next engines' callbacks didn't see it. But it doesn't know that some later engines didn't also ask for stepping. So there would have to be some understood convention between engines. For example, a later engine could see info-si_signo==SIGTRAP et al and act on that even when the incoming utrace_signal_action(action)==UTRACE_SIGNAL_IGN. Of course, that doesn't help a non-stepping engine that is earlier in the list to know that a later engine will be swallowing the signal. The original theory on this was that we'd one day stop overloading user signals for debugger-induced traps. In some past TODO lists and postings I referred to extension events. The idea (in part) was that things like hardware stepping would generate a special new flavor of utrace event rather than a real signal that has to be intercepted. Then engines' callbacks would easily see that this was a debugging event induced by some engine and ignore it (or more likely, just never get any callback unless your engine registered interest in stepping). This would also address the case of asynchronous engine detach just after a trap has actually hit, where today the SIGTRAP is queued and then later won't be intercepted by a debugger, and instead kills the user process. But we don't have any of that now, and don't yet know if we will really pursue any big improvements at this API level. The only really important (I think) case is when the last engine detaches. That's the most important case, sure. But in any case that is not actually racy, we should avoid later spurious traps. IOW. Suppose that eninge E does utrace_control(STEP) or its callback returns UTRACE_*STEP. If we do not detach this engine, other engines will see the trap. That's only so if the tracee actually gets back to user mode before we have another reporting pass. So why it is so important to clear X86_EFLAGS_TF if we detach E ? Perhaps I am worrying too much about it. The worst thing is if it could really get stuck. But that shouldn't be possible if there are any engines at all, perhaps only any with UTRACE_EVENT(QUIESCE) set. Worst case, one spurious SIGTRAP will get to a report_signal pass, but nobody will return UTRACE_*STEP again, so there won't be another. (Of course, nobody will swallow that SIGTRAP and so it will terminate the process first anyway, but that's another problem.) What seems important is any non-racy scenarios. That is, where perhaps it wasn't properly stopped and E detached asynchronously, but in practice the tracee was known to be otherwise blocked elsewhere or something, so the detach should have full effect before it returns to user mode. But that is just vague theory off hand. Thanks, Roland
Re: [PATCH] utrace: utrace_reset() should clear TIF_SINGLESTEP if no more engines
On 09/21, Roland McGrath wrote: OK, so what should we do for now? If we can't come to a straightforward answer for the general case fairly quickly, then we can do the simple change to start with. I think we should start with changing utrace_control(DETACH) anyway, then try to improve. I'll ressend the one-liner I already showed. Without the multitracing utrace_resume()-user_disable_single_step() fixes the problem. Otherwise we probably need ENGINE_WANTS_SINGLESTEP. No, no, we don't want that. We don't need more state. And, remember, we really don't need to microoptimize cases when single-step was used recently--the cost of taking one more single-step trap and percolating through the signal paths was going to be pretty large anyway. OK, It's better to have a spurious report (preferably just spurious TIF_NOTIFY_RESUME with no actual callbacks) following any detach, or whatever it takes to ensure user_disable_single_step() always runs if user_enable_*_step did before and there is no engine around to see a report_quiesce callback pass. If there is a report_quiesce or report_signal callback pass where nobody returns UTRACE_*STEP, then we should never leave stepping enabled when we return to user mode. Hmm. I'll try to think more. Right now I don't really understand how to do this correctly. OK, finish_callback_report() and utrace_control(DETACH) can set TIF_NOTIFY_RESUME. But what if there are no more attached engines? Looks like, utrace_resume(UTRACE_RESUME) needs to handle this special case. And utrace_reset() shouldn't clear task-utrace_flags, otherwise utrace_resume/utrace_get_signal won't be called. So, probably detach should set TIF_NOTIFY_RESUME, but utrace_reset() should do user_disable_single_step() too if no more engines. Confused. And in fact I don't understand why this is important. When it comes to multiracing, any engine can hit the unwanted/unexpected trap because another engine can ask for UTRACE_*STEP. The only really important (I think) case is when the last engine detaches. IOW. Suppose that eninge E does utrace_control(STEP) or its callback returns UTRACE_*STEP. If we do not detach this engine, other engines will see the trap. So why it is so important to clear X86_EFLAGS_TF if we detach E ? Oleg.
Re: [PATCH] utrace: utrace_reset() should clear TIF_SINGLESTEP if no more engines
On 09/23, Oleg Nesterov wrote: On 09/21, Roland McGrath wrote: It's better to have a spurious report (preferably just spurious TIF_NOTIFY_RESUME with no actual callbacks) following any detach, or whatever it takes to ensure user_disable_single_step() always runs if user_enable_*_step did before and there is no engine around to see a report_quiesce callback pass. If there is a report_quiesce or report_signal callback pass where nobody returns UTRACE_*STEP, then we should never leave stepping enabled when we return to user mode. Hmm. I'll try to think more. Right now I don't really understand how to do this correctly. OK, finish_callback_report() and utrace_control(DETACH) can set TIF_NOTIFY_RESUME. But what if there are no more attached engines? Looks like, utrace_resume(UTRACE_RESUME) needs to handle this special case. And utrace_reset() shouldn't clear task-utrace_flags, otherwise utrace_resume/utrace_get_signal won't be called. So, probably detach should set TIF_NOTIFY_RESUME, but utrace_reset() should do user_disable_single_step() too if no more engines. Confused. Or, detach can always do user_disable_single_step() and set TIF_NOTIFY_RESUME. If another engine wants stepping its report_quiesce() should re-ack UTRACE_SINGLESTEP anyway, otherwise it is buggy. And in fact I don't understand why this is important. When it comes to multiracing, any engine can hit the unwanted/unexpected trap because another engine can ask for UTRACE_*STEP. The only really important (I think) case is when the last engine detaches. Aaah. please ignore. Somehow I assumed every engine should hook SIGTRAP. Oleg.
Re: [PATCH] utrace: utrace_reset() should clear TIF_SINGLESTEP if no more engines
Change utrace_reset() to do user_disable_single_step(). Alternatively we could change ptrace layer, but I think this is not ptrace specific. Yes, it's right to fix this in the utrace layer. But I'm not sure that's the right place in the code to fix it. The expectation is that either we'll get to finish_resume_report, which calls user_*_step, or that utrace_control is resuming us without any expectation of a report. For UTRACE_RESUME in that case, utrace_control calls user_disable_single_step. So probably the UTRACE_DETACH case there should just do it to. Thanks, Roland
Re: [PATCH] utrace: utrace_reset() should clear TIF_SINGLESTEP if no more engines
On 09/21, Roland McGrath wrote: Change utrace_reset() to do user_disable_single_step(). Alternatively we could change ptrace layer, but I think this is not ptrace specific. Yes, it's right to fix this in the utrace layer. But I'm not sure that's the right place in the code to fix it. The expectation is that either we'll get to finish_resume_report, which calls user_*_step, or that utrace_control is resuming us without any expectation of a report. For UTRACE_RESUME in that case, utrace_control calls user_disable_single_step. So probably the UTRACE_DETACH case there should just do it to. Yes, initially I was going to change utrace_control(DETACH), and this certainly looks more clean. I was worried about the case when the TIF_SINGLESTEP tracee detaches itself and escapes finish_resume_report()-user_disable_single_step(), say, utrace_report_exec(). But probably we can ignore this. Another concern was implicit detach, but thinking more I do not see why utrace_resume() is better. OK, I'll do some testing and resend right now. In UTRACE_DETACH case reset can be true but the tracee is running. But I don't think it makes sense to check target-exit_state == 0, correct? Oleg.
Re: [PATCH] utrace: utrace_reset() should clear TIF_SINGLESTEP if no more engines
On 09/21, Roland McGrath wrote: I was worried about the case when the TIF_SINGLESTEP tracee detaches itself and escapes finish_resume_report()-user_disable_single_step(), say, utrace_report_exec(). But probably we can ignore this. No, I think that is indeed a problem. If no engine is still attached whose last callback returned UTRACE_SINGLESTEP, we should never return to user mode with single-step enabled. OK, so what should we do for now? Without the multitracing utrace_resume()-user_disable_single_step() fixes the problem. Otherwise we probably need ENGINE_WANTS_SINGLESTEP. Another concern was implicit detach, but thinking more I do not see why utrace_resume() is better. OK, I'll do some testing and resend right now. In UTRACE_DETACH case reset can be true but the tracee is running. But I don't think it makes sense to check target-exit_state == 0, correct? I think it's OK if it's running with -exit_state set (or even with just PF_EXITING set), i.e. already in kernel mode and never going back to user mode. (Then it's essentially equivalent to calling user_*_step while racing with a SIGKILL death, which has to be OK.) Any other kind of running would not be OK. Yes, my concern was running and !current, not exiting. This is OK on x86 but user_disable_single_step() is arch specific. Oleg.
Re: [PATCH] utrace: utrace_reset() should clear TIF_SINGLESTEP if no more engines
On 09/22, Oleg Nesterov wrote: On 09/21, Roland McGrath wrote: I was worried about the case when the TIF_SINGLESTEP tracee detaches itself and escapes finish_resume_report()-user_disable_single_step(), say, utrace_report_exec(). But probably we can ignore this. No, I think that is indeed a problem. If no engine is still attached whose last callback returned UTRACE_SINGLESTEP, we should never return to user mode with single-step enabled. OK, so what should we do for now? Without the multitracing utrace_resume()-user_disable_single_step() fixes the problem. Otherwise we probably need ENGINE_WANTS_SINGLESTEP. As expected, the patch below equally fixes the problem and passes ptrace-tests. Which one do you prefer? Note that, since we are going to change UTRACE_RESUME to remove ENGINE_STOP from utrace_flags, we can probably unify DETACH/RESUME case UTRACE_DETACH: mark_engine_detached(engine); reset = reset || utrace_do_stop(target, utrace); if (!reset) { /* * As in utrace_set_events(), this barrier ensures * that our engine-flags changes have hit before we * examine utrace-reporting, pairing with the barrier * in start_callback(). If @target has not yet hit * finish_callback() to clear utrace-reporting, we * might be in the middle of a callback to @engine. */ smp_mb(); if (utrace-reporting == engine) ret = -EINPROGRESS; } /* fall */ case UTRACE_RESUME: clear_engine_wants_stop(engine); target-utrace_flags = ~ENGINE_STOP; /* * This and all other cases imply resuming if stopped. * There might not be another report before it just * resumes, so make sure single-step is not left set. */ if (likely(reset)) user_disable_single_step(target); break; --- kstub/kernel/utrace.c~10_utrace_reset_should_clear_ss 2010-09-20 03:55:12.0 +0200 +++ kstub/kernel/utrace.c 2010-09-22 01:54:24.0 +0200 @@ -1139,7 +1139,9 @@ int utrace_control(struct task_struct *t target-utrace_flags = ~ENGINE_STOP; mark_engine_detached(engine); reset = reset || utrace_do_stop(target, utrace); - if (!reset) { + if (reset) { + user_disable_single_step(target); + } else { /* * As in utrace_set_events(), this barrier ensures * that our engine-flags changes have hit before we
Re: [PATCH] utrace: utrace_reset() should clear TIF_SINGLESTEP if no more engines
OK, so what should we do for now? If we can't come to a straightforward answer for the general case fairly quickly, then we can do the simple change to start with. Without the multitracing utrace_resume()-user_disable_single_step() fixes the problem. Otherwise we probably need ENGINE_WANTS_SINGLESTEP. No, no, we don't want that. We don't need more state. And, remember, we really don't need to microoptimize cases when single-step was used recently--the cost of taking one more single-step trap and percolating through the signal paths was going to be pretty large anyway. It's better to have a spurious report (preferably just spurious TIF_NOTIFY_RESUME with no actual callbacks) following any detach, or whatever it takes to ensure user_disable_single_step() always runs if user_enable_*_step did before and there is no engine around to see a report_quiesce callback pass. If there is a report_quiesce or report_signal callback pass where nobody returns UTRACE_*STEP, then we should never leave stepping enabled when we return to user mode. Yes, my concern was running and !current, not exiting. This is OK on x86 but user_disable_single_step() is arch specific. It's not really OK on x86 either, with either SMP or preemption. Thanks, Roland
[PATCH] utrace: utrace_reset() should clear TIF_SINGLESTEP if no more engines
Test-case: #include stdio.h #include signal.h #include unistd.h #include sys/ptrace.h #include sys/wait.h #include assert.h int main(void) { int pid, status; pid = fork(); if (!pid) { assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0); kill(getpid(), SIGSTOP); return 0x23; } assert(wait(status) == pid); assert(WIFSTOPPED(status) WSTOPSIG(status) == SIGSTOP); assert(ptrace(PTRACE_SINGLESTEP, pid, 0,0) == 0); assert(wait(status) == pid); assert(WIFSTOPPED(status) WSTOPSIG(status) == SIGTRAP); assert(ptrace(PTRACE_DETACH, pid, 0,0) == 0); assert(wait(status) == pid); if (WIFEXITED(status) WEXITSTATUS(status) == 0x23) return 0; printf(ERR!! exit status: %04x\n, status); return 1; } It fails because TIF_SINGLESTEP is not cleared after PTRACE_DETACH and the tracee gets another trap after resume. Change utrace_reset() to do user_disable_single_step(). Alternatively we could change ptrace layer, but I think this is not ptrace specific. Signed-off-by: Oleg Nesterov o...@redhat.com --- kernel/utrace.c |1 + 1 file changed, 1 insertion(+) --- kstub/kernel/utrace.c~10_utrace_reset_should_clear_ss 2010-09-20 03:55:12.0 +0200 +++ kstub/kernel/utrace.c 2010-09-20 21:33:47.0 +0200 @@ -709,6 +709,7 @@ static bool utrace_reset(struct task_str */ utrace-resume = UTRACE_RESUME; utrace-signal_handler = 0; + user_disable_single_step(task); } /*
Re: [PATCH] utrace: utrace_reset() should clear TIF_SINGLESTEP if no more engines
On Mon, 20 Sep 2010 21:42:19 +0200, Oleg Nesterov wrote: Test-case: Checked in http://sourceware.org/systemtap/wiki/utrace/tests as step-detach. Thanks, Jan