Re: utrace-cleanup branch
Btw, why does utrace_set_events() check utrace-stopped? If a tracee is stopped then -reporting == engine is not possible, -reporting must be NULL. To optimize out other checks + mb() in the likely stopped case? Oleg.
Re: utrace-cleanup branch
To optimize out other checks + mb() in the likely stopped case? Yes. Thanks, Roland
Re: utrace-cleanup branch
Can't comment right now, need to read the code. Such gentlemanly restraint. Afaics, we can't just remove utrace_finish_jctl() and the similar code in utrace_stop(). We need void utrace_finish_jctl(void) { struct utrace *utrace = task_utrace_struct(current); /* * While in TASK_STOPPED, we can be considered safely stopped by * utrace_do_stop(). Make sure we can do nothing until the tracer * drops utrace-lock */ if (unlikely(__fatal_signal_pending())) spin_unlock_wait(utrace-lock); } and utrace_stop() should do the same. I see. The comments should be more clear that SIGKILL breaking TASK_TRACED is the only way this can arise. In the jctl case, that is in particular after utrace_do_stop() changed TASK_STOPPED into TASK_TRACED. Otherwise, the killed tracee can start another reporting loop and list_for_each() can race with, say, utrace_reset(DETACH)-utrace_reset(). More generally, if the tracer sees it is stopped under utrace-lock, the tracee must be really stopped until we drop utrace-lock(), it must not escape from utrace_stop() or do_signal_stop(). Correct. So today -stopped really does still mean one other thing. It means that it's either still in TASK_TRACED or is just waking up for SIGKILL. I think we've discussed the related stuff before. A contrary approach would be to ensure that in the SIGKILL-breaks-UTRACE_STOP case we never make any more normal reports at all before utrace_report_death. It seems like a good API choice on its face: if you used UTRACE_STOP, you didn't expect to see any SYSCALL_EXIT, SIGNAL, etc. events--the special case is the instant-death scenario, so straight to report_death makes some sense. I was attracted by this until I started going through it. It's all good until you consider report_clone. If a SIGKILL is pending when you get to utrace_report_clone(), the clone has already happened. If you skip that callback, the engine doesn't find out about the new child, and that matters if it's not a CLONE_THREAD. Thanks, Roland
Re: utrace-cleanup branch
On 10/29, Roland McGrath wrote: void utrace_finish_jctl(void) { struct utrace *utrace = task_utrace_struct(current); /* * While in TASK_STOPPED, we can be considered safely stopped by * utrace_do_stop(). Make sure we can do nothing until the tracer * drops utrace-lock */ if (unlikely(__fatal_signal_pending())) spin_unlock_wait(utrace-lock); } and utrace_stop() should do the same. I see. The comments should be more clear that SIGKILL breaking TASK_TRACED is the only way this can arise. In the jctl case, that is in particular after utrace_do_stop() changed TASK_STOPPED into TASK_TRACED. Agreed. So today -stopped really does still mean one other thing. It means that it's either still in TASK_TRACED or is just waking up for SIGKILL. I think we've discussed the related stuff before. A contrary approach would be to ensure that in the SIGKILL-breaks-UTRACE_STOP case we never make any more normal reports at all before utrace_report_death. Yes, I agree. I was attracted by this until I started going through it. It's all good until you consider report_clone. If a SIGKILL is pending when you get to utrace_report_clone(), the clone has already happened. If you skip that callback, the engine doesn't find out about the new child, and that matters if it's not a CLONE_THREAD. another nasty case is report_exit(). Even if we fix the discussed problems with explicit/implicit SIGKILL's. We shouldn't afaics skip this report if the tracee was killed by execing sub-thread. Both can be fixed if we add spin_unlock_wait() before REPORT(). But imho it would be safer if we start with spin_unlock_wait() in utrace_stop(), perhaps there is something else to consider. Oleg.