Re: [PATCH 0/3] UTRACE_DETACH fixes
I misunderstood the UTRACE_INTERRUPT without UTRACE_EVENT(QUIESCE) above as if you mean it is possible to get UTRACE_SIGNAL_REPORT without QUIESCE. No, I meant the opposite: since you won't get UTRACE_SIGNAL_REPORT without UTRACE_EVENT(QUIESCE), it might seem that UTRACE_INTERRUPT should not be permitted, as UTRACE_REPORT is not. But because that is not it's *only* effect, we don't do permit it even though you won't get UTRACE_SIGNAL_REPORT. OK. This means ugdb should set QUIESCE, just to ensure its -report_signal() will be called. If you ever want UTRACE_SIGNAL_REPORT calls, yes. OK, please forget. I must admit, this still looks a bit, well, unnatural to me. May be it makes sense to add _UTRACE_EVENT_SIGNAL_REPORT, _UTRACE_EVENT_SIGNAL_HANDLER, into utrace_events. Then ugdb could ask for UTRACE_SIGNAL_REPORT and avoid the unnecessary -report_quiesce() calls. If anything, I think UTRACE_EVENT(RESUME) would make sense, perhaps with a separate -report_resume callback. That would request only the cases that now get report_quiesce(0). But, it really does not cost much to have if (event) return UTRACE_RESUME; or similar in your report_quiesce callback. It's beyond refinement of the utrace API (that we're not doing now), it's just micro-optimization. Roland, could you also comment this patch? https://www.redhat.com/archives/utrace-devel/2010-August/msg00108.html Again, this looks like a bug to me, but I won't insist if it is not. Sorry, that is still on my queue. I haven't forgotten it. I just haven't gotten to reviewing it yet because of other work and it needing a lot of hard thinking. Thanks, Roland
Re: [PATCH 0/3] UTRACE_DETACH fixes
On 09/03, Roland McGrath wrote: So, it is technically kosher enough to use UTRACE_INTERRUPT without UTRACE_EVENT(QUIESCE) set, if you really know the situation and are sure about all those caveats above. How? see below. I mentioned the examples. I misunderstood the UTRACE_INTERRUPT without UTRACE_EVENT(QUIESCE) above as if you mean it is possible to get UTRACE_SIGNAL_REPORT without QUIESCE. But, it's only UTRACE_EVENT(QUIESCE) that gives you any expectation of an extra callback for no event from either UTRACE_REPORT or UTRACE_INTERRUPT. Yes, this is clear too. Apparently not. ;-) Perhaps ;) At least I certainly missed your point. QUIESCE is the only event type for no event. If you only asked for signal events, then you only get a callback when there is an actual signal event. If you want an extra callback before dequeuing any signal, then that is what QUIESCE is for. OK. This means ugdb should set QUIESCE, just to ensure its -report_signal() will be called. Once again, I am not asking to change utrace now, but could you explain what is wrong with the patch below? That gives an extra unrequested report_signal callback to every engine that is only interested in some signal event. Consider a crash-catcher engine. It would only track UTRACE_EVENT(SIGNAL_CORE) (and perhaps CLONE for its bookkeeping). This engine never asked for a callback before each and every attempt to dequeue any signal, I see your point (I hope). which is what you would give it with your change. No ;) My change was wrong. event == 0 in this case, and then later utrace_get_signal() checks if ((want (event | UTRACE_EVENT(QUIESCE))) == 0) continue; OK, please forget. I must admit, this still looks a bit, well, unnatural to me. May be it makes sense to add _UTRACE_EVENT_SIGNAL_REPORT, _UTRACE_EVENT_SIGNAL_HANDLER, into utrace_events. Then ugdb could ask for UTRACE_SIGNAL_REPORT and avoid the unnecessary -report_quiesce() calls. But at least I can see the logic now, thanks. Roland, could you also comment this patch? https://www.redhat.com/archives/utrace-devel/2010-August/msg00108.html Again, this looks like a bug to me, but I won't insist if it is not. Oleg.
Re: [PATCH 0/3] UTRACE_DETACH fixes
Hmm. I am not sure I understand. If -report_signal != NULL, then you can't expect -report_quiesce() after utrace_control(INTERRUPT) ? Sorry, I used report_quiesce as shorthand for either report_quiesce, or report_signal with UTRACE_SIGNAL_REPORT. So, it is technically kosher enough to use UTRACE_INTERRUPT without UTRACE_EVENT(QUIESCE) set, if you really know the situation and are sure about all those caveats above. How? see below. I mentioned the examples. But, it's only UTRACE_EVENT(QUIESCE) that gives you any expectation of an extra callback for no event from either UTRACE_REPORT or UTRACE_INTERRUPT. Yes, this is clear too. Apparently not. ;-) So. Now I am even more confused. First of all, ugdb (at least currently) does not need report_quiesce() and UTRACE_EVENT(QUIESCE) at all. OK, this is not 100% true due to multitracing/etc, lets ignore this. Anyway it makes sense to clear QUIESCE after $c to avoid the unnecessary callbacks. All it needs is -report_signal(). But, the problem is, it is not called after utrace_control(INTERRUPT) ! You need QUIESCE to ensure report_signal() will be called, and this looks at least strange to me. QUIESCE is the only event type for no event. If you only asked for signal events, then you only get a callback when there is an actual signal event. If you want an extra callback before dequeuing any signal, then that is what QUIESCE is for. Once again, I am not asking to change utrace now, but could you explain what is wrong with the patch below? That gives an extra unrequested report_signal callback to every engine that is only interested in some signal event. Consider a crash-catcher engine. It would only track UTRACE_EVENT(SIGNAL_CORE) (and perhaps CLONE for its bookkeeping). This engine never asked for a callback before each and every attempt to dequeue any signal, which is what you would give it with your change. Thanks, Roland
Re: [PATCH 0/3] UTRACE_DETACH fixes
But probably we need utrace_detached_ops-report_reap() anyway. -report_death can return UTRACE_DETACH, and after that utrace_report_death() can skip utrace_reset() and do -report_reap(). Yeah, that's a good point. There's no good reason to do a special case check for detached_ops there rather than just having the no-op report_reap hook. If an engine used UTRACE_INTERRUPT without having first set QUIESCE, then it has no rightful expectation of getting any callback. Hmm. This is certainly new to me. Could you confirm? Well, I didn't say it precisely correctly. UTRACE_INTERRUPT serves two purposes. First, it just interrupts things like a signal would. You could use that on its own without even tracing any events at all, just do achieve fault injection or similar. Second, it's like UTRACE_REPORT. If you do have some other event bits set, then you can expect/rely on getting those normal events soon if they would otherwise have happened--i.e., if you know it's blocked in a syscall, and you have UTRACE_EVENT(SYSCALL_EXIT) set, then you can expect to get a report_syscall_exit soon. (But, it's always possible to race with the syscall finishing on its own, or being interrupted by an outside signal, so that exit might have come before your utrace_control call if you were not otherwise synchronizing with your established report_syscall_exit callback.) As with UTRACE_REPORT, after UTRACE_INTERRUPT you can expect to get some report_quiesce callback even if there is no other event you were tracing that happens. For this to actually happen, you need to have UTRACE_EVENT(QUIESCE) set. So, it is technically kosher enough to use UTRACE_INTERRUPT without UTRACE_EVENT(QUIESCE) set, if you really know the situation and are sure about all those caveats above. But, it's only UTRACE_EVENT(QUIESCE) that gives you any expectation of an extra callback for no event from either UTRACE_REPORT or UTRACE_INTERRUPT. (And since that is the sole possible purpose of UTRACE_REPORT, we diagnose a utrace_control call like that.) Thanks, Roland
Re: [PATCH 0/3] UTRACE_DETACH fixes
I was going to ping you ;) This is connected to the problem I hit today. Despite the fact I already complained about utrace_get_signal() QUIESCE, I forgot about this and figured out it doesn't work in practice. On 09/02, Roland McGrath wrote: If an engine used UTRACE_INTERRUPT without having first set QUIESCE, then it has no rightful expectation of getting any callback. Hmm. This is certainly new to me. Could you confirm? Well, I didn't say it precisely correctly. UTRACE_INTERRUPT serves two purposes. First, it just interrupts things like a signal would. You could use that on its own without even tracing any events at all, just do achieve fault injection or similar. Second, it's like UTRACE_REPORT. Yes, this is clear. As with UTRACE_REPORT, after UTRACE_INTERRUPT you can expect to get some report_quiesce callback even if there is no other event you were tracing that happens. For this to actually happen, you need to have UTRACE_EVENT(QUIESCE) set. Hmm. I am not sure I understand. If -report_signal != NULL, then you can't expect -report_quiesce() after utrace_control(INTERRUPT) ? So, it is technically kosher enough to use UTRACE_INTERRUPT without UTRACE_EVENT(QUIESCE) set, if you really know the situation and are sure about all those caveats above. How? see below. But, it's only UTRACE_EVENT(QUIESCE) that gives you any expectation of an extra callback for no event from either UTRACE_REPORT or UTRACE_INTERRUPT. Yes, this is clear too. So. Now I am even more confused. First of all, ugdb (at least currently) does not need report_quiesce() and UTRACE_EVENT(QUIESCE) at all. OK, this is not 100% true due to multitracing/etc, lets ignore this. Anyway it makes sense to clear QUIESCE after $c to avoid the unnecessary callbacks. All it needs is -report_signal(). But, the problem is, it is not called after utrace_control(INTERRUPT) ! You need QUIESCE to ensure report_signal() will be called, and this looks at least strange to me. Once again, I am not asking to change utrace now, but could you explain what is wrong with the patch below? Oleg. --- x/kernel/utrace.c +++ x/kernel/utrace.c @@ -2029,7 +2029,8 @@ int utrace_get_signal(struct task_struct report.spurious = false; finish_resume_report(task, utrace, report); return -1; - } else if (!(task-utrace_flags UTRACE_EVENT(QUIESCE))) { + } else if (!(task-utrace_flags + (UTRACE_EVENT(QUIESCE) | UTRACE_EVENT_SIGNAL_ALL))) { /* * We only got here to clear utrace-signal_handler. */
Re: [PATCH 0/3] UTRACE_DETACH fixes
On 08/19, Roland McGrath wrote: OK, instead of filling utrace_detached_ops{} we can change start_callback() - if (want UTRACE_EVENT(QUIESCE)) { + if ((want UTRACE_EVENT(QUIESCE)) || ops == detached_ops) { If we're going to have a special-case check for utrace_detached_ops, then it can just short-circuit entirely and there is no need for any callbacks in utrace_detached_ops. (Then we might even use NULL or (void *)-1l instead of utrace_detached_ops.) All it needs to do is: report-detaches = true; utrace-reporting = NULL; return NULL; Yes, I thought about this too, see the changelog. But probably we need utrace_detached_ops-report_reap() anyway. -report_death can return UTRACE_DETACH, and after that utrace_report_death() can skip utrace_reset() and do -report_reap(). And, in particular, I don't understand this code in utrace_get_signal: } else if (!(task-utrace_flags UTRACE_EVENT(QUIESCE))) { /* * We only got here to clear utrace-signal_handler. */ return -1; } How so? What if we were called because of utrace_control(INTERRUPT) and QUIESCE is not set? If an engine used UTRACE_INTERRUPT without having first set QUIESCE, then it has no rightful expectation of getting any callback. Hmm. This is certainly new to me. Could you confirm? If yes, shouldn't we change utrace_control() - if (action = UTRACE_REPORT action UTRACE_RESUME + if (action = UTRACE_INTERRUPT action UTRACE_RESUME unlikely(!(engine-flags UTRACE_EVENT(QUIESCE { then? I must admit, I do not understand why INTERRUPT needs QUIESCE. utrace_get_signal() should respect QUIESCE, this is clear. But why it is needed? Oleg.
Re: [PATCH 0/3] UTRACE_DETACH fixes
On 08/17, Oleg Nesterov wrote: Suppose that, say, engine-flags == UTRACE_EVENT(EXEC) (no QUIESCE). 1. start_callback() reads want = engine-flags (== EXEC) 2. mark_engine_detached() sets engine-ops = utrace_detached_ops 3. start_callback() gets ops = utrace_detached_ops After that start_callback() skips if (want UTRACE_EVENT(QUIESCE)) block and returns utrace_detached_ops, utrace_detached_ops-report_exec == NULL will be called. OK, instead of filling utrace_detached_ops{} we can change start_callback() - if (want UTRACE_EVENT(QUIESCE)) { + if ((want UTRACE_EVENT(QUIESCE)) || ops == detached_ops) { But this is minor. Roland, I can't explain this, but I have the strong gut feeling there is something else we should worry about. I am still reading this code... Or not... I'll recheck once again tomorrow and try to make the patch. In particular, I'd like to think if we can simplify this somehow. Say, avoid the barriers somewhere, or avoid changing engine-flags in mark_engine_detached(). I am worried that mark_engine_detached() sets engine-flags, but it doesn't add QUIESCE to utrace_flags. I can't convince myself that utrace_do_stop() closes all possible races. And, in particular, I don't understand this code in utrace_get_signal: } else if (!(task-utrace_flags UTRACE_EVENT(QUIESCE))) { /* * We only got here to clear utrace-signal_handler. */ return -1; } How so? What if we were called because of utrace_control(INTERRUPT) and QUIESCE is not set? Oleg.
Re: [PATCH 0/3] UTRACE_DETACH fixes
On 08/16, Oleg Nesterov wrote: Well, I tried to test these changes, seems to work. But... From 2/3 changelog: This means that mark_engine_detached() can race with REPORT_CALLBACKS() but there is nothing new. Yes, there is nothing new. But, Roland, this is WRONG! With or without these fixes utrace_control()-mark_engine_detached() was always wrong if it races with REPORT/REPORT_CALLBACKS. Not sure what we were thinking about :/ Look. Suppose that utrace_control(DETACH) is called right after start_callback() returns ops != NULL. After that finish_callback(..., ops-callback(...)) can hit -callback == NULL! Cough. I guess I spent too much time with utrace and testing today ;) Let me try again. mark_engine_detached() does engine-ops = utrace_detached_ops; smp_wmb(); engine-flags = UTRACE_EVENT(QUIESCE); Whatever we do, start_callback() can see the old engine-flags but the new -ops = utrace_detached_ops. Just suppose that the caller of UTRACE_DETACH is interrupted right after setting engine-ops. The obvious and simplest solution: utrace_detached_ops{} should have the dummy handler for every callback. But I'd like to think a bit more. Yes. Perhaps (unlikely) we can reverse the order, but I can no longer think properly today. Or do you think I miss something and this is false alarm? Oleg.
Re: [PATCH 0/3] UTRACE_DETACH fixes
Whatever we do, start_callback() can see the old engine-flags but the new -ops = utrace_detached_ops. Just suppose that the caller of UTRACE_DETACH is interrupted right after setting engine-ops. * it can check the old flags before using the old ops, or check the old * flags before using the new ops, or check the new flags before using the * new ops, but can never check the new flags before using the old ops. Or do you think I miss something and this is false alarm? * Hence, utrace_detached_ops might be used with any old flags in place. * It has report_quiesce() and report_reap() callbacks to handle all cases. report_reap is covered with the utrace_detached_reap() stub. Every other callback uses start_callback(), i.e. calls report_quiesce first. utrace_detached_quiesce() returns UTRACE_DETACH, so start_callback() will return NULL. [...] but I can no longer think properly today. Sleep well. :-) Thanks, Roland