Re: Possible problem with utrace_control
On 06/21, Roland McGrath wrote: I am wondering if there is another way to handle such a race... Suppose we - add UTRACE_STOP_XXX - modify utrace_control(UTRACE_STOP_XXX) to do mark_engine_wants_stop() and nothing more. utrace_control(current,,UTRACE_STOP) doesn't really do anything more. It does utrace_do_stop() which sets UTRACE_REPORT/TIF_NOTIFY_RESUME. Harmless, but makes sense to avoid. Then, probably producer could do utrace_control(current, UTRACE_STOP_XXX); slot = book_message_in_buffer(buf); if (slot 0) return UTRACE_STOP_XXX; ... return UTRACE_RESUME; Even better, UTRACE_STOP_XXX can live outside of UTRACE_RESUME_MASK. Right, it doesn't belong in enum utrace_resume_action--it doesn't make sense as an argument for utrace_control, for example. So it could be something like a return-value flag bit saying stop if marked for stop, e.g. UTRACE_SINGLESTEP|UTRACE_STICKY_STOP for single-step, unless marked to stop, in which case stay stopped. Yes, this is what I meant. And yes, if UTRACE_STICKY_STOP doesn't fall into UTRACE_RESUME_MASK utrace_control(enum utrace_resume_action) shouldn't be used. OTOH, I am wondering if we can give more power to utrace_control(STOP). Currently debugger can't stop the tracee unless it cooperates. However, if the tracee doesn't have UTRACE_EVENT(QUIESCE), then utrace_control(STOP) works. This is strange. Stupid question: what if we just remove the clear_engine_wants_stop() code in finish_callback_report() ? So this would fix that race. ... (But note if there is another engine that decides to stop anyway, then we'll have used a second CPU and/or ping-pong scheduled to run the tracer thread in parallel with a very short amount of work in the tracee before it just blocks anyway.) Hmm. Not sure I understand... Yes, the tracee still can stop if another tracer demands, but this is fine? Btw, before I forgot about this. shouldn't utrace_control(UTRACE_RESUME) remove ENGINE_STOP from -utrace_flags like DETACH does? Here what the callback does is wake up the tracer, and return UTRACE_STOP. For this sort of scenario, it is clearly better by far to have the wake_up_tracer() done as late as possible in the sequence. In the past, Oleg has suggested something very general, like a utrace_engine_ops hook that gets called right before schedule(). I shy away from something like that, since I think it degrades the intended property of the API that it make it harder to write truly arcane bugs with terrible effects on the system, not easier. It's just too big a sledgehammer to wield above the thinly-protected toes of the utrace engine writer. To avoid something dangerously general, the only real option is to give something safely constrained and specific. It's straightforward enough to come up with something for any single synchronization method. For example, if every need for this kind of synchronization with tracees is based on linux/wait.h facilities, then we can have a utrace API feature whereby a callback can queue up a wait_queue_head_t pointer to be passed to wake_up_interruptible_sync() after all the callbacks are done and UTRACE_STOP is being performed. It's probably sufficient to allow one pending wake-up per engine, which makes it pretty cheap and simple to implement. Heh. it turns out that I misunderstood you in the past, I thought you dislike the whole idea of wake_up from utrace_stop(). I agree, the utrace_engine_ops-notify() hook is dangerous. But, it is simple and allows us to kill utrace_stop()-ptrace_notify_stop(). (yes I understand, this is not only about utrace_stop(), utrace_barrier wants notification too). It's straightforward enough to come up with something for any single synchronization method. Hmm. Not sure... at least I don't immediately see something simple. Except, just add wait_queue_head into struct utrace. This way utrace_barrier() can wait longer than needed if it notices utrace-reporting == engine. Or we can add it into utrace_engine, in this case utrace_stop() needs list_for_each(engine, utrace-attached). Or I misunderstood you again? Oleg.
Re: Possible problem with utrace_control
utrace_control(current,,UTRACE_STOP) doesn't really do anything more. It does utrace_do_stop() which sets UTRACE_REPORT/TIF_NOTIFY_RESUME. Harmless, but makes sense to avoid. Right. I was talking about the API perspective. Those internal details are more or less invisible to a given engine. OTOH, I am wondering if we can give more power to utrace_control(STOP). Currently debugger can't stop the tracee unless it cooperates. However, if the tracee doesn't have UTRACE_EVENT(QUIESCE), then utrace_control(STOP) works. This is strange. Why is it strange? If you don't have any callback, then all your engine does is get stopped. If you have callbacks, then you always get callbacks before the thread does anything (including stop). Since your callback's return value always replaces any other resume action for your engine, you have to cooperate with yourself in not changing your mind the next time your code is in charge of the tracee. Stupid question: what if we just remove the clear_engine_wants_stop() code in finish_callback_report() ? That would change the API so that a callback's resume action is always overridden by a utrace_control(,,UTRACE_STOP). Today the practice from a third-party debugger thread is to do: ret = utrace_control(task, engine, UTRACE_STOP); if (ret == 0) { // Already stopped, e.g. was in jctl or another engine's stop. // This means we can do our asynchronous examination here. } else if (ret == -EINPROGRESS) { // it will get to report_quiesce soonish (modulo blocked in kernel) } If the examination you wanted to do was some predetermined sequence of fetching and fiddling before resuming, then the report_* callback can just do all that itself. For example, fetch the registers for a sample and keep going; insert some breakpoints and keep going; etc. Today, the callback that goes with that asynchronous-side code has the choice to do something and stop or to do something and not stop. With this change, it would have to explicitly request not stopping by calling utrace_control(current, engine, UTRACE_RESUME) inside the callback. It would need to use its own bookkeeping above the utrace layer to know that there was a stop it was supposed to clear (i.e. something the code sample above would set before calling utrace_control). This changes the set of races with asynchronous stop/resume scenarios. But it doesn't remove the asynchronous-resume vs callback-stop race. In concert with wake_up_tracer-after-stop perhaps it gets better. We'd have to really think through the races with the new API. So this would fix that race. ... (But note if there is another engine that decides to stop anyway, then we'll have used a second CPU and/or ping-pong scheduled to run the tracer thread in parallel with a very short amount of work in the tracee before it just blocks anyway.) Hmm. Not sure I understand... Yes, the tracee still can stop if another tracer demands, but this is fine? Correct. When that is what's going to happen (e.g. there is another engine attached that is a slow stopper like ptrace), then it's suboptimal to have the scheduler decide to yield the CPU immediately on waking your first tracer. Btw, before I forgot about this. shouldn't utrace_control(UTRACE_RESUME) remove ENGINE_STOP from -utrace_flags like DETACH does? Perhaps so. My memory is dim on the use of that bit in utrace_flags. It's to close certain races, but we'll have to remind each other of the details. Heh. it turns out that I misunderstood you in the past, I thought you dislike the whole idea of wake_up from utrace_stop(). No, just the over-general hook or the over-specific ptrace hack. I talked about about a utrace_wake_up() that would work this way. Hmm. Not sure... at least I don't immediately see something simple. Except, just add wait_queue_head into struct utrace. This way utrace_barrier() can wait longer than needed if it notices utrace-reporting == engine. Or we can add it into utrace_engine, in this case utrace_stop() needs list_for_each(engine, utrace-attached). You're talking about two different things here. Whatever implementation details change in utrace_barrier() is not directly apropos. That would not change the API-visible details at all, just make utrace_barrier() perform better. Let's not get into that before we hash out the new API ideas. For the new API idea, I was talking about something like a utrace_wake_up() call. That might be implemented by a wait_queue_head_t * in utrace_engine, for example. The implementation internals could involve a simple iteration, or involve more hidden bookkeeping. I'm only talking about the API idea for the moment. I only mentioned implementation details at all to indicate that a restricted API (one wake-up per callback per engine) could be done with constant storage overhead and thus avoid a whole potential can of worms--which is the only real reason to