Re: asynchronous detach
On Thu, Jul 31, 2008 at 03:38:02PM -0700, Roland McGrath wrote: ... I think this can be a good model to use for non-perturbing quiesce for cases as breakpoint insertion and removal or any applicaton text modification, where one needs to ensure no thread is executing in the vicinity of the said modification. So, even on an -EINPROGRESS on a utrace_control, if the thread is in kernel, manipulating application text is still safe, right? For this particular case, maybe safe enough. I wouldn't like to encourage thinking of it as safe in general. Whatever the kernel is doing might be touching user memory, or unmapping it or whatever. For breakpoint insertion/removal, you are pretty much assuming nothing else would ever touch the text anyway. You still need an interlock against munmap text + mmap reusing addrs in quick succession. I suppose you could hold mmap_sem or something. Not sure if a similar model can be used to address the teardown races problem. Hmm. I mentioned before having a non-synchronizing record of the callback in progress. This lets UTRACE_DETACH return -EINPROGRESS, which is after the horse has left the barn, as it were. There is a related extension of this idea I had vaguely in mind but didn't mention. A utrace_set_events call that disables event bits could also return -EINPROGRESS when some callback might be in progress. As with detach, that means a zero return tells you all is well. Unlike detach, if you get -EINPROGRESS from utrace_set_events, you still have the engine. So, the intention is that -EINPROGRESS can be returned only on a utrace_set_events(task, engine, 0), right? Actually, I think it would also be useful to just have a utrace_clear_events_sync() that encapsulates the above call. I can see use for it outside of an asynchronous detach, for cases when one needs to just turn tracing events off on a thread. (It is definitely not recommended for debugger/probes usage with breakpoints already set in user code, but would be useful for someone doing non-invasive syscall and process lifetime tracing). If your callback never blocks, that is enough to know that it's safe to free your data, unload your module, etc. (Or if your callback can block but what you want to synchronize with is just what it does before it blocks. That's when you don't care about being able to unload the module, just juggling your own data structures.) For the general case of some unknown callback code that might include blocking, you could instead do: ret = utrace_set_events(task, engine, 0); while (ret == -EINPROGRESS) { ret = utrace_prepare_examine(task, engine, exam); if (ret == -EAGAIN) yield(); ret = utrace_set_events(task, engine, 0); if (ret == -EINPROGRESS) yield(); } That is: when it is blocked, check again if a callback still might be in progress, and loop. Assuming your callback is well-behaved and only ever blocks for a short time (like kmalloc), this will return soon. This can cetainly work! yield(); should actually be schedule_timeout_interruptible(1); probably, and it should be able to return -ERESTARTSYS. If this is a good plan, then we should encapsulate it in a new call utrace_set_events_sync() or something like that. So then the safe detach procedure would be just: ret = utrace_set_events_sync(task, engine, UTRACE_EVENT(REAP)); if (!ret) ret = utrace_control(task, engine, UTRACE_DETACH); This can only return zero or -ESRCH. Zero says that all callbacks are definitely finished. -ESRCH says that your report_reap callback is definitely happening (right now, modulo preemption). We could encapsulate that into utrace_detach_sync() too. Yes, this does take care of the thread about to die case too. Ananth
Re: asynchronous detach
So, the intention is that -EINPROGRESS can be returned only on a utrace_set_events(task, engine, 0), right? No, for any call when some bits had been enabled before and there hasn't been an intervening safe point. My idea was that it would return 0 only when it's sure that no disabled callback might be made. You'd get -EINPROGRESS any time we aren't 100% positive of that guarantee. Actually, I think it would also be useful to just have a utrace_clear_events_sync() that encapsulates the above call. I can see use for it outside of an asynchronous detach, for cases when one needs to just turn tracing events off on a thread. (It is definitely not recommended for debugger/probes usage with breakpoints already set in user code, but would be useful for someone doing non-invasive syscall and process lifetime tracing). I mentioned utrace_set_events_sync. Yes, this does take care of the thread about to die case too. I've written code to implement most of this. In working through it, I came across a couple of other wrinkles I hadn't thought about before. These come up in thinking about the possible interactions with callbacks that return UTRACE_DETACH. Firstly, you can't call *_sync freely if any of your callbacks ever use UTRACE_DETACH. When any callback returns UTRACE_DETACH, immediately the engine pointer is only kept alive by RCU. You can't make a blocking call with rcu_read_lock() held, so you can't be keeping it alive via RCU when you all *_sync. You can still use it, just not freely. What that means is that you need some other synchronization with your callbacks. If your callback safely updates a data structure saying already doing the detach, then you can synchronize with that update and check it before calling utrace_detach_sync. It takes a little thought to do that without deadlock, but it's doable. There are already special cases for the report_death callback using UTRACE_DETACH. We can see that a death callback might be in progress just from the task/utrace data structures, without actually looking at the engine pointer passed in. So we can relax that rule to allow using UTRACE_DETACH in report_death without making *_sync unsafe. The same sort of problems makes me unsure we can use interruptible blocks. I'll think about this some more. Thanks, Roland
Re: asynchronous detach, global tracing
Roland McGrath [EMAIL PROTECTED] writes: [...] What the utrace interface has always said about this is, So don't do that. [...] What I overlooked is that not just your data structures, but your callbacks too might be going away, i.e. unloading the kernel module. I don't think the module-unloading case is so special. If there exist races involving utrace detach, then they will affect long-lasting modules too that may want to do some utracing then some other stuff, then perhaps return to utracing again. In this scenario, the data too is volatile or could be repurposed between utrace sessions. Such a module would need to know positively when no further callbacks will arrive. [...] For global engines' detach, one option is to offer no help with your own data structures but to solve the module-unload problem using the module refcount. [...] If having a per-cpu counter vector is sufficiently low weight for utrace to update it around every callback, how about letting a utrace engine specify an (optional?) percpu-integer vector? Then, the utrace client could use a similar synchronization algorithm as that of module/refcount unloading to assure itself of a complete and final utrace detach. It could even opt to reuse the counters between engines, or between utracing sessions, if it knows that its data/code lifetimes can work with that. - FChE