Re: [PATCH 3/4] utrace_set_events: fix UTRACE_EVENT(REAP) case
On 08/16, Roland McGrath wrote: - It is possible that both -death and -reap are true. In this case it is OK to clear UTRACE_EVENT(REAP), but set_events fails. No, it's not OK to clear it. Once -reap is set, then the engine's ops-report_reap might or might not have been called already. Afaics - no. If utrace-death is set (and we check it under utrace-lock) we can ignore utrace-reap. In short, if ops-report_reap can be called before -death is cleared, then 2 possible callers of utrace_maybe_reap() can race with each other, but this can't happen. utrace-death == T means: - (utrace_flags _UTRACE_DEATH_EVENTS) == T - utrace-death was set by utrace_report_death() which will take utrace-lock later and clear -death, only then it may call ops-report_reap(). - until utrace_report_death() clears -death, _UTRACE_DEATH_EVENTS must be set in -utrace_flags, otherwise utrace_maybe_reap(true) is buggy. Note that both utrace-death and _UTRACE_DEATH_EVENTS are cleared atomically from utrace-lock pov. IOW. utrace-death is true, then IF (utrace-reap) tracehook_prepare_releas()-utrace_maybe_reap(true) was already called, this is how utrace-reap was set. But utrace_maybe_reap() did nothing and returned. We rely on the subsequent utrace_maybe_reap(false) from utrace_report_death() - but this can't happen until we drop utrace-lock ELSE utrace-reap can't be set until we drop utrace-lock. Now that you merged c93fecc925ea7567168f0c94414b9021de2708c5 get_utrace_lock() must not succeed if utrace-reap == T, this becomes a bit off-topic. However, I thought about relaxing the dead check in get_utrace_lock(), instead of utrace-reap we could check utrace-reap !utrace-death. In fact, initially I was going to do this, but then decided to make the simpler patch for now. Oleg.
Re: [PATCH 2/4] utrace_set_events: consolidate setting _UTRACE_DEATH_EVENTS checks
On 08/16, Roland McGrath wrote: In short: this double check allows to avoid tasklist when utrace_flags already has these bits while engine-flags doesn't. But multitracing is unlikely case, in the likely case if we add _UTRACE_DEATH_EVENTS to engine-flags we set these bits in utrace_flags too. I think it makes sense to consolidate these checks to make the code a bit more understandable. I don't really agree about unlikely case. In many uses, the systemtap task-finder will have a utrace engine on every task in the system, for example. Moreover, this is an importantly distinct particular kind of micro-optimization to be doing here: avoiding a system-wide lock. Any place that we take tasklist_lock at all, we are introducing a system-wide slowdown or limitation on scaling, just because of our tracing of one task. So optimizing the minimize that is qualitatively different and really much more important than avoiding taking the utrace lock, for example. But with that note in your mind, I am happy to take this patch now as a simplification and let reoptimization come back later. OK, agreed. Oleg.
Re: [PATCH 0/3] UTRACE_DETACH fixes
On 08/17, Oleg Nesterov wrote: Suppose that, say, engine-flags == UTRACE_EVENT(EXEC) (no QUIESCE). 1. start_callback() reads want = engine-flags (== EXEC) 2. mark_engine_detached() sets engine-ops = utrace_detached_ops 3. start_callback() gets ops = utrace_detached_ops After that start_callback() skips if (want UTRACE_EVENT(QUIESCE)) block and returns utrace_detached_ops, utrace_detached_ops-report_exec == NULL will be called. OK, instead of filling utrace_detached_ops{} we can change start_callback() - if (want UTRACE_EVENT(QUIESCE)) { + if ((want UTRACE_EVENT(QUIESCE)) || ops == detached_ops) { But this is minor. Roland, I can't explain this, but I have the strong gut feeling there is something else we should worry about. I am still reading this code... Or not... I'll recheck once again tomorrow and try to make the patch. In particular, I'd like to think if we can simplify this somehow. Say, avoid the barriers somewhere, or avoid changing engine-flags in mark_engine_detached(). I am worried that mark_engine_detached() sets engine-flags, but it doesn't add QUIESCE to utrace_flags. I can't convince myself that utrace_do_stop() closes all possible races. And, in particular, I don't understand this code in utrace_get_signal: } else if (!(task-utrace_flags UTRACE_EVENT(QUIESCE))) { /* * We only got here to clear utrace-signal_handler. */ return -1; } How so? What if we were called because of utrace_control(INTERRUPT) and QUIESCE is not set? Oleg.