Re: [PATCH 3/4] utrace_set_events: fix UTRACE_EVENT(REAP) case
On 08/19, Roland McGrath wrote: On 08/16, Roland McGrath wrote: Now that you merged c93fecc925ea7567168f0c94414b9021de2708c5 get_utrace_lock() must not succeed if utrace-reap == T, this becomes a bit off-topic. However, I thought about relaxing the dead check in get_utrace_lock(), instead of utrace-reap we could check utrace-reap !utrace-death. In fact, initially I was going to do this, but then decided to make the simpler patch for now. When utrace-reap is set, it means release_task() has been called. The API caller has no way to know reaping hasn't already begun--except if its report_death callback has not returned yet. No! the task can be already reaped even before -report_death() was called. The task_struct itself can't go away, but that is all (perhaps you meant this). And this is the problem for ugdb. Damn, I knew this, that is why ugdb_process_exit() doesn't try to read -group_exit_code. Still I managed to forget that this has other implications. But I think that the API definition of once release_task() has been called (i.e. entered, maybe not returned yet), any utrace call will get -ESRCH, is a clear and comprehensible constraint. We should not open any holes in that. I am not sure. I had another reason in mind to check reap !death in get_utrace_lock(), synchronization with -report_death() callback. But OK. Let's forget about more utrace changes for now. Oleg.
Re: [PATCH 2/3] fix utrace_control(DETACH) utrace-death interaction
On 08/19, Roland McGrath wrote: Wait. It doesn't break this. It only breaks -EALREADY contract. And I don't understand why this is bad. The -EALREADY contract lets you have a report_death callback that does all your cleanup and then returns UTRACE_DETACH. I think this is possible without the current -EALREADY contract. To have that plan, you need a way for an asynchronous detach attempt to be excluded once report_death has begun, and for that asynchronous caller to know that's the situation. The problem is, with the current conract detach is never simple if you have report_death. You always have to synchronize with this callback. But see below. It breaks the documented API. I am of course open to changing the API. But we need to get completely clear on what the new range of plans we'll support is, and make the documentation and the implementation match. Yes. I thought it makes sense to change the API and docs if we can improve things, before utrace is widely used. But OK, lets's forget about this. Why? What is the point? What makes UTRACE_EVENT(DEATH) so special? I do not see the logic at all. It is special because it is the last interesting event to a traditional user such as ptrace. Hence it is attractive to write a report_death callback that returns UTRACE_DETACH spontaneously, i.e. without an asynchronous caller (i.e. debugger thread) having requested the detach. Sure. And this is still possible. If -report_death() does something which the caller of utrace_control() should know, then they should take care of synchronization anyway. With or without this patch, utrace_control(DEATH) can return 0 after -report_death() was already called. If your engine's report_death always returns UTRACE_DETACH, then utrace_control(UTRACE_DETACH) can never return 0 once report_death is about to be called nor any time thereafter. I don't undestand this. OK, probably I missed something, this doesn't matter. But, the current state of play from the broader perspective is that we really have no idea about the future viability of the utrace API. I don't think it makes a lot of sense to put a great deal of effort into perfecting the corners of the API much further when we don't really have any good reasons to think that this API will be the basis of anything that survives in the long run. Yes. I have to agree, this is good point. So, lets forget about these changes, at least for now. The current focus of work is your ugdb prototype. If some utrace changes make it greatly easier to get that to kind-of-working status, then they are worthwhile. No. I didn't think about ugdb at all. I'll find the workaround for ugdb. If nothing else, I can use utrace_engine_ops-release(). Or something else. If it works in practice but has disturbing robustness holes, it is OK to just comment that loudly now and move on to the next battle, IMHO. OK, agreed. Oleg.
Re: [PATCH 0/3] UTRACE_DETACH fixes
On 08/19, Roland McGrath wrote: OK, instead of filling utrace_detached_ops{} we can change start_callback() - if (want UTRACE_EVENT(QUIESCE)) { + if ((want UTRACE_EVENT(QUIESCE)) || ops == detached_ops) { If we're going to have a special-case check for utrace_detached_ops, then it can just short-circuit entirely and there is no need for any callbacks in utrace_detached_ops. (Then we might even use NULL or (void *)-1l instead of utrace_detached_ops.) All it needs to do is: report-detaches = true; utrace-reporting = NULL; return NULL; Yes, I thought about this too, see the changelog. But probably we need utrace_detached_ops-report_reap() anyway. -report_death can return UTRACE_DETACH, and after that utrace_report_death() can skip utrace_reset() and do -report_reap(). And, in particular, I don't understand this code in utrace_get_signal: } else if (!(task-utrace_flags UTRACE_EVENT(QUIESCE))) { /* * We only got here to clear utrace-signal_handler. */ return -1; } How so? What if we were called because of utrace_control(INTERRUPT) and QUIESCE is not set? If an engine used UTRACE_INTERRUPT without having first set QUIESCE, then it has no rightful expectation of getting any callback. Hmm. This is certainly new to me. Could you confirm? If yes, shouldn't we change utrace_control() - if (action = UTRACE_REPORT action UTRACE_RESUME + if (action = UTRACE_INTERRUPT action UTRACE_RESUME unlikely(!(engine-flags UTRACE_EVENT(QUIESCE { then? I must admit, I do not understand why INTERRUPT needs QUIESCE. utrace_get_signal() should respect QUIESCE, this is clear. But why it is needed? Oleg.
Re: [PATCH] fix mark_engine_detached() vs start_callback() race
On 08/19, Roland McGrath wrote: I think in the longer term mark_engine_detached() should not change engine-flags at all but add QUIESCE to -utrace_flags. However, this breaks utrace_maybe_reap(reap = true) and we should avoid the race with finish_callback() which clears -reporting after report_quiesce(). What's the benefit to adding QUIESCE? If any utrace code gets entered at all, then any callback run will be able to do the special-case ops check for detached engines. Well yes, but otoh it could be the last engine. We shouldn't miss, say, start_callback() from utrace_resume() (although currently -spurious helps). Anyway, this needs more thinking, and probably you are right and QUIESCE is not needed. Oleg.