Re: [PATCH] utrace_barrier(detached_engine) must not sping without checking -reporting
On 10/22, Oleg Nesterov wrote: On 10/19, Roland McGrath wrote: Yes. But, with this patch, in this case utrace_barrier()-get_utrace_lock(attached = false) returns success and then we check utrace-reporting == engine. I guess that's OK if -reporting == engine is always set when any kind of callback might be in progress. I think utrace_maybe_reap() is the only exception, (Hmm. Probably utrace-reap = T case needs more attention, utrace_maybe_reap() doesn't set utrace-reporting). That would be a problem. Yes, but the current code has the same problem? Hmm, I need to recheck this all once again tomorrow. Suddenly I started to suspect that even -ops == NULL case is not right. And the more I read this code now, the more I confused. Yes. And get_utrace_lock() must not succeed if utrace-reap == T c93fecc9 makes things worse. With this patch utrace_barrier() can return -ESRCH while -report_death() or report_reap() is in progress. But even before that comment, I do not think that !engine-ops check can prevent the race with -report_reap(), we need at least mb() in utrace_maybe_reap() before engine-ops = NULL. But in any case. If engine-ops == utrace_detached_ops and utrace-reporting != engine, I do not see why utrace_barrier() can't return success, even if utrace-reap is set. No! it must not return success! I still think it should not spin if -reporting != engine, but it should not return zero. Somehow I forgot that utrace_barrier() != reporting != engine. This means that my patch is very wrong. Oleg.
Re: [PATCH] utrace_barrier(detached_engine) must not sping without checking -reporting
Yes. But, with this patch, in this case utrace_barrier()-get_utrace_lock(attached = false) returns success and then we check utrace-reporting == engine. I guess that's OK if -reporting == engine is always set when any kind of callback might be in progress. (Hmm. Probably utrace-reap = T case needs more attention, utrace_maybe_reap() doesn't set utrace-reporting). That would be a problem. But, unfortunately, this signal_pending() check assumes the caller can always handle this ERESTARTSYS. But this is not true, one example is the exiting debugger doing exit_ptrace(). I understand it's a problem. Perhaps we need to have a flag argument or (probably better) a new utrace_barrier_uninterruptible call. But, for this particular situation with exit_ptrace, it could be kludged around. The caller is never going to dequeue another signal once it's as far into death as exit_ptrace anyway. So, it could do a loop of: ret = utrace_barrier(task, engine); if (ret == -ERESTARTSYS) { clear_thread_flag(TIF_SIGPENDING); continue; } Not that I'm suggesting this isn't a dismal kludge. But it seems like it would close the hole we have today in practice. Thanks, Roland
Re: [PATCH] utrace_barrier(detached_engine) must not sping without checking -reporting
On 10/12, Roland McGrath wrote: If engine is detached (has utrace_detached_ops), utrace_barrier(engine) spins until engine-ops becomes NULL. This is just wrong. Yes, this happens because get_utrace_lock returns -ERESTARTSYS and utrace_barrier checks for this and loops. I agree these long-spin scenarios would be wrong. The reason it tries to wait for fully detached state is that after utrace_control(task,engine,UTRACE_DETACH), @task could still be in the middle of a callback for @engine. Yes. But, with this patch, in this case utrace_barrier()-get_utrace_lock(attached = false) returns success and then we check utrace-reporting == engine. This patch only makes a difference if ops == utrace_detached_ops, Before this patch spin until -ops goes away After this patch spin until -ops goes away OR -reporting != engine (Hmm. Probably utrace-reap = T case needs more attention, utrace_maybe_reap() doesn't set utrace-reporting). Also, it is not clear why utrace_barrier() needs utrace-lock, except to ensure it is safe to dereference target/utrace. Well, wouldn't that be reason enough? The comment in utrace_barrier talks about needing the lock. This corresponds to the comment in the UTRACE_DETACH case of finish_callback_report. Yes agreed. Honestly, I can't even recall what I meant. Note: we should also reconsider() utrace_barrier()-signal_pending() check. IMHO it is badly wrong to have utrace_barrier do an uninterruptible wait (even moreso since it's really a spin). If a buggy callback gets stuck blocking or spinning and fails to return promptly, then you wedge any debugger thread trying to synchronize with it via utrace_barrier. If you can't even interrupt that debugger thread, then there will really be no chance to recover from the deadlock. Completely agreed. But, unfortunately, this signal_pending() check assumes the caller can always handle this ERESTARTSYS. But this is not true, one example is the exiting debugger doing exit_ptrace(). Oleg.
Re: [PATCH] utrace_barrier(detached_engine) must not sping without checking -reporting
Please feel free to ignore this if it's no longer relevant to your work. I'm trying to catch up on the backlog of replies I owe you. I chose this one as the arbitrary cut-off before which I think I have already neglected things too long to still matter. If engine is detached (has utrace_detached_ops), utrace_barrier(engine) spins until engine-ops becomes NULL. This is just wrong. Yes, this happens because get_utrace_lock returns -ERESTARTSYS and utrace_barrier checks for this and loops. I agree these long-spin scenarios would be wrong. The reason it tries to wait for fully detached state is that after utrace_control(task,engine,UTRACE_DETACH), @task could still be in the middle of a callback for @engine. Suppose that utrace_control(DETACH) returns -EINPROGRESS, now we should call utrace_barrier(). However, it is possible that -EINPROGRESS means we raced with sys_sleep(A_LOT) doing report_syscall_entry(). Right. Perhaps utrace_barrier could do some different variant of the (utrace-reporting != engine) check. Change get_utrace_lock() to succeed if the caller is utrace_barrier() and ops == utrace_detached_ops. I do not see any reason why this case should be special from utrace_barrier's pov. It can just check -reporting and return 0 or do another iteration. [...] Also, it is not clear why utrace_barrier() needs utrace-lock, except to ensure it is safe to dereference target/utrace. Well, wouldn't that be reason enough? The comment in utrace_barrier talks about needing the lock. This corresponds to the comment in the UTRACE_DETACH case of finish_callback_report. Do you think those comments are inaccurate about what's required? Note: we should also reconsider() utrace_barrier()-signal_pending() check. IMHO it is badly wrong to have utrace_barrier do an uninterruptible wait (even moreso since it's really a spin). If a buggy callback gets stuck blocking or spinning and fails to return promptly, then you wedge any debugger thread trying to synchronize with it via utrace_barrier. If you can't even interrupt that debugger thread, then there will really be no chance to recover from the deadlock. Thanks, Roland