Re: [PATCH 1/4] ptrace: temporary revert the recent ptrace/jobctl rework
OK, so here's my (hacky) idea: (1) Forget ptrace-via-utrace. Have utrace be a separate thing. This way the recent ptrace changes won't matter. (2) But, what about ptrace co-existing well with utrace? Make them mutually exclusive - a ptraced-process can't be utraced and a utraced-process can't be ptraced. We had this situation before for a while. It has the substantial downside that e.g. you cannot do any system-wide systemtap tracing without making all strace and gdb use impossible. Assuming the above is a semi-reasonable idea, it might be a lot less work than updating the ptrace-via-utrace code to handle the new ptrace changes. That's for Oleg to say. (Sorry, Oleg. ;-) Thanks, Roland
RE: utrace support on ARM
I will see if I can interest people in TI and Linaro. I will need a good story... ;-) It is kernel port modernization work that nearly every other platform has done by now.
Re: [PATCH 0/5] ptrace-utrace: fix exit_ptrace() vs ptrace_report_signal() races
I've merged these patches to the utrace-ptrace branch, now merged up to 2.6.37-rc5, and also the 2.6.34 and 2.6.35 backport branches. The 2.6.33 backport branch is no longer being maintained. I didn't update the 2.6.36 backport branch and probably won't maintain it unless some Fedora release starts using that kernel version (rawhide is already on 2.6.37-rc5 now). Thanks, Roland
Re: Utrace .report_jctl serialization issue ?
Is a utrace engine with .report_jctl enabled suppose to handle do_notify_parent_cldstop(current, notify) processing for the last stopping task ? Or should it muck with task-ptrace to force tracehook_notify_jctl() to return a non-zero value ? I can't figure out exactly how to construe this as a question about the utrace API. The documented API is that each thread gets a report_jctl callback, and the @notify argument is zero in all threads but one. I ask because I have a simple multi-threaded process with a utrace engine attached to the process group leader; .report_jctl is enabled. Do you mean the thread_group leader of one process in the process group? Or do you mean multiple utrace engines, one per thread, all in the process whose pid==pgid (that's what process group leader means in POSIX)? If I SIGTSTP the process, occasionally control is not returned to the shell. That sounds like a kernel bug. There should be nothing your report_jctl callback could do (assuming it doesn't send more signals itself) that affects the normal job control semantics. The kernels that are appropriate to discuss here are upstream kernels with the current utrace patches applied (i.e. what you get in the current utrace-ptrace git branch), or the most recent Fedora kernels that should include that same code. The code in question might well be the same in RHEL6 kernels, but RHEL issues need to be addressed through the proper RHEL support channels. What we will help you with here is the current utrace development code. Perhaps Oleg and/or I will get time soon to look into this issue. Chances are better if you provide a test case in the form of a simple utrace module and a test scenario using it. Thanks, Roland
Re: utrace unwanted traps
Probably, I am not sure. I can't recall the previous discussions. Me either (except hazily the one notion I mentioned), but that's why we have mailing list archives. Afaics, this was introduced by 26fefca955cc7c3c83014be2e10b773209d32eea utrace: sticky resume action. Before that patch the stopped tracee could only remember the pending UTRACE_REPORT or UTRACE_INTERRUPT. Correct. But that's how it used to be, and then we changed it. So it merits digging up our old discussion threads that led to doing that and see what all we had in mind then. I tried to grep my mail archives and google, but failed to find anything related. Based on the date of the commit you just mentioned, I browsed in: https://www.redhat.com/archives/utrace-devel/2009-October/thread.html and saw: https://www.redhat.com/archives/utrace-devel/2009-October/msg00287.html which is the day before that commit. I didn't read through all that to refamiliarize myself with all the details. Thanks, Roland
Re: [PATCH] utrace_barrier(detached_engine) must not sping without checking -reporting
Yes. But, with this patch, in this case utrace_barrier()-get_utrace_lock(attached = false) returns success and then we check utrace-reporting == engine. I guess that's OK if -reporting == engine is always set when any kind of callback might be in progress. (Hmm. Probably utrace-reap = T case needs more attention, utrace_maybe_reap() doesn't set utrace-reporting). That would be a problem. But, unfortunately, this signal_pending() check assumes the caller can always handle this ERESTARTSYS. But this is not true, one example is the exiting debugger doing exit_ptrace(). I understand it's a problem. Perhaps we need to have a flag argument or (probably better) a new utrace_barrier_uninterruptible call. But, for this particular situation with exit_ptrace, it could be kludged around. The caller is never going to dequeue another signal once it's as far into death as exit_ptrace anyway. So, it could do a loop of: ret = utrace_barrier(task, engine); if (ret == -ERESTARTSYS) { clear_thread_flag(TIF_SIGPENDING); continue; } Not that I'm suggesting this isn't a dismal kludge. But it seems like it would close the hole we have today in practice. Thanks, Roland
Re: [PATCH] utrace: utrace_reset() should clear TIF_SINGLESTEP if no more engines
The 1st patch I sent initially: --- kstub/kernel/utrace.c~10_utrace_reset_should_clear_ss 2010-09-20 03:55:12.0 +0200 +++ kstub/kernel/utrace.c 2010-09-20 21:33:47.0 +0200 @@ -709,6 +709,7 @@ static bool utrace_reset(struct task_str */ utrace-resume = UTRACE_RESUME; utrace-signal_handler = 0; + user_disable_single_step(task); } /* It clears TIF_SINGLESTEP when the last engine detaches. This we didn't want because utrace_reset can be called when the tracee is not stopped, and it's not safe to call user_*_step then (with the one caveat that the SIGKILL race is OK). The 2nd one which changes utrace_control(DETACH), @@ -1139,7 +1139,9 @@ int utrace_control(struct task_struct *t target-utrace_flags = ~ENGINE_STOP; mark_engine_detached(engine); reset = reset || utrace_do_stop(target, utrace); - if (!reset) { + if (reset) { + user_disable_single_step(target); + } else { /* * As in utrace_set_events(), this barrier ensures * that our engine-flags changes have hit before we It doesn't cover the case when the TIF_SINGLESTEP tracee detaches itself, but a) currently the are no such engines, and b) it looks more clean. If we're trying to meet the advertised API--how it's documented now is the only reasonable thing if we are actually fixing things properly at all--we do need to cover the case where the only engine to have requested single-step earlier then later returns UTRACE_DETACH from a callback. That is, if that engine could actually know that the tracee never reached user mode between its use of UTRACE_SINGLESTEP and its later UTRACE_DETACH. However, I am starting to think that if you prefer this change we have a better option. Instead of duplicating UTRACE_RESUME logic, perhaps the patch below makes more sense? Hmm, that does seem like it might be pretty reasonable. If your engine had ever permitted the tracee to get back to user mode after requesting UTRACE_*STEP, then it's just desserts to cause that SIGTRAP regardless of whether other engines might have kept it stopped in actuality. But, there might be plausible corners where an engine really could know reliably (without outside knowledge of what other engines might be doing) that the tracee can't really have been back to user mode yet. For example, if it checked signal_pending(tracee) before it used UTRACE_*STEP, then it could know for sure that its report_signal callback will have been called before it ever got to user mode, so if that callback returns UTRACE_DETACH, it better not be left with stepping enabled. In that particular case, it shouldn't be a problem, since after its report_signal we will always hit finish_resume_report later. But we should think about if there are any holes of a similar nature. Please tell me which fix you prefer? Or just commit the fix you like more. I'm only going to merge a specific commit of yours that you have tested. Thanks, Roland
Re: [PATCH] utrace_barrier(detached_engine) must not sping without checking -reporting
Please feel free to ignore this if it's no longer relevant to your work. I'm trying to catch up on the backlog of replies I owe you. I chose this one as the arbitrary cut-off before which I think I have already neglected things too long to still matter. If engine is detached (has utrace_detached_ops), utrace_barrier(engine) spins until engine-ops becomes NULL. This is just wrong. Yes, this happens because get_utrace_lock returns -ERESTARTSYS and utrace_barrier checks for this and loops. I agree these long-spin scenarios would be wrong. The reason it tries to wait for fully detached state is that after utrace_control(task,engine,UTRACE_DETACH), @task could still be in the middle of a callback for @engine. Suppose that utrace_control(DETACH) returns -EINPROGRESS, now we should call utrace_barrier(). However, it is possible that -EINPROGRESS means we raced with sys_sleep(A_LOT) doing report_syscall_entry(). Right. Perhaps utrace_barrier could do some different variant of the (utrace-reporting != engine) check. Change get_utrace_lock() to succeed if the caller is utrace_barrier() and ops == utrace_detached_ops. I do not see any reason why this case should be special from utrace_barrier's pov. It can just check -reporting and return 0 or do another iteration. [...] Also, it is not clear why utrace_barrier() needs utrace-lock, except to ensure it is safe to dereference target/utrace. Well, wouldn't that be reason enough? The comment in utrace_barrier talks about needing the lock. This corresponds to the comment in the UTRACE_DETACH case of finish_callback_report. Do you think those comments are inaccurate about what's required? Note: we should also reconsider() utrace_barrier()-signal_pending() check. IMHO it is badly wrong to have utrace_barrier do an uninterruptible wait (even moreso since it's really a spin). If a buggy callback gets stuck blocking or spinning and fails to return promptly, then you wedge any debugger thread trying to synchronize with it via utrace_barrier. If you can't even interrupt that debugger thread, then there will really be no chance to recover from the deadlock. Thanks, Roland
Re: gdbstub initial code, v8 ptrace
But. Suppose we have to attached engines. The first engine gets UTRACE_SIGNAL_REPORT and returns UTRACE_STOP | UTRACE_SIGNAL_IGN. Right, that's what it would do. I see, you're saying that the report.result passed on to the next engine would appear like it had been a real signal that the previous engine decided to ignore (or whose sa_handler==SIG_IGN or default action was to ignore). Hmm. Well, it's also distinguished by having orig_ka==NULL in the callback. Any real signal has a non-null orig_ka argument. or yes, it returns UTRACE_SIGNAL_DELIVER after gdb does signal XX. Now. The second engine gets UTRACE_SIGNAL_DELIVER or UTRACE_SIGNAL_IGN, not UTRACE_SIGNAL_REPORT. At least in the UTRACE_SIGNAL_DELIVER case, that's really the true thing for it to see. The previous engine decided to do a signal delivery (as directed by return_ka), so the next engine needs to see this to know what the prevailing state of play is now. That is why ugdb_signal_report(UTRACE_SIGNAL_REPORT) tries to return UTRACE_STOP | utrace_signal_action(action) to not change report-result (passed to the next tracee) inside the reporting loop. Well, it *can* do that. If UTRACE_SIGNAL_REPORT (or anything else random) is the ultimate report.result from the whole callback loop, we treat it just like UTRACE_SIGNAL_IGN. The worst case is UTRACE_SIGNAL_HANDLER. Single-stepping should know about this case. This means that any engine should always return UTRACE_resume_action | UTRACE_SIGNAL_HANDLER. I see. This is indeed the only way for any engine to know that it's getting the UTRACE_SIGNAL_HANDLER specific notification rather than some random fallout of someone else's UTRACE_REPORT request or whatnot. Probably we can check orig_ka != NULL and treat orig_ka == NULL as UTRACE_SIGNAL_REPORT. Hopefully this can't be confused with UTRACE_SIGNAL_HANDLER. I'm not really sure what you mean here. If -report_signal(orig_ka) was calles with orig_ka == NULL, then, whatever utrace_signal_action(action) we see, originally it was either UTRACE_SIGNAL_HANDLER or UTRACE_SIGNAL_REPORT but another engine returned, say, UTRACE_SIGNAL_DELIVER/UTRACE_SIGNAL_IGN to deliver/stop. Right. But this is in fact just the same for either UTRACE_SIGNAL_REPORT or UTRACE_SIGNAL_HANDLER. Not sure about UTRACE_SIGNAL_HOLD, but this is very unlikely race. You probably don't want to use that, but I'm not entirely sure. ptrace doesn't use it, and the signal interception model is pretty much the same. Yes, agreed. But there is one corner case. Until we generalize utrace_stop()-ptrace_notify_stop() hook, the tracee can be reported as stopped to gdb, but before it actually stops it can recieve a signal. I don't follow this. If we're stopping, then we don't receive (dequeue) any other signal until we get resumed. That should be fine from gdb's point of view. The next signal is just pending for later. The signal that we just reported was a) in fact reported to gdb and b) is still sitting in the siginfo_t on stack and the engine can examine/modify that before letting the thread resume. (The kerneldoc paragraph about @report_signal in utrace.h makes this guarantee.) Thanks, Roland
Re: Tracing with utrace, some questions
1. From what I've read in the DocBook pages I've figured out that I have to have two report entries. One for syscall_entry and one for syscall_exit. On syscall_entry I should use syscall_get_nr() and check if this number is equal to __NR_socket and return UTRACE_SYSCALL_ABORT in that case and on syscall_exit, I need to call syscall_rollback() to rollback the registers if utrace_syscall_action(action) returns UTRACE_SYSCALL_ABORT. Is this correct? The report_syscall_entry hook is the only one you need to prevent the system call from running. If it returns UTRACE_SYSCALL_ABORT, then the system call will fail with the ENOSYS error code. You only need to use a report_syscall_exit hook if you want the registers the user process sees after attempting the system call to be different from that. 2. First I've read the documentation from Roland's page and figured out it's out of date. report_syscall_entry callback used to have a struct task_struct argument but now it doesn't. How should I get a struct task_struct to pass to syscall_get_nr? Am I supposed to keep a reference to the struct pid I used in init_module() and use pid_task(pid, PIDTYPE_PID) or should I use find_get_pid() just as I used in the init_module()? Those callbacks are made in the affected thread itself. So you can just use the magic variable 'current'. I've updated the web copy of the documentation. If you install the kernel-doc rpm on a Fedora system, that contains the version of this same documentation that goes with the kernel you have. 3. In the report_syscall_exit callback, is the struct pt_regs argument there so that the user can directly pass it to syscall_rollback() or do I have to save the registers I had in report_syscall_entry() callback and use them instead? You can just pass that pointer directly. In fact, the same pointer to 'struct pt_regs' will be passed to both callbacks. This is always the same pointer that 'task_pt_regs(current)' would return, just made quicker to access by having the argument to the callbacks. 4. __NR_socket is available on some architectures and it's implemented on top of __NR_socketcall on others. I'm running this example on x86_64. How should my module figure out which mode the target process is running in? I suppose this is related to the CS register. There are several ways to go about this, depending on how arch-specific you want to make it. On x86-64 it's possible for 32-bit processes to make 64-bit system calls and vice versa, though normal user programs will never actually do this. Perhaps the easiest and cleanest way, that both covers this arcane possibility and also works on other architectures, is to call is_compat_task() (under #ifdef CONFIG_COMPAT). Having figured that how am I supposed to include system call numbers from both architectures and use __NR_socketcall for 32bit mode and __NR_socket for 64bit? Unfortunately, there is no clean way to refer to the 32-bit syscall numbers in code inside the 64-bit kernel. socketcall is not one of the few listed in asm/ia32_unistd.h, so off hand I think you'd just have to hard-code the i386 __NR_socketcall value in your module. 5. Is there any project in the outer-space that does something like this, sandboxing or monitoring system calls, from which I can learn more? Renzo Davoli's umview/kmview is just such an animal. See http://wiki.virtualsquare.org for details. Thanks, Roland
Re: gdbstub initial code, v11
The ones I'm talking about are Z2/Z3 for (data) watchpoints. Ah, OK, thanks. I'll try to understand how this works. In theory these will map to uses of the hw_breakpoint interface. Thanks, Roland
Re: gdbstub initial code, v11
I think it would be good to implement a feature that shows how this approach is an improvement over the current state of gdb+ptrace or gdb+gdbserver. Exactly what feature this should be... I don't know :-) I would imagine something performance-related. My vague notion was that we'd get it working well enough to have basic parity with native or gdbserver on some realish test cases, and then just look at the protocol interaction log to see cases where we could reduce round-trips between gdb and the stub. Some of those are bound to entail protocol extensions and gdb changes to exploit them. Off hand, the Z cases might be the only things that won't need that. Thanks, Roland
Re: [PATCH] utrace: utrace_reset() should clear TIF_SINGLESTEP if no more engines
I think we should start with changing utrace_control(DETACH) anyway, then try to improve. I'll ressend the one-liner I already showed. Ok. Hmm. I'll try to think more. Right now I don't really understand how to do this correctly. I wasn't immediately sure either. OK, finish_callback_report() and utrace_control(DETACH) can set TIF_NOTIFY_RESUME. Right. Those utrace_resume has the report.action==UTRACE_RESUME bail-out case. So either that would change or detach would also do UTRACE_REPORT. But what if there are no more attached engines? Looks like, utrace_resume(UTRACE_RESUME) needs to handle this special case. And utrace_reset() shouldn't clear task-utrace_flags, otherwise utrace_resume/utrace_get_signal won't be called. Right. Or else tracehook_notify_resume could call utrace_resume unconditionally, but I'm not at all sure that is not worse. The original theory was that it should always be OK to have some utrace_flags bits stay set when they are stale, because any kind of reporting pass that got enabled would hit the report-spurious case and clean the state up synchronously when it's safe. So, probably detach should set TIF_NOTIFY_RESUME, but utrace_reset() should do user_disable_single_step() too if no more engines. Confused. If there are no more engines but the tracee is still running, we still shouldn't do it there because it still might not be entirely safe. If the tracee is not stopped, it's only safe to call in current. And in fact I don't understand why this is important. When it comes to multiracing, any engine can hit the unwanted/unexpected trap because another engine can ask for UTRACE_*STEP. Right. An engine earlier in the list could swallow the signal so the next engines' callbacks didn't see it. But it doesn't know that some later engines didn't also ask for stepping. So there would have to be some understood convention between engines. For example, a later engine could see info-si_signo==SIGTRAP et al and act on that even when the incoming utrace_signal_action(action)==UTRACE_SIGNAL_IGN. Of course, that doesn't help a non-stepping engine that is earlier in the list to know that a later engine will be swallowing the signal. The original theory on this was that we'd one day stop overloading user signals for debugger-induced traps. In some past TODO lists and postings I referred to extension events. The idea (in part) was that things like hardware stepping would generate a special new flavor of utrace event rather than a real signal that has to be intercepted. Then engines' callbacks would easily see that this was a debugging event induced by some engine and ignore it (or more likely, just never get any callback unless your engine registered interest in stepping). This would also address the case of asynchronous engine detach just after a trap has actually hit, where today the SIGTRAP is queued and then later won't be intercepted by a debugger, and instead kills the user process. But we don't have any of that now, and don't yet know if we will really pursue any big improvements at this API level. The only really important (I think) case is when the last engine detaches. That's the most important case, sure. But in any case that is not actually racy, we should avoid later spurious traps. IOW. Suppose that eninge E does utrace_control(STEP) or its callback returns UTRACE_*STEP. If we do not detach this engine, other engines will see the trap. That's only so if the tracee actually gets back to user mode before we have another reporting pass. So why it is so important to clear X86_EFLAGS_TF if we detach E ? Perhaps I am worrying too much about it. The worst thing is if it could really get stuck. But that shouldn't be possible if there are any engines at all, perhaps only any with UTRACE_EVENT(QUIESCE) set. Worst case, one spurious SIGTRAP will get to a report_signal pass, but nobody will return UTRACE_*STEP again, so there won't be another. (Of course, nobody will swallow that SIGTRAP and so it will terminate the process first anyway, but that's another problem.) What seems important is any non-racy scenarios. That is, where perhaps it wasn't properly stopped and E detached asynchronously, but in practice the tracee was known to be otherwise blocked elsewhere or something, so the detach should have full effect before it returns to user mode. But that is just vague theory off hand. Thanks, Roland
Re: [PATCH] utrace: utrace_reset() should clear TIF_SINGLESTEP if no more engines
Change utrace_reset() to do user_disable_single_step(). Alternatively we could change ptrace layer, but I think this is not ptrace specific. Yes, it's right to fix this in the utrace layer. But I'm not sure that's the right place in the code to fix it. The expectation is that either we'll get to finish_resume_report, which calls user_*_step, or that utrace_control is resuming us without any expectation of a report. For UTRACE_RESUME in that case, utrace_control calls user_disable_single_step. So probably the UTRACE_DETACH case there should just do it to. Thanks, Roland
Re: [PATCH] utrace: utrace_reset() should clear TIF_SINGLESTEP if no more engines
OK, so what should we do for now? If we can't come to a straightforward answer for the general case fairly quickly, then we can do the simple change to start with. Without the multitracing utrace_resume()-user_disable_single_step() fixes the problem. Otherwise we probably need ENGINE_WANTS_SINGLESTEP. No, no, we don't want that. We don't need more state. And, remember, we really don't need to microoptimize cases when single-step was used recently--the cost of taking one more single-step trap and percolating through the signal paths was going to be pretty large anyway. It's better to have a spurious report (preferably just spurious TIF_NOTIFY_RESUME with no actual callbacks) following any detach, or whatever it takes to ensure user_disable_single_step() always runs if user_enable_*_step did before and there is no engine around to see a report_quiesce callback pass. If there is a report_quiesce or report_signal callback pass where nobody returns UTRACE_*STEP, then we should never leave stepping enabled when we return to user mode. Yes, my concern was running and !current, not exiting. This is OK on x86 but user_disable_single_step() is arch specific. It's not really OK on x86 either, with either SMP or preemption. Thanks, Roland
Re: exit_ptrace() ptrace_report_signal() problems
The problem is, utrace_control(UTRACE_RESUME) can't prevent the stop if the tracee has already returned UTRACE_STOP, but utrace_stop() didn't take utrace-lock yet. So you are saying that utrace_barrier does not meet its documented API. Right? It says effect ... has been applied. But that's not true if a UTRACE_STOP return value will not be cleared by an immediate subsequent utrace_control(,,UTRACE_RESUME). Basically, ptrace_detach_task(sig = -1) should do: - if we are going to do utrace_control(UTRACE_DETACH), we should first instruct the (running) tracee to not report a signal, otherwise that signal will be lost. Right. - if the tracee has already reported a signal, we should set -resume = UTRACE_DETACH and resume the tracee, like explicit detach does. Right. We can check ctx-siginfo != NULL to detect this case. Ok. Especially if it's as I suspect, that we can do that without changing the utrace layer. No, this problem is orthogonal, or I missed something. Please look at this message https://www.redhat.com/archives/utrace-devel/2010-June/msg00075.html Yes, I'd forgotten about that. We do need to fix utrace_barrier to match its documented guarantee, or else we cannot rely on it for ptrace. In particualar: Off hand I think it does matter today insofar as it violates the documented guarantees of the utrace_barrier API. If utrace_barrier returns 0 it is said to mean that, Any effect of its return value (such as %UTRACE_STOP) has already been applied to @engine. So e.g. if you get a wake-up sent by your callback before it returns UTRACE_STOP, and then call utrace_barrier followed by utrace_control(,,UTRACE_RESUME), then you should be guaranteed that your resumption cleared the callback's stop. Yes, but currently UTRACE_RESUME can't guarantee this. From the API perspective I had been thinking in, it's not utrace_control that's supposed to guarantee it. It's utrace_barrier that's not supposed to return yet. But, that is indeed a sort of inside-out way of looking at it, really. What utrace_barrier guarantees is that the callback bookkeeping is done, and it's not supposed to wait for e.g. the next engine's callback to run. utrace_control(RESUME) should remove ENGINE_STOP like UTRACE_DETACH does I think you've now talked me into this. There is no other way that utrace_barrier can keep its guarantee about the return value effect without also delaying while other engines' callbacks might run, which seems much worse. Thanks, Roland
Re: ugdb breakpoints
The traditional method is to restore the original instruction replaced by the breakpoint in text, single-step over that instruction, then restore the breakpoint in text, then continue. That method requires all-stop so that while you are stepping the thread that just hit the breakpoint, you can't have another thread run past that instruction and miss the breakpoint. Both this traditional in-place method, and the instruction-copying method, depend on using single-step. So stepi has to work before break can work. Thanks, Roland
Re: gdbstub initial code, v8
Please note that last year's gdbstub prototype used kernel uprobes as an optional gdb breakpoint implementation (i.e., a backend for the Z packets). When/if the lkml uprobes patches actually get merged, ugdb should also use them. That's something for later, and it's not quite so simple. If a utrace engine ever uses uprobes, it probably would need to use the utrace-based version of uprobes. If something different goes in upstream, it remains to be seen how it would interact with utrace, and there would be specific work required for that. There are many more issues about that too. At any rate, this is all a distraction at the moment, and Oleg doesn't need any more of those! Thanks, Roland
Re: gdbstub initial code, v8 ptrace
I am a bit confused... OK, ugdb is wrong wrt multitracing. UTRACE_SIGNAL_REPORT case shouldn't return UTRACE_STOP | UTRACE_SIGNAL_IGN, it should return UTRACE_STOP | UTRACE_SIGNAL_REPORT to keep report-result. No, UTRACE_SIGNAL_REPORT is not meant for a return value. Its only use is in the incoming argument to tell you that a given report_signal call is standing in for a report_quiesce(0) call. But it needs to return UTRACE_SIGNAL_DELIVER? That's what you return when you want the signal delivered. When you are stopping the tracee to decide about the signal, that's not what you want. UTRACE_STOP | UTRACE_SIGNAL_IGN is correct to not deliver the signal right now, and stop instead. If you want to deliver the signal later, then you'll resume with UTRACE_INTERRUPT to ensure you get back to report_signal and that can fill in the details and return UTRACE_SIGNAL_DELIVER. Probably we can check orig_ka != NULL and treat orig_ka == NULL as UTRACE_SIGNAL_REPORT. Hopefully this can't be confused with UTRACE_SIGNAL_HANDLER. I'm not really sure what you mean here. Not sure about UTRACE_SIGNAL_HOLD, but this is very unlikely race. You probably don't want to use that, but I'm not entirely sure. ptrace doesn't use it, and the signal interception model is pretty much the same. Thanks, Roland
Re: gdbstub initial code, v7
But I meant another case, when the stopped tracee doesn't have siginfo. Currently ugdb just sends this signal to tracee, and then it will be reported to gdb. Not sure if this is right or not, I can change this. (or perhap this doesn't matter, I dunno). What do you mean by doesn't have siginfo? You mean non-signal stops? What non-signal stops does ugdb report? Thanks, Roland
Re: [PATCH 0/3] UTRACE_DETACH fixes
I misunderstood the UTRACE_INTERRUPT without UTRACE_EVENT(QUIESCE) above as if you mean it is possible to get UTRACE_SIGNAL_REPORT without QUIESCE. No, I meant the opposite: since you won't get UTRACE_SIGNAL_REPORT without UTRACE_EVENT(QUIESCE), it might seem that UTRACE_INTERRUPT should not be permitted, as UTRACE_REPORT is not. But because that is not it's *only* effect, we don't do permit it even though you won't get UTRACE_SIGNAL_REPORT. OK. This means ugdb should set QUIESCE, just to ensure its -report_signal() will be called. If you ever want UTRACE_SIGNAL_REPORT calls, yes. OK, please forget. I must admit, this still looks a bit, well, unnatural to me. May be it makes sense to add _UTRACE_EVENT_SIGNAL_REPORT, _UTRACE_EVENT_SIGNAL_HANDLER, into utrace_events. Then ugdb could ask for UTRACE_SIGNAL_REPORT and avoid the unnecessary -report_quiesce() calls. If anything, I think UTRACE_EVENT(RESUME) would make sense, perhaps with a separate -report_resume callback. That would request only the cases that now get report_quiesce(0). But, it really does not cost much to have if (event) return UTRACE_RESUME; or similar in your report_quiesce callback. It's beyond refinement of the utrace API (that we're not doing now), it's just micro-optimization. Roland, could you also comment this patch? https://www.redhat.com/archives/utrace-devel/2010-August/msg00108.html Again, this looks like a bug to me, but I won't insist if it is not. Sorry, that is still on my queue. I haven't forgotten it. I just haven't gotten to reviewing it yet because of other work and it needing a lot of hard thinking. Thanks, Roland
Re: gdbstub initial code, v7
ugdb sets please stop flag and does utrace_control(INTERRUPT). However, in unlikely case the tracee can stop before -report_signal() reporting I don't think this is the right thing to do. When the intent is explicitly to interrupt, there is no reason to stop before the interruption is complete, i.e. report_signal. If you only stop there, then you can always process a signal injection with complete flexibility. The gdb model and the remote protocol doesn't currently have any concept of requesting a stop that is not an interruption. Thanks, Roland
Re: gdbstub initial code, v7
Currently GDB does not do anything special, that is if there is siginfo for signal SIGUSR1 but one does $C0B (SIGSEGV) does ptrace reset the siginfo or is left the SIGUSR1 siginfo for SIGSEGV? The kernel considers this sloppy behavior on the debugger's part. If you inject a different signal, we expect you should PTRACE_SETSIGINFO to something appropriate, or else that you really didn't care about the bits being accurate. If the resumption signal does not match the siginfo_t.si_signo, then the kernel resets the siginfo as if the debugger had just used kill with the new signal (i.e. si_pid, si_uid point to the ptracer). Thanks, Roland
Re: [PATCH 0/3] UTRACE_DETACH fixes
Hmm. I am not sure I understand. If -report_signal != NULL, then you can't expect -report_quiesce() after utrace_control(INTERRUPT) ? Sorry, I used report_quiesce as shorthand for either report_quiesce, or report_signal with UTRACE_SIGNAL_REPORT. So, it is technically kosher enough to use UTRACE_INTERRUPT without UTRACE_EVENT(QUIESCE) set, if you really know the situation and are sure about all those caveats above. How? see below. I mentioned the examples. But, it's only UTRACE_EVENT(QUIESCE) that gives you any expectation of an extra callback for no event from either UTRACE_REPORT or UTRACE_INTERRUPT. Yes, this is clear too. Apparently not. ;-) So. Now I am even more confused. First of all, ugdb (at least currently) does not need report_quiesce() and UTRACE_EVENT(QUIESCE) at all. OK, this is not 100% true due to multitracing/etc, lets ignore this. Anyway it makes sense to clear QUIESCE after $c to avoid the unnecessary callbacks. All it needs is -report_signal(). But, the problem is, it is not called after utrace_control(INTERRUPT) ! You need QUIESCE to ensure report_signal() will be called, and this looks at least strange to me. QUIESCE is the only event type for no event. If you only asked for signal events, then you only get a callback when there is an actual signal event. If you want an extra callback before dequeuing any signal, then that is what QUIESCE is for. Once again, I am not asking to change utrace now, but could you explain what is wrong with the patch below? That gives an extra unrequested report_signal callback to every engine that is only interested in some signal event. Consider a crash-catcher engine. It would only track UTRACE_EVENT(SIGNAL_CORE) (and perhaps CLONE for its bookkeeping). This engine never asked for a callback before each and every attempt to dequeue any signal, which is what you would give it with your change. Thanks, Roland
Re: [PATCH 0/3] UTRACE_DETACH fixes
But probably we need utrace_detached_ops-report_reap() anyway. -report_death can return UTRACE_DETACH, and after that utrace_report_death() can skip utrace_reset() and do -report_reap(). Yeah, that's a good point. There's no good reason to do a special case check for detached_ops there rather than just having the no-op report_reap hook. If an engine used UTRACE_INTERRUPT without having first set QUIESCE, then it has no rightful expectation of getting any callback. Hmm. This is certainly new to me. Could you confirm? Well, I didn't say it precisely correctly. UTRACE_INTERRUPT serves two purposes. First, it just interrupts things like a signal would. You could use that on its own without even tracing any events at all, just do achieve fault injection or similar. Second, it's like UTRACE_REPORT. If you do have some other event bits set, then you can expect/rely on getting those normal events soon if they would otherwise have happened--i.e., if you know it's blocked in a syscall, and you have UTRACE_EVENT(SYSCALL_EXIT) set, then you can expect to get a report_syscall_exit soon. (But, it's always possible to race with the syscall finishing on its own, or being interrupted by an outside signal, so that exit might have come before your utrace_control call if you were not otherwise synchronizing with your established report_syscall_exit callback.) As with UTRACE_REPORT, after UTRACE_INTERRUPT you can expect to get some report_quiesce callback even if there is no other event you were tracing that happens. For this to actually happen, you need to have UTRACE_EVENT(QUIESCE) set. So, it is technically kosher enough to use UTRACE_INTERRUPT without UTRACE_EVENT(QUIESCE) set, if you really know the situation and are sure about all those caveats above. But, it's only UTRACE_EVENT(QUIESCE) that gives you any expectation of an extra callback for no event from either UTRACE_REPORT or UTRACE_INTERRUPT. (And since that is the sole possible purpose of UTRACE_REPORT, we diagnose a utrace_control call like that.) Thanks, Roland
Re: gdbstub initial code, v5
When the main thread exits, gdbserver still exposes it to gdb as a running process. It is visible via info threads, you can switch to this thread, $Tp or $Hx result in OK as if this thread is alive. gdbserver even pretends that $vCont;x:DEAD_THEAD works, although this thread obviously can never report something. This is sort of consistent with the kernel treatment. The main thread stays around as a zombie, acting as a moniker for the whole process. But indeed that is not actually useful for any thread-granularity control or information (well, there is the dead thread's usage stats, but that's all). I don't think this is really right. This just confuses the user, and imho this should be considered like the minor bug. I tend to agree, but don't think it's a big issue either way, really. ugdb doesn't do this. If the main thread exits - it exits like any other thread. I played with gdb, it seems to handle this case fine. Sounds good to me! - The exit code (Wxx) can be wrong in mt-case. The problem is, -report_death can't safely access -group_exit_code with kernel 2.6.35. This is solveable. Don't even worry about it. If there is something trivial to do that makes it better for earlier kernels, then go ahead. But if the easy thing to do gives correct results on =2.6.35 and racily wrong or random results on older kernels, then we can just live with that. Thanks, Roland
Re: [PATCH 3/4] utrace_set_events: fix UTRACE_EVENT(REAP) case
On 08/16, Roland McGrath wrote: - It is possible that both -death and -reap are true. In this case it is OK to clear UTRACE_EVENT(REAP), but set_events fails. No, it's not OK to clear it. Once -reap is set, then the engine's ops-report_reap might or might not have been called already. Afaics - no. If utrace-death is set (and we check it under utrace-lock) we can ignore utrace-reap. In short, if ops-report_reap can be called before -death is cleared, then 2 possible callers of utrace_maybe_reap() can race with each other, but this can't happen. Right. If -death is still set, it means utrace_report_death is still running. So the utrace internals know that the report_reap callbacks haven't started yet. But from the utrace API perspective, all you can know is that release_task() has already been called. So I think it's right for the API not to let you clear UTRACE_EVENT(REAP) at that point. utrace-death == T means: - (utrace_flags _UTRACE_DEATH_EVENTS) == T Yes. - utrace-death was set by utrace_report_death() which will take utrace-lock later and clear -death, only then it may call ops-report_reap(). Yes. - until utrace_report_death() clears -death, _UTRACE_DEATH_EVENTS must be set in -utrace_flags, otherwise utrace_maybe_reap(true) is buggy. Yes. Note that both utrace-death and _UTRACE_DEATH_EVENTS are cleared atomically from utrace-lock pov. Yes. IOW. utrace-death is true, then IF (utrace-reap) tracehook_prepare_releas()-utrace_maybe_reap(true) was already called, this is how utrace-reap was set. Meaning release_task() was called--semantically reaping may have begun. But utrace_maybe_reap() did nothing and returned. We rely on the subsequent utrace_maybe_reap(false) from utrace_report_death() - but this can't happen until we drop utrace-lock Correct. ELSE utrace-reap can't be set until we drop utrace-lock. Correct. Now that you merged c93fecc925ea7567168f0c94414b9021de2708c5 get_utrace_lock() must not succeed if utrace-reap == T, this becomes a bit off-topic. However, I thought about relaxing the dead check in get_utrace_lock(), instead of utrace-reap we could check utrace-reap !utrace-death. In fact, initially I was going to do this, but then decided to make the simpler patch for now. When utrace-reap is set, it means release_task() has been called. The API caller has no way to know reaping hasn't already begun--except if its report_death callback has not returned yet. But I think that the API definition of once release_task() has been called (i.e. entered, maybe not returned yet), any utrace call will get -ESRCH, is a clear and comprehensible constraint. We should not open any holes in that. Thanks, Roland
Re: [PATCH] fix mark_engine_detached() vs start_callback() race
This is the minimal temporary ugly fix for now, we should certainly cleanup and simplify this logic. The barriers in mark_engine_detached() and in start_callback() can't help and should be removed. If we ignore utrace_get_signal() we do not even need utrace_detached_quiesce(), start_callback() could just do Agreed. I applied the patch for now. I think in the longer term mark_engine_detached() should not change engine-flags at all but add QUIESCE to -utrace_flags. However, this breaks utrace_maybe_reap(reap = true) and we should avoid the race with finish_callback() which clears -reporting after report_quiesce(). What's the benefit to adding QUIESCE? If any utrace code gets entered at all, then any callback run will be able to do the special-case ops check for detached engines. A bit off-topic, but I don't think finish_callback() should check engine-ops == utrace_detached_ops before return. Instead we should change finish_callback_report() to return the boolean. We shouldn't return true without setting report-detaches. Ok. Thanks, Roland
Re: [PATCH?] avoid the unnecessary utrace_resume()-utrace_reset()
utrace_resume(UTRACE_REPORT) always calls utrace_reset() because start_callback() obviously can't clear report-spurious when event == 0. Change start_callback() to correctly clear -spurious in this case. Ok. Note: utrace_control(DETACH) does utrace_do_stop() and sets UTRACE_REPORT if the tracee is not stopped. It also does mark_engine_detached() which does not set QUIESCE in target-utrace_flags. This means we rely on report.spurious which should provoke utrace_reset() from utrace_resume() if target-utrace_flags doesn't have QUIESCE. A bit too subtle, imho. Agreed. There is no reason utrace_control can't set it in utrace_flags in its !reset case. Also, UTRACE_REPORT can be lost because of UTRACE_INTERRUPT or normal signal: utrace_get_signal() checks utrace_flags UTRACE_EVENT(QUIESCE) and returns otherwise. This should be fixed somehow. This check is wrong anyway, but it is not clear how we can fix the race with DETACH. I see. That would be fixed by utrace_control setting it. Thanks, Roland
Re: gdbstub initial code, v4
Note the second attachment, GDBCAT. It is just the simple perl script which connects /proc/ugdb to tcp port. It can be used for remote debugging via tcp, or with (unpatched) gdb which can't do select() on /proc/ugdb. bash$ nc -l 2000 /proc/ugdb Actually, it is more convenient to use it in any case, at least for logging purposes. (gdb) set remote debug 1 Thanks, Roland
Re: [PATCH 2/4] utrace_set_events: consolidate setting _UTRACE_DEATH_EVENTS checks
Hmm. For unknown reason I do not see this 2/4 patch on utrace-devel, strange. So I am attaching it in case my email was really lost and you didn't get it. Indeed, it did not come through to me or the list. Thanks, Roland
Re: [PATCH 2/4] utrace_set_events: consolidate setting _UTRACE_DEATH_EVENTS checks
utrace_set_events() checks the setting _UTRACE_DEATH_EVENTS case twice, and it is not immediately obvious why the first check is needed, and why it is not racy (we are checking exit_state without tasklist). The code is correct, but looks confusing. More comments are always good. In short: this double check allows to avoid tasklist when utrace_flags already has these bits while engine-flags doesn't. But multitracing is unlikely case, in the likely case if we add _UTRACE_DEATH_EVENTS to engine-flags we set these bits in utrace_flags too. I think it makes sense to consolidate these checks to make the code a bit more understandable. I don't really agree about unlikely case. In many uses, the systemtap task-finder will have a utrace engine on every task in the system, for example. Moreover, this is an importantly distinct particular kind of micro-optimization to be doing here: avoiding a system-wide lock. Any place that we take tasklist_lock at all, we are introducing a system-wide slowdown or limitation on scaling, just because of our tracing of one task. So optimizing the minimize that is qualitatively different and really much more important than avoiding taking the utrace lock, for example. But with that note in your mind, I am happy to take this patch now as a simplification and let reoptimization come back later. Thanks, Roland
Re: [PATCH 1/3] get_utrace_lock() must not succeed if utrace-reap == T
get_utrace_lock() must threat utrace-reap == T as engine-ops == NULL. Yes, I think you're right. This requires some changes to the kerneldoc and utrace.tmpl, because it now says that you get EALREADY if report_reap is already running. Now it will be consistent with utrace_control, where you get ESRCH either if report_reap might already be running or if it's entirely detached and/or reaped. Thanks, Roland
Re: [PATCH 2/3] fix utrace_control(DETACH) utrace-death interaction
The problem is, utrace_control(DETACH) does nothing and returns -EALREADY if utrace-death is set, this is not right. We can and should detach in this case, we only should skip utrace_reset() to avoid the race with utrace_report_death()-REPORT_CALLBACKS(). This behavior is the original (minimal) synchronization scheme from before we had utrace_barrier. See Interlock with final callbacks in Documentation/DocBook/utrace.tmpl. The expectation is that your report_death is going to clean up your -data stuff and then return UTRACE_DETACH. If utrace_control returns -EALREADY, then you know that report_death is taking care of things. I'd imagined that you'd do something like: mutex_lock(stuff engine-data points to); mark in there that we are detaching; ret = utrace_control(task, engine, UTRACE_DETACH); if (ret == 0) { /* detached. tear our stuff down. */ ... return DONE; } else if (ret == -EALREADY) { /* report_death is running, i.e. waiting for our lock */ mutex_unlock(...); return DONE; } else ... Put another way, you can have told your report_death beforehand that it should return UTRACE_DETACH if it runs before your utrace_control call. Thanks, Roland
Re: [PATCH 0/3] UTRACE_DETACH fixes
Whatever we do, start_callback() can see the old engine-flags but the new -ops = utrace_detached_ops. Just suppose that the caller of UTRACE_DETACH is interrupted right after setting engine-ops. * it can check the old flags before using the old ops, or check the old * flags before using the new ops, or check the new flags before using the * new ops, but can never check the new flags before using the old ops. Or do you think I miss something and this is false alarm? * Hence, utrace_detached_ops might be used with any old flags in place. * It has report_quiesce() and report_reap() callbacks to handle all cases. report_reap is covered with the utrace_detached_reap() stub. Every other callback uses start_callback(), i.e. calls report_quiesce first. utrace_detached_quiesce() returns UTRACE_DETACH, so start_callback() will return NULL. [...] but I can no longer think properly today. Sleep well. :-) Thanks, Roland
Re: problems with v3
I don't know this area well, but considering that ser-unix.c is just chock full of tty-related goo, I think it is probably important for something. My impression is that this API is not just used for target communication but also for manipulating gdb's own terminal. Ah, I see. I've appended the patch I came up with. I have not tried it at all, but it should all be pretty obvious. Yeah, that seems like it should be reasonable, i.e. more or less like what I'd figured it would do if it didn't have all these different backends. However, I think the test you probably want is !S_ISCHR. I believe that /proc/ugdb as is stats as S_ISREG, not S_ISFIFO. Actually, better yet, just make it !isatty (fd). Another pseudodevice that also behaves more like a socket than like a tty might be S_ISCHR, but only a tty isatty. Thanks, Roland
Re: problems with v3 (Was: gdbstub initial code, v3)
I seem to understand the problem(s). I am a bit surprized. Basically I have the questions about almost every utrace_ function. I'll try to recheck and summarize my concerns tomorrow with a fresh head, I hope I missed something. Ok. That stuff about pure kernel implementation issues doesn't need to go to the archer list and Tom, only to utrace-devel. Thanks, Roland
Re: gdbstub initial code, v3
Indeed, gdb sees that this fd is not pipe/tcp and uses the hardwire serial_ops, but hardwire_readchar() doesn't play well with select(). Please teach gdb to use poll/select ? If it makes it easier, could use: bash$ nc -l -U /tmp/socket /proc/ugdb (gdb) target remote |nc -U /tmp/socket for the moment. Silly of course, but just not to be blocked on cleaning up gdb's serial-device handling to be more nicely nonblocking. Thanks, Roland
Re: problems with v3
I don't really know the gdb code, but I'm surprised it really has multiple different serial backends. I would have thought that after opening the fd, you would treat them all about the same. If tcsetattr et al work on it, then you set the baud rate and whatever, if they say ENOTTY then fine. It might make sense to default to a short read timeout for an actual serial port (or UDP port), which might be unplugged or the other end dead, or whatever; and to an infinite timeout for a non-tty, which should more presumably have its fd shut down and read EOF or EPIPE or whatever when the stub goes away, and otherwise perhaps the stub is just taking that long. But aside from that, I don't know why you wouldn't treat all kinds of remote protocol fd's the same wrt select/poll and blocking/nonblocking reads/writes, be they serial-port fds, pipe socketpair sockets, network sockets, or fds to a new magic file that pretends to be a tty or a socket or whatever (or even a disk file of canned response playback!). Thanks, Roland
Re: gdbstub initial code
At first I tried to support both multiprocess and !multiprocess modes, but this needs too many unnecessary code which I'd like to avoid, at least now. So this version requires multiprocess+ in qSupported, otherwise target extended-remote fails. Please let me know if we need both modes. No, that is fine. We aren't trying to make it a universal implementation of the gdb remote protocol in all variants to work with all gdb versions of the past. We're just trying to make it work well with current/future gdb. Thanks, Roland
Re: gdbstub initial code
Note that currently I am not even trying to do something meaningful with utrace. My only goal for now is to implement the very basic things like attach/detach/stop/cont/exit correctly from the remote protocol pov. And I want to do this rightly, then we will discuss utrace issues. Ok. I think you might need select to work on the pseudofile for gdb to be happy. I'm not sure on the situation about that. Thanks, Roland
Re: gdbstub initial code
Given these requirements, and given that the 'new' uprobes is close to -tip, would it be more useful to pursue an alternate syscall approach rather than gdbstub? Feel free to pursue whatever you like. For our own time allocation, we see an effort along those lines now as a distraction from work that will really make a qualitative difference in the debugging experience. Any new interface is an instant source of endless discussions (at best) about many details that are ultimately trivial in the greater scheme of things. Aside from flaming its details a priori, no new interface is of any interest to anyone unless its use is integrated into real, non-toy userland debugging tools and it enables their delivery of qualitatively significant, new or better aspects of the debugging experience. Starting with a whole new interface inevitably involves spending most of the time on the combination of LKML flames about the interface trivia and work on toy userland libraries and utilities to demonstrate using the new kernel features. That's a whole lot more time and effort and friction to come around to doing the toy version of what real userland debugging tools do today, and maybe then start on doing anything that's actually new or different beyond cleaning up pet-peeve interface trivia, if you don't get too side-tracked filling in practical holes in your toy tools along the way first. What Oleg is embarking on now is a prototyping exercise. We're not trying to find a new kind of backwards to bend over to have upstream people like any new interface layers. We're trying to get quickly to the experimental baseline from which we can try to come up with some of those qualitatively significant new things. That means having a full, adult-sized, real-world debugging tool plugged into new and unencumbered kernel code paths and doing approximately its normal thing at least as well (approximately) as normal. In the end, more of the work is on the userland side (or in fritter about what the user-kernel interface details should be) than the actual guts of the kernel-side work. We've chosen the gdb remote protocol as a prototype vehicle because we start with about 95% complete support in our closest-to-hand real-world tool (gdb) for that baseline. (We also happen to have some gdb-hacking colleagues nearby to help us experiment with anything that might rely on that remaining 5% or otherwise on teaching gdb new tricks.) Hence we hope to get very quickly to that baseline for experimentation. From there we can start trying out the things that could really make a big difference in what a debugger tool can do (or how well/fast it can do them). What matters for that isn't the little details of encodings or syscall interfaces, but the large-granularity issues about how the interface is used, like how many context switches back and forth to the debugger it takes to get a task done, etc. IMHO it would be pointless to try to design any new interface before knowing concretely what kinds of things on the big-idea scale make an important difference to the actual debugging experience with a real-world tool like gdb. When we've shown what key features and characteristics deliver a big tangible payoff, we can worry about how to formulate new interfaces or extensions that both achieve those essential goods and meet with upstream tastes. Thanks, Roland
Re: gdbstub initial code
What is the reasoning for selecting /proc/ugdb instead of something like /proc/pid/ugdb? The protocol, and gdb, support dealing with many processes over one control channel. For normal the debugger model to work where it can attach to a process on demand, with your model it would have to open another fd for every process (or perhaps thread), maintain perhaps thousands of fds, do a select dance, never be able to pipeline notifications from multiple processes into a single receiving call in the debugger, etc. Moreover, that's just not what the protocol is and we already have a protocol with a client that we can just support as is. Thanks, Roland
Re: gdbstub initial code
Please see the attachment. Don't take this code seriously, this is the early prototype and everything should be rewritten. It barely uses utrace, only to stop the target. You've got to start somewhere! Thanks, Oleg. It's great to see this get underway. This seems to work, but I had to export access_process_vm(). Yeah, that's a known issue. We can discuss how to either work around or change it at some point, but it's just a distraction at the moment. I still can't understand what utrace_xxx_pid() buys us, and I still think that utrace_prepare_examine() can't protect the task even for regset calls. Please start a separate thread here about each of those issues. The first one is fairly simple, but what that means in practice depends on the resolution of the second question, which is a more complex subject. Thanks, Roland
Re: gdbstub initial code
When I had posted a prototype of a gdbstub which Frank and I had worked on. http://lkml.org/lkml/2009/11/30/173, Peter and Ingo showed a preference for a combined gdbstub in kernel, i.e kgdb and the newer stub should use only one stub in kernel. Do we have plans to handle that? Their actual idea there largely represents a misunderstanding of the problem space. But regardless, it's a distraction from the prototype work that Oleg is doing now. The actual possibilities for code sharing between kgdb and something at all like what we're doing now are quite small. It's just not a problem at all to get prototyping progress done with a new implementation of the fairly trivial gdb remote protocol decoder, and contemplate consolidation later on. Thanks, Roland
Re: [PATCH 0/6] utrace: security problems
As to the unsafe_exec stuff, I'd long figured we would have something just about like that. (You might recall that an earlier utrace API had an unsafe_exec engine callback, which had its own unresolved complications.) For exec transitions (set-id, file caps, selinux), I'd originally figured an engine's report_exec could check for changes and decide to detach itself if appropriate. We will figure out when we come to it whether that can really cover all the exec angles or not. setprocattr is the one other troublesome wrinkle, which I haven't thought all that much about. I don't think we need to, nor should, try to tie down the security-related stuff at the outset. We can work on prototype engines with the proviso that they are for root only or for experimentation when one doesn't care about security issues yet. When we have at least a couple of different engines with different access models, we will be better placed to figure out how to tie in the security issues. In the long run, the security_ptrace() granularity of hook is probably too blunt an instrument. We'll want to contemplate the different kinds of engines with their different kinds of security-relevant interactions and decide on security checking models that give the appropriate flexibility. But it's premature to get into that before we have a bit of an ecosystem of different sorts of modules to consider concretely. Thanks, Roland
Re: [PATCH 0/6] utrace: security problems
For exec transitions (set-id, file caps, selinux), I'd originally figured an engine's report_exec could check for changes and decide to detach itself if appropriate. No, it can't. At this point S_ISUID/S_ISGID exid's were already dropped, or exec can fail before before tracehook_report_exec(). If an exec fails, nothing changes and there is no security-relevant event to take notice of. I don't really follow your other comment. But ... Yes, agreed, let's forget this for now. Indeed. The only question: do you think the trivial 1st patch is correct? The one that just adds a macro defined to another existing macro? Any change that preprocesses out to the same code is correct, sure... Thanks, Roland
Re: Possible problem with utrace_control
OK. But then perhaps UTRACE_STICKY_STOP makes sense even without the races we discussed. It can simplify the cooperation in this case. The only cooperation methods we should consider are those that cover all their races. Yes. Let's consider the concrete example. The tracee is going to stop and calls utrace_stop(). Before it takes utrace-lock and sets state = TASK_TRACED, the debugger does utrace_control(DETACH). In this case utrace_stop() shouldn't stop, otherwise nobody will ever wake it up. That is why we clear this bit in -utrace_flags to provoke utrace_reset() which will check carefully the tracee has an engine with ENGINE_STOP. Right. I agree this race seems to apply to all kinds of resumption, not just detach. The question for the non-detach cases is what kind of guarantee the utrace API is claiming to provide. For detach it matters strongly because the loser of the race is gone and there is no way for it to be responsible for cleaning up the fallout. For non-detach cases, the picture is more hazy. The conclusions on what matters might be different if we have utrace_wake_up() or equivalent. Off hand I think it does matter today insofar as it violates the documented guarantees of the utrace_barrier API. If utrace_barrier returns 0 it is said to mean that, Any effect of its return value (such as %UTRACE_STOP) has already been applied to @engine. So e.g. if you get a wake-up sent by your callback before it returns UTRACE_STOP, and then call utrace_barrier followed by utrace_control(,,UTRACE_RESUME), then you should be guaranteed that your resumption cleared the callback's stop. Another example, you call utrace_set_events and disable some events, including all the ones for which your callbacks ever return UTRACE_STOP. If that returns 0, or then you call utrace_barrier and it returns 0, then you know that either it was already stopped before you called utrace_set_events, or that it's not stopped now. In either case, you know that utrace_control(,,UTRACE_RESUME) ensures that it's no longer stopped. (That is, stopped by your engine--it of course might still be stopped by a different engine.) For the new API idea, I was talking about something like a utrace_wake_up() call. Now I don't understand you again... The 1st question: should the new API help us to kill ptrace_notify_stop() in utrace_stop() ? Ideally yes. This might still have to be some other kind of special case, we'll have to figure that out. That would be survivable if it's necessary. The wait_chldexit wakeup would be covered by what we're discussing, it's just another wait_queue_head_t. The signal sending is more problematic. But I wanted first to contemplate what API would cover new code cleanly written. The dismal old ptrace signal/wait synchronization might always be a difficult outlying case. Hmm... After the previous email I thought that this utrace_wake_up() or whatever shouldn't be visible outside of utrace.c. I'm sorry if I was not clear. I have been talking entirely about new utrace API features, not internals. This is a discussion for your engine-writer hat more than your utrace-maintainer hat. I'm only mentioning any utrace implementation details to elucidate what is realistic or problematic to consider for the new API features. Or. Do you literally mean something like utrace_wait_for_event();// for debugger utrace_wake_up(); // for tracee which should (at least) cover all races with exit/detach/etc ? I meant utrace_wake_up(wait_queue_head_t *) used in place of wake_up() inside a utrace callback. The waiter (tracer) side would use normal linux/wait.h interfaces (wait_event macro, etc.). Thanks, Roland
Re: Possible problem with utrace_control
utrace_control(current,,UTRACE_STOP) doesn't really do anything more. It does utrace_do_stop() which sets UTRACE_REPORT/TIF_NOTIFY_RESUME. Harmless, but makes sense to avoid. Right. I was talking about the API perspective. Those internal details are more or less invisible to a given engine. OTOH, I am wondering if we can give more power to utrace_control(STOP). Currently debugger can't stop the tracee unless it cooperates. However, if the tracee doesn't have UTRACE_EVENT(QUIESCE), then utrace_control(STOP) works. This is strange. Why is it strange? If you don't have any callback, then all your engine does is get stopped. If you have callbacks, then you always get callbacks before the thread does anything (including stop). Since your callback's return value always replaces any other resume action for your engine, you have to cooperate with yourself in not changing your mind the next time your code is in charge of the tracee. Stupid question: what if we just remove the clear_engine_wants_stop() code in finish_callback_report() ? That would change the API so that a callback's resume action is always overridden by a utrace_control(,,UTRACE_STOP). Today the practice from a third-party debugger thread is to do: ret = utrace_control(task, engine, UTRACE_STOP); if (ret == 0) { // Already stopped, e.g. was in jctl or another engine's stop. // This means we can do our asynchronous examination here. } else if (ret == -EINPROGRESS) { // it will get to report_quiesce soonish (modulo blocked in kernel) } If the examination you wanted to do was some predetermined sequence of fetching and fiddling before resuming, then the report_* callback can just do all that itself. For example, fetch the registers for a sample and keep going; insert some breakpoints and keep going; etc. Today, the callback that goes with that asynchronous-side code has the choice to do something and stop or to do something and not stop. With this change, it would have to explicitly request not stopping by calling utrace_control(current, engine, UTRACE_RESUME) inside the callback. It would need to use its own bookkeeping above the utrace layer to know that there was a stop it was supposed to clear (i.e. something the code sample above would set before calling utrace_control). This changes the set of races with asynchronous stop/resume scenarios. But it doesn't remove the asynchronous-resume vs callback-stop race. In concert with wake_up_tracer-after-stop perhaps it gets better. We'd have to really think through the races with the new API. So this would fix that race. ... (But note if there is another engine that decides to stop anyway, then we'll have used a second CPU and/or ping-pong scheduled to run the tracer thread in parallel with a very short amount of work in the tracee before it just blocks anyway.) Hmm. Not sure I understand... Yes, the tracee still can stop if another tracer demands, but this is fine? Correct. When that is what's going to happen (e.g. there is another engine attached that is a slow stopper like ptrace), then it's suboptimal to have the scheduler decide to yield the CPU immediately on waking your first tracer. Btw, before I forgot about this. shouldn't utrace_control(UTRACE_RESUME) remove ENGINE_STOP from -utrace_flags like DETACH does? Perhaps so. My memory is dim on the use of that bit in utrace_flags. It's to close certain races, but we'll have to remind each other of the details. Heh. it turns out that I misunderstood you in the past, I thought you dislike the whole idea of wake_up from utrace_stop(). No, just the over-general hook or the over-specific ptrace hack. I talked about about a utrace_wake_up() that would work this way. Hmm. Not sure... at least I don't immediately see something simple. Except, just add wait_queue_head into struct utrace. This way utrace_barrier() can wait longer than needed if it notices utrace-reporting == engine. Or we can add it into utrace_engine, in this case utrace_stop() needs list_for_each(engine, utrace-attached). You're talking about two different things here. Whatever implementation details change in utrace_barrier() is not directly apropos. That would not change the API-visible details at all, just make utrace_barrier() perform better. Let's not get into that before we hash out the new API ideas. For the new API idea, I was talking about something like a utrace_wake_up() call. That might be implemented by a wait_queue_head_t * in utrace_engine, for example. The implementation internals could involve a simple iteration, or involve more hidden bookkeeping. I'm only talking about the API idea for the moment. I only mentioned implementation details at all to indicate that a restricted API (one wake-up per callback per engine) could be done with constant storage overhead and thus avoid a whole potential can of worms--which is the only real reason to
Re: Possible problem with utrace_control
I think Roland is right. Let's forget about utrace for the moment, this code looks like if (!CONDITION) { set_task_state(TASK_XXX); schedule(); } and it can obviously race with CONDITION = true; wake_up_xxx(); This is Linux in-kernel synchronization 101. The correct blocks of this sort do: set_current_state(TASK_UNINTERRUPTIBLE); while (!CONDITION) { schedule(); } set_current_state(TASK_RUNNING); This is entirely orthogonal to anything having to do with utrace per se. This pattern race-free because wake_up_* will reset the waiter to TASK_RUNNING after making CONDITION true. So in the race where the while (!CONDITION) clause lets it get into schedule(), that just returns immediately, after which the second iteration sees CONDITION true. This is why I recommended using higher-level things like linux/wait.h facilities, where the correct usage patterns are more obvious. But the race we're actually talking about here is on the other side of the coin. I am wondering if there is another way to handle such a race... Suppose we - add UTRACE_STOP_XXX - modify utrace_control(UTRACE_STOP_XXX) to do mark_engine_wants_stop() and nothing more. utrace_control(current,,UTRACE_STOP) doesn't really do anything more. So a callback doing this closes the old race: UTRACE_STOP is in force when the tracer side wakes up, since utrace_control-on-current was done before wake_up_tracer(). But... - finish_callback_report(UTRACE_STOP_XXX) does spin_lock(utrace-lock); if (engine_wants_stop(engine)) action = UTRACE_STOP; else action = UTRACE_REUME; spin_unlock(utrace-lock); Let's clarify what this means for the people following the utrace API but not its implementation internals: currently, the callback's return value overrides the engine's last-set state such as UTRACE_STOP, whether that came from a previous utrace_control before the callback, a utrace_control call inside the callback, or from the return value of a previous callback. So now we've moved to a different race (or a different instance of the original race, if you like): after wake_up_tracer(), the tracer can come and clear our engine's UTRACE_STOP state with its utrace_control call; but then our callback return value just reestablishes UTRACE_STOP and we still stop, having lost the asynchronous resumption from the tracer. So Oleg has an idea for a new kind of callback return value, which I'd call perhaps UTRACE_NOP. This would mean to stay in UTRACE_STOP if the last utrace_control call on this engine used UTRACE_STOP, or the last callback return value left the engine in that state, otherwise UTRACE_RESUME. Then, probably producer could do utrace_control(current, UTRACE_STOP_XXX); slot = book_message_in_buffer(buf); if (slot 0) return UTRACE_STOP_XXX; ... return UTRACE_RESUME; Even better, UTRACE_STOP_XXX can live outside of UTRACE_RESUME_MASK. Right, it doesn't belong in enum utrace_resume_action--it doesn't make sense as an argument for utrace_control, for example. So it could be something like a return-value flag bit saying stop if marked for stop, e.g. UTRACE_SINGLESTEP|UTRACE_STICKY_STOP for single-step, unless marked to stop, in which case stay stopped. So this would fix that race. For the scenario we're discussing here, that's probably just fine. The only reason to stop was for the consumer to run. If it woke up quickly and ran enough to call utrace_control while we were still finishing the return of the callback and utrace bookkeeping, then we just don't stop at all. (But note if there is another engine that decides to stop anyway, then we'll have used a second CPU and/or ping-pong scheduled to run the tracer thread in parallel with a very short amount of work in the tracee before it just blocks anyway.) Now let's consider another scenario. Take the normal debugger scenario, i.e. a ptrace implementation done cleanly on the utrace API without the magic kludges we have today, or a ptrace replacement built cleanly from scratch in the ideal way, or the pending gdbstub implementation. Here what the callback does is wake up the tracer, and return UTRACE_STOP. What the tracer thread does is block in some fashion until the tracee's wake_up_tracer(), then do whatever logic (in ptrace, that means return to user mode and let it make another ptrace system call, but that all might be quite fast), and decide to examine the thread. To make sure its own callback is finished and UTRACE_STOP will be in force, it has to synchronize with utrace_barrier. So in maximum parallelism, this means, having preempted the tracee just now by waking up, we now yield back to the tracee to finish our callback, other engines' callbacks, and utrace bookkeeping.
Re: Possible problem with utrace_control
You should use linux/wait.h or similar facilities rather than calling schedule() and wake_up_process() directly. This doesn't have anything to do with utrace, it's just the clean practice for normal kinds of blocking in the kernel, for a variety of reasons. That has to do with how your control thread blocks, which is just the normal set of issues for any thread in kernel code. You are using the lowest-level way of doing things, which is not recommended--that's why various higher-level facilities exist. The admonition to use utrace facilities for all blocks has to do with when you make a traced thread block. You are on the correct path in that regard. The relevant parts of your scenario match what I described. The race I talked about between your control thread's utrace_control() and tracee callbacks returning UTRACE_STOP explains what you are seeing. Thanks, Roland
Re: Possible problem with utrace_control
Thanks for your interest in utrace. It's correct that passing UTRACE_REPORT to utrace_control should always ensure you get some report_* callback before the tracee returns to user mode. However, I think your use may be susceptible to a race between resumption by utrace_control, and the UTRACE_STOP done by the callback. If your callback looks something like: wake_up_consumer(); return UTRACE_STOP; then there is a window when the consumer can wake up and call utrace_control before that callback has actually returned UTRACE_STOP and had it processed. In that event, your UTRACE_REPORT comes before the UTRACE_STOP, and then you do nothing afterwards to resume it. If it ever did resume, it would indeed report (the UTRACE_REPORT is still pending). But that doesn't help you. Off hand, I think there is only one way to address this race. When you get the notification to consume, before calling utrace_control, call utrace_barrier first. This will block the consumer until the producer's callback has actually finished and had UTRACE_STOP recorded for your engine. At that point, you can be sure that your utrace_control call will indeed wake up the stopped tracee. Oleg or others here can double-check my reasoning and help explain the situation if you share your code for them to see. Because of this possibility, it might make sense to have utrace_control return -EINPROGRESS for this situation. That at least would alert you that something might be amiss. Unlike other cases of -EINPROGRESS, this would not just mean that you need to call utrace_barrier to be sure the effect has completed. Rather, it means there is a danger of the scenario I described above, where the effect you asked for is going to be trumped by the callback's UTRACE_STOP. So it doesn't quite fit in with the other situations--it's not really an indication that you have to follow up with a synchronization, it's an indication that your synchronization logic is already wrong. But at least it would relieve the mystery of such a situation, assuming you are checking return values and noticing in debugging. I'm certainly open to opinions on this API change. The need to use utrace_barrier as a second synchronization after the main wake-up (done by whatever normal means you are using) is rather clunky, and when it matters (as we suspect it does in the occasional race in your case) it leads to ping-pong scheduling. So we would like to improve the utrace API in this area at some point. One idea we have discussed before is a utrace_wake_up call. (This assumes that some variant of linux/wait.h wake_up* is what you are using in the guts of wake_up_consumer.) This would replace wake_up* for use in a callback, and simply do the wake-up after all the callbacks are finished (or perhaps just yours, but probably all), so your return value action is already recorded. In fact, it might do the wakeup only after the UTRACE_STOP is not only recorded but actually performed, so that by the time your consumer wakes up, the tracee is actually stopped and available for third-party examination. (You don't actually care about that for your case--for your purposes, it would be optimal just to delay the wakeup until your own callback's return value is recorded. That way, if your consumer acts quickly enoug on another CPU, it might even resume the tracee before it actually yields the processor.) As you might tell from the complexity of that paragraph, there are many fuzzy nuances of exactly what that better API might be. (And I didn't even delve into the possibility that you're using some synchronization mechanism other than wake_up* calls on a wait_queue_head_t.) Hence we haven't tried to work it out exactly yet. We hope that seeing a variety of uses such as yours will help us figure out how best to improve the utrace API for tracee-callback-to-tracer synchronization handshakes. Thanks, Roland
Re: No rule to make target
obj-m := crash_suspend.o s/_/-/
Re: ptrace crash on PREEMPT 2.6.18-128.7.1.el5 kernel
But I'm happy to have tracked it down to the utrace-based ptrace emulation, and was mostly just interested in knowing if preempt and utrace are fundamentally incompatible on x86_64, or something like that. I'll fight through the 2.6.18-164 issues instead, since the ptrace problem doesn't seem to be happening on that version. It is probably the case that the RHEL5 utrace code cannot easily be made to work reliably with CONFIG_PREEMPT. Thanks, Roland
Re: linux-next: add utrace tree
Frank, please be clear as to which branch you want included (master or utrace-ptrace). Also note that neither of those branches matches what was posted in the sense that they both have lots of history and merges not represented in the patches. (I assume that they do produce the same final source tree, though). Yes, the trees do match. I certainly never expected our ancient git history to get merged in directly upstream. I've made a new branch on: git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-utrace.git called: next/master (Actually it's on master.kernel.org and the public mirror is being a little slow as I write this.) This starts from v2.6.33-rc4 and then has commits for the 7 patches that Oleg posted in December. Beyond that, we've added one follow-on patch to fix a bug Oleg just tracked down (Oleg will post that patch soon). And I've added one more commit with a MAINTAINERS update, shown below. You can also find the same stuff from the series file and patch files in: http://people.redhat.com/utrace/2.6-next/ If it makes things easier for linux-next to have this git branch either rebased or merged from a different fork point, please let me know. Thanks, Roland --- [PATCH] MAINTAINERS: add utrace This updates the ptrace entry to cover utrace too. They are part of the same maintenance effort. Also add the utrace mailing list. Signed-off-by: Roland McGrath rol...@redhat.com --- MAINTAINERS |7 +-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index c8f47bf..8da2a0a 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -4375,15 +4375,18 @@ M: Jim Paris j...@jtan.com L: cbe-oss-...@ozlabs.org S: Maintained -PTRACE SUPPORT +PTRACE AND UTRACE SUPPORT M: Roland McGrath rol...@redhat.com M: Oleg Nesterov o...@redhat.com +L: utrace-devel@redhat.com S: Maintained F: include/asm-generic/syscall.h F: include/linux/ptrace.h F: include/linux/regset.h F: include/linux/tracehook.h -F: kernel/ptrace.c +F: include/linux/utrace.h +F: kernel/ptrace* +F: kernel/utrace* PVRUSB2 VIDEO4LINUX DRIVER M: Mike Isely is...@pobox.com
Re: PTRACE_SYSCALL_ENTRY/EXIT
We don't have any particular plans to extend the ptrace interface. I strongly doubt we would even try to do anything like that until the utrace-based ptrace interface is merged into Linux and the old ptrace implementation gone. In general, we are not looking for extensions to the ptrace interface. It is an ugly hairball already and we are more interested in having the utrace API layer available inside the kernel and then embarking on new and sane userland interfaces instead of shoehorning more into ptrace. That said, some particular kinds of simple enhancements to ptrace are really quite trivial to implement in the new utrace-based implementation. The particular area you suggest is one of these. What I would expect is not new variants of the one-shot interface like PTRACE_SYSCALL. Rather, I would envision new PTRACE_O_* options to enable syscall entry and exit tracing analogous to the PTRACE_EVENT_* events you can now enable. This means that you make one PTRACE_SETOPTIONS call to enable the set of events you want, and then use plain PTRACE_CONT (or whatever). If you really want exactly the one-shot behavior instead, then we could consider that. But, like I said, we are not looking to add much in the way of new wrinkles to the dismal ptrace userland interface. Thanks, Roland
Re: s390 user_enable_single_step() (Was: odd utrace testing results on s390x)
Ok, I changed the wording slightly: Clear the TIF_SINGLE_STEP bit in copy_thread. The new process did not get a PER event of its own. It is wrong deliver a SIGTRAP that was meant for the parent process. Very good! Thanks, Roland
Re: s390 user_enable_single_step() (Was: odd utrace testing results on s390x)
Right. That means we should leave the status quo of not clearing TIF_SINGLE_STEP in user_disable_single_step. Ok, although it seems a bit strange not to do it. Perhaps I should add a comment about it. It doesn't seem strange to me, but then I've just been through all this. user_*_step is about what the task will do next. TIF_SINGLE_STEP is about what the task has done recently. Of course more good comments always help. I might be inclined to change the name of TIF_SINGLE_STEP so that its true purpose is more obvious. AFAICT, in fact it is not even about single-step per se. It means some PER trap happened and should produce SIGTRAP. Don't you get the same thing if you haven't used single-step, but instead used PTRACE_POKEUSR to set up per_struct with bits that say to trigger for some other reason? How about calling it TIF_PER_PENDING? So after everthing has been converted to utrace we always will load the control registers in FixPerRegisters. Right. (This could well still change in the future. But that's how it is in utrace now. And regardless of possible future implementation changes it will always be the case that sometimes it will be called on current.) We could use the same compare of the control registers as the code in __switch_to. See below. Yes, sounds good. The PSW_MASK_PER is the global PER enablement switch, the PER_EM_MASK bits enable the different PER events. We check for the PER_EM_MASK bits because it is easier to access in __switch_to. The return PSW is stored in the pt_regs structure, we would have to get a pointer to it (what regs = task_pt_regs(taks) does in FixPerRegisters). In FixPerRegisters we can as well use the PSW_MASK_PER bit. I see. Thanks for the explanation. Thanks, Roland
Re: s390 user_enable_single_step() (Was: odd utrace testing results on s390x)
Hmm, command for tracehook_signal_handler say this for stepping: @stepping: nonzero if debugger single-step or block-step in use Are you saying you would like me to clarify that wording somehow? It's meant to be implicit that the arch code is not doing any special fakery about single-step for signal handlers, only processing real single-step traps (and faking them for a syscall instruction if the arch requires that). No other arch does it, so it didn't occur to me that s390 would. Before tracehook some had ptrace_notify calls there, and the call to tracehook_signal_handler replaced that call. In ptrace (including utrace-based ptrace), this winds up with sending a SIGTRAP. So when we finally do get out of do_signal and TIF_SINGLE_STEP causes a second SIGTRAP, it's already pending and the second one makes no difference. So we have been lucky so far. Actually, Oleg rightly points out: Confused again, perhaps I just misunderstood what you mean... Without utrace, tracehook_signal_handler() doesn't send SIGTRAP, it merely does ptrace_notify(SIGTRAP), this means that [...] even without utrace we can have unexpected SIGTRAP. That is quite true, and I just misremembered when writing that paragraph. So indeed we have been lucky, but it's not the luck of the problem not happening on s390, but the luck of nobody ever caring. :-) Ok, so with the full utrace the semantics of tracehook_signal_handler is more than just causing a SIGTRAP. It is an indication for a signal AND a SIGTRAP if single-stepping is active. In short, it is the indication of a signal handler having been set up, just like its kerneldoc description says. Whatever that should mean to tracing (SIGTRAP or otherwise) is in the purview of the generic tracing layer, not the arch layer. To make both cases work we should stop setting TIF_SINGLE_STEP in do_signal and pass current-thread.per_info.single_step to tracehook_signal_handler instead of test_thread_flag(TIF_SINGLE_STEP). Correct. Thanks, Roland
Re: s390 user_enable_single_step() (Was: odd utrace testing results on s390x)
Clear the TIF_SINGLE_STEP bit in copy_thread. If the new process is not auto-attached by the tracer it is wrong to delivere SIGTRAP to the new process. The change is right, but this log entry is confusing. auto-attached has nothing to do with it, nor does anything about tracing the new process or not. The new process has not experienced a PER trap of its own, so it is wrong to deliver a SIGTRAP that is meant for its creator. Thanks, Roland
Re: s390 user_enable_single_step() (Was: odd utrace testing results on s390x)
I am confused as well. Yes, I thought about regs-psw.mask change too, but I don't understand why it helps.. [...] But. Acoording to the testing I did (unless I did something wrong again) this patch doesn't make any difference in this particular case. 6580807da14c423f0d0a708108e6df6ebc8bc83d does. Those results are quite mysterious to me. I think we'll have to get Martin to sort it out definitively. Thanks, Roland
Re: s390 user_enable_single_step() (Was: odd utrace testing results on s390x)
However, with or without CONFIG_UTRACE, 6580807da14c423f0d0a708108e6df6ebc8bc83d is needed on s390 too, otherwise the child gets unnecessary traps. This confuses me. user_disable_single_step on non-current doesn't do anything not already done by the memset in copy_thread. Ooh, except perhaps it does not clear PSW_MASK_PER. Maybe that matters. That's the only thing I can think of. Maybe Martin can make sense of it. Thanks, Roland
Re: s390 user_enable_single_step() (Was: odd utrace testing results on s390x)
The PER control registers only get reloaded on task switch. Can you test if this patch fixes your problem? Long ago when I first worked with David Wilder on s390 arch code, I remember we made this change. It seems to have been forgotten in the later rounds of reworking and merging. Thanks, Roland
Re: s390 user_enable_single_step() (Was: odd utrace testing results on s390x)
This probably means that copy_process()-user_disable_single_step() is not enough to clear the this task wants single-stepping copied from parent. I would suspect s390's TIF_SINGLE_STEP flag here. That flag means a single-step trap occurred. This is what causes do_single_step to be called before returning to user mode, rather than the machine trap doing it directly as is done in the other arch implementations. If I'm right, then this task wants single-stepping is not the problem, and that really is fully cleared. In fact, looking at s390's copy_thread (arch/s390/kernel/process.c) it clears out all the state that is actually touched by user_enable_single_step and user_disable_single_step. So for s390 the new fork.c call is actually superfluous AFAICT. The problem is that the copied parent state includes the this task has a pending single-step to report flag. IMHO it clearly makes sense for s390's copy_thread to clear this flag in a new task, which it does not do now. An alternative to that would be just to make its user_disable_single_step clear the flag. That could in theory also have an effect on e.g. the (authentic) pending step report of a tracee that was stopped with TIF_SINGLE_STEP set when its tracer detached. This might be considered a good thing, but since every other arch posts the SIGTRAP immediately they all have the equivalent issue and s390 doesn't need to be any better than they are before we have a generic resolution to the whole subject of tracer-induced signals (which we won't get into now). I'm not even sure from my insufficient reading of the s390 assembly code whether this path is even possible, i.e. do_signal called before do_single_step. Martin, I suggest having copy_thread clear TIF_SINGLE_STEP. That bit is always task-private state that should not be copied. Btw, given the complexity of FixPerRegisters (and its new additional cost on task==current), you might want to make user_*_single_step bail out if per_info.single_step is already set/clear on entry. Thanks, Roland
Re: [PATCH 0/7] utrace/ptrace
Well. I had a lot of technical discussions with Roland about utrace, but I never asked him why he created this thing ;) To me, utrace looks like vfs. Currently we have the single and very poor filesystem, ptrace. Until we add the appropriate layer, we can't expect the further improvements is this area. I think that is an excellent analogy, and it's one I've used before. Oleg and I have had our hands pretty full just with the infrastructure layer and with ptrace. Having this layer in the kernel is what makes it tractable for a lot of other people to collaborate on new features in this space, and that's what we want to enable and accelerate. Some of those on the CC list have worked and are working on such things, and I hope they will pipe up about those. Given the date, I suspect we might not see much from anybody on this (or anything) until January. Myself, I expect to be largely offline for the rest of the year. As Oleg mentioned, I have a cleanup/reimplementation of seccomp using utrace. That is quite a trivial use--it demonstrates how easy the utrace API makes it to do things like that, in contrast to previous solutions with arch-specific assembly hacking and so forth. I can dust that patch off and post it if anybody cares. Some other features based on utrace have been floating around for some time, posted here before. Those include uprobes, kmview, and the gdb stub. I don't which of those are quite ready for merging, but honing and polishing them gets quite a lot more doable with utrace in the tree instead of out. Thanks, Roland
Re: [PATCH 0/7] utrace/ptrace
Do you have an estimate or better numbers how the overhead of seccomp-over-utrace compares to the current in-tree seccomp? I never measured it. I would estimate that any difference one way or another is in the noise. The point of seccomp is to run a process that almost never makes any system calls. The only effects of utrace for that use are on the system call path itself, and the essential effects there (i.e. taking the tracing path vs the hot path) are the same as what the old seccomp implementation does. If you have some example uses of seccomp or something that can serve as a benchmark for it, I would be glad to measure the difference. Thanks, Roland
Re: odd utrace testing results on s390x
Damn, my fault. I forgot to cc you when I sent the fix for s390 (attached below), and I forgot to remind you about this fix when we discussed the testing on s390. That change is upstream for 2.6.33 now. I'll cherry-picked it into the 2.6.32/tracehook branch so it will be in the backport patchset too. Thanks, Roland
Re: odd utrace testing results on s390x
Oh. I am still trying to parse arch/s390/kernel/entry.S to understand how can we fix these test-cases. I think I need the help, will continue tomorrow. Martin Schwidefsky schwidef...@de.ibm.com is the s390 arch maintainer. He is friendly and helpful. You can ask him for help both with understanding the intended s390 behavior before, and with understanding the code paths. He won't expect any of us to grok s390 assembly. :-) Thanks, Roland
Re: new Fedora 13 utrace kernel
The utrace patch looks suspicious in utrace.h, which cause the compilation failure without CONFIG_UTRACE. I have confirmed that the git tree looks sane. +static inline void utrace_init_task(struct task_struct *child) +{ +} +{ +} Oops! That snafu got fixed on the main branch but I forgot to put the fix on the 2.6.32/utrace branch too. Fixed now. Thanks, Roland
Re: x86: do_debug PTRACE_SINGLESTEP broken by 08d68323d1f0c34452e614263b212ca556dae47f
Please find the trivial test-case below. It hangs, because PTRACE_SINGLESTEP doesn't trigger the trap. 2.6.33-rc1 x86-64 works for me with either -m64 or -m32 version of that test. (not sure this matters, but I did the testing under kvm) Apparently it does. You should hack some printks into do_debug() and see how kvm is differing from real hardware. (Actually you can probably do this with a notifier added by a module, not that you are shy about recompiling!) Probably kvm's emulation of the hardware behavior wrt the DR6 bits is not sufficiently faithful. Conceivably, kvm is being consistent with some older hardware and we have encoded assumptions that only newer hardware meets. But I'd guess it's just a plain kvm bug. Thanks, Roland
Re: x86: do_debug PTRACE_SINGLESTEP broken by 08d68323d1f0c34452e614263b212ca556dae47f
Comparing to the old (2.6.32) logic, I think it might be this (untested). I also note this is the sole use of get_si_code, seems like it should just be rolled in here. Thanks, Roland diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 3339917..16a88f5 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -530,7 +530,6 @@ dotraplinkage void __kprobes do_debug(struct pt_regs *regs, long error_code) { struct task_struct *tsk = current; unsigned long dr6; - int si_code; get_debugreg(dr6, 6); @@ -569,14 +568,15 @@ dotraplinkage void __kprobes do_debug(struct pt_regs *regs, long error_code) * We already checked v86 mode above, so we can check for kernel mode * by just checking the CPL of CS. */ + dr6 = tsk-thread.debugreg6; if ((dr6 DR_STEP) !user_mode(regs)) { tsk-thread.debugreg6 = ~DR_STEP; set_tsk_thread_flag(tsk, TIF_SINGLESTEP); regs-flags = ~X86_EFLAGS_TF; + } else if (dr6 (DR_STEP | DR_TRAP_BITS)) { + send_sigtrap(tsk, regs, error_code, get_si_code(dr6)); } - si_code = get_si_code(tsk-thread.debugreg6); - if (tsk-thread.debugreg6 (DR_STEP | DR_TRAP_BITS)) - send_sigtrap(tsk, regs, error_code, si_code); + preempt_conditional_cli(regs); return;
Re: x86: do_debug PTRACE_SINGLESTEP broken by 08d68323d1f0c34452e614263b212ca556dae47f
+ dr6 = tsk-thread.debugreg6; why? we have tsk-thread.debugreg6 = dr6 above Yeah, it's a little superfluous. Except that the existing code uses tsk-thread.debugreg6 and dr6 inconsistently. It only matters either way if some notifier function might change thread.debugreg6, which I wasn't sure that none might. i.e., does/should hw_breakpoint hide/remap the watchpoint-fired bits when they are not for the same-numbered, ptrace-installed virtual debugreg? And does/should kprobes, kgdb, and whatnot, hide DR_STEP from thread.debugreg6 for a step that's not from user_enable_single_step? if ((dr6 DR_STEP) !user_mode(regs)) { tsk-thread.debugreg6 = ~DR_STEP; set_tsk_thread_flag(tsk, TIF_SINGLESTEP); regs-flags = ~X86_EFLAGS_TF; this looks strange... we set TIF_SINGLESTEP but clear X86_EFLAGS_TF This was the original purpose of TIF_SINGLESTEP from long, long ago. This happens when TF was set in user mode and then it did syscall/sysenter so TF is still set at the first instruction in kernel mode. TF is cleared from the interrupted kernel registers so the kernel can resume normally. In the original logic, TIF_SINGLESTEP served just to make it turn TF back on when going to user mode. Since then we grew the complicated step.c stuff and it all fits together slightly differently than it did when the original traps.c path was written. can't understand how this change can fix the problem. We should always send SIGTRAP if the task returns to user-mode with X86_EFLAGS_TF? If the debug exception happened in user mode, then we should send SIGTRAP. In the old (2.6.32) code with its goto-heavy logic the !user_mode(regs) was goto clear_TF_reenable; and that is: clear_TF_reenable: set_tsk_thread_flag(tsk, TIF_SINGLESTEP); regs-flags = ~X86_EFLAGS_TF; preempt_conditional_cli(regs); return; I thought the new logic was falling through to the send_sigtrap case after if ((dr6 DR_STEP) !user_mode(regs)). But now I see that the subtle use of dr6 vs tsk-thread.debugreg6 (without comments about it!) meant that DR_STEP is cleared from tsk-thread.debugreg6 before we test it. So I guess the idea there is that the !user_mode case would swallow the step indication but still leave some DR_TRAP_BITS set and so you'd generate a user SIGTRAP in honor of those (i.e. watchpoint hits). But I thought the hardware behavior was that a step will set DR_STEP in DR6 but not clear any DR_TRAP_BITS set from before, so I'm not sure this can't sometimes send a SIGTRAP twice for a combination of a watchpoint hit and a delayed step. OK. I blindly applied this patch, step-simple still fails. Yeah, it was a quick reaction to the funny-looking control flow. But I didn't really investigate what is actually happening. Thanks, Roland
Re: [RFC,PATCH 14/14] utrace core
C does implicit casts from enum to integer types, right? So always using u32 here would not impose any extra typing on the user, or am I missing something subtle here? No, that's right. I had just been thinking of the documentation / convenience issue. I figured with u32 people might tend to think they always had to write utrace_resume_action(action) like you do in report_signal, or would want it to get the enum so e.g. you can write a switch and have gcc give those warnings about covering all the enum cases. But you have convinced me that the consistency of using u32 everywhere is the what's really easiest to understand. I don't mind the sharing of the argument, it just looked odd to have the u32/unsigned int/enum thing intermixed, since you care about typing length (as good a criteria as any) I'd just be lazy and make everything u32 ;-) That's good enough for me. As to the content, can't you accomplish the same thing by processing such exclusive parent registration before exposing the child in the pid-hash? Right before cgroup_fork_callback() in kernel/fork.c:copy_process() seems like the ideal site. As Oleg mentioned, that would add some complications. The task is not really fully set up at that point and the clone might actually still fail. The final stages where the clone can fail are necessarily inside some important locks held while making the new task visible for lookup. One of the key features of the utrace API is that callbacks are called in uncomplicated contexts so you just don't have to worry about a lot of strangeness and arcane constraints. That means making such a callback while holding any spin lock or suchlike is really out of the question. So, either we have to make a callback where you suggest, before the task really exists, or where we do now, after the task is visible to others. An unfinished task that doesn't yet have all its pointers in place, and still might be freed before it could ever run, would add a whole lot of hair to what the utrace API semantics would be. If you get the callback that early, then you can attach to it that early, and then you expect some callbacks about it actually failing ever to exist. And meanwhile, you might have to know what you can and can't try to do with a task that is not really set up yet. It just seems like a really bad idea. Hence, we are stuck with the post-clone callback being really post-clone so that it's possible for other things in the system to see the new task and try to utrace_attach it. But, as I said, we are not actually relying on having a way to rule that out at the utrace level today. So I think we can just take out this hack and reconsider it later when we have an active need. Best would be to fix up the utrace-ptrace implementation and get rid of those other kludges I think, not sure.. its all a bit involved and I'm not at all sure I'm fully aware of all the ptrace bits. We haven't figured out all the best ways to clean up ptrace the rest of the way yet. We'd like to keep improving that incrementally after the basics of utrace are in the tree. [...] Surely you are not suggesting that all these callbacks should be made with a spin lock held, because that would obviously be quite insane. Because there can be many engines attached? Because it's a callback API. A callback is allowed to use mutex_lock(), is expected to be preemptible, etc. Having an interface where you let somebody's function unwittingly run with spin locks held, preemption disabled, and so forth, is just obviously a terrible interface. Or in case of utrace_reap() maybe push the spin_lock() into it? I did a restructuring to make that possible. IMHO it makes the control flow a bit less clear. But it came out a bit smaller in the text, so I'll go with it. I'm well aware that ptrace had some quirky bits in, and this might well be one of the crazier parts of it, but to the un-initiated reviewer (me) this could have done with a comment explaining exactly why this particular site is not worth properly abstracting etc.. We'll add more comments there. Not if the comment right above the WARN_ON() says that its a clueless caller.. but if you're really worried about it, use something like: WARN(cond, Dumb-ass caller\n); Oh, that's much better. I've made all the WARN_ON's give some text. Thanks, Roland
post-2.6.32 utrace
Up until now, the utrace git trees were relative to 2.6.32. Now Linus has merged several of Oleg's small patches that were prerequisites for utrace, those populating my tracehook branch. So, I've now merged the latest tree from Linus in, and split out 2.6.32/* branches. We still have a tracehook branch. These patches (which might be in -mm?) are on the tracehook branch and still not merged (listed last to first): a8f782e export __ptrace_detach() and do_notify_parent_cldstop() c3473e1 ptrace_signal: check PT_PTRACED before reporting a signal b396f5e tracehooks: check PT_PTRACED before reporting the single-step 45667dd tracehooks: kill some PT_PTRACED checks These ones have been merged upstream now, but remain on 2.6.32/tracehook (the basis for the backport branches 2.6.32/utrace*): e8a2f23 ptrace: cleanup ptrace_init_task()-ptrace_link() path a88b467 ptrace: x86: change syscall_trace_leave() to rely on tracehook when stepping e01acf4 ptrace: x86: implement user_single_step_siginfo() 462a57b ptrace: change tracehook_report_syscall_exit() to handle stepping 172590d ptrace: powerpc: implement user_single_step_siginfo() d63b43d ptrace: introduce user_single_step_siginfo() helper d4ef551 (upstream) signals: check -group_stop_count after tracehook_get_signal() There is so far no drift between the trunk utrace branch and the 2.6.32/utrace branch. If we have more tweaks to utrace as part of the next round of upstream submission, we will try to keep the 2.6.32/utrace backport branch identically updated pretty quickly. Note this is after various changes discussed in the recent LKML review thread (CC'd here) that I trust you have all been reading. This includes some callback API changes that are trivial but will break all your utrace engine code today. Thanks, Roland
new Fedora 13 utrace kernel
In http://kojipkgs.fedoraproject.org/packages/kernel/2.6.32.1/10.fc13/ right now you can find a Fedora kernel with the latest utrace up to the minute. Coming soon to a rawhide near you (should be tomorrow). This is a 2.6.32.1-based kernel with exactly the 2.6.32/* utrace patches as you see them today in git and on people.redhat.com, including Oleg's utrace-based ptrace. If we have any more tweaks to the utrace code soon, we will build an updated rawhide kernel shortly thereafter. Thanks, Roland
Re: [RFC,PATCH 14/14] utrace core
On 12/14, Oleg Nesterov wrote: IOW, we must ensure that if ever clear TIF_NOTIFY_RESUME we must not miss -pending_attach, correct? and for this case we have mb() after clear_thread_flag(). Perhaps instead we should add mb__after_clear_bit() into arch/ hooks, but this needs a lot of arch-dependent changes. Since it's utrace/tracehook code that relies on the barrier I think it makes sense to have it in tracehook_notify_resume() or utrace_resume(). The arch requirement is having done clear_thread_flag() beforehand, so the arch-independent code can reasonably assume whatever semantics that is guaranteed to have. Cough. And I always read this rmb as mb. Even when I changed the comment to explain that we need a barrier between clear bit and read flags, I didn't notice it is in fact rmb... I guess we need the trivial fix, Roland? You're the barrier man, send me what changes it should get. Thanks, Roland
Re: [RFC,PATCH 14/14] utrace core
Yes, I think this is correct. It is fine to miss -pending_attach == T, and in any case the new attacher can come right after the check, even if it was checked under utrace-lock. Right. It is important that the tracee can't miss, say, UTRACE_REPORT request (as you already explained), and every time the tracee clears -resume it calls splice_attaching(). Right. In the stopped cases, there are lots of locks and barriers and things after resuming. (Oleg?) Every time the tracee resumes after TASK_TRACED it uses utrace-lock to synchronize with utrace_control/etc, it must see any changes. And TASK_STOPPED? Please send me patches to add whatever comments would make all this clear enough to Peter when reading the code. Thanks, Roland
Re: Tests Failures on PPC64
Yes. I straced gdb to be sure it really does PTRACE_SET_DEBUGREF to use the hardware watchpoint. There is something strange though. gdb does PTRACE_SINGLESTEP and only then PTRACE_CONT after watch xxx. powerpc's data breakpoints are before-access, whereas x86's are after-access. In x86-speak, it's a fault-type exception rather than a trap-type. The only way to actually get the caught load or store to complete is to clear the DABR, single-step, and then restore it. Thanks, Roland
Re: [RFC,PATCH 14/14] utrace core
All that seems to do is call -release() and kmem_cache_free()s the utrace_engine thing, why can't that be done with utrace-lock held? Calling -release with a lock held is clearly insane, sorry. It is true that any engine-writer who does anything like utrace_* calls inside their release callback is doing things the wrong way. But guaranteeing that simple mistakes result in spin-lock deadlocks just seems clearly wrong to me. A main point of the utrace API is to make it easier to work in this space and help you avoid writing the pathological bugs. Adding picayune gotchas like this just does not help anyone. No other API callback is made holding some internal implementation lock, and making this one the sole exception seems just obviously ill-advised on its face. I can't really imagine what a justification for such an obfuscation would be. But yeah, passing that list along does seem like a better solution. So you find it cleaner to have each caller of utrace_reset take another output parameter and be followed with the same exact source code duplicated in each call site, than to have utrace_reset() do the unlock and then the common code itself. I guess there is no accounting for taste. We try not to get excited about trivia, so on matters like this one we will do whatever the consensus of gate-keeping reviewers wants. My patch to implement your suggestion adds 13 lines of source and 134 bytes of compiled text (x86-64). Is that what you prefer? I'll note that the code as it stands uses the __releases annotation for sparse, as well as thoroughly documenting the locking details in comments. I gather that clear explanation of the code is in your eyes no excuse for ever writing code that requires one to actually read the comments. I'm not sure that attitude can ever be satisfied by any code that is nontrivial or makes any attempts at optimization. Thanks, Roland
Re: [RFC,PATCH 14/14] utrace core
I'm sorry for the delay. I'm picking up with responding to the parts of your review that I did not include in my first reply. I appreciate very much the discussion you've had with Oleg about the issues that I did not address myself. I look forward to your replies to my comments in that first reply, which we have yet to see. Seems inconsistent on u32 vs enum utrace_resume_action. Why force enum utrace_resume_action into a u32? The first argument to almost all callbacks (all the ones made when the task is alive) is called action and it's consistent that its low bits contain an enum utrace_resume_action. The argument is u32 action in the report_signal and report_syscall_entry callbacks, where it combines an enum utrace_resume_action with an enum utrace_{signal,syscall}_action (respectively). It would indeed be more consistent to use u32 action throughout, but it seemed nicer not to have engine writers always writing utrace_resume_action(action) for all the cases where there are no other bits in there. The return value is a mirror of the u32 action argument. In many calls, it has only an enum utrace_resume_action in it. But in some it combines that with another enum utrace_*_action. There I went for consistency (and less typing) in the return type, since it doesn't make any difference to how you have to write the code in your callback function. The main reason I used u32 at all instead of unsigned int is just its shortness for less typing. I don't really mind changing these details to whatever people think is best. The other people writing code to use the utrace API may have more opinions than I do. I guess it could even be OK to make the return value and action argument always a plain enum utrace_resume_action, and use a second in/out enum utrace_{signal,syscall}_action * parameter in those particular calls. But that does put some more register pressure and loads/stores into this API. Seems inconsistent in the bitfield type, also it feels like that 3 the 7 and the enum should be more tightly integrated, maybe add: UTRACE_RESUME_MAX #define UTRACE_RESUME_BITS (ilog2(UTRACE_RESUME_MAX)) #define UTRACE_RESUME_MASK ((1 UTRACE_RESUME_BITS) - 1) Yes, that's a good cleanup. Thanks. (ilog2(7) is 2, so ilog2() + 1 is what you meant.) +static struct utrace_engine *matching_engine( [...] The function does a search, suggesting the function name ought to have something like find or search in it. Ok, I'll make it find_matching_engine. Quite gross this.. can't we key off the tracehoook_report_clone_complete() and use a wakeup there? Yes, we intended to clean this up at some point later. But doing that is just a distraction and complication right now so we put it off. For this case, however, I suppose we could just punt for the initial version. This code exists to support the special semantics of calling utrace_attach_task() from inside the parent's report_clone() callback. That guarantees the parent that it wins any race with any third thread calling utrace_attach_task(). This guarantees it will be first attacher in the callback order, but the general subject of callback order is already something we think we will want to revisit in the future after we have more experience with lots of different engines trying to work together. It's most meaningful when using the UTRACE_ATTACH_EXCLUSIVE flag--then you can use UTRACE_ATTACH_EXCLUSIVE|UTRACE_ATTACH_MATCH_OPS to synchronize with other threads trying to attach the same kind of engine to a task, and give special priority in that to the engine that attaches in report_clone() from tracing the parent. Really I came up with this special semantics originally just with ptrace in mind. ptrace is an engine that wants to exclude other tracer threads attaching asynchronously with the same kind of engine, and that wants to give special priority on a child to the parent's tracer (to implement PTRACE_O_TRACECLONE et al). However, Oleg's current ptrace code still relies on the old hard-coded locking kludges to exclude the separate attachers and to magically privilege the auto-attach from the parent's tracer. So we are not relying on this special semantics yet. We could just punt utrace_attach_delay() and remove the associated documentation about the special rights of report_clone() calling utrace_attach_task(). If we decide it helps clean things up later when we finish more cleanup of the ptrace world, then we can add the fancy semantics back in later. Does this really need the inline? It has one caller and that call is unconditional. Asymmetric locking like this is really better not done, and looking at the callsites its really no bother to clean that up, arguably even makes them saner. By assymetric you mean that utrace_reap releases a lock (as the __releases annotation indicates). As should be obvious from the code, the unlock is done before the loop that does -report_reap callbacks and utrace_engine_put() (which can make
Re: s390: stepping PT_PTRACED (Was: utrace failed to compile for s390x)
But... I tried to understand these check and failed. Why do we need them? They look unneeded to me, but of course I know nothing about s390. It's not specific to s390. Other arch's have equivalent logic. As with all things ptrace, I strongly suspect that they just blindly copied the logic from some old i386 code and never contemplated the actual intent. AFAIK this is just part of some old defensive programming or partial mitigation of the general problem that we overload the user signal queue as a queue of debugger-induced hardware traps. This is some very incomplete mitigation--the general problem is a known issue we want to address in the future--but it's the status quo, so better not to tweak it now in case it might be closing an observable hole today in some usage pattern that someone might actually experience. The general problem has many corners and races and we still have those with or without this check. But consider e.g. the scenario where the debugger PTRACE_SINGLESTEP'd into a long-blocking syscall (read on a dangling pipe, etc.), then the debugger suddenly exits without doing a proper PTRACE_DETACH. That is an entirely non-racey case where TIF_SINGLE_STEP was left set but -ptrace is clear, so without the check you could get the spurious SIGTRAP killing the tracee (the once-was-tracee-and-no-longer, that is). Perhaps nowadays exit_ptrace() leads to ptrace_disable() - user_disable_single_step() and so the TIF_* bit is clear before resuming and getting here (or at least with utrace, it leads to UTRACE_DETACH and eventually to user_disable_single_step()). But at least before that (and it looks like with the 2.6.32 ptrace code still), it makes a difference in this scenario. So it is still an open can of worms I don't want us to dive into this week, but at least this one worm-escape hole should stay as plugged as it has been these last 10+ years. (For extra credit, write up that case in ptrace-tests and make it a KFAIL.) Thanks, Roland
Re: step-into-handler.c compilation failure on ppc64
How about this? --- step-into-handler.c 10 Dec 2008 04:42:43 -0800 1.8 +++ step-into-handler.c 05 Dec 2009 09:18:54 -0800 @@ -35,6 +35,7 @@ #include sys/time.h #include string.h #include stddef.h +#include stdint.h #if defined __x86_64__ #define REGISTER_IP regs.rip @@ -113,11 +114,11 @@ handler_alrm_get (void) { #if defined __powerpc64__ /* ppc64 `handler_alrm' resolves to the function descriptor. */ - return *(void **) handler_alrm; + return *(void **) (uintptr_t) handler_alrm; /* __s390x__ defines both the symbols. */ #elif defined __s390__ !defined __s390x__ /* s390 bit 31 is zero here but I am not sure if it cannot be arbitrary. */ - return (void *) (0x7fff (unsigned long) handler_alrm); + return (void *) (0x7fff (uintptr_t) handler_alrm); #else return handler_alrm; #endif
Re: [PATCH] utrace: don't set -ops = utrace_detached_ops lockless
I forgot that there is another issue (iirc a bit discussed too). finish_callback_report() sets -ops = utrace_detached_ops lockless! You'll have to remind me why this is a problem. Re: [PATCH 85] ptrace_attach_task: rely on utrace_barrier(), don't check -ops https://www.redhat.com/archives/utrace-devel/2009-October/msg00180.html We already discussed this, but forgot to finish. Ah, yes. I had that message still sitting in my folder to think about again and reply. Do you agree with the patch? I think so, yes. It could use some more comments about the importance of the lock. I added a comment and merged it in. Thanks, Roland
Re: powerpc: syscall_dotrace() retcode (Was: powerpc: fork stepping)
We don't really intend to impose any new requirements on the arch behavior here. It's up to the arch folks to decide. It does simplify life somewhat on x86 that you can change the registers at the syscall-entry stop and then after skipping the syscall, those registers will be unchanged from what you set. But it's never been that way on every other arch, and it doesn't need to be. The arch requirement on the tracehook_report_syscall_entry() return value handling is that it make the syscall not be run, and that the register state then left be compatible with using syscall_rollback() at the tracehook_report_syscall_exit() spot. As to what you get from ptrace explicitly fiddling with registers at syscall entry, that has always been arch-specific before and we haven't done anything to change that situation. On every arch, there is some way to change the syscall number to be run, and changing it to a known-bogus number (e.g. -1) makes sure no syscall runs. On every arch, it's possible at the tracehook_report_syscall_exit() spot to change the registers to exactly whatever you want userland to see. That's enough as it stands to make it possible to do whatever you want, some way or other. If the powerpc maintainers want to change the behavior here, that is fine by me. But there is no need for that just to satisfy general ptrace cleanups (or utrace). Normal concerns require that no such change break the ptrace behavior that userland could have relied on in the past. So off hand I don't see a reason to change at all. If every arch were to change so that registers changed at syscall-entry were left unmolested by aborting the syscall, then that might be a new consistency worth having. But short of that, I don't really see a benefit. All this implies that the ptrace-tests case relating to this needs to be tailored differently for powerpc and each other arch so it expects and verifies exactly the arch-specific behavior that's been seen in the past. Thanks, Roland
Re: clone bug (glibc?) (Was: clone-multi-ptrace test failure)
So. Any ptrace test which uses clone() is broken, at least on x86_64. If you use clone() directly then you need to have the code run in that child be purely under your control. You can't use miscellaneous libc calls nor any libpthread calls, only ones you are sure do not require any thread setup. Given TLS, that means even using errno or anything that might set it. It also means any libc function that might use any TLS you don't know about, i.e. really anything beyond the pure computation calls like str*/mem*. It also means running any dynamic linker code, such as relying on dynamic linking without LD_BIND_NOW (or -Wl,-z,now at compile time). The only thing that changed about this recently in glibc is that even more code paths through the dynamic linker now happen to depend on thread setup. Jan, Roland, how should we fix this? We can rewrite the code to use pthread_create(), this should be trivial. Unfortunately, libpthread is not trivial, it can shadow the problem and complicate the testing. We should avoid library code more thoroughly, not use more of it. As well as being complex, it also varies a lot across systems and interferes with using the same sources to translate to exact kernel-level testing across various people's development environments. I think the best bet is to link with -Wl,-z,now and then minimize the library code you rely on. (It really only matters to be extra careful about that for the code running in the clone child.) So you can use syscall() if you are not relying on its error-case behavior--if the system call fails, the function will set errno, which can rely on the TLS setup. And the stupid question. If I create the subthread via pthread_create(), how can I know its tid? I grepped glibc-2.11, and afaics pthread_create returns the pointer to struct pthread which has pid_t tid but I can not find the helper which returns -tid and struct pthread is not exported. There is no official exported way. You can use syscall(__NR_gettid). That kernel concept of a global thread ID number does not exist in pthreads, it is a detail of the Linux implementation. Thanks, Roland
Re: [RFC,PATCH 14/14] utrace core
Could we just drop the tracehook layer if this finally merged and call the low level functions directly? We can certainly do appropriate streamlining cleanups later, by all means. The original purpose of the tracehook layer is simply this: A person hacking on core kernel code or arch code should not have to think about all the innards of tracing (ptrace, utrace, or anything else). If he comes across a tracehook_* call, he can just read its kernel-doc for explanation of its parameters, return value, what it expects about its context (locking et al), and what the semantic significance of making that particular call is. If changes to the core/arch code in question do not require changing any of those aspects, then said person need not consider tracing issues further. If a change to a function's calling interface or contextual assumptions looks warranted, then he knows he should discuss the details with some tracing-related folks (i.e. find tracehook.h in MAINTAINERS and thus get to me and Oleg). Likewise, a person hacking on tracing code should not have to think about every corner of interaction with the core code or arch code. Each tracehook_* call's kernel-doc comments say everything that matters about how and where it's called. If some of those details are problematic for what we want to do inside the tracing code, then we know we have to hash out the details with the maintainers of the core or arch code that makes those calls. Otherwise we can keep our focus on tracing infrastructure without spending time getting lost in arcane details of the arch or core kernel code. These two things seem permanently worthwhile to me: that the core and arch source code use simple function calls without open-coding any innards of the tracing infrastructure; and that these functions have clear and complete documentation about their calling interfaces and context (locking et al). I find it natural and convenient that such calls have a common prefix that makes it obvious to any reader of the core code what subsystem the call relates to. Beyond those ideas, I certainly don't care at all what the names of these functions are, what common prefix is used, or in which source files those declarations and kernel-doc comments appear. Thanks, Roland
Re: [RFC,PATCH 0/14] utrace/ptrace
Note to all: I'm on the road this week (having had a holiday last week) and will be somewhat slow in replying on these threads, but I will be sure to get to them all. Yes, nobody likes 2 implementations. I guess Roland and me hate CONFIG_UTRACE much more than anybody else. :-) We both hate maintaining the old ptrace implementation, that is true! The other thing we hate is having our work delayed arbitrarily again and again to wait for the arch maintainers of arch's we don't use ourselves. Oleg and I have been trying to follow the advice we get on how to get this work merged in. When multiple gate-keepers give conflicting advice, we really don't know what to do next. Our interest is in whatever path smooths the merging of our work. We'd greatly prefer to spend our time hacking on our new code to make it better or different in cool and useful ways than to spend more time guessing what order of patches and rejuggling of the work will fit the changing whims of the next round of review. We don't want to rush anyone, like uninterested arch maintainers. We don't want to break anyone who doesn't care about our work (we do test for ptrace regressions but of course new code will always have new bugs so some instances of instability in the transition to a new ptrace implementation have to be expected no matter how hard we try). We just don't want to keep working out of tree. The advantage of making the new code optional is, obviously, that you can turn it off. That is, lagging arch's won't be broken, just unable to turn it on. And, anyone who doesn't want to try anything new yet can just turn it off and not be exposed to new code. The advantage of making the new code nonoptional is, obviously, that you can't turn it off. The old implementation will be gone and we won't have to maintain it any more (outside some -stable streams for a while). The kernel source will be cleaner with no optional old cruft code duplicating functionality. All ptrace users will be testing the new code, and so we'll find its bugs and gain confidence that it works solidly. Like I've said, we want to do whatever the people want. My concern about what Christoph has suggested is that it really seems like an open-ended delay depending on some arch people who are not even in the conversation. For anyone either who likes utrace or who is concerned about its details, the sooner we are working in-tree the sooner we can really hash it out thoroughly and get to having merged code that works well. As long as there is anything unfinished like arch work that we've decided is a gating event, then the review and hashing out of the new code itself does not seem to get very serious. (To wit, this thread is still talking about merge order and such, another long thread was about incidental trivia, and we've only just started to see a tiny bit of actual code review today.) Thanks, Roland
Re: [RFC,PATCH 14/14] utrace core
This above text seems inconsistent. First you say report_reap() is the final callback and should be used for cleanup, then you say report_death() is the penultimate callback but might not always happen and people would normally use that as cleanup. If we cannot rely on report_death() then I would suggest to simply recommend report_reap() as cleanup callback and leave it at that. I'm sorry the explanation was not clear to you. I'll try to make it clear now and then I'd appreciate your suggestions on how the documentation could be worded better. (I do appreciate your suggestion here but it's one based on an idea that's not factual, so I don't think we should follow it.) There is no unreliable aspect to it. If you call utrace_set_events() to ask for report_death() callbacks then you will get that callback for that task--guaranteed--unless utrace_set_events() returned an error code that tells you unambiguously that you could not get that callback. What's true is that report_reap() is the last callback you can ever get for a given task. If you had asked for report_death() callbacks, then you always get report_death() and if you've asked for both these callbacks then report_death() is always before report_reap(). (This is a special ordering guarantee, because in actuality the release_task() call that constitutes reap can happen in the parent simultaneously with the task's own exit path.) The one situation in which you cannot get report_death() is when the task was already dead when you called utrace_set_events(). In that case, it gives an error code as I mentioned above. Even in that situation, you can still ask to enable just the report_reap() callback. With either of these, if you enabled it successfully, then you are guaranteed to get it. When you get report_death() and are not interested in getting report_reap() afterwards, then you can return UTRACE_DETACH from report_death(). If you don't detach, then you will get report_reap() later too. The documentation mentions using report_death() to detach and clean up because many kinds of uses will have no interest in report_reap() at all. The only reason to get a report_reap() callback is if you are interested in tracking when a task's (real) parent reaps it with wait* calls, but usually people are only really interested in a task until it dies. + para +There is nothing a kernel module can do to keep a structnamestruct +task_struct/structname alive outside of +functionrcu_read_lock/function. Sure there is, get_task_struct() comes to mind. __put_task_struct() is not exported to modules and so put_task_struct() cannot be used by modules. +still valid until functionrcu_read_unlock/function. The +infrastructure never holds task references of its own. And here you imply its existence. [I take it this refers to the next quoted bit:] Though neither +functionrcu_read_lock/function nor any other lock is held while +making a callback, it's always guaranteed that the structnamestruct +task_struct/structname and the structnamestruct +utrace_engine/structname passed as arguments remain valid +until the callback function returns. No, we do not imply that the utrace infrastructure holds any task reference. The current task_struct is always live while it's running and until it's passed to release_task(). The task_struct passed to callbacks is current. That's all it means. The true facts are that utrace callbacks are all made in the current task except for report_reap(), which is made at the start of release_task(). So the kernel's core logic is holding a task reference at all times that utrace callbacks are made. If you really think it is clearer to explain that set of facts as utrace holds a reference, then please suggest the exact wording you have in mind. The above seems weird to me at best... why hold a pid around when you can keep a ref on the task struct? A pid might get reused leaving you with a completely different task than you thought you had. The *_pid interfaces are only there because put_pid() can be used by modules while put_task_struct() cannot. If we can make __put_task_struct() an exported symbol again, I would see no reason for these *_pid calls. Right, except you cannot generally rely on this report_death() thing, so a trace engine needs to deal with the report_reap() thing anyway. This is a false antecedent. I hope the explanation above made this subject clear to you. + sect2 id=interlocktitleInterlock with final callbacks/title [...] Hrmm, better mention this earlier, or at least refer to this when describing the above cleanup bits. This explanation is somewhat long and has its own whole section so as to be thoroughly explicit and clear. Do you think there should just be a cross reference here in the earlier mention of report_reap() and report_death()? Or do you think
Re: utrace-ptrace gdb testsuite tesults
In general everything where is a word thread has unstable results and nonstop tests are also a bit unstable. So where exactly is the problem in these cases? Are the tests overly timing-sensitive where there is no actual behavior bug? Or is gdb overly timing-sensitive where there is no actual kernel bug? Or is it just unknown, and might be a kernel bug after all (even an undiagnosed one in vanilla kernels)? There are IMO/hopefully very few cases tested by the gdb testsuite and still not covered by the ptrace-testsuite, I even do not much expect we will see again a new utrace regression caught by the gdb testsuite uncaught by the ptrace-testsuite. That's certainly good to hear. If you are pretty confident about that, then I am quite happy to consider nonregression on all of ptrace-tests the sole gating test for kernel changes. We just don't want to wind up having other upstream reviewers notice a regression using gdb that we didn't notice before we submitted a kernel change. Please point at some built or easily buildable kernel .rpm first. http://kojipkgs.fedoraproject.org/scratch/roland/task_1825649/ Thanks, Roland
http://koji.fedoraproject.org/scratch/roland/task_1825649/
At this URL find built rpms (x86_64 and i686 only) that you can install on a Fedora 12 (or rawhide, probably) system. These are the upstream kernel du jour with the current utrace-ptrace branch code (see rpm changelog for commit id). (I tried an f12-flavored build too, but it looks like the current upstream code does not build on ppc.) I tried ptrace-tests on this kernel. step-fork fails as expected, this kernel doesn't have that (utrace-unrelated) upstream fix. On i686, I get no other ptrace-tests failures. On x86_64, detach-sigkill-race consistently fails in tests/ but succeeds in biarch-tests/. Hmm, looks like that fails on stock F-12 kernel (i.e. vanilla ptrace) too, so not a regression. Thanks, Roland
Re: Q: UTRACE_SYSCALL_RESUMED logic
On 11/18, Roland McGrath wrote: In any case, what is the rationality? The rationale is that if you see utrace_resume_action(action)==UTRACE_STOP in your callback, then you know another engine asked for stop Yes, but engine can't know if the next one is going to return UTRACE_STOP. You can't know if the next one is going to change the registers and resume, either. That's just the general engine order issue. The only point of UTRACE_SYSCALL_RESUMED is to bring make it as possible to cooperate with a stop-modify-resume engine as it already is with a modify-in-callback engine. OK, thanks. But shouldn't utrace_report_syscall_entry() reset report-action = UTRACE_RESUME after do_report_syscall_entry()? If we stop, report-action remains UTRACE_STOP when we do the reporting loop. Yes, fixed. It is not clear to me why ptrace_report_syscall_entry() uses utrace_syscall_action() under if (UTRACE_SYSCALL_RESUMED). That is what a callback should do if it doesn't have a specific intent. Otherwise you clobber the choice made by a previous engine. This looks a bit strange because it returns the unconditional UTRACE_SYSCALL_RUN below. IOW, if ptrace should obey to another engine's request to abort this syscall, the code should use utrace_syscall_action() consistently. Yes, it should not change the incoming action for a ptrace syscall report. OTOH, PTRACE_O_SYSEMU always aborts. Not sure I understand how different engines can be friendly to each other. The friendly idea can only apply when an engine intends to be noninvasive to userland behavior. i.e., as the lone engine you would not perturb the default behavior, so as a second engine you should not perturb what the last engine chose. When the ptrace engine is doing SYSEMU, it intends to break the normal userland behavior. There is just inherently no way to be noninvasive about it. Thanks, Roland
Re: [PATCH 3] ptrace: introduce user_single_step_siginfo() helper
Is it possible to add si_code and si_addr info info-si_code = TRAP_TRACE; info-si_addr = instruction_pointer(regs); This is exactly what arch-specific versions should do here. The choice of TRAP_TRACE is an arch detail, not a common default. Thanks, Roland
Re: [PATCH 133] stepping, accommodate to utrace-cleanup changes
The former is e.g. PTRACE_SINGLESTEP while an unrelated engine uses UTRACE_EVENT(SYSCALL_ENTRY). The ptrace engine's report_quiesce returns UTRACE_SINGLESTEP. finish_resume_report() calls user_enable_single_step(). That seems fine. Right? In this case ptrace_report_quiesce(event) returns UTRACE_RESUME, note that it does return event ? UTRACE_RESUME : ctx-resume; and event == SYSCALL_ENTRY. This looks correct. With the utrace layer changes we're discussing, we need it to return UTRACE_SINGLESTEP (i.e. ctx-resume) here. AFAICT that never hurts even with today's utrace layer. I see. finish_resume_report() will only do utrace_stop() and then we'll go directly into the syscall unless someone used UTRACE_REPORT. Yes, utrace_stop() will only set TIF_NOTIFY_RESUME and utrace-resume. In the real resume-to-user cases, that's fine because we'll handle that in utrace_resume() or utrace_get_signal() before doing anything else. Not sure I understand. utrace_stop() will not set TIF_NOTIFY_RESUME? We are talking about the case when the tracee stops in SYSCALL_ENTER, and then the tracer does utrace_control(UTRACE_SINGLESTEP) to resume. In this case utrace_control() sets -resume/TIF_NOTIFY_RESUME, yes. I think you do understand fine. Yes, that's what it will do. I was just recognizing that this isn't enough in the syscall-entry case. this seems like the right idea for how to get tracehook_report_syscall_exit called in the cases where it is in old ptrace. Afaics yes. But, what if another engine does utrace_control(UTRACE_INTERRUPT) ? Hmm. Yes, I think this is the one case that is unlike all the others. That is, UTRACE_INTERRUPT at syscall-entry. In all other cases, nothing would care about utrace-resume until we get to get_signal_to_deliver anyway, where the UTRACE_SIGNAL_REPORT pass will trump any pending resume action anyway so you don't care that you lost track of it when UTRACE_INTERRUPT came in. Hmm. So what does UTRACE_INTERRUPT mean here anyway? It means that we should report soon and that signal_pending() should be true until that report. So today that means you can get into the syscall with signal_pending() and depending on the particular call that might cause a normal blocking path not to block and/or a -EINTR/-ERESTART* return, but some syscalls will just complete normally. How about if we say that UTRACE_INTERRUPT at syscall-entry means that the syscall really doesn't happen at all? That is, instead, you force an -ERESTARTNOHAND return and the normal roll-back such that you get your report_signal callback before the syscall is entered. That even seems like a useful utrace API feature, as well as conveniently smoothing over this quirk in the resume action handling. My idea here is that this makes it OK to lose UTRACE_SINGLESTEP here because from the user-mode-centric perspective no instruction has happened yet. The original guy who did UTRACE_SINGLESTEP (and perhaps never cared about syscall tracing) will get a generic report_quiesce or report_signal at which it needs to reassert UTRACE_SINGLESTEP. This merits more thought. For now, let's just leave an XXX comment about a UTRACE_INTERRUPT effectively swallowing the UTRACE_SINGLESTEP that should cause utrace_report_syscall_exit to be called. I think we can revisit this corner after we have merged up all the rest of it. Argh. I meant, from the caller pov, utrace_control(UTRACE_SINGLESTEP) does enable_step() immediately, like it did before utrace-cleanup changes. I see. From the utrace API perspective, I don't think anything changes here. In today's code, the resume action can take effect without a subsequent utrace callback. So that's the same as it ever was, and where this actually happens is not something that a utrace caller should know or can tell. IOW, suppose the tracee is stopped, and ptrace does UTRACE_SINGLESTEP. The tracee resumes, processes -resume, and does enable_step(). From the ptrace pov, it looks as if utrace_control() does enable_step() before utrace_wakeup(). Sure. I meant, every time the tracee stops, it should process -resume after wakeup. Looks like, the patch you sent in another thread (which adds apply_resume_action()) does something like this, right? Right. Thanks, Roland
Re: [PATCH 134] mv kernel/ptrace.c kernel/ptrace-utrace.c
I am not sure what is the best way to do these renames but I hope this doesn't really matter, utrace-ptrace branch is only for us. Sure, whatever you want to try is fine. The goal is to make it testable with and without CONFIG_UTRACE. Ok. We need to get upstream feedback on what they do or don't want like that. Some people have definitely said they don't want to see two implementations in the tree at the same time. But I can never guess what they will say next week. You'll get the joy of hashing that out with them. :-) Thanks, Roland
Re: [PATCH 0/4] utrace-resume fixes
Whatever we do, perhaps it makes sense to apply your patch in https://www.redhat.com/archives/utrace-devel/2009-November/msg00109.html first and then do further changes? Ok. v2.6.32-rc8-245-g3d4f9cf has that. I'll shelve this 4-patch series while we keep discussing (one more reply to come from me as of now). Then you send me a fresh series done however you think is best given that discussion. (So if you want to do more relative to these 4 patches, just resend them as part of the new series.) Thanks, Roland
Re: [PATCH 0/4] utrace-resume fixes
Somehow I can't really understand this patch. I hope more or less I can see what it does, but the resulting code looks even more subtle to me. Well, it was an untested draft and probably needed more comments. With this patch, apply_resume_action() is always called after utrace_stop(). Well, except for utrace_report_exit(), but I guess it could do apply_resume_action() too. It could, but it just never matters at all. The only thing that can possibly be meaningful is UTRACE_INTERRUPT, and for that finish_report has set TIF_SIGPENDING already anyway. Now it is not clear why utrace_control(SINGLESTEP) sets TIF_NOTIFY_RESUME, it is not needed after apply_resume_action() processes -resume. Yes, we have jctl stops, but we can move this code into utrace_finish_stop(). Yes, it is not really needed. But note that it is actually now possible to use UTRACE_SINGLESTEP without UTRACE_STOP first--not that it's really useful, but we don't really have any reason to disallow it. So in that case it does make a difference. We could just do: @@ -1183,7 +1183,8 @@ int utrace_control(struct task_struct *target, clear_engine_wants_stop(engine); if (action utrace-resume) { utrace-resume = action; - set_notify_resume(target); + if (reset) + set_notify_resume(target); } break; Since TIF_NOTIFY_RESUME will now always be superfluous when stopped. It is not clear to me why apply_resume_action(UTRACE_INTERRUPT) does set_tsk_thread_flag(TIF_SIGPENDING). In fact I don't understand why apply_resume_action() checks UTRACE_INTERRUPT at all. If it is called after utrace_stop(), action == start_report() which never resets UTRACE_INTERRUPT. It's only because this code path is shared with the tail of finish_resume_report, where it is the only thing processing UTRACE_INTERRUPT in the real return-to-user cases. i.e., in paths that don't call finish_report(), TIF_SIGPENDING won't ever have been set by a callback return value. utrace_control does always set it up front, so it is superfluous when that's how UTRACE_INTERRUPT got set. And since we process -resume after stop, it is not clear why we set -resume = report-resume_action before stop, apply_resume_action() could use min(report-resume_action, utrace-resume). Yes, yes, we have the nasty utrace-vfork_stop case, I mean it is not easy to understand the logic behind all these -resume changes. Ok. Feel free to clean it up if you think it makes things clearer. To me, it's just natural to do that MIN operation as soon as you know about it. utrace_stop() always takes the lock anyway, so it's relatively free. If the ambient state is stored early, then e.g. utrace_control() won't bother to set TIF_SIGPENDING or TIF_NOTIFY_RESUME. I guess I think the logic is simple: apply a new minimum to -resume as soon as you know about it. And afaics, this can't help to fix the tracehook_report_syscall_exit() TIF_SINGLESTEP problems in the multitracing case, UTRACE_INTERRUPT destroys UTRACE_SINGLESTEP. Ptrace can reassert it later, but this will be too late, the trace has already passed this tracehook. See the other thread, but what I said there is let's take this case up in its own thread later. I don't understand this skip_notify, utrace_stop() is always called with skip_notify == true? Hmm. Yes, this patch was unfinished when I sent it because your patches were crossing paths with what I was doing. We can skip TIF_NOTIFY_RESUME when we'll always call apply_resume_action() after utrace_stop() before user mode (i.e. report_exit doesn't matter). Since report_exit doesn't matter one way or the other, perhaps it will be cleaner to roll apply_resume_action() into utrace_stop() in some fashion. I notice we're now actually using the REPORT() macro only four times, and one of those is report_exit. So maybe that and finish_report() could change somehow too to refactor things. I'll leave it to you to rework all those and finish_* to the most useful organization of subroutines. I think we're mutually clear now on the idea of when to do user_*_step() calls (i.e. apply_resume_action). Thanks, Roland
Re: copy_process utrace_init_task (Was: [PATCH 133] stepping, accommodate to utrace-cleanup changes)
Found the trivial but nasty problem. Ah! Good catch. I added tracehook_init_task() in my tree. I don't see much benefit in sending any tracehook patch upstream for this. tracehook_init_task() corresponds to tracehook_free_task(), which is only added by utrace (and both would just be empty in a separate preparatory patch). I don't see any reason to fiddle the ptrace_init_task() call. The setup it does is all stuff that only matters for teardown done by release_task() or even earlier in the exit sequence. So it makes enough sense that the setup side should happen later as it does now. In the long run, the ptrace init stuff will all just go away. Before then I can't see any rationale for touching it. Thanks, Roland
Re: tracehook_report_syscall_exit() PT_PTRACED (Was: [PATCH 133] stepping, accommodate to utrace-cleanup changes)
but now I think perhaps it would be better to send ptrace-change-tracehook_report_syscall_exit-to-handle-stepping_fix to akpm right now: --- a/include/linux/tracehook.h +++ b/include/linux/tracehook.h @@ -134,7 +134,7 @@ static inline __must_check int tracehook */ static inline void tracehook_report_syscall_exit(struct pt_regs *regs, int step) { - if (step) { + if (step (task_ptrace(current) PT_PTRACED)) { siginfo_t info; user_single_step_siginfo(current, regs, info); force_sig_info(SIGTRAP, info, current); What do you think? Yes, this makes it consistent with the x86 behavior before the change, which used tracehook_consider_fatal_signal(current, SIGTRAP) in its test. Thanks, Roland
Re: [PATCH] utrace: finish_report() must never set -resume = UTRACE_STOP
Forgot to mention, your tree lacks these patches we sent upstream: Right. I'll merge these into some requirements branch (or maybe just the existing tracehook branch) and merge that into utrace. Thanks, Roland
Re: [PATCH] utrace: utrace_attach_task() forgets to return when -utrace == NULL
Now, if we race with another task doing utrace_task_alloc() and see -utrace != NULL, why should we see the correctly initialized *utrace? utrace_task_alloc() needs wmb(), and the code like above read_barrier_depends(). Ok. Please review what I put in (esp. commit c575558) and send patches relative to the latest tree if that's not right. Perhaps it is overkill to do read_barrier_depends() in task_utrace_struct(). But AFAICT if we don't actually need it all those places, that is only because of some less-obvious barrier effect with checks on -utrace_flags or something. Do you think it's not really needed in all those places we extract the pointer before spin_lock et al? Thanks, Roland